National Academies Press: OpenBook
« Previous: Front Matter
Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×

Executive Summary

The issues surrounding comparability and equivalency of educational assessments, although not new to the measurement and student testing literature, received broader public attention during congressional debate over the Voluntary National Tests (VNT) proposed by President Clinton in his 1997 State of the Union address. If there is any common ground shared by the advocates and opponents of national testing, it is the potential merits of bringing greater uniformity to Americans' understanding of the educational performance of their children. Advocates of the VNT argue that this is only possible through the development of a new test, while opponents have suggested that statistical linkages among existing tests might provide a basis for comparability.

To help inform this debate, Congress asked the National Research Council (NRC) to study the feasibility of developing a scale to compare, or link, scores from existing commercial and state tests to each other and to the National Assessment of Educational Progress (NAEP). This question, stated in Public Law 105-78 (November 1997), was one of three, stemming from the debate over the VNT, that the NRC was asked to study. Under the auspices of the Board on Testing and Assessment, the NRC appointed the Committee on Equivalency and Linkage of Educational Tests in January 1998.

Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×

Key Issues

The committee faced a relatively straightforward question: Is it feasible to establish an equivalency scale that would enable commercial and state tests to be linked to one another and to the National Assessment of Educational Progress (NAEP)? The committee has reviewed research literature on the statistical and technical aspects of creating valid links between tests and on how the content, use, and purposes of educational testing in the United States influence the quality and meaning of those links. We issued an interim report in June 1998.

Testing experts have long used various statistical calculations, or linking procedures, to connect the scores from one test with those of another—in other words, to interpret a student's score on one test in terms of the scores on a test the student has not taken. A common analogy for linking tests is the formula used to convert Celsius temperatures to the Fahrenheit scale: for Americans traveling to Europe, it pays to know that 30 degrees is quite warm, not 2 degrees below freezing. Indeed, in some tightly circumscribed cases, linkage across tests is not very different. For example, equating is used to make alternate forms of the Scholastic Assessment Test (SAT) equivalent, so that college admissions officers are sure that a score of 600 means much the same thing regardless of which form of the SAT a student took (because a different form of the SAT is given at each major test administration).

But in most cases, especially those that motivate this report, linking test scores in a useful way involves more complex considerations than conversions of temperature or equating nearly identical tests across their multiple forms. For example, clusters of states are looking at possible linkages to stimulate greater comparability between scores on the state tests and between scores on the state tests and NAEP. These situations require linking tests that do not meet the strict requirements for equating and must take into account an array of complicated and complicating factors such as definition of educational goals, uses of tests, and varied emphasis on the multiplicity of skills and knowledge that comprise mastery in different subject areas.

In evaluating the feasibility of linkages, the committee focused on the linkage of various 4th-grade reading tests and the linkage of various 8th-grade mathematics tests (the topics and grades designated in the VNT proposal). We concentrated on factors that affect the validity of the inferences about student performance that users would draw from the linked test scores. We note that it is often possible to calculate arithmetic

Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×

linkages that create misleading interpretations of student performance. To cite an extreme case, one could create a formula to link a reading test and a mathematics test, but the resulting scores would be ambiguous, since mathematics performance cannot be interpreted in terms of the skills used in reading. Even in less extreme situations, links between tests that differ in less dramatic ways can produce scores that are substantially misleading. Moreover, a link between two specific tests may be appropriate for one purpose, but unacceptable for others. Thus, linkage between tests involves factors that are not apparent in the analogy with linking temperature scales. These factors might be relevant whether 2 tests—or 200—are being linked. A difference between tests on any one of these factors, though not always sufficient to disqualify the proposed linkage, signals a warning about misinterpretations that may result.

Assumptions

In approaching its charge, the committee made three key assumptions. First, the question motivating the study is predictable and sensible. It manifests a historical tension in the American educational system between a belief that curriculum, instruction, and assessment are best designed and managed at the state and local levels and a desire to bring greater uniformity to the reporting of information about student achievement in the nation's diverse educational system.

Second, though Congress was not explicit about the purposes of linkage, we recognize that the study originated in the debate over President Clinton's proposal for national tests of reading and mathematics. But the committee's charge is a narrowly defined and technical one, namely, to evaluate the feasibility of developing a scale to compare individual scores on existing tests to one another and to NAEP. Some of our findings are directly relevant to technical aspects of the VNT, for example, the requirement that it be linked to NAEP. And the committee acknowledges that a key underlying issue in the debate over the VNT is the utility of nationally comparable information on individual student achievement. However, the committee has no position on the overall merits of the VNT, and in making conclusions about the feasibility of linking existing tests we do not intend to suggest either that the nation should or should not have national tests. Neither policy decision follows inevitably from our basic conclusions about linkage and equivalency.

Third, we adopted a definition of “feasibility" that combines validity

Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×

and practicality. Validity is the central criterion for evaluating any inferences based on tests and is applied in this report to inferences based on linkages among tests. By practicality we mean not only whether linkages can be calculated, in the arithmetic sense, but whether the costs of carrying out the linkages are reasonable and manageable.

Conclusions

In drawing our conclusions, the committee acknowledges that, ultimately, policy makers and educators must take responsibility for determining the degree to which they can tolerate imprecision in testing and linking. In other words: test-based decisions involve error, linkage can add to the error, and we realize that responsible people may reach different conclusions about the minimally acceptable level of precision in linkages that are intended to serve various goals. Our role is to provide science-based information on the possible sources and magnitude of the imprecision, in the hope that alerting educators and policy makers to the possibility of errors and their consequences will prove useful.

In the committee's interim report, we reached two basic conclusions:

  • 1.  

    Comparing the full array of currently administered commercial and state achievement tests to one another, through the development of a single equivalency or linking scale, is not feasible.

  • 2.  

    Reporting individual student scores from the full array of state and commercial achievement tests on the NAEP scale and transforming individual scores on these various tests and assessments into the NAEP achievement levels are not feasible.

We reached these conclusions despite our appreciation of the potential value of a technical solution to the dual challenges of maintaining diversity and innovation in testing while satisfying growing demands for nationally benchmarked data on individual student performance.

We have now considered two additional issues relevant to the committee's charge. First, we have examined whether it is feasible to link smaller subsets of tests, other than the existing "full array," and to use these linkages to make meaningful comparisons of student performance. Second, we have studied in greater depth the questions involved in reporting individual scores from any test on the NAEP scale and in terms of the NAEP achievement levels.

Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×

On these questions our level of optimism is not much higher. We find that simply reducing the number of tests under consideration does not necessarily increase the feasibility of linkage unless the tests to be linked are very similar in a number of important ways. We also find that interpreting the scores on any test in terms of the NAEP achievement levels poses formidable technical and interpretive challenges.

Therefore, the Committee has reached the following two additional conclusions:

  • 3.  

    Under limited conditions it may be possible to calculate a linkage between two tests, but multiple factors affect the validity of inferences drawn from the linked scores. These factors include the content, format, and margins of error of the tests; the intended and actual uses of the tests; and the consequences attached to the results of the tests. When tests differ on any of these factors, some limited interpretations of the linked results may be defensible while others would not.

  • 4.  

    Links between most existing tests and NAEP, for the purpose of reporting individual students' scores on the NAEP scale and in terms of the NAEP achievement levels, will be problematic. Unless the test to be linked to NAEP is very similar to NAEP in content, format, and uses, the resulting linkage is likely to be unstable and potentially misleading. (The committee notes that it is theoretically possible to develop an expanded version of NAEP that could be used in conducting linkage experiments, which would make it possible to establish a basis for reporting achievement test scores in terms of the NAEP achievement levels. However, the few such efforts that have been made thus far have yielded limited and mixed results.)

Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×
This page in the original is blank.
Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×
Page 1
Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×
Page 2
Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×
Page 3
Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×
Page 4
Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×
Page 5
Suggested Citation:"Executive Summary." National Research Council. 1999. Uncommon Measures: Equivalence and Linkage Among Educational Tests. Washington, DC: The National Academies Press. doi: 10.17226/6332.
×
Page 6
Next: 1 Tests and the Challenge of Linkage »
Uncommon Measures: Equivalence and Linkage Among Educational Tests Get This Book
×
Buy Paperback | $40.00 Buy Ebook | $31.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The issues surrounding the comparability of various tests used to assess performance in schools received broad public attention during congressional debate over the Voluntary National Tests proposed by President Clinton in his 1997 State of the Union Address. Proponents of Voluntary National Tests argue that there is no widely understood, challenging benchmark of individual student performance in 4th-grade reading and 8th-grade mathematics, thus the need for a new test. Opponents argue that a statistical linkage among tests already used by states and districts might provide the sort of comparability called for by the president's proposal.

Public Law 105-78 requested that the National Research Council study whether an equivalency scale could be developed that would allow test scores from existing commercial tests and state assessments to be compared with each other and with the National Assessment of Education Progress.

In this book, the committee reviewed research literature on the statistical and technical aspects of creating valid links between tests and how the content, use, and purposes of education testing in the United States influences the quality and meaning of those links. The book summarizes relevant prior linkage studies and presents a picture of the diversity of state testing programs. It also looks at the unique characteristics of the National Assessment of Educational Progress.

Uncommon Measures provides an answer to the question posed by Congress in Public Law 105-78, suggests criteria for evaluating the quality of linkages, and calls for further research to determine the level of precision needed to make inferences about linked tests. In arriving at its conclusions, the committee acknowledged that ultimately policymakers and educators must take responsibility for determining the degree of imprecision they are willing to tolerate in testing and linking. This book provides science-based information with which to make those decisions.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    Switch between the Original Pages, where you can read the report as it appeared in print, and Text Pages for the web version, where you can highlight and search the text.

    « Back Next »
  6. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  7. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  8. ×

    View our suggested citation for this chapter.

    « Back Next »
  9. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!