National Research Council. "2 Accumulation of Scientific Knowledge." Scientific Research in Education. Washington, DC: The National Academies Press, 2002. 1. Print.
The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Scientific Research in Education
individuals on two tests. However, in working with the Kuder-Richardson formulas, Cronbach (1989) found that at times it produced numbers that were not believable—e.g., sometimes the estimated reliability was negative. In response, he (Cronbach, 1951) extended this work by providing a general formula that fit a very wide class of situations, not just dichotomously scored test questions.
Once easily usable formulas were available for computing measures of a test’s reliability, these measures could be used to study the factors that affect reliability. This led to improved test development and to the gradual recognition that different test uses required different measures of test reliability. In the 1960s, Cronbach, Rajaratnam, and Gleser (1963), drawing on advances in statistical theory (especially Fisher’s variance partitioning and random components of variance theory) incorporated this understanding into a framework that accounted, simultaneously, for multiple sources of measurement error. Generalizability theory (Cronbach, Gleser, Nanda and Rajaratnam, 1972), now provides a systematic analysis of the many facets that affect test score consistency and measurement error.
Test Validity
In a similar manner, the concept of test validity—initially conceived as the relation between test scores and later performance—has evolved as straightforward mathematical equations have given way to a growing understanding of human behavior. At first, validity was viewed as a characteristic of the test. It was then recognized that a test might be put to multiple uses and that a given test might be valid for some uses but not for others. That is, validity came to be understood as a characteristic of the interpretation and use of test scores, and not of the test itself, because the very same test (e.g., reading test) could be used to predict academic performance, estimate the level of an individual’s proficiency, and diagnose problems. Today, validity theory incorporates both test interpretation and use (e.g., intended and unintended social consequences).
While the problem of relating test results to later performance is quite old, Wissler (1901) was the first to make extensive use of the correlation coefficient, developed a decade earlier, to measure the strength of this relationship. He showed that the relationship between various physical and