at a single, summary statistic for student performance…. After all, it is critical to know that a student can arrive at an idea but cannot organize her or his writing or cannot use the resources of language in any but the most conventional and boring ways…. Finally, we have to consider different units of analysis…because so much of learning occurs either in social situations or in conjunction with tools or resources, we need to consider what performance looks like in those more complex units, (pp. 63–64)

Although a single statement could not be expected to outline all possible needs, the list provided here is challenging and instructive. Much of this agenda could be accomplished now, with the measurement tools already available. For example, the second call—for the “valuing of diversity of opinions among appraisers” —could be incorporated through the use of rater facets in the observations model (see Figure 4–4, above). And the third call—for something beyond “a single, summary statistic for student performance” —could be addressed using multidimensional item response models or, more broadly, the range of multiattribute models (as in Figure 4–7, above). Other parts of this agenda are less easily satisfied. Below we discuss some ways in which statistical approaches have been augmented to address the types of issues raised in the Wolf et al. agenda.

Progress Maps: Making Numbers More Interpretable

Wolf et al. call for a “developmentally ordered series of accomplishments” which could be regarded as a prompt to apply an ordered latent class approach; indeed, some see this as the only possible interpretation. Yet while there have been explications of that approach, this has not been a common usage. What has been attempted more frequently is enhancement of the continuous approaches to incorporate a developmental perspective. This approach, dubbed developmental assessment by Masters, Adams, and Wilson (1990), is based on the seminal work of Wright using the Rasch model and its extensions (Wright and Masters, 1982). In this approach, an attempt is made to construct a framework for describing and monitoring progress that is larger and more important than any particular test or method of collecting evidence of student achievement. A simple analogy is the scale for measuring weight. This scale, marked out in ounces and pounds, is a framework that is more important and “true” than any particular measuring instrument. Different instruments (e.g., bathroom scales, kitchen scales) can be constructed and used to measure weight against this more general reporting framework. This is done by developing a “criterion-referenced” interpretation of the scale (Glaser, 1963).

Under the “norm-referenced” testing tradition, each test instrument has a special importance. Students’ performances are interpreted in terms of the

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement