Skip to main content

Currently Skimming:

9 Using Student Assessments for Educational Accountability
Pages 171-196

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 171...
... is one example of a reform approach that would rely heavily on holding educators accountable for improving students' performance, although it differs in some ways from many current assessment-based accountability systems. The PEER proposal focuses on three elements: the efficient use of resources, performance incentives for educators based on assessments of student performance, and continuous adaptation.
From page 172...
... A second source is the structure of the data in which test scores are typically embedded; assessment databases are rarely of a form that would permit accurate measurement of value added or of program effectiveness. A third source is behavioral responses to accountability: holding educators accountable for students' test scores can create undesirable practices that inflate scores and may undermine learning.
From page 173...
... Confidence in the reforms was so high at the outset that few programs were evaluated realistically. By the end of the decade, however, confidence in the reforms was supplanted by widespread suspicion that they had often degraded instruction and inflated test scores by inappropriate teaching to the test.
From page 174...
... A performance assessment that assigns students identical performance tasks administered under uniform conditions and scored according to uniform rules also is "standardized." Current reforms employ both standardized and unstandardized performance assessments. For example, the Vermont portfolio program is unstandardized in terms of both task selection and administrative conditions.
From page 175...
... At the other extreme, many current assessments, such as NAEP and many statewide assessments, are intended to support inferences about far broader domains, such as the cumulative mastery of mathematics by eighth-grade students. Truly narrow domains that might be tested more fully, such as "using the distributive law to simplify simple algebraic expressions," are the focus of pop quizzes, not of the assessments debated by the press and policymakers.
From page 176...
... The extent to which they are successful in this later learning may depend in substantial part on the body of knowledge and skills that students have at graduation, much of which can be tested. But it is also likely to depend on attitudes and habits that are not typically measured by achievement tests -- an attitude that mathematical problems are interesting and tractable, for example, or an interest in and willingness to weigh carefully conflicting evidence and competing positions underlying political arguments.
From page 177...
... Second, most assessment databases include only limited information on the background factors that exert powerful influences on test scores. For example, parental education, income, and ethnicity are all known to be very powerful predictors of performance on tests.
From page 178...
... (1990) showed that between the mid-1970s and mid-1980s, trends in third-grade mathematics scores varied substantially across the norming samples for commercial standardized tests, with rates of change at the median ranging from –1.0 to +2.2 national percentile ranks per year.
From page 179...
... Third, school effectiveness indices are often unstable over time, a critical limitation in accountability systems that depend on measures of change. For example, Rowan and his colleagues (Rowan and Denk, 1983; Rowan et al., 1983)
From page 180...
... This is particularly problematic in the case of elementary schools, which typically have far fewer students per grade than do secondary schools and therefore have averages that are more influenced by a few particularly good or bad students. In a set of schools that vary markedly in terms of background characteristics of students, the stability of those characteristics will often induce some stability of rankings in terms of raw scores.
From page 181...
... Although this pattern might have a number of explanations, a likely one is that the NAEP, unlike many state and local tests, was immune to the corrupting influence of teaching to the test. A second suggestion of inflated test scores from aggregate data is the socalled Lake Wobegon phenomenon: the fact that most states with statewide data and an implausible proportion of districts in other states reported themselves to be "above the national average." This pattern was first reported by a family practitioner in West Virginia who was skeptical of his own state's scores and called around the country to get information from other states and districts (Cannell, 1987)
From page 182...
... Like most commercial norm-referenced tests, the results of both tests were reported using the same metrics, such as national percentile ranks and grade equivalents, which simplifies comparisons between them. In the fourth year in which test B was administered, Koretz et al.
From page 183...
... When the district first switched from test C to test B, the median score in the district (the median of school medians) fell about half an academic year, from a grade equivalent (GE)
From page 184...
... These teachers have an additional option for responding to test-based accountability, that is, to reallocate time across subject areas or other activities to maximize score gains. This is particularly problematic in the case of the many testing programs that assess only a small number of subject areas.
From page 185...
... For example, a number of companies (including but not limited to publishers of achievement tests) sell materials designed to help students prepare for specific achievement tests, and some teachers devote considerable time to using them.
From page 186...
... The Educational Testing Service traditionally has kept most SAT items secure and changed many for each new administration of the test, a procedure followed by the NAEP. Publishers of conventional achievement tests, on the other hand, tend to prepare only two highly similar forms of each edition of a test and to issue new editions only every six or seven years.
From page 187...
... In practice, however, it has been politically difficult to maintain an expensive and time-consuming samplingbased testing program that does not provide reliable scores for individual students. For example, Governor Wilson cited the lack of scores for individual students as one reason for terminating California's well-known assessment program, and both Kentucky and Maryland are now wrestling with the question of how to respond to pressure to report student-level scores from their matrixsampled assessments.
From page 188...
... Nevertheless, evidence about the instructional effects of performance assessment programs remains scarce. It is not clear under what circumstances these programs are conducive to improved teaching or what the effects are on student achievement.
From page 189...
... It is neither realistic nor desirable to avoid the use of achievement tests in accountability systems. Meeting more of the goals of an accountability system and minimizing undesirable effects, however, is likely to be far more complex and difficult than many advocates of test-based accountability contemplate.
From page 190...
... State-level NAEP scores, if available often enough in the same subjects and grades, might also serve as a mechanism for gauging inflation at the state level, although the NAEP is not designed to provide usable estimates at the level of individual schools and therefore would not be useful for determining whether there are specific schools in which inflation is particularly severe or modest. Finally, the impact, often technically termed "consequential validity," of accountability-oriented testing programs should be evaluated directly.
From page 191...
... Embedding Assessments in an Accountability System The larger question is the role of achievement tests in an educational accountability system. Achievement tests are an important but insufficient basis for holding schools accountable.
From page 192...
... Indeed, actions that are successful in terms of such proximal outcomes may even have negative effects on scores. For example, a teacher may decide to devote considerable time to a topic of unusual interest to her students to increase motivation, at the cost of reducing time devoted to other tested topics.
From page 193...
... " Invited debate, annual meeting of the American Educational Research Association, New Orleans, April. Cannell, J
From page 194...
... , The Effects of High-Stakes Testing. Symposium presented at the annual meetings of the American Educational Research Association and the National Council on Measurement in Education, Chicago, Ill.
From page 195...
... The Effects of High Stakes Testing, Symposium presented at the annual meetings of the American Educational Research Association and the National Council on Measurement in Education, Chicago, Ill.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.