Read "Improving Mathematics Education: Resources for Decision Making" at NAP.edu

Page 30 Cite

Suggested Citation:"5 How Do We Know That What We Are Doing Is Working?." National Research Council. 2001. Improving Mathematics Education: Resources for Decision Making. Washington, DC: The National Academies Press. doi: 10.17226/10268.

×

Page 30

5 How Do We Know That What We Are Doing Is Working?

Assessment has always been a critical element in education, used by classroom teachers, schools, districts, and (increasingly) states to determine what students know and what they are able to do. Its pervasiveness, political importance, and potential influence over student learning make it a powerful tool for change. Understanding how assessment operates, however, requires considering the entire range of assessments—formative classroom assessment, classroom tests, state and local tests, college entrance and placement practices, tests for teacher certification—and the context in which each of these is used. The interpretation and application of specific assessment tools is of critical concern for educators, and for researchers as well as for families and the community. The following questions should drive decisions made about assessment:

How can we assess the extent to which what we are doing is working?
How can student assessment yield accurate information regarding student achievement in mathematics?
How can large-scale student assessments be used fairly and appropriately?
How can critical decisions about tracking, promotion, and graduation be made on the basis of student assessments?

RESOURCE AVAILABLE

High Stakes: Testing for Tracking, Promotion, and Graduation, developed by the National Research Council's Committee on Appropriate Test Use, 1999.

OVERVIEW OF THE RESOURCE

Page 31 Cite

Suggested Citation:"5 How Do We Know That What We Are Doing Is Working?." National Research Council. 2001. Improving Mathematics Education: Resources for Decision Making. Washington, DC: The National Academies Press. doi: 10.17226/10268.

×

Page 31

High Stakes is a report by the National Research Council developed in response to a congressional request for such a study and for recommendations “on appropriate methods, practices, and safeguards to ensure that (a) existing and new tests that are used to assess student performance are not used in a discriminatory manner or inappropriately for student promotion, tracking, or graduation; and (b) existing and new tests adequately assess student reading and mathematics comprehension” (p. 1).

The report serves as a primer for the sensible use of high-stakes tests— capitalizing on their positive characteristics and minimizing their negative aspects. As noted in the introduction:

Most people seem to agree that America's public schools are in need of repair. How to fix them has become a favorite topic of policymakers, and for many the remedy includes increased reliance on the testing of students. The standards-based reform movement, for example, is premised on the idea of setting clear, high standards for what children are supposed to learn and then holding students—and often educators and schools—to those standards.

The logic seems clear: Unless we test students' knowledge, how will we know if they have met the standards? And the idea of accountability, which is also central to this theory of school reform, requires that the test results have direct and immediate consequences: A student who does not meet the standard should not be promoted, or awarded a high school diploma. This report is about the appropriate use of tests in making such high-stakes decisions about individual students, (p. 13)

High Stakes considers what constitutes appropriate use of tests in making teaching, promotion, and graduation decisions affecting individual students and emphasizes three criteria for judging the appropriateness of a particular test (p. 23):

“Measurement validity. Is the test appropriate for a particular purpose? Is there evidence that the constructs to be measured are relevant in making a decision? Does the test measure those constructs? Is it confounded with other constructs that are not relevant to the decision? Is the test reliable and accurate?
Attribution of cause. Does a student's performance on a test reflect knowledge and skills based on appropriate instruction, or is it attributable to poor instruction? Or is it attributable to factors such as language barriers or disabilities that are irrelevant to the construct being measured?

Page 32 Cite

Suggested Citation:"5 How Do We Know That What We Are Doing Is Working?." National Research Council. 2001. Improving Mathematics Education: Resources for Decision Making. Washington, DC: The National Academies Press. doi: 10.17226/10268.

×

Page 32

Effectiveness of treatment. Does performance on the test lead to placements or other decisions that are educationally beneficial and well matched to the student's needs?”

Based on these criteria, the reader is reminded that “blanket criticism of tests is not justified.” However, “it is also a mistake to accept observed test scores as either infallible or immutable” (p. 276).

In addition, the report helps to frame the dilemmas that arise from asking the same test to serve multiple functions and identifies seven distinct purposes of student assessment as a policy instrument (pp. 33–37):

Aid in making instructional decisions about individual students.
Provide information about the status of the educational system.
Motivate change by “shaking people up.”
Evaluate programs.
To hold schools and educators accountable for student performance.
Leverage change in classroom instruction.
To certify individual students as having attained specified levels of achievement.

The report seeks to clarify the relationship between the types and forms of assessment used and the purposes for which the assessment is given. It argues that standards-based approaches and accountability approaches can be compatible or incompatible, depending on what the tests measure, how they are used, and the regulations that govern their implementation and influence.

In making its recommendations, the report provides a clear picture of the tensions that abound on the assessment landscape. The reader is reminded of the tension between the enthusiasm of policymakers and the caution of experts that results in the twin dilemmas that (1) policy and public expectations of testing generally exceed the technical capacity of the tests themselves, and (2) the desire for more fairness and efficiency often conflicts with the impulse to sort and classify students (pp. 30–31).

The committee indicated a “strong need for better evidence on the intended benefits and unintended negative consequences of using high-stake tests to make decisions about individuals” (p. 8).

The report concludes that “large-scale assessments, used properly, can improve teaching, learning, and equality of educational opportunity” (p. 9). But, “when test use is inappropriate, especially in the case of high-stakes decisions about individuals, it can undermine the quality of education and equality of opportunity” (p. 276). Thus assessments have the potential for both help and harm, which should motivate action to ensure that educational tests are used fairly and effectively.

Page 33 Cite

Suggested Citation:"5 How Do We Know That What We Are Doing Is Working?." National Research Council. 2001. Improving Mathematics Education: Resources for Decision Making. Washington, DC: The National Academies Press. doi: 10.17226/10268.

×

Page 33

RECOMMENDATIONS MADE IN THE REPORT

The following recommendations represent a selection of the findings and recommendations presented in the report (pp. 275–290).

Decisions regarding appropriate use of tests should be based on the following principles:

– “First, the important thing about a test is not its validity in general, but its validity when used for a specific purpose.

– Second, tests are not perfect. Test questions are a sample of possible questions that could be asked in a given area. Moreover, a test score is not an exact measure of a student's knowledge or skills.

– Third, an educational decision that will have a major impact on a test taker should not solely or automatically be made on the basis of a single test score.

– Finally, neither a test score nor any other kind of information can justify a bad decision.” (p. 275)

“Accountability for educational outcomes should be a shared responsibility of states, school districts, public officials, educators, parents, and students. High standards cannot be established and maintained merely by imposing them on students.” (p. 278)
“As tracking is currently practiced, low-track classes are typically characterized by an exclusive focus on basic skills, low expectations, and the least-qualified teachers. Students assigned to low-track classes are worse off than they would be in other placements. This form of tracking should be eliminated. Neither test scores nor other information should be used to place students in such classes.” (p. 282)
“Scores from large-scale assessment should never be the only sources of information used to make a promotion or retention decision. No single source of information—whether test scores, course grades, or teacher judgments—should stand alone in making promotion decisions. Test scores should always be used in combination with other sources of information about student achievement.” (p. 286)
“The quality of the process of setting a cutscore on a graduation test should be documented and evaluated—including the qualifications of the judges employed, the method or methods employed, and the degree of consensus reached.” (p. 290)
“Students who fail should have opportunities to retake any test used in making promotion decisions. This implies that tests used in making promotion decisions should have alternate forms.” (p. 287)

Page 34 Cite

Suggested Citation:"5 How Do We Know That What We Are Doing Is Working?." National Research Council. 2001. Improving Mathematics Education: Resources for Decision Making. Washington, DC: The National Academies Press. doi: 10.17226/10268.

×

Page 34

“All students are entitled to sufficient test preparation so their performance will not be adversely affected by unfamiliarity with item format or by ignorance of appropriate test-taking strategies.” (p. 290)

“In general, large-scale assessments should not be used to make high-stakes decisions about students who are less than 8 years old or enrolled below grade 3.” (p. 279)

Recommendations related to assessing mathematical understanding can be found in Principles and Standards (p. 11), Adding It Up (p. 423–424), and How People Learn (p. 24).

ACTIONS EDUCATORS MIGHT CONSIDER

Based on the discussions, findings, and recommendations in High Stakes, educators and policymakers concerned with making critical decisions about tracking, promotion, and graduation might

Examine all assessment policies and procedures currently in place to ensure that the spirit of fair and appropriate uses of student assessments permeates practice.
Analyze the purposes for which any given student assessment was developed, and ensure that these intents match the actual uses of the assessment results.
Ensure that no high-stakes decision about an individual student is ever made on the basis of a single measure.

Improving Mathematics Education: Resources for Decision Making (2001)

Chapter: 5 How Do We Know That What We Are Doing Is Working?

5 How Do We Know That What We Are Doing Is Working?

RESOURCE AVAILABLE

OVERVIEW OF THE RESOURCE

RECOMMENDATIONS MADE IN THE REPORT

ACTIONS EDUCATORS MIGHT CONSIDER

Welcome to OpenBook!

Get Email Updates