High-quality mathematics assessment must be shaped and defined by important mathematical content. This fundamental concept is embodied in the first of three educational principles to guide assessment.

**T****HE** **C****ONTENT** **P****RINCIPLE**

*Assessment should reflect the mathematics that is most important for students to learn.*

The content principle has profound implications for designing, developing, and scoring mathematics assessments as well as reporting their results. Some form of the content principle may have always implicitly guided assessment development, but in the past the notion of content has been construed in the narrow topic-coverage sense. Now content must be viewed much more broadly, incorporating the processes of mathematical thinking, the habits of mathematical problem solving, and an array of mathematical topics and applications, and this view must be made explicit. What follows is, nonetheless, a beginning description; much remains to be learned from research and from the wisdom of expert practice.

Many of the assessments in use today, such as standardized achievement tests in mathematics, have reinforced the view that the mathematics curriculum is built from lists of narrow, isolated skills that can easily be decomposed for appraisal. The new vision of mathematics requires that assessment reinforce a new conceptualization that is both broader and more integrated.

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
3
ASSESSING IMPORTANT MATHEMATICAL CONTENT
High-quality mathematics assessment must be shaped and defined by important mathematical content. This fundamental concept is embodied in the first of three educational principles to guide assessment.
THE CONTENT PRINCIPLE
Assessment should reflect the mathematics that is most important for students to learn.
The content principle has profound implications for designing, developing, and scoring mathematics assessments as well as reporting their results. Some form of the content principle may have always implicitly guided assessment development, but in the past the notion of content has been construed in the narrow topic-coverage sense. Now content must be viewed much more broadly, incorporating the processes of mathematical thinking, the habits of mathematical problem solving, and an array of mathematical topics and applications, and this view must be made explicit. What follows is, nonetheless, a beginning description; much remains to be learned from research and from the wisdom of expert practice.
DESIGNING NEW ASSESSMENT FRAMEWORKS
Many of the assessments in use today, such as standardized achievement tests in mathematics, have reinforced the view that the mathematics curriculum is built from lists of narrow, isolated skills that can easily be decomposed for appraisal. The new vision of mathematics requires that assessment reinforce a new conceptualization that is both broader and more integrated.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
The new vision of mathematics requires that assessment reinforce a new conceptualization that is both broader and more integrated.
Tests have traditionally been built from test blueprints, which have often been two dimensional arrays with topics to be covered along one axis and types of skills (or processes) on the other.1 The assessment is then created by developing questions that fit into one cell or another of this matrix. But important mathematics is not always amenable to this cell-by-cell analysis.2 Assessments need to involve more than one mathematical topic if students are to make appropriate connections among the mathematical ideas they have learned. Moreover, challenging assessments are usually open to a variety of approaches, typically using varied and multiple processes. Indeed, they can and often should be designed so that students are rewarded for finding alternative solutions. Designing tasks to fit a single topic and process distorts the kinds of assessments students should be able to do.
BEYOND TOPIC-BY-PROCESS FORMATS
Assessment developers need characterizations of the important mathematical knowledge to be assessed that reflect both the necessary coverage of content and the interconnectedness of topics and process. Interesting assessment tasks that do not elicit important mathematical thinking and problem solving are of no use. To avoid this, preliminary efforts have been made on several fronts to seek new ways to characterize the learning domain and the corresponding assessment. For example, lattice structures have recently been proposed as an improvement over matrix classifications of tasks.3 Such structures provide a different and perhaps more interconnected view of mathematical understanding that should be reflected in assessment.
The approach taken by the National Assessment of Educational Progress (NAEP) to develop its assessments is an example of the effort to move beyond topic-by-process formats. Since its inception, NAEP has used a matrix design for developing its mathematics assessments. The dimensions of these designs have varied over the years, with a 35-cell design used in 1986 and the design below for the 1990 and 1992 assessments. Although classical test theory strongly encouraged the use of matrices to structure and provide balance to examinations, the designs also were often the root cause of the decontextualizing of assessments. If 35 percent of the items on the assessment were to be from the area of measurement and 40 percent of those were to assess students' procedural

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
knowledge, then 14 percent of the items would measure procedural knowledge in the content domain of measurement. These items were developed to suit one cell of the matrix, without adequate consideration to the context and connections to other parts of mathematics.
Starting with the 1995 NAEP mathematics assessment, the use of matrices as a design feature has been discontinued. Percentages of items will be specified for each of the five major content areas, but some of these items will be double-coded because they measure content in more than one of the domains. Mathematical abilities categories—conceptual understanding, procedural knowl
NAEP 1990-1992 Matrix
Content
Numbers and Operations
Measurement
Geometry
Data Analysis, Probability, and Statistics
Algebra and Functions
Conceptual Understanding
Mathematical Ability
Procedural Knowledge
Problem Solving
edge, and problem solving—will come into play only at the final stage of development to ensure that there is balance among the three categories over the entire assessment (although not necessarily by each content area) at each grade level. This change, along with the continued use of items requiring students to construct their own responses, has helped provide a new basis for the NAEP mathematics examination.4
One promising approach to assessment frameworks is being developed by the Balanced Assessment Project, which is a National Science Foundation-supported effort to create a set of assessment packages, at various grade levels, that provide students, teachers, and administrators with a fair and deep characterization of student

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
attainment in mathematics.5 The seven main dimensions of the framework are sketched below:
content (which is very broadly construed to include concepts, senses, procedures and techniques, representations, and connections),
thinking processes (conjecturing, organizing, explaining, proving, etc.),
products (plans, models, reports, etc.),
mathematical point of view (real-world modeling, for example),
diversity (accessibility, sensitivity to language and culture, etc.),
circumstances of performance (amount of time allowed, whether the task is to be done individually or in groups, etc.), and
pedagogics-aesthetics (the extent to which a task or assessment is believable, engaging, etc.).
The first four dimensions describe aspects of the mathematical competency that the students are asked to demonstrate, whereas the last three dimensions pertain to characteristics of the assessment itself and the circumstances or conditions under which the assessment is undertaken.
One noteworthy feature of the framework from the Balanced Assessment Project is that it can be used at two different levels: at the level of the individual task and at the level of the assessment as a whole. When applied to an individual task, the framework can be used as more than a categorizing mechanism: it can be used to enrich or extend tasks by suggesting other thinking processes that might be involved, for example, or additional products that students might be asked to create. Just as important, the framework provides a way of examining the balance of a set of tasks that goes beyond checking off cells in a matrix. Any sufficiently rich task will involve aspects of several dimensions and hence will

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
strengthen the overall balance of the entire assessment by contributing to several areas. Given a set of tasks, one can then examine the extent to which each aspect of the framework is represented, and this can be done without limiting oneself to tasks that fit entirely inside a particular cell in a matrix.
As these and other efforts demonstrate, researchers are attempting to take account of the fact that assessment should do much more than test discrete procedural skills.6 The goal ought to be schemes for assessment that go beyond matrix classification to assessment that elicits student work on the meaning, process, and uses of mathematics. Although the goal is clearly defined, methods to achieve it are still being explored by researchers and practitioners alike.
SPECIFYING ASSESSMENT FRAMEWORKS
An assessment framework should provide a way to examine the balance of a set of tasks that goes beyond checking off cells in a matrix.
Assessment frameworks provide test developers with the guidance they need for creating new assessments. Embedded in the framework should be information to answer the following kinds of questions: What mathematics should students know before undertaking an assessment? What mathematics might they learn from the assessment? What might the assessment reveal about their understanding and their mathematical power? What mathematical background are they assumed to have? What information will they be given before, during, and after the assessment? How might the tasks be varied, extended, and incorporated into current instruction?
Developers also need criteria for determining appropriate student behavior on the assessment: Will students be expected to come up with conjectures on their own, for example, or will they be given some guidance, perhaps identification of a faulty conjecture, which can then be replaced by a better one? Will they be asked to write a convincing argument? Will they be expected to explain their conjecture to a colleague or to the teacher? What level of conjecture and argument will be deemed satisfactory for these tasks? A complete framework might also include standards for student performance (i.e., standards in harmony with the desired curriculum).
Very few examples of such assessment frameworks currently exist. Until there are more, educators are turning to curriculum frameworks, such as those developed by state departments of education

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
across the country, and adapting them for assessment purposes. The state of California, for example, has a curriculum framework that asserts the primacy of developing mathematical power for all students: "Mathematically powerful students think and communicate, drawing on mathematical ideas and using mathematical tools and techniques."7 The framework portrays the content of mathematics in three ways:
Strands (such as number, measurement, and geometry) run throughout the curriculum from kindergarten through grade 12. They describe the range of mathematics to be represented in the curriculum and provide a way to assess its balance.
Unifying ideas (such as proportional relationships, patterns, and algorithms) are major mathematical ideas that cut across strands and grades. They represent central goals for learning and set priorities for study, bringing depth and connectedness, to the student's mathematical experience.
Units of instruction (such as dealing with data, visualizing shapes, and measuring inaccessible distances) provide a means of organizing teaching. Strands are commingled in instruction, and unifying ideas give too big a picture to be useful day to day. Instruction is organized into coherent, manageable units consisting of investigations, problems, and other learning activities.
Through the California Learning Assessment System, researchers at the state department of education are working to create new forms of assessment and new assessment tasks to match the curriculum framework.8
Further exploration is needed to learn more about the development and appropriate use of assessment frameworks in mathematics education. Frameworks that depict the complexity of mathematics enhance assessment by providing teachers with better targets for teaching and by clearly communicating what is valued to students, their parents, and the general public.9 Although an individual assessment may not treat all facets of the framework, the collection of assessments needed to evaluate what students are learning should be comprehensive. Such completeness is necessary if assessments are to provide the right kind of leadership for educa-

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
tional change. If an assessment represents a significant but small fraction of important mathematical knowledge and performance, then the same assessment should not be used over and over again. Repeated use could inappropriately narrow the curriculum.
DEVELOPING NEW ASSESSMENT TASKS
Several desired characteristics of assessment tasks can be deduced from the content principle and should guide the development of new assessment tasks.
TASKS REFLECTING MATHEMATICAL CONNECTIONS
Current mathematics education reform literature emphasizes the importance of the interconnections among mathematical topics and the connections of mathematics to other domains and disciplines. Much assessment tradition is based, however, on an atomistic approach that in practice, if not in theory, hides the connections among aspects of mathematics and between mathematics and other domains. Assessment developers will need to find new ways to reflect these connections in the assessment tasks posed for students.
One way to help ensure the interconnectedness is to create tasks that ask students to bring to bear a variety of aspects of mathematics. An example involving topics from arithmetic, geometry, and measurement appears on the following page.10 Similarly, tasks may ask students to draw connections across various disciplines. Such tasks may provide some structure or hints for the students in finding the connections or may be more open-ended, leaving responsibility for finding connections to the students. Each strategy has its proper role in assessment, depending on the students' experience and accomplishment.
Another approach to reflecting important connections is to set tasks in a real-world context. Such tasks will more likely capture students' interest and enthusiasm and may also suggest new ways of understanding the world through mathematical models so that the assessment becomes part of the learning precess. Moreover, the "situated cognition" literature11 suggests that the specific settings and

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
Lightning Strikes Again!
One way to estimate the distance from where lightning strikes to you is to count the number of seconds until you hear the thunder and then divide by five. The number you get is the approximate distance in miles.
One person is standing at each of the four points A, B, C, and D. They saw lightning strike at E. Because sound travels more slowly than light, they did not hear the thunder right away.
1.Who heard the thunder first? _____ Why?
Who heard it last? _____ Why?
Who heard it after 17 seconds? _____ Explain your answer.
2. How long did the person at B have to wait to hear the thunder?
3. Now suppose lightning strikes again at a different place. The person at A and the person at C both hear the thunder after the same amount of time. Show on the map below where the lightning might have struck.
4. In question 3, are there other places where the lightning could have struck? Explain your answer.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
contexts in which a mathematical situation is embedded are critical determinants of problem solvers' responses to that situation. Developers should not assume, however, that just because a mathematical task is interesting to students, it therefore contains important mathematics. The mathematics in the task may be rather trivial and therefore inappropriate.
Test items that assess one isolated fragment of a student's mathematical knowledge may take very little time and may yield reliable scores when added together. However, because they are set in no reasonable context, they do not provide a full picture of the student's reasoning. They cannot show how the student connects mathematical ideas, and they seldom allow the student an opportunity to explain or justify a line of thinking.
Students should be clear about the context in which a question is being asked. Either the assumptions necessary for students to use mathematics in a problem situation should be made clear in the instructions or students should be given credit for correct reasoning under various assumptions. The context of a task, of course, need not be derived from mathematics. The example at right contains a task from a Kentucky statewide assessment for twelfth-graders that is based on the notion of planning a budget within certain practical restrictions.12
Budget Planning Task
You graduated from Fairdale High School 2 years ago, and although you did not attend college, you have been attending night school to learn skills to repair video cassette recorders while you worked for minimum wages at a video center by day. Now you have been fortunate to find an excellent job that requires the special skills you have developed. Your salary will be $18,000.
This new job excites you because for some time you have been wanting to move out of your parents' home to your own apartment. During the past 2 years you have been able to buy your own bedroom set, a television, a stereo, and some of your own dishes and utensils.
To move to your own apartment, you will need to develop a budget. Your assignment is to develop a monthly budget showing how you will live on the income from your new job. To guide you, read the list below. (A packet of resource materials is provided, including a newspaper and brochures with consumer information.)
Estimate your monthly take-home pay. Remember that you must allow for city, state, federal, social security, and property taxes. Assume that city, state, federal, and social security taxes are 25% of your gross pay.
Using the newspaper provided, investigate various apartments and decide which one you will rent.
You will need a car on your new job. Price several cars and decide how much money you will need to borrow to buy the car you select; estimate the monthly payment. Use the newspaper and other consumer materials provided to make your estimate. Property taxes will be $10 per $1,000 assessed value.
You will do your own cooking. Figure how much you will spend on food, cooking and eating out.
As you plan your budget, don't forget about clothing, savings, entertainment and other living expenses.
Your budget for this project should be presented as a one-page, two-column display. Supporting this one-page budget summary, you should submit an explanation for each budget figure, telling how/where you got the information.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
Other examples of age-appropriate contexts can be found in the fourth-grade assessments developed by the New Standards Project (NSP), a working partnership of researchers and state and local school districts formed to develop new systems of performance-based assessments. One such problem includes a fairly complex task in which children are given a table of information about various kinds of tropical fish (their lengths, habits, prices, etc.) and are asked to propose how to spend a fixed amount of money to buy a variety of fish for an aquarium of limited capacity, under certain realistic constraints.13 The child must develop a solution that takes the various constraints into account. The task offers ample possibilities for students to display reasoning that connects mathematics with the underlying content.
THE CHALLENGES IN MAKING CONNECTIONS
The need to reflect mathematical connections pushes task development in new directions, each presenting challenges that require attention.
Assessment tasks can use unusual, yet realistic settings, so that everyone's prior knowledge of the setting is the same.
Differential Familiarity Whatever the context of a mathematical task, some students will be more familiar with it than other students, possibly giving some an unfair advantage. One compensating approach is to spend time acquainting all students with the context. The NSP, for example, introduces the context of a problem in an assessment exercise in a separate lesson, taught before the assessment is administered.14 Presumably the lesson reduces the variability among the students in their familiarity with the task setting. The same idea can be found in some of the assessment prototypes in Measuring Up: Prototypes for Mathematics Assessment. In one prototype, for instance, a script of a videotaped introduction was suggested;15 playing such a videotape immediately before students work on the assessment task helps to ensure that everyone is equally familiar with the underlying context.
Another approach is to make the setting unusual, yet realistic, so that everyone will be starting with a minimum of prior knowledge. This technique was used in a study of children's problem solving conducted through extended individual task-based interviews.16 The context used as the basis of the problem situation—a complex game involving probability—was deliberately constructed so that it would be unfamiliar to everyone. After extensive pilot testing of many variations,

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
an abstract version of the game was devised in which children's prior feelings and intuitive knowledge about winning and losing (and about competitions generally) could be kept separate from their mathematical analyses of the situation.
Task developers must consider whether students' assumptions affect the mathematics called for in solution of a problem.
Clarifying Assumptions Task developers must consider seriously the impact of assumptions on any task, particularly as the assumptions affect the mathematics that is called for in solution of the problem. An example of the need to clarify assumptions is a performance assessment17 that involves four tasks, all in the setting of an industrial arts class and all involving measuring and cutting wood. As written the tasks ignore an important idea from the realm of wood shop: When one cuts wood with a saw, a small but significant amount of wood is turned into sawdust. This narrow band of wood, called the saw's kerf, must always be taken into account, for otherwise the measurements will be off. The tasks contain many instances of this oversight: If, for example, a 16-inch piece is cut from a board that is 64 inches long, the remaining piece is not 48 inches long. Thus students who are fully familiar with the realities of wood shop could be at a disadvantage, since the problems posed are considerably more difficult when kerf is taken into account. Any scoring guide should provide an array of plausible answers for such tasks to ensure that students who answer the questions more accurately in real-world settings are given ample credit for their work. Better yet, the task should be designed so that assumptions about kerf (in this case) are immaterial to a solution.
Another assessment item18 that has been widely discussed19 also shows the need to clarify assumptions. In 1982, this item appeared in the third NAEP mathematics assessment: "An army bus holds 36 soldiers. If 1128 soldiers are being bussed to their training site, how many buses are needed?'' The responses have been taken as evidence of U.S. students' weak understanding of mathematics, because only 33 percent of the 13-year-old students surveyed gave 32 as the answer, whereas 29 percent gave the quotient 31 with a remainder, and 18 percent gave just the quotient 31. There are of course many possible explanations as to why students who performed the division failed to give the expected whole-number answer. One plausible explanation may be that some students did not see a need to use one more bus to transport the remaining 12 soldiers. They could squeeze into the other buses; they could go by car. Asked about their answers in interviews or in writing, some

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
MATHEMATICAL EXPERTISE
New kinds of assessments call for new kinds of expertise among those who develop the tasks.
New kinds of assessments call for new kinds of expertise among those who develop the tasks. The special features of the mathematics content and the special challenges faced in constructing assessment tasks illustrate a need for additional types of expertise in developing assessment tasks and evaluation schema. Task developers need to have a high level of understanding of children, how they think about things mathematical and how they learn mathematics, well beyond the levels assumed to be required to develop assessment tasks in the past. Developers must also have a deep understanding of mathematics and its applications. We can no longer rely on task developers with superficial understanding of mathematics to develop assessment tasks that will elicit creative and novel mathematical thinking.
SCORING NEW ASSESSMENTS
The content principle also has implications for the mathematical expertise of those who score assessments and the scoring approaches that they use.
JOINING TASK DEVELOPMENT TO STUDENT RESPONSES
A multiple-choice question is developed with identification of the correct answer. Similarly, an open-ended task is incomplete without a scoring rubric—a scoring guide—as to how the response will be evaluated. Joining the two processes is critical because the basis on which the response will be evaluated has many implications for the way the task is designed, and the way the task is designed has implications for its evaluation.
Just as there is a need to try out multiple-choice test questions prior to administration, so there is a need to try out the combination of task and its scoring rubric for open-ended questions. Students' responses give information about the design of both the task and the rubric. Feedback loops, where assessment tasks are modified and sharpened in response to student work, are especially important, in part because of the variety of possible responses.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
EVALUATING RESPONSES TO REFLECT THE CONTENT PRINCIPLE
The key to evaluating responses to new kinds of assessment tasks is having a scoring rubric that is tied to the prevailing vision of mathematics education. If an assessment consists of multiple-choice items, the job of determining which responses are correct is straightforward, although assessment designers have little information to go on in trying to decide why students have made certain choices. They can interview students after a pilot administration of the test to try to understand why they chose the answers they did. The designers can then revise the item so that the erroneous choices may be more interpretable. If ambiguity remains and students approach the item with sound interpretations that differ from those of the designers, the response evaluation cannot help matters much. The item is almost always scored either right or wrong.25
Designers of open-ended tasks, on the other hand, ordinarily describe the kinds of responses expected in a more general way. Unanticipated responses can be dealt with by judges who discuss how those responses fit into the scoring scheme. The standard-setting process used to train judges to evaluate open-ended responses, including portfolios, in the Advanced Placement (AP) program of the College Board, for example, alternates between the verbal rubrics laid out in advance and samples of student work from the assessment itself.26 Portfolios in the AP Studio Art evaluation are graded by judges who first hold a standard-setting session at which sample portfolios representing all the possible scores are examined and discussed. The samples are used during the judging of the remaining portfolios as references for the readers to use in place of a general scoring rubric. Multiple readings and moderation by more experienced graders help to hold the scores to the agreed standard.27 Together, graders create a shared understanding of the rubrics they are to use on the students' work. Examination boards in Britain follow a similar procedure in marking students' examination papers in subjects such as mathematics, except that a rubric is used along with sample examinations discussed by the group to help examiners agree on marks.28
The development of high-quality scoring guides to match new assessment is a fairly recent undertaking. One approach has been first

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
to identify in general terms the levels of desired performance and then to create task-specific rubrics. An example from a New Jersey eighth-grade "Early Warning" assessment appears on the following page.29
Profound challenges confront the developer of a rating scheme regardless of the system of scoring or the type of rubric used.
A general rubric can be used to support a holistic scoring system, as New Jersey has done, in which the student's response is examined and scored as a whole. Alternatively, a much more refined analytic scheme could be devised in which specific features or qualities of a student's response are identified, according to predetermined criteria, and given separate scores. In the example from New Jersey, one can imagine a rubric that yields two independent scores: one for the accuracy of the numerical answer and one for the adequacy of the explanation.
Assessors are experimenting with both analytic and holistic approaches, as well as a amalgam of the two. For example, in the Mathematics Performance Assessment developed by The Psychological Corporation,30 responses are scored along the dimensions of reasoning, conceptual knowledge, communication, and procedures, with a separate rubric for each dimension. In contrast, QUASAR, a project to improve the mathematics instruction of middle school students in economically disadvantaged communities,31 uses an approach that blends task-specific rubrics with a more general rubric, resulting in scoring in which mathematical knowledge, strategic knowledge, and communication are considered interrelated components. These components are not rated separately but rather are to be considered in arriving at a holistic rating.32 Another approach is through so-called protorubrics, which were developed for the tasks in Measuring Up.33 The protorubrics can be adapted for either holistic or analytic approaches and are designed to give only selected characteristics and examples of high, medium, and low responses.
Profound challenges confront the developer of a rating scheme regardless of the system of scoring or the type of rubric used. If a rubric is developed to deal with a single task or a type of task, the important mathematical ideas and processes involved in the task can be specified so that the student can be judged on how well those appear to have been mastered, perhaps sacrificing some degree of interconnectedness among tasks. On the other hand, general rubrics may not allow scorers to capture some important qualities of students' thinking about a particular task. Instead,

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
From a Generalized Holistic Scoring Guide to a Specific Annotated Item Scoring Guide
Generalized Scoring Guide:
Student demonstrates proficiency — Score Point = 3.
The student provides a satisfactory response with explanations that are plausible, reasonably clear, and reasonably correct, e.g., includes appropriate diagram(s), uses appropriate symbols or language to communicate effectively, exhibits an understand of the mathematics of the problem, uses appropriate processes and/or descriptions to answer the question, and presents sensible supporting arguments. Any flaws in the response are minor.
Student demonstrates minimal proficiency — Score Point = 2
The student provides a nearly satisfactory response which contains some flaws, e.g., begins to answer the question correctly but fails to answer all of its parts or omits appropriate explanation, draws diagram(s) with minor flaws, makes some errors in computation, misuses mathematical language, or uses inappropriate strategies to answer the question.
Student demonstrates a lack of proficiency — Score Point = 1
The student provides a less than satisfactory response that only begins to answer the question, but fails to answer it completely, e.g., provides little or no appropriate explanation, draws diagram(s) which are unclear, exhibits little or no understanding of the question being asked, or makes major computational errors.
Student demonstrates no proficiency — Score Point = 0
The student provides an unsatisfactory response that answers the question inappropriately, e.g., uses algorithms which do not reflect any understanding of the question, makes drawings which are inappropriate to the question, provides a copy of the question without an appropriate answer, fails to provide any information which is appropriate to the question, or fails to attempt to answer the question.
Specific Problem:
What digit is in the fiftieth decimal place of the decimal form of 3/11? Explain your answer.
Annotated Scoring Guide:
3 points The student provides a satisfactory response; e.g., indicates that the digit in the fiftieth place is 7 and shows that the digits 2 and 7 in the quotient (.272727 …) alternate; the explanation of why 7 is the digit in the fiftieth place is either based on some counting procedure or on the pattern of how the digits are positioned after the decimal point. (The student could read fiftieth as fifteenth or fifth, identify 2 as the digit, and provide an explanation similar to the ones above.)
2 points The student provides a nearly satisfactory response which contains some flaws, e.g., identifies the pattern of the digits 2 and 7 (.272727 …) and provide either a weak or no explanation of why 7 is the digit in the fiftieth place OR converts 3/11 incorrectly to 3.666 … and provides some explanation of why 6 is the digit in the fiftieth place.
1 point The student provides a less than satisfactory response that only begins to answer the question; e.g., begins to divide correctly (minor flaws in division are allowed) but fails to identify "the digit" OR identifies 7 as the correct digit with no explanation or work shown.
0 points The student provides an unsatisfactory response; e.g., either answers the question inappropriately or fails to attempt to answer the question.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
anecdotal evidence suggests that students may be given credit for verbal fluency or for elegance of presentation rather than mathematical acumen. The student who mentions everything possible about the problem posed in the task and rambles on about minor points the teacher has mentioned in class may receive more credit than a student who has deeper insights into the problem but produces only a terse, minimalist solution. The beautiful but prosaic presentation with elaborate drawings may inappropriately outweigh the unexpected but elegant solution. Such difficulties are bound to arise when communication with others is emphasized as part of mathematical thinking, but they can be dealt with more successfully when assessors include those with expertise in mathematics.
Unanticipated responses require knowledgeable graders who can recognize and evaluate them.
In any case, regardless of the type of rubric, graders must be alert to the unconventional, unexpected answer, which, in fact, may contain insights that the assessor had not anticipated. The likelihood of unanticipated responses will depend in part upon the mathematical richness and complexity of the task. Of course, the greater the chances of unanticipated responses, the greater the mathematical sophistication needed by the persons grading the tasks: the graders must be sufficiently knowledgeable to recognize kernels of mathematical insight when they occur. Similarly, graders must sharpen their listening skills for those instances in which task results are communicated orally. Teachers are uniquely positioned to interpret their students' work on internal and external assessments. Personal knowledge of the students enhances their ability to be good listeners and to recognize the direction of their students' thinking.
There may also be a need for somewhat different rubrics even on the same task because judgment of draft work should be different from judgment of polished work. With problem solving a main thrust of mathematics education, there is a place for both kinds of judgments. Some efforts are under way, for example, to establish iterative processes of assessment: Students work on tasks, handing it in to teachers to receive comments about their work in progress. With these comments in hand, students may revise and extend their work. Again, it goes to the teacher for comment. This back-and-forth process may continue several times, optimizing the opportunity for students to learn from the assessment. Such a model will require appropriate rubrics for teachers and students alike to judge progress at different points.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
REPORTING ASSESSMENT RESULTS
Consideration of issues about the dissemination of results are often not confronted until after an assessment has been administered. This represents a missed opportunity, particularly from the perspective of the content principle. Serious attention to what kind of information is needed from the assessment and who needs it should influence the design of the assessment and can help prevent some of the common misuses of assessment data by educators, researchers, and the public. The reporting framework itself must relate to the mathematics content that is important for all students to learn.
There has been a long tradition in external assessment of providing a single overall summary score, coupled in some cases with subscores that provide a more fine-grained analysis. The most typical basis for a summary score has been a student's relative standing among his or her group of peers. There have been numerous efforts to move to other information in a summary score, such as percent mastery in the criterion-related measurement framework. One innovative approach has been taken by the Western Australia Monitoring Standards in Education program. For each of five strands (number; measurement; space; chance and data; algebra) a student's performances on perhaps 20 assessment tasks are arrayed in such a way that overall achievement is readily apparent while at the same time some detailed diagnostic information is conveyed.34 NAEP developed an alternative approach to try to give meaning to summary scores beyond relative standing. NAEP used statistical techniques to put all mathematics items in the same mathematics proficiency scale so that sets of items can be used to describe the level of proficiency a particular score represents.35 Although these scales have been criticized for yielding misinterpretations about what students know and can do in mathematics,36 they represent one attempt to make score information more meaningful.
Similarly, some teachers focus only on the correctness of the final answer on teacher-made tests with insufficient attention to the mathematical problem solving that preceded it. Implementation of the content principle supports a reexamination of this approach. Problem solving legitimately may involve some false starts or blind alleys; students whose work includes such things are doing important mathematics.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
Rather than forcing mathematics to fit assessment, assessment must be tailored to whatever mathematics is important to learn.
Along with the efforts to develop national standards in various fields, there is a push to provide assessment information in ways that relate to progress toward those national standards. Precisely how such scores would be designed to relate to national standards and what they would actually mean are unanswered questions. Nonetheless, this push also is toward reporting methods that tell people directly about the important mathematics students have learned. This is the approach that NAEP takes when it illustrates what basic, proficient, and advanced mean by giving specific examples of tasks at these levels.
An assessment framework that is used as the foundation for the development of an assessment may provide, at least in part, a lead to how results of the assessment might be reported. In particular, the major themes or components of a framework will give some guidance with regard to the appropriate categories for reporting. For example, the first four dimensions of the Balanced Assessment Project's framework suggest that attention be paid to describing students' performance in terms of thinking processes used and products produced as well as in terms of the various components of content. In any case, whether or not a direct connection between aspects of the framework and reporting categories is made, a determination of reporting categories should affect and be affected by the categories of an assessment framework.
The mathematics in an assessment should never be distorted or trivialized for the convenience of assessment. Design, development, scoring, and reporting of assessments must take into account the mathematics that is important for students to learn.
In summary, rather than forcing mathematics to fit assessment, assessment must be tailored to whatever mathematics is important to assess.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
ENDNOTES
1
For examples of such matrices, see Edward G. Begle and James W. Wilson, "Evaluation of Mathematics Programs," in Edward G. Begle, ed., Mathematics Education, 69th Yearbook of the National Society for the Study of Education, pt. 1 (Chicago, IL: University of Chicago Press, 1970), 367-404; for a critique of this approach to content, see Thomas A. Romberg, E. Anne Zarinnia, and Kevin F. Collis, "A New World View of Assessment in Mathematics," in Gerald Kulm, ed., Assessing Higher Order Thinking in Mathematics (Washington, D.C.: American Association for the Advancement of Science, 1990), 24-27.
2
Edward A. Silver, Patricia Ann Kenney, and Leslie Salmon-Cox, The Content and Curricular Validity of the 1990 NAEP Mathematics Items; A Retrospective Analysis (Pittsburgh, PA: Learning Research and Development Center, University of Pittsburgh, 1991).
3
Edward Haertel and David E. Wiley, "Representations of Ability Structures: Implications for Testing," in Norman Fredericksen, Robert J. Mislevy, and Isaac I. Bejar, eds., Test Theory for a New Generation of Tests (Hillsdale, NJ: Lawrence Erlbaum Associates, 1992).
4
John Dossey, personal communication, 24 June 1993.
5
Alan H. Schoenfeld, Balanced Assessment for the Mathematics Curriculum: Progress Report to the National Science Foundation (Berkeley, CA: University of California, June 1993).
6
See also Suzanne P. Lajoie, "A Framework for Authentic Assessment in Mathematics," in Thomas A. Romberg, ed., Reform in School Mathematics and Authentic Assessment, in press.
7
California Department of Education, Mathematics Framework for California Public Schools: Kindergarten Through Grade 12 (Sacramento, CA: Author, 1992), 20.
8
E. Anne Zarinnia and Thomas A. Romberg, "A Framework for the California Assessment Program to Report Students' Achievement in Mathematics," in Thomas A. Romberg, ed., Mathematics Assessment and Evaluation: imperatives for Mathematics Education (Albany, NY: State University of New York Press, 1992), 242-284.
9
Lauren B. Resnick and Daniel P. Resnick, "Assessing the Thinking Curriculum: New Tools for Educational Reform," in Bernard R. Gifford and Mary Catherine O'Connor, eds., Changing Assessments: Alternative Views of Aptitude, Achievement and Instruction (Boston, MA: Kluwer Academic Publishers, 1992), 37-75.
10
National Research Council, Mathematical Sciences Education Board, Measuring Up: Prototypes for Mathematics Assessment (Washington, D.C.: National Academy Press, 1993), 117-119.
11
John S. Brown, Allan Collins, and P. Duguid, "Situated Cognition and the Culture of Learning," Educational Researcher 18:1 (1989), 32-42; Ralph T. Putnam, Magdalene Lampert, and Penelope Peterson, "Alternative Perspectives on Knowing Mathematics in Elementary Schools," Review of Research in Education 16 (1990):57-150; James G. Greeno, "A Perspective on Thinking," American Psychologist 44:2 (1989), 134-141; James Hiebert and Thomas P. Carpenter, "Learning and Teaching with Understanding," in Douglas A. Grouws, ed., Handbook of Research on Mathematics Teaching and Learning (New York, NY: Macmillan Publishing Company, 1992), 65-97.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
12
Kentucky Department of Education, "All About Assessment," EdNews, Special Section, Jan/Feb 1992, 7.
13
Lauren B. Resnick, Diane Briars, and Sharon Lesgold, "Certifying Accomplishments in Mathematics: The New Standards Examining System," in Izaak Wirszup and Robert Strait, eds., Developments in School Mathematics Education Around the World, vol. 3 (Reston, VA: National Council of Teachers of Mathematics, 1992), 196-200.
14
Learning Research and Development Center, University of Pittsburgh and National Center on Education and the Economy, New Standards Project (Pittsburgh, PA: Author, 1993).
15
Measuring Up, 101-106.
16
Eve R. Hall, Edward T. Esty, and Shalom M. Fisch, "Television and Children's Problem-Solving Behavior: A Synopsis of an Evaluation of the Effects of Square One TV," Journal of Mathematical Behavior 9:2 (1990), 161-174.
17
The Riverside Publishing Company, Riverside Student Performance Assessment, Grade 8 Mathematics Sample Assessment (Riverside, CA: Author, 1991), 2-6.
18
Thomas P. Carpenter et al., "Results of the Third NAEP Mathematics Assessment: Secondary School," Mathematics Teacher, 76:9 (1983), 656.
19
See, for example, Alan H. Schoenfeld, "When Good Teaching Leads to Bad Results: The Disasters of 'Well Taught' Mathematics Classes," Educational Psychologist 23:2 (1988), 145-166; Mary M. Lindquist, "Reflections on the Mathematics Assessments of the National Assessment of Educational Progress," in Developments in School Mathematics Education Around the World, vol. 3.
20
Edward A. Silver, Lora J. Shapiro, and Adam Deutsch, "Sense Making and the Solution of Division Problems Involving Remainders: An Examination of Middle School Students' Solution Processes and Their Interpretations of Solutions," Journal for Research in Mathematics Education 24:2 (1993), 117-135.
21
California Assessment Program, Question E (Sacramento, CA: California State Department of Education, 1987).
22
Nancy S. Cole, Changing Assessment Practice in Mathematics Education: Reclaiming Assessment for Teaching and Learning (Draft version, 1992).
23
Adapted from Kirsten Hermann and Bent Hirsberg, "Assessment in Upper Secondary Mathematics in Denmark," in Mogens Niss, ed., Cases of Assessment in Mathematics Education (Dordrecht, The Netherlands: Kluwer Academic Publishers, 1993), 133.
24
Second International Mathematics Study, "Technical Report 4: Instrument Book," booklet 2LB, problem 26, (Urbana, IL: International Association for the Evaluation of Educational Achievement, 8, November 1985), 8. This was the only item, of those given to eighth graders in the U.S., that was later judged to have involved problem solving as specified in the NCTM Standards.
25
Peter Hilton, "The Tyranny of Tests," American Mathematical Monthly 100:4 (1993), 365-369.

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
26
"Representations of Ability Stuctures"; Robert J. Mislevy, "Test Theory Reconceived," Research Report, in press.
27
Ruth Mitchell, Testing for Learning: How New Approaches to Evaluation Can Improve American Schools (New York: Free Press, 1992).
28
Alan Bell, Hugh Burkhardt, and Malcolm Swan, "Assessment of Extended Tasks," in Richard Lesh and Susan J. Lamon, eds., Assessment of Authentic Performance in School Mathematics (Washington, D.C.: American Association for the Advancement of Science, 1992), 182.
29
New Jersey Department of Education, "Grade 8 Early Warning Test," Guide to Procedures for Scoring the Mathematics Constructed-Response Items (Trenton, NJ: Author, 1991), 4-6.
30
Marilyn Rindfuss, ed., Integrated Assessment System: Mathematics Performance Assessment Tasks Scoring Guides (San Antonio, TX: The Psychological Corporation, 1991).
31
Edward A. Silver, "QUASAR," Ford Foundation Letter, 20:3 (1989), 1-3.
32
Edward A. Silver and Suzanne Lane, "Assessment in the Context of Mathematics Instruction Reform: The Design of Assessment in the QUASAR Project," in Cases of Assessment in Mathematics Education: An ICMI Study.
33
Measuring Up, 14-16.
34
Geoff N. Masters, Inferring Levels of Achievement on Profile Strands (Hawthorn, Australia: Australian Council for Educational Research, 1993).
35
John A. Dossey et al., The Mathematical Report Card: Are We Measuring Up? (Princeton, NJ: Educational Testing Service, 1988).
36
Robert A. Forsyth, "Do NAEP Scales Yield Valid Criterion-Referenced Interpretations?" Educational Measurement: Issues and Practice 10:3 (1991), 3-9, 16. For a more recent critique of the procedures that the National Assessment Governing Board has used in setting and interpreting performance standards in the 1992 mathematics NAEP, see Educational Achievement Standards: NAGB's Approach Yields Misleading Interpretations (Washington, D.C.: General Accounting Office, 1993).

OCR for page 41

Measuring What Counts: A Conceptual Guide for Mathematics Assessment
This page in the original is blank.