| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 26
Assessment to Improve Learning
The U.S. programs presented at the workshop were selected based on an
informal review of efforts in states and districts to put into practice the goal of
aligning their assessment systems with standards and curricula. The committee
acknowledges that there are many more states and districts that are working to
bridge the gap between classroom and large-scale assessments, but exploring
more than a few at the present workshop was beyond its charge. The selected
programs are, however, exemplary in that they are making progress towards
goals of the kind identified by the committee. Not all of the programs have
articulated their goals in the same terms the committee had identified, but all
share a commitment to using assessments to improve learning, and were seen as
evidently meeting at least one of the criteria (see summary in Chapter 2, and
Boxes 2-1 and 2-2~.
NEBRASKA: SCHOOL-BASED TEACHER-LED ASSESSMENT
RECORDING SYSTEM
Nebraska is an interesting state to consider first because it had no statewide
assessment program at all until 2000, and thus had the benefit of many years to
observe the efforts of other states before initiating its own program. As Patricia
Roschewski, director of assessment for the Nebraska Department of Education,
explained, the state had first developed academic standards in 1998, and had
decided that it needed an assessment program for two reasons. First, the state
wanted to collect information about student performance that could be used to
improve instruction. Second, it wanted accountability data that could be shared
26
OCR for page 27
ASSESSMENT TO IMPROVE LEARNING
27
with the public. Nebraska was clear in wanting the primary stakeholders in the
system to be students and teachers, rather than policy makers, and this decision
led it to give teachers a key role in the assessment program.
The Nebraska program's title, the School-Based Teacher-Led Assessment
Recording System (STARS), is a very brief summary of the goals the state had
developed, and, as Patricia Roschewski explained, the focus on teachers led it to
devote a considerable proportion of the available resources to professional devel-
opment and support. Many Nebraska districts had developed their own assess-
ment systems, mostly criterion-referenced and classroom-based, but the state
perceived that teachers and administrators generally had had very little training in
assessment issues. STARS is essentially a way of building on existing local
assessments to meet the new statewide goals.
Under STARS, goals based on the state standards are set for each district and
school, and clearly articulated so that students, parents, and everyone else con-
cerned understands the expectations for learning. For accreditation purposes,
each district gives a norm-referenced test, such as the Terra Nova or the SAT 9,
which typically covers some 35-40 percent of the standards. The remaining
60-65 percent is measured using classroom-based assessments developed by
teachers; local teachers and administrators can blend these with activities and
assessments dictated by district curricula in whatever ways they choose. The
state monitors these assessments using a national advisory panel made up of
assessment experts and Nebraska educators. This panel reviews and rates assess-
ment portfolios prepared by the districts over a period of several months each
summer. Districts whose methods are not successful receive further support and
training; exemplary methods are shared around the state.
The system, Roschewski explained, works in part because the local curricula
and state standards are closely aligned and clearly understood, and in part because
intensive training has built the "assessment literacy" of the educators who are
responsible for the bulk of the assessment. Nebraska teachers who once had little
reason to think about issues such as validity and reliability are now responsible
for ensuring that they assess their students in ways that stand up to professional
scrutiny. The state had not thought in terms of bridging a particular gap, said
Roschewski, but rather had sought to focus on balancing and integrating a new
element the desire for more feedback that teachers and students could use to
improve learning and for information that could be used for accountability into
a system without disrupting the balance it had already achieved.
DELAWARE: COMPREHENSIVE SCIENCE ASSESSMENT
Delaware provides another example of a program that involved a significant
amount of teacher training to increase assessment literacy. Its comprehensive
science assessment grew out of the state' s commitment to improve science learn-
ing. Rachel Wood, education associate at the state's Science Resource Center,
OCR for page 28
28
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
described a process that began in 1992 with the development of state standards in
science. A needs assessment revealed that science was indeed getting short
shrift; in the elementary grades it was often taught for as little as forty-five
minutes a week. Curriculum materials were developed, and the state focused on
identifying explicit learning goals for the topics outlined in the standards. For
example, a requirement that fourth graders study electricity was broken down
into precise descriptions of the key concepts related to electricity that were to be
mastered. Attention was paid at the same time to both cognitive and practical
factors that would affect articulation so that the prerequisites for meeting the
curricular objectives were accomplished grade by grade.
Once the state was pleased with its curricular units, it took a look at the
accompanying end-of-unit assessments, and was not satisfied. In particular, it
found that the scoring rubrics were generic and provided little useful diagnostic
information. The state wanted to obtain summative information that could be
used for accountability purposes from assessments that were closely linked to the
curriculum, but also wanted the assessments to give teachers clear feedback they
could use to improve their instruction. Delaware wanted specific data about how
students were faring with particular elements of the curriculum, and it wanted
assessments that would be part of a continuous loop of feedback and improve-
ment, thus fostering a community in which teachers and students shared a sense
of the purpose of and expectations for science learning. The state made the
decision that teachers should be heavily involved in the assessment process, and
that a significant investment in professional development was needed.
One of the innovations Delaware instituted was in direct response to the need
for diagnostic assessment data. Using a system of double-digit scoring rubrics,
modeled after a strategy used in the performance component of the Third Inter-
national Mathematics and Science Study, educators could collect not only data
showing how well students did with particular items, but also data on the kinds of
misconceptions that kept them from complete understanding. In this system, the
first digit works in the same way many rubrics do, indicating that a response is
completely or partially correct. The second digit indicates the nature of miscon-
ceptions expressed in the answer (and raters are trained to recognize and code
these) so that teachers can see what is missing in their students' understanding.
Moreover, widespread misconceptions can often be traced to areas of the curricu-
lum that are not adequately addressed, or to ambiguities in texts or materials.
The state recognized that few teachers had sufficient background in assess-
ment issues to meet the emerging needs. With some outside resources, the state
provided intensive professional development for a cadre of teachers, who could
then branch out and work with other teachers. Not only did teachers undergo
training to improve their understanding of assessment issues as well as their
capacities for making use of both formative and summative assessments, they
were also increasingly linked together in the kind of community of learners
referred to earlier by Dylan Wiliam. Using shared materials available online,
OCR for page 29
ASSESSMENT TO IMPROVE LEARNING
29
including assessments, rubrics, and student work, as well as professional devel-
opment activities, teachers were encouraged to share ideas about specific goals
for student learning and ways to help their students meet them.
VERMONT: THE VERMONT ASSESSMENT SYSTEM AND
THE PARTNERSHIP FOR THE ASSESSMENT OF
STANDARDS-BASED SCIENCE
Vermont was the subject of national attention in the spring of 2002, when it
announced that it was considering foregoing public education funds so that it
would not have to comply with all of the assessment requirements of the No
Child Left Behind Act. The state subsequently decided to accept the funds and is
now trying to work out a way to satisfy the new federal requirements using
locally designed assessments as well as statewide, large-scale assessments, as it
has been doing for a number of years. Vermont's existing assessment program
was designed to rely in part on a formal set of assessment tools developed or
selected by districts, or, in some cases, by schools, to meet their specific needs.
As described by Bud Myers and David White, assessment coordinators in the
Vermont Department of Education, the state's goals for its local assessments are
very clear. Assessments are to
be linked to state and local content standards,
provide information that is valued at the local level,
support teaching and learning,
meet tough standards of reliability and validity, and
be part of a continuum of assessment strategies that serve a range of
purposes at the national, state, district, school, and program level, includ-
ing both evaluation and feedback to students.
Vermont has developed an infrastructure both to support teachers and
administrators in carrying out assessments and to ensure that the local assess-
ments meet quality standards. Technical advisory panels oversee the quality of
local assessments. Materials are provided to guide the development of local
assessments, and exemplary assessment tools, item banks, and other resources
are posted on a website accessible throughout the state. Review panels continu-
ously evaluate assessment tools, and summer institutes help teachers keep up to
date on assessment strategies. The state has built professional development for
both teachers and administrators into the system, and has developed master's
degree programs for teachers with an incomplete command of the mathematics
and science knowledge needed to teach the content outlined in the standards.
A part of Vermont' s assessment system is the Partnership for the Assessment
of Standards-Based Science (PASS) program, which is a commercially available
standards-based science assessment developed by WestEd. Kathy Comfort, prin-
f
OCR for page 30
30
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
cipal investigator and director of PASS, described how the program fits Vermont's
goals and dovetails with the larger question of integrating large-scale and class-
room assessments. PASS was originally developed as a large-scale assessment
that states and districts could use to measure their students' performance and
growth in science against national standards and learning goals. PASS also meets
the science assessment requirements of the No Child Left Behind act. The PASS
assessment is aligned with the content recommendations of the National Science
Education Standards (NRC, 1996) and the American Association for the Ad-
vancement of Science's Benchmarks for Science Literacy (1993~. It incorporates
multiple measures enhanced multiple-choice questions, hands-on performance
tasks, constructed-response investigations, and open-ended questions to get at
different kinds of knowledge and skills. WestEd staff worked closely with Ver-
mont officials to customize the assessment to Vermont's standards and learning
goals.
In response to feedback from PASS users, WestEd is developing ways that
the program could also be used to help inform instruction and guide professional
development. WestEd is using PASS to conduct research on the relationship
among different assessment components, instructional practices, and student
achievement, and on teachers' understanding of large-scale assessment results
and the uses they make of the results in their classroom practice. Vermont
teachers develop school and classroom science assessments using the methodol-
ogy and learning goals of the PASS assessment. Teachers are also involved in
developing items and in scoring, which provides an opportunity for large numbers
of them to focus on specific performance expectations, and to share information
and ideas.
While Vermont is proud of what it has done to make local assessments an
integral part of its system, Bud Myers discussed some of the issues that are still of
concern. Questions have arisen about how to keep the local assessments secure,
and also about ways to make sure all the stakeholders find them credible. Perhaps
foremost, however, is the question of resources. A significant degree of profes-
sional development, in both content and assessment issues, has been required to
achieve current levels of competence. Myers raised concerns about both the
funding and time that will be required to keep the program moving forward. He
also cited the requirements of the No Child Left Behind Act, noting that they are
not readily compatible with a system that relies as heavily as Vermont does on
local assessments. Adding additional assessments to meet the requirement would
substantially increase the assessment costs the state will have to bear.
WYOMING: BODY OF EVIDENCE SYSTEM
Wyoming' s newly approved system grew out of the desire to make sure that
graduating students had mastered the content specified in the state standards.
Scott Marion, former director of assessment for the Wyoming Department of
OCR for page 31
ASSESSMENT TO IMPROVE LEARNING
31
Education, described how, in lieu of an end-of-school exit exam, the state decided
to develop the Body of Evidence System (BOE). Under the system, students will,
over time, establish that they have mastered the material required for gradua-
tion—performance standards in nine content areas. They will be able to meet
these requirements as early as eighth grade, and typically will complete most by
the end of tenth grade. Multiple sources of evidence will be acceptable.
An important goal for the BOB system was to improve teaching, learning,
and classroom assessment; at the same time, Wyoming hoped to avoid some of
the negative consequences other states had encountered using single high-stakes
exams to make sure students had mastered graduation requirements. The state
has asked local districts to design the measures by which students would demon-
strate their mastery, based on a set of five assessment design principles arrived at
through a deliberative process. Each district's program will be evaluated in terms of:
alignment with the state's content and performance standards;
consistent and reliable application;
fairness, in that it is not biased against any subgroups and uses accommo-
dations and alternate assessments appropriately; and in that it provides
students with multiple opportunities, using different formats, to demon-
strate their knowledge and skills;
standard-setting, as revealed in the strength of its rationale for its method
of choosing cut scores, and how closely they are linked to performance
standards; and
comparability, through evidence that requirements are applied in compa-
rable ways across classrooms, programs, schools, and the district. (Wyo-
ming decided not to evaluate comparability from district to district, since
each would be meeting minimum requirements.)
Recognizing that in most cases local educators lack the expertise to design
the innovative measures Wyoming wanted to see in use, the state has begun
providing considerable professional development and technical support for this
endeavor. Moreover, it decided to use peer review to evaluate local systems, in
part because of the many opportunities this would provide for professional devel-
opment; reviewers are drawn from every one of Wyoming's districts and some
serve as team leaders throughout the state. The reviewers work with national
experts, Marion explained, and the review process has already helped those in-
volved grapple with the real meaning of alignment, coherence, and other assess-
ment design principles. In addition, to address the sometimes poor quality of
locally developed assessments, the state formed the Body of Evidence Consor-
tA cut score is a score point below which performance is deemed unacceptable for a particular
purpose.
OCR for page 32
32
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
tium, a partnership of almost all of the districts, Wyoming's Department of
Education, and national assessment experts, which disseminates assessment
knowledge and skills through workshops and other activities.
Marion discussed what he perceives as the most difficult challenges the state
has faced in implementing the BOE system. As noted, the state was initially
disappointed with the quality of many local assessments, and efforts to address
that problem have led in many cases to a deeper conversation about theories of
learning and modes of teaching. While this is an ongoing challenge, Marion was
pleased to find veteran teachers seeking guidance on how to modify their teach-
ing in light of what they had learned through the BOE process. On a more
practical note, the state has found that aggregating the various kinds of evidence
to make fair decisions about students across districts has been a challenge, as has
setting standards.
Reflecting on how Wyoming's system looks in light of the criteria presented
by the committee, Marion concluded that the BOE system has focused on finding
a variety of workable summative assessments. Consequently, it places relatively
little emphasis on classroom assessment the state hopes that the BOE system
will foster classroom discourse and the kinds of ongoing feedback that teachers
and students need, but it has not made that a requirement. He suggested that
while a system can try to address all of the criteria the committee identified, and
perhaps come close on many of them, there is a fundamental choice that needs to
be made in the end between the unique characteristics and demands of large-scale
assessment and those of classroom assessment. Marion expressed concern that
there is a contradiction between the goal of assessing the few, carefully chosen,
big ideas and the goal of assessing in a way that provides frequent and unobtru-
sive feedback. As he affirmed, "You can't assess big ideas very frequently unless
you are assessing parts of the big ideas, and then are they still big ideas?"
MAINE: COMPREHENSIVE ASSESSMENT SYSTEM
Like many states, Maine developed a new assessment system after new
standards were put into place. Jill Rosenblum and Pam Rolfe, assessment coor-
dinators at the Maine Mathematics and Science Alliance and the Maine Depart-
ment of Education, respectively, described the state's efforts. Maine had three
principal goals for its assessment program, as outlined in 1997 legislation, but it
highlighted as the first producing "high quality information about student perfor-
mance that will inform teaching and learning." The other two goals are monitor-
ing schools and administrative units and holding them accountable for their suc-
cess at making sure students meet the state standards, and certifying that students
have met the content standards.
Maine was determined to meet those goals with a system that delegated a
considerable amount of the assessment work to schools and districts. The state
administers a large-scale assessment in six subjects at grades four, eight, and
OCR for page 33
ASSESSMENT TO IMPROVE LEARNING
eleven, and participates
i,
33
n the National Assessment of Educational Progress.
While the state expects that it will need to further modify its system to meet the
requirements of the No Child Left Behind Act, it currently relies on local educa-
tors to devise their own strategies for all the remaining assessments required to
meet Maine's three goals. Table 5-1, provided by Rolfe and Rosenblum, summa-
rizes the basic structure of the system.
To unify its system, Maine developed a very specific "alignment protocol,"
which spells out in detail the relationship between the assessments at all levels
and the state standards. All assessments are to be linked to learning targets
described in the standards documents, and they are conducted at the classroom,
school, district, and state levels, as well as at all grades. It is left to the discretion
of local educators to determine when they think their students have mastered a
particular body of material and are ready to be assessed on it. Students are
assessed using a wide variety of methods, and are given multiple opportunities to
demonstrate their knowledge, understanding, and developing skills. The assess-
ments are in many cases common instruments but are tailored to fit local curricu-
lum and instruction, and provide immediate feedback to teachers and students.
The state is now completing the pilot testing of its assessment plan, which
uses a combination of anchor tasks, common tasks, and assessments developed
and selected at the local level. Thus, in Rosenblum's view, Maine avoided the
need to make the basic choice between large-scale and classroom objectives that
Scott Marion identified in Wyoming. Maine, she argued, has taken a middle
TABLE 5-1 Characteristics of Maine's Assessment System
Primary Purpose
Selected or
Developed by
Scored by
Classroom
assessment
School or district
assessment
State assessment
Assessment system
Informing teaching
and learning
Informing and
monitoring
Monitoring and
evaluating programs to
ensure accountability
Informing teaching,
monitoring and
evaluating, certification
Individual teacher
Groups of teachers
and administrators
Groups of
administrators,
and/or policy makers
District assessment
leadership
Individual teacher
Groups of teachers
(and others)
Scorers outside
the district
Both internal and
external
SOURCE: Maine Department of Education (2003).
OCR for page 34
34
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
path: the school and district assessments have shared features but are firmly
grounded in the curriculum.
Professional development has been a key to making the system work, accord-
ing to Rosenblum and Rolfe. For teachers to succeed with this new kind of
responsibility, Rosenblum explained, they need to make assessment concepts
such as validity and reliability a part of their day-to-day thinking. They need to
internalize the links between the content in the standards, the local curriculum,
their own instructional models, and the purposes and nature of the assessments
they are carrying out.
Maine bolstered teachers' capacity to do this through a series of regional
seminars that tackled assessment issues and presented the details of the way the
system was to operate. At summer institutes for assessment development, educa-
tors had many opportunities to build their base of knowledge, share ideas, and
participate in scoring sessions that helped them focus on performance expecta-
tions. Maine considers the work it has done in professional development to be
one of the key successes of the program, and cites not only improved assessment
literacy, but also improved instruction and a broad-based sense of shared respon-
sibility for the program' s success.
WASHINGTON: ADAPTING A TRADITIONAL ASSESSMENT
Greg Hall, assistant superintendent of assessment and research in the Office
of the Superintendent of Public Instruction, Washington State Department of
Education, explained that the principal purpose of Washington's assessment sys-
tem is to provide the state, districts, schools, parents, and other stakeholders with
evidence of how well students are meeting state standards. The state made the
decision to use an assessment program it is using a criterion-referenced test
developed jointly with a commercial testing company to lead an effort to reform
and improve its system. Articulated as an effort to make Washington competitive
internationally, the reform goal was not initially popular in a state that had previ-
ously been characterized by strong local control of education. Many initially saw
the assessment program that was to drive the reform as secretive and out of touch
with classroom needs.
The state identified professional development as the potential bridge that
could link teachers and classrooms into the potential benefits of the new assess-
ment system, and has found a number of ways to involve teachers in the process.
First, they are participants in all stages of test development. The test contractor
was asked to conduct all item-writing workshops in the state and to involve only
Washington teachers. Teachers also pilot the assessments and are involved in
review of the pilot data; they have also conducted the scoring, which has pro-
vided ongoing opportunities for them to focus on performance benchmarks.
Through regional learning and assessment centers, national assessment experts
provide training in assessment issues and methods of interpreting data. Teacher
OCR for page 35
ASSESSMENT TO IMPROVE LEARNING
35
assessment leadership teams help disseminate the knowledge they gain at the
centers, and provide support to other teachers in their home districts and schools.
Washington also strives to help its teachers make use of the data they can
obtain from the large-scale assessment. Reports that are provided to every school
and district include data linked to each learning target and strand in the state
standards, as well as item analyses by school, district, and state. A companion
document contains the language of the learning target, so that educators can track
patterns in performance on different elements of the standards. The supporting
document also provides guidance on how to analyze the data and how to use the
released items that are included.
Hall told the workshop that Washington expects that now that teachers are
developing competence with large-scale assessment issues, and becoming more
comfortable with the data that they can provide, the state will be able to further
develop teachers' assessment literacy and, in turn, improve their classroom assess-
ment skills.
BERKELEY EVALUATION AND ASSESSMENT RESEARCH SYSTEM
The Berkeley Evaluation and Assessment Research (BEAR) Center has de-
veloped a science assessment system, BEAR, that is based on close links between
assessment and curriculum. Indeed, explained Mark Wilson of the University of
California at Berkeley, and one of the system's contributing researchers, the idea
guiding BEAR is that a large-scale assessment that is not coherent with classroom
assessment cannot effectively improve instruction because any gains students
make on it will be superficial. At the same time, he added, if classroom assess-
ments are not linked to large-scale assessments, teachers will be faced with the
need to teach two curricula, another recipe for failure.
Developed in tandem with a middle school science curriculum, the Issues,
Evidence, and You (IEY) program, BEAR is based on a developmental perspec-
tive on students' science learning. It is structured around what Wilson calls
"progress variables," definitions of the steps students take as they develop higher
levels of competence and deeper understanding of the material they are studying.
The teacher uses the progress variables to guide instruction and to provide direct
feedback to students. The assessment component consists of opportunities to
observe student performance, through tasks that are embedded in the instruc-
tional program and linked to particular progress variables, and through "link
tests," which assess similar skills in different contexts. Thus link tests provide a
kind of check on the information gained through the embedded assessments;
teachers evaluate both using common, generic scoring guides and examples of
student work.
These different sorts of items are then scaled so that student progress on the
multiple progress variables that define the curriculum can be monitored. These
results are used to establish that the assessments achieve high standards of
OCR for page 36
36
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
reliability and validity (for example, that the classroom-based IEY assessments
have reliabilities similar to those archived on standardized tests). The results can
be displayed in a variety of ways that can help teachers with planning and instruc-
tional activities for example, by showing an individual's progress over a year,
the state of a class at a particular time, or detailed results on each item for a
particular student.
Scoring sessions, in which teachers collaborate to calibrate their expecta-
tions, have been a crucial part of the program. The teachers not only learn from
one another about performance standards and ways of working with students,
they also use the opportunity to have deeper conversations about the educational
implications of the assessments and other issues related to teaching. At the same
time, Wilson explained, these sessions have been the principal way teachers have
made the system their own and internalized its goals and overall approach. Teach-
ers have also conducted similar moderating sessions in their classrooms to help
students understand the performance expectations and enter into the goals of the
program.
In describing the genesis of the BEAR program it was developed primarily
by graduate students in measurement working with curriculum developers-
Wilson noted the ways in which that process encapsulated the gaps the present
workshop attempted to address. He observed that the curriculum developers
functioned in a sense as artists do, working to assemble a set of experiences that
would provoke thinking and have effects on the participants. They had little
instinct for the prime concern of the measurement specialists, who focused on
finding valid and reliable evidence of particular outcomes. Yet these two groups
were able to find common ground using concrete notions of what students would
be doing in the form of the progress variables. Using that common framework,
they were able to combine their disparate goals into a coherent system.
NORTHERN CALIFORNIA MATHEMATICS
ASSESSMENT COLLABORATIVE
The Mathematics Assessment Collaborative (MAC), an initiative of the
California-based Noyce Foundation, is made up of thirty school districts in the
San Francisco Bay area that share the goal of using high-quality mathematics
performance assessments to improve both instruction and student learning.2
Participating districts assess 65,000 students every year in grades three through
ten. Linda Fisher, who directs MAC, and David Foster, mathematics program
director of the Noyce Foundation, described the way the collaborative's assess-
2The MAC is one of several related projects designed to support mathematics instruction that have
been sponsored by the Noyce Foundation. It is considered a component of the Silicon Valley
Mathematics Initiative, which addresses all aspects of mathematics instruction and learning.
OCR for page 37
ASSESSMENT TO IMPROVE LEARNING
37
ment program works and provided a detailed look at the kinds of feedback teach-
ers get about their students from the assessments.
The assessments used by the collaborative are produced by a commercial test
publishing company (CTB/McGraw-Hill), together with the Mathematics Assess-
ment Resource Service (MARS), which is a joint endeavor of a number of univer-
sities to write performance exams, scoring guides, and score reports that are
aligned with the national standards produced by the National Council of Teachers
of Mathematics. The collaborative has been administering a performance-based
assessment system since 1998; it provides both formative and summative data.
Foster began by setting the collaborative's use of MARS in the context of
California's assessment program. He noted that the performance of California
students on the SAT 9, a commercially available, norm-referenced test, had
increased steadily from 1998 to 2002, but that there were significant discrepancies
between student performance on that test and on the MARS. A comparison of the
results showed that although both assessments were based on the same standards,
students who performed well on the SAT 9 did not necessarily perform well on
MARS, the performance-based assessments. The findings for seventh graders,
for example, showed that half of the students who performed well on the norm-
referenced test did not meet national standards for seventh graders according to
the MARS results. These results, Foster explained, demonstrate the critical
importance of using multiple measures to assess student performance without
them, educators and administrators can be seriously misled about their students'
learning.
The MARS assessment program was designed not only to provide multiple
measures of achievement, but also to provide tools teachers can use to target their
instruction. The focus on teachers meant both that significant opportunities for
professional development were incorporated into the program, and also that the
assessment results were produced in a way that was meaningful for teachers in
the classroom as well as for more summative purposes. Fisher presented a
number of assessment tasks, and some of the data produced from them, to illus-
trate the "Tools for Teachers" that the MARS program includes.
Box 5-1 is a sample of the results teachers get for each task; it shows results
for point four on a ten-point scale. The goal in providing this kind of detail is to
encourage teachers to be "reflective about their practice" Fisher explained. What
the organizers of the collaborative have found is that as teachers work with such
feedback, and consider ways to use it with their students, they become curious
about research that might help them understand the misconceptions their students
showed and suggest techniques to help them in addressing these problems.
Sessions with teachers to go over the assessment data also yielded broader
insights about the kinds of professional development that might best help teach-
ers improve instruction. Fisher explained, for example, that in sessions focused
on the textbooks students were using, teachers quickly identified links between
the way many of them oriented the information they presented and some of the
OCR for page 38
38
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
student misconceptions they had discovered through the assessments. They
brainstormed ways to use the textbooks differently so they could anticipate and
forestall the misconceptions.
Teachers involved in the collaborative have a variety of other sources of
support and development. Summer workshops as well as training sessions during
the school year, supporting materials (the "Tools for Teachers," which include
targeted questions for them to use in evaluating their test results and lesson
plans), opportunities to participate in scoring the assessments, opportunities for
one-on-one coaching and classroom observations, and schoolwide debriefing
sessions, are all part of the program. Both Fisher and Foster stressed that the
various ways in which teachers are involved and encouraged to learn and change
are key elements of the program.
FACET-BASED ASSESSMENT
Jim Minstrell, a former high school physics teacher in Washington state,
described a system he has created for teaching physics according to a model of
students' developing understanding. The facet-based system is based on the
cognitive principle that students come to physics with ideas and preconceptions
that teachers need to identify and build on. To describe the basic units of thought,
Minstrell chose the word "facets" meaning pieces of knowledge, reasoning, or
beliefs that students have because he wanted to include both correct ideas and
the incorrect, naive, or incomplete ideas that students typically have along the
way to complete understanding. He chose not to use the word "misconceptions"
for the incorrect or incomplete ideas because these ideas often reflect important
steps along the way to full understanding that teachers can use to advantage.
Facet clusters, then, are sets of facets related to a particular topic that include
both the learning target and a complete and accurate understanding of a complex
OCR for page 39
ASSESSMENT TO IMPROVE LEARNING
39
principle or other topic, as well as students' evolving notions, arranged in the
approximate order that developing understanding usually follows. The facets and
clusters have been identified through research, teacher observations, and analysis
of student work. Using this means of organizing the content, Minstrell and his
colleagues developed a set of tools with which teachers can structure instruction
and assessment.
The system provides teachers with tasks, activities, Reassessments, and scor-
ing procedures that help them discover which facets their students are using, and
then guide students toward complete understanding. All of the activities and
assessment tools are linked to some part of the facet cluster for a particular topic
and are also coded so that they can be easily analyzed. The codes work with
multiple-choice as well as short-answer questions: distracters (incorrect choices)
and other student-generated responses are linked to the naive or incomplete facets
identified for the topic. Thus, when a teacher sees that a group of students
misunderstand, for example, the effect of ambient air on weight, he or she is
prepared: the facet-based system will likely supply a "prescriptive activity" the
teacher can use to address this shared misunderstanding in the classroom.
To make the system accessible to more than just a handful of teachers,
Minstrell and his colleagues developed a website for Washington teachers and
their students. Teachers can find elements such as preinstruction activities for
eliciting naive understandings, "checkout" questions to monitor students' devel-
opment, tools for interpreting and using assessment results, and other resources
and support. Students can also log on to do activities and get feedback about their
progress.
Teachers who have used the system have shown measurable improvements
in results for individual units, but Minstrell has found it difficult to involve
teachers as extensively as he had hoped. Web access in schools has presented a
practical obstacle: many schools have outdated systems that are slow or cannot
navigate the site, and in many schools students have only limited web access.
A perhaps larger problem has been that many teachers who were intrigued by
facet-based assessment were not sure they could manage to incorporate it and still
cover all the material their students would need to meet state requirements. While
the facet clusters are linked to Washington performance benchmarks for physics,
Minstrell recognizes that teachers will need more support if they are to make full
use of the program. He and his colleagues are currently conducting research to
better understand what kinds of professional development and teacher and dis-
trict support will be needed to make the program more readily accessible.
MODEL-BASED ASSESSMENT
In her presentation on the Los Angeles Unified School District's application
of a National Center for Research on Evaluation, Standards, and Student Testing
(CRESST) program, Eva Baker discussed some ideas she believes are critical to
OCR for page 40
40
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
the goal of using assessments to support learning. For Baker, professor in the
School of Education, University of California, Los Angeles, the goal of assess-
ment is to produce both usable and useful knowledge, and she explained what she
meant by the distinction. Usable knowledge is in a form that can be understood
and applied, it is timed appropriately, and it may cause rethinking of the problem.
Useful knowledge yields a new solution, based on rethinking of the problem. It
is adapted to the situation, it is sufficient to provide a solution, and it can yield an
improved outcome.
Some schools are much more successful than others at using assessment
knowledge for several reasons. They focus on the learning of both students and
adults. They make constant use of appropriate information, drawn from both
formal and informal assessments, and they focus on feedback and change. Learn-
ing and change are publicized and the entire learning community takes pride in its
achievements. The CRESST program, called Model-Based Assessment (MBA),
is rooted in this understanding of the ways in which assessments can benefit a
learning community.
MBA takes research-based understanding of thinking skills and applies it to
different content areas. MBA's key elements of learning are
· content understanding,
· problem solving,
· metacognition (consciousness about one's thought processes),
· communication, and
· teamwork and collaboration.
With MBA, these basic principles were intended to guide both the design of
assessments and instruction. Models were developed that could be used as tem-
plates and transferred to many subject areas, and were designed so that new
teachers can easily be trained to use and score them; they are also reusable and
thus relatively inexpensive and easy to adapt. The models, or templates, include
tasks, formats, prompts, scoring guides, directions, and samples.
The scoring and performance expectations are based on a research-based
model of the way experts in particular domains think and work in their area of
expertise. Experts make use of principles or themes in organizing their existing
knowledge as well as new information. They draw on prior knowledge, identify
explicit relationships among ideas or pieces of information, and avoid miscon-
ceptions.3 Baker illustrated the application of this understanding of expertise
with several sample templates, showing how the prompts were derived from an
understanding of expertise in particular domains, such as using primary docu-
ments to organize an essay.
3The expert model is discussed more fully in Knowing What Students Know (NRC, 2001c).
OCR for page 41
ASSESSMENT TO IMPROVE LEARNING
41
Despite the challenges it presented, the opportunity to try out MBA in Los
Angeles was welcome, as the assessment's creators were very eager to find out
how well the program could operate on a large scale. Initially the plan was to use
MBA in four subjects at three grade levels and in two languages. The program is
currently being administered in grades two through nine. CRESST staff have
trained a large cadre of teachers to score the assessments and to train other
teachers. Despite pressures to provide more concrete accountability and to
address mandated curriculum packages, Baker has hopes that the program will
continue.
CRESST has been conducting validation studies and pursuing a number of
research efforts to help it refine the program. Baker cited several key elements to
their success in running MBA on such a large scale. Because of the vital impor-
tance of cost and time factors, CRESST worked from the start of the program to
maintain a low cost per student, and thus benefited from the crucial support of
both the school board and teachers' union. Finally, because MBA was designed
to be easily transferable, responsibility for the program could be shifted relatively
easily to the school district staff, which had many important benefits. Los Angeles
educators were much better able to implement the knowledge gained from the
assessments because they felt responsible for the program. Moreover, teachers
learned and benefited from their participation, and the MBA was more easily
meshed with other educational mandates by those within the system than it could
have been by CRESST staff.
Representative terms from entire chapter:
improve learning