| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 5
The Criteria in Context
The steering committee began the process of planning the workshop by
considering the characteristics of an assessment system in which classroom and
large-scale assessments work together to support learning. It agreed with earlier
committees that, to be effective, assessment systems must do more than provide
valid data. They must also be designed so that the information produced can be
used to improve both the educational system and the teaching and learning
process. In such a system a single assessment does not function in isolation but
rather within a coordinated system in which the state, the district, the school, and
the classroom each play a role.
The specific criteria the committee identified are listed and briefly described
here. They are elaborated further in the discussion of workshop presentations
later in this report. The steering committee made no attempt to evaluate the
relative importance of each of the criteria, nor did it use the criteria to evaluate
programs. Rather, the intent was to use the experiences of workshop presenters
as a vehicle for thinking about the ways in which each of the criteria can contrib-
ute to the establishment of a coherent system.
The following are the ideal characteristics of assessment systems that the
committee identified: 1
.
Comprehensive: A comprehensive system is one in which a range of
measurement approaches are used to provide a variety of evidence to
1 The first three criteria are adapted from Knowing What Students Know (NRC, 2001c); the last two
were distilled from other reports listed in the Introduction.
s
OCR for page 6
6
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
.
.
.
support educational decision-making. A well-designed system includes
both formative (to support students' ongoing learning and help teachers
make instructional decisions) and summative (to evaluate students' level
of achievement at the completion of a phase of learning) assessments that
move students toward a manageable and clearly articulated set of out-
comes. Measures might also include those that assess the quality of
instruction, and provide evidence that improvements in tested achieve-
ment represent actual gains in learning as opposed to improved test-
taking skills, for example.
Coherent: A coherent system is one in which the conceptual base or
models of learning underlying the assessments used at all levels (large-
scale or classroom) are compatible. Furthermore, the content, processes,
and skills measured by different assessments across the system are com-
patible. For a system to be coherent, alignment is needed among stan-
dards, curriculum, instruction, and professional development so that each
element contributes to a common set of learning goals.
Continuous: In a coordinated system, assessments measure student
progress over time for example, over a school year, over several grades,
or over a student's school career. Assessments are ongoing and seam-
lessly integrated into instruction.
· Integrated: An assessment system is integrated if it is carefully designed
to fit into a larger, coherent educational system that provides resources
and professional development to ensure that teachers have the capacity to
do what is expected of them based on the standards in place.
Includes High-Quality Assessments: All of the assessments included in
the system should be of high quality, by which is meant, first, that they
must adhere to relevant professional standards. To further illustrate what
high quality means, the committee has identified a set of specific charac-
teristics that large-scale and classroom assessments can exhibit, which
are summarized in Boxes 2-1 and 2-2.
These criteria address the educational assessment environment as a whole,
and certainly it is not possible to talk about the relative effectiveness of large-
scale or classroom systems without considering the contexts in which they are
designed to operate. Nevertheless, there are many choices of approach for
assessing students, and the workshop began with an overview of current thinking
about both large-scale and classroom assessments. The discussion was grounded
in professional thinking on the purposes that each kind of assessment serves best,
and offered an overview of their potential, as well as their limitations.
OCR for page 7
THE CRITERIA IN CONTEXT
7
OCR for page 8
8
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
OCR for page 9
THE CRITERIA IN CONTEXT
9
OCR for page 10
10
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
THE IDEAL
While no current assessment programs have been identified that satisfy all of
the attributes described above, some can be seen as making significant progress
in implementing specific features of a high-quality program. To explore what it
might be like to teach and learn in a coherent and balanced assessment environ-
ment, where assessments, curriculum, instruction, and professional development
are fully aligned with standards, the committee invited Gail Burrill, a teacher and
teacher educator at Michigan State University and former president of the Na-
tional Council of Teachers of Mathematics, to inaugurate the workshop by simu-
lating such a situation for the workshop audience.
Describing an array of embedded, formative assessment techniques, Burrill
illustrated for the workshop participants how assessment can help to shape learn-
ing and direct instruction. Examples from Japan, the Netherlands, and China
helped to illustrate the ways in which assessments can circumscribe both what is
taught and how it is learned. Burrill used these international examples to make
OCR for page 11
THE CRITERIA IN CONTEXT
11
the point that educators in the United States are often leery of expecting students
to transfer their knowledge to new contexts. In the examples she discussed,
assessments were more challenging in that they called on students to use cogni-
tive processes on unfamiliar material, but she argued that U.S. students could
handle this kind of challenge.
To be sure that there is correspondence between what is taught and what is
valued, Burrill suggests, input from many sources is necessary. Subject area
experts, curriculum developers, researchers, teachers, cognitive scientists, and
assessment developers need to work together to develop the standards and the
assessments that will be used to measure student mastery of the specified compe-
tencies. Key for Burrill is that teachers be able to make choices as they imple-
ment a curriculum, and that assessments serve as an appropriate guide to what is
taught. Coherent assessments will foster coherent curriculum and effective
instruction; lack of coherence leads to unfocused learning and shallow under-
standing.
LARGE-SCALE ASSESSMENTS
While large-scale assessments can be controversial, and are easily misused,
they are an important way of obtaining certain kinds of extremely valuable infor-
mation about students. Large-scale assessments, those that are designed to pro-
vide evidence about large numbers of students, are the primary means by which
accountability evidence is obtained in the United States. Indeed, there is little
dispute that accountability the provisions made for those who use, fund, and
oversee public education to review and evaluate its effectiveness is a crucial
element in the continued success of public education.
As Lorrie Shepard of the School of Education, University of Colorado,
Boulder, outlined at the workshop, there are three particular uses for which large-
scale tests are essential. The first is program diagnosis. Assessments that make
it possible to compare the performance of a large number of students can be used
to identify patterns of strengths and weaknesses that are in turn critical for iden-
tifying any needed improvements in curriculum or instruction. Assessments
developed for large-scale use, to provide evidence about district- or statewide
performance, can also exemplify, as Shepard termed it, the educational goals
described in standards and curriculum documents. In other words, assessment
tasks and examples of student work make concrete just what students will actu-
ally know or be able to do if they meet defined standards. Large-scale assess-
ments are also useful for one-time certification or screening; for example, to
identify students who are not ready for grade-level work in reading and who need
follow-up targeted assessment to determine their specific needs for remediation.
Shepard also noted that large-scale assessments often provide teachers an
opportunity for effective professional development. Development of tests, scor-
ing, curriculum development, and standards-based professional development are
OCR for page 12
2
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
all occasions when efforts to improve classroom assessment strategies can be
woven into the program. Shepard argues that more could be gained through these
opportunities if teachers had improved access to materials that model teaching for
understanding, such as extended instructional activities, formative assessment
tasks, and scoring rubrics with summative assessments built in to them.
While the value of large-scale assessments for these purposes is clear, it is
equally clear that they are not useful for many other important educational pur-
poses, particularly that of providing detailed understanding of individual students'
performance. Professional standards are firm on the point that it is not a test itself
that can be established as valid, but particular inferences that may be made from
the test data (see National Science Education Standards (NSES) Standard 13.2,
NRC, 1996).
Nevertheless, administrators who are pressed for both time and resources are
often tempted to find tests that can serve more than one purpose. While this can
be done, it necessarily entails compromises. Noting, "Ironically, the questions
that are of most use to the state officer are of the least use to the teacher" (NRC,
2001c, p. 224), the Committee on the Foundations of Assessment framed the
problem as a trade-off in assessment design between supporting accountability
for schools and systems and supporting the need for specific guidance about
individual students.
As Shepard stated, "The best way to help policy makers understand the
limitations of an external, once-per-year test for instruction is to recognize that
good teachers should already know so much about their students that they could
fill out the test booklet for them." Shepard listed some of the contrasts, shown in
Box 2-3, between large-scale and classroom assessments that make clear why
different instruments are usually needed for different purposes.
OCR for page 13
THE CRITERIA IN CONTEXT
13
Many large-scale assessments are what psychometricians call "norm-
referenced," which means that one of their functions is to provide evidence of
how students compare to one another. The resulting scores can be used to spread
students' performance out along a scale. The SAT is a good example of such a
test: it is designed not to assess particular knowledge or content, but to provide
college and university admissions officials with a means of ranking students
based on their potential to succeed at college-level work. The questions are
carefully selected, based on pretesting results, to present a range of difficulty, so
that very few students are likely to succeed at either all or none of them, and so
that the students will be spread out along the scale. Performance on such tests is
often expressed in terms of percentiles, with a particular score reflecting perfor-
mance that is better than that of a certain percentage of other test takers.
Other assessments are called "criterion-referenced" because their scoring
"refers" not to the past performance of other students but to a fixed body of
knowledge. Good examples of this kind of testing include professional licensure
tests, which often identify minimum acceptable levels of mastery. With such
tests, it does not matter how well other students have done; it matters only that a
prospective airline pilot or surgeon has mastered a particular body of knowledge
deemed essential. Assessments used with K-12 students can be of either type,
and in some cases may blend the two. For example, states that use tests devel-
oped by national companies, which are often norm-referenced and offer the state
the opportunity to determine how its students compare to those of the same age
across the country, may also wish to assess their students' knowledge of particu-
lar aspects of their standards. A state may add sections to the norm-referenced
portion or make other modifications to adapt the test to the multiple purposes it
has identified, though, as noted above, such an approach entails compromise.
CLASSROOM ASSESSMENTS
Discussion of classroom assessments has been somewhat less tidy, in part
because the definition of such assessments is less precise, and the range that the
term covers was evident at the workshop. Teachers make assessments of their
students' learning every day, by noting the misconceptions or insights that under-
lie a question, for example, or observing the way a student makes use of materials
provided for a task. They also assess them more formally, with particular
questions in mind, and it is through the teacher's aim in assessing that presenter
Dylan Wiliam, professor of education at King's College in London, defines class-
room assessment, or, in his phrase, assessment for learning. That is, if the aim of
the assessment is to improve the student's learning in some direct way, rather
than to rank, evaluate, or certify some aspect of performance, then it is properly
in the realm of classroom assessment.
For Wiliam, it is the feedback provided to the student that is critical to the
success of this enterprise, and he describes it as a three-part process. First the
OCR for page 14
4
ASSESSMENT IN SUPPORT OF INSTRUCTION AND LEARNING
teacher must find out where the student is in relation to the goals for the class;
next, he or she must clearly convey to the student what those goals are. Perhaps
most important, the teacher must then help the student in concrete ways to move
toward those goals. Assessments that are intended primarily to provide feedback
to students and to shape their learning are often called formative assessments, and
distinguished from summative ones, which are intended primarily to evaluate
students. This mode of categorizing assessments shares some aspects with the
dichotomy between classroom and large-scale assessments that is the subject of
this report, but it is important to remember that a large-scale assessment could
serve formative purposes, just as a classroom assessment can serve summative
purposes.
Presenter Jan de Lange, professor and director of the Freudenthal Institute at
the University of Utrecht, The Netherlands, addressed the issue of classroom
assessments used in teaching mathematics, using a description of a project car-
ried out in Philadelphia and Milwaukee by the Freudenthal Institute to highlight
several points. The project's goal was to influence the quality of learning and
instruction by changing classroom assessment methods, and it used an Assess-
ment Pyramid to depict the different levels of mathematical competencies that
students display. In the pyramid, level 1 covers reproduction and facts, level 2 is
making connections and simple problem solving, and level 3 is complex problem
solving and mathematical reasoning.
Teachers involved in the project were given a variety of supports, including
both assessment materials and training, through which they could help their
students think more deeply about mathematics. At the same time, teachers'
thinking about what constitutes effective classroom assessment, scoring, and
other issues was expanded. The pyramid was the basis for defining expectations
for student performance, for structuring instruction, and for giving students use-
ful feedback in relation to learning goals and competency levels.
The pyramid was derived from the framework used in the Organisation for
Economic Co-operation and Development's Programme for International Stu-
dent Development (PISA) (PISA's assessment program is described in Chapter
4~. De Lange argued that the alignment between the pyramid used in the class-
room and the large-scale PISA demonstrated for teachers that a comprehensive,
coherent, and continuous assessment is possible. At the same time, by working
with the pyramid the teachers became skilled at recognizing and analyzing qual-
ity assessment. Through the two-year study, de Lange explained, teachers
changed their approaches to both classroom assessment and the teaching of math-
ematics in significant ways.
For committee chair J. Myron Atkin, professor at the Center for Educational
Research, Stanford University, the key is the teacher's unique capacity to monitor
students' progress over time. In his presentation, which focused on the way
classroom assessment functions in science education, Atkin asked workshop par-
ticipants to consider the many different opportunities a teacher has to assess what
OCR for page 15
THE CRITERIA IN CONTEXT
15
students know and can do in the course of a project that takes place over several
weeks or months.
As an example, Atkin cited a project in which a group of students monitored
the state of a pond near their school and investigated the nature and possible
causes for an algal bloom that occurred in the course of their study. Not only
were they conducting original research, in the sense that no scientists had previ-
ously studied that particular pond, the students were also able to respond to
unpredictable events. The project afforded them many opportunities to demon-
strate their capacity to bring prior knowledge and experience to bear on a problem,
their proficiency with available methods and tools, and their resourcefulness in
drawing on available sources of data and interpretation. Their teacher was able to
monitor their progress through formal output, such as field notes and reports, as
well as in countless informal interchanges that revealed the students' thinking
and their development over time.
This project exemplified for Atkin how a teacher can develop an "assessment
culture," in which the focus is on inquiry a key element of both the content and
skill standards included in the NSES (NRC, 1996~. The teacher was able to
assess students on skills and knowledge that are deemed essential by NSES, and
yet are impossible to measure using a one-time performance assessment. The
challenge Atkin identified is to find more ways to make systematic use, for
purposes of accountability beyond the classroom, of the information about stu-
dents that teachers are in a unique position to obtain.
Representative terms from entire chapter:
classroom assessments