Classroom assessments can take many forms, including observations of student performance during instructional activities; interviews; formal performance tasks; portfolios; investigative projects; written reports; and multiple choice, short-answer, and essay examinations. The relationship of some of those forms of assessment tasks to the goals of science education are not as obvious as others. For instance, a student's ability to obtain and evaluate scientific information might be measured using a short-answer test to identify the sources of high-quality scientific information about toxic waste. An alternative and more authentic method is to ask the student to locate such information and develop an annotated bibliography and a judgment about the scientific quality of the information.

AN INDIVIDUAL STUDENT'S PERFORMANCE IS SIMILAR ON TWO OR MORE TASKS THAT CLAIM TO MEASURE THE SAME ASPECT OF STUDENT ACHIEVEMENT. This is one aspect of reliability. Suppose that the purpose of an assessment is to measure a student's ability to pose appropriate questions. A student might be asked to pose questions in a situation set in the physical sciences. The student's performance and the task are consistent if the performance is the same when the task is set in the context of the life sciences, assuming the student has had equal opportunities to learn physical and life sciences.

[See Teaching Standard C]

STUDENTS HAVE ADEQUATE OPPORTUNITIES TO DEMONSTRATE THEIR ACHIEVEMENTS. For decision makers to have confidence in assessment data, they need assurance that students have had the opportunity to demonstrate their full understanding and ability. Assessment tasks must be developmentally appropriate, must be set in contexts that are familiar to the students, must not require reading skills or vocabulary that are inappropriate to the students' grade level, and must be as free from bias as possible.

ASSESSMENT TASKS AND THE METHODS OF PRESENTING THEM PROVIDE DATA THAT ARE SUFFICIENTLY STABLE TO LEAD TO THE SAME DECISIONS IF USED AT DIFFERENT TIMES. This is another aspect of reliability, and is especially important for large-scale assessments, where changes in performance of groups is of interest. Only with stable measures can valid inferences about changes in group performance be made.

Although the confidence indicators discussed above focus on student achievement data, an analogous set of confidence indicators can be generated for opportunity to learn. For instance, teacher quality is an indicator of opportunity to

Assessment tasks must be developmentally appropriate, must be set in contexts that are familiar to the students, must not require reading skills or vocabulary that are inappropriate to the students' grade level, and must be as free from bias as possible.

learn. Authenticity is obtained if teacher quality is measured by systematic observation of teaching performance by qualified observers. Confidence in the measure is



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement