and scored externally from the school. Although such external tests are not subject to the risks of bias at a personal, one-on-one level, this advantage may be offset because a teacher might see that a student does not understand a question and can rephrase to overcome the obstacle, the external grader or machine cannot.

Some people caution against complications associated with the multiple roles that teachers play in assessment, including that of both judge and jury. They see this subjectivity as a threat to the validity of the assessment. They point to a study that examined the effects of expectations on human judgment (Rosenthal & Jacobsen, 1968). Teachers were provided contrived information that a handful of students showed exceptional promise, when in actuality they were no different from the others. When questioned several months later about those students ' progress, the teacher reported that they excelled and progressed more than their classmates. One of the basic claims made by the researchers in this study was that the teacher fulfilled the “exceptional-promise ” expectation. In efforts to try to overcome or at least abate inherent bias that results in inequitable treatment, teachers, and all those working with students, need to be examined and keep a check on the bias that enters into their own questioning, thinking, and responses.


To some, issues of validity and reliability are at the heart of assessment discussions. Although these considerations come into play most often in connection with large-scale assessment activities, technical issues are important to consider for all assessments including those that occur each day in the classroom (American Educational Research Association, American Psychological Association, & National Council on Measurement and Education, 1999). Though principles stay the same, operationally they mean and look different for formative and summative purposes of assessment.

Issues of validity center on whether an assessment is measuring or capturing what is intended for measure or capture. Validity has many dimensions, three of which include content validity, construct validity, and instructional validity. Content validity concerns the degree to which an assessment measures the intended content area. Construct validity refers to the degree to which an assessment measures the intended construct or ability. For example, the Standards outline the abilities and understandings necessary to do scientific inquiry. For an assessment to make valid claims about a student' s ability to conduct inquiry, the assessment would need to assess the range or abilities and understandings comprised in the construct of inquiry.

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement