The comparative results obtained on an assessment depend on the degree to which the assessment reflects the curriculum and instruction of the groups of students whose performance is being compared (Linn, 1988; Linn & Baker, 1995; Porter, 1991). In any evaluation of educational programs, “if a test does not correspond to important program goals, the evaluation will be considered unfair” (Linn, 1987, p. 6). This is true for assessments within a nation, but becomes critically important in considering comparisons of performance of nations because there are such large differences between countries in curriculum and instructional emphases. For individual countries the fairness of the assessment necessarily varies as a function of the degree of correspondence between each country’s curriculum and the content boundaries and the relative emphasis given to covered topics of the assessment.
The particulars of the definition of the domain can have a significant impact on the relative position of nations on the assessment. Heavy weight given to one subdomain can advantage some nations and disadvantage others. Multiple-choice formats familiar to students in some nations may be less so to students in others. Conversely, extended-answer problems are standard fare for students in some nations, but not for students in all nations participating in the study. As Mislevy (1995, p. 423) has noted, “The validity of comparing students’ capabilities from their performance on standard tasks erodes when the tasks are less related to the experience of some of the students.” Because of the sensitivity of the relative performance of nations to the details of the specification of the assessments, considerable effort must go into negotiating the details of the specifications and to review and signoff on the actual items administered.
Messick (1989, p. 65) has noted that
[I]ssues of content relevance and representativeness arise in connection with both the construction and the application of tests. In the former instance, content relevance and representativeness are central to the delineation of test specifications as a blueprint to guide test development. In the latter instance, they are critical to the evaluation of a test for its appropriateness for a specific applied purpose.
Details of the approaches used to develop specifications for the assessments have varied somewhat in previous international assessments, but the general nature of the approaches have had a great deal in common. Generally, the approach has been to define a two-way table of specifications, beginning with one dimension defined by content. The topic and subtopic grain size has varied considerably, due in part to the subject