. "1. Introduction." Learning and Understanding: Improving Advanced Study of Mathematics and Science in U.S. High Schools: Report of the Content Panel for Mathematics. Washington, DC: The National Academies Press, 2002.
The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
Learning and Understanding: Improving Advanced Study of Mathematics and Science in U.S. High Schools - Report of the Content Panel for Mathematics
Schools, 2000.5 The panel found surprisingly few other data on AP and IB on which to base its evaluation.6
For example, little is known, except anecdotally, about how either program is implemented in U.S. high schools, including the instructional strategies and resources used in individual classrooms, the structure of the syllabi in different schools, the quantity and quality of the facilities available, the preparation of teachers who teach the courses, and the ways in which students are prepared prior to enrolling in AP calculus or advanced IB mathematics courses. Information about the AP and IB assessments is also limited. The IBO currently conducts no systematic research addressing the validity,7 reliability,8 or comparability9 of its assessments across administrations (Pook, 2001). The senior examiners, not psychometricians, make determinations about the degree to which each administration is a valid and reliable measure of student achievement. The College Board, on the other hand, has gathered considerable data to demonstrate the reliability and comparability of student scores from one administration to the next and from one student to another. However, neither program has a strong program of validity research, and neither has gathered data to document that the test items on its examinations measure the skills and cognitive processes they purport to measure. To fully analyze or evaluate the AP and IB assessments, it is necessary to know, for example, that test items intended to measure problem solving do in fact tap those skills and do not just elicit memorized solutions or procedures.
Further, little evidence is available for evaluating the long-term effects of the AP and IB programs. For instance, the panel could not find systematic data on how students who participate in AP and IB fare in college mathematics relative to other students. Nor could the panel find studies that examined the effects on postsecondary mathematics programs of the ever-increasing numbers of students who are entering college with credit or advanced standing in mathematics. While the College Board and a few colleges that receive IB students have conducted some isolated studies addressing how AP or IB students perform in college (see, for example, Morgan and Ramist, 1998), the inferences that can accurately be drawn from the findings of these studies are ambiguous (see Chapter 10 of the parent committee’s report).
Because empirical evidence about the programs’ quality and effectiveness is lacking, the panel focused its analysis on what the programs say they do, using available program materials.
Although few data currently exist, the panel notes that both programs have circulated requests for proposals to conduct research on the ways in which their respective programs are implemented in schools and classrooms and the effects of these different implementations on student learning and achievement. No data from any of these studies were available at the time this report was prepared.
Validity addresses what a test is measuring and what meaning can be drawn from the test scores and the actions that follow. It should be clear that what is being validated is not the test itself, but each inference drawn from the test score for each specific use to which the test results are put.
Reliability generally refers to the stability of results. For example, the term denotes the likelihood that a particular student or group of students would earn the same score if they took the same test again or took a different form of the same test. Reliability also encompasses the consistency with which students perform on different questions or sections of a test that measure the same underlying concept, for example, energy transfer.
Comparability generally means that the same inferences can be supported accurately by test scores earned by different students, in different years, on different forms of the test. That is, a particular score, such as a 4 on an AP examination, represents the same level of achievement over time and across administrations.