provide evidence of improvement in student learning in real classrooms across different curricula. Yet without the kind of complementary evidence provided in a content analysis, nothing will be known about the quality or comprehensiveness of the content in the curriculum that produced better outcomes. Furthermore, neither content analyses nor comparative studies typically provide information about the quality of the implementation of a particular curriculum. A case study provides deep insight into issues of implementation; by itself, though, it cannot establish representativeness or causality.

This conclusion—that multiple methods of evaluation strengthen the determination of effectiveness—led the committee to recommend that a curricular program’s effectiveness should be ascertained through the use of multiple methods of evaluation, each of which is a scientifically valid study. Periodic synthesis of the results across evaluation studies should also be conducted.

This is a general principle for the conduct of evaluations in recognition that curricular effectiveness is an integrated judgment, continually evolving, and based on scientifically valid evaluations. The committee further recognized, however, that agencies, curriculum developers, and evaluators need an explicit standard by which to decide when federally funded curricula (or curricula from other sources whose adoption and use may be supported by federal monies) can be considered effective enough to adopt. The committee proposes a rigorous standard to which programs should be held to be scientifically established as effective.

In this standard, the committee recommends that a curricular program be designated as scientifically established as effective only when it includes a collection of scientifically valid evaluation studies addressing its effectiveness that establish that an implemented curricular program produces valid improvements in learning for students, and when it can convincingly demonstrate that these improvements are due to the curricular intervention. The collection of studies should use a combination of methodologies that meet these specified criteria: (1) content analyses by at least two qualified experts (a Ph.D.-level mathematical scientist and a Ph.D.-level mathematics educator) (required); (2) comparative studies using experimental or quasiexperimental designs, identifying the comparative curriculum (required); (3) one or more case studies to investigate the relationships among the implementation of the curricular program and the program components (highly desirable); and (4) a final report, to be made publicly available, should link the analyses, specify what they convey about the effectiveness of the curriculum, and stipulate the extent to which the program’s effectiveness can be generalized (required). This standard relies on the primary methodologies identified in our review, but we acknowledge the possibility of other configurations, provided they draw on the framework and the

