Case Studies and Synthesis Studies

The committee drew a distinction between two types of studies: comparative studies that investigate a curriculum’s effectiveness (as indicated by measures of student performance) and case studies, which typically examine the mechanism or means of obtaining those effects, although some case studies do attend to outcome measures. Case studies typically document “what happened” differently than do comparative studies. Case studies provide insight into mechanisms at play that are hidden from a comparison of student achievement. This is an important distinction for program evaluation and curriculum development, as the actual treatment in a large-scale comparative study is often ill defined. As discussed in a report by the National Research Council (2002, p. 117):

In many situations, finding that a causal agent (*x*) leads to the outcome (*y*) is not sufficient. Important questions remain about *how x* causes *y*. Questions about how things work demand attention to the processes and mechanisms by which the causes produced their effects.

Case study research is appropriate “when the inquirer seeks answers to ‘how’ or ‘why’ questions, when the inquirer has little control over events being studied, when the object of study is a contemporary phenomenon in a real-life context, when boundaries between the phenomenon and the context are not clear, and when it is desirable to use multiple sources of evidence” (Schwandt, 2001, p. 23).

Although the genre of studies that investigate details of “what hap-

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
6
Case Studies and Synthesis Studies
CASE STUDIES
The committee drew a distinction between two types of studies: comparative studies that investigate a curriculum’s effectiveness (as indicated by measures of student performance) and case studies, which typically examine the mechanism or means of obtaining those effects, although some case studies do attend to outcome measures. Case studies typically document “what happened” differently than do comparative studies. Case studies provide insight into mechanisms at play that are hidden from a comparison of student achievement. This is an important distinction for program evaluation and curriculum development, as the actual treatment in a large-scale comparative study is often ill defined. As discussed in a report by the National Research Council (2002, p. 117):
In many situations, finding that a causal agent (x) leads to the outcome (y) is not sufficient. Important questions remain about how x causes y. Questions about how things work demand attention to the processes and mechanisms by which the causes produced their effects.
Case study research is appropriate “when the inquirer seeks answers to ‘how’ or ‘why’ questions, when the inquirer has little control over events being studied, when the object of study is a contemporary phenomenon in a real-life context, when boundaries between the phenomenon and the context are not clear, and when it is desirable to use multiple sources of evidence” (Schwandt, 2001, p. 23).
Although the genre of studies that investigate details of “what hap-

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
pened” frequently do not provide sufficient experimental evidence to permit causal inference about a curriculum’s effectiveness as measured by student achievement, the studies may indicate why that curriculum had the effect it did and it may highlight aspects of implementation or design that were instrumental in producing that effect. Such studies, generally referred to as case studies, can provide useful information along a number of dimensions that emanate from a careful description of the connections among a curriculum’s program theory, its implementation theory, and its actualization in particular settings (Bickman, 1987). The generalizations from a well-designed comparative evaluation may not provide sufficient information to permit decision makers to know whether the experimental treatment (new curriculum) will be appropriate for their particular setting. Case studies may provide additional specificity that is necessary and helpful to practitioners in assessing the probability of successful use in their settings. As written by Easley (1977, p. 6):
Experimentalists feel that they can generalize their findings from an experiment to the population as a whole because they have drawn an adequate random sample from the population about which a hypothesis speaks. Clinical researchers feel that they can generalize from a study of a single case to some other individual cases because they have seen a given phenomenon in one situation in sufficient detail and know its essential workings to be able to recognize it when they encounter it in another situation.
Criteria for Inclusion in Our Study
Forty-five articles, dissertations, and unpublished manuscripts were originally classified as case studies. We considered case studies, ethnographies, descriptive studies, and research studies that inform us about what happens in the implementation of specific curricula, classifying them all as “case studies” for simplicity. To be classified as a case study, the study had to examine curricula implementation of significant parts of the curricula materials (more than one unit) over a significant duration (more than one semester) and had to show evidence of systematic data collection and report on the effectiveness of the materials in the conclusions. For our purposes the study also had to focus on 1 of the 13 mathematics curricula supported by the National Science Foundation (NSF), the University of Chicago School Mathematics Project (UCSMP) curriculum, or one of the five other commercially generated mathematics curricula included in our review.
After the initial categorizing, we refined our criteria for inclusion in our review to stipulate that the case studies must have been published, be a dissertation, or have a draft date of 2000 or later. We assumed that manuscripts with a draft date prior to 2000 were written with the intent to publish. Therefore we decided not to consider them if they remained un-

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
published in 2003. Unpublished manuscripts written prior to 2000 will probably not be published even if they were written with that intention. On the other hand, manuscripts with draft dates of 2000 or later may be in the pipeline for publication and were included. Thirty-two studies met our criteria.
The Studies
From the original 45 studies, we excluded 12 draft manuscripts with dates prior to 2000 and 1 manuscript dated in 2000 because it was simply a compilation of the author’s dissertation results. Thus we included 32 studies: 9 unpublished manuscripts, 13 dissertations, and 10 published articles. Therefore, the remainder of the section will report only on the 32 included studies. Table 6-1 reports the number of case studies on each NSF-supported curriculum.
TABLE 6-1 Distribution of Case Studies by Curricula
Number of Studies
NSF-Supported Elementary Curriculum Materials
Everyday Mathematics (EM)
4
Investigations in Number, Data and Space/TERC
1
Math Trailblazers
1
NSF-Supported Middle School Curriculum Materials
Connected Mathematics Project (CMP)
14
Mathematics in Context (MiC)
7
Math Thematics (STEM)
4
MathScape
2
MS Mathematics Through Applications Project (MMAP)
0
NSF-Supported High School Curriculum Materials
Interactive Mathematics Program (IMP)
1
Mathematics: Modeling Our World (MMOW or ARISE)
0
Contemporary Mathematics in Context (Core-Plus)
5
Math Connections
0
SIMMS
1
Commercially Generated Elementary Curriculum Materials
Addison Wesley: Math, 2002
0
Harcourt Brace: Harcourt Math K-6
0
Commercially Generated Middle School Curriculum Materials
McGraw-Hill/Glencoe: Applications and Connections, 2001
0
Saxon: An Incremental Development
0
Commercially Generated High School Curriculum Materials
Houghton Mifflin/McDougal Littell: Larson Series, 2002
0
Prentice Hall: UCSMP Integrated Mathematics, 2002
0
NOTE: Some reports addressed more than one curriculum, so the number of curricula addressed is larger than the number of included studies.

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
Method
We judged each included study according to how well it met the following criteria:
1. Defined the case.
A report defined its case well if it made clear the category to which the case belonged. In other words, a well-defined case allowed us to make statements that clarified, “This study is about x,” where x was defined with enough specificity and clarity that an equivalent case could be replicated at a later time with assurance of studying a similar phenomenon. “This study is about two middle school teachers teaching a reform curriculum” is not sufficiently clear about the subjects or the setting to assure an equivalent case in another study. In terms of our framework, a well-defined case also presented a clearly articulated program theory.
2. Backed its claims by evidence and argument.
Authors backed their claims when they used a methodology that included data, a way to analyze data systematically, and a form of argument that could support a reader’s reaching a similar or contrary conclusion. This criterion permitted us to distinguish a case study from an anecdotal report that told a story, but did not indicate how the data were systematically collected, linked to program theory, and analyzed and evaluated.
3. Was based on a replicable design.
A central feature of a scientific experiment is that the conditions under which it was conducted, the procedures used in conducting it, and the methods for collecting and analyzing data are described explicitly. The purpose of explicit and veridical descriptions is so that other researchers can perform “the same” experiment or a variation of it in order to compare its results with the experiment being replicated. Thus, replicability of an experiment does not refer to the experiment’s results being repeated. Rather, it refers to repeating the experiment itself so that the replication’s results can be compared with the original’s results. Clearly a case cannot be precisely repeated. But the method of constructing a case can be repeated if conducted appropriately and if described sufficiently, and if not repeated precisely the differences can be noted and taken into account. Thus, a case must provide sufficient

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
delineation of case events, behaviors, perceptions, and the methods of data collection associated with them to permit another evaluator to design a parallel evaluation in another setting and to conduct a related study. A replicable design, therefore, is one that allows another person to “repeat” the study methodologically, to the extent feasible, using similar data collection techniques and similar analytic methods.
4. Revealed something about the mechanisms at play during the implementation of a curriculum.
A case study should develop clear explanatory constructs that coherently link together the mechanisms involved in curricular use with the program theory, the conditions of implementation, and the documented events, behaviors, and perceptions of the case.
The studies included in our review are most valuable for generating explanations about a curriculum’s program theory or implementation theory. This stance is in line with Campbell’s (1994) thinking that generating and addressing rival explanations of a phenomenon is the heart of scientific inquiry. As pointed out in the section on comparative studies, typical experimental and quasi-experimental designs produce results subject to refutation by rival hypotheses. Case studies can be useful in shedding light on which of these hypotheses, or others, are most promising to pursue.
Each included study was read by at least two committee members and discussed with regard to each criterion, thereby leading to a consensus score of 1 (poor), 2 (acceptable), or 3 (well done) being assigned on each.
Findings
Case studies in this review were found only in the NSF-supported mathematics curriculum materials. Therefore, the generalization of results must be restricted to NSF-supported curricula; case studies of commercially generated curricula would be needed to draw broader conclusions. Table 6-2 provides an overview of how many studies received what rating on each of the four criteria.
Only 11 studies received ratings of 2 or 3 on all criteria. Surprisingly, there was little correlation between quality rankings and type of report. Table 6-3 shows the breakdown of level by type of report. Dissertations tended to back claims better than articles or manuscripts. Dissertations and articles tended to have higher ranks than unpublished manuscripts, which were notably poor at providing a sense of mechanism by which a curriculum’s effects might be realized. However, on any criterion the majority of reports were at best acceptable.

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
TABLE 6-2 Number of Studies by Rating on Each Criterion
Quality Ranking
Defined Case
Backed Claims
Replicable Design
Insight Into Mechanism
Level 1 (poor)
16
15
16
17
Level 2 (acceptable)
7
5
6
10
Level 3 (well done)
9
12
10
5
Patterns in Findings
Despite the relatively small number of high-quality studies, there were four recurrent issues that were raised broadly in these studies and bear on the design of future evaluations. It is important to note that our purpose for identifying patterns in case study results is methodological. Case studies can provide useful information on how program components interact with implementation factors at the level of classroom practices, and therefore can provide insight into the reason for whatever level of curricular effectiveness occurred. Case studies can therefore inform future evaluators about potential explanatory variables to include the conduct of future evaluations.
Recurrent issues among the case studies were as follows:
Design features affect student subpopulations differentially;
Common practices, beliefs, and understandings among teachers and students interact in unanticipated ways with characteristics of these curricula;
Professional development is an essential consideration; and
Time and resource allocations must be carefully managed.
A fifth issue that ultimately could be important for curriculum adopters, and thus potentially important to be addressed in evaluations, is that of students’ transitions from reform to nonreform curricula (or vice versa). One study (de Groot, 2000) followed three female students as they transitioned from the Connected Mathematics Project (CMP) to standard 9th-grade algebra, identifying interesting issues that might generalize to the larger population. However, there were no other studies of sufficient quality addressing this issue to report any secure or robust patterns.
Differential Impact on Different Student Populations
Many new curricula anticipate that instruction will be highly interactive, involving students in patterns resembling reflective discourse in which students and teacher interact around substantive ideas and take their under

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
TABLE 6-3 Type of Studies by Rating on Each Criterion
Defined Case
Backed Claims
Replicable Design
Insight into Mechanism
Criterion Rating
1
2
3
1
2
3
1
2
3
1
2
3
Unpublished manuscript
6
3
0
7
1
1
7
0
2
8
0
1
Dissertation
6
2
5
3
2
8
4
4
5
5
6
2
Published article
4
2
4
5
2
3
5
2
3
4
4
2
NOTE: Criterion Rating: (1) poor; (2) acceptable; (3) well done.

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
standing of mathematics and their forms of representing it as objects of discussion (Cobb et al., 1997). Although it has been documented that these practices can offer advantages to students who participate in them (Nicholls et al., 1990; Cobb et al., 1991; Lehrer et al., 1999), it is possible that these practices are not easily implemented widely without greater attention to changes in classroom culture and teachers’ expectations of why these practices might be fruitful.
Baxter et al. (2001) and Murphy (1998) suggest the importance of giving special attention to low-achieving students when implementing instructional practices that emphasize public displays of knowledge, such as working in small groups or participating in whole-class discussions. Woodward and Baxter (1997), in a comparison of Everyday Mathematics and Heath Mathematics, found that while average- and high-ability students seemed to benefit from using Everyday Mathematics in relation to the comparison groups, low-achieving students in both groups performed at comparatively the same level and showed only modest improvement over time. In a follow-up qualitative study of why low-achieving students benefited less than higher achieving students in Everyday Mathematics, Baxter et al. (2001) found that low-achieving students often were disengaged during whole-class discussion. Sometimes they were not able to follow other students’ often poorly constructed and fragmentary contributions. At other times the nature of the discussion seemed to assume levels of prior knowledge that many low-ability students lacked.
An exception to this finding occurred during small-group work. Low-achieving students were more engaged during small-group work than in whole-class instruction. However, the nature of their engagement was typically low level (e.g., copying results, collecting resources). Baxter et al. (2001) noted that one teacher was successful using Everyday Mathematics with low-achieving students. The difference was that this teacher provided many conceptual entry points to conversations, a point elaborated further in the next section. Murphy (1998) suggested an additional reason why low-achieving students failed to participate in discussions by noting that low-achieving students felt greater exposure to ridicule because they had to display their lack of understanding or achievement in front of others.
The Baxter et al. and Murphy studies do not provide comparative assessments of curricula that place a premium on group work or class discussion. Rather, they are most useful in generating hypotheses about what kinds of practices associated with such curricula may need to be modified or supplemented to ensure a fair distribution of opportunities to learn among all ability levels. They suggest a need for curricular evaluations to examine whether implementation and program theories provide sufficient attention to necessary changes in existing classroom norms and practices for various subgroups, and to study relationships between actualization of those theories and student achievement.

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
In a dissertation focusing on the CMP curriculum, Lubienski (2000) argued that differences in socio-economic status (SES) among students can be linked to clashes between curriculum designers’ intent to empower students mathematically and cultural values internalized by them. In particular, she focuses on the responses of low-SES students to the context of open-ended, ill-defined problems:
Hence, in contrast with the reformers’ rhetoric of “mathematical empowerment,” some of my students reacted to the more open, challenging mathematics problems by becoming overly frustrated and feeling increasingly mathematically disempowered. The lower SES students seemed to prefer more external direction from the textbook and the teacher. The lower SES students, particularly the females, seemed to internalize their struggles and “shut down,” preferring a more traditional, directive role from the teacher and text. These students longed to return to the days in which they could see more direct results for their efforts (e.g., 48 out of 50 correct on the day’s worksheet). (p. 476)
Lubienski emphasized that readers should not generalize her observations to all implementations of problem-based mathematics curricula. Instead she stressed that the strongest use of her results should be to alert designers and users of problem-centered curricula of the possibility that they may be insensitive to cultural values designed into curricula that certain student subpopulations may not initially or subsequently understand or share. We add to Lubienski’s caveat that evaluations should be designed with the awareness of possible unintended interactions between program design and subgroup characteristics.
In a similar vein, Hetherington (2000) documented a clash between the emphasis that Core-Plus Mathematics places on group work and public discourse and the habitual lack of intellectual engagement that students in her study had developed in prior years. Late, sloppy work interfered with progress because curriculum designers anticipated that later tasks and assignments would build on previous, solid work that students had not accomplished.
Taken together, these examples illustrate the complex interactions among key features of a program’s design, existing instructional practices, and characteristics of particular student subgroups that program evaluations should consider. Attending to these interactions in program evaluations may provide more precise understandings of a curriculum’s differential impact among student subgroups and of its differential impact among implementation sites. It might also lead to a deeper understanding about how cognitive and conceptual accomplishments are produced through the interactions among curricular tasks and student and teacher participation patterns (Greeno and Goldman, 1998).

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
Interactions Among Curricula and Common Practices, Beliefs, and Understandings
The prior section focused on design features that assume students benefit automatically from public discussion as a means of support for collective reflection and student engagement. Those studies elucidate unexpected interactions between public discourse and student characteristics that can result in their disengagement rather than engagement. Complicating the framework further, several studies also document that these design characteristics can interact with teacher characteristics in ways that diminish curricular effects. Teachers who express mathematical ideas primarily in terms of numbers, symbols, and operations, and who encourage students to do the same, can create ways of talking that make the ideas being discussed accessible only to those who already understand the ideas—and therefore inaccessible to students who do not already understand them (Thompson et al., 1994).
Several studies (Fuson et al., no date; Herbel-Eisenmann, 2000; Manouchehri and Goodman, 1998, 2000) provided excerpts of classroom dialog suggesting that the degree to which teachers and students speak calculationally could be an important factor in how successfully they implement curricula that place a premium on public discourse in the service of teaching for understanding. Other studies suggested that the degree to which teachers are oriented to making sense of mathematical ideas for students can be important factors both in using public discussion productively (Kett, 1997; Smith, 1998) and in implementing the curriculum according to its designers’ intent (Manouchehri and Goodman, 1998, 2000).
We found one study particularly informative in illustrating the evaluators’ view of the importance of classroom discourse that draws on students’ ideas. Fuson et al. (no date) analyzed 1st-grade Everyday Mathematics materials to discern the social and sociomathematical norms (Yackel and Cobb, 1996) assumed by the curriculum designers. Three particularly important norms were:
Extend students’ thinking;
Use errors as opportunities for learning; and
Foster student-to-student discussion of mathematical thinking.
They investigated the degree to which these norms were implemented in 19 1st-grade classrooms in the Chicago area. In only one of the classrooms did the authors witness all three norms addressed. In attempting to understand why so few teachers implemented “extended students’ thinking,” they deduced that teachers needed to shift from talking about “their” and “the text’s” mathematics to talking about children’s mathematics.

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
In summary, studies highlighted in this section point to potentially significant sources of variation in the impact of NSF-supported mathematics curricula that should be addressed in program evaluations. A curriculum’s program theory may presume a certain instructional discourse style that requires significant changes in teachers’ beliefs and practices. It may also be designed with the anticipation that teachers will foster certain social and sociomathematical norms when it may be uncommon that they do.
Professional Development
The most compelling pattern among included case studies was the importance of professional development. Schoen et al. (2003) found that teachers’ engagement in professional development, comfort with class management, and high performance expectations for their students were the best predictors of student achievement among a sample of teachers implementing the Core-Plus Mathematics Project. Collins (2002), in a comparative case study of one implementation of the Connected Mathematics Project, examined student achievement in relation to the level of teacher professional development in three Boston schools. Collins found that “students in schools whose teachers received sustained professional development designed to meet the needs of the participating teachers performed significantly higher on both the Massachusetts Comprehensive Assessment System (MCAS) and a nationally normed achievement test, TerraNova, than did those students whose teachers had not participated in consistent professional development” (p. 8).
Bay (1999) studied the effects of teacher collaboration on curricular implementation and determined that a lack of collaboration among teachers at an implementation site appeared to allow room for individual teachers’ frustrations to foment, sometimes leading to their return to old routines. On the other hand, collaboration among teachers at an implementation site appeared to sustain excitement and commitment to change. Dapples (1994) found that teachers who implemented the Systemic Initiative for Montana Mathematics and Science (SIMMS) curriculum found professional development instrumental in implementing SIMMS. However, these same teachers were also teaching “traditional” courses and found few entry points to use their new routines learned in professional development.
These case studies show clearly that the level and quality of professional development entailed in a curriculum implementation are important factors in its effectiveness, especially when the curriculum demands changes in teachers’ beliefs, understandings, and practices. Therefore, evaluators should document and measure the types of opportunities for teachers to

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
participate in professional development and the frequency of these opportunities as well as the opportunities for teachers to collaborate (i.e., on curricular decision making and curricular implementation issues).
Time Management
While many studies pointed to the disruption in well-established routines that teachers had developed with other curricula, two studies in particular suggested processes that might systematically create time management problems. Hetherington (2000) provided a rich description of her mathematics department’s earnest and well-intentioned attempt to implement Core-Plus Mathematics Project materials. Teachers experienced significant problems in managing the flow of instruction in the context of Core-Plus’ emphasis on group work and, as already mentioned, students’ predilections to be minimally engaged with instruction. Keiser and Lambdin (2001) documented the difficulty teachers had in organizing their instruction into coherent chunks that had educationally appropriate beginning and ending points and yet fit into fixed time blocks in the school day. They pointed out the difficulty of parsing a curriculum organized around conceptual themes into predetermined time blocks in comparison to doing the same with a curriculum that is organized by topics, facts, and procedures that typically are presented in smaller units.
Kramer and Keller (2003) suggested that a conceptually organized curriculum that is used in the schools with block scheduling can work better than a procedurally organized curriculum used in schools with traditional scheduling. In a finding reminiscent of the prior professional development section, Kett (1997) found that teachers had persistent difficulty in implementing new assessment procedures that involved a greater volume of student work, and student work also came in forms that placed higher demands on teachers’ abilities to interpret the student’s work and in-class contributions.
Comments on Case Studies Evaluations
It is worthwhile to note that case studies often reveal aspects of program components, implementation components, and their interactions that work differently than intended by program designers. This is one reason why case studies are a valuable tool in an evaluator’s methodological toolkit. We note again that our sample of case studies was limited to studies of NSF-supported curricula and hence no broader generalizations can be drawn.
Although the case studies were valuable in pointing to important variables that should be included in future curriculum evaluations, the commit-

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
tee noted several aspects of the case studies as a group that deserve comment. Overall, the case studies displayed an inattention to theory and a disconnection with other research on learning and teaching mathematics, thereby limiting the ability of results to become cumulative across studies. The majority of studies told a story instead of developing a theory. At times, it seemed as if the documentation of contextual detail overpowered the building of theoretical, testable, and generalizable constructs and hence limited the potential to use these studies to contribute to an aggregated knowledge base on effectiveness.
Many studies would have been strengthened considerably if they had attempted to explain their observations by drawing on pertinent theories in the creation of constructs that pointed to mechanisms presented in the case that may be in play generally. Explaining observations requires systematic data collection and analysis, and it requires the investigator to entertain competing interpretations of what happened and competing explanations of why things happened as described. Often studies could have been strengthened by some degree of quantification of observations—even a simple count of how many times something happened. Baxter et al. (2001) illustrated this point by reporting the percentage of times they observed a particular behavior out of the total number of observations of that class of behaviors. A significant aspect of Baxter et al.’s study is that they developed a construct in a way that allowed them to measure it, and thereby gave readers a fairly refined sense of the intensity of the phenomenon. The better studies tended to quantify their observations or to embed them within a theoretical framework.
Prior comments notwithstanding, the case studies examined by the committee provided valuable information about variables that program evaluations should include and about the roles that case studies can play in those evaluations. The variables identified by examining case study results arose primarily because people wondered about what it meant for a particular curriculum to be effective. They wondered why particular curricula—each with its own theories of what students need to learn and of how to support students in learning it, and each implemented in settings that posed constraints on how those theories could be actualized in practice—had the effect they did. Therefore, the variables identified by the committee are potential explanatory variables in an evaluation—explaining why a curriculum had the effect it did and helping to answer the question of whether it was effective in fostering students’ mathematics learning.
Moreover, the committee believes that if program evaluations systematically included explanatory variables in their study of curriculum effectiveness, the gap between research and evaluation would be largely erased. Thus evaluation studies would become far more valuable to the educational field. Moreover, the inclusion of explanatory variables would give program

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
adopters more precise information about whether the conditions for effectiveness demanded by a particular curriculum coincide with their own local conditions, commitments and resources coincide. Thus evaluation studies would be a valuable resource for stakeholders and researchers.
Finally, the committee believes that evaluation studies should include case studies as a matter of design. Few of the case studies examined in this study were planned and executed as part of a larger program evaluation. Instead, faculty and graduate students who were somehow connected to curriculum projects conducted them as research, but independent research was not a component of an overall evaluation. If case studies were included by design in program evaluations or even planned as a systematic set of cases, we would anticipate a greater aggregation of insights into why some programs are effective under certain conditions and not effective under others. Therefore, over time, the creation of principles for designing curricula to achieve results under specific conditions could be established. Many studies would have been strengthened considerably if the investigators had quantified some of their observations, even at the level of simple coding of frequencies of outcome. Descriptions of how the primary constructs were identified and verified also would be helpful.
SYNTHESIS STUDIES
For the purposes of this study, a synthesis study summarizes several evaluation studies across a particular curriculum, discusses the results drawn from these data, and draws conclusions based on the data and the discussion. The evaluations used in synthesis studies may employ their own quantitative analyses, or they may refer to quantitative analyses in the studies they summarize. Summary studies also may refer to qualitative results. Studies used as data for a particular synthesis study might draw conclusion(s) based, inter alia, on standardized tests, items from national and international assessments, college entrance examinations, specially designed assessments, performance of certain students involved in the study using various methods, observations of teachers and classrooms of students, or survey instruments.
In all, the committee found and analyzed 16 synthesis studies of the curricula discussed in this report. Fifteen were NSF supported and one was a UCSMP study. Eleven of 15 appeared in one source (Senk and Thompson, 2002), 10 of which were about different NSF-funded curricula and 1 about UCSMP. The Senk and Thompson book1 itself is counted among the synthesis studies because it offers synthesis across some or all of these 11 curricula studies in its introductory and concluding chapters. Most of the
1
Senk and Thompson (2002) were funded by the National Science Foundation, ESI-9729228.

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
studies in Senk and Thompson include a brief statement of the historical background and theoretical basis for the development of the curriculum, the content covered in the curriculum, a discussion of student outcomes, and a discussion of possible explanations for these outcomes (e.g., Carroll and Isaacs, 2002).
Three of the synthesis studies are summaries of those aspects of evaluation that are related to teacher involvement (Romberg, 1997, 2000; Shafer, in press). The remaining study involved political ramifications of a new curriculum (Schoen, Fey, Hirsch, and Coxford, 1999). Each of the 16 synthesis studies is authored by a senior writer of the curriculum materials or by a person closely allied with the curriculum. Therefore, in addition, there is a need for researchers not connected with the curriculum materials to do this type of research.
Examples of Synthesis Studies
Example from Everyday Mathematics
Carroll and Isaacs (2002) summarized each of six quantitative studies measuring student outcomes. These studies, one of which is a longitudinal study, compared outcomes of students using the Everyday Mathematics (EM) curriculum with those who used other curricula. Data were gathered from standardized and specialized tests and survey instruments. The data were drawn mainly from suburban students. References to the original reports were given. Carroll and Isaacs then synthesized data across these studies to conclude:
Generally, results indicate the following. First, on more traditional topics, such as fact knowledge and paper-and-pencil computation, EM students perform as well as students in more traditional programs. However, EM students use a greater variety of computation solution methods. Students are especially strong on mental computation. Second, on topics that have been underrepresented in the elementary curriculum—geometry, measurement, data, and so on—EM students score substantially higher than do students in more traditional programs. EM students also generally perform better on questions that assess problem solving, reasoning, and communication. Third, although some districts report a decline in computation, especially in the first year or two of implementation, this is usually offset by gains in other areas. Many districts, moreover, report gains in all areas. On tests that are aligned with the National Council of Teachers of Mathematics (NCTM) Standards, such as the Illinois Goal Assessment Program, EM students nearly always show significant improvement over scores before the curriculum was adopted. (pp. 103-104)

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
Example from the Systemic Initiative for Montana Mathematics and Science
Lott et al. (2002) offer an example of the historical background and basis for the development of the curriculum, and the content covered in the curriculum. They begin with a brief introduction describing the context for the creation of the SIMMS curriculum as part of the NSF-funded State Systemic Initiative in Montana. Then they summarize the history of the curriculum as growing out of a 1989 national survey, “Integrated Mathematics Project,” funded by the Exxon Education Foundation. The article describes the development of the curriculum, the philosophical underpinning, as well as the aims and goals of the various curriculum levels. The authors discuss assessments that have been conducted in Montana, Cincinnati, and El Paso, and follow-up surveys with certain college students who had passed three or more full years of SIMMS Integrated Mathematics (IM). The authors then state the following conclusions:
Evidence from most facets of the evaluation shows that study with the SIMMS IM curriculum does not limit students’ abilities on such standardized tests as the mathematics portion of the Preliminary Scholastic Aptitude Test (PSAT). Teachers of the SIMMS IM curriculum are preparing students very well in the areas of problem solving, reasoning, applications, communication, and use of technology. Students do at least as well overall in collegiate classes, especially the nondevelopmental classes. Students who must take developmental classes in college are at a disadvantage when compared with students who studied a more traditional curriculum, though fewer SIMMS IM students appeared in those courses when given the option of not taking them.
The collegiate student interviews suggest that the view of collegiate mathematics is not changing as rapidly, specifically in Montana, as the secondary curriculum is changing…. The student interviews also suggest that teachers at the secondary level need to continue their learning if they are to implement reform curricula. Use of technology, an integrated mathematics curriculum, and new forms of pedagogy provide a basis for needed inservice for current teachers at all levels. (pp. 421-422)
Example from Mathematics in Context
Romberg (1997) synthesizes several studies of the impact on teachers of the Mathematics in Context (MiC) curriculum. Many of these are case study analyses and are dissertations from Romberg’s home institution, the University of Wisconsin, Madison. In general, these studies trace the impact of using MiC materials on the practices of fully certified, experienced, mainly suburban teachers. The MiC materials presented many challenges to

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
teachers familiar with using traditional instructional practices: Authority for gaining knowledge was transferred from teacher to student. Organizational and management strategies became a problem for some. Views of students and their capabilities were challenged. Romberg concludes:
This approach to mathematics teaching “represents, on the whole, a substantial departure from teachers’ prior experience, established beliefs, and present practice. Indeed, they hold out an image of conditions of learning for children that their teachers have themselves rarely experienced.” Such departures from traditional practices were evident in every classroom in these studies. Clearly these departures are nonroutine forms of teaching new to mathematics teachers, and this should lead to new organizational relationship. (p. 377)
Although this synthesis study addresses only one curriculum program, synthesis studies across programs may help to expand the field and shed light on various topics.
Summary
As Senk and Thompson (2002) point out:
Researchers investigating the effects of curriculum face many issues, including the following: what questions to ask, what type of research design to employ, how to ensure that students using various curricula are comparable at the start of their experience, how to determine the extent to which teachers implement the curriculum, and what measures to use to determine the effects of the curriculum. (p. 17)
The considerable variation in research design and evaluation methods across studies may pose serious challenges to identifying common themes. However, conclusions drawn from such collective evidence can be compelling. The problem with the studies reviewed is that when the syntheses are all written by senior authors of the curricula, the credibility of the results may be challenged. Although these syntheses provided important sources of integrated data on the programs, we found that they tended to lack critical scrutiny and thus may not convince readers that the authors had sought out and included competing interpretations. A common database of variables that all evaluation studies contained could assist researchers when doing synthesis studies and possibly provide additional reader confidence in the findings.
Furthermore, there was a lack of comparison and contrast across programs to discuss how the contrasting and complementary findings around a common research interest might inform each other. Finally, judging by the evidence presented in this report, there is a need to pay much more attention to the adequacy of design of curricular evaluations. The final review

OCR for page 167

On Evaluating Curricular Effectiveness: Judging the Quality of K-12 Mathematics Evaluations
chapter by Kilpatrick in Senk and Thompson (2002) provides a more balanced and challenging representation of what is needed to demonstrate curricular effectiveness.
Nonetheless, the committee encourages synthesis evaluations, and funding agencies should consider supporting them, as a means to build on previous knowledge, to provide a summary of existing studies, to enhance understanding of the effectiveness of the various curricula, to build scientific consensus on certain aspects of education research, and to contribute to theory building.