A Collaborative Agenda for Improving International Comparative Studies in Education

HOW CAN INTERNATIONAL COMPARATIVE STUDIES BE IMPROVED?

It is important to conserve and accumulate whatever knowledge and experience is needed for the betterment of future comparative international studies. In particular, since advancing the science and technology of cross-national comparative education research is a legitimate end in itself, both theoretical and empirical research aimed directly at methodological improvement should be encouraged.

To improve the way such knowledge is gathered, interpreted, and used, a number of topics that require attention are discussed in the sections that follow: research design, data analysis, and dissemination; methods for assessing current and desired outcomes of education; comparability across nations and ways of interpreting differences within varying contexts; use of ethnographic and historical studies to strengthen investigations using statistical analysis and to provide in general a deeper, richer sense of what education is, can, and should be; quality control and monitoring; and accumulation, synthesis, dissemination, and use of cross-national knowledge about education.

Research Design, Analysis, and Dissemination

Current approaches to the design and analysis of international studies are inadequate to show how much various educational inputs and processes contribute to students' learning and later occupational endeavors and to estimate the effectiveness of these inputs and processes in different environments. In such analyses, it remains important to find better ways to take into account factors not under the control of education decision makers, such as students' family background and neighborhood setting. More generally, there is a need to be thoughtful and explicit about the underlying theories for comparative education research.

It is also important in both the design and dissemination phases of a study, to consider researchers who may wish to carry out secondary analyses of the data. Secondary analyses of large education studies frequently make major contributions to the literature and to policy concerns. For example, using data from the public-access data base developed at the University of Illinois at Urbana-Champaign for the Second International Mathematics Study (SIMS), Westbury (1992) compared mathematics achievement in the United States and Japan. He examined curriculum and achievement in grade 7/8 algebra and grade 12 elementary functions and analysis (calculus). He found that the curricula in the United States are not as well matched to the SIMS tests as are the curricula of Japan, a situation that results in lower achievement in the United States. In grade 8 algebra classes, however, the United States curricula is comparable to the "curriculum" of the test and the Japanese curriculum, and U.S. achievement is similar to that of Japan. His finding that the difference between Japanese and U.S. achievement is a consequence of different curricula is very different from earlier analyses of SIMS, which concluded that U.S. students performed poorly in every grade and in every aspect of mathematics tested when compared with students in the other 19 countries in the study (Crosswhite et al., 1986). He concludes that analysis focused on the undifferentiated variable of "country" as a unit of analysis, which has been used in studies like those of the IEA or the IAEP, needs to be rethought to provide more emphasis "on analytic variables defining the properties of school systems that are common across countries but that might be distributed in different ways in different places" (p. 23).

Therefore, in designing studies, consideration should be given to fostering ways to increase the opportunities for the research community to do secondary analyses by (1) using well-established methods of analysis or (2) clarifying methodological innovations to enhance their understanding and use. It is also crucial for comparative data to be made available to the education research community in a timely and effective manner, since most researchers may not be affiliated with the organization collecting the comparative data and so will not have early access to the data. It should be possible to establish a data bank in which all published data from international education studies are entered and to make the data bank widely available to anyone with a computer and a modem.

It is equally important to make available to a wide audience discussions in nontechnical terms of the primary research questions addressed by cross-national studies. These presentations should spell out the advantages and disadvantages of alternative designs to provide answers to the research and policy questions posed. The topics covered in such a public discussion should also include the validity of the measures used in the designs and the possibilities for including outcomes other than student achievement. In clarifying issues that have arisen in previous studies, it is important to distinguish between the objectives and value of cross-sectional studies and longitudinal studies and to clarify the advantages and disadvantages of examining cumulative achievement levels over examining change in achievement levels within a given time period (such as a school year). Publishing methodologically oriented reports that have a common theme, namely the design and analysis of cross-national educational studies, or sponsoring training institutes on this subject, would contribute both to better understanding of their value to the countries concerned and to influencing the design of future studies.

Methods for Assessing Current
and Desired Outcomes of Education

The research questions investigated in a country are often shaped by the methods used. Just as the global community provides a natural laboratory for comparing different education goals, systems, and methods, so it provides a variety of traditions and approaches to educational measurement. In the United States, since researchers have historically made much use of standardized multiple-choice tests of cognitive outcomes, research has emphasized ways to improve the kinds of learning that can be measured in that way. Some European countries have made greater use of essay examinations or performance assessments and have established research traditions that have placed less emphasis on reliability and objectivity as defined in the United States. In still other parts of the world, education scholars and policy makers have placed more emphasis on less tangible schooling outcomes, including personal and social values, character, and the ability to communicate or cooperate.

Cross-national studies are under pressure to draw on this plurality of methods and traditions, and at the same time they have been challenged to discover ways to quantify those kinds of learning for which good measures have not yet been found. Specific areas in need of investigation include: (1) reliable and valid performance-based measurement of cognitive learning outcomes across subject areas and grade levels; (2) measurement of personal values and other affective goals of schooling; (3) measurement of the abilities to communicate and cooperate as well as other schooling outcomes that are manifested only in a social context; (4) measurement of context variables at the level of class, school, neighborhood, and society, rather than the individual student; (5) measurement of schooling processes, including in particular a student's opportunity to learn not only content knowledge but also strategies for learning.

In planning a cross-national study, consideration should be given to whether it will serve its intended purpose. In particular, evaluation is required of its intended and likely unintended social consequences. Although it may not be possible to identify all the social consequences of an assessment, Messick (1988) has suggested contrasting the potential social consequences of a proposed assessment with those of alternative procedures, including not testing at all. This type of appraisal contributes to a consequential basis of test validity and should preclude some of the problems in the use of international data.

Comparability Across Nations

In making cross-cultural comparisons, in addition to value and cultural issues, both logical and measurement problems arise in defining and implementing fair and meaningful practices. Two subcategories are important: first, primarily technical questions, for which different languages and practices complicate the problem of producing valid comparative information but do not, in principle, complicate the definition of sensible comparison groups or of the variable to be measured. As an example of a primarily technical problem, consider the comparison of per-pupil expenditures. Different education funding policies, accounting conventions, and currencies may complicate the development of comparable cross-national statistics. Economists have spent a large amount of time on this problem and have addressed it in great detail. Following the lead of quantitative economists, it should be possible to reach consensus on what should or should not be included as an education expenditure. In resolving such questions, useful models from other fields may include such international classification codes as the Standard International Trade Classification and the International Standard Industrial Classification of All Economic Activities. Effort should be directed to further development of such codes with a common language to describe and discuss educational organizations, processes, and outcomes so that cross-national measures and analyses of these elements can be improved.

At a second level, there are deeper substantive questions, for which different languages and practices may lead to incommensurabilities, making it difficult even to conceptualize variables representing corresponding attributes of students, practices, and institutions in different parts of the world. A problem in this category is the comparison of writing proficiency across languages. Problems of translations per se are difficult enough, but, beyond translation, languages have different conventions for organizing prose and presenting ideas.

Comparisons among countries can be difficult to analyze because of the large number of differences in the countries being compared. The large number of interactions between variables makes it difficult to identify which variables influence outcomes and which are covarying with those that influence the outcomes. Both the problem of covariation and the problem of interaction cause difficulties in making interpretations in comparative studies. These difficulties in drawing conclusions from comparisons do not mean that comparisons should be ignored -- there are many useful advantages afforded by such investigations as discussed earlier in this report.

In cross-national research, such issues of what constitutes an intelligent comparison could be more readily addressed if some studies were focused on specific topics within a few participating countries instead of omnibus studies welcoming as many countries as wish to join. For example, limiting the countries in a study to a few developed countries in Europe and North America (or to groups of developing countries, African countries, Asian countries, or South American countries) would reduce the number of differences between countries being compared, making it easier to identify which variables influence outcomes. Conceptualizing variables representing attributes of students, practices, and institutions would also be easier for a limited study than for a worldwide study. Finally, if the study were focused on a few topics, for example, teaching practices and student achievement in secondary school chemistry, the topics could be explored in greater depth than in a study with countries having very disparate education systems.

In addition, a limited study would have fewer problems with languages, culture, and value systems; would have fewer communication problems in study administration; would require less training; would be less costly; and could be completed on a more timely basis. Such a study could also be designed to meet the policy needs of the participating countries.

Qualitative Studies and Large-Scale Surveys

Many significant questions in comparative education are best addressed by small, focused studies, which may draw on a broad range of techniques and provide a deeper, richer sense of what education is, can, and should be. Thus, in addition to large-scale surveys, there is a need for a wide range of other cross-national research, such as ethnographic studies, case studies, small-scale focused quantitative and qualitative studies, and historical studies that would allow us to understand what it means to be educated in diverse settings around the world. Such studies go beyond the exploratory and the descriptive. They have become essential parts of the explanatory repertoire. These studies provide ways of analyzing and explaining a variety of processes, conditions, and contexts. They help to uncover patterns of interaction and to interpret complex situations both in the classroom and in the larger community. There is a great need for small, in-depth studies of local situations that would permit cross-cultural comparisons capable of identifying the myriad of causal variables that are not recognized in large-scale surveys. In fact, much survey data would remain difficult to interpret and explain without the deep understanding of society that other kinds of studies provide. Given that research in cross-national contexts benefits from increased documentation of related contextual information, it would be useful to combine large-scale surveys and qualitative methods.

Ethnographic and other qualitative studies are especially important in clarifying the perspectives of many diverse groups of learners and their teachers. Such groups include gifted and talented students, individuals with disabilities, racial and ethnic minorities, women, and religious groups that reject the state's secular systems of education.

But there is also a role for the large-scale surveys. About the only way to obtain a simple numerical comparison of a large number of countries on a common set of measures is with a large-scale survey. Large-scale studies also permit curriculum analysis on a scale that could not possibly be done adequately by a few independent researchers, because such studies require a high degree of international organization and structure. For example, the preliminary curriculum analyses of data collected by IEA in the Third International Mathematics and Science Study (TIMSS) (Schmidt, 1993) reveal a large and interesting variation in mathematics and science textbooks across a large number of countries--variation that can be helpful in better understanding the role of the teacher in various countries. Textbooks can have a major influence on teachers' curricular decisions (Schmidt et al., 1987). Teachers are more likely to teach the topics included in the textbook than others not included. Large-scale studies such as TIMSS or the OECD indicators project will also provide trend data, and independent researchers are unlikely to have the commitment, longevity, or resources to produce such data in adequate fashion.

Improving Quality Control and Monitoring

Due to the lack of continuity in funding and personnel as well as the lack of an organizational structure capable of maintaining rigorous quality control, the overall quality of some large-scale cross-national studies of educational achievement has suffered. In many countries, the quality of the sampling, the translation of instruments into the national language(s), and the collection and management of data have been impeccable, but in other countries errors have been made in one or more of these processes. The results have been damaging in two respects: on one hand, suspect conclusions have been accepted, and on the other hand, critics of large-scale studies have overgeneralized and exaggerated these faults and asserted that nothing in the whole enterprise is valid or to be believed. For example, whereas in some countries, exclusions from target populations may have been substantial enough to mislead the user about the effects of a given country's schooling policies and practices, such exclusions can be clearly identified in reports and, in many cases, shown not to be so extensive as to affect conclusions drawn about the country as a whole.

Hence increased attention to controlling all sources of errors is important. This includes such matters as defining comparative populations, constructing sampling frames and selecting samples, developing instruments and maintaining cross-national equivalency of instrumentation, administering instruments and recording data, entering and editing data, coding and scoring, and weighting and analysis. Documentation on the steps taken to control errors is essential at all stages: at the planning stage, during data collection as a guide to study personnel in all locations, and as a necessary part of reporting what was done.

Fostering Accumulation, Synthesis, and Dissemination

Dissemination and utilization of international education data remain weak links in the information-reform connection within education. Public discussion of cross-national education research is frequently superficial or incorrect or both. The use of international data in the United States needs close scrutiny. In the past, test results were often extracted from a study and reported independently of other variables that provide essential context for the results. Sometimes the limitations, such as sampling problems, were glossed over as if they did not matter, and at other times they were exaggerated as if they were sufficient to rebut everything that could be learned from such studies. In the future, researchers and organizations responsible for cross-national studies need to begin early in project cycles to plan for public and professional discussion of the results and to improve the presentation of findings to all concerned. Press conferences should be planned that respond to the needs of journalists and at the same time present a more in-depth background on what has been done. These meetings should clarify the importance of considering variables such as demographics, expenditures, and teacher training in conjunction with achievement scores in presenting the results of the increasingly sophisticated research on education that is under way or already available.

In addition to a better understanding of how and what to communicate, consideration is needed of the potential contributions of new communication technologies and data bases to the planning of high-quality cross-national studies and to the dissemination and utilization of results. The rapidity and simplicity of communication by facsimile and by electronic networks (e.g., BITNET and Internet) have revolutionized international research in the last five years. As cross-national studies produce items and instruments, it will be important to consider how all the information generated by such studies might be made more accessible through electronic communication networks.

The presentation and interpretation of current results could be improved if there were an easily accessible repository of cross-national research. Students of education- - from undergraduate prospective teachers to senior researchers in highly specialized subfields- - ought to have readily at hand a literature (good research published in reasonable places) and associated data bases that would give them access not just to a description of the structure of other education systems, but to an understanding of how such systems work, with particular emphasis on how context influences school organization, teaching practices, and learning outcomes.

Previous Section | HTML Home Page | Next Section

NAS Home Page | NAP Home Page | Reading Room | Report Home Page