| ||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 14
RECOMMENDED PRINCIPLES FOR
APPRAISING PROPOSALS FOR
INTERNATIONAL COMPARATIVE STUDIES
IN EDUCATION
This section presents the principles recommended by the board
for the appraisal of proposals to conduct international educa-
tion studies. These criteria do not constitute a precise set of
standards to be applied rigidly in assessing all proposals. Rather,
they are the dimensions that the board believes should be con-
sidered In reviewing plans for international comparative education
studies in which the United States is a prospective participant
or contributor. Comparative studies that exclude the United
States are obviously also important in the larger, global educational
context of which the United States is a part, but the board is
unlikely to review proposals for such studies. These principles
have been adopted both to guide the board's own appraisal of
planned activities and for consideration by all those who are
involved or interested in international comparative studies.
Introduction
The board encourages the conduct of international compara-
tive studies across a wide range of research strategies, formats,
and procedures and a broad range of nations. in the past, many
of the most widely publicized research efforts have been rooted
in cross-national comparisons of student academic achievement.
The dominant method has been item and student sampling,
that is, collection of responses from each student for a sample
of items from a pool and careful scientific sampling of schools
or classes. Where appropriately conducted, this is a productive
line of research and the board encourages similar efforts in the
future. However, there are other research models, some highly
quantitative, others relying on rigorous qualitative techniques,
that also can enhance knowledge. The board also encourages
international studies using qualitative techniques, especially when
14
OCR for page 15
PRINCIPLES FOR APPRAISING PROPOSALS
15
they enrich or parallel previous or contemporaneous quantita-
tive studies.
Explanatory and Descriptive Studies
Comparative education studies may be more or less directly
grounded in educational models or theories. At one end of a
continuum are theoretically based or explanatory studies in-
tended to build or test complex models linking educational
resources, practices, and outcomes. At the other end are descriptive
studies, intended only to monitor or document critical facets of
educational systems, practices, or outcomes.
More theoretically grounded studies often probe the relationships
among variables in an effort to seek evidence for causality.
For example, they might be designed to study the educational
effects of cultural and other large contextual differences among
countries or to determine the degree to which teacher charac-
teristics, family expectations, textbooks, or funding levels are
correlated with and might explain educational achievement.
They might relate the education levels of different nations'
populations to their financial support for schooling or to voter
participation. They may also be designed to compare peda-
gogical approaches and their effects on students' learning by
including longitudinal item-level data. Less theoretically oriented
studies might include collection and compilation of data on
student achievement, teacher salaries, curricula, or enrollments.
They might map the range of variation, determine trends over
time, or chart the progress of reforms. These studies are of
increasing interest to policy makers as nations intensify their
investments in human capital because they provide information
that can assist in shaping and selecting from broad policy options.
We caution, however, that the comparability of the results of
such studies depends on the degree of similarity between the
country contexts, and therefore the results must be placed in a
clearly identified context.
In discussing the board's principles for appraising compara-
tive education studies, we refer to less theoretically oriented
studies as descriptive, and those that are explicitly grounded
in particular theories as explanatory. We use the term explanatory
OCR for page 16
16
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
because explanation is the goal. However, it needs to be em-
phasized that correlations are not necessarily and often are
not indicators of cause and effect. In addition, there is no
sharp division between these two categories of studies, and
any particular study is likely to partake of both purposes in
some degree.
Quantitative and Qualitative Studies
Comparative studies also vary in their reliance on objective
measurement, quantification, and narrative description and on
use of statistical methods or systematic observation. There is
no sharp division between these latter two research approaches,
but we refer to the first approach as quantitative and the second
as qualitative. Some studies use both quantitative and qualitative
methods; in fact, qualitative strategies can be embedded in
quantitative studies to illuminate relationships.
Quantitative studies most often rely on scientific samples
from carefully framed populations that are usually defined at
the level of individual students, although primary and intermediate
sampling units may be at some other level of aggregation.
Numerically quantifiable data are collected, usually with tests
or questionnaires, and these sample data are used to support
statistical inferences to the population. Quantitative methods
can also be used to study resources, activities, and outcomes at
the classroom or school level.
Qualitative studies are more likely to use samples defined at
the level of classrooms, schools, or school systems, rather than
individual students. The number of units sampled Is typically
much smaller than for quantitative studies, but they are investigated
much more intensively. The sites investigated are usually chosen
systematically to represent a range of demographic characteristics,
organizational arrangements, or other features relevant to the
questions to be addressed. Observations and interviews will
be conducted over a period of time, sometimes by an investigator
who participates in the ongoing activities of the school or other
setting studied. Case studies can be used initially to document
relationships that, once understood, can then be translated to
survey formats; and survey results, in turn, can stimulate in-
depth case studies. A special type of qualitative study is docu-
OCR for page 17
PRINCIPLES FOR APPRAISING PROPOSALS
17
mentation relating to the history of education systems. His-
torical studies are very important for understanding the conditions
that account for particular structures of schooling and achievement
levels and can aid in developing realistic policy alternatives.
The fundamental principles of sound research apply equally
to qualitative and quantitative studies, but there are different
canons of systematic inquiry for each which entail different
warrants for generalization. Thus, proposals for qualitative or
historical studies and those for quantitative studies must be
evaluated by somewhat different criteria.
In characterizing studies, other distinctions can also be made.
Many studies are cross-sectional, obtaining data for only one
point in time. Others are longitudinal, obtaining information
on the same sample at various points in time, for example, at
the beginning and end of the school year. Other contrasting
approaches are large-scale, randomized surveys of entire nations
versus smaller, localized, but intensive observational studies.
The board believes there is value in all these different varieties
of inquiry and does not hold any particular research strategy,
descriptive or explanatory, quantitative or qualitative, longitudinal
or cross-sectional, to be uniformly superior. Rather, the overriding
concerns are that the methods used be appropriate to the ques-
tions posed and that, regardless of topic or technique, a pro-
posed study adhere to appropriate canons of systematic inquiry,
consistent with the principles, enunciated below.
These principles are to be regarded as a set of basic stan-
dards to which proposed studies should aspire. Rather than
suggesting what ought to be studied or which proposed studies
would be of greatest significance, these criteria only suggest
how a study ought to be conducted or what questions most
proposals should address. In practice, of course, discussions
about "how" will be shaped by views about what ought to be
studied and the significance of the issues.
Finally, it will be clear that not all of these principles are
relevant to all studies. Many pertain only to particular purposes
or methods of inquiry. Moreover, many of the principles describe
ideals that may sometimes be difficult or impossible to attain.
Because of practical constraints imposed by time, resources,
knowledge, and the sometimes competing values and interests
of study participants, the design of every study must embody
OCR for page 18
18
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
compromises. Depth may be traded for breadth, sample sizes
may be smaller and instruments shorter than ideal, and so on.
There is probably no perfect proposal or perfect study. Conse-
quently, researchers are encouraged to consider which principles
are most relevant to their own investigations and to view these
principles as ideals to strive for as they inevitably balance
competing demands and practical constraints. Certainly all
principles should be carefully considered in the design of any
study.
Relation to Education
The board interprets "education" broadly. In addition to for-
mal instruction delivered through various institutions to indi-
viduals of all ages (including adults), the term is intended to
include activities, whether formal or informal, that directly re-
late to education and educational agencies and institutions. Areas
within the purview of the board include studies or surveys of
student performance or other educational outcomes; educational
requirements; planning processes; curricula; instructional ma-
terials, resources, and practices; structural arrangements; pro-
fessional preparation; parents', pupils', and professional edu-
cators' attitudes; enrollment and dropout rates; and those that
analyze education as part of the political agenda or the economy.
Even this list is only illustrative; it is by no means exhaustive.
By way of contrast, proposed international comparative studies
or surveys of the effects of nutrition, housing, or health effects
on schooling, however significant and useful, would probably
not be construed primarily as studies of educational activities,
agencies, or institutions.
Relation to Other Studies and Information Sources
The value of achievement scores and other educational data
or findings may be enhanced when they can be compared directly
with information collected in the past or from other populations.
Thus, the board supports the idea of studies that provide for
linkages to earlier comparative studies or surveys in the same
subject area, even though it recognizes that most international
studies to date have not been so designed. Because of the
OCR for page 19
PRINCIPLES FOR APPRAISING PROPOSALS
feature of such studies.
19
technical difficulties associated with monitoring trends over
time, an appropriate statistical mode} should be a key design
~ ' ' ' When appropriate and feasible, the
value of a proposed study may also be enhanced by the use of
test items and data collection strategies that permit linkage to
planned or ongoing national or regional data collections. Such
linkages might be accomplished by providing for a core data
collection with options for national augmentation. However,
any such scheme should strive to ensure that augmentation
does not compromise the validity of the international comparisons.
Relation to Policy, Practice, or Understancling
in the United States
A proposal for an international comparative education study
or survey should be appraised first and foremost on its likelihood
of informing educational policies, practices, or the scholarly
understanding of professional educators and researchers. Or-
ganizations and individuals planning such studies should not
assume that the utility of what they propose is automatically
evident. Thus, a proposal should include a list of the questions
the proposers expect to answer, and it should include a de-
scription of its significance for informing policy makers, im-
proving practice, or systematically adding to knowledge. In
documenting how a critical issue will be addressed, the proposal
should show inputs that can be manipulated by policy makers.
It should show sensitivity to questions important to policy makers,
administrators, teachers, researchers, and other stakeholders,
and it should specify the means by which the analysis and
study conclusions will be disseminated to relevant audiences
in participating nations.
The board notes that studies narrowly limited to comparing
highly aggregated mean levels of educational achievement for
participating nations, assessed at a single point in time, are
likely to be somewhat more difficult to justify in terms of their
relevance to policy, practice, or understanding than are studies
with the potential to illuminate the role of educational factors
(e.g., organization of the curriculum or teacher training) in
promoting achievement. They do, however, provide impor-
tant contextual information for policy makers, particularly on
OCR for page 20
20
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
macroleve] and alterable variables. Clearly, the board has spe-
cific and particular concerns with the utility of cross-national
studies to audiences within its own nation and therefore en-
courages proposals for studies of potential value to educational
practice, policy, and research in the United States.
Every country's curriculum is rooted in its culture. Some-
times, in the interests of expanding a study to make it a wide-
ranging cross-national comparison of achievement, data relevant
to national understanding and national policy may be compro-
mised. More detailed and purposeful studies of a small num-
ber of comparable countries may be more useful in these cases
than large-scale cross-national studies.
Attention to Eclucational Influences and Cultural Context
The cultural context for learning may contribute to differ-
ences in expectations that affect not only what is taught but
when it is taught. The fundamental problem of cross-cultural
comparisons is the need for a strong theory explaining the
contextual differences among the nations.
A proposed international study should display sensitivity to
the cultural contexts (e.g., language spoken, religion, laws,
implements used, values held) for the education dimensions to
be assessed. The study plan should be reviewed by an individual
in each participating country who understands how educational
influences and cultural context shape and are shaped by policy.
Also of concern are demographic and economic trends disag-
gregated by occupational divisions or rural-urban residence,
for example, to permit examining the educational attainment
of various subpopulations across nations. Among other concerns,
the utility and interpretation of the study should be considered
in the light of participating nations' resources, curricula, graduation
requirements, and school-going populations. Even descriptive
surveys, intended to chronicle the conditions of two or more
nations on one or a few dimensions (e.g., teacher salaries or 12-
year-olds' mathematics knowledge) should strive to provide
information regarding the context—country wealth, value placed
on technology, and so on in which such conditions are em-
bedded in each of the nations included in the sample. Although
much of this information is available, organizing it into a com-
OCR for page 21
PRINCIPLES FOR APPRAISING PROPOSALS
21
mon framework with interpretive usefulness can be very diffi-
cult.
Conceptual Coherence of the Research
Another underlying principle in considering proposals, par-
ticularly those for explanatory studies, is the degree to which
the prospective study represents a conceptually cohesive research
endeavor. This means that a proposal that is technically sound
but that largely ignores past studies or is disconnected from
existing bodies of knowledge in the study area, or in which
intellectual elements of the research are fragmented or contra-
dictory, may be inadequate. Descriptive studies should likewise
demonstrate awareness of any recent closely related studies.
Research Neutrality and Involvement
An international comparative education study must avoid
political, national, religious, racial, gender, or ideological bias.
It is particularly important to make certain that, if western
paradigms are used, they are relevant to other geographic areas.
Therefore, it is essential that all nations to be included in a
study participate in the study design, and mechanisms for fa-
cilitating such participation should be described in the proposal.
Although it is important to safeguard against biases, actual
differences (political, ideological, gender, and even religious)
present challenges in comparative research that must be recognized.
Such differences are often meaningful sources of cultural variation.
International Scope
Prospective studies submitted should have a clear cross-na-
tional scope, and the United States, either in toto or in appropriate
states and regions, should be included among the nations pro-
posed to be studied. The United States and at least one other
nation should be involved, unless a study has already been
done in the United States and the same study is being repeated
in other countries to obtain relevant comparisons. In general,
there should be no upper limit on the number of international
comparisons to be undertaken, although for reasons of resources
OCR for page 22
22
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
and manageability it may be important to limit the number of
countries participating in any given study. Involvement of
developing countries in international studies contributes to the
development of local research capacity and also broadens the
sample of participating countries. Third-worId participation
improves North-South dialogue as well as East-West linkages.
Education research studies are good vehicles for building trust
and cooperation. The important consideration is that the pro-
posed study be clearly cross-national in its scope and intent.
Conclitions under which countries (or national data) watt be
excluded from a given study which are usually associated
with data quality or failure to meet deadlines should be macle
explicit.
Personnel, Institutional, and Financial Capacity
Organizations and individuals proposing a comparative in-
ternational study should have qualifications and credentials
appropriate for the proposed undertaking. The institution pro-
posing the study or serving as the international center should
demonstrate that it has a good research record, preferably in
international research. The institution must show that it pos-
sesses among its staff the necessary organizational, language,
psychometric, statistical, probability sampling, data management,
and specific subject-matter skills, as well as staff who have a
thorough knowledge of the principal ideas behind the educational
systems that are included in the study and experience working
with researchers in different countries and cultures. The in-
dividuals who coordinate the study within individual countries
are also key for success of the study. They should have a very
thorough knowledge of their own educational systems and of
the subject areas under study, and they should have some ex-
perience with survey research. To participate effectively in the
international planning meetings they need to speak the inter-
national common language which currently is English. Cross-
national study organizers need to ensure that participating nations
have available sufficient expertise to enable them to fulfill their
obligations.
In addition to ensuring that the researchers involved possess
the appropriate background and training, evidence should be
OCR for page 23
PRINCIPLES FOR APPRAISING PROPOSALS
23
provided that financial resources being sought for the proposed
study (or, occasionally, already available) are sufficient to con-
duct the study in a technically valid manner. The matter of
sufficient resources is particularly significant. Past experience
suggests that proposed studies are frequently well conceived,
but that they later develop operational flaws due to debilitating
compromises necessitated by inadequate resources. International
studies cost more than national studies, but without realistic
handing neither the quality of the work nor adherence to time
schedules can be guaranteed. The board encourages organizations
that are planning international studies and researchers who
undertake responsibility for a country's participation in a study
to avoid such situations by ensuring from the outsets to the
extent reasonable, that adequate resources exist or wiD be obtained.
Prior to undertaking a study, the organization responsible for
the international aspects of the study should have firm funding
commitments for international planning (both theoretical and
operative); coordination; instrument development; training; data
cleaning; analysis; and data documentation, preservation, and
dissemination.
The study plan should demonstrate that the steps of the study
are well integrated and mapped out in advance. Provision
should be made for an initial task force to secure pertinent
expert advice, and sufficient time should be provided to secure
funding from multiple sources. Schedules and budgets should
be realistic and should cover data analysis, reporting, and dis-
semination as well as study design and data collection. Finally,
it is important to ascertain whether a proposed study is overly
ambitious. Would participating countries have the personnel
and financial capacity and endurance to complete a study with
large numbers of instruments ant] questions, which would take
up to 7 years, or would a more modest study be more productive
in the long run?
Technical Validity
A complex education study may serve a variety of descrip-
tive or explanatory needs, but its primary justification is likely
to rest on the few central questions or issues it is designed to
address. For any proposed international study, these key ques-
OCR for page 24
24
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
lions or issues should be explicit. In an explanatory study, the
relationship of the issues to existing knowledge should be clear,
and the study should be technically capable of addressing those
issues. The proposed methodology, design, and statistics should
fit the underlying model. The more specific guidelines that follow
are subordinate to this general principle. Their importance to
any particular study will depend on the major purposes the
study is intended to serve. They are directed primarily to cross-
national student achievement studies, which have been the fo-
cus of most of the board's early activity. The board's scope of
activity is expanding and later revisions of the principles will
include specific guidelines for other kinds of comparative studies
of education, for example, studies that attempt to explain how
differences in attainment are produced or those that focus on
more culture-bound factors.
Sampling en c! Access to Schools
Nearly all quantitative studies, both descriptive and explanatory,
as well as some qualitative studies, necessitate drawing a sample
from the full population of all respondents, that is, all teachers,
all administrators, all students at an age or grade level, or all
policy makers. Valid estimation of population parameters from
sample data depends critically on rigorous adherence to an
explicit sample design. Whenever statistical inference from a
sample to a population is intended, proposals for international
comparative studies should describe in appropriate detail their
plans for framing and selecting samples in participating coun-
tries as well as for exclusion of particular subgroups (e.g., persons
who are developmentally disabled or who do not speak the
language of the test). Subgroups should not be excluded solely
for convenience in administering a test: for example, students
not in the modal grade for the target population should not be
excluded. Whenever a subgroup is excluded, information should
be provided on the portion of the target population excluded
and the extent and direction of bias introduced by the exclusion.
Potential differences in student demographics among countries
must also be considered. The population of students in coun-
tries in which the rate of participation in education is low may
OCR for page 25
PRINCIPLES FOR APPRAISING PROPOSALS
25
be very different from the population sampled in a country
where the participation rate is high.
Each sample should be designed so as to support reasonably
accurate inferences about an age or grade cohort, and the pro-
portion of each cohort covered should be carefully estimated
and reported. The sample should be designed to ensure it
captures the range of individual, school, or classroom variation
that exists in the nation sampled. Explicit delineation of the
populations and subpopulations to be sampled is critical. Within-
country samples may be defined according to geographic regions,
language groups, school systems or sectors (e.g., public versus
private), or other relevant stratification variables.
The board recognizes the difficulty of defining comparable
samples across different nations' school systems and curricula.
Nonetheless, corresponding national samples should be defined
in such a way that valid and informative cross-national comparisons
are possible. To facilitate the sample selection, an international
sampling manual is essential. In view of the complexities in
this area, the board encourages the appointment of an experienced
and expert sampling consultant to scrutinize sampling plans in
all participating countries. Individual country samples should
be approved by the international sampling consultant before
testing takes place.
Well in advance of the date for test administration, arrange-
ments should be made with the appropriate organization or
individuals (ministry, state, district, school, teachers) to ensure
high participation rates in the study. While the principle of
strict adherence to an explicit sample design is sound, the achieved
sample in actual international studies is usually different from
the designed sample, especially so in countries in which response
rates are low. The sampling manual should include a maxi-
mum acceptable nonresponse rate for inclusion of a country's
data in the international analyses.
Subnational or regional units smaller than a nation should
be allowed to participate in international studies if they have
separate autonomous school systems. However, study results
for such units should be reported in separate tables from the
data for whole nations.
Even though the sample designs for large-scare studies sat-
isfy the criteria described above, typically they cannot afford
OCR for page 26
26
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
the close direct observations that qualitative educational re-
searchers want. Smaller in-depth studies of relatively small,
localized samples in a small number of sites can also play an
important role in comparative education research and policy
development.
Content Sampling and Design of Achievement Items
Achievement items in an international comparative study may
be used to support inferences about broad curriculum areas.
Thus, it is critical that they be chosen according to an explicit
and justified plan. The curricula of all participating nations
should be considered in formulating such a plan, and content
specifications should be developed through a consensual process
involving representatives from all of the nations involved. Ample
time should be allowed for meetings on content sampling and
design of achievement items. At these meetings, information
should be available on the purpose of each item, to assist the
country representatives in selecting those that will evaluate
the most important knowledge and skills. In general, coverage
should be broadly inclusive. It will probably be desirable to
assess a core of learning objectives common to most participating
nations, but if there is general agreement on the importance of
relevant, measurable learning outcomes that do not appear in
participating nations' curricula, they may be included. It may
also be desirable to include objectives in other domains, for
example, student attitudes, values, and creativity. Matrix sampling
(i.e., dividing the items to be included into subsamples to be
administered to different students) might be considered as a
means to increase the number and diversity of test questions
included without unduly burdening individual survey respon-
dents. The validity of test items should be reviewed by teams
of experts that include cognitive scientists, educational psy-
chologists, and curriculum or methods specialists in the rel-
evant disciplines. The board recognizes the complexity of sampling
curriculum content and the intractable problems of interpreta-
tion when comparing student outcomes for countries with very
different learning objectives.
OCR for page 27
PRINCIPLES FOR APPRAISING PROPOSALS
27
Coverage of Performance and Higher-Order Skills
When assessing student performance, objective questions can
offer considerable assessment efficiencies relative to free-re-
sponse items (such as open-ended questions), and multiple choice
paper-and-pencil items can be designed to measure some higher-
order skills. Nonetheless, consideration should also be given
to the inclusion of test items and other data collection formats
offering opportunities for students to display their performance
abilities. Increased emphasis should be placed on writing, speaking,
and interacting in both practical and school tasks. For example,
reading, writing, and problem solving might be assessed in the
context of particular subject areas. When feasible, complex,
conceptual knowledge, process skills, and higher-order thinking
should be assessed, as well as important factual knowledge, basic
skills, and other outcomes usually achieved earlier and considered
prerequisite for higher-level learning. Of course, there are economic
considerations that must be taken into account in any study
that uses "hands-on" assessment activities, but in most cases
time and resources should be reserved to make some open-
ended tasks possible.
Instrument Construction
Test Instruments
There may be sound reasons to use existing test instruments
in international comparative studies, including continuity with
earlier studies and linkage to other ongoing studies, as well as
economy and efficiency. When new instruments are developed,
however, they should adhere to high standards. Test content
should represent a reasoned balance among the curricula and
the information needs of all nations to be included in a stubbly.
The test development process should allow for participation
by representatives of the various nations involved and should
be informed by expertise in the curriculum area assessed, in
the cognate academic discipline, and in educational measure-
ment. Care should be taken to avoid redundancy among the
questions. If new measures are proposed, there should be evidence
OCR for page 28
28
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
that the measure works in at least one country before it is
included in an international study.
Whenever corresponding tests in more than one language
must be prepared, the test should include some items origi-
nating in each of the languages represented. Consideration
should be given to the development of parallel text materials
that are constructed simultaneously within the cultural context
of the different nations, rather than simply translated. If this is
not feasible economically, and translation is used, aU exercises
should be back-translated to enhance accuracy and compara-
bility. In addition, qualified bilingual experts should scrutinize
pairs of tests, item by item, for unintended differences in emphasis
or levels of abstraction. Care must be taken to ensure the
equivalence of meaning of an item in the different languages.
New or substantially revised tests should be pilot-tested to
ensure the quality of individual items and instructions to examiners,
as well as the appropriateness of time Innits for the questionnaire.
Following the pilot test, a check should be maple for item bias,
including cultural bias or translation bias, by examining the
relative difficulty of an item to other items in a subtest or domain.
A check should also be made of the appropriateness of any
statistical mode! used for scaling to ensure that it can cover the
total range of scaled scores from all countries before the tests
are used in any main testing.
A standardized research design across countries is essential,
although national or international options can be added. Other
modifications of the standardized design should not be per-
mitted, since they can have serious consequences for validity
or comparability.
· ~ ~ e ~ ~
Backgrounc! Questionnaires
Educational achievement data cannot be appropriately inter-
preted in the absence of information about responding students,
their backgrounds, their motivations, and their educational
experiences. For cross-national studies of achievement test scores,
it is especially critical that such information be collected. Back-
ground questions should be selected judiciously, and particular
attention should be given to matters such as variables (a) relevant
to the interpretation of achievement patterns, (b) plausibly related
OCR for page 29
PRINCIPLES FOR APPRAISING PROPOSALS
29
to school achievement (including locally available educational
resources), or (c) reflecting additional schooling outcomes val-
ued in their own right.
Explanatory studies that rely on quantitative data should
generally not rely exclusively on students' own reports of such
factors. Such studies should also include instruments directed
to teachers, administrators, and parents. For example, teachers
or curriculum coordinators might be asked about the availability
and use of particular instructional materials, local curriculum,
or specific instructional practices.
A structural mode} that postulates cause-effect relationships
to account for variation in student achievement should be used
in selecting background questions. The mode} can also guide
the analyses directed to identifying the sources of individual
and group differences in achievement and the relative impact
of these sources. Background variables about students seek to
explore the relationship between students' background and home
environments and achievement and attitudes. For example,
information might be requested about the students (age, gender,
race or ethnicity), indicators of family environment, parental
encouragement, and attitudes toward school assignments in
the subject matter being assessed. Information sought from
teachers might include information about their teaching expenence,
availability and use of particular instructional materials, local
curriculum, and classroom environment. School administrators
might be asked for data on school factors believed to influence
student achievement, such as instructional time, student enrollment
and attendance, and programs in the subject area.
Background information collected from students, teachers,
and school administrators can be supplemented by data from
other sources that provide economic and social indicators for
the various nations participating in the study. Economic and
social indicators can be related to student achievement In various
sectors of the population (e.g., rural or urban) and can also be
used to explore the relationship of student achievement to eco-
nomic development, resource development, industrialization,
political stability, and the like across nations.
Representatives of all the countries participating in a study
should be involved in developing background questionnaires
as they are for the test instruments. Similarly, care should be
OCR for page 30
30
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
given to translation, back-translation, and scrutiny of background
questions to ensure the equivalence of meaning of a question
In different languages. The background questionnaires should
be pilot-tested.
Because background data become more valuable if they can
be compared over time and across populations, the same wording
should be retained from study to study. Although it is difficult,
effort shout be made to ensure that background} variables are
defined similarly in the languages of all participating nations.
Similar effort is required to ensure the comparability of social
and economic indicators for all participating nations. All variant
definitions should be documented.
Test Aciminisbation
Whenever achievement results are to be compared from one
test administration to another, it is ~rnperative that administrative
procedures be controlled to be as nearly identical as possible.
Maintenance of standard test administration procedures over
time and from one nation to another is of paramount importance.
Standardized procedures for instructing students and establishing
conditions for testing should be developed, based on a pilot
test of the instructions in each participant country. Time should
be allotted at an international meeting of study coordinators to
listen to their complaints and suggestions following the pilot
test and to agree to standard administrative procedures. Test-
ing materials should be clearly understandable. The testing
environment should be comparable from one setting to another
both within ant} across nations and should be free from clis-
tractions.
Each study design should address plans to control and stan-
dard~ze conditions of test administration. IdeaDy, to ensure
adequate quality control, suitably trained people from outside
the schools should be in charge of the test administration. In
addition, people from different countries should supervise the
implementation of the procedures to be followed (previously
agreed on by the countries involved) by being present on site
when the field work is conducted. Such quality control procedures
would assure more uniform test administration, particularly in
countries with little experience in assessment. Each design
OCR for page 31
PRINCIPLES FOR APPRAISING PROPOSALS
31
also should address the level of student motivation to try to
minimize any plausible systematic differences from one nation
(and from one test administration within a nation) to another
in incentive to perform well in response to test questions. Each
country report should carry a description of test administra-
tion conditions.
Plans for Analysis, Reporting, and Dissemination
Plans for analysis, reporting, and dissemination of interna-
tional comparative study findings should be described at the
time the study is proposed and should indicate how the critical
questions to be informed by that study will be addressed. These
plans should provide for balanced reporting of cross-national
comparisons and may also involve separate analysis and reporting
of data from each participating nation or subsets of them. The
board discourages exclusive, or even heavy, reliance on overall
national rankings. Very often differences in educational systems
render such comparisons invalid; a more productive approach
is to find out the reasons for observed differences in pupil
achievement. Prior to the release of any cross-national report,
opportunities should be provided to all nations for review of
the analysis and interpretations.
Without dwelling on them too much, reports should give
prominent place to a discussion of the known and surmised
limitations. Reporting should be sensitive to contextual factors
that might affect test validity, for example, the relative familiarity
of children in different countries with testing in general or with
the particular item formats used in a comparative study. The
possibility might also be considered that children who are exposed
to a great deal of testing may expend less effort on "low stakes"
tests they know do not matter for their own educational futures.
Reporting should also be sensitive to technical limitations on
a study's interpretability. Limitations might include caveats about
the comparability of national samples, the limited number of
test items or range of content on which comparisons are based,
differences in administration conditions from place to place,
the match of tests to different curricula, the difficulty of trans-
lating exercises from one language to another, the limited pre-
cision of sample statistics, or other qualifications on study findings.
OCR for page 32
32
Analysis Plan
INTE=ATIONAL COMPARATIVE STUDIES OF EDUCATION
For various reasons, data analysis plans may change or evolve
from the time a study is designed to the time it is completed
and reported. Unforeseen difficulties in data collection or
limitations of data quality may preclude some planned analyses.
New questions or insights that occur in the course of data col-
lection and analysis may open productive new lines of inquiry.
Data already collected may be pressed into service to address
emergent policy issues. Even when such evolution is anticipated,
however, every proposal for an international educational study
should include an analysis plan. The correspondence between
the analyses proposed and the questions they are intended to
answer- if not obvious should be made explicit. In both ex-
planatory and descriptive studies, it should be clear how theo-
reticaDy central variables are to be measured and how relationships
among critical variables are to be assessed. In qualitative studies,
methods of examining and relating alternative data sources should
be indicated, and anticipated procedures for developing conceptual
or explanatory frameworks should be described.
Level of Detail in Reporting
In any complex study, there is a tension between the level of
detail and the precision of the reported results. At one extreme,
an average score over a large number of test items for an entire
nation may be estimated quite precisely, but it conveys little
information. At the other extreme, reports of numerous quartiles
of the score distributions for narrow student subpopulations
on individual items may be so poorly estimated that they also
convey little information. However this tension is resolved, it
is crucial that standard errors be calculated and reported with
all reported statistics. Calculation of standard errors is technically
complex, and the board encourages the use of a recognized
expert consultant in this and other analysis stages, as it does
for sampling.
The first issue to be resolved with respect to the appropriate
level of detail in reporting is the number and size of subpopu-
lations to be distinguished. Performance may be reported for
major subgroups of student cohorts, defined by geographic region,
OCR for page 33
PRINCIPLES FOR APPRAISING PROPOSALS
33
language background, gender, race and ethnicity, or other variables,
if such reporting advances the purposes of the study. When
achievement is reported, the utility of multiple scores should
be considered. In many cases, interpretive emphasis is prop-
erly given to major content and process categories rather than
to total scores. Finally, within the limits on precision imposed
by the design and size of a study, distributional summaries
should be given and not just means and standard deviations.
Reporting of quartiles (e.g., deciles, or quartiles) is one method
that is readily explained and understood, and graphics such as
box plots are easily understood and of potential value. Con-
sideration may also be given to reporting at multiple levels of
aggregation if that is appropriate to the design and intent of
the study. In addition to presenting the student-level score
distribution, for example, distributions of classroom or school
means might also be reported.
Standards and Criterion Levels
Studies concerned with student achievement data can be en-
hanced considerably by reporting outcomes in terms of performance
standards, for example, the percentage of students who know
everyday science facts or who use scientific procedures and
analyze scientific data. This can be difficult to accomplish,
however, and there is a risk that arbitrarily established stan-
dards will lead to serious misinterpretations of achievement
levels. if results are reported relative to specified performance
levels (e.g., functional literacy), the basis for establishing these
levels must be explicit, defensible, and responsive to the needs
and contexts of all the nations involved. This might imply the
use of different criterion levels for cross-national reporting than
for national reporting. Alternatively, a graduated series of
proficiency levels might be defined, labeled with appropriate
descriptors, and illustrated with representative test items.
Special Reports for Nontechnical Audiences
Special reports should be prepared for nontechnical audi-
ences, including the press, politicians, and policy makers. These
OCR for page 34
34
INTERNATIONAL COMPARATIVE STUDIES OF EDUCATION
reports, which are designed to serve political purposes, differ
from the more detailed reports intended for research and edu-
cational purposes. They should be designed so that the infor-
mation is easily assimilated. Useful analytic tools for such
reports include simple graphs, percentiles, and a graduated
series of proficiency levels with illustrative test items.
Preparation of this type of report plays a role in institutional
capacity building by forging links between the research and
policy making communities. It also augments the dissemination
of the latest information and techniques and will enhance long-
term funding prospects. Study proposals should provide for
mechanisms to disseminate results widely among public and
private organizations. Such dissemination stimulates debate,
which makes it more likely that study findings will be put into
practice.
Data Audit and Evaluation
Experience has shown that national researchers make many
changes in background questionnaires from the intent of the
international questions. This leads to nonconformity of data to
the international code book, which requires extensive work by
the international coordinators, to clean the data. In some cases
it is desirable to produce a data-entry program and a data-
cleaning program for the use of national research coordinators.
The technical features of any international comparative study
should be clearly documented. It is desirable that at least a
summary of the methods involved be included in the principal
reports, along with estimates of sampling precision. More de-
tailed documentation, which might be published in a separate
volume from the main report of the study, should address such
matters as maintenance of the security of test materials before
the actual testing; sampling adequacy (participation rate, attri-
tion, absentee follow-up); comparability of administration con-
ditions; procedures for audit of data collection; data checking,
cleaning, and scoring; procedures for review of study reports
prior to publication; and other procedural matters that may
condition the confidence placed in study findings.
OCR for page 35
PRINCIPLES FOR APPRAISING PROPOSALS
Public Use of Data
35
Countries participating in studies should be authorized to
release their own findings as soon as the national data file is
cleaned, merged into the international file, and ready for analysis.
Provisions should be made to ensure that, when appropriate
and within a reasonable period after analysis and reporting by
project sponsors, data are placed in the public domain in a
form accessible for secondary analysis. Special attention should
be paid to making the data accessible to researchers in third-
worId countries. Clear and complete data documentation is
crucial. When feasible, consideration should be given to using
existing archives.
The importance of making international data easily accessible
for secondary analysis should not be underestimated. More
extensive use of the data at the national policy level can help
in understanding the weaknesses and strengths of the U.S.
educational system as well as those of other countries.
Representative terms from entire chapter:
comparative studies