Appendix: Scientific Evidence

THE SCIENTIFIC METHODS

METHODS USED BY SOCIAL SCIENTISTS to gain knowledge are very diverse. Especially in the field of education, a field that calls on several social sciences in order to constitute its knowledge base, a variety of methods are relevant and useful. Each method has its strengths and is best suited to a particular set of questions, and less well suited to a different set of questions. Despite the relative merits of all social science methods, their application in the service of research requires that several basic standards be met if the answers they yield are to be considered valid and valuable.

This note on evidence makes explicit those standards of the scientific communities to which we are accountable in conducting research on young children, and to which we have held researchers accountable in conducting our review. We identify and briefly describe those standards, highlighting areas in which the consensus is not as strong, as well as areas in which important advances have been made in recent decades.

Empiricism: Theory Building

Scientists pose hypotheses based on their observations in the world and in the laboratory. In order to test their hypotheses and



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 323
Appendix: Scientific Evidence THE SCIENTIFIC METHODS ETHODS USED BY SOCIAL SCIENTISTS to gain knowledge are very M diverse. Especially in the field of education, a field that calls on several social sciences in order to constitute its knowledge base, a variety of methods are relevant and useful. Each method has its strengths and is best suited to a particular set of questions, and less well suited to a different set of questions. Despite the relative merits of all social science methods, their application in the service of research requires that several basic standards be met if the answers they yield are to be considered valid and valuable. This note on evidence makes explicit those standards of the scientific communities to which we are accountable in conduct- ing research on young children, and to which we have held re- searchers accountable in conducting our review. We identify and briefly describe those standards, highlighting areas in which the consensus is not as strong, as well as areas in which important advances have been made in recent decades. Empiricism: Theory Building Scientists pose hypotheses based on their observations in the world and in the laboratory. In order to test their hypotheses and 323

OCR for page 323
324 EAGER TO LEARN refine theories, they design research studies that entail the collec- tion of data in some form. Those data are then analyzed, results or findings are arrived at, and interpretations of those results are made. These interpretations then can be used to frame future research and guide policy making and program design and imple- mentation. Replicability and Falsifiability All theories must be falsifiable. In other words, any theory derived from a study must be sufficiently elaborated so that other scientists can replicate the study and collect additional empirical data to either corroborate or contradict the original theory. It is this willingness to abandon or modify a theory in the face of new evidence that is one of the most central defining features of the scientific method. The extent to which one particular theory can be viewed as uniquely supported by a particular study depends on the extent to which alternative explanations have been ruled out. A particu- lar research result is never equally relevant to all competing theo- retical explanations. A given experiment may be a very strong test of one or two alternative theories but a weak test of others. Validity and Generalizability Validity is defined as the extent to which the instrument is actually measuring what the researcher intends it to measure. External validity concerns the generalizability of the conclusions to the larger population and setting of interest. Internal and ex- ternal validity are often traded off across different methodolo- gies. The alleged trade-off between internal and external validity presents some interesting questions. In what sense can a biased estimate (one that is inaccurate for the whole population) be said to be generalizable? What we mean is that we are willing to risk a small amount of bias for a large increase in confidence that the estimate generalizes to a much larger set of children and pro- grams. Willingness to take that risk requires some confidence that the size of the bias introduced by lack of experimental con- trol is small relative to the bias introduced by applying an unbi-

OCR for page 323
325 APPENDIX: SCIENTIFIC EVIDENCE ased estimate obtained from a narrow set of children and pro- grams to a broader set of programs and children. Convergence Scientists and those who apply scientific knowledge must of- ten make a judgment about where the preponderance of evidence points. When this is the case, the principle of converging evi- dence is an important tool, both for evaluating the state of the research evidence and also for deciding how future research should be designed. Research is highly convergent when a series of studies consis- tently supports a given theory while collectively eliminating the most important competing explanations. Although no single study can rule out all alternative explanations, taken collectively, a series of partially diagnostic studies can lead to a strong conclu- sion if the data converge. This aspect of the convergence prin- ciple implies that we should expect to see many different meth- ods employed in all areas of educational research. A relative balance among the methodologies used to arrive at a given con- clusion is desirable because the various classes of research tech- niques have different strengths and weaknesses. The results from many different types of investigation are usually weighed to de- rive a general conclusion, and the basis for the conclusion rests on the convergence observed from the variety of methods used. This is particularly true in the domains of classroom and curriculum research. Types and Uses of Empirical Methods There are several ways to categorize the empirical methods used in research on early childhood development and education. They may be classified according to: • the purpose of the study (e.g., evaluation of a program, open-ended inquiry for hypothesis or theory building, hypoth- esis testing, comparison of groups or of individuals), • the design aspects of the study (e.g., the number of times

OCR for page 323
326 EAGER TO LEARN data are collected: longitudinal, cross-sectional; the type of data that are collected: quantifiable, qualitative), and • the data analysis aspects and the unit of analysis used when the data are analyzed (e.g., univariate, bivariate, and multi- variate analyses, qualitative analyses). Across these groups of studies, methodological rigor can be defined and ensured through attention to the standards outlined above (replicability, generalizability, convergence). Purposes of Research Open-ended Inquiry: Qualitative, Ethnographic Research In order to record and collect data in a naturalistic setting, social scientists conduct various types of ethnographic or qualita- tive research. These include case studies of individual learners or teachers, classroom ethnographic observations, open-ended and introspective interviews, and combinations of these methods. Qualitative research is most useful for in-depth descriptions of complex processes, such as teaching and learning. It may be im- portant, for example, to assess the beliefs and attitudes of the adults involved in an intervention in order to evaluate the role of those adults in the implementation of a particular educational in- tervention. The strengths of qualitative inquiry include a focus on depth, attention to the meaning of phenomena to the people being stud- ied, and a quality of openness that enables new questions and perspectives to be uncovered throughout the research process. In most cases, however, qualitative studies sacrifice breadth for depth, and it is difficult to judge if the results are applicable or generalizable to a different population. Identifying Causal Relationships: Experimental and Quasi-Experimental Design If the purpose of the research is to identify cause-and-effect relationships between variables, then experimental and quasi- experimental studies are useful. An experimental study is one in

OCR for page 323
327 APPENDIX: SCIENTIFIC EVIDENCE which the researchers randomly select a control group and a treatment group, administer an intervention (such as an educa- tional program) to the treatment group, and then compare the results by measuring before-and-after treatment variables on both the control and treatment groups. A true experiment is one in which all extraneous variables are controlled and only the single variable of interest is allowed to vary, so that the effect of that variable on the outcome variable can be clearly measured. This pure experimental design is the strongest inferential tool for statistical analysis. In social science research, and especially in the education field, is often difficult and even unethical to ensure that the con- trol group remains a true control throughout the duration of the intervention. There are several reasons why this may be the case and which therefore justify quasi-experimental or other types of research. First, there are logistical difficulties associated with carrying out classroom and curriculum research that may pre- clude true experimental designs. For example, members of con- trol groups may engage in an alternative program, not that of the treatment group, but which will have some effect on those mem- bers. In some cases, ensuring a true control group would be un- ethical, as it would require withholding treatment from children even though the purpose of the research may be to gain knowl- edge that will help those same children in the future. Also, vari- ables such as birth order, sex, and age cannot be manipulated, and therefore the relationships among these can only be correla- tional. By collecting observational and interview data from all participants, and by using statistical control mechanisms to neu- tralize the effects of the alternative programs on the control group members, researchers can overcome these limitations. Researchers can also plan the study so as to minimize such problems. For example, the research plan may require providing a treatment that is of much higher quality and intensity than ordi- nary child care or even public preschool education and Head Start where these are provided. When service availability varies geo- graphically, study locations might be chosen based on the lack of close substitutes for the treatment. In any case, it is vital that researchers document all of the potentially significant educational activities that both the treat-

OCR for page 323
328 EAGER TO LEARN ment and the control groups experience. The Abecedarian study provides a good example of a successful experiment in which much of the control group attended other early childhood pro- grams. In this study, the difference in quality and intensity was so large that program effects were apparent. Moreover, an esti- mate of the diminishment of group differences due to control group experiences was produced. However, it may well be that the critical public policy issue is what is the effect of a program without taking into account the child care and preschool educa- tion experiences of the control group. If the research is to investi- gate the impact of providing a particular program, as it is cur- rently implemented, given what is already available, then what the control group receives is irrelevant (assuming appropriate sampling procedures), and experimental studies produce good answers. Thus, whether a true experiment is useful depends on (a) the expected difference between the treatment and what oc- curs naturally and (b) the precise question being asked. Quasi-experimental studies often suffer some of the same problems in assessing treatment effects as experimental studies. An example of this is when a comparison group is not examined carefully enough to determine interventions that they have re- ceived. Some correlational studies of Head Start and other pre- school programs have failed to take into account the attendance of children in child care centers, despite the fact that these may not be particularly different from the “treatment” in terms of the child’s educational experiences and given the fact that children tend to spend longer hours in child care. Evidence can be combined across studies looking at different parts of causal chains that might not be completely encompassed by very many studies. For example, studies that link smoking and cancer need not follow subjects all the way to premature death, when there are many studies linking the kinds of cancer caused by smoking to premature death. Identifying Relationships and Patterns: Correlational Studies Although experimental studies represent a most powerful design for drawing causal inferences, their limitations must be

OCR for page 323
329 APPENDIX: SCIENTIFIC EVIDENCE recognized. A not uncommon misconception is that correlational (i.e., nonexperimental) studies cannot contribute to knowledge. This is false for a number of reasons. First, many scientific hypotheses are stated in terms of correla- tion or lack of correlation, so that such studies are directly relevant to these hypotheses. Second, although correlation does not imply causation, causation does imply correlation. That is, although a cor- relational study cannot definitively prove a causal hypothesis, it may rule one out. Third, correlational studies are more useful than they once were due to more recently developed correlational designs. For example, the technique of partial correlation, widely used in studies cited in this report, makes possible a test of whether a par- ticular third variable is accounting for a relationship. RESEARCH IN EARLY CHILDHOOD EDUCATION Researchers in early childhood education study a vast num- ber of questions. For example: What are the processes through which knowledge is transmitted to young children? What are the effects of educational experiences and of different types of pro- grams on young children? How do factors such as gender, social class, culture, and ethnicity affect the development and education of young children? Given this wide scope of investigation and the inherent complexity of studying young children’s develop- ment, the design of precise and accurate measurements is a chal- lenging task. Below we elaborate on a number of questions that should be addressed both in designing research studies and in evaluating the quality of research results. Precision of the Questions Being Asked in the Research Did the researcher have a defined purpose for comparing re- sults? Are the questions being asked too broad in nature and are inappropriate measures used? Are we clearly specifying the mul- tiple variables that might underlie the expected change? Are we defining relevant dimensions that may or may not be factors within the child? How do measurement indexes relate to the goals of programs?

OCR for page 323
330 EAGER TO LEARN The Variability of Young Children’s Performance Variability in any sample of living organisms should initially be examined in terms of the phenomenon or phenomena under study before attributing variability to measurement error (Farran, 2000). The consequence of ignoring within-group variability or of neglecting the potential significance of outliers in an aggre- gated data base is a missed opportunity. By focusing so narrowly on the “normal,” a great deal of potentially useful information is overlooked, and understanding of the phenomena under study is thereby greatly handicapped. Within-group variability is not nec- essarily a random, inconsequential event. An inadequate under- standing of the sources of variation should not automatically lead to an interpretation of random error. The argument for random- ization is based on the assumption that randomness ensures that the phenomenon under study has an equal chance of being dis- tributed in the entire population. The samples employed in many studies are too small, however, to uphold the validity of this assumption. Use of Common Measures Versus Trying Innovative Measures Using measures that are commonly used by others in similar studies allows communication and comparison among different research groups and studies. A persistent use of measures known to have serious limitations, however, may allow these measures to gain acceptance and “incremental validity” simply by the fact that everyone uses these measures to answer a particular set of research questions. In other words, measures often become insti- tutionalized, or part of a research culture. Designing innovative measures, however, also has potentially negative consequences. If these measures are entirely new and therefore still under question within the scientific community, it may be difficult to interpret the results that they yield to the satis- faction of all. In addition, new measures present difficulties when it comes to training those who will administer them. We suggest that the solution to this dilemma lies in the use of multiple measures. For example, measuring verbal intelligence

OCR for page 323
331 APPENDIX: SCIENTIFIC EVIDENCE among young children would include administration of a com- monly used measure such as the Peabody Picture Vocabulary Test in combination with conducting clinical interviews of at least a subsample of children. Triadic Nature of Early Childhood Education In a recent comparative study of preschool programs, it was found that children in classrooms in which teachers strongly be- lieved in the curriculum model they were implementing did bet- ter on standardized measures of development than children whose teachers were torn between conflicting models. This find- ing is supported in the literature showing how belief systems cre- ate environments in which particular beliefs are resistant to change even when the data support alternative points of view. The work of Shepard and Smith from the University of Colorado is an example of such research. Evaluations of the effects of pre- school education on children should therefore take account of the mode of implementation of the programs being evaluated. In other words, the unit of analysis in such assessments is not only the program, or the child, but rather a triad composed of teacher, child or group of children, and program in the context of class- room. The factors that constrain or facilitate the interactions among these three factors include the social characteristics of the chil- dren and of the teacher and the target and/or goal of the program relative to the transactions in the classroom. The social character- istics of the child are for the most part characteristics of the house- hold, race and/or ethnicity, income and access to other economic resources, and even the level of parental education that influences the processes that take place in the home. These factors are im- portant for a number of reasons: first, they shape the processes that occur in the home. Second, they shape the interactions be- tween the parents and children and their environments (includ- ing access to and choice of nonparental care and education ar- rangements). Finally, the social characteristics of the child shape the perceptions (or interpretations) of the experiences of the child and parents in all of their environments.

OCR for page 323
332 EAGER TO LEARN Conceptual Orientation of the Investigator In addition to taking full account of this triad, the orientation of the investigator must also be considered when examining evaluation or other types of early education studies. Research scientists approach their studies from a particular perspective with particular assumptions and understandings that guide their investigations. In evaluating their research, we feel it is impor- tant to ask such questions as: What is the ideological or concep- tual orientation of the investigator? Is he or she studying children in context, or in isolation from the natural social envi- ronment? Is the perspective dominated by a search for univer- sals, or rather for a search for differences between groups and/or cultures? Is the researcher interested in describing a dynamic model of the processes involved, or is the research instead inter- ested in capturing a more static picture? Is the researcher more interested in endogenous or exogenous variables? Finally, does the researcher hold an individualist orientation, focusing on the child as the center of the model, or a more interactionist perspec- tive, in which systems including the child, his or her family, and the school interact to shape development?