go away,” said Nehm. “Whatever instruction is happening at early levels, it’s not ameliorating the problems that we have.”

In education, the only way to make robust causal claims is through a randomized controlled trial (RCT), but no such trials have been conducted for evolution education. “If you want to make causal claims, there is no causal literature to refer to.”

Fortunately, other research tools can be used with educational interventions to draw conclusions that can guide policy. A group receiving an intervention can be compared with a group not receiving the intervention. Interventions can be done without a comparison group—for example, by looking at a group before and after an intervention. Survey research can yield associations, although survey research cannot determine whether these associations are causal. Finally, case studies, interviews, and other forms of qualitative research can reveal new variables and possible associations.

Nehm’s 2006 review of the literature found no intervention studies with randomized control groups, 6 intervention studies with comparison groups, and 24 other studies that employed various intervention techniques. Also, some of the interventions were quite brief—just one to three weeks—a period during which substantial changes are unlikely to occur, given the difficulties of teaching evolution. One conclusion is obvious, Nehm said: “We need to do some randomized controlled trials to see what works causally in terms of evolution education.”

Nehm also pointed out that documenting learning outcomes is critically important in education research. According to the report Knowing What Students Know: The Science and Design of Educational Assessment (National Research Council, 2001), “assessments need to examine how well students engage in communicative practices appropriate to a domain of knowledge and skill, what they understand about those practices, and how well they use the tools appropriate to that domain.” Yet most tests today, including those that dominate biology curricula, assess isolated knowledge fragments using multiple choice tests. Students may be learning about evolution, “but if we can’t measure that progress, we can’t show that what we’re doing has any positive effect. So we need assessments that can measure the way people actually think.”

The problems caused by inadequate metrics are particularly obvious in the literature on teacher knowledge of evolution, Nehm said. Only five intervention studies exist, and three of them assess teacher’s knowledge of evolution using a multiple choice or Likert scale test (Baldwin et al., 2012). This lack of careful metrics “is really concerning,” said Nehm. Evolution assessments must be developed that meet quality control standards established by the educational measurement community, or robust claims, causal or otherwise, cannot be made.

The National Academies of Sciences, Engineering, and Medicine
500 Fifth St. N.W. | Washington, D.C. 20001

Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement