6
Evaluating Evidence

KEY MESSAGES

  • What constitutes the best-quality evidence varies with the question being asked, which should be aligned with user needs and interests.

  • Evidence should be evaluated against outcomes that may be short-term, intermediate, or long-term. These outcomes may be related to systems considerations (see Chapter 4) and not directly to the problem of interest.

  • In evaluating the evidence for an intervention, both the level of certainty of the causal relationship between the intervention and its observed outcomes and the generalizability of the evidence to other individuals, settings, contexts, and time frames should be considered.

  • The quality of each type of evidence needed to answer a particular question should be evaluated according to established criteria for that type of evidence.

  • Evaluating the evidence gathered to address a particular population-level health problem will help identify gaps in knowledge that require further research.

The previous chapter describes an expanded perspective on the types of evidence that  can be used in decision making for interventions addressing obesity and other complex, systems-level population health problems. It presents a detailed typology of evidence that goes beyond the traditional simple evidence hierarchies that have been used in clinical practice and less complex public health interventions. This chapter focuses on the question of how one judges the quality of different types of evidence in making decisions about what interventions to undertake. The question is an important one not only because many of the interventions required to address obesity are complex, but also because the available evidence for such interventions comes from studies and program



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 115
6 Evaluating Evidence KEY MESSAGES • What constitutes the best-quality evidence varies with the question being asked, which should be aligned with user needs and interests. • Evidence should be evaluated against outcomes that may be short-term, intermediate, or long-term. These outcomes may be related to systems considerations (see Chapter 4) and not directly to the problem of interest. • In evaluating the evidence for an intervention, both the level of certainty of the causal relationship between the intervention and its observed outcomes and the generalizability of the evidence to other individuals, settings, contexts, and time frames should be considered. • The quality of each type of evidence needed to answer a particular question should be evaluated accord- ing to established criteria for that type of evidence. • Evaluating the evidence gathered to address a particular population-level health problem will help iden- tify gaps in knowledge that require further research. T he previous chapter describes an expanded perspective on the types of evidence that can be used in decision making for interventions addressing obesity and other com- plex, systems-level population health problems. It presents a detailed typology of evi- dence that goes beyond the traditional simple evidence hierarchies that have been used in clinical practice and less complex public health interventions. This chapter focuses on the question of how one judges the quality of different types of evidence in making deci- sions about what interventions to undertake. The question is an important one not only because many of the interventions required to address obesity are complex, but also because the available evidence for such interventions comes from studies and program 

OCR for page 115
evaluations that often are purposely excluded from systematic reviews and practice guidelines, in which studies are selected on the basis of the conventional hierarchies. In the L.E.A.D. framework (Figure 6-1), one begins with a practical question to be answered rather than a theory to be tested or a particular study design (Green and Kreuter, 2005; Sackett and Wennberg, 1997). A decision maker, say, a busy health department director or staff member, will have recognized a certain problem or oppor- tunity and asked, “What should I do?” or “What is our status on this issue?” Either of these questions may be of interest only to this decision maker for the particular social, cultural, political, economic, and physical context in which he/she works, and the answer may have limited generalizability. This lack of generalizability may lead some in the academic community to value such evidence less than that from random- ized controlled trials (RCTs). However, data that are contextually relevant to one set- ting are often more, not less, relevant and useful to decision makers in other settings than highly controlled trial data drawn from unrepresentative samples of unrepresen- tative populations, with highly trained personnel conducting the interventions under tightly supervised protocols (see Chapter 3 for further discussion). Specif y O pp Questions or tu e ni tiv tie ec s sp Identify and gather the to Locate r Pe G t ypes of evidence that are en Evidence s potentially relevant to the er em questions at st e Sy Ev i de Apply standards of qualit y nc Evaluate e as relevant to different Evidence t ypes of evidence Select and summarize the Assemble relevant evidence according O Evidence to considerations for its use pp or tu e ni tiv tie ec s sp to r Pe G Inform Use evidence in the en s decision-making process Decisions er em at st e Sy Ev i de nc e FIGURE 6-1 The Locate Evidence, Evaluate Evidence, Assemble Evidence, Inform Decisions (L.E.A.D.) framework for obesity prevention decision making. NOTE: The element of the framework addressed in this chapter is highlighted. Figure 6-1.eps Bridging the Evidence Gap in Obesity Prevention 

OCR for page 115
The types of evidence that are used in local decision making, including the policy process, extend beyond research to encompass politics, economics, stakeholder ideas and interests, and general knowledge and information (see Chapter 3), and the decision maker needs to take a practical approach to incorporating this evidence into real-life challenges. Working from this expanded view of what constitutes relevant evi- dence and where to find it (Chapter 5), this chapter describes an approach for evaluat- ing these different types of evidence that is dependent on the question being asked and the context in which it arises. Before proceeding, it is worth emphasizing that the L.E.A.D. framework is use- ful not only for decision makers and their intermediaries but also for those who gener- ate evidence (e.g., scientists, researchers, funders, publishers), a point captured by the phrase “opportunities to generate evidence” surrounding the steps in the framework (Figure 6-1). In fact, a key premise of the L.E.A.D. framework is that research gen- erators need to give higher priority to the needs of decision makers in their research designs and data collection efforts. To this end, the use of the framework and the evaluation of evidence in the appropriate context will identify gaps in knowledge that require further investigation and research. This chapter begins by reviewing several key aspects of the evaluation of evi- dence: the importance of the user perspective, the need to identify appropriate out- comes, and the essential role of generalizability and contextual considerations. After summarizing existing approaches to evaluating the quality of evidence, the chapter describes the general approach proposed by the committee. Finally, the chapter addresses the issue of the trade-offs that have to be made when the available evidence has limitations for answering the question(s) at hand—a particular concern for those who must make decisions about complex, multilevel public health interventions such as obesity prevention. A USER’S PERSPECTIVE The approach of “horses for courses” (Petticrew and Roberts, 2003) emphasizes that what constitutes best evidence varies with the question being addressed and that there is no value in forcing the same type of evidence to fit all uses. Once the question being asked is clear, users of the L.E.A.D. framework must either search for or generate (see Chapter 8) the kinds of evidence that will be helpful in answering that question. The next chapter describes how to assemble the evidence to inform decisions. For situa- tions in which the evidence is inadequate, incomplete, and/or inconsistent, this chapter suggests ways to blend the best available evidence with less formal sources that can bring tacit knowledge and the experience of professionals and other stakeholders to bear. A large number of individual questions can, of course, be raised by those under- taking efforts to address obesity or other complex public health challenges. Petticrew and Roberts (2003) place such questions into eight broad categories: effectiveness  Evaluating Evidence

OCR for page 115
(Does this work?), process of delivery (How does it work?), salience (Does it matter?), safety (Will it do more good than harm?), acceptability (Will people be willing to use the intervention?), cost-effectiveness (Is it worth buying this service?), appropriateness (Is this the right service/intervention for this group?), and satisfaction (Are stakehold- ers satisfied with the service?). To this categorization the committee has added such questions as How many and which people are affected? and What is the seriousness of the problem? In Chapter 5, the committee adopts this approach but places these ques- tions in the broad categories of “Why,” “What,” and “How” and gives a number of examples for each category (Tables 5-1 through 5-3). Certain types of evidence derived from various study designs could be used to answer some of these questions but not others (Flay et al., 2005). For example, to ascertain the prevalence and severity of a condition and thus the population burden, one needs survey or other surveillance data, not an RCT. To ascertain efficacy, effec- tiveness, or cost-effectiveness, an RCT may be the best design. To understand how an intervention works, qualitative designs may be the most valuable and appropriate (MacKinnon, 2008). To assess the organizational adoption and practitioner implemen- tation and maintenance of a practice, longitudinal studies of organizational policies and their implementation and enforcement (i.e., studies of quality improvement) may be needed. As discussed in previous chapters, to assess interventions designed to control obesity at the community level or in real-world settings, RCTs may not be feasible or even possible, and other types of evidence are more appropriate (Mercer et al., 2007; Sanson-Fisher et al., 2007; Swinburn et al., 2005). To apply the terminology adopted for this report (Chapter 5) (Rychetnik et al., 2004), for “Why” (e.g., burden of obe- sity) or in some cases “How” (e.g., translation of an intervention) questions, RCTs are not the appropriate study design. The same may be true even for some “What” questions (e.g., effectiveness of an intervention) that lend themselves more to formal intervention studies. Also as discussed in previous chapters, decision makers need to recognize the interrelated nature of factors having an impact on the desired outcome of complex public health interventions. They should view an intervention in the context in which it will be implemented, taking a systems perspective (see Chapter 4). Such a perspec- tive, which evolved from an appreciation of the importance of effectiveness in real- world conditions or natural settings (Flay, 1986), is clearly needed when decision makers evaluate generalizability, as well as level of certainty, in judging the quality of evidence (Green and Glasgow, 2006; Rychetik et al., 2004; Swinburn et al., 2005). IDENTIFICATION OF APPROPRIATE OUTCOMES Appropriate outcomes may be multiple and may be short-term, intermediate, or long- term in nature. Regardless, they should be aligned with user needs and interests. For policy makers, for example, the outcomes of interest may be those for which they Bridging the Evidence Gap in Obesity Prevention 

OCR for page 115
will be held accountable, which may or may not be directly related to reductions in obesity. In a political context, policy makers may want to know how voters will react, how parents will react, what the costs will be, or whether the ranking of the city or state on body mass index (BMI) levels will change. Health plan directors may want evidence of comparative effectiveness (i.e., comparing the benefits and harms of a competitive intervention in real-world settings) to make decisions on coverage. In any situation with multiple outcomes, which is the usual case, trade-offs may have to be made between these outcomes. For example, an outcome may be cost-effective but not politically popular or feasible. Further discussion of trade-offs can be found later in the chapter. Logic models are helpful in defining appropriate evaluation outcomes and pro- viding a framework for evaluation. For a long-term outcome, a logic model is use- ful in defining the short-term and intermediate steps that will lead to that outcome. Outcomes can be goals related to the health of the population (e.g., reduced mortal- ity from diabetes), structural change (e.g., establishment of a new recreation center), a new policy (e.g., access to fresh fruits and vegetables in a Special Supplemental Nutrition Program for Women, Infants, and Children [WIC] program), or others. A recent report by the Institute of Medicine (IOM) (2007) introduces a general logic model for evaluating obesity prevention interventions (for children) (see Chapter 2, Figure 2-2) and applies it specifically to distinct end users, such as government and industry (see Figures 6-2 and 6-3, respectively). This model takes into account the RESOURCES AND INPUTS SECTORS OUTCOMES STRATEGIES AND ACTIONS Health Adequate Program Structural and Institutional Outcomes Funding Coordination, Outcomes GOVERNMENT • Increase federal, Surveillance, and • Increase funds appropriated by Industry Reduce BMI state, and local health Congress to support obesity Training Communities department budgets prevention research and improve Levels in the • Improve coordination of Schools substantially to match program implementation. Population governmental and Home the growing obesity • Increase resources provided to state nongovernmental burden. health, education, and other relevant activities to prevent Reduce departments and agencies to increase childhood obesity. Capacity evaluation capacity. Obesity • Expand programmatic • Train staff in state and local Development Prevalence activities and develop agencies to provide leadership, • Regularly assess training materials to training, and evaluation capacity for government’s capacity prevent childhood obesity. Reduce communities at the local level. at all levels to prevent • Develop and enhance • Strengthen and expand federal and Obesity- childhood obesity. surveillance systems to state surveillance systems to monitor Related monitor indicators and relevant obesity-related outcomes. Morbidity outcomes. • Implement in all states a mechanism to monitor childhood obesity prevalence, dietary factors, physical activity levels, and sedentary behaviors through a population-based sample over time. Cross-cutting Factors FIGURE 6-2 Evaluation framework for government efforts to support capacity development for preventing childhood obesity. SOURCE: IOM, 2007. Figure 6-2.eps landscape  Evaluating Evidence

OCR for page 115
RESOURCES AND INPUTS SECTORS STRATEGIES OUTCOMES AND ACTIONS Programs and Structural Cognitive, Health Leadership and Policies Outcomes Social, and Outcomes Commitment Government • Develop, implement, and Behavioral • Companies, retailers, • Increase company INDUSTRY Reduce BMI coordinate policies within and trade associations Outcomes sales and profits for Communities and across companies to Levels in commit to supporting and low-calorie and non- • Increase Schools support healthier sustaining health the calorie beverages. consumer use of beverage product Home promotion and childhood • Increase Population company icons por tfolios. obesity prevention company use of to identify initiatives. product icons and Reduce healthier Marketing and labeling to promote beverages. Obesity Strategic Planning Promotion healthier • Increase Prevalence beverages. • Strategic plans of • Develop and implement consumer companies reflect a responsible adver tising purchases and Reduce substantial shift to and marketing guidelines consumption of Obesity- promote the growth of across multiple media. healthier healthier beverage Related beverages. product por tfolios. Environmental Education • Reduce calorie Morbidity intake and total Outcomes • Implement educational added sugars in materials, product labeling, • Increase children’s diets icons or logos, and health availability and associated with claims in packaging and affordability of Adequate Funding sugar-sweetened marketing. beverage products beverages. • Companies dedicate in smaller Collaboration and substantial resources to containers in retail develop and promote Coalitions outlets, restaurants, affordable low-calorie and • Involvement in public– and schools. nutrient-dense products in private par tnerships for calorie-controlled por tions childhood obesity and serving sizes. prevention and health promotion initiatives. Cross-cutting Factors FIGURE 6-3 Evaluation framework for industry efforts to develop low-calorie and nutrient-dense beverages and promote their consumption by children and youth. SOURCE: IOM, 2007. interconnected factors that influence the potential impact of an intervention. It facili- tates the identification of resources (e.g., funding), strategies and actions (e.g., educa- tion, programs), outcomes (e.g., environmental, health), and other cross-cutting fac- tors (e.g., age, culture, psychosocial status) that are important to obesity prevention for particular users. GENERALIZABILITY AND CONTEXTUAL CONSIDERATIONS Existing standards of evidence formulate the issue of generalizability in terms of efficacy, effectiveness, and readiness for dissemination (Flay et al., 2005). From this perspective, among the questions to be answered in evaluating whether studies are more or less useful as a source of evidence are the following: How representative were the setting, population, and circumstances in which the studies were conducted? Can the evidence from a study or group of studies be generalized to the multiple settings, populations, and contexts in which the evidence would be applied? Are the interven- tions studied affordable and scalable in the wide variety of settings where they might Bridging the Evidence Gap in Obesity Prevention 0

OCR for page 115
be needed, given the resources and personnel available in those settings? For decision makers, the generalizability of evidence is what they might refer to as “relevance”: Is the evidence, they ask, relevant to our population and context? Answering this ques- tion requires comparing the generalizability of the studies providing the evidence and the context (setting, population, and circumstances) in which the evidence would be applied. Glasgow and others have called for criteria with which to judge the gener- alizability of studies in reporting evidence, similar to the Consolidated Standards of Reporting Trials (CONSORT) reporting criteria for RCTs and the Transparent Reporting of Evaluations with Nonrandomized Designs (TREND) quality rating scales for nonrandomized trials (Glasgow et al., 2006a). Box 6-1 details four dimensions of generalizability (using the term “external validity”) in the reporting of evidence in most efficacy trials and many effectiveness trials and the specific indicators or ques- tions that warrant consideration in judging the quality of the research (Green and Glasgow, 2006). EXISTING APPROACHES TO EVALUATING EVIDENCE The most widely acknowledged approach for evaluating evidence—one that underlies much of what is considered evidence of causation in the health sciences—is the classic nine criteria or “considerations” of Bradford Hill (Hill, 1965): strength of associa- tion, consistency, specificity, temporality, biological gradient, plausibility, coherence, experiment, and analogy. All but one of these criteria emphasize the level of causality, largely because the phenomena under study were organisms whose biology was rela- tively uniform within species, so the generalizability of causal relationships could be assumed with relative certainty. The rating scheme of the Canadian Task Force on the Periodic Health Examination (Canadian Task Force on the Periodic Health Examination, 1979) was adopted in the late 1980s by the U.S. Preventive Services Task Force (USPSTF) (which systematically reviews evidence for effectiveness and develops recommendations for clinical preventive services) (USPSTF, 1989, 1996). These criteria establish a hierar- chy for the quality of studies that places professional judgment and cross-sectional observation at the bottom and RCTs at the top. As described by Green and Glasgow (2006), these criteria also concern themselves almost exclusively with the level of cer- tainty. “The greater weight given to evidence based on multiple studies than a single study was the main . . . [concession] to external validity (or generalizability), . . . [but] even that was justified more on grounds of replicating the results in similar popula- tions and settings than of representing different populations, settings, and circum- stances for the interventions and outcomes” (Green and Glasgow, 2006, p. 128). The Cochrane Collaboration has followed this line of evidence evaluation in its systematic reviews, as has the evidence-based medicine movement (Sackett et al., 1996) more generally in its almost exclusive favoring of RCTs (see Chapter 5). As the Cochrane  Evaluating Evidence

OCR for page 115
Box 6-1 Quality Rating Criteria for External Validity Reach and Representativeness • Participation: Are there analyses of the participation rate among potential (1) settings, (2) delivery staff, and (3) patients (consumers)? • Target Audience: Is the intended target audience stated for adoption (at the intended settings, such as worksites or medical offices) and application (at the individual level)? • Representativeness (settings): Are comparisons made of the similarity of settings in the study to the intended target audience of program settings—or to those settings that decline to participate? • Representativeness (individuals): Are analyses conducted of the similarity and differences between patients, consumers, or other subjects who participate vs. either those who decline or the intended tar- get audience? Program or Policy Implementation and Adaptation • Consistent Implementation: Are data presented on level and quality of implementation of different pro- gram components? • Staff Expertise: Are data presented on the level of training or experience required to deliver the program or quality of implementation by different types of staff? • Program Adaptation: Is information reported on the extent to which different settings modified or adapt- ed the program to fit their setting? • Mechanisms: Are data reported on the process(es) or mediating variables through which the program or policy achieved its effects? Outcomes for Decision Making • Significance: Are outcomes reported in a way that can be compared to either clinical guidelines or public health goals? • Adverse Consequences: Do the outcomes reported include quality of life or potential negative outcomes? • Moderators: Are there any analyses of moderator effects—including of different subgroups of partici- pants and types intervention staff—to assess robustness vs. specificity of effects? • Sensitivity: Are there any sensitivity analyses to assess dose−response effects, threshold level, or point of diminishing returns on the resources expended? • Costs: Are data on the costs presented? If so, are standard economic or accounting methods used to fully account for costs? Maintenance and Institutionalization • Long-term Effects: Are data reported on longer-term effects, at least 12 months following treatment? • Institutionalization: Are data reported on the sustainability (or reinvention or evolution) of program implementation at least 12 months after the formal evaluation? • Attrition: Are data on attrition by condition reported, and are analyses conducted of the representative- ness of those who drop-out? SOURCE: Green, L. W., and R. E. Glasgow, Evaluation and the Health Professions 29(1), pp. 126-153, Copyright © 2006 by SAGE Publications. Reprinted by permission of SAGE Publications. Bridging the Evidence Gap in Obesity Prevention 

OCR for page 115
methods have been extended to nonmedical applications, greater acceptability of other types of evidence has been granted, but reluctantly (see below). More recently, the Campbell Collaboration (see Sweet and Moynihan, 2007) attempted to take a related but necessarily distinctive approach to systematic reviews of more complex interven- tions addressing social problems beyond health, in the arenas of education, crime and justice, and social welfare. The focus was on improving the usefulness of systematic reviews for researchers, policy makers, the media, interest groups, and the broader community of decision makers. The Society for Prevention Research has extended efforts to establish standards for identifying effective prevention programs and policies by issuing standards for efficacy (level of certainty), effectiveness (generalizability), and dissemination (Flay et al., 2005). The criteria of the USPSTF mentioned above were adapted by the Community Preventive Services Task Force, with greater concern for generalizability in recogni- tion of the more varied public health circumstances of practice beyond clinical set- tings (Briss et al., 2000, 2004; Green and Kreuter, 2000). The Community Preventive Services Task Force, which is overseeing systematic reviews of interventions designed to promote population health, is giving increasing attention to generalizability in a standardized section on “applicability.” Numerous textbooks on research quality have tended to concern themselves primarily with designs for efficacy rather than effective- ness studies, although the growing field of evaluation has increasingly focused on issues of practice-based, real-time, ordinary settings (Glasgow et al., 2006b; Green and Lewis, 1986, 1987; Green et al., 1980). Finally, in the field of epidemiology, Rothman and Greenland (2005) offer a widely cited model that describes causality in terms of sufficient causes and their component causes. This model illuminates important princi- ples such as multicausality, the dependence of the strength of component causes on the prevalence of other component causes, and the interactions among component causes. The foregoing rules or frameworks for evaluating evidence have increasingly been taken up by the social service professions, building not just on biomedical tra- ditions but also on agricultural and educational research in which experimentation predated much of the action research in the social and behavioral sciences. The social service and education fields have increasingly utilized RCTs, but have faced growing resistance to their limitations and the “simplistic distinction between strong and weak evidence [that] hinged on the use of randomized controlled trials . . .” (Chatterji, 2007, p. 239; see also Hawkins et al., 2007; Mercer et al., 2007; Sanson-Fisher et al., 2007), especially when applied to complex community interventions. Campbell and Stanley’s (1963) widely used set of “threats to internal validity (level of certainty)” for experimental and quasi-experimental designs were accompa- nied by their seldom referenced “threats to external validity (generalizability).” “The focus on internal validity (level of certainty) was justified on the grounds that without internal validity, external validity or generalizability would be irrelevant or mislead- ing, if not impossible” (Green and Glasgow, 2006, p. 128). These and other issues  Evaluating Evidence

OCR for page 115
concerning the level of certainty and generalizability are discussed in greater detail in Chapter 8. A PROPOSED APPROACH TO EVALUATING THE QUALITY OF SCIENTIFIC EVIDENCE Scientists have always used criteria or guidelines to organize their thinking about the nature of evidence. Much of what we think we know about the causes of obesity and the current obesity epidemic, for example, is based on the evaluation of evidence using existing criteria. In thinking about the development of a contemporary framework to guide decision making in the complex settings of public health, however, the commit- tee decided to advance a broader view of appropriate evaluation criteria. As described in 2005 in a seminal report from the Institute of Medicine (IOM), these decisions need to be made with the “best available evidence” and cannot wait for the “best possible evidence” or all the desirable evidence to be at hand (IOM, 2005, p. 3). The L.E.A.D. framework should serve the needs of decision makers focused on the obesity epidemic, but can also provide guidance for those making decisions about complex, multi- factorial public health challenges more generally. The starting point for explaining the committee’s approach to evaluating the quality of evidence for obesity prevention is the seven categories of study designs and different sources of evidence presented in Chapter 5. In Table 6-1, this typology is linked to criteria for judging the quality of evidence, drawing on the concept of “criti- cal appraisal criteria” of Rychetnik and colleagues (Rychetnik et al., 2002, 2004). Generally speaking, different types of evidence from different types of study designs are evaluated by different criteria, all of which can be found in the literature on evalu- ating the quality of each type of evidence. In all cases, high-quality evidence avoids bias, confounding, measurement error, and other threats to validity whenever possible; however, other aspects of quality come into play within the broader scope of evidence advanced by the L.E.A.D. framework. Users of the L.E.A.D. framework can refer to any of the various criteria for high-quality evidence depending on the source of evidence they have located, follow- ing the guidance provided in Chapter 5 as well as the references cited in Table 6-1. This process requires some time and effort by an individual or multidisciplinary group with some expertise in evaluating evidence. Despite the availability of the criteria listed in Table 6-1, making judgments about the quality of evidence can still be chal- lenging. One recommended approach is the eight-step process advanced by Liddle and colleagues (1996): 1. “Select reviewers(s) and agree on details of the review procedure. 2. Specify the objective of the review of evidence. 3. Identify strategies to locate the full range of evidence including unpublished results and work in progress. 4. Classify the literature according to general purpose and study type. Bridging the Evidence Gap in Obesity Prevention 

OCR for page 115
TABLE 6-1 A Typology of Study Designs and Quality Criteria Sources of Evidence (research designs, tools, and methods for evidence gathering) Existing Criteria for Assessing Quality of Evidence Nonexperimental or Can be assessed by criteria grouped by Liddle et al. (1996): Observational Studies • Descriptive information about the review or study (e.g., type of intervention) • Study design, implementation, and analysis • Overall assessment of the study Transparent Reporting of Evaluations with Nonrandomized Designs (TREND) is a new attempt to establish criteria for nonrandomized intervention studies that has produced a preliminary statement of criteria for judging such studies (Des Jarlais et al., 2004). Experimental and Can be graded by assessment of study design, selection bias, confounding; blinding; data collection Quasi-experimental and classification of outcomes, follow-up, withdrawal and drop-out, and analysis (Rychetnik et al., Studies 2002, as outlined by the Oxford-based Public Health Resource Unit). Quality of an RCT is based on (Higgins and Green, 2009): • Assignment to treatment and control groups and blinding • Degree of potential confounding • Classification of outcomes and follow-up • Appropriate analysis (e.g., “intention to treat”) Study design is evaluated by levels of evidence (as in those of the Canadian Task Force on the Periodic Health Examination or the Task Force on Community Preventive Services and the U.S. Preventive Services Task Force [USPSTF]). Criteria for the USPSTF are summarized by Harris et al. (2001) and updated by Pettiti et al. (2009). Qualitative Research Standardized quality criteria have not been agreed upon, but should reflect the distinctive goals of the research. As an example of criteria, quality may be determined by the audit trail of processes and decisions made and the credibility of the study methods (Rychetnik et al., 2002, Table 3): • Clarity of objectives and research questions • Appropriate selection of method to meet aims • Clear rationale for sampling strategy • Appropriate use of triangulation • Audit trail in data collection and analysis • Explicit research position and role • Clear basis for findings • Transferability of findings • Relevance, usefulness, importance of findings Mixed-Method Quality criteria for mixed-method research derive from the quality criteria used for quantitative Experimental Studies and qualitative designs separately. A 15-point checklist of criteria for mixed-method research and mixed studies reviews is presented by Pluye et al. (2009). Three points on which mixed-method research can be judged are: • Justification of the mixed-method design • Combination of qualitative and quantitative data collection−analysis techniques or procedures • Integration of qualitative and quantitative data or results continued  Evaluating Evidence

OCR for page 115
TABLE 6-1 Continued Sources of Evidence (research designs, tools, and methods for evidence gathering) Existing Criteria for Assessing Quality of Evidence Evidence Synthesis Questions to consider when appraising a systematic review include (Public Health Resource Unit, Methods 2006): • Did the review address a clearly focused question? • Did the review include the right type of study? • Did the reviewers try to identify all relevant studies? • Did the reviewers assess the quality of all the studies included? • If the results of the study were combined, was it reasonable to do so? • How are the results presented, and what are the main results? • How precise are these results? • Can the results be applied to the local population? • Were all important outcomes considered? • Should practice or policy change as a result of the evidence contained in this review? Parallel Evidence Quality is determined by the underlying study designs of the parallel evidence sources in the same way that it is determined for the primary evidence. Expert Knowledge Questions to consider when appraising expert knowledge include (World Cancer Research Fund and American Institute for Cancer Research, 2007): • Were methods of review and development of recommendations described? • Was expert knowledge (1) derived from an expert panel, (2) derived from an original review of the literature, and (3) based on published peer-reviewed literature specified in a bibliography? A description of the computer-based Delphi Method for utilizing expert knowledge reliably is provided by Turoff and Hiltz (1996). A description of procedures used to quantify expert opinion (using specialized software) is in Garthwaite et al. (2008). 5. Retrieve the full version of evidence available. 6. Assess the quality of the evidence. 7. Quantify the strength of the evidence. 8. Express the evidence in a standard way.” (pp. 6-7). Step 6 includes checklists for assessing the quality of studies depending on their design and purpose (Liddle et al., 1996). Most biomedical researchers are familiar with the quality criteria that have been used for experimental and observational epidemiological research, but less so with those used for qualitative studies. Quality is not addressed for qualitative research in the checklists offered by Liddle and colleagues (1996), but can be assessed using Bridging the Evidence Gap in Obesity Prevention 

OCR for page 115
the same broad concepts of validity and relevance used for quantitative research. However, these concepts need to be applied differently to account for the distinctive goals of such research, so defining a single method for evaluation is not suggested (Cohen and Crabtree, 2008; Patton, 1999). Mays and Pope (2000) summarize “rela- tivist” criteria for quality, similar to the criteria of Rychetnik and colleagues (2002) (see Table 6-1), that are common to both qualitative and quantitative studies. Others have since reported on criteria that can be used to assess qualitative research (Cohen and Crabtree, 2008; Popay et al., 1998; Reis et al., 2007). In addition, guidance on the description and implementation of qualitative (and mixed-method) research, along with a checklist, has been provided by the National Institutes of Health (Office of Behavioral and Social Sciences Research, 2000). Criteria also exist for evaluating the quality of systematic reviews themselves, whether they are of quantitative or qualitative studies (Goldsmith et al., 2007). In addition to the criteria of the Public Health Resource Unit (2006) listed in Table 6-1, a detailed set of criteria has been compiled by the Milbank Memorial Foundation and the Centers for Disease Control and Prevention (CDC) (Sweet and Moyniham, 2007). As noted earlier, expert knowledge is frequently considered to be at the bot- tom of traditional hierarchies that focus on level of certainty, such as that used by the USPSTF. However, expert knowledge can be of value in evaluating evidence and can also be viewed with certain quality criteria in mind (Garthwaite et al., 2008; Harris et al., 2001; Petitti et al., 2009; Turoff and Hiltz, 1996; World Cancer Research Fund and American Institute for Cancer Research, 2007). The Delphi Method was devel- oped to utilize expert knowledge in a reliable and creative way that is suitable for decision making and has been found to be effective in social policy and public health decision making (Linstone and Turoff, 1975); it is a “structured process for collecting and distilling knowledge from a group of experts” through questionnaires interspersed with controlled feedback (Adler and Ziglio, 1996, p. 3). If these quality criteria are taken into account and conflicts of interest are identified and minimized, decision making can benefit substantially from the considered opinion of experts in a particular field or of practitioners, stakeholders, and policy makers capable of making informed judgments on implementation issues (e.g., doctors, lawyers, scientists, or academics able to interpret the scientific literature or specialized forms of data). Finally, in addition to the main sources of evidence included in Table 6-1, other sources may be of value in decision making. Many are not independent sources, but closer to a surveillance mechanism or a tool for dissemination of evidence. They include simulation models, health impact assessments, program or policy evaluations, policy scans, and legal opinions. For instance, health impact assessments (described in more detail in Chapter 5, under “What” questions) formally examine the poten- tial health effects of a proposed intervention (Cole and Fielding, 2007). An example is Health Forecasting (University of California–Los Angeles School of Public Health, 2009), which uses a web-based simulation model that allows users to view evidence- based descriptions of populations and subpopulations (disparities) to assess the poten-  Evaluating Evidence

OCR for page 115
tial effects of policies and practices on future health outcomes. Another such source, policy evaluations, allows studies of various aspects of a problem to be driven by a clear conceptual model. An example is the International Tobacco Control Policy Evaluation Project, a multidisciplinary, multisite, international endeavor that aims to evaluate and understand the impact of tobacco control policies as they are imple- mented in countries around the world (Fong et al., 2006). These sources may provide evidence for which there are quality criteria to consider, but are not addressed in detail here. WHEN SCIENTIFIC EVIDENCE IS NOT A PERFECT FIT: TRADE-OFFS TO CONSIDER Trade-offs may be involved in considering the quality of various types of evidence available to answer questions about complex, multilevel public health interventions (Mercer et al., 2007). Randomization at the individual level and experimental con- trols may remain the gold standard, but as pointed out above, these methods are not always possible in population health settings, and they are sometimes counterproduc- tive with respect to the artificial conditions used to implement randomization and control procedures. Therefore, some of the advantages of RCTs may have to be traded off to obtain the best available evidence for decision making. Because no one study is usually sufficient to support decisions on public health interventions, the use of mul- tiple types of evidence (all of good quality for their design) may be the best approach (Mercer et al., 2007), a point further elaborated upon in Chapter 8. REFERENCES Adler, M., and E. Ziglio. 1996. Gazing into the oracle: The Delphi method and its application to social policy and public health. London, UK: Jessica Kingsley. Briss, P. A., S. Zaza, M. Pappaioanou, J. Fielding, L. Wright-De Aguero, B. I. Truman, D. P. Hopkins, P. D. Mullen, R. S. Thompson, S. H. Woolf, V. G. Carande-Kulis, L. Anderson, A. R. Hinman, D. V. McQueen, S. M. Teutsch, and J. R. Harris. 2000. Developing an evidence-based Guide to Community Preventive Services—methods. American Journal of Preventive Medicine 18(1, Supplement 1):35-43. Briss, P. A., R. C. Brownson, J. E. Fielding, and S. Zaza. 2004. Developing and using the Guide to Community Preventive Services: Lessons learned about evidence-based public health. Annual Review of Public Health 25:281-302. Campbell, D. T., and J. C. Stanley. 1963. Experimental and quasi-experimental designs for research. Chicago: Rand McNally. Canadian Task Force on the Periodic Health Examination. 1979. The periodic health examina- tion. Canadian Medical Association Journal 121(9):1193-1254. Chatterji, M. 2007. Grades of evidence: Variability in quality of findings in effectiveness studies of complex field interventions. American Journal of Evaluation 28(3):239-255. Cohen, D. J., and B. F. Crabtree. 2008. Evaluative criteria for qualitative research in health care: Controversies and recommendations. Annals of Family Medicine 6(4):331-339. Bridging the Evidence Gap in Obesity Prevention 

OCR for page 115
Cole, B. L., and J. E. Fielding. 2007. Health impact assessment: A tool to help policy makers understand health beyond health care. Annual Review of Public Health 28:393-412. Des Jarlais, D. C., C. Lyles, and N. Crepaz. 2004. Improving the reporting quality of non- randomized evaluations of behavioral and public health interventions: The TREND state- ment. American Journal of Public Health 94(3):361-366. Flay, B. R. 1986. Efficacy and effectiveness trials (and other phases of research) in the develop- ment of health promotion programs. Preventive Medicine 15(5):451-474. Flay, B. R., A. Biglan, R. F. Boruch, F. G. Castro, D. Gottfredson, S. Kellam, E. K. Moscicki, S. Schinke, J. C. Valentine, and P. Ji. 2005. Standards of evidence: Criteria for efficacy, effec- tiveness and dissemination. Prevention Science 6(3):151-175. Fong, G. T., K. M. Cummings, R. Borland, G. Hastings, A. Hyland, G. A. Giovino, D. Hammond, and M. E. Thompson. 2006. The conceptual framework of the International Tobacco Control (ITC) Policy Evaluation Project. Tobacco Control 15(Supplement 3): iii1-iii2. Garthwaite, P. H., J. B. Chilcott, D. J. Jenkinson, and P. Tappenden. 2008. Use of expert knowledge in evaluating costs and benefits of alternative service provisions: A case study. International Journal of Technology Assessment in Health Care 24(3):350-357. Glasgow, R., L. Green, L. Klesges, D. Abrams, E. Fisher, M. Goldstein, L. Hayman, J. Ockene, and C. Orleans. 2006a. External validity: We need to do more. Annals of Behavioral Medicine 31(2):105-108. Glasgow, R. E., L. M. Klesges, D. A. Dzewaltowski, P. A. Estabrooks, and T. M. Vogt. 2006b. Evaluating the impact of health promotion programs: Using the RE-AIM framework to form summary measures for decision making involving complex issues. Health Education Research 21(5):688-694. Goldsmith, M. R., C. R. Bankhead, and J. Austoker. 2007. Synthesising quantitative and qualitative research in evidence-based patient information. Journal of Epidemiology and Community Health 61(3):262-270. Green, L. W., and R. E. Glasgow. 2006. Evaluating the relevance, generalization, and applica- bility of research: Issues in external validation and translation methodology. Evaluation & the Health Professions 29(1):126-153. Green, L. W., and M. W. Kreuter. 2000. Commentary on the emerging Guide to Community Preventive Services from a health promotion perspective. American Journal of Preventive Medicine 18(1 Supplement 1):7-9. Green, L. W., and M. W. Kreuter. 2005. Health program planning: An educational and ecologi- cal approach. 4th ed. New York: McGraw-Hill. Green, L. W., and F. M. Lewis. 1986. Measurement and evaluation in health education and health promotion. Palo Alto, CA: Mayfield Publishing Company. Green, L., and F. M. Lewis. 1987. Data analysis in evaluation of health education: Towards standardization of procedures and terminology. Health Education Research 2(3):215-221. Green, L. W., F. M. Lewis, and D. M. Levine. 1980. Balancing statistical data and clinician judgments in the diagnosis of patient educational needs. Journal of Community Health 6(2):79-91. Harris, R. P., M. Helfand, S. H. Woolf, K. N. Lohr, C. D. Mulrow, S. M. Teutsch, and D. Atkins. 2001. Current methods of the U.S. Preventive Services Task Force: A review of the process. American Journal of Preventive Medicine 20(3, Supplement):21-35.  Evaluating Evidence

OCR for page 115
Hawkins, N. G., R. W. Sanson-Fisher, A. Shakeshaft, C. D’Este, and L. W. Green. 2007. The multiple baseline design for evaluating population-based research. American Journal of Preventive Medicine 33(2):162-168. Higgins, J. P. T., and S. Green (editors). 2009. Cochrane handbook for systematic review of interventions, Version 5.0.2 [updated September 2009]. The Cochrane Collaboration, 2008. http://www.cochrane-handbook.org (accessed December 13, 2009). Hill, A. B. 1965. The environment and disease: Association or causation. Proceedings of the Royal Society of Medicine 58:295-300. IOM (Institute of Medicine). 2005. Preventing childhood obesity: Health in the balance. Edited by J. Koplan, C. T. Liverman, and V. I. Kraak. Washington, DC: The National Academies Press. IOM. 2007. Progress in preventing childhood obesity: How do we measure up? Edited by J. Koplan, C. T. Liverman, V. I. Kraak, and S. L. Wisham. Washington, DC: The National Academies Press. Liddle, J., M. Williamson, and L. Irwig. 1996. Method for evaluating research and guideline evidence. Sydney: NSW Health Department. Linstone, H. L., and M. Turoff. 1975. The Delphi method: Techniques and applications. Reading, MA: Addison-Wesley. MacKinnon, D. P. 2008. An introduction to statistical meditation analysis. New York: Lawrence Erlbaum Associates. Mays, N., and C. Pope. 2000. Qualitative research in health care: Assessing quality in qualita- tive research. British Medical Journal 320(7226):50-52. Mercer, S. L., B. J. DeVinney, L. J. Fine, L. W. Green, and D. Dougherty. 2007. Study designs for effectiveness and translation research: Identifying trade-offs. American Journal of Preventive Medicine 33(2):139-154. Office of Behavioral and Social Sciences Research. 2000. Qualitative methods in health research: Opportunities and considerations in application and review. Produced by the NIH Culture and Qualitative Research Interest Group, based on discussions and written com- ments from the expert working group at a workshop sponsored by the Office of Behavioral and Social Sciences Research. Bethesda, MD: Office of Behavioral and Social Sciences Research. Patton, M. Q. 1999. Enhancing the quality and credibility of qualitative analysis. Health Services Research 34(5, Part 2):1189-1208. Petitti, D. B., S. M. Teutsch, M. B. Barton, G. F. Sawaya, J. K. Ockene, and T. Dewitt. 2009. Update on the methods of the U.S. Preventive Services Task Force: Insufficient evidence. Annals of Internal Medicine 150(3):199-205. Petticrew, M., and H. Roberts. 2003. Evidence, hierarchies, and typologies: Horses for courses. Journal of Epidemiology and Community Health 57(7):527-529. Pluye, P., M. P. Gagnon, F. Griffiths, and J. Johnson-Lafleur. 2009. A scoring system for appraising mixed methods research, and concomitantly appraising qualitative, quantita- tive and mixed methods primary studies in mixed studies reviews. International Journal of Nursing Studies 46(4):529-546. Popay, J., A. Rogers, and G. Williams. 1998. Rationale and standards for the systematic review of qualitative literature in health services research. Qualitative Health Research 8(3):341-351. Bridging the Evidence Gap in Obesity Prevention 0

OCR for page 115
Public Health Resource Unit. 2006. Critical appraisal skills programme (CASP). Making sense of evidence. http://www.phru.nhs.uk/Doc_Links/S.Reviews%20Appraisal%20Tool.pdf (accessed December 17, 2009). Reis, S., D. Hermoni, R. Van-Raalte, R. Dahan, and J. M. Borkan. 2007. Aggregation of quali- tative studies—from theory to practice: Patient priorities and family medicine/general prac- tice evaluations. Patient Education and Counseling 65(2):214-222. Rothman, K. J., and S. Greenland. 2005. Causation and causal inference in epidemiology. American Journal of Public Health 95(Supplement 1):S144-S150. Rychetnik, L., M. Frommer, P. Hawe, and A. Shiell. 2002. Criteria for evaluating evi- dence on public health interventions. Journal of Epidemiology and Community Health 56(2):119-127. Rychetnik, L., P. Hawe, E. Waters, A. Barratt, and M. Frommer. 2004. A glossary for evidence based public health. Journal of Epidemiology and Community Health 58(7):538-545. Sackett, D. L., and J. E. Wennberg. 1997. Choosing the best research design for each question. British Medical Journal 315(7123):1633-1640. Sackett, D. L., W. M. C. Rosenberg, J. A. M. Gray, R. B. Haynes, and W. S. Richardson. 1996. Evidence based medicine: What it is and what it isn’t. British Medical Journal 312(7023):71-72. Sanson-Fisher, R. W., B. Bonevski, L. W. Green, and C. D’Este. 2007. Limitations of the ran- domized controlled trial in evaluating population-based health interventions. American Journal of Preventive Medicine 33(2):155-161. Sweet, M., and R. Moynihan. 2007. Improving population health: The uses of system- atic reviews. New York: Milbank Memorial Fund and Centers for Disease Control and Prevention. Swinburn, B., T. Gill, and S. Kumanyika. 2005. Obesity prevention: A proposed framework for translating evidence into action. Obesity Reviews 6(1):23-33. Turoff, M., and S. R. Hiltz. 1996. Computer-based Delphi process. In Gazing into the oracle: The Delphi method and its application to social policy and public health, edited by M. Adler and E. Ziglio. London, UK: Jessica Kingsley. Pp. 56-85. University of California–Los Angeles School of Public Health. 2009. Health forecasting. http:// www.health.forcasting.org (accessed November 9, 2009). USPSTF (U.S. Preventive Services Task Force). 1989. Guide to clinical preventive services. Baltimore, MD: Lippincott Williams & Wilkins. USPSTF. 1996. Guide to clinical preventive services. 2nd ed. Baltimore, MD: Williams & Wilkins. World Cancer Research Fund and American Institute for Cancer Research. 2007. Food, nutri- tion, physical activity, and the prevention of cancer: A global perspective. Washington, DC: American Institute for Cancer Research.  Evaluating Evidence

OCR for page 115