Read "Knowing What Works in Health Care: A Roadmap for the Nation" at NAP.edu

« Previous: 3 Setting Priorities for Evidence Assessment

Page 81 Cite

Suggested Citation:"4 Systematic Reviews: The Central Link Between Evidence and Clinical Decision Making." Institute of Medicine. 2008. Knowing What Works in Health Care: A Roadmap for the Nation. Washington, DC: The National Academies Press. doi: 10.17226/12038.

Page 82 Cite

Page 83 Cite

Page 84 Cite

Page 85 Cite

Page 86 Cite

Page 87 Cite

Page 88 Cite

Page 89 Cite

Page 90 Cite

Page 91 Cite

Page 92 Cite

Page 93 Cite

Page 94 Cite

Page 95 Cite

Page 96 Cite

Page 97 Cite

Page 98 Cite

Page 99 Cite

Page 100 Cite

Page 101 Cite

Page 102 Cite

Page 103 Cite

Page 104 Cite

Page 105 Cite

Page 106 Cite

Page 107 Cite

Page 108 Cite

Page 109 Cite

Page 110 Cite

Page 111 Cite

Page 112 Cite

Page 113 Cite

Page 114 Cite

Page 115 Cite

Page 116 Cite

Page 117 Cite

Page 118 Cite

Page 119 Cite

Page 120 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

4 Systematic Reviews: The Central Link Between Evidence and Clinical Decision Making If, as is sometimes supposed, science consisted in nothing but the labori- ous accumulation of facts, it would soon come to a standstill, crushed, as it were, under its own weight. Two processes are thus at work side by side, the reception of new material and the digestion and assimilation of the old. . . . The work which deserves, but I am afraid does not always receive, the most credit is that in which discovery and explanation go hand in hand, in which not only are new facts presented, but their relation to old ones is pointed out. J. W. Strutt Lord Rayleigh Address to the British Association for the Advancement of Science (Rayleigh, 1884, p. 1) More than a decade has passed since it was first shown that patients have been harmed by failure to prepare scientifically defensible reviews of exist- ing research evidence. There are now many examples of the dangers of this continuing scientific sloppiness. Organizations and individuals concerned about improving the effectiveness and safety of health care now look to systematic reviews of researchânot individual studiesâto inform their judgments. Iain Chalmers Academiaâs Failure to Support Systematic Reviews (Chalmers, 2005) Abstract: This chapter provides the committeeâs findings and recommen- dations for conducting systematic evidence reviews under the aegis of a proposed national clinical effectiveness assessment program (âthe Pro- gramâ). The chapter reviews the origins of systematic review methods and describes the fundamental components of systematic reviews and the shortcomings of current efforts. Under the status quo, the quality of the reviews is variable, methods are poorly documented, and findings are often unreliable. The committee recommends that the Program establish evidence-based, methodological standards for systematic reviews, includ- ing standard terminology for characterizing the strength of evidence and a standard reporting format for systematic reviews. Once Program stan- 81

82 KNOWING WHAT WORKS IN HEALTH CARE dards are established, the Program should fund only those reviewers who commit to and consistently meet the standards. The committee found that the new science of systematic reviews has made great strides, but more methodological research is needed. Investing in the science of research synthesis will increase the quality and the value of the evidence provided in systematic reviews. It is not clear whether there are sufficient numbers of qualified researchers to conduct high-quality reviews. The capacity of the workforce should be assessed and expanded, if needed. Systematic reviews are central to scientific inquiry into what is known and not known about what works in health care (Glasziou and Haynes, 2005; Helfand, 2005; Mulrow and Lohr, 2001; Steinberg and Luce, 2005). In 1884, J. W. Strutt Lord Rayleigh, who later won a Nobel prize in physics, observed that the synthesis and explanation of past discoveries are integral to future progress (Rayleigh, 1884). Yet, more than a century later, Antman and colleagues (1992) and Lau and colleagues (1992) clearly demonstrated that this message was still largely ignored, with the potential for great harm to patients. In a series of meta-analyses examining the treatment of myocar- dial infarction, the researchers concluded that clinicians need better access to syntheses of the results of existing studies to formulate clinical recom- mendations. Today, systematic reviews of the available evidence remain an often undervalued scientific discipline. This chapter has three principal objectives: (1) to describe the funda- mental components of a systematic review, (2) to present the committeeâs recommendations for conducting systematic evidence reviews under the aegis of a proposed national clinical effectiveness assessment program (âthe Programâ), and (3) to highlight the key challenges in producing high- quality systematic reviews. BACKGROUND What Is a Systematic Review? A systematic review is a scientific investigation that focuses on a specific question and uses explicit, preplanned scientific methods to identify, select, assess, and summarize similar but separate studies (Haynes et al., 2006; West et al., 2002). It may or may not include a quantitative synthesis of the results from separate studies (meta-analysis). A meta-analysis quantitatively combines the results of similar studies in an attempt to allow inference from the sample of studies included to the population of interest. This report uses the term âsystematic reviewâ to describe reviews that incorporate meta- analyses as well as reviews that present the study data descriptively rather than inferentially.

SYSTEMATIC REVIEWS 83 Individual studies rarely provide definitive answers to clinical effective- ness questions (Cook et al., 1997). If it is conducted properly, a systematic review should make obvious the gap between what is known about the ef- fectiveness of a particular service and what clinicians and patients want to know (Helfand, 2005). As such, systematic reviews are also critical to the development of an agenda for further primary research because they reveal where the evidence is insufficient and new information is needed (Neumann, 2006). Without systematic reviews, researchers may miss promising leads or pursue questions that have already been answered (Mulrow et al., 1997). In addition, systematic reviews provide an essential bridge between the body of research evidence and the development of clinical guidance. Key U.S. Producers and Users of Systematic Reviews This section briefly describes the variety of contexts in which key U.S. organizations produce or use systematic reviews (Table 4-1). The ultimate purposes of systematic reviews vary and include health coverage decisions, practice guidelines, regulatory approval of new pharmaceuticals or medical devices, clinical research or program planning. Within the federal govern- ment, the users include the Agency for Healthcare Research and Quality (AHRQ), the Centers for Medicare & Medicaid Services (CMS), the Medi- care Evidence Development and Coverage Advisory Committee (MedCAC), the Centers for Disease Control and Prevention (CDC), the U.S. Food and Drug Administration (FDA), the Substance Abuse and Mental Health Ad- ministration (SAMHSA), the U.S. Preventive Services Task Force (USPSTF), and the Veterans Health Administration (VHA). AHRQ plays a lead role in producing systematic reviews through its program of Evidence-based Practice Centers (EPCs) as a part of its Effective Health Care Program. EPCs produce systematic reviews for professional medical societies and several federal agencies, including CMS and the Na- tional Institutes of Health (NIH) Consensus Development Conferences, as well as a variety of other public and private requestors, such as the USPSTF and the American Heart Association. The reviews cover a broad range of topics, including the effectiveness and safety of health care interventions, emergency preparedness, research methods, and approaches to improving the quality and delivery of health care. The AHRQ Effective Health Care Program produces comparative effectiveness studies on surgical proce- dures, medical devices, and medical therapies in 10 priority areas (Slutsky, 2007). The CDC conducts or sponsors systematic effectiveness reviews to evaluate and make recommendations on population-based and public â See Table 3-3 in Chapter 3 for a list of recent EPC studies.

84 KNOWING WHAT WORKS IN HEALTH CARE TABLE 4-1â Key U.S. Producers and Users of Systematic Reviews Government Agencies CMS Component AHRQ USPSTF SAMHSA FDA VHA MedCAC CDC Activity â¢ Produces reviews â â â â¢ Sponsors or â â â â â purchases reviews Principal use â¢ Development â â â â â of practice guidelines and recommendations â¢ Decisions â â â â regarding health coverage â¢ Regulatory â approval NOTE: BCBSA TEC = Blue Cross and Blue Shield Association Technology Evaluation Center. health interventions and to improve the underlying research methods (CDC, 2007). The Blue Cross and Blue Shield Association (BCBSA) Technology Evaluation Center (TEC) produces systematic reviews that assess medical technologies for decision makers in its member plans but also provides the results of these reviews to the public for free. Many other health plans look to private research organizations, such as the ECRI Institute and Hayes, Inc., that produce systematic evidence assessments available by subscription or for purchase (ECRI, 2006a,b; Hayes, Inc., 2007). Because the reviews are proprietary, they are not free to the public and the subscription fees are considerable. At Hayes, Inc., for example, subscriptions range from $10,000 to $300,000, depending on the size of the subscribing organiza- tions and the types of products licensed. The Cochrane Collaboration is an international effort that produces systematic reviews of health interventions; 11 percent (nearly 1,700 indi- viduals) of its active contributors are in the United States (Allen and Clarke, 2007). Cochrane reviews are available by subscription to The Cochrane Library, and abstracts are available for free through PubMed or www. cochrane.org. â See http://www.bcbs.com/betterknowledge/tec/. â Personal communication, W. S. Hayes, Hayes, Inc., August 29, 2007.

SYSTEMATIC REVIEWS 85 Private Research Firms Other Entities ECRI BCBSA Hayes, Cochrane Health Specialty Institute TEC Inc. Collaboration Plans Societies Manufacturers â â â â â â â â â â â â â â â â â â â â â â â Professional medical societies often sponsor or conduct evidence re- views as the first step in developing a practice guideline. These include, for example, the American College of Physicians, several cardiology groups (the American College of Cardiology, the American College of Chest Physicians, and the American Heart Association), the American Academy of Neurol- ogy, and the American Society of Clinical Oncology. Origins of Systematic Review Methods The term âmeta-analysisâ was first used by social scientists in the 1970s to describe the process of identifying a representative set of studies of a given topic and summarizing their results quantitatively. In a groundbreak- ing 1976 assessment of treatment for depression, Glass (1976) first used the term âmeta-analysisâ to describe what is now referred to as systematic review. Textbooks describing the concept and methods of systematic re- views (Cooper and Rosenthal, 1980; Glass et al., 1981; Hedges and Olkin, 1985; Light and Pillemer, 1984; Rosenthal, 1978; Sutton et al., 2000), and research articles exploring issues such as publication bias followed during that and the subsequent decade. Subsequently, as quantitative syntheses started to include qualitative summaries and medical scientists adopted the methods, a new terminol- ogy emerged. Richard Peto and colleagues used the term âoverviewâ for

86 KNOWING WHAT WORKS IN HEALTH CARE the new combined approach (Early Breast Cancer Trialistsâ Collaborative Group, 1988). Chalmers and Altman (1995) appear to have introduced the term âsystematic reviewâ in their book Systematic Reviews. They also suggested that the term âmeta-analysisâ be restricted to the statistical sum- mary of the results of studies identified as a product of the review process (Chalmers and Altman, 1995). Confusion over terminology persists today, perhaps because the methods grew up in the social sciences and only later were embraced by the medical sciences. The statistical methods underlying the quantitative aspects of system- atic reviewâi.e., meta-analysisâdate to the early 20th century, when statis- ticians started developing methods for combining the findings from separate but similar studies. In 1904, using new statistical methods, Karl Pearson (1904) combined research on the impact of inoculation against enteric fe- ver on mortality in five communities. In a 1907 study on the prevalence of typhoid, Goldberger (1907) again used quantitative synthesis. Social scientists were the first to use methods to critically synthesize results to allow statistical inference from a sample a population. As early as 1940, Pratt and colleagues (1940) at Duke University published a critical synthesis of more than 60 years of research on extrasensory perception. Systematic reviews in the health care arena were comparatively slow to catch on, and the growth in their development and use coincided with the general rise of evidence-based medicine (Guyatt, 1991). The early imple- menters of systematic reviews were those who conducted clinical trials and who saw the need to summarize data from multiple effectiveness trials, many of them with very small sample sizes (Yusuf et al., 1985). In the 1970s, Iain Chalmers organized the first major collaborative effort to de- velop a clinical trials evidence base, beginning with the Oxford Database of Perinatal Trials (Chalmers et al., 1986). This subsequently led to two major compilations of systematic reviews of clinical trials, one of pregnancy and childbirth (Chalmers et al., 1989) and one of the newborn period (Sinclair and Bracken, 1992). The growth of bioinformatics, specifically, electronic communication, data storage, and improved indexing and retrieval of pub- lications, allowed this collaborative effort in the perinatal field to expand further. In 1993, the Cochrane Collaboration was formed (Dickersin and Manheimer, 1998) with the aim of synthesizing information from studies of interventions on all health topics. Up to this time, literature reviews were often used to assess the effec- tiveness of health care interventions, but empiric research also began to re- veal problems in their execution. The methods underlying the reviews were often neither objective nor transparent (Mulrow, 1987; Oxman and Guyatt, 1988); and they did not routinely use scientific methods to identify, assess, and synthesize information. The approach to deciding which literature should be included and which findings should be presented was subjective

SYSTEMATIC REVIEWS 87 and nonsystematic. The reviews may have provided thoughtful, readable discussions of a topic, but the conclusions were generally not credible. The following sections of the chapter describe the fundamentals of conducting a scientifically rigorous systematic review and then provide the committeeâs findings on current efforts. FUNDAMENTALS OF A SYSTEMATIC REVIEW Although researchers use a variety of terms to describe the building blocks of a systematic review, the fundamentals are well established (AHRQ EPC Program, 2007; Counsell, 1997; EPC Coordinating Center, 2005; Haynes et al., 2006; Higgins and Green, 2006; Khan and Kleijnen, 2001; Khan et al., 2001a,b; West et al., 2002). Five basic steps (listed below) should be followed, and the key decisions that comprise each step of the review should be clearly documented. Step 1:â Formulate the research question. Step 2:â Construct an analytic (or logic) framework. Step 3:â Conduct a comprehensive search for evidence. Step 4:â Critically appraise the evidence. Step 5:â Synthesize the body of evidence. The following sections briefly describe each of these steps in the process. Step 1: Formulate the Research Question The foundation of a good systematic review is a well-formulated, clearly defined, answerable question. As such, it guides the analytic (or logic) framework for the review, the overall research protocol (i.e., the search for relevant evidence, decisions about which types of evidence should be used, and how best to identify the evidence), and the critical appraisal of the relevant evidence. The objective, in this first step, is to define a precise, unambiguous answerable research question. Richardson and colleagues (1995) coined the mnemonic PICO (popula- tion, intervention, comparison, and outcome of interest) to help ensure that explicit attention is paid to the four key elements of an evidence question., â Unless otherwise noted, this section draws from these references. â Personal communication, W. S. Richardson, Boonshoft School of Medicine, Wright State University, October 3, 2007. â recent draft version of an AHRQ comparative effectiveness methods manual proposes A expanding the PICO format to PICOTS, adding âtâ for timing and âsâ for settings (AHRQ, 2007a).

88 KNOWING WHAT WORKS IN HEALTH CARE Table 4-2 shows examples of how the PICO format can guide the building of a research question. The characteristics of the study population, such as age, sex, severity of illness, and presence of comorbidities, usually vary among studies and can be important factors in the effect of an intervention. Health care inter- ventions may have numerous outcomes of interest. The research question should be formulated so that it addresses all outcomesâbeneficial and adverseâthat matter to patients, clinicians, payers, developers of practice guidelines, and others who may be affected (SchÃ¼nemann et al., 2006). For example, treatments for prostate cancer may affect mortality; but patients are also interested in learning about potential harmful treatment effects, such as urinary incontinence and impotence. Imaging tests for Alzheimerâs disease may lead to the early diagnosis of the condition, but patients and the patientsâ caregivers may be particularly interested in whether an early diagnosis improves cognitive outcomes or quality of life. Many researchers suggest that decision makers be directly involved in formulating the question to ensure that the systematic review is relevant and can inform decision making (Lavis et al., 2005; SchÃ¼nemann et al., 2006). The questions posed by end users must sometimes be reframed to be answerable by clinical research studies. TABLE 4-2â PICO Format for Formulating an Evidence Question PICO Component Tips for Building Question Example Patient population or âHow would I describe this âIn patients with heart failure from problem group of patients?â dilated cardiomyopathy who are in â¢ Balance precision with sinus rhythm . . .â brevity Intervention (a cause, âWhich main intervention â. . . would adding anticoagulation prognostic factor, is of interest?â with warfarin to standard heart treatment, etc.) â¢ Be specific failure therapy . . .â Comparison âWhat is the main â. . . when compared with intervention alternative to be compared standard therapy alone . . .â (if necessary) with the intervention?â â¢ Be specific Outcomes âWhat do I hope â. . . lead to lower mortality or the intervention will morbidity from thromboembolism? accomplish?â âWhat could Is this enough to be worth the this exposure really affect?â increased risk of bleeding?â â¢ Be specific SOURCE: Adapted from the Evidence-based Practice Center Partnerâs Guide (EPC Coordinat- ing Center, 2005).

SYSTEMATIC REVIEWS 89 Step 2: Construct an Analytic Framework Once the research question is established, it should be articulated in an analytic framework that clearly lays out the chain of logic underlying the case for the health intervention of interest. The complexity of the analysis will vary depending on the number of linkages between the intervention and the outcomes of interest. For preventive services, there may be mul- tiple steps between, for example, screening for a disease and reductions in morbidity and mortality. Figure 4-1 shows the generic analytic framework 1 5 Treatment Reduced Persons Screening Association Early Detection of Intermediate Morbidity 3 4 6 at Risk Target Condition Outcome and/or 2 Mortality 7 8 Adverse Effects Adverse Effects of Treatment of Screening FIGURE 4-1â Analytic framework used by the U.S. Preventive Services Task Force. NOTE: Generic analytic framework for screening topics. Numbers refer to key questions as follow: (1) Is there direct evidence that screening reduces morbidity and/or mortality? (2) What is the prevalence of disease in the target groups? Can a high-risk group be reliably identified? (3) Can the screening test accurately detect the target condition? (a) What are the sensitivity and specificity of the test? (b) Is there 4-1 significant variation between examiners in how the test is performed? (c) In actual screening programs, how much earlier are patientstype new identified and treated? (4) Does treatment reduce the incidence of the intermediate outcome? (a) Does treatment work under ideal, clinical trial conditions? (b) How do the efficacy and effectiveness of treatments compare in community settings? (5) Does treatment improve health outcomes for people diagnosed clinically? (a) How similar are people diagnosed clinically to those diagnosed by screening? (b) Are there reasons to expect people diagnosed by screening to have even better health outcomes than those diagnosed clinically? (6) Is there intermediate outcome reliability associated with reduced morbidity and/or mortality? (7) Does screening result in adverse effects? (a) Is the test acceptable to patients? (b) What are the potential harms, and how often do they occur? (8) Does treatment result in adverse effects? SOURCE: Reprinted from the American Journal of Preventive Medicine, 20(3) Harris, R. P., M. Helfand, S. H. Woolf, K. N. Lohr, C. D. Mulrow, S. M. Teutsch, and D. Atkins, Current methods of the US Preventive Services Task Force: A review of the process, 21-35, Copyright 2007, with permission from Elsevier.

90 KNOWING WHAT WORKS IN HEALTH CARE that the USPSTF uses to assess screening interventions. It makes explicit the population at risk (left side of the figure), preventive services, diagnostic or therapeutic interventions, and intermediate and health outcomes to be considered (Harris et al., 2001). It also illustrates the chain of logic that the evidence must support to link the service to potential health outcomes: the arrows (linkages), labeled with a service or treatment, represent the ques- tions that the evidence must answer; dotted lines represent associations; and rectangles represent the intermediate outcomes (rounded corners) or the health states (square corners) by which those linkages are measured. The overarching linkage (Arrow 1) above the primary framework rep- resents evidence that directly links screening to changes in health out- comes. For example, a randomized controlled trial (RCT) of screening for Chlamydia established a direct, causal connection between screening and reductions in the incidence of pelvic inflammatory disease (Meyers et al., 2007; Scholes et al., 1996). That is, a single body of evidence established the connection between the preventive service (screening) and the health outcome (reduced morbidity). When direct evidence is lacking or is of insufficient quality to be con- vincing, the USPSTF relies on a chain of linkages to assess the likely effec- tiveness of a service. These linkages correspond to key questions about the screening test accuracy (Arrow 3), the efficacy of treatment (Arrows 4 and 5 for intermediate and health outcomes, respectively), and the association between intermediate measures and health outcomes (Dotted Line 6). A similar analytic framework can be constructed for questions of drug treat- ment, devices, behavior change, procedures, health care delivery, or any type of health intervention used in a population or in individuals. Deciding Which Evidence to Use: Study Selection Criteria What constitutes evidence that a health care service is highly effective? As noted in Chapter 1, scientists view evidence as knowledge that is ex- plicit, systematic, and replicable. However, patients, clinicians, payers, and other decision makers have different perspectives on what constitutes evi- dence of effectiveness. For example, some may view the scientific evidence as demonstrating what works under ideal circumstances but not necessarily under a particular set of real world circumstances. A variety of factors can affect the applicability of a particular RCT to individual clinical decisions or circumstances, including patient factors, such as comorbidities, underly- ing risk, adherence to therapies, disease stage and severity, health insurance coverage, and demographics; intervention factors, such as care setting, level of training, timing and quality of the intervention, and an array of other factors (Atkins, 2007).

SYSTEMATIC REVIEWS 91 The choice of study designs to be included in a systematic review should be based on the type of research question being asked and should have the goal of minimizing bias (Glasziou et al., 2004; Oxman et al., 2006). Table 4-3 provides examples of research questions and the types of evidence that are the most appropriate for addressing them. RCTs can answer ques- tions about the efficacy of screening, preventive, and therapeutic interven- tions. Although RCTs can best answer questions about the potential harms from interventions, observational study designs, such as cohort studies, case series, or case control studies, may be all that are available or possible for the evaluation of rare or long-term outcomes. In fact, because harms from interventions are often rare or occur far in the future, a systematic review of observational research may be the best approach to identifying reliable evidence on potential rare harms (or benefits). Observational studies are generally the most appropriate for answering questions related to prognosis, diagnostic accuracy, incidence, prevalence, and etiology (Chou and Helfand, 2005; Tatsioni et al., 2005). Cohort stud- ies and case series are useful for examining long-term outcomes because RCTs may not monitor patients beyond the primary outcome of interest or for rare outcomes because they generally have small numbers of par- ticipants. Case series are often used, for example, to identify the potential long-term harms of new types of radiotherapy. Similarly, the best evidence on potential harms related to oral contraceptive use (e.g., an increased risk of thromboembolism) may be from nonrandomized cohort studies or case- control studies (Glasziou et al., 2004). Many systematic reviews use a best evidence approach that allows the use of broader inclusion criteria when higher-quality evidence is lacking (Atkins et al., 2005). In these cases, the systematic reviews consider obser- vational studies because, at a minimum, noting the available evidence helps to delineate what is known and what is not known about the effectiveness of the intervention in question. By highlighting the gaps in knowledge, the review establishes the need for better quality evidence and helps to priori- tize research topics. For intervention effectiveness questions for which RCTs form the high- est level of evidence, it is essential to fully document the rationale for including nonrandomized evidence in a review. Current practice does not meet this standard, however. Researchers have found, for example, that 30 of 49 EPC reports that included observational studies did not disclose the rationale for doing so (Norris and Atkins, 2005). â See Chapter 1 for the definitions of the types of experimental and observational studies.

92 KNOWING WHAT WORKS IN HEALTH CARE TABLE 4-3â Matching the Clinical Question with the Appropriate Evidence Type of Question Example of Question Type of Evidencea Screening or Is prostate-specific antigen screening for RCTs early diagnosis the detection of prostate cancer in low-risk populations effective in reducing mortality? Does early diagnosis by use of a PETb scan RCTs result in improved cognitive ability for patients with Alzheimerâs disease? Etiology Does smoking cause lung cancer? Cohort studies, case-control studies Diagnostic Does a PET scan diagnose Alzheimerâs disease Case series (RCTs accuracy more accurately than a standard clinical desirable but evaluation? unlikely) Prognosis What is the likelihood for fertility loss RCTs, cohort in a premenopausal woman receiving studies chemotherapy for breast cancer? How long do patients remain insulin Cohort studies, independent after pancreatic islet cell case series transplantation for Type I diabetes mellitus? Preventive or Is bevacizumab (Avastin) as effective as RCTs therapeutic ranibizumab (Lucentis) in delaying the effectivenessc progression of acute macular degeneration? How does surgical implantation of an RCTs artificial lumbar disc compare with lumbar spinal fusion for pain reduction in patients with degenerative disc disease? Is external beam radiation more effective RCTs than watchful waiting in reducing mortality from prostate cancer? Safety or What proportion of postmenopausal women RCTs, cohort potential harm receiving calcium and vitamin D supplements studies, case- develop kidney stones? control studies Is robotic-assisted radical prostatectomy more RCTs, cohort likely to lead to urinary incontinence than studies, case- laparoscopicic-assisted radical prostatectomy? control studies aSystematic reviews of the âbestâ evidence are more reliable than evidence from a single study, regardless of the clinical question being asked. bPET = positron emission tomography. cIncludes drugs, devices, procedures, physical therapy, counseling, behavior change, and systems change in head-to-head comparisons and comparisons with standard interventions, placebo or sham treatments, or no intervention. SOURCE: Adapted from the work of Dickersin (2007).

SYSTEMATIC REVIEWS 93 Dearth of Evidence For surgical procedures, population-based public health measures, quality improvement strategies, and many other health care interventions, relevant, randomized evidence is frequently unavailable (Norris and Atkins, 2005). Indeed, the evidence base on the effectiveness of most health services is sparse (BCBSA, 2007; Congressional Budget Office, 2007; The Health Industry Forum, 2006; IOM, 2007; Medicare Payment Advisory Commis- sion, 2007; Wilensky, 2006). Well-designed, well-conducted studies of the effectiveness of most health care services are the exception, and the avail- able research evidence falls far short of answering many questions that are important to patients and providers (Tunis, 2006). Although the FDA reviews prescription drugs for their short-term safety and efficacy, medical devices, surgical procedures and implants, diagnostic tests, common off- label uses of pharmaceuticals, and new combinations of approved uses of pharmaceuticals do not receive comparable reviews. Moreover, the FDA reviews do not consider evidence on whether the benefits of using a drug or a device outweigh the potential harms in individual patients or population groups. Effectiveness data for major subpopulations, including children, elderly people, African Americans, and Hispanics, are rarely available. Commonly, researchers carefully review hundreds of references from the literature, only to conclude that no eligible study that directly addresses the question of interest exists. For example, in a review of the evidence on how best to determine if acute conjunctivitis is viral or bacterial in origin, the investigators were unable to identify evidence of the diagnostic validity of clinical signs, symptoms, or both in distinguishing bacterial conjunctivitis from viral conjunctivitis (Rietveld et al., 2003). Neumann and colleagues (2005) reviewed the availability and quality of evidence for 69 medical devices, surgical procedures, and other medical therapies that were subject to national Medicare coverage determinations from 1998 to 2003. The researchers found good evidence on health out- comes for only 11 of the 69 technologies (16 percent) (Table 4-4). For more than 29 technologies, there was either no evidence at all (6 technologies) or poor-quality evidence (23 technologies) because of a limited number of studies, the weak power of the studies, flaws in the design or the conduct of the studies, or missing information on important health outcomes. The evi- dence was considered âfairâ for 29 technologies (42 percent). See Box 4-1 for a list of the technologies with poor or no evidence. The Medicare experience closely mirrors that at the USPSTF. The â Excluding 13 coverage decisions that were omitted because they involved minor coding or language changes (n = 7), exceptional circumstances (n = 3), or incomplete Centers for Medicare & Medicaid Services decision memoranda (n = 3).

94 KNOWING WHAT WORKS IN HEALTH CARE TABLE 4-4â Quality of Evidence for Technologies Subject to Medicare National Coverage Determinations, 1998-2003 Rating Number of Technologies Percent All technologies 69 100 Good 11 16 Fair 29 42 Poor 23 33 Unavailable 6 9 NOTE: The ratings are based on USPSTF criteria. âGoodâ indicates consistent results from well-designed, well-conducted studies with representative populations. âFairâ indicates suf- ficient evidence to determine effect on health outcomes but the evidence is limited by the number, quality, or consistency of the individual studies. âPoorâ indicates insufficient evidence on effects on health outcomes because of a limited number of studies or the weak power of the studies, flaws in study design or conduct, or lack of information on important health outcomes. SOURCE: Neumann et al. (2007). USPSTF currently has 114 recommendations on the use of clinical preven- tive services by specific population groups (e.g., men ages 50 to 70 years or women older than age 65 years). For almost 40 percent (44 of 114) of the recommendations, the USPSTF concluded that the evidence was insufficient to determine if the service had an effect on health outcomes for the speci- fied population because of a limited number of studies or the weak power of the studies, important flaws in the design or conduct of the studies, gaps in the chain of evidence, or a lack of information on important health outcomes (Barton, 2007; USPSTF, 2007). Box 4-2 lists prevention topics with insufficient evidence for one or more population subgroups. These include, for example, routine use of testing for human papillomavirus as a primary screening test for cervical cancer; screening of asymptomatic indi- viduals for lung cancer by the use of low-dose computerized tomography, chest X-ray, sputum cytology, or a combination of these tests; and routine screening for prostate cancer by prostate-specific antigen testing or digital rectal examination. New Sources of Evidence There is growing interest in using sources of evidence such as large clinical and administrative databases based on electronic health records, registries, and other sources (AHRQ, 2007b; Perlin and Kupersmith, 2007). As health information technology advances, these sources of evidence will grow richer and the information contained in them should be mined as appropriate. Large data sets are especially useful for examining questions of incidence, prognosis, diagnosis, harms, related risks, effects of complex

SYSTEMATIC REVIEWS 95 BOX 4-1 Medicare National Coverage Decisions with Poor Evidence â¢ Air-fluidized beds for pressure ulcers â¢ Autologous stem cell transplantation for AL amyloidosis â¢ Biofeedback for urinary incontinence â¢ Cardiac pacemakers â¢ Cryosurgical salvage therapy for recurrent prostate cancer â¢ Electrical bioimpedence for cardiac output monitoring â¢ Electrical stimulation for fracture healing â¢ Electrodiagnostic sensory nerve conduction threshold â¢ Home biofeedback for urinary incontinence â¢ Liver transplantation for malignancies other than hepatocellular carcinoma â¢ Noninvasive positive-pressure respiratory-assist devices for chronic obstruc- tive pulmonary disease â¢ Ocular photodynamic therapy with verteporfin for macular degeneration â¢ Pneumatic compression pumps for venous insufficiency â¢ Positron emission tomography fluorodeoxyglucose (FDG) for Alzheimerâs disease/dementia â¢ Positron emission tomography (FDG) for breast cancer â¢ Positron emission tomography (FDG) for soft tissue sarcoma â¢ Positron emission tomography scanner technology â¢ Prolotherapy for chronic low back pain â¢ Transmyocardial revascularization for severe angina â¢ Warm-Up Wound Therapy (noncontact normothermic wound therapy) NOTE: Evidence was considered âpoorâ if it was insufficient to assess the effects on health outcomes because of the limited number of studies or weak power of the studies, flaws in study design or conduct, or lack of information on important health outcomes. SOURCE: Neumann et al. (2007). patterns of comorbidities, and the effects of genetic variation (Francis and Perlin, 2006; IOM, 2007; Stewart et al., 2007). Mathematical modeling, Bayesian statistics, and decision modeling have also been heralded as having great future potential in better understanding health care effectiveness and risks (Claxton et al., 2005; Eddy, 2007). These types of evidence will pose significant challenges, but are likely to prove essential to understanding and improving health and health care systems. Step 3: Conduct a Comprehensive Search for Evidence The search for the evidence is arguably the most important step in conducting a high-quality systematic review. In a human research study, selection of the appropriate group to be studied is widely understood to be

96 KNOWING WHAT WORKS IN HEALTH CARE BOX 4-2 Prevention Topics with Insufficient Evidence for One or More Population Subgroups â¢ Behavioral counseling in primary care to promote a healthy diet â¢ Behavioral counseling in primary care to promote physical activity â¢ Breast-feeding â¢ Counseling to prevent skin cancer â¢ Counseling to prevent tobacco use and tobacco-caused disease â¢ Interventions in primary care to reduce alcohol misuse â¢ Lung cancer screening â¢ Newborn hearing screening â¢ Prevention of dental caries in preschool-age children â¢ Primary care interventions to prevent low back pain in adults â¢ Routine vitamin supplementation to prevent cancer and cardiovascular disease â¢ Screening and behavioral counseling â¢ Screening and interventions for overweight in children and adolescents â¢ Screening for bacterial vaginosis in pregnancy â¢ Screening for breast cancer â¢ Screening for cervical cancer â¢ Screening for chlamydial infection â¢ Screening for coronary heart disease â¢ Screening for dementia â¢ Screening for depression â¢ Screening for family and intimate partner violence â¢ Screening for gestational diabetes mellitus â¢ Screening for glaucoma â¢ Screening for gonorrhea â¢ Screening for hepatitis C in adults â¢ Screening for high blood pressure â¢ Screening for lipid disorders in adults â¢ Screening for obesity in adults â¢ Screening for oral cancer â¢ Screening for prostate cancer â¢ Screening for skin cancer â¢ Screening for suicide risk â¢ Screening for thyroid disease â¢ Screening for Type II diabetes mellitus in adults NOTE: Each clinical topic or preventive service that the USPSTF has reviewed may lead to one or more separate population-specific recommendations. The USPSTF rates the strength of its recommendations as âIâ for âinsufficientâ when evidence on whether the service is ef- fective is lacking, of poor quality, or conflicting and the balance of benefits and harms cannot be determined. In such cases, the USPSTF does not recommend either for or against the routine provision of the service. For the topics listed here, there was at least one population subgroup with an âIâ rating. SOURCE: AHRQ (2006).

SYSTEMATIC REVIEWS 97 critical to obtaining valid findings. The comparable step in a systematic re- view is the identification of all relevant studies meeting the eligibility criteria for the review. A comprehensive search is necessary because there is no way of knowing whether the missing studies are missing at random or missing for a reason critical to understanding current knowledge. Minimizing Bias Biasâwhich is the tendency for a study to produce results that depart systematically from the truthâis the biggest threat to the validity of a review. Box 4-3 describes the potential sources of bias in the individual studies identified during the search for evidence and in the review itself. Without the use of systematic methods to guard against bias in the review, useless or harmful interventions may appear to be worthwhile and beneficial interventions may appear to be useless (Chalmers, 2003). Report- ing biases have important implications during the search for evidence. For example, it is now well established that positive results are more likely to be published than null or negative results both for entire studies (Dickersin, 2005; Dickersin and Min, 1993) and for selected outcomes (Chan et al., 2004). Furthermore, a growing literature indicates that industry-sponsored research is more likely to favor the industry sponsorâs product than non- industry-sponsored research (Als-Nielsen et al., 2003; Bekelman et al., 2003; Heres et al., 2006; Jorgensen et al., 2006; Lexchin et al., 2003; Peppercorn et al., 2007). Studies have also found that the direction of the results (i.e., positive or negative) can be associated with the language of publication (Egger et al., 1997), the impact factor10 of the journal (Easterbrook et al., 1991), and publication in the âgray literatureâ (Hopewell et al., 2007b), for example, research abstracts, government reports, and theses. Publication biases also relate to where a study is published, as some sources are more accessible than others. Some systematic reviewers find it difficult to readily identify studies published in non-English-language journals, the gray literature, and certain specialty journals. One favorable development is that the rate of universal registration of RCTs is growing. This development may help address the publication bias related to studies of this design (World Health Organizations, 2007). Un- fortunately, there is no similar organized effort to promote the registration of observational studies. â Elsewhere in this report, the term âbiasâ is used to refer to bias due to conflicts of interest. 10â The âimpact factorâ is a commonly used ratio developed to estimate the relative impact or influence of biomedical journals (Garfield, 2006).

98 KNOWING WHAT WORKS IN HEALTH CARE BOX 4-3 Sources of Bias in Individual Studies and Systematic Reviews Biases can lead to under-estimation or over-estimation of a true intervention ef- fect. Systematic reviews should be based on the best evidence available to answer the questions posed, controlling against systematic bias both in the individual studies and in the review itself. The key types of bias that can affect the internal validity of individual studies are as follows: â¢ Selection biasâsystematic differences between comparison groups in a study, for example, in a clinical trial if patients assigned to the treatment group have a better prognosis than those assigned to the placebo group. â¢ Attrition biasâsystematic differences in withdrawals from a study or exclu- sions from the study results between the studyâs comparison groups. â¢ Performance biasâsystematic differences in care, apart from the interven- tion being evaluated or the measurement of exposure, provided to different comparison groups in a study. â¢ Detection biasâsystematic differences in outcome assessment or verification in comparison groups (also called âascertainment biasâ). â¢ Within-study reporting biasâsystematic differences between reported and unreported findings. The key types of bias that may affect the validity of a systematic review are as follows: â¢ Reporting biasâsystematic differences may exist between reported and non- reported studies (e.g., a higher proportion of studies with positive findings than studies with null or negative findings may be published [âpublication biasâ]). Systematic differences in findings may also exist between MEDLINE-indexed and non-MEDLINE-indexed journals, English-language and non-English- language publications (language bias), easier and harder-to-access literature (e.g., null or negative findings are published in journals with less of an impact or in the âgray literatureâ), and studies with commercial funding sources. â¢ Information biasâkey details about the study may be missing, particularly for studies that appear in the literature only as abstracts, which are subject to reporting bias. SOURCES: Dickersin (2002); Higgins and Green (2006); West et al. (2002). Sources of Evidence Most systematic reviewers limit their searches to electronic databases, for reasons of time, convenience, expense, and their own limitations in knowledge and understanding of the appropriate review methodology.

SYSTEMATIC REVIEWS 99 In the United States, most reviews include a search of the MEDLINE11 database; and fewer include searches of the Cochrane Central Register of Controlled Trials, EMBASE,12 CINHAL,13 the Web of Science, the Latin American Caribbean Health Sciences Literature (LILACS), and other databases. A search of just one electronic database is likely to identify only a sub- set of all relevant studies for inclusion in a review. Early research showed that searches of the MEDLINE database for clinical trials identified only about 50 percent of all relevant trials (Dickersin et al., 1985, 1994). This led to a modification of the MEDLINE indexing system to include meth- odology indexing terms, and the Cochrane Collaboration further enhanced the ability to retrieve relevant information by contributing trials that it had identified to a central repository (Dickersin et al., 2002). Researchers at McMaster University and elsewhere have extensively tested search strategies to determine those strategies that are optimal for detecting reports on RCTs and other types of studies used in systematic reviews (Wieland and Dickersin, 2005; Wilczynski et al., 2005). However, more research is needed to determine the best search strategy for identify- ing adverse effects, for example, by using evidence from nonrandomized studies when one is examining adverse effects. Some studies suggest that because highly sensitive searches tend to yield large numbers of irrelevant studies, there should be a greater emphasis on improving both reporting and indexing to facilitate the conduct of systematic reviews (Golder et al., 2006; Wieland and Dickersin, 2005). Hand searchesâ Although many reviewers also conduct a hand search14 of reference lists and other review articles, few hand searches include confer- ence proceedings or recent issues of key journals. Because only about half of all results reported in conference proceedings are ultimately reported in key journals (Scherer et al., 2007) and only full publication is associated with 11â MEDLINE is the United States National Library of Medicineâs bibliographic database of the literature from medicine, nursing, dentistry, veterinary medicine, allied health, and pre- clinical sciences. See http://www.nlm.nih.gov for more information. 12â Excerpta Medica (EMBASE) is a biomedical and pharmaceutical database indexing over 3,500 international journals in drug research, pharmacology, pharmaceutics, toxicology, clinical and experimental human medicine, health policy and management, public health, occupational health, environmental health, drug dependence and abuse, psychiatry, forensic medicine, and biomedical engineering/instrumentation. See http://www.embase.com/ for more information. 13â Cumulative Index to Nursing & Allied Health Literature (CINHAL) covers literature related to nursing and allied health from 1982 to the present. See http://www.cinahl.com/ prodsvcs/cinahldb.htm for more information. 14â hand search is a manual review of each page of selected individual journals published A during a specified period.

100 KNOWING WHAT WORKS IN HEALTH CARE positive findings, those conducting systematic reviews may indicate that a treatment is successful when it actually is not, if abstracts from conference proceedings are not included and hand searches are not done. Adding to this potential bias is the fact that the information in abstracts is limited at best, making it difficult to judge the validity of study methods and results. In a recent systematic review of 34 studies comparing the sensitivity of hand searches with that of electronic searches, Hopewell and colleagues (2007a) found that hand searches identified 92 to 100 percent of the total number of reports of randomized trials. In contrast, electronic searches had a lower yield; a search of the MEDLINE database retrieved 55 percent of the total reports, a search of EMBASE retrieved 49 percent, and a search of PsycINFO retrieved 67 percent. Step 4: Critically Appraise the Evidence A properly conducted systematic review systematically scrutinizes and documents the quality, strength, and consistency of the studies that make up the relevant body of evidence (Box 4-4). The quality of an individual study relates to all aspects of its design and execution, including the extent to which bias is avoided or minimized. Each individual study, including past systematic reviews (if they are available), should be meticulously examined to identify whether the study incorporated methods that protect against bias and how the various types of bias may have affected the results (Khan and Kleijnen, 2001). Both experimental and observational studies must also be judged for their external validity or for their applicability to the popula- tion of interest. Without a thorough analysis of the body of research, the review will not meet decision makersâ need to know which evidence is valid, for whom it is valid, and under what circumstances it is valid. Despite the imperative for the use of standardized methods in sys- tematic reviews, current practices appear to fall short of expectations. This is particularly worrisome because end usersâpatients, clinicians, and othersâmay accept the findings in published reviews at face value. Deficiencies are commonplace; for example, the methods may be poorly documented or poorly executed, the quality of individual studies may not be assessed or described, inappropriate statistical methods may have been used, and errors in the analyses may not be identified (Bhandari et al., 2001; Delaney et al., 2005; Glenny et al., 2003; Hayden et al., 2006; Jadad and McQuay, 1996; Jadad et al., 2000; Mallen et al., 2006; Moher et al., 2007; Shea et al., 2002; Whiting et al., 2005). The following describes examples of recent findings. Moher and colleagues (2007) assessed the quality of 300 systematic reviews identified through a MEDLINE search for English-language re- views. Most of the reviews (213 of 300) concerned therapeutic or preven-

SYSTEMATIC REVIEWS 101 BOX 4-4 Key Concepts in Appraising Evidence Assessing the effectiveness of a health intervention requires careful scrutiny of the quality, strength, and consistency of the individual and systematic reviews that make up the relevant body of evidence. These and other related concepts are defined below: â¢ Study qualityâFor an individual study, study quality refers to all aspects of a studyâs design and execution and the extent to which bias is avoided or minimized. A related concept is internal validity, that is, the degree to which the results of a study are likely to be true and free of bias. â¢ Strength of findingsâThe strength of the findings can refer to those of a single study or a body of evidence. The term can be used to refer to the numbers of participants and events observed (greater strength for greater numbers), as well as to the magnitude of the effect, either beneficial or harmful. â¢ ConsistencyâConsistency refers to a body of evidence in which individual studies report similar findings, even though there might be some variations in the populations studied or the forms or dosages of the interventions. â¢ External validityâExternal validity (or applicability) refers to the extent to which the effects observed in a research study can be applied to a real-life population and setting. â¢ Estimate of effectâThe estimate of the effect is the relationship observed between an intervention and an outcome. In intervention studies, the es- timate of effect may be expressed as the study effect size, relative risk, risk difference, an odds ratio, the number needed to treat, or some other measure of effect or association. SOURCES: GRADE Working Group (2004); Ioannidis and Lau (2004); Khan et al. (2001a,b); Treadwell et al. (2006); West et al. (2002). tive interventions and were published in specialty journals (272 of 300). The authors found that only 11 percent of the reviews were based on a standard protocol, less than one-quarter (23 percent) considered or assessed publication bias, and 41 percent did not report their funding sources. The reviews searched a median of three electronic databases and two other sources. There was little consistency in how the electronic searches were documented in the reviews; only 69 percent of the reviews reported the years of publication searched. Mallen and colleagues (2006) examined how 78 English-language sys- tematic reviews analyzed the quality of the original observational studies. All the reviews were published in peer-reviewed journals from 2003 to 2004. The reviews of the Cochrane Collaboration and United Kingdom

102 KNOWING WHAT WORKS IN HEALTH CARE National Health Service R&D Health Technology Assessment Programme (HTA)15 were excluded because they were known to include formal quality assessment procedures. In 36 of the 78 reviews, the quality of the individual studies was not assessed. Although the quality of the studies was reported in 39 reviews, the reviews used 10 different quality assessment techniques, making it difficult to compare them. It was unknown whether quality was assessed for three of the reviews. Investigators have also identified data extraction errors in many system- atic reviews, including Cochrane Collaboration and other standards-based reviews (GÃ¸tzsche et al., 2007; Jones et al., 2005). Some errors could be averted by using a second extractor. For example, in preparation for a systematic review of the use of melatonin for the management of sleep disorders, a Canadian team found that more errors were made by using single extraction with verification by a second person than with double data extraction (Buscemi et al., 2006). Hierarchies of Evidence Organizations that develop clinical guidelines, as well as other review- ers of evidence, often look to hierarchies of evidence to gauge the relative strength of individual studies. The hierarchies provide frameworks that assign types of evidence (e.g., RCTs, controlled trials without randomiza- tion, and well-designed case series or cohort studies) to various levels, each with a corresponding grade. Numerous hierarchies and typologies have proliferatedâeach with its own system of letters, codes, and symbols (SchÃ¼nemann et al., 2003). As Table 4-5 illustrates, the end result is greater confusion rather than clarification.16 Hierarchies that include systematic reviews typically place them above single studies of the same design. Montori and colleagues (2003) explained that systematic reviews of RCTs should be at the top of the hierarchy for intervention questions because of their emphasis on methodological quality and, if a meta-analysis is employed, the availability of more precise esti- mates of the association or treatment effect. Evidence hierarchies have helped raise awareness that some study de- signs are less subject to bias than others (Glasziou et al., 2004). Hierarchies, however, consider just the type of research study (e.g., RCTs or prospec- tive observational studies) and not the quality of the individual studies (Poolman et al., 2006). Findings from a poorly conducted trial should not 15â HTA reviews are systematic reviews conducted under the auspices of the United Kingdom National Health Service R&D Health Technology Assessment Programme. 16â Table 5-2 in Chapter 5 illustrates the confusion in evidence hierarchies and recommenda- tion grades in cardiology.

SYSTEMATIC REVIEWS 103 TABLE 4-5â Selected Examples of Evidence Hierarchies for Three Cardiology Interventions Quality of Intervention and Organization the Evidence Type of Evidence Oral anticoagulation therapy in patients with atrial fibrillation and rheumatic mitral valve disease American Heart Association Level B Single randomized trial or nonrandomized studies Scottish Intercollegiate Level 4 Expert opinion Guidelines Network American College of Chest Grade C+ No RCTs (but strong RCT results Physicians can be unequivocally extrapolated) or overwhelming evidence from observational studies Implantable cardioverter-defibrillator for cardiac arrest due to sustained ventricular fibrillation or ventricular tachycardia American College of Cardiology/ Level A Multiple RCTs or meta-analyses American Heart Association Scottish Intercollegiate Level 3/4 Nonanalytic studies, e.g., case reports Guidelines Network and case series European Society of Cardiology Level B Single RCT or large nonrandomized studies Carotid endarterectomy for internal carotid artery stenosis or symptomatic stenosis American College of Cardiology/ Level C Consensus opinion of experts, results of American Heart Association case studies, or standard of care American Academy of Class I/II Class I = prospective RCT with masked Neurology outcome assessment, in a representative population* Class II = prospective matched group cohort study in a representative population with masked outcome assessment that meets all four Class I criteria (a to d) or an RCT in a representative population that lacks one of the Class I criteria Veterans Health Administration Level I At least one properly conducted randomized controlled trial *The following are also required: (a) primary outcome(s) clearly defined; (b) exclusion and inclusion criteria clearly defined; (c) adequate accounting for dropouts and crossovers with numbers sufficiently low to have a minimal potential for bias; and (d) relevant baseline char- acteristics are presented and are substantially equivalent among treatment groups or there is an appropriate statistical adjustment for the differences. SOURCE: NGC (2007); SchÃ¼nemann et al. (2003).

104 KNOWING WHAT WORKS IN HEALTH CARE necessarily trump evidence from a nonrandomized study. All the evidence that is found should be clearly described and scrutinized and not just as- signed to a level of a hierarchy (Glasziou et al., 2004). Step 5: Synthesize the Body of Evidence The core of a systematic review is a concise and transparent synthesis of the results of the studies included in the review. The language of the review should be simple and clear so that it is usable and accessible to deci- sion makers. The synthesis may be purely qualitative; quantitative but only descriptive, in that study results are presented in a common metric but not combined; or it may be complemented by a meta-analysis that combines the individual study results and allows statistical inference. There are no standard guidelines for conducting or presenting the synthesis. However, the Cochrane Collaboration produces and regularly updates a methods handbook for Cochrane reviews of clinical trials that is available on the Internet (Higgins and Green, 2006). The AHRQ Effective Health Care Program is currently developing a methods manual for system- atic reviews that focuses on comparative effectiveness (AHRQ, 2007a). The synthesis should collate, describe, and summarize the following key features of the individual studies that could have a bearing on the findings: â¢ Characteristics of the patient population, the care setting, and type of provider â¢ Intervention (route, dose, timing, duration) â¢ Comparison group â¢ Outcome measures and timing of assessments â¢ Quality of the evidence (i.e., risk of bias) from individual studies and possible influence on findings â¢ Sample sizes â¢ Quantitative results and analyses including examination of whether the study estimates of effect are consistent across studies â¢ Examination of potential sources of study heterogeneity, if relevant The investigators should consider carefully if a meta-analysis is appro- priate and should combine clinical judgment and a thorough understanding of the individual studies with the aggregated result. A summary estimate has the potential to mislead and lead to spurious conclusions (Editors, 2005). A detailed description of meta-analysis is beyond the scope of this report; however, an excellent review of the analytic considerations in conducting meta-analyses can be found in the text Methods for Meta-Analysis in Medi- cal Research (Sutton et al., 2000).

SYSTEMATIC REVIEWS 105 The synthesis should not include policy recommendations. If the sys- tematic review is both scientific and transparent, decision makers should be able to interpret the evidence, to know what is not known, and to describe the extent to which the evidence is applicable to clinical practice and particular subgroups of patients (Santaguida et al., 2005). Making evidence-based decisionsâsuch as when a guideline developer recommends what should and should not be done in specific clinical circumstancesâis a distinct and separate process from conducting a systematic review and is the subject of the next chapter. Journal Standards for Reporting Systematic Reviews In the past decade, researchers, clinicians, epidemiologists, statisti- cians, and editors have collaborated to develop standards for the report- ing of findings from clinical trials and meta-analyses of randomized and nonrandomized studies in journals. The collaboration arose from concerns that study quality was poorly reflected in the manuscripts that present study findings. Table 4-6 describes the basic requirements of these three standardized reporting formats, Consolidated Standards for Reporting Tri- als (CONSORT), Quality of Reporting of Meta-analyses (QUOROM)17 for RCTs, and Meta-analysis Of Observational Studies in Epidemiology (MOOSE). Other approaches to standardized reporting including Strength- ening the Reporting of Observational Studies in Epidemiology (STROBE) and Standards for Reporting of Diagnostic Accuracy (STARD) (Bossuyt et al., 2003; Ebrahim and Clarke, 2007; von Elm et al., 2007). CONSORT uses standardized checklists and a flow diagram to ensure the proper and consistent reporting of the benefits and harms reported from RCTs (Ioannidis et al., 2004; Moher et al., 2001b). Since its publication in 1996, many journals have adopted CONSORT, and as a result, the quality of reporting of the findings from RCTs has improved substantially (Moher et al., 2001a,b, 2007). After the release of CONSORT, two standard formats for reporting on meta-analyses were developed: QUOROM for meta-analyses of RCTs and MOOSE for meta-analyses of observational studies (Moher et al., 1999; Stroup et al., 2000). Like CONSORT, QUOROM and MOOSE use checklists to ensure that meta-analyses include sections describing the back- ground, search strategy, methods, results, a discussion, and conclusions. However, as Table 4-7 indicates, the use of QUOROM and MOOSE is not widely required by most prominent journals, according to the instructions 17â QUOROM standards are currently being updated under the name PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses).

TABLE 4-6â Comparison of Reporting Standards in CONSORT, QUOROM, and MOOSE 106 Required Information System, Study Type Title and Abstract Introduction Methods Results Discussion CONSORT, For the abstract, identify Scientific Participants, interventions, Participant flow, Interpretation, trials the method of selection background and objectives, outcomes, sample recruitment, baseline generalizability, overall of participants explanation of size, randomization, blinding, data, numbers evidence rationale and statistical methods analyzed, outcomes and estimation, ancillary analyses, and adverse events QUOROM, For the title, identify Clinical problem, Searching, selection, validity Trial flow, study Key findings, meta-analyses the report as a meta- biological assessment, data abstraction, characteristics, and clinical inferences, of trials analysis (or a systematic rationale for the study characteristics, quantitative data interpretation of review) of RCTs; for the intervention, quantitative data, and synthesis synthesis results, potential abstract, use a structured and rationale for biases, and future abstract format that review research agenda describes the objectives, data sources, review methods, results, and conclusions MOOSE, meta- For the abstract, identify Clinical problem, Search strategy, qualification Graph and table Assessment of bias, analyses of the type of study and use hypothesis, of searchers, software and summarizing the justification of observational structured abstract outcome of databases used, potential results, sensitivity exclusion, quality studies study, exposure biases, rationale for studies testing, and of studies included, or intervention and selection of data used, indication of alternative explanation, used, study assessment of confounding, the statistical generalization, future design used, and study quality and heterogeneity, uncertainty of the research, and funding study population and statistical methods findings source

SYSTEMATIC REVIEWS 107 TABLE 4-7â Use of Reporting Standards in Leading Biomedical Journals ICMJE Uniform Standards Are Specifically Mentioned Requirements Are in the Journalâs Instructions to Included in the Journalâs Authors Instructions to Authors Selected CONSORT QUOROM MOOSE Entire sections of Journal (1996) (1999) (2000) manuscript manuscript Annals of Internal â â â â Medicine Archives of General â â Psychiatry British Medical â â â â Journal CANCER â â CHEST â â Circulation â â â Diabetes â Hypertension â â JAMA â â â â Journal of Clinical â â Oncology Journal of the â â National Cancer Institute Lancet â â New England â â Journal of Medicine Obstetrics and â â â Gynecology Pediatrics â â Radiology â â â Reviews in Clinical Gerontology Spine â â NOTE: For systematic reviews, the International Committee of Medical Journal Editors (ICMJE) encourages but does not require authors to consult the QUOROM and MOOSE reporting guidelines; for RCTs, the use of the CONSORT reporting guidelines is required. SOURCES: Annals of Internal Medicine (2007); Archives of General Psychiatry (2007); Brit- ish Medical Journal (2007); CANCER (2007); CHEST (2006); Circulation (2007); Diabetes (2007); Hypertension (2007); ICMJE (2006); JAMA (2007); Journal of Clinical Oncology (2007); Journal of the National Cancer Institute (2007); Lancet (2007); New England Journal of Medicine (2007); Obstetrics and Gynecology (2007); Pediatrics (2007); Radiology (2007); Reviews in Clinical Gerontology (2007); Spine (2007).

108 KNOWING WHAT WORKS IN HEALTH CARE for authors that they publish. Appendix D provides the QUOROM and MOOSE formats. RECOMMENDATIONS As noted earlier, this chapterâs recommendations are intended to guide the conduct of systematic reviews produced under the aegis of a national clinical effectiveness assessment program (âthe Programâ). The recommen- dations draw from the research examined in this chapter and are based on the consensus of the committee. The Program must address three critical challenges: (1) the development or endorsement of standards to ensure high-quality and usable evidence reviews, (2) methods research to find so- lutions to the technical challenges of systematic review, and (3) a research workforce that is sufficient to meet the Programâs demands. Standards Systematic reviews of evidence on the effectiveness of health care ser- vices provide a central link between the generation of research and clinical decision making. Systematic review is itself a science and, in fact, is a new and dynamic science with evolving methods. In the United States, much can be gained by beginning to systemize and standardize the generation of systematic reviews as soon as possible. At a minimum, this should include standard reporting formats and common terminology for characterizing the strength of the evidence in order to ensure that evidence reviews are acces- sible and usable for all types of decision makers. Under the status quo, the quality of published reviews is variable and often unreliable. Judging the quality of reviews is difficult at best because the methods used to produce the reviews are so frequently poorly docu- mented. The numerous grading schemes and hierarchies that are used are confusing. An overreliance on hierarchies is also inappropriate because such hierarchies fail to account for the quality of the underlying research. Reporting standards exist, but are often not used or enforced. Recommendation: The Program should develop evidence-based, meth- odologic standards for systematic reviews, including a common lan- guage for characterizing the strength of evidence. The Program should fund reviewers only if they commit to and consistently meet these standards. â¢ The Program should invest in advancing the scientific methods un- derlying the conduct of systematic reviews and, when appropriate, update the standards for the reviews it funds.

SYSTEMATIC REVIEWS 109 Methods Investing in the science of research synthesis will increase the quality and the value of the evidence provided in systematic reviews. As a new field, attention to the methods used to conduct systematic reviews and attention to improving the existing methods are critically important. About two de- cades of research underpins the methods that are being used to search, iden- tify, appraise, and interpret the evidence presented in a systematic review (Egger et al., 2001; Mulrow and Lohr, 2001). Much remains to be learned, and numerous unresolved methodological issues remain (Helfand, 2005; Neumann, 2006). Research is needed on methods for identifying observa- tional studies, using observational evidence in the absence of randomized data, and better understanding the impact of potential biases (Egger et al., 2003; Gluud, 2006; Hopewell et al., 2007a,b; Kunz et al., 2007; Song et al., 2000). Box 4-5 lists some of the most pressing methodological issues. Recommendation: The Program should assess the capacity of the re- search workforce to meet the Programâs needs, and, if deemed appro- priate, it should expand training opportunities in systematic review and comparative effectiveness research methods. Research Workforce It is not known how many researchers in the United States are ad- equately trained and qualified to conduct systematic reviews on the effec- tiveness of health care services. At present, AHRQ provides predoctoral and postdoctoral educational and career development grants in health services research (Medicare Payment Advisory Commission, 2007). The agency also provides institution-level grants to support the planning and develop- ment of health services research in certain types of institutions. The NIH also supports a wide range of research training opportunities. However, it is not known to what extent the AHRQ and NIH training programs focus on systematic reviews. Thus, it is unknown but likely that the nation has insufficient human capacity to support an expanded national effort to generate systematic reviews of clinical effectiveness. The Program should assess the research workforce to see if it is adequate. If necessary, the Program should provide more opportunities for training in the conduct of systematic reviews and comparative effectiveness research. A field can grow and produce high- quality work only if it attracts and retains creative investigators. There must be opportunities to learn and grow professionally. To be attractive to the best and the brightest individuals, the field must adhere to high standards of research quality and scientific integrity, be open to new ideas and people,

110 KNOWING WHAT WORKS IN HEALTH CARE BOX 4-5 Unresolved Methodological Issues in Conducting Systematic Reviews Locating and Selecting Studies â¢ How best to identify all relevant published studies â¢ Whether to include and how best to identify non-English-language studies â¢ Whether to include and how best to identify unpublished studies and studies in the gray literature (e.g., abstracts) â¢ Search strategies for identifying observational studies in MEDLINE, EMBASE, and other databases â¢ Search strategies for identifying studies of diagnostic accuracy in MEDLINE, EMBASE, and other databases Assessing Study Quality â¢ Understanding the sources of reporting deficiencies in studies being synthesized â¢ Understanding and identifying potential biases and conflicts of interest â¢ Quality thresholds for study inclusion and the management of individual study quality in the context of a review Collecting Data â¢ Identifying and selecting information to assess treatment harms â¢ Obtaining important unpublished data from relevant studies â¢ Methods used for data abstraction Analyzing and Presenting Results â¢ Use of qualitative data in systematic reviews â¢ Use of economic data in systematic reviews â¢ Methods for combining results of diagnostic test accuracy Statistical Methods (e.g., statistical heterogeneity, fixed versus random effects, and meta-regression) â¢ Inclusion of interstudy variability into displays of results â¢ How best to display findings and their reliability for users â¢ Methods and validity of indirect comparisons Interpreting Results â¢ Understanding why reviews on similar topics may yield different results â¢ Updating systematic reviews â¢ Frequency of updates SOURCE: Cochrane Collaboration (2007); Higgins and Green (2006). and provide excitement about the potential to contribute to health research and to health care practice overall. Moreover, the academic community must recognize the scientific scholarship that is required to conduct high- quality systematic reviews.

SYSTEMATIC REVIEWS 111 OTHER PROGRAM CHALLENGES Keeping Reviews Up-to-Date Systematic reviews are not only difficult and time consuming, they also must be kept up-to-date to ensure patient safety. Having an organiza- tion that exercises oversight on the production of systematic reviews, for example, the Cochrane Collaboration or professional societies that pro- duce clinical practice guidelines, provides an infrastructure and chain of responsibility for the updating of reviews. There has been little research on updating, and the research that does exist indicates that not all organiza- tions have mechanisms for systematically updating their reviews. In 2001, Shekelle and colleagues (2001) examined how quickly the AHRQ guidelines went out of date. At the time of that study, they classified only 3 of the 17 guidelines in circulation at that time as still valid. About half of the guidelines were out of date in 5.8 years from the time of their release, and at 3.6 years, at least 10 percent were out of date. A more recent report examining a sample of 100 high-quality systematic reviews of inter- ventions found that within 5.5 years, half of the reviews had new evidence that would substantively change the conclusions about the effectiveness of interventions, and within 2 years almost 25 percent had such evidence (Shojania et al., 2007). The frequency of updating was associated with the clinical topic area and the initial heterogeneity of the results. Thus, it appears that the failure to update systematic reviews and guidelines within a few years could easily result in patient care that is not evidence based and, worse, care that is not as effective as possible or po- tentially dangerous. New and Emerging Technologies Although this chapter has focused on comprehensive, systematic re- views, the committee recognizes that some decision makers have a legiti- mate need for objective advisories on new and emerging technologies in order to respond to coverage requests when few, if any, high-quality studies or systematic reviews exist. In addition, patients and providers want infor- mation on new health care services as soon as the services become known, often because manufacturers are pressing them to adopt a product or be- cause patients have read direct-to-consumer advertising and want answers from their physicians and other health care providers. Private technology assessment organizations, such as the ECRI Institute and Hayes, Inc., have responded to the market demand for early reviews of new technologies (ECRI, 2006b; Hayes, Inc., 2007). These firms and other private, proprietary organizations offer clients brief reviews based on

112 KNOWING WHAT WORKS IN HEALTH CARE readily available sources of information. Two examples are provided in Ap- pendix E (as proprietary products, they are not in the public domain). The reviews aggregate what little is known from searches of electronic databases (e.g., MEDLINE, EMBASE, or the Cochrane Central Register of Controlled Trials) and published conference abstracts. Other easily obtained informa- tion, such as reports from FDA advisory committee meetings, may also be included. Typically, the reviews include a brief description of an interven- tion; its relevance to clinical care; a short, preliminary list of the relevant research citations that have been identified; two- to three-paragraph sum- maries of selected research abstracts; and details on the methods used to search the literature. The Program should consider producing brief advisories on new and emerging technologies in addition to full systematic reviews. If so, like the ECRI Institute and Hayes, Inc., products, the advisories produced under the aegis of the Program should clearly emphasize and highlight the limitations of the information. The advisories clearly state their limitations, so that no one will misinterpret them as an adequate substitute for substantive assess- ments of evidence on effectiveness. REFERENCES AHRQ (Agency for Healthcare Research and Quality). 2006. The guide to clinical preventive services 2006: Recommendations of the U.S. Preventive Services Task Force. AHRQ. Pub. No. 06-0588. Rockville, MD: AHRQ. âââ. 2007a. Guide for conducting comparative effectiveness reviews (Draft for public comment) http://effectivehealthcare.ahrq.gov/getInvolved/commentFormMethodsGuide. cfm?DocID=1 (accessed October 10, 2007). âââ. 2007b. Userâs guide to registries evaluating patient outcomes: Summary. AHRQ Pub. No. 07-EHC001-2. Rockville, MD: AHRQ. AHRQ EPC Program (Evidence-based Practice Center Program). 2007. Template for submis- sions of topics for AHRQ evidence reports or technology assessments http://www.ahrq. gov/clinic/epcpartner/epcesubtempl.doc (accessed January 17, 2007). Allen, C., and M. Clarke. 2007 (unpublished). International activity in Cochrane Review Groups with a focus on the USA. Cochrane Collaboration. Als-Nielsen, B., W. Chen, C. Gluud, and L. L. Kjaergard. 2003. Association of funding and conclusions in randomized drug trials: A reflection of treatment effect or adverse events? JAMA 290(7):921-928. Annals of Internal Medicine. 2007. Information for authors http://www.annals.org/shared/ author_info.html (accessed July 11, 2007). Antman, E. M., J. Lau, B. Kupelnick, F. Mosteller, and T. C. Chalmers. 1992. A comparison of meta-analyses of randomized control trials and recommendations of clinical experts. Treatments for myocardial infarction. JAMA 268(2):240-248. Archives of General Psychiatry. 2007. Instructions for authors http://archpsyc.ama-assn.org/ misc/ifora.dtl (accessed July 12, 2007). Atkins, D. 2007. Creating and synthesizing evidence with decision makers in mind: Integrat- ing evidence from clinical trials and other study designs. Medical Care 45(10 Suppl 2): S16-S22.

SYSTEMATIC REVIEWS 113 Atkins, D., K. Fink, and J. Slutsky. 2005. Better information for better health care: The Evidence-based Practice Center program and the Agency for Healthcare Research and Quality. Annals of Internal Medicine 142(12 Part 2):1035-1041. Barton, M. 2007. Using systematic reviews to develop clinical recommendations (Submitted responses to the IOM HECS committee meeting, January 25, 2007). Washington, DC. BCBSA (Blue Cross and Blue Shield Association). 2007. Blue Cross and Blue Shield Asso- ciation proposes payer-funded institute to evaluate what medical treatments work best http://www.bcbs.com/news/bcbsa/blue-cross-and-blue-shield-association-proposes-payer- funded-institute.html (accessed May 2007). Bekelman, J. E., Y. Li, and C. P. Gross. 2003. Scope and impact of financial conflicts of interest in biomedical research: A systematic review. JAMA 289(4):454-465. Bhandari, M., F. Morrow, A. V. Kulkarni, and P. Tornetta. 2001. Meta-analyses in orthopaedic surgery: A systematic review of their methodologies. Journal of Bone and Joint Surgery 83A:15-24. Bossuyt, P. M., J. B. Reitsma, D. E. Bruns, C. A. Gatsonis, P. P. Glasziou, L. M. Irwig, D. Moher, D. Rennie, H. C. de Vet, and J. G. Lijmer. 2003. The STARD statement for reporting studies of diagnostic accuracy: Explanation and elaboration. Clinical Chemistry 49:7-18. British Medical Journal. 2007. Resources for authors: Article requirements http://resources. bmj.com/bmj/authors/article-submission/article-requirements (accessed July 11, 2007). Buscemi, N., L. Hartling, B. Vandermeer, L. Tjosvold, and T. P. Klassen. 2006. Single data extraction generated more errors than double data extraction in systematic reviews. Journal of Clinical Epidemiology 59(7):697-703. CANCER. 2007. CANCER instructions for authors http://www.interscience.wiley.com/cancer/ (accessed July 11, 2007). CDC (Centers for Disease Control and Prevention). 2007. Community preventive services: Methods: Effectiveness evaluation http://www.thecommunityguide.org/methods/ (ac- cessed October 3, 2007). Chalmers, I. 2003. Trying to do more good than harm in policy and practice: The role of rigor- ous, transparent, up-to-date evaluations. Annals of the American Academy of Political and Social Science 589(1):22-40. âââ. 2005. Academiaâs failure to support systematic reviews. Lancet 365(9458):469. Chalmers, I., and D. G. Altman. 1995. Systematic reviews. London: BMJ Publications. Chalmers, I., J. Hetherington, and M. Newdick. 1986. The Oxford database of perinatal tri- als: Developing a register of published reports of controlled trials. Controlled Clinical Trials 7(4):306-324. Chalmers, I., M. Enkin, and M. Keirse. 1989. Effective care in pregnancy and childbirth. Oxford, UK: Oxford University Press. Chan, A. W., A. Hrobjartsson, M. T. Haahr, P. C. GÃ¸tzsche, and D. G. Altman. 2004. Em- perical evidence for selective reporting of outcomes in randomized trials: Comparison of protocols to published articles. JAMA 291:2457-2465. CHEST. 2006. CHEST instructions to authors and statement of CHEST policies http://www. chestjournal.org/misc/PolicyInstruxA.pdf (accessed July 11, 2007). Chou, R., and M. Helfand. 2005. Challenges in systematic reviews that assess treatment harms. Annals of Internal Medicine 142(12 Part 2):1090-1099. Circulation. 2007. Instructions for authors http://circ.ahajournals.org/misc/ifora.shtml (ac- cessed July 12, 2007). Claxton, K., J. T. Cohen, and P. J. Neumann. 2005. When is evidence sufficient? Health Af- fairs 24(1):93-101. Cochrane Collaboration. 2007. Methods groups newsletter Vol. 11. Oxford, UK.

114 KNOWING WHAT WORKS IN HEALTH CARE Congressional Budget Office. 2007. Research on the comparative effectiveness of medical treatments: Options for an expanded federal role. Testimony by Director Peter R. Orszag before House Ways and Means Subcommittee on Health http://www.cbo.gov/ftpdocs/ 82xx/doc8209/Comparative_Testimony.pdf (accessed June 12, 2007). Cook, D. J., C. D. Mulrow, and R. B. Haynes. 1997. Systematic reviews: Synthesis of best evidence for clinical decisions. Annals of Internal Medicine 126(5):376-380. Cooper, H. M., and R. Rosenthal. 1980. A comparison of statistical and traditional procedures for summarizing research. Psychological Bulletin 87:442-449. Counsell, C. 1997. Formulating questions and locating primary studies for inclusion in sys- tematic reviews. Annals of Internal Medicine 127(5):380-387. Delaney, A., S. M. Bagshaw, A. Ferland, B. Manns, and K. B. Laupland. 2005. A systematic evaluation of the quality of meta-analyses in the critical care literature. Critical Care 9: R575-R582. Diabetes. 2007. Diabetes instructions for authors http://care.diabetesjournals.org/misc/ifora. shtml (accessed July 30, 2007). Dickersin, K. 2002. Systematic reviews in epidemiology: Why are we so far behind? Interna- tional Journal of Epidemiology 31(1):6-12. âââ. 2005. Publication bias: Recognizing the problem, understanding its origins and scope, and preventing harm. In Publication bias in meta-analysis: Prevention, assessment, and adjustments. Edited by Rothstein, H., A. Sutton, and M. Borenstein. London, UK: John Wiley and Sons, Ltd. âââ. 2007 (unpublished). Steps in evidence-based healthcare. PowerPoint Presentation. Baltimore, MD. Dickersin, K., and Y.-I. Min. 1993. Publication bias: The problem that wonât go away. In Do- ing more good than harm: The evaluation of health care interventions. Edited by Warren, K. S., and F. Mosteller. New York: New York Academy of Sciences. Pp. 135-148. Dickersin, K., and E. Manheimer. 1998. The Cochrane Collaboration: Evaluation of health care and services using systematic reviews of the results of randomized controlled trials. Clinical Obstetrics and Gynecology 41(2):315-331. Dickersin, K., P. Hewitt, and L. Mutch. 1985. Perusing the literature: Comparison of MEDLINE searching with a perinatal trials database. Controlled Clinical Trials 6(4):306-317. Dickersin, K., R. Scherer, and C. Lefebvre. 1994. Identifying relevant studies for systematic reviews. BMJ 309(6964):1286-1291. Dickersin, K., E. Manheimer, S. Wieland, K. A. Robinson, C. Lefebvre, S. McDonald, and the CENTRAL Development Group. 2002. Development of the Cochrane Collaborationâs CENTRAL register of controlled clinical trials. Evaluation and the Health Professions 25:38-64. Early Breast Cancer Trialistsâ Collaborative Group. 1988. Effects of adjuvant tamoxifen and of cytotoxic therapy on mortality in early breast cancer: An overview of 61 randomised trials among 28 896 women. New England Journal of Medicine 319:1681-1692. Easterbrook, P. J., J. A. Berlin, R. Gopalan, and D. R. Matthews. 1991. Publication bias in clinical research. Lancet 337:867-872. Ebrahim, S., and M. Clarke. 2007. STROBE: New standards for reporting observational epi- demiology, a chance to improve. International Journal of Epidemiology 36(5):945-948. ECRI. 2006a (unpublished). 2006 ECRI price list. ECRI Health Technology Assessment Information Service. âââ. 2006b. About ECRI http://www.ecri.org/About_ECRI/About_ECRI.aspx (accessed January 31, 2007). Eddy, D. M. 2007. Linking electronic medical records to large-scale simulation models: Can we put rapid learning on turbo? Health Affairs 26(2):w125-w136.

SYSTEMATIC REVIEWS 115 Editors. 2005. Reviews: Making sense of an often tangled skein of evidence. Annals of Internal Medicine 142(12 Part 1):1019-1020. Egger, M., T. Zellweger-ZÃ¤hner, M. Schneider, C. Junker, C. Lengeler, and G. Antes. 1997. Language bias in randomised controlled trials published in English and German. Lancet 350(9074):326-329. Egger, M., G. Davie Smith, and K. OâRourke. 2001. Rationale, potentials, and promise of systematic reviews. In Systematic Reviews in Health Care: Meta-Analysis in Context. Edited by Egger, M. London, UK: BMJ Publishing Group. Pp. 3-19. Egger, M., P. Juni, C. Bartlett, F. Holenstein, and J. Sterne. 2003. How important are com- prehensive literature searches and the assessment of trial quality in systematic reviews? Health Technology Assessment 7(1):76. EPC Coordinating Center. 2005. Evidence-based practice centers partnerâs guide http://www. ahrq.gov/clinic/epcpartner/epcpartner.pdf (accessed January 25, 2007). Francis, J., and J. B. Perlin. 2006. Improving performance through knowledge translation in the Veterans Health Administration. Journal of Continuing Education in the Health Professions 26(1):63-71. Garfield, E. 2006. The history and meaning of the journal impact factor. JAMA 295(1): 90-93. Glass, G. V. 1976. Primary, secondary and meta-analysis. Educational Researcher 5(10):3-8. Glass, G. V., B. McGaw, and M. L. Smith. 1981. Meta-analysis in social research. Newbury Park, CA: Sage Publications. Glasziou, P., and B. Haynes. 2005. The paths from research to improved health outcomes. ACP Journal Club 142(2):A8-A10. Glasziou, P., J. Vandenbroucke, and I. Chalmers. 2004. Assessing the quality of research. BMJ 328(7430):39-41. Glenny, A. M., M. Esposito, P. Coulthard, and H. V. Worthington. 2003. The assessment of systematic reviews in dentistry. European Journal of Oral Sciences 111:85-92. Gluud, L. L. 2006 Bias in clinical intervention research. American Journal of Epidemiology 163(6):493-501. Goldberger, J. 1907. Typhoid bacillus carriers. Edited by Rosenau, M. J., L. L. Lumsden, and J. H. Kastle. Report on the origin and prevalence of typhoid fever in the District of Columbia. Hygienic Laboratory Bulletin No. 35 167-174. Golder, S., H. M. McIntosh, S. Duffy, and J. Glanville. 2006. Developing efficient search strate- gies to identify reports of adverse effects in Medline and Embase. Health Information and Libraries Journal 23(1):3-12. GÃ¸tzsche, P. C., A. Hrobjartsson, K. Maric, and B. Tendal. 2007. Data extraction errors in meta-analyses that use standardized mean differences. JAMA 298(4):430-437. GRADE Working Group. 2004. Grading quality of evidence and strength of recommenda- tions. BMJ 328(7454):1490. Guyatt, G. H. 1991. Evidence-based medicine. ACP Journal Club 114:A-16. Harris, R. P., M. Helfand, S. H. Woolf, K. N. Lohr, C. D. Mulrow, S. M. Teutsch, and D. Atkins. 2001. Current methods of the U.S. Preventive Services Task Force: A review of the process. American Journal of Preventive Medicine 20(3 Suppl):21-35. Hayden, J. A., P. Cote, and C. Bombardier. 2006. Evaluation of the quality of prognosis studies in systematic reviews. Annals of Internal Medicine 144:427-437. Hayes, Inc. 2007. Welcome to Hayes http://hayesinc.com (accessed May 8, 2007). Haynes, R. B., D. L. Sackett, G. H. Guyatt, and P. Tugwell. 2006. Clinical epidemiology: How to do clinical practice research. 3rd ed. Philadelphia, PA: Lipincott Williams & Wilkins. The Health Industry Forum. 2006. Comparative effectiveness forum: Key themes. Washington, DC: The Health Industry Forum.

116 KNOWING WHAT WORKS IN HEALTH CARE Hedges, L. V., and I. Olkin. 1985. Statistical methods for meta-analysis. Orlando, FL: Aca- demic Press. Helfand, M. 2005. Using evidence reports: Progress and challenges in evidence-based decision making. Health Affairs 24(1):123-127. Heres, S., J. Davis, K. Maino, E. Jetzinger, W. Kissling, and S. Leucht. 2006. Why olanzapine beats risperidone, risperidone beats quetiapine, and quetiapine beats olanzapine: An exploratory analysis of head-to-head comparison studies of second-generation antipsy- chotics. American Journal of Psychiatry 163(2):185-194. Higgins, J. T., and S. Green. 2006. Cochrane handbook for systematic reviews of interventions 4.2.6 [updated September 2006], The Cochrane Library, Issue 4, 2006. Chichester, UK: John Wiley & Sons, Ltd. Hopewell, S., M. Clarke, C. Lefebvre, and R. Scherer. 2007a. Handsearching versus electronic searching to identify reports of randomized trials. Cochrane Database of Systematic Reviews (2). Hopewell, S., S. McDonald, M. Clarke, and M. Egger. 2007b. Grey literature in meta-analy- ses of randomized trials of health care interventions. Cochrane Database of Systematic Reviews (2). Hypertension. 2007. Instructions to authors http://hyper.ahajournals.org/misc/ifora.shtml (ac- cessed July 30, 2007). ICMJE (International Committee of Medical Journal Editors). 2006. Uniform requirements for manuscripts submitted to biomedical journals http://www.icmje.org (accessed September 5, 2007). Ioannidis, J. P., and J. Lau. 2004. Systematic review of medical evidence. Journal of Law and Policy 12(2):509-535. Ioannidis, J. P., J. W. Evans, P. C. GÃ¸tzsche, R. T. OâNeill, D. Altman, K. Schulz, and D. Moher. 2004. Better reporting of harms in randomized trials: An extension of the CONSORT Statement. Annals of Internal Medicine 141:781-788. IOM (Institute of Medicine). 2007. Learning what works best: The nationâs need for evidence on comparative effectiveness in health care http://www.iom.edu/ebm-effectiveness (ac- cessed April 2007). Jadad, A. R., and H. J. McQuay. 1996. Meta-analyses to evaluate analgesic interventions: A systematic qualitative review of their methodology. Journal of Clinical Epidemiology 49:235-243. Jadad, A. R., M. Moher, G. P. Browman, L. Booker, C. Sigouin, M. Fuentes, and R. Stevens. 2000. Systematic reviews and meta-analyses on treatment of asthma: Critical evaluation. BMJ 320:537-540. JAMA. 2007. Instructions for authors http://jama.ama-assn.org/misc/ifora.dtl (accessed July 12, 2007). Jones, A. P., T. Remmington, P. R. Williamson, D. Ashby, and R. L. Smyth. 2005. High preva- lence but low impact of data extraction and reporting errors were found in Cochrane systematic reviews. Journal of Clinical Epidemiology 58:741-742. Jorgensen, A. W., J. Hilden, and P. GÃ¸tzsche. 2006. Cochrane reviews compared with industry supported meta-analyses and other meta-analyses of the same drugs: Systematic review. BMJ Online 333:782-786. Journal of Clinical Oncology. 2007. Information for contributers http://jco.ascopubs.org/misc/ ifora.shtml (accessed July 12, 2007). Journal of the National Cancer Institute. 2007. Instructions to authors http://www. oxfordjournals.org/our_journals/jnci/for_authors/index.html (accessed July 30, 2007). Khan, K. S., and J. Kleijnen. 2001. Stage II conducting the review: Phase 4 selection of studies. In CRD Report Number 4. Edited by Khan, K. S., G. ter Riet, H. Glanville, A. J. Sowden, and J. Kleijnen. York, UK: NHS Centre for Reviews and Dissemination, University of York.

SYSTEMATIC REVIEWS 117 Khan, K. S., J. Popay, and J. Kleijnen. 2001a. Stage I planning the review: Phase 2 development of a review protocol. In CRD Report Number 4. Edited by Khan, K. S., G. ter Riet, H. Glanville, A. J. Sowden, and J. Kleijnen. York, UK: NHS Centre for Reviews and Dis- semination, University of York. Khan, K. S., G. ter Riet, J. Popay, J. Nixon, and J. Kleijnen. 2001b. Stage II conducting the review: Phase 5 study quality assessment. In CRD Report Number 4. Edited by Khan, K. S., G. ter Riet, H. Glanville, A. J. Sowden, and J. Kleijnen. York, UK: NHS Centre for Reviews and Dissemination, University of York. Kunz, R., G. Vist, and A. D. Oxman. 2007. Randomisation to protect against selection bias in healthcare trials. Cochrane Database of Systematic Reviews (2). Lancet. 2007. Information for authors http://www.thelancet.com/authors/lancet/authorinfo/ (accessed July 30, 2007). Lau, J., E. M. Antman, J. Jimenez-Silva, B. Kupelnick, F. Mosteller, and T. C. Chalmers. 1992. Cumulative meta-analysis of therapeutic trials for myocardial infarction. New England Journal of Medicine 327(4):248-254. Lavis, J., H. Davies, A. Oxman, J. Denis, K. Golden-Biddle, and E. Ferlie. 2005. Towards systematic reviews that inform health care management and policy-making. Journal of Health Services Research and Policy 10(Suppl 1):35-48. Lexchin, J., L. A. Bero, B. Djulbegovic, and O. Clark. 2003. Pharmaceutical industry sponsor- ship and research outcome and quality: Systematic review. BMJ 326:1167-1170. Light, R. J., and D. B. Pillemer. 1984. Summing up. Cambridge, MA: Harvard University Press. Mallen, C., G. Peat, and P. Croft. 2006. Quality assessment of observational studies is not commonplace in systematic reviews. Journal of Clinical Epidemiology 59:765-769. Medicare Payment Advisory Commission. 2007. Chapter 2: Producing comparative effective- ness information. In Report to the Congress: Promoting greater efficiency in Medicare http://www.medpac.gov/documents/Jun07_EntireReport.pdf (accessed June 2007). Meyers, D., H. Halvorson, and S. Luckhaupt. 2007. Evidence synthesis number 48. Screening for chlamydial infection: A focused evidence update for the U.S. Preventive Services Task Force. Gaithersburg, MD: Agency for Healthcare Research and Quality. Moher, D., D. J. Cook, S. Eastwood, I. Olkin, D. Rennie, D. F. Stroup, and the QUOROM Group. 1999. Improving the quality of reports of meta-analyses of randomized controlled trials: The QUOROM statement. Lancet 354:1896-1900. Moher, D., A. Jones, L. Lepage, and the CONSORT Group. 2001a. Use of the CONSORT Statement and quality of reports of randomized trials. JAMA 285(15):1992-1995. Moher, D., K. F. Schulz, D. Altman, and the CONSORT Group. 2001b. The CONSORT state- ment: Revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 285(15):1987-1991. Moher, D., J. Tetzlaff, A. C. Tricco, M. Sampson, and D. G. Altman. 2007. Epidemiology and reporting characteristics of systematic reviews. PLoS Medicine 4(3):447-455. Montori, V., N. Wilczynski, D. Morgan, and R. B. Haynes, for the Hedges Team. 2003. Sys- tematic reviews: A cross-sectional study of location and citation counts. BMC Medicine 1(1):2. Mulrow, C. 1987. The medical review article: State of the science. Annals of Internal Medicine 106:485-488. Mulrow, C., and K. Lohr. 2001. Proof and policy from medical research evidence. Journal of Health Politics, Policy and Law 26(2):249-266. Mulrow, C. D., D. J. Cook, and F. Davidoff. 1997. Systematic reviews: Critical links in the great chain of evidence. Annals of Internal Medicine 126(5):389-391. Neumann, P. J. 2006. Emerging lessons from the Drug Effectiveness Review Project. Health Affairs 25(4):w262-w271.

118 KNOWING WHAT WORKS IN HEALTH CARE Neumann, P. J., N. Divi, M. T. Beinfeld, B. S. Levine, P. S. Keenan, E. F. Halpern, and G. S. Gazelle. 2005. Medicareâs national coverage decisions, 1999-2003: Quality of evidence and review times. Health Affairs 24(1):243-254. Neumann, P. J., N. Divi, M. T. Beinfeld, and B. S. Levine. 2007 (unpublished). Medicare National Coverage Decision Database. Tufts-New England Medical Center. Sponsored by the Commonwealth Fund. New England Journal of Medicine. 2007. Instructions for submitting a NEW manuscript http://authors.nejm.org/Misc/NewMS.asp (accessed July 12, 2007). NGC (National Guideline Clearinghouse). 2007. Search for cardiology http://www.guideline. gov/search/searchresults.aspx?Type=3&txtSearch=cardiology&num=500 (accessed July 11, 2007). Norris, S. L., and D. Atkins. 2005. Challenges in using nonrandomized studies in systemÂ atic reviews of treatment interventions. Annals of Internal Medicine 142(12 Part 2): 1112-1119. Obstetrics and Gynecology. 2007. Instructions for authors http://www.greenjournal.org/misc/ authors.pdf (accessed July 12, 2007). Oxman, A. D., and G. H. Guyatt. 1988. Guidelines for reading literature reviews. Canadian Medical Association Journal 138:697-703. Oxman, A. D., H. J. SchÃ¼nemann, and A. Fretheim. 2006. Improving the use of research evi- dence in guideline development: 7. Deciding what evidence to include. Health Research Policy and Systems 4(19). Pearson, K. 1904. Report on certain enteric fever inoculation statistics. BMJ 3:1243-1246. Pediatrics. 2007. Instructions for authors http://mc.manuscriptcentral.com/societyimages/ pediatrics/2004_author_instructions.pdf (accessed July 12, 2007). Peppercorn, J., E. Blood, E. Winer, and A. Partridge. 2007. Association between pharmaceutical involvement and outcomes in breast cancer clinical trials. Cancer 109(7):1239-1246. Perlin, J. B., and J. Kupersmith. 2007. Information technology and the inferential gap. Health Affairs 26(2):w192-w194. Poolman, R., P. Struijs, R. Krips, I. Sierevelt, K. Lutz, and M. Bhandari. 2006. Does a âLevel I Evidenceâ rating imply high quality of reporting in orthopaedic randomised controlled trials? BMC Medical Research Methodology 6(1):44. Pratt, J. G., J. B. Rhine, B. M. Smith, C. E. Stuart, and J. A. Greenwood. 1940. Extra-sensory perception after sixty years: A critical appraisal of the research in extra-sensory percep- tion. New York: Henry Holt. Radiology. 2007. Publication information for authors http://www.rsna.org/publications/rad/ pdf/pia.pdf (accessed July 11, 2007). Rayleigh, L. 1884. Address by the Rt. Hon. Lord Rayleigh. In Report of the fifty-fourth meeting of the British Association for the Advancement of Science. Edited by Murray, J. Montreal. Reviews in Clinical Gerontology. 2007. Instructions for contributors http://assets.cambridge. org/RCG/RCG_ifc.pdf (accessed July 30, 2007). Richardson, W. S., M. C. Wilson, J. Nishikawa, and R. S. A. Hayward. 1995. The well-built clinical question: A key to evidence-based decisions [editorial]. ACP Journal Club 123: A12-A13. Rietveld, R. P., H. C. P. M. van Weert, G. ter Riet, and P. J. E. Bindels. 2003. Diagnostic impact of signs and symptoms in acute infectious conjunctivitis: Systematic literature search. BMJ 327(7418):789. Rosenthal, R. 1978. Combining results of independent studies. Psychological Bulletin 85: 185-193. Santaguida, P., M. Helfand, and P. Raina. 2005. Challenges in systematic reviews that evaluate drug efficacy or effectiveness. Annals of Internal Medicine 142(12 Part 2):1066-1072.

SYSTEMATIC REVIEWS 119 Scherer, R. W., P. Langenberg, and E. von Elm. 2007. Full publication of results initially pre- sented in abstracts. Cochrane Database of Systematic Reviews (2). Scholes, D., A. Stergachis, F. E. Heidrich, H. Andrilla, K. K. Holmes, and W. E. Stamm. 1996. Prevention of pelvic inflammatory disease by screening for cervical chlamydial infection. New England Journal of Medicine 334(21):1362-1366. SchÃ¼nemann, H., D. Best, G. Vist, and A. D. Oxman. 2003. Letters, numbers, symbols and words: How to communicate grades of evidence and recommendations. Canadian Medi- cal Association Journal 169(7):677-680. SchÃ¼nemann, H., A. Oxman, and A. Fretheim. 2006. Improving the use of research evidence in guideline development: 6. Determining which outcomes are important. Health Research Policy and Systems 4(1):18. Shea, B., D. Moher, I. Graham, B. A. Pham, and P. Tugwell. 2002. A comparison of the quality of Cochrane reviews and systematic reviews published in paper-based journals. Evalua- tion and the Health Professions 25:116-129. Shekelle, P. G., E. Ortiz, S. Rhodes, S. C. Morton, M. P. Eccles, J. M. Grimshaw, and S. H. Woolf. 2001. Validity of the Agency for Healthcare Research and Quality Clinical Practice Guidelines: How quickly do guidelines become outdated? JAMA 286:1461-1467. Shojania, K. G., M. Sampson, M. T. Ansari, J. Ji, S. Doucette, and D. Moher. 2007. How quickly do systematic reviews go out of date? A survival analysis. Annals of Internal Medicine 147:224-233. Sinclair, J., and M. Bracken. 1992. Effective care of the newborn infant. New York: Oxford University Press. Slutsky, J. 2007. Approaches to priority setting: Identifying topics and selection. Submitted Responses to the HECS Committee Meeting, January 25, 2007. Washington, DC. Song, F., A. J. Eastwood, S. Gilbody, L. Duley, and A. J. Sutton. 2000. Publication and related biases. Health Technology Assessment 4(10). Spine. 2007. Instructions for authors http://edmgr.ovid.com/spine/accounts/ifauth.htm (ac- cessed July 12, 2007). Steinberg, E. P., and B. R. Luce. 2005. Evidence based? Caveat emptor! Health Affairs 24(1):80-92. Stewart, W. F., N. R. Shah, M. J. Selna, R. A. Paulus, and J. M. Walker. 2007. Bridging the inferential gap: The electronic health record and clinical evidence. Health Affairs 26(2): w181-w191. Stroup, D. F., J. A. Berlin, S. C. Morton, I. Olkin, G. D. Williamson, D. Rennie, D. Moher, B. J. Becker, T. A. Sipe, and S. B. Thacker for the Meta-analysis Of Observational Stud- ies in Epidemiology (MOOSE) Group. 2000. Meta-analysis of observational studies in epidemiology: A proposal for reporting. JAMA 283(15):2008-2012. Sutton, A. J., K. R. Abrams, D. R. Jones, T. A. Sheldon, and F. Song. 2000. Methods for meta- analysis in medical research. London, UK: John Wiley. Tatsioni, A., D. A. Zarin, N. Aronson, D. J. Samson, C. R. Flamm, C. Schmid, and J. Lau. 2005. Challenges in systematic reviews of diagnostic technologies. Annals of Internal Medicine 142(12 Part 2):1048-1055. Treadwell, J. R., S. J. Tregear, J. T. Reston, and C. M. Turkelson. 2006. A system for rating the stability and strength of medical evidence. BMC Medical Research Methodology [electronic resource] 6:52. Tunis, S. 2006. Improving evidence for health care decisions. Presentation to IOM staff, April 28, 2006. Washington, DC. USPSTF (U.S. Preventive Services Task Force). 2007. U.S. Preventive Services Task Force rat- ings http://www.ahrq.gov/clinic/uspstf07/ratingsv2.htm (accessed July 10, 2007).

120 KNOWING WHAT WORKS IN HEALTH CARE von Elm, E., D. G. Altman, M. Egger, S. J. Pocock, P. C. GÃ¸tzsche, J. P. Vandenbroucke, and the Strobe Initiative. 2007. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement: Guidelines for reporting observational studies. Annals of Internal Medicine 147(8):573-577. West, S., V. King, T. Carey, K. Lohr, N. McCoy, S. Sutton, and L. Lux. 2002. Systems to rate the strength of scientific evidence. Evidence Report/Technology Assessment No. 47. (Pre- pared by the Research Triangle Institute-University of North Carolina Evidence-based Practice Center under Contract No. 290-97-0011.) AHRQ Publication No. 02-E016. Rockville, MD: Agency for Healthcare Research and Quality. Whiting, P., A. W. Rutjes, J. Dinnes, J. B. Reitsma, P. M. Bossuyt, and J. Kleijnen. 2005. A systematic review finds that diagnostic reviews fail to incorporate quality despite avail- able tools. Journal of Clinical Epidemiology 58:1-12. Wieland, S., and K. Dickersin. 2005. Selective exposure reporting and Medline indexing limited the search sensitivity for observational studies of the adverse effects of oral con- traceptives. Journal of Clinical Epidemiology 58(6):560-567. Wilczynski, N. L., D. Morgan, R. B. Haynes, and the Hedges Team. 2005. An overview of the design and methods for retrieving high-quality studies for clinical care. BMC Medical Informatics and Decision Making 5(20). Wilensky, G. R. 2006. Developing a center for comparative effectiveness information. Health Affairs w572. World Health Organization. 2007. International clinical trials registry platform http://www. who.int/ictrp/en/ (accessed August 9, 2007). Yusuf, S., R. Peto, J. Lewis, R. Collins, and P. Sleight. 1985. Beta blockade during and after myocardial infarction: An overview of the randomized trials. Progress in Cardiovascular Diseases 27(5):335-371.

Next: 5 Developing Trusted Clinical Practice Guidelines »

Knowing What Works in Health Care: A Roadmap for the Nation (2008)

Chapter: 4 Systematic Reviews: The Central Link Between Evidence and Clinical Decision Making

Welcome to OpenBook!

Get Email Updates