Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 155
Finding What Works in Health Care: Standards for Systematic Reviews 4 Standards for Synthesizing the Body of Evidence Abstract: This chapter addresses the qualitative and quantitative synthesis (meta-analysis) of the body of evidence. The committee recommends four related standards. The systematic review (SR) should use prespecified methods; include a qualitative synthesis based on essential characteristics of study quality (risk of bias, consistency, precision, directness, reporting bias, and for observational studies, dose–response association, plausible confounding that would change an observed effect, and strength of association); and make an explicit judgment of whether a meta-analysis is appropriate. If conducting meta-analyses, expert methodologists should develop, execute, and peer review the meta-analyses. The meta-analyses should address heterogeneity among study effects, accompany all estimates with measures of statistical uncertainty, and assess the sensitivity of conclusions to changes in the protocol, assumptions, and study selection (sensitivity analysis). An SR that uses rigorous and transparent methods will enable patients, clinicians, and other decision makers to discern what is known and not known about an intervention’s effectiveness and how the evidence applies to particular population groups and clinical situations. More than a century ago, Nobel prize-winning physicist J. W. Strutt Lord Rayleigh observed that “the work which deserves …
OCR for page 156
Finding What Works in Health Care: Standards for Systematic Reviews the most credit is that in which discovery and explanation go hand in hand, in which not only are new facts presented, but their relation to old ones is pointed out” (Rayleigh, 1884). In other words, the contribution of any singular piece of research draws not only from its own unique discoveries, but also from its relationship to previous research (Glasziou et al., 2004; Mulrow and Lohr, 2001). Thus, the synthesis and assessment of a body of evidence is at the heart of a systematic review (SR) of comparative effectiveness research (CER). The previous chapter described the considerable challenges involved in assembling all the individual studies that comprise current knowledge on the effectiveness of a healthcare intervention: the “body of evidence.” This chapter begins with the assumption that the body of evidence was identified in an optimal manner and that the risk of bias in each individual study was assessed appropriately—both according to the committee’s standards. This chapter addresses the synthesis and assessment of the collected evidence, focusing on those aspects that are most salient to setting standards. The science of SR is rapidly evolving; much has yet to be learned. The purpose of standards for evidence synthesis and assessment—as in other SR methods—is to set performance expectations and to promote accountability for meeting those expectations without stifling innovation in methods. Thus, the emphasis is not on specifying preferred technical methods, but rather the building blocks that help ensure objectivity, transparency, and scientific rigor. As it did elsewhere in this report, the committee developed this chapter’s standards and elements of performance based on available evidence and expert guidance from the Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program, the Centre for Reviews and Dissemination (CRD, part of University of York, UK), and the Cochrane Collaboration (Chou et al., 2010; CRD, 2009; Deeks et al., 2008; Fu et al., 2010; Lefebvre et al., 2008; Owens et al., 2010). Guidance on assessing quality of evidence from the Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group was another key source of information (Guyatt et al. 2010; Schünemann et al., 2009). See Appendix F for a detailed summary of AHRQ, CRD, and Cochrane guidance for the assessment and synthesis of a body of evidence. The committee had several opportunities for learning the perspectives of stakeholders on issues related to this chapter. SR experts and representatives from medical specialty associations, payers, and consumer groups provided both written responses to the committee’s questions and oral testimony in a public workshop (see Appendix C).
OCR for page 157
Finding What Works in Health Care: Standards for Systematic Reviews In addition, staff conducted informal, structured interviews with other key stakeholders. The committee recommends four standards for the assessment and qualitative and quantitative synthesis of an SR’s body of evidence. Each standard consists of two parts: first, a brief statement describing the related SR step and, second, one or more elements of performance that are fundamental to carrying out the step. Box 4-1 lists all of the chapter’s recommended standards. This chapter provides the background and rationale for the recommended standards and elements of performance, first outlining the key considerations in assessing a body of evidence, and followed by sections on the fundamental components of qualitative and quantitative synthesis. The order of the chapter’s standards and the presentation of the discussion do not necessarily indicate the sequence in which the various steps should be conducted. Although an SR synthesis should always include a qualitative component, the feasibility of a quantitative synthesis (meta-analysis) depends on the available data. If a metaanalysis is conducted, its interpretation should be included in the qualitative synthesis. Moreover, the overall assessment of the body of evidence cannot be done until the syntheses are complete. In the context of CER, SRs are produced to help consumers, clinicians, developers of clinical practice guidelines, purchasers, and policy makers to make informed healthcare decisions (Federal Coordinating Council for Comparative Effectiveness Research, 2009; IOM, 2009). Thus, the assessment and synthesis of a body of evidence in the SR should be approached with the decision makers in mind. An SR using rigorous and transparent methods allows decision makers to discern what is known and not known about an intervention’s effectiveness and how the evidence applies to particular population groups and clinical situations (Helfand, 2005). Making evidence-based decisions—such as when a guideline developer recommends what should and should not be done in specific clinical circumstances—is a distinct and separate process from the SR and is outside the scope of this report. It is the focus of a companion IOM study on developing standards for trustworthy clinical practice guidelines.1 A NOTE ON TERMINOLOGY The SR field lacks an agreed-on lexicon for some of its most fundamental terms and concepts, including what actually constitutes 1 The IOM report, Clinical Practice Guidelines We Can Trust, is available at the National Academies Press website: http://www.nap.edu/.
OCR for page 158
Finding What Works in Health Care: Standards for Systematic Reviews BOX 4-1 Recommended Standards for Synthesizing the Body of Evidence Standard 4.1 Use a prespecified method to evaluate the body of evidence Required elements: 4.1.1 For each outcome, systematically assess the following characteristics of the body of evidence: Risk of bias Consistency Precision Directness Reporting bias 4.1.2 For bodies of evidence that include observational research, also systematically assess the following characteristics for each outcome: Dose–response association Plausible confounding that would change the observed effect Strength of association 4.1.3 For each outcome specified in the protocol, use consistent language to characterize the level of confidence in the estimates of the effect of an intervention Standard 4.2 Conduct a qualitative synthesis Required elements: 4.2.1 Describe the clinical and methodological characteristics of the included studies, including their size, inclusion or exclusion of important subgroups, timeliness, and other relevant factors the quality of a body of evidence. This leads to considerable confusion. Because this report focuses on SRs for the purposes of CER and clinical decision making, the committee uses the term “quality of the body of evidence” to describe the extent to which one can be confident that the estimate of an intervention’s effectiveness is correct. This terminology is designed to support clinical decision making and is similar to that used by GRADE and adopted by the Cochrane Collaboration and other organizations for the same purpose (Guyatt et al., 2010; Schünemann et al., 2008, 2009). Quality encompasses summary assessments of a number of characteristics of a body of evidence, such as within-study bias (methodological quality), consistency, precision, directness or applicability of the evidence, and others (Schünemann et al., 2009). Syn-
OCR for page 159
Finding What Works in Health Care: Standards for Systematic Reviews 4.2.2 Describe the strengths and limitations of individual studies and patterns across studies 4.2.3 Describe, in plain terms, how flaws in the design or execution of the study (or groups of studies) could bias the results, explaining the reasoning behind these judgments 4.2.4 Describe the relationships between the characteristics of the individual studies and their reported findings and patterns across studies 4.2.5 Discuss the relevance of individual studies to the populations, comparisons, cointerventions, settings, and outcomes or measures of interest Standard 4.3 Decide if, in addition to a qualitative analysis, the systematic review will include a quantitative analysis (meta-analysis) Required element: 4.3.1 Explain why a pooled estimate might be useful to decision makers Standard 4.4 If conducting a meta-analysis, then do the following: Required elements: 4.2.1 Use expert methodologists to develop, execute, and peer review the meta-analyses 4.2.2 Address the heterogeneity among study effects 4.2.3 Accompany all estimates with measures of statistical uncertainty 4.2.4 Assess the sensitivity of conclusions to changes in the protocol, assumptions, and study selection (sensitivity analysis) NOTE: The order of the standards does not indicate the sequence in which they are carried out. thesis is the collation, combination, and summary of the findings of a body of evidence (CRD, 2009). In an SR, the synthesis of the body of evidence should always include a qualitative component and, if the data permit, a quantitative synthesis (meta-analysis). The following section presents the background and rationale for the committee’s recommended standard and performance elements for prespecifying the assessment methods. A Need for Clarity and Consistency Neither empirical evidence nor agreement among experts is available to support the committee’s endorsement of a specific approach for assessing and describing the quality of a body of evi-
OCR for page 160
Finding What Works in Health Care: Standards for Systematic Reviews dence. Medical specialty societies, U.S. and other national government agencies, private research groups, and others have created a multitude of systems for assessing and characterizing the quality of a body of evidence (AAN, 2004; ACCF/AHA, 2009; ACCP, 2009; CEBM, 2009; Chalmers et al., 1990; Ebell et al., 2004; Faraday et al., 2009; Guirguis-Blake et al., 2007; Guyatt et al., 2004; ICSI, 2003; NCCN, 2008; NZGG, 2007; Owens et al., 2010; Schünemann et al., 2009; SIGN, 2009; USPSTF, 2008). The various systems share common features, but employ conflicting evidence hierarchies; emphasize different factors in assessing the quality of research; and use a confusing array of letters, codes, and symbols to convey investigators’ conclusions about the overall quality of a body of evidence (Atkins et al., 2004a, 2004b; Schünemann et al., 2003; West et al., 2002). The reader cannot make sense of the differences (Table 4-1). Through public testimony and interviews, the committee heard that numerous producers and users of SRs were frustrated by the number, variation, complexity, and lack of transparency in existing systems. One comprehensive review documented 40 different systems for grading the strength of a body of evidence (West et al., 2002). Another review, conducted several years later, found that more than 50 evidence-grading systems and 230 quality assessment instruments were in use (COMPUS, 2005). Early systems for evaluating the quality of a body of evidence used simple hierarchies of study design to judge the internal validity (risk of bias) of a body of evidence (Guyatt et al., 1995). For example, a body of evidence that included two or more randomized controlled trials (RCTs) was assumed to be “high-quality,” “level 1,” or “grade A” evidence whether or not the trials met scientific standards. Quasi-experimental research, observational studies, case series, and other qualitative research designs were automatically considered lower quality evidence. As research documented the variable quality of trials and widespread reporting bias in the publication of trial findings, it became clear that such hierarchies are too simplistic because they do not assess the extent to which the design and implementation of RCTs (or other study designs) avoid biases that may reduce confidence in the measures of effectiveness (Atkins et al., 2004b; Coleman et al., 2009; Harris et al., 2001). The early hierarchies produced conflicting conclusions about effectiveness. A study by Ferreira and colleagues analyzed the effect of applying different “levels of evidence” systems to the conclusions of six Cochrane SRs of interventions for low back pain (Ferreira et al., 2002). They found that the conclusions of the reviews were highly dependent on the system used to evaluate the evidence
OCR for page 161
Finding What Works in Health Care: Standards for Systematic Reviews TABLE 4-1 Examples of Approaches to Assessing the Body of Evidence for Therapeutic Interventions* System System for Assessing the Body of Evidence Agency for Healthcare Research and Quality High High confidence that the evidence reflects the true effect. Further research is very unlikely to change our confidence of the estimate of effect. Moderate Moderate confidence that the evidence reflects the true effect. Further research may change our confidence in the estimate of effect and may change the estimate. Low Low confidence that the evidence reflects the true effect. Further research is likely to change the confidence in the estimate of effect and is likely to change the estimate. Insufficient Evidence either is unavailable or does not permit a conclusion. American College of Chest Phsicians High Randomized controlled trials (RCTs) without important limitations or overwhelming evidence from observational studies. Moderate RCTs with important limitations (inconsistent results, methodological flaws, indirect, or imprecise) or exceptionally strong evidence from observational studies. Low Observational studies or case series. American Heart Association/American College of Cardiology A Multiple RCTs or meta-analyses. B Single RCT, or nonrandomized studies. C Consensus opinion of experts, case studies, or standard of care. Grading of Recommendations Assessment, Development and Evaluation (GRADE) Starting points for evaluating quality level: RCTs start high. Observational studies start low. Factors that may decrease or increase the quality level of a body of evidence: Decrease: Study limitations, inconsistency of results, indirectness of evidence, imprecision of results, and high risk of publication bias. Increase: Large magnitude of effect, dose–response gradient, all plausible biases would reduce the observed effect.
OCR for page 162
Finding What Works in Health Care: Standards for Systematic Reviews System System for Assessing the Body of Evidence High Further research is very unlikely to change our confidence in the estimate of effect. Moderate Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low Any estimate of effect is very uncertain. National Comprehensive Cancer Network High High-powered RCTs or meta-analysis. Lower Ranges from Phase II Trials to large cohort studies to case series to individual practitioner experience. Oxford Centre for Evidence-Based Medicine Varies with type of question. Level may be graded down on the basis of study quality, imprecision, indirectness, inconsistency between studies, or because the absolute effect size is very small. Level may be graded up if there is a large or very large effect size. Level 1 Systematic review (SR) of randomized trials or n-of-1 trial. For rare harms: SR of case-control studies, or studies revealing dramatic effects. Level 2 SR of nested case-control or dramatic effect. For rare harms: Randomized trial or (exceptionally) observational study with dramatic effect. Level 3 Nonrandomized controlled cohort/follow-up study. Level 4 Case-control studies, historically controlled studies. Level 5 Opinion without explicit critical appraisal, based on limited/undocumented experience, or based on mechanisms.
OCR for page 163
Finding What Works in Health Care: Standards for Systematic Reviews Scottish Intercollegiate Guidelines Network 1++ High-quality meta-analyses, SRs of RCTs, or RCTs with a very low risk of bias. 1+ Well-conducted meta-analyses, SRs, or RCTs with a low risk of bias. 1− Meta-analyses, SRs, or RCTs with a high risk of bias. 2++ High-quality SRs of case control or cohort studies. High-quality case control or cohort studies with a very low risk of confounding or bias and a high probability that the relationship is causal. 2− Case control or cohort studies with a high risk of confounding or bias and a significant risk that the relationship is not causal. 3 Nonanalytic studies, e.g., case reports, case series. 4 Expert opinion. * Some systems use different grading schemes depending on the type of intervention (e.g., preventive service, diagnostic tests, and therapies). This table includes systems for therapeutic interventions. SOURCES: ACCF/AHA (2009); ACCP (2009); CEBM (2009); NCCN (2008); Owens et al. (2010); Schünemann et al. (2009); SIGN (2009).
OCR for page 164
Finding What Works in Health Care: Standards for Systematic Reviews primarily because of differences in the number and quality of trials required for a particular level of evidence. In many cases, the differences in the conclusions were so substantial that they could lead to contradictory clinical advice. For example, for one intervention, “back school,”2 the conclusions ranged from “strong evidence that back schools are effective” to “no evidence” on the effectiveness of back schools. One reason for these discrepancies was failure to distinguish between the quality of the evidence and the magnitude of net benefit. For example, an SR and meta-analysis might highlight a dramatic effect size regardless of the risk of bias in the body of evidence. Conversely, use of a rigid hierarchy gave the impression that any effect based on randomized trial evidence was clinically important, regardless of the size of the effect. In 2001, the U.S. Preventive Services Task Force broke new ground when it updated its review methods, separating its assessment of the quality of evidence from its assessment of the magnitude of effect (Harris et al., 2001). What Are the Characteristics of Quality for a Body of Evidence? Experts in SR methodology agree on the conceptual underpinnings for the systematic assessment of a body of evidence. The committee identified eight basic characteristics of quality, described below, that are integral to assessing and characterizing the quality of a body of evidence. These characteristics—risk of bias, consistency, precision, directness, and reporting bias, and for observational studies, dose–response association, plausible confounding that would change an observed effect, and strength of association—are used by GRADE; the Cochrane Collaboration, which has adopted the GRADE approach; and the AHRQ Effective Health Care Program, which adopted a modified version of the GRADE approach (Owens et al., 2010; Balshem et al., 2011; Falck-Ytter et al., 2010; Schünemann et al., 2008). Although their terminology varies somewhat, Falck-Ytter and his GRADE colleagues describe any differences between the GRADE and AHRQ quality characteristics as essentially semantic (Falck-Ytter et al., 2010). Owens and his AHRQ colleagues appear 2 Back schools are educational programs designed to teach patients how to manage chronic low back pain to prevent future episodes. The curriculums typically include the natural history, anatomy, and physiology of back pain as well as a home exercise program (Hsieh et al., 2002).
OCR for page 165
Finding What Works in Health Care: Standards for Systematic Reviews BOX 4-2 Key Concepts Used in the GRADE Approach to Assessing the Quality of a Body of Evidence The Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group uses a point system to upgrade or downgrade the ratings for each quality characteristic. A grade of high, moderate, low, or very low is assigned to the body of evidence for each outcome. Eight characteristics of the quality of evidence are assessed for each outcome. Five characteristics can lower the quality rating for the body of evidence: Limitations in study design and conduct Inconsistent results across studies Indirectness of evidence with respect to the study design, populations, interventions, comparisons, or outcomes Imprecision of the estimates of effect Publication bias Three factors can increase the quality rating for the body of evidence because they raise confidence in the certainty of estimates (particularly for observational studies): Large magnitude of effect Plausible confounding that would reduce the demonstrated effect Dose–response gradient SOURCES: Atkins et al. (2004a); Balshem et al. (2011); Falck-Ytter et al. (2010); Schünemann et al. (2009). to agree (Owens et al., 2010). As Boxes 4-2 and 4-3 indicate, the two approaches are quite similar.3 Risk of Bias In the context of a body of evidence, risk of bias refers to the extent to which flaws in the design and execution of a collection of studies could bias the estimate of effect for each outcome under study. 3 For detailed descriptions of the AHRQ and GRADE methods, see the GRADE Handbook for Grading Quality of Evidence and Strength of Recommendations (Schünemann et al., 2009) and “Grading the Strength of a Body of Evidence When Comparing Medical Interventions—AHRQ and the Effective Health Care Program” (Owens et al., 2010).
OCR for page 184
Finding What Works in Health Care: Standards for Systematic Reviews Statistical Uncertainty In meta-analyses, the amount of within- and between-study variation determines how precisely study and aggregate treatment effects are estimated. Estimates of effects without accompanying measures of their uncertainty, such as confidence intervals, cannot be correctly interpreted. A forest plot can provide a succinct representation of the size and precision of individual study effects and aggregated effects. When effects are heterogeneous, more than one summary effect may be necessary to fully describe the data. Measures of uncertainty should also be presented for estimates of heterogeneity and for statistics that quantify relationships between treatment effects and sources of heterogeneity. Between-study heterogeneity is common in meta-analysis because studies differ in their protocols, target populations, settings, and ages of included subjects. This type of heterogeneity provides evidence about potential variability in treatment effects. Therefore, heterogeneity is not a nuisance or an undesirable feature, but rather an important source of information to be carefully analyzed (Lau et al., 1998). Instead of eliminating heterogeneity by restricting study inclusion criteria or scope, which can limit the utility of the review, heterogeneity of effect sizes can be quantified, and related to aspects of study populations or design features through statistical techniques such as meta-regression, which associates the size of treatment effects with effect modifiers. Meta-regression is most useful in explaining variation that occurs from sources that have no effect within studies, but big effects among studies (e.g., use of randomization or dose employed). Except in rare cases, meta-regression analyses are exploratory, motivated by the need to explain heterogeneity, and not by prespecification in the protocol. Meta-regression is observational in nature, and if the results of meta-regression are to be considered valid, they should be clinically plausible and supported by other external evidence. Because the number of studies in a meta-regression is often small, the technique has low power. The technique is subject to spurious findings because many potential covariates may be available, and adjustments to levels of significance may be necessary (Higgins and Thompson, 2004). Users should also be careful of relationships driven by anomalies in one or two studies. Such influential data do not provide solid evidence of strong relationships. Research Trends in Meta-Analysis As mentioned previously, a detailed discussion of meta-analysis methodology is beyond the scope of this report. There are many
OCR for page 185
Finding What Works in Health Care: Standards for Systematic Reviews unresolved questions regarding meta-analysis methods. Fortunately, meta-analysis methodological research is vibrant and ongoing. Box 4-4 describes some of the research trends in meta-analysis and provides relevant references for the interested reader. Sensitivity of Conclusions Meta-analysis entails combining information from different studies; thus, the data may come from very different study designs. A small number of studies in conjunction with a variety of study designs contribute to heterogeneity in results. Consequently, verifying that conclusions are robust to small changes in the data and to changes in modeling assumptions solidifies the belief that they are robust to new information that could appear. Without a sensitivity analysis, the credibility of the meta-analysis is reduced. Results are considered robust if small changes in the metaanalytic protocol, in modeling assumptions, and in study selection do not affect the conclusions. Robust estimates increase confidence in the SR’s findings. Sensitivity analyses subject conclusions to such tests by perturbing these characteristics in various ways. The sensitivity analysis could, for example, assess whether the results change when the meta-analysis is rerun leaving one study out at a time. One statistical test for stability is to check that the predictive distribution of a new study from a meta-analysis with one of the studies omitted would include the results of the omitted study (Deeks et al., 2008). Failure to meet this criterion implies that the result of the omitted study is unexpected given the remaining studies. Another common criterion is to determine whether the estimated average treatment effect changes substantially upon omission of one of the studies. A common definition of substantial involves change in the determination of statistical significance of the summary effect, although this definition is problematic because a significance threshold may be crossed with an unimportant change in the magnitude or precision of the effect (i.e., loss of statistical significance may result from omission of a large study that reduces the precision, but not the magnitude, of the effect). In addition to checking sensitivity to inclusion of single studies, it is important to evaluate the effect of changes in the protocol that may alter the composition of the studies in the meta-analysis. Changes to the inclusion and exclusion criteria—such as the inclusion of non-English literature or the exclusion of studies that enroll some participants not in the target population or the focus on studies with low risk of bias—may all modify results sufficiently to question robustness of inferences.
OCR for page 186
Finding What Works in Health Care: Standards for Systematic Reviews BOX 4-4 Research Trends in Meta-Analysis Meta-analytic research is a dynamic and rapidly changing field. The following describes key areas of research with recommended citations for additional reading: Prospective meta-analysis—In this approach, studies are identified and evaluated prior to the results of any individual studies being known. Prospective meta-analysis (PMA) allows selection criteria and hypotheses to be defined a priori to the trials being concluded. PMA can implement standardization across studies so that heterogeneity is decreased. In addition, small studies that lack statistical power individually can be conducted if large studies are not feasible. See for example: Berlin and Ghersi, 2004, 2005; Ghersi et al., 2008; The Cochrane Collaboration, 2010. Meta-regression—In this method, potential sources of heterogeneity are represented as predictors in a regression model, thereby enabling estimation of their relationship with treatment effects. Such analyses are exploratory in the majority of cases, motivated by the need to explain heterogeneity. See for example: Schmid et al., 2004; Smith et al., 1997; Sterne et al., 2002; Thompson and Higgins, 2002. Bayesian methods in meta-analysis—In these approaches, as in Bayesian approaches in other settings, both the data and parameters in the meta-analytic model are considered random variables. This approach allows the incorporation of prior information into subsequent analyses, and may be more flexible in complex situations than standard methodologies. See for example: Berry et al., 2010; O’Rourke and Altman, 2005; Schmid, 2001; Smith et al., 1995; Sutton and Abrams, 2001; Warn et al., 2002. Meta-analysis of multiple treatments—In this setting, direct treatment comparisons are not available, but an indirect comparison through a common comparator is. Multiple treatment models, also called mixed comparison models or network meta-analysis, may be used to more efficiently model treatment comparisons of interest. See for example: Cooper et al., 2009; Dias et al., 2010; Salanti et al., 2009. Individual participant data meta-analysis—In some cases, study data may include outcomes, treatments, and characteristics of individual participants. Meta-analysis with such individual participant data (IPD) offers many advantages over meta-analysis of aggregate studylevel data. See for example: Berlin et al., 2002; Simmonds et al., 2005; Smith et al., 1997; Sterne et al., 2002; Stewart, 1995; Thompson and Higgins, 2002; Tierney et al., 2000.
OCR for page 187
Finding What Works in Health Care: Standards for Systematic Reviews Another good practice is to evaluate sensitivity to choices about outcome metrics and statistical models. While one metric and one model may in the end be chosen as best for scientific reasons, results that are highly model dependent require more trust in the modeler and may be more prone to being overturned with new data. In any case, support for the metrics and models chosen should be provided. Meta-analyses are also frequently sensitive to assumptions about missing data. In meta-analysis, missing data include not only missing outcomes or predictors, but also missing variances and correlations needed when constructing weights based on study precision. As with any statistical analysis, missing data pose two threats: reduced power and bias. Because the number of studies is often small, loss of even a single study’s data can seriously affect the ability to draw conclusive inferences from a meta-analysis. Bias poses an even more dangerous problem. Seemingly conclusive analyses may give the wrong answer if studies that were excluded—because of missing data—differ from the studies that supplied the data. The conclusion that the treatment improved one outcome, but not another, may result solely from the different studies used. Interpreting such results requires care and caution. RECOMMENDED STANDARDS FOR META-ANALYSIS The committee recommends the following standards and elements of performance for conducting the quantitative synthesis. Standard 4.3—Decide if, in addition to a qualitative analysis, the systematic review will include a quantitative analysis (meta-analysis) Required element: 4.3.1 Explain why a pooled estimate might be useful to decision makers Standard 4.4—If conducting a meta-analysis, then do the following: Required elements: 4.4.1 Use expert methodologists to develop, execute, and peer review the meta-analyses 4.4.2 Address heterogeneity among study effects 4.4.3 Accompany all estimates with measures of statistical uncertainty 4.4.4 Assess the sensitivity of conclusions to changes in the protocol, assumptions, and study selection (sensitivity analysis)
OCR for page 188
Finding What Works in Health Care: Standards for Systematic Reviews Rationale A meta-analysis is usually desirable in an SR because it provides reproducible summaries of the individual study results and has potential to offer valuable insights into the patterns of results across studies. However, many published analyses have important methodological shortcomings and lack scientific rigor (Bailar, 1997; Gerber et al., 2007; Mullen and Ramirez, 2006). One must always look beyond the simple fact that an SR contains a meta-analysis to examine the details of how it was planned and conducted. A strong meta-analysis emanates from a well-conducted SR and features and clearly describes its subjective components, scrutinizes the individual studies for sources of heterogeneity, and tests the sensitivity of the findings to changes in the assumptions and set of studies (Greenland, 1994; Walker et al., 2008). REFERENCES AAN (American Academy of Neurology). 2004. Clinical practice guidelines process manual. http://www.aan.com/globals/axon/assets/3749.pdf (accessed February 1, 2011). ACCF/AHA. 2009. Methodology manual for ACCF/AHA guideline writing committees. http://www.americanheart.org/downloadable/heart/12378388766452009MethodologyManualACCF_AHAGuidelineWritingCommittees.pdf (accessed July 29, 2009). ACCP (American College of Chest Physicians). 2009. The ACCP grading system for guideline recommendations. http://www.chestnet.org/education/hsp/gradingSystem.php (accessed February 1, 2011). Ammerman, A., M. Pignone, L. Fernandez, K. Lohr, A. D. Jacobs, C. Nester, T. Orleans, N. Pender, S. Woolf, S. F. Sutton, L. J. Lux, and L. Whitener. 2002. Counseling to promote a healthy diet. http://www.ahrq.gov/downloads/pub/prevent/pdfser/dietser.pdf (accessed September 26, 2010). Anello, C., and J. L. Fleiss. 1995. Exploratory or analytic meta-analysis: Should we distinguish between them? Journal of Clinical Epidemiology 48(1):109–116. Anzures-Cabrera, J., and J. P. T. Higgins. 2010. Graphical displays for meta-analysis: An overview with suggestions for practice. Research Synthesis Methods 1(1):66–89. Atkins, D. 2007. Creating and synthesizing evidence with decision makers in mind: Integrating evidence from clinical trials and other study designs. Medical Care 45(10 Suppl 2):S16–S22. Atkins, D., D. Best, P. A. Briss, M. Eccles, Y. Falck-Ytter, S. Flottorp, and GRADE Working Group. 2004a. Grading quality of evidence and strength of recommendations. BMJ 328(7454):1490–1497. Atkins, D., M. Eccles, S. Flottorp, G. Guyatt, D. Henry, S. Hill, A. Liberati, D. O’Connell, A. D. Oxman, B. Phillips, H. Schünemann, T. T. Edejer, G. Vist, J. Williams, and the GRADE Working Group. 2004b. Systems for grading the quality of evidence and the strength of recommendations I: Critical appraisal of existing approaches. BMC Health Services Research 4(1):38.
OCR for page 189
Finding What Works in Health Care: Standards for Systematic Reviews Bailar, J. C., III. 1997. The promise and problems of meta-analysis. New England Journal of Medicine 337(8):559–561. Balshem, H., M. Helfand, H. J. Schünemann, A. D. Oxman, R. Kunz, J. Brozek, G. E. Vist, Y. Falck-Ytter, J. Meerpohl, S. Norris, and G. H. Guyatt. 2011. GRADE guidelines: 3. Rating the quality of evidence. Journal of Clinical Epidemiology (In press). Berlin, J. A., J. Santanna, C. H. Schmid, L. A. Szczech, H. I. Feldman, and the Anti-lymphocyte Antibody Induction Therapy Study Group. 2002. Individual patient versus group-level data meta-regressions for the investigation of treatment effect modifiers: Ecological bias rears its ugly head. Statistics in Medicine 21(3):371–387. Berlin, J., and D. Ghersi. 2004. Prospective meta-analysis in dentistry. The Journal of Evidence-Based Dental Practice 4(1):59–64. ———. 2005. Preventing publication bias: Registries and prospective meta-analysis. Publication bias in meta-analysis: Prevention, assessment and adjustments, edited by H. R. Rothstein, A. J. Sutton, and M. Borenstein, pp. 35–48. Berry, S., K. Ishak, B. Luce, and D. Berry. 2010. Bayesian meta-analyses for comparative effectiveness and informing coverage decisions. Medical Care 48(6):S137. Borenstein, M. 2009. Introduction to meta-analysis. West Sussex, U.K.: John Wiley & Sons. Brozek, J. L., E. A. Aki, P. Alonso-Coelle, D. Lang, R. Jaeschke, J. W. Williams, B. Phillips, M. Lelgemann, A. Lethaby, J. Bousquet, G. Guyatt, H. J. Schünemann, and the GRADE Working Group. 2009. Grading quality of evidence and strength of recommendations in clinical practice guidelines: Part 1 of 3. An overview of the GRADE approach and grading quality of evidence about interventions. Allergy 64(5):669–677. CEBM (Centre for Evidence-based Medicine). 2009. Oxford Centre for Evidence-based Medicine—Levels of evidence (March 2009). http://www.cebm.net/index. aspx?o=1025 (accessed February 1, 2011). Chalmers, I., M. Adams, K. Dickersin, J. Hetherington, W. Tarnow-Mordi, C. Meinert, S. Tonascia, and T. C. Chalmers. 1990. A cohort study of summary reports of controlled trials. JAMA 263(10):1401–1405. Chou, R., N. Aronson, D. Atkins, A. S. Ismaila, P. Santaguida, D. H. Smith, E. Whitlock, T. J. Wilt, and D. Moher. 2010. AHRQ series paper 4: Assessing harms when comparing medical interventions: AHRQ and the Effective Health Care Program. Journal of Clinical Epidemiology 63(5):502–512. Cochrane Collaboration. 2010. Cochrane prospective meta-analysis methods group. http://pma.cochrane.org/ (accessed January 27, 2011). Coleman, C. I., R. Talati, and C. M. White. 2009. A clinician’s perspective on rating the strength of evidence in a systematic review. Pharmacotherapy 29(9):1017–1029. COMPUS (Canadian Optimal Medication Prescribing and Utilization Service). 2005. Evaluation tools for Canadian Optimal Medication Prescribing and Utilization Service. http://www.cadth.ca/media/compus/pdf/COMPUS_Evaluation_Methodology_final_e.pdf (accessed September 6, 2010). Cooper, H. M., L. V. Hedges, and J. C. Valentine. 2009. The handbook of research synthesis and meta-analysis, 2nd ed. New York: Russell Sage Foundation. Cooper, N., A. Sutton, D. Morris, A. Ades, and N. Welton. 2009. Addressing between-study heterogeneity and inconsistency in mixed treatment comparisons: Application to stroke prevention treatments in individuals with non-rheumatic atrial fibrillation. Statistics in Medicine 28(14):1861–1881. CRD (Centre for Reviews and Dissemination). 2009. Systematic reviews: CRD’s guidance for undertaking reviews in health care. York, U.K.: York Publishing Services, Ltd.
OCR for page 190
Finding What Works in Health Care: Standards for Systematic Reviews Cummings, P. 2004. Meta-analysis based on standardized effects is unreliable. Archives of Pediatrics & Adolescent Medicine 158(6):595–597. Deeks, J., J. Higgins, and D. Altman, eds. 2008. Chapter 9: Analysing data and undertaking meta-anayses. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins and S. Green. Chichester, UK: John Wiley & Sons. Devereaux, P. J., D. Heels-Ansdell, C. Lacchetti, T. Haines, K. E. Burns, D. J. Cook, N. Ravindran, S. D. Walter, H. McDonald, S. B. Stone, R. Patel, M. Bhandari, H. J. Schünemann, P. T. Choi, A. M. Bayoumi, J. N. Lavis, T. Sullivan, G. Stoddart, and G. H. Guyatt. 2004. Payments for care at private for-profit and private not-for-profit hospitals: A systematic review and meta-analysis. Canadian Medical Association Journal 170(12):1817–1824. Dias, S., N. Welton, D. Caldwell, and A. Ades. 2010. Checking consistency in mixed treatment comparison meta analysis. Statistics in Medicine 29(7 8):932–944. Dickersin, K. 1990. The existence of publication bias and risk factors for its occurrence. JAMA 263(10):1385–1389. Dwan, K., D. G. Altman, J. A. Arnaiz, J. Bloom, A.-W. Chan, E. Cronin, E. Decullier, P. J. Easterbrook, E. Von Elm, C. Gamble, D. Ghersi, J. P. A. Ioannidis, J. Simes, and P. R. Williamson. 2008. Systematic review of the empirical evidence of study publication bias and outcome reporting bias. PLoS ONE 3(8):e3081. Ebell, M. H., J. Siwek, B. D. Weiss, S. H. Woolf, J. Susman, B. Ewigman, and M. Bowman. 2004. Strength of recommendation taxonomy (SORT): A patient-centered approach to grading evidence in medical literature. American Family Physician 69(3):548–556. Editors. 2005. Reviews: Making sense of an often tangled skein of evidence. Annals of Internal Medicine 142(12 Pt 1):1019–1020. Egger, M., G. D. Smith, and D. G. Altman. 2001. Systematic reviews in health care: Meta-analysis in context. London, U.K.: BMJ Publishing Group. Falck-Ytter, Y., H. Schünemann, and G. Guyatt. 2010. AHRQ series commentary 1: Rating the evidence in comparative effectiveness reviews. Journal of Clinical Epidemiology 63(5):474–475. Faraday, M., H. Hubbard, B. Kosiak, and R. Dmochowski. 2009. Staying at the cutting edge: A review and analysis of evidence reporting and grading; The recommendations of the American Urological Association. BJU International 104(3): 294–297. Federal Coordinating Council for Comparative Effectiveness Research. 2009. Report to the President and the Congress. Available from http://www.hhs.gov/recovery/programs/cer/cerannualrpt.pdf. Ferreira, P. H., M. L. Ferreira, C. G. Maher, K. Refshauge, R. D. Herbert, and J. Latimer. 2002. Effect of applying different “levels of evidence” criteria on conclusions of Cochrane reviews of interventions for low back pain. Journal of Clinical Epidemiology 55(11):1126–1129. Fu, R., G. Gartlehner, M. Grant, T. Shamliyan, A. Sedrakyan, T. J. Wilt, L. Griffith, M. Oremus, P. Raina, A. Ismaila, P. Santaguida, J. Lau, and T. A. Trikalinos. 2010. Conducting quantitative synthesis when comparing medical interventions: AHRQ and the Effective Health Care Program. In Methods guide for comparative effectiveness reviews, edited by Agency for Healthcare Research and Quality. http://www.effectivehealthcare.ahrq.gov/index.cfm/search-for-guidesreviews-and-reports/?pageaction=displayProduct&productID=554 (accessed January 19, 2011). Gerber, S., D. Tallon, S. Trelle, M. Schneider, P. Jüni, and M. Egger. 2007. Bibliographic study showed improving methodology of meta-analyses published in leading journals: 1993–2002. Journal of Clinical Epidemiology 60(8):773–780.
OCR for page 191
Finding What Works in Health Care: Standards for Systematic Reviews Ghersi, D., J. Berlin, and L. Askie, eds. 2008. Chapter 19: Prospective meta-analysis. Edited by J. Higgins and S. Green, Cochrane handbook for systematic reviews of interventions. Chichester, UK: John Wiley & Sons. Glasziou, P., J. Vandenbroucke, and I. Chalmers. 2004. Assessing the quality of research. BMJ 328(7430):39–41. Gluud, L. L. 2006. Bias in clinical intervention research. American Journal of Epidemiology 163(6):493–501. GRADE Working Group. 2010. Organizations that have endorsed or that are using GRADE. http://www.gradeworkinggroup.org/society/index.htm (accessed September 20, 2010). Greenland, S. 1994. Invited commentary: A critical look at some popular meta-analytic methods. American Journal of Epidemiology 140(3):290–296. Guirguis-Blake, J., N. Calonge, T. Miller, A. Siu, S. Teutsch, E. Whitlock, and for the U.S. Preventive Services Task Force. 2007. Current processes of the U.S. Preventive Services Task Force: Refining evidence-based recommendation development. Annals of Internal Medicine 147:117–122. Guyatt, G. H., D. L. Sackett, J. C. Sinclair, R. Hayward, D. J. Cook, and R. J. Cook. 1995. Users’ guides to the medical literature: A method for grading health care recommendations. JAMA 274(22):1800–1804. Guyatt, G., H. J. Schünemann, D. Cook, R. Jaeschke, and S. Pauker. 2004. Applying the grades of recommendation for antithrombotic and thrombolytic therapy: The seventh ACCP conference on antithrombotic and thrombolytic therapy. Chest 126(3 Suppl):179S–187S. Guyatt, G., A. D. Oxman, E.A. Akl, R. Kunz, G. Vist, J. Brozek, S. Norris, Y. Falck-Ytter, P. Glasziou, H. deBeer, R. Jaeschke, D. Rind, J. Meerpohl, P. Dahm, and H. J. Schünemann. 2010. GRADE guidelines 1. Introduction—GRADE evidence profiles and summary of findings tables. Journal of Clinical Epidemiology (In press). Harris, R. P., M. Helfand, S. H. Woolf, K. N. Lohr, C. D. Mulrow, S. M. Teutsch, D. Atkins, and the Methods Work Group Third U. S. Preventive Services Task Force. 2001. Current methods of the U.S. Preventive Services Task Force: A review of the process. American Journal of Preventive Medicine 20(3 Suppl):21–35. Helfand, M. 2005. Using evidence reports: Progress and challenges in evidence-based decision making. Health Affairs 24(1):123–127. HHS (U.S. Department of Health and Human Services). 2010. The Sentinel Initiative: A national strategy for monitoring medical product safety. Available from http://www.fda.gov/Safety/FDAsSentinelInitiative/ucm089474.htm. Higgins, J. P. T., and S. G. Thompson. 2004. Controlling the risk of spurious findings from meta-regression. Statistics in Medicine 23(11):1663–1682. Higgins, J. P. T., S. G. Thompson, J. J. Deeks, and D. G. Altman. 2003. Measuring inconsistency in meta-analyses. BMJ 327(7414):557–560. Hopewell, S., K. Loudon, M. J. Clarke, A. D. Oxman, and K. Dickersin. 2009. Publication bias in clinical trials due to statistical significance or direction of trial results (Review). Cochrane Database of Systematic Reviews 1:MR000006. Hopewell, S., J. Clarke Mike, L. Stewart, and J. Tierney. 2008. Time to publication for results of clinical trials (Review). Cochrane Database of Systematic Reviews (2). Hsieh, C., A. H. Adams, J. Tobis, C. Hong, C. Danielson, K. Platt, F. Hoehler, S. Reinsch, and A. Rubel. 2002. Effectiveness of four conservative treatments for subacute low back pain: A randomized clinical trial. Spine 27(11):1142–1148. ICSI (Institute for Clinical Systems Improvement). 2003. Evidence grading system. http://www.icsi.org/evidence_grading_system_6/evidence_grading_system_pdf_.html (accessed September 8, 2009).
OCR for page 192
Finding What Works in Health Care: Standards for Systematic Reviews IOM (Institute of Medicine). 2009. Initial national priorities for comparative effectiveness research. Washington, DC: The National Academies Press. Kirkham, J. J., K. M. Dwan, D. G. Altman, C. Gamble, S. Dodd, R. Smyth, and P. R. Williamson. 2010. The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews. BMJ 340:c365. Lau, J., J. P. A. Ioannidis, and C. H. Schmid. 1998. Summing up evidence: One answer is not always enough. Lancet 351(9096):123–127. Lefebvre, C., E. Manheimer, and J. Glanville. 2008. Chapter 6: Searching for studies. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins and S. Green. Chichester, UK: John Wiley & Sons. Lu, G., and A. E. Ades. 2004. Combination of direct and indirect evidence in mixed treatment comparisons. Statistics in Medicine 23(20):3105–3124. Mullen, P. D., and G. Ramirez. 2006. The promise and pitfalls of systematic reviews. Annual Review of Public Health 27:81–102. Mulrow, C. D., and K. N. Lohr. 2001. Proof and policy from medical research evidence. Journal of Health Politics Policy and Law 26(2):249–266. Mulrow, C., P. Langhorne, and J. Grimshaw. 1997. Integrating heterogeneous pieces of evidence in systematic reviews. Annals of Internal Medicine 127(11):989–995. NCCN (National Comprehensive Cancer Network). 2008. About the NCCN clinical practice guidelines in oncology. http://www.nccn.org/professionals/physician_gls/about.asp (accessed September 8, 2009). Norris, S., D. Atkins, W. Bruening, S. Fox, E. Johnson, R. Kane, S. C. Morton, M. Oremus, M. Ospina, G. Randhawa, K. Schoelles, P. Shekelle, and M. Viswanathan. 2010. Selecting observational studies for comparing medical interventions. In Methods guide for comparative effectiveness reviews, edited by Agency for Healthcare Research and Quality. http://www.effectivehealthcare.ahrq.gov/index.cfm/search-for-guides-reviews-and-reports/?pageaction=displayProduct&productID=454 (accessed January 19, 2011). NZGG (New Zealand Guidelines Group). 2007. Handbook for the preparation of explicit evidence-based clinical practice guidelines. http://www.nzgg.org.nz/download/files/nzgg_guideline_handbook.pdf (accessed February 1, 2011). O’Rourke, K., and D. Altman. 2005. Bayesian random effects meta-analysis of trials with binary outcomes: Methods for the absolute risk difference and relative risk scales Statistics in Medicine 24(17):2733–2742. Owens, D. K., K. N. Lohr, D. Atkins, J. R. Treadwell, J. T. Reston, E. B. Bass, S. Chang, and M. Helfand. 2010. Grading the strength of a body of evidence when comparing medical interventions: AHRQ and the Effective Health Care Program. Journal of Clinical Epidemiology 63(5):513–523. Pham, H. H., D. Schrag, A. S. O’Malley, B. Wu, and P. B. Bach. 2007. Care patterns in Medicare and their implications for pay for performance. New England Journal of Medicine 356(11):1130–1139. Platt, R. 2010. FDA’s Mini-Sentinel program. http://www.brookings.edu/~/media/Files/events/2010/0111_sentinel_workshop/06%20Sentinel%20Initiative%20Platt%20Brookings%2020100111%20v05%20distribution.pdf (accessed October 25, 2010). Platt, R., M. Wilson, K. A. Chan, J. S. Benner, J. Marchibroda, and M. McClellan. 2009. The new Sentinel Network: Improving the evidence of medical-product safety. New England Journal of Medicine 361(7):645–647. Rayleigh, J. W. 1884. Address by the Rt. Hon. Lord Rayleigh. In Report of the fifty-fourth meeting of the British Association for the Advancement of Science, edited by Murray J. Montreal.
OCR for page 193
Finding What Works in Health Care: Standards for Systematic Reviews Riley, R. D., and E. W. Steyerberg. 2010. Meta-analysis of a binary outcome using individual participant data and aggregate data. Research Synthesis Methods 1(1):2–19. Rothstein, H. R., A. J. Sutton, and M. Borenstein, editors. 2005. Publication bias in metaanalysis: Prevention, assessment and adjustments. Chichester, U.K.: Wiley. Salanti, G., V. Marinho, and J. Higgins. 2009. A case study of multiple-treatments meta-analysis demonstrates that covariates should be considered. Journal of Clinical Epidemiology 62(8):857–864. Salanti, G., S. Dias, N. J. Welton, A. Ades, V. Golfinopoulos, M. Kyrgiou, D. Mauri, and J. P. A. Ioannidis. 2010. Evaluating novel agent effects in multiple-treatments meta-regression. Statistics in Medicine 29(23):2369–2383. Salpeter, S., E. Greyber, G. Pasternak, and E. Salpeter. 2004. Risk of fatal and nonfatal lactic association with metformin use in type 2 diabetes mellitus. Cochrane Database of Systematic Reviews 4:CD002967. Schmid, C. 2001. Using bayesian inference to perform meta-analysis. Evaluation & the Health Professions 24(2):165. Schmid, C. H., P. C. Stark, J. A. Berlin, P. Landais, and J. Lau. 2004. Meta-regression detected associations between heterogeneous treatment effects and study-level, but not patient-level, factors. Journal of Clinical Epidemiology 57(7):683–697. Schriger, D. L., D. G. Altman, J. A. Vetter, T. Heafner, and D. Moher. 2010. Forest plots in reports of systematic reviews: A cross-sectional study reviewing current practice. International Journal of Epidemiology 39(2):421–429. Schünemann, H., D. Best, G. Vist, and A. D. Oxman. 2003. Letters, numbers, symbols and words: How to communicate grades of evidence and recommendations. Canadian Medical Association Journal 169(7):677–680. Schünemann, H., A. D. Oxman, G. Vist, J. Higgins, J. Deeks, P. Glasziou, and G. Guyatt. 2008. Chapter 12: Interpreting results and drawing conclusions. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins and S. Green. Chichester, UK: John Wiley & Sons. Schünemann, H. J., J. Brożek, and A. D. Oxman. 2009. GRADE handbook for grading quality of evidence and strength of recommendations. Version 3.2 [updated March 2009]. http://www.cc-ims.net/gradepro (accessed November 10, 2010). SIGN (Scottish Intercollegiate Guidelines Network). 2009. SIGN 50: A guideline developer’s handbook. http://www.sign.ac.uk/guidelines/fulltext/50/index.html (accessed Februray 1, 2011). Silagy, C.A., P. Middelton, and S. Hopewell. 2002. Publishing protocols of systematic reviews: Comparing what was done to what was planned. JAMA 287:2831–2834. Simmonds, M., J. Higginsa, L. Stewartb, J. Tierneyb, M. Clarke, and S. Thompson. 2005. Meta-analysis of individual patient data from randomized trials: A review of methods used in practice. Clinical Trials 2(3):209. Slone Survey. 2006. Patterns of medication use in the United States, 2006: A report from the Slone Survey. http://www.bu.edu/slone/SloneSurvey/AnnualRpt/SloneSurveyWebReport2006.pdf (accessed February 1, 2011). Smith, G., M. Egger, and A. Phillips. 1997. Meta-analysis: Beyond the grand mean? BMJ 315(7122):1610. Smith, T., D. Spiegelhalter, and A. Thomas. 1995. Bayesian approaches to random-effects meta-analysis: A comparative study. Statistics in Medicine 14(24):2685–2699. Song, F., S. Parekh-Bhurke, L. Hooper, Y. Loke, J. Ryder, A.J. Sutton, C.B. Hing, and I. Harvey. 2009. Extent of publication bias in different categories of research cohorts: a meta-analysis of empirical studies. BMC Medical Research Methodology 9:79.
OCR for page 194
Finding What Works in Health Care: Standards for Systematic Reviews Song, F., S. Parekh, L. Hooper, Y. K. Loke, J. Ryder, A. J. Sutton, C. Hing, C. S. Kwok, C. Pang, and I. Harvey. 2010. Dissemination and publication of research findings: an updated review of related biases. Health Technology Assessment 14(8). Sterne, J., P. Jüni, K. Schulz, D. Altman, C. Bartlett, and M. Egger. 2002. Statistical methods for assessing the influence of study characteristics on treatment effects in ‘meta epidemiological’ research. Statistics in Medicine 21(11):1513–1524. Stewart, L. 1995. Practical methodology of meta-analyses (overviews) using updated individual patient data. Statistics in Medicine 14(19):2057–2079. Sutton, A., and K. Abrams. 2001. Bayesian methods in meta-analysis and evidence synthesis. Statistical Methods in Medical Research 10(4):277. Sutton, A. J., and J. P. Higgins. 2008. Recent developments in meta-analysis. Statistics in Medicine 27(5):625–650. Sutton, A. J., K. R. Abams (Q: Abrams?), D. R. Jones, T. A. Sheldon, and F. Song. 2000. Methods for meta-analysis in medical research, Wiley series in probability and statistics. Chichester, U.K.: John Wiley & Sons. Thompson, S., and J. Higgins. 2002. How should meta-regression analyses be undertaken and interpreted? Statistics in Medicine 21(11):1559–1573. Tierney, J., M. Clarke, and L. Stewart. 2000. Is there bias in the publication of individual patient data meta-analyses? International Journal of Technology Assessment in Health Care 16(02):657–667. Turner, E. H., A. M. Matthews, E. Linardatos, R. A. Tell, and R. Rosenthal. 2008. Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine 358(3):252–260 USPSTF (U.S. Preventive Services Task Force). 2008. Grade definitions. http://www.ahrq.gov/clinic/uspstf/grades.htm (accessed January 6, 2010). Vogeli, C., A. Shields, T. Lee, T. Gibson, W. Marder, K. Weiss, and D. Blumenthal. 2007. Multiple chronic conditions: Prevalence, health consequences, and implications for quality, care management, and costs. Journal of General Internal Medicine 22(Suppl. 3):391–395. Walker, E., A. V. Hernandez, and M. W. Kattan. 2008. Meta-analysis: Its strengths and limitations. Cleveland Clinic Journal of Medicine 75(6):431–439. Warn, D., S. Thompson, and D. Spiegelhalter. 2002. Bayesian random effects metaanalysis of trials with binary outcomes: Methods for the absolute risk difference and relative risk scales. Statistics in Medicine 21(11):1601–1623. West, S., V. King, T. S. Carey, K. N. Lohr, N. McKoy, S. F. Sutton, and L. Lux. 2002. Systems to rate the strength of scientific evidence. Evidence Report/Technology Assessment No. 47 (prepared by the Research Triangle Institute–University of North Carolina Evidence-based Practice Center under Contract No. 290-97-0011). AHRQ Publication No. 02-E016:64–88. West, S. L., G. Gartlehner, A. J. Mansfield, C. Poole, E. Tant, N. Lenfestey, L. J. Lux, J. Amoozegar, S. C. Morton, T. C. Carey, M. Viswanathan, and K. N. Lohr. 2010. Comparative effectiveness review methods: Clinical heterogeneity. http://www.effectivehealthcare.ahrq.gov/ehc/products/93/533/Clinical_Heteogeneity_Revised_Report_FINAL%209-24-10.pdf (accessed September 28, 2010).