1
Introduction
Abstract: This chapter presents the objectives and context for this report and describes the approach that the Institute of Medicine (IOM) Committee on Standards for Systematic Reviews of Comparative Effectiveness Research used to undertake the study. The committee’s charge was two-fold: first, to assess potential methodological standards that would assure objective, transparent, and scientifically valid systematic reviews (SRs) of comparative effectiveness research and, second, to recommend a set of methodological standards for developing and reporting such SRs. A companion IOM committee was charged with developing standards for trustworthy clinical practice guidelines.
Healthcare decision makers in search of the best evidence to inform clinical decisions have come to rely on systematic reviews (SRs). Well-conducted SRs systematically identify, select, assess, and synthesize the relevant body of research, and will help make clear what is known and not known about the potential benefits and harms of alternative drugs, devices, and other healthcare services. Thus, SRs of comparative effectiveness research (CER) can be essential for clinicians who strive to integrate research findings into their daily practices, for patients to make well-informed choices about their own care, for professional medical societies and other organizations that
develop CPGs, and for payers and policy makers.1 A brief overview of the current producers and users of SRs is provided at the end of the chapter. SRs can also inform medical coverage decisions and be used to set agendas and funding for primary research by highlighting gaps in evidence. Although the importance of SRs is gaining appreciation, the quality of published SRs is variable and often poor (Glasziou et al., 2008; Hopewell et al., 2008b; Liberati et al., 2009; Moher et al., 2007). In many cases, the reader cannot judge the quality of an SR because the methods are poorly documented (Glenton et al., 2006). If methods are described, they may be used inappropriately, such as in meta-analyses (Glenny et al., 2005; Laopaiboon, 2003). One cannot assume that SRs, even when published in well-regarded journals, use recommended methods to minimize bias (Bassler et al., 2007; Colliver et al., 2008; Roundtree et al., 2008; Song et al., 2009; Steinberg and Luce, 2005; Turner et al., 2008). Many SRs fail to assess the quality of the included research (Delaney et al., 2007; Mallen et al., 2006; Tricco et al., 2008) and neglect to report funding sources (Lundh et al., 2009; Roundtree et al., 2008). A plethora of conflicting approaches to evidence hierarchies and grading schemes for bodies of evidence is a further source of confusion (Glasziou et al., 2004; Lohr, 2004; Schünemann et al., 2003).
In its 2008 report, Knowing What Works in Health Care: A Roadmap for the Nation, the Institute of Medicine (IOM) recommended that methodological standards be developed for SRs that focus on research on the effectiveness of healthcare interventions and for CPGs (IOM, 2008). The report concluded that decision makers would be helped significantly by development of standards for both SRs and CPGs, especially with respect to transparency, minimizing bias and conflict of interest, and clarity of reporting. The IOM report was soon followed by a congressional mandate in the Medicare Improvements for Patients and Providers Act of 20082 for two follow-up IOM studies: one, to develop standards for conducting SRs, and the other to develop standards for CPGs. The legislation directs the IOM to recommend methodological standards to ensure that SRs and CPGs “are objective, scientifically valid, and consistent.”
In response to this congressional directive, the IOM entered into a contract with the Agency for Healthcare Research and Quality (AHRQ) in July 2009 to produce both studies at the same time.
The IOM appointed two independent committees to undertake the projects. The 16-member3 Committee on Standards for Systematic Reviews of Comparative Effectiveness Research included experts in biostatistics and epidemiology, CER, CPG development, clinical trials, conflict of interest, clinical care and delivery of healthcare services, consumer perspectives, health insurance, implementation science, racial and ethnic disparities, SR methods, and standards of evidence. Brief biographies of the SR committee members are presented in Appendix I. This report presents the findings and recommendations of the SR committee. A companion report, Clinical Practice Guidelines We Can Trust, presents the findings and recommendations of the Committee on Standards for Developing Trustworthy Clinical Practice Guidelines.
COMMITTEE CHARGE
The charge to the SR committee was two-fold: first, to assess potential methodological standards that would assure objective, transparent, and scientifically valid SRs of CER, and second, to recommend a set of methodological standards for developing and reporting such SRs (Box 1-1).
WHAT IS COMPARATIVE EFFECTIVENESS RESEARCH?
In recent years, various terms such as evidence-based medicine, health technology assessment, clinical effectiveness research, and comparative effectiveness research have been used to describe healthcare research that focuses on generating or synthesizing evidence to inform real-world clinical decisions (Luce et al., 2010). While the legislation that mandated this study used the term clinical effectiveness research, the committee could not trace the ancestry of the phrase and was uncertain about its meaning separate from the phrase comparative effectiveness research in general use by clinicians, researchers, and policy makers. Thus, this report adopts the more commonly used terminology—comparative effectiveness research and defines CER as proposed in the IOM report, Initial National Priorities for Comparative Effectiveness Research (IOM, 2009, p. 42):
CER is the generation and synthesis of evidence that compares the benefits and harms of alternative methods to prevent, diagnose, treat, and monitor a clinical condition or
BOX 1-1 Charge to the Committee on Standards for Systematic Reviews of Comparative Effectiveness Research An ad hoc committee will conduct a study to recommend methodological standards for systematic reviews (SRs) of comparative effectiveness research on health and health care. The standards should ensure that the reviews are objective, transparent, and scientifically valid, and require a common language for characterizing the strength of the evidence. Decision makers should be able to rely on SRs of comparative effectiveness to know what is known and not known and to describe the extent to which the evidence is applicable to clinical practice and particular patients. In this context, the committee will:
|
to improve the delivery of care. The purpose of CER is to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels.
Research that is compatible with the aims of CER has six defining characteristics (IOM, 2009):
-
The objective is to inform a specific clinical decision.
-
It compares at least two alternative interventions, each with the potential to be “best practice.”
-
It addresses and describes patient outcomes at both a population and a subgroup level.
-
It measures outcomes that are important to patients, including harms as well as benefits.
-
It uses research methods and data sources that are appropriate for the decision of interest.
-
It is conducted in settings as close as possible to the settings in which the intervention will be used.
Body of Evidence for Systematic Reviews of Comparative Effectiveness Research
The body of evidence for an SR of CER includes randomized controlled trials (RCTs) and observational studies such as cohort
studies, cross-sectional studies, case-control studies, registries, and SRs themselves (Box 1-2). RCTs have an ideal design to answer questions about the comparative effects of different interventions across a wide variety of clinical circumstances. However, to be applicable to real-world clinical decision making, SRs should assess well-
BOX 1-2 Types of Comparative Effectiveness Research Studies Experimental study: A study in which the investigators actively intervene to test a hypothesis.
Observational study: A study in which investigators simply observe the course of events.
Systematic review (SR): A scientific investigation that focuses on a specific question and that uses explicit, planned scientific methods to identify, select, assess, and summarize the findings of similar but separate studies. It may or may not include a quantitative synthesis (meta-analysis) of the results from separate studies.
SOURCE: Adapted from Last (1995). |
designed research on the comparative effectiveness of alternative treatments that includes a broad range of participants, describes results at both the population and subgroup levels, and measures outcomes (both benefits and harms) that are important to patients, and reflects results in settings similar to those in which the intervention is used in practice. Many RCTs lack these features (IOM, 2009). As a result, in certain situations and for certain questions, decision makers find it limiting to use SRs that are confined to RCTs.
Observational research is particularly useful for identifying an intervention’s potential for unexpected effects or harms because many adverse events are too rare to be observed during typical RCTs or do not occur until after the trial ends (Chou et al., 2010; Reeves
BOX 1-3 Four Examples of the Use of Observational Studies in Systematic Reviews of Comparative Effectiveness Research Important outcomes are not captured in randomized controlled trials (RCTs) More than 50 RCTs of triptans focused on the speed and degree of migraine pain relief related to a few isolated episodes of headache. These trials provided no evidence about two outcomes important to patients: the reliability of migraine relief from episode to episode over a long period of time, and the overall effect of use of the triptan on work productivity. The best evidence for these outcomes came from a time-series study based on employment records merged with prescription records comparing work days lost before and after a triptan became available. Although the study did not compare one triptan with another, the study provided data that a particular triptan improved work productivity—information that was not available in RCTs. Available trials of antipsychotic medications for schizophrenia included a narrow spectrum of participants and only evaluated short-term outcomes In a systematic review (SR) of antipsychotic medications, 17 short-term efficacy trials evaluated a relatively narrow spectrum of patients with schizophrenia, raising a number of questions: Is the effect size observed in the RCTs similar to that observed in practice? Do groups of patients excluded from the trials respond as frequently and as well as those included in the trials? Are long-term outcomes similar to short-term outcomes? For a broad spectrum of patients with schizophrenia who are initiating treatment with an atypical antipsychotic medication, which drugs have better persistency and sustained effectiveness for longer term follow-up (e.g., 6 months to 2 years)? Given the many questions not addressed by RCTs, these review authors determined that they would examine and include observational studies. Meta-analyses of RCTs were conducted where appropriate, but most of the data were summarized qualitatively. |
et al., 2008). Moreover, observational studies may provide evidence about the performance of an intervention in everyday practice or about outcomes that were not evaluated in available RCTs (Box 1-3). Despite their potential advantages, however, observational studies are at greater risk of bias compared to randomized studies for determining intervention effectiveness.
STUDY SCOPE
This report presents methodological standards for SRs that are designed to inform everyday healthcare decision making, especially for patients, clinicians and other healthcare providers, and develop-
Participants in trials comparing percutaneous coronary intervention (PCI) versus coronary artery bypass graft (CABG) differed from patients seen in community practices An SR of PCI versus CABG for coronary disease identified 23 relevant RCTs. At the outset, cardiothoracic surgical experts raised concerns that the trials enrolled patients with a relatively narrow spectrum of disease (generally single- or two-vessel disease) relative to patients receiving the procedures in current practice. Thus, the review included 96 articles reporting findings from 10 large cardiovascular registries. The registry data confirmed that the choice between the two procedures in the community varied substantially with extent of coronary disease. For patients similar to those enrolled in the trials, mortality results in the registries reinforced the findings from trials (i.e., no difference in mortality between PCI and CABG). At the same time, the registries reported that the relative mortality benefits of PCI versus CABG varied markedly with extent of disease, raising caution about extending trial conclusions to patients with greater or lesser disease than those in the trial population. Paucity of trial data on using a commonly prescribed drug for a specific indication, that is, heparin for burn injury In an SR on heparin to treat burn injury, the review team determined very early in its process that observational data should be included. Based on preliminary, cursory reviews of the literature and input from experts, the authors determined that there were few (if any) RCTs on the use of heparin for this indication. Therefore, they decided to include all types of studies that included a comparison group before running the main literature searches. SOURCES: Adapted from Norris et al. (2010), including Bravata et al. (2007); Helfand and Peterson (2003); McDonagh et al. (2008); and Oremus et al. (2006). |
ers of CPGs. The focus is on the development and reporting of comprehensive, publicly funded SRs of the comparative effectiveness of therapeutic medical or surgical interventions.
The recent health reform legislation underscores the imperative for establishing SR standards, calling for a new research institute similar to the national program envisioned in Knowing What Works. The Patient Protection and Affordable Care Act of 20104 created the nation’s first nonprofit, public–private Patient-Centered Outcomes Research Institute (PCORI). It will be responsible for establishing and implementing a research agenda—including SRs of CER—to help patients, clinicians, policy makers, and purchasers in making informed healthcare decisions. As this report was being developed, the plans for PCORI were underway. An initial task of the newly appointed PCORI governing board is to establish a standing methodology committee charged with developing and improving the science and methods of CER. The IOM committee undertook its work with the intention to inform the PCORI methodology committee’s own standards development. The IOM committee also views other public sponsors of SRs of CER as key audiences for this report, including the AHRQ Effective Health Care Program, Medicare Evidence Development & Coverage Advisory Committee (MEDCAC), Drug Effectiveness Research Project (DERP), National Institutes of Health, Centers for Disease Control and Prevention, and U.S. Preventive Services Task Force. See Table 1-1 for a brief overview of the statutory requirements for PCORI.
Outside the Scope of the Study
As noted earlier, this report focuses on methods for producing comprehensive, publicly funded SRs of the comparative effectiveness of therapeutic interventions. The report’s recommended standards are not intended for SRs initiated and conducted for purely academic purposes. Nor does the report address SR methods for synthesizing research on diagnostic tests, disease etiology or prognosis, systems improvement, or patient safety practices. The evidence base and expert guidance for SRs on these topics is considerably less advanced. For example, while the Cochrane Collaboration issued its fifth edition of its handbook for SRs of interventions in 2008 (Higgins and Green, 2008), a Cochrane diagnostics handbook is still under development (Cochrane Collaboration Diagnostic Test
TABLE 1-1 Statutory Requirements for the Patient-Centered Outcomes Research Institute
Topic |
Provisions |
Purpose |
|
Organization |
|
Funding |
|
Oversight |
|
Research |
|
Dissemination and transparency |
|
SOURCE: Clancy and Collins (2010). |
Accuracy Working Group, 2011). AHRQ methods guidance for SRs of diagnostics and prognosis is also underway.
Finally, the utility of an SR is only as good as the body of individual studies available. A considerable literature documents the shortcomings of reports of individual clinical trials and observational research (Altman et al., 2001; Glasziou et al., 2008; Hopewell et al., 2008b; Ioannidis et al., 2004; Plint et al., 2006; von Elm et al., 2007). This report will emphasize that the quality of individual studies must be scrutinized during the course of an SR. However, it is beyond the scope of this report to examine the many quality-scoring systems that
have been developed to measure the quality of individual research studies (Brand, 2009; Hopewell et al., 2008a; Moher et al., 2010).
Relationship with the Committee on Standards for Developing Trustworthy Clinical Practice Guidelines
The boundaries of this study were defined in part by the work of the companion CPG study (Box 1-4). A coordinating group5 for the two committees met regularly to consider the interdependence of SRs and CPGs and to minimize duplication of effort. The coordinating group agreed early on that SRs are critical inputs to the guideline development process. It also decided that the SR committee would limit its focus to the development of SRs, starting with the formulation of the research question and ending with the completion of a final report—while paying special attention to the role of SRs in supporting CPGs. At the same time, the CPG committee would work under the assumption that guideline developers have access to high- quality SRs (as defined by the SR committee’s recommended standards) that address their specific research questions, and would discuss what steps in an SR are particularly important for a CPG. In Chapter 2 of this report, the SR committee addresses how the SR and CPG teams may interact when an SR is being conducted to inform a specific CPG.
CONCEPTUAL FRAMEWORK
Fundamentals of Systematic Reviews
Experts agree on many of the key attributes of a high-quality SR (CRD, 2009; Higgins and Green, 2008; Owens et al., 2010). The objective of an SR is to answer a specific research question by using an explicit, preplanned protocol to identify, select, assess, and summarize the findings of similar but separate studies. SRs often include—but do not require—a quantitative synthesis (meta-analysis). The SR process can be summarized in six steps:
Step 1: Initiate the process, organize the review team, develop a process for gathering user and stakeholder input, formulate the research question, and implement procedures for minimiz-
BOX 1-4 Charge to the Committee on Standards for Developing Trustworthy Clinical Practice Guidelines An ad hoc committee will conduct a study to recommend standards for developing clinical practice guidelines and recommendations. The standards should ensure that clinical practice guidelines are unbiased, scientifically valid, and trustworthy and also incorporate separate grading systems for characterizing quality of available evidence and strength of clinical recommendations. In this context, the committee should:
|
ing the impact of bias and conflict of interests (see standards in Chapter 2).
Step 2: Develop the review protocol, including the context and rationale for the review and the specific procedures for the search strategy, data collection and extraction, qualitative synthesis and quantitative data synthesis (if a meta-analysis is done), reporting, and peer review (see standards in Chapter 2).
Step 3: Systematically locate, screen, and select the studies for review (see standards in Chapter 3).
Step 4: Appraise the risk of bias in the individual studies and extract the data for analysis (see standards in Chapter 3).
Step 5: Synthesize the findings and assess the overall quality of the body of evidence (see standards in Chapter 4).
Step 6: Prepare a final report and have the report undergo peer review (see standards in Chapter 5).
SRs of CER can be narrow in scope and consist of simple comparisons, such as drug X versus drug Y. They can also address broader topics including comparisons of the effectiveness of drugs versus surgery for a condition, or “watchful waiting” when it is a reason-
able strategy in a clinical context (IOM, 2009). These more complex reviews often include multiple clinical questions that will each need a separate review of the literature, analysis, and synthesis. The committee’s standards apply to both narrow and broad SRs of CER.
The Purpose of Setting Standards
Most disciplines establish standards to articulate their agreed-on performance expectations and to promote accountability for meeting these expectations. Users of SRs and the public have the right to expect that SRs meet minimum standards for objectivity, transparency, and scientific rigor (as the legislative mandate for this study required). For the purposes of this report, the committee defined an SR “standard” as meaning:
A process, action, or procedure for performing SRs that is deemed essential to producing scientifically valid, transparent, and reproducible results. A standard may be supported by scientific evidence, by a reasonable expectation that the standard helps achieve the anticipated level of quality in an SR, or by the broad acceptance of the practice in SRs.
The principal objectives of applying standards to SR methods are: (1) to improve the usefulness of SRs for patients, clinicians, and guideline developers; (2) to increase the impact of SRs on clinical outcomes; (3) to encourage stakeholder “buy-in” and trust in SRs; and (4) to minimize the risks of error and bias. The fourth objective is an essential precursor to the first three. An SR must minimize bias in identifying, selecting, and interpreting evidence to be credible.
METHODS OF THE STUDY
The committee deliberated during four in-person meetings and numerous conference calls between October 2009 and October 2010. During its second meeting, the committee convened a public workshop to learn how various stakeholders use and develop SRs. Panels of SR experts, professional specialty societies, payers, and consumer advocates provided testimony in response to a series of questions posed by the committee in advance of the event. Appendix C provides the workshop agenda and questions. Other experts from selected organizations were also interviewed by committee staff.6
Developing the SR Standards
The committee faced a difficult task in proposing a set of standards where in general the evidence is thin especially with respect to linking characteristics of SRs to clinical outcomes, the ultimate test of quality. There have been important advances in SR methods in recent years. However, the field remains a relatively young one and the evidence that is available does not suggest that high-quality SRs can be done quickly and cheaply. For example, literature searching and data extraction, two fundamental steps in the SR process, are very resource intensive but there is little research to suggest how to make the processes more efficient. Similarly, as noted earlier, observational data can alert researchers to an intervention’s potential for harm but there is little methodological research on ways to identify, assess, or incorporate high-quality observational data in an SR. Moreover, whereas this report concerns the production of comprehensive SR final reports, most research on SR methods focuses on the abridged, page-limited versions of SRs that appear in peer-reviewed journals.
Thus, the committee employed a multistep process to identify, assess, and select potential SR standards. It began by developing a set of assessment criteria, described below, to guide its selection of SR standards (Table 1-2). The next steps were to document expert guidance and to collect the available empirical research on SR methods. In addition, the committee commissioned two reports: one on the role of consumers in developing SRs in the United States and another that helped identify the evidence base for the steps in the SR process.7
Criteria for Assessing Potential Standards
The overarching goals of the criteria are to increase the usefulness of SRs for patient and clinician decisions while minimizing
TABLE 1-2 Committee Criteria for Assessing Potential Standards and Elements for Systematic Reviews
Acceptability or credibility |
Cultivates stakeholder understanding and acceptance of findings |
Applicability or generalizability |
Is consistent with the aim of comparative effectiveness research (CER): to assist consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels |
Efficiency of conducting the review |
Avoids unnecessary burden and cost of the process of conducting the review, and allows completion of the review in a timely manner |
Patient-centeredness |
Shows respect for and responsiveness to individual patient preferences, needs, and values; helps ensure that patient values and circumstances guide clinical decisions |
Scientific rigor |
Improves objectivity, minimizes bias, provides reproducible results, and fosters more complete reporting |
Timeliness |
Ensures currency of the review |
Transparency |
Ensures that methods are explicitly defined, consistently applied, and available for public review so that observers can readily link judgments, decisions, or actions to the data on which they are based; allows users to assess the strengths and weaknesses of the systematic review or clinical practice guideline |
the risks of error and bias. The following describes the committee’s rationale for each criterion:
-
Acceptability (credibility): If clinicians, guideline developers, or patients are unlikely to accept the findings of SRs, the costs of conducting the SRs could be for naught. Some SR standards are necessary to enhance the review’s overall credibility. For example, a standard requiring that the review team solicit consumer input as it formulates the review questions enhances credibility.
-
Applicability (generalizability): Healthcare interventions found to be effective in one patient population may not be effective in other patient populations. SRs should address the relevance of the available evidence to actual patients. Evidence on how outcomes vary among different types of patients is essential to developing usable CPGs and oth-
-
er types of clinical advice (Boyd et al., 2005; Tinetti et al., 2004; Vogeli et al., 2007). Patients seen in everyday clinical practice are more diverse than participants in clinical trials, particularly with respect to age, gender, race and ethnicity, health status, comorbidities, and other clinically relevant factors (Pham et al., 2007; Slone Survey, 2006; Vogeli et al., 2007).
-
Efficiency: Despite the potential benefit of standardizing some aspects of SRs, the decision to impose a standard must consider the cost implications, both in time and economic resources. Some standards, such as requiring two reviewers to screen individual studies, may require additional cost, but be necessary because empirical evidence shows that the standard would meaningfully improve the reliability of the SR (Edwards et al., 2002). Or, the evidence may suggest that the additional expense is not always warranted. For example, for some topics, collecting and translating non-English literature may ensure a comprehensive collection of research, but it may not be worth the cost if the research question is confined to an English-language only region (e.g., school lunches) (Moher et al., 2000, 2003; Morrison et al., 2009).
-
Patient-centeredness: Patients want to know what healthcare services work best for them as individuals. Focusing on the patient is integral to improving the quality of health care (IOM, 2001, 2008). SRs of research on comparative effectiveness should focus on informing the decisions about the care patients receive by addressing the questions of consumers, practicing clinicians, and developers of CPGs. For example, a standard that requires the review team to solicit feedback from patients about which clinical outcomes to address in review would enhance patient-centeredness.
-
Scientific rigor: Potential standards should be considered if evidence shows that they increase the scientific rigor of the review. SRs are most likely to benefit patient care if the underlying methods are objective and fully reported, minimize risk of bias, and yield reproducible results. For example, a standard that requires use of appropriate statistical techniques to synthesize data from the body of research enhances scientific rigor.
-
Timeliness: If an SR is out of date, it may not analyze important new clinical information of the benefits or harms of an intervention. Decision makers require up-to-date information. When new discoveries reveal serious risk of harm
-
or introduce a new and superior alternative treatment, updating the review or commissioning a new one is critical. For example, a standard that requires a review to consider relevant research within a recent timeframe would enhance timeliness.
-
Transparency: Without transparency, the integrity of an SR remains in question. Transparency requires that methods be reported in detail and be available to the public. This enables readers to judge the quality of the review and to interpret any decisions based on the review’s conclusions. For example, standards that require thorough reporting of review methods, funding sources, and conflicts of interest would facilitate transparency.
Expert Guidance
The committee’s next step was to consult with and review the published methods manuals of leading SR experts—at AHRQ, Centre for Reviews and Dissemination (CRD) (University of York, UK), and the Cochrane Collaboration—to document state-of-the-art guidance on best practices. Experts at other organizations were also consulted to finalize the committee’s detailed list of essential steps and considerations in the SR process. These organizations were DERP, the ECRI Institute, National Institute for Health and Clinical Excellence (UK), and several Evidence-based Practice Centers (EPCs) (with assistance from AHRQ staff).
With this information, the committee’s assessment criteria, and the research of commissioned authors and staff, the committee evaluated and revised the list of steps and best practices in the SR process through several iterations. The committee took a cautious approach to developing standards. All of the committee‘s recommended standards are based on current evidence, expert guidance (and are actively used by many experts), and thoughtful reasoning, Thus, the proposed standards are reasonable “best practices” for reducing bias and for increasing the scientific rigor of SRs of CER.
In its use of the term “standard,” the committee recognizes that its recommendations will not be the final word. Standards must always be considered provisional, pending additional evidence and experience. The committee supports future research that would identify better methods that meet both the goals of scientific rigor and efficiency in producing SRs.
The committee’s proposed standards are presented in Chapters 2–5. Each standard is articulated in the same format: first, a brief state-
ment of the step in the SR process (e.g., in Chapter 3, Standard 3.1. Conduct a comprehensive systematic search for evidence) followed by a series of elements of performance. These elements are essential components of the standard that should be taken for all publicly funded SRs of CER. Thus, Standard 3.1, for example, includes several elements that are integral to conducting a comprehensive search (e.g., “design a search strategy to address each key research question,” “search bibliographic databases”). Box 1-5 describes the committee’s numbering system for the recommended standards.
Collectively the standards and elements present a daunting task. Few, if any, members of the committee have participated in an SR that fully meets all of them. Yet the evidence and experience are strong enough that it is impossible to ignore these standards or hope that one can safely cut corners. The standards will be especially valuable for SRs of high-stakes clinical questions with broad population impact, where the use of public funds to get the right answer justifies careful attention to the rigor with which the SR is conducted. Individuals involved in SRs should be thoughtful about all of the standards and elements, using their best judgment if resources are
BOX 1-5 Numbering System for the Committee’s Recommended Systematic Review Standards The recommended systematic review (SR) standards are presented in Chapters 2–5. For easy reference within the report, the recommended standards and related elements of performance are numbered according to chapter number and sequence within chapters using the convention “x.y.z.” The first number (x) refers to the chapter number; the second number (y) refers to the standard; and the third number (z) refers to the essential element of the standard, where applicable. For example, the first standard in Chapter 3 is: Standard 3.1 Conduct a comprehensive systematic search for evidence Required elements:
|
inadequate to implement all of them, or if some seem inappropriate for the particular task or question at hand. Transparency in reporting the methods actually used and the reasoning behind the choices are among the most important of the standards recommended by the committee.
CURRENT LANDSCAPE
This section provides a brief overview of the major producers, users, and other stakeholders involved in SRs.
Producers of Systematic SRs
A number of public- and private-sector organizations produce SRs. As noted earlier, the committee focused much of its review on the methods of AHRQ, the Cochrane Collaboration, and CRD. However, many other organizations play a key role in sponsoring, conducting, and disseminating SRs. Some of the key U.S. and international organizations are described below.
U.S. Organizations
In the United States, the federal government funds a number of SRs, primarily through the AHRQ EPCs (Table 1-3). Private organizations also conduct SRs of CER, including the Blue Cross and Blue Shield Association’s Technology Evaluation Center, the ECRI Institute, and Hayes, Inc. (Table 1-4).
International Organizations
The U.S. SR enterprise is part of a larger international effort focused on SRs. Many international organizations have advanced and highly sophisticated SR programs that not only produce SRs, but also focus on how best to conduct SRs. Table 1-5 describes several leading international SR organizations.
Users and Stakeholders
This report uses the terms “users” and “stakeholders” to refer to individuals and organizations that are likely to consult a specific SR to guide decision making or who have a particular interest in the outcome of an SR. Table 1-6 lists examples of user and stakeholder organizations that use SRs to inform decision making. The
TABLE 1-3 Examples of U.S. Governmental Organizations That Produce Systematic Reviews
Organization |
Description |
Agency for Healthcare Research and Quality (AHRQ) Effective Health Care Program |
In 1997, AHRQ established 12 Evidence-based Practice Centers (EPCs) to promote evidence-based practice in everyday care. AHRQ awards 5-year contracts to EPCs to develop evidence reports and technology assessments. Currently, there are 14 EPCs in university and private settings. The U.S. Department of Veterans Affairs, the U.S. Preventive Services Task Force, and the Centers for Medicare & Medicaid Services use EPC reviews. |
Centers for Disease Control and Prevention (CDC) |
The CDC supports two programs for systematic reviews, the Guide to Community Preventive Services, initiated in 1996 and focusing on synthesizing evidence related to public health interventions, and the HIV/AIDS Prevention Research Synthesis, established in 1996 to review and summarize HIV behavioral prevention research literature. |
Substance Abuse and Mental Health Services Administration (SAMHSA) |
Since 1997 SAMHSA has provided information about the scientific basis and practicality of interventions that prevent or treat mental health and substance abuse disorders through the National Registry of Evidence-based Programs and Practices. |
SOURCES: Adapted from GAO (2009), IOM (2008). |
report focuses on four major categories of users and stakeholders: (1) consumers, including patients, families, and informal (or unpaid) caregivers; (2) clinicians, including physicians, nurses, and other healthcare professionals; (3) payers; and (4) policy makers, including guideline developers and other SR sponsors.
ORGANIZATION OF THE REPORT
Chapter Objectives
This introductory chapter has described the background, charge to the committee, study scope, conceptual framework, current landscape, and methods for this report. Chapter 2 through Chapter 5 present the committee’s review of and recommended standards for the basic steps in an SR. Chapter 6 provides a summary of the committee’s conclusions and recommendations.
TABLE 1-4 Examples of Private U.S. Organizations That Produce Systematic Reviews
Organization |
Description |
Blue Cross and Blue Shield Association (BCBSA), Technology Evaluation Center (TEC) |
BCBSA founded TEC in 1985 to provide decision makers with objective assessments of comparative effectiveness. TEC serves a wide range of clients in both the private and public sectors, including Kaiser Permanente and the Centers for Medicare & Medicaid Services. TEC is a designated Evidence-based Practice Center (EPC), and its products are publicly available. |
ECRI Institute |
The ECRI Institute is a nonprofit organization that provides technology assessments and cost-effectiveness analyses to ECRI Institute members and clients, including hospitals; health systems; public and private payers; U.S. federal and state government agencies; and ministries of health, voluntary-sector organizations, associations, and accrediting agencies. Its products and methods are generally not available to the public. The ECRI Institute is a designated EPC and is also a Collaborating Center for the World Health Organization. |
Hayes, Inc. |
Hayes, Inc., is a for-profit organization, established in 1989, to develop health technology assessments for health organizations, including health plans, managed-care companies, hospitals, and health networks. Hayes, Inc., produces several professional products, including the Hayes Briefs, the Hayes Directory, and the Hayes Outlook. Its products and methods are generally not available to the public. |
SOURCE: Adapted from IOM (2008). |
Chapter 2, Standards for Initiating a Systematic Review, focuses on the early steps in an SR that define the objectives of the review and influence its ultimate relevance to clinical decisions: establishing the review team, ensuring user and stakeholder input, managing bias and conflict of interest, and formulating the research topic and review protocol.
Chapter 3, Standards for Finding and Assessing Individual Studies, focuses on a central step in the SR process: the identification, collection, screening, and appraisal of the individual studies that make up an SR’s body of evidence.
Chapter 4, Standards for Synthesizing the Body of Evidence, focuses on considerations in the synthesis and assessment of the body of evidence that are key to ensuring objectivity, transparency, and scientific rigor.
TABLE 1-5 Examples of International Organizations That Produce Systematic Reviews
Organization |
Description |
Cochrane Collaboration |
Founded in 1993, the Cochrane Collaboration is an independent, nonprofit multinational organization that produces systematic reviews (SRs) of healthcare interventions. Cochrane SRs are prepared by researchers who work with one or more of 52 Cochrane Review Groups that are overseen by an elected Steering Committee. Editorial teams oversee the preparation and maintenance of the SRs and the application of quality standards. Cochrane’s global contributors and centers are funded by government agencies and private sources; its central infrastructure is supported by subscriptions to The Cochrane Library. Commercial funding of review groups is not allowed. Cochrane review abstracts and plain-language summaries are free; complete SRs are available via subscription. The Cochrane Database of SRs includes more than 6,000 protocols and SRs. |
Centre for Reviews and Dissemination (CRD) |
CRD is part of the National Institute for Health Research (NIHR) and a department of the University of York in the UK. Founded in 1994, CRD produces SRs of health interventions, SR methods research, and guidance for conducting SRs. CRD also produces the Database of Abstracts of Reviews of Effects (DARE), the National Health Service Economic Evaluation Database, and the Health Technology Assessment Database, which are used internationally by health professionals, policy makers, and researchers. An international prospective registry of SRs utilizing existing database infrastructure is also under development. The DARE includes over 19,000 records of SRs of health care interventions, including more than 10,000 critical abstracts, which summarize the methods and findings of published reviews—highlighting their strengths and weaknesses. Approximately 1,200 new critical abstracts are added to DARE annually. CRD is funded primarily through NIHR with some funding from other government agencies. To avoid conflict of interest, CRD has a policy not to undertake research for or receive funds from the pharmaceutical or medical devices industries. |
Campbell Collaboration |
The Campbell Collaboration is an international research network that produces SRs of the effects of social interventions. It was established in 2000 and has five Coordinating Groups: Social Welfare, Crime and Justice, Education, Methods, and Users. The Coordinating Groups oversee the production, scientific merit, and relevance of the SRs. Final SRs are published in the peer-reviewed monograph series, Campbell Systematic Reviews. The International Secretariat is hosted by the Norwegian Centre for the Health Services. |
Organization |
Description |
National Institute for Health and Clinical Excellence (NICE) |
NICE was established in 1999 as part of the U.K.’s National Health Service (NHS). It provides guidance to NHS, sets quality standards, and manages a national database to improve health and prevent and treat ill health. NICE commissions SRs on new and existing technologies from independent academic centers. NICE then uses the SRs to make recommendations to NHS on how a technology should be used in NHS. |
SOURCES: Information on the Cochrane Collaboration was adapted from IOM (2008). Information on CRD and the Campbell Collaboration: The Campbell Collaboration (2010); CRD (2010); NICE (2010). |
TABLE 1-6 Examples of Organizations That Use Systematic Reviews
Organization |
Description |
Drug Effectiveness Review Project (DERP) |
DERP is a collaboration of public and private organizations, including 13 state programs, which develops reports assessing the comparative effectiveness and safety of drugs within particular drug classes. Evidence-based Practice Centers (EPCs) conduct evidence reviews for DERP. State Medicaid programs have used this information to develop their drug formularies. |
Medicare Evidence Development & Coverage Advisory Committee (MedCAC) |
The Centers for Medicare & Medicaid (CMS) established the Medicare Coverage Advisory Committee (now the Medicare Evidence Development & Coverage Advisory Committee [MedCAC]) in 1998 to provide independent expert advice to CMS on specific clinical topics. MedCAC reviews and evaluates the medical literature and technology assessments on medical items and services that are under evaluation at CMS, including systematic reviews (SRs) produced by the EPCs and other producers of SRs. MedCAC can be an integral part of the national coverage determination process. MedCAC is advisory in nature; CMS is responsible for all final decisions. |
NIH Consensus Development Program (CDP) |
CDP produces consensus statements on the effects of healthcare interventions. CDP convenes independent panels of researchers, health professionals, and public representatives who consider the literature reviews conducted by EPCs, as well as expert testimony. Topics are chosen based on their public health importance, prevalence, controversy, potential to reduce gaps between knowledge and practice, availability of scientific information, and potential impact on healthcare costs. |
Organization |
Description |
Performance measurement organizations |
Performance measurement organizations track and evaluate provider performance by measuring providers’ actual clinical practices against the recommended practices. To conduct this work, these organizations typically establish standards of care based on SRs, against which the performance of providers can be assessed. Examples of performance measurement organizations include the AQA Alliance and the National Quality Forum. |
Professional medical societies |
Many professional medical societies have instituted processes and directed resources to developing clinical practice guidelines on the basis of systematic reviews. Examples of societies with well-established guideline development procedures include the American College of Cardiology/American Heart Association, American College of Chest Physicians, American Academy of Neurology, and American Academy of Pediatrics. |
U.S. Preventive Services Task Force (USPSTF) |
The USPSTF consists of a panel of private-sector experts that makes recommendations about which preventive services should be incorporated routinely into primary medical care. Its evidence-based recommendations are regarded as the “gold standard” for clinical preventive services. USPSTF is supported by an EPC, which conducts systematic reviews on relevant clinical prevention topics. |
SOURCE: Adapted from IOM (2008). |
Chapter 5, Standards for Reporting Systematic Reviews, focuses on the components of an SR final report that are fundamental to its eventual utility for patients, clinicians, and others.
Chapter 6, Improving the Quality of Systematic Reviews: Discussion, Conclusions, and Recommendations, presents the committee’s conclusions and recommendations for advancing the science underlying SR methods and for providing a more supportive environment for the conduct of SRs.
REFERENCES
Altman, D. G., K. F. Schulz, D. Moher, M. Egger, F. Davidoff, D. Elbourne, P. C. Gøtzsche, and T. Lang. 2001. The revised CONSORT statement for reporting randomized trials: Explanation and elaboration. Annals of Internal Medicine 134(8):663–694.
Bassler, D., I. Ferreira-Gonzalez, M. Briel, D. J. Cook, P. J. Devereaux, D. Heels-Ansdell, H. Kirpalani, M. O. Meade, V. M. Montori, A. Rozenberg, H. J. Schünemann, and G. H. Guyatt. 2007. Systematic reviewers neglect bias that results from trials stopped early for benefit. Journal of Clinical Epidemiology 60(9):869–873.
Boyd, C. M., J. Darer, C. Boult, L. P. Fried, L. Boult, and A. W. Wu. 2005. Clinical practice guidelines and quality of care for older patients with multiple comorbid diseases: Implications for pay for performance. JAMA 294(6):716–724.
Brand, R. A. 2009. Standards of reporting: The CONSORT, QUORUM, and STROBE guidelines. Clinical Orthopaedics and Related Research 467(6):1393–1394.
Bravata, D. M., K. M. McDonald, A. L. Gienger, V. Sundaram, M. V. Perez, R. Varghese, J. R. Kapoor, R. Ardehali, M. C. McKinnon, C. D. Stave, D. K. Owens, and M. Hlatky. 2007. Comparative effectiveness of percutaneous coronary interventions and coronary artery bypass grafting for coronary artery disease. Rockville, MD: AHRQ.
The Campbell Collaboration. 2010. About us. http://www.campbellcollaboration.org/about_us/index.php (accessed September 22, 2010).
Chou, R., N. Aronson, D. Atkins, A. S. Ismaila, P. Santaguida, D. H. Smith, E. Whitlock, T. J. Wilt, and D. Moher. 2010. Assessing harms when comparing medical interventions: AHRQ and the Effective Health-Care Program. Journal of Clinical Epidemiology 63(5):502–512.
Clancy, C., and F. S. Collins. 2010. Patient-Centered Outcomes Research Institute: The intersection of science and health care. Science Translational Medicine 2(37): 37cm18.
Cochrane Collaboration Diagnostic Test Accuracy Working Group. 2011. Handbook for DTA reviews. http://srdta.cochrane.org/handbook-dta-reviews (accessed March 15, 2011).
Colliver, J. A., K. Kucera, and S. J. Verhulst. 2008. Meta-analysis of quasi-experimental research: Are systematic narrative reviews indicated? Medical Education 42(9):858–865.
CRD (Centre for Reviews and Dissemination). 2009. Systematic reviews: CRD’s guidance for undertaking reviews in health care. York, U.K.: York Publishing Services, Ltd.
CRD. 2010. About CRD. http://www.york.ac.uk/inst/crd/about_us.htm (accessed September 22, 2010).
Delaney, A., S. M. Bagshaw, A. Ferland, K. Laupland, B. Manns, and C. Doig. 2007. The quality of reports of critical care meta-analyses in the Cochrane Database of Systematic Reviews: An independent appraisal. Critical Care Medicine 35(2):589–594.
Edwards, P., M. Clarke, C. DiGuiseppi, S. Pratap, I. Roberts, and R. Wentz. 2002. Identification of randomized controlled trials in systematic reviews: Accuracy and reliability of screening records. Statistics in Medicine 21:1635–1640.
GAO (Government Accountability Office). 2009. Program evaluation: A variety of rigorous methods can help identify effective interventions. Vol. GAO-10-30. Washington, DC: GAO.
Glasziou, P., J. Vandenbroucke, and I. Chalmers. 2004. Assessing the quality of research. BMJ 328(7430):39–41.
Glasziou, P., E. Meats, C. Heneghan, and S. Shepperd. 2008. What is missing from descriptions of treatment in trials and reviews? BMJ 336(7659):1472–1474.
Glenny, A. M., D. G. Altman, F. Song, C. Sakarovitch, J. J. Deeks, R. D’Amico, M. Bradburn, and A. J. Eastwood. 2005. Indirect comparisons of competing interventions. Health Technology Assessment 9(26):1–134.
Glenton, C., V. Underland, M. Kho, V. Pennick, and A. D. Oxman. 2006. Summaries of findings, descriptions of interventions, and information about adverse effects would make reviews more informative. Journal of Clinical Epidemiology 59(8):770–778.
Helfand, M., and K. Peterson. 2003. Drug class review on the triptans: Drug Effectiveness Review Project. Portland, OR: Oregon Evidence-based Practice Center.
Higgins, J. P. T., and S. Green, eds. 2008. Cochrane handbook for systematic reviews of interventions. Chichester, UK: John Wiley & Sons.
Hopewell, S., M. Clarke, D. Moher, E. Wager, P. Middleton, D. G. Altman, and K. F. Schulz. 2008a. CONSORT for reporting randomized controlled trials in journal and conference abstracts: Explanation and elaboration. PLoS Medicine 5(1):e20.
Hopewell, S., L. Wolfenden, and M. Clarke. 2008b. Reporting of adverse events in systematic reviews can be improved: Survey results. Journal of Clinical Epidemiology 61(6):597–602.
Ioannidis, J. P., J. W. Evans, P. C. Gøtzsche, R. T. O’Neill, D. Altman, K. Schulz, and D. Moher. 2004. Better reporting of harms in randomized trials: An extension of the CONSORT Statement. Ann Intern Med 141:781–788.
IOM (Institute of Medicine). 2001. Crossing the quality chasm: A new health system for the 21st century. Washington, DC: National Academy Press.
IOM. 2008. Knowing what works in health care: A roadmap for the nation. Edited by J. Eden, B. Wheatley, B. J. McNeil, and H. Sox. Washington, DC: The National Academies Press.
IOM. 2009. Initial national priorities for comparative effectiveness research. Washington, DC: The National Academies Press.
Laopaiboon, M. 2003. Meta-analyses involving cluster randomization trials: A review of published literature in health care. Statistical Methods in Medical Research 12(6):515–530.
Last, J. M., ed. 1995. A dictionary of epidemiology, 3rd ed. New York: Oxford University Press.
Liberati, A., D. G. Altman, J. Tetzlaff, C. Mulrow, P. C. Gøtzsche, J. P. A. Ioannidis, M. Clarke, P. J. Devereaux, J. Kleijnen, and D. Moher. 2009. The PRISMA statement for reporting systematic reviews and meta-analyses of studies that evaluate health care interventions: Explanation and elaboration. Annals of Internal Medicine 151(4):W1–W30.
Lohr, K. N. 2004. Rating the strength of scientific evidence: Relevance for quality improvement programs. International Journal for Quality in Health Care 16(1):9–18.
Luce, B. R., M. Drummond, B. Jönsson, P. J. Neumann, J. S. Schwartz, U. Siebert, and S. D. Sullivan. 2010. EBM, HTA, and CER: Clearing the confusion. Milbank Quarterly 88(2):256–276.
Lundh, A., S. L. Knijnenburg, A. W. Jorgensen, E. C. van Dalen, and L. C. M. Kremer. 2009. Quality of systematic reviews in pediatric oncology—A systematic review. Cancer Treatment Reviews 35(8):645–652.
Mallen, C., G. Peat, and P. Croft. 2006. Quality assessment of observational studies is not commonplace in systematic reviews. Journal of Clinical Epidemiology 59(8):765–769.
McDonagh, M., K. Peterson, S. Carson, R. Fu, and S. Thakurta. 2008. Drug class review: Atypical antipsychotic drugs. Update 3. Portland, OR: Oregon Evidence-based Practice Center.
Moher, D., B. Pham, T. P. Klassen, K. F. Schulz, J. A. Berlin, A. R. Jadad, and A. Liberati. 2000. What contributions do languages other than English make on the results of meta-analyses? Journal of Clinical Epidemiology 53 (9):964–972.
Moher, D., B. Pham, M. L. Lawson, and Klassen T. P. 2003. The inclusion of reports of randomised trials published in languages other than English in systematic reviews. Health Technology Assessment 7 (41):1–90.
Moher, D., J. Tetzlaff, A. C. Tricco, M. Sampson, and D. G. Altman. 2007. Epidemiology and reporting characteristics of systematic reviews. PLoS Medicine 4(3):447–455.
Moher, D., K. F. Schulz, I. Simera, and D. G. Altman. 2010. Guidance for developers of health research reporting guidelines. PLoS Med 7(2):e1000217.
Morrison, A., K. Moulton, M. Clark, J. Polisena, M. Fiander, M. Mierzwinski-Urban, S. Mensinkai, T. Clifford, and B. Hutton. 2009. English-language restriction when conducting systematic review-based meta-analyses: Systematic review of published studies. Ottawa, CA: Canadian Agency for Drugs and Technologies in Health.
NICE (National Institute for Health and Clinical Excellence). 2010. About NICE. http://www.nice.org.uk/aboutnice/about_nice.jsp (accessed October 27, 2010).
Norris, S., D. Atkins, W. Bruening, S. Fox, E. Johnson, R. Kane, S. C. Morton, M. Oremus, M. Ospina, G. Randhawa, K. Schoelles, P. Shekelle, and M. Viswanathan. 2010. Selecting observational studies for comparing medical interventions. In Methods guide for comparative effectiveness reviews. http://effectivehealthcare.ahrq.gov/ehc/products/196/454/MethodsGuideNorris_06042010.pdf (accessed November 8, 2010).
Oremus, M., M. Hanson, R. Whitlock, E. Young, A. Gupta, A. Dal Cin, C. Archer, and P. Raina. 2006. The uses of heparin to treat burn injury. Evidence Report/Technology Assessment No. 148. AHRQ Publication No. 07-E004. Rockville, MD: AHRQ.
Owens, D. K., K. N. Lohr, D. Atkins, J. R. Treadwell, J. T. Reston, E. B. Bass, S. Chang, and M. Helfand. 2010. AHRQ Series Paper 5: Grading the strength of a body of evidence when comparing medical interventions: AHRQ and the Effective Health-Care Program. Journal of Clinical Epidemiology 63 (5):513–523.
Pham, H. H., D. Schrag, A. S. O’Malley, B. Wu, and P. B. Bach. 2007. Care patterns in Medicare and their implications for pay for performance. New England Journal of Medicine 356(11):1130–1139.
Plint, A. C., D. Moher, A. Morrison, K. Schulz, D. G. Altman, C. Hill, and I. Gaboury. 2006. Does the CONSORT checklist improve the quality of reports of randomised controlled trials? A systematic review. Medical Journal of Australia 185(5):263–267.
Reeves, B. C., J. J. Deeks, J. Higgins, and G. A. Wells. 2008. Chapter 13: Including non-randomized studies. In Cochrane handbook for systematic reviews of interventions, edited by J. P. T. Higgins and G. S. West. Chichester, UK: John Wiley & Sons.
Roundtree, A. K., M. A. Kallen, M. A. Lopez-Olivo, B. Kimmel, B. Skidmore, Z. Ortiz, V. Cox, and M. E. Suarez-Almazor. 2008. Poor reporting of search strategy and conflict of interest in over 250 narrative and systematic reviews of two biologic agents in arthritis: A systematic review. Journal of Clinical Epidemiology 62(2):128–137.
Schünemann, H., D. Best, G. Vist, and A. D. Oxman. 2003. Letters, numbers, symbols and words: How to communicate grades of evidence and recommendations. Canadian Medical Association Journal 169(7):677–680.
Slone Survey. 2006. Patterns of medication use in the United States. Boston, MA: Slone Epidemiology Center.
Song, F., Y. K. Loke, T. Walsh, A. M. Glenny, A. J. Eastwood, and D. G. Altman. 2009. Methodological problems in the use of indirect comparisons for evaluating healthcare interventions: Survey of published systematic reviews. BMJ 338:b1147.
Steinberg, E. P., and B. R. Luce. 2005. Evidence based? Caveat emptor! [editorial]. Health Affairs (Millwood) 24(1):80–92.
Tinetti, M. E., S. T. Bogardus, Jr., and J. V. Agostini. 2004. Potential pitfalls of disease-specific guidelines for patients with multiple conditions. New England Journal of Medicine 351(27):2870–2874.
Tricco, A. C., J. Tetzlaff, M. Sampson, D. Fergusson, E. Cogo, T. Horsley, and D. Moher. 2008. Few systematic reviews exist documenting the extent of bias: A systematic review. Journal of Clinical Epidemiology 61(5):422–434.
Turner, E. H., A. M. Matthews, E. Linardatos, R. A. Tell, and R. Rosenthal. 2008. Selective publication of antidepressant trials and its influence on apparent efficacy. New England Journal of Medicine 358(3):252–260.
Vogeli, C., A. Shields, T. Lee, T. Gibson, W. Marder, K. Weiss, and D. Blumenthal. 2007. Multiple chronic conditions: Prevalence, health consequences, and implications for quality, care management, and costs. Journal of General Internal Medicine 22(Suppl. 3):391–395.
von Elm, E., D. G. Altman, M. Egger, S. J. Pocock, P. C. Gotzsche, and J. P. Vandenbroucke. 2007. The strengthening the reporting of observational studies in epidemiology (STROBE) statement: Guidelines for reporting observational studies. Annals of Internal Medicine 147(8):573–577.