TABLE D-1 Selected Approaches to Rating Strength of Evidence and Clinical Recommendations
System |
Focus/Audience |
Systems for Rating Evidence Quality |
International Approaches |
||
Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group (2009) |
Focus: Diagnosis and therapy |
Grades of evidence Randomized trial: High Observational study: Low Any other evidence: Very low |
Audience: Guideline developers |
||
Decrease grade if limitations in study quality, important inconsistency of results, uncertainty about the directness of the evidence, imprecise or sparse data, and high risk of reporting bias. |
||
A voluntary, international, collaboration |
||
Increase grade if a very strong association, evidence of a dose–response gradient, presence of all plausible residual confounding would have reduced the observed effect. |
System for Rating Clinical Recommendations’ Strength |
Strong: Desirable effects clearly outweigh the undesirable effects, or clearly do not. Quality of evidence is high and other considerations support a strong recommendation. |
Weak: Trade-offs are less certain—either because of low-quality evidence or because evidence suggests that desirable and undesirable effects are closely balanced. The quality of evidence is high and other considerations support a weak recommendation. |
Based on:
|
NOTE: Many organizations claim to use GRADE, but modify the system in the application of translating evidence into clinical recommendations or guidelines. |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
Centre for Evidence-Based Medicine (CEBM) (2009) |
Focus: Prevention, diagnosis, prognosis, therapy, differential diagnosis/symptom prevalence, and economic and decision analyses |
CEBM is currently working on updating its level of evidence rankings and providing further rationale for them, tentatively due to become available in January 2010. |
One of several UK centers with the aim of promoting evidence-based health care |
||
This approach has different evidence rating system depending on the type of healthcare intervention. For example, the following rating system is used for therapy interventions: |
||
Audience: Doctors, clinicians, teachers, and others |
Level 1a: Systematic review (SR) of randomized controlled trials (RCTs) with homogeneity.a |
|
Level 1b: Individual RCT with narrow confidence interval. |
||
Level 1c: All or none case series.b |
||
Level 2a: SR with homogeneity of cohort studies. |
||
Level 2b: Individual cohort studies (including quality RCT; e.g., <80% follow-up). |
||
Level 2c: Outcomes research, ecological studies.c |
||
Level 3a: SR with homogeneity of case control studies. |
||
Level 3b: Individual case control study. |
||
Level 4: Case series (and poor-quality cohort and case control studiesd). |
||
Level 5: Expert opinion without explicitly critical appraisal, or based on physiology, bench research, or “first principles.” |
System for Rating Clinical Recommendations’ Strength |
A: Consistent level 1 studies. |
B: Consistent level 2 or 3 studies or extrapolationse from level 1 studies. |
C: Level 4 studies or extrapolations from level 2 or 3 studies. |
D: Level 5 evidence or troublingly inconsistent or inconclusive studies of any level. |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
Scottish Intercollegiate Guidelines Network (SIGN) (2009) |
Focus: All healthcare interventions |
Levels of evidence 1++ High-quality meta-analyses, systematic reviews of RCTs, or RCTs with a very low risk of bias. 1+ Well-conducted meta-analyses, systematic reviews, or RCTs with a low risk of bias. 1− Meta-analyses, systematic reviews, or RCTs with a high risk of bias. 2++ High-quality systematic reviews of case control or cohort studies. ___ High-quality case control or cohort studies with a very low risk of confounding or bias and a high probability that the relationship is causal. 2+ Well-conducted case control or cohort studies with a low risk of confounding or bias and a moderate probability that the relationship is causal. 2− Case control or cohort studies with a high risk of confounding or bias and a significant risk that the relationship is not causal. 3 Non-analytic studies, such as case reports, case series. 4 Expert opinion. |
Audience: National Health Service in Scotland |
||
New Zealand Guidelines Group (NZGG) (2007) |
Focus: Screening, diagnosis, prognosis, and therapy |
The body of evidence is the sum of the evidence of all the individual studies and the quality ratings of each study. |
Independent, not-for-profit |
||
Good evidence: From studies of strong design for answering the question addressed. |
||
Audience: Clinical practitioners, policy makers, and consumers |
||
Fair evidence: Reasonable evidence, but there may be minimal inconsistency, or uncertainty. |
||
Expert opinion: For some outcomes, trials or studies cannot be or have not been performed and practice is informed only by expert opinion. |
System for Rating Clinical Recommendations’ Strength |
Guidelines are developed based on judgment on the consistency, clinical relevance, and external validity of the whole body of evidence. |
A: At least one meta-analysis, systematic review, or RCT rated as 1++, and directly applicable to the target population; or a body of evidence consisting principally of studies rated as 1+, directly applicable to the target population, and demonstrating overall consistency of results. |
B: A body of evidence including studies rated as 2++, directly applicable to the target population, and demonstrating overall consistency of results; or extrapolated evidence from studies rated as 1++ or 1+. |
C: A body of evidence including studies rated as 2+, directly applicable to the target population and demonstrating overall consistency of results; or extrapolated evidence from studies rated as 2++. |
D: Evidence level 3 or 4; or extrapolated evidence from studies rated as 2+. |
Good practice points: Occasionally, guideline development groups find that there is an important practical point that they wish to emphasize, but for which there is not, nor is there likely to be, any research evidence. This typically will be where some aspect of treatment is regarded as such sound clinical practice that nobody is likely to question it. These are shown in the guideline as Good Practice Points, and are marked with a green check. |
The grade of the recommendation is based on consideration of
|
A: The recommendation is supported by GOOD evidence. |
B: The recommendation is supported by FAIR. |
C: The recommendation is supported by EXPERT opinion (published) only. |
I: Evidence to make a recommendation is INSUFFICIENT. |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
The Canadian Hypertension Education Program (2007) |
Focus: Diagnosis and therapy related to hypertension |
Uses flow charts to assess the evidence according to study methodology: |
A: RCT with blinded assessment of outcomes, intention-to-treat analysis, adequate follow-up, and sufficient sample size to detect a clinically important difference with power >80%. |
||
A Canadian volunteer, non-profit organization |
Audience: Canadian Diabetes Association, Canadian Society of Nephrology, Canadian Coalition for High Blood Pressure Prevention and Control, The College of Family Physicians of Canada, Heart and Stroke Foundation of Canada, and Public Health Agency of Canada |
|
B: Adequate subgroup analysis: Analysis was a priori, performed within an adequate RCT and one of only a few tested, and there was sufficient sample size within the examined subgroup to detect a clinically important difference. |
||
C: Systematic review or meta-analysis: Comparison arms are derived from head-to-head comparisons within the same RCT. |
||
D: Observational study or systematic review in which the comparison arms are derived from different placebo-controlled RCTs and then extrapolations are made across RCTs. |
System for Rating Clinical Recommendations’ Strength |
A: The recommendation is supported by a-, b-, or c-level evidence. Clinically important outcomes and the study population is representative of the population in the recommendation. |
B: The recommendation is supported by a-, b-, or c-level evidence. Clinically important or validated surrogate outcomes. |
C: The recommendation is supported by a-, b-, c-, or d-level evidence. For levels a, b, and c evidence, the outcome is an unvalidated surrogate for clinically important outcomes. For level d evidence, there must be a clinically important outcome and study population representative of the recommendation population, or an outcome-validated surrogate, or results that are extrapolated from study population to real population. |
D: Outcome is an unvalidated surrogate for clinically important population, or the applicability of the study is irrelevant. |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
U.S. Approaches |
||
Institute for Clinical Systems Improvement (ICSI) (2003) |
Focus: Prevention, diagnosis, or management of a given symptom, disease, or condition for individual patients under normal circumstances |
Primary reports of new data collection: A: RCT. |
Collaborative of 57 medical groups in Minnesota |
B: Cohort study. |
|
C: Nonrandomized trial with concurrent or historical controls, case control study, study of sensitivity and specificity of a diagnostic test, population-based descriptive study. |
||
D: Cross-sectional study, case series, or case report. |
||
Audience: Minnesota healthcare providers and payers |
System for Rating Clinical Recommendations’ Strength |
Grade I: Good evidence The evidence consists of results from studies of strong design for answering the question addressed. The results are both clinically important and consistent with minor exceptions at most. The results are free of any significant doubts about generalizability, bias, and flaws in research design. Studies with negative results have sufficiently large samples to have adequate statistical power. |
Grade II: Fair evidence The evidence consists of results from studies of strong design for answering the question addressed, but there is some uncertainty attached to the conclusion because of inconsistencies among the results from the studies or because of minor doubts about generalizability, bias, research design flaws, or adequacy of sample size. Alternatively, the evidence consists solely of results from weaker designs for the question addressed, but the results have been confirmed in separate studies and are consistent with minor exceptions at most. |
Grade III: Limited evidence The evidence consists of results from studies of strong design for answering the question addressed, but there is substantial uncertainty attached to the conclusion because of inconsistencies among the results from different studies or because of serious doubts about generalizability, bias, research design flaws, or adequacy of sample size. Alternatively, the evidence consists solely of results from a limited number of studies of weak design for answering the question addressed. |
Grade not assignable: No evidence is available that directly supports or refutes the conclusion. |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
Strength of Recommendation Taxonomy (SORT) (2004) |
Focus: Prevention, screening, diagnosis, prognosis, and therapy |
Level 1: Good-quality, patient-oriented evidence:
|
Developed by the editors of American Family Physician, Family Medicine, The Journal of Family Practice, Journal of the American Board of Family Practice, and BMJ-USA |
||
Audience: Guideline developers, family practice, and other primary care providers |
||
Level 2: Limited-quality, patient-oriented evidence:g
|
||
Level 3: Other evidence: Consensus guidelines, extrapolations from bench research, usual practice, opinion, disease-oriented evidence (intermediate or physiologic outcomes only), or case series for studies of diagnosis, treatment, prevention or screening. |
System for Rating Clinical Recommendations’ Strength |
A: Consistent and good-quality, patient-oriented evidence.* (Level 1) |
B: Inconsistent or limited-quality, patient-oriented evidence.* (Level 2) |
C: Consensus, usual practice, opinion, disease-oriented evidence,* or case series for studies of diagnosis, treatment, prevention, or screening. (Level 3) |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
U.S. Preventive Services Task Force (USPSTF) (2008) |
Focus: Prevention |
High: The available evidence usually includes consistent results from well-designed, well-conducted studies in representative primary care populations. These studies assess the effects of the preventive service on health outcomes. This conclusion is therefore unlikely to be strongly affected by the results of future studies. |
Audience: Guideline developers and users |
||
Moderate: The available evidence is sufficient to determine the effects of the preventive service on health outcomes, but confidence in the estimate is constrained by factors such as
As more information becomes available, the magnitude or direction of the observed effect could change, and this change may be large enough to alter the conclusion. |
||
Low: The available evidence is insufficient to assess effects on health outcomes. Evidence is insufficient because of
More information may allow estimation of effects on health outcomes. |
System for Rating Clinical Recommendations’ Strength |
A: The USPSTF recommends the service. There is high certainty that the net benefit is substantial. Offer or provide this service. |
B: The USPSTF recommends the service. There is high certainty that the net benefit is moderate or there is moderate certainty that the net benefit is moderate to substantial. Offer or provide this service. |
C: The USPSTF recommends against routinely providing the service. There may be considerations that support providing the service in an individual patient. There is at least moderate certainty that the net benefit is small. Offer or provide this service only if other considerations support the offering or providing the service in an individual patient. |
D: The USPSTF recommends against the service. There is moderate or high certainty that the service has no net benefit or that the harms outweigh the benefits. Discourage the use of this service. |
I statement: The USPSTF concludes that the current evidence is insufficient to assess the balance of benefits and harms of the service. Evidence is lacking, of poor quality, or conflicting, and the balance of benefits and harms cannot be determined. Read the clinical considerations section of USPSTF Recommendation Statement. If the service is offered, patients should understand the uncertainty about the balance of benefits and harms. |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
Professional Societies |
||
American College of Cardiology Foundation/American Heart Association (ACCF/AHA) (2009) |
Focus: Prevention, diagnosis, or management of heart diseases or conditions |
A: Data derived from multiple randomized clinical trials or meta-analyses. |
B: Data derived from a single randomized trial, or nonrandomized studies. |
||
Audience: Healthcare providers |
||
C: Consensus opinion of experts, case studies, or standard of care. |
||
American Academy of Pediatrics (AAP) (2004) |
Focus: Pediatric guidelines for all healthcare interventions |
A: Well-designed, randomized controlled trials or diagnostic studies on relevant populations. |
B: RCTs or diagnostics studies with minor limitations; overwhelmingly consistent evidence from observational studies. |
||
Audience: Guideline developers, implementers, and users |
||
C: Observational studies (case control and cohort design). |
||
D: Expert opinion, case reports, reasoning from principles. |
||
X: Exceptional situations where validating studies cannot be performed and there is a clear preponderance of benefit or harm. |
System for Rating Clinical Recommendations’ Strength |
|
Any combination of classification of recommendation and level of evidence is possible. A recommendation can be Class I, based entirely on expert opinion (level C), or Class IIB, with level A evidence if based on multiple RCTs with divergent conclusions. |
Class I: Conditions for which there is evidence and/or general agreement that a given procedure or treatment is useful and effective. Class 1 statements may read: should, is recommended, is indicated, or is useful/effective/beneficial. |
Class II: Conditions for which there is conflicting evidence and/or a divergence of opinion about the usefulness/efficacy of a procedure or treatment. |
Class IIa: Weight of evidence/opinion is in favor of usefulness/efficacy. Class IIa statements may read: is reasonable, can be useful/effective/beneficial, is probably recommended, is probably indicated. |
Class IIb: Usefulness/efficacy is less well established by evidence/opinion. Class IIb statements may read: may/might be considered, may/might be reasonable, usefulness/effectiveness is unknown/unclear/uncertain/not well established. |
Class III: Conditions for which there is evidence and/or general agreement that the procedure/treatment is not useful/effective and in some cases may be harmful. Class III statements may read: is not recommended, is not indicated, should not, is not useful/effective/beneficial, may be harmful. |
Strong recommendation: The benefits of the recommended approach clearly exceed the harms (or in the case of a negative recommendation, the harms clearly exceed the benefits) and the quality of the evidence is either excellent or impossible to obtain (A, sometimes B, or X). |
Recommendation: The benefits exceed the harms or vice versa, but the quality of evidence is not as strong (sometimes B, C, or X). |
Option: The evidence quality that exists is suspect or not that well-designed; well-conducted studies have demonstrated little clear advantage of one approach versus another (A, B, C, or D). |
No recommendation: There is both lack of pertinent evidence and an unclear balance between benefits and harms (D). |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
American Academy of Neurology (AAN) (2004) |
Focus: Screening, diagnosis, prognosis, and therapy of neurologic disorders |
Similar ratings systems exist for diagnostic, prognostic, and screening interventions. Therapeutic interventions is one example: |
Class I: Prospective, RCT with masked outcome assessment, in a representative population. The following are required: (a) primary outcome(s) clearly defined, (b) exclusion/inclusion criteria clearly defined, (c) adequate accounting for dropouts and crossovers with numbers sufficiently low to have minimal potential for bias, (d) relevant baseline characteristics are presented and substantially equivalent among treatment groups or there is appropriate statistical adjustment for differences. |
||
Audiences: Neurologists, patients, payers, federal agencies, other healthcare providers, and clinical researchers |
||
Class II: Prospective matched group cohort study in a representative population with masked outcome assessment that meets a through d above or an RCT in a representative population that lacks one criteria in a through d. |
||
Class III: All other controlled trials (including well-defined natural history controls or patients serving as own controls) in a representative population, where outcome is independently assessed, or independently derived by objective outcome measurement. |
||
Class IV: Evidence from uncontrolled studies, case series, case reports, or expert opinion. |
System for Rating Clinical Recommendations’ Strength |
A: Established as effective, ineffective, or harmful (or established as useful/predictive or not useful/predictive) for the given condition in the specified population. Recommendation: Should be done or should not be done. Translation of evidence to recommendation: Requires at least two consistent Class I studies. |
B: Probably effective, ineffective, or harmful (or probably useful/predictive or not useful/predictive) for the given condition in the specified population. Recommendation: Should be considered or should not be considered. Translation of evidence to recommendation: Requires at least one Class I study or two consistent Class II studies. |
C: Possibly effective, ineffective, or harmful (or possibly useful/predictive or not useful/predictive) for the given condition in the specified population. Recommendation: May be considered or may not be considered. Translation of evidence to recommendation: Level C rating requires at least one Class II study or two consistent Class III studies. |
B: Data inadequate or conflicting. Given current knowledge, treatment (test, predictor) is unproven. Recommendation: None. Translation of evidence to recommendation: Studies not meeting criteria for Class I–Class III. |
System |
Focus/Audience |
Systems for Rating Evidence Quality |
American College of Chest Physicians (ACCP) (2009) |
Focus: Diagnosis and management of chest disease |
High: RCTs without important limitations or overwhelming evidence from observational studies. |
Moderate: RCTs with important limitations (inconsistent results, methodologic flaws, indirect, or imprecise) or exceptionally strong evidence from observational studies. |
||
Audience: Chest physicians |
||
Low: Observational studies or case series. |
||
National Comprehensive Cancer Network (NCCN) (2008) |
Focus: Prevention, diagnosis, and therapy related to cancer |
High: High-powered randomized clinical trials or meta-analysis. |
Lower: Runs the gamut from phase II to large cohort studies to case series to individual practitioner experience. |
||
Audience: Oncologists and other healthcare providers |
System for Rating Clinical Recommendations’ Strength |
1A: Strong recommendation. High level of evidence. Benefits outweigh the risks/burdens, or the risks/burdens outweigh the benefits. |
1B: Strong recommendation. Moderate evidence. Benefits outweigh the risks/burdens, or the risks/burdens outweigh the benefits. |
1C: Strong recommendation. Low or very low evidence. Benefits outweigh the risks/burdens, or the risks/burdens outweigh the benefits. |
2A: Weak recommendation. High evidence, and the risks/burdens are evenly balanced with the benefits. |
2B: Weak recommendation. Moderate evidence, and the risks/burdens are evenly balanced with the benefits. |
2C: Weak recommendation. Low or very low evidence, and the risks/burdens are evenly balanced with the benefits. Or the balance of benefits to risks and burdens is uncertain. |
Category 1: The recommendation is based on high-level evidence (e.g., randomized controlled trials), and there is uniform NCCN consensus. |
Category 2A: The recommendation is based on lower level evidence and there is uniform NCCN consensus. |
Category 2B: The recommendation is based on lower level evidence and there is non-uniform NCCN consensus (but no major disagreement). |
Category 3: The recommendation is based on any level of evidence, but reflects major disagreement. |
REFERENCES
AAN (American Academy of Neurology). 2004. Clinical practice guidelines process manual. http://www.aan.com/globals/axon/assets/3749.pdf (accessed July 28, 2009).
ACCF/AHA (American College of Cardiology Foundation/American Heart Association). 2009. Methodology manual for ACCF/AHA guideline writing committees. http://www.americanheart.org/downloadable/heart/12378388766452009MethodologyManualACCF_AHAGuidelineWritingCommittees.pdf (accessed July 29, 2009).
ACCP (American College of Chest Physicians). 2009. The ACCP grading system for guideline recommendations. http://www.chestnet.org/education/hsp/gradingSystem.php (accessed July 28, 2009).
CEBM (Centre for Evidence-Based Medicine). 2009. Oxford Centre for Evidence-based Medicine—Levels of Evidence (March 2009). http://www.cebm.net/index.aspx?o=1025 (accessed July 28, 2009).
Ebell, M. H., J. Siwek, B. D. Weiss, S. H. Woolf, J. Susman, B. Ewigman, and M. Bowman. 2004. Strength of recommendation taxonomy (SORT): A patient-centered approach to grading evidence in medical literature. American Family Physician 69(3):548–556.
GRADE Working Group (Grading of Recommendations Assessment, Development, and Evaluation Working Group). 2009. Grading the quality of evidence and the strength of recommendations http://www.gradeworkinggroup.org/intro.htm (accessed July 20, 2009).
ICSI (Institute for Clinical Systems Improvement). 2003. Evidence grading system. http://www.icsi.org/evidence_grading_system_6/evidence_grading_system__pdf_.html (accessed September 8, 2009).
Kish, M. A. 2001. Guide to development of practice guidelines. Clinical Infectious Diseases 32(6):851–854.
NCCN (National Comprehensive Cancer Network). 2008. About the NCCN clinical practice guidelines in oncology. http://www.nccn.org/professionals/physician_gls/about.asp (accessed September 8, 2009).
NZGG (New Zealand Guidelines Group). 2007. Handbook for the preparation of explicit evidence-based clinical practice guidelines. http://www.nzgg.org.nz/download/files/nzgg_guideline_handbook.pdf (accessed September 4, 2009).
SIGN (Scottish Intercollegiate Guidelines Network). 2009. SIGN 50: A guideline developer’s handbook http://www.sign.ac.uk/guidelines/fulltext/50/index.html (accessed July 20, 2009).
Steering Committee on Quality Improvement Management. 2004. Classifying recommendations for clinical practice guidelines. Pediatrics 114(3):874–877.
Tobe, S. W., R. M. Touyz, and N. R. C. Campbell. 2007. The Canadian Hypertension Education Program—a unique Canadian knowledge translation program. Canadian Journal of Cardiology 23(7):551–555.
USPSTF (U.S. Preventive Services Task Force). 2008. Grade definitions. http://www.ahrq.gov/clinic/uspstf/grades.htm (accessed July 28, 2009).