Read "Clinical Practice Guidelines We Can Trust" at NAP.edu

Page 231 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

Appendix D
Systems for Rating the Strength of Evidence and Clinical Recommendations

Page 232 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

TABLE D-1 Selected Approaches to Rating Strength of Evidence and Clinical Recommendations

System	Focus/Audience	Systems for Rating Evidence Quality
International Approaches
Grading of Recommendations Assessment, Development, and Evaluation (GRADE) Working Group (2009)	Focus: Diagnosis and therapy	Grades of evidence Randomized trial: High Observational study: Low Any other evidence: Very low
	Audience: Guideline developers
		Decrease grade if limitations in study quality, important inconsistency of results, uncertainty about the directness of the evidence, imprecise or sparse data, and high risk of reporting bias.
A voluntary, international, collaboration
		Increase grade if a very strong association, evidence of a dose–response gradient, presence of all plausible residual confounding would have reduced the observed effect.

Page 233 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

Strong: Desirable effects clearly outweigh the undesirable effects, or clearly do not. Quality of evidence is high and other considerations support a strong recommendation.

Weak: Trade-offs are less certain—either because of low-quality evidence or because evidence suggests that desirable and undesirable effects are closely balanced. The quality of evidence is high and other considerations support a weak recommendation.

Based on:

Quality of evidence.
Uncertainty about the balance between desirable and undesirable effects.
Uncertainty or variability in values or preferences.
Uncertainty about whether the intervention represents a wise use of resources.

NOTE: Many organizations claim to use GRADE, but modify the system in the application of translating evidence into clinical recommendations or guidelines.

Page 234 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
Centre for Evidence-Based Medicine (CEBM) (2009)	Focus: Prevention, diagnosis, prognosis, therapy, differential diagnosis/symptom prevalence, and economic and decision analyses	CEBM is currently working on updating its level of evidence rankings and providing further rationale for them, tentatively due to become available in January 2010.
One of several UK centers with the aim of promoting evidence-based health care
		This approach has different evidence rating system depending on the type of healthcare intervention. For example, the following rating system is used for therapy interventions:
	Audience: Doctors, clinicians, teachers, and others	Level 1a: Systematic review (SR) of randomized controlled trials (RCTs) with homogeneity.^a
		Level 1b: Individual RCT with narrow confidence interval.
		Level 1c: All or none case series.^b
		Level 2a: SR with homogeneity of cohort studies.
		Level 2b: Individual cohort studies (including quality RCT; e.g., <80% follow-up).
		Level 2c: Outcomes research, ecological studies.^c
		Level 3a: SR with homogeneity of case control studies.
		Level 3b: Individual case control study.
		Level 4: Case series (and poor-quality cohort and case control studies^d).
		Level 5: Expert opinion without explicitly critical appraisal, or based on physiology, bench research, or “first principles.”

Page 235 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

A: Consistent level 1 studies.

B: Consistent level 2 or 3 studies or extrapolations^e from level 1 studies.

C: Level 4 studies or extrapolations from level 2 or 3 studies.

D: Level 5 evidence or troublingly inconsistent or inconclusive studies of any level.

Page 236 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
Scottish Intercollegiate Guidelines Network (SIGN) (2009)	Focus: All healthcare interventions	Levels of evidence 1++ High-quality meta-analyses, systematic reviews of RCTs, or RCTs with a very low risk of bias. 1+ Well-conducted meta-analyses, systematic reviews, or RCTs with a low risk of bias. 1− Meta-analyses, systematic reviews, or RCTs with a high risk of bias. 2++ High-quality systematic reviews of case control or cohort studies. ___ High-quality case control or cohort studies with a very low risk of confounding or bias and a high probability that the relationship is causal. 2+ Well-conducted case control or cohort studies with a low risk of confounding or bias and a moderate probability that the relationship is causal. 2− Case control or cohort studies with a high risk of confounding or bias and a significant risk that the relationship is not causal. 3 Non-analytic studies, such as case reports, case series. 4 Expert opinion.
	Audience: National Health Service in Scotland
New Zealand Guidelines Group (NZGG) (2007)	Focus: Screening, diagnosis, prognosis, and therapy	The body of evidence is the sum of the evidence of all the individual studies and the quality ratings of each study.
Independent, not-for-profit
		Good evidence: From studies of strong design for answering the question addressed.
	Audience: Clinical practitioners, policy makers, and consumers
		Fair evidence: Reasonable evidence, but there may be minimal inconsistency, or uncertainty.
		Expert opinion: For some outcomes, trials or studies cannot be or have not been performed and practice is informed only by expert opinion.

Page 237 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

Guidelines are developed based on judgment on the consistency, clinical relevance, and external validity of the whole body of evidence.

A: At least one meta-analysis, systematic review, or RCT rated as 1++, and directly applicable to the target population; or a body of evidence consisting principally of studies rated as 1+, directly applicable to the target population, and demonstrating overall consistency of results.

B: A body of evidence including studies rated as 2++, directly applicable to the target population, and demonstrating overall consistency of results; or extrapolated evidence from studies rated as 1++ or 1+.

C: A body of evidence including studies rated as 2+, directly applicable to the target population and demonstrating overall consistency of results; or extrapolated evidence from studies rated as 2++.

D: Evidence level 3 or 4; or extrapolated evidence from studies rated as 2+.

Good practice points: Occasionally, guideline development groups find that there is an important practical point that they wish to emphasize, but for which there is not, nor is there likely to be, any research evidence. This typically will be where some aspect of treatment is regarded as such sound clinical practice that nobody is likely to question it. These are shown in the guideline as Good Practice Points, and are marked with a green check.

The grade of the recommendation is based on consideration of

The design and quality of individual studies that have been identified.
Quantity, consistency, applicability, and clinical impact of the body of evidence that is applicable to the guidelines question.
The consensus of a guideline development team.

A: The recommendation is supported by GOOD evidence.

B: The recommendation is supported by FAIR.

C: The recommendation is supported by EXPERT opinion (published) only.

I: Evidence to make a recommendation is INSUFFICIENT.

Page 238 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
The Canadian Hypertension Education Program (2007)	Focus: Diagnosis and therapy related to hypertension	Uses flow charts to assess the evidence according to study methodology:
		A: RCT with blinded assessment of outcomes, intention-to-treat analysis, adequate follow-up, and sufficient sample size to detect a clinically important difference with power >80%.
A Canadian volunteer, non-profit organization	Audience: Canadian Diabetes Association, Canadian Society of Nephrology, Canadian Coalition for High Blood Pressure Prevention and Control, The College of Family Physicians of Canada, Heart and Stroke Foundation of Canada, and Public Health Agency of Canada
		B: Adequate subgroup analysis: Analysis was a priori, performed within an adequate RCT and one of only a few tested, and there was sufficient sample size within the examined subgroup to detect a clinically important difference.
		C: Systematic review or meta-analysis: Comparison arms are derived from head-to-head comparisons within the same RCT.
		D: Observational study or systematic review in which the comparison arms are derived from different placebo-controlled RCTs and then extrapolations are made across RCTs.

Page 239 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

A: The recommendation is supported by a-, b-, or c-level evidence. Clinically important outcomes and the study population is representative of the population in the recommendation.

B: The recommendation is supported by a-, b-, or c-level evidence. Clinically important or validated surrogate outcomes.

C: The recommendation is supported by a-, b-, c-, or d-level evidence. For levels a, b, and c evidence, the outcome is an unvalidated surrogate for clinically important outcomes. For level d evidence, there must be a clinically important outcome and study population representative of the recommendation population, or an outcome-validated surrogate, or results that are extrapolated from study population to real population.

D: Outcome is an unvalidated surrogate for clinically important population, or the applicability of the study is irrelevant.

Page 240 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
U.S. Approaches
Institute for Clinical Systems Improvement (ICSI) (2003)	Focus: Prevention, diagnosis, or management of a given symptom, disease, or condition for individual patients under normal circumstances	Primary reports of new data collection: A: RCT.
Collaborative of 57 medical groups in Minnesota		B: Cohort study.
		C: Nonrandomized trial with concurrent or historical controls, case control study, study of sensitivity and specificity of a diagnostic test, population-based descriptive study.
		D: Cross-sectional study, case series, or case report.
	Audience: Minnesota healthcare providers and payers

Page 241 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

Grade I: Good evidence

The evidence consists of results from studies of strong design for answering the question addressed. The results are both clinically important and consistent with minor exceptions at most. The results are free of any significant doubts about generalizability, bias, and flaws in research design. Studies with negative results have sufficiently large samples to have adequate statistical power.

Grade II: Fair evidence

The evidence consists of results from studies of strong design for answering the question addressed, but there is some uncertainty attached to the conclusion because of inconsistencies among the results from the studies or because of minor doubts about generalizability, bias, research design flaws, or adequacy of sample size. Alternatively, the evidence consists solely of results from weaker designs for the question addressed, but the results have been confirmed in separate studies and are consistent with minor exceptions at most.

Grade III: Limited evidence

The evidence consists of results from studies of strong design for answering the question addressed, but there is substantial uncertainty attached to the conclusion because of inconsistencies among the results from different studies or because of serious doubts about generalizability, bias, research design flaws, or adequacy of sample size. Alternatively, the evidence consists solely of results from a limited number of studies of weak design for answering the question addressed.

Grade not assignable: No evidence is available that directly supports or refutes the conclusion.

Page 242 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
Strength of Recommendation Taxonomy (SORT) (2004)	Focus: Prevention, screening, diagnosis, prognosis, and therapy	Level 1: Good-quality, patient-oriented evidence: Diagnosis: Validated clinical decision rule, ^f SR/meta-analysis of high-quality studies, high-quality diagnostic cohort study. Treatment, prevention, or screening: SR/meta-analysis of RCTs with consistent findings, high-quality individual randomized controlled all-or-none study. Prognosis: SR/meta-analysis of good-quality cohort studies, prospective cohort study with good follow-up.
Developed by the editors of American Family Physician, Family Medicine, The Journal of Family Practice, Journal of the American Board of Family Practice, and BMJ-USA
	Audience: Guideline developers, family practice, and other primary care providers
		Level 2: Limited-quality, patient-oriented evidence:^g Diagnosis: Unvalidated clinical decision rule, SR/meta-analysis of lower quality studies or studies with inconsistent findings, lower quality diagnostic cohort study or diagnostic case control study. Treatment, prevention, or screening: SR/meta-analysis of lower quality clinical trials or studies with inconsistent findings, lower quality clinical trial, cohort study, case control study. Prognosis: SR/meta-analysis of lower quality cohort studies or with inconsistent results, retrospective cohort study or prospective cohort study with poor follow-up, case control study, case series.
		Level 3: Other evidence: Consensus guidelines, extrapolations from bench research, usual practice, opinion, disease-oriented evidence (intermediate or physiologic outcomes only), or case series for studies of diagnosis, treatment, prevention or screening.

Page 243 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

A: Consistent and good-quality, patient-oriented evidence.* (Level 1)

B: Inconsistent or limited-quality, patient-oriented evidence.* (Level 2)

C: Consensus, usual practice, opinion, disease-oriented evidence,* or case series for studies of diagnosis, treatment, prevention, or screening. (Level 3)

Page 244 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
U.S. Preventive Services Task Force (USPSTF) (2008)	Focus: Prevention	High: The available evidence usually includes consistent results from well-designed, well-conducted studies in representative primary care populations. These studies assess the effects of the preventive service on health outcomes. This conclusion is therefore unlikely to be strongly affected by the results of future studies.
	Audience: Guideline developers and users
		Moderate: The available evidence is sufficient to determine the effects of the preventive service on health outcomes, but confidence in the estimate is constrained by factors such as The number, size, or quality of individual studies. Inconsistency of findings across individual studies. Limited generalizability of findings to routine primary care practice. Lack of coherence in the chain of evidence. As more information becomes available, the magnitude or direction of the observed effect could change, and this change may be large enough to alter the conclusion.
		Low: The available evidence is insufficient to assess effects on health outcomes. Evidence is insufficient because of The limited number or size of studies. Important flaws in study design or methods. Inconsistency of findings across individual studies. Gaps in the chain of evidence. Findings not generalizable to routine primary care practice. Lack of information on important health outcomes. More information may allow estimation of effects on health outcomes.

Page 245 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

A: The USPSTF recommends the service. There is high certainty that the net benefit is substantial. Offer or provide this service.

B: The USPSTF recommends the service. There is high certainty that the net benefit is moderate or there is moderate certainty that the net benefit is moderate to substantial. Offer or provide this service.

C: The USPSTF recommends against routinely providing the service. There may be considerations that support providing the service in an individual patient. There is at least moderate certainty that the net benefit is small. Offer or provide this service only if other considerations support the offering or providing the service in an individual patient.

D: The USPSTF recommends against the service. There is moderate or high certainty that the service has no net benefit or that the harms outweigh the benefits. Discourage the use of this service.

I statement: The USPSTF concludes that the current evidence is insufficient to assess the balance of benefits and harms of the service. Evidence is lacking, of poor quality, or conflicting, and the balance of benefits and harms cannot be determined. Read the clinical considerations section of USPSTF Recommendation Statement. If the service is offered, patients should understand the uncertainty about the balance of benefits and harms.

Page 246 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
Professional Societies
American College of Cardiology Foundation/American Heart Association (ACCF/AHA) (2009)	Focus: Prevention, diagnosis, or management of heart diseases or conditions	A: Data derived from multiple randomized clinical trials or meta-analyses.
		B: Data derived from a single randomized trial, or nonrandomized studies.
	Audience: Healthcare providers
		C: Consensus opinion of experts, case studies, or standard of care.
American Academy of Pediatrics (AAP) (2004)	Focus: Pediatric guidelines for all healthcare interventions	A: Well-designed, randomized controlled trials or diagnostic studies on relevant populations.
		B: RCTs or diagnostics studies with minor limitations; overwhelmingly consistent evidence from observational studies.
	Audience: Guideline developers, implementers, and users
		C: Observational studies (case control and cohort design).
		D: Expert opinion, case reports, reasoning from principles.
		X: Exceptional situations where validating studies cannot be performed and there is a clear preponderance of benefit or harm.

Page 247 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

Any combination of classification of recommendation and level of evidence is possible. A recommendation can be Class I, based entirely on expert opinion (level C), or Class IIB, with level A evidence if based on multiple RCTs with divergent conclusions.

Class I: Conditions for which there is evidence and/or general agreement that a given procedure or treatment is useful and effective. Class 1 statements may read: should, is recommended, is indicated, or is useful/effective/beneficial.

Class II: Conditions for which there is conflicting evidence and/or a divergence of opinion about the usefulness/efficacy of a procedure or treatment.

Class IIa: Weight of evidence/opinion is in favor of usefulness/efficacy. Class IIa statements may read: is reasonable, can be useful/effective/beneficial, is probably recommended, is probably indicated.

Class IIb: Usefulness/efficacy is less well established by evidence/opinion. Class IIb statements may read: may/might be considered, may/might be reasonable, usefulness/effectiveness is unknown/unclear/uncertain/not well established.

Class III: Conditions for which there is evidence and/or general agreement that the procedure/treatment is not useful/effective and in some cases may be harmful. Class III statements may read: is not recommended, is not indicated, should not, is not useful/effective/beneficial, may be harmful.

Strong recommendation: The benefits of the recommended approach clearly exceed the harms (or in the case of a negative recommendation, the harms clearly exceed the benefits) and the quality of the evidence is either excellent or impossible to obtain (A, sometimes B, or X).

Recommendation: The benefits exceed the harms or vice versa, but the quality of evidence is not as strong (sometimes B, C, or X).

Option: The evidence quality that exists is suspect or not that well-designed; well-conducted studies have demonstrated little clear advantage of one approach versus another (A, B, C, or D).

No recommendation: There is both lack of pertinent evidence and an unclear balance between benefits and harms (D).

Page 248 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
American Academy of Neurology (AAN) (2004)	Focus: Screening, diagnosis, prognosis, and therapy of neurologic disorders	Similar ratings systems exist for diagnostic, prognostic, and screening interventions. Therapeutic interventions is one example:
		Class I: Prospective, RCT with masked outcome assessment, in a representative population. The following are required: (a) primary outcome(s) clearly defined, (b) exclusion/inclusion criteria clearly defined, (c) adequate accounting for dropouts and crossovers with numbers sufficiently low to have minimal potential for bias, (d) relevant baseline characteristics are presented and substantially equivalent among treatment groups or there is appropriate statistical adjustment for differences.
	Audiences: Neurologists, patients, payers, federal agencies, other healthcare providers, and clinical researchers
		Class II: Prospective matched group cohort study in a representative population with masked outcome assessment that meets a through d above or an RCT in a representative population that lacks one criteria in a through d.
		Class III: All other controlled trials (including well-defined natural history controls or patients serving as own controls) in a representative population, where outcome is independently assessed, or independently derived by objective outcome measurement.
		Class IV: Evidence from uncontrolled studies, case series, case reports, or expert opinion.

Page 249 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

A: Established as effective, ineffective, or harmful (or established as useful/predictive or not useful/predictive) for the given condition in the specified population.

Recommendation: Should be done or should not be done.

Translation of evidence to recommendation: Requires at least two consistent Class I studies.

B: Probably effective, ineffective, or harmful (or probably useful/predictive or not useful/predictive) for the given condition in the specified population.

Recommendation: Should be considered or should not be considered.

Translation of evidence to recommendation: Requires at least one Class I study or two consistent Class II studies.

C: Possibly effective, ineffective, or harmful (or possibly useful/predictive or not useful/predictive) for the given condition in the specified population.

Recommendation: May be considered or may not be considered.

Translation of evidence to recommendation: Level C rating requires at least one Class II study or two consistent Class III studies.

B: Data inadequate or conflicting. Given current knowledge, treatment (test, predictor) is unproven.

Recommendation: None.

Translation of evidence to recommendation: Studies not meeting criteria for Class I–Class III.

Page 250 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
American College of Chest Physicians (ACCP) (2009)	Focus: Diagnosis and management of chest disease	High: RCTs without important limitations or overwhelming evidence from observational studies.
		Moderate: RCTs with important limitations (inconsistent results, methodologic flaws, indirect, or imprecise) or exceptionally strong evidence from observational studies.
	Audience: Chest physicians
		Low: Observational studies or case series.
National Comprehensive Cancer Network (NCCN) (2008)	Focus: Prevention, diagnosis, and therapy related to cancer	High: High-powered randomized clinical trials or meta-analysis.
		Lower: Runs the gamut from phase II to large cohort studies to case series to individual practitioner experience.
	Audience: Oncologists and other healthcare providers

Page 251 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

1A: Strong recommendation. High level of evidence. Benefits outweigh the risks/burdens, or the risks/burdens outweigh the benefits.

1B: Strong recommendation. Moderate evidence. Benefits outweigh the risks/burdens, or the risks/burdens outweigh the benefits.

1C: Strong recommendation. Low or very low evidence. Benefits outweigh the risks/burdens, or the risks/burdens outweigh the benefits.

2A: Weak recommendation. High evidence, and the risks/burdens are evenly balanced with the benefits.

2B: Weak recommendation. Moderate evidence, and the risks/burdens are evenly balanced with the benefits.

2C: Weak recommendation. Low or very low evidence, and the risks/burdens are evenly balanced with the benefits. Or the balance of benefits to risks and burdens is uncertain.

Category 1: The recommendation is based on high-level evidence (e.g., randomized controlled trials), and there is uniform NCCN consensus.

Category 2A: The recommendation is based on lower level evidence and there is uniform NCCN consensus.

Category 2B: The recommendation is based on lower level evidence and there is non-uniform NCCN consensus (but no major disagreement).

Category 3: The recommendation is based on any level of evidence, but reflects major disagreement.

Page 252 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System	Focus/Audience	Systems for Rating Evidence Quality
Infectious Diseases Society of America (2001)	Focus: Healthcare interventions for infectious diseases Audience: Infectious disease clinicians	I: Evidence from >1 properly randomized, controlled trial.
		II: Evidence from >1 well-designed clinical trial, without randomization; from cohort or case-controlled analytic studies (preferably from >1 center); from multiple time-series; or from dramatic results from uncontrolled experiments.
		III: Evidence from opinions of respected authorities, based on clinical experience, descriptive studies, or reports of expert committees.
^a Homogeneity refers to an SR that is free of worrisome variations (heterogeneity) in the directions and degrees of results between individual studies. ^b Met when all patients died before the Rx became available, but some now survive on it, or when some patients died before the Rx became available, but none now die on it. ^cA member of CEBM stated that this ranking requires further analysis, as well as more detailed explanation of what is meant by ecological and outcomes research. ^d Poor-quality prognostic cohort study refers to one in which sampling is biased in favor of patients who already had the target outcome, or the measurement of outcomes is accomplished in < 80 percent of study patients, or outcomes were determined in an unblinded, non-objective way, or there is no correction for confounding errors. ^e Extrapolations are where data are used in a situation that has potentially clinically important differences than the original study situation.

Page 253 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

System for Rating Clinical Recommendations’ Strength

A: Good evidence to support a recommendation for use.

B: Moderate evidence to support a recommendation for use.

C: Poor evidence to support a recommendation for use.

^fClinical decision rules (CDRs) are tools designed to help clinicians make bedside diagnostic and therapeutic decisions. The development of a CDR involves three stages: derivation, validation, and implementation.

^gPatient-oriented evidence measures outcomes that matter to patients: morbidity, mortality, symptom improvement, cost reduction, and quality of life. Disease-oriented evidence measures intermediate, physiologic, or surrogate end points that may or may not reflect improvements in patient outcomes (e.g., blood pressure, blood chemistry, physiologic function, pathologic findings).

SOURCES: AAN (2004); ACCF/AHA (2009); ACCP (2009); CEBM (2009); Ebell et al. (2004); GRADE Working Group (2009); ICSI (2003); Kish (2001); NCCN (2008); NZGG (2007); SIGN (2009); Steering Committee on Quality Improvement Management (2004); Tobe et al. (2007); USPSTF (2008).

Page 254 Cite

Suggested Citation:"Appendix D: Systems for Rating the Strength of Evidence and Clinical Recommendations." Institute of Medicine. 2011. Clinical Practice Guidelines We Can Trust. Washington, DC: The National Academies Press. doi: 10.17226/13058.

×

REFERENCES

AAN (American Academy of Neurology). 2004. Clinical practice guidelines process manual. http://www.aan.com/globals/axon/assets/3749.pdf (accessed July 28, 2009).

ACCF/AHA (American College of Cardiology Foundation/American Heart Association). 2009. Methodology manual for ACCF/AHA guideline writing committees. http://www.americanheart.org/downloadable/heart/12378388766452009MethodologyManualACCF_AHAGuidelineWritingCommittees.pdf (accessed July 29, 2009).

ACCP (American College of Chest Physicians). 2009. The ACCP grading system for guideline recommendations. http://www.chestnet.org/education/hsp/gradingSystem.php (accessed July 28, 2009).

CEBM (Centre for Evidence-Based Medicine). 2009. Oxford Centre for Evidence-based Medicine—Levels of Evidence (March 2009). http://www.cebm.net/index.aspx?o=1025 (accessed July 28, 2009).

Ebell, M. H., J. Siwek, B. D. Weiss, S. H. Woolf, J. Susman, B. Ewigman, and M. Bowman. 2004. Strength of recommendation taxonomy (SORT): A patient-centered approach to grading evidence in medical literature. American Family Physician 69(3):548–556.

GRADE Working Group (Grading of Recommendations Assessment, Development, and Evaluation Working Group). 2009. Grading the quality of evidence and the strength of recommendations http://www.gradeworkinggroup.org/intro.htm (accessed July 20, 2009).

ICSI (Institute for Clinical Systems Improvement). 2003. Evidence grading system. http://www.icsi.org/evidence_grading_system_6/evidence_grading_system__pdf_.html (accessed September 8, 2009).

Kish, M. A. 2001. Guide to development of practice guidelines. Clinical Infectious Diseases 32(6):851–854.

NCCN (National Comprehensive Cancer Network). 2008. About the NCCN clinical practice guidelines in oncology. http://www.nccn.org/professionals/physician_gls/about.asp (accessed September 8, 2009).

NZGG (New Zealand Guidelines Group). 2007. Handbook for the preparation of explicit evidence-based clinical practice guidelines. http://www.nzgg.org.nz/download/files/nzgg_guideline_handbook.pdf (accessed September 4, 2009).

SIGN (Scottish Intercollegiate Guidelines Network). 2009. SIGN 50: A guideline developer’s handbook http://www.sign.ac.uk/guidelines/fulltext/50/index.html (accessed July 20, 2009).

Steering Committee on Quality Improvement Management. 2004. Classifying recommendations for clinical practice guidelines. Pediatrics 114(3):874–877.

Tobe, S. W., R. M. Touyz, and N. R. C. Campbell. 2007. The Canadian Hypertension Education Program—a unique Canadian knowledge translation program. Canadian Journal of Cardiology 23(7):551–555.

USPSTF (U.S. Preventive Services Task Force). 2008. Grade definitions. http://www.ahrq.gov/clinic/uspstf/grades.htm (accessed July 28, 2009).