5
Potential of Screening to Reduce the Burden of Cancer1

Acomplementary strategy to preventing the occurrence of cancer (primary prevention) is early detection of cancer through screening (secondary prevention). The fundamental tenet of screening for cancer is that finding the disease before symptoms develop enables detection at a less advanced stage and that instituting treatment at that time leads ultimately to improved health outcomes. Although this syllogism seems intuitive and is widely assumed to be true by both health professionals and the lay public, its validity is unclear for many cancers.

The term screening, as used in this report, refers to the early detection of cancer or premalignant disease in persons without signs or symptoms suggestive of the target condition (the type of cancer that the test seeks to detect). Some investigators draw a distinction between screening and case finding, using the former term to describe population-based screening programs, such as those conducted at health fairs or shopping malls, and the latter term to refer to testing of patients in the clinical setting. This report refers to both forms of testing as screening because the evidence base is similar in both contexts. Diagnostic testing, which is not addressed in this report, refers to the evaluation of patients with signs or symptoms associated with cancer (e.g., a breast lump, blood in stool, fatigue, or weight loss), often by use of the same tests used for screening. Surveillance

1  

This chapter is a condensed version of a background paper prepared by Steven H. Woolf (www.iom.edu/ncpb).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection 5 Potential of Screening to Reduce the Burden of Cancer1 Acomplementary strategy to preventing the occurrence of cancer (primary prevention) is early detection of cancer through screening (secondary prevention). The fundamental tenet of screening for cancer is that finding the disease before symptoms develop enables detection at a less advanced stage and that instituting treatment at that time leads ultimately to improved health outcomes. Although this syllogism seems intuitive and is widely assumed to be true by both health professionals and the lay public, its validity is unclear for many cancers. The term screening, as used in this report, refers to the early detection of cancer or premalignant disease in persons without signs or symptoms suggestive of the target condition (the type of cancer that the test seeks to detect). Some investigators draw a distinction between screening and case finding, using the former term to describe population-based screening programs, such as those conducted at health fairs or shopping malls, and the latter term to refer to testing of patients in the clinical setting. This report refers to both forms of testing as screening because the evidence base is similar in both contexts. Diagnostic testing, which is not addressed in this report, refers to the evaluation of patients with signs or symptoms associated with cancer (e.g., a breast lump, blood in stool, fatigue, or weight loss), often by use of the same tests used for screening. Surveillance 1   This chapter is a condensed version of a background paper prepared by Steven H. Woolf (www.iom.edu/ncpb).

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection refers to follow-up screening for new evidence of cancer in patients who have already been diagnosed with and treated for cancer or premalignant disease. This chapter reviews the principles used for determination of the effectiveness of cancer screening and applies those principles in an examination of current scientific evidence regarding the benefits and harms of screening for four types of cancer (cancers of the colon and rectum, breast, prostate, and cervix). Chapter 6 examines strategies for optimization of the delivery of recommended cancer screening tests from the perspective of the health care system, providers, and, most importantly, the patient. Chapter 7 presents a case study that reviews the history and prospects for screening for lung cancer, illustrating the difficulties of adopting new technologies in the face of uncertain science. PRINCIPLES FOR ASSESSMENT OF THE EFFECTIVENESS OF SCREENING FOR CANCER The principal considerations in judging the effectiveness of cancer screening are (1) the burden of suffering, the frequency of cancer, and the severity of its health effects; (2) the accuracy and reliability of the screening test in detecting cancer and minimizing inaccurate test results; (3) the effectiveness of early detection, including the incremental benefit of detecting and treating cancer at an earlier stage; (4) the harms of screening, both from the testing process and from the incremental harms from evaluation and treatments that follow; and (5) costs. These considerations form the tradeoffs used to weigh the benefits and harms of screening. All of the preceding analytical steps are necessary to address the pivotal question of whether patients and populations experience better outcomes with screening than without it. Burden of Suffering The first consideration in assessing the effectiveness of cancer screening is the frequency with which cancer occurs in the population and its attendant health effects. The prevalence rate determines the pretest probability of disease or the average likelihood that a person in the screened population will have cancer. The lower this value is, the larger the number of tests that must be performed to detect one case of cancer (i.e., it will have lower yield) and, for statistical reasons discussed below, the greater the chances that a positive test result will be erroneous (a false-positive result). Mortality rates and other measures of the probability of adverse health effects from cancer influence the absolute benefit of screening (see the discussion of absolute benefit versus relative benefit below). For example, if a screening test reduces the risk of dying from cancer by 20 percent (relative

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection risk reduction), the number of lives saved by screening or the probability that a person undergoing screening will avert death (absolute benefit) depends directly on the baseline mortality rate in the screened population. If that rate is 30/100,000 per year, screening will save six lives per 100,000 screened. The same form of screening would save only one life in a lower-risk setting where the mortality rate is only 5/100,000. As detailed below, absolute risks influence the number of individuals who need to be screened to achieve a health benefit. Most published rates for cancer morbidity and mortality are derived from patients with clinically detected disease (i.e., cancers that come to the attention of health care providers in the evaluation of abnormal symptoms or physical findings), but the types of cancers detected by screening also include those that were not destined to manifest clinical symptoms and that are therefore of uncertain clinical significance. Autopsy studies have demonstrated for decades that a large proportion of persons live their lives harboring occult cancers that cause little or no clinical symptoms because of their slow rate of growth or late onset. Screening often detects such lesions, but it is often difficult to determine at the time of diagnosis whether those cancers were destined to progress. This phenomenon of screening is known as overdiagnosis and is important because the degree to which cancers that are not destined to progress are represented among cancers detected by screening limits the net health benefits of screening. Overdiagnosis figures prominently in debates about the benefits of screening for various cancers. A common criticism of screening for prostate cancer, for example, is that many of the cancers detected by screening are latent carcinomas that, due to that disease’s slow growth characteristics, are unlikely to progress or cause clinical symptoms (Woolf, 1995). Screening mammography has led to increased detection of ductal carcinoma in situ (Feig, 2000; Winchester et al., 2000), the clinical significance of which is debated. Cervical cancer screening uncovers various forms of cervical atypia for which the need to treat and proper approach for follow-up are uncertain. Finally, new imaging technologies for lung cancer screening are finding small cancers and pulmonary nodules about which the natural history is uncertain (Frame, 2000). Accuracies and Reliabilities of Screening Tests The second consideration in judging the effectiveness of screening for cancer is whether the available test(s) can detect cancer at an early stage without producing large numbers of false-positive and false-negative results. Of greatest concern is the test’s accuracy, the degree to which it measures the true value of the attribute it is testing, and its reliability, the consistency of the result when it is repeated. The principal parameters for measuring accuracy are sensitivity, specificity, and predictive value.

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection Sensitivity, Specificity, and Predictive Value Sensitivity is the proportion of persons with cancer who correctly test positive, and specificity is the proportion of persons without cancer who correctly test negative (Box 5.1). Sensitivity and specificity are usually inversely related, so that tests with high sensitivities (i.e., those that miss few cases of cancer) tend to have low specificities (i.e., produce a higher proportion of false-positive results). If patients are put at substantial risk by receiving false-positive results, it may be worth compromising sensitivity—even though it means fewer cancers will be detected—in the interest of adopting a screening test or threshold with a higher specificity that generates fewer false-positive results. Although the sensitivity and specificity of a cancer screening test are generally constant across populations and settings, this is not true for the positive predictive value (PPV), which is the probability that an abnormal BOX 5.1 Definitions of Screening Test Performance The performance of a screening test is often defined by three related measurements: sensitivity, specificity, and positive predictive value. The sensitivity (se) of a screening test is the proportion of people with the disease who test positive. Specificity (sp) is the proportion of people without the disease who test negative. The positive predictive value is the portion of individuals with a positive screening test who actually have the disease. If screened individuals are assigned a position in a 2 × 2 classification scheme based on their disease status and test result, values for the three measurements can be defined as follows: Actual Disease Status   + – Test Result + True Positive (TP) False Positive (FP)   – False Negative (FN) True Negatives (TN) Measurement: Question Answered: Sensitivity (se) = How often does the test correctly identify individuals with the disease? Specificity (sp) = How often does the test correctly identify individuals without the disease? Positive predictive value = Among individuals with an abnormal test, what proportion actually have the disease? The positive predictive value is a function of sensitivity, specificity, and disease prevalence (P), with the following mathematical relationship:

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection result correctly indicates cancer (Box 5.1). The PPV depends on the pretest probability or likelihood that cancer is present at the time that the person is tested. For any cancer screening test, the PPV is lower (and the chances of false-positive results are higher) when there is a lower prevalence of cancer. This important principle that underlies many concerns about cancer screening is best understood by example (Table 5.1). Suppose a test has a sensitivity and a specificity of 90 percent each. Clinicians would characteristically misinterpret these data to mean that a patient who has a positive result has a 90 percent likelihood of having cancer (i.e., PPV = 90 percent). In actuality the PPV is dependent on a third variable, the prevalence, or pretest probability, of cancer. Suppose the prevalence of cancer is 1 percent (1,000/ 100,000 population). This means that if 100,000 persons are screened, of whom 1,000 actually have cancer, then the 90 percent sensitivity means that 900 of these 1,000 will test positive, and the 90 percent specificity means that 89,100 of the 99,000 people without cancer will test negative. The chances that a positive test result is indicative of cancer (the PPV question asked above) would not be 90 percent, but 900/10,800, or 8 percent. The seeming accuracy conveyed by the “90 percent” figure for both sensitivity and specificity obscures the disturbing problem that the test would give false-positive information to 92 percent of those testing positive (11 people for every 1 person who truly had cancer). Because PPV correlates with prevalence, if the same test is administered in a community with a lower prevalence, the PPV would fall even further and the risk of producing false-positive results would climb higher (99 percent of those testing positive or 111 people for every 1 person who truly had cancer) (Table 5.1). The policy significance of these mathematics is that, regardless of the accuracy of a screening test, the administration of a test to populations or individuals with a low risk of cancer has TABLE 5.1 Illustration of Influence of Prevalence on Positive Predictive Value Prevalence = 1 percent, sensitivity = 90 percent, specificity = 90 percent Test Result No. with Disease Present No. with Disease Absent Total No. Positive 900 9,900 10,800 Negative 100 89,100 89,200 Total 1,000 99,000 100,000 Prevalence = 0.1 percent, sensitivity = 90 percent, specificity = 90 percent Test Result No. with Disease Present No. with Disease Absent Total No. Positive 90 9,990 10,080 Negative 10 89,910 89,920 Total 100 99,900 100,000

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection a potential to introduce major problems with false-positive results, leading to harms that can offset the benefits of screening. Reliability Reliability (or reproducibility) is the degree to which a screening test yields the same result when it is repeated under the same conditions. A laboratory assay for a serum tumor marker, for example, lacks reliability if it yields significantly different results when the test is repeated with a sample from the same tube of blood. Radiologists’ interpretation of a screening chest radiograph can suffer from poor reliability due to either interobserver variation (differences between radiologists’ interpretation of the same film) or intraobserver variation (different interpretations of the same film by one radiologist). Effectiveness of Early Detection A common mistake in determining whether screening for cancer is justified and a reason for premature enthusiasm for promoting screening tests is to limit consideration to the issues described above: burden of suffering and accuracy. Proponents argue that if the disease is serious and an accurate test is available, routine screening should be instituted. What this argument overlooks is the possibility that early detection of the disease may not improve outcomes either for the screened population as a whole or even for the individuals who will be found to have cancer. Effectiveness of Treatment The efforts and potential adverse effects of screening are not justified if an effective treatment is unavailable for persons found to have cancer. The tragedy of many cancers is that they progress inexorably, despite the use of the best available treatment regimens, because of the inability of these therapies to alter the natural history of the disease. Screening for such cancers serves only to identify the disease earlier in its course, not to improve the prognosis. This longer apparent survival time is not a benefit to the patient (and indeed may be a psychological and social cost) if that earlier diagnosis did not result in either less morbidity from treatment or longer life. The benefits of early detection are muted for cancers that have a short preclinical period because the time window for early detection is short and the opportunity to affect outcomes is brief. Screening is also unlikely to confer benefits by detecting cancers that would have excellent outcomes under usual circumstances, when treatment is not initiated until patients present with symptoms. This concern underlies skepticism about the incre-

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection mental benefit of screening for endometrial or testicular cancer, for example. Latent cancers detected by screening-induced overdiagnosis also may not benefit from early detection if the lesions were not destined to progress or affect the patient’s health. Although little harm would have occurred if the cancer went undetected, the excellent outcomes of screening programs that predominantly detect such lesions are often cited as evidence of the benefits of screening. These principles are embodied in Whitmore’s now-famous aphorism about prostate cancer: “Is cure possible for those for whom it is necessary, and is cure necessary for those in whom it is possible?” (Whitmore, 1988, pp. 7–11). Incremental Benefit of Early Detection Having an effective treatment is not enough. The logic behind screening rests on the argument that outcomes are improved by the early institution of treatment. If there is no incremental health benefit to early detection and patients fare just as well if their cancers are diagnosed after signs or symptoms appear, then there is not a good argument for screening. In this case, there are harms of screening, including adverse effects of screening on people without cancer, many of whom will experience anxiety and undergo workups for false-positive results, and the adverse effects of consumption of resources that would help patients more effectively if they were invested elsewhere. The presumption that early detection improves outcomes is almost axiomatic in U.S. society. Epidemiological evidence would seem to support this belief. For almost all forms of cancer, 5-year survival rates are substantially lower for persons with advanced-stage disease (see Chapter 1). Such statistics are often mistakenly interpreted as evidence that patients are likely to live longer if their cancer is diagnosed early (see discussion of “lead time bias” below). Screening is consistently associated with the diagnosis of smaller and more localized tumors and with the familiar phenomenon of “stage shift,” in which the proportion of cancers diagnosed at an earlier stage increases after screening is introduced. Also, observational studies demonstrate that patients whose cancers are diagnosed through screening often have better outcomes than those whose cancers are diagnosed otherwise. Many advocates of cancer screening find such evidence more than adequate to justify the intuitive notion that early detection is beneficial. Whether such evidence is indeed adequate lies at the heart of many controversies about cancer screening. Critics of such evidence argue that such observations do not offer proof of benefit because the same patterns would be expected even if screening did not improve outcomes. For example, the fact that patients who participate in screening programs have better outcomes than those in other settings may be due to the fact that patients who participate in screening are more likely to have a college

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection education, to be nonsmokers, and to have other healthier habits (Rimer et al., 1996b). Similarly, the fact that screening detects disease at an earlier stage and that patients diagnosed with localized disease have higher 5-year survival rates may reflect length and lead-time biases rather than true lengthening of life (Welch et al., 2000). The influence of these factors cannot be excluded unless outcomes are examined for a control group that is comparable in all respects other than exposure to screening, as has been done in trials of screening mammography. Lead-Time Bias Lead-time bias refers to the overestimation of survival time simply due to a backward shift in the starting point for the measurement of survival as a result of early detection (Last, 1988). Patients diagnosed earlier can seem to live longer after diagnosis even if the time that they die does not change. For illustration, consider a man who is destined to develop symptoms from prostate cancer at age 65 and to die at age 70. His survival after diagnosis (5 years) can be doubled (10 years) if the cancer is detected through screening at age 60, even if he still dies from that same cancer at age 70. Because of lead-time bias, the fact that 5-year survival rates are higher for early-stage cancer than for advanced-stage cancer does not, by itself, prove that patients who are screened benefit from that screening and live longer; it may mean only that their disease is detected earlier. Similarly, the tendency of screening to detect smaller, localized tumors proves that cancers are being found at an earlier stage of their progression, not that the outcomes of that progression will necessarily be altered. Length Bias Length bias refers to the tendency of screening to detect slowly growing lesions more readily than aggressive cancers. Rapidly progressive cancers, because they lead more hastily to death, are present in the screened population for a shorter period of time, thereby reducing their prevalence in the population and, thus, their odds of being detected when a screening test is administered. The consequence of length bias is that cancers detected by screening contain a higher proportion of slowly growing cancers than among cancers detected by symptoms. The favorable prognosis observed for cancers detected through screening may therefore imply a benefit from screening even when there is none. Screening Interval and Duration Under conditions of uncertainty, in which the optimal frequency of screening has not been determined directly in clinical studies, there is a tendency to assume that a shorter interval is appropriate if the individual is at high risk of acquiring cancer. This assumption, which underlies the common advice that individuals in high-risk groups undergo more frequent screening, may be invalid because the proper determinants of the frequency of a screening test are the rate of progression of the disease and the sensitivity of the test. If these variables are held

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection constant, increasing the frequency of testing offers little benefit, regardless of one’s underlying risk of acquiring cancer (Frame and Frame, 1998). Many controversies in cancer screening surround the question of when to stop. For most cancers, the absolute risk of dying from cancer increases with age, making elderly individuals the largest subset of people with cancer. On the other hand, the decreasing life expectancy and the greater likelihood of having other diseases that accompany advancing age tend to offset these benefits. One analysis, based on certain assumptions of efficacy, estimated that lifetime screening for breast cancer from age 50 until death results in a maximum potential life expectancy gain of 43 days, whereas the cessation of screening at age 75 or 80 would result in women giving up a maximum potential life expectancy gain of 9 or 5 days, respectively (Rich and Black, 2000). Rather than relying on such modeling data, which have their limitations, it would be preferable to examine direct evidence of the relative benefits of screening with advancing age, but most screening trials have limited enrollment to patients under the ages of 70, limiting access to definitive data. Because many older adults have excellent life expectancies and qualities of life, current thinking is shifting away from reliance on strict age cutoffs for screening and looking more closely at the life expectancy and health status of each individual to assess the potential benefits of screening. Study Designs For the reasons outlined above, epidemiological studies reporting better outcomes for individuals with early-stage cancer tend not to persuade skeptics that early detection improves outcomes. Study designs fall in a hierarchy of persuasiveness (Box 5.2), in which uncontrolled epidemiological data and case series rank lowest in proving effectiveness. Controlled observational studies compare outcomes among those who do or do not receive screening and bring investigators and clinicians one BOX 5.2 Hierarchy of Effectiveness of Study Designs Experimental trials Randomized controlled trials Nonrandomized controlled trials Controlled observational studies Cohort studies Case-control studies Cross-sectional comparisons Historical (before-and-after) studies Epidemiological studies without controls Case reports, case series, descriptive analyses

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection step closer to having definitive evidence of the effectiveness of screening. Historical studies (before-and-after studies), such as a comparison of outcomes within a community before and after the introduction of a screening program, raise questions about the influence of temporal factors (e.g., improved treatment regimens) other than screening that occurred contemporaneously with the screening program. Cross-sectional comparisons, such as comparisons of outcomes for patients screened at a local institution with those for other patients in the community, also lack persuasiveness because of potential confounding variables: the characteristics of patients at these institutions may have an independent effect on the observed outcomes that are unrelated to screening. In a case-control study, a retrospective review of medical records is undertaken to compare patients who died of cancer to a matched group of patients who did not die from cancer. If the patients who died from cancer were significantly less likely to have undergone screening, it is tempting to infer that the screening test was beneficial. The limitations of such studies include their retrospective design (e.g., medical records may not systematically capture relevant variables) and the difficulties of addressing confounding variables (persons who underwent screening may have other characteristics, such as healthier lifestyles, which may have contributed to the observed outcomes). Matching of the two groups by known confounding variables (e.g., age and risk factors) and the formulation of statistical adjustments in the odds ratios to control for such cofactors address some of these problems, but such studies cannot exclude the role of unknown or unmeasured confounding variables. Prospective cohort studies overcome some of the limitations of retrospective analyses by establishing the variables of interest at the start of the study and collecting them systematically over time, often with long periods of follow-up, but the potential influence of confounding remains. Unless the decision to screen patients is made randomly, it is possible that screened and unscreened persons differ in characteristics other than screening that may account, at least in part, for the observed outcomes. It is this concern that accounts for the primacy of randomized controlled trials in demonstrating the effectiveness of screening (Jadad, 1998). The defining characteristic of such trials is that the assignment of patients to undergo screening is made randomly, creating comparison groups that are essentially the same in all respects other than exposure to screening. Unrecognized, as well as known, confounding variables are thereby distributed equally and should therefore not contribute to observed differences in outcomes. Outcome Measures The persuasiveness of evidence that screening does or does not improve outcomes depends in large part on which outcomes are considered. The

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection outcomes that matter most are health outcomes, which in this report refer to outcomes that are perceptible to patients (e.g., pain, dysfunction, and death). Because of the lengthy follow-up periods and methodological challenges associated with the measurement of such outcomes, however, many studies infer effectiveness by measuring intermediate or surrogate outcomes. Intermediate outcomes are findings that are not health outcomes in themselves (e.g., histological features of a cancer) but that are thought to increase the risk of such outcomes. Surrogate outcomes are indicators that correlate with but that are not themselves health outcomes (e.g., length of hospital stay). One must be cautious, however, in relying on such indicators to infer effectiveness because screening can improve intermediate outcomes without necessarily improving health (Bucher et al., 1999; Gøtzsche et al., 1996). The most definitive health outcome in terms of both importance to patients and relative ease of measurement is death, and thus, much of the focus in cancer screening is on evaluating whether death rates are lowered. As noted earlier, lead-time bias limits the utility of measuring survival after diagnosis, and thus, the conventional basis of comparison in screening trials is the proportion of persons in the intervention and control groups who die from cancer in a defined follow-up period. The customary endpoint is the cancer-specific mortality rate and not mortality from all causes. In theory, a demonstrated reduction in all-cause mortality would be ideal, to ensure that death from cancer is not traded for death from another cause (such as fatal complications induced by screening or treatment). But because any specific cancer accounts for a relatively small proportion of all deaths in a population, the statistical power required to demonstrate an effect on all-cause mortality would require trials to have a sample size and duration that would render them unfeasible. Although most trials are therefore not powered to show an effect on all-cause mortality, their failure to do so is often mistakenly interpreted as evidence of a lack of benefit or, more erroneously, as evidence that screening somehow induces deaths from other causes. Results can be statistically significant without having clinical or public health significance. Proponents of screening, in making their case, often emphasize the relative benefits rather than the absolute benefits of interventions. The absolute benefit of a 20 percent relative reduction in the risk of dying from cancer depends on the baseline probability of death. If that probability is 100/100,000 over some defined interval of time, the intervention reduces the risk of death to 80/100,000, an absolute difference of 20/ 100,000 or an absolute risk reduction of 0.02 percent, a far less impressive figure than the relative risk reduction of 20 percent. Although both figures are true, the absolute risk reduction has important policy implications, because it indicates that a large number of people must receive the intervention to save the life of one individual. The number of people who need to be

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection tatectomy and radiation therapy are generally unreliable because of differences in study design, patient populations, and outcome measures. Patients tend to report more bowel dysfunction with radiation therapy and more sexual dysfunction with radical prostatectomy (Shrader-Bogen et al., 1997). At a median follow-up of 14 years, patients who undergo radiotherapy report worse bladder, bowel, and erectile functions than are reported for men without prostate cancer (Johnstone et al., 2000). A recent study reported that almost 2 years after treatment, men receiving radical prostatectomy were more likely than men receiving radiotherapy to be incontinent and impotent. Radiotherapy produced greater declines in bowel function (Potosky et al., 2000). Cost-Effectiveness The widespread performance of PSA testing is costly. A 1995 Canadian study reported screening of all eligible men in Canada would have cost $317 million (Canadian dollars) (Krahn et al., 1999). An older U.S. article claimed 1 year of PSA screening could cost the United States $28 billion (Kramer et al., 1993). Several cost-effectiveness analyses have been published. One estimated that digital rectal examination and PSA screening at ages 50 to 69 would cost $12,491 to $18,769 per year of life saved (Coley et al., 1997). An analysis from the Medicare perspective by the Office of Technology Assessment of the U.S. Congress estimated that, given favorable assumptions, a one-time digital rectal examination-PSA screening would cost from $14,200 per year of life saved at age 65 to $51,290 per year of life saved at age 75, although the report emphasized that the estimates were highly sensitive to arguable assumptions (U.S. Congress, 1995). Similarly, other analyses have conjectured that screening for prostate cancer would have favorable cost-effectiveness ratios given certain assumptions about benefits and performance characteristics (Benoit and Naslund, 1997; Littrup, 1997). A screening program in Sweden estimated that screening costs about $14,900 per patient (U.S. dollars, 158,000 SEK) (Holmberg et al., 1998). Claims of cost-effectiveness are dubious if the denominator, the magnitude of benefit from screening, is uncertain and if assumptions used in the model are debatable. Modeling Studies to Weigh Trade-Offs In contrast to many of the other forms of cancer screening reviewed in this report, investigators studying prostate cancer screening have attempted to use modeling techniques to quantify the influence of subjective value judgments on the weighing of benefits and harms. The models take account of potential harms by adjusting for patient utilities. Older analyses found

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection that screening achieves minor improvements in absolute survival (Love et al., 1985; Thompson et al., 1987), but more recent analyses that adjust for utilities have concluded that screening produces, at best, a modest gain, measured in days to weeks, or a net loss in QALYs (Cantor et al., 1995; Coley et al., 1997; Krahn et al., 1994; Mold et al., 1992). According to those studies, the harmful effects of screening and treatment on quality of life undercut the potential gains in life expectancy, but the assumptions used in the models have been challenged (Miles et al., 1995). Some modeling studies have compared the relative impacts of different testing protocols. One examined the benefits of conducting screenings less frequently, estimating for a hypothetical population of 1,000 men that annual PSA testing beginning at age 50 would require 10,500 PSA tests, prevent 3.2 deaths, and require 600 biopsies, whereas a policy of PSA testing at ages 40 and 45 years followed by biennial testing beginning at age 50 would require 7,500 PSA tests, prevent 3.3 deaths, and require 450 biopsies (Ross et al., 2000). Other decision analyses have focused on treatment. An analysis for men aged 60 to 75 concluded that treatment increases quality-adjusted survival by less than 1 year (in most cases, by less than 0.2 QALY) compared with observation (Fleming et al., 1993). For men over age 70 and younger men with well-differentiated disease, treatment appeared to be more harmful than watchful waiting. Critics of the analysis questioned the probabilities for certain components of the model and the inclusion of a relatively older population of men with low-volume and low-grade tumors (Beck et al., 1994; Walsh, 1993). The investigators emphasize that the data were adjusted for age and tumor grade. Other studies also concluded that radical prostatectomy and radiation therapy produce a net decrease in quality of life, even after adjusting for prevalence rates for sexual and urinary dysfunction (Litwin et al., 1995). Although some patients are willing to risk these complications of treatment, others do not believe that the risks are justified. In one study, 26 percent of patients (mean age, 66 years) indicated a preference for expectant management over surgery, even if the latter would extend life by 10 years (Mazur and Merz, 1996). Subjective Value Judgments Given the lack of direct evidence about the benefits of early detection, the uncertainty about complication rates, and the indefinite implications of modeling studies, the ultimate judgment of whether benefits outweigh harms remains subjective (Woolf and Rothemich, 1999). Physicians who weigh the trade-offs are affected by personal beliefs about the intuitive benefits of early detection, clinical training and experience, practice norms, patients’ expectations, insurance coverage, and medicolegal concerns. Many clinicians feel compelled to screen patients for prostate cancer to protect them-

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection selves against litigation and damages should patients later develop prostate cancer. However, what is best for the individual patient depends on personal preferences, subjective values, and individual risks (Woolf, 1997a,b). A man’s fears, lifestyle plans, and priorities dictate whether the balance of benefits and harms is favorable. These issues received little attention in the early 1990s, when PSA screening guidelines first emerged, and organizations assumed polar positions on whether men should receive the test. Groups on one extreme recommended that all men uniformly undergo screening (American College of Radiology, 1991; American Urological Association, 1992; Mettlin et al., 1993), and opposing groups argued against routine screening (U.S. Preventive Services Task Force, 1989). Most guidelines now take account of the importance of patient preferences (Box 5.5). The move toward a policy of shared decision making has subdued the heated controversy that once characterized guidelines on this topic and is giving way to an emerging consensus around a patient-centered approach. The shift in policy began to occur in 1997 when the American College of Physicians, a group previously associated with its resistance to prostate cancer screening, recommended that the physician “describe the potential benefits and known harms of [prostate] screening ..., listen to the patient’s concerns, and then individualize the decision to screen” (American College of Physicians, 1997, p. 482) (italics added). In 1998, the American Academy of Family Physicians adopted a similar policy (American Academy of Family Physicians, 1998), advising that physicians “counsel [men] regarding the known risks and uncertain benefits of screening for prostate cancer” (American Academy of Family Physicians, 2000, p. 14). In 2000, the American Urological Association, once the staunchest advocate of routine screening, stated that “early detection of prostate cancer should be offered” and emphasized in its 2000 practice policy that “the decision to use PSA for the early detection of prostate cancer should be individualized. Patients should be informed of the known risks and the potential benefits” (American Urological Association, 2000, p. 271). The 2001 guidelines of the American Cancer Society state that “the PSA test and the digital rectal examination should be offered.... Information should be provided to patients about benefits and limitations of testing. Specifically, prior to testing, men should have an opportunity to learn about the benefits and limitations of testing for early prostate cancer detection and treatment” (Smith et al., 2001, pp. 42–43). Referring to the controversy surrounding the National Institutes of Health consensus conference statement on breast cancer screening, the absence of a similar phenomenon for prostate cancer screening and the likely role that language has played in the acceptability of the prostate cancer screening policy are worth noting. The organizations’ consistent advice that physicians “offer” choices to patients conveys a sense of partnership and

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection BOX 5.5 Recommendations for Screening for Prostate Cancer Organization Recommendations American Cancer Society (Smith et al., 2001), pp., 42–43 The PSA test and the digital rectal examination should be offered annually beginning at age 50 to men who have a life expectancy of at least 10 years. Men at high risk should begin testing at age 45. Information should be provided to patients about the benefits and limitations of testing. Specifically, before testing, men should have an opportunity to learn about the benefits and limitations of testing for early prostate cancer detection and treatment. High-risk men (men of sub-Saharan African descent or with a first-degree relative with a diagnosis of prostate cancer at a young age) should begin testing for early prostate cancer detection at age 45. The digital rectal examination should be included whenever appropriate. American Urological Association, 2000, p. 271 Early detection of prostate cancer should be offered to asymptomatic men age 50 or older with an estimated life expectancy of more than 10 years. It is reasonable to offer testing at an earlier age to men with defined risk factors, including men with a first-degree relative who has prostate cancer and African-American men. American Academy of Family Physicians, 2000 As a guideline, for men ages 50 to 65, counsel them regarding the known risks and uncertain benefits of screening for prostate cancer. places the locus of control at a more moderate position in the continuum of control than did the National Institutes of Health consensus conference’s reference to the “woman’s decision” about mammography. Guidelines that individuals use to make decisions about their personal choices about prostate cancer screening differ in important respects from the guidelines that governments and policy makers for health plans must use to make decisions for populations of individuals (Woolf, 1997b; Woolf and Rothemich, 1999). Which way the scales tip for a population depends on the average utilities of men as a whole. Although some men favor screening, modeling studies that incorporate the full distribution of men’s utilities, cited above, suggest that screening decreases QALYs. Population policy also requires consideration of resources: whether it is appropriate to invest in screening, especially for an intervention of uncertain effectiveness and safety, if it comes at the expense of other services. Policy positions opposing routine screening of the population for prostate cancer have therefore been issued in the United States by the U.S. Preventive Services Task Force (1996) and the Office of Technology Assess-

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection American College of Preventive Medicine (Ferrini and Woolf, 1998), p. 84 The American College of Preventive Medicine recommends against routine population screening by digital rectal examination and for PSA. Men age 50 and older with a life expectancy of greater than 10 years should be given information about the potential benefits and harms of screening and the limits of current evidence and should be allowed to make their own choice about screening, in consultation with their physician, on the basis of personal preferences. American College of Physicians, 1997, p. 482 Rather than screening all men for prostate cancer as a matter of routine, physicians should describe the potential benefits and known harms of screening, diagnosis, and treatment; listen to the patient’s concerns; and then individualize the decision to screen. U.S. Preventive Services Task Force, 1996, p. 129a Routine screening for prostate cancer by digital rectal examination, evaluation for serum tumor markers (e.g., PSA), or transrectal ultrasound is not recommended. Patients who request screening should be given objective information about the potential benefits and harms of early detection and treatment. If screening is to be performed, the best-evaluated approach is to screen by digital rectal examination and for PSA and to limit screening to men with a life expectancy of greater than 10 years. aThe USPSTF guidelines on prostate cancer screening are being updated. ment (U.S. Congress, 1995), as well as in Canada (Canadian Task Force on the Periodic Health Examination, 1994), the United Kingdom (Morris, 1997), Sweden (Swedish Council on Technology Assessment in Health Care, 1996), and Australia (Australian Health Technology Advisory Committee, 1996). For reasons that are too extensive to outline in this report, such positions are not inconsistent with the clinical recommendations presented above that each man should decide for himself whether to be screened (Woolf, 1997b). A factor that influences the balance of benefits and harms at the societal level is the cascade effect of screening on stimulating inappropriate procedures. For example, the dramatic escalation in PSA screening in the United States in the early 1990s was accompanied by a striking increase in the performance of radical prostatectomies (Lu-Yao and Greenberg, 1994; Wilt et al., 1999). Many of these operations, especially the large number performed on men over age 75, may not have been indicated. A similar phenomenon is becoming apparent in other countries, such as the Netherlands (Spapen et al., 2000) and Australia (Ansari et al., 1998).

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection Cervical Cancer This section of the chapter reviews the Pap (Papanicolaou) smear and adjunctive technologies that can be used to improve the accuracy of detection of cervical cancer. Alternative screening strategies, such as testing for human papillomavirus (HPV), testing for molecular biomarkers (e.g., fluorescent immunochemical labeling) (Patterson et al., 2001), and cervicography, are not reviewed. Pap Smear A fundamental difficulty in the evaluation of screening tests for cervical cancer is the lack of reliability of the reference standard: cytological and histological interpretation of cervical specimens. Even among expert pathologists, interobserver variations in interpreting atypical squamous cells of undetermined significance and low-grade squamous intraepithelial lesions are substantial (Stoler and Schiffman, 2001). A principal limitation of the Pap smear is its poor sensitivity. A recent meta-analysis calculated that the sensitivity and specificity of the Pap smear were 51 percent (95 percent CI, 37 to 66 percent) and 98 percent (95 percent CI, 97 to 99 percent), respectively (Agency for Health Care Policy and Research, 1999). False-negative results are due to both sampling errors (in obtaining the sample from the cervix and in cell collection and cell preparation techniques) and interpretation errors, with the latter accounting for about one-third of false-negative results. Efforts to improve sensitivity have included the introduction of the cytobrush, broom brushes, and plastic spatulas to gain better access to the squamocolumnar junction and endocervix. Other measures have been programmatic, such as federal legislation mandating manual reexamination of a portion of negative slides under the Clinical Laboratory Improvement Amendments. Adjunctive Technologies New technologies have been introduced in recent years to improve the sensitivity and specificity of screening. These include thin-layer cytology (ThinPrep), computerized rescreening neural network technology (Papnet), and algorithm-based computer rescreening (AutoPap). Although these innovations offer the promise of improving the sensitivity and specificity of screening, a systematic review by the Agency for Health Care Policy and Research concluded that existing data for making comparisons were inadequate to reach conclusions about their incremental impacts on health outcomes (Agency for Health Care Policy and Research, 1999). Coupling Pap smears with screening for HPV infection has also been advocated, but

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection testing for HPV plays a larger role (including the evaluation of cervical atypia) that is beyond the scope of this review. Following introduction of the Pap smear by Papanicolaou in the 1930s, experience has revealed a consistent association between the routine use of cervical smears for cytological examination and lower rates of mortality from cervical cancer. Virtually all evidence of a benefit in terms of a lower rate of mortality is observational rather than from experimental trials. The consistency of the evidence is impressive, however, with evidence coming from studies conducted over time, ecological studies, cross-national comparisons, and case-control studies. A body of literature suggesting a 20 to 60 percent relative reduction in mortality rates has been reviewed by the U.S. Preventive Services Task Force (1996). Harms There are no direct harms from cervical cancer screening aside from the inconvenience, discomfort, and embarrassment that may accompany the examination procedure. The principal harms relate to the consequences of false-positive and false-negative results. As with other screening tests, psychological harms are a potential concern. In one study, 3 months after a positive Pap smear result, women were significantly more worried about cancer and had greater impairments in mood, daily activities, interest in sexual activity, and sleep patterns (Lerman et al., 1991a). Other studies have also reported an association between a positive Pap smear result and adverse emotional reactions, fears of cancer, decreased sexual function, social dysfunction, and feelings of unattractiveness (Khanna and Phillips, 2001). False-positive results also incur the inconvenience of follow-up re-examinations and colposcopic procedures. False-negative results introduce the risk of failing to detect interval dysplasia and cancer and underlie concerns about the need for frequent screening. Periodicity of Screening Although many physicians recommend an annual interval for Pap smears, there is little evidence to suggest that it confers greater benefit than screening every 2 or 3 years. A collaborative study of screening programs in eight countries, published in 1986 by the International Agency for Research on Cancer, shed considerable light on the incremental benefit of frequent screening. The analysis clarified the limited difference in the protection afforded by screening every year compared with that afforded by screening every 3 years. Relative to no screening at all, the protection afforded by screening was many-fold greater than no screening, across a wide range of screening intervals. Relative to no screening at all, screening at intervals less than one year offered about 15-fold protection, at intervals between 1–2

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection years about 12-fold protection, and at intervals of 2–3 years about 8-fold protection (IARC Working Group on Evaluation of Cervical Cancer Screening Programmes, 1986). More recent insights have been gained from a large prospective cohort study of 128,805 women at community-based clinics throughout the United States who were screened for cervical cancer within 3 years of normal smears (Sawaya et al., 2000a). It documented that the yield of screening is relatively low (for high-grade squamous intraepithelial lesions or lesions suggestive of squamous cell carcinoma, the incidence per 10,000 women was 66 for women under age 30, 22 for women ages 30 to 49, 15 for women ages 50 to 64, and 10 for women ages 65 and older). Moreover, the incidence rate did not differ significantly on the basis of the frequency of screening: 25/10,000 for screening at 9 to 12 months, 29/10,000 for screening at 13 to 24 months, and 33/10,000 for screening at 25 to 36 months. In previously screened postmenopausal women, in whom the incidence of new cytological abnormalities was low, the PPV of an abnormal smear was zero 1 year after a normal smear and 0.9 percent within 2 years. On the basis of this evidence, the investigators concluded that cervical smears should not be performed for postmenopausal women within 2 years of normal cytological results (Sawaya et al., 2000b). Cost-Effectiveness A variety of studies document that cervical cancer screening has acceptable cost-effectiveness ratios compared with those for no screening (Agency for Health Care Policy and Research, 1999). In one analysis, the cost-effectiveness ratios for screening every 1 or 3 years were $7,345 and $2,254 per year of life saved, respectively, and cost savings seemed likely if screening was targeted to women who have not had regular screenings (Fahs et al., 1992). Several analyses have cast doubt on the incremental cost-effectiveness of computerized rescreening versus conventional cytological evaluation (Meerding et al., 2001; Troni et al., 2000). The imprecision of the available data makes it inappropriate to draw conclusions about the relative cost-effectiveness of these modalities (Agency for Health Care Policy and Research, 1999), but the ratios tend to be more favorable if screening is less frequent. For example, in one analysis, annual use of the AutoPap cytology smear was estimated to cost $166,000 per year of life saved, whereas use of AutoPap every 4 years cost $7,777 per year of life saved (Brown and Garber, 1999). Some have cautioned that the resources expended to pay for these adjunctive technologies could compromise the delivery of cervical cancer screening to high-risk groups (Sawaya and Grimes, 1999). Box 5.6 summarizes the cervical cancer screening recommendations of selected organizations.

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection BOX 5.6 Recommendations for Screening for Cervical Cancer Organization Recommendations American Cancer Society (Smith et al., 2001), p. 40 All women who are or who have been sexually active or who have reached age 18 should have an annual Pap test and pelvic examination. After a woman has had three or more consecutive satisfactory normal annual examinations, the Pap test may be performed less frequently at the discretion of the physician. American College of Obstetricians and Gynecologists, 2000 Women should have a pelvic examination and Pap smear every year beginning at age 18 or earlier if she is sexually active. If the patient is at low risk, testing may be continued periodically at the discretion of the physician and the patient after three consecutive normal tests. American Academy of Family Physicians, 2000 The standard is to offer a Pap smear at least every 3 years. U.S. Preventive Services Task Force, 1996, p. 112 Regular Pap tests are recommended for all women who are or who have been sexually active and who have a cervix. Testing should begin at the age when the woman first engages in sexual intercourse. Adolescents whose sexual history is thought to be unreliable should be presumed to be sexually active at age 18. There is little evidence that annual screening achieves better outcomes than screening every 3 years. Pap tests should be performed at least every 3 years. There is insufficient evidence to recommend for or against an upper age limit for Pap testing, but recommendations can be made on other grounds to discontinue regular testing after age 65 in women who have had regular previous screening in which the smears have been consistently normal. There is insufficient evidence to recommend for or against routine cervicography or colposcopy screening for cervical cancer in asymptomatic women, nor is there evidence to support routine screening for HPV infection. Recommendations against such screening can be made on other grounds, including poor specificity and costs. American College of Preventive Medicine (Hawkes et al., 1996 Screening for cervical cancer by regular Pap tests should be performed for all women who are or who have been sexually active and should be instituted after a woman first engages in sexual intercourse. If the sexual history is unknown or is considered unreliable, screening should begin at age 18. At least two initial screening tests should be performed 1 year apart. For women who have had at least two normal annual smears, the screening interval may then be lengthened at the discretion of the patient and physician after considering the presence of risk factors, but it should not exceed 3 years. Screening may be discontinued at age 65 if the following criteria are met: the woman has been regularly screened, has had two satisfactory smears, and has had no abnormal smears within the previous 9 years.

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection SUMMARY AND CONCLUSIONS The intuitive notion that early detection saves lives is supported by scientific evidence for some cancers. As detailed in this chapter, studies demonstrate that screening for colorectal, breast, and cervical cancer significantly lowers cancer mortality rates. For other cancers, however, the evidence is less direct. Although screening increases the likelihood that cancer will be diagnosed at an early stage, when survival rates are generally higher than those for individuals with advanced-stage disease, these findings do not necessarily prove that screening improves outcomes because of potential statistical artifacts (e.g., length and lead-time biases). Doubts about the value of screening grow even stronger when the available treatment options appear to be of limited efficacy and are unable to alter the natural progression of the disease. Given the alarming death toll from cancer, many would argue that the mere possibility of benefit from screening offers sufficient grounds for moving forward, even in the absence of scientific certainty. However, screening can itself be harmful. A substantial proportion of a population that is screened for cancer can receive false-positive or false-negative results (depending on the accuracy of the test), and this misinformation can set off a cascade of adverse physical and emotional health consequences. Even for those in whom cancer is accurately diagnosed, the incremental benefit of early detection may be outweighed by the side effects and complications of treatment. The monetary costs of screening, which can be substantial, may be offset by the savings achieved through early detection, but quantifying the health gains achieved per dollar invested in screening requires evidence that screening produces health gains. Determining whether resources spent on screening are wise investments are concerns not only of insurance companies and other third-party payers but also of society at large. This is an era in which escalating health care costs and the mounting pressures on service delivery are stretching the capacities of the U.S. health care system to the point of compromising quality (Institute of Medicine, 2001c) and threatening patient safety (Institute of Medicine, 2000e). Under such conditions it is reasonable to examine whether resources spent on screening tests of uncertain benefit would save more lives and achieve greater health gains if they were invested in health care services for which effectiveness is more certain. Health care organizations, government agencies, advocacy organizations, and expert panels have struggled for decades with these issues in deciding what constitutes prudent policy and guidelines for cancer screening. Groups that develop such guidelines approach these issues from different perspectives—depending on their audiences, methods of developing guidelines, and the importance that they place on supporting scientific evidence (Woolf and George, 2000; Woolf et al., 1996)—and have reached

OCR for page 156
Fulfilling the Potential of Cancer Prevention and Early Detection different conclusions about who should be screened, how often, and by which tests (see Boxes 5.3 to 5.6). Despite these inconsistencies, however, a core consensus has emerged about the appropriateness of certain types of cancer screening. There is essentially universal agreement across organizations that all adults age 50 and older should be screened for colorectal cancer, that all women should receive mammograms every 1 to 2 years beginning at least by age 50 (some say age 40), and that all sexually active women with a cervix should be screened regularly for cervical cancer. Of course, controversies about cancer screening persist, the details of which receive some attention in this report and are dissected in detail elsewhere (U.S. Preventive Services Task Force, 1996). The debate over whether men should routinely receive the PSA test symbolizes such controversies. A case study describing efforts to screen individuals for lung cancer, first using chest radiography and more recently using low-dose spiral computed tomography (CT), is presented in Chapter 7 to illustrate the dilemma of adoption of a new screening technology in the face of uncertain science. From a public health perspective, the disturbing paradox is that the cancer screening tests for which there is a core consensus are not being administered to a large proportion of the Americans for whom they are recommended. Upward trends in the proportion of Americans receiving recommended cancer screening tests are heartening, but disparities in screening by socioeconomic status are substantial, many individuals are tested too late to obtain the full benefits of early detection, they are tested incorrectly, or their results receive inadequate follow-up. Chapter 6 examines the size of this gap and reviews evidence regarding potential strategies to improve the delivery of cancer screening services.