Read "Assessing Medical Technologies" at NAP.edu

« Previous: Appendix A: Profiles of 20 Technology Assessment Programs

Page 490 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 491 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 492 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 493 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 494 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 495 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 496 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 497 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 498 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 499 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 500 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 501 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 502 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 503 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 504 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 505 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 506 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 507 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 508 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 509 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 510 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 511 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 512 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 513 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 514 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 515 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 516 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 517 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 518 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 519 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 520 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 521 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 522 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 523 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 524 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 525 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 526 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 527 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 528 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 529 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 530 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 531 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 532 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 533 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 534 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 535 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 536 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 537 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 538 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 539 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 540 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 541 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 542 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 543 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 544 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 545 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 546 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 547 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 548 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 549 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 550 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 551 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 552 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 553 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 554 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 555 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 556 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 557 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 558 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 559 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 560 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 561 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 562 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 563 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Page 564 Cite

Suggested Citation:"Appendix B: Selected Papers." Institute of Medicine. 1985. Assessing Medical Technologies. Washington, DC: The National Academies Press. doi: 10.17226/607.

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

APPENDIX B Selectec! Papers Guide to Comparative Clinical Trials Clifford S. Goodman* This guide is intended to help a reviewer evaluate reports of comparative experimen- tal clinical trials. Such trials are a mainstay of medical technology assessment, but their worth depends on the care with which they are designed, implemented, and analyzed. Experimental trials are prospective studies in that they entail the intentional application of a technology to an experimental group and then the observation of the effects of the tech- nology. Comparative experimental clinical trials are typically used to compare the safety and effectiveness of a resew technology with a standard treatment. Trials may have more than one experimental group. In the simple comparative trial, patients with a common condition are assigned to an experimental group or to a control group. The experimen- tal group receives the new technology, and the control group receives no treatment, a placebo, a standard treatment, or a variation (e.g., a different dosage) of the experimental treatment. After a designated time, each in- * National Research Council Fellow, National Academy of Sciences, Washington, D.C. 490 dividual in the experimental and control groups is assessed for a designated endpoint or outcome. Endpoints may be measured in qualitative terms (e.g., survived or died) or quantitative terms (e.g., blood pressure mea- surements) . The most definitive type of experimental clinical trial is the randomized controlled clinical trial (RCT). In an RCT, patients are randomly assigned to experimental and con- trol groups. Randomization reduces bias that might otherwise be introduced by prognostic and other selection factors not accounted for in the design of the trial. There are a number of design variations, which can be used in combination, to the simple comparative trial. Some of these are crossover, stratified, matched, and factorial designs. In a crossover trial, patients are systemati- cally switched from one treatment group to another during the trial, and outcomes in the same patient are contrasted. Switching may be determined by a time-dependent rule or a disease-state-dependent rule. In a self- controlled trial, which incorporates many of the same features of crossover studies, a sin-

APPENDIX B.: SELECTED PAPERS ale treatment under study is evaluated by comparison of patient status before and after treatment. Louis et al. (1984) describe im- portant factors in determining the effective- ness of crossover and self-controlled designs, e.g., crossover rules, and carry-over and se- quencing effects. In a stratified trial, patients are catego- rized according to characteristics which are thought to have prognostic significance (e. g., stage of disease), so as to isolate treatment ef- fects from those of the prognostic factors. Stratification may be used in designing a trial, or it may be applied in data analysis af- ter completion of the trial. Matching is an al- location process, used to gain statistical preci- sion, of sorting patients into pairs matched according to significant prognostic factors, and then randomly assigning one member of each pair to the treatment group and the other to the control group. (For trials involv- ing multiple treatment groups, patients can be matched into groups of the appropriate number.) In a factorial design trial, combi- nations of treatment factors are grouped and observed to determine independent and in- teractive effects of multiple treatments. For example, a 2 x 2 factorial design could be used to determine the effects of medication and dietary counseling for the treatment of hypertension in four treatment groups: medi- cation and counseling, medication only, counseling only, and neither medication nor counseling. Experimental design and the de- sign of clinical trials in particular are dis- cussed extensively in the literature, e.g., in Campbell and Stanley (1963), Cook and Campbell (1979), Chalmers et al. (1981), Friedman et al. (1981), Mosteller et al. (1980), Peto et al. (1976, 1977), and Shapiro and Louis (1983~. Useful observations of the effects of tech- nologies may be made under nonrandomized and nonexperimental conditions. Although this guide is written to accommodate the as- sessment of RCTs, other types of clinical trials are subject to most of the assessment criteria discussed here. Some trials use h~stor- ical control groups selected from hospital charts or computerized data bases, or stan- dard outcomes from reports in the literature (e.g., organ transplant survival curves or re- 491 jection rates following N years). In observa- tiortal studies (including most epidemiologi- cal studies), assignment of patients to treatment and control groups is generally not under the control of the investigator, making it difficult to control for prognostic factors which might affect observed outcomes. These may include prospective studies as well as retrospective studies (i.e., those in which the investigator identifies treatment and con- trol groups after their exposure and nonexpo- sure to the technology in question). Although lacking the rigor of RCTs, observational studies are valuable in formulating hypothe- ses and in ruling out certain explanations for observed effects of technologies. Observa- tional studies and those using historical con- trols may be useful in situations in which comparative experimental designs are impos- sible or precluded by ethical, financial, and other constraints. Examples of observational studies are cohort and case-control studies. CLINICAL TRIAL REPORTING No study can be adequately interpreted without information about the methods used in the design of the study and the analysis of the results. Instructive surveys of clinical trial reporting (e.g., by Chalmers et al., 1983; DerSimonian et al., 1982; Freiman et al., 1978; Lavori et al., 1983, and Louis et al., 1984) demonstrate the extent to which im- portant methodological elements are re- ported in clinical trials and their bearing on findings. DerSimonian et al. (1982) examined 67 clinical trials published in four prominent medical journals in 1979-1980 for 11 impor- tant aspects of trial design and analysis (e. g., method of randomization, blinding, and sta- tistical methods). Of the 11 items for each of the 67 trials published in the four journals, 56 percent were clearly reported, 10 percent were ambiguously mentioned, and 34 per- cent were not reported at all. The method of randomization was reported in only 19 per- cent of the papers, and statistical power to detect treatment effects was discussed in only 12 percent. Table 13-1 lists the percentage of articles that reported the 11 aspects of trial design and analysis in the four journals sur-

492 ASSESSING MEDICAL TECHNOLOGY Percentage of Clinical Trial Articles in Four Journals Reporting 11 TABLE B-1 0 _ Important Aspects of Design and Analysis Design and Whether Journalb (Number of articles) Analysis Aspects Reporteda NEJM (13) JAMA (14) BMJ (19) Lancet (21) Total (67) Eligibility criteria R 77 36 21 29 37 ? 23 50 58 48 46 O 0 14 21 24 16 Admission before R 85 64 63 29 57 allocation ? 8 7 5 14 9 O 8 29 32 57 34 Random allocation R 100 71 95 71 84 ? 0 0 0 0 0 O 0 29 5 29 16 Method of R 15 43 16 10 19 randomization ? 23 0 11 0 7 O 62 57 74 90 73 Patients' blindness R 62 71 37 57 55 to treatment ? 0 21 21 5 12 O 38 7 42 38 33 Blind assessment R 46 43 26 14 30 of outcome ? 23 36 21 29 27 O 31 21 53 57 43 Treatment R 92 71 58 48 64 complications ? 0 0 5 5 3 O 8 29 37 48 33 Loss to follow-up R 100 93 74 62 79 ? 0 7 11 5 7 O 0 00 16 33 15 Statistical analyses R 92 79 100 95 93 ? 0 0 0 0 0 O 8 21 0 5 8 Statistical methods ~ R 92 86 79 86 85 ? 0 0 5 0 1 O 8 14 16 14 13 - Power R 15 36 5 0 12 ? 0 0 5 5 3 O 85 64 89 95 85 Mean, all items R 71 63 52 46 56 ? 7 11 13 10 10 O 23 26 35 45 34 a R denotes item reported, ? item unclear, and O item omitted. b NEJM denotes the New England Journal of Medicine, JAMA the Journal of the American Medical Association. BMJ the British Medical Journal. SOURCE: R. DerSimonian et al. (1982~. veyed. Emerson et al. (1984) repeated the study on 84 clinical trials published in 1 vear in six journals, and they found that 58 per- cent were clearly reported, 5 percent were ambiguously mentioned, and 37 percent were not reported at all. Freiman et al. (1978) examined 71 pub- lished negative trials (i.e., those in which the outcomes of treatment groups were found to be no different than those of control groups) to determine whether investigators ade- quately addressed a particular element of trial design: power to detect important clini- cal differences between treatment groups.

APPENDIX B.: SELECTED PAPERS The study found that of 71 papers on medical randomized controlled trials that reported no significant differences among treatment groups, only four of the trials were large enough to ensure a reasonable chance, in this instance a power greater than 0. 90, of detect- ing a 25 percent improvement in patient out- comes. Only 30 percent of the trials had power greater than 0.90 for detecting a 50 percent improvement. Chalmers et al. (1983) provide evidence for the seriousness of bias introduced bv vari- ous methods of treatment assignment. Among 145 papers reporting controlled clini- cal trials of the treatment of acute myocar- dial infarction, they found significant differ- ences in bias associated with the method of treatment assignment. At least one prognos- tic factor was maldistributed (p < 0.05) in 14.0 percent of the blind randomized studies (57 papers), in 26.7 percent of the unblinded randomized studies (45 papers), and in 58.1 percent of the nonrandomized studies (43 pa- pers). Significant differences in outcome (case fatality rates) between experimental and control groups were reported in 8.8 per- cent of the blind randomized studies, in 24.4 percent of the unblinded randomized studies, and in 58.1 percent of the nonrandomized studies. These reporting rates among the three types of papers differed significantly (p 0.05). The major subjects addressed in this guide are · basic descriptive material · sample size · selection of patients · random allocation · blinding · treatments compliance withdrawals/loss to follow-up treatment complications tabulation of outcomes statistical methods and analyses power These aspects of clinical trials should be closely examined before one combines the results of smaller trials, or generalizes the results of trials to other populations. This guide draws upon work of others in the re- 493 porting of clinical trials, especially that of Chalmers et al. (1981) and DerSimonian et al. (1982), from whom permission has been granted to use the same or similar wording in places. Table B-2 summarizes the items re- ferred to in the text and may be used as a re- viewer's checklist. TABLE B-2 Checklist for Comparative Clinical Trials Check or complete multiple entries where applicablea 1. Basic descriptive material a. Authors b. Title c. Journal/publication d. Date/volume e. Trial type _ Randomized _ Matched _ Simple comparative _ Factorial Crossover Historical _ Stratified control Not reported Observational (specify) Other (specify) f. Sources of financial support _NIH VA _Drug company _ Other Not reported g. Biostatistician cited as author or evaluator Yes _ No _ Not reported h. Start and end dates for trial given _Yes No Peer reviewed Yes _No Unknown j. Statement of significant findings Major endpoints _ + + Statistically significant (treatment) + Trend (treatment) 0 No difference - Trend (control) - Statistically significant (control) Minor endpoints + + Statistically significant (treatment) + Trend (treatment) 0 No difference - Trend (control) - - Statistically significant (control) None Side effects + + Statistically significant + Trend 0 No side effects na 1.

494 TABLE B-2 Continued TABLE B-2 Continued 2. Sample size a. Expected control group endpoints Yes No Unclear na b. Improvement of clinical interest that should c. not be missed _Yes No _Unelear na Levels of risk given a: Yes No it: Yes No d. Prior estimate of numbers of patients required _Yes _No 3. Selection of patients a. Patient sources University Public Private _Clinie Industry Not reported b. Admission criteria description Yes _ No Unelear Rejection criteria description _Yes _No Unelear d. Number of patients actually entering trial given Yes No c. Unelear . na 8. ma e. Reject log reported _Yes No . na 4. Random allocation a. Method Envelope Pharmacy Telephone Not reported na Other (speeifyi b. Stratifieation/bloeking Yes No Blinding of random allocation described c. _na Yes No na d. Testing of randomization described c. Yes No na 5. Blinding a. Patients as to treatment assignment Yes No na b. Physicians as to treatment assignment Yes No na Physicians and patients as to trends of trial _Yes No na d. Biostatisticians/other evaluators _Yes No na e. Testing for blinding Physicians: Yes No na Patients: Yes No na 6. Treatments a. Description Yes b. Patient number and treatment Controls Group 1 ASSESSING MEDICAL TECHNOLOGY Group 2 Group 3 e. Placebo described Yes No Unclear na 7. Compliance a. Defined Yes No Unclear b. Accounted for all patients Yes Partial c. Biological equivalent _Yes _No Withdrawals/loss to follow-up a. Listed Yes No b. How analyzed Counted in original treatment group _Counted as end result at time of withdrawal Counted as in both ways above, and other ways Discarded Counted in new group Not reported na 9. Treatment complications Described Not described 10. Tabulation of outcomes Given _Not given 11. Statistical methods and analyses b. No Unclear Number Treatment na _na a. Statistical methods reported (speeifie-tests, techniques, computer programs, etc.) Yes No . Statistical analyses reported (beyond means, percentages, standard deviations) Yes No e. Test statistics given for endpoints Yes No _ Unelear d. Associated probability values given Yes No Unclear e. Confidence intervals given Yes _ No Unclear f. Regression or correlation analyses Yes No Unclear g. Statistical discussion of treatment complications given Yes No Unclear h. Appropriate retrospective analysis of subgroups given Yes No Unclear na 12. Power addressed for negative trials Yes _No a Yes means reported; unclear means inadequate, partial, or ambiguous information; no means not reported; na means inapplicability is reported or clearly implied. Adapted from Chalmers et al. (1981~. _na na na na

APPENDIX B.: SELECTED PAPERS Basic Descriptive Material When assessing a published report of a clinical trial, the reviewer should note cer- tain basic descriptive information, beginning with authors, title, journal, and date of pub- lication. The report of the clinical trial should in- clude a description of the trial design (e.g., RCT, stratified-blocking). Other basic de- scriptive information includes the sources of financial support for the trial and whether or not a biostatistician has participated in the study (as an author, consultant, or reviewer). Studies should list the starting and stopping dates of the trial so that the results can be in- terpreted in the light of other changes in ther- apy that may have occurred. The paper should include a statement of significant findings as the author understands and inter- prets them. It would be helpful for the reviewer of a published trial to know whether or not the report of the trial has been reviewed by peers. However, this often is not readily dis- cernable, even among published papers that have been so reviewed. Although many jour- nals use peer reviewers for most or all articles reporting scientific findings, these journals, as a matter of editorial policy, may not dis- close whether or not a particular article was subject to peer review. Sample Size The numbers of patients in the trial affect the ability of the trial to detect differences be- tween experimental and control groups. (The risks of making errors in detection of differ- ences a, the probability of the false-positive error, and d, the probability a false-negative error- are discussed below.) There should be evidence that a prior estimate of the numbers of patients required has been made. A paper should list the expected control group endpoints, the improvement of clini- cal interest that should not be missed, the chosen levels of risk (a and P), and the num- ber of patients required. Here is an example from a trial of cyclosporine in cadaveric renal transplantation: 495 Sample size was decided on the basis of a two-sided test of the hypothesis of equality of treatment groups for one-year graft survival. At the 5 per cent level of significance, the power of the test was set at 90 per cent for an expected difference between the two treatment groups of 20 per cent (55 vs. 75 per cent). The sample size was established at 100 patients per treatment group. Statistical analysis of the background variables was carried out to assess the balance between the two groups (Canadian Multicentre Transplant Study Group, 1983~. Regrettably, few studies do this (Altman, 1983~. Only 12 percent of the trials reviewed by DerSimonian et al. (1982) reported calcu- lations of power in planning sample size; Mosteller et al. (1980) found that less than 2 percent of the trials they reviewed did so. Selection of Patients The paper should provide a detailed de- scription of the criteria for admission and re- jection of patients to the trial, and should show that these criteria were applied before knowledge of the specific treatment assign- ment had been obtained. Without selection criteria, it is difficult to interpret and apply the findings of the trial. To the extent that study patients are not selected at random from a well-defined population (not to be confused with random allocation to treat- ment groups), doubt may exist as to whether trial findings may be generalized to that pop- ulation, as well as to others. Thus, a mere statement that a certain number of patients with a given diagnosis were randomized is in- sufficient. First, admission criteria should be given which describe who was eligible for the study, e.g., patients with a particular diag- nosis and treatment history, in a certain med- ical center, in a particular year, etc. Second, rejection criteria should be given which de- scribe reasons why those who might other- wise have been admitted were ruled out of the study, e.g., diagnosis not confirmed by pathology, other serious illnesses, patient re- fusals, etc. The number of patients actually entering the trial should be given. The description of the eligible patient pop- ulation rejected for the trial can be as impor- tant as the documentation of the subjects

496 studied. A log of those patients who are not allowed to enter the trial should be kept, in- cluding the reasons for their noneligibility. The primary use of such a reject log is to help identify bias in patient selection. An attempt should be made to compare the outcome of the rejected patients to the outcome of the trial subjects to detect any important selec- tion biases, especially in instances in which cooperative studies are being undertaken at different centers. Random Allocation Random allocation of patients to treat- ment groups is a major bias-reducing tech- nique in controlled clinical trials. The ob- served results of a trial may be affected (biased) by an uneven distribution to treat- ment groups of factors that affect prognosis such as age, disease state, or concurrent med- ical problems, as well as by the experimental treatment. (These prognostic factors may be referred to as confounding variables or co- variates.) Proper randomization is an indif- ferent yet objective procedure which, among other benefits, tends to spread prognostic fac- tors evenly among treatment groups. A randomized study should provide infor- mation about the method of random alloca- tion of patients, including information about the mechanism used to generate the random assignment and success in implementing it. A simple statement that a random assignment was made is insufficient, because some meth- ods of random assignment may be effective but are poorly implemented, and some that appear to be random have serious weak- nesses. In studies in which randomization was not possible, this should be noted. Although random number tables or coin flipping may be unbiased in and of them- selves, they may be used in ways which allow for the introduction of bias into a trial. Ran- domization should be verifiable as well as properly executed. Methods such as flipping coins, tossing dice, or drawing cards cannot be verified, and may lead investigators to in- terfere with the process. Some methods which are verifiable can also be too easily in- spected by study personnel, providing oppor- tunity for bias to influence acceptance and ASSESSING MEI)ICAL TECHNOLOGY therefore treatment distribution. Examples are allocation by birth date, chart number, alternate cases, and an open randomization table. Random numbers from one of the pub- lished random number tables or pseudo- random numbers generated by a well-studied computer method offer good sources of ran- domization. After being admitted into a trial, it is best that the patient is assigned to a treatment group by a central source. A pre- ferred method of randomization uses care- fully prepared, sealed, consecutively num- bered opaque envelopes. In the case of a drug trial, drugs should be prepackaged and num- bered for each patient before the time of ran- domization. Envelopes and packages should be returned to the biostatistician for verifica- tion of assignment. Whereas simple randomization tends to spread prognostic factors evenly among treatment groups in trials with large numbers of patients, small studies are more vulnerable to imbalances of prognostic factors. To en- hance the effect of randomization in studies with small sample sizes, the patient alloca- tion process may include stratification and blocking. Patients are first classified accord- ing to one or more important prognostic fac- tors (stratification), and then they are ran- domly assigned to experimental groups so that predetermined, appropriately fixed pro- portions of patients from each stratum re- ceive each treatment (blocking) (see, e.g., Lavori et al., 1983~. If used, the methods for stratification and blocking should be de- scribed. Randomization should be blinded in that the investigator must not be able to deduce which treatment is next in line when a pa- tient is accepted into the trial. It is especially important to blind the randomization process when the treatments are not blinded or trends in the study are known to the admit- ting investigator. An admitting investigator with a bias for or against a therapy that is thought to be next up for assignment may readily circumvent the patient in whom a suspected outcome might, in the view of the investigator, favor one treatment over an- other, or the investigator may delay admis- sion until some other patient has been admit-

APPENDIX B.: SELECTED PAPERS ted. The informed consent procedure is an opportunity to inject this bias. One method of testing randomization is the measurement of the prognostic factors of the groups being compared. Listing only de- mographic comparisons such as the usual age and sex distributions is usually insufficient. If the distribution among the treatment groups of prognostic factors is disproportion- ate, the cause may be chance or a previously unsuspected bias; thus, the distribution of known prognostic factors by treatment cate- gory should be shown in tabular form. These data are critical in both assessing the efficacy of the blinding of the randomization (which, if in serious doubt, will generally result in a trial's results being discarded) and removing the unwanted effects of chance variation by using stratified analysis of a trial's results. Analysis-of-variance modeling of prognostic factors may be used to reduce bias and to in- crease precision of estimates of effects (Lavori et al., 1983~. If the trial results are to be considered for use in combination with other trial results, the significance of known prognostic factors by treatment category may be important in deciding whether to do so. Blinding Blinding is a major bias-reducing tech- nique in clinical trials. Many papers report that the therapy given was concealed from the patient or the physician or both (double blinding). However, many reports stop after using the term blind or double blind and leave the reader uncertain of exactly what has been concealed from whom. These terms are not sufficiently descriptive because the roles in the trial may not be limited to the pa- tient and physician. Four or more parties may be involved with as many roles; each may be subject to hopes and prejudices about the trial. These are (1) the personts) making the random allocations to treatment groups, (2) the patients, (3) the physicians or other providers, and (4) the biostatistician or other evaluators. Sometimes the physician makes the random assignment to treatment groups and/or is the evaluator. Such multiple re- sponsibilities present further opportunities for bias which must be checked. 497 Persons making the random assignments who have knowledge of the assignments made for particular patients may have their own prejudices and hopes, which may bias the assignments. For scientific and ethical reasons, blinding of patients and physicians as to the ongoing results of the study is impor- tant. Of course, patients' attitudes toward their treatments may affect compliance, par- ticipation, and outcomes. The physician who gives or orders treatment naturally hopes for success and may treat patients differently, given knowledge of treatment assignments, such as providing extra attention to patients with the less-preferred treatment. If the treatments and randomization process are not adequately blinded, knowledge of the trial trends could lead the conscientious phy- sician to alter the intake of patients to the trial or to influence withdrawals from the trial. From an ethical standpoint, the physi- cian should no longer ask patients to join a study or to remain in it if the physician per- ceives an impressive trend. A data-monitor- ing committee, charged with studying the trial and notifying the investigators when a change in protocol should be considered or when the trial should be discontinued, is a proper inclusion in the informed consent pro- cedure (Chalmers, 1976~. Although such a committee would not dissolve the ethical considerations (which would be shifted in part to it), it would better enable the physi- cian to act consistently in randomization and treatment. Evaluators who are aware of the treatments given may bias their findings, de- spite conscious efforts to be fair. When neces- sary, the statistician-evaluator who has prop- erly participated in planning the trial may work with coded data. As for randomization, the methods of achieving blindness should be reported to give the reader important information for judging the adequacy of a trial's protection from bias. Although not all types of blindness are feasible for all trials, every reasonable at- tempt should be made to achieve as many types of blindness as possible. This aspect re- quires careful consideration and reporting by the authors. Five aspects of a trial which should be blinded (Chalmers et al., 1981) are as follows:

498 1. the randomization process (discussed above under Random Allocation); 2. patients as to treatment assignment; 3. physicians as to treatment assignment; 4. physicians and patients as to trends/on- going results of the trial; and 5. biostatisticians/other evaluators. In certain trials, patients and physicians should be blinded to the timing of interven- tions, e. g., the point during crossover trials at which patients switch from one treatment group to another. It is not sufficient to assume that a double- blind procedure is effective. In good studies the physicians and their patients are tested for blinding at the end of the study to deter- mine whether or not they have guessed treat- ment assignments. Treatments All experimental and ancillary treatment regimens must be described well enough to allow interpretation of the results and repli- cation in other studies or practice. This in- cludes the timing and amount of treatments in the trial and all other allowable treat- ments. If a trial used a placebo, it is insufficient merely to mention that a placebo was given. Identity of appearance and taste where ap- plicable should be documented, and evi- dence for physician and/or patient ability to distinguish between placebo and experimen- tal treatment should be noted. Compliance Objective methods of verifying that pa- tients are conforming to the protocol should be described. For example, in a drug trial, pill counts would be acceptable. When sub- jective (indirect) assessments of compliance are used, the validity of the subjective mea- sure should be addressed. Biological equiva- lent refers to a measure, where appropriate, of a therapeutic agent after absorption or in- jection, preferably in its active form. Exam- ples are pre- and postvagotomy measure- ments of gastric acid output in therapeutic trials of peptic ulcer. Blood or urine levels of ASSESSING MEDICAL TECHNOLOGY an active agent may also be used to measure compliance. Biological equivalent measure- ments are useful both as measures of compli- ance and in describing treatments. In some trials the assessment of compliance is self-evident, such as in certain trials com- paring surgical and medical treatments for a disease. Trials in which patients are to main- tain regimens on an outpatient basis, espe- cially over an extended period, present spe- cial problems in validating compliance. In some trials, a patient's compliance may be partial or temporary, as well as positive or negative. In any case, definitions for what constitutes compliance must be explicit, and compliance of all patients should be ac- counted for. WithdrawalslLoss to Follow-up In most reported trials, a number of sub- jects drop out or are withdrawn after the trial is under way. In trials with long-term follow- up, large trials, and trials with complicated protocols, some follow-up data are likely to be missing. Sometimes investigators cannot collect outcome data from all subjects be- cause some die, move away, decline to con- tinue to participate, or become lost from the study group for other reasons. Information should be available regarding what happened to all the patients treated. Dropouts should be listed by diagnosis, treat- ment, reason for withdrawal, and whether withdrawal occurred as a result of patient or investigator initiative. It is usually important to report outcome in this group after the time of withdrawal, and they should be consid- ered in the main analysis of the trial. When dropouts are properly reported, the reader can often assess the effect of missing data on the trial's conclusions; otherwise, the skepti- cal reader may conclude that the paper should be dismissed. Different kinds of with- drawals (i.e., in terms of prognostic charac- teristics) could bias the final makeup of each treatment group, thus diminishing the effi- cacy of the randomization procedure for ob- taining similar kinds of patients in each treat- ment group. Trials that do not mention withdrawals, or whose withdrawals exceed 5 percent, should

APPENDIX B.: SELECTED PAPERS be carefully scrutinized. Deletion of cases by the investigator after completion of the study raises strong concern over investigator bias and undermines any findings. Universal application of a rule regarding counting of withdrawals without considering the nature of a trial and its objectives may give misleading findings (Sackett and Gent, 1979~. Depending on the type of study, with- drawals are handled in different ways, for ex- ample, as follows. 1. Patients are considered as an end result for the group to which they were originally assigned regardless of what happens to them. 2. Patients can be counted as an end result at the time of withdrawal. 3. The results are analyzed with the drop- outs handled as in both 1 and 2 and in other ways if appropriate. For instance, it may be useful to characterize treatment groups in terms of patient-years of treatment, to which even withdrawals, under certain well- defined circumstances, would make contri- butions. 4. Patients can be ignored or eliminated from the study at the time of withdrawal, and thus not be counted as an end result. A1- though this is often done, it is rarely defensi- ble. 5. Patients may change groups, i.e., cross over, and be considered as an end result in the new group. Unless this is done as part of the planned protocol, it is not defensible. Treatment Complications The paper should provide information de- scribing the presence or absence of side ef- fects or complications after treatment. To de- termine the usefulness of a treatment, readers need to assess the nature and incidence of these side effects and their implications for patient care. The report should describe an active search for side effects or complications and discuss those that are found. If no side effects occur, this should be explicitly stated. As is done in the main analysis, statistical analysis should be made of side effects if the sample size warrants it, including compari- sons of percentages with a statistical test of significance and the observed probability. 499 Given no significant difference in side effects, the probability of the false-negative error (~) should be mentioned. Tabulation of Outcomes A good study will tabulate all events em- ployed as outcomes (endpoints) so that the reader can check the calculations and use the data more effectively in combining the results of different studies. Data of trial results should not be aggregated to a level that would preclude the reader from con- ducting secondary analyses. For all discrete endpoints that are spread over time, such as mortality or morbidity, even for some trials of short duration, life table or time series analysis should be carried out. Some papers present outcome data as crude rates, e.g., a 5-year death rate. This may be useful sum- mary information, but alone it may be inade- quate for illustrating the course of treatment effects. The data should be presented in a form that would allow the reader to repro- duce the survival curve or curves. Statistical Methods and Analysis The uncertainty associated with real- world sample sizes usually requires formal statistical inference to evaluate the effects ob- served in trials. When an author merely states that p was less than 0.05 without iden- tifying the statistical test, readers cannot sat- isfy themselves that the methods were appro- priate. The paper should include statistical analyses going beyond the computation of means, percentages, and standard devia- tions. The names of the specific statistical methods, i.e., tests, techniques, and com- puter programs (with program version) used for statistical analyses should be given. In the analysis of the data gathered in any clinical trial there are certain minimal procedures that are indicative of quality. These include, but are not limited to, significance of major endpoints, confidence intervals, and regres- sion or correlation analyses. The level of statistical significance is the probability of making a false-positive, or Type I, error, i.e., concluding that there is a

500 difference between the experimental and control groups when in fact there is none. The probability of a Type I error is known as a. When significance is reported, it should be given in such a way that the reader can make the actual calculations. Both the test statistic and its associated probability values should be stated. If one is given without the other, the reader may have trouble verifying or un- derstanding the statistical conclusions. Confidence intervals should be provided for the measurements used as trial endpoints. Confidence intervals provide information that adds to the accept-reject findings of a hypothesis test. The confidence limits define the interval in which one can be reasonably confident (e.g., greater than 90 percent) that the true difference between treatments lies. If that interval includes zero, the null hy- pothesis of no true difference cannot be re- jected. However, the location and width of the interval may suggest the direction of a true difference and the ability of a larger sample size to reject the null hypothesis. Con- fidence intervals that encompass clinically unrealistic measurements raise questions about the assumed distribution of measure- ments and should be discussed. Regression or correlation analyses should be carried out for trials when it is of interest to know how treatment and outcome vari- ables change or do not change together, such as when the critical response is a function of drug dosage or predetermined, quantifiable clinical factors. When a trial is over, it is tempting and sometimes useful to select for analysis sub- groups that were not stratified at the trial's outset. However, investigators and reviewers should realize that such post hoc study is sub- ject to selection bias just as is any retrospec- tive study, and that no rigorous conclusions can be drawn from them. Retrospective stud- ies are useful to suggest new studies; may point out inadequacies arising from the ran- dom allocation process, dropouts, or compli- ance; and may help to estimate their effect on outcomes. Although many papers state the specific objectives of the study, it is often very diffi- cult to find results in the paper that apply di- rectly to the specific objectives. A clear pre- sentation of results should be made. ASSESSING MEDICAL TECHNOLOGY Power The probability of making a false-nega- tive, or Type II, error, i.e., of not detecting a difference between the experimental and control treatments when in fact one does ex- ist, is known as ,B. Power, generally defined as (1 - hi, is the probability of avoiding Type II error, i.e., detecting that true differ- ence. As discussed above, the paper should provide information describing the determi- nation of sample size before the trial, which would enable the detection of clinically im- portant differences. Although confidence limits portray the un- certainty of a treatment effect, discussion of power denotes the strength of the conclusion. As illustrated in the study by Freiman et al. (1978) referenced above under Clinical Trial Reporting, small sample size frequently leads to trials with little power to detect differences among treatment groups. If the difference between the experimental and control groups is not statistically significant, then the false- negative error and its probability should be addressed. A well-designed trial with high power that detects no statistically significant difference between treatments can be con- vincing. But if no statistically significant dif- ference is found and the power is low, or not discussed, the reader cannot dismiss the pos- sib~lity that the study was not large enough to detect an important treatment effect. For a negative trial, it would be informative to esti- mate the number of patients that would have been required to document as significant the observed difference between treatment and control groups, assuming that that difference were to hold up with the larger sample size. REFERENCES Altman, D. G. 1983. Size of clinical trials. British Medical Journal 286:1842-1843. Campbell, D. T., and J. C. Stanley. 1963. Experi- mental and Quasi-Experimental Design for Research. Chicago: Rand McNally. Canadian Multicentre Transplant Study Group. 1983. A randomized clinical trial of cyclosporine in cadaveric renal transplantation. New England Jour- nal of Medicine 309:809-815. Chalmers T. C., and discussants. 1976. How to turn off an experiment. In J. D. Cooper and H. D. Ley, eds., Ethical Safeguards in Research on Hu-

APPENDIX B.: SELECTED PAPERS mans. Washington, D. C.: Interdisciplinary Commu- nications Associates. Chalmers, T. C., P. C. Celano, H. S. Sacks, and H. Smith. 1983. Bias in treatment assignment in con- trolled clinical trials. New England Journal of Medi- cine 309:1358-1361. Chalmers, T. C., H. Smith, B. Blackburn, B. Silverman, B. Schroeder, D. Reitman, and A. Am- broz. 1981. A method for assessing the quality of a randomized control trial. Controlled Clinical Trials 2:31-49. Cook, T. D., and D. T. Campbell. 1979. Quasi- Experimentation: Design and Analysis Issues for Field Settings. Chicago: Rand McNally. DerSimonian, R., L. J. Charette, B. McPeek, and F. Mosteller. 1982. Reporting on methods in clinical trials. New England Journal of Medicine 306:1332- 1337. Emerson, J. D., B. McPeek, and F. Mosteller. In press. Reporting clinical trials in general medical journals. Surgery. Freiman, J. A., T. C. Chalmers, H. Smith, and R. R. Kuebler. 1978. The importance of beta, the Type II error and sample size in the design and interpreta- tion of the randomized control trial. New England Journal of Medicine 299:690-694. Friedman, L., C. Furberg, and D. deMets. 1981. Fundamentals of Clinical Trials. Boston: Wright- PSG. 501 Lavori, P. W., T. A. Louis, J. C. Bailar, and M. Polansky. 1983. Designs for experimentsparallel comparisons of treatment. New England Journal of Medicine 309:1291-1299. Louis, T. A., P. W. Lavori, J. C. Bailar, and M. Polansky. 1984. Crossover and self-controlled designs in clinical research. New England journal of Medi- cine 310:24-31. Mosteller, F., J. Gilbert, and B. McPeek. 1980. Re- porting standards and research strategies for con- trolled trials. Controlled Clinical Trials 1:37-58. Peto, R., M. C. Pike, P. Armitage, N. E. Breslow, D. R. Cox, S. V. Howard, N. Mantel, K. McPherson, J. Peto, and P. G. Smith. 1976. Design and analysis of randomized clinical trials requiring prolonged obser- vation of each patient, Part I. British Journal of Can- cer 34:585-612. Peto, R., M. C. Pike, P. Armitage, N. E. Breslow, D. R. Cox, S. V. Howard, N. Mantel, K. McPherson, J. Peto, and P. G. Smith. 1977. Design and analysis of randomized clinical trials requiring prolonged obser- vation of each patient, Part II. British Journal of Can- cer 35:1-39. Sackett, D. L., and M. Gent. 1979. Controversy in counting and attributing events in clinical trials. New England Journal of Medicine 301:1410-1412. Shapiro, S. H., and T. A. Louis. 1983. Clinical Trials: Issues and Approaches. New York: Marcel Dekker.

Information Needs for Technology Assessment Morris F. Co1len* Technology assessment has been defined in this volume and elsewhere) as a complex pro- cess requiring a broad comprehensive base of data in order to permit evaluation of short- and long-term, intended and unintended, and direct and indirect consequences of the use of technology. A technology assessment usually first de- fines the technology to be studied, identifies the alternative technologies with which it competes, describes the patients using the technology, and considers the goals of deci- sion makers for whom the technology assess- ment is intended; then it evaluates the pro- cess and outcomes of using the technology for these patients. Accordingly, the information needs for a technology assessment include (1) descriptive information of the technology and how it is used; (2) descriptive informa- tion of alternative, competitive technologies; (3) descriptive data of the patient and popu- lation users of the technology; (4) evaluative data of direct, intended effects; and (5) eval- uative data of indirect, unintended effects. This section lists what information is usu- ally needed for a comprehensive technology assessment. Information sources that can provide these data (such as registries and data bases), and how the data are acquired are considered in Chapter 3. DESCRIPTIVE INFORMATION OF THE TECHNOLOGY Definition of the Type of Technology The different approaches to classification of technologies have been mentioned in Chapter 2, and different types of technology require different information for assessment. As a minimum, a technology can be classified using recommended standard terms when available as being a drug,2 a technique or * Director, Technology Assessment nen~rtm'`nt of Medical Methods Research, Medical Care Program, Oakland, California. ~ ~ -rho A Kaiser-Perm anente 502 procedure,3 4 a device or equipment, or a sys- tem. Functional Specifications of the Technology Information specifying what the technol- ogy is intended to do, its purpose or objec- tives, is necessary in accordance with the functional classification of the technology. Evaluation methodology is sufficiently dif- ferent for the following functional groups to require detailed functional specifications of a technology when used for: · Screening or diagnostic medicine, such as screening for fetal neural tube defects by the maternal serum alpha-fetoprotein test, computed tomographic (CT) scanning for the diagnosis of abdominal tumors, etc. · Therapeutic, preventive, or rehabilita- tive medicine, such as cancer therapy, car- diac surgery, immunizations, prosthetics, etc. · Supporting or coordinating medicine, such as a hospital computer system. Technical Specifications of the Technology An evaluation of how the technology works will usually require information as to the following: · Technical description of the technology, its structure and operational characteristics, exact type of procedure, etc. · Supporting resources needed, such as necessary specialized technical personnel and their training, supplies, facility site, and en- ergy requirements. · When appropriate, data on process quality control, reliability, preventive main- tenance, and backup requirements. Costs of Technology Reliable cost data are difficult to obtain, but as a compromise, fees or charges to the

APPENDIX B.: SELECTED PAPERS user are often used as substitutes for costs. Unless specific cost centers are established to identify the various expenses associated with the utilization of the technology, accurate costs will not usually be available. Informa- tion will be needed as to capital costs for equipment and facilities and direct opera- tional costs including those for personnel, supplies, depreciation of equipment, etc. Process information as to workload can then provide unit costs per procedure or per use of technology.4 When appropriate, and if a defined population of users is available, then total expenditures can be derived for the technology per unit of population using the technology (e.g., per one thousand patients, or per one million population). Alternative Technologies Similar information as described above will need to be obtained for alternative com- peting technologies used for the same func- tional requirements. For example, for the di- agnosis of abdominal body tumors, one will need information to assess alternative scan- ners (x-ray CT versus ultrasound versus nu- clear medicine). PROVIDER, PATIENT, AND POPULATION INFORMATION Patient Workload In order to evaluate cost-effectiveness, the following information will be needed: · Number of patients using the device (equipment) or receiving the procedure, per unit of time (e. g., per episode of illness or per day) for specific conditions. · Number of patients with the same con- dition receiving alternative technology. · Criteria used for appropriate selection of patients for the alternative technologies. Patient Demographic Information For patients using the technology, infor- mation will be required as to number of pa- tients (yet preserving patient confidential- ity), age (date of birth), sex, race and ethnic group, occupation, and residence area (e. g., zip codes). Marital status, family status, and educational and financial information are 503 important for some technologies such as those used for home or self-care. Health habits and life-styles may be important factors influenc- ing patient outcomes. Relevant Patient Medical Information Diagnoses for which the technology is used must be available in detail as appropriate for the specific technology, indicating severity or staging of disease and using a standardized diagnosis code.5~S Also, information will be needed as to how often the technology was used for the specific conditions, such as total usage for an episode of illness and patient outcome from the use of the technology. Linkage of medical data will be needed from multiple medical records (i.e., for mul- tiple episodes of illness, office visits, and hos- pitalizations). Prior use of medical services may be important. Relevant Population Medical Data To determine rates of use of technology in the population that is served requires data as to the size of the targeted or user population and incidence and prevalence of the condi- tion or disease in the population using the technology. Provider Information Health care provider specialty services that use the technology (e. g., cardiac surgery, ob- stetrics, nuclear medicine, etc.) will need to be identified. Also information will be needed as to the facility sites providing the services, such as (1) ambulatory care visits/ encounters for technology used in the office, (2) hospital admission rates and days in hos- pital for technology used in the hospital, and (3) nursing home days and home care visits for technology used in these sites. Appropri- ate information will be needed for ancillary services, including pharmacy drug usage, clinical laboratory tests, x-ray procedures, electrocardiograms, electroencephalograms, cardiac catheterization, hemodialysis, etc. Payments for Technology Who pays for the use of a technology may influence its rate of diffusion and utilization.

504 ASSESSING MEDICAL TECHNOLOGY Charges to patients for specific technology Economic Analysis and Efficiency Effi- procedures9 and sources of payment (self- ciency measures will require information as to pay, insurance, Medicare, etc. ~ will be the resource costs used to achieve specified lev- needed. els of technology effectiveness for clinical, medical, or outcome efficiency, such as unit cost per true positive test for diagnostic tech- nology or cost per episode of illness for a thera- peutic technology. Assessment of managerial, technical process, or production efficiency will require measures of unit cost of technol- ogy per unit of operational time, etc. Effects of discount rates must be con- sidered in valuing costs and benefits over time. Information should be obtained to as- sess the effect of organizational financial arrangements on costs, utilization, and rate of diffusion/replacement of technology when applicable (e.g., fee-for-service or cost-reim- bursement versus health maintenance orga- nization tHMO] capitation payments or pro- spective budgeting). An assessment of benefits from the use of medical technology adds an additional exten- sive set of information requirements, includ- ing estimated value of extended years of life to individual patients and to the group.) A cost-benefit analysis requires a basis for con- verting all benefits to monetary terms. Similar information will be required for all competing alternative technologies, includ- ing data on appropriateness of patient utili- zation rates for the alternative technologies. It may be of interest to assess the actual (and appropriate) mix of competing technologies used per unit of population served (e.g., per one thousand patients or per one million pop- ulation served). EVALUATIVE INFORMATION Intended Users of the Technology Assessment Technology assessments are usually con- ducted for health care policymakers, but they may also be intended for use by administra- tors, physicians, and patients. Each of these groups has special interests that may require different information. Policymakers and ad- ministrators will be especially interested in comparative cost-benefit analyses for com- peting technologies, whereas patients will be primarily concerned with health care out- comes. Evaluative Information for Direct, Intended Effects Effectiveness Effectiveness measures how well the technology works, that is, the extent to which the technology achieves its specified intended objectives. Evaluation information usually includes measures of both patient outcome and the health care process. Evaluation of clinical effectiveness for screening and diagnostic technology requires information on sensitivity, specificity, yield rates, etc. For therapeutic technology, effec- tiveness measures for individual patients usu- ally include information on health status out- comes, functional disabilities which limit patients' return to work, etc. For population groups, information needs include the effects on health indexes; rates of morbidity, disabil- ity, and mortality; and average years of life gained. Managerial or production effectiveness, such as for coordinating technology, will need information on throughput time, error detection rates, etc. Clinical Safety Technology assessments usually need information as to clinical haz- ards, toxicity, adverse effects, etc. EVALUATIVE INFORMATION FOR INDIRECT, UNINTENDED EFFECTS An indirect or unintended consequence of the use of a technology may be very exten- sive, and usually the decision makers for whom the assessment is being prepared have special interests which will determine the specific information needed. As has been em- phasized, different data may be needed for a technology assessment providing options for patient-consumers than may be needed for clinicians, for hospital administrators, or for government policymakers. Accordingly, the

APPENDIX B.: SELECTED PAPERS assessment of indirect consequences requires a correspondingly broad range of informa- tion needs, including, when appropriate, data as to societal effects, legal effects, ethi- cal effects, and environmental effects. RECOMMENDATION FOR A STANDARD MINIMUM DATA SET Using the above guidelines and other pub- lished minimum data sets as models,~-~4 it is recommended that there be developed a min- imum data set for medical records to better satisfy the basic information needs for their use in technology assessments. Such a mini- mum data set could contain essential and necessary data for many technology assess- ments. Of course, this could not provide suf- ficient comprehensive data for every assess- ment, and special data subsets could be developed for the different technology types and for those that require analytic methods. Nevertheless, it would encourage a more uni- form approach to data collection and docu- mentation which should increase the poten- tial of using medical record data for technology assessment, facilitate data link- age, permit meaningful data comparisons, and support retrospective studies. 505 ~ Office of Technology Assessment. 1980. The Im- plications of Cost-Effectiveness Analysis of Medical Technology. Washington, D. C.: U. S. Government Printing Office. ~ American Hospital Formulary Services for Drug Classification. 3 Current Procedural Terminology (CPT-4~. American Medical Association. Blue Cross/Blue Shield Codes for Procedures. ' International Classification of Diseases, 9th Revi- sion, Clinical Modification (ICD-9-CM). 6 Systematized Nomenclature of Medicine (SNOMED). American Association of Pathologists. ' Classification of Reasons for Visits. National Am- bulatory Care Survey (NCHS). ~ International Classification of Health Problems in Primary Care (ICHPPC-2), World Health Organi- zation of National Colleges, Academies, and Aca- demic Associations for General Practitioners/Family Physicians (WONCA). ~ California Relative Value Codes for Physician Reimbursement. in McNeil, B. J. Values and Preferences of Patients and Providers (see p. 535 of Appendix B of this book). i~ Uniform Ambulatory Medical Care. Minimum Data Set. DHEW Pub. No. (PHS) 81-1161. Washing- ton, D.C.: U.S. Government Printing Office. ~0 Guidelines for Producing Uniform Data for Health Care Plans. DHEW Pub. No. (HSM) 73-3005. Washington, D.C.: U.S. Government Printing Of- fice. i3 Uniform Hospital Discharge Data. Minimum Data Set. DHEW Pub. No. (PHS) 80-1157. Washing- ton, D.C.: U.S. Government Printing Office. ii Long-Term Health Care. Minimum Data Set. DHHS Pub. No. (PHS) 80-1158. Washington, D.C.. U.S. Government Printing Office.

Toward Evaluating Cost Effectiveness of Medical ant! Social Experiments Frederick Mosteller* Milton C. WeinsteinT The United States has been increasingly concerned about costs of health care (Fuchs, 1974; Hiatt, 1975~. One possible response to this concern is an accelerated strategy for evaluating the efficacy of medical practices, with the hope that identifying those practices that are not efficacious will lead to their abandonment and, therefore, to substantial savings in health care resources (Cochrane, 1972~. An alternative response to the cost problem acknowledges that information on efficacy will not eliminate the need to face trade-offs between increasing incremental costs and diminishing incremental benefits. Moreover, information on efficacy will not resolve the highly individual and subjective judgments about the value of symptom relief or other aspects of improved quality of life. These two responses to the health cost problem are not mutually exclusive, al- though they lead to different emphases. While we concentrate on evaluation of effi- cacy, we acknowledge and, indeed, seek to elucidate some of the limitations of evalua- tion of efficacy as a means of improving the public health or controlling costs. Evaluation has its own costs, and so it needs to be considered how much different kinds of evaluation are worth and what their benefits may be. The long-term goal of the research that we outline here would be to de- velop and demonstrate a methodology for as- sessing these benefits and costs. To oversimplify for a moment, two possi- ble scenarios resulting from the evaluation of efficacy can be identified. In the first, a ther- apy or diagnostic method proved ineffective (or at least cost-ineffective) would be * Chairman, Department of Health Policy and Management, Harvard School of Public Health. t Professor of Policy and Decision Sciences, Har- vard School of Public Health. 506 dropped by the profession, and the money saved would reduce the national medical budget without substantially impairing health. In the second scenario, a procedure is proved effective, and this leads to more wide- spread use and resulting health benefits. l here are examples of both scenarios: gastric freezing for the first, and antihypertensive medications for the second. Students of policy, however, will recog- nize both of these scenarios as idealized and unrealistic. Technological changes and changes in practice are ordinarily slow, ex- cept in crisis situations. For the first scenario, funds not used for one purpose are quickly and smoothly diverted to other uses, possibly ones that compensate for an abandoned pro- cedure. Advocates of a procedure let go slowly and use ostensible (and sometimes le- gitimate) scientific arguments to cast doubt on the validity of the evaluation. For the sec- ond scenario, practitioners may be slow to adopt new procedures, even if proved effica- cious, unless they perceive the benefits to be immediate and attributable to the interven- tion (a general obstacle to adopting preven- tive medical practices). Although we recognize the difficulty of the task, we are reminded of the need for some rational basis for allocating resources to clini- cal experiments. Budgets for clinical trials at the National Institutes of Health (NIH) are under constant surveillance, and vigilant members of Congress will want to know that resources have been well spent. Administra- tors at NIH facing straitened budgets must choose carefully medical procedures in which to invest the resources for a clinical trial, rec- ognizing that a trial done in one area means a trial not done in another. Can these adminis- trators not only improve their decision rules for internal budget allocation but also deter- mine whether additional resources spent on clinical investigations have a greater ex-

APPENDIX B.: SELECTED PAPERS pected return than resources spent at the margin elsewhere in the health sector? The economist's test of allocative efficiency (equal incremental return across and within sectors of the budget) has more than a little concep- tual appeal in this domain, but the analytical tasks are formidable. HOW ARE EVALUATIONS USED? The value of an evaluation depends on how its results are translated into changes in Practice. Consider three models of decision 507 by professional societies) intended to make medical practice more responsive to informa- tion would be allowed. For example, reim- bursement might be preconditioned on evi- dence of efficacy or otherwise linked to the state of information. Food and Drug Admin- istration (FDA)-type procedures for practices other than drugs and devices would fall into this category. We recognize many problems inherent in such an approach: establishing criteria for efficacy when outcomes have multiple attributes (including survival and many features of the quality of life), and es- making in the presence of information from tablishing criteria for efficacy to apply to a evaluations: the normative, the descriptive heterogeneous population when the proce- and the regulatory. ' cure could not have been tested in all possible In the normative model, the ideal physi- subpopulations. We are open to the possibil- cians act in the best interests of the society. ity that more decentralized approaches to al- They process new information rationally. tering incentives for practice in response to They allocate resources according to the principles of cost-effectiveness analysis, elect- ing the procedures that yield the maximum health benefits obtainable from the health care budget. Although some future reconfig- uration of incentives in the U.S. health care system may move the United States closer to that state of affairs, the normative model of decision making is best thought of as an unat- tainable ideal; the value of information un- der this model is the best that can possibly be expected. In the descriptive model, or models, an at- tempt would be made to assess what the re- sponse of physicians and other decision information on efficacy or even to collect- ing the information itself may be possible. WHAT IS BEING EVALUATED? Initially the problem was defined as that of evaluating the worth of a randomized clini- cal trial (RCT). Inevitably, the question arose, "Compared with what?" One possible answer was: "Compared with what would have happened in the absence of an RCT." Th'? no.c.cihilitie.c are varied: perhaps observ~- tional studies of procedures after they are widely practiced, perhaps clinic-based or community-based studies nerhans .sv~tem- makers would be to the information from a atic efforts using data banks, perhaps NIH trial. Here, past experiences must be relied consensus development conferences, perhaps on, as well as information from economic, so- committee appraisals in the Institute of Med- ciologic, and psychologic theories. We need icine, or perhaps review papers in leading to learn how to predict when the response will be rapid, when it will be slow, when it will be nonexistent, and when it will be para- doxical. Perhaps a model can be developed, based on data from past history, that would identify the characteristics of the procedure, the type of study (e.g., randomized versus observational, large versus small, multicen- ter versus single institution), the nature of the medical specialty, and other variables that can be combined into a prediction of re- sponse. In the regulatory model, the possibility of interventions (by government, by insurers, , 7 - ~ -- - --1 - ~ -1 7 .¢ ~ ~ ~ o professional journals. Whatever the alterna- tives may be, we do not seem to be able to deal with the RCT, or other methods, in iso- lation. Obviously this necessity for breadth multiplies the research effort enormously. AN ANALYTIC FRAMEWORK FOR ASSESSING COST-EFFECTIVENESS We suggest a general conceptual model for evaluating the cost-effectiveness of medical evaluations, and illustrate its applicability to two particular clinical trials.

508 Cost-Effectiveness Analysis and Health Practices Economists turn to cost-effectiveness anal- ysis when resources are limited and when the objective is to maximize some nonmonetary output. This technique is well suited to the assessment of medical procedures in which outcomes do not lend themselves to monetary valuation. The cost-effectiveness of a medical procedure may be evaluated as the ratio of its resource cost (in dollars) to some measure of its health effectiveness (Weinstein and Sta- son, 1977; U.S. Congress, 1980; Warner and Luce, 1982~. The units of effectiveness vary across studies, but years of life gained (per- haps adjusted or weighted for quality) are the most commonly used. The rationale for using such a ratio as a basis for resource allocation is as follows. Assume that the society's objec- tive is to allocate its health budget to achieve the maximum total health benefit (setting aside, for the moment, equity concerns). Then the optimal decision rule is to rank or- der programs in increasing order of their cost-effectiveness ratios (C/E) and to assign priorities on this basis. The C/E ratio for the last program chosen under the budget con- straint may be interpreted as the incremental health value (for example, in years of life, or quality-adjusted years of life) per additional dollar allocated to health care. Although the cost-effectiveness model is far from being used as a blueprint for health resources allocated in practice, many studies along these lines have helped to clarify the relative efficiency with which health care re- sources are being, or might be, consumed in various areas of medical technology (Wein- stein and Stason, 1976; Bunker et al., 1977; U.S. Congress, 1980~. A Cost-Effectiveness Model for Clinical Trials In the above formulation, the net costs ~ C) and net effectiveness (E) are uncertain. For purposes of today's decision making, it may be reasonable to act on best estimates of these values, but the possibility that new informa- tion might alter perceptions of these variables must not be obscured, thus permitting reallo- ASSESSING MEDICAL TECHNOLOGY cations of the budget in more health-produc- ing ways. It is reasonable to ask what is the value of information about the effectiveness (or cost) of a medical procedure? Moreover, since resources for providing such informa- tion (e.g., for clinical trials) are limited, it is reasonable to ask what is the cost-effective- ness ratio for a clinical trial, where the cost would be the resource cost of the trial, and the effectiveness would be the expected in- crease in the health benefits produced, owing to the information. We would also want to take into account the possibility that, if the utilization of a procedure drops as a conse- quence of the trial (e.g., if the procedure is found not to be effective), that might have the effect of freeing health care resources for other beneficial purposes. Thus, the cost-effectiveness ratio for a study would be represented as Cs~u~yliAE - (A C/~], where Cams is the cost of the study, AE is the net expected health gain (e.g., years of life gained) attributable to the study, it C is the net expected addition to health care cost attributable to the study, and ~ is an "exchange rate" between health benefits and health care costs. Below are comments on each of these terms. The net expected health benefit from the study (A E) depends on a number of factors, such as the following: the presumptive prob- ability that the intervention being evaluated is more effective than the current interven- tion; the magnitude of the possible gain in ef- fectiveness; the probability that the study will detect the effect, if present; the propor- tion of the population that will adopt the new intervention if the trial is conducted and under each possible study result; the propor- tion that will adopt the new intervention if no trial is conducted. The net expected addition to health care cost could be positive (if the trial leads to adoption of a more-expensive intervention) or negative (if the trial leads to cost savings- for example, if the prevailing therapy is found to be no more effective than placebo). The weighting factor ~ reflects the equiva- lent health value society wishes to place on health resource costs. Under our idealized normative model, ~ would equal the cost-ef- fectiveness ratio for the lowest priority health

APPENDIX B.: SELECTED PAPERS program adopted under society's budget con- straint. More realistically, it could reflect the cost-effectiveness ratio for the health pro- grams that would, in fact, be forgone if the new intervention absorbed a share of avail- able resources. Or, it could reflect society's explicit monetary valuation of health, in terms of willingness to pay, human capital, or other measures. Finally, the numerator, Cstudy' represents the cost of the study itself. In order to express both numerator and denominator in compa- rable units of time (e.g., cost per year, bene- fits per year), they would both have to be ei- ther an annual value or a present value, according to appropriate accounting proce- dures. Examples Beta-Carotene The first example con- cerns an ongoing trial of beta-carotene against placebo, in which the hypothesis is that beta-carotene might prevent cancer. Background on this subject and details of the calculations of cost-effectiveness of the trial have been reported (Weinstein, 1983~. An expected annual benefit (/kE) of 4,600 years of life saved was calculated as follows: 0.1 x 0.15 x 0.64 x 0.10 400,000 12 (prior probability of ef- fect) (percent reduction in cancer mortality if effect is present and if inter- vention is universally adopted) (probability that study will detect effect, if pres- ent) (increase in proportion of population consuming more than 15 mg/day beta-carotene if study is positive, compared with no study) (cancer deaths per year) (years of life saved per number of cancer deaths averted). 509 An expected annual cost (/` C) of $66 mil- lion was based on the assumptions that an au- ditional 10 percent of the population would consume 15 mg/day at an annual cost of $30 (for a pharmacologic preparation), and that this would occur with a probability of 0.11 (allowing for the risk of a false-positive study result*) . The cost of the study ~ Cs~u,~y) was estimated to be $4 million. Taking the value of this amount 15 years later, when the first cancer deaths would be avoided, and taking an an- nual value in perpetuity at 5 percent per an- num leads to an annual equivalent of $420,000. If the cost of treatment is ignored, there- fore, the cost-effectiveness ratio for this study would be: $420,000/year - 4,600 years of life/year or $91/life-year gained. If the cost of treat- ment is included and an opportunity cost of $50,000 per year of life is assumed, then the ratio becomes: $420,000/year 4,600 years of life/year - ($66 million/year)/~$50,000/year of life) or $128/life-year gained. In either case, this trial appears to be an excellent investment. This randomized trial was funded, after some controversy, as an add-on to a trial ex- amining the relation between aspirin and myocardial infarction. The study, following more than 20,000 physicians over a 5-year period, is now in progress. Mild Hypertension Calculation of the potential benefits and costs of a mild hyper- tension trial was made at the time such a trial was being considered by the Veterans Ad- ministration and the National Institutes of Health (Laird et al., 1979~. First, the cost of the trial was estimated at $135 million, as- suming that 28,000 subjects were followed * If the study has a 64 percent chance of detecting a genuine effect and a 5 percent chance of detecting an effect when none is present, then the probability of a positive study is (0.1) (0.64) + (0.9) (0.05) = 0.109.

510 for 5 years. Next, the size of the population at risk was estimated to be 20 million, of which 10 percent was already being treated. To simplify, consider three possible results of the trial as viewed prospectively at that time: not efficacious, efficacious, and inconclusive. If the finding was that treatment is not effica- cious, and if this finding is translated into practice, then 2 million persons per year would not spend an average of $200 on treat- ment, for a total of $400 million per year. Over 10 years, with discounting at 5 percent per annum, the present valise would he $.S billion. Say that a 0.1 probability was as- signed to the event that treatment is not ef- fective and that a 0.2 probability was as- signed that the study would show con- clusively that the effect is either zero or small enough to be considered outweighed by risks and costs. (The latter estimate can be made more rigorous by considering study size, a prior distribution of the efficacy parameters, such as mortality rates, and the probability that each particular finding would result in reduced utilization.) Under these assump- tions, the study appeared to have a 0.02 chance of saving $3 billion over 10 years, an expected value of $60 million. Therefore, this contingency would pay back half the cost of the study. Then the analysis would have to be repeated under the possibility that treatment is efficacious and that the study will demon- strate this. To do this, the health benefits would have to be estimated- as Weinstein and Stason (1976) have done and the addi- tional treatment costs owing to increased uti- lization would have to be added. It would also be necessary to consider the false- negative case (treatment is efficacious, but the study says it is not) and the false- positive case (treatment is not efficacious, but the study says it is). These estimates could be substituted into the cost-effectiveness model and the cost-effectiveness ratio of the study could be assessed. The epilogue to this fable (although it is by no means over) is that this particular trial never was conducted, but another major trial reported its results in 1979 (Hypertension De- tection and Follow-up Program, 1979~. It re- ported a significant and important treatment effect, especially in the mildly hypertensive ASSESSING MEDICAL TECHNOLOGY group. The controversy continues as to whether this community-based study was re- ally measuring the effects of antihypertensive medication or whether other differences be- tween the treatments could have accounted for the difference in mortality. PROBLEMS IN ASSESSING COST-EFFECTIVENESS OF STUDIES As the foregoing illustration suggests, the diversity and incomparability of situations forces us to tailor the evaluation to a particu- lar study. Diagnosis, prevention, therapies, palliation, and health care delivery all fall within the scope of the studies that we might try to evaluate. RCTs can be used for any of them, or they may be a component of evalua- tion. For example, in considering the dissem- ination of a new, expensive technology, a RCT may be required to help measure the ef- fectiveness of treatment as one component of an evaluation. Another component might be related to utilization patterns, and yet an- other might be related to costs. We will prob- ably tend to focus on RCT as a method of providing information on efficacy and take information on other aspects of cost-effec- tiveness as given. However, we may also want to consider how to assess the value of information on costs or on patterns of use of medical procedures and facilities. In any event, the following tasks lie before us in vir- tually any attempt to evaluate a study posi- tively. How Decisions Will Be Made with the Experiment We do not know very much about how de- cisions are actually made. We need a system- atic set of historical studies that tells us the situation before, during, and after the evalu- ations. (The term evaluations is used because often more than one is available ~ From these, it might be possible to identify the fac- tors that tend to predict the impact of evalua- tions on practice. For example, how does the effect of a RCT on practice depend on the ex- istence of an inventory of prior observational studies? Does it matter whether the RCT contradicts or confirms the previous studies?

APPENDIX B.: SELECTED PAPERS Does the second, or third, RCT seem to make more of a difference than the first? Perhaps, as Cochrane (1972) suggested, we should sys- tematically plan more than one trial, not lust for scientific reasons, but because people will pay attention to the results. How Decisions Will Be Made from the Literature Suppose we take the observational study model in which an innovation comes into so- ciety, is practiced (or experimented on) for awhile, and reports appear. What is the course of events? We can draw upon the liter- ature for theoretical insights, but the empiri- cal data base is thin. We see no way to handle this except to obtain a collection of situations and try to trace them as cases and then to generalize some models. For example, by sys- tematically reviewing a surgical journal through the years, Barnes (1977) has found examples of surgical innovations that later were discarded. Measures of Efficacy Acute and chronic diseases tend to give us different measures of outcome. In acute dis- eases we usually focus on proportion surviv- ing, proportion cured, or degree of cure, rather than length of survival. Morbidity, measured perhaps by days in the hospital, gives another measure of efficacy. Ideally costs, risks, and benefits from the new treat- ment would be compared with those from the standard treatment. In chronic disease, we may be especially concerned with length of survival and with quality of life. Although it is generally agreed that quality of life is important, indeed it is often the dominant issue, its measurement, evaluation, and integration into cost-benefit studies currently must still be regarded as ex- perimental (Weinstein, 1979~. Heterogeneous Populations Information on homogeneity of response to treatment across patients and providers tells about the uncertainty of improvements a therapy offers. If community hospitals get 511 different results from teaching hospitals, or if various ethnic, age, or sex groups produce differing responses, then efficacy becomes difficult to measure. In these circumstances, there is difficulty in nailing down the amount of gain attributable to new information. A trial may be valuable in describing who can benefit from a procedure and who can not. Such information could save lots of money, even if most procedures are benefi- cial for some people. But learning how to de- scribe the subpopulations that can benefit may not be easy, especially if there is not a good predictive model when patients are al- located to treatments and it is decided how to stratify the study. Assessing the Information Content of Studies The precision of outcome achievable by various designs depends on their size, on their .~ .. r . a. _ ~ stratification, and on the uniformity of lnal- vidual and group responses. In addition, the measurement technique sets some bounds on precision because simple yes-no responses may not be as sensitive as those achieved by relevant measured variables. When the out- come variables measured are not the relevant ones, but are proxies for them, both precision and validity and lost. The RCT, however, is likely to give values for a rather narrow setting and would need to be buttressed by further information from before and after the investigation. Nonexperimental designs run the gamut from anecdotes or case studies of single indi- viduals through observational studies. Cur- rent behavior toward such studies is of great variety, ranging from ignoring them, to re- garding them as stimuli for designing studies with better controls, to regarding them as be- ing so true as to override contradictory results from better-controlled studies. The reasons given for these differing attitudes include the fact that physicians like the medical theory, that institutions like the implied reimburse- ment policy, that no one has a better therapy, that patients need something, that a new generation of physicians is required to under- stand the new biological theory, and that pa- tients will not comply, but those do not help

512 in developing a normative basis for judging the information content of the data from the studies. Predicting the Demand for Procedures By assessing numbers of patients with the disease and the rates with which the disease occurs and progresses, we can get an idea of the importance of a procedure and its value. The value of an innovation also depends on how soon another innovation that is at least as good comes along and is adopted. Thinking about the course of diffusion over time raises another important question: At what point in time should a trial be con- ducted? We do not want to wait too long, be- cause then the procedure will be established and practice will be hard to change. But we also do not want to do the trial too soon, be- cause (1) the technology may not be techni- cally mature and may yet improve over time (in which case nobody will pay attention to the trial if it shows no benefit), and (2) the innovation may turn out to be an obvious loser and sink into obscurity under its own weight. Assessing Priors Gilbert et al. (1977) took a small step in this direction by reviewing randomized clini- cal trials in surgery over a 10-year period. They estimated the distribution of the size of the improvements (or losses). They separated the experiments into two classes: those inno- vations intended to improve primary out- comes from surgery and anesthesia and those intended to prevent or reduce complications following surgery and anesthesia. They found the average gain across studies to be about 0 percent improvement, and the stan- dard deviation for the gain in the primaries to be about 8 percent and for the secondaries to be about 21 percent. Such empirical studies help assess the prior probabilities of the size of an improvement by an innovation. Costs and Risks of Studies If we already have an experimental design, we are likely to be able to evaluate its direct ASSESSING MEDICAL TECHNOLOGY costs. Although there can be quarrels about whether the cost of treatment, for example, should be allocated to the cost of the investi- gation, there should not be much difficulty evaluating the price of a given trial. On the other hand, in certain cancer trials, the incre- mental cost may be small because the fixed cost of a multicenter study group has already been paid. It is understood that incremental cost is the appropriate measure. The question of risks is a thorny one that arises when human subjects are given a treat- ment that is less effective than the alternative (Weinstein, 1974~. For treatments that may be applied to reasonably large numbers of pa- tients, this should be considered a minor risk compared with the long-term value of know- ing which is the better treatment. However, if horizons are short, these problems may be more important. Other Benefits of Trials One of the great values of combining well- founded facts with good theory resides in the bounds that can be set. Thus a study that gives solid information about death rates, re- covery times, and rates of complications for a variety of treatment groups is likely to pro- vide extra values that go beyond its own problem. The National Halothane Study, for example, not only studied the safety of anes- thetics, generally, but also provided data used to design other studies, stimulated the further Study of Institutional Differences (in operative death rates), and acted as a proving ground for a variety of statistical methods and encouraged their further development. How shall such information be evaluated? Another benefit of clinical trials is that they may reinforce a general professional awareness of the value of scientific evidence of efficacy. CONCLUSION This paper outlines a general program of research. Until such a program can be car- ried out, three observations seem likely to stand up to further scrutiny:

APPENDIX B.: SELECTED PAPERS 1. In planning a controlled trial, it would be valuable for people expert in effectiveness, costs, and other data in that health area to perform at least a rough, back-of-the- envelope calculation of potential benefits and cost savings. This sort of analysis cannot hurt unless we make major omissions or misalloca- tions of costs, and even if we do not yet know how to implement a full-blown planning model of the type we have outlined, it may help. 2. Evidence of efficacy from controlled trials will not solve the health care cost prob- lem and will not eliminate uncertainty from medical decisions. Value judgments related to outcomes with multiple attributes (includ- ing quality of life) will remain, as will uncer- tainties at the level of the individual patient. Moreover, the problems of what to do about procedures that offer diminishing (but posi- tive) benefits at increasing costs will always be with us. 3. Clinical trials can help, however; and we need to learn what their value is and how to increase it. As a nation, we may try various institutional changes to encourage the use of information from trials by practitioners, per- haps by linking reimbursement to demon- strated efficacy, but more likely by providing incentives to be both efficacy-conscious and cost-conscious. An earlier, longer version of this paper (Mosteller and Weinstein, 1985) was pre- pared for the National Bureau of Economic Research's (NBER) Conference on Social Ex- perimentation, supported by the Alfred P. Sloan Foundation and held on Hilton Head Island, South Carolina, March 5-7, 1981. The authors are solely responsible for any opinions expressed in this paper. The authors wish to thank John Bailar, Leon Eisenberg, Rashi Fein, Howard Fra- zier, Alexander Leaf, and Marc Roberts for their comments and suggestions. We are es- pecially indebted to David Freedman, Jay Kadane, and other participants in the NBER conference for their thoughtful criticism. This research was supported in part by a grant from the Robert Wood Johnson Foun- dation to the Center for the Analysis of 513 Health Practices and by National Science Foundation Grant SES-75-15702. REFERENCES Barnes, B. A. 1977. Discarded operations: surgical innovations by trial and error. In J. P. Bunker, B. A. Barnes, and F. Mosteller, eds., Costs, Risks and Bene- fits of Surgery. New York: Oxford University Press. Bunker, J. P., B. A. Barnes, and F. Mosteller. 1977. Costs, Risks, and Benefits of Surgery. New York: Ox- ford University Press. Cochrane, A. L. 1972. Effectiveness and Effi- ciency: Random Reflections on Health Services. Lon- don: Nuffield Provincial Hospitals Trust. Fuchs, V. R. 1974. Who Shall Live? New York: Ba- sic Books. Gilbert, J. P., B. McPeek, and F. Mosteller. 1977. Progress in surgery and anesthesia: benefits and risks of innovative therapy. In J. P. Bunker, B. A. Barnes, and F. Mosteller, eds., Costs, Risks, and Benefits of Surgery. New York: Oxford University Press. Hiatt, H. H. 1975. Protecting the medical com- mons: Who is responsible? New England Journal of Medicine 293:235-241. Hypertension Detection and Follow-up Program Cooperative Group. 1979. Five-year findings of the Hypertension Detection and Follow-up Program. Journal of the American Medical Association 242:2562-2755. Laird, N. M., M. C. Weinstein, and W. B. Stason. 1979. Sample-size estimation: a sensitivity analysis in the context of a clinical trial for treatment of mild hy- pertension. American Journal of Epidemiology 109:408-419. Mosteller, F., and M. C. Weinstein. 1985. Toward evaluating the cost-effectiveness of medical and social experiments. In J. A. Hausman and D. A. Wise, eds., Social Experimentation. The National Bureau of Eco- nomic Research. Chicago: University of Chicago Press. U.S. Congress, Office of Technology Assessment. 1980. The Implications of Cost-Effectiveness Analysis of Medical Technology. Washington, D. C.: U. S. Government Printing Office. Warner, K. E., and B. R. Luce. 1982. Cost-Benefit and Cost-Effectiveness Analysis in Health Care: Prin- ciples, Practice, and Potential. Ann Arbor, Michigan: Health Administration Press. Weinstein, M. C. 1974. Allocation of subjects in medical experiments. New England Journal of Medi- cine 291:1278-1285. Weinstein, M. C. 1979. Economic evaluation of medical procedures and technologies: progress, prob- lems and prospects. In U.S. National Center for

514 Health Services Research, Medical Technology, DHEW Pub. No. (PHS) 79-3254. Washington, D.C. U.S. Government Printing Office. Weinstein, M. C. 1983. Cost-effective priorities for cancer prevention. Science 221:17-23. Weinstein, M. C., and W. B. Stason. 1976. Hyper- ASSESSING MEDICAL TECHNOLOGY tension: A Policy Perspective. Cambridge, Mass.: Harvard University Press. Weinstein, M. C., and W. B. Stason. 1977. Foun- dations of cost-effectiveness analysis for health and medical practices. Neal England Journal of Medicine 296:716-721.

Technology Assessment in Prepaid Group Practice Morris F. Collen* Technology assessment (TA) as applied herein is the process of evaluating the extent to which a medical technology achieves its in- tended objectives and examining important unintended and indirect consequences from the use of the technology. The TA process, in general, follows that recommended by the Office of Technology Assessment TOTAL, and has been applied in prepaid group prac- tice (POP) to a variety of medical technolo- gies, including procedures, equipment, or systems that are used for diagnostic, thera- peutic, or coordinating patient care services. A prepaid group practice (often called a health maintenance organization or HMO) is an organization of health care providers who by contract provide specified comprehensive services to a defined, voluntarily enrolled membership, financed primarily through pe- riodic per capita payments of members' dues or premiums. INCENTIVES FOR TECHNOLOGY ASSESSMENT TA is a useful management tool for a pre- paid group practice whose expenditures are largely limited by prospective annual bud- gets and whose revenues are primarily gener- ated by periodic payments of fixed premiums from its defined membership.2 Within the constraints of a fixed annual budget, the group is motivated to practice a level of qual- ity care which satisfies its patients and retains its members. The PGP administrator strives to improve managerial efficiency by modify- ing care processes to decrease costs yet pro- vide adequate services to satisfy and retain members, such as increased efficiency of scheduling systems, training lower-cost per- sonnel for technical procedures, applying sys- * Director, Technology Assessment, Department of Medical Methods Research, Kaiser-Permanente Medical Care Program, Oakland, California. 515 terns engineering to appropriate care pro- cesses (such as multiphasic health testing to save physician time), etc. PGP physicians are increasingly being encouraged on a micro- level to improve their clinical efficiency by using clinical analysis to arrive at the best di- agnosis and treatment at the lowest cost. Although a physician is traditionally trained to provide clinically effective care at a cost acceptable to the patient, prepayment reverses the traditional financial incentives of fee-for-service practice and encourages pa- tients to seek well care in addition to sick cared Under a fee-for-service or cost-reim- bursement financial arrangement, a medical care provider's income is directly dependent upon revenues generated from the services provided to patients. In the PGP, the pro- gram profits primarily from its well mem- bers, and there is a direct financial incentive to provide to the sick appropriate, effective care at the lowest cost. Within the PGP financing structure, an in- crease in the use of a technology often in- creases expenses and does not generate reve- nues as it might in a fee-for-service or cost-reimbursement program. Nor, in a non- profit program, are there any tax savings from the purchases of equipment that other- wise could increase cash flow. Accordingly, there exists in PGP significant incentives to acquire and employ only those technologies that maintain or increase the effectiveness of medical care yet contain or decrease costs. In the past, the competition to HMOs has been from fee-for-service practitioners. To survive under the newly increasing competition from other health maintenance organizations, a PGPs physician and management must pru- dently select cost-effective technology to sus- tain an appropriate balance between quality of medical care services to its patients and costs (premiums) to its members. The peer pressures An physician specialists to acquire and use the same innovations in technology employed by others in their spe-

516 cialty must be balanced in a PGP by the fi- nancial constraints of the fixed budget. This requires prudent allocation of limited re- sources among competing alternative tech- nologies and specialties which provides strong incentives to obtain the most cost- effective technology. Of course, this incen- tive is present in any hospital with a fixed an- nual budget, but the PGP cannot balance an overspent equipment budget by utilizing the new equipment to generate more revenues. SELECTION OF TECHNOLOGY FOR ASSESSMENT TA is an expensive process so it is usually done only when use of the technology re- quires sufficient resources to justify the cost of it. An existing technology may be selected for assessment because (1) physicians request a change in the established usage and there are insufficient data available to administra- tion as to its comparative cost-effectiveness; (2) the aggregate expenditures from the use of the technology have reached a level that calls for a reassessment as to whether its utilization is appropriate; or (3) there exist alternative, competitive technologies and uncertainty as to which is the most cost-effective. For a new technology, a decision is required as to whether the technology is still investiga- tional, or whether it should be provided to PGP members as a benefit. A TA usually is funded by the PGP, but for large, expensive TAs some grant support may be solicited from governmental or nonprofit foundations. When the TA is entirely initi- ated and funded by a PGP, it is usually pre- pared for internal use only, and it is rarely published in the literature since it will likely be limited to specific organizational objec- tives. If a TA is initiated or supported in whole or in part by an outside grant, it is then pre- pared to satisfy both internal and public dis- tribution. Such a TA will be more compre- hensive, and the document becomes one in the public domain through publication in the scientific literature. When a technology has been selected for assessment, it is necessary first to define the technology precisely and determine its objec- ASSESSING MEDICAL TECHNOLOGY fives for its specific applications, then to identify alternative technologies also used for the same applications. A two-dimensional categorization of tech- nology (complexity and clinical application) is often used since the process of assessment differs for a single procedure, an expensive equipment item, or a complex system; fur- thermore, the assessment varies for diagnos- tic, therapeutic, and coordinating (such as computer support) technology. Specific medical conditions are defined for which the diagnostic or therapeutic technol- ogy is used, and the prevalence and incidence of these conditions are estimated in the popu- lation being served. It is a great advantage to a PGP that its defined population provides a denominator, so the rates of utilization of the technology for these conditions can be ap- praised to provide projections of workloads for the technology and requirements for the associated technical personnel. TA METHODOLOGY An evaluation of the extent to which the technology achieves its specified objectives involves an analysis of its effectiveness and cost and of the comparative cost-effectiveness for alternative technologies to achieve these same objectives. Appropriate evaluative data are used, when available, from studies con- ducted within the medical care program. Otherwise data are sought from the medical literature. For unavailable or controversial data, appropriately selected experts within the PGP are used for consensus development to provide substitute data, e.g., as was done to obtain probabilities of patient outcome af- ter alpha-fetoprotein screening. Effective analysis of the technology is con- ducted in accordance with its category; i.e., for a diagnostic technology, its sensitivity and specificity in achieving its intended diagno- sis; for a treatment technology, its effects on morbidity, disability, and/or mortality; and for a coordinating technology, its having achieved its intended effects on efficiency or productivity. Analysis of effectiveness in- cludes evaluation of safety and of known ad- verse effects. Effectiveness is then compared

APPENDIX B.: SELECTED PAPERS to that of alternative technologies used to achieve the same objectives. Economic analysis is conducted to deter- mine unit costs (or charges) and total costs (including both direct and indirect costs) for the specified uses of the technology. Eco- nomic analysis sometimes includes the im- pact of different payment modes for medical services, such as prepayment versus Medicare cost-reimbursement, purchase versus lease fi- nancing for capitalized equipment, etc. Cost-effectiveness analysis provides com- parative costs of employing alternative tech- nologies to achieve similar, specified levels of desired effectiveness. The advantages and disadvantages of an alternative technology are specified (e. g., criteria or clinical indica- tions for patient selection in the treatment of end-stage renal disease). Other benefits from the use of each technology are considered, al- though a formal cost-benefit analysis is rarely conducted due to methodological problems associated with subjective estimates of bene- fits, such as the value of added years of life, quality of life measures, etc. A comprehensive TA requires an appraisal of important unintended consequences from using a medical technology, including any significant legal, ethical, organizational, so- cial, or environmental effects. Legal and legislative aspects are consid- ered in almost every TA, such as licensing re- quirements for biofeedback therapists, legis- lative regulations for Medicare that affect selection of center versus home hemodialysis treatments, etc. Medical liabilities have been a worrisome consideration from the conse- quences of false-positive and -negative tests, as in the case of screening pregnant women for fetal neural tube defects by the alpha- fetoprotein test. Ethical considerations are important in es- tablishing criteria for selecting patients most suitable for a technology. This was of great concern for patients with end-stage renal dis- ease before Medicare financed every patient with this condition. Unintended consequences for a POP can result, for example, from the increasing in- terest of surrounding community and health plan members in more self-care and holistic practices, including biofeedback. Environ- ~. ~ mental effects may be a serious consider- ation, perhaps for the effect of the location of shared-facilities technology on the accessibil- ity to care, as, for example, the consideration of centralization versus decentralization of hemodialysis centers on transportation re- quirements for patients. Most cost-effectiveness analyses are tested as to their sensitivity to potentially important variations in the data used. For example, dif- ferences in the characteristics of patients' ages and in the causes of end-stage renal dis- ease greatly affect the optimal mix of alterna- tive technologies required by a POP. EXAMPLES OF TA IN PGPs Some PGPs were solicited for statements as to their experience with TA, and the follow- ing responses were obtained. TA in Northern California by KPMCP The Kaiser-Permanente Medical Care Pro- gram (KPMCP) in Northern California es- tablished in 1961 its Department of Medical Methods Research (MMR). Its purpose is to conduct research directed toward utilizing modern technology for the development of improved methods of providing and deliver- ing medical care within the KPMCP. MMR is professionally administered by its director under the Permanente Medical Group. All MMR grants and contracts are the financial responsibility of the Kaiser Founda- tion Research Institute, a nonprofit, tax- exempt corporation. From 1968 to 1973, the federal govern- ment's National Center for Health Services Research and Development (NCHSR&D) awarded to MMR a Health Services Research Center grant with a technology focus. Its pri- mary project was to develop a computerized pilot medical data system in the Kaiser- Permanente San Francisco hospital and to es- tablish a medical data base for both patient care and health services research. In 1979, the Division of Technology Assess- ment was established within MMR. The pri- mary purpose of its technology assessments is to aid in the selection of the most cost- effective technology. This division has em-

518 played technology assessment for procedures, eauinment and systems used in diagnostic, tnerapeur~c, and coordinative patient care services. The TA process, in general, first identifies for assessment an appropriate tech- nology that uses substantial resources. The TA then determines the characteristics of the population utilizing the technology and de- termines the workloads for its utilization. A1- ternative technologies used for the same spec- ified objectives are evaluated as to important intended and unintended consequences, with consideration of alternative mixes of compet- ing technologies per million people. The TA has used epidemiological methods, con- trolled studies, medical record studies, litera- ture reviews, consensus development, and sensitivity analysis. The intent is not to arrive at a single decision or recommendation, but to present the important consequences of ap- propriate alternative technologies so that management can make a more rational deci- sion. Thus the TA attempts to decrease the uncertainty of decision making. Technologies suitable for assessment have been identified at all organizational levels. The executive director requested a TA on bio- feedback to assist in a policy decision as to whether it should be included as a prepaid benefit. Service chiefs had already requested a TA of alternative technologies for the treat- ment of end-stage renal disease, when ad- ministration also requested this after it was learned that Medicare, beginning January 1982, would decrease the reimbursement of costs. Pediatric geneticists requested a TA on alpha-fetoprotein screening because of im- pending legislation requiring it for pregnant women as a screening test for fetal neural tube defects. Technology procedures suitable for TA also have been identified by monitor- ing organizational gross expenditures; e.g., two-view chest x rays were the most fre- quently ordered diagnostic radiology proce- dure, so a TA was conducted to assess the consequences of alternative criteria for utili- zation of this technology. Periodic surveys of chiefs of professional services have been conducted in order to at- tempt to identify as early as possible future substantial increases in capital-intensive . . , ~ .1 . · 1 1- . - ASSESSING MEDICAL TECHNOLOGY equipment needs, e.g., as has just occurred for diagnostic imaging equipment. Some TAs conducted by KPMCP's Divi- sion of Technology Assessment are summa- rized below: Biofeedback A new treatment modality was advocated by physicians for chronically recurring headaches and other conditions. A TA was completed using data from three Kaiser-Permanente medical centers and from the literature. The consequences were as- sessed as they would relate to three alterna- tive organizational decisions for providing biofeedback as a health plan benefit, namely, full biofeedback benefits, partial benefits for treatment of chronic headaches only, and no biofeedback benefits. Included was a sensi- tivity analysis of effects from a variety of bio- feedback treatment schedules. Utilization of Diagnostic X Rays Skull, chest, and upper gastrointestinal tract diag- nostic x-ray procedures are leading radiologi- cal expenditures for ambulatory care ser- vices. A TA was completed analyzing clinical indications (i.e., referral criteria) for order- ing these x-ray examinations and the effects of the radiologists; it reports on the diagnosis, treatment, and outcome of patients.4 This study was supported in part by a grant from the Food and Drug Administration's (FDA) Bureau of Radiological Health. Serum Alpha-Fetoprotein Increasing leg- islative interest in this procedure could re- quire KPMCP to provide serum alpha-feto- protein screening tests to 30,000 pregnant women each year, followed by a series of costly technical procedures (including ultra- sonography and amniocentesis) when posi- tive. A TA compares the consequences of screening versus no screening of this subpop- ulation. End-Stage Renal Disease The increasing cost of institution-based or center hemodialy- sis, the decreasing cost-reimbursement from Medicare, and the alternative technologies now available for the treatment of end-stage renal disease (ESRD) patients make this an

APPENDIX B.: SELECTED PAPERS ideal TA for identifying the most cost- effective mix of alternative technologies, per one million KPMCP members, for the treat- ment of ESRD patients over a projected 10- year period. ,, By compar- ing the use of a systematized process for pro- viding periodic health examinations with the traditional health checkup mode and by con- ducting a long-term controlled study of effec- tiveness it was shown that a more compre- hensive battery of tests are provided at a lower cost and that they use less physician time. The multiphasic approach was more effective in decreasing mortality from poten- tially postponable conditions, and the cost of care for 12 months after the checkup was less for those receiving the multiphasic checkup as compared to the traditional mode for simi- lar groups of patients standardized by age, sex, and health status.5 The study was sup- ported by a grant from the National Center for Health Services Research (NCHSR). Multi~hasic Health Testing Team Primary Care An alternative sys- tem for providing primary care employs a team of physicians, nurse practitioners, a health educator, and a mental health coun- selor, and includes a multiphasic type of health checkup in the initial entry visit. A TA is being conducted that compares cohort members randomly assigned upon joining the KPMCP to either the new team or to tradi- tional primary care services. This is a follow- up study of a new approach to ambulatory care involving the entry of patients through a paramedically staffed health evaluation ser- vice.6 7 This study is supported in part by a grant from the H. J. Kaiser Family Founda- tion. - ... Hospital Computer System A prototype hospital computer system was installed in one medical center and a patient computer medi- cal record system was compared to a tradi- tional hospital and outpatient record sys- tem.S This study was supported in part by a grant from the NCHSR. 519 TA in Oregon by KPMCP The Kaiser-Permanente Medical Care Pro- gram (KPMCP) in Oregon has a Health Ser- vices Research Center, the primary aim of which has been to design, develop, and direct research and demonstration projects to add to knowledge about health and medical cared The following briefly summarizes some of the projects that can be categorized as TA studies. Do-Not Admit Surgery Study This study compared the costs, quality of care, and satis- faction with surgical services performed in hospital operating rooms on nonadmitted pa- tients with similar services performed on hos- pital inpatients.~° Contraceptive Studies The purposes of contraceptive studies are to understand fac- tors that determine the acceptability of vari- ous contraceptive methods (including contra- ceptive sterilization and new hormonal contraceptive methods for men), and to eval- uate the long-term medical and psychosocial sequelae of various procedures (such as vasec- tomy) . i~ Alcohol Treatment Demonstration Proj- ect The purpose of this project was to im- plement and evaluate a new program for al- cohol services. The evaluation included an analysis of the effect of a copayment on the use of alcohol treatment services, on patient functioning, and on nonalcohol-related utili- zation. 12 Television for Community Health The overall objective is to develop and test a unique behavior change program directed at obesity and obesity-related behaviors. This study will use broadcast television and two levels of professional support: (1) regular mailings to participants and (2) regular mail- ings together with some direct assistance in the establishment of local mutual support groups and in the training of group leaders. In addition to assessing efficacy, the second- ary objective is to estimate the costs of imple- menting a similar program in the context of

520 any HMO and to produce the operations manual, educational materials, and support documents required to allow the ready im- plementation of such a program in other HMOs or other population organizations. In addition, clinical drug intervention trials have been conducted, including partic- ipation in the Multiple Risk Factor Interven- tion Trial (MRFIT) and the Beta-Blocker Heart Attack Trial (BHAT) TA In Southern California KPMCP The Kaiser-Permanente Medical Care Pro- gram (KPMCP) in Southern California has taken a pragmatic orientation to TA. ~3 Its ef- forts in this regard are to review new technol- ogies with a major emphasis on whether they are ready for introduction and whether they work as reported. These efforts are an inte- gral part of management decision making and operations. Specific areas in which these studies have been conducted include com- puted tomography (CT) scanners, nuclear magnetic resonance imaging, nuclear emis- sion computerized tomography, intensive care unit monitoring systems, computerized arrythmia detection systems, and other mini- and microcomputer medical applications. A second emphasis for technology evalua- tion has concerned operational effectiveness, productivity, and cost-effectiveness. Ques- tions studied have included the following: Does the southern region have sufficient vol- ume to effectively provide new technology or service or should the technology/service be centralized in one or a few medical centers or dispersed widely throughout the region? And what is the most effective way to staff and otherwise operate with the new technology? Other studies in this area have included open heart surgery, bone marrow transplan- tation, and computerized systems such as outpatient pharmacy, appointment making, x-ray file management, electrocardiogram (EKG) interpretation, and admission/dis- charge/transfer. A third level of effort incorporates the other efforts, as appropriate, to develop a southern region plan for implementing a technology or service for the benefit of our members. Questions concern the demand for ASSESSING MEDICAL TECHNOLOGY the service, quality, cost, and other consider- ations. Areas that have been addressed in- clude open heart surgery, CT scanning, radi- ation therapy, neurosurgery, hemodialysis, skilled nursing facilities, acute rehabilitation services, neonatal intensive care units, and chemical dependency rehabilitation. In summary, the KPMCP southern re- gion's efforts include not only the issues of hard technology that are usually considered as technology assessment, but also the assess- ment of human and management factors that have an impact on the ultimate value of the use of technology. Efforts are directed to- ward answering the following question: What is the most cost-effective approach to using existing and new technology for the benefit of our members? TA in the Harvard Community Health Plan In 1977, the Board of Trustess of the Har- vard Community Health Plan (HCHP) estab- lished a fund for research at HCHP, to which would be donated 0.5 percent of the gross premium income of the plan each year.~4 A research department was established to im- plement this program, and contributions to research activities from HCHP are channeled through an HCHP foundation, which has its own board of trustees. The current plan con- tribution to research amounts to some $400,000 per annum and grows with the size of the plan's overall budget. Research in the effectiveness and the cost- effectiveness of clinical practices at HCHP is represented in a number of projects, some of which are conducted wholly internally, some of which are conducted by outside investiga- tors with HCHP as a passive site, and some of which represent truly collaborative investi- gations. What follows are brief summary de- scriptions of the major projects in this area currently under way at HCHP. Laboratory Test Use Funded partially by the National Fund for Medical Education, this is an 18-month study of the utilization of 15 common laboratory tests in ambulatory centers. The study is directed at understand- ing variations in behavior within internal

APPENDIX B.: SELECTED PAPERS medicine practice groups. Interventions, in- cluding educational interventions and com- puter-based feedback, are being tested for their impact on interprovider variability. This study makes considerable use of an auto- mated medical record system. Studies of Mitral Valve Prolapse Patterns of use of diagnoses and management tech- niques for mitral valve prolapse are studied in an ambulatory population. The study will, among other things, attempt to determine variations in practice among providers and the impact of diagnostic and therapeutic in- terventions on the total well-being of the pa- tient, including self-image, recreational be- haviors, and medical management. Nondiagnostic Uses of Ultrasound in Preg- nancy This is an investigation of the ways in which ultrasonography in pregnancy af- fects the management of patients and alters their well-being in ways other than classic medical outcomes. The study involves inter- views of women undergoing ultrasonogra- phy, as well as chart reviews and physician interviews about patterns of management. Changes in Physician Test Ordering A study in cooperation with physicians at HCHP who have managed hypertension and other chronic conditions records interprovi- der variations in practice, and will compare HCHP-based practices to practices in settings with other forms of organization and finan- cial incentives. Efficacy Studies of Diagnostic X Rays This study is gathering prospective informa- tion on the use of intravenous pyelograms and upper gastrointestinal x-ray series. Clini- cal predictors of the outcomes of these tests are being assessed. Cost-Effectiveness of Periodontal Ther- apy A randomized study is being conducted in HCHP's Dental Department of different ways of managing periodontal disease, com- paring surgical and medical interventions. Analysis of Hospitalization Costs Fol- lowing a large increase in the cost of inpa- 521 tient hospitalizations over the past 2 years, this study is a detailed review of hospital bills to determine areas of inflation of hospital costs. The study will attempt to understand what types of conditions and types of diag- nostic and therapeutic procedures consume resources in hospitalized patients. A detailed substudy is being conducted with respect to costs of the neonatal intensive care unit. Valued Outcomes in Chest X Rays This is an investigation of the reasons for use of chest x rays in the Kenmore Center of HCHP. Pa- tients and physicians are being interviewed to determine what information each values, and what forms of information in general motivate the decision to order the test. Cost-Effectiveness of Lead Screening A cost-effectiveness analysis of lead screening was begun at the Harvard School of Public Health and has been continued at HCHP. Analysis of Diagnostic Skill This study uses a large data base consisting of physicians' estimates of the changes of positivity in their ordering of throat cultures and chest x rays in children. Receiver operating characteristic analysis is being employed to characterize the diagnostic skill of different physicians at dif- ferent levels of training. In addition to the above studies, work is under way in cooperation with Johns Hopkins University and with the Harvard School of Public Health to characterize the ways in which different clinical conditions at HCHP consume health care resources. Stud- ies have also been conducted on clinical trials, such as prophylactic use of phenobar- bital to prevent febrile seizures in children. TA in Group Health Cooperative of Puget Sound Group Health Cooperative of Puget Sound (GHC) is a member-owned health care coop- erative founded in 1947 with the purpose of providing health care on a prepaid basis. The governance of Group Health Cooperative is constituted by the triad of board, manage-

522 meet, and professional staff. It is through the collaboration of this triad that technology as- sessments and decisions based on them are made. is Typically, in the past, technology assess- ment at GHC was explicit only when a new technology that required a considerable capi- tal outlay was proposed. In these cases a medical staff proponent of the new technol- ogy, usually a specialist subgroup, would de- velop effectiveness and cost-effectiveness data with the assistance of management staff. The resultant proposal was presented to the professional staff executive council, and a recommendation was then sequentially for- warded to the board planning committee, the board fiscal and management committee, and finally to the board itself. Occasional technology assessments and resultant deci- sions have been made within the professional staff alone when the technology did not re- quire significant capital outlay. For example, after assessment, ileojejunal bypass for obe- sity was discontinued as a matter of profes- sional staff policy, on the basis of inadequate safety (due to metabolic complications) and unproved effectiveness. Similarly, certain unusual nutritional therapies were not al- lowed to begin. Recently, under the stimulus of the need to maintain an attractive and appropriate bene- fit package and yet hold dues at a competitive level, technology assessment at GHC has be- come more explicit and systematic. The pro- fessional staff has formed and chairs a medi- cal services committee, with representation from management and the board. Through this committee the professional staff fills what it sees as its responsibility to assess pre- dominantly new but also old medical tech- nologies and makes appropriate recommen- dations to the board. The Medical Services Committee takes as its agenda requests for provision of new ser- vices or requests for evaluation of currently provided services. Such requests may origi- nate with consumers, board, management, or professional staff. Technologies that are rapidly changing in their rate of utilization (at least 30 percent in 2 years or 50 percent in 5 years) are automatically considered by the committee. With staff assistance the propo- ASSESSING MEDICAL TECHNOLOGY nents of a particular technology prepare a justification of the technology in question, specifically with answers to the following questions. 1. Is the technology medically appropri- ate, i.e., efficacious for the individual andfor effective for the population, or else an appro- priately managed investigational technol- ogy? To answer this question the literature is examined regarding efficacy and effective- ness, and alternative technologies are re- viewed. Unproved (investigational) technol- ogies may be considered medically ap- propriate at GHC provided that (a) their uti- lization is part of a properly designed study, (b) GHC benefits by having participated in the study either directly by early access of results or indirectly, and (c) the cost of the technology is no greater than that of existing practice or any net increased costs are de- frayed by funds from outside membership dues. 2. What is the cost of the technology to the health care organization? The cost in person- nel and facilities is estimated, and a judg- ment is made as to whether this represents (a) cost savings through displacement of other technology or through cost-avoidance through improved health outcomes, (b) a break-even reallocation of funds through dis- placement of other technology, or (c) in- creased costs requiring new money. 3. What priority should this technology have among technologies competing for cov- erage under the budget for the benefit pack- age? This is a value judgment based on the estimated cost and benefit to individuals and the population. If a technology is considered to have low priority but still to be medically appropriate, a recommendation to allow it to be provided on a fee-for-service basis may be made (with the fee going to the cooperative). Similarly, an investigational technology can receive no priority for coverage, but a recom- mendation may be made to allow it to be pro- vided through the use of fee-for-service or re- search funds if it is provided as part of an ap- propriately designed study. Examples of assessments made by the com- mittee and the resultant decisions include the following:

APPENDIX B.: SELECTED PAPERS Stool Occult-Blood Screening for the Sec- ondary Prevention of Colorectal Cancer Coverage for this technology was originally proposed by members of the Family Practice Section. Such screening was judged by the committee to be medically appropriate as noninvestigatory and effective, to have esti- mated costs partially recovered from avoided costs of treatment with the remainder to be defrayed by an increase in membership dues, and to merit a high priority for coverage. Penile Prosthesis for Selected Impotence Coverage for this technology was proposed by members of the Urology Section. The technology was judged to be medically ap- propriate as noninvestigational and effective, to have costs which would require new mon- ies, and to merit a low priority for coverage. Because of its efficacy for selected individuals it was recommended that the service be pro- vided on a fee-for-service basis until covered. Scolitron, an Electrical Muscle Stimulator for the Correction of Scolios?s Proposed for coverage by orthopedists, this technology was judged to be medically appropriate only as an investigational, probably effective tech- nology. As an investigational technology it could receive no priority for coverage. The scolitron was recommended for utilization only if it could pass the criteria for an investi- gational technology: that its utilization be part of an appropriate study, that the organi- zation benefit from participation in the study, and that net costs be no greater than those for existing alternative treatments. Insulin Pump for the Treatment of Diabe- tes Proposed for coverage by internists, this technology was judged to be medically ap- propriate only as an investigational technol- ogy of unknown effectiveness. It could re- ceive no priority for coverage and could be used only if it could pass the criteria for an investigational technology. Mandibular Osteotomy for Malocclu- sion This technology is being provided cur- rently and is covered. Review was requested by personnel responsible for benefits and cov- erage. The committee judged that the tech- 523 nology, although effective, could not be judged medically appropriate because it is a dental rather than medical technology. The committee recommmended that it be dropped from coverage, although it could be continued on a fee-for-service basis, for an estimated savings of $200,00Q per year in an overall budget of approximately $130,000,000 per year. Mammography This technology is cur- rently provided and covered as medically ap- propriate for cases selected according to spe- cific criteria. Mammography came to the attention of the committee because of its rap- idly increasing utilization rate and a resul- tant request for additional radiologists. The committee determined that it was not known whether the increasing referrals for mam- mography followed the proper selection cri- teria, and therefore no judgment could be made as to medical appropriateness of hiring additional radiologists for this purpose. The question of whether the increasing utilization was medically appropriate was referred back to the professional staff committee. Ultrasound for Diagnostic Assessment of Pelvic and Intrauterine Structures This technology is currently provided and covered as medically appropriate. It came under re- view because of a request by the obstetrical and gynecological surgeons for new ultra- sound equipment. The committee deter- mined that in the interest of avoiding dupli- cation, the appropriateness of purchasing ad- ditional ultrasound equipment should not be considered until it was determined whether the radiologists or obstetrician/gynecologists should more appropriately provide the ser- vice. The recommendations of the Medical Ser- vices Committee are passed on to the Man- agement Benefits Committee, which struc- tures benefits recommendations for the Fiscal and Management Committee of the board. It is the task of the Fiscal and Management Committee to integrate recommendations as to benefits together with all other budgetary recommendations for final review and deci- sion by the board.

524 It is interesting to note that this medical technology assessment activity is carried on at GHC with no direct outside stimulation. (Indeed, it is not referred to as technology as- sessment.) Rather, it is the result of the inter- nal perception that such assessment is a nec- essary part of doing business as a prepaid group practice. NOTES ~ Office of Technology Assessment. 1980. The Im- plications of Cost-Effectiveness Analysis of Medical Technology. Chapter 3: Methodological Findings and Principles. Washington, D. C .: U. S. Government Printing Office. 2 Office of Technology Assessment. 1980. The Im- plications of Cost-Effectiveness Analysis of Medical Technology. Chapter 10: Health Maintenance Orga- nizations. Washington, D. C.: U. S. Government Printing Office. 3 Garfield, S. R. 1970. Multiphasic health testing and medical care as a right. N. Engl. J. Med. 283: 1087-1089. 4 Collen, M. F. 1983. Utilization of Diagnostic X- ray Examinations. Department of Health and Human Services, 83-82008. Washington, D.C.: U.S. Food and Drug Administration. 5 Collen, M. F. 1979. A Case Study of Multiphasic Health Testing. Appendix C in Medical Technology and the Health Care System. A Study of the Diffusion ASSESSING MEDICAL TECHNOLOGY of Equipment-Embodied Technology. A Report by the Committee on Technology Health Care, National Academy of Sciences. Also, Multiphasic Health Test- ing Services. 1978. New York: John Wiley. 6 Garfield, S. R., M. F. Collen, R. Feldman, et al. 1976. Evaluation of an ambulatory medical care de- livery system. N. Engl. J. Med. 294:426-431. ~ Collen, M. F., S. R. Garfield, R. H. Richart, et al. 1977. Cost analysis of alternative health examina- tion modes. Arch. Intern. Med. 137:73-79. ~ Collen, M. F. 1974. Hospital Computer Systems. New York: John Wiley. 9 Greenlick, M. R., Director, Health Services Re- search Center. Personal communication. l0 Marks, S. D., M. R. Greenlick, A. V. Hurtado, J. D. Johnson, and J. Henderson. 1980. Ambulatory surgery in an HMO, a study of costs, quality of care and satisfaction. Med. Care 28: 127-146. ll Weist, W., and members of the WHO Task Force on Psychosocial Factors in Family Planning. 1980. Acceptability of drugs for male fertility regula- tion: a prospectus and some preliminary data. Con- traception 21:121-134. is Hayami, D. E., and D. K. Freeborn. 1981. Ef- fect of coverage on use of an HMO alcoholism treat- ment program, outcome, and medical care utiliza- tion. Am. Pub. Health 71:1133-1143. 13 Ruml, J., Director, Benefit/Cost Analysis. Per- sonal communication. li Berwick, D. M., Research Director. Personal communication. i' Watkins, R., Staff Physician. Personal commu- . . nlcatlon.

A Randomized Controlled Trial to Evaluate the Effects of an Experimental Prepaid Group Practice on Medical Care Utilization ant! Cost Geralct T. Perkoff* The Medical Care Group of Washington University (MCG), now the Medical Care Group of St. Louis, Missouri, was begun as a randomized controlled trial to compare pre- paid group practice with fee-for-service prac- tice for comparable groups of people to learn more about the effects of the organization of medical care upon utilization and costs. A1- though small, it was possible through the en- rollment mechanism to develop prospective study and control groups and to obtain sound data on hospital and ambulatory services uti- lization and costs. The MCG plan offered comprehensive family care by salaried internists, pediatri- cians, and obstetricians who provided pri- mary ambulatory, hospital, emergency, and home services, and specialists on the full-time faculty of Washington University School of Medicine who provided consultations, sur- gery, laboratory, x ray, and other procedures and services. The primary care physicians re- ceived no instructions about delivery of care and practiced according to their own best judgment. Thus, the study was confined as much as possible to the effects of prepayment and organizational change upon utilization and costs. The potential favorable effects of a financial incentive were avoided purposely. All MCG services were prepaid and were financed by an experimental insurance op- tion which was added to the patient's group plan; hospitalization benefits remained un- changed. Control families were covered un- der a comprehensive major medical plan with the same hospitalization benefits as those of the patients in the study group. Hospital utilization data were derived di- rectly from bills paid by Metropolitan Life * Curators Professor & Associate Chairman, De- partment of Family & Community Medicine, and Professor of Medicine, University of Missouri School of Medicine, Columbia, MO 65212. 525 Insurance Company. While gynecologic ser- vices and costs were included, obstetrical hospital days and visits were excluded from both study and control data. Obstetrical uti- lization was thought unlikely to be influ- enced by the experimental design. Two sys- tems were necessary for collection of ambulatory services data. An encounter form was used for MCG enrollees. For controls, analysis of insurance deductibles, claims and tax records, plus multiple telephone and questionnaire surveys were used. Informa- tion was obtained from 77 percent of the con- trols in these surveys. It was found that con- trol enrollees were quite knowledgeable about insurance coverage and income tax de- ductions for medical care and most main- tained detailed records for tax purposes. From these records and surveys, control am- bulatory services utilization was estimated and compared to study group data. Since the control enrollees paid for services on a fee- for-services basis, a fee equivalent was re- corded for prepaid group study patients. Costs of operation of MCG related to patient care were recorded separately from any re- search expenditures just as though MCG was a private group practice. Thus, exclusive of any research costs, the sums expended for professional and paraprofessional salaries, rent, supplies, biologicals, telephones, tele- phone answering services, data manage- ment, and administration were added to- gether and considered the cost of operation. Salaries paid to professional and parapro- fessional personnel were prorated according to the actual time allotted to patient care in the MCG. This proration was the figure used in calculating the cost of operation, from which the cost per visit then was calculated for primary care MCG services. MCG paid the specialty departments and divisions of the medical school for specialty and laboratory services on a fee-for-service basis through the experimental period. Therefore, it was possi-

526 ble to determ. these directly. ine the actual cost for each of All services were expressed as services/100 person years (PY) where person years are: ~ persons enrolled x months enrolled RESULTS Hospital utilization varied by type, age, and sex. For children, surgical admission rates were slightly higher among MCG pa- tients than among controls. MCG children were admitted for nonsurgical conditions less often than were controls, and this difference was statistically significant (p < 0.05~. Over- all, a 33 percent reduction (p < 0.01) in hos- pital days utilized by children occurred. For adults, a similar result was noted in nonsurgical admissions. MCG men were ad- mitted at a rate 55 percent lower than that of control men (p ~ 0.01~; MCG women's ad- missions were only minimally less. For both men and women considered separately or combined as all adults there was a statisti- cally significant reduction in hospital days used (p < 0.01~. Overall, nonsurgical admis- sions were significantly reduced for all sub- groups, as were hospital days used, down 35 percent ~ p < 0. 01) . For surgical conditions, however, the converse was true; MCG men were admitted 68.5 percent more than con- trol men (p ~ 0.01~; surgical admissions of MCG women also were higher than those of the controls, but to a lesser extent than those of the men. The net result was little differ- ence in surgical days used by the two groups. The reduced nonsurgical utilization was so striking that hospital days utilized by all groups combined still were statistically sig- nificantly lower for MCG compared with controls, down 23 percent (p < 0.01~. MCG patients had higher utilization rates for office visits and consultations, diagnostic x-ray and laboratory services, and preventive services. Adult women used the most office visits, even though the data exclude pre- and postnatal visits and associated laboratory ser- vices. There were other major contributors to increased MCG ambulatory service rates than office visits per se, especially diagnostic ASSESSING MEDICAL TECHNOLOGY x-ray and laboratory services. The total rate for ambulatory services in MCG was several- fold that for controls (p ~ 0.01~. Certain aspects of the ambulatory services data utilization require special comment. The preventive services and consultations data were likely to be recorded completely for MCG patients by virtue of the specific data collection system set up to capture such information. For the control group, how- ever, immunizations, other preventive health care examinations, and follow-up visits to consultants were not always clearly identifi- able on physicians' bills, and therefore could not be reported accurately. This no doubt led to a systematic underreporting of these ser- vices in controls. However, even if both ini- tial consultations and follow-up visits to con- sultants are considered as office visits in both groups, and all preventive services in MCG and in controls are excluded from the final data, the MCG ambulatory care rates still were 426 services/100 PY compared to 265 services/ 100 PY for controls ~ p < 0 . 01 ) . During the study period, only nine insur- ance claims were presented to the Metropoli- tan Life Insurance Company for non- emergency, out-of-plan hospital admissions. These admissions added only 3.1 hospital days/100 PY and represented only 5.8 per- cent of all hospital days used. Out-of-plan utilization would appear to have been mini- mal. MCG actual costs and fee equivalents for diagnostic and therapeutic visits were quite comparable to those that were charged con- trols, as were actual costs/visit. For preven- tive visits, the MCG actual cost and MCG fee equivalent again were comparable. Both, however, were considerably greater than the total preventive fees that controls paid, pri- marily because many more preventive ser- vices were provided to MCG enrollees than to controls. The cost per visit did not give a complete picture of ambulatory care provided, be- cause MCG enrollees received more visits and services than did controls. When allowance was made for this difference, MCG showed greater cost per person year than did controls for each type of service, except for surgery and hospital charges.

APPENDIX B.: SELECTED PAPERS Visits to specialists made up 28.7 percent of all diagnostic and therapeutic visits. Initial referrals were 11.3 percent of all visits or one referral for each 8.8 primary care visits. Charges for specialty consultations and labo- ratory and x-ray services were 31 percent of all costs, or almost 53 percent of the MCG costs exclusive of hospital charges. Together with those provided in MCG itself, then, cost of physician, laboratory, and x-ray services made up over 70 percent of all nonhospital costs incurred (Perkoff et al., 1976; Perkoff, 1979~. DISCUSSION Within the limits of the methodology em- ployed, the data presented appeared to sup- port the major basic premise of the experi- ment. Hospital utilization by MCG enrollees was less than that of control enrollees for chil- dren and adults, and overall. This was almost entirely the result of reduced admissions for nonsurgical causes. The ambulatory services corollary to re- duced hospital use expected on theoretical grounds was observed in every major cate- gory. The changes in ambulatory services uti- lization were highly significant statistically, and were so large as to make it unlikely that the known deficiencies in methods for col- lecting the control data could account for the result. The major effect on ambulatory ser- · . . · . vices provision was on anal ary services x-ray, laboratory, and preventive services. Thus, while office visits did increase, there was no evidence of excessive demand for ill- ness care. It should be noted that control health care utilization was low, but there is no evidence that this was an underserved group. Rather, the MCG plan resulted in in- creased utilization in an otherwise relatively healthy population. The major difference between expected results and findings was that MCG surgical admissions were increased rather than de- creased. One of the accepted explanations for reduced hospital utilization in a prepaid plan is that elimination of the fee incentive re- duces unnecessary surgery. Details of the MCG surgical experience and discussion of possible explanations for the results have 527 been published elsewhere (Perkoff et al., 1975~. Suffice it to say here that no differ- ences were found in the types of conditions for which surgery was done in the MCG en- rollee group compared with controls, nor were there differences in the proportion of different operations, including those often thought of as elective or unnecessary. These studies provided sound evidence for the often-stated belief that hospital utiliza- tion can be decreased and ambulatory ser- vices utilization can be increased by a system which combines prepayment for comprehen- sive medical care services with an organized medical care delivery system. The assump- tion always is made that such reductions in hospital utilization will reduce the cost of medical care by the substitution of less expen- sive ambulatory care for the most expensive hospital care. There are fewer data than as- sumptions on this important issue, and this study did not resolve the problem. It was not possible to know whether reduced hospital use resulted directly from increased ambula- tory care or that both were results of change in care patterns for the same people. In any case, cost savings were spotty and much less than expected. The study was designed to see whether the organization of medical care into a prepaid group practice could affect utilization and cost of care without any effort being made to control the practice of either the primary care or consultant physicians. Thus, only prepayment and organization of care varied among the study and control groups. Other factors that might have led to cost savings and that are characteristic of some prepaid group practices i. e., limitations on the number of hospital beds available, extensive management systems, incentive plans, use of salaried specialty personnel, ownership of hospitals were not part of the study. By measuring medical care utilization in both study and control groups; by accounting for expenditures in the study group on an actual cost and fee equivalent basis; and by paying full fees to the appropriate departments for all consultant, x-ray, and laboratory services provided by salaried physicians, it was hoped that the effects on utilization and cost of only two factors, prepayment and organization of

528 care, could be isolated. Both commonly are considered to be remedies applicable to the present medical care system to yield cost sav- ings. In addition, it was hoped that other po- tential areas for influencing medical care and medical care costs would be recognized. Within limits, these goals were reached. In addition to examining the basic prem- ise, the study suggested several other ways costs might be contained in prepaid group practice, or indeed, in the traditional system. Better training of primary care physicians in certain specialties surely could reduce refer- rals to several major specialties. Utilization of less expensive allied health personnel for vari- ous high-use and more routine services such as preventive care could lead to lower costs. In the area of prevention, more needs to be done to identify those procedures which truly are of value, with elimination of other less useful tests and examinations. More important, however, are specific fis- cal measures to reduce the cost of all needed services. Since physicians' services accounted for the majority of MCG nonhospital costs, any plan for controlling medical care costs will have to deal with this area of cost genera- tion. ASSESSING MEDICAL TECHNOLOGY SUMMARY Organization of medical care into an effec- tive group with prepayment did lead to re- duced hospital and increased ambulatory ser- vices utilization, but in and of itself did not lead to reduced medical care costs. Several other aspects of prepaid group practice were identified and may lead to cost control, espe- cially reduced cost of physicians' services. A1- though special characteristics of MCG dif- fered from those of larger prepaid group practices, the data did suggest that high ex- pectations of cost savings from this method of medical care delivery may need to be modi- fied as this concept is applied more broadly in varied private and public medical care set- tings. REFERENCES Perkoff, G. T. 1979. Changing Health Care: Per- spectives from a New Medical Care Setting. Ann Ar- bor, Michigan: Health Administration Press, Univer- sity of Michigan. Perkoff, G. T., L. I. Kahn, W. Ballinger, and J. K. Turner. 1975. Lack of effect of an experimental pre- paid group practice on utilization of surgical care. Surgery 77:619-623. Perkoff, G. T., L. I. Kahn, and P. Haas. 1976. The effects of an experimental prepaid group practice on medical care utilization and cost. Med. Care 14:432- 449.

The Metro Firm Trials: An Innovative Approach to Ongoing Ranclomized Clinical Trials David I. Cohen* Duncan Neuhausert Randomized clinical trials are said to be expensive, hard to organize, and fraught with ethical difficulties. The demand for more such trials persists, however, because of the need to definitively evaluate the large number of medical interventions of debat- able or undemonstrated efficacy. If the diffi- culties in conducting these trials could be substantially reduced, it would greatly facili- tate a more widespread evaluation of medical care. The Department of Medicine at Cleveland Metropolitan General Hospital (Metro) is or- ganized into four similar teams of physicians associated with similar 28-bed inpatient units and similar outpatient clinics Firms. New patients are randomly assigned to one of these four firms. Experimental changes in the delivery of care are being carried out on an ongoing basis (trials). We think that this is one solution to some of the problems of con- ducting randomized clinical trials. In this paper we shall describe the Metro firm trials in the context of the historical background from which this system arose. We shall then discuss the development of the system for the purpose of research, and briefly describe some of the trials which have been conducted in this setting. Practical is- sues including costs and human subjects re- view will be addressed. Finally, we shall dis- cuss some of the unique methodologic issues which have arisen during the firm trials. * Department of Medicine, Cleveland Metropoli- tan General Hospital, Case Western Reserve Univer- sity. t Department of Epidemiology and Community Health, Case Western Reserve University. 529 HISTORICAL BACKGROUND Many hospitals have had arbitrary, hap- hazard, or rotational methods of assigning new patients to one of several similar teams of physicians. Such institutions include the Johns Hopkins Hospital, The Massachusetts General Hospital (internal medicine ward services), and the Sodersjukhuset in Stock- holm, Sweden. The earliest study known to us, and based on this kind of arrangement, is described by Hailer. During the 1880's the Cook County Hospital in Chicago placed every fifth medical case and every fourth surgical case under homeopathic treatment. Comparative mortality statistics in- dicated a difference of roughly one percent, with the allopaths (regular physician) reporting a 7.2 percent mortality and the homeopaths 82 percent.2 Another example was the Boston City Hos- pital where all new patients were assigned on a rotational basis to either the Boston, Har- vard, or Tufts University services. This ar- rangement was assumed to permit an equita- ble distribution of patients. As far as we know this arrangement was used for research purposes only once by Halperin and Neuhauser3 who observed differences in treatment for elective inguinal hernior- raphies among the three services. This rota- tional arrangement at Boston University took over complete teaching responsibility for the hospital. In 1956 Thomas Chalmers reported a co- ordinated series of four studies he and others carried out during 19514 in the United States Army Hospital in Kyoto, Japan, which was designated the Hepatitis Center for Ameri- can soldiers during the Korean War. Over

530 4,000 hepatitis patients were treated there, and 460 patients were entered into the stud- ies. Patients with jaundice were randomly as- signed to one of four inpatient wards (p. 1167 in note 4~. Different regimens of diet, rest, and patients' conditions were compared. The patient was used as the unit of analysis. Chalmers et al. thus had a unique opportu- nity to study a large number of soldiers with the same disease. With the Metro firm trials, we have pur- posefully developed a system and introduced a formal randomization procedure in order to create a laboratory for ongoing clinical and health services research. THE METRO FIRMS The firm system was developed by the late director of the Department of Medicine, Charles Rammelkamp, for several reasons. err ~ . ~ ne system promoted continuity both be- tween patients and their physicians and be- tween house staff and their attendings. Both the former and current directors of the De- partment of Medicine have strongly sup- ported the firm system and have advocated its use as a research laboratory. The Division of General Medicine has been established and is responsible for the four firms. It also has hierarchical control of the system. The combination of foresight, support, and con- trol has allowed Metro to continue to develop this research model. Moreover, it provides a clear focus for general internal medicine within a department with strong subspecialty interests. Initially, new patients were assigned to the firms on a rotational basis. Since 1980 new patients have been randomly assigned to the firms according to a computer-generated ta- ble listing the firms in random sequences of four. In this manner we ensure equal distri- bution of patients among firms. Not only are new patients randomly as- signed, but since July 1981, all new internal medicine residents have been randomly as- signed to the firms by lottery. Both patients and resident physicians remain with their firm throughout their connection with the hospital in order to provide continuity of care. Subspecialty consultants come to the ASSESSING MEDICAL TECHNOLOGY TABLE B-3 Volume of Services and Staffing in the Four Metro Firms Number F. lrms Inpatients admitted per firm per month New inpatients admitted per firm per month Outpatient visits per firm per month Inpatient beds in each firm Number of physicians in each firm 4 80 55 240 28 18 firms on request. Second-year residents ro- tate out of the firms for subspecialty training, and patients needing subspecialty care go to subspecialty clinics, although they return to their firm for general medical care. EXAMPLES OF THE METRO FIRM TRIALS It is not our purpose here to provide a de- tailed description of our completed trials or of those currently in progress. We will, how- ever, briefly describe several studies that rep- resent the sorts of issues that can be explored using the firm system. Our first studies dealt with health services research issues concerning physician behav- ior. In the first trial, house officers in two firms were provided with information about charges for the inpatient laboratory tests they ordered. House officers in the other two firms were provided with charges for their inpa- tient x-ray test usage.5 The feedback of this information resulted in an overall decline in test usage. However, this effect was greatest in those firms in which group leaders became interested in using the data. Moreover, the greatest effect was observed after feedback was discontinued. The second study was undertaken to evalu- ate a program intended to improve physician compliance with the delivery of preventive interventions. In this trial, house officers in two randomly selected experimental firms were offered educational seminars and given checklists with the medical records of their patients to encourage the use of Pneumovax, influenza vaccines, and mammography.6 House officers in the control firm were not

APPENDIX B.: SELECTED PAPERS given checklists, although they were invited to the seminars. The use of the experimental maneuvers of interest increased significantly (from 5 to 45 percent among eligible patients) in the experimental group while remaining unchanged among the controls. A subsequent trial was of a more clinical nature. When intravenous therapy teams were introduced to the inpatient medical ser- vices at Metro, they were done so in a sequen- tial manner, that is, one firm at a time. This enabled us to evaluate their efficacy in de- creasing the incidence of phlebitis and more serious complications associated with intra- venous therapy. Firms on which the intra- venous therapy team was functioning were compared with those on which intravenous catheters were inserted and maintained by the house staff, as had been the traditional practice.7 A clinically and statistically signifi- cant decrease in intravenous catheter-related complications was observed in those firms in which the intravenous therapy team was functioning, thus supporting its efficacy. A subsequent analysis of the data permitted a description of the natural history of intraven- ous catheter-associated phlebitis.8 PRACTICAL CONSIDERATIONS The Metro firm trials are particularly ap- propriate for research on the organization and delivery of medical care. They may also be used to study highly prevalent medical conditions. The ongoing randomization within the firm system provides a setting in which such research can be conducted at a reasonable cost. Costs Although a determination of the real costs involved in conducting clinical trials is a complicated issue, the extra costs, requiring outside funding for the first three trials, to- talled less than $100,000. Suffice it to say, these trials are not expensive in relation to many randomized clinical trails. Costs may be even further reduced with the availability of a computerized data base. Currently the hospital provides several independent non- communicating computerized data bases. 531 These provide basic sociodemographic data on each patient, lab test results, and financial information. We plan to develop a hospital- wide patient information system which will merge these data and permit retrieval at low cost. Within the year it is expected that com- puter terminals will be available in each firm's inpatient unit. We plan a series of trials related to computer introduction, data availability, and the use of decision models for patient care. Human Subjects The development of any clinical research laboratory requires the approval and cooper- ation of the Committee on Human Investiga- tion. If one accepts the idea that small teams (firms) are a preferred type of organization, then we would propose that the best way to ensure equal access to good care is to ran- domly assign physicians and patients to firms. This is the approach that the human subjects committee has taken at this hospital. The first two trials were considered to be ad- ministrative changes which did not require their detailed attention. Human subjects committees are to some degree idiosyncratic for each hospital, and in another hospital the response might be different. UNIQUE METHODOLOGIC ISSUES The firm system has provided an excellent setting for research. However, our experience in conducting trials in this setting has given rise to some unique methodological issues and theoretical considerations which are con- sidered below. Randomization Although we have attempted to ensure as complete a randomization process as is possi- ble given the constraints of a teaching pro- gram, we have had to consider the theoreti- cal possibility that there might be some decay of the process with time. For example, it is conceivable that some unique attribute of a firm may lead to greater or lesser retention of a given type of patient. To date, we have not observed such a phe-

532 nomenon, nor have we been able to detect differences in the sociodemographic charac- teristics of patients among the four firms. Nevertheless, because the possibility of a de- cay in the randomization process exists, we have taken the precaution of running two separate analyses of our data for each trial. The impact of an experimental intervention is first analyzed using only the patients newly randomized into the firms, and is then ana- lyzed using all patients. If we can demon- strate the same effect in both groups, we can assume either that randomization decay has not occurred, or if it has, that it has not influ- enced our results. In this way we can gain the statistical power associated with the larger patient population. Unit of Analysis One particularly difficult problem in trials of therapeutic interventions and studies of the organization and delivery of medical care is the determination of the appropriate unit of analysis. For example, in the Veterans Ad- ministration multicenter trial comparing medical and surgical therapy for the treat- ment of patients with chronic stable angina, 596 patients were used as the unit of analy- sis.9-~ However, one could argue that the unit of analysis might have been the 13 par- ticipating surgical teams. A similar issue may be considered in the case of the Burlington Trial of the use of nurse practitioners for pri- mary patient care. i2,~3 Although patients were used as the unit of analysis to compare the care provided by nurse practitioners and doctors, one might argue that the sample size was one group of doctors and one group of nurse practitioners in one clinic in one Cana- dian province. Another example is the evalu- ation of fluoridation of the water supply in two Michigan towns,~4 ~5 which is generally considered the best test of the effect of a fluo- ridated water supply on tooth decay. The unit of analysis was assumed to be patients; one could argue that the unit of analysis might have been the town. The running de- bate on this question of unit of analysis- has no simple resolution. In the Metro firm trials the firms are ran- domly selected to serve as experimental or ASSESSING MEDICAL TECHNOLOGY control groups, and physicians and patients are randomly assigned to firms. Thus, there are four potential levels of analysis: the ex- perimental versus control firms, each firm as a separate entity, the physician within each firm, or the individual patients. Which, then, is the appropriate unit of analysis? The solution we propose is to use a hierarchical, or nested, k design in which firms are nested within intervention group, physicians within firms, and patients within physicians. To some extent, the unit of analysis will vary depending upon the question being ad- dressed in a study or may vary for multiple questions addressed in a single trial. In our second trial we paid particular attention to this issue. Because we were concerned with the delivery of preventive care to our pa- tients, and because they had been a unit of randomization, they were treated as the unit of analysis for comparisons of the delivery of preventive procedures. However, to the ex- tent that we were interested in the attitudes and behavior of house officers, and because they had also been a unit of randomization, they were also treated as a unit of analysis for these issues. The Hawthorne Effect In trials associated with staff education and group dynamics, the possibility exists that an educational intervention in one firm will be communicated to other firms. Results from the initial trials suggest that there was little, if any, cross-firm contamination given the fact that the experimental effects were so large. Comparisons between control and ex- perimental firms and between pre-experi- mental and experimental conditions should allow us to monitor for any possible contami- nation phenomenon for each study. The Hawthorne effects could occur in in- tervention studies of medical care organiza- tion. Observed behavioral changes could be the effect of attention alone. One approach to controlling for this Hawthorne effect is to run two trials concurrently. In our first trial, two firms received information on laboratory test charges, while the other two firms were controls. Concurrently, the lab test control firms were the experimental firms for the

APPENDIX B.: SELECTED PAPERS feedback of x-ray test charges. The lab test experimental firms were the control firms for the x-ray charge feedback study. We thus planned interventions on both firms in an ef- fort to control for the Hawthorne effect. Nev- ertheless, because we cannot accurately mea- sure the Hawthorne effect independently for each experiment, we are unable to control for this problem in the manner possible for dou- ble blind drug studies. Gene ralizab it ity A legitimate question exists as to the generalizability of data collected in one set- ting, particularly one as unique as Cleveland Metropolitan General Hospital which is both a public institution and a major academic unit of Case Western Reserve University. One means of overcoming this problem would be the development of cooperative multicenter studies with other similar institu- tions in which a similar system might be es- tablished. CONCLUSION The Metro Firm trials will not solve all the problems of medical care evaluation. How- ever, we believe that the firms provide a unique model for conducting clinical and health service research efficiently and inex- pensively. We hope that other hospitals will develop similar models, and look forward to cooperating with them in multicenter trials. We are indebted to Drs. Thomas C. Chalmers, Frederick Mosteller, Mitchel Gail, and Harold Goldberg for their advice, encouragement, and criticisms. NOTES l Waggoner, D. M., J. D. Frengley, R. C. Griggs, and C. H. Rammelkamp. 1979. A "firm" system for graduate training in general internal medicine. J. Med. Ed. 54:556. 2 Haller, J. S. 1981. American Medicine in Transi- tion 1849-1910. Urbana, Illinois: University of Illi- nois Press. 3 Halperin, W., and D. Neuhauser. 1976. MEU: A way of measuring efficient utilization of hospital ser- vices. Health Care Management Review 1~2):63-70. 533 4 Chalmers, T. C., R. D. Eckhardt, W. E. Rey- nolds, J. G. Cigarroa, N. Deane, R. W. Reifenstein, C. W. Smith, and C. S. Davidson. 1955. The treat- ment of acute infectious hepatitis. Controlled studies of the effects of diet, rest, and physical reconditioning on the acute course of the disease and on the incidence of relapses and residual abnormalities. J. Clin. Invest. 34:1163-1235. 5 Cohen, D., P. Jones, B. Littenberg, and D. Neuhauser. 1982. Does cost information availability reduce physician test usage? A randomized clinical trial with unexpected findings. Med. Care 20~3~:286- 292. 6 Cohen, D., B. Littenberg, C. Wetzel, and D. Neuhauser. 1982. Improving physician compliance with preventive medicine guidelines. Med. Care 20(10): 1040-1045. 7 Tomford, J. W., C. O. Hershey, C. E. McLaren, D. K. Porter, and D. I. Cohen. 1984. Intravenous therapy team and peripheral venous catheterassoci- ated complications: A prospective controlled study. Arch. Intern. Med. 144:1191-1194. ~ Hershey, C. O., J. W. Tomford, C. E. McLaren, D. K. Porter, and D. I. Cohen. In press. The natural history of intravenous catheter associated phlebitis. Arch. Intern. Med. 144:1373-1375. 9 Murphy, M. L., H. N. Hultgren, K. Detre, J. Thomsen, T. Takaro, and Participants of the Vet- erans Administration Cooperative Study. 1977. Treatment of chronic stable angina. N. Engl. J. Med. 297~12) :621-627. i° Special Correspondence: A debate on coronary bypass. 1977. N. Engl. J. Med. 297~26~: 1464-1470. ii Detre, K., M. L. Murphy, and H. Hultgren. 1977. Effect of coronary bypass surgery on longevity in high and low risk patients. Lancet 2:1243-1245. i2 Spitzer, W. O., D. L. Sackett, J. C. Sibley, R. S. Roberts, M. Gent, D. J. Kergin, B. C. Hackett, and C. A. Olynich. 1974. The Burlington randomized trial of the nurse practitioner. N. Engl. J. Med. 290:251-256. ]3 Sackett, D. L., W. O. Spitzer, M. Gent, R. S. Roberts, W. I. Hay, G. M. Lefroy, G. P. Sweeny, I. Vandervlist, J. C. Sibley, L. W. Chambers, C. H. Goldsmith, A. S. Macpherson, and R. G. McAuley. 1974. The Burlington randomized trial of the nurse practitioner: Health outcomes of patients. Ann. In- tern. Med. 80:137-142. ~4 Arnold, F. A., Jr., H. T. Dean, P. Jay, and J. W. Knutson. 1956. Effects of fluoridated public water supplies on dental caries prevalenceTenth year of the Grand Rapids-Muskingum study. Public Health Rep. 71 (7) :652-658. i5 Dean, H. T., F. A. Arnold, Jr., P. Jay, and J. W. Knutson. 1950. Studies on mass control of dental car- ies with flouridation of the public water supply. Pub- lic Health Rep. 71 (7) :652-658.

534 '6 Simon, R. 1981. Composite randomization de- signs for clinical trials. Biometrics 37:723-731. 1' Cornfield, J. 1978. Randomization by group: A formal analysis. Am. J . Epidemiol. 108~2~: 100-102. fix Williams, P. T., S. P. Fortmann, J. W. Far- quhar, A. Parody, and S. Mellen. 1981. A compari- son of statistical methods for evaluating risk factor ASSESSING MEDICAL TECHNOLOGY changes in community-based studies: An example from the Stanford three-community study. J. Chronic Dis. 34:565-571. i'3 Rothlesberger, F. J., and W. J. Dickson. Man- agement and the Worker 1939-1975. Cambridge, Mass.: Harvard University Press.

Values and Preferences in the Delivery of Health Care Barbara l. McNeil* Over the past 5 years many persons en- gaged in health care have become concerned with the need for increased incorporation of patient values regarding quality of life into medical decision making. They also have be- come concerned with major lacks in knowl- edge about the kinds of information that should be available to aid patients in making such value decisions. These concerns may have arisen, in part, as a natural evolution of emphasis from quantity of care to quality of care. They may also have arisen, in part, from studies like those of Cassileth and co- workersi indicating that an increasing frac- tion of younger individuals with cancer want more information about their disease and want to play a more active role in its manage- ment (Table Bob. Whatever the origin, values and prefer- ences are now important in an explicit fash- ion to many providers of health care and to their patients. This article will review some of the considerations facing each of these groups and will give, where possible, sup- portive data from the literature. The ethical issues involved in choosing between diagnos- tic strategies and alternative therapies will become clear. PROVIDERS OF HEALTH CARE Providers (taken to mean physicians, nurses, nurse practitioners, etc.) have two major tasks: (1) obtaining data on the results of medical interventions, with results includ- ing not only health outcomes themselves but also the functional implications of such health outcomes; (2) presenting data in as un- biased a fashion as possible so that the values and preferences of their patients can be as- sessed and incorporated into medical deci- * Professor of Clinical Epidemiology and Radiol- ogy, Harvard Medical School and Brigham and Women's Hospital. 535 signs. Providers must also worry about how to use these data to assess preferences; for log- ical emphasis, however, assessment tech- niques will be discussed as part of the pa- tient's perspective. Obtaining Data In some diseases (e.g., hypertension) out- come states for both treated and untreated pa- tients are well documented. The Framingham study on cardiovascular diseases and the Vet- erans Administration (VA) study on medical therapy for hypertension have, for example, provided a wealth of data on the incidence (as a function of age) of a large variety of sequelae (strokes, myocardial infarction, angina, etc.~.2-5 Such studies generally fail, however, to give the functional implications of these se- quelae: how many patients with stroke return to work, are self-sufficient, need to change ca- reers? How many patients with heart attacks return to work, worry unnecessarily about their disease, etc. ? Data here are spotty at best (see notes 6 and 7 for typical examples), but if available, they would help health care profes- sionals educate patients more effectively about the likely course of their disease. In oncology, in which many alternative therapies exist and therefore patient values become increasingly important, the situation is even worse. Large strides have been taken and progress has been made to increase the life expectancy of patients with cancer. The End Results Reporting Data,8 published every 4 years, and periodic updates in CA, a journal for clinicians, emphasize these changes. Data on the incidence of associated morbidities are less detailed and are not available so widely and easily. In addition, the functional implications of such morbidi- tiese.g., those for mastectomies, ileosto- mies, radical prostate surgery, etc.are sel- dom available. Also, when available, their presentation is frequently skewed so that only

536 ASSESSING MEDICAL TECHNOLOGY TABLE B-4 Participation and Information Preferences by Age Group Patients in the Following Age Groups (years) Selecting Response ~ c7O ~ (N = 256) Preferences 20-39 40-59 60+ p Value Participation preferences Prefer participatingin decisions 87 62 51 Prefer leaving decision to M.D. 13 38 49 <0.001 Type of information desired Want all informationgood and bad 96 79 80 Want only minimal or good information 4 21 20 <0.05 Preferences for detailed information Prefer minimum 15 40 31 Prefer maximum 85 60 69 <0.01 SOURCE: Ann. Intern. Med. 92:834, 1980. data from patients with minimal functional sequelae are discussed. Presenting Data Studies of the effects of alterations in data presentation on both physicians and patients are new and preliminary. Their importance is great, however. Some physicians present data to their patients in terms of life expect- ancies, others in terms of survival rates, and others in terms of mortality rates. Some phy- sicians give a great deal of data regarding the immediate (less than 30 days), short-term side effects of alternative treatments and oth- ers give less. Some physicians try to give a full picture of the associated morbidities; others give an abbreviated picture. How do these styles influence the choice a patient might make between or among alter- native therapies? Data from cognitive psy- chology have recently suggested that the way data are presented can have a major impact on treatment choice.9 One pilot study illus- trates this effect in medicines Treatment for operable lung cancer can involve either radiation therapy (RT) or sur- gery, and the results of these can be expressed in terms of cumulative probabilities of being alive or dead at varying points in time (Table Buy; these data can also obviously be inte- grated so that a life expectancy figure is ob- tained. When students, healthy outpatients, and physicians were asked to choose between surgery or radiation therapy, assuming that they had lung cancer, major and systematic differences in the percentage of respondents choosing radiation therapy were observed (Table B-6~. For example when these data were presented in terms of mortality, radia- tion therapy was favored; when they were presented in terms of life expectancy, surgery was favored. In addition other systematic ef- fects were found by concealing the identity of the two treatments and calling them A or B instead. This maneuver systematically in- creased the likelihood of an individual's choosing radiation therapy. The above study was designed to illustrate the effects of explicit perturbations of data presentation on treatment choice in an exper- TABLE B-5 Cumulative Likelihoods for 60-Year Old Men (Percent) Dying Surgery RT L. . 1vlug Surgery RT 90 100 68 34 During treatment 10 By 1 year 32 By 5 years 66 77 22 TABLE B-6 Percent Choosing RT Over Surgery (S) Label Dying 44 61 L. . 1vmg S- RT A - B 18 37

APPENDIX B.: SELECTED PAPERS imental situation. In clinical situations the problems may get more complex and provide increasing problems for providers of health care. These providers must worry not only about the effects seen in Tables B-5 and B-6 but also about other more implicit effects. For example, whatever the data presented, it is possible in a real clinical situation for pro- viders to present data alone or do providers always end up providing data plus some of their own values? A recent study in psychia- try~i would suggest the latter. Implications Major efforts must be made to make known the side effects of health interventions and, more importantly, their functional im- plications. Also, extensive research needs to be done to understand the extent to which the framing of data influences our ability to as- sess values and preferences. In addition, rem- edies for systematic biases in data presenta- tion and interpretation and hence in decision making should be developed. THE PATIENT The assessment of a patient's values and preferences pervades all areas of his health care system from screening for occult disease, to diagnosing suspected disease, to instituting treatment for disease. Two questions are per- tinent here: (1) For each of the above stages (screening, diagnosis, and treatment) do we have prototypical examples showing the im- portance of incorporating patients' values? (2) When should such values be assessed? At the time of the screening, diagnostic, or ther- apeutic encounter or sometime before the need actually occurs? Both of these questions will be addressed in turn. PROTOTYPICAL EXAMPLES When providers and patients think of pref- erences and values in medicine, they gener- ally think of their impact on therapeutic choices. In fact, the importance of values far exceeds decision making in therapeutic medi- cine and, as indicated above, covers the whole range of medical interventions. In 537 general, whatever the specific area of medi- cine involved, that methodology involved in assessing preferences is complex and will not be mentioned here. Instead, we will discuss the kinds of data required for proper incorpo- ration of values at different stages of the med- ical process and the type of results that might be expected at these stages. Perhaps the reader will think of other analogous exam- ples. Screening for Disease One example of this category relates to the screening of pregnant women for neural tube defects, using the alpha-fetoprotein (AFP) as- say. i2 The major problem with this screening device is the fact that it has false-positive results that can lead to amniocentesis, which, in turn, can lead to the accidental abortion of a normal fetus. Thus, the question becomes, "Is a prospective parent willing to run the risk of an accidental miscarriage to avoid the birth of a child with a potential neural tube defect?" Several pieces of data are required to make an optimal decision: (1) responses to the question, "At what risk of a pregnancy's producing a severely deformed child would you prefer the risk of an elective abortion to the risk of having a child born with a neural tube defect?"; (2) the risk of having a child with a neural tube defect; (3) the false- positive rate of the AFP assay; and (4) the ac- cidental abortion rate. With these Pauker and Paukeri3 were able to determine that about 50 percent of prospective parents who were already undergoing genetic counseling would want to have the screening test done. This 50 percent figure is undoubtedly a high one because individuals undergoing genetic counseling probably place a lower cost on the burden of an elective abortion than does the population as a whole. Nonetheless, it does indicate the role that values have in either es- tablishing a screening program or in applying it to an individual patient. Another example along these same lines relates to amniocente- sis for the detection of Down's syndrome. Diagnosing Disease A graphic example in this category is, "Should patients with presumed operable

538 lung cancer be investigated for occult meta- static disease, knowing that 20 to 40 percent of these patients have such disease?"~4 If tests produced perfectly correct diagnostic infor- mation involving the presence of occult metastatic disease, then performing such tests would always be in the patients' best in- terests. Yet, if false-negative results occur, some patients would have unnecessary sur- gery and its attendant mortality rate (aver- age, 10 percent; range, 5 to 20 percent). If false-positive results occur, some operable patients would not be offered the benefit from potentially curative treatment from sur- gery. Patients' attitudes toward the impor- tance of near-term versus far-term survival are thus important in this decision to do pre- operative staging examinations. When a detailed analysis of patient atti- tudes was made and incorporated into this decision (Figure B-1), the data in Tables B-7 and B-8 emerged. Specifically, the 5-year survival rate is equal for the no-test and the ASSESSING MEDICAL TECHNOLOGY test strategies if a perfect test (sensitivity and specificity both equal to 100 percent) is used. The life expectancy is slightly increased if a perfect test is used. Under all other circum- stances, however, it is obvious that preopera- tive testing for occult disease should never be done if maximizing the 5-year survival rate or life expectancy were the goal (Table Bob. When patient values were incorporated, however, a large fraction of patients would benefit from testing even if the false-positive and false-negative rates for the test were in the 5 to 20 percent range (Table B-8~. Treating Disease Two recent examples highlight the impor- tance of incorporating patient attitudes into treatment decisions. One, already men- tioned, involves the choice between surgery and radiation therapy for operable lung can- cer. As indicated in Table B-5 the importance of near-term versus far-term survival is criti- DIE POST OP MD ~ ~ =:C~ _ . APPARENTLY METASTATIC OPERABLE {3 DIE POST OP PATIENT l 1~'19 = TEST 1 METASTATIC REGIONAL RT/CHEMO ~ METASTATIC FIGURE B-1 Decision flow diagram comparing test and no-test strategies. In the no-test strategy (upper branch), patients with presumably operable bronchogenic carcinoma undergo surgery. Ten percent die perioperatively. Of the remaining patients, 80 percent have regional disease and are cured, while 20 percent have occult metastatic disease and are not cured. In the test strategy (lower branch), patients with presumably operable bronchogenic carcinoma are examined preoperatively in order to identify occult metastatic disease. Those patients with negative tests are treated surgically. Length of survival after operation depends on whether they have regional (true-negative test) or metastatic (false-negative test) disease. Those pa- tients with positive tests are treated palliatively with radiation therapy or chemotherapy. Their length of survival also depends on whether they have regional (false-positive test) or metastatic (true-positive test) disease. (Reprinted with permission from Radiology 132:605, 1979).

APPENDIX B.: SELECTED PAPERS TABLE B-7 Evaluation of Preoperative Testing in Patients with "Operable" Lung Cancer: Traditional Objective Approaches 539 Life 5-Year Expectancy Optimal Strategy Sensitivity (%) Specificity (%) Survival (%) (years) Decision No preoperative testing 32.5 6.42 Preoperative testing where test has: 100 100 32.5 6.43 Test 90 95 32.0 6.36 No test 90 90 31:.5 6.29 No test 80 95 32.0 6.35 No test 80 90 31.5 6.29 No test 80 80 30.5 6.15 No test 50 90 31.5 6.28 No test SOURCE: Radiology 132:608, 1979. TABLE B-8 Evaluation of Preoperative Testing in Patients with "Operable" Lung Cancer: Incorporation of Patient Attitucles Test Characteristics Sensitivity ( % ~ Specificity ~ % 100 95 90 95 90 80 90 Patients Who Should Be Tested ~ % 100 68 60 66 56 47 50 100 90 90 80 80 80 50 cat for this clinical problem. In a small studyi5 that explored this problem using ex- pected utility theory (i.e., assessing values or utilities individually and then integrating them with probability data*) the results sug- gested that at a 10 percent operative mortal- ity rate 21 percent of 60-year-old men should have radiation therapy instead of surgery; 43 $ Note that this approach has been the traditional one advocated to date and involves assessing values, calculating expected utility, and then indicating what the patient should WaIlt. }6 ]7 It differs from the tech- nique described in Tables B-5 and B-6, wherein a re- spondent is given the data and asked to make a direct choice. The relative advantage of each of these tech- niques needs to be explored. percent of 70-year-old men should have radi- ation therapy instead of surgery (Table B-9~. At lower operative mortality rates these fig- ures drop, and at higher rates they increase. These results are particularly striking when it is recalled that most therapeutic choices are not made on the basis of patient preferences as indicated here, but rather on the basis of absolute 5-year survival rates. Data from the work of Torrance et al. have indicated that it is possible through the time trade-off techniques to determine, relative to perfect health, how individuals value vari- ous states of imperfect health. ~9 On a scale of TABLE B-9 Influence of the Measure of Therapeutic Efficacy on the Choice of Therapy when Excellent Surgical Results Prevail Measure of Therapeutic Efficacy Percent Who Should Receive Radiation Therapy Rather than Operation with Operative Mortality Rates of 5~0 10% 15~o 20% At age 60: 5-year survival Expected utility 7 At age 70: O O 21 5-year survival 0 . 0 Expected utility 14 43 O O 43 64 O O 50 71 SOURCE: N. Engl. J. Med. 299:140O, 1978.

540 TABLE B- 10 Mean Utilities for Several States of Health in General Population and a Selected Subset State of Health Utility 1.00 0.68 0.48 0.44 Perfect health Tuberculosis Mastectomy for breast cancer Depression for 3 months Home dialysis for life By general population By dialysis patients - Hospital dialysis for life By general population By dialysis patients Death 0.39 0.56 0.32 0.52 0.00 O (death) to 1.0 (perfect health), they were able to obtain data for a variety of chronic health conditions (Table B-10~. Using this same technique, McNeil and coworkers de- termined how people valued the state of hav- ing either no speech or artificial speech, caused by surgery for cancer of the larynx.20 They then integrated this information with the results of alternative treatments for can- cer of the larynx surgery with high survival rates and no speech and radiation therapy with low survival rates and normal speech- and determined how many people so valued voice that they would want radiation ther- apy. The results indicated that about 20 per- cent of individuals fall in that category. Implications For a particular patient the weighing of quantity and quality of life may be important in choosing between alternative therapies for the same disease. Similarly, for society as a whole accurate weighting of quality and quantity of life may be important in develop- ing a rank ordering of benefits achieved from various health interventions. Such an order- ing would aid in an optimum allocation for health care resources. From these two per- spectives and from the above prototypical ex- amples, it should be clear that additional work is needed in two areas: (1) refinement of the methodology for assessing values and ASSESSING MEDICAL TECHNOLOGY preferences, and (2) routine use of this meth- odology in the practice of medicine. THE TIMING OF ASSESSMENTS The methodology used in the assessments described above is detailed and complex and beyond the scope of this review (see notes 16- 18 for a discussion of various methodologies). Suffice it to say that any kind of detailed questioning will be difficult to administer routinely in patients who have either become acutely ill or who have become recently aware of a long-term chronic problem about to afflict them. In addition, the validity of re- sponses made under such circumstances could be questioned. These problems have led a number of individuals to suggest one or two courses of action. First, streamline ques- tions and methodologic approaches so that they are no more complex than other activi- ties a patient is asked to respond to. Second, identify a set of characteristics corresponding to optimal courses of action for a variety of patients and diseases; then, match a new pa- tient to these characteristics and a course of action previously determined to be optimal for a similar individual. This second ap- proach would require a "bank" of prototypi- cal patients, each with an associated optimal . . c .eclslon. Implications Problems in timing value assessments are great. Their solution may depend in large part on the extent to which the methodology for value assessment is simplified. The easier the assessment technique the more likely that timely values can be obtained. The more cumbersome the technique the more likely a bank of prototypical values will be required. Supported in part from a grant from the Henry J. Kaiser Family Foundation. NOTES l Cassileth, B. R., R. V. Zupkis, K. Sutton-Smith, and V. March. 1980. Information and participation preferences among cancer patients. Ann. Intern. Med. 92:832-836.

APPENDIX B.: SELECTED PAPERS 2 Kannel, W.B., and T. Gordon. 1970. The Framingham Study: An Epidemiological Investiga- tion of Cardiovascular Disease. Section 26. Some Characteristics Related to the Incidence of Cardiovas- cular Disease and Death: Framingham Study, 16- year Follow-up. Washington, D.C.: U.S. Govern- ment Printing Office. 3 Kannel, W. B., and T. Gordon. 1970. The Framingham Study: An Epidemiological Investiga- tion of Cardiovascular Disease. Section 25. Survival Following Certain Cardiovascular Events. Washing- ton, D.C.: U.S. Government Printing Office. ~ Veterans Administration Cooperative Study Group on Antihypertensive Agents. 1969. Effects of treatment on morbidity in hypertension: Results in patients with diastolic blood pressures averaging 115 through 129 mm Hg. J. Am. Med. Assoc. 202:1028- 1034. 5 Veterans Administration Cooperative Study Group on Antihypertensive Agents. 1970. Effects of treatment on morbidity in hypertension. II. Results in patients with diastolic blood pressures averaging 90 through 114 mm. J. Am. Med. Assoc. 213:1143-1152. 6 Cary, E. L., N. Vetter, and A. Philip. 1973. Re- turn to work after a heart attack. J. Psychosom. Res. 17:231-243. 7 Weisbroth, S., N. Esibill, and R. R. Zuger. 1971. Factors in the vocational success of hemiplegic pa- tients. Arch. Phys. Med. Rehab. 52:441-446. ~ Axtell, L. M., S. J. Cutler, and M. H. Myers, eds. 1972. End Results in CancerReport No. 4 DHEW Publication No. (NIH) 73-272. Washington, D.C.: U.S. Government Printing Office. 9 Tversky, A., and D. Kahneman. 1981. The fram- ing of decisions and the psychology of choice. Science 211 :453-458. ~° McNeil, B. J., S. G. Pauker, H. C. Sox, and A. 541 Tversky. 1982. On the elicitation of preferences for al- ternativetherapies. N. Engl. J. Med. 306:1259-1262. i~ Lidz, C. W. 1980. The weather report model of informed consent: Problems in preserving patient vol- untariness. Bull. Am. Acad. Psych. Law 8~2~:152- 160. i2 Pauker, S. G., S. P. Pauker, and B. J. McNeil. 1982. The effect of private attitudes on public policy: Prenatal screening for neural tube defects as a proto- type. Med. Dec. Mak. 2:103-114. i3 Pauker, S. P., and S. G. Pauker. 1977. Prenatal diagnosis: A directive approach to genetic counseling using decision analysis. Yale J. Biol. Med. 50:275- 289. i4 McNeil, B. J., and S. G. Pauker. 1979. The pa- tient's role in assessing the value of diagnostic tests. Radiology 132:605-610. i5 McNeil, B. J., R. Wichselbaum, and S. G. Pauker. 1978. Fallacy of the five-year survival in lung cancer. N. Engl. J. Med. 299:1397-1401. |6 Keeney, R. L., and H. Raiffa. 1976. Decision Making with Multiple Objectives: Preferences and Value Tradeoffs. New York: John Wiley. ]7 Raiffa, H. 1968. Decision Analysis: Introductory Lectures on Choices Under Uncertainty. Reading, Mass.: Addison-Wesley. i~ Torrance, G. W., W. H. Thomas, and D. L. Sackett. 1972. A utility maximization model for eval- uation of health care programs. Health Services Res. 7: 118-133. ~9 Sackett, D. L., and G. W. Torrance. 1979. The utility of different health states as perceived by the general public. J. Chronic Dis. 31 :697-704. 20 McNeil, B. J., R. Weichselbaum, and S. G. Pauker. 1981. Speech and survival: Tradeoffs be- tween quality and quantity of life in laryngeal cancer. N. Engl. J. Med. 305:982-987.

New Federalism and State Support for Technology Assessment George D. Greenberg* Penny H. FelclmanT This paper considers the implications of new federalism for state government support of technology assessment. First we describe new federalism and related health policy ini- tiatives of the Reagan administration. Sec- ond, we review the states' current involve- ment in technology assessment. Third, we examine state incentives and capacity to ex- pand technology assessment efforts and their likely effectiveness should they choose to do so. Fourth, we discuss economic and political rationales for state involvement. Finally, we identify those aspects of new federalism, as well as concomitant changes in federal and state health policy, that we believe will be critical in influencing future state support for technology assessment. NEW FEDERALISM AND HEALTH POLICY DEVELOPMENTS At the outset of his administration, Presi- dent Reagan asserted that his ultimate goal with regard to federalism was to sort out functional responsibilities between the fed- eral and state governments and to turn back appropriate revenue sources and decision- making authority to the states (Cannon and Dewar, 1981; Reagan, 1982~. The six block grants proposed as part of the fiscal year (FY) 1982 budget were to be a first step toward this ultimate goal, consolidating categorical grant programs in the areas of health, educa- tion, social services, energy, and emergency * Assistant Secretary for Planning and Evaluation, U.S. Department of Health and Human Services. Dr. Greenberg's contribution to this article was written in his private capacity. No official support or endorse- ment by the U.S. Department of Health and Human Services is intended or should be inferred. t Harvard School of Public Health, Harvard Uni- versity. 542 assistance. The block grants proposed for FY 1982 called for reduced federal funding and a minimum of federal strings, eliminating or reducing requirements for state matching funds, planning, reporting, maintenance of effort, and the like. In the health field, the Reagan administra- tion proposed two block grants, a health ser- vices block and a prevention block, funded at approximately 75 percent of FY 1981 pro- gram levels (Feder et al., 1982~. ~ The admin- istration also proposed to alter federal-state arrangements governing the $28 billion Medicaid program. For FY 1982 it proposed a 5 percent cap on federal Medicaid match- ing funds, so that the entire burden of future cost increases above the cap would fall on the states (Pelham, 1981a).2 In conjunction with the cap proposal, states were to receive greatly expanded authority to restructure their Medicaid programs to achieve cost sav- ings. Congress substantially revised the adminis- tration's proposal in the Omnibus Budget Reconciliation Act of 1981. It created four health block grants instead of two, consoli- dating programs in the areas of primary care; maternal and child health; prevention; and alcohol, drug abuse, and mental health (Pel- ham, 1981b). The health blocks contained many more federal strings than the adminis- tration would have liked, but they did some- what expand state discretion, particularly in the areas of maternal and child health and prevention. At the same time, they provided fewer federal dollars than previously. Fur- thermore, while Congress rejected the Medi- caid cap, it voted to reduce federal Medicaid matching payments by 3 percent in 1982, 4 percent in 1983, and 4.5 percent in 1984; and it greatly enhanced states' flexibility to re- structure their Medicaid programs.3 There leave been no significant legislative advances toward new federalism since 1981, despite subsequent administration proposals

APPENDIX B.: SELECTED PAPERS to federalize Medicaid (in exchange for state financing of the Aid to Families with Depen- dent Children and Food Stamp Programs) and to turn back a variety of domestic pro- grams, including the four health block grants, to the states. Yet a variety of pro- posals to federalize all or large portions of Medicaid, to turn the remaining parts of the program over to the states to be funded through block grants, or to further alter and reduce federal matching payments for Medi- caid remain on the administration's agenda to be moved forward pending the outcome of the 1984 presidential election (Grannemann and Pauly, 1983~. Complementing the administration's new federalism strategy in the area of health was its competition initiative. The goal of compe- tition is to make consumers more cost- conscious by increasing cost sharing, in turn motivating health insurers and providers to promote more efficient health delivery sys- tems to compete more effectively for the dol- lars of newly cost-conscious consumers. If competition were to succeed, the govern- ment's role at all levels would be reduced as government dollars were transferred to the private sector in the form of vouchers, tax credits, or the like. Private insurers, rather than federal or state government, would have the primary role in determining what health benefits beyond a minimum basic package should be included in their plans. Furthermore, private insurers, rather than federal or state government, would bear the burden of costs that exceeded the value of the government contribution. Major national legislation to promote competition has not yet been enacted by the Congress, despite the administration's FY 1984 proposals for caps on employers' tax deductible health insur- ance contributions, voluntary Medicare vouchers, and restructuring of Medicare Part A benefits to provide catastrophic coverage and impose cost sharing on the first 60 days of care. However, individual states are using the flexibility granted them under the Recon- ciliation Act of 1981 to introduce competitive incentives in their Medicaid programs. The passage of hospital prospective pay- ment for Medicare has been the administra- tion's major accomplishment in the health 543 care arena. Prospective payment has both regulatory and competitive aspects. It con- tains elements of regulation in that it estab- lishes limits on budgetary resources that Medicare will pay for given diagnosis-related groups (DRGs) and requires elaborate gov- ernmental and intermediary monitoring to ensure that hospitals do not manipulate the system. Prospective payment is pro-competi- tive in that it provides incentives for hospitals to drop unprofitable services and compete with each other for patients in order to maxi- mize their income. Whichever aspect of pro- spective payment is considered dominant, the DRG system should renew federal inter- est in technology assessment insofar as DRG rates are intended to reflect the real resource costs of medical technologies appropriate to specific diagnoses. Renewed federal interest may take the form of research support for the Prospective Payment Assessment Commis- sion, a more narrowly conceived successor to the National Center for Health Care Tech- nology. The evolution of the DRG system, and as- sociated federal support for technology as- sessment, will likely be as important as health block grants, Medicaid financing and reim- bursement reform, and the success or failure of competitive incentive schemes in influenc- ing future state support for technology assess- ment. Renewed federal interest in expanding technology assessment might obviate the role of the states, particularly in the context of a fully federalized Medicaid program. On the other hand, increased federal support for technology assessment might stimulate some states to expand their activities in this area, particularly in the context of Medicaid re- form in which they were required to bear in- creased financial responsibility. Devolution of health program financing on the states can be expected to increase their interest in tech- nology assessment as a means to promote cost control, while it erects political and financial barriers to actual state support of such stud- ies. Finally, if competition were to succeed and Medicare/Medicaid expenditures were capped by means of vouchers or some compa- rable mechanism, technology assessment might become the concern of competing pri- vate insurers seeking to make more cost-

544 effective decisions. We examine the implica- tions of these scenarios in our concluding section. STATE INVOLVEMENT IN TECHNOLOGY ASSESSMENT By all accounts, current state involvement in technology assessment is minimal. The federal, state, and local governments spent $5.6 billion on health-related research in 1982. Nearly 90 percent ($5.0 billion) came from the federal government, and the rest came from states and localities (Gibson et al., 1983~. Although estimates of government ex- penditures on technology assessment are sketchy, surveys by the U.S. Office of Tech- nology Assessment (OTA; 1980) indicate that cost-effectiveness and cost-benefit analyses "are not frequently conducted or applied" by major federal health agencies, state and local governments, or nongovernmental organiza- tions. It is quite likely then that only a small fraction of total health research spending is devoted to technology assessment; and, of course, an even tinier fraction is spent by state or local governments. OTA (1980) reported that where cost- effectiveness or cost-benefit analyses had been- conducted at the state or local level, their performance usually reflected the indi- vidual interests of government staff (and, by implication, not explicit policy directives of key decision makers). OTA (1980) found that most state and local analyses had been con- ducted in Massachusetts and New York. Those studies tended to focus on the costs and benefits or cost-effectiveness of various screening and other disease prevention pro- grams. The difficulty of obtaining necessary data, the relatively high cost of comprehen- sive analyses estimated at $381,000 by Coates (1972) and a tradition of devoting relatively little money and staff to evaluation were cited as deterrents to state support for technology assessment (OTA, 1980~. If states are not devoting significant re- sources to conducting or contracting for tech- nology assessment, they may nonetheless be using technology assessment results in a vari- ety of reimbursement or regulatory decisions. Evidence of such action, however, is scanty ASSESSING MEDICAL TECHNOLOGY and is largely anecdotal. As third-party pay- ers, some states limit payment for experimen- tal procedures under Medicaid. The defini- tion of experimental may be drawn from federal Medicare/Medicaid guidelines (them- selves drawn sometimes from formal evalua- tion and sometimes from expert opinion) or from the opinion of medical experts at the state level (OTA, 1980~. As planners and reg- ulators, states may draw on technology assessment to guide them in establishing standards or making particular certificate-of- need decisions. For example, Massachusetts originally granted certificate of need for computed tomographic (CT) scanners on the condition that hospitals participate in a for- mal evaluation of the technology, intended to inform future regulatory decisions. States' use, as distinct from their financial support, of technology assessment is influ- enced by the availability of relevant studies produced under other auspices and, in the case of Medicaid, by a recent court ruling. Medicaid requires that states pay for all "medically necessary" services. In 1980, a federal court established that with regard to mandatory Medicaid services the individual physician's judgment determines medical ne- cessity except where the state has an explicit policy limiting payment for experimental procedures. Otherwise the only permissible review of the physician's judgment is to de- termine whether there was a reasonable bias in fact for the diagnosis (Rush v. Parham, 1980~. Thus under current court rulings, states' latitude in applying the results of tech- nology assessment is apparently circum- scribed. POSSIBILITIES FOR MORE TECHNOLOGY ASSESSMENT BY STATES In understanding states' minimal use of technology assessment today and in consider- ing their possible future support for such ac- tivity, it is important to examine their incen- tives, capacity, and likely effectiveness should they expand technology assessment ef- forts. Why would a state want to conduct technology assessment and devote scarce re- sources to it? Do states have the money, staff,

APPENDIX B.: SELECTED PAPERS and technical know-how? Could states be successful if they tried? Incentives States have the constitutional authority to provide for the health and welfare of their citizens. In exercising this authority, they perform a variety of functions that provide potential incentives for supporting technol- ogy assessment: (1) they are responsible for traditional public health activities, including sanitation, communicable disease control, and the assurance of the quality of water and food supplies; (2) they provide institutional and ambulatory services for chronic condi- tions as well as for disease prevention and health promotion; (3) they invest directly in state hospitals, medical schools, and other health care facilities; (4) they engage in a wide set of regulatory activities ranging from institutional licensure and inspection to regu- lation of environmental hazards. In addition, as partners in the federal-state system, states act as third-party payers for personal health services for the Medicaid population (Wilson and Neuhauser, 1982~. In theory, if not in practice, states could perform all of these functions better if their decision were in- formed by the results of cost-effectiveness or cost-benefit analysis. State incentives to ex- pand efforts in a given area depend in part on what the federal government and the private sector are already doing and on whether the citizens of a state perceive they are ade- quately protected by the actions of federal agencies and private bodies. Protection of the health and welfare of citi- zens involves states in the development and enforcement of standards of safety and effi- cacy. To the extent that federal agencies, such as the Food and Drug Administration, already develop and enforce such standards, state motivation to go beyond the federal minimum is reduced as long as federal agen- cies are not perceived as weak or ineffective. On the other hand, if new technologies ap- pear to be beyond the control or authority of existing federal agencies, citizen fears may motivate states to act on their own. Dramatic incidents such as the recent Tylenol poison- ings or perceived reductions in federal en- 545 forcement also can lead to renewed popular calls for state protections. For example, state regulation of environmental pollution may increase to the extent that the federal govern- ment deregulates this area, and state regula- tion of nuclear power production may also increase as public perception of the adequacy of Nuclear Regulatory Commission standards declines after incidents such as that at Three Mile Island. The history of the passage of the Pure Food, Drug, and Cosmetic Act (1938) is illus- trative of the dynamic and changing relation- ship between federal and state regulations. Calls for federal regulation in the early 1930s in response to cases of blindness from cosmet- ics and deaths resulting from dissolving sulfa drugs in carbon tetrachloride were blocked by business interests, until individual states began to enact very strict laws. Faced with strict regulation in some states, business helped enact a uniform national code in 1938, which preempted state action and pre- served a national market (Jackson, 1969~. Health planning and cost-control pro- grams enacted in the late 1960s and in the 1970s also illustrate the dynamic relationship among federal, state, and private actions. Approximately a half dozen states includ- ing New York, Maryland, Connecticut, and Massachusetts were the first to initiate health care regulation, using their constitu- tional powers to establish vigorous certifi- cate-of-need and hospital rate-setting pro- grams independent of federal mandates. The professional standards review and national health planning laws enacted in 1972 and 1974, respectivelywere intended by Con- gress to foster increased professional regula- tion and centralized resource allocation across the 50 states. National legislation pro- vided political and financial incentives for previously inactive states and private groups to establish certificate-of-need agencies and peer review organizations where they had not before existed. On the other hand, associ- ated federal rules and guidelines actually re- duced the incentives for pioneering states to develop innovative regulatory programs. In their role as service providers and third- party payers, state decision makers are simi- larly influenced by the degree of flexibility

546 and the financial incentives embodied in fed- eral law. As indicated above, Medicaid regu- lations governing medically necessary ser- vices, along with recent court rulings, provide a disincentive for states to indepen- dently evaluate the cost-effectiveness of Medicaid-reimbursable procedures. Further- more, while the costs of conducting or con- tracting for cost-effectiveness or cost-benefit analyses are concentrated and not routinely reimbursed by the federal government, the costs of covering questionable medical proce- dures and equipment are relatively diffuse and under current federal-state Medicaid matching arrangements- reimbursed by the federal government at rates ranging from 50 to 80 percent, depending on the state. State investment in health facilities and equipment and state financing and provision of direct services are relatively free of federal constraints. However, state funding priori- ties are often subject to detailed scrutiny and approval by state legislatures. Technology as- sessment is an activity virtually devoid of popular appeal. Furthermore, knowing that affected interest groups and their legislative representatives tend to be highly resistant to cuts or changes in funding and service and that constituencies who bear the burden of efficiency measures are unlikely to be swayed by the results of cost-effectiveness analysis, decision makers in the state bureaucracy may be loath to divert scarce resources away from service provisions to support analytic studies. Given the unpopularity of benefit costs and negative coverage decisions and the often controversial nature of even the most author- itative cost-effectiveness studies, it is not sur- prising that technology assessment is not a high-priority item for state officials. State Resources and Capacity If states were motivated to expand technol- ogy assessment efforts, would they have the resources and technical capacity to do so? At a minimum, states would need the financial resources to adequately fund technology as- sessments. In addition, if they chose to con- duct rather than contract for such assess- ments, they would require appropriately ASSESSING MEDICAL TECHNOLOGY trained staff, adequate data bases, and the command of complex methodologies. There has been general growth in state ca- pabilities over the past 20 years. State fiscal capacity and strength grew significantly dur- ing the 1970s. States have upgraded and ex- panded their tax bases (Advisory Commission on Intergovernmental Relations, 1982~.4 Be- tween 1960 and 1979, 11 states adopted per- sonal income taxes, 9 enacted corporate in- come taxes, and 10 enacted general sales taxes. By 1979, 41 states had a broad-based income tax, 45 a corporate income tax, and 45 a general sales tax. A total of 37 states had all three levies in 1979 compared with only 19 in 1960 (Walker, 1981~. The average tax bite by states rose from 7.6 percent of personal in- come in 1953 to 12.8 percent in 1977 (Walker, 1981~. State and local receipts from their own sources rose from $105 billion in 1970 to $295 billion in 1980 (Walker, 1981~. Although these trends do not speak to a spe- cific state capacity to perform technology as- sessments, they argue against a hasty conclu- sion that states could not support expanded technology assessment efforts if they chose to do so. On the other hand, after a period of grow- ing surpluses, some state governments faced fiscal crisis in the early 1980s. Looming defi- cits resulted from the decline in the economy, federal cuts in intergovernmental aid, the en- actment of tax or expenditure limits in several states, and past expansions in state services. A survey of state budget officers in the spring of 1981 indicated that state balances would drop to $2.3 billion or 1.5 percent of expendi- tures in 1982, enough to cover only 4 days of operations (Hamilton, 1982~.5 With previous surpluses depleted, many states faced consti- tutional limits against deficit financing. States reacted by reducing services and in- creasing taxes. These actions, along with an upswing in the economy, strengthened states' 1983 and 1984 fiscal position vis-a-vis the fed- eral government. Combined state and local government surpluses of $15 billion were esti- mated for 1983, while the federal govern- ment faced a deficit of over $100 billion (Her- bers, 1983~. However, even in an improved fiscal environment, states' overall ability to finance expensive evaluative procedures is in

APPENDIX B.: SELECTED PAPERS question. The prospect of improved state fi- nances raised the Spector of further decreases in federal aid and highlighted the need for states to reestablish reserve funds rather than fund new priorities. Furthermore, states such as New York, Massachusetts, Michigan, and California, with disproportionately high public health care expenditures in relation to tax capac- ity i. e., those states that might benefit most from increased support for cost-effectiveness analysis in the long run may be least able to finance it in the short run, while states with disproportionately low public health expend- itures but high tax capacity (e.g., Texas, Florida, and Wyoming) may be disinclined to spend their money on technology assess- ment.6 States cannot easily pool their re- sources to support expensive studies collec- tively. Thus their real capacity to support technology assessment is less than their com- bined tax capacity would suggest. State personnel capabilities grew in the 1970s along with fiscal capacity. State gov- ernments have grown much more ranidlv than the federal government. The number of federal employees remained roughly constant at 2.9 million between 1970 and 198O, while state employment increased from 2.8 to 36 million (Barfield, 1981~. Between 1964 and 1978 the proportion of state agency heads with graduate degrees rose from 40 percent to 58 percent, and the proportion of state agency heads promoted from within the state civil service increased to over 50 percent (Broder, 1982~. State health-planning and rate-setting agencies are one potential source of personnel with training and skills particu- larly well-suited to technology assessment. However, more to the point, university- based faculties and private consulting firms within a state can service state governments under contract as easily as they can serve the federal government. To the extent that nei- ther the federal government nor the states currently have a large capacity to conduct technology assessments, future capacity can be built at either level, if funds are available. To the extent that well-conducted technol- ogy assessments require access to large data bases, the conduct of large-scale clinical trials including diverse populations over long pe- 547 riods of time, and the development of com- plex methodologies, states' capacity to com- plete successful studies may be limited. For the most part, state agencies do not have ac- cess to or experience in amassing large data bases such as those used in studies supported by federal agencies such as the Health Care Financing Administration (HCFA) or the National Center for- Health Services Re- search. Nor do states have experience in run- ning clinical trials, such as those supported by the National Institutes of Health. More- over, state data-processing and analytic ca- pacities have been slow to develop. For ex- ample, in 1980, 9 years after the 1972 Social Security Amendments authorized 90 percent federal matching payments to states for the design, development, and installation of Medicaid Management Information Systems (mechanized claims processing and informa- tion retrieval systems), only 29 states had fed- erally certified systems (HCFA, 1982~. Of course, with data, as with staff, if states had funds available, they could contract for tech- nology assessments from groups with the nec- essary administrative and methodological ca- pacities. Whether or not studies under state sponsorship could achieve the level of data access (e. g., access to sensitive medical records) achieved by federally sponsored studies is an open question. State Effectiveness If states invest the resources, can they be effective? Relevant questions include the fol- lowing: (1) Are parochial political forces in the state legislature more likely to overwhelm the judgments of scientists than they are in the Congress? (2) Could providers and pa- tients escape the enforcement actions of states that regulate based on the conduct of strin- gent technology assessments by moving to states that do not? (3) Do states create eco- nomic and social chaos by basing regulatory actions on vastly different scientific stan- dards? Unfortunately, there is little evidence available to answer these questions. The political environment in each state is different. However, it can be reasonably pre- dicted that coalitions of health policy ana- lysts, scientists, and business groups con-

548 corned about rising health costs, who understand the potential role technology as- sessment can play, and who form the pri- mary constituency in support of research, will be weaker on average in each of the 50 states than they are in Congress. Even in relatively proregulation states, state legislatures have supported local hospitals vis-a-vis state plan- ning agencies by overturning certificate- of-need decisions with special legislation (Altman et al., 1981~. To the extent that state legislatures are more responsive to coalitions of local providers than to various cost-control interests, scientific judgments may be less likely to be sustained. Action by a single state always risks defeat if citizens can easily obtain desired services in nearby but unregulated environments in neighboring states. Just as physicians in some states purchased CT scanners for their offices to avoid certificate-of-need controls in hospi- tals, they might shift certain practices and procedures to nearby but out-of-town offices or institutions if those practices were judged non-cost-effective and nonreimbursable by a given state. The regulating state might expe- rience some budgetary savings as a result of its application of cost-effectiveness analysis; however to the extent that non-cost-effective practices were shifted rather than deterred, its neighboring state might experience added costs. Systemwide savings would not accrue. Finally, if state standards for aspects of re- search such as sample size, significance level, length of observation, etc., varied widely, state judgments as to safety and efficacy probably also would vary significantly. This could create significant uncertainty for pa- tients and providers, shake public confidence in the integrity of regulatory decisions, and increase social costs in exchange for uneven budgetary savings. CRITERIA FOR STATE ACTION What is the appropriate role for the states in technology assessment? Is technology as- sessment one of those functions that should be decentralized via new federalism, or is it pri- marily a responsibility of the federal govern- ment? The literature on federalism offers sev- eral economic and political criteria for ASSESSING MEDICAL TECHNOLOGY allocating responsibilities between federal and state governments. A classic justification for federal action is the presence of serious externalities. A state may decide not to build a dam if only the costs and benefits to its citizens are calcu- lated, but the federal government may be justified in building the dam if the benefits to citizens of another state downstream are added in. Similarly, federal intervention may be justified when states refrain from taking socially beneficial action for fear of putting themselves at a competitive disadvantage versus other states. For example, states may fear that tightening environmental controls unilaterally will simply drive businesses else- where or that raising welfare levels beyond the federal minimum may make them more attractive to nonproductive populations. Uniform federal action in such situations the- oretically enhances the public good without sacrificing the interests of the individual states. A second consideration used to justify federal action is the fact that states may be in the position of a "free rider" vis-a-vis certain n~hlic ~nod.s. For example, Georgia cannot be defended from Soviet attack without also defending Florida. The citizens of a particu- lar state might be tempted to provide less than their fair share if the federal govern- ment did not preempt the issue. Finally, when the scale of an enterprise is truly mas- sive, when only the federal government can assemble the expertise or talent necessary to successfully complete a project, or when there are large economies of scale, the case for federal action is often considered strong. For example, the technical expertise and re- sources required to land a man on the moon were beyond the capacity of any single state. Hence federal action was considered neces- sary and appropriate by those who believed that the nation should develop space technol- ogy. Although federal action is most often de- fended on grounds of equity of efficiency, state action is defended on grounds of pre- serving capacity for innovation, maintaining diversity and pluralism (the states constitute 50 laboratories), minimizing administrative complexity (fewer levels of government need to be involved), and maximizing democratic

APPENDIX B.: SELECTED PAPERS participation (state government is closer to the people). George Silver (1974) has argued that in the area of maternal and child health, federal grants have simply supported what states have wanted to do and federal action has always followed, not led, state action in that field.7 Richard Elmore (1978) has re- viewed a number of studies demonstrating that innovative programs tend to be success- ful when developed and implemented at the grass roots level by states and localities. Press- man and Wildavsky (1973) conclude that re- ducing layers of government and concomi- tant decision clearance points improves the prospects for program implementation. On balance, we believe that the arguments supporting a primary federal role in financ- ing, technical assistance, standard setting, and data gathering if not in directly conduct- ing technology assessments are stronger than those supporting increased state respon- sibility. Technology assessment as such would probably benefit more from nation- ally accepted guarantees of scientific integ- rity and access to nationwide data sets than from innovation, diversity, or grass roots par- ticipation at the state level. Large federal deficits notwithstanding, the federal govern- ment is probably still better able to fund a major technology assessment effort than the states acting individually. Furthermore, the findings of technology assessment have implications that extend be- yond the boundaries of a single state. Hence there are significant externalities for states in conducting technology assessments and ap- plying their results. Different state regula- tions with regard to technology may be self- defeating if providers and patients simply cross state lines to escape more stringent stan- dards. Similarly, vastly different state action may make the marketing of new technologies so problematic that incentives for technologi- cal innovations are reduced. The payoff of the federal government will likely be greater than that to the states insofar as the impact of technology assessment can be maximized through uniform national policy. Yet in the context of reduced federal support for tech- nology assessment, support by selected states or by the private sector is the only alternative to vastly diminished efforts in this area. 549 FACTORS IN STATE SUPPORT FOR TECHNOLOGY ASSESSMENT Four factors, we believe, will largely deter- mine the impact of new federalism on state support for technology assessment: (1) the availability of federal support for technology assessment, (2) the content of Medicaid fi- nancing and reimbursement reforms, (3) the allocation of block grant funds at the state level, and (4) the relative success or failure of competitive health care incentives. Federal Support for Technology Assessment Because technology assessment itself is ex- pensive and technically complex, states have to be convinced that technology is cost-effec- tive before making the investment. Even if states are convinced of the merits of technol- ogy assessments, the amounts they invest will depend upon what is already being spent at the national level, either by the federal gov- ernment or privately funded institutes. Just as technologies continue to develop and evolve, so do public attitudes and regula- tory policies. In the 1960s and 1970s, expan- sion of the authority of federal agencies to deal with rapidly diffusing technologies dis- couraged and in some cases preempted state action. This situation might be reversed in the context of extensive deregulation and sig- nificant Revolution of federal authority. States might be spurred to action if federal agencies did not or could not act, as citizens became concerned about possible harmful ef- fects of increasingly unregulated technologi- cal development. So far this has not occurred in the health care sector. Over time, under DRGs, the federal Health Care Financing Administration will establish rates for diagnosis-related groups that theoretically reflect the costs of technol- ogy appropriate to treating a given diagnosis. The perceived rationality and justifiability of changes in diagnosis-specific payment rates will presumably depend in part on the avail- ability and persuasiveness of technology as- sessment studies relevant to technological ad- vances in the treatment of illness. The potential role for federally sponsored tech-

550 nology assessment thus seems greatly en- hanced under the DRG system. Renewed and expanded federal commit- ment to technology assessment might obviate the role of the states. States acting alone may not be able to conduct large clinical trials, amass necessary data, or perform technically complex analyses. The federal government is generally better equipped to perform these functions. However, to the extent that the federal government provides financial sup- port and technical assistance to states in de- veloping methodologies and assembling the necessary data, the federal-state relationship could be stimulative and symbiotic rather than competitive. States' motivation to take advantage of federal support and assistance will depend in part on the degree of responsi- bility they bear for Medicaid financing and on the Medicaid reimbursement policies they pursue. Medicaid Financing and Reimbursement Policies Full federalization of Medicaid would eliminate the states' role as third-party payers and significantly reduce their financial stake in supporting a host of cost-containment ef- forts, among them technology assessment in- tended to promote more cost-effective medi- cal care. State incentives might in fact be reversed if health care were federally fi- nanced but supply controls such as certificate of need remained a state responsibility under state licensing authority. For example, if ser- vices were fully financed by the federal gov- ernment, states might have incentives to re- move restrictions on bed supply since this would increase access of their own citizens to services while the bill would be paid by an- other level of government. Similarly, state in- centives to invest scarce resources in technol- ogy assessment would be reduced if the bill for expensive technologies were fully paid at another level. Alternatively, if Medicaid financing re- forms took the form of federal caps or block grants to the states, states' financial stake in cost control would be maintained insofar as they were at risk for expenditures exceeding federal contributions. Under current match- ASSESSING MEDICAL TECHNOLOGY ing arrangements, states pay between 25 and 50 percent of total Medicaid costs. Under a Medicaid block grant with fixed federal con- tributions, states' responsibility for costs above the federal contribution would rise to 100 percent. Significant Revolution of Medi- caid financing responsibility on the states can be expected to increase their interest in tech- nology assessment as a means to support cost controls, but it also can erect political and fi- <7 , . ~ 7 ~ _ _ _ _ _ _ _ __ _ _ _ _ _ _ nancial barriers to their actually supporting technology assessment, as discussed in the section on block grants below. Furthermore, other cost-cutting actions, particularly re- ductions in Medicaid rolls or provider reim- bursement rates, would likely be favored in- sofar as they offer quicker and large payoffs. Whether Medicaid financing is centralized or decentralized, federal and state choices about how to reimburse hospitals and other health care providers will affect interest in technology assessment. As indicated above, a DRG form of pay- ment might stimulate interest in technology assessment. On the other hand, payment sys- tems such as those in Massachusetts, Mary- land, and New York which impose global budgetary limits on hospitals or other pro- viders would decentralize resource allocation decisions to the provider level and suggest a smaller role for government use of technology assessment as a cost-control vehicle. Yet given preset budgets and tight fiscal constraints, provider demand for more information on the cost-effectiveness of practices and tech- nology might increase. Thus, under global budgeting systems there might be an impor- tant educative role for a central agency with a reputation for scientific integrity to guide individual providers in their resource alloca- tion decisions. These effects would be similar to those of procompetitive schemes which also decentralize resource allocation deci- sions to the individual provider levels. Block Grants Consolidation of categorical programs- including Medicaid, if Congress chose that option into block grants can be predicted to have several effects. First, block grants strengthen governors vis-a-vis their own state

APPENDIX B.: SELECTED PAPERS bureaucracies and enhance their ability to coordinate policy across state agencies.8 Mar- tha Derthick (1970) has pointed out how the most important factor in expanding the influ- ence of the federal government is the creation of vertical linkages between federal and state professionals through the categorical grant system. Although federal influence may have fostered professionalism among state bureau- crats, it has also undermined the profession- alism among state bureaucrats and has un- dermined the ability of the state executives to shift funds among program categories to in- crease policy rationality or cost-effectiveness. The relaxation of federal requirements through block grants should enhance states' interest in using the findings of technology as- sessment and other analytic studies to justify redistribution of formerly categorical dollars. As a result, block grants may increase state willingness to support such activities. Second, another effect of block grants is to create political uncertainty. Power is shifted from Congress to 50 state legislatures. Inter- est groups with access to Congress must now win battles in 50 state legislatures to achieve the same results. Political outcomes will be- come less predictable given different political alignments and interest group strengths in each state. On average, certain industries are likely to be less dominant nationally than they are in the economy of a particular state (e.g., coal in Kentucky). And on average, state legislators are more likely to be respon- sive to a few local interests than are congress- men who represent larger constituencies where cross-pressures from different interests are more likely to be felt. Local coalitions of providers are alleged to have greater strength at the state level. This might reduce state ef- fectiveness and interest in performing tech- nology assessments if the technical judgments of state agencies were overturned by political forces in the legislature. Of course, cost- control and regulatory coalitions may be stronger in particular states than they are on average in the Congress. As noted above, leg- islation proposed in some states prior to the passage of the Pure Food, Drug, and Cos- metic Act of 1938 was much stricter than the law that ultimately passed the Congress with business support. On the whole, however, it 551 seems unlikely that health policy analysts and other technology assessment advocates will be as influential in the individual states as they have been at the national level. Third, block grants as enacted under the Reagan administration have increased fiscal constraints on currently funded programs and services. If Congress were to convert Medicaid from an open-ended entitlement program to a block grant, the fiscal impact on states would be far more severe than the fiscal impact of creating the existing health care blocks. The reduction in funds accompa- nying the creation of health block grants makes it unlikely that states will choose to di- vert resources to new functions such as tech- nology assessment at a time when previously funded service programs are being cut back (Greenberg, 1981). The support for technol- ogy assessment within states will be further reduced by cutbacks in support for health planning agencies. In the past, national and state health planning programs created a na- tional constituency of officials committed to analytical methods. To the extent that these programs have lost staff, the pool of talent within each state capable of performing tech- nology assessment is reduced, as is an impor- tant impetus for cost-effective resource allo- cation. However, should the federal government earmark special funds or undertake a major technical assistance effort to assist states in technology assessment, the combination of increased program flexibility and enhanced gubernatorial influence under block grants might induce states to look to technology as- sessment as a means for promoting efficiency and effectiveness in health care spending. The Success or Failure of Competition To date,the administration's competitive health care proposals have not received legis- lative endorsement. If the administration's competitive agenda were enacted, govern- ment involvement in medical resource alloca- tion decisions could be expected to decline, and technology assessment would become less important thanit is now as a government cost-containment vehicle. On the other hand, to the extent that pri-

552 vate insurers are motivated to influence the provision of health services in order to offer more competitive insurance premiums, they will have increased incentives to support technology assessment as a means to elimi- nate cost-ineffective services. They might choose to conduct their own studies (in order to gain a competitive advantage) or, given the externalities, to contribute funds to a cen- tral agency with a reputation for scientific in- tegrity. The ability to eliminate third-party payment for previously covered but cost- ii~effective procedures will depend in part upon consumer and physician acceptance of such changes. Therefore, the importance of a central agency (privately or publicly financed) which would develop methodologies and en- sure integrity of studies might be enhanced under a competitive health care system, al- though such an agency would not directly in- fluence or make coverage decisions. Whether the agency were located at the federal or state level would depend on the availability of federal funds, the degree of health care fi- nancing centralization or decentralization, and the extent to which competition became a national versus a state priority. CONCLUSION Distinguishing among state incentives to conduct technology assessments, state re- sources and capacity, and state effectiveness is a useful first step in assessing the effects of other developments in the health care system on state support for the development and ap- plication of technology assessment. Making these distinctions leads us to conclude that there are several factors that may lead states to assume a more active role in this area. In the first place, a minimum federal (or per- haps joint public-private) commitment to provide money for incipient state efforts, to secure data, and to ensure the scientific integ- rity of studies may be a precondition of ex- panded and effective state action. In addi- tion, further decentralization of Medicaid, state adoption of DRG-type Medicaid pay- ment systems, the expansion of health block grants, and the stabilization of state finances would promote state interest in technology assessment. In contrast, a major new federal ASSESSING MEDICAL TECHNOLOGY initiative to conduct technology assessments, resulting, for example, from the need to ad- minister the DRG payment system; the fed- eralization of Medicaid; state adoption of glob-al budgeting systems; the withering of block grants, and/or continued fluctuations in state fiscal conditions would probably un- dermine the modicum of existing state sup- port for technology assessment. NOTES ~ The health services block grant would have con- solidated 15 categorical programs, including commu- nity health centers; alcohol, drug abuse, and mental health programs; maternal and child health; and mi- grant health programs. The prevention block grant would have supplanted family planning, adolescent health services, hypertension, fluoridation, lead-paint screening, rodent control, and other preventive pro- grams (Feder et al., 1982). ~ According to the cap proposal, the federal gov- ernment would have limited FY 1982 matching funds to a figure only 5 percent greater than the federal gov- ernment's FY 1981 contribution to Medicaid (Pelham, 1981b). 3 Several provisions of the 1981 Reconciliation Act are key in this regard. Specifically: · States may eliminate certain recipient groups and/or services from their programs for the medically needy. · States may limit recipients' freedom of choice to selected providers. · States may add a wider list of health mainte- nance organizations (HMOs) to their list of Medicaid providers. · States are no longer required to use Medicare's retrospective reasonable cost principles in paying hos- pitals under Medicaid. (While some states had already obtained federal waivers to experiment with prospec- tive hospital reimbursement methods, such waivers should now be more widely available). · States may use Medicaid dollars to substitute home and community-based services for long-term in- stitutional care. The 1980 Budget Reconciliation Act had already relaxed the reasonable cost requirement for nursing home reimbursement (Pelham, 1981a). ~ According to a recent study by the Advisory Com- mission on Intergovernmental Relations (ACIR), dis- parities across states in personal income have declined markedly, although disparities in tax effort have be- gun to grow again after a period of narrowing (ACIR, 1982). ~ In fact, states and localities posted a combined operating deficit of $3 billion in 1982 (Herbers, 1983). fi Robert Pear (1982) presents data on disparities in

APPENDIX B.: SELECTED PAPERS tax capacity and welfare/Medicaid spending for 10 states. ' Although Silver (1974) is critical of the lack of federal leadership in maternal and child health, the more general point is that the federal government of- ten follows the leadership of the states. ~ In some states, however, state legislatures are al- ready working to limit the increased discretion of the governor. REFERENCES Advisory Commission on Intergovernmental Rela- tions. 1982. Tax Capacity of the Fifty States: Method- ology and Estimates. Publication No. M-134. Wash- ington, D.C. Altman, D., R. Greene, and H. Sapolsky. 1981. Health Planning and Regulation: The Decision- Making Process. Washington, D. C.: American Public Health Association Press. Barfield, C. E. 1981. Rethinking Federalism. Washington, D.C.: American Enterprise Institute. Broder, D. S. April 14, 1982. The "new federalism" fades away and with it an opportunity. Washington Post. Cannon, L., and H. Dewar. March 10, 1981. Reagan asks $48 billion budget curb. Washington Post. Coates, V. T. 1972. Technology and Public Policy, Summary Report. Prepared for the National Science Foundation. Washington, D. C.: George Washington University. Quoted in Steven A. Schroder and Jona- than A. Showstack. 1979. The Dynamics of Medical Technology Use: Analysis and Policy Options, p. 194 in Medical Technology: The Culprit Behind Health Care Costs? Stuart A. Altman and Robert Blendon, eds. DHEW Pub. No. (PHS) 79-3216. Washington, D.C. Derthick, M. 1970. The Influence of Federal Grants. Cambridge, Mass.: Harvard University Press. Elmore, R. 1978. Organizational models of social program implementation. Public Policy 26(2~:185- 228. Feder, J., J. Holahan, R. Bovbjerg, and J. Hadley. 1982. Health. Pp. 271-305 in The Reagan Experi- ment, J. L. Palmer and I. V. Sawhill, eds. Washing- ton D.C.: The Urban Institute Press. 553 Gibson, R. M., D. B. Waldo, and K. R. Levit. 1983. National health expenditures 1982. Health Care Financing Review 541):1-31. Grannemann, T. W., and M. V. Pauly. 1983. Con- trolling Medicaid Costs: Federalism Competition and Choice. Washington, D.C.: American Enterprise In- stitute. Greenberg, G. D. 1981. Block grants and state dis- cretion: A study of the implementation of the Partner- ship for Health Act in three states. Policy Sciences 13: 153-181. Hamilton, M. January 10, 1982. States must find ways to offset 'new federalism' cuts. Washington Post. HCFA. 1982. Health Care Financing Program Sta- tistics: The Medicare and Medicaid Data Book. 1981. Washington, D. C.: U. S. Department of Health and Human Services. Herbers, J. 1983. Many states find sudden surpluses in their revenue. New York Times. Jackson, R. O. 1969. Food and Drug Legislation in the New Deal. Princeton, N.J.: Princeton University Press, passim. Pear, R. June 18, 1982. Many states still far from ready to go it alone. New York Times. Pelham, A. 1981a. Health Program Spending Cut by 25 Percent. Congressional Quarterly August 15: 1501-1504. Pelham, A. 1981b. Medicaid Spending Cut, "Cap" Rejected. Congressional Quarterly August 15: 1499- 1500. Pressman, J., and A. Wildavsky. 1973. Implemen- tation. Berkeley: University of California Press. Reagan, R. July 14, 1982. Excerpts from address to county officials' meeting. New York Times. Rush v. Parham. 625 F. 2d 1150 (1980~. Silver, G. 1974. Report #1, Final Report of the Yale Health Policy Project. HRA Grant #S00900. U.S. Office of Technology Assessment, Congress of the United States. 1980. The Implications of Cost- Effectiveness Analyses of Medical Technology. Ap- pendix B. p. 145. Washington, D.C.: U.S. Govern- ment Printing Office. Walker, D. 1981. Towards a Functioning Federal- ism. Cambridge, Mass.: Winthrop Publishers. Wilson, F., and D. Neuhauser. 1982. Health Ser- vices in the United States, 2nd edition. Chapter 7. Cambridge, Mass.: Ballinger Publishing Co.

Government Payers for Health Care Donald A. Young* Between 1965 and 1980 the government replaced direct payment by consumers as the dominant source of the dollars used to pur- chase personal health care services and sup- plies. In 198O, a total of $217.9 billion was spent for personal health care in the United States. Government programs spent $86.4 billion and provided 39.7 percent of personal health care expenditures. Federal funds pro- vided $62.5 billion, more than two-thirds of the public outlay. This compares dramati- cally with the situation in 1965 when the fed- eral government paid only 10.1 percent of the bills for personal health care services and consumers paid 51.7 percent of the share. While the total expenditures for medical ser- vices has risen rapidly in this 15-year period, the percentage of the total outlay paid by the state and local governments and by private health insurance has remained relatively sta- ble. The dramatic increase in total governmen- tal expenditures as well as the increasing share paid by the federal government makes governmental bodies significant parties with interest in health services delivery, the use of evaluative information, and making sound policy decisions regarding payment for medi- cal services. Although many governmental agencies ex- pend funds for health services and supplies through an array of public programs, the Medicare and Medicaid programs are domi- nant, accounting in 1980 for $60.6 billion in personal expenditures, two-thirds of all pub- lic spending for personal health care, and fi- nancing nearly 28 percent of all personal health care expenditures. Other significant contributors to public spending for personal health care include veterans medical care, $5.8 billion; Defense Department medical care, $4.2 billion; worker's compensation, $3.9 billion; and outlays by state and local * Executive Director, Prospective Payment Assess- ment Commission, Washington, D.C. 554 governments for hospital care, in addition to that provided to Medicaid recipients, $6.0 billion. Numerous other programs account for the remainder of the governmental medi- cal care expenditures. A brief overview of the largest governmen- tal programs that provide medical services and benefits is followed by a more compre- hensive examination of the Medicare pro- gram. THE MEDICARE AND MEDICAID PROGRAMS The Health Care Financing Administra- tion (HCFA), through its Medicare and Medicaid programs, helps pay medical ex- penses of 50 million poor, elderly, disabled, and blind Americans. A total of 28 million people are Medicare beneficiaries and 23 mil- lion people are Medicaid beneficiaries. In 1980 $60.6 billion was paid for health ser- vices used by Medicare and Medicaid benefi- ciaries. This makes HCFA the single largest payer of health care services. The Medicare program is a health insur- ance program. Like other public and private insurance programs, its purpose is to reduce the economic risk to beneficiaries of the cost of illness. The costs of the program are paid through Social Security tax payments, fed- eral general revenues, and individual cost- sharing provisions. Although the program is administered by the Health Care Financing Administration, an agency of the Depart- ment of Health and Human Services (DHHS), the day-by-day claims processing and payment functions are carried out by fis- cal agents under contract to HCFA. The con- tractors are generally public insurance orga- nizations such as Blue Cross/Blue Shield or private commercial insurance companies. The processes of claims review and payment are, therefore, similar to the process used by these groups in the conduct of their private business. The Medicare program differs from pri-

APPENDIX B.: SELECTED PAPERS vate health insurance, however, in that cov- ered benefits are determined by Congress in statutory authority rather than contract and are subject to change by lawmakers and to in- terpretation by the executive branch of the government, which is charged with adminis- tering the program. Beneficiaries do not have the option of selecting from a range of benefit packages designed to meet their specific needs. Benefits available to one beneficiary are generally available to all beneficiaries subject to medical need for the services. In addition, in public and private insurance plans, premiums taken in plus administrative costs must equal or exceed over a period of time payments paid out. Because the Medi- care program is funded by Social Security taxes and general revenues, there is no direct relationship between funds contributed bv beneficiaries and funds paid for services pro- vided to beneficiaries. The Medicaid program differs in a number of respects from the Medicare program. The primary difference is that Medicaid is a vol- untary, state-administered program. The federal government participates by sharing with the states the cost of providing care. In return, the government requires that a cer- tain minimum level of services be made avail- able as well as other requirements. Signifi- cant flexibility is given to the states in determining eligibility for medical assis- tance, benefits made available, and reim- bursement amounts to be paid. Providers who participate in the program must accept Medicaid-determined reimbursements as payment in full and cannot bill beneficiaries. There is no beneficiary cost-sharing except for nominal copayments for a limited num- ber of services. States may process claims themselves or contract with private organiza- tions, and nearly half of the states currently contract out all or part of the claims process- ing functions. DEPARTMENT OF DEFENSE In 1980, the Department of Defense ex- pended $4.2 billion for medical care for ac- tive duty personnel as well as retirees and military dependents. The greatest amount of this expenditure was for the direct provision 555 of services in facilities owned and operated by the military. For military dependents, re- tirees and their dependents, and some other eligibility groups unable to obtain care in a medical facility, the federal government pro- vides a medical benefits program, the Civil- ian Health and Medical Program of the Uni- formed Services (CHAMPUS). It became ef- fective December 7, 1956, and was amended in 1966 to include coverage for retired uni- formed service personnel and their depen- dents as well as dependents of active duty personnel. CHAMPUS is provided by law (Title 10, United States Code, Chapter 55) and is operated in accordance with policies and procedures set forth by the Department of Defense in regulations. Although it is not a health insurance pro- gram, CHAMPUS is similar in many respects to health insurance and especially to the Medicare program. Authorized medical ser- vices and supplies are cost shared by the gov- ernment from money appropriated by the Congress to the Department of Defense for this purpose. The uniformed services to which CHAMPUS applies are the Army, Navy, Marine Corps, Air Force, Coast Guard, Commissioned Corps of the United States, Public Health Service, and Commis- sioned Corps of the National Oceanic and At- mospheric Administration. Beneficiaries are encouraged, and in some circumstances required, to obtain medical care from uniformed services medical facili- ties, i.e., military (and Public Health Ser- vice) hospitals. Beneficiaries do, however, have the option of obtaining needed medical care from civilian sources when care is not available close to their homes or in emer- gency situations. For most medical care ob- tained from civilian sources, CHAMPUS re- quires that the beneficiary pay part of the expense through deductibles and cost- sharing. CHAMPUS program benefits are very similar to those provided by Medicare, and CHAMPUS also relies on contractors to receive and process the claims for service. VETERANS ADMINISTRATION The Veterans Administration (VA) health care system furnishes services to eligible vet-

556 erans in 172 medical centers, 226 outpatient clinics, 92 nursing homes, and 16 domiciliar- ies. During 1980, VA treated approximately 1.25 million hospital inpatients; 15.8 million outpatient medical care visits were furnished directly by VA staff, and an additional 2.2 million visits were authorized by the VA payable to non-VA physicians authorized to render care on a fee-for-service basis. In addition, under an agreement with the De- partment of Defense, approximately 224,000 dependents of veterans were eligible to re- ceive care under the Civilian Health and Medical Program of the Veterans Adminis- tration (CHAMPVA). The care was fur- nished in non-VA facilities. HEALTH CARE FINANCING ADMINISTRATION HCFA was established in 1977 to combine health financing and quality assurance pro- grams into a single agency. It is responsible for the Medicare program, federal participa- tion in the Medicaid program, and a variety of other health care quality assurance pro- grams. As its mission statement indicates, HCFA views its responsibility to be much broader than simply paying medical bills. The mission of HCFA is to administer the Medicare and Medicaid programs and re- lated provisions of the Social Security Act in a manner that (1) promotes the timely and eco- nomic delivery of appropriate quality health care to eligible beneficiaries, (2) promotes beneficiary awareness of the services for which they are eligible and improves the ac- cessibility of those services, and (3) promotes efficiency and quality within the total health care delivery system. To accomplish this mis- sion, HCFA provides operational direction and policy guidance for the nationwide ad- ministration of the Medicare and Medicaid health care financing programs; the Profes- sional Standards Review Organization (PSRO) and related quality assurance pro- grams designed to promote quality, safety, and appropriateness of health care services provided under Medicare and Medicaid; quality control programs designed to ensure the financial integrity of Medicare and Medi- ASSESSING MEDICAL TECHNOLOGY caid funds; and various policy planning, re- search, and demonstration activities. Medicare and Medicaid, along with other third-party payers, have an interest in con- taining administration and program costs and promoting efficiency in the delivery of services to beneficiaries while maintaining the availability of high-quality, medically necessary services. To make decisions regard- ing benefits, HCFA must have up-to-date medical, scientific, and health services re- search information. To understand how ap- propriate information influences benefit de- cisions in these programs, it is first necessary to review the authority, structure, and pro- cesses of the Medicare and Medicaid pro- grams as they relate to benefit decision mak- ing. Medicare The Medicare program was established by Congress in 1965 with the enactment of Title XVIII of the Social Security Act and became effective on July 1, 1966. In 1972, major changes were made in the program's provi- sions, and the name of the Medicare program was officially changed to Health Insurance for the Aged and Disabled. The program pro- vides payment for certain medical services for persons 65 years of age or over, disabled beneficiaries, and persons with end-stage re- nal disease. The program currently covers 24.9 million aged and 3.1 million disabled in- dividuals. In the title and opening sections of the Medicare statute, Congress indicated clearly that Medicare was to be an insurance pro- gram providing basic protection against the costs of medical care rather than a health ser- vices delivery program. In addition to stress- ing the insurance nature of the program, the opening sections of the statute prohibit any federal interference in the practice of medi- cine or the manner in which medical services are provided, guarantees beneficiaries free choice of qualified providers, and allows in- dividuals the option of obtaining other health insurance protection. The Medicare program consists of two sep- arate but complementary insurance pro-

APPENDIX B.: SELECTED PAPERS grams, a Hospital Insurance Program, known as Part A, and a Supplementary Med- ical Insurance Program, known as Part B. All persons age 65 or over who qualify for Social Security cash benefits, and individuals who have been receiving Social Security disability benefits for 24 months or more are automati- cally enrolled in Part A. Part A is financed by a payroll tax shared equally by employers and employees. Although Part A is called hospital insurance, covered benefits include medical services furnished in institutional settings including hospitals, skilled nursing facilities, or provided by a home health agency. Such institutions are termed pro- viders by Medicare and must be certified as qualified providers of services and have signed an agreement to participate in the program. The Medicare law includes limits, based on the concept of a benefit period, on the services which may be covered in the var- ious settings. The law also established cost- sharing by the individual through deductible and coinsurance payments. Part A providers of services are reimbursed directly by the pro- gram (for all reasonable costs) and generally cannot bill beneficiaries other than for appli- cable cost-sharing. The Supplementary Medical Insurance Program (SMI), or Part B of Medicare, is vol- untary for individuals who elect to be cov- ered. It is financed from premium payments by enrollees together with contributions from appropriated general revenue funds. (Be- cause of limits on premiums, the federal con- tribution has been increasing more rapidly than the premium. Currently, premiums fi- nance about 30 percent of the program costs, with the remaining 70 percent coming from general revenues.) Medicare Part B covers medical services and supplies furnished by physicians or others in connection with phy- sicians services, outpatient hospital services, and home health services. Physicians' services covered under the program include visits to the home, office, hospital, and other institu- tions. The program also pays for certain drugs and biologicals that cannot be self- administered, diagnostic x-ray and labora- tory tests, purchase or rental of durable med- ical equipment, ambulance services, pros- thetic devices, and certain medical supplies. 557 In contrast to Part A institutional costs re- imbursement, benefits paid under Part B are usually reimbursed on a fee or charge basis. After the beneficiary pays an annual deducti- ble, Medicare will pay 80 percent of the rea- sonable charge for most covered services for that year. Physicians and other suppliers, however, are allowed to charge beneficiaries an additional amount if the Medicare pay- ment is less than their usual charge. CLAIMS PROCESSING The separation of the Medicare program into an institutional (provider) component (Part A) and a noninstitutional (medical ser- vices) component (Part B) was patterned af- ter a program alignment used by Blue Cross/ Blue Shield Associations in paying for services to their subscribers. In order to keep the fed- eral health insurance program closely linked to the private sector, Congress decided that most claims-processing and administrative functions for both Part A and Part B of Medi- care should be handled by public or private insurance organizations (commercial or Blue Cross/Blue Shield) acting as fiscal agents for the Medicare program. The fiscal agents responsible for the ad- ministration of hospital insurance or Part A benefits are termed intermediaries. Institu- tional providers (hospitals, skilled nursing fa- cilities, home health agencies) were initially allowed to select the intermediary of their choice; however, this is slowly changing. In- termediaries act as the link between the pro- vider and the Health Care Financing Admin- istration, which is responsible for the administration of the Medicare program. The major role of the intermediaries is to re- view and pay claims for the costs of providing care to beneficiaries. The intermediary makes these payments to providers for cov- ered items and services on the basis of reason- able cost determinations following policies set by HCFA. Under the SMI (Part B), the fiscal agents are called carriers. Carriers are selected on a geographical basis by the secretary of the De-

558 partment of Health and Human Services; physicians and others furnishing Part B ser- vices have no say in selecting the carrier to process claims for these services. Since Part B services are reimbursed primarily on a rea- sonable charge (as opposed to reasonable cost) basis, one of the major functions of car- riers is to determine the reasonable charges in their respective areas for each medical care service paid for under the program. Carriers are also responsible for reviewing and paying claims to or on behalf of beneficiaries for the services provided. The functions performed by Medicare in- termediaries and carriers in the adjudication of claims is similar for both their private and their government business. These functions include, in addition to claims review and processing, utilization review, beneficiary hearing and appeals, professional relations, and statistical activities. The final decisions regarding payment for services in their pri- vate insurance business is determined by a contract with beneficiaries or their represen- tatives. The final decision in their Medicare business is determined by statutory author- ity, regulations promulgated by DHHS and program instructions, and guidance pre- pared by HCFA to implement regulations and statutory authority. In the absence of HCFA instructions concerning a specific ser- vice, authority is vested in carriers and inter- mediaries to make the benefit decisions. Medicare intermediaries and carriers are re- imbursed for their administrative costs under the basic principle of no profit or no loss. Contractors are not at risk with respect to program benefit payments as these payments are entirely underwritten by the program. Contractors, however, are regularly evalu- ated as to their capability and efficiency in administering the program and are subject to loss of contract for poor performance. Medicaid The Medicaid program was also enacted by Congress in 1965. Title XIX of the Social Security Act, Grants to States for Medical As- sistance Programs, succeeded earlier, wel- fare-linked medical care programs. Under the Medicaid program, states may enter into ASSESSING MEDICAL TECHNOLOGY an agreement with the secretary of DHHS to finance health care services for certain cate- gories of low-income individuals, primarily those eligible to receive cash payment under the Aid to Families with Dependent Children (AFDC) program and the Supplemental Se- curity Income (SSI) program for the aged, blind, and disabled (categorically needy). In addition, many states have exercised the op- tion to extend coverage to "medically needy" individuals who meet the AFDC or SSI cate- gorical criteria but whose incomes are slightly above the welfare standards or indi- viduals who have incurred substantial medi- cal expenses. An estimated 22.5 million indi- viduals are Medicaid recipients. The federal share of program costs is related to state per capita income, ranging from 50 percent in the highest per capita income states to 77 per- cent in the lowest per capita states. The fed- eral contribution is referred to as Federal Fi- nancial Participation (FFP). Federal law mandates that states cover hospital, physician, skilled nursing facility, family planning, home health, laboratory, x-ray, rural health clinic, and nurse midwife services for all eligible recipients, and early and periodic screening, diagnosis, and treat- ment (EPSDT) services for children under 21. States may also provide a variety of op- tional services, including intermediate care facility services, prescription drugs, dental care, eyeglasses, and other services. States determine the scope of services of- fered and the reimbursement rate for these services subject to federal guidelines. They also exercise a great amount of control over the income eligibility level for Medicaid. The Omnibus Budget Reconciliation Act of 1981 further extended the states' flexibility in these matters. All of these variations in benefits of- fered, income standards, and levels of reim- bursement mean that Medicaid programs differ greatly from state to state. States are responsible for claims processing and other administrative functions for their Medicaid programs, although the federal government shares in the cost of these func- tions. Some states administer their Medicaid programs directly; others contract with the private sector to perform various functions. Fiscal agent contracts are currently used by

APPENDIX B.: SELECTED PAPERS the majority of states to process and pay claims for some or all services. Fiscal agents are reimbursed on either a cost-reimburse- ment or fixed-price basis. In some cases the state contracts with the same fiscal agent re- sponsible for processing Medicare claims. COVERAGE AND REIMBURSEMENT UNDER MEDICARE PROGRAM This discussion will outline the current au- thority, criteria, and process by which deci- sions are made to pay for certain medical pro- cedures and services within the Medicare program. The Medicare program pays for some or all of the cost of certain medical ser- vices furnished to eligible beneficiaries. In this regard, the Medicare program is similar to the insurance programs of other third- party payers such as Blue Cross and Blue Shield and commercial insurance companies. Individuals with such insurance plans are en- titled to certain benefits under the conditions of their particular policy. Individuals, their employers, or other groups frequently negoti- ate with the insurance agent a package of benefits to be included in the policy, and sub- scribers are frequently given the opportunity to select from a range of different benefit packages the policy best suited to their needs. tastes, and income. In the Medicare program, the benefits available to eligible beneficiaries are called covered services. Services not covered are not paid for with Medicare funds. The Medicare program differs from other insurance pro- grams in that there is no negotiation between individual beneficiaries or their representa- tives regarding the content of a benefit pack- age or selection of alternative benefit pack- ages. Rather, all eligible beneficiaries may have partial or full payment made for those services which are covered if their medical condition and level of care is judged to war- rant it. In the following discussion, a distinction is made between issues related to coverage of services and issues related to reimbursement for services. Reimbursement, in terms of the Medicare program, deals with determining the methods and amounts of payment for ser- vices which are covered. Reimbursement is- 559 sues become important only after it has been determined that a service is covered as a ben- efit. For example a new device may be devel- oped which monitors the rhythm of the heart in a new way. After review, it may be deter- mined that the device performs this function safely and effectively, and since heart rhythm monitors are covered services, the de- vice is also covered. The question then be- comes one of reimbursement. Should the level of reimbursement for the use of the de- vice be the same as for devices previously used or is there reason for a different level of reimbursement? This discussion will focus on issues related to coverage rather than reim- bursement of services, that is with determin- in~ the services to be paid for as benefits un- der the Medicare program rather than the method or level of reimbursement. * Services covered by the Medicare program are determined by Medicare statute, regula- tions developed in keeping with the statute, program instructions included in a series of manuals used by those administering the pro- gram on a day-to-day basis, and interpreta- tions of policy in response to specific inqui- r~es. Medicare Statute The Medicare law specifically provides coverage for broad categories of benefits, for example, hospital benefits, skilled nursing fa- cility benefits, home health benefits, physi- cians' services, ambulance services, labora- tory services, durable medical equipment, and others. The Medicare statute also, to some degree, defines these broad categories of benefits. For example, physicians' cervices means ". . . professional services performed by physicians, including surgery, consulta- tion, a home, office, and institutional call. . . ." The statute also lists some specific items which are covered such as diagnostic x-ray tests, surgical dressings, iron lungs, oxygen tents, wheelchairs, and others. In addition, the statute places some limitations, of a gen- eral and categorical nature, on the services * This paper does not cover the new prospective payment system. See Chapter 5 of this book.

560 that can be covered when furnished by cer- tain practitioners, such as dentists, chiro- practors, and podiatrists. In addition to indi- cating what is covered, the law expressly excludes some categories and types of services from coverage, such as cosmetic surgery, per- sonal comfort items, custodial care, and rou- tine physical checkups. The Medicare law does not, however, fur- nish an all-inclusive list of specific items, ser- vices, treatment procedures, or technologies covered. Thus, except for the listed examples of medical and other health services, the stat- ute does not explicitly include or exclude cov- erage of most medical devices, surgical pro- cedures, or diagnostic or therapeutic services. The apparent intention of Congress, at the time the act was passed, was that Medicare should generally cover services ordinarily furnished by hospitals, skilled nursing facili- ties, and physicians licensed to practice medi- cine. However, it is also apparent that the Congress understood that questions as to cov- erage of specific items and services would in- variably~arise and would require a specific coverage decision by those administering the program. Thus, the Medicare law states: Notwithstanding any other provisions of this ti- tle, no payment may be made under Medicare for any expenses incurred for items or services . . . which are not reasonable and necessary for the diagnosis or treatment of illness or injury or to improve the functioning of a malformed body member. This a key provision. First, by reason of the words "notwithstanding any other provision of this title . . ." this is an overriding exclu- sion, and may be applicable in a given situa- tion despite the other provisions for coverage in the statute. Second, it provides the secre- tary of the Department of Health and Hu- man Services considerable discretion and flexibility to respond to changes in the way health care is furnished, especially to the de- velopment and application of new medical practices, procedures, and devices. Medicare Regulations The regulations implementing the reason- able and necessary section of the Medicare ASSESSING MEDICAL TECHNOLOGY law are also quite general ~42 CFR 405.310(k)~. The term reasonable and neces- sary is not further defined in the regulation, nor does the regulation spell out a process for how this term is to be applied. The regula- tions do, however, contain a variety of spe- cific exclusions and limitations on the bene- fits covered by Medicare. These exclusions include such things as routine physical ex- ams, eyeglasses, cosmetic surgery, and dental services which were spelled out in the Medi- care statute. Program Instructions and Policy The clearest formal operational definition of reasonable and necessary is contained in program instructions prepared by HCFA and sent to the fiscal agents (carriers and interme- diaries) responsible for processing Medicare claims for services and administering the pro- gram on a day-by-day basis. This statement of policy translates the statutory and regula- tory terms reasonable and necessary into a test of whether the item, service, or proce- dure in question is 1. generally accepted as safe and effective, or proven to be safe and effective, 2. not experimental; 3. medically necessary; or 4. furnished in accordance with accepted standards of medical practice in an appropri- ate setting. Over the years, this test has been applied to many items and services resulting in a large collection of informal policy statements and accumulated decisions serving as precedents for current policy work. These policy state- ments are continuously undergoing change as medical services and procedures evolve and new research findings emerge. Decisions on Individual Claims The claim review process for the Medicare program is designed to identify services which may not be covered or about which there may be a question of medical necessity or reasonableness. Medicare's contractors (carriers and intermediaries) have discretion within the statutory, regulatory, and pro-

APPENDIX B.: SELECTED PAPERS gram instruction guidelines to decide cover- age issues identified in the claims review pro- cess. All contractors have nurses on their claims review staffs and all have physicians available to provide advice and to consult with other physician specialists and peer groups in the community, in order to resolve coverage issues on the basis of sound medical judgment and information. Medicare contractors are currently pro- cessing 200 million individual claims for ser- vice each year. Most of these are paid with- out serious questions being raised about whether the items and services are covered under Medicare. When questions are raised, they relate primarily to whether the service was medically necessary in the particular case and was furnished in an appropriate manner and setting, rather than to the broader issue of general coverage. However, at times an issue arises as to whether a proce- dure or item should be covered under any cir- cumstance. These services are usually new procedures or new applications for existing procedures, although occasionally questions will arise regarding potentially outmoded services and items. Such questions are re- ferred to the HCFA central office for further review and evaluation. Although the claims review process is the major source of questions about coverage of procedures, items, and services, inquiries also come to HCFA from physicians and pro- fessional groups and with increasing fre- quency from manufacturers of medical equipment and devices. HCFA examines the question and is able to answer many referrals based on the statute, regulations, and exist- ing policies and definitions concerning cov- ered services. A few questions raise important new issues which HCFA cannot resolve with- out seeking additional professional and medi- cal expertise. If medical consultation appears necessary, HCFA will review the medical and scientific literature related to the service in question, gather appropriate articles and background material, and present the question to a panel of physicians employed by HCFA and other components of the department. The task of the panel is to sharpen and clarify the ques- tions that need to be answered. For example, 561 HCFA was asked recently if plasmapheresis (apheresis) was a covered service. After re- view and discussion, and with assistance from physician members on the panel from the Public Health Service (PHS), it was clear that apheresis had potential application for many diseases and conditions and that evalu- ations of this procedure were necessary based on the specific indications for its use. Cur- rently, HCFA covers apheresis for a limited number of indications with additional evalu- ation under way. When the panel confirms the need for further expert medical opinion and evaluation, HCFA refers the question with the background information to PHS. USE OF EVALUATIVE DATA IN COVERAGE DECISIONS The criteria currently used to determine if an item or service is reasonable and necessary in terms of the Medicare program are as un- specific as safe and effective. HCFA may in- terpret the meaning of safe and effective in a very different manner from other groups. For example, in considering whether to approve new medical devices, the Food and Drug Ad- ministration (FDA) also uses the terms safe and effective. The process and specific crite- ria used by the FDA in evaluating the safety and effectiveness of a medical device for pur- poses of market approval, however, are very different from those used by HCFA to deter- mine safety and effectiveness for the purposes of providing Medicare payment. It is possible that a new device may be ap- proved by the FDA based on a limited amount of research data focused on short- term safety and effectiveness of the device rather than longer-term safety and effective- ness in terms of improved health outcome necessary for Medicare coverage. Hence, cer- tain devices or procedures may be approved by FDA but not covered as benefits under the Medicare program. Because both agencies use the terms safe and effective, the public may be confused by the seeming inconsis- tency. The issue is further complicated by the lan- guage describing a service as either generally accepted as safe and effective or proven as safe and effective. There is no commonly ac-

562 cepted definition of these terms. For many new items, services, and procedures it may not be possible to make a decision based on general acceptability by the medical profes- sion because the service has usually been pro- vided by only a small number of physicians. Hence, the coverage decision will rest on medical evidence or judgments proving safety and efficacy. But even here, there is no clear agreement as to what constitutes an ac- ceptable level of proof. There are similar difficulties in determin- ing if a procedure is still experimental. There are no accepted definitions or operational measures to indicate when a procedure or service has moved from a clear research phase, to an investigational phase, to ac- cepted medical practice. In actuality, these stages overlap and research and investigation continue at the time a new procedure is gain- ing acceptance by practicing physicians. For example, studies of the diffusion of comput- erized axial tomography scanning indicate the diffusion of the procedure, and the growth of the investigative literature base proceeded in parallel. At what point has the procedure become generally accepted? The question is complicated further because the decision to pay is usually yes or no. It is gener- ally not possible to pay for a service only in certain institutions or when performed by certain specially qualified physicians. To date, HCFA and its medical and scien- tific advisors and contractors have not explic- itly considered criteria beyond safety, effi- ciency, and research status in determining Medicare coverage policy. It is reasonable to assume, however, that other considerations such as economic, ethical, and social issues are at least implicitly considered as some pro- cedures are evaluated for Medicare coverage. For example, coverage questions referred to HCFA for detailed evaluation frequently concern services or devices that have the po- tential for high program costs if covered. In such cases, economic considerations or issues of distribution of services and access to ser- vices may be an implicit factor in the decision to refer the issue to HCFA for evaluation or to initiate a thorough evaluation. Such con- siderations may implicitly affect the coverage decision if for no other reason than that a ASSESSING MEDICAL TECHNOLOGY greater burden of proof may be required be- fore a decision is made to cover the service as a Medicare benefit. To date, however, no safe and effective procedures have been de- nied coverage based on cost or social consid- erations. Current Status of Coverage Decision Making HCFA, its medical advisors, and the Medi- care contractors have wide discretion in mak- ing coverage decisions concerning individual items and services. Although leaving signifi- cant room for flexibility and individual con- siderations and judgments by the medical profession, the lack of more explicit coverage criteria has also resulted at times in inconsis- tency from claim to claim or service to ser- vice. HCFA, in preparing national coverage instructions, which are binding on contrac- tors, may also inconsistently apply the crite- ria in evaluating certain coverage questions. As noted above, a more rigorous burden of proof may be required for newly introduced services or costly services compared with es- tablished services which may be less safe or effective or more costly. TECHNOLOGICAL INNOVATION, COVERAGE DECISIONS, AND MEDICAL PRACTICE There is a belief that the technological re- search and development capabilities are ex- ceeding the capacity of the health care deliv- ery system and the individual practitioner to evaluate the medical research findings and appropriately apply them to patient care needs. Although drugs and some medical de- vices are subjected to scientific scrutiny by the FDA before marketing and wide avail- ability, other medical procedures, devices, and services are accepted and widely applied by the medical community with little evi- dence regarding relative safety or effective- ness. For example, gastric freezing in the treatment of peptic ulcer disease and internal mammary artery ligation for coronary artery disease were widely used by medical practi- tioners before clinical studies demonstrated their lack of effectiveness. With the publica-

APPENDIX B.: SELECTED PAPERS tion of evaluation studies, these procedures subsequently disappeared from the therapeu- tic armamentarium. In addition, many medical devices and procedures are evaluated as individual items rather than in comparison with existing, al- ternative approaches to achieve similar medi- cal outcomes. This is especially true for diag- nostic studies in which new findings may be quickly applied and new tests are added to the array of those already available rather than replacing existing studies. HCFA thus far has directed the bulk of its medical coverage evaluation resources to in- dividual new devices and procedures. But in- novations are also occurring in the patterns of health services delivery. Data are being gath- ered concerning the appropriate minimum numbers of procedures and distribution of services such as open heart surgery, the growth in numbers and appropriateness of coronary care and intensive care unit beds, home health and day care services, and so- called unnecessary surgery. Recently, there has also been a significant effort by the medi- cal profession to move services from the tradi- tional hospital setting to settings outside the hospital. Ambulatory surgical centers and free-standing cardiac rehabilitative facilities are examples of such movement of services. Medical and scientific evaluative informa- tion is also necessary to determine coverage policies in these areas. For services that have long been accepted by the medical community, but frequently are unproved as to effectiveness, Medicare coverage policy has proceeded on adminis- trative rather than medical judgments. For example, concerning home health or rehabil- itation services, administrative decisions are usually made in terms of the Medicare stat- ute, regulations, and policy. Only on occa- sion do new medical research findings or technological innovations lead to a change in policy. The failure to use medical and health ser- vices research for assistance in determining coverage policy in the service area is under- standable. Questions regarding the appropri- ate applications of new technologies or the existing patterns of health services delivery are complex and highly value-laden. Fur- 563 thermore, there are very significant differ- ences in delivery of medical care services in different areas of the country. A physician evaluating alternative approaches and select- ing those services that best serve the needs of an individual patient will draw upon very different information and values than will a policy analyst evaluating information to de- termine if a service qualifies as reasonable and necessary and, therefore, will be paid for by the Medicare program. Evaluative information is necessary for the physician and patient to select the proper mix of services. It is also necessary for the public policy official charged with the responsible administration of a publicly funded pro- gram. The absence of a sound information base as well as the potential conflict between the needs of an individual patient and the needs of third-party payers to exercise a fidu- ciary responsibility in behalf of all beneficia- ries is the source of a major conflict surround- ing benefit coverage decision making. Frequently a test, procedure, or service is considered necessary by a physician if it is likely to make any difference at all in the di- agnostic process or therapeutic outcome. The economic concepts of marginal gain and marginal cost may not be applied by practi- tioners in the care of individual patients, par- ticularly when third-party payers such as Medicare are picking up most of the bill. In this case, the apparent costs to the individual approach zero and the service or test is or- dered even if its value also may approach zero. For example, in evaluating a patient with coronary artery disease, many different tests and procedures are available. Are all the tests or only certain selected ones necessary for an individual patient? There are no clear research findings to answer the question, and because physicians are trained to acquire all the possible data available to minimize un- certainty, the tests are ordered and usually , paid for by the Medicare or other third-party payers. When either physicians or third-party pay- ers turn to the medical and health services de- livery research data for guidance on ques- tions similar to this, they find it may fail to provide the information needed. Sound in- formation and consensus on the safety and ef-

564 fectiveness of alternative methods of diagno- sis, treatment, and delivery of services fre- quently do not exist as part of an accepted body of knowledge. Many procedures and services commonly used and accepted in medical practice have not been evaluated by means of carefully planned, well-designed, controlled clinical studies. Nevertheless, Medicare generally pays for these commonly accepted procedures and services when or- dered by a licensed physician. It is the new procedures or new applications for accepted procedures that are currently subject to eval- uation by the Medicare program. The medical profession contends that an assessment of the risks and costs as well as the benefits of services and procedures has been central to the exercise of good medical judg- ment for decades and that such analysis and judgments are better made, and are being re- sponsibly made, within the medical profes- ASSESSING MEDICAL TECHNOLOGY sign. An alternative view holds that an indi- vidual physician frequently does not have available all the information needed to make a sound decision regarding the safety and ef- fectiveness of complex new procedures, al- though from the physician's own experience with the procedure it would appear to be working out well. Such might have been the case with carotid artery ligation or gastric freezing. One view placing high weight on the judg- ment of individual physicians might be that if a physician orders any procedure or service Medicare should pay for it. An opposite view could require that payment be made only for those services that have been evaluated with evidence as to safety and efficacy. In prac- tice, the Medicare program looks to the judg- ment, experience, and opinion of physicians and to sound scientific evidence.

Next: Index »

Assessing Medical Technologies (1985)

Chapter: Appendix B: Selected Papers

Welcome to OpenBook!

Get Email Updates