Click for next page ( 125


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 124
6 Randomized and Observational Approaches to Evaluating the Effectiveness of AIDS Prevention Programs In previous chapters, the panel recommended that randomized condoned experiments be used to evaluate a small number of important and carefully selected AIDS prevention projects. Our reasoning has been that we~- execute~ randomized experiments afford the smallest opportunity for error in assessing the magnitude of project effects, and they provide the most trustworthy basis for inferring causation. Notwithstanding this conclusion, we recognize that this strategy will not always be feasible and Hat nonrandomized studies may be required in some ~nstar~ces.~ In this chapter the pane] reviews a number of observational approaches to the evaluation of tile effectiveness of AIDS prevention programs. (~ addition, Appendix F presents a background paper for this chapter which provides a detailed treatment of an econometric technique known as selection modeling and its potential uses.) On January 12-13, 1990, the panel hosted a Conference on Nonexpenmental Approaches to Evaluat- ing AIDS Prevention Programs. Fifteen experts from the behavioral sciences, statistics, biostatistics, econometrics, psychometrics, and education joined panelists and federal representatives to discuss the application of quasi-expe~imentation and modeling to evaluating COC's three major AIDS interven- tions. This chapter is an outgrowth of those discussions and the papers presented at this conference (Bender, 1990; Campbell, 1990; Moffitt, 1990). 1 Observational designs may convey other practical benefits. For example, such studies avoid the ethical debate that may accompany the withholding of treatment in a randomized study, Discussed in Chapter 5. Observational studies may also be advantageous when an intervention occ~"naturally" or has saturated a community before randomization can be implemented. 124

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES OVERVIEW ~ 125 Determining the effectiveness of an intervention project requires compar- ing how a project participant (or group of participants) fares with how that participant or group would have fared under a different intervention or no intervention. Because such direct comparisons are ordinarily not possible,2 researchers have developed a number of ways to construct a comparison group that "looks like" the participants. The objective is to make this group as similar as possible with respect to confounding fac- tors3 that may affect the outcome lower than the fact of the intervention itself). If participants' selection or reason to enter into a study is not in- dependent of the study's outcome vanables, however, selection bias is introduced. For example, if individuals who enter a counseling and test- ~ng project are more highly motivated to change their risk-associated behaviors than individuals who do not choose to enroll in such programs, a selection bias is present, and the effects of the intervention cannot be estimated by simply comparing outcomes among program participants and nonparticipants. Strategies to evaluate AIDS interventions In such instances require the assumption that the effects of such confounding vanables can be estimated and adjusted for. (A variant of this problem can also arise in randomized experiments, where the attrition of respon- dents from expenmental and control groups can introduce an analogous selection bias.) As explained in Chapter I, selection bias can Menially be controlled by He random assignment of individuals to one group or another. When properly implemented, randomization will, on average, create groups that have the similar initial charactenst~cs and thus are free (on average) of selection bias. The chance always remains, of course, that randomized groups are different, but the chance is small and decreases as the sample size increases. Furthermore, standard statistical tools such as confidence intervals can be used In properly randomized experiments to calculate the variability in Be effect size associated with the randomization. Thus, well-executed randomized experiments require the fewest assumptions in estimating the effect of an intervention. Notwithstanding this statistical advantage, the pane! urges that underlying theory about who an interven- tion will affect (and how) be sufficiently compelling to justify mounting 2A few important exceptions anse, such as when the same subject can try two diets or two ointments. But if temporal sequence is important or if memory or attitude is at stake, "you can't go back." Simi- larly, only one of two alternative surgical procedures will be applied in any one patient, etc. 3 that is, variables that (1) influence outcomes, and (2) are not equivalently distributed in the treatment and comparison groups.

OCR for page 124
126 ~ EVALUATING AIDS PREVENTION PROGRAMS a randomized trial and sufficiently explicit about the relationship be- tween independent vanables and the outcome to allow the analysis of the experimental data if randomization fails and statistical adjustments are needed. Nonrandomized studies require additional assumptions and/or data to infer causation. This is so because it is seldom safe to assume Hat individuals who participate In a program and receive its services are similar to those who do not participate and do not receive services. In a few cases, the differences between participants and nonparticipants may be fully explained by observable characteristics (e.g., age, partner status, education, and so on), but often the differences may be too subtle to be observed (e.g., motivation to participate, intention to change behavior, and so on). This is particularly true in the AIDS prevention arena because so little is known about changing sexual and drug use behaviors. Thus, a simple comparison of the risk reduction behavior of participants and nonparticipants in a program can yield a misleading estimate of the true effect of the program. In addition to randomized experiments, six observational research designs will be discussed in this chapter. These alternatives use a variety of tactics to construct comparison groups.4 For organizational purposes, the panel clusters the six strategies under two umbrellas that differ in their general approach to controlling bias and providing fair comparisons. One approach involves the design of comparability and the other involves post hoc statistical adjustment:5 Design approaches develop comparability a priori by de- vis~ng a comparison group on some basis over than ran- dom~zed assigrunent. This may be done through: quasi-experiments, natural experiments, and matching. Adjustment approaches correct for selection bias a poste- riori Trough model-based data analysis. Such approaches 4Throughout this chapter, control groups will refer to the randomly assigned groups that may either have received no treatment or have received an alternative to the experimental treatment. Their non- randomized counterparts will be referred to as comparison groups. SA third type of observational method is the case study, in which evaluators probe individual histories for factors related to outcome variables. Because little comparability is achieved in these studies, the panel will not discuss ~em, except to note that case studies often yield hypotheses and measures that can eventually be used in the other designs and can yield useful information to help interpret the results of randomized trials that are not optimally implemented.

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 127 use models of the process underlying selection or partici- pation In an intervention and of the factors influencing the outcome vanableks). Specific methods include: analysis of covanance, structural equation modeling, and selection modeling. It should be recognized that the panel's distinction between the two approaches is not absolute. Matching, for example, is sometimes done retrospectively as a method of controlling selection bias, and prospective data can be collected for use In modeling. Fur~e~ore, in some sense, all the approaches involve "modeling" at least to the extent that they take account of behavioral theories or models to infer causation. Despite some imprecision, the pane} finds the general distinction helpful in thinking about the ways that have been developed to estimate project effects from nonexperimental designs. Choosing Among Strategies Despite the panel's preference for randomization, we realize that it is not always feasible or appropriate and that its implementation is not immune to compromise. When randomization is infeasible or threats to randomization loom large, we believe researchers should look to alter- native strategies and determine which of them, ~ turn, will be feasible, appropriate, and produce convincing results. The choice of approach can present itself in many ways: a cohort study, for example, can begin to study all new entrants at a clinic (the a priori approach), or it can look back at all entrants who first appeared at that clinic at some tune in the previous monthsif the clinic records are good enough.6 Because We data are already In hand, the a posterior) approach may permit an apparently faster investigation of the problem. Offsetting this advantage, however and often outweighing it are two other considerations. First, retrospectively collected data may not include measures of key vanables, and the available data may be difficult to interpret. Planned data collections may profit from steps taken to ensure the availability of ~nfonnation on all variables of interest. Second, design imposes control over the observations of behavioral and psychological variables: planning what data to collect, how to collect ~em, and what comparisons to make improves the prospects of obtaining valid and 6 Some quasi-experiments and nonexperiments may also offer the choice. If the trigger event is an earthquake, then only the retrospective mode is likely to be available, but if it is the initiation of a new legal requirement at a future date, then the choice of a prospective study is available.

OCR for page 124
128 ~ EVALUATING AIDS PREVENTION PROGRAMS reliable measures of project inputs and outcomes.7 For these reasons, the pane! believes that strategies that build on data collected for the specific purpose of evaluating project effects should, In general, have a greater likelihood of success. In the case of AIDS interventions In particular, the pane] is pes- simistic about our ability to correct for bias after the fact because we have at present a poorly-developed understanding both of the factors affecting participation In such projects and the factors that induce people to change their sexual and drug use behaviors (e.g., motivation, social support, and so on). Success through a posterior) approaches benefits from a comprehensive understanding of these confounding factors and reliable measurements of them.8 Finally, the pane] notes that the charged climate that surrounds many AIDS prevention programs can render decision making difficult in the best of circumstances. Research procedures that produce findings that are subject to considerable uncertainties or that provoke lengthy debates among scientists about the suitability of particular analytic models may, in the opinion of this panel, impede crucial decision making about the allocation of resources for effective AIDS prevention programs. These factors underlie the panel's preference for well-executed randomized experiments, where such experiments are feasible. This is not to say that observational strategies do not have a place in AIDS evaluation designs nor that their role must necessarily remain secondary in the future. Rather it reflects the panel's judgment that In the current state of our understanding of AIDS prevention efforts and the state of development of alternative observational strategies for evaluation, overreliance on observational strategies would not be a prudent research strategy where it is feasible to conduct well-executed randomized expenments. Before reviewing observational strategies for evaluation, He pane! provides a brief reprise of the basis for its recommendation that carefully executed randomized experiments be conducted to assess the effects of a small subset of AIDS prevention projects. RANDOMIZED EXPERIMENTATION Randomized controlled experiments specify a single group of interest of sufficient sized and Men randomly assign its members to a treatment 7 See Appendix C for a discussion of validity and reliability of behavioral data. Success through design approaches also depends on these things, but careful design allows some of the factors to be controlled. Randomization leads to the most trustworthy expectation that these factors have been controlled, although theory is important to examine whether groups are indeed comparable. 9 The question of sample size is important because it affects the statistical variance of the estimate of

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 129 group or a control group that receives an alternative treatment or no treatment at all. By randomly assigning units (i.e., individuals, schools, clinics, communities) to treatment and control groups, it becomes possi- ble in theory to interpret any resultant intergroup differences as a direct estimate of the magnitude of the effect, if any, induced by He treat- ment. The method's assumption that selection bias has been controlled is probabilistically hedged by the significance test. In properly random- ized experiments, statistical significance tests can indicate whether the observed differences In group outcomes are larger than can be explained by the random differences in the groups. By providing a statistically well-grounded basis for assessing the probability that observed differences in outcomes between groups are attributable to the treatment, we-execute randomized experiments re- duce ambiguity in the interpretation of findings and provide the greatest opportunity for producing clear-cut results. This reduction In ambiguity is made possible by the fact that assignment to a particular treatment group is by Edition independent of all other factors. This inferential strategy requires, however, that the random~zai~on of assignment not be compromised in its execution and that it be maintained through t~me.~ When assignment is not random or when randomization is compro- mised, differences between the treatment group and the condor group may result from either the effect of the treatment, or from idiosyncratic differences between the groups, or both. If, for example, members of the treatment group diner from those In a comparison group because they were more motivated and thus more aggressively pursued or stuck with the intervention program, the treatment's success may be overstated. On the other hand, if the treatment group represents those at highest risk, any comparison group would have the advantage of being composed of individuals less in need of the intervention. As such examples illustrate, selection bias can cause the intervention group to perform either "better" treannent effect derived from a given experiment. (Over things being equal, the squared standard error of this estimate will be directly proportional to the square root of the sum of the standard errors of the estimate of the means of the treatment and control groups.) Sample size is discussed in more technical teens in Appendix D, but, in brief, it should be noted that as the size of the sample increases, the variance in the expected distribution of estimated effects will decrease, thus permitting more precise estimates. (In addition to large sample sizes, homogeneous populations will also reduce variance.) 10In practice, nonequivalent addition in the treatment and control groups and over factors can re- introduce the selection biases that randomization excluded. When randomized assignment is thus compromised in execution, the same inferential problems that beset observational studies operate and they may require use of procedures such as statistical adjustments, modeling of attrition bias, and so forth. AT} such instances, it should be clearly recognized that the inferential uncertainties attending a severely compromised randomized experiment may be just as large (or even larger) than those that attend the use of a purely observational design.

OCR for page 124
130 ~ EVALUATING AIDS PREVENTION PROGRAMS or "worse" than the comparison group. The direction of the bias, let alone its magnitude, is often difficult to predict beforehand. The Power of Experiments: An Example History provides a number of examples of the interpretive difficulties Mat can attend observational studies (or compromised experiments) and the power of a well-executed randomized experiment to provide definitive results. In the infant blindness epidemic at m~-cen~y, for example, well-executed controlled experiments ended an inferential debate that observational studies had fueled instead of extinguished. In the 1940s and early 19SOs, more than 10,000 infants most of whom were born prematurely- fen victim to a previously unimown form of blindness caned retrolental fibroplasia (Silverman, 19771. Over We years, more than 50 hypotheses were offered for the cause of Me disease and for effective treatments. About half the hypotheses were examined observationally, but only four were actually tested in experimentally controlled teals. Before the expenmental studies took place, an uncontrolled study had indicated that He application of ACTH (adrenocorticotrophic hormone) would prevent the fibroplasia. A randomized controlled tnal showed that this therapy was unhelpful or worse: a third of the infants who received ACTH became blind whereas only a fifth of the control group did. One hypothetical cause of the observed blindness based on a study of 479 infants was a deficiency of oxygen. This proposal was coun- tered by another hypothesis based on 142 observationsthat an excess of oxygen was to blame. (Duling the penod of the epidemic, premature infants were routinely given oxygen supplements at a concentration of more than 50 percent for 28 days.) Once again, a well-controBed ran- domized experiment put the debate to rest: the group of infants randomly assigned to receive the routine supplemental oxygen had a dramatically higher incidence of blindness than the control group (23 percent versus 7 percent. Observational studies might have finally yielded this same conclusion- at least one small study had suggested excess oxygen as He culprit but He human cost and the time involved (10,000 blinded children and more Han 10 years) were dear indeed. Because neither the cause of nor the cure for the children's blindness were known, the randomized trials reported here met ethical standards for varying treatments. loathe results of the study were widely publicized among ophthalmologists, and within a year the prac- tice of providing high concentrations of oxygen to premature infants was largely modified. Subsequent efforts have been made to provide an oxygen concentrate that prevents brain damage but does not cause blindness (Silverman, 1977).

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 131 Compromised Randomization The pane] believes that the inferential debates that bedevil the interpre- tation of nonexperimental studies are largely avoided by well-conducted randomized experiments. In practice, uncertainties may nonetheless at- tend the inference that a causal relationship exists between the inter- vention being evaluated and the outcomeks) observed In a randomized experiment. In this section, we discuss four important sources of uncer- tainty that investigators need to monitor: sample attrition, compliance with treatment, spillover (and diffusion of effects), and compensatory behavior. Note that the first three of these are not solely problems for experiments; they can frustrate observational studies as well. The last, however, is a special risk of randomized experiments. Attrition Careful randomization of participants into treatment and control groups is not sufficient In itself to guarantee informative results. Successful exper- ~ments also require that sample attrition be minimized. Any such attrition can introduce post-assignment biases in the composition of treatment and control groups. Two types of attrition can occur, each with different results. With one, participants drop out of He study and cannot be followed up. To the extent that this occurs, the ~ntegnty of the experiment is compromised and results are subject to some of the same concerns about selection bias that threaten the results of observational studies. If different plausible ways of analyzing the data lead to qualitatively different interpretations, it is then evident that: (~) the evaluator will have to mode] the self-selection bias, (2) the conclusions may depend on the mode! approach chosen, and (3) if no strong basis exists for confidence in the chosen model, the study results must be subject to considerable uncertainty. A second type of attrition occurs when people do not complete the protocol but are still available for follow-up. In this case a valid interpretable randomized comparison can still be made: outcomes can be compared between all those who started on intervention A and all those who started on intervention B. This comparison is sometimes mealiingflll because, in practice, the choice may be to start a participant In one intervention or another,-in full recognition that some participants may not stick with it. If defection rates are high, however, restricting analysis to only those who stay win the assigned treatment would produce wholly biased results. For example, selective drop-out may occur from experimental group A because project participation entails more effort than staying

OCR for page 124
132 ~ EVALUATING AIDS PREVENTION PROGRAMS in control group B. This type of drop-out introduces selection bias, with the result being that the outcomes of the expenmental group win be artifactually overestimated because the more motivated participants remained.~3 But some members of group B might also have dropped out had they been assigned to the program that required effort on the part of . . participants. Where selective audition occurs, differences in outcome between the two groups are inevitably an unknown mixture of effects related to the actual differences in treatment effects and differences In the kinds of participants who do and do not drop out of the two treatment groups. Even if selective attrition does not occur, the completeness of information may still differ systematically between treatment groups (especially where participant cooperation is necessary to information acquisition); again, bias from self-selection is a risk. Compliance Both the first report of the parent committee (Turner, Miller, and Moses, 1989: Chapter 5) and the preceding chapters of this report identify compliance, along with attrition, as major threats to the integrity of ex- penments. Even In the most carefully designed experiments, a substantial number of individuals may leave the program or fall to comply with the requirements of the experiment and thus not receive the full spongy of the intervention. The threat that attrition and noncompliance pose underscores the panel's sense that an essential first step of any outcome evaluation is to analyze the delivery of services before interpreting es- timates of project effects. Tracking respondents' compliance with the assigned treatment is essential to ensure that valid inferences can be Lawn from the data. The potential importance of tracking compliance is well illustrated by an example. Clofibrate, a drug intended to treat coronary heart dis- ease, was compared to a placebo In a very large clinical trial, and no significant beneficial effect was found.~4 Upon later analysis, however, it was observed that those patients assigned to Clofibrate who actually took at least SO percent of their medication had a much lower five-year mortality than those in the Clofibrate group who took less than 80 percent 130n the other hand, participants may drop out because their transportation falls through or they move away, which, one might expect, would not introduce selection bias. It is, however, the case that ad hoc inferences such as these are always open to challenge. "Transportation falling ~rough" may be a polite way for subjects to disguise their lack of interest in a program. l4In the trial, 1,103 individuals were randomly assigned to receive the drug, and 2,789 individuals received the placebo.

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 133 of their medication. The mortality rates for the Clofibrate compliers and noncompliers were about .15 and .25 respectively. Note that this was not a randomized companson; the randomization put all these patients on Clofibrate rather than on placebo. These results appeared to show an important difference and suggested that the Mug had beneficial effects. Complianceactual ~ug-takingwas, however, a matter of self- seiection. As it turned out, the group assigned to take the placebo also had "good" compliers (who took at least 80 percent of placebo) and "bad" compliers (who took less than 80 percent). Moreover, their five- year mortality rates were also about .15 and .25 (Coronary Drug Project Research Group, 19801. The effort to use the information available on the patients to account for this self-selection effect failed; the data in the records were not sufficient to adjust away the mortality difference in either group. Without the randomized control group data on compliance, however, a false treatment benefit could easily have been claimed for those who took 80 percent or more of the Clofibrate. While this example does not tell us how alternative methods might have been used to resolve the problem, it does clearly illustrate the importance of tracking self-selection and compliance. It also illustrates the usefulness of data from a randomized control group. Spillover The diffusion of treatment effects throughout the population can also obscure evaluation results. A major threat to the internal validity of the randomized experiment is "spillover." This phenomenonthe commu- nication of ideas, skills, or even outcomes from the intervention group to the control group~an result In the dilution of estimated program effects in a variety of ways. Members of an experimental group who adopt safer sex skills as a result of an intervention, for example, are likely to come into contact with members of the control group. If both groups are drawn from the same population, the control group may thereby adopt safer sex skills as well, at least when involved with individuals from the experimental group. Alternatively, an effective intervention may produce the outcome of lower infection rates among the experimental group; this outcome would then spill over into reduced rates among the control group because of the reduced pool of HIV-positive individuals to whom they could be exposed. In these situations, it is plausible that any observed difference between the experimental group and the control group is an underestimate of the program's Sue effect.~5 ISIS is unlikely that these rates can be adjusted to reflect initial conditions in different communities

OCR for page 124
134 ~ EVALUATING AIDS PREVENTION PROGRAMS Such spillover effects are less of a threat when the unit of randomized assignment is at the organizational level rather than at the level of the individual. As discussed in Chapter 1, the unit of assignment can be a citric (i.e., the clientele of the clinics, a community, or city, and so on. En fact, when thinking about AIDS interventions, it is apparent that many educational projects, such as the media campaign, are based on a diffusion theory that assumes that interpersonal contacts are made after media exposure. ~ such cases, organizational units are appropriate to study because spiBover within units is desired. Nonetheless, spillover across units can remain harmful to the evaluation effort, so geographic proximity of treatment and control groups should be avoided. Compensatory Behavior A problem unique to randomized designs is the threat that control group members will act in a way that compensates for their having been as- signed to the control group (and that is, In fact, different Tom the way they would have behaved if July '`untreated"~. Such compensatory be- havior can con~ninate the outcomes of an evaluation, and it is difficult to predict the direction of such bias beforehand. For example, if an attractive AIDS counseling project were offered to some participants but not to others (and both groups were aware of this assignment decision) the nonrecipients could react In different ways. They may overcompen- sate for their exclusion by taking it upon themselves to change their risky behavior or form their own support group. Such overcompensation would d~ sh the effects of a project detected by an evaluation. Or, nonrecipients may become demoralized and give up, not malting any change in their behavior or even backsliding to riskier ways. Such a reaction by the control group would Men tend to overestimate the effects of the intervention on the expenmental group. Such effects are particularly worrisome in that they can easily go unnoticed and result in misleading conclusions. Some protection against such missteps may be afforded by blinding the study so that participants are unaware of the alternate treatments a strategy that may be feasible when randomization is done at the clinic or community level. Use of ethnographic observers (See Appendix C) may also be helpful in recognizing the presence of such compensatory behaviors. Replication of experiments in different milieus may also protect investigators against such experunenta] artifacts. because we lack reliable data on We prevalence and distribution of HIV in the U.S. population. (See discussion in Chapter 1 of the 1989 report of Me parent committee [Turner, Miller, and Moses, 1989].)

OCR for page 124
184 ~ EVALUATING AIDS PREVENTION PROGRAMS effects are uncertain.) On the other hand, the lack of randomization into participation categories means that it is not possible to discount the possibility that something other than the planned intervention occurred in the ~ntervennon community to cause the change. Thus, this exam- ple accepts some ambiguity, but it also may be more feasible than a randomized expenment. In addition, the example reveals the value of quasi-expenments in giving investigators experience with intervention procedures before they are deployed In an experimental study. 2. Are the competing explanations for the project's elect reasonably assumed to be negligible? The number of instances In which competing explanations are neg- ligible will be few. They do exist, however. In testing an algebra course for Bird graders, for example, it will often be safe to assume that third graders who are not involved In the curriculum win not learn algebra on their own. A before-and-after design would then be sufficient to estimate the effect of the cumcuTum on children's knowledge.S3 Simi- larly, a before d-after design might be acceptable to test the elects on schoolchildren of art intensive CBO project to reduce stigmatization of a prospective seropositive classmate. The media campaign's effects on this population might be assumed to be negligible (given the late hour of most broadcasts of national public service announcements about AIDS and the reading level required for published matenals). Any changes In attitudes might then be attnbuted to the intervention. 3. Must the program be deployed to all relevant individuals or institu- tions that are eligible? As mentioned earlier, a community-wide intervention project to prevent AIDS may be swiftly implemented and offered to all eligible residents, thus saturating the community and precluding the random assignment of individual residents to experimental and control condi- tions. Consequently, any evaluation design will have to depend on quasi- expelimental or statistical adjustment methods. For example, a time series analysis of trends In condom sales, visits to STD clinics, and sales Of safe sex videos or books might be implemented. Note, however, that when multiple sites are involved, the pane] suggests that communities themselves might be randomly assigned to an intervention or to a control condition in the interest of estimating the effects of the program. 53 Before-and-after evaluation designs are discussed in Chapter 4.

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 185 4. WiR a nonrandomized approach meet standards of ethical propriety while a randomized experiment will not? As discussed in Chapters 4 and 5, random assignment to an interven- tion or to a control group fails to meet standards of ethical propriety if resources are In ample supply to provide the intervention, it is not other- wise available, and the beneficial effects of the intervention are assumed to outweigh any negative effects. HIV testing, for example, is believed to be an effective medical care procedure, thus making a randomized no-treatment control inappropriate for estimating the effect of CDC's counseling and testing program.54 In this case, it might be possible to use a time series design to examine the effectiveness of a new counseling and testing setting on the accessibility of services. For example, suppose a small community with HIV test facilities in its public heath and family planning clinics wishes to open a new site specifically to attract gay men. Before opening the new site, the community can count the number of test takers using test facilities by their risk exposure group (as identified In Figure 5-l in Chapter S). After the new site is open, the number of test takers by risk group can be recounted (actually, a series of before- and-after measurements would be preferred). If the number of gay test takers increases (without a corresponding decrease in the over categories to which they may have assigned themselves), it might be inferred that the new project was effective in attracting gay men. 5. Are theory- or d~ta-based predictions of effectiveness so strong that nonexperimental evidence will suffice?55 In some cases, theory may predict dramatic effect sizes. It is often (but not always) true that the larger the expected impact of an ~nter- vention, He less accurate an evaluation technique one needs to discern that impact. Extremely persuasive educational and prevention projects might, for example, produce such large effects that the impact would be convincingly evident even with observational designs that are more vulnerable to bias. In other cases, an intervention may have previously been shown to make a difference under a given set of circumstances or within a given subgroup using a randomized experiment. In these cases, suppose the generalizability of this finding is not known, and an investigator wishes to test the intervention in a different setting or among a different target group. Under these circumstances, the inferences from an observational study may be sufficiently convincing as to preclude the 54See Appendix D for furler discussion of the ethical concerns of evaluating patient care procedures. 55Note that it is important to differentiate well-founded predictions of effectiveness from "coTrunon knowledge" of what works. Too often hunches or instincts about what works have stood in the way of deciding to conduct a well-controlled randomized study.

OCR for page 124
186 ~ EVALUATING AIDS PREVENTION PROGRAMS need for a full-scale experiment. Consider, for example, the case of a counseling support project that has been tested in a randomized controlled experiment and shown to increase gay men's behavioral skills for refusing sexual coercion (Kelly et al., 19891. The support project's effectiveness among women partners of intravenous drug users, however, is unknown. To test it, a quasi-experiment might be designed. In a final section, below, the pane] considers the ~nvestigator's final assessment of the results of his or her study, whether it be a randomized experiment or not. INTERPRETING EVALUATION RESULTS The goal of outcome evaluation is to determine how well a project works. Part of this deterrn~nation, no matter the method chosen for evaluation, involves an investigator's interpretation of results. The degree of certainty that the observed outcomes result from the intervention is a function of, among other things: the reasonableness of the assumptions behind the evaluation strategy, the quality and amount of the data, and the plausibility of counter-hypotheses that could account for the observed data. It is also important for interpretation to address whether results are specific to a given set of circumstances or are generalizable to other populations. Randomized Experiments Assuming that randomized controlled trials are used, the assumptions underlying the inference of effects are generally easy to verify, which will facilitate acceptance of a study's interpretation. One still needs, however, to examine the data on project participants and the project itself, to insure the internal validity of the experiment. Such validation is needed to be sure that the project and randomization were implemented as designed and that the degree of attrition is acceptably small. If these conditions are satisfied, differences between units can be analyzed using standard statistical tests. In the end, even if the results are strongly encouraging for a subgroup of a population, generalizability will often be uncertain. The results from a single experiment may allow strong and rather precise inferences of causality, but because they are likely to be based on small, selective samples, they may be equivocal in terms of how He project will work in other groups, other settings, and other regions of the country. Whatever is known about the experiment should be communicated in the interpretation of results.

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 187 Nonrandomized Methods Nonrandomized methods make greater use of assumptions than random- ized Dials. In interpreting the results of such studies the plausibility of these assumptions must be considered (and reported) because they wiB vary from one design to another, and they are crucial to He inferences that will be drawn. Moreover, investigators need to analyze Be sensitivity of their inferences to the likely amount of departure from these assumptions. Accessibility of Assumptions All of the alternatives to randomization have one Ming in common they rely on assumptions that are not directly verifiable. The nonexperimental alternatives differ, however, In the nature of the assumptions that are necessary. Observational studies, natural experiments, and matching approaches tend to make assumptions that, although they may not be directly verifi- able, can be expressed in accessible everyday terms. Comparison groups must be similar to treatment groups in every respect (other than the treat- ment) that might influence the outcome; there must be no changes other than the treatment between pretest and posttest; and so on. To the extent that we know the factors that influence the outcome variable, we may be able to assess whether there are differences between comparison and treatment conditions. Analysis of covanance, selection models, structural equation models and other statistical techniques require assumptions that are generally expressed in formal statistical terms that are somewhat removed from everyday experience. Analysis of covanance, for instance, assumes that the relationship between outcome variables, covanates, and the treatment can be adequately and fully expressed In a particular form of (single- equation) statistical model. Selection models based on histoncal controls assume Mat the treatment and comparison groups are similar wide respect to the first, second, Bird, or higher order differences over time In the Outcome variable. Structural equation models make complex assumptions about the covanance structure among all of variables in Be model. The appropriateness of such assumptions can be quite difficult to assess, even if one is familiar win the statistical language and the subject matter, and external validation data are often unavailable. Although there are some statistical techniques for testing the inadequacy of the requisite assumptions for all of these models, Mere is no general way to determine Mat Be assumptions hold. In summary, compared to quasi-experimental designs, the complex statistical alternatives to randomization require more elaborate assump- tions Mat can be quite difficult to verify.

OCR for page 124
188 ~ EVALUATING AIDS PREVENTION PROGRAMS interpretation Besides plausible assumptions, the interpretation of observational studies is also a function of data quality and competing hypotheses for change in the observed outcomes. The panel has addressed both of these issues in this chapter, but we wish to add a note on how competing hypotheses might be ruled out in a more trustworthy way. A set of six criteria developed by Hill (1971) to assess observational studies In the field of medicine are of interest. These criteria, which have been modified over the years, point out the need to take into account the whole of the evidence, not just selected studies, in interpreting whether an observed association is causal. A recent report of the Committee on Diet and Health (1989) restated HiD's criteria to include the following: the strength of association between He intervention and the observed outcome, He "dose-response relationship" in which greater effects are demonstrated from more intense ~eatments,56 a temporally correct association (i.e., an appropriate fume sequence between the intervention and the observed out- come), the consistency with which similar associations are found in a variety of evaluations, the specificity of He association, and plausibility (i.e., the supposed causal association comports with existing knowledge). Although several of these criteria are applicable to the findings of any one study, the consistency of association and the notion of plausibility argue that a study also be interpreted in the context of other findings. One of the greatest difficulties for observational studies to surmount is their vulnerability to counter-hypotheses that could account for differ- ences between the comparison and treatment groups (based on factors other than the intervention). Although this problem is inherent to the approach, certainty about a particular causal inference increases as a reservoir of similar findings is accumulated across studies using disparate methods. What is more, even flawed studies can be convincing when a body of evidence is compiled. When data are drawn from several studies, however, they are some- t~mes difficult to compare because the studies use different definitions of target audiences, different specifications of causal variables, different . 56A ceiling effect may sometimes appear, thus diluting Me dose-response relationship.

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 189 outcome measures, different wordings of survey questions, and so on. These differences make it hard to compare results across studies, and de- tract from their interpretation as a whole. Moreover, differences between studies also make results difficult to generalize, regardless of whether expenmental or nonexperimental studies are used. We believe that a way exists to improve their interpretability. The pane! recommends that the Public Health Service and other agencies that sponsor the evaluation of AIDS preven- tion research require the collection of selected subsets of common data elements across evaluation studies to ensure comparability across sites and to establish and improve data validity and reliability.57 Questions about a project's applicability to other populations require information on the populations for which the project succeeded, peculiar- ities of the region or the population that were important to its success, and the cost of the project and possible areas for cost reduction. The hope is that an evaluation that suggests success for a particular project In one area win lead to a rapid implementation of the project In similar regions and to its gradual implementation in regions less and less similar to the original site evaluated, so that the generalizability of the Initial finding is not assumed to stretch too far without emp~ncal verification. None of this is meant to imply that the pane] urges scores of evalua- tions. The pane} believes that more certain and useful knowledge will be gained by a smaller number of wet/-executed studies than by a precipi- tous rush to assess the effects of every prevention program that is being mounted. At present, the panel believes the randomized experiment to be the most appropriate design for outcome evaluation, both In teas of clarity and dispatch of results, all else being equal. At the same tune, we recognize that the strategy will not always be feasible or appropriate and, for these situations, other designs may have to be deployed until evidence accumulates to make their interpretation dependable or until a randomized experiment can be conducted. REFERENCES Barnow, B. S. (1973) The effects of Head Start and socioeconomic status on cog- nitive development of disadvantaged children. Ph.D. dissertation. University of Wisconsin. 57Fu~ermore, methodological research is urgently needed to study the validity and reliability of be- havioral measurements. Appendix C is devoted to a discussion of these issues.

OCR for page 124
190 ~ EVALUATING AIDS PREVENTION PROGRAMS Barnow, B. S., Cain, G. G., and Goldberger, A. S. (1980) Issues in the analysis of selectivity bias. In E. W. Stromsdorfer and G. Parkas, eds., Evaluation Studies Review Annual, Vol. 5. Beverly Hills, Calif.: Sage Publications. gentler, P. M. (1980) Multivanate analysis with latent vanables: Causal modeling. Annual Review of Psychology 31:419~56. gentler, P. M. (1990) Structural equation modeling and AIDS prevention research. Presented at the NRC Conference on Nonexperimental Approaches to Evaluating AIDS Prevention Programs, Washington, D.C., January 12-13. Berk, R. A., and Rauma, D. (1983) Capitalizing on nonrandom assignment to treatments: A regression~iscontinuity evaluation of a cnme-control program. Journal of the American Statistical Association 78:21-27. Betsey, C. L., Hollister, R. G., and Papageorgiou, M. R., eds. (1985) Youth Employment and Training Programs: The YEDPA Years. Report of the NRC Committee on Youth Employment Programs. Washington, D.C.: National Academy Press. Boruch, R. F. (1986) Comparative aspects of randomized experiments for planning and evaluation. In M. Bulmer, ea., Social Science Research and Government. New York: Cambridge University Press. Boruch, R. F., and Riecken, H. W., eds. (1975) Experimental Tests of Public Policy.. Boulder, Cola.: West Press. Box, G. E. P., and Tiao, G. C. (1965) A change in level of non-stanonary time senes. Biometrika 52:181-192. Bryk, A. S., and Weisberg, H. I. (1976) Value-added analysis: A dynamic approach to the estimation of treatment effects. Journal of Educational Statistics 1:127-155. Campbell, D. T. (1990) Quasi-experunental design in AIDS prevention research. Pre- sented at He NRC Conference on Nonexperimental Approaches to Evaluating AIDS Prevention Programs, Washington, D.C., January 12-13. Campbell, D. T., and Stanley, J. C. (1966) Experimental and Quasi-Experimental Designs for Research. Chicago: Rand McNally. Chapin, F. S. (1947) Experimental Designs in Sociological Research. New York: Harper. Coates, T. J., McKusick, L., Kuno, R., and Stites, D. P. (1989) Stress reduction training changed number of sexual partners but not immune function in men with HIV. American Journal of Public Health 79:885-887. Cochran, W. G. (1965) The planning of observational studies of human populations. Journal of the Royal Statistical Society, Part 2, 128:234-255. Cook, T. D., and Campbell, D. T. (1979) Quasi-Experimentation: Design & Analysis Issues for Field Settings. Boston: Houghton Mifflin. Committee on Diet and Heals (1989) Diet and Health: Implications for Reducing Chronic Disease Risk. Report of the NRC Food and Nutrition Board. Washington, D.C.: National Academy Press. Coronary Drug Project Research Group (1980) Influence of adherence to treatment and response of cholesterol on mortality in He coronary drug project. New England Journal of Medicine 303:1038-1041. Duncan, O. D. (1975) introduction to Structural Equation Models. New York: Academic Press. Dwyer, J. H. (1983) Statistical Models for the Social am1 Behavioral Sciences. New York: Oxford University. Ehrenberg, A. S. C. (1968) The elements of lawlike relationships. Journal of the Royal Statistical Society, Series A, 131:280-302.

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 191 Emmett, B. P. (1966) The design of investigations into the effects of radio and television progranuT~es and other mass communications. Journal of the Royal Statistical Society, Part 1, 129:2649. Fehrs, L. J., Fleming, D., Foster, L. R., McAlister, R. O., Fox, V., et al. (1988) Trial of anonymous versus confidential human immunodeficiency virus testing. Lancet 2:379-382. Fisher, B., Redmond, C., Fisher, E. R., Bauer, M., Wolmark, N., et al. (1985) Ten-year results of a randomized clinical trial comparing radical mastectomy and total mastectomy with or without radiation. New England Journal of Medicine 3 12:674-681. Fleiss, J. L., and Tanur, I. M. (1973~1,he analysis of covariance in psychopathology. In M. Hammer, K. Salzinger, and S. Sutton, eds., Psychopathology: Contributions from the Social, Behavioral, and Biological Sciences. New York: John Wiley & Sons. Fox, R., Odaka, N. 3., Brookmeyer, R., and Polk, B. F. (1987) Effect of HIV antibody disclosure on subsequent sexual activity in homosexual men. AIDS 1:241-246. Praker, T., and Maynard, R. (1986) The Adequacy of Comparison Group Design for Evaluations of Employment-Related Programs. Princeton, N.J.: Mathematica Policy Research. Friedman, S. R., Rosenblum, A., Goldsmith, D., Des Jarlais, D. C., Sudan, M., et al. (1989) Risk factors for lIIV-1 infection among street-recruited intravenous drug users in New York City. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9. Fuller, R. K., Branchey, L., Brightwell, i). R., Derman, R. M., Emuck, C. D., et al. (1986) Disulfiram treatment of alcoholism: A Veterans Administration cooperative study. Journal of the American Medical Association 256:1449-1455. Goldberger, A. S., and Duncan, O. D., eds. (1973) Structural Equation Models in the Social Sciences. New York: Seminar Press. Gostin, L., and Ziegler, A. (1987) A review of AIDS-related legislative and regulatory policy in the United States. La~v, Medicine & Health Care 15:5-16. Hartigan, J. (1986) Discussion 3: Alternative methods for evaluating the impact of intervention. In H. Wa~ner, ea., Drawing Inferences from Self-selected Samples. New York: Springer-Verlag. Heckman, J. J. (1979) Sample selection bias as a specification error. Econometrica 47:153-162. Heckman, J. J., and Robb, R. (1985a) Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics 30:239-267. Heckman, J. J., and Robb, R. (1985b) Alternative methods for evaluating the impact of ~nterventions. In J. Heckman and B. Singer, eds., Longitudinal Analysis of Labor Market Data. Cambridge: Cambridge University Press. Heckman, J. J., and Robb, R. (1986a) Alternative methods for solving the problem of selection bias in evaluating the impact of treatments on outcomes. In H. Wainer, ea., Drawing inferences from Self-selected Samples. New York: Springer-Verlag. Heckman, J. J., and Robb, R. (1986b) Postscript: A rejoinder to Tukey. In H. Wainer, ea., Drawing Inferences from Self-selected Samples. New York: Springer-Verlag. Heckman, J. J., and Hotz, V. J. (1989a) Choosing among alternaiive nonexperimental me~ods for estimaiing ~e impact of social programs: The case of manpower training. Journal of the American Statistical Association 84:862-874.

OCR for page 124
192 ~ EVALUATING AIDS PREVENTION PROGRAMS Heckman, J. J., and Hotz, V. J. (1989b) Rejoinder. Journal of the American Statistical Association 84:878-880. Hennigan, K. M., Del Rosano, M. L., Heath, L., Cook, T. D., Wharton, J. D., and Calder, B. J. (1982) Impact of the introduction of television on crime in the United States: Empirical findings and theoretical implications. Journal of Personality and Social Psychology 42:461-477. Hill, A. B. (1971) principles of Medical Statistics. 9th ed. New York: Oxford University Press. Holland, P. W. (1989) Comment: It's very clear. Journal of the American Statistical Association 84:875-877. Hubbard, R. L., Marsden, M. E., Cavanaugh, E., Rachal, J. V., and Ginzburg, H. M. (1988) Role of drug-abuse treatment in limiting the spread of AIDS. Reviews of Infectious Diseases 10:377-384. Joseph, J. G., Montgomery, S. B., Emmons, C. A., Kessler, R. C., Ostrow, D. G., et al. (1987) Magnitude and determinants of behavioral risk reduction: Longitudinal analysis of a cohort at risk for AIDS. Psychology and Health 1:73-95. Kelly, J. A., St. Lawrence, J. S., Hood, H. V., and Brasfield, T. L. (1989) Behavioral intervention to reduce AIDS risk activities. Journal of Consulting and Clinical Psychology 57:60-67. Kelly, 3. A., St. Lawrence, J. S., Stevenson, L. Y., Diaz, Y. E., Hauth, A. C., et al. (1990) Population-wide risk behavior reduction through diffusion of innovation following intervention with natural opinion leaders. Presented at the Sixth International Conference on AIDS, San Francisco, June 23. LaLonde, R. J. (1986) Evaluating the econometric evaluations of training programs with experimental data. American Economic Review 76:604-620. Lohr, W. (1972) An historical view of the research on the behavioral and organizational factors related to the utilization of health services. Social and Economic Analysis Division, Bureau for Health Services Research and Evaluation, Rockville, Md. January. Lord, F. M. (1967) A paradox in the interpretation of group compansons. Psychological Bulletin 68:304-305. Maddala, G. S. (1983) Limited-Dependent Variable and Qualitative Variables in Econo- metrics. Cambridge: Cambridge University Press. Magidson, J. (1977) Toward a causal model approach for adjusting for preexisting differences in the nonequivalent control group situation. Evaluation Quarterly 1:399420. Martin, J. L., and Dean, L. (1989) Risk factors for AIDS related bereavement in a cohort of homosexual men in New York City. In B. Cooper and T. Helgason, eds., Epidemiology and the Prevention of Mental Disorders. London: Routledge & Kegan Paul. Maxwell, S. E., and Delany, H. D. (1990) Designing Experiments and Analyzing Data. Belmont, Calif.: Wadsworth Publishing. McCusker, J., Stoddard, A. M., Mayer, K. H., Zapka, J., Morrison, C., and Saltzman, S. P. (1988) Effects of HIV antibody test knowledge on subsequent sexual behaviors in a cohort of homosexually active men. American Journal of Public Health 78:462~67. McGlothlin, W. H., and Anglin, M. D. (1981) Shutting off methadone. Archives of General Psychiatry 38:885-892.

OCR for page 124
RANDOMIZED AND OBSERVATIONAL APPROACHES ~ 193 McKay, H., McKay, A., and Sinisterra, L. (1973) Stimulation of intellectual and Social Competence in Colombian Preschool-Age Children Affected by the Multiple Deprivations of Depressed Urban Environments. Second Progress Report. Call, Colombia: Human Ecology Research Station, Universidad del Valle. September. McKusick, L., Horstman, W., and Coates, T. J. (1985) AIDS and sexual behavior reported by gay men in San Francisco. American Journal of Public Health 75:493496. Miller, H. G., Turner, C. F., and Moses, L. E. (1990) AIDS: The Second Decade. Report of the NRC Committee on AII:)S Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. Moffitt, R. A. (1989) Comment. Journal of the American Statistical Association 84:877-880. Moffitt, R. A. (1990) Applying Heckman methods for program evaluation to CDC AIDS prevention programs. Presented at the NRC Conference on Nonexperimental Approaches to Evaluating AIDS Prevention Programs, Washington, D.C., January 12-13. Mood, A. M. (1950) Introduction to the Theory of Statistics. New York: McGraw-Hill. Nelson, K. E., Vlahov, D., Margolick, J., and Bernal, M. (1989) Blood and plasma donations among a cohort of IV drug users. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9. Riecken, H. W., and Boruch, R. F., eds. (1974) Social Experimentation: A Method for Planning and Evaluating Social Intervention. Report of a Committee of the Social Science Research Council. New York: Academic Press. Silverman, W. A. (1977) The lesson of retrolental fibroplasia. Scientific American 236~6~:100-107. Smith, H. S. (1957) Interpretation of adjusted treatment means and regressions in analysis of covariance. Biometrics 13:282-308. Transportation Research Board (1984) 55: A Decade of Experience. Special Report 204 of the NRC Committee for the Study of the Benefits and Costs of the 55 MPH National Maximum Speed Limit. Washington, D.C.: National Academy Press. Tukey, J. W. (1986a) Comments. In H. Wainer, ea., Drawing inferences from Self- selected Samples. New York: Springer-Verlag. Tukey, J. W. (1986b) Discussion 4: Mixture modeling versus selection modeling with nonignorable nonresponse. In H. Wainer, ea., Drawing inferences from Self-selected Samples. New York: Springer-Verlag. Turner, C. F. and Martin, E. (1984) Surveying Subjective Phenomena. Two volumes. New York: Russell Sage. Turner, C. F., Miller, H. G., and Moses, L. E., eds. (1989) AIDS, Sexual Behavior, and Intravenous Drug Use. Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. Valdise~ri, R. O., Lyter, D. W., Leviton, L. C., Callahan, C. M., Kingsley, L. A., and Rinaldo, C. R. (1989) AIDS prevention in homosexual and bisexual men: Results of a randomized trial evaluating two risk reduction interventions. AIDS 3:21-26. Wilder, C. S. (1972) Physician Visits, Volume and Interval Since Last Visit, United States-1969. Acts and Health Statistics, Series 10, No. 75. Rockville, Md.: National Center for Health Statistics.

OCR for page 124
194 ~ EVALUATING AIDS PREVENTION PROGRAMS W~nkelstein, W., Samuel, M., Padian, N. S., Wiley, J.A., Lang, W., Anderson, R. E., and Levy, J. A. (1987) The San Francisco Men's Health Study. m. Reduction in human immunodeficiency virus transmission among homosexuavbisexual men, 1982-86. Americar: Journal of Public Health 77:685-689. Ziffer, A., and Ziffer, J. (1989) The need for psychosocial emphasis in academic courses on AIDS. Presented at the Fifth International Conference on AIDS, Montreal, June 4-9.