Click for next page ( 343


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 342
F The Use of Selection Modeling to Evaluate AIDS Interventions with Observational Data Robert Moffitt I. INTRODUCTION This paper considers the potential applicability to AIDS interventions of nonexpenmental evaluation methods developed by econome~mcians in the evaluation of social programs. Econometric methods for program evaluation have been studied by economists over the past twenty years and are by now quite well-developed. To give the discussion a focus, two types of interventions are considered: AIDS counseling and testing (C&T) programs, and AIDS programs run by community-based organi- zations (CBOs). While both C&T programs and CBOs are quite diverse, especially He CBOs, many are designed to encourage the adoption of sexual prevention behaviors and to encourage risk reduction behaviors more generally. It is this outcome that will be Me focus of the analysis here. This paper was presented at the Conference on Nonexperimental Approaches to Evaluating AIDS Prevention Programs convened on January 12-13, 1990 by the Panel on the Evaluation of AIDS Inter- ventions of the National Research Council. The views expressed in this paper are those of the author, they should not be attributed to the Panel or to tile NRC. Comments on an earlier version of the pa- per from James Heckman, V. Joseph Hotz, Roderick Little, Charles Manski, and Lincoln Moses are appreciated. A version of this paper is to appear in Evaluation Review. 342

OCR for page 342
APPENDIX F | 343 In the next section of the paper, a brief historical overview of the development of econometric methods for program evaluation is given. Following that, in Section HI, a more formal statistical exposition of those methods is given. This section constitutes the major part of the paper. The conclusion of the discussion is that nonexperimental evaluation in general requires either adequate data on the behavioral histories of par- ticipants and non-participants In the interventions, or the availability of identifying vanables ("Z's") that affect the availability of the treatment to different individuals but not their behavior directly. Whether either of these conditions can be met in the evaluation of AIDS interventions is then discussed In Section IV for C&T and CBO programs. A summary and conclusion is provided in Section V. II. HISTORICAL DEVELOPMENT OF ECONOMETRIC METHODS FOR PROGRAM EVALUATION Most of the econometric methods for program evaluation have been de- signed to evaluate government-sponsored manpower training programs, where the major issue has been whether such programs increase ~ndi- vidua] earnings and other indicators of labor market performance. Such programs began to appear in the early 1960s with the Manpower Devel- opment and Training Act (MDTA) of 1962, and grew more extensive in the late 1960s as part of the War on Poverty. They became a fixture in the 1970s and 1980s, though changing in name and form from, for example, the Comprehensive Employment and Training Act (CETA) program In the 1970s to the Job Training and Partnership Act (JTPA) program in the 1980s. However, economists have also conducted extensive studies of welfare programs of other types, of health and education programs, and many others. One of He earliest studies (Ashenfelter, 1978) presented an econo- me~ic mode] for the estimation of the effect of the MDTA program on earnings using observational data. Many studies were later conducted of the CETA program and have been surveyed by Barnow (1987~. No major evaluation studies of the J IPA program have been completed, although one is currently underway. A recent study of an experimental training program called Supported Work has been published by Heckman and Hotz (1989), and win be discussed further below. The econometric literature on program evaluation underwent a ma- jor alteration in its formal framework after the separate development of "selectivity bias" techniques in the mid-1970s. Originally, the selectivity bias issue in economics concerned a miss~ng-data problem that arises In the study of individual labor market behavior, namely, the inherent

OCR for page 342
344 ~ EVALUATING AIDS PREVENTION PROGRAMS unobservability of the potential market earrungs of individuals who are not working. The development of techniques for properly estimating such potential earnings (Gronau, 1974; Lewis, 1974; Heckman, 1974) was quickly realized to have relevance to the estimation of the effects of public programs on economic behavior. As will be discussed extensively in the next section, a similar selectivity bias problem arises in obser- vational evaluation studies through the inherent unobservability of what would have happened to program participants had they not received the treatment, and of what would have happened to non-par~cipants had they undergone the treatment. The connecting link was first explicitly made by Bamow, Cain, and Goldberger (1980), which included a comparison of the new technique with earlier techniques. A textbook treatment of the applicability of selectivity bias methods to program evaluation be- came available shortly thereafter (Maddala, 1983), as well as a survey of the applicability of those methods to health interventions in particular (Maddala, 1985~. The recent work of Heckman and Robb (1985a, 1985b) represents the most complete and general statement of the selectivity bias problem In program evaluation using observational data, and provides He most Borough analysis of the conditions under which the methods win yield good estimates and of the estimation me~ods available to obtain such estimates. The analysis In the next section of this paper is heavily influenced by the work of Heckman and Robb, which is itself built upon the almost twenty years of work on econometric methods for program evaluation. III. THE STATISTICS OF PROGRAM EVALUATION WITH OBSERVATIONAL DATA Although the statistical methods exposited in this section are applicable to any program in principle and will be developed fairly abstractly, it may help for specificity to consider the evaluation of C&T programs. Such programs have many goals but, for present purposes, it will be assumed that the major goal is to encourage those who receive the services of the program to adopt risk reduction activities and sexual prevention behaviors to reduce the likelihood of HIV infection to themselves and to others. The aim of He evaluation is to determine whether such programs do indeed have such effects and, if so, to provide an estimate of them magnitude. To begin the formal analysis, let Y be the outcome variable (e.g., level of prevention behavior) and make He following definitions: YE = Level of outcome variable for individual i at time t, assum- ~ng he has not received the "treatment" (i.e., the services of a C&T program)

OCR for page 342
APPENDIX F ~ 345 Yin** = Level of outcome variable for individual i at time t, assum- ing he has received the treatment at some prior date The difference between these two quantities is the effect of the treatment, denoted c>: Yit =Yit+~ (~) or C} = Yit Yit (2) The aim of the evaluation is to obtain an estimate of the value of a, the treatment effect, from whatever data are ava~lable.~ The easiest way to Wink about what we seek in an estimate of cat is to consider individuals who have gone through a C&T program and therefore have received the treatment, and for whom we later measure their value of Yin**. Ideally, we wish to know the level of Yip for such individuals we would like to know what their level of prevention behaviors, for example, would have been had they not gone through the program. If Yi'* could be known, the difference between it and Yin** would be a satisfactory estimates of a. The difficulty that arises is that we do not observe Yi`* directly but only the values of Yin* for non-participants. Define a dummy variable for whether an individual has or has not received the treatment: di = ~ if individual i has received the treatment di = 0 if individual i has not received the treatment Then a satisfactory estimate of ~ could be obtained by estimating the difference between Yin** and Yi'* for those who went through the program: = E(Yit**~di = I)E(Yit*~di = I) (3) where E is the expectation operator. The estimate & in (3) is, in fact, the estimate that would be obtained if we had administered a randomized trial for the evaluation. For example, as individuals come In through the door of a C&T program, they could be randomly assigned to treatment status or control status, where the latter would involve receiving none of He services of the program. At some later date we could measure the levels of Y for the two groups and calculate (3) to obtain an estimate of the effect of the program. 1 For simplicity, the treatment effect, a, is assumed to be constant over time and across individuals, and to be non-random. Random treatment effects across individuals have been incorporated by B3orklund and Moffitt (1987) and are discussed by Heckman and Robb (1985a, 1985b). 21n standard econometric practice, Y.` is set equal to X,6~e, where X is a vector of observed variables, ,B is its coefficient vector, and ~ is an error term.

OCR for page 342
346 ~ EVALUATING AIDS PREVENTION PROGRAMS The Problem The first key point of the statistical literature on evaluation is Mat ob- servational, nonexperimental data do not allow us to calculate (3) and therefore do not allow us to compute the estimate that could be obtained with a randomized trial. This is simply because we generally do not observe in such data any individuals who would have taken the treatment but do not; we only generally observe individuals who did not take the treatment at all.3 What we can estimate with nonexperimental data is an effect denoted here as a: cat = E(Yit** Eli = I)E(Yit Eli = 0) (4) which is just the difference between mean Y for participants, those who did take the treatment (di = I) and He mean Y for non-participants, those who did not undergo the treatment (ah = 0) When will the estimate we are able to calculate, a, equal the estimate we would have obtained with a randomized teal, &? Comparison of (3) and (4) shows that the two will be equal if and only if the following condition is true: E(Yit Eli = 1) = E(Yit Eli = 0) (5) In words, the two estimates of ax are equal only if the value of YE for those who did not take the treatment equals the value of Yi' that Hose who did take the treatment would have had, had they not gone through the program. The heart of the nonexperimental evaluation problem is reflected In equation (5), and an understanding of that equation is necessary to understand the pervasiveness and unavoidability of what is termed the "selection bias" problem when observational data are employed. The equation will fall to hold under many plausible circumstances. For example, if those who go through a C&T program are concerned with their health and have already begun adopting prevention behaviors even before entering the program, they wiD be quite different from those who do not go through He program even prior to receiving any program services. Hence equation (5) win fall to hold because those who go through the program have different levels of Yi`, that is, different levels of prevention behavior even in the absence of receiving any program services. Our estimate of ~ will be too high relative to &, for He 3 Some evaluation designs make use of individuals on waiting Lsts as controls. Unfortunately, these individuals may not be randomly selected from the applicant pool; if they are not, Weir Y values will not be an accurate proxy for those of participants.

OCR for page 342
APPENDIX F | 347 greater level of prevention behaviors observed for the treatment group subsequent to receiving services was present even prior to the treatment, and is therefore not a result of the treatment itself. Those who are observed to have actually gone through the program are therefore a "self- selected" group out of the pre-treatment population, and the estimate of is contaminated by "selectivity bias" because of such self-selection. The unavoidability of the potential for selectivity bias arises because the validity of equation (5) cannot be tested, even in principle, for the left-hand side of that equation is inherently unobservable. It is impossible In principle to know what the level of Yip for those who went through the program would have been had they not gone through it, for that level of Yip is a "counterfactual" that can never be observed. We may know, as discussed further below, the pre-treatment level of Ye for those who later undergo treatment, but this is not the same as the Yip we seek for the left-hand side of (5), we need to know the level of Yip for program participants that they would have had at exactly the same time as they went through the treatment, not at some previous time.4 Solutions There are three general classes of potential solutions to the selection bias problem (Heckman and Robb, 1985a, 1985b). None guarantees the elimination of the problem altogether, but rather each seeks to determine possible circumstances under which the problem could be eliminated. The question is then whether Lose circumstances hold. At the outset, it is important to note that two of the three solution methods have important implications for evaluation design because they require Be collection of certain types of data. Whether those data can be collected for AIDS programs then becomes the most important question, which is discussed In detail In Section IV. Solution ]: Identifying Variables ("Z's"J The selection bias problem can be solved if a vanable Zi is available, or one can be found, Cat (1) affects the probability that an individual receives Me treatment but which (2) has no direct relationship to Yip (e.g., no direct relationship to individual prevention behavior). What is an example of such a Zi? A Zi could be constructed if, for example, a C&T program were funded by CDC In one city and not In another for political 4Ie may be noted that Manski (1990) has pointed out that if Yz' is bounded (e.g., between O and 1), a worst-case/best-case analysis can be conducted in which the unobserved counterfactual is taken to equal each of the bounds in turn. This gives a range in which the true effect must lie instead of a point estimate.

OCR for page 342
348 ~ EVALUATING AIDS PREVENTION PROGRAMS or bureaucratic reasons unrelated to the needs of the populations in the two cities, and therefore unrelated to the likelihood that the individuals in He two cities practice prevention behaviors. If a random sample of the relevant subpopulations were conducted in the two cities and data on Y were collected the data would include both participants and non-participants in Be city where the C&T program was fundeda comparison of the mean values of Y In the two could form the basis for a valid estimates of a. The variable Zi In this case should be Fought of as a dummy vanable equal to 1 In the C&T city and O in the over. The vanable satisfies the two conditions given above- it obviously affects whether Individuals In the two cities receive the treatment, since if Zi = 0 no C&T treatment is available, and it is unrelated to the level of YE in the two cities because the funding decision was made for reasons unrelated to sexual prevention behavior.6 This estimation method is known in econometrics as an "instrumental-var~able" method and Zi is termed an "instrument." It is an instrument In He sense that it is used to proxy the treatment variable itself. What is an example of an illegitimate Zi? The same dummy variable defined In the previous example would be illegitimate if the CDC funding decision were based not on political or bureaucratic decisions but on the relative level of need in the two cities. For example, if He C&T program were placed in the city with the higher rate of HIV infection, then Zi would not be independent of Yip the presence of the C&T program would be associated with lower levels of prevention behavior not because of a negative causal effect of He program but because of the reason for its placement. Further examples of legitimate and illegitimate "Z's" will be dis- cussed in Section IV. Solution 2: Parametric Distributional Assumptions on Yip A second solution to the selection bias problem arises if a paramedic distributional assumption on YE can be safely made or determined win reasonable certainty (Heckm~ and Robb, 1985a, 1985b). For example, if Yip follows a normal, logistic, or some over distribution with a finite set of parameters, identification of a program effect free of selection bias 5 For example, if Y1 is the mean value of the outcome variable in the city with the program and Yo is the mean value in the city without one, and if p is the fraction of the relevant subpopulation in the first city that participated in the program, the impact estimate would be calculated as (YeYO)/p. 6Essentially, this is a case of what is often telexed a "natural" experiment. It is similar to an experiment inasmuch as He probability of having the treatment available is random with respect to the outcome variable under study.

OCR for page 342
APPENDIX F 349 is possible. The reasons are relatively technical and difficult to explain in simple terms. However, this method will not be especially useful for the AIDS interventions because very little is known about the distribution of sexual prevention behaviors, for example, in the total population or even the high-risk population. Consequently, this method will not be considered further. Solution 3: Availability of Cohort Data A third solution memos requires the availability of "cohort," "Iongitudi- nal," or "panel" data, that is, data on the same individuals at several points In time before and after some of them have undergone the treatment. In the simplest case, data on Ye are available not only after the treatment but also before, giving a data set with one pre-treatment observation and one post-treatment observation for each individual, both participants and non-participants. In the more general case, three or more points in time may be available in the data. The use of such cohort data is sufficiently important to warrant an extended discussion. To illustrate this method, first consider the situation that would arise if data at two points In time were available, one before the treatment and one after it. Let"t" denote the post-treatment point and "tat" denote the pre-treatment point. Then, analogously with the cross-section case considered previously, Yit Yi*t_, = Change in Yit from t-] to t in the absence of having undergone the treatment Yi`* Yi*~_~ = Change in Yin* from t-] to t if having undergone the treatment Then the effect of the treatment is a, and Yit Yitt_1 = (YitYi,t-l) + ~ (6) Since Yi*~_~ cancels out on both sides of (6), (6) is the same as (I) and therefore the true effect, a, is the same. As before, a preferred estimate of the effect of the program could be obtained by a randomized tnal in which those wishing to undergo the treatment (di = I) are randomly assigned to participation or non- participation status. With data on both pre-treatment and post-treatment status, the estimate of the program effect could be calculated as: & = E(Yi~*Yi*~_~di = 1)- E(Yi~Yi,~_~di = I) (7) However, with observational data the second term on the nght-hand side of (7) is not measurable since, once again, we cannot measure Yi`* for

OCR for page 342
350 ~ EVALUATING AIDS PREVENTION PROGRAMS those who undergo the treatment. We can instead only use the data on Yip available from non-participants to estimate the program effect as follows: ~ = EtY**Y*^ . ~d. = 1NE(Y*Y.*. . id = no `8y Miltloci 11 ~\lit wilttori~J The estimate cat is often called a "differences" estimate because it is computed by comparing the first-differenced values of Y for participants and non-pariicipants. The estimate we are able to obtain in (~) will equal that we could have obtained in the randomized teal, (7), if and only if E(YitYi*~_~di = I) = E(YitYi,~_~4i = 0) (9) Equation (9) is the key equation for the two data-po~nt case and is the analogue to equation (5) in the single post-treatment data case. The equation shows that a data set with a pre-treatment and post-treatment observation will yield a good estimate of cat if the change in YE from pre to post would have been the same for participants, had they not undergone the treatment, as it actually was for non-participants. Sometimes He change in Yip is referred to as the "growth rate" of Yip, in which case we may say that our nonexperimental estimate requires that the grown rate of participants and non-participants be the same in the absence of the treatment. Perhaps the most important point is that this condition may hold even though the condition in (5) does not. Equation (5), the condition Mat must hold for the nonexperimental estimate in a single post-treatment cross-section to be correct, requires that the levels of Yin be the same for participants and non-participants in the absence of He treatment. Equation (9), on the over hand, only requires that the growth rates of Yin be the same for participants and non-participants in the absence of the treatment, even though the levels may differ. The latter is a much weaker condition and win more plausibly hold. The nature of the condition is illustrated in panels (a) and (b) of Figure 1. In panel (a), the pretreatment levels of non-participants and participants, A and A', respectively, are quite different participants have a higher level of Y. as would be the case, for example, if those who later undergo C&T have higher prevention behaviors in the first place. From t-] to t, the level of Y for non-participants grows from A to B. as might occur if everyone in the population under consideration (e.g., homosexual men) were increasing their degree of risk reduction behaviors even without participating in a C&T program. The figure shows, for illustration, a growth rate of Y for participants from A'

OCR for page 342
- APPENDIX F | 351 UJ 12 _ 10 _ 8 _ 6 _ 4 _ _ 2 Pars ~ Par~dpa~l A ~ A / OCR for page 342
352 ~ EVALUATING AIr)S PREVENTION PROGRAMS assumption cannot be verified because point B' is not observed; it is only a "counterfactual." But clearly the estimate in the figure would be a much better estimate than that obtained from a single post-~eatment cross-section, which would take the vertical distance between B and C as the treatment estimate. This would be invalid because equation (5) does not hold. Pane} (b) In Figure 1 shows a case where condition (9) breaks down. On that panel, a case is shown in which He Y of participants would have grown faster than that for non-participants even In the absence of the treatment (A' to B' is greater than A to By. This might anse, for example, if those individuals who choose to undergo C&T are adopting risk reduction behaviors more quickly than non-participants. In this case, our estimate of ~ is too high, since it measures the vertical distance between B" and C instead of between B' and C. Neither B' nor B" is observed, so we cannot know which case holds. The primary conclusion to be Lawn from this discussion is that we may be able to do better in our estimate of program effect with more data. Adding a single pre-treatment data point permits us to compute an estimate of the treatment effect the differences estimator In (~ - that may be correct In circumstances in which the estimator using a single post-treatment is not. The importance of having additional data on the histories of Y. or the sexual behavior histories of C&T participants and non-participants, for example, stands in contrast to the situation faced when conducting a randomized tIia] where, strictly speaking, only a single post-treatment cross section is required. Thus we conclude that more data may be required for valid inference In nonexpenmental evaluations than in experimental evaluations. This point extends to the availability of additional pre-treahnent observations.7 Suppose, for example, that an additional pretreatment ok servation is available at time t-2. The estimate calculable in a randomized tnal is = EE(Yit Yi,`_~)(Yi,~_iYi,`-2~di = I] EE(YitYi,`_~)(Yi,~_iYi,`-2~di = I] (10) while the estimate permitted In an observational study is at = E[(Yit Yitt-l)(Yi,t_1Yi*t_2)ldi = 1] - Er(Y3tYitt-l)(Yi,t-1Yi,t_2)ldi = 0] (11) 7 Gathering data from additional post-treatment observations is easier but does not serve the appropriate control function. Prior to the treatment, it is known with certainty that the program could have no true effect; after the treatment, it cannot be Mown with certainty what the pattern of the effect is, assuming it has an effect. Consequently, participant/non-participant differences in Yin after the treatment can never be treated with absolute certainty as reflecting selection bias rather than a true effect.

OCR for page 342
354 ; EVALUATING AIDS PREVENTION PROGRAMS the nonexpenmenta] estimator. In the general case, a slight modification in the mode] allows us to write the estimate of the treatment effect as the follow~ng:8 = E(Yit phi = 1, Yin,_, Yin,_ 2' - -, Yin,k) E(Yit ~ Hi = 0, Yi,`~ ' Yi,~2, , Yip,k (13) assuming that data are available for k pre-treatment periods. This esti- mator will equal that obtainable in a randomized mal if and only if the following condition holds: E(Yit ~ di = 1 , Yin,_ ~ , . ., Yin,_ k) = E(Yit ~ di = 0, Yi,~_ ~ , . . ., Yi,~_k) ~ 14) This condition can be interpreted as requiring that the values of di and Yi' must be independent of one another conditional upon the history of Yi' up to to-. Put differently, it must be the case that it if we observe two individuals at time t-] who have exactly the same history of Yip up to that time (e.g., the exact same history of sexual prevention behaviorsWand who therefore look exactly alike to the investigators they must have the same value of YE in the next time period regardless of whether they do or do not undergo the treatment. If, on the other hand, the probability of entering a C&T program is related to the value of Yi`* they would have had if the treatment were not available, the condition in equation (14) will not hold and the nonexperimental estimate will be inaccurate. The Relationship between Data Availability and Testing of Assumptions The discussion thus far has demonstrated that the availability of certain types of datainformation on legitimate "Z" vanables, or on individual histories is related to He conditions that must hold, and the assumptions that must be made, In order to obtain an estimate of program effect similar to that obtainable In a randomized trial. A natural question is whether any of the assumptions can be tested, and whether it can be determined if the conditions do or do not hold. The answer to this question once again is related to data availability. The technical answer to the question is that "overidentifying" as- sumptions can be tested but that "just identifying" assumptions cannot This autoregressive model was estimated in an early economic study by Ashenfelter (1978). A simpler model but one more focused on the evaluation question was also analyzed by Goldberger (1972) in a study of methods of evaluating the effect of compensatory education programs on test scores when the treatment group is selected, in pan, on the basis of a pretest score.

OCR for page 342
APPENDIX F | 355 Model I A1: Zj independent of Y.' conditional on d A2: No selection bias in levels: (9) holds A3: No selection bias in differences: (12) holds Data Set\ 1 _ A1 holds A2 does not hold A3 holds Data Set 2 A1 holds A2 does not hold ~ Model IV A3 does not hold Data Set 3 Model II Model - A1 does not hold | A2 holds A3 holds Model V A1 does not hold A2 does not hold A3 holds FIGURE F-2 Estimable models win different data sets. Data set 1: Single post-program, no Zi. Data set 2: Single post-program, Z:. Data set 3: Pre-program and post-program. Zi. be (Heckman and Hotz, 19893. For present purposes, a less technical answer is that assumptions can be tested if the data available are a bit more than are actually needed to estimate the model in question. This is illustrated in Figure 2, which shows five different models that can be estimated on different data sets. The model at the top of the figure can be estimated on Data Set I, while the two models below can be estimated on a richer data set, Data Set 2, and the two models below that can be estimated on a yet richer data set, Data Set 3. At the top of the figure, it is presumed that the evaluator has a data set (Data Set 1) consisting of a single post-treatment data point with Ye information, but no other vanables at allin particular, no Zi variable is in the data set. The best the analyst can do in this circumstance is to compare the Yin means of participants and non-participants to calculate ~ as in equation (4) above.

OCR for page 342
356 ~ EVALUATING AIDS PREVENTION PROGRAMS This estimate will equal that obtainable from a randomized trial under the three assumptions shown In the box for Model ~ In the figure: that the missing Zi is independent of YE conditional on di, and that there is no selection bias In either levels or first differences. The first assumption is necessary to avoid "omitted-vanable" bias, the bias generated by leaving out of the mode! an important variable that is correlated with both the probability of receiving the treatment and Yip. Suppose, for example, that Zi is a dummy for city location, as before. If city location is an important determinant of sexual behavior, and if the probability of treatment also varies across cities, then not having a variable for city location in the data set will lead to bias because the estimate of program impact (the dif- ference in mean Y between participants and non-participants) reflects, in part, ~ntercity differences in sexual behavior that are not the result of the treatment but were there to begin with. The second and third assumptions are necessary in order for the value of Yip for non-participants to be the proper counterfactual, that is, for it to equal He value that participants would have had, had they not undergone the treatment.9 Models ~ and m in the Figure can be estimated if the data set contains information on a potential Zi, like city location, but still only a single post-treatment observation on Ye (Data Set 21. Each of these models requires only two assumptions instead of three, as In Mode! I, but each model drops a different assumption. Mode] ~ drops the assumption that there is no selection bias in levels that is, it drops the assumption that (5) holds. This assumption can be dropped because a Zi is now available and the instrumental-vanable technique described above as Solution ~ is now available. In this method, Me values of Yin for participants and non-participants In a given city are not compared to one another to obtain a treatment estimate that estimate would be faulty because participants are a self-selected group. Instead, mean values of Yin across cities are compared to one another, where the cities differ in the availability of the treatment and therefore have different treatment proportions (e.g., a proportion of O if the city has no program at all, as in the example given previously). For the treaunent-effect estimate from this model to be accurate still requires the assumption Mat Be Zi is a legitimate instrument that Me differential availability of Me program across cities is not related to the basic levels of prevention behavior In each city (i.e., that Zi and Yip are independent). 9~ this case, Me third assumption is technically redundant because there will be no selection bias in differences if there is none in levels. This will not be true in other sets of three assumptions. Note too that, of course, more than three assumptions must be made, but these three are focused on for illustration because they are the three relevant to the richest data set considered, Data Set 3. With yet richer data sets, additional assumptions could be examined.

OCR for page 342
APPENDIX F 1 357 Not only does Model ~ require one less assumption than does Mode! I, it also permits the testing of that assumption and therefore He testing of the validity of Mode! I. The test of the Copped assumption that there is no selection bias in levels is based upon a comparison of impact estimates obtained from the two models. If the two are the same or close to one another, then it must be the case that there is, in fact, no selection bias in levels because the impact estimate in Mode] ~ is based upon participant/non-pa~ticipant comparisons whereas that in Mode! II is not. If the two are different, then there must be selection bias if the participant/nonparticipant differences within cities do not generate the same impact estimate as that generated by the differences in Yi' across different cities, the former must be biased since the latter is accurate (under the assumption Mat the Zi available is legitimate). Mode} m takes the opposite tack and drops the assumption that Zi is legitimate but maintains the assumption that there is no selection bias In levels. The mode] estimates the treatment effect by making participant- non-participant comparisons only within cities, that is, conditional on Zi. If there are cities where die program is not present at all, data on Yi' from those cities are not utilized at ah, unlike the method in Mode} H. The Model m impact estimate win be accurate if there is no selection bias into participation but it will also be accurate even if intercity variation is not a legitimate Zi (e.g., if program placement were based upon need). In this case, a comparison of the impact estimate with that obtained from Mode! where participants and non-participants across cities were pooled into one data set and city location was not controlled for because the vanable was not available provides a test for whether intercity variation is a legitimate Zi. If it is not (e.g., if program placement across cities is based on need~then Models ~ and m will produce quite different treatment estimates, for Model ~ does not control for city location but Mode} m does (Mode] m eliminates cross-city variation entirely by examining only participar~t/non-participant differences within cities). On He other hand, if city location is a legitimate Zi (e.g., if program placement is independent of need) Den the two estimates should be close to one another. The implication of this discussion is that Data Set 2 makes it possible to reject Model I by finding its assumptions to be invalid. This testing of Mode} ~ is possible because Data Set 2 provides more data than is actuaBy necessary to estimate the model. Unfortunately, this data set does not allow the evaluator to test the assumptions of Models ~ and m necessary to assure their validity. Each makes a different assumption Model assumes that Zi is legitimate, while Model m assumes no selection bias to be present- and the estimates from the two need not be the same. If

OCR for page 342
358 ~ EVALUATING AIDS PREVENTION PROGRAMS Hey are different, the evaluator must gather additional Information. Such additional information may come from detailed institutional knowledgefor example, of whether Zi is really legitimate (e.g., detailed knowledge of how programs are placed across cities). But another source of additional information is additional data, for example, information on a pre-program measure of Yi`. For example, if Data Set 2 is expanded by adding a pre-program measure of Y (Data Set 3) the assumptions of Models ~ and m can be tested by estimating Models IV and V shown In the Figure. Each of these models drops yet another assumption, although a different one in each case. Model IV drops the assumption that there is no selection bias In differences but continues to make the assumption that Zi is a legitimate instrument. The impact estimate is obtained by the instrumental-var~able technique, as in Mocle] it, but in this case by comparing the means of LYE - You) across cities, thereby eliminating selection bias in levels if there is any. Mode] V drops the assumption that there is no selection bias in levels by applying the difference estimate in (~) but still assumes that there is no selection bias in differences. Once again, the richer data set permits the testing of the assumptions that went into Models ~ and m and therefore permits their rejection as invalid. The arrows in the Figure between Models show which models can be tested against one another. A comparison of the estimates of Mode] IV to Dose of Mode} ~ provides a test of the Gird assumption (that there is no selection bias in differences); a comparison of the estimates of Mode] V and Mode] ~ provides a test of Me first assumption (that Zi is a legitimate instruments, a comparison of the estimates of Mode] V and Mode} m provides a test of whether the second assumption holds (that there is no selection bias In levels). If each comparison indicates estimates that are similar to one another, the relevant assumption in Me more restricted mode] (Mode] II or Mode] my should be taken to be valid; when estimates differ, however, the assumption involved should be taken as invalid and Me more restricted mode] should be rejected. Thus Models II and m may be discarded. As before, Models IV and V now require certain assumptions in order for their impact estimates to be valid. The estimates required for each are different, but neither can be tested unless more information or more data were available. An additional pre-program data point or an additional Zi variable would ennch the data set and permit the assumptions of Me two models to be tested. New models made possible by increasing the richness of the data set permit the evaluator to discard more and more assumptions arid therefore obtain impact estimates that are more and

OCR for page 342
APPENDIX F 359 more reliable. This strategy can be pursued until models are found that are not rejected by richer data sets.~ IV. APPLICATION TO AIDS INTERVENTIONS Two of the interventions being considered are C&T and CBO programs. In 1989, the CDC funded from 1,600 to 2,000 C&T programs across the country. The programs offer HIV testing and some pre-test and post-test counseling to individual clients, and sometimes partner notification and referral as well. The programs are often offered in local health depart- ments or other local health facilities. The HIV testing and counseling are confidential and often also anonymous. There is considerable diversity across programs In the exact types of services offered, for local operators have considerable discretion In designing the type of program offered. The CBO programs under consideration here are Pose which conduct local community health education and risk reduction projects. The types of programs offered are more diverse than those offered In the C&T programs, ranging from educational seminars for AIDS educators to the establishment of focus groups, conducting counseling, educating high-risk groups about risk reduction strategies, and the sponsoring of street fairs and performing arts activities In support of education and risk reduction. The organizations conducting the activities are often small and have close ties to the community, and usually target their activities on specific high- risk groups or other subsegments of the community. At present there is little systematic knowledge of the types of activities sponsored by CBOs on a nationwide basis. Although C&T and CBO programs are quite distinct in their m~s- sions, they pose similar evaluation problems since both are generally aimed at altering sexual behavior in a target population. To evaluate whether the various programs have any impact at all, and to estimate the magnitude of the impact of different types of programs, systematic and careful evaluation strategies are required. The Panel on the Evaluation of AIDS Interventions recommends randomized trials wherever possible to evaluate these programs.l2 Un- fortunately, randomization win be difficult to apply In many cases. First 10Never~eless, as I have stressed elsewhere (Moffitt, 1989), at least one untested assumption must, by definition, always be made in any nonexperimental evaluation. It is only in a randomized trial that such an assumption is no longer necessary for valid impact estimates to be obtained. ~1 Of course, this is not the only goal of these programs and there are many other important ones as well. The techniques discussed in Section m will often be applicable to the evaluation of program impact on over goals, albeit with appropriate modification. 12The panel qualifies this recommendation in several respects. First, it recommends evaluation of only new CBO projects in order not to disrupt the operations of on-going ones. Second, for ethical

OCR for page 342
360 ~ EVALUATING AIDS PREVENTION PROGRAMS and foremost are the ethical issues involved in denying treatment at all, or denying a particular type of treatment, to individuals in the target population. The ethics in this case are not always a clear-cut issue. It is often argued, for example, that the ethical issues are less serious if individuals are not assigned to a zero-treatment cell but only to different types of treatments, each of which represents a gain over the ~ndivid- ual's alternatives outside the experiment. However, even here there are ethical issues involved in any alteration of the natural order of priority In treatment assignment that would occur in the absence of random~za- tion, especially if those operating the program believe that individuals are already being assigned to the "best" treatment for each individual. Second, there are likely to be serious political difficulties as well, for AIDS treatment has already become a highly politicized issue in local communities, and popular resistance to randomization will no doubt be even more difficult to overcome Can it already is for other programs. Third, more than in most randomized trials, those In the AIDS context require a high degree of cooperation from the indigenous staff operating the programs, both to elicit accurate responses from the subjects, to re- duce attntion, and in light of confidentiality requirements that often make it difficult for outside evaluators to be integrally involved In the operation and data collection of the experiment. Such cooperation may be difficult to achieve if randomization is talcing place. ~ any case, it is clear that observational, nonexperimental evaluation techniques must be given serious consideration in the evaluation of AIDS interventions. The techniques outlined in Section m are of potential applicability to such interventions. It is no doubt obvious that in both C&T and CBO programs selectivity bias is likely to be a problem that those who choose to voluntarily make use of the services are likely to be quite different from those who do not, even if they had not received any program services. The techniques outlined in Section m for addressing the selectivity bias problem point In very specific directions for a solution to the problem, namely, (~) the search for appropriate "Z's," and (2) the collection of sexual behavior histories. ~ addition, although it has not been heavily emphasized thus far, those techniques implicitly require the collection of data on non-participants as well as participants. If data on only participants are available, and therefore only a before-and-after study can be conducted, it will be very difficult to identify the effects of the treatment on behavior given Me rapid increases in AIDS knowledge In the reasons, it recommends against randomization for C&T evaluations if a zero-treannent cell is involved. prefemng Hat all cells involve some type of treatment.

OCR for page 342
APPENDIX F ~ 361 general population and the presumed steady change in sexual prevention behaviors that are occulting independently of these programs. The Search for Z's First, consider the issue of whether appropriate Z's can be found for AIDS interventions. It is likely to be difficult to locate such Z's, but not necessarily impossible. It is much easier, in fact, to identify variables Cat are inappropriate as Z's than variables that are appropriate. For example, it is extremely unlikely that arty sociodemographic or health characteristic of individuals themselves would be appropriate. Health status, education level, prior sexual history, and other such characteristics no doubt affect the probability that an individual enrolls in a C&T or CBO program but also unquestionably are independently related to prevention behavior as well. Indeed, to use the language of economics, it is probably not possible to locate appropriate Z vanables on the "demand" side of the marketthat is, among those individuals who are availing themselves of the programsand it would be more fruitful to look on the "supply" side, where availability of programs is determined in the first place. On the availability side, the C&T and CBO programs are indeed differentially placed across neighborhoods within cities, between cities and suburbs, across metropolitan areas, a~nd across states and regions. Unfortunately for the evaluation effort, however, differential availability in most cases is certain to be closely related to need. Those cities most likely to have an extensive set of programs are those with large subsegments of the high-risk population a~nd those where REV incidence has already been determined to be high. Within cities, it is no doubt also the case that programs are more likely to be located in neighborhoods close to high-nsk populations than in neighborhoods far from them. With this process of program placement, differential availability win not constitute an appropriate Zi. If appropriate Z's are to be identified, it wild require a more detailed investigation than is possible here but there are several directions in which such an investigation could be pursued. First, a detailed examination of the funding rules of CDC and other federal agencies would be warranted. Grants are made to applying C&T and CBO sponsors, and no doubt the need of the population to be served is a criterion in the funding decision. But He availability of a Zi does not require that need not be used at all in the decision, only that it not be the sole cntenon. To the extent that other criteria are used to make funding decisions, criteria unrelated to HIV incidence in the area, Z's may be identified. In addition, it is rarely the case that federal funding decisions are as rational and clear-cut

OCR for page 342
362 ~ EVALUATING AIDS PREVENTION PROGRAMS as published funding formulas and formal points criteria suggest. It is almost always the case that some agency discretion, political factors, or bureaucratic forces come into play in some fraction of the decisions. To the extent Hat they are, appropriate Z's will be available. Second, a detailed study of several large cities may result In the identification of other Z's. For example, it has been estimated that 60 percent of the male homosexual population in San Francisco has not been tested for REV infection and has, therefore, almost certainly also not enrolled in a C&T or CBO program.~3 Why this percent is so high could be investigated. Perhaps the 60 percent who have not been tested are those with low probabilities of HIV In the first place, or those who are already practicing prevention behaviors -in this case, no appropriate Zi would be available. On the other hand, some of the non-par~cipants may be located in areas where no C&T or CBO program is present for example, if they do not live in particular neighborhoods that have been targeted. If so, differential access to a program could serve as the basis for a Z. Collection of Histories The collection of data In general, and histories in particular, is likely to be difficult for the evaluation of AIDS interventions. The confidentiality of the testing and counseling process as well as the inherently sensitive nature of the questions that must be asked to obtain information on the necessary behaviors makes the prospect of obtaining reliable data highly uncertain at our present state of knowledge. Obtaining informed consent from those receiving the treatment as well as others may be problematical, and may result in self-selected samples that threaten the integrity of the design and consequently the validity of any impact estimates obtained. These considerations make difficult the prospect of obtaining even a single wave of post-program data, much less multiple penods of pre- program data.~4 Randomized trials have the advantage of requiring less data collection than observational studies, as noted In Section m, and hence are relatively favored in this respect. Nevertheless, cohort studies In this area have been undertaken and and have often been successful In retaining individuals In the sample, more cohort collection efforts are underway. For example, Kasiow and colleagues (1987) report the results of a survey of sexual behavior of . _ Washington Post, January 9, 1990. 14Histories can be collected from retrospective questions as well as reinterviews. For example, one or two pre-program interviews could be conducted, with the earliest one also containing a retrospective battery.

OCR for page 342
APPENDIX F ~ 363 5000 asymptomatic homosexual men In which a baseline survey and lab tests were followed by reinterviews and tests at s~x-month intervals. As of the latest (IOth) wave, about 5 years into the study, from 76 percent to 97 percent of the individuals (across areas and risk groups) are still in the sample, a very high percentage. The success of the cohort is partly a result of solid confidentiality measures as well as the heavy involvement of local gay community leaders and trained local staff from the beginning of the study. Other cohort collection efforts include the CDC cross-city study of O'ReiBy, involving both homosexual men as well as IV drug users; the study of seven CBOs headed by Vincent Mor at Brown University; He San Francisco city clinic cohort and Hepatitis B cohort; and the upcoming Westat cohort sponsored by NCHSR. How successful these efforts win be remains to be seen, but there is no question that serious cohort studies are being undertaken In increasing number. If they are successful, and if the histories described In Section m can be obtained, program evaluation designs will be greatly enhanced and impact estimates will be obtainable with much greater reliability. V. SUMMARY AND CONCLUSIONS The evaluation of AIDS interventions poses difficult conceptual and prac- tical issues. Since randomized trials are unlikely to be feasible in many circumstances, evaluation methods for observational, nonexperimental data must be applied. Statistical methods developed by econorn~sts for the evaluation of the impact of social and economic programs over the past twenty years are applicable to this problem and have several ~rnpor- tant lessons for AIDS evaluations. The most important are that accurate estimates of program impact require (~) a systematic search for iden- ~fy~g "Z" vanables, vanables that affect the availability of program services to different populations but which are not direct detenn~nants of REV incidence or the adoption of prevention behaviors; or (2) the collection of sufficiently lengthy sexual histories from participants and non-participaIlts in the programs that can be used to reduce the selec- tion bias attendant upon participant/non-participant compansons. Both of these implications are quite concrete and should provide funding agen- cies and program evaluators win specific directions to search for and In which to pursue evaluation designs that win yield reliable estimates of program impact. REFERENCES Ashenfelter, O. (1978) Estimating the effect of Mining programs on earnings. Review of Economics arid Statistics 60:47-57.

OCR for page 342
364 ~ EVALUATING AIDS PREVENTION PROGRAMS Barnow, B. (1987) We impact of CETA programs on earnings: A review of the literature. Journal of Human Resources 22:157-193. Barnow, B. Cain, G. and Goldberger, A. (1980) Issues in the analysis of selectivity bias. In E. Stromsdorfer and G. Parkas, eds., Evaluation Studies Review Annual, Volume 5. Beverly Hills, Calif.: Sage. Bjorklund, A. and Moffitt, R. (1987) Estimation of wage gains and welfare gains in self-selection models. Review of Economics and Statistics 69:42~9. Goldberger, A. (1972) Selection bias in evaluating treatment effects: Some formal illustrations. Discussion paper 123-72. Madison, Wisconsin: Institute for Research on Poverty. Gronau, R. (1974) Wage compansons- a selectivity bias. Journal of Political Economy 82:1119-1143. Heckman, J. J. (1974) Shadow prices, market wages, and labor supply. Econometrica 42:679-694. Heckman, J. J. and Hotz, V. J. (1989) Choosing among alternative nonexperimental methods for estimating the impact of social programs: The case of manpower training. Journal of the American Statistical Association 84:862-874. Heckman, J. J. and Robb, R. (1985a) Alternative methods for evaluating the impact of interventions: An overview. Journal of Econometrics 30:239-267. Heckman, J. J. and Robb, R. (1985b) Alternative methods for evaluating the impact of interventions. In J. Heckman and B. Singer, eds., Longitudinal Analysis of Labor Market Data. Cambndge: Cambridge University Press, 1985b. Kaslow, R. W. Ostrow, D. G. Detels, R., Phair, J. P. Polk, B. F. and Rinaldo, C. R. (1987) The Multicenter AIDS cohort study: Rationale, organization, and selected characteristics of the participants. American Journal of Epidemiology 126:310-318. Lewis, H. G. (1974) Comments on Selectivity Biases in Wage Comparisons. Journal of Political Economy 82: 1145-1155. Maddala, G. S. (1983) Limited-Dependent Variable and Qualitative Variables in Econo- metrics. Cambridge: Cambridge University Press, 1983. Maddala, G. S. (1985) A survey of the literature on selectivity bias as it pertains to health care markets. In R. M. Schemer, Ed., Advances in Health Economics and Health Services Research, Vol. 6. Greenwich, Conn.: JAI Press. Manski, C. (1990) Nonparame~c bounds for treatment effects. American Economic Review 80:319-323. Moffitt, R. (1989) Comment on Heclunan and Hotz. Journal of the American Statistical Association 84:877-878.