Below are the first 10 and last 10 pages of uncorrected machineread text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapterrepresentative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 342
F
The Use of Selection Modeling to
Evaluate AIDS Interventions with
Observational Data
Robert Moffitt
I. INTRODUCTION
This paper considers the potential applicability to AIDS interventions
of nonexpenmental evaluation methods developed by econome~mcians
in the evaluation of social programs. Econometric methods for program
evaluation have been studied by economists over the past twenty years
and are by now quite welldeveloped. To give the discussion a focus,
two types of interventions are considered: AIDS counseling and testing
(C&T) programs, and AIDS programs run by communitybased organi
zations (CBOs). While both C&T programs and CBOs are quite diverse,
especially He CBOs, many are designed to encourage the adoption of
sexual prevention behaviors and to encourage risk reduction behaviors
more generally. It is this outcome that will be Me focus of the analysis
here.
This paper was presented at the Conference on Nonexperimental Approaches to Evaluating AIDS
Prevention Programs convened on January 1213, 1990 by the Panel on the Evaluation of AIDS Inter
ventions of the National Research Council. The views expressed in this paper are those of the author,
they should not be attributed to the Panel or to tile NRC. Comments on an earlier version of the pa
per from James Heckman, V. Joseph Hotz, Roderick Little, Charles Manski, and Lincoln Moses are
appreciated. A version of this paper is to appear in Evaluation Review.
342
OCR for page 342
APPENDIX F  343
In the next section of the paper, a brief historical overview of the
development of econometric methods for program evaluation is given.
Following that, in Section HI, a more formal statistical exposition of those
methods is given. This section constitutes the major part of the paper.
The conclusion of the discussion is that nonexperimental evaluation in
general requires either adequate data on the behavioral histories of par
ticipants and nonparticipants In the interventions, or the availability of
identifying vanables ("Z's") that affect the availability of the treatment
to different individuals but not their behavior directly. Whether either of
these conditions can be met in the evaluation of AIDS interventions is
then discussed In Section IV for C&T and CBO programs. A summary
and conclusion is provided in Section V.
II. HISTORICAL DEVELOPMENT OF ECONOMETRIC
METHODS FOR PROGRAM EVALUATION
Most of the econometric methods for program evaluation have been de
signed to evaluate governmentsponsored manpower training programs,
where the major issue has been whether such programs increase ~ndi
vidua] earnings and other indicators of labor market performance. Such
programs began to appear in the early 1960s with the Manpower Devel
opment and Training Act (MDTA) of 1962, and grew more extensive in
the late 1960s as part of the War on Poverty. They became a fixture in the
1970s and 1980s, though changing in name and form from, for example,
the Comprehensive Employment and Training Act (CETA) program In
the 1970s to the Job Training and Partnership Act (JTPA) program in the
1980s. However, economists have also conducted extensive studies of
welfare programs of other types, of health and education programs, and
many others.
One of He earliest studies (Ashenfelter, 1978) presented an econo
me~ic mode] for the estimation of the effect of the MDTA program on
earnings using observational data. Many studies were later conducted of
the CETA program and have been surveyed by Barnow (1987~. No major
evaluation studies of the J IPA program have been completed, although
one is currently underway. A recent study of an experimental training
program called Supported Work has been published by Heckman and
Hotz (1989), and win be discussed further below.
The econometric literature on program evaluation underwent a ma
jor alteration in its formal framework after the separate development of
"selectivity bias" techniques in the mid1970s. Originally, the selectivity
bias issue in economics concerned a miss~ngdata problem that arises
In the study of individual labor market behavior, namely, the inherent
OCR for page 342
344 ~ EVALUATING AIDS PREVENTION PROGRAMS
unobservability of the potential market earrungs of individuals who are
not working. The development of techniques for properly estimating
such potential earnings (Gronau, 1974; Lewis, 1974; Heckman, 1974)
was quickly realized to have relevance to the estimation of the effects of
public programs on economic behavior. As will be discussed extensively
in the next section, a similar selectivity bias problem arises in obser
vational evaluation studies through the inherent unobservability of what
would have happened to program participants had they not received the
treatment, and of what would have happened to nonpar~cipants had they
undergone the treatment. The connecting link was first explicitly made
by Bamow, Cain, and Goldberger (1980), which included a comparison
of the new technique with earlier techniques. A textbook treatment of
the applicability of selectivity bias methods to program evaluation be
came available shortly thereafter (Maddala, 1983), as well as a survey
of the applicability of those methods to health interventions in particular
(Maddala, 1985~.
The recent work of Heckman and Robb (1985a, 1985b) represents
the most complete and general statement of the selectivity bias problem
In program evaluation using observational data, and provides He most
Borough analysis of the conditions under which the methods win yield
good estimates and of the estimation me~ods available to obtain such
estimates. The analysis In the next section of this paper is heavily
influenced by the work of Heckman and Robb, which is itself built upon
the almost twenty years of work on econometric methods for program
evaluation.
III. THE STATISTICS OF PROGRAM EVALUATION
WITH OBSERVATIONAL DATA
Although the statistical methods exposited in this section are applicable
to any program in principle and will be developed fairly abstractly, it may
help for specificity to consider the evaluation of C&T programs. Such
programs have many goals but, for present purposes, it will be assumed
that the major goal is to encourage those who receive the services of the
program to adopt risk reduction activities and sexual prevention behaviors
to reduce the likelihood of HIV infection to themselves and to others. The
aim of He evaluation is to determine whether such programs do indeed
have such effects and, if so, to provide an estimate of them magnitude.
To begin the formal analysis, let Y be the outcome variable (e.g.,
level of prevention behavior) and make He following definitions:
YE = Level of outcome variable for individual i at time t, assum
~ng he has not received the "treatment" (i.e., the services of
a C&T program)
OCR for page 342
APPENDIX F ~ 345
Yin** = Level of outcome variable for individual i at time t, assum
ing he has received the treatment at some prior date
The difference between these two quantities is the effect of the treatment,
denoted c>:
Yit =Yit+~ (~)
or
C} = Yit —Yit
(2)
The aim of the evaluation is to obtain an estimate of the value of a, the
treatment effect, from whatever data are ava~lable.~ The easiest way to
Wink about what we seek in an estimate of cat is to consider individuals
who have gone through a C&T program and therefore have received the
treatment, and for whom we later measure their value of Yin**. Ideally,
we wish to know the level of Yip for such individuals we would like to
know what their level of prevention behaviors, for example, would have
been had they not gone through the program. If Yi'* could be known, the
difference between it and Yin** would be a satisfactory estimates of a.
The difficulty that arises is that we do not observe Yi`* directly but
only the values of Yin* for nonparticipants. Define a dummy variable for
whether an individual has or has not received the treatment:
di = ~ if individual i has received the treatment
di = 0 if individual i has not received the treatment
Then a satisfactory estimate of ~ could be obtained by estimating the
difference between Yin** and Yi'* for those who went through the program:
= E(Yit**~di = I)—E(Yit*~di = I)
(3)
where E is the expectation operator.
The estimate & in (3) is, in fact, the estimate that would be obtained if
we had administered a randomized trial for the evaluation. For example,
as individuals come In through the door of a C&T program, they could
be randomly assigned to treatment status or control status, where the
latter would involve receiving none of He services of the program. At
some later date we could measure the levels of Y for the two groups and
calculate (3) to obtain an estimate of the effect of the program.
1 For simplicity, the treatment effect, a, is assumed to be constant over time and across individuals, and
to be nonrandom. Random treatment effects across individuals have been incorporated by B3orklund
and Moffitt (1987) and are discussed by Heckman and Robb (1985a, 1985b).
21n standard econometric practice, Y.` is set equal to X,6~e, where X is a vector of observed variables,
,B is its coefficient vector, and ~ is an error term.
OCR for page 342
346 ~ EVALUATING AIDS PREVENTION PROGRAMS
The Problem
The first key point of the statistical literature on evaluation is Mat ob
servational, nonexperimental data do not allow us to calculate (3) and
therefore do not allow us to compute the estimate that could be obtained
with a randomized trial. This is simply because we generally do not
observe in such data any individuals who would have taken the treatment
but do not; we only generally observe individuals who did not take the
treatment at all.3 What we can estimate with nonexperimental data is an
effect denoted here as a:
cat = E(Yit** Eli = I)—E(Yit Eli = 0)
(4)
which is just the difference between mean Y for participants, those who
did take the treatment (di = I) and He mean Y for nonparticipants, those
who did not undergo the treatment (ah = 0)
When will the estimate we are able to calculate, a, equal the estimate
we would have obtained with a randomized teal, &? Comparison of (3)
and (4) shows that the two will be equal if and only if the following
condition is true:
E(Yit Eli = 1) = E(Yit Eli = 0)
(5)
In words, the two estimates of ax are equal only if the value of YE for
those who did not take the treatment equals the value of Yi' that Hose
who did take the treatment would have had, had they not gone through
the program.
The heart of the nonexperimental evaluation problem is reflected
In equation (5), and an understanding of that equation is necessary to
understand the pervasiveness and unavoidability of what is termed the
"selection bias" problem when observational data are employed. The
equation will fall to hold under many plausible circumstances. For
example, if those who go through a C&T program are concerned with
their health and have already begun adopting prevention behaviors even
before entering the program, they wiD be quite different from those who
do not go through He program even prior to receiving any program
services. Hence equation (5) win fall to hold because those who go
through the program have different levels of Yi`, that is, different levels
of prevention behavior even in the absence of receiving any program
services. Our estimate of ~ will be too high relative to &, for He
3 Some evaluation designs make use of individuals on waiting Lsts as controls. Unfortunately, these
individuals may not be randomly selected from the applicant pool; if they are not, Weir Y values will
not be an accurate proxy for those of participants.
OCR for page 342
APPENDIX F  347
greater level of prevention behaviors observed for the treatment group
subsequent to receiving services was present even prior to the treatment,
and is therefore not a result of the treatment itself. Those who are
observed to have actually gone through the program are therefore a "self
selected" group out of the pretreatment population, and the estimate of
is contaminated by "selectivity bias" because of such selfselection.
The unavoidability of the potential for selectivity bias arises because
the validity of equation (5) cannot be tested, even in principle, for the
lefthand side of that equation is inherently unobservable. It is impossible
In principle to know what the level of Yip for those who went through
the program would have been had they not gone through it, for that level
of Yip is a "counterfactual" that can never be observed. We may know,
as discussed further below, the pretreatment level of Ye for those who
later undergo treatment, but this is not the same as the Yip we seek for
the lefthand side of (5), we need to know the level of Yip for program
participants that they would have had at exactly the same time as they
went through the treatment, not at some previous time.4
Solutions
There are three general classes of potential solutions to the selection
bias problem (Heckman and Robb, 1985a, 1985b). None guarantees the
elimination of the problem altogether, but rather each seeks to determine
possible circumstances under which the problem could be eliminated.
The question is then whether Lose circumstances hold. At the outset, it
is important to note that two of the three solution methods have important
implications for evaluation design because they require Be collection of
certain types of data. Whether those data can be collected for AIDS
programs then becomes the most important question, which is discussed
In detail In Section IV.
Solution ]: Identifying Variables ("Z's"J
The selection bias problem can be solved if a vanable Zi is available,
or one can be found, Cat (1) affects the probability that an individual
receives Me treatment but which (2) has no direct relationship to Yip (e.g.,
no direct relationship to individual prevention behavior). What is an
example of such a Zi? A Zi could be constructed if, for example, a C&T
program were funded by CDC In one city and not In another for political
4Ie may be noted that Manski (1990) has pointed out that if Yz' is bounded (e.g., between O and 1),
a worstcase/bestcase analysis can be conducted in which the unobserved counterfactual is taken to
equal each of the bounds in turn. This gives a range in which the true effect must lie instead of a point
estimate.
OCR for page 342
348 ~ EVALUATING AIDS PREVENTION PROGRAMS
or bureaucratic reasons unrelated to the needs of the populations in the
two cities, and therefore unrelated to the likelihood that the individuals
in He two cities practice prevention behaviors. If a random sample
of the relevant subpopulations were conducted in the two cities and
data on Y were collected the data would include both participants and
nonparticipants in Be city where the C&T program was funded—a
comparison of the mean values of Y In the two could form the basis
for a valid estimates of a. The variable Zi In this case should be
Fought of as a dummy vanable equal to 1 In the C&T city and O in the
over. The vanable satisfies the two conditions given above it obviously
affects whether Individuals In the two cities receive the treatment, since
if Zi = 0 no C&T treatment is available, and it is unrelated to the
level of YE in the two cities because the funding decision was made for
reasons unrelated to sexual prevention behavior.6 This estimation method
is known in econometrics as an "instrumentalvar~able" method and Zi
is termed an "instrument." It is an instrument In He sense that it is used
to proxy the treatment variable itself.
What is an example of an illegitimate Zi? The same dummy variable
defined In the previous example would be illegitimate if the CDC funding
decision were based not on political or bureaucratic decisions but on the
relative level of need in the two cities. For example, if He C&T program
were placed in the city with the higher rate of HIV infection, then
Zi would not be independent of Yip the presence of the C&T program
would be associated with lower levels of prevention behavior not because
of a negative causal effect of He program but because of the reason for
its placement.
Further examples of legitimate and illegitimate "Z's" will be dis
cussed in Section IV.
Solution 2: Parametric Distributional Assumptions on Yip
A second solution to the selection bias problem arises if a paramedic
distributional assumption on YE can be safely made or determined win
reasonable certainty (Heckm~ and Robb, 1985a, 1985b). For example,
if Yip follows a normal, logistic, or some over distribution with a finite
set of parameters, identification of a program effect free of selection bias
5 For example, if Y1 is the mean value of the outcome variable in the city with the program and Yo is
the mean value in the city without one, and if p is the fraction of the relevant subpopulation in the first
city that participated in the program, the impact estimate would be calculated as (Ye—YO)/p.
6Essentially, this is a case of what is often telexed a "natural" experiment. It is similar to an experiment
inasmuch as He probability of having the treatment available is random with respect to the outcome
variable under study.
OCR for page 342
APPENDIX F
349
is possible. The reasons are relatively technical and difficult to explain in
simple terms. However, this method will not be especially useful for the
AIDS interventions because very little is known about the distribution
of sexual prevention behaviors, for example, in the total population or
even the highrisk population. Consequently, this method will not be
considered further.
Solution 3: Availability of Cohort Data
A third solution memos requires the availability of "cohort," "Iongitudi
nal," or "panel" data, that is, data on the same individuals at several points
In time before and after some of them have undergone the treatment. In
the simplest case, data on Ye are available not only after the treatment
but also before, giving a data set with one pretreatment observation and
one posttreatment observation for each individual, both participants and
nonparticipants. In the more general case, three or more points in time
may be available in the data.
The use of such cohort data is sufficiently important to warrant an
extended discussion. To illustrate this method, first consider the situation
that would arise if data at two points In time were available, one before
the treatment and one after it. Let"t" denote the posttreatment point
and "tat" denote the pretreatment point. Then, analogously with the
crosssection case considered previously,
Yit —Yi*t_, = Change in Yit from t] to t in the absence of
having undergone the treatment
Yi`* —Yi*~_~ = Change in Yin* from t] to t if having undergone
the treatment
Then the effect of the treatment is a, and
Yit —Yitt_1 = (Yit—Yi,tl) + ~
(6)
Since Yi*~_~ cancels out on both sides of (6), (6) is the same as (I) and
therefore the true effect, a, is the same.
As before, a preferred estimate of the effect of the program could
be obtained by a randomized tnal in which those wishing to undergo
the treatment (di = I) are randomly assigned to participation or non
participation status. With data on both pretreatment and posttreatment
status, the estimate of the program effect could be calculated as:
& = E(Yi~*—Yi*~_~di = 1) E(Yi~—Yi,~_~di = I) (7)
However, with observational data the second term on the nghthand side
of (7) is not measurable since, once again, we cannot measure Yi`* for
OCR for page 342
350 ~ EVALUATING AIDS PREVENTION PROGRAMS
those who undergo the treatment. We can instead only use the data on Yip
available from nonparticipants to estimate the program effect as follows:
~ = EtY**—Y*^ . ~d. = 1N—E(Y*—Y.*. . id = no `8y
Milt—loci— 11 ~\lit — wilt—tori—~J
The estimate cat is often called a "differences" estimate because it is
computed by comparing the firstdifferenced values of Y for participants
and nonpariicipants.
The estimate we are able to obtain in (~) will equal that we could
have obtained in the randomized teal, (7), if and only if
E(Yit—Yi*~_~di = I) = E(Yit—Yi,~_~4i = 0) (9)
Equation (9) is the key equation for the two datapo~nt case and is the
analogue to equation (5) in the single posttreatment data case. The
equation shows that a data set with a pretreatment and posttreatment
observation will yield a good estimate of cat if the change in YE from pre to
post would have been the same for participants, had they not undergone
the treatment, as it actually was for nonparticipants. Sometimes He
change in Yip is referred to as the "growth rate" of Yip, in which case
we may say that our nonexperimental estimate requires that the grown
rate of participants and nonparticipants be the same in the absence of
the treatment.
Perhaps the most important point is that this condition may hold even
though the condition in (5) does not. Equation (5), the condition Mat
must hold for the nonexperimental estimate in a single posttreatment
crosssection to be correct, requires that the levels of Yin be the same
for participants and nonparticipants in the absence of He treatment.
Equation (9), on the over hand, only requires that the growth rates of
Yin be the same for participants and nonparticipants in the absence of
the treatment, even though the levels may differ. The latter is a much
weaker condition and win more plausibly hold.
The nature of the condition is illustrated in panels (a) and (b) of
Figure 1. In panel (a), the pretreatment levels of nonparticipants and
participants, A and A', respectively, are quite different participants
have a higher level of Y. as would be the case, for example, if those
who later undergo C&T have higher prevention behaviors in the first
place. From t] to t, the level of Y for nonparticipants grows from A
to B. as might occur if everyone in the population under consideration
(e.g., homosexual men) were increasing their degree of risk reduction
behaviors even without participating in a C&T program. The figure
shows, for illustration, a growth rate of Y for participants from A'
OCR for page 342

APPENDIX F  351
UJ
12
_
10 _
8 _
6 _
4 _
_
2
Pars ~ Par~dpa~l
A ~
A /
OCR for page 342
352 ~ EVALUATING AIr)S PREVENTION PROGRAMS
assumption cannot be verified because point B' is not observed; it is
only a "counterfactual." But clearly the estimate in the figure would be
a much better estimate than that obtained from a single post~eatment
crosssection, which would take the vertical distance between B and C
as the treatment estimate. This would be invalid because equation (5)
does not hold.
Pane} (b) In Figure 1 shows a case where condition (9) breaks down.
On that panel, a case is shown in which He Y of participants would
have grown faster than that for nonparticipants even In the absence of
the treatment (A' to B' is greater than A to By. This might anse, for
example, if those individuals who choose to undergo C&T are adopting
risk reduction behaviors more quickly than nonparticipants. In this case,
our estimate of ~ is too high, since it measures the vertical distance
between B" and C instead of between B' and C. Neither B' nor B" is
observed, so we cannot know which case holds.
The primary conclusion to be Lawn from this discussion is that we
may be able to do better in our estimate of program effect with more
data. Adding a single pretreatment data point permits us to compute
an estimate of the treatment effect the differences estimator In (~ 
that may be correct In circumstances in which the estimator using a
single posttreatment is not. The importance of having additional data on
the histories of Y. or the sexual behavior histories of C&T participants
and nonparticipants, for example, stands in contrast to the situation faced
when conducting a randomized tIia] where, strictly speaking, only a single
posttreatment cross section is required. Thus we conclude that more data
may be required for valid inference In nonexpenmental evaluations than
in experimental evaluations.
This point extends to the availability of additional pretreahnent
observations.7 Suppose, for example, that an additional pretreatment ok
servation is available at time t2. The estimate calculable in a randomized
tnal is
= EE(Yit —Yi,`_~)—(Yi,~_i—Yi,`2~di = I]
—EE(Yit—Yi,`_~)—(Yi,~_i—Yi,`2~di = I] (10)
while the estimate permitted In an observational study is
at = E[(Yit —Yittl)—(Yi,t_1—Yi*t_2)ldi = 1]
 Er(Y3t—Yittl)—(Yi,t1—Yi,t_2)ldi = 0] (11)
7 Gathering data from additional posttreatment observations is easier but does not serve the appropriate
control function. Prior to the treatment, it is known with certainty that the program could have no true
effect; after the treatment, it cannot be Mown with certainty what the pattern of the effect is, assuming
it has an effect. Consequently, participant/nonparticipant differences in Yin after the treatment can
never be treated with absolute certainty as reflecting selection bias rather than a true effect.
OCR for page 342
354 ; EVALUATING AIDS PREVENTION PROGRAMS
the nonexpenmenta] estimator. In the general case, a slight modification
in the mode] allows us to write the estimate of the treatment effect as the
follow~ng:8
= E(Yit phi = 1, Yin,_, Yin,_ 2'·  , Yin,—k)
E(Yit ~ Hi = 0, Yi,`—~ ' Yi,~—2, · , Yip,—k
(13)
assuming that data are available for k pretreatment periods. This esti
mator will equal that obtainable in a randomized mal if and only if the
following condition holds:
E(Yit ~ di = 1 , Yin,_ ~ , · . ., Yin,_ k) = E(Yit ~ di = 0, Yi,~_ ~ , . . ., Yi,~_k) ~ 14)
This condition can be interpreted as requiring that the values of di and
Yi' must be independent of one another conditional upon the history of
Yi' up to to. Put differently, it must be the case that it if we observe two
individuals at time t] who have exactly the same history of Yip up to that
time (e.g., the exact same history of sexual prevention behaviorsWand
who therefore look exactly alike to the investigators they must have the
same value of YE in the next time period regardless of whether they do
or do not undergo the treatment. If, on the other hand, the probability of
entering a C&T program is related to the value of Yi`* they would have
had if the treatment were not available, the condition in equation (14)
will not hold and the nonexperimental estimate will be inaccurate.
The Relationship between Data
Availability and Testing of Assumptions
The discussion thus far has demonstrated that the availability of certain
types of data—information on legitimate "Z" vanables, or on individual
histories is related to He conditions that must hold, and the assumptions
that must be made, In order to obtain an estimate of program effect similar
to that obtainable In a randomized trial. A natural question is whether
any of the assumptions can be tested, and whether it can be determined if
the conditions do or do not hold. The answer to this question once again
is related to data availability.
The technical answer to the question is that "overidentifying" as
sumptions can be tested but that "just identifying" assumptions cannot
This autoregressive model was estimated in an early economic study by Ashenfelter (1978). A simpler
model but one more focused on the evaluation question was also analyzed by Goldberger (1972) in a
study of methods of evaluating the effect of compensatory education programs on test scores when the
treatment group is selected, in pan, on the basis of a pretest score.
OCR for page 342
APPENDIX F  355
Model I
A1: Zj independent of Y.' conditional on d
A2: No selection bias in levels: (9) holds
A3: No selection bias in differences: (12) holds
Data Set\ 1
_
A1 holds
A2 does not hold
A3 holds
Data Set 2
A1 holds
A2 does not hold ~ Model IV
A3 does not hold
Data Set 3
Model II Model

A1 does not hold 
A2 holds
A3 holds
Model V
A1 does not hold
A2 does not hold
A3 holds
FIGURE F2 Estimable models win different data sets. Data set 1: Single postprogram, no
Zi. Data set 2: Single postprogram, Z:. Data set 3: Preprogram and postprogram. Zi.
be (Heckman and Hotz, 19893. For present purposes, a less technical
answer is that assumptions can be tested if the data available are a bit
more than are actually needed to estimate the model in question. This
is illustrated in Figure 2, which shows five different models that can be
estimated on different data sets. The model at the top of the figure can be
estimated on Data Set I, while the two models below can be estimated
on a richer data set, Data Set 2, and the two models below that can be
estimated on a yet richer data set, Data Set 3. At the top of the figure,
it is presumed that the evaluator has a data set (Data Set 1) consisting
of a single posttreatment data point with Ye information, but no other
vanables at all—in particular, no Zi variable is in the data set. The best
the analyst can do in this circumstance is to compare the Yin means of
participants and nonparticipants to calculate ~ as in equation (4) above.
OCR for page 342
356 ~ EVALUATING AIDS PREVENTION PROGRAMS
This estimate will equal that obtainable from a randomized trial under
the three assumptions shown In the box for Model ~ In the figure: that the
missing Zi is independent of YE conditional on di, and that there is no
selection bias In either levels or first differences. The first assumption is
necessary to avoid "omittedvanable" bias, the bias generated by leaving
out of the mode! an important variable that is correlated with both the
probability of receiving the treatment and Yip. Suppose, for example, that
Zi is a dummy for city location, as before. If city location is an important
determinant of sexual behavior, and if the probability of treatment also
varies across cities, then not having a variable for city location in the
data set will lead to bias because the estimate of program impact (the dif
ference in mean Y between participants and nonparticipants) reflects, in
part, ~ntercity differences in sexual behavior that are not the result of the
treatment but were there to begin with. The second and third assumptions
are necessary in order for the value of Yip for nonparticipants to be the
proper counterfactual, that is, for it to equal He value that participants
would have had, had they not undergone the treatment.9
Models ~ and m in the Figure can be estimated if the data set
contains information on a potential Zi, like city location, but still only
a single posttreatment observation on Ye (Data Set 21. Each of these
models requires only two assumptions instead of three, as In Mode!
I, but each model drops a different assumption. Mode] ~ drops the
assumption that there is no selection bias in levels that is, it drops the
assumption that (5) holds. This assumption can be dropped because a
Zi is now available and the instrumentalvanable technique described
above as Solution ~ is now available. In this method, Me values of Yin
for participants and nonparticipants In a given city are not compared to
one another to obtain a treatment estimate that estimate would be faulty
because participants are a selfselected group. Instead, mean values of
Yin across cities are compared to one another, where the cities differ in
the availability of the treatment and therefore have different treatment
proportions (e.g., a proportion of O if the city has no program at all, as
in the example given previously). For the treaunenteffect estimate from
this model to be accurate still requires the assumption Mat Be Zi is a
legitimate instrument that Me differential availability of Me program
across cities is not related to the basic levels of prevention behavior In
each city (i.e., that Zi and Yip are independent).
9~ this case, Me third assumption is technically redundant because there will be no selection bias in
differences if there is none in levels. This will not be true in other sets of three assumptions. Note
too that, of course, more than three assumptions must be made, but these three are focused on for
illustration because they are the three relevant to the richest data set considered, Data Set 3. With yet
richer data sets, additional assumptions could be examined.
OCR for page 342
APPENDIX F 1 357
Not only does Model ~ require one less assumption than does Mode!
I, it also permits the testing of that assumption and therefore He testing of
the validity of Mode! I. The test of the Copped assumption that there
is no selection bias in levels is based upon a comparison of impact
estimates obtained from the two models. If the two are the same or
close to one another, then it must be the case that there is, in fact, no
selection bias in levels because the impact estimate in Mode] ~ is based
upon participant/nonpa~ticipant comparisons whereas that in Mode! II is
not. If the two are different, then there must be selection bias if the
participant/nonparticipant differences within cities do not generate the
same impact estimate as that generated by the differences in Yi' across
different cities, the former must be biased since the latter is accurate
(under the assumption Mat the Zi available is legitimate).
Mode} m takes the opposite tack and drops the assumption that Zi is
legitimate but maintains the assumption that there is no selection bias In
levels. The mode] estimates the treatment effect by making participant
nonparticipant comparisons only within cities, that is, conditional on Zi.
If there are cities where die program is not present at all, data on Yi' from
those cities are not utilized at ah, unlike the method in Mode} H. The
Model m impact estimate win be accurate if there is no selection bias into
participation but it will also be accurate even if intercity variation is not a
legitimate Zi (e.g., if program placement were based upon need). In this
case, a comparison of the impact estimate with that obtained from Mode!
where participants and nonparticipants across cities were pooled into
one data set and city location was not controlled for because the vanable
was not available provides a test for whether intercity variation is a
legitimate Zi. If it is not (e.g., if program placement across cities is based
on need~then Models ~ and m will produce quite different treatment
estimates, for Model ~ does not control for city location but Mode} m
does (Mode] m eliminates crosscity variation entirely by examining
only participar~t/nonparticipant differences within cities). On He other
hand, if city location is a legitimate Zi (e.g., if program placement is
independent of need) Den the two estimates should be close to one
another.
The implication of this discussion is that Data Set 2 makes it possible
to reject Model I by finding its assumptions to be invalid. This testing of
Mode} ~ is possible because Data Set 2 provides more data than is actuaBy
necessary to estimate the model. Unfortunately, this data set does not
allow the evaluator to test the assumptions of Models ~ and m necessary
to assure their validity. Each makes a different assumption Model
assumes that Zi is legitimate, while Model m assumes no selection bias
to be present and the estimates from the two need not be the same. If
OCR for page 342
358 ~ EVALUATING AIDS PREVENTION PROGRAMS
Hey are different, the evaluator must gather additional Information.
Such additional information may come from detailed institutional
knowledge—for example, of whether Zi is really legitimate (e.g., detailed
knowledge of how programs are placed across cities). But another source
of additional information is additional data, for example, information on
a preprogram measure of Yi`. For example, if Data Set 2 is expanded
by adding a preprogram measure of Y (Data Set 3) the assumptions of
Models ~ and m can be tested by estimating Models IV and V shown In
the Figure. Each of these models drops yet another assumption, although
a different one in each case. Model IV drops the assumption that there
is no selection bias In differences but continues to make the assumption
that Zi is a legitimate instrument. The impact estimate is obtained by
the instrumentalvar~able technique, as in Mocle] it, but in this case by
comparing the means of LYE  You) across cities, thereby eliminating
selection bias in levels if there is any. Mode] V drops the assumption that
there is no selection bias in levels by applying the difference estimate in
(~) but still assumes that there is no selection bias in differences.
Once again, the richer data set permits the testing of the assumptions
that went into Models ~ and m and therefore permits their rejection as
invalid. The arrows in the Figure between Models show which models
can be tested against one another. A comparison of the estimates of Mode]
IV to Dose of Mode} ~ provides a test of the Gird assumption (that there
is no selection bias in differences); a comparison of the estimates of
Mode] V and Mode] ~ provides a test of Me first assumption (that Zi
is a legitimate instruments, a comparison of the estimates of Mode] V
and Mode} m provides a test of whether the second assumption holds
(that there is no selection bias In levels). If each comparison indicates
estimates that are similar to one another, the relevant assumption in Me
more restricted mode] (Mode] II or Mode] my should be taken to be
valid; when estimates differ, however, the assumption involved should be
taken as invalid and Me more restricted mode] should be rejected. Thus
Models II and m may be discarded.
As before, Models IV and V now require certain assumptions in order
for their impact estimates to be valid. The estimates required for each are
different, but neither can be tested unless more information or more data
were available. An additional preprogram data point or an additional
Zi variable would ennch the data set and permit the assumptions of Me
two models to be tested. New models made possible by increasing the
richness of the data set permit the evaluator to discard more and more
assumptions arid therefore obtain impact estimates that are more and
OCR for page 342
APPENDIX F
359
more reliable. This strategy can be pursued until models are found that
are not rejected by richer data sets.~°
IV. APPLICATION TO AIDS INTERVENTIONS
Two of the interventions being considered are C&T and CBO programs.
In 1989, the CDC funded from 1,600 to 2,000 C&T programs across the
country. The programs offer HIV testing and some pretest and posttest
counseling to individual clients, and sometimes partner notification and
referral as well. The programs are often offered in local health depart
ments or other local health facilities. The HIV testing and counseling are
confidential and often also anonymous. There is considerable diversity
across programs In the exact types of services offered, for local operators
have considerable discretion In designing the type of program offered.
The CBO programs under consideration here are Pose which conduct
local community health education and risk reduction projects. The types
of programs offered are more diverse than those offered In the C&T
programs, ranging from educational seminars for AIDS educators to the
establishment of focus groups, conducting counseling, educating highrisk
groups about risk reduction strategies, and the sponsoring of street fairs
and performing arts activities In support of education and risk reduction.
The organizations conducting the activities are often small and have close
ties to the community, and usually target their activities on specific high
risk groups or other subsegments of the community. At present there is
little systematic knowledge of the types of activities sponsored by CBOs
on a nationwide basis.
Although C&T and CBO programs are quite distinct in their m~s
sions, they pose similar evaluation problems since both are generally
aimed at altering sexual behavior in a target population. To evaluate
whether the various programs have any impact at all, and to estimate the
magnitude of the impact of different types of programs, systematic and
careful evaluation strategies are required.
The Panel on the Evaluation of AIDS Interventions recommends
randomized trials wherever possible to evaluate these programs.l2 Un
fortunately, randomization win be difficult to apply In many cases. First
10Never~eless, as I have stressed elsewhere (Moffitt, 1989), at least one untested assumption must,
by definition, always be made in any nonexperimental evaluation. It is only in a randomized trial that
such an assumption is no longer necessary for valid impact estimates to be obtained.
~1 Of course, this is not the only goal of these programs and there are many other important ones as
well. The techniques discussed in Section m will often be applicable to the evaluation of program
impact on over goals, albeit with appropriate modification.
12The panel qualifies this recommendation in several respects. First, it recommends evaluation of
only new CBO projects in order not to disrupt the operations of ongoing ones. Second, for ethical
OCR for page 342
360 ~ EVALUATING AIDS PREVENTION PROGRAMS
and foremost are the ethical issues involved in denying treatment at all,
or denying a particular type of treatment, to individuals in the target
population. The ethics in this case are not always a clearcut issue. It
is often argued, for example, that the ethical issues are less serious if
individuals are not assigned to a zerotreatment cell but only to different
types of treatments, each of which represents a gain over the ~ndivid
ual's alternatives outside the experiment. However, even here there are
ethical issues involved in any alteration of the natural order of priority
In treatment assignment that would occur in the absence of random~za
tion, especially if those operating the program believe that individuals
are already being assigned to the "best" treatment for each individual.
Second, there are likely to be serious political difficulties as well, for
AIDS treatment has already become a highly politicized issue in local
communities, and popular resistance to randomization will no doubt be
even more difficult to overcome Can it already is for other programs.
Third, more than in most randomized trials, those In the AIDS context
require a high degree of cooperation from the indigenous staff operating
the programs, both to elicit accurate responses from the subjects, to re
duce attntion, and in light of confidentiality requirements that often make
it difficult for outside evaluators to be integrally involved In the operation
and data collection of the experiment. Such cooperation may be difficult
to achieve if randomization is talcing place.
~ any case, it is clear that observational, nonexperimental evaluation
techniques must be given serious consideration in the evaluation of AIDS
interventions. The techniques outlined in Section m are of potential
applicability to such interventions. It is no doubt obvious that in both
C&T and CBO programs selectivity bias is likely to be a problem that
those who choose to voluntarily make use of the services are likely to be
quite different from those who do not, even if they had not received any
program services.
The techniques outlined in Section m for addressing the selectivity
bias problem point In very specific directions for a solution to the problem,
namely, (~) the search for appropriate "Z's," and (2) the collection of
sexual behavior histories. ~ addition, although it has not been heavily
emphasized thus far, those techniques implicitly require the collection
of data on nonparticipants as well as participants. If data on only
participants are available, and therefore only a beforeandafter study
can be conducted, it will be very difficult to identify the effects of the
treatment on behavior given Me rapid increases in AIDS knowledge In the
reasons, it recommends against randomization for C&T evaluations if a zerotreannent cell is involved.
prefemng Hat all cells involve some type of treatment.
OCR for page 342
APPENDIX F ~ 361
general population and the presumed steady change in sexual prevention
behaviors that are occulting independently of these programs.
The Search for Z's
First, consider the issue of whether appropriate Z's can be found for
AIDS interventions. It is likely to be difficult to locate such Z's, but not
necessarily impossible. It is much easier, in fact, to identify variables Cat
are inappropriate as Z's than variables that are appropriate. For example,
it is extremely unlikely that arty sociodemographic or health characteristic
of individuals themselves would be appropriate. Health status, education
level, prior sexual history, and other such characteristics no doubt affect
the probability that an individual enrolls in a C&T or CBO program
but also unquestionably are independently related to prevention behavior
as well. Indeed, to use the language of economics, it is probably not
possible to locate appropriate Z vanables on the "demand" side of the
market—that is, among those individuals who are availing themselves
of the programs—and it would be more fruitful to look on the "supply"
side, where availability of programs is determined in the first place.
On the availability side, the C&T and CBO programs are indeed
differentially placed across neighborhoods within cities, between cities
and suburbs, across metropolitan areas, a~nd across states and regions.
Unfortunately for the evaluation effort, however, differential availability
in most cases is certain to be closely related to need. Those cities
most likely to have an extensive set of programs are those with large
subsegments of the highrisk population a~nd those where REV incidence
has already been determined to be high. Within cities, it is no doubt also
the case that programs are more likely to be located in neighborhoods
close to highnsk populations than in neighborhoods far from them.
With this process of program placement, differential availability win not
constitute an appropriate Zi.
If appropriate Z's are to be identified, it wild require a more detailed
investigation than is possible here but there are several directions in which
such an investigation could be pursued. First, a detailed examination of
the funding rules of CDC and other federal agencies would be warranted.
Grants are made to applying C&T and CBO sponsors, and no doubt the
need of the population to be served is a criterion in the funding decision.
But He availability of a Zi does not require that need not be used at
all in the decision, only that it not be the sole cntenon. To the extent
that other criteria are used to make funding decisions, criteria unrelated
to HIV incidence in the area, Z's may be identified. In addition, it is
rarely the case that federal funding decisions are as rational and clearcut
OCR for page 342
362 ~ EVALUATING AIDS PREVENTION PROGRAMS
as published funding formulas and formal points criteria suggest. It is
almost always the case that some agency discretion, political factors, or
bureaucratic forces come into play in some fraction of the decisions. To
the extent Hat they are, appropriate Z's will be available.
Second, a detailed study of several large cities may result In the
identification of other Z's. For example, it has been estimated that 60
percent of the male homosexual population in San Francisco has not
been tested for REV infection and has, therefore, almost certainly also
not enrolled in a C&T or CBO program.~3 Why this percent is so high
could be investigated. Perhaps the 60 percent who have not been tested
are those with low probabilities of HIV In the first place, or those who
are already practicing prevention behaviors in this case, no appropriate
Zi would be available. On the other hand, some of the nonpar~cipants
may be located in areas where no C&T or CBO program is present for
example, if they do not live in particular neighborhoods that have been
targeted. If so, differential access to a program could serve as the basis
for a Z.
Collection of Histories
The collection of data In general, and histories in particular, is likely to
be difficult for the evaluation of AIDS interventions. The confidentiality
of the testing and counseling process as well as the inherently sensitive
nature of the questions that must be asked to obtain information on the
necessary behaviors makes the prospect of obtaining reliable data highly
uncertain at our present state of knowledge. Obtaining informed consent
from those receiving the treatment as well as others may be problematical,
and may result in selfselected samples that threaten the integrity of the
design and consequently the validity of any impact estimates obtained.
These considerations make difficult the prospect of obtaining even a
single wave of postprogram data, much less multiple penods of pre
program data.~4 Randomized trials have the advantage of requiring less
data collection than observational studies, as noted In Section m, and
hence are relatively favored in this respect.
Nevertheless, cohort studies In this area have been undertaken and
and
have often been successful In retaining individuals In the sample,
more cohort collection efforts are underway. For example, Kasiow and
colleagues (1987) report the results of a survey of sexual behavior of
. _
Washington Post, January 9, 1990.
14Histories can be collected from retrospective questions as well as reinterviews. For example, one or
two preprogram interviews could be conducted, with the earliest one also containing a retrospective
battery.
OCR for page 342
APPENDIX F ~ 363
5000 asymptomatic homosexual men In which a baseline survey and lab
tests were followed by reinterviews and tests at s~xmonth intervals. As
of the latest (IOth) wave, about 5 years into the study, from 76 percent
to 97 percent of the individuals (across areas and risk groups) are still in
the sample, a very high percentage. The success of the cohort is partly a
result of solid confidentiality measures as well as the heavy involvement
of local gay community leaders and trained local staff from the beginning
of the study.
Other cohort collection efforts include the CDC crosscity study of
O'ReiBy, involving both homosexual men as well as IV drug users; the
study of seven CBOs headed by Vincent Mor at Brown University; He
San Francisco city clinic cohort and Hepatitis B cohort; and the upcoming
Westat cohort sponsored by NCHSR. How successful these efforts win
be remains to be seen, but there is no question that serious cohort studies
are being undertaken In increasing number. If they are successful, and if
the histories described In Section m can be obtained, program evaluation
designs will be greatly enhanced and impact estimates will be obtainable
with much greater reliability.
V. SUMMARY AND CONCLUSIONS
The evaluation of AIDS interventions poses difficult conceptual and prac
tical issues. Since randomized trials are unlikely to be feasible in many
circumstances, evaluation methods for observational, nonexperimental
data must be applied. Statistical methods developed by econorn~sts for
the evaluation of the impact of social and economic programs over the
past twenty years are applicable to this problem and have several ~rnpor
tant lessons for AIDS evaluations. The most important are that accurate
estimates of program impact require (~) a systematic search for iden
~fy~g "Z" vanables, vanables that affect the availability of program
services to different populations but which are not direct detenn~nants
of REV incidence or the adoption of prevention behaviors; or (2) the
collection of sufficiently lengthy sexual histories from participants and
nonparticipaIlts in the programs that can be used to reduce the selec
tion bias attendant upon participant/nonparticipant compansons. Both of
these implications are quite concrete and should provide funding agen
cies and program evaluators win specific directions to search for and In
which to pursue evaluation designs that win yield reliable estimates of
program impact.
REFERENCES
Ashenfelter, O. (1978) Estimating the effect of Mining programs on earnings. Review
of Economics arid Statistics 60:4757.
OCR for page 342
364 ~ EVALUATING AIDS PREVENTION PROGRAMS
Barnow, B. (1987) We impact of CETA programs on earnings: A review of the
literature. Journal of Human Resources 22:157193.
Barnow, B. Cain, G. and Goldberger, A. (1980) Issues in the analysis of selectivity
bias. In E. Stromsdorfer and G. Parkas, eds., Evaluation Studies Review Annual,
Volume 5. Beverly Hills, Calif.: Sage.
Bjorklund, A. and Moffitt, R. (1987) Estimation of wage gains and welfare gains in
selfselection models. Review of Economics and Statistics 69:42~9.
Goldberger, A. (1972) Selection bias in evaluating treatment effects: Some formal
illustrations. Discussion paper 12372. Madison, Wisconsin: Institute for Research
on Poverty.
Gronau, R. (1974) Wage compansons a selectivity bias. Journal of Political Economy
82:11191143.
Heckman, J. J. (1974) Shadow prices, market wages, and labor supply. Econometrica
42:679694.
Heckman, J. J. and Hotz, V. J. (1989) Choosing among alternative nonexperimental
methods for estimating the impact of social programs: The case of manpower
training. Journal of the American Statistical Association 84:862874.
Heckman, J. J. and Robb, R. (1985a) Alternative methods for evaluating the impact of
interventions: An overview. Journal of Econometrics 30:239267.
Heckman, J. J. and Robb, R. (1985b) Alternative methods for evaluating the impact of
interventions. In J. Heckman and B. Singer, eds., Longitudinal Analysis of Labor
Market Data. Cambndge: Cambridge University Press, 1985b.
Kaslow, R. W. Ostrow, D. G. Detels, R., Phair, J. P. Polk, B. F. and Rinaldo, C. R.
(1987) The Multicenter AIDS cohort study: Rationale, organization, and selected
characteristics of the participants. American Journal of Epidemiology 126:310318.
Lewis, H. G. (1974) Comments on Selectivity Biases in Wage Comparisons. Journal
of Political Economy 82: 11451155.
Maddala, G. S. (1983) LimitedDependent Variable and Qualitative Variables in Econo
metrics. Cambridge: Cambridge University Press, 1983.
Maddala, G. S. (1985) A survey of the literature on selectivity bias as it pertains to
health care markets. In R. M. Schemer, Ed., Advances in Health Economics and
Health Services Research, Vol. 6. Greenwich, Conn.: JAI Press.
Manski, C. (1990) Nonparame~c bounds for treatment effects. American Economic
Review 80:319323.
Moffitt, R. (1989) Comment on Heclunan and Hotz. Journal of the American Statistical
Association 84:877878.