| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 15
1
Design and Implementation
of Evaluation Research
Evaluation has its roots In He social, behavioral, and statistical sciences,
and it relies on their principles and methodologies of research, including
expenmental design, measurement, statistical tests, and direct observa-
tion. What distinguishes evaluation research from other social science is
that its subjects are ongoing social action programs that are intended to
produce individual or collective change. This setting usually engenders
a great need for cooperation between those who conduct the program
and those who evaluate it. This need for cooperation can be particularly
acute In the case of AIDS prevention programs because those programs
have been developed rapidly to meet the urgent demands of a changing
and Seamy epidemic.
Although the characteristics of AIDS intervention programs place
some unique demands on evaluation, the techniques for conducting good
program evaluation do not need to be invented. Two decades of evaluation
research have provided a basic conceptual framework for undertaking
such efforts (see, e.g., Campbell and Stanley [19661 and Cook and
Campbell [1979] for discussions of outcome evaluation; see Weiss [19721
and Rossi and Freeman t1982] for process and outcome evaluations); in
addition, similar programs, such as the antismoking campaigns, have
been subject to evaluation, and they offer examples of the problems that
have been encountered.
In this chapter the pane} provides an overview of the terminology,
types, designs, and management of research evaluation. The following
chapter provides an overview of program objectives and the selection and
measurement of appropriate outcome vanables for judging the effective-
15
OCR for page 16
16
EVALUATING AIDS PREVENTION PROGRAMS
ness of AIDS intervention programs. These issues are discussed In detail
in the subsequent, program-specific Chapters 3-5.
TYPES OF EVALUATION
The term evaluation implies a variety of different things to different
people. The recent report of the Committee on AIDS Research and the
Behavioral, Social, and Statistical Sciences defines the area through a
senesof questions (Turner, Miller, end Moses, 1989:317-3181:
Evaluation is a systematic process that produces a trustworthy account of what
was attempted and why; through the examination of results the outcomes
of intervention programs it answers the questions, "What was done?" "To
whom, and how?" and "What outcomes were observed?" Well-designed
evaluation permits us to draw inferences from the data and addresses the
difficult question: "What do the outcomes mean?"
These questions differ in the degree of difficulty of answering them.
An evaluation that tries to determine the outcomes of an intervention
and what those outcomes mean is a more complicated endeavor than
an evaluation that assesses the process by which the intervention was
delivered. Both kinds of evaluation are necessary because they are ~nt~-
mately connected: to establish a project's success, an evaluator must first
ask whether the project was Implemented as planned and then whether
its objective was achieved. Questions about a project's implementation
usually fall under the rubric of process evaluation. If the investigation
involves rapid feedback to the project staff or sponsors, particularly at the
earliest stages of program implementation, the work is celled formative
evaluation. Questions about effects or effectiveness are often variously
called summative evaluation, impact assessment, or outcome evaluation,
the term Me panel uses.
Formative evaluation is a special type of early evaluation that occurs
dunag and after a program has been designed but before it is broadly
implemented. Formative evaluation is used to understand the need for the
intervention and to make tentative decisions about how to implement or
improve it. Dunng formative evaluation, information is collected and then
fed back to program designers and administrators to enhance program
development and maximize the success of the intervention. For example,
formative evaluation may be calTied out through a pilot project before a
program is implemented at several sites. A pilot study of a commun~ty-
based organization (CBO), for example, might be used to gather data
on problems involving access to and recruitment of targeted populations
and He utilization and implementation of services; the findings of such
a study would then be used to modify (if needed) the planned program.
OCR for page 17
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 17
Another example of formative evaluation is the use of a "story board"
design of a TV message that has yet to be produced. A story board is
a series of text and sketches of camera shots that are to be produced in
a commercial. To evaluate the effectiveness of the message and forecast
some of the consequences of actually broadcasting it to the general public,
an advertising agency convenes small groups of people to react to and
comment on the proposed design.
Once an intervention has been implemented, the next stage of evalu-
ation is process evaluation, which abbesses two broad questions: "What
was done?" and "To whom, and how?" Ordinarily, process evaluation is
canted out at some point In the life of a project to determine how and how
well the delivery goals of the program are being met. When intervention
programs continue over a long period of time (as is the case for some of
the major AIDS prevention programs), measurements at several times are
warranted to ensure that the components of the intervention continue to
be delivered by the right people, to the right people, in the right manner,
and at the right time. Process evaluation can also play a role in improving
interventions by providing the information necessary to change delivery
strategies or program objectives in a changing epidemic.
Research designs for process evaluation include direct observation
of projects, surveys of service providers and clients, and the monitoring
Of a~ninistrative records. The pane] notes that the Centers for Disease
Control (CDC) is already collecting some administrative records on its
counseling and testing program and commun~ty-based projects. The panel
believes that this type of evaluation should be a continuing and expanded
component of intervention projects to guarantee the maintenance of the
projects' integrity and responsiveness to their constituencies.
The purpose of outcome evaluation is to identify consequences and to
establish that consequences are, indeed, attributable to a project. This type
of evaluation answers the questions, "What outcomes were observed?"
and, perhaps more importantly, "What do the outcomes mean?" Like
process evaluation, outcome evaluation can also be conducted at intervals
during an ongoing program, and the pane} believes that such periodic
evaluation should be done to monitor goal achievement.
The pane] believes that these stages of evaluation (i.e., formative,
process, and outcome) are essential to learning how AIDS prevention
programs contribute to containing the epidemic. After a body of findings
has been accumulated from such evaluations, it may be fruitful to launch
another stage of evaluation: cost-effectiveness analysis (see Weinstein
et al., 19891. Like outcome evaluation, cost-effectiveness analysis also
measures program effectiveness, but it extends He analysis by adding a
OCR for page 18
18 ~ EVALUATING AIDS PREVENTION PROGRAMS
measure of program cost. The panel believes that consideration of cost-
effective analysis should be postponed until more experience is gained
with formative, process, and outcome evaluation of the CDC AIDS
prevention programs.
EVALUATION RESEARCH DESIGN
Process and outcome evaluations require different types of research de-
signs, as discussed below. Formative evaluations, which are intended
to both assess implementation and forecast effects, use a mix of these
designs.
Process Evaluation Designs
To conduct process evaluations on how well services are delivered, data
need to be gathered on the content of interventions and on their delivery
systems. Suggested methodologies include direct observation, surveys,
and record keeping.
Direct observation designs include case studies, In which participant-
observers unobtrusively and systematically record encounters within a
program setting, and nonparticipant observation, in which long, open-
ended (or "focused") interviews are conducted win program participants.)
For example, "professional customers" at counseling and testing sites can
act as project clients to monitor activities unobtrusively;2 alternatively,
nonparticipant observers can interview both staff and clients. Surveys—
either censuses (of the whole population of interest) or samples elicit
information through interviews or questionnaires completed by project
participants or potential users of a project. For example, surveys within
commun~ty-based projects can collect basic statistical information on
project objectives, what services are provided, to whom, when, how
often, for how long, and in what context.
Record keeping consists of administrative or other reporting systems
that monitor use of services. Standardized reporting ensures consistency
in He scope and depth of data collected. To use the media campaign
as an example, the pane} suggests using standardized data on He use of
the AIDS hotline to monitor public attentiveness to the advertisements
broadcast by the media campaign.
1 On occasion, nonparticipants observe behavior during or after an intervention. Chapter 3 introduces
this option in the context of formative evaluation.
2The use of professional customers can raise serious concerns in the eyes of project administrators
at counseling and testing sites. The panel believes that site administrators should receive advance
notification Mat professional customers may visit their sites for testing and counseling services and
provide their consent before this method of data collection is used.
OCR for page 19
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 19
These designs are simple to understand, but they require expertise
to implement. For example, observational studies must be conducted by
people who are well trained in how to carry out on-site tasks sensitively
and to record their findings un~forndy. Observers can either complete
narrative accounts of what occulted in a service sewing or they can
complete some sort of data inventory to ensure that multiple aspects of
service delivery are covered. These types of studies are time consuming
and benefit from corroboration among several observers. The use of sur-
veys in research is well-understood, although they, too, require expertise
to be weD implemented. As the program chapters reflect, survey data
collection must be carefully designed to reduce problems of validity and
reliability and, if samples are used, to design an appropriate sampling
scheme. Record keeping or service Overtones are probably the easiest
research designs to implement, although preparing standardized internal
forms requires attention to detail about salient aspects of service delivery.
Outcome Evaluation Designs
Research designs for outcome evaluations are meant to assess principal
and relative effects. Ideally, to assess the effect of an intervention on
program participants, one would like to know what would have happened
to He same participants In the absence of the program. Because it is
not possible to make this comparison directly, inference strategies Mat
rely on proxies have to be used. Scientists use three general approaches
to construct proxies for use in the comparisons required to evaluate
the effects of interventions: (1) nonexpenmental methods, (2) quasi-
experiments, and (3) randomized experiments. The first two are discussed
below, and randomized experiments are discussed in the subsequent
section.
Nonexperimental and Quasi-Experimental Designs3
The most common foe of nonexperimental design is a before-and-after
study. In this design, pre-~ntervention measurements are compared with
equivalent measurements made after the intervention to detect change in
the outcome variables that the intervention was designed to influence.
Although the pane] finds that before-and-after studies frequently pro-
vide helpful insights, the pane] believes that these studies do not provide
sufficiently reliable information to be the cornerstone for evaluation re-
search on the effectiveness of AIDS prevention programs. The panel's
3 Parts of this section are adopted from Turner, Miller, and Moses, (1989:32~326).
OCR for page 20
20 ~ EVALUATING AIDS PREVENTION PROGRAMS
conclusion follows from the fact that the post~ntervention changes can-
not usually be attnbuted unambiguously to We interveni~on.4 Plausible
competing explanations for differences between pre- and postintervention
measurements will often be numerous, including not only the possible ef-
fects of other AIDS intervention programs, news stones, and local events,
but also the effects that may result from the maturation of the partici-
pants and the educational or sensitizing effects of repeated measurements,
among others.
Quasi-exper~mental and matched control designs provide a separate
comparison group. In these designs, the control group may be- selected
by matching nonparticipants to participants In the treatment group on the
basis of selected characteristics. It is difficult to ensure the comparability
of Me two groups even when they are matched on many characteristics
because other relevant factors may have been overlooked or mismatched
or they may be difficult to measure (e.g., the motivation to change
behavior). In some situations, it may simply be impossible to measure
all of the characteristics of the units (e.g., communities) that may affect
outcomes, much less demonstrate their comparability.
Matched control designs require extraordinarily comprehensive sci-
entific knowledge about the phenomenon under investigation in order for
evaluators to be confident that all of the relevant determinants of outcomes
have been properly accounted for in the matching. Three types of infor-
mation or knowledge are required: (~) knowledge of intervening variables
that also affect Me outcome of the intervention and, consequently, need
adjustment to make the groups comparable; (2) measurements on all ~n-
terven~ng vanables for all subjects; and (3) knowledge of how to make
the adjustments properly, which in turn requires an understanding of the
functional relationship between the intervening vanables and Me outcome
vanables. Satisfying each of these information requirements is likely to
be more difficult than answering the primary evaluation question, "Does
this intervention produce beneficial effects?"
Given the size and the national importance of AIDS intervention
programs and given the state of current knowledge about behavior change
in general and AIDS prevention, in particular, the pane! believes that
it would be unwise to rely on matching and adjustment strategies as
the primary design for evaluating AIDS intervention programs. With
differently constituted groups, inferences about results are hostage to
uncertainty about the extent to which the observed outcome actually
4This weakness has been noted by CDC in a sourcebook provided to its HIV intervention project
grantees (CDC, 1988:F-14).
OCR for page 21
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 21
results from the intervention and is not an artifact of intergroup differences
that may not have been removed by matching or adjustment.
Randomized Experiments
A remedy to the inferential uncertainties that afflict nonexpelirnental
designs is provided by randomized experiments. In such expenments,
one singly constituted group is established for study. A subset of the
group is then randomly chosen to receive the intervention, with the
other subset becoming the control. The two groups are not identical,
but they are comparable. Because they are two random samples drawn
from the same population, they are not systematically different In any
respect, which is important for all variables both known and unknown-
that can influence the outcome. Dividing a singly constituted group
into two random and therefore comparable subgroups cuts through the
tangle of causation and establishes a basis for the valid comparison of
respondents who do and do not receive the intervention. Randomized
experiments provide for clear causal inference by solving the problem of
group comparability, and may be used to answer the evaluation questions
"Does He Intervention work?" and "What works better?"
Which question is answered depends on whether the controls receive
an intervention or not. When the object is to estimate whether a given
Intervention has any effects, individuals are randomly assigned to the
project or to a zero-~eatment control group. The control group may
be put on a waiting list or simply not get the treatment. This design
addresses the question, "Does it work?"
When the object is to compare variations on a project e.g., indi-
vidual counseling sessions versus group counseling then individuals are
randomly assigned to these two regimens, and there is no zero-treatment
control group. This design addresses the question, "What works better?"
In either case, the control groups must be followed up as ngorously as
Me experimental groups.
Rationale. A randomized experiment requires that individuals, or-
ganizations, or other treatment units be randomly assigned to one of
two or more treatments or program variations. Random assignment en-
sures that the estimated differences between the groups so constituted
are statistically unbiased; that is, that any differences in effects measured
between them are a result of treatment. The absence of statistical bias
In groups constituted In this fashion stems from Be fact that random as-
signment ensures that there are no systematic differences between them,
differences that can and usually do affect groups composed in ways
OCR for page 22
22 ; EVALUATING AIDS PREVENTION PROGRAMS
that are not random.5 The pane] believes this approach is far superior
for outcome evaluations of AIDS interventions than the nonrandom and
quasi-exper~mental approaches. Therefore,
To improve interventions that are already broadly imple-
mented, the pane! recommends the use of randomized field
experiments of alternative or enhanced interventions.
Under certain conditions, the pane] also endorses randomized field
experiments with a nontreatment control group to evaluate new ~nterven-
tions. In the context of a beady epidemic, ethics dictate that treatment
not be withheld simply for the purpose of conducting an expenment.
Nevertheless, there may be times when a randomized field test of a new
treatment with a no-treatment control group is worthwhile. One such
tune is during the design phase of a major or national intervention.
Before a new intervention is broadly implemented, the pane!
recommends that it be pilot tested in a randomized field
experiment.
The panel considered the use of experiments with delayed rather
than no treatment. A delayed-treatment control group strategy might be
pursued when resources are too scarce for an intervention to be widely
distnbuted at one time. For example, a project site that is waiting to
receive funding for an intervention would be designated as the contra!
group. If it is possible to randomize which projects in the queue receive
the intervention, an evaluator could measure and compare outcomes
after the experimental group had received the new treatment but before
the control group received it. The pane] believes that such a design
can be applied only in limited circumstances, such as when groups
would have access to related services in their communities and that
conducting the study was likely to lead to greater access or better services.
For example, a study cited in Chapter 4 used a randomized delayed-
treatment experiment to measure the effects of a community-based risk
reduction program. However, such a strategy may be impractical for
several reasons, including:
· sites waiting for funding for an intervention might seek
resources from another source;
· it might be difficult to enlist the nonfunded site and its
clients to participate in the study;
5 The significance tests applied to experimental outcomes calculate the probability that any observed
differences between the sample estimates might result from random variations between the groups.
OCR for page 23
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 23
· there could be an appearance of favoritism toward projects
whose funding was not delayed.
Pi~alis. Although randomized experiments have many benefits, the
approach is not without pitfalls. In the planning stages of evaluation, it is
necessary to contemplate cer ala hazards, such as the Hawthorne effects
and differential project dropout rates. Precautions must be taken either to
prevent these problems or to measure their effects. Fortunately, there is
some evidence suggesting that the Hawthorne effect is usually not very
large (Ross) and Freeman, 1982:175-1761.
Attrition is potentially more damaging to an evaluation, and it must
be Inmates if the experimental design is to be preserved. If sample
attrition is not limited in an experimental design, it becomes necessary
to account for the potentially biasing impact of the loss of subjects in
the treatment and control conditions of the expenment. The statistical
adjustments required to make inferences about treatment effectiveness in
such circumstances can introduce uncertainties Hat are as womsome as
those afflicting nonexpenmental and quasi-experimental designs. Thus,
the panel's recommendation of the selective use of randomized design
carries an implicit caveat: To realize the theoretical advantages offered
by randomized experimental designs, substantial efforts will be required
to ensure that the designs are not compromised by flawed execution.
Another pitfall to randomization is its appearance of unfairness or
unattractiveness to participants and the controversial legal and ethical
issues it sometimes raises. Often, what is being cnticized is the control
of project assignment of participants rather than the use of randomiza-
tion itself. In deciding whether random assignment is appropriate, it is
important to consider the specific context of the evaluation and how par-
ticipants would be assigned to projects in the absence of randomization.
The Federal Judicial Center (1981) offers five threshold conditions for
the use of random assignment.
· Does present practice or policy need improvement?
· Is there significant uncertainty about He value of the pro-
posed regimen?
· Are there acceptable alternatives to randomized exper~-
ments?
· Will the results of the experiment be used to improve prac-
tice or policy?
6Research participants' knowledge that they were being observed had a positive effect on Heir re-
sponses in a series of famous studies made at General Electnc's Hawthome Works in Chicago (Roeth-
lisberger and Dickson, 1939); the phenomenon is referred to as the Hawthorne effect.
OCR for page 24
24 ~ EVALUATING AIDS PREVENTION PROGRAMS
.
Is there a reasonable protection against risk for vulnerable
groups (i.e., individuals within the justice system)?
The parent committee has argued that these threshold conditions apply In
the case of AIDS prevention programs (see Turner, Miller, and Moses,
1989:331-333).
Although randomization may be desirable from an evaluation and
ethical standpoint, and acceptable from a legal standpoint, it may be
difficult to implement from a practical or political standpoint. Again, the
pane! emphasizes that questions about the practical or political feasibility
of the use of randomization may in fact refer to the control of program
allocation rather than to the issues of randomization itself. In fact, when
resources are scarce, it is often more ethical and politically palatable to
randomize allocation rather than to allocate on grounds that may appear
biased.
It is usually easier to defend the use of randomization when the choice
has to do with assignment to groups receiving alternative services than
when the choice involves assignment to groups receiving no treatment.
For example, in comparing a testing and counseling intervention that
offered a special "skins training" session in addition to its regular services
with a counseling and testing intervention that offered no additional
component, random assignment of participants to one group rather than
another may be acceptable to program staff and participants because the
relative values of the alternative interventions are unknown.
The more difficult issue is the introduction of new interventions that
are perceived to be needed and effective in a situation In which there are
no services. An argument that is sometimes offered against the use of
randomization in this instance is that interventions should be assigned on
the basis of need (perhaps as measured by rates of HIV incidence or of
high-risk behaviors). But this argument presumes that the intervention
will have a positive effect which is unknown before evaluation and
that relative need can be established, which is a difficult task In itself.
The pane! recognizes that community and political opposition to
randomization to zero treatments may be strong and that enlisting par-
ticipation in such experiments may be difficult. This opposition and
reluctance could senously jeopardize the production of reliable results if
it is translated into noncompliance with a research design. The feasibility
of randomized experiments for AIDS prevention programs has already
been demonstrated, however (see the review of selected experiments In
Turner, Miller, and Moses, 1989:327-3291. The substantial effort in-
volved In mounting randomized field experiments is repaid by the fact
that they can provide unbiased evidence of the effects of a program.
OCR for page 25
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 25
Unit of Assignment. The unit of assignment of an experiment may be
an individual person, a clinic (i.e., the clientele of the conch, or another
organizational unit (e.g., the community or city). The treatment unit is
selected at the earliest stage of design. Vanations of units are illustrated
In the following four examples of intervention programs.
1. Two different pamphlets (A and B) on the same subject
(e.g., testing) are distnbuted In an alternating sequence to
Individuals calling an AIDS hotline. The outcome to be
measured is whether the recipient returns a card asking for
more information.
2. Two instruction curricula (A and B) about AIDS and HIV
infections are prepared for use In high school Diver educa-
tion classes. The outcome to be measured is a score on a
knowledge test.
3. Of all clinics for sexually transmitted diseases (STDs) in
a large metropolitan area, some are randomly chosen to
introduce a change in the fee schedule. The outcome to be
measured is the change in patient load.
4. A coordinated set of commuIiity-wide interventions invol-
viny community leaders, social service agencies, the media,
community associations and other groups is implemented
in one area of a city. Outcomes are knowledge as assessed
by testing at Mug treatment centers and STD clinics and
condom sales in Be community's retail outlets.
In example (1), the treatment unit is an individual person who receives
pamphlet A or pamphlet B. If either "treatment" is applied again, it
would be applied to a person. In example (2), the high school class is the
treatment urut; everyone in a given class experiences either curriculum A
or curriculum B. If either treatment is applied again, it would be applied
to a class. The treatment unit is the clinic in example (3), and in example
(4), the treatment unit is a community.
The consistency of the effects of a particular intervention across
repetitions justly carries a heavy weight In appraising the intervention. It
is important to remember that repetitions of a treatment or intervention
are the number of treatment units to which the intervention is applied.
This is a salient principle in the design and execution of intervention
programs as well as in the assessment of their results.
The adequacy of the proposed sample size (number of treatment
units) has to be considered in advance. Adequacy depends mainly on
two factors:
OCR for page 26
26 ~ EVALUATING AIDS PREVENTION PROGRAMS
· How much vanation occurs from unit to unit among units
receiving a common treatment? If that variation is large,
then the number of units needs to be large.
· What is the minimum size of a possible treatment difference
that, if present, would be practically important? That is,
how small a treatment difference is it essential to detect if it
is present? The smaller this quantity, the larger the number
of units that are necessary.
Many formal methods for considenng and choosing sample size
exist (see, e.g., Cohen, 19881. Practical circumstances occasionally allow
choosing between designs that involve units at different levels; thus, a
classroom might be the unit if the treatment is applied In one way, but
an entire school might be the unit if the treatment is applied in another.
When both approaches are feasible, the use of a power analysis for each
approach may lead to a reasoned choice.
Choice of Methods
There is some controversy about the advantages of randomized expen-
ments In comparison with other evaluative approaches. It is the panel's
belief that when a (well executed) randomized study is feasible, it is
superior to alternative kinds of studies in the strength and cIanty of what-
ever conclusions emerge, pnmanly because the experimental approach
avoids selection biases.7 Other evaluation approaches are sometimes un-
avoidable, but ordinary the accumulation of valid inflation will go
more slowly and less securely than In randomized approaches.
Experiments in medical research shed light on the advantages of
carefully conducted randomized experiments. The Salk vaccine trials
ale a successful example of a large, randomized study. In a double-
blind test of the polio vaccine, children in various communities were
randomly assigned to two treatments, either the vaccine or a placebo. By
this method, the effectiveness of Salk vaccine was demonstrated in one
summer of research (Meter, 19571.
A sufficient accumulation of relevant, observational information, es-
pecially when collected in studies using different procedures and sample
populations, may also clearly demonstrate the effectiveness of a treat-
ment or intervention. The process of accumulating such info~mabon can
7Participants who self-select into a program are likely to be different from non-random comparison
groups in teens of interests, motivations, values, abilities, and other attributes that can bias the out-
comes.
8A double-blind test is one in which neither the person receiving the treatment nor the person admin-
istering it knows which treatment (or when no treatment) is being given.
OCR for page 27
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 27
be a long one, however. When a (well-executed) randomized study is
feasible, it can provide evidence that is subject to less uncertainty in its
interpretation, and it can often do so In a more timely fashion. In the
midst of an epidemic, the pane! believes it proper that randomized exper-
iments be one of the primary strategies for evaluating the effectiveness
of AIDS prevention efforts. In making this recommendation, however,
the pane] also wishes to emphasize that the advantages of the random-
ized experimental design can be squandered by poor execution (e.g., by
compromised assignment of subjects, significant subject attrition rates,
etc.~. To achieve the advantages of the experimental design, care must
be taken to ensure that the integrity of the design is not compromised by
poor execution.
In proposing that randomized experiments be one of the primary
strategies for evaluating the effectiveness of AIDS prevention programs,
the pane] also recognizes that there are situations in which randomization
will be impossible or, for other reasons, cannot be used. In its next report
the pane] win describe at length appropriate nonexperimental strategies
to be considered In situations in which an experiment is not a practical
or desirable alternative.
THE MANAGEMENT OF EVALUATION
Conscientious evaluation requires a considerable investment of funds,
time, and personnel. Because the panel recognizes that resources are not
unlimited, it suggests that Hey be concentrated on the evaluation of a
subset of projects to maximize the return on investment and to enhance
the likelihood of high-quality results.
Project Selection
Deciding which programs or sites to evaluate is by no means a trivial
matter. Selection should be carefully weighed so that projects that are
not replicable or that have little chance for success are not subjected to
rigorous evaluations.
The pane! recommends that any intensive evaluation of an
intervention be conducted on a subset of projects selected
according to explicit criteria. These criteria should include
the replicability of the project, the feasibility of evaluation,
and the project's potential effectiveness for prevention of
HIV transmission.
If a project is replicable, it means that the particular circumstances of
service delivery In that project can be duplicated. In other words, for
OCR for page 28
28 ~ EVALUATING AIDS PREVENTION PROGRAMS
CBOs and counseling and testing projects, the content and sewing of
an intervention can be duplicated across sites. Feasibility of evaluation
means that, as a practical matter, the research can be done: that is,
the research design is adequate to control for rival hypotheses, it is not
excessively costly, and the project is acceptable to the community and
the sponsor. Potential effectiveness for IDV prevention means that the
intervention is at least based on a reasonable theory (or mix of theones)
about behavioral change (e.g., social learning theory [Bandura, 1977], the
health belief mode! [Ianz and Becker, 1984], etc.), if it has not already
been found to be effective In related circumstances.
In addition, since it is important to ensure Mat the results of evalua-
tions will be broadly applicable,
The pane! recommends that evaluation be conducted and
replicated across major types of subgroups, programs, and
settings. Attention should be paid to geographic areas with
low and high AIDS prevalence, as well as to subpopulations
at low and high risk for AIDS.
Research Administration
The sponsoring agency interested in evaluating an AIDS intervention
should consider the mechanisms through which We research will be
caITied out as well as the desirability of both independent oversight and
agency in-house conduct and monitoring of the research. The appropriate
entities and mechanisms for conducting evaluations depend to some
extent on the kinds of data being gathered and the evaluation questions
being asked.
Oversight and monitoring are important to keep projects fully ir~-
fo~ed about Me other evaluations relevant to their own and to render
assistance when needed. Oversight and mon~tonng are also important
because evaluation is often a sensitive issue for project and evaluation
staff alike. The pane! is aware that evaluation may appear threatening
to practitioners and researchers because of Me possibility Mat evaluation
research will show that their projects are not as effective as they believe
them to be. These needs and vuinerabilities should be taken into account
as evaluation research management is developed.
Conducting the Research
To conduct some aspects of a project's evaluation, it may be appropriate
to involve project a~n~n~strators, especially when the data will be used
to evaluate delivery systems (e.g., to determine when and which services
OCR for page 29
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 29
are being delivered). To evaluate outcomes, the services of an outside
evaluators or evaluation team are almost always required because few
practitioners have the necessary professional experience or the time and
resources necessary to do evaluation. The outside evaluator must have
relevant expertise In evaluation research methodology and must also be
sensitive to the fears, hopes, and constraints of project adrn~n~strators.
Several evaluation management schemes are possible. For example,
a prospective AIDS prevention project group (the contractor) can bid on
a contract for project funding that includes an intensive evaluation com-
ponent. The actual evaluation can be conducted either by the contractor
alone or by the contractor working in concert with an outside indepen-
dent collaborator. This mechanism has the advantage of involving project
practitioners in the work of evaluation as well as building separate but
mutually informing communities of experts around the county. Alterna-
tively, a contract can be let with a single evaluator or evaluation team
that will collaborate with the subset of sites that is chosen for evaluation.
This variation would be managerially less burdensome than awarding
separate contracts, but it would require greater dependence on the exper-
tise of a single investigator or investigative team. (Appendix A discusses
contracting options in greater depth.) Both of these approaches accord
with the parent committee's recommendation that collaboration between
practitioners and evaluation researchers be ensured. Final, in the more
traditional evaluation approach, independent principal investigators or ~n-
vesiigative teams may respond to a request for proposal (REP) issued to
evaluate individual projects. Such investigators are frequently university-
based or are members of a professional research organization, and they
bring to the task a variety of research experiences and perspectives.
Independent Oversight
The panel believes that coordination and oversight of multisite evaluations
is critical because of the variability In investigators' expertise and in the
results of the projects being evaluated. Oversight can provide quality
condor for individual investigators and can be used to review and integrate
findings across sites for developing policy. The independence of an
oversight body is crucial to ensure that project evaluations do not succumb
to Me pressures for positive findings of effectiveness.
When evaluation is to be conducteat4 by a number of dif-
ferent evaluation teams, the pane! recommends establishing
9As discussed under 44Agency In-House Te4~n9'9 the outside evaluator might be one of CDC's personnel.
However, given the Large 4amount of research to be done, it is likely that non-CDC ev4~1ua4tors will also
need to be used.
OCR for page 30
30 ~ EVALUATING AIDS PREVENTION PROGRAMS
an independent scientific committee to oversee project selec-
tion and research efforts, corroborate the impartiality and
validity of results, conduct cross-site analyses, and prepare
reports on the progress of the evaluations.
The composition of such an independent oversight committee win
depend on Me research design of a given program. For example, the
committee ought to include statisticians and other specialists In random-
ized field tests when that approach is being taken. Specialists in survey
research and case studies should be recruited if either of those approaches
is to be used. Appendix B offers a mode! for an independent oversight
group that has been successfully implemented in other settings—a project
review team, or advisory board.
Agency In-House Team
As the parent committee noted in its report, evaluations of AIDS inter-
ventions require skins that may be in short supply for agencies invested in
delivering services (Turner, Miner, and Moses, 1989:3491. Although this
situation can be partly alleviated by recruiting professional outside eval-
uators and retaining an independent oversight group, the pane] believes
that an in-house team of professionals within the sponsoring agency is
also critical. The in-house experts will interact with Me outside evalua-
tors and provide input into the selection of projects, outcome objectives,
and appropriate research designs; they will also monitor the progress
and costs of evaluation. These functions require not just bureaucratic
oversight but appropriate scientific expertise.
This is not intended to preclude the direct involvement of CDC staff
In conducting evaluations. However, given the great amount of work to
be done, it is likely a considerable pornon will have to be contracted out.
The quality and usefulness of the evaluations done under contract can be
greatly enhanced by ensuring that there are an adequate number of CDC
staff trained in evaluation research methods to monitor these contracts.
The pane! recommends that CDC recruit and retain behav-
ioral, social, and statistical scientists trained in evaluation
methodology to facilitate the implementation of the evalua-
tion research recommended in this report.
Interagency Collaboration
The panel believes that the federal agencies that sponsor the design of
basic research, intervention programs, and evaluation strategies would
OCR for page 31
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 31
profit from greater interagency collaboration. The evaluation of AIDS
intervention programs would benefit from a coherent program of studies
that should provide models of efficacious and effective interventions to
prevent further HIV transmission, the spread of other STDs, and unwanted
pregnancies (especially among adolescents). A marriage could then be
made of basic and applied science, from which the best evaluation is
born. Explonng the possibility of interagency collaboration and CDC's
role in such collaboration is beyond the scope of this panel's task, but it
is an important issue that we suggest be addressed in the future.
Costs of Evaluation
In view of He dearth of current evaluation efforts, the pane] believes
that vigorous evaluation research must be undertaken over the next few
years to build up a body of knowledge about what interventions can and
cannot do. Dedicating no resources to evaluation win virtually guarantee
that high-quality evaluations win be infrequent and the data needed for
policy decisions will be sparse or absent. Yet, evaluating every project is
not feasible simply because there are not enough resources and, in many
cases, evaluating every project is not necessary for good science or good
policy.
The pane] believes that evaluating only some of a program's sites
or projects, selected under the criteria noted in Chapter 4, is a sensible
strategy. Although we recommend that intensive evaluation be conducted
on only a subset of carefully chosen projects, we believe that high-quality
evaluation will require a significant investment of time, planning, per-
sonnel, and financial support. The panel's aim is to be realistic not
discouraging when it notes that the costs of program evaluation should
not be underestimated. Many of the research strategies proposed In this
report require investments that are perhaps greater than has been previ-
ously contemplated. This is particularly the case for outcome evaluations,
which are ordinanly more difficult and expensive to conduct than forma-
tive or process evaluations. And those costs win be additive win each
type of evaluation that is conducted.
Panel members have found that the cost of an outcome evaluation
sometimes equals or even exceeds the cost of actual program delivery.
For example, it was reported to the pane} that randomized studies used to
evaluate recent manpower mining projects cost as much as He projects
themselves (see Cottingham and Rodriguez, 1987~. In another case, the
pnncipal investigator of an ongoing AIDS prevention project told the
pane} that the cost of randomized experunentation was approx~nately
three times higher Can the cost of delivering the intervention (albeit the
OCR for page 32
32 ~ EVALUATING AIDS PREVENTION PROGRAMS
study was quite small, involving only 104 participants) (Kelly et al.,
1989~. Fortunately, only a fraction of a program's projects or sites need
to be intensively evaluated to produce high-quaTity information, and not
all win require randomized studies.
Because of the variability in kinds of evaluation that will be done
as well as in the costs involved, there is no set standard or rule for
judging what fraction of a total program budget should be invested in
evaluation. Based upon very Limited datable and assuming that only a
smog sample of projects would be evaluated, the panel suspects that
program managers might reasonably anticipate spending ~ to 12 percent
of their intervention budgets to conduct high-quality evaluations (i.e.,
formative, process, and outcome evaluations).ii Larger investments seem
politically infeasible and unwise in view of the need to put resources into
program delivery. Smaller investments in evaluation may risk studying an
inadequate sample of program types, and it may also invite compromises
in research quality.
The nature of the HIV/AIDS epidemic mandates an unwavering
commitment to prevention programs, and the prevention activities require
a similar commitment to the evaluation of those programs. The magnitude
of what can be learned from doing good evaluations win more than
balance the magnitude of the costs required to perform them. Moreover,
it should be realized that the costs of shoddy research can be substantial,
both in their direct expense and In the lost opportunities to identify
effective strategies for AIDS prevention. Once the investment has been
made, however, and a reservoir of findings and practical experience has
accumulated, subsequent evaluations should be easier and less costly to
conduct.
REFERENCES
Bandura, A. (1977) Self-efficacy: Toward a unifying theory of behavioral change.
Psychological Review 34:191-215.
Campbell, D. T., and Stanley, J. C. (1966) Experimental and Q'`"ci-E~perimental Design
and Analysis. Boston: Hollghton-Mifflin.
Centers for Disease Control (CDC) (1988) Sourcebook presented at the National
Conference on the Prevention of HIV Infection and AIDS Among Racial and
Ethnic Minonties in the United States (August).
10See, for example, chapter 3 which presents cost estimates for evaluations of media campaigns. Sim-
ilar estimates are not readily available for other program types.
11For example, the U. K. Health Education Authority (that country's primary agency for AIDS edu-
cation and prevention programs) allocates 10 percent of its AIDS budget for research and evaluation
of its AIDS programs (D. McVey, Health Education Authority, personal communication, June 1990).
This allocation covers both process and outcome evaluation.
OCR for page 33
DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 33
Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences. 2nd ed.
Hillsdale, N.J.: L. Erlbaum Associates.
Cook, T., and Campbell, D. T. (1979) Quasi-Experimentation: Design and Analysis for
Field Settings. Boston: Houghton-Mifflin.
Federal Judicial Center (1981) Experimentation in the Law. Washington, D.C.: Federal
Judicial Center.
Jane, N. K., and Becker, M. H. (1984) The health belief model: A decade later. Health
Education Quarterly 1 1~11: 1~7.
Kelly, J. A., St. I~wrence, J. S., Hood, H. V., and Brasfield, T. L. (1989) Behavioral
intervention to reduce AIDS risk activities. Journal of Consulting and Clinical
Psychology 57:60-67.
Meter, P. (1957) Safety testing of poliomyelitis vaccine. Science 125~32571: 1067-1071.
Roethlisberger, F. J. and Dickson, W. J. (1939) Management and the Worker. Cambridge,
Mass.: Harvard University Press.
Rossi, P. H., and Freeman, H. E. (1982) Evaluation: A Systematic Approach. 2nd ed.
Beverly Hills, Cal.: Sage Publications.
Turner, C. F., Miller, H. G., and Moses, L. E., eds. (1989) AIDS, Sexual Behavior,
and Intravenous Drug Use. Report of the NRC Committee on AIDS Research
and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National
Academy Press.
Weinstein, M. C., Graham, J. D., Siegel, J. E., and Fineberg, H. V. (1989) Cost-
effectiveness analysis of AIDS prevention programs: Concepts, complications,
and illustrations. ~ C.F. Turner, H. G. Miller, and L. E. Moses, eds., AIDS,
Sexual Behavior, and Intravenous Drug Use. Report of the NRC Committee on
AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington,
D.C.: National Academy Press.
Weiss, C. H. (1972) Evaluation Research. Englewood Cliffs, N.J.: Prentice-Hall, Inc.
1
Representative terms from entire chapter:
prevention programs