Read "Evaluating AIDS Prevention Programs: Expanded Edition" at NAP.edu

« Previous: Summary

Page 15 Cite

Suggested Citation:"1 Design and Implementation of Evaluating Research." National Research Council. 1991. Evaluating AIDS Prevention Programs: Expanded Edition. Washington, DC: The National Academies Press. doi: 10.17226/1535.

Page 16 Cite

Page 17 Cite

Page 18 Cite

Page 19 Cite

Page 20 Cite

Page 21 Cite

Page 22 Cite

Page 23 Cite

Page 24 Cite

Page 25 Cite

Page 26 Cite

Page 27 Cite

Page 28 Cite

Page 29 Cite

Page 30 Cite

Page 31 Cite

Page 32 Cite

Page 33 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

1 Design and Implementation of Evaluation Research Evaluation has its roots In He social, behavioral, and statistical sciences, and it relies on their principles and methodologies of research, including expenmental design, measurement, statistical tests, and direct observa- tion. What distinguishes evaluation research from other social science is that its subjects are ongoing social action programs that are intended to produce individual or collective change. This setting usually engenders a great need for cooperation between those who conduct the program and those who evaluate it. This need for cooperation can be particularly acute In the case of AIDS prevention programs because those programs have been developed rapidly to meet the urgent demands of a changing and Seamy epidemic. Although the characteristics of AIDS intervention programs place some unique demands on evaluation, the techniques for conducting good program evaluation do not need to be invented. Two decades of evaluation research have provided a basic conceptual framework for undertaking such efforts (see, e.g., Campbell and Stanley [19661 and Cook and Campbell [1979] for discussions of outcome evaluation; see Weiss [19721 and Rossi and Freeman t1982] for process and outcome evaluations); in addition, similar programs, such as the antismoking campaigns, have been subject to evaluation, and they offer examples of the problems that have been encountered. In this chapter the pane} provides an overview of the terminology, types, designs, and management of research evaluation. The following chapter provides an overview of program objectives and the selection and measurement of appropriate outcome vanables for judging the effective- 15

16 EVALUATING AIDS PREVENTION PROGRAMS ness of AIDS intervention programs. These issues are discussed In detail in the subsequent, program-specific Chapters 3-5. TYPES OF EVALUATION The term evaluation implies a variety of different things to different people. The recent report of the Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences defines the area through a senesof questions (Turner, Miller, end Moses, 1989:317-3181: Evaluation is a systematic process that produces a trustworthy account of what was attempted and why; through the examination of results the outcomes of intervention programs it answers the questions, "What was done?" "To whom, and how?" and "What outcomes were observed?" Well-designed evaluation permits us to draw inferences from the data and addresses the difficult question: "What do the outcomes mean?" These questions differ in the degree of difficulty of answering them. An evaluation that tries to determine the outcomes of an intervention and what those outcomes mean is a more complicated endeavor than an evaluation that assesses the process by which the intervention was delivered. Both kinds of evaluation are necessary because they are ~nt~- mately connected: to establish a project's success, an evaluator must first ask whether the project was Implemented as planned and then whether its objective was achieved. Questions about a project's implementation usually fall under the rubric of process evaluation. If the investigation involves rapid feedback to the project staff or sponsors, particularly at the earliest stages of program implementation, the work is celled formative evaluation. Questions about effects or effectiveness are often variously called summative evaluation, impact assessment, or outcome evaluation, the term Me panel uses. Formative evaluation is a special type of early evaluation that occurs dunag and after a program has been designed but before it is broadly implemented. Formative evaluation is used to understand the need for the intervention and to make tentative decisions about how to implement or improve it. Dunng formative evaluation, information is collected and then fed back to program designers and administrators to enhance program development and maximize the success of the intervention. For example, formative evaluation may be calTied out through a pilot project before a program is implemented at several sites. A pilot study of a commun~ty- based organization (CBO), for example, might be used to gather data on problems involving access to and recruitment of targeted populations and He utilization and implementation of services; the findings of such a study would then be used to modify (if needed) the planned program.

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 17 Another example of formative evaluation is the use of a "story board" design of a TV message that has yet to be produced. A story board is a series of text and sketches of camera shots that are to be produced in a commercial. To evaluate the effectiveness of the message and forecast some of the consequences of actually broadcasting it to the general public, an advertising agency convenes small groups of people to react to and comment on the proposed design. Once an intervention has been implemented, the next stage of evalu- ation is process evaluation, which abbesses two broad questions: "What was done?" and "To whom, and how?" Ordinarily, process evaluation is canted out at some point In the life of a project to determine how and how well the delivery goals of the program are being met. When intervention programs continue over a long period of time (as is the case for some of the major AIDS prevention programs), measurements at several times are warranted to ensure that the components of the intervention continue to be delivered by the right people, to the right people, in the right manner, and at the right time. Process evaluation can also play a role in improving interventions by providing the information necessary to change delivery strategies or program objectives in a changing epidemic. Research designs for process evaluation include direct observation of projects, surveys of service providers and clients, and the monitoring Of a~ninistrative records. The pane] notes that the Centers for Disease Control (CDC) is already collecting some administrative records on its counseling and testing program and commun~ty-based projects. The panel believes that this type of evaluation should be a continuing and expanded component of intervention projects to guarantee the maintenance of the projects' integrity and responsiveness to their constituencies. The purpose of outcome evaluation is to identify consequences and to establish that consequences are, indeed, attributable to a project. This type of evaluation answers the questions, "What outcomes were observed?" and, perhaps more importantly, "What do the outcomes mean?" Like process evaluation, outcome evaluation can also be conducted at intervals during an ongoing program, and the pane} believes that such periodic evaluation should be done to monitor goal achievement. The pane] believes that these stages of evaluation (i.e., formative, process, and outcome) are essential to learning how AIDS prevention programs contribute to containing the epidemic. After a body of findings has been accumulated from such evaluations, it may be fruitful to launch another stage of evaluation: cost-effectiveness analysis (see Weinstein et al., 19891. Like outcome evaluation, cost-effectiveness analysis also measures program effectiveness, but it extends He analysis by adding a

18 ~ EVALUATING AIDS PREVENTION PROGRAMS measure of program cost. The panel believes that consideration of cost- effective analysis should be postponed until more experience is gained with formative, process, and outcome evaluation of the CDC AIDS prevention programs. EVALUATION RESEARCH DESIGN Process and outcome evaluations require different types of research de- signs, as discussed below. Formative evaluations, which are intended to both assess implementation and forecast effects, use a mix of these designs. Process Evaluation Designs To conduct process evaluations on how well services are delivered, data need to be gathered on the content of interventions and on their delivery systems. Suggested methodologies include direct observation, surveys, and record keeping. Direct observation designs include case studies, In which participant- observers unobtrusively and systematically record encounters within a program setting, and nonparticipant observation, in which long, open- ended (or "focused") interviews are conducted win program participants.) For example, "professional customers" at counseling and testing sites can act as project clients to monitor activities unobtrusively;2 alternatively, nonparticipant observers can interview both staff and clients. Surveys either censuses (of the whole population of interest) or samples elicit information through interviews or questionnaires completed by project participants or potential users of a project. For example, surveys within commun~ty-based projects can collect basic statistical information on project objectives, what services are provided, to whom, when, how often, for how long, and in what context. Record keeping consists of administrative or other reporting systems that monitor use of services. Standardized reporting ensures consistency in He scope and depth of data collected. To use the media campaign as an example, the pane} suggests using standardized data on He use of the AIDS hotline to monitor public attentiveness to the advertisements broadcast by the media campaign. 1 On occasion, nonparticipants observe behavior during or after an intervention. Chapter 3 introduces this option in the context of formative evaluation. 2The use of professional customers can raise serious concerns in the eyes of project administrators at counseling and testing sites. The panel believes that site administrators should receive advance notification Mat professional customers may visit their sites for testing and counseling services and provide their consent before this method of data collection is used.

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 19 These designs are simple to understand, but they require expertise to implement. For example, observational studies must be conducted by people who are well trained in how to carry out on-site tasks sensitively and to record their findings un~forndy. Observers can either complete narrative accounts of what occulted in a service sewing or they can complete some sort of data inventory to ensure that multiple aspects of service delivery are covered. These types of studies are time consuming and benefit from corroboration among several observers. The use of sur- veys in research is well-understood, although they, too, require expertise to be weD implemented. As the program chapters reflect, survey data collection must be carefully designed to reduce problems of validity and reliability and, if samples are used, to design an appropriate sampling scheme. Record keeping or service Overtones are probably the easiest research designs to implement, although preparing standardized internal forms requires attention to detail about salient aspects of service delivery. Outcome Evaluation Designs Research designs for outcome evaluations are meant to assess principal and relative effects. Ideally, to assess the effect of an intervention on program participants, one would like to know what would have happened to He same participants In the absence of the program. Because it is not possible to make this comparison directly, inference strategies Mat rely on proxies have to be used. Scientists use three general approaches to construct proxies for use in the comparisons required to evaluate the effects of interventions: (1) nonexpenmental methods, (2) quasi- experiments, and (3) randomized experiments. The first two are discussed below, and randomized experiments are discussed in the subsequent section. Nonexperimental and Quasi-Experimental Designs3 The most common foe of nonexperimental design is a before-and-after study. In this design, pre-~ntervention measurements are compared with equivalent measurements made after the intervention to detect change in the outcome variables that the intervention was designed to influence. Although the pane] finds that before-and-after studies frequently pro- vide helpful insights, the pane] believes that these studies do not provide sufficiently reliable information to be the cornerstone for evaluation re- search on the effectiveness of AIDS prevention programs. The panel's 3 Parts of this section are adopted from Turner, Miller, and Moses, (1989:32~326).

20 ~ EVALUATING AIDS PREVENTION PROGRAMS conclusion follows from the fact that the post~ntervention changes can- not usually be attnbuted unambiguously to We interveni~on.4 Plausible competing explanations for differences between pre- and postintervention measurements will often be numerous, including not only the possible ef- fects of other AIDS intervention programs, news stones, and local events, but also the effects that may result from the maturation of the partici- pants and the educational or sensitizing effects of repeated measurements, among others. Quasi-exper~mental and matched control designs provide a separate comparison group. In these designs, the control group may be- selected by matching nonparticipants to participants In the treatment group on the basis of selected characteristics. It is difficult to ensure the comparability of Me two groups even when they are matched on many characteristics because other relevant factors may have been overlooked or mismatched or they may be difficult to measure (e.g., the motivation to change behavior). In some situations, it may simply be impossible to measure all of the characteristics of the units (e.g., communities) that may affect outcomes, much less demonstrate their comparability. Matched control designs require extraordinarily comprehensive sci- entific knowledge about the phenomenon under investigation in order for evaluators to be confident that all of the relevant determinants of outcomes have been properly accounted for in the matching. Three types of infor- mation or knowledge are required: (~) knowledge of intervening variables that also affect Me outcome of the intervention and, consequently, need adjustment to make the groups comparable; (2) measurements on all ~n- terven~ng vanables for all subjects; and (3) knowledge of how to make the adjustments properly, which in turn requires an understanding of the functional relationship between the intervening vanables and Me outcome vanables. Satisfying each of these information requirements is likely to be more difficult than answering the primary evaluation question, "Does this intervention produce beneficial effects?" Given the size and the national importance of AIDS intervention programs and given the state of current knowledge about behavior change in general and AIDS prevention, in particular, the pane! believes that it would be unwise to rely on matching and adjustment strategies as the primary design for evaluating AIDS intervention programs. With differently constituted groups, inferences about results are hostage to uncertainty about the extent to which the observed outcome actually 4This weakness has been noted by CDC in a sourcebook provided to its HIV intervention project grantees (CDC, 1988:F-14).

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 21 results from the intervention and is not an artifact of intergroup differences that may not have been removed by matching or adjustment. Randomized Experiments A remedy to the inferential uncertainties that afflict nonexpelirnental designs is provided by randomized experiments. In such expenments, one singly constituted group is established for study. A subset of the group is then randomly chosen to receive the intervention, with the other subset becoming the control. The two groups are not identical, but they are comparable. Because they are two random samples drawn from the same population, they are not systematically different In any respect, which is important for all variables both known and unknown- that can influence the outcome. Dividing a singly constituted group into two random and therefore comparable subgroups cuts through the tangle of causation and establishes a basis for the valid comparison of respondents who do and do not receive the intervention. Randomized experiments provide for clear causal inference by solving the problem of group comparability, and may be used to answer the evaluation questions "Does He Intervention work?" and "What works better?" Which question is answered depends on whether the controls receive an intervention or not. When the object is to estimate whether a given Intervention has any effects, individuals are randomly assigned to the project or to a zero-~eatment control group. The control group may be put on a waiting list or simply not get the treatment. This design addresses the question, "Does it work?" When the object is to compare variations on a project e.g., indi- vidual counseling sessions versus group counseling then individuals are randomly assigned to these two regimens, and there is no zero-treatment control group. This design addresses the question, "What works better?" In either case, the control groups must be followed up as ngorously as Me experimental groups. Rationale. A randomized experiment requires that individuals, or- ganizations, or other treatment units be randomly assigned to one of two or more treatments or program variations. Random assignment en- sures that the estimated differences between the groups so constituted are statistically unbiased; that is, that any differences in effects measured between them are a result of treatment. The absence of statistical bias In groups constituted In this fashion stems from Be fact that random as- signment ensures that there are no systematic differences between them, differences that can and usually do affect groups composed in ways

22 ; EVALUATING AIDS PREVENTION PROGRAMS that are not random.5 The pane] believes this approach is far superior for outcome evaluations of AIDS interventions than the nonrandom and quasi-exper~mental approaches. Therefore, To improve interventions that are already broadly imple- mented, the pane! recommends the use of randomized field experiments of alternative or enhanced interventions. Under certain conditions, the pane] also endorses randomized field experiments with a nontreatment control group to evaluate new ~nterven- tions. In the context of a beady epidemic, ethics dictate that treatment not be withheld simply for the purpose of conducting an expenment. Nevertheless, there may be times when a randomized field test of a new treatment with a no-treatment control group is worthwhile. One such tune is during the design phase of a major or national intervention. Before a new intervention is broadly implemented, the pane! recommends that it be pilot tested in a randomized field experiment. The panel considered the use of experiments with delayed rather than no treatment. A delayed-treatment control group strategy might be pursued when resources are too scarce for an intervention to be widely distnbuted at one time. For example, a project site that is waiting to receive funding for an intervention would be designated as the contra! group. If it is possible to randomize which projects in the queue receive the intervention, an evaluator could measure and compare outcomes after the experimental group had received the new treatment but before the control group received it. The pane] believes that such a design can be applied only in limited circumstances, such as when groups would have access to related services in their communities and that conducting the study was likely to lead to greater access or better services. For example, a study cited in Chapter 4 used a randomized delayed- treatment experiment to measure the effects of a community-based risk reduction program. However, such a strategy may be impractical for several reasons, including: · sites waiting for funding for an intervention might seek resources from another source; · it might be difficult to enlist the nonfunded site and its clients to participate in the study; 5 The significance tests applied to experimental outcomes calculate the probability that any observed differences between the sample estimates might result from random variations between the groups.

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 23 · there could be an appearance of favoritism toward projects whose funding was not delayed. Pi~alis. Although randomized experiments have many benefits, the approach is not without pitfalls. In the planning stages of evaluation, it is necessary to contemplate cer ala hazards, such as the Hawthorne effects and differential project dropout rates. Precautions must be taken either to prevent these problems or to measure their effects. Fortunately, there is some evidence suggesting that the Hawthorne effect is usually not very large (Ross) and Freeman, 1982:175-1761. Attrition is potentially more damaging to an evaluation, and it must be Inmates if the experimental design is to be preserved. If sample attrition is not limited in an experimental design, it becomes necessary to account for the potentially biasing impact of the loss of subjects in the treatment and control conditions of the expenment. The statistical adjustments required to make inferences about treatment effectiveness in such circumstances can introduce uncertainties Hat are as womsome as those afflicting nonexpenmental and quasi-experimental designs. Thus, the panel's recommendation of the selective use of randomized design carries an implicit caveat: To realize the theoretical advantages offered by randomized experimental designs, substantial efforts will be required to ensure that the designs are not compromised by flawed execution. Another pitfall to randomization is its appearance of unfairness or unattractiveness to participants and the controversial legal and ethical issues it sometimes raises. Often, what is being cnticized is the control of project assignment of participants rather than the use of randomiza- tion itself. In deciding whether random assignment is appropriate, it is important to consider the specific context of the evaluation and how par- ticipants would be assigned to projects in the absence of randomization. The Federal Judicial Center (1981) offers five threshold conditions for the use of random assignment. · Does present practice or policy need improvement? · Is there significant uncertainty about He value of the pro- posed regimen? · Are there acceptable alternatives to randomized exper~- ments? · Will the results of the experiment be used to improve prac- tice or policy? 6Research participants' knowledge that they were being observed had a positive effect on Heir re- sponses in a series of famous studies made at General Electnc's Hawthome Works in Chicago (Roeth- lisberger and Dickson, 1939); the phenomenon is referred to as the Hawthorne effect.

24 ~ EVALUATING AIDS PREVENTION PROGRAMS . Is there a reasonable protection against risk for vulnerable groups (i.e., individuals within the justice system)? The parent committee has argued that these threshold conditions apply In the case of AIDS prevention programs (see Turner, Miller, and Moses, 1989:331-333). Although randomization may be desirable from an evaluation and ethical standpoint, and acceptable from a legal standpoint, it may be difficult to implement from a practical or political standpoint. Again, the pane! emphasizes that questions about the practical or political feasibility of the use of randomization may in fact refer to the control of program allocation rather than to the issues of randomization itself. In fact, when resources are scarce, it is often more ethical and politically palatable to randomize allocation rather than to allocate on grounds that may appear biased. It is usually easier to defend the use of randomization when the choice has to do with assignment to groups receiving alternative services than when the choice involves assignment to groups receiving no treatment. For example, in comparing a testing and counseling intervention that offered a special "skins training" session in addition to its regular services with a counseling and testing intervention that offered no additional component, random assignment of participants to one group rather than another may be acceptable to program staff and participants because the relative values of the alternative interventions are unknown. The more difficult issue is the introduction of new interventions that are perceived to be needed and effective in a situation In which there are no services. An argument that is sometimes offered against the use of randomization in this instance is that interventions should be assigned on the basis of need (perhaps as measured by rates of HIV incidence or of high-risk behaviors). But this argument presumes that the intervention will have a positive effect which is unknown before evaluation and that relative need can be established, which is a difficult task In itself. The pane! recognizes that community and political opposition to randomization to zero treatments may be strong and that enlisting par- ticipation in such experiments may be difficult. This opposition and reluctance could senously jeopardize the production of reliable results if it is translated into noncompliance with a research design. The feasibility of randomized experiments for AIDS prevention programs has already been demonstrated, however (see the review of selected experiments In Turner, Miller, and Moses, 1989:327-3291. The substantial effort in- volved In mounting randomized field experiments is repaid by the fact that they can provide unbiased evidence of the effects of a program.

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 25 Unit of Assignment. The unit of assignment of an experiment may be an individual person, a clinic (i.e., the clientele of the conch, or another organizational unit (e.g., the community or city). The treatment unit is selected at the earliest stage of design. Vanations of units are illustrated In the following four examples of intervention programs. 1. Two different pamphlets (A and B) on the same subject (e.g., testing) are distnbuted In an alternating sequence to Individuals calling an AIDS hotline. The outcome to be measured is whether the recipient returns a card asking for more information. 2. Two instruction curricula (A and B) about AIDS and HIV infections are prepared for use In high school Diver educa- tion classes. The outcome to be measured is a score on a knowledge test. 3. Of all clinics for sexually transmitted diseases (STDs) in a large metropolitan area, some are randomly chosen to introduce a change in the fee schedule. The outcome to be measured is the change in patient load. 4. A coordinated set of commuIiity-wide interventions invol- viny community leaders, social service agencies, the media, community associations and other groups is implemented in one area of a city. Outcomes are knowledge as assessed by testing at Mug treatment centers and STD clinics and condom sales in Be community's retail outlets. In example (1), the treatment unit is an individual person who receives pamphlet A or pamphlet B. If either "treatment" is applied again, it would be applied to a person. In example (2), the high school class is the treatment urut; everyone in a given class experiences either curriculum A or curriculum B. If either treatment is applied again, it would be applied to a class. The treatment unit is the clinic in example (3), and in example (4), the treatment unit is a community. The consistency of the effects of a particular intervention across repetitions justly carries a heavy weight In appraising the intervention. It is important to remember that repetitions of a treatment or intervention are the number of treatment units to which the intervention is applied. This is a salient principle in the design and execution of intervention programs as well as in the assessment of their results. The adequacy of the proposed sample size (number of treatment units) has to be considered in advance. Adequacy depends mainly on two factors:

26 ~ EVALUATING AIDS PREVENTION PROGRAMS · How much vanation occurs from unit to unit among units receiving a common treatment? If that variation is large, then the number of units needs to be large. · What is the minimum size of a possible treatment difference that, if present, would be practically important? That is, how small a treatment difference is it essential to detect if it is present? The smaller this quantity, the larger the number of units that are necessary. Many formal methods for considenng and choosing sample size exist (see, e.g., Cohen, 19881. Practical circumstances occasionally allow choosing between designs that involve units at different levels; thus, a classroom might be the unit if the treatment is applied In one way, but an entire school might be the unit if the treatment is applied in another. When both approaches are feasible, the use of a power analysis for each approach may lead to a reasoned choice. Choice of Methods There is some controversy about the advantages of randomized expen- ments In comparison with other evaluative approaches. It is the panel's belief that when a (well executed) randomized study is feasible, it is superior to alternative kinds of studies in the strength and cIanty of what- ever conclusions emerge, pnmanly because the experimental approach avoids selection biases.7 Other evaluation approaches are sometimes un- avoidable, but ordinary the accumulation of valid inflation will go more slowly and less securely than In randomized approaches. Experiments in medical research shed light on the advantages of carefully conducted randomized experiments. The Salk vaccine trials ale a successful example of a large, randomized study. In a double- blind test of the polio vaccine, children in various communities were randomly assigned to two treatments, either the vaccine or a placebo. By this method, the effectiveness of Salk vaccine was demonstrated in one summer of research (Meter, 19571. A sufficient accumulation of relevant, observational information, es- pecially when collected in studies using different procedures and sample populations, may also clearly demonstrate the effectiveness of a treat- ment or intervention. The process of accumulating such info~mabon can 7Participants who self-select into a program are likely to be different from non-random comparison groups in teens of interests, motivations, values, abilities, and other attributes that can bias the out- comes. 8A double-blind test is one in which neither the person receiving the treatment nor the person admin- istering it knows which treatment (or when no treatment) is being given.

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 27 be a long one, however. When a (well-executed) randomized study is feasible, it can provide evidence that is subject to less uncertainty in its interpretation, and it can often do so In a more timely fashion. In the midst of an epidemic, the pane! believes it proper that randomized exper- iments be one of the primary strategies for evaluating the effectiveness of AIDS prevention efforts. In making this recommendation, however, the pane] also wishes to emphasize that the advantages of the random- ized experimental design can be squandered by poor execution (e.g., by compromised assignment of subjects, significant subject attrition rates, etc.~. To achieve the advantages of the experimental design, care must be taken to ensure that the integrity of the design is not compromised by poor execution. In proposing that randomized experiments be one of the primary strategies for evaluating the effectiveness of AIDS prevention programs, the pane] also recognizes that there are situations in which randomization will be impossible or, for other reasons, cannot be used. In its next report the pane] win describe at length appropriate nonexperimental strategies to be considered In situations in which an experiment is not a practical or desirable alternative. THE MANAGEMENT OF EVALUATION Conscientious evaluation requires a considerable investment of funds, time, and personnel. Because the panel recognizes that resources are not unlimited, it suggests that Hey be concentrated on the evaluation of a subset of projects to maximize the return on investment and to enhance the likelihood of high-quality results. Project Selection Deciding which programs or sites to evaluate is by no means a trivial matter. Selection should be carefully weighed so that projects that are not replicable or that have little chance for success are not subjected to rigorous evaluations. The pane! recommends that any intensive evaluation of an intervention be conducted on a subset of projects selected according to explicit criteria. These criteria should include the replicability of the project, the feasibility of evaluation, and the project's potential effectiveness for prevention of HIV transmission. If a project is replicable, it means that the particular circumstances of service delivery In that project can be duplicated. In other words, for

28 ~ EVALUATING AIDS PREVENTION PROGRAMS CBOs and counseling and testing projects, the content and sewing of an intervention can be duplicated across sites. Feasibility of evaluation means that, as a practical matter, the research can be done: that is, the research design is adequate to control for rival hypotheses, it is not excessively costly, and the project is acceptable to the community and the sponsor. Potential effectiveness for IDV prevention means that the intervention is at least based on a reasonable theory (or mix of theones) about behavioral change (e.g., social learning theory [Bandura, 1977], the health belief mode! [Ianz and Becker, 1984], etc.), if it has not already been found to be effective In related circumstances. In addition, since it is important to ensure Mat the results of evalua- tions will be broadly applicable, The pane! recommends that evaluation be conducted and replicated across major types of subgroups, programs, and settings. Attention should be paid to geographic areas with low and high AIDS prevalence, as well as to subpopulations at low and high risk for AIDS. Research Administration The sponsoring agency interested in evaluating an AIDS intervention should consider the mechanisms through which We research will be caITied out as well as the desirability of both independent oversight and agency in-house conduct and monitoring of the research. The appropriate entities and mechanisms for conducting evaluations depend to some extent on the kinds of data being gathered and the evaluation questions being asked. Oversight and monitoring are important to keep projects fully ir~- fo~ed about Me other evaluations relevant to their own and to render assistance when needed. Oversight and mon~tonng are also important because evaluation is often a sensitive issue for project and evaluation staff alike. The pane! is aware that evaluation may appear threatening to practitioners and researchers because of Me possibility Mat evaluation research will show that their projects are not as effective as they believe them to be. These needs and vuinerabilities should be taken into account as evaluation research management is developed. Conducting the Research To conduct some aspects of a project's evaluation, it may be appropriate to involve project a~n~n~strators, especially when the data will be used to evaluate delivery systems (e.g., to determine when and which services

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 29 are being delivered). To evaluate outcomes, the services of an outside evaluators or evaluation team are almost always required because few practitioners have the necessary professional experience or the time and resources necessary to do evaluation. The outside evaluator must have relevant expertise In evaluation research methodology and must also be sensitive to the fears, hopes, and constraints of project adrn~n~strators. Several evaluation management schemes are possible. For example, a prospective AIDS prevention project group (the contractor) can bid on a contract for project funding that includes an intensive evaluation com- ponent. The actual evaluation can be conducted either by the contractor alone or by the contractor working in concert with an outside indepen- dent collaborator. This mechanism has the advantage of involving project practitioners in the work of evaluation as well as building separate but mutually informing communities of experts around the county. Alterna- tively, a contract can be let with a single evaluator or evaluation team that will collaborate with the subset of sites that is chosen for evaluation. This variation would be managerially less burdensome than awarding separate contracts, but it would require greater dependence on the exper- tise of a single investigator or investigative team. (Appendix A discusses contracting options in greater depth.) Both of these approaches accord with the parent committee's recommendation that collaboration between practitioners and evaluation researchers be ensured. Final, in the more traditional evaluation approach, independent principal investigators or ~n- vesiigative teams may respond to a request for proposal (REP) issued to evaluate individual projects. Such investigators are frequently university- based or are members of a professional research organization, and they bring to the task a variety of research experiences and perspectives. Independent Oversight The panel believes that coordination and oversight of multisite evaluations is critical because of the variability In investigators' expertise and in the results of the projects being evaluated. Oversight can provide quality condor for individual investigators and can be used to review and integrate findings across sites for developing policy. The independence of an oversight body is crucial to ensure that project evaluations do not succumb to Me pressures for positive findings of effectiveness. When evaluation is to be conducteat4 by a number of dif- ferent evaluation teams, the pane! recommends establishing 9As discussed under 44Agency In-House Te4~n9'9 the outside evaluator might be one of CDC's personnel. However, given the Large 4amount of research to be done, it is likely that non-CDC ev4~1ua4tors will also need to be used.

30 ~ EVALUATING AIDS PREVENTION PROGRAMS an independent scientific committee to oversee project selec- tion and research efforts, corroborate the impartiality and validity of results, conduct cross-site analyses, and prepare reports on the progress of the evaluations. The composition of such an independent oversight committee win depend on Me research design of a given program. For example, the committee ought to include statisticians and other specialists In random- ized field tests when that approach is being taken. Specialists in survey research and case studies should be recruited if either of those approaches is to be used. Appendix B offers a mode! for an independent oversight group that has been successfully implemented in other settingsa project review team, or advisory board. Agency In-House Team As the parent committee noted in its report, evaluations of AIDS inter- ventions require skins that may be in short supply for agencies invested in delivering services (Turner, Miner, and Moses, 1989:3491. Although this situation can be partly alleviated by recruiting professional outside eval- uators and retaining an independent oversight group, the pane] believes that an in-house team of professionals within the sponsoring agency is also critical. The in-house experts will interact with Me outside evalua- tors and provide input into the selection of projects, outcome objectives, and appropriate research designs; they will also monitor the progress and costs of evaluation. These functions require not just bureaucratic oversight but appropriate scientific expertise. This is not intended to preclude the direct involvement of CDC staff In conducting evaluations. However, given the great amount of work to be done, it is likely a considerable pornon will have to be contracted out. The quality and usefulness of the evaluations done under contract can be greatly enhanced by ensuring that there are an adequate number of CDC staff trained in evaluation research methods to monitor these contracts. The pane! recommends that CDC recruit and retain behav- ioral, social, and statistical scientists trained in evaluation methodology to facilitate the implementation of the evalua- tion research recommended in this report. Interagency Collaboration The panel believes that the federal agencies that sponsor the design of basic research, intervention programs, and evaluation strategies would

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 31 profit from greater interagency collaboration. The evaluation of AIDS intervention programs would benefit from a coherent program of studies that should provide models of efficacious and effective interventions to prevent further HIV transmission, the spread of other STDs, and unwanted pregnancies (especially among adolescents). A marriage could then be made of basic and applied science, from which the best evaluation is born. Explonng the possibility of interagency collaboration and CDC's role in such collaboration is beyond the scope of this panel's task, but it is an important issue that we suggest be addressed in the future. Costs of Evaluation In view of He dearth of current evaluation efforts, the pane] believes that vigorous evaluation research must be undertaken over the next few years to build up a body of knowledge about what interventions can and cannot do. Dedicating no resources to evaluation win virtually guarantee that high-quality evaluations win be infrequent and the data needed for policy decisions will be sparse or absent. Yet, evaluating every project is not feasible simply because there are not enough resources and, in many cases, evaluating every project is not necessary for good science or good policy. The pane] believes that evaluating only some of a program's sites or projects, selected under the criteria noted in Chapter 4, is a sensible strategy. Although we recommend that intensive evaluation be conducted on only a subset of carefully chosen projects, we believe that high-quality evaluation will require a significant investment of time, planning, per- sonnel, and financial support. The panel's aim is to be realistic not discouraging when it notes that the costs of program evaluation should not be underestimated. Many of the research strategies proposed In this report require investments that are perhaps greater than has been previ- ously contemplated. This is particularly the case for outcome evaluations, which are ordinanly more difficult and expensive to conduct than forma- tive or process evaluations. And those costs win be additive win each type of evaluation that is conducted. Panel members have found that the cost of an outcome evaluation sometimes equals or even exceeds the cost of actual program delivery. For example, it was reported to the pane} that randomized studies used to evaluate recent manpower mining projects cost as much as He projects themselves (see Cottingham and Rodriguez, 1987~. In another case, the pnncipal investigator of an ongoing AIDS prevention project told the pane} that the cost of randomized experunentation was approx~nately three times higher Can the cost of delivering the intervention (albeit the

32 ~ EVALUATING AIDS PREVENTION PROGRAMS study was quite small, involving only 104 participants) (Kelly et al., 1989~. Fortunately, only a fraction of a program's projects or sites need to be intensively evaluated to produce high-quaTity information, and not all win require randomized studies. Because of the variability in kinds of evaluation that will be done as well as in the costs involved, there is no set standard or rule for judging what fraction of a total program budget should be invested in evaluation. Based upon very Limited datable and assuming that only a smog sample of projects would be evaluated, the panel suspects that program managers might reasonably anticipate spending ~ to 12 percent of their intervention budgets to conduct high-quality evaluations (i.e., formative, process, and outcome evaluations).ii Larger investments seem politically infeasible and unwise in view of the need to put resources into program delivery. Smaller investments in evaluation may risk studying an inadequate sample of program types, and it may also invite compromises in research quality. The nature of the HIV/AIDS epidemic mandates an unwavering commitment to prevention programs, and the prevention activities require a similar commitment to the evaluation of those programs. The magnitude of what can be learned from doing good evaluations win more than balance the magnitude of the costs required to perform them. Moreover, it should be realized that the costs of shoddy research can be substantial, both in their direct expense and In the lost opportunities to identify effective strategies for AIDS prevention. Once the investment has been made, however, and a reservoir of findings and practical experience has accumulated, subsequent evaluations should be easier and less costly to conduct. REFERENCES Bandura, A. (1977) Self-efficacy: Toward a unifying theory of behavioral change. Psychological Review 34:191-215. Campbell, D. T., and Stanley, J. C. (1966) Experimental and Q'`"ci-E~perimental Design and Analysis. Boston: Hollghton-Mifflin. Centers for Disease Control (CDC) (1988) Sourcebook presented at the National Conference on the Prevention of HIV Infection and AIDS Among Racial and Ethnic Minonties in the United States (August). 10See, for example, chapter 3 which presents cost estimates for evaluations of media campaigns. Sim- ilar estimates are not readily available for other program types. 11For example, the U. K. Health Education Authority (that country's primary agency for AIDS edu- cation and prevention programs) allocates 10 percent of its AIDS budget for research and evaluation of its AIDS programs (D. McVey, Health Education Authority, personal communication, June 1990). This allocation covers both process and outcome evaluation.

DESIGN AND IMPLEMENTATION OF EVALUATIONS ~ 33 Cohen, J. (1988) Statistical Power Analysis for the Behavioral Sciences. 2nd ed. Hillsdale, N.J.: L. Erlbaum Associates. Cook, T., and Campbell, D. T. (1979) Quasi-Experimentation: Design and Analysis for Field Settings. Boston: Houghton-Mifflin. Federal Judicial Center (1981) Experimentation in the Law. Washington, D.C.: Federal Judicial Center. Jane, N. K., and Becker, M. H. (1984) The health belief model: A decade later. Health Education Quarterly 1 1~11: 1~7. Kelly, J. A., St. I~wrence, J. S., Hood, H. V., and Brasfield, T. L. (1989) Behavioral intervention to reduce AIDS risk activities. Journal of Consulting and Clinical Psychology 57:60-67. Meter, P. (1957) Safety testing of poliomyelitis vaccine. Science 125~32571: 1067-1071. Roethlisberger, F. J. and Dickson, W. J. (1939) Management and the Worker. Cambridge, Mass.: Harvard University Press. Rossi, P. H., and Freeman, H. E. (1982) Evaluation: A Systematic Approach. 2nd ed. Beverly Hills, Cal.: Sage Publications. Turner, C. F., Miller, H. G., and Moses, L. E., eds. (1989) AIDS, Sexual Behavior, and Intravenous Drug Use. Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. Weinstein, M. C., Graham, J. D., Siegel, J. E., and Fineberg, H. V. (1989) Cost- effectiveness analysis of AIDS prevention programs: Concepts, complications, and illustrations. ~ C.F. Turner, H. G. Miller, and L. E. Moses, eds., AIDS, Sexual Behavior, and Intravenous Drug Use. Report of the NRC Committee on AIDS Research and the Behavioral, Social, and Statistical Sciences. Washington, D.C.: National Academy Press. Weiss, C. H. (1972) Evaluation Research. Englewood Cliffs, N.J.: Prentice-Hall, Inc. 1

Next: 2 Measurement of Outcomes »

Evaluating AIDS Prevention Programs: Expanded Edition (1991)

Chapter: 1 Design and Implementation of Evaluating Research

Welcome to OpenBook!

Get Email Updates