3
Improving Evaluation

Improving the standards of evidence used in the evaluation of family violence interventions is one of the most critical needs in this field. Given the complexity and unique history of family violence interventions, researchers and service providers have used a variety of methods and a broad array of measures and evaluation strategies over the past two decades. This experimentation has contributed important ideas (using qualitative and quantitative methods) that have helped to establish a baseline for the assessment of individual programs. What is lacking, however, is a capacity and research base that can offer specific guidance to key decision makers and the broader service provider community about the impact or relative effectiveness of specific interventions, as well as broad service strategies, to address the multiple dimensions of family violence.

Recognizing that more rigorous studies are needed to better determine ''what works," "for whom," "under what conditions," and "at what cost," the committee sought to identify research strategies and components of evaluation designs that represent key opportunities for improvement. The road to improvement requires attention to four areas: (1) assessing the limitations of current evaluations, (2) forging functional partnerships between researchers and service providers, (3) addressing the dynamics of collaboration in those partnerships, and (4) exploring new evaluation methods to assess comprehensive community initiatives.

The emerging emphasis on integrated, multifaceted, community-based approaches to treatment and prevention services, in particular, presents a new dilemma in evaluating family violence interventions: comprehensive interventions are particularly difficult, if not impossible, to implement as well as study using experimental or quasi-experimental designs. Efforts to resolve this dilemma may



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 59
--> 3 Improving Evaluation Improving the standards of evidence used in the evaluation of family violence interventions is one of the most critical needs in this field. Given the complexity and unique history of family violence interventions, researchers and service providers have used a variety of methods and a broad array of measures and evaluation strategies over the past two decades. This experimentation has contributed important ideas (using qualitative and quantitative methods) that have helped to establish a baseline for the assessment of individual programs. What is lacking, however, is a capacity and research base that can offer specific guidance to key decision makers and the broader service provider community about the impact or relative effectiveness of specific interventions, as well as broad service strategies, to address the multiple dimensions of family violence. Recognizing that more rigorous studies are needed to better determine ''what works," "for whom," "under what conditions," and "at what cost," the committee sought to identify research strategies and components of evaluation designs that represent key opportunities for improvement. The road to improvement requires attention to four areas: (1) assessing the limitations of current evaluations, (2) forging functional partnerships between researchers and service providers, (3) addressing the dynamics of collaboration in those partnerships, and (4) exploring new evaluation methods to assess comprehensive community initiatives. The emerging emphasis on integrated, multifaceted, community-based approaches to treatment and prevention services, in particular, presents a new dilemma in evaluating family violence interventions: comprehensive interventions are particularly difficult, if not impossible, to implement as well as study using experimental or quasi-experimental designs. Efforts to resolve this dilemma may

OCR for page 59
--> benefit from attention to service design, program implementation, and assessment experiences in related fields (such as substance abuse and teenage pregnancy prevention). These experiences could reveal innovative methods, common lessons, and reliable measures in the design and development of comprehensive community interventions, especially in areas characterized by individualized services, principles of self-determination, and community-wide participation. Improving on study design and methodology is important, since technical improvements are necessary to strengthen the science base. But the dynamics of the relationships between researchers and service providers are also important; a creative and mutually beneficial partnership can enhance both research and program design. Two additional points warrant mention in a broad discussion of the status of evaluations of family violence interventions. First, learning more about the effectiveness of programs, interventions, and service strategies requires the development of controlled studies and reliable measures, preceded by detailed process evaluations and case studies that can delve into the nature and clients of a particular intervention as well as aspects of the institutional or community settings that facilitate or impede implementation. Second, the range of interactions between treatment and clients requires closer attention to variations in the individual histories and social settings of the clients involved. These interactions can be studied in longitudinal studies or evaluations that pair clients and treatment regimens and allow researchers to follow cohorts over time within the general study group. Assessing The Limitations Of Current Evaluations The limitations of the empirical evidence for family violence interventions are not new, nor are they unique. For violence interventions of all kinds, few examples provide sufficient evidence to recommend the implementation of a particular program (National Research Council, 1993b). And numerous reviews indicate that evaluation studies of many social policies achieve low rates of technical quality (Cordray, 1993; Lipsey et al., 1985). A recent National Research Council study offered two explanations for the poor quality of evaluations of violence interventions: (1) most evaluations were not planned as part of the introduction of a program and (2) evaluation designs were too weak to reach a conclusion as to the program's effects (National Research Council, 1993b). The field cannot be improved simply by urging researchers and service providers to strengthen the standards of evidence used in evaluation studies. Nor can it be improved simply by urging that evaluation studies be introduced in the early stages of the planning and design of interventions. Specific attention is needed to the hierarchy of study designs, the developmental stages of evaluation research and interventions, the marginal role of research in service settings, and

OCR for page 59
--> the difficulties associated with imposing experimental conditions in service settings. A Hierarchy of Study Designs The evaluation of family violence interventions requires research directed at estimating the unique effects (or net impact) of the intervention, above and beyond any change that may have occurred because of a multitude of other factors. Such research requires study designs that can distinguish the impact of the intervention within a general service population from other changes that occur only in certain groups or that result simply as a passage of time. These design commonly involve the formation of (at least) two groups: one composed of individuals who participated in the intervention (often called the treatment or experimental group) and a second group composed of individuals who are comparable in character and experience to those who participated in the intervention but who received no services or received an intervention that was significantly different from that under study (the control or comparison group). Some study designs involve multiple treatment groups who receive modified forms of the services that are the subject of the evaluation; other studies use one subject group, but sample the group at multiple times prior to and after the intervention to determine whether the measures immediately before and after the program are a continuation of earlier patterns or whether they indicate a decisive change that endures after the cessation of services; this is called a time-series design (Campbell and Stanley, 1966, and Weiss, 1972, are two comprehensive primers on the basic principles and designs of program assessment and evaluation research). Study designs that rely on a comparison group, or that use a time-series design, are commonly viewed as more reliable than evaluation studies that simply test a single treatment group prior to and after an intervention (often called a pre-post study). More rigorous experimental designs, which involve the use of a randomized control group, are able to distinguish changes in behavior or attitude that occur as a result of the intervention from those that are influenced by maturation (the tendency for many individuals to improve over time), self-selection (the tendency of individuals who are motivated to change to seek out and continue their involvement with an intervention that is helping them), and other sources of bias that may influence the outcome of the intervention. The importance of estimating the net impact of innovative services and determining their comparative value highlights several technical issues: The manner in which the control or comparison groups are formulated influences the validity of the inference that can be drawn regarding net effects. The number of participants enrolled in each group (the sample size) must be sufficient to permit statistical detection of differences between the groups, if one exists.

OCR for page 59
--> There should be agreement among interested parties that a selected outcome is important to measure, that it is a valid reflection of the objective of the intervention, and that it can reflect change over time. Evidence is needed that the innovative services were actually provided as planned, and that the differences between the innovative services and usual services were large enough to generate meaningful differences in the outcome of interest. Over the last few decades, evaluation research has developed a general consensus about the relative strength of various study designs used to assess the effectiveness (or net effects) of interventions (Figure 3-1; see also Green and Byar, 1984). The lowest level of evidence in the hierarchy is nonexperimental designs, which include case studies and anecdotal reports. This type of research often consists of detailed histories of participant experiences with an intervention. Although they may contain a wealth of information, nonexperimental studies cannot provide a strong base for inference because they are unable to control for such factors as maturation, self-selection, the interaction of selection and maturation (the tendency for those with more or less severe problem levels to mature at differential rates), historical influences that are unrelated to the intervention, other interventions they may have received, a variety of response biases and demand characteristics, and changes in instrumentation (the interviewer becomes more familiar with the client over the course of the study). The next level of evidence is quasi-experimental research designs (levels 4 through 6 in Figure 3-1). Although these designs can improve inferential clarity, they cannot be relied on to yield unbiased estimates of the effects of interventions because the research subjects are not assigned randomly. Two research reviews of family violence interventions suggest that some trustworthy information can be extracted from quasi-experimental designs, so they are not without merit (Finkelhor and Berliner, 1995; Heneghan et al., 1996). Although quasi-experimental study designs can provide evidence that a relationship exists between participation in the intervention and the outcome, the magnitude of the net effect is difficult to determine. In other fields, quasi-experimental results are more trustworthy when the studies involve a broader evidential basis than simple prepost designs (Cook and Campbell, 1979; Cordray, 1986). The highest level of evidence is experimental designs that include controls to restrict a number of important threats to internal validity. These are the least prevalent types of designs in the family violence literature. Although, in theory, properly designed and executed experiments can produce unbiased estimates of net effects, other threats to validity emerge when they are conducted in largely uncontrolled settings. Such threats involve various forms of generalization (across persons, settings, time, and other constructs of interest), statistical problems, and logistical problems (e.g., differential attrition from measurement or noncompliance with the intervention protocol). These other threats to validity

OCR for page 59
--> FIGURE 3-1 Hierarchy of strength of evidence in research evaluations. SOURCE: Modified from Green and Byar (1984). Copyright John Wiley & Sons Limited. Reproduced with permission.

OCR for page 59
--> can be addressed by replications and the synthesis of research results, including the use of meta-analysis (Lipsey and Wilson, 1993). Replication is an essential part of the scientific process; replication studies can reveal evidence from both successes and failures. The proper use of research synthesis can provide a tool for understanding variation and similarities across studies and will uncover robust intervention effects, if they are present, even if the individual studies are not generalizable because their samples are not representative, the interventions are unique, or their measures are inconsistent (Cook and Shadish, 1994; Cordray, 1993; Colditz et al., 1995). Developmental Stages of Research Interventions often undergo an evolutionary process that refines theories, services, and approaches over time. In the early stages, interventions generate reform efforts and evaluations that rely primarily on descriptive studies and anecdotal data. As these interventions and evaluation efforts mature, they begin to approach the standards of evidence needed to make confident judgments about effectiveness and cost. Current discussions about what is known about family violence interventions are often focused on determining the effectiveness or cost-effectiveness of selected programs, interventions, or strategies. Conclusions about effectiveness require fairly high standards of evidence because an evaluation must demonstrate, with high certainty, that the intervention of interest is responsible for the observed effects. This high standard of evidence is warranted to change major policies, but it may inhibit a careful assessment of what can be derived from a knowledge base that is still immature. Of the more than 2,000 evaluation studies identified by the committee in the course of this study, the large majority consist of nonexperimental study designs. For example, one review of 29 studies of the treatment of sexually abused children indicated that more than half of the studies (17) were pre-post (also called before and after) designs that evaluated the same group of children at two or more time intervals during which some kind of professional intervention was provided (Finkelhor and Berliner, 1995). Similarly, a methodological review of intensive family preservation services programs excluded 36 of 46 identified program evaluations because they contained no comparison groups (Heneghan et al., 1996). Thus, while hundreds of evaluation studies exist in the family violence research literature, most of them provide no firm basis for examining the relative impact of a specific intervention or considering the ways in which different types of clients respond to a treatment, prevention, or deterrence intervention. Still, nonexperimental studies can reveal important information in the developmental process of research. They can illuminate the characteristics and experience of program participants, the nature of the presenting problems, and issues associated with efforts to implement individual programs or to change systems of

OCR for page 59
--> service within the community. Although these kinds of studies cannot provide evidence of effectiveness, they do represent important building blocks in evaluation research. Developmental Stages of Interventions A similar developmental process exists on the programmatic side of interventions. Many family violence treatment and prevention programs have their origins in the efforts of advocates concerned about children, women, elderly people, and the family unit. Over several decades of organized activity, these efforts have fostered the development of interventions in social service, health, and law enforcement settings that program sponsors believe will improve the welfare of victims or control and reduce the violent behavior of perpetrators. Some programs are based on common sense and legal authority, such as mandatory reporting requirements; some are based on theories borrowed from other areas, such as counseling services for victims of trauma (Azar, 1988); and some are based on broad theories of human interaction that are difficult to operationalize, such as comprehensive community interventions. Rarely do family violence interventions result from the development of theory or data collection efforts that precede the implementation of a particular program or strategy. Significant exceptions can be found in some areas, however, such as the treatment of domestic violence (especially the use of arrest policies) and the development of home visitation services and intensive family preservation services. All these interventions were preceded by research studies that identified critical decision points in the intervention process that could be influenced by policy or service reforms (such as deterrence research in domestic violence cases) and research suggesting that families may be more responsive to services during times of crisis (such as a decision to move a child from a family setting into out-of-home placement) or change (such as the birth of a child). Initial attempts to implement strategies are followed by refinements that include concrete descriptions of services, the development of models to differentiate types of interventions, and the emergence of theories or rationales to explain why particular approaches ought to be effective. As these models are replicated, empirical evidence and experience emerge that clarify who is best served by a particular intervention and under what circumstances. As programs mature and become better articulated and implemented, the evaluation questions and methods become more complex. The Marginal Role of Research in Service Settings Most research on family violence interventions is concentrated in social service settings, in which researchers have comparatively easy access to clients and can exert greater control over the service implementation process that accompanied

OCR for page 59
--> the development of the intervention. Research is also concentrated in the area of child maltreatment, which has a longer history of interventions than domestic violence or elder abuse. It is important to reiterate that the distribution of the evaluation studies reviewed in this report does not match the history of the programs and the interventions themselves. Some interventions, such as home visitation and family preservation services, are comparatively new and employ innovative service strategies. Because they easily lend themselves to evaluation of small study populations, they have been the subject of numerous evaluation studies. Other, more extensive interventions, such as foster care, judicial rulings on child placement, and shelter services for battered women, involve larger numbers of individuals and are more resistant to study because they are deeply embedded in major institutional or advocacy settings that are often not receptive to or do not have the resources to support research. As a result, research is often marginalized in discussions of the effectiveness of certain programs or service strategies, and the conditions that can foster empirical program evaluation studies are restricted to a fairly narrow range of program activity. This situation appears to be changing. The growing costs of ongoing interventions have stimulated interest in the public sector in knowing more about the processes, effects, and outcomes associated with service strategies, interventions, and programs. In the site visits and workshops that were part of our study, program advocates with extensive experience with different service models expressed a receptivity to learning more about specific components of service systems that can allow them to tailor services to the individual needs of their clients and communities. Researchers have documented the variation among victims and offenders, suggesting that, in assigning cases to service categories, repetitive and chronic cases should be distinguished from those that are episodic or stimulated by unusual stress. Furthermore, inconsistent findings and uncertainties associated with the ways in which clients are referred to or selected for programs and interventions suggest that more attention should be focused on the pathways by which individual victims or offenders enter different service settings. There is now greater interest in understanding the social, economic, legal, and political processes that shape the development of family violence interventions. At the same time, program advocates have begun to focus on the development of comprehensive community interventions that can move beyond the difficulties associated with providing professional services in institutional settings and establish resource centers that can aid parents, families, women, and children in their own communities. The development of these comprehensive and individualized interventions has stimulated further interest in knowing more about the ways in which client characteristics, service settings, and program content interact to stimulate behavioral change and lead to a reduction in child maltreatment, domestic violence, and elder abuse.

OCR for page 59
--> Imposing Experimental Conditions in Service Settings It is difficult for researchers to establish good standards of evidence when they cannot exert complete control over the selection of clients and the implementation of the intervention. But several strategies have emerged in other fields that can guide the development of evaluation research as it moves from its descriptive stage into the conduct of quasi-experimental and true experimental studies. An important part of this process is the development of a "fleet of studies" (Cronbach and Snow, 1981) that can provide a broad and rich understanding of a specific intervention. For example, a recent report by the National Research Council and the Institute of Medicine used a variety of sources of evaluation information to examine the effects of needle exchange programs on the risk of HIV transmission for intravenous drug users (National Research Council, 1995). Several program evaluations conducted over several years made it possible for the study panel to examine the pattern of evidence about the effectiveness of the needle exchange program. Although each individual study of a given project was insufficient to support a claim that the needle exchange program was effective, the collective strengths of this fleet of studies taken together provided a basis for a firm inference about effectiveness. The only fleet of studies identified in our review of evaluations is the Spousal Arrest Replication Program (SARP) discussed in Chapter 5 . In contrast, although multiple studies have been conducted of parenting programs and family support interventions (such as lay counseling, peer group support, and parent education), their dimensions, subject samples, and instrumentation are too varied to allow strong inferences to be developed at this time. Evaluation may lag behind the development and refinement of intervention programs, especially when there is a rush to experiment without establishing the necessary conditions for a successful endeavor. Premature experimentation can leave the impression that nothing works, especially if the problem to be addressed is complex, interventions are limited, and program effects are obscure. Yet early program experimentation studies can be helpful in describing the characteristics of clients who appear to be receptive or impervious to change, documenting the barriers to program implementation, and estimating the size, intensity, costs, and duration of the intervention that may be required. If these lessons can be captured, they are a valuable resource in moving the research and program base to a new level of development, one that can address multiple and contextual interactions. Flaws in Current Evaluations The committee identified 114 evaluation studies conducted in the period 1980-1996 that have sufficient scientific strength to provide insights on the effects

OCR for page 59
--> of specific interventions in the area of child maltreatment, domestic violence, and elder abuse (Table 3-1). This time period was selected because it provides a contemporary history of the evaluation research literature while maintaining manageable limits on the scope of the evidence considered by the committee. As noted in Chapter 1, each of the studies employed an experimental or quasi-experimental research design, used reliable research instrumentation, and included a control or comparison group. In addition, a set of 35 detailed review articles summarizes a broader set of studies that rely on less rigorous standards but offer important insights into the nature and consequences of specific interventions (Table 3-2). Most of the 114 studies identified by the committee focus on interventions conducted in the United States. As a group, these studies represent only a small portion of the enormous array of evaluation research that has been conducted. A rigorous assessment of the quasi-experimental studies and research review papers reveals many methodological weaknesses, including differences in the nature of the treatment and control groups, sample sizes that are too small to detect medium or small effects of the intervention, research measures that are unreliable or that yield divergent results with different ethnic and cultural groups, short periods of follow-up, and lack of specificity and consistency in program content and services. The lack of equivalence between treatment and control groups in quasi-experimental studies can be illustrated in one study of therapeutic treatment of sexually abused children, in which children in the therapy group were compared with a group of no-therapy children (Sullivan et al., 1992). In a review of the study findings, the authors note that although children who received treatment had significantly fewer behavior problems at a one-year follow-up assessment, the no-therapy comparison group consisted of children whose parents had specifically refused therapy for their children when it was offered (Finkelhor and Berliner, 1995). This observation suggests that other systematic differences could exist within the two groups that affected their recovery (for example, having parents who were or were not supportive of psychological interventions). Similarly, a review of intensive family preservation services programs indicated that, in a major Illinois evaluation of the intervention, the 2,000 families included were distributed across six sites that administered significantly different types of programs and services (Rossi, 1992). As a result, the sample size was reduced to the 333 families associated with each individual site; small effects associated with the intervention, if they occurred, could not be observed (Rossi, 1992). A later methodological review of intensive family preservation services indicated that 5 of 10 studies that used control or comparison groups had treatment groups that included fewer than 100 participants (Heneghan et al., 1996). The quality of the existing research base of evaluations of family violence interventions is therefore insufficient to provide confident inferences to guide policy and practice, except in a few areas that we identify in this report. Nevertheless,

OCR for page 59
--> TABLE 3-1 Interventions by Type of Strategy and Relevant Quasi-Experimental Evaluations, 1980-1996 Intervention Quasi-Experimental Evaluations Parenting practices and family support services 4A-1 Barth et al., 1988 Barth, 1991 Brunk et al., 1987 Burch and Mohr, 1980 Egan, 1983 Gaudin et al., 1991 Hornick and Clarke, 1986 Lutzker et al., 1984 National Center on Child Abuse and Neglect, 1983a,b Reid et al., 1981 Resnick, 1985 Schinke et al., 1986 Wesch and Lutzker, 1991 Whiteman et al., 1987 School-based sexual abuse prevention 4A-2 Conte et al., 1985 Fryer et al., 1987 Harvey et al., 1988 Hazzard et al., 1991 Kleemeier et al., 1988 Kolko et al., 1989 McGrath et al., 1987 Miltenberger and Thiesse-Duffy, 1988 Peraino, 1990 Randolph and Gold, 1994 Saslawsky and Wurtele, 1986 Wolfe et al., 1986 Wurtele et al., 1986, 1991 Child protective services investigation and casework 4A-3 Intensive family preservation services 4A-4 AuClaire and Schwartz, 1986 Barton, 1994 Bergquist et al., 1993 Dennis-Small and Washburn, 1986 Feldman, 1991 Halper and Jones, 1981 Pecora et al., 1992 Schuerman et al., 1994 Schwartz et al., 1991 Szykula and Fleischman, 1985 Walton et al., 1993 Walton, 1994 Wood et al., 1988 Yuan et al., 1990

OCR for page 59
--> treatment systems. The absence of differences in studies of alternative forms of service can be interpreted as showing that interventions do not have their intended effects (i.e., they are unsuccessful) or that both approaches are equally effective in enhancing well-being. The absence of a framework for assessing change over time will impede efforts to distinguish between no beneficial effects and equivalent effects. The Value of Service Descriptions Collaborative research and practice partnerships can highlight significant differences between new services and usual-care services, which are generally a matter of degree. Treatment interventions may differ in terms of frequency (number of times delivered per week), intensity (a face-to-face meeting with counselors versus a brief telephone contact), or the nature of the activities (goal setting versus establishment of a standard set of expectations). Documenting these differences is critical for planning and interpreting the outcome and replication, since they establish the strength of the intervention. The value of service descriptions is enhanced by identifying key elements that distinguish interventions from comparison groups as well as ones they have in common. Such descriptions should include information about the setting(s) in which service activities occur; the training of the service provider; the frequency, intensity, and/or duration of service; the form, substance, and flexibility of the program (e.g., individual or group counseling); and the type of follow-up or subsequent service associated with the intervention. Most family violence interventions are poorly documented (Heneghan et al., 1996; Ors, 1992). For example, in 10 evaluation studies of intensive family preservation services reported by Heneghan et al. (1996), all 10 described (in narrative form) the types of services that were provided, but only 5 provided data on the duration of services; 3 provided data on the number of contacts per week; and 3 provided limited, narrative information on services provided in the usualcare conditions. In a different study, Median and Hanse (1985) found that the majority of abuse cases that were substantiated received no services at all (see Chapter 4). This point highlights the importance of knowing more about significant differences between service levels in the treatment and comparison groups, so that innovative projects do not duplicate what is currently available. The flexibility in services is also an issue in assessing differences in service levels. Some services are protocol driven, whereas others can be adjusted to the needs of specific clients. The most respected intervention evaluations use consistent program models that can be replicated in other sites or cities. However, most interventions in social service, health, and law enforcement settings aim to be responsive to individual client needs and profiles rather than driven by protocols because of the variation in the individual experiences of their clients. Safety planning for battered women, for example, varies tremendously depending on

OCR for page 59
--> whether a woman plans to stay in an abusive relationship, is planning to leave, or has already left. Staff, clients, and researchers can collaborate to prepare multiple protocols for a multifaceted intervention that reflects the context and setting of the service site. Finally, many intervention efforts presume that a social ill can be remedied through the application of a single dose of an intervention (e.g., 3 weeks of rehabilitation). Others rely on follow-up activities to sustain the gains made in primary intervention efforts. Careful description of differences in follow-up activities should be given as much attention as is given to describing the primary intervention. Explaining Theories of Change In the early stages of development, the fundamental notions or theoretical concepts that guide a program intervention are not always well articulated. However, the service providers and program developers often share a conceptual framework that reflects their common understanding of the origins of the problem and the reasons why a specific configuration of services should remedy it. Extracting and characterizing the theory of change that guides an intervention can be a useful tool in evaluating community-based programs (Weirs, 1995), but this approach has not been widely adopted in family violence evaluations. Program articulation involves a dynamic process, involving consultation with program directors, line staff, and participants throughout the planning, execution, and reporting phases of the evaluation (Hen and Ors, 1983; Corduroy, 1993; US. General Accounting Office, 1990). Causal models provide a basis for determining intermediate outcomes or processes that could be assessed as part of the overall evaluation plan. If properly measured, these linkages can provide insights into why interventions work and what they are intended to achieve. Outcomes and Follow-up Front-line personnel often express concerns that traditional evaluation research, which relies on a single perspective or method of assessment, may measure the wrong outcome. For example, evaluations of shelters for battered women may focus on the recurrence of violence, which is usually dependent on community sanctions and protection and is out of the victim's and the shelter's control. The reliance on single measures can strip the entire enterprise, as well as victims' lives, of their context and circumstances. Another issue of concern is that outcomes may get worse before they get better. For instance, health care costs may increase in the short term with better identification and assessment of family violence survivors. An important strategy is to measure many and different outcomes, immediate and long-term outcomes, multiple time periods, both self-report and observational

OCR for page 59
--> measures, and different levels of outcomes. The absence of consensus about the unique or common purposes of measurement should not obscure the central point that the identification of relevant domains for measuring the success of interventions requires an open collaborative discussion among researchers, service providers, clients, sponsors, and policy makers. Another key to improving data collection, storing, and tracking capabilities is the development of coordinated information systems that can trace cases across different service settings. Three national data systems for child welfare data collection provide a general foundation for national and regional studies in the United States: in the Department of Health and Human Services, the Administration on Children and Families' Adoption and Foster Care Analysis and Reporting System (EFFACERS); the federally supported and optional Statewide Automated Child Welfare Information Systems (SACCUS); and the National Center on Child Abuse and Neglect's National Child Abuse and Neglect Data System (NANDS), a voluntary system that includes both summary data on key indicators of state child abuse and neglect indicators and detailed case data that examine trends and issues in the field. Researchers are now exploring how to link these data systems with health care and educational Dataquest, using record-matching techniques to bring data from multiple sources together (George et al., 1994a). This type of tracking and record integration effort allows researchers to examine the impact of settings on service outcomes. Using this approach, Wulczyn (1992), for example, was able to show that children in foster care with mothers who had prenatal care during their pregnancy had shorter time periods in placement than children whose mothers did not have prenatal care. Similarly, drawing on foster care Dataquest and the records of state education agencies, George et al. (1992) reported that 28 percent of children in foster care also received various forms of special education. Fostering links between Dataquest such as NANDS and EFFACERS and other administrative record sets will require research resources, coordination efforts, and the use of common definitions. Such efforts have the potential of greatly improving the quality of program evaluations. Community Context The efficacy of a family violence intervention may depend on other structural processes and service systems in the community. For instance, the success of battered treatment (and the evaluation of attrition rates) is often dependent on the ability of criminal justice procedures to keep the perpetrator in attendance. Widespread unemployment and shortages in affordable housing may undermine both violence prevention and women's ability to maintain independent living situations after shelter stays. Evaluation outcomes therefore need to be analyzed in light of data collected about the community realities in which they are embedded. A combination of qualitative and quantitative data and the use of triangulation

OCR for page 59
--> can often best resolve the need to capture contextual factors and control for competing explanations for any observed changes. Triangulation involves gathering both qualitative and quantitative data about the same processes for validation, and complementing quantitative data with qualitative data about other issues not amenable to traditional measurement is recommended. Agency records can be analyzed both quantitatively and qualitatively. At a minimum, regardless of the measures that are chosen, the ones selected ought to be assessed for their reliability (consistency) and validity (accuracy). There is a surprising lack of attention to these issues in the family violence field, although a broad range of measures is currently used in program evaluations (Table 3-3). The Dynamics Of Collaboration Strengthening the structural aspects of the partnership between researchers and service providers will change the kinds of relationships between them. Creative collaborations require attention to several issues: (1) setting up equal partnerships, (2) the impact of ethnicity and culture on the research process, (3) safety and ethical issues, and (4) concerns such as publishing and publicizing the results of the evaluation study and providing services when research resources are no longer available. Equal Partnerships Tensions between service providers and researchers may reflect significant differences in ideology and theory regarding the causes of family violence; they may also reflect mutual misunderstandings about the purpose and conduct of evaluation research. For front-line service providers, evaluation research may take time and resources away from the provision of services. The limited financial resources available to many agencies have created situations in which finances directed toward evaluation seem to absorb funds that might support additional services for clients or program staff. Evaluations also have the potential to jeopardize clients' safety, individualization, and immediacy of access to care. Recent collaborations and partnerships have gone far to address these concerns. Community agencies are beginning to realize that well-documented and soundly evaluated successes will help ensure their fiscal viability and even attract additional financial resources to support promising programs. Researchers are starting to recognize the accumulated expertise of agency personnel and how important they can be in planning as well as conducting their studies. Both parties are recognizing that, even if research fails to confirm the success of a program, the evaluation results can be used to improve the program. True collaborative partnerships require a valuing and respect for the work on all sides. Too often the attitude of researchers in the past has been patronizing

OCR for page 59
--> TABLE 3-3 Outcome Measures Used in Evaluations of Family Violence Interventions Type of Violence Instrument Subject of Outcome Measure       Victim Perpetrator Other Child Abuse Adolescent-Family Inventory Events   X     Adult/Adolescent Parenting Inventory   X     Chemical Measurement Package   X     Child Abuse Potential Inventory   X     Child and Family Well-Being Scales X X     Child Behavior Checklist X       Conflict Tactics Scales X X     Coppersmith Self-Opinion Inventory   X     Coping Health Inventory   X X   Family Adaptability and Cohesion Scales X X X   Family Assessment Form X X X   Family Environment Scale   X X   Family Inventory of Life Events and Changes X X X   Family Systems Outcome Measures     X   Home Observation     X   Cent Infant Development Scale X       Maternal Characteristics Scale (Right)   X     Minnesota Child Development Inventories X       Parent Outcome Interview   X     Parenting Stress Index   X   Domestic Violence Adult Self-Expression Scale X X     Conflict Tactics Scales X X     Depression Scale CEOs-D X X     Quality of Life Measure X       Rosenberg Self-Esteem Scale X X     Rooter Internal-External Locus of Control Scale X X     Social Support Scale X     Elder Abuse Anger Inventory   X     Brief Symptom Inventory X X     Rosenberg Self-Esteem Scale X X     SOURCE: Committee on the Assessment of Family Violence Interventions, National Research Council and Institute of Medicine, 1998.

OCR for page 59
--> toward activists, and front-line agency personnel have been suspicious of researchers' motives and commitment to the work. Honest discussion is needed of issues of parity, all the possible gains of the enterprise for all parties, what everyone would like out of the partnership, and any other unresolved issues. Both sides need to spend time observing each other's domains, in order to better realize their constraints and risks. The time constraints associated with setting up partnerships, especially those associated with short deadlines for responding to requests for proposals, need attention. Rather than waiting for a deadline, service providers and researchers need to support a plan for assessing the effects of interventions early in their development. Viable collaborations also involve the consumers of the service at every step of the evaluation process. Clients can suggest useful outcome variables, the contextual factors that will modify those outcomes, elements of theory, and strategies that will maximize response rates and minimize attrition. They can help identify risks and benefits associated with the intervention and its evaluation. Finally, collaboration that leads to a common conceptual framework for the evaluation study and intervention design is an essential and productive process that can identify appropriate intermediate and outcome measures and also reconcile underlying assumptions, beliefs, and values (Conned et al., 1995). The Impact of Ethnicity and Culture Ethnicity and culture consistently have significant impact on assessments of the reliability and validity of selected measures. Most measures are tested on populations that may not include large representation of minority cultural groups. These measures are then used in service settings in which minorities are often overrepresented, possibly as a result of economic disadvantage. The issues of ethnicity and cultural competence influence all aspects of the research process and require careful consideration at various stages. These stages include the formation of hypotheses taking into account any prior research indicating cultural or ethnic differences, careful sampling with a large enough sample size to have enough power to determine differential impact for different ethnic groups, and strategies for data analysis that take into account ethnic differences and other measures of culture. Improving cultural competence involves going beyond cultural sensitivity, which involves knowledge and concern, to a stance of advocacy for disenfranchised cultural groups. The absence of researchers who are knowledgeable about cultural practices in specific ethnic groups creates a need for greater exposure to diverse cultural practices in such areas as parenting and caregiving, child supervision, spousal relationships, and sexual behaviors. This approach requires greater interaction with representatives from diverse ethnic communities in the

OCR for page 59
--> research process as well as agency services to foster a cultural match with the participants (Williams and Beaker, 1994). In analyzing of the effect of social context on parenting, caregiving behaviors, and intimate relationships, greater attention to culture and ethnicity is advisable. Evaluating the role of neighborhood and community factors (including measures such as social cohesion, ethnic diversity, and perceptions by residents of their neighborhood as a ''good" or "bad" place to raise children) can provide insight into the impact of social context on behaviors that are traditionally viewed only in terms of individual psychology or family relationships. Safety and Ethics Safety concerns related to evaluation of family violence interventions are complex and multifaceted. Research confidentiality can conflict with legal reporting requirements, and concern for the safety of victims must be paramount. Certificates of confidentiality are useful in this kind of research, but they simply shield the researcher from subpoena and do not resolve the problems associated with reporting requirements and safety concerns. Basic agreements regarding safety procedures, disclosure responsibilities, and ethical guidelines need to be established among the clients, research team, service providers, and individuals from the reporting agency (e.g., child or adult protective services) to develop strategies and effective relationships. Exit Issues Ideally the research team and the service agency will develop a long-term relationship, one that can be sustained, through graduate student involvement or small projects, between large formal research studies. Such informal collaborations can help researchers in establishing the publications and background record needed for large-scale funding. Dissemination of findings in local publications is also helpful to the agency. Formal and informal collaboration requires that all partners decide on authorship and the format of publication ahead of time. Multiple submissions for funding can attract resources to carry out some aspects of the project, even if the full program is not externally funded. The collaboration also requires thoughtful discussions before launching an evaluation about what will be released in terms of negative findings and how they will be used to improve services. One exit issue often not addressed is the termination of health or social services in the community after the research is completed. Innovative services are often difficult to develop in public service agencies if no independent source of funds is available to support the early stages of their development, when they must compete with more established service strategies. Models of reimbursement and subsidy plans are necessary to foster positive partnerships that can

OCR for page 59
--> sustain services that seem to be useful to a community after the research evaluation has been completed. Evaluating Comprehensive Community Initiatives Family violence interventions often involve multiple services and the coordinated actions of multiple agencies in a community. The increasing prevalence of cross-problems, such as substance abuse and family violence or child abuse and domestic violence, has encouraged the use of comprehensive services to address multiple risk factors associated with a variety of social problems. This has prompted some analysts to argue that the community, rather then individual clients, is the proper unit of analysis in assessing the effectiveness of family violence interventions. This approach adds a complex new dimension to the evaluation process. As programs become a more integrated part of the community, the challenges for evaluation become increasingly complex: Because participants receive numerous services, it is nearly impossible to determine which service, if any, contributed to improvement in their well-being. If the sequencing of program activities depends on the particular needs of participants, it is difficult to tease apart the effects of selectivity bias and program effects. As intervention activities increasingly involve organizations throughout the community, there is a growing chance that everyone in need will receive some form of service (reducing the chance of constituting an appropriate comparison group). As program activities saturate the community, it is necessary to view the community as the unit of analysis in the evaluation. The tremendous variation in individual communities and diversity in organizational approaches impede analyses of the implementation stages of interventions. An emphasis on community process factors (ones that facilitate or impede the adoption of comprehensive service systems), as opposed to program components, suggests that evaluation measures require a general taxonomy that can be adapted to particular local conditions (Kaftarian and Hanse, 1994). Conventional notions of what constitutes a rigorous evaluation design are not easily adapted to meet these challenges. Holster and Hill (1995), after a careful review of the technical requirements of conventional evaluation techniques, conclude that randomization is not feasible as a means of assessing the impact of comprehensive community initiatives; they also conclude that the alternatives to randomization are technically insufficient. Weirs (1995) reiterates these points, nothing that community-based programs are particularly difficult to

OCR for page 59
--> evaluate using conventional control groups because it is unlikely that a sufficient number of communities could be recruited and assigned to the experimental and control conditions. Those with an interest in arresting the spread of violence are likely to have already installed ameliorative efforts. If all communities willing to volunteer for such an experiment have initiated such efforts, the amount of programmatic difference between communities is likely to be too small to allow for the detection of effects. Using nonrandomly assigned comparison communities engenders similar problems and adds another—selectivity bias—that is particularly difficult to account for using conventional statistical procedures (Holster and Hill, 1995). Weirs (1995) proposes an alternative evaluation model, based on clarifying the theories of change that explore how and why an intervention is supposed to work. The evaluation should start with the explicit and implicit assumptions underlying the theory guiding the intervention efforts; this theory is generally based on a series of small steps that involve assumptions about linkages to other activities or surrounding conditions. By creating a network of assumptions, it is possible to gather data to test whether the progression of actions leads to the intended end point. Examining the steps and progression through each phase also provides a better understanding of how interventions work and where problems arise. Conned and Kubisch (1996) note that this perspective provides some basic principles to guide collaborative evaluations. First, the theory of change should draw on the available scientific information, and it should be judged plausible by all the stakeholders. Second, the theory of change should be doable—that is, the activities defined in the theory can be implemented. Third, the theory should be testable, which means that the specification of outcomes follows logically from the theory. A greater reliance on the use of measures or indicators, in turn, follows from the theory-based specification of outcomes. This approach is consistent with the evaluation of single interventions. Theories of change are established by relevant stakeholders, important outcomes are selected on the basis of activities embodied in these theories, and logical expectations are established for declaring that the intervention has achieved its collective goals. What is different in the evaluation of community-based interventions is the standard of evidence that is established. In the traditional evaluation, the standard focuses on meeting technical and logical criteria surrounding the validity of the causal claim. In the theory of change model, the logical consequences of programmatic activities and actions, as judged by the stakeholders, are the standard of evidence. A related approach is the use of empowerment evaluation to encourage self-assessment and program accountability in a variety of community-based interventions (Letterman et al., 1996). Empowerment evaluations allow programs to take stock of their existing strengths and weaknesses, focus on key goals and program improvements, develop self-initiated strategies to achieve these goals,

OCR for page 59
--> and determine the type of evidence that will document credible progress. Such evaluations represent opportunities to encourage collaboration between research and practice while providing interim data that can lead to program improvements. The measurement issues for community-based studies are complex and invoke the use of specialized research designs. If comparable archival records are available, approaches such as an interrupted-time-series design could be used to strengthen the caliber of these assessments. But community-to-community variation in record keeping diminishes hope that such designs could be employed, except as case studies. The evaluation challenges that emerge from large-scale community-based efforts are formidable. The approach detailed by Conned and Kubisch (1996) seems promising in that theories of interventions can help to establish lines of reasoning and lines of evidence that can be developed and probed; with careful thought, the effects of community-based interventions may be demonstrated. The emphasis on innovative methodological approaches, in which evaluation designs are built into the program from its very inception, requires the development of a research capacity that is flexible, creative, and able to integrate both quantitative and qualitative research findings. What is not yet known are the circumstances that are conducive to these synthesis efforts; the extent to which they can be successful in identifying relevant and interim community process factors as well as child, family, and community outcomes in the prevention and treatment of family violence; and the relative effectiveness of such approaches when compared to traditional service delivery system efforts. Conclusion Evaluation studies in the area of family violence are usually small in scale, likely to be underpowered, and subject to a long list of rival interpretations because of study designs that include threats to validity, such as the lack of appropriate control groups, small study samples, unreliable research measures that have not been tested across diverse social classes and ethnic and cultural groups, short follow-up periods, and inconsistencies in program content and service delivery. Limited evidence exists in this field about what works, for whom, and under what conditions. Furthermore, program development and service innovation have exceeded the capacity of the service system to conduct meaningful evaluation and research studies on existing programs, interventions, and strategies or integrate such research into service delivery efforts. It is not clear whether this state of affairs is due to limited funding, short time horizons for studies that prevent the accumulation of a sufficient sample size, or the absence of the pre-evaluation research necessary to describe usual-care services and the nature of the intervention. This characterization is comparable to that of other fields of violence research. At numerous points in the research and program development processes,

OCR for page 59
--> there are opportunities for greater collaboration between service providers and researchers. Developing knowledge about what works, for whom and under what conditions requires attention to four areas: (1) describing what is known about services that are currently available within the community, (2) documenting the theory of change that guides the service intervention, (3) describing the stages of implementation of the service program, and (4) describing the client referral, screening, and baseline assessment process. The size and sensitivity of individual studies could be enhanced by greater attention to the referral and screening processes and the workloads of staff in service agencies. More useful evaluation information could be gathered by paying closer attention to the information needs of staff. Creative study designs that incorporate principles of self-determination and mutual understanding of the goals of research in the design of evaluation of research studies could be developed through collaboration among researchers, front-line workers, and program developers. Team-building efforts are needed to address the dynamics of collaboration and to foster greater opportunity for functional partnerships that build a common respect for research requirements and service information needs. Understanding the interventions from the perspective of those who develop or use them could go a long way toward illuminating the strengths and weaknesses of existing services. The development of measurement schemes that faithfully reflect the theories and processes underlying interventions is a promising area of greater collaboration. The use of community measures and the role of social context, including the impact of social class and culture, deserve further analysis in evaluating family violence interventions.