Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 43
Improving Democracy Assistance: Building Knowledge through Evaluations and Research 2 Evaluation in USAID DG Programs: Current Practices and Problems INTRODUCTION To make decisions about the best ways to assist the spread of democracy and governance (DG), the U.S. Agency for International Development (USAID) must address at least two broad questions: Where to intervene. In what countries and in what sectors within countries? Selecting the targets for DG programming requires a theory, or at least hypothesis, about the relationships among different institutions and processes and how they contribute to shaping overall trajectories toward democracy and governance. It also requires strategic assessment, that is, the ability to identify the current quality of democratic institutions and processes in various countries and set reasonable goals for their future development. How to intervene. Which DG projects will work best in a given country under current conditions? Learning how well various projects work in specific conditions requires well-designed impact evaluations that can determine how much specific activities contribute to desired outcomes in those conditions. The two questions are clearly connected. To decide where to intervene (Question 1), one wants to know which interventions can work (Question 2) in the conditions facing particular countries. Indeed, in the current state of scientific knowledge, answers to Question 2 may provide the most helpful guidance to answering Question 1.
OCR for page 44
Improving Democracy Assistance: Building Knowledge through Evaluations and Research This chapter therefore focuses on USAID’s policies and practices for monitoring and evaluation (M&E) of its DG projects. To provide a context, we begin with a brief description of the current state of evaluations of development assistance programs in general. Then existing USAID assessment, monitoring, and evaluation practices for DG programs are described. Since such programs are called into existence and bounded by U.S. laws and policies, the key laws and policies that shape current USAID DG assessment and evaluation practices are examined, to lay the foundation for the changes recommended later in the report. The chapter concludes with a discussion of three key problems that USAID encounters in its efforts to decide where and how to intervene. CURRENT EVALUATION PRACTICES IN DEVELOPMENT ASSISTANCE: GENERAL OBSERVATIONS As Chapter 5 discusses later in detail, there is a widely recognized set of practices for how to make sound and credible determinations of how well specific programs have worked in a particular place and time (see, e.g., Shadish et al 2001, Wholey et al 2004). The goal of these practices is to determine, not merely what happened following a given assistance program, but how much what happened differs from what would be observed in the absence of that program. The final phrase is critical, because many factors other than the given policy intervention—including ongoing long-term trends and influences from other sources—are generally involved in shaping observed outcomes. Without attention to these other factors and some attempt to account for their impact, it is easy to be misled regarding how much an aid program really is contributing to an observed outcome, whether positive or negative. The practices used to make this determination generally have three parts: (1) collection of baseline data before a program begins, to determine the starting point of the individuals, groups, or communities who will be receiving assistance; (2) collection of data on the relevant desired outcome indicators, to determine conditions after the program has begun or operated for a certain time; and (3) collection of these same “before and after” data for a comparison set of appropriately selected or assigned individuals, groups, or communities that will not receive assistance, to estimate what would have happened in the absence of such aid.1 1 The ideal comparison group is achieved by random assignment, and if full randomization is achieved, a “before” measurement may not be required, as randomization effectively sets the control and intervention groups at the same starting point. However, both because randomization is often not achievable, requiring the use of matched or baseline-adjusted comparison groups, and because baseline data collection itself often yields valuable information about the conditions that policymakers desire to change, we generally keep to the three-part model of sound evaluation design.
OCR for page 45
Improving Democracy Assistance: Building Knowledge through Evaluations and Research Wide recognition of these practices for determining project impacts does not mean that they are widely or consistently applied, however. Nor does it mean that policy professionals or evaluation specialists agree that the three elements are feasible or appropriate in all circumstances, especially for highly diverse and politically sensitive programs such as democracy assistance or other social programs. Thus, while some areas of development assistance, such as public health, have a long history of using impact evaluation designs to assess whether policy interventions have their intended impact, social programs are generally much less likely to employ such methods. In 2006 the Center for Global Development (CGD), a think tank devoted to improving the effectiveness of foreign assistance in reducing global poverty and inequality, released the report of an “Evaluation Gap Working Group” convened to focus on the problem of improving evaluations in development projects. Their report concludes: Successful programs to improve health, literacy and learning, and household economic conditions are an essential part of global progress. Yet … it is deeply disappointing to recognize that we know relatively little about the net impact of most of these social programs…. [This is because] governments, official donors, and other funders do not demand or produce enough impact evaluations and because those that are conducted are often methodologically flawed. Too few impact evaluations are being carried out. Documentation shows that UN agencies, multilateral development banks, and developing country governments spend substantial sums on evaluations that are useful for monitoring and operational assessments, but do not put sufficient resources into the kinds of studies needed to judge which interventions work under given conditions, what difference they make, and at what cost. (Savedoff et al 2006:1-2) Although not a focus for the CGD analysis, democracy assistance reflects this general weakness. As a recent survey of evaluations in democracy programming noted: “Lagging behind our programming, however, is research focusing on the impact of our assistance, knowledge of what types of programming is (most) effective, and how programming design and effectiveness vary with differing conditions” (Green and Kohl 2007:152). The Canadian House of Commons recently investigated Canada’s DG programs and came to similar conclusions: [W]eaknesses … have been identified in evaluating the effectiveness of Canada’s existing democracy assistance funding…. Canada should invest more in practical knowledge generation and research on effective democratic development assistance. (House of Commons 2007) As discussed in more detail below, there are many reasons why DG projects—and social development programs more generally—are not rou-
OCR for page 46
Improving Democracy Assistance: Building Knowledge through Evaluations and Research tinely subject to the highest standards of impact evaluation. One reason is that “evaluation” is a broad concept, of which impact evaluations are but one type (see, e.g., World Bank 2004). On more than one occasion committee members found themselves talking past USAID staff and implementers because they lack a shared vocabulary and understanding of what was meant by “evaluation.” Diverse Types of Evaluations Because the term “evaluation” is used so broadly, it may be useful to review the various types of evaluations that may be undertaken to review aid projects. The type of evaluations most commonly called for in current USAID procedures is process evaluation. In these evaluations investigators are chosen after the project has been implemented and spend several weeks visiting the program site to study how the project was implemented, how people reacted, and what outcomes can be observed. Such an evaluation often provides vital information to DG missions, such as whether there were problems with carrying out program plans due to unexpected obstacles, or “spoilers,” or unanticipated events or other actors who became involved. They are the primary source of “lessons learned” and “best practices” intended to inform and assist project managers and implementers. They may reveal factors about the context that were not originally taken into account but that turned out to be vital for program success. Process evaluations focus on “how” and “why” a program unfolded in a particular fashion, and if there were problems, why things did not go as originally planned. However, such evaluations have a difficult time determining precisely how much any observed changes in key outcomes can be attributed to a foreign assistance project. This is because they often are unable to re-create appropriate baseline data if such data were not gathered before the program started and because they generally do not collect data on appropriate comparison groups, focusing instead on how a given DG project was carried out for its intended participants. A second type of evaluation is participatory evaluation. In these evaluations the individuals, groups, or communities who will receive assistance are involved in the development of project goals, and investigators interview or survey participants after a project was carried out to determine how valuable the activity was to them and whether they were satisfied with the project’s results. Participatory evaluation is an increasingly important part of both process and impact evaluations. In regard to all evaluations, aid agencies have come to recognize that input from participants is vital in defining project goals and understanding what con-
OCR for page 47
Improving Democracy Assistance: Building Knowledge through Evaluations and Research stitutes success for activities that are intended to affect them. This focus on building relationships and engaging people as a project goal means this type of evaluation may also be considered part of regular project activity and not just a tool to assess its effects. Using participatory evaluations to determine how much a DG activity contributed to democratic progress, or even to more modest and specific goals such as reducing corruption or increasing legislative competence, can pose problems. Participants’ views of a project’s value may rest on their individual perceptions of personal rewards. This may bias their perception of how much the program has actually changed, as they may be inclined to overestimate the impact of an activity if they benefited from it personally and hope to have it repeated or extended. Thus participatory evaluations should be combined with collection of data on additional indicators of project outcomes to provide a full understanding of project impacts. Another type of evaluation is an output evaluation (generally equivalent to “project monitoring” within USAID). These evaluations consist of efforts to document the degree to which a program has achieved certain targets in its activities. Targets may include spending specific sums on various activities, giving financial support or training to a certain number of nongovernmental organizations (NGOs) or media outlets, training a certain number of judges or legislators, or carrying out activities involving a certain number of villagers or citizens. Output evaluations or monitoring are important for ensuring that activities are carried out as planned and that money is spent for the intended purposes. USAID thus currently spends a great deal of effort on such monitoring, and under the new “F Process,” missions report large numbers of output measures to USAID headquarters (more on this below). Finally, impact evaluation is the term generally used for those evaluations that aim to establish, with maximum credibility, the effects of policy interventions relative to what would be observed in the absence of such interventions. These require the three parts noted above: collection of baseline data; collection of appropriate outcome data; and collection of the same data for comparable individuals, groups, or communities that, whether by assignment or for other reasons, did and did not receive the intervention. The most credible and accurate form of impact evaluation uses randomized assignments to create a comparison group; where feasible this is the best procedure to gain knowledge regarding the effects of assistance projects. However, a number of additional designs for impact evaluations exist, and while they offer somewhat less confidence in inferences about program effects than randomized designs, they have the virtue of being applicable in conditions when randomization cannot be applied
OCR for page 48
Improving Democracy Assistance: Building Knowledge through Evaluations and Research (e.g., when aid goes to a single group or institution or to a small number of units where the donor has little or no control over selecting who will receive assistance). Impact evaluations pose challenges to design, requiring skill and not merely science to identify and collect data from an appropriate comparison group and match the best possible design to the conditions of the particular assistance program. The need for baseline data on both the group receiving the policy intervention and the comparison group usually means that the evaluation procedures must be designed before the project is begun and carried out as the project itself is implemented. Finally, the need to collect baseline data and comparison group data may increase the costs of evaluation. For these reasons, among others, impact evaluations of DG programs are at present the most rarely carried out of the various kinds of evaluations described here. Indeed, many individuals throughout the community of democracy assistance donors and scholars have doubts about the feasibility and utility of conducting rigorous impact evaluations of DG projects. Within the committee, Larry Garber has strongly expressed concerns in this regard, and the committee as a whole has given a great deal of attention to these worries. However, as discussed in Chapters 6 and 7, there are a number of practical ways to deal with these issues, and these were explored in the field by the committee’s consultants in partnership with several missions. In addition, a good evaluation design is not necessarily more expensive or time-consuming than routine monitoring or a detailed process evaluation. The differences among these distinct kinds of evaluations are often obscured by the way in which the term “evaluation” is used in DG and foreign assistance discussions. “Evaluation” is often used to imply any estimate or appraisal of the effects of donor activities, ranging from detailed counts of participants in specific programs to efforts to model the aggregate impact of all DG activities in a country on that country’s overall level of democracy. This catch-all use of the term “evaluation” undermines consideration of whether there is a proper balance among various kinds of evaluations, how various types of evaluations are being used, and whether specific types of evaluations are being done or are needed. As another CGD report notes: Part of the difficulty in debating the evaluation function in donor institutions is that a number of different tasks are implicitly simultaneously assigned to evaluation: building knowledge on processes and situations in receiving countries, promoting and monitoring quality, informing judgment on performance, and, increasingly, measuring actual impacts. Agencies still need their own evaluation teams, as important knowledge providers from their own perspective and as contributors to quality
OCR for page 49
Improving Democracy Assistance: Building Knowledge through Evaluations and Research management. But these teams provide little insight into our actual impacts and, although crucial, their contribution to knowledge essentially focuses on a better understanding of operational constraints and local institutional and social contexts. All these dimensions of evaluations are complementary. For effectiveness and efficiency reasons, they should be carefully identified and organized separately: some need to be conducted in house, some outside in a cooperative, peer review, or independent manner. In short, evaluation units are supposed to kill all these birds with one stone, while all of them deserve specific approaches and methods. (Jacquet 2006) Efforts to Improve Assessments and Evaluations by Donor Agencies There are encouraging signs of efforts to put greater emphasis on impact evaluations for improving democracy and governance programs. The basic questions motivating USAID’s Strategic and Operational Research Agenda (SORA) project are also motivating other international assistance agencies and organizations. The desire to understand “what works and what doesn’t and why” in an effort to make more effective policy decisions and to be more accountable to taxpayers and stakeholders has led a host of agencies to consider new ways to determine the effects of foreign assistance projects. This focus on impact evaluations in particular has increased since the creation of the Millennium Challenge Corporation (MCC) and the 2005 Paris Declaration on AID Effectiveness. Yet while there is wide agreement that donors need more knowledge of the effects of their assistance projects, and there are increased efforts to coordinate and harmonize the approaches and criteria employed in pursuit of that knowledge, donors are far from consensus on how best to answer the fundamental questions at issue. As the Organization for Economic Cooperation and Development (OECD) has stated: There is strong interest among donors, NGOs and research institutions in deepening understanding of the political and institutional factors that shape development outcomes. All donors are feeling their way on how to proceed. (OECD 2005:1) Several donors have focused on the first question posed above, the question of where to intervene in the process of democratization to help further that process. In the committee’s view this is a question that the current state of knowledge on democratic development cannot answer. It is an essential question, however, and Chapters 3 and 4 suggest specific research programs that might help bring us closer to answers. These issues are more a matter of strategic assessment of a country’s condition and potential for democratic development, rather than evaluation, a term
OCR for page 50
Improving Democracy Assistance: Building Knowledge through Evaluations and Research the committee thinks is better reserved for studying the effects of specific DG programs. Nonetheless, several national development assistance agencies have, under the general rubric of improving evaluation, sought to improve their strategic assessment tools. What all of the following donor programs have in common is an increased effort at acquiring and disseminating knowledge about how development aid works in varied contexts. The broad range of current efforts to revise and improve evaluation procedures undertaken by national and international assistance agencies described below are aimed at better understanding the fundamental questions of interest to all: “what works and what doesn’t and why,” although at present only some involve the use of impact evaluations. Perhaps the most visible leader in efforts to increase the use of impact evaluations is MCC, which has set a high standard for the integration of impact evaluation principles into the design of programs at the earliest stages and for the effective use of baseline data and control groups: There are several methods for conducting impact evaluations, with the use of random assignment to create treatment and control groups producing the most rigorous results. Using random assignment, the control group will have—on average—the same characteristics as the treatment group. Thus, the only difference between the two groups is the program, which allows evaluators to measure program impact and attribute the results to the MCC program. For this reason, random assignment is a preferred impact evaluation methodology. Because random assignment is not always feasible, MCC may also use other methods that try to estimate results using a credible comparison group, such as double difference, regression discontinuity, propensity score matching, or other type of regression analysis. (MCC 2007:19) The World Bank has also embarked on the use of impact evaluations for aid programs through its Development Impact Evaluation (DIME) project. Many of the DIME studies involve randomized-experimental evaluations; moreover, “rather than drawing policy conclusions from one-time experiments, DIME evaluates portfolios of similar programs in multiple countries to allow more robust assessments of what works” (Banerjee 2007:30).2 A major symposium on economic development aid also recently explored the pros and cons of conducting impact evaluations of specific programs (Banerjee 2007). While there were numerous objections to the unrestrained use of such methods (which are explored in more detail in Chapters 6 and 7 below), many eminent contributors urged that foreign 2 The CGD has also created the International Initiative for Impact Evaluation to encourage greater use of this method. See http://www.cgdev.org/section/initiatives/_active/evalgap/calltoaction.
OCR for page 51
Improving Democracy Assistance: Building Knowledge through Evaluations and Research aid cannot become more effective if we are unwilling to subject our assumptions about how well various assistance programs work to credible tests. The lead author argued that ignorance of general principles to guide successful economic development (a situation that applies as much or more to our knowledge of democratization) is a powerful reason to take the more humble step of simply trying to determine which aid projects in fact work best in attaining their specific goals. The Department for International Development (DfID) of the United Kingdom has developed the “Drivers of Change” approach because “donors are good at identifying what needs to be done to improve the lives of the poor in developing countries. But they are not always clear about how to make this happen most effectively” (DfID 2004:1). By focusing on the incorporation of “underlying political systems and the mechanics of pro-poor change … in particular the role of institutions—both formal and informal” into their analysis, this approach attempts to uncover more clearly what fosters change and reduces poverty. This approach is currently being widely applied to multiple development contexts and is being taught to numerous DfID country offices (OECD 2005:1). Multipronged approaches to evaluation are being employed by the German Agency for Technical Cooperation (Deutsche Gesellschaft für Technische Zusammenarbeit, GTZ). The range of instruments currently being employed is based on elements of self-evaluation as well as independent and external evaluations. Evaluations aim to address questions of relevance, effectiveness, impact, efficiency, and sustainability.3 These questions are addressed throughout the project’s life span as a means of better understanding the links between inputs and outcome. Commitment by the GTZ to evaluations is demonstrated by the agency’s increased spending on these activities, spending “roughly 1.2 percent of its public benefit turnover on worldwide evaluations—some EUR 9 million a year” (Schmid 2007). The Swedish Agency for International Development Cooperation (SIDA) is also actively considering ways to improve its evaluation tools. Since 2005, SIDA has shifted from post-hoc project evaluations to a focus on underlying assumptions and theories; specifically, SIDA is currently conducting a project that “looks at the program theory of a number of different projects in the area. This evaluation focuses on the theoretical constructs that underpin these projects and tries to discern patterns of 3 For further information, see “Working on Sustainable Results: Evaluation at GTZ.” Available at: http://www.gtz.de/en/leistungsangebote/6332.htm. Accessed on September 12, 2007.
OCR for page 52
Improving Democracy Assistance: Building Knowledge through Evaluations and Research ideas and assumptions that recur across projects and contexts.”4 Building on these initial efforts, SIDA hopes to combine the results of this study with others to “make an overall assessment of the field.” The Norwegian Agency for Development Cooperation (NORAD) has also initiated a new strategy for evaluating the effectiveness of its programs in the area of development assistance. The intent of this new strategy, undertaken in 2006, is to “help Norwegian aid administrators learn from experience by systematizing knowledge, whether it is developed by (themselves), in conjunction with others, or entirely by others. Additionally, the evaluation work has a control function to assess the quality of the development cooperation and determine whether resources applied are commensurate with results achieved.”5 Additional attention is being paid to communicating the results of such evaluations with other agencies and stakeholders; this emphasis on communicating results is widely shared in the donor community. The Danish Ministry of Foreign Affairs has embarked on an extensive study of both its own and multilateral agencies’ evaluations of development and democracy assistance (Danish Ministry of Foreign Affairs 2005). It has found that evaluations vary greatly in method and value, with many evaluations failing to provide unambiguous determinations of program results. In regard to the United Nations Development Program’s central evaluation office, “its potential for helping strengthen accountability and performance assessment is being underexploited, both for the purpose of accountability and as an essential basis for learning” (Danish Ministry of Foreign Affairs 2005:4). Finally, the Canadian International Development Agency (CIDA) has been involved in recent efforts to improve evaluation and learning from collective experiences at international assistance in the area of democracy and governance. In April 1996, as part of its commitment to becoming more results-oriented, CIDA’s President issued the “Results-Based Management in CIDA—Policy Statement.” This statement consolidated the agency’s experience in implementing Results-Based Management (RBM) and established some of the key terms, basic concepts and implementation principles. It has since served as the basis for the development of a variety of management tools, frameworks, and training programs. The Agency Accountability Framework, approved in July 1998, is another 4 For more information on this project, see SIDA, “Sida’s Work with Democracy and Human Rights.” Available at: http://www.sida.se/sida/jsp/sida.jsp?d=1509&a=32056&language=en_US. Accessed on September 12, 2007. 5 For more information, see NORAD’s Web site: http://www.norad.no/default.asp?V_ITEM_ID=5704. The new strategy discussed here can be found at http://www.norad.no/items/5704/38/7418198779/EvaluationPolicy2006-2010.pdf. Accessed on September 12, 2007.
OCR for page 53
Improving Democracy Assistance: Building Knowledge through Evaluations and Research key component of the results-based management approach practiced in CIDA. (CIDA 2007) The CIDA report makes an important distinction, however: “The framework articulates CIDA’s accountabilities in terms of developmental results and operational results at the overall agency level, as well as for its various development initiatives. This distinction is crucial … since the former is defined in terms of actual changes achieved in human development through CIDA’s development initiatives, while the latter represents the administration and management of allocated resources (organisational, human, intellectual, physical/material, etc.) aimed at achieving development results.” In short, there is growing agreement—across think tanks, blue-ribbon panels, donor agencies, and foreign ministries—that current evaluation practices in the area of foreign assistance in general, and of democracy assistance in particular, are inadequate to guide policy and that substantial efforts are needed to improve the knowledge base for policy planning. Thus, USAID is not alone in struggling with these issues. CURRENT POLICY AND LEGAL FRAMEWORK FOR USAID DG ASSESSMENTS AND EVALUATIONS Current DG policies regarding project assessment and evaluation are shaped in large part by broader USAID and U.S. government policies and regulations. Official USAID polices and procedures are set forth in the Automated Directives System (ADS) on its Web site; Series 200 on “Programming Policy” covers monitoring and evaluation in Section 203 on “Assessing and Learning” (USAID ADS 2007). Of particular importance for this report, in 1995 the USAID leadership decided to eliminate the requirement of a formal evaluation for every major project; instead evaluations would be “driven primarily by management need” (Clapp-Wincek and Blue 2001:1). The prior practice of conducting mainly post-hoc evaluations (which were almost entirely process evaluations), often done by teams of consultants brought in specifically for the task, was seen as too expensive and time consuming to be applied to every project. As a result of the change, the number of evaluations for all types of USAID assistance, not just DG, has declined, and the approach to evaluation has evolved over time (Clapp-Wincek and Blue 2001). ADS 188.8.131.52 (“When Is an Evaluation Appropriate?”) lists a number of situations that should require an evaluation: A key management decision is required, and there is inadequate information; Performance information indicates an unexpected result (posi-
OCR for page 60
Improving Democracy Assistance: Building Knowledge through Evaluations and Research Such survey questions make excellent baseline indictors on outcome measures for many DG assistance projects. USAID could then survey assisted and nonassisted groups on the same questions a year later to help determine the impact of DG assistance. This is an example where USAID can make use of extant surveys that already provide baseline data on a variety of relevant outcome measures. A more centralized set of indicators was developed as part of the F Process. As mentioned above, the Foreign Assistance Performance Indicators are intended to measure “both what is being accomplished with U.S. foreign assistance funds and the collective impact of foreign and host-government efforts to advance country development” (U.S. Department of State 2006). Indicators are divided into three levels: (1) the Objective level, which are usually country-level outcomes, as collected by other agencies such as the World Bank, United Nations Development Program, and Freedom House; (2) the Area level, measuring performance of subsectors such as “governing justly and democratically,” which captures most of the objectives pursued by the DG office; and (3) the Element level, which seeks to measure outcomes that are directly attributable to USAID programs, projects, and activities, using data collected primarily by USAID partners in the field (U.S. Department of State 2006). Clearly, USAID has taken the task of performance-based policymaking seriously. The central DG office, the various missions throughout the world, and the implementers who support USAID’s work in the field are all acutely aware of the importance of measurement and the various obstacles encountered. The concerns the committee heard were often not that USAID lacks the right measures to track the outcomes of its programs. Although this can be a major problem for some areas of DG, the committee also saw evidence that USAID field missions and implementers have, and seek to use, appropriate measures for program outcomes. Rather, the problem is that the demands to supply detailed data on basic output measures or to show progress on more general national-level measures overwhelm or sidetrack efforts that might go into collecting data on the substantive outcomes of projects. Matching Tasks with Appropriate Measurement Tools Broadly speaking, USAID is concerned with three measurement-related tasks: (1) project monitoring, (2) project evaluation, and (3) country assessment. The first concerns routine oversight (e.g., whether funds are being properly allocated and implementers are adhering to the terms of a contract). The second concerns whether the program is having its intended effect on society. The third concerns whether a given country
OCR for page 61
Improving Democracy Assistance: Building Knowledge through Evaluations and Research is progressing or regressing in a particular policy area with regard to democratization (USAID 2000). Corresponding to these different tasks are three basic types of indicators: outputs, outcomes, and meso- and macro-level indicators. Output measures track the specific activities of a project, such as the number of individuals trained or the organizations receiving assistance. Outcome measures track policy-relevant factors that are expected to flow from a particular project (e.g., a reduction in corruption in a specific agency, an increase in the autonomy and effectiveness of specific courts, an improvement in the fairness and accuracy of election vote counts). Meso- and macro-level measures are constructed to assess country-level features of specific policy areas and are often at levels of abstraction that are particularly difficult to determine with any exactness. Examples include “judicial autonomy,” “quality of elections,” “strength of civil society,” and “degree of political liberties.” For purposes of clarification, these concepts are included, along with an illustrative example, in Table 2-1. As noted, USAID has made extensive efforts to identify indicators at all levels and across a wide range of sectors of democratic institutions. Nonetheless, in practice a mismatch often arises between the chosen measurement tools and the tasks these tools are expected to perform. Two problems, in particular, stand out. First, based on the committee’s discussions with USAID staff and implementers and further discussions and reviews of project documents during the three field visits described TABLE 2-1 Measurement Tools and Their Uses 1. Output 2. Outcome 3. Meso-Level Indicator 4. Macro-Level Indicator Definition Indicator focused on counting activities or immediate results of a program Indicator focused on policy-relevant impacts of a program Indicator focused on broad national characteristics of a policy area or sector Indicator focused on national levels of democracy Level Generally subnational Generally subnational National National Example: Improving elections Number of polling stations with election observers Reduction in irregularities at the polls (bribing, intimidation) Quality of election Level of democracy (e.g., Freedom House Index of Political Rights) Objective Monitoring Evaluation Assessment Assessment
OCR for page 62
Improving Democracy Assistance: Building Knowledge through Evaluations and Research in Chapter 7, there is continuing concern that the effectiveness of specific USAID DG projects should not be judged on the basis of meso- or macro-level indicators, such as the overall quality of elections or even changes in national-level indicators of democracy. Second is whether current practices lead to overinvestment in generating and collecting basic output measures, as opposed to policy-relevant indicators of project results. The F Process indicators reflect both of these problems, although they had little impact on day-to-day project implementation during the course of this study. As noted above, these mandate collecting data at the “Objective” and “Area” levels, which correspond to macro- and meso-level indicators in the table, and at the “Element” level, which corresponds mostly to the output level. Data at the outcome level, which seems crucial to evaluating how well specific projects actually achieve their immediate goals, thus suffer relative neglect. USAID mission staff and program implementers complained that the success of their projects was being judged (in part) on the basis of macro-level indicators that bore very little or no plausible connection to the projects they were running, given the limited funds expended and the macro nature of the indicator. The most common example given was the use of changes in the Freedom House Political Rights or Civil Liberties Index as evidence of the effectiveness or ineffectiveness of their projects, even though these national-level indices were often quite evidently beyond their control to affect. One implementer commented that his group had benefited from an apparent perception that his project had contributed to improvements in the country’s Freedom House scores over the past several years. While this coincidence worked in his firm’s favor, he made it clear that this was purely coincidental; he was also concerned that if the government policies that currently helped his work changed and made his work more difficult, this would be taken as evidence that his project had “failed.” This is a poor way to measure project effectiveness. To use the example in Table 2-1, although USAID may contribute to better elections or even more democracy in a nation as a whole, there are always multiple forces and often multiple donors at work pursuing these broad goals. USAID may be very successful in helping a country train and deploy election monitors and thus reduce irregularities at the polling stations. But if the national leaders have already excluded viable opposition candidates from running, or deprived them of media access, the resulting flawed elections should not mean that USAID’s specific election project was not effective. As a senior USAID official with extensive experience in many areas of foreign assistance has written regarding this problem: To what degree should a specific democracy project, or even an entire USAID democracy and governance programme, be expected to have an
OCR for page 63
Improving Democracy Assistance: Building Knowledge through Evaluations and Research independent, measurable impact on the overall democratic development in a country? Th[at] sets a high and perhaps unreasonable standard of success. Decades ago, USAID stopped measuring the success of its economic development programmes against changes in the recipient countries’ gross domestic product (GDP). Rather, we look for middle-level indicators: we measure our anti-malaria programmes in the health sector against changes in malaria statistics, our support for legume research against changes in agricultural productivity. What seems to be lacking in democracy and governance programmes, as opposed to these areas of development, is a set of middle-level indicators that have two characteristics: (a) we can agree that they are linked to important characteristics of democracy; and (b) we can plausibly attribute a change in those indicators to a USAID democracy and governance programme. It seems clear that we need to develop a methodology that is able to detect a reasonable, plausible relationship between particular democracy activities and processes of democratic change. (Sarles 2007:52) The appropriate standard for evaluating the effectiveness of specific DG projects and even broader programs is how much of the targeted improvement in behavior and institutions can be observed compared to conditions in groups not supported by such projects or programs. It is in identifying how much difference specific programs or projects made, relative to the investment in such programs, that USAID can learn what works best in given conditions. Of course, it is hoped that such projects do contribute to broader processes of democracy building. But these broader processes are subject to so many varied forces—from strategic interventions to ongoing conflicts to other donors actions and the efforts of various groups in the country to obtain or hold on to power—that macro-level indicators are a misleading guide to whether or not USAID projects are in fact having an impact. USAID efforts in such areas as strengthening election commissions, building independent media, or supporting opposition political parties may be successful at the project level but only become of vital importance to changing overall levels of democracy much later, when other factors internal to the country’s political processes open opportunities for political change (McFaul 2006). Learning “what works” requires that USAID focus its efforts to gather and analyze data on outcomes at the appropriate level for evaluating specific projects—what is labeled “outcome” measures in Table 2-1. The committee wants to stress that there are good reasons for employing meso- and macro-level indicators of democracy and working to improve them. They are important tools for strategic assessment of a country’s current condition and long-term trajectory regarding democratization. But these indicators are usually not good tools for project evaluation. For the latter purpose, what is needed, as Sarles noted, are
OCR for page 64
Improving Democracy Assistance: Building Knowledge through Evaluations and Research measures that are both policy relevant and plausibly linked to a specific policy intervention sponsored by USAID. The committee discusses these policy-relevant outcome measures and provides examples from our field visits in Chapter 7. If one concern regarding USAID’s evaluation processes is that they may rely too much on meso- and macro-measures to judge program success, the committee also found a related concern regarding USAID’s data collection for M&Es: USAID spends by far the bulk of its M&E efforts on data at the “output” level, the first category in Table 2-1. Current M&E Practices and the Balance Among Types of Evaluations In the current guidelines for USAID’s M&E activities given earlier, only monitoring is presented as “an ongoing, routine effort requiring data gathering, analysis, and reporting on results at periodic intervals.” Evaluation, by contrast, is presented as an “occasional” activity to be undertaken “only when needed.” The study undertaken for SORA by Bollen et al (2005) that is discussed in Chapter 1 found that most USAID evaluations were process evaluations. These can provide valuable information and insights but, as already discussed, do not help assess whether a project had its intended impact. Although we cannot claim to have made an exhaustive search, the committee asked repeatedly for examples of impact evaluations for DG projects. The committee learned about very few. One example was a well-designed impact evaluation of a project to support CSOs in Mali (Management Systems International 2000). Here the implementers had persuaded USAID to make use of annual surveys being done in the country, and to use those surveys to measure changes in attitudes toward democracy in three distinct areas: those that received the program, those that were nearby but did not receive the program (to check for spillover effects), and areas that were distant from the sites of USAID activity. The results of this evaluation suggested that USAID programs were not having as much of an impact as the implementers and USAID had hoped to see. The response within USAID was informative. Some USAID staff members were concerned that a great deal of money had been spent to find little impact; complaints were thus made that the evaluation design had not followed changes made while the program was in progress or was not designed to be sensitive to the specific changes USAID was seeking. On the other hand, there were also questions about whether annual surveys were too frequent or too early to capture the results of investments that were likely to pay off only in the longer term. And the project, by funding hundreds of small CSOs, might have suffered from its own design flaws; some of those who took part in the project suggested that fewer
OCR for page 65
Improving Democracy Assistance: Building Knowledge through Evaluations and Research and larger investments in a select set of CSOs might have had a greater impact. All of these explanations might have been explored further as a way to understand when and how impact evaluations work best. But from the committee’s conversations, the primary “lessons” taken away by some personnel at USAID were that such rigorous impact evaluations were not worth the time, effort, and money given what they expected to get from them or did not work. While certainly only a limited number of projects should be subject to full evaluations, proper impact evaluations cannot be carried out unless “ongoing and routine efforts” to gather appropriate data on policy-relevant outcomes before, during, and after the project are designed into an M&E plan from the inception of the project. Current guidelines for M&E activity tend to hinder making choices between impact and process evaluations and in particular make it very difficult to plan the former. Chapter 7 discusses, based on the committee’s field visits to USAID DG missions, the potential for improving, in some cases, USAID M&E activities simply by focusing more efforts on obtaining data at the policy outcome level. Using Evaluations Wisely: USAID as a Learning Organization Even if USAID were to complete a series of rigorous evaluations with ideal data and obtained valuable conclusions regarding the effectiveness of its projects, these results would be of negligible value if they were not disseminated through the organization in a way that led to substantial learning and were not used as inputs to planning and implementation of future DG projects. Unfortunately, much of USAID’s former learning capacity has been reduced by recent changes in agency practice. A longstanding problem is that much project evaluation material is simply maintained in mission archives or lost altogether (Clapp-Wincek and Blue 2001). For example, the committee found that when project evaluations involved surveys, while the results might be filed in formal evaluation reports, the underlying raw data were discarded or kept by the survey firm after the evaluation was completed. While many case studies of past projects, as well as many formal evaluations, are supposed to be available to all USAID staff online, not all evaluations were easy to locate. Moreover, simply posting evaluations online does not facilitate discussion, absorption, and use of lessons learned. Without a central evaluation office to identify key findings and organize conferences or meetings of DG officers to discuss those findings, the information is effectively lost. As mentioned above, CDIE is no longer active. USAID also formerly had conferences of DG officers to discuss not only CDIE-sponsored evaluations but also research and reports on DG assistance undertaken by
OCR for page 66
Improving Democracy Assistance: Building Knowledge through Evaluations and Research NGOs, academics, and other donors. These activities appear to have significantly atrophied. The committee is concerned about the loss of these learning activities. Even the best evaluations will not be used wisely if their lessons are not actively discussed and disseminated in USAID and placed in the context of lessons learned from other sources, including research on DG assistance from outside the agency and the experience of DG officers themselves. The committee discusses the means to help USAID become a more effective learning organization in Chapters 8 and 9. CONCLUSIONS This review of current evaluation practices regarding development assistance in general and USAID’s DG programs in particular leads the committee to a number of findings: The use of impact evaluations to determine the effects of many parts of foreign assistance, including DG, has been historically weak across the development community. Within USAID the evaluations most commonly undertaken for DG programs are process and participatory evaluations; impact evaluations are a comparatively underutilized element in the current mix of M&E activities. Some donors and international agencies are beginning to implement more impact evaluations. Nonetheless, considerable concerns and skepticism remain regarding the feasibility and appropriateness of applying impact evaluations to DG projects. These need to be taken seriously and addressed in any effort to introduce them to USAID. Current practices regarding measurement and data collection show a tendency to emphasize collection of output measures rather than policy-relevant outcome measures as the core of M&E activities. There is also a tendency, in part because of the lack of good meso-level indicators, to judge the success of DG programs by changes in macro-level measures of a country’s overall level of democracy, rather than by achieving outcomes more relevant to a project’s plausible impacts. Much useful information aside from evaluations, such as survey data and reports, detailed spending breakdowns, and mission director and DG staff reports, remains dispersed and difficult to access. USAID has made extensive investments in developing outcome measures across all its program areas; these provide a sound basis for improving measurements of the policy-relevant effects of DG projects. Once completed, there are few organizational mechanisms for broad discussion of USAID evaluations among DG officers or for integra-
OCR for page 67
Improving Democracy Assistance: Building Knowledge through Evaluations and Research tion of evaluation findings with the large range of research on democracy and democracy assistance being carried on outside the agency. Many of the mechanisms and opportunities for providing organizational learning were carried out under the aegis of the CDIE. The dissolution of this unit, combined with the longer term decline in regular evaluation of projects, means that USAID’s capacity for drawing and sharing lessons has disappeared. The DG office’s own efforts to provide opportunities for DG officers and implementers to meet and learn from one another and outside experts have also been eliminated. Evaluation is a complex process, so that improving the mix of evaluations and their use, and in particular increasing the role of impact evaluations in that mix, will require a combination of changes in USAID practices. Gaining new knowledge from impact evaluations will depend on developing good evaluation designs (a task that requires special skills and expertise), acquiring good baseline data, choosing appropriate measures, and collecting data on valid comparison groups. Determining how to feasibly add these activities to the current mix of M&E activities will require attention to the procedures governing contract bidding, selection, and implementation. The committee’s recommendations for how USAID should address these issues are presented in Chapter 9. Moreover, better evaluations are but one component of an overall design for learning, as making the best use of evaluations requires placing the results of all evaluations in their varied contexts and historical perspectives. This requires regular activities within USAID to absorb and disseminate lessons from case studies, field experience, and research from outside USAID on the broader topics of democracy and social change. The committee’s recommendations on these issues are presented in Chapter 8. These recommendations are intended to improve the value of USAID’s overall mix of evaluations, to enrich its strategic assessments, and to enhance its capacity to share and learn from a variety of sources—both internal and from the broader community—about what works and what does not in efforts to support democratic progress. REFERENCES Asia Foundation. 2007. Afghanistan in 2007: A Survey of the Afghan People. Available at: http://www.asiafoundation.org/pdf/AG-survey07.pdf. Accessed on February 23, 2008. Banerjee, A.V. 2007. Making Aid Work. Cambridge, MA: MIT Press. Bollen, K., Paxton, P., and Morishima, R. 2005. Assessing International Evaluations: An Example from USAID’s Democracy and Governance Programs. American Journal of Evaluation 26:189-203.
OCR for page 68
Improving Democracy Assistance: Building Knowledge through Evaluations and Research CIDA (Canadian International Development Agency). 2007. Results-Based Management in CIDA: An Introductory Guide to the Concepts and Principles. Available at: http://www.acdi-cida.gc.ca/CIDAWEB/acdicida.nsf/En/EMA-218132656-PPK#1. Accessed on September 12, 2007. Clapp-Wincek, C., and Blue, R. 2001. Evaluation of Recent USAID Evaluation Experience. Washington, DC: USAID, Center for Development Information and Evaluation. Danish Ministry of Foreign Affairs. 2005. Peer Assessment of Evaluation in Multilateral Organizations: United Nations Development Programme, by M. Cole et al. Copenhagen: Ministry of Foreign Affairs of Denmark. DfID (Department for International Development). 2004. Public Information Note: Drivers of Change. Available at: http://www.gsdrc.org/docs/open/DOC59.pdf. Accessed on September 16, 2007. Green, A.T., and Kohl, R.D. 2007. Challenges of Evaluating Democracy Assistance: Perspectives from the Donor Side. Democratization 14(1):151-165. House of Commons (Canada). 2007. Advancing Canada’s Role in International Support for Democratic Development. Ottawa: Standing Committee on Foreign Affairs and International Development. Jacquet, P. 2006. Evaluations and Aid Effectiveness. In Rescuing the World Bank: A CGD Working Group Report and Collected Essays, N. Birdsall, ed. Washington, DC: Center for Global Development. Kessler, G. 2007. Where U.S. Aid Goes Is Clearer, But System Might Not Be Better. Washington Post, p. A1. McFaul, M. 2006. The 2004 Presidential Elections in Ukraine and the Orange Revolution: The Role of U.S. Assistance. Washington, DC: USAID, Office for Democracy and Governance. McMurtry, V.A. 2005. Performance Management and Budgeting in the Federal Government: Brief History and Recent Developments. Washington, DC: Congressional Research Service. Management Systems International. 2000. Third Annual Performance Measurement Survey: Data Analysis Report. USAID/Mali Democratic Governance Strategic Objective. Unpublished. Millennium Challenge Corporation. 2007. Fiscal Year 2007 Guidance for Compact Eligible Countries, Chapter 29, Guidelines for Monitoring and Evaluation Plans, p. 19. Available at: http://www.mcc.gov/countrytools/compact/fy07guidance/english/29-guidelinesformande.pdf. Accessed on September 12, 2007. OECD (Organization for Economic Cooperation and Development). 2005. Lessons Learned on the Use of Power and Drivers of Change Analyses in Development Operation. Review commissioned by the OECD DAC Network on Governance, Executive Summary. Available at: http://www.gsdrc.org/docs/open/DOC82.pdf. Accessed on September 12, 2007. Sarles, M. 2007. Evaluating the Impact and Effectiveness of USAID’s Democracy and Governance Programmes, in Evaluating Democracy Support: Methods and Experiences, P. Burnell, ed. Stockholm: International Institute for Democracy and Electoral Assistance and Swedish International Development Cooperation Agency. Savedoff, W.D., Levine, R., and Birdsall, N. 2006. When Will We Ever Learn? Improving Lives Through Impact Evaluation. Washington, DC: Center for Global Development. Schmid, A. 2007. Measuring Development. Available at: http://www.gtz.de/de/dokumente/ELR-en-30-31.pdf. Accessed on September 12, 2007. Shadish, W.R., Cook, T.D., and Campbell, D.T. 2001. Experimental and Quasi-Experimental Designs for Generalized Causal Inference, 2nd ed. Boston: Houghton Mifflin. USAID ADS. 2007. Available at: http://www.usaid.gov/policy/ads/200/. Accessed on August 2, 2007. USAID (U.S. Agency for International Development). 1997. The Role of Evaluation in USAID. TIPS 11. Washington, DC: USAID.
OCR for page 69
Improving Democracy Assistance: Building Knowledge through Evaluations and Research USAID (U.S. Agency for International Development). 1998. Handbook of Democracy and Governance Program Indicators. Washington, DC: Center for Democracy and Governance. USAID. Available at: http://www.usaid.gov/our_work/democracy_and_governance/publications/pdfs/pnacc390.pdf. Accessed on August 1, 2007. USAID (U.S. Agency for International Development). 2000. Conducting a DG Assessment: A Framework for Strategy Development. Available at: http://www.usaid.gov/our_work/democracy_and_governance. Accessed on August 26, 2007. USAID (U.S. Agency for International Development). 2006. U.S. Foreign Assistance Reform. Available at: http://www.usaid.gov/about_usaid/dfa/. Accessed on August 2, 2007. USAID (U.S. Agency for International Development). 2007. Decentralization and Democratic Local Governance (DDLG) Handbook. Draft. U.S. Department of State. 2006. U.S. Foreign Assistance Performance Indicators for Use in Developing FY2007 Operational Plans, Annex 3: Governing Justly and Democratically: Indicators and Definitions. Available at: http://www.state.gov/f/releases/factsheets2007/78450.htm. Accessed on August 25, 2007. Wholey, J.S., Hatry, H.P., and Newcomer, K.E., eds. 2004. Handbook of Practical Program Evaluation, 2nd ed. San Francisco: Jossey-Bass. World Bank. 2004. Monitoring & Evaluation: Some Tools, Methods, and Approaches. Washington, DC: World Bank. de Zeeuw, J., and Kumar, K. 2006. Promoting Democracy in Postconflict Societies. Boulder: Lynne Rienner.
OCR for page 70
Improving Democracy Assistance: Building Knowledge through Evaluations and Research This page intentionally left blank.