Chapter 1
Statement of the Problem

GPRA and Research

In 1993, Congress passed the Government Performance and Results Act (GPRA) with broad bipartisan support. The law is part of a set of budget-reform measures intended to increase the effectiveness and efficiency of government. Both the General Accounting Office (GAO) and the Office of Management and Budget (OMB) testified in favor of the bill, and the President's National Performance Review advocated its implementation. Unlike several predecessor systems (program planning and budgeting, management by objectives, and zero-based budgeting), GPRA is not an executive branch initiative but rather a congressional mandate. It has received a high level of attention in both the Senate and the House of Representatives.

The specific goal of GPRA is to focus agency and oversight attention on the outcomes of government activities—the results produced for the American public. The approach is to develop measures of outcomes that can be tied to annual budget allocations. To that end, the law requires each agency to produce three documents: a strategic plan, which sets general goals and objectives over a minimal 5-year period; a performance plan, which translates the goals of the strategic plan into annual targets; and a performance report, which demonstrates whether the targets were met. Agencies delivered the first required strategic plans to Congress in September 1997 and the first performance plans in the spring of 1998. Performance reports are due in March 2000. The law calls for strategic plans to be updated every 3 years and the other documents annually.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 13
--> Chapter 1 Statement of the Problem GPRA and Research In 1993, Congress passed the Government Performance and Results Act (GPRA) with broad bipartisan support. The law is part of a set of budget-reform measures intended to increase the effectiveness and efficiency of government. Both the General Accounting Office (GAO) and the Office of Management and Budget (OMB) testified in favor of the bill, and the President's National Performance Review advocated its implementation. Unlike several predecessor systems (program planning and budgeting, management by objectives, and zero-based budgeting), GPRA is not an executive branch initiative but rather a congressional mandate. It has received a high level of attention in both the Senate and the House of Representatives. The specific goal of GPRA is to focus agency and oversight attention on the outcomes of government activities—the results produced for the American public. The approach is to develop measures of outcomes that can be tied to annual budget allocations. To that end, the law requires each agency to produce three documents: a strategic plan, which sets general goals and objectives over a minimal 5-year period; a performance plan, which translates the goals of the strategic plan into annual targets; and a performance report, which demonstrates whether the targets were met. Agencies delivered the first required strategic plans to Congress in September 1997 and the first performance plans in the spring of 1998. Performance reports are due in March 2000. The law calls for strategic plans to be updated every 3 years and the other documents annually.

OCR for page 13
--> The general principles of GPRA have been implemented by many state governments and in other countries (for example, Canada, New Zealand, and the U.K.), but implementation by the U.S. federal government is the largest scale application of the concept to date and somewhat different. Over the last 5 years, various states have tried to develop performance measures of their investments. With respect to performance measures of science and technology activities, states tend to rely on an economic-development perspective with measures reflecting job creation and commercialization. Managers struggle to define appropriate measures, and level-of-activity measures dominate their assessments.3 With respect to other countries, our limited review of their experiences showed that most are struggling with the same issues that the United States is concerned with, notably how to measure the results of basic research. Not every aspect of the system worked perfectly the first time around in the United States. Some agencies started the learning process earlier and scaled up faster than others. OMB allowed considerable agency experimentation with different approaches to similar activities, waiting to see what ideas emerged. The expectations of and thus the guidance from the various congressional and executive audiences for strategic and performance plans have not always been the same and that has made it difficult for agencies to develop plans agreeable to all parties. Groups outside government that are likely to be interested in agency implementation of GPRA have not been consulted as extensively as envisioned. There is general agreement that all relevant parties should be engaged in a continuing learning process, and there are high expectations for improvement in future iterations. The development of plans to implement GPRA has been particularly difficult for agencies responsible for research activities supported by the federal government. A report by GAO (GAO, 1997) indicates that measuring performance and results is particu-

OCR for page 13
--> larly challenging for regulatory programs, scientific research programs, and programs that deliver services to taxpayers through third parties, such as state and local governments. Findings from Workshops From January through June 1998, COSEPUP held a series of workshops to gather information about the implementation of GPRA. The first workshop, cosponsored with the Academy Industry Program, focused on the approaches that industry uses to develop strategic plans and performance assessments. Industry participants emphasized the importance of having a strategic plan that clearly articulates the goals and objectives of the organization. One of the industry participants said that the objective of their industrial research is "knowledge generation with a purpose." The industry representative indicated that the company must first support world-class research programs that create new ideas; second, relate the new ideas to an important need within the organization or project; and third, build new competence in technologies and people. With respect to performance assessment, many industry participants noted that results of applied research and development programs are more easily quantified than results of basic research. However, even though they might not be able to quantify results of basic research, they nonetheless support it because they believe it important to their business; investments in basic research do pay off over time.4 With respect to assessing basic research, industry representatives indicated that they must rely on the judgment of individuals knowledgeable about the content of the research and the objectives of the organization to evaluate the results of such efforts. Some industry participants stressed the importance of giving careful consideration to any metrics one adopts—whether in industrial or government research. It is important to choose measures well and use them efficiently to minimize non-productive efforts. The metrics used also will change the behavior of the people being

OCR for page 13
--> measured. For example, in basic research, if you measure relatively unimportant indicators, such as the number of publications per researcher instead of the quality of those publications, you will foster activities that may not be very productive or useful to the organization. A successful performance assessment program will both encourage positive behavior and discourage negative behavior. Metrics must be simple, not easily manipulated, and drive the right behavior. Most industry R&D metrics are more applicable to assessing applied research and technology development activities in the mission agencies. The second COSEPUP workshop focused on the strategic and performance plans of 10 federal agencies: the Department of Defense, the Department of Energy, the Department of Transportation, the Department of Agriculture, the National Aeronautics and Space Administration, the National Institutes of Health, the National Science Foundation, the Environmental Protection Agency, the National Institute of Standards and Technology, and the National Oceanic and Atmospheric Administration. As might be expected, most of these organizations use different approaches to translate the goals in their strategic plans into performance goals for scientific and engineering research. Some agencies use qualitative, others quantitative, and still others, a combination of qualitative and quantitative measures. There was a strong consensus among the agencies that the practical outcomes of basic research cannot be captured by quantitative measures alone. Agency representatives generally agreed that progress in program management and facility operation can be assigned quantitative values. Agencies with long-term targeted research goals have generally translated them into short-term milestones that can be achieved within a 2-year time horizon for performance planning and reporting. Agencies that seek advances in knowledge in broad fields rather than targeted ones, have not used the milestone approach to performance planning and reporting.

OCR for page 13
--> Some agencies have had difficulty in implementing GPRA. When preparing GPRA strategic and performance plans, some agencies are more likely than others to highlight research activities. The major variable is the magnitude of research relative to the agency's other activities. Submersion of research within large agencies makes it impossible for an integrated view of the federal science and technology investment to emerge through the GPRA process and is therefore a matter of concern for COSEPUP. The performance plans of the agencies tend to emphasize short-term applied research with practical outcomes. Some participants expressed concern that this emphasis would skew funding away from long-term research that is difficult to measure against annual milestones. Some participants indicated that a desirable result of GPRA would be to increase teamwork among the agencies, as well as to improve communication between research agencies and oversight entities, including Congress, OMB, and GAO. Another theme that recurred throughout the workshop was that the research community has a low level of awareness and is not strongly involved in the GPRA process. The education and training of graduate and undergraduate students are among the most important duties and durable legacies of the research agencies. Yet human resources was not thoroughly identified or addressed in most agencies' performance plans. Peer review was identified as the primary method for assessing the quality of research. However, the process by which peer review is applied varies widely among the agencies. Peer review of projects, grants, and contracts differs from peer review of programs and of intramural and extramural research. Those differences led COSEPUP to hold a third workshop focused on peer review and other methods for evaluating research. In its third workshop, COSEPUP discussed the various methods available for evaluating research. As a result of that

OCR for page 13
--> workshop and other discussions, COSEPUP found that the following methods are currently available for analyzing research: Bibliometric analysis. Economic rate of return. Peer review. Case study. Retrospective analysis. Benchmarking. Each of these methods is briefly described below.5 The pros and cons associated with each technique are summarized in Table 1, later in this chapter. Bibliometric Analysis6 A technique known as bibliometric analysis, which includes publications, citations, and patent counts, is based on the premise that a researcher's work has value when it is judged by peers to have merit. A manuscript is published in a refereed journal only when expert reviewers and the editor approve its quality; a published work is cited by other researchers as recognition of its authority; and a published work is cited as evidence by a company applying for a patent. By extension, the more times a work is cited, the greater its merit. The primary benefit of bibliometric analysis is its quantitative nature. Furthermore, it correlates well (approximately 60% in one study) with peer review when both methods are used. The primary argument against bibliometric analysis is that bibliometric measurements treat all citations as equally important. However, many citations refer to routine methods or statistical designs, modifications of techniques, or standard data or even refute the validity of a paper. Other problems are caused by citing the first-named author of a publication when the customs that determine the order in which authors are listed vary by fields. In

OCR for page 13
--> TABLE 1: CURRENT METHODS USED FOR EVALUATING RESEARCH Methods Pro Con Bibliometric analysis Quantitative; useful on aggregate basis to evaluate quality for some programs and fields At best, measures only quantity; not usefulacross all programs & fields; comparisons across fields or countries difficult; can be artificially influenced Economic rate of return Quantitative; shows economic benefits of research Measures only financial benefits, not social benefits (such as health-quality improvements); time separating research from economic benefit is often long; not usefulacross all programs and fields Peer review Well-understood method and practices; provides evaluation of quality of research and sometimes other factors; already an existing part of most federal-agency programs in evaluating the quality of research projects Focuses primarily on research quality; other elements are secondary; evaluation usually of research projects, not programs; great variance across agencies; concerns regarding use of "old boy network"; results depend on involvement of high-quality people in process Case studies Provides understanding of effects of institutional, organizational, and technical factors influencing research process, so process can be improved; illustrates all types of benefits of research process Happenstance cases not comparable across programs; focus on cases that might involve many programs or fields making it difficult to assess federal-program benefit Retrospective analysis Useful for identifying linkages between federal programs and innovations over long intervals of research investment Not useful as a short-term evaluation tool because of long interval between research and practical outcomes Benchmarking Provides a tool for comparison across programs and countries Focused on fields, not federal research programs

OCR for page 13
--> addition, different mores among research communities—whether particular disciplines or countries—an skew results when they are used comparatively (for example, far fewer outlets are available for Russian publications than for U.S. publications). Furthermore, in emphasizing counts, researchers are apt to take actions that artificially increase the number of citations they receive or reduce their research in fields that offer less opportunity of immediate or frequent publication or in critical related fields (such as education) that do not offer publication opportunities. Economic Rate of Return In recent years, economists have developed a number of techniques to estimate the economic benefits (such as rate of return) of research. The primary benefit of this method is that it provides a metric of research outcomes. However, there are a number of difficulties. In particular, the American Enterprise Institute (AEI, 1994) found that existing economic methods and data are sufficient to measure only a subset of important dimensions of the outcomes and impacts of fundamental science. Economic methods are best suited to assessing mission-agency programs and less-well suited to assessing the work of fundamental research agencies, particularly on an annual basis. Furthermore, economists are not able to estimate the benefit-to-cost ratio "at the margin" for fundamental science (that is, the marginal rate of return—or how much economic benefit is received for an additional dollar investment in research), and it is this information that is needed to make policy decisions. Finally, the time that separates the research from its ultimate beneficial outcome is often very long—50-some years is not unusual. Peer Review7 Peer review is the method by which science exercises continuous self-evaluation and correction. It is the centerpiece of

OCR for page 13
--> many federal agencies' approach to evaluating proposed, current, and past research in science and engineering. Peer review, like all human judgments, can be affected by self-interest, especially the favoritism of friendship and the prejudice of antagonism. However, those distortions can be minimized by the rigor of peer selection, the integrity and independence of individual reviewers, and the use of bibliometric analysis and other quantitative techniques to complement the subjective nature of peer review. Peer review is not equally appropriate across the wide span of research performed by federal agencies. We might visualize at one end of the spectrum the fundamental, long-term projects whose ultimate outcomes are unpredictable and at the other end programs of incremental or developmental work whose results are easier to predict within fairly narrow time limits. Projects of the latter type can often be evaluated in a rigorously quantifiable fashion by appropriate metrics. It is for the former kind of research, whose results are not easily quantified, especially while the work is in progress, that peer review of quality and leadership is required and generally effective. Agency managers have the responsibility of designing review techniques that suit the nature of each individual research program being evaluated. Case Studies Historical accounts of the social and intellectual developments that led to key events in science or applications of science illuminate the discovery process in greater depth than other methods. The chief advantage of case studies is that they can be used to understand the effects of institutional, organizational, and technical factors on the research process and can identify important outcomes of the research process that are not purely intellectual, such as the collaboration of other researchers, the training of young researchers, and the development of productive research centers.

OCR for page 13
--> Difficulties of case studies are that they can be expensive, and that the validity of the results and conclusions depends on the objectivity, investigative skills, and scientific knowledge of the persons doing them. Retrospective Analysis Retrospective analyses are related to case studies in that they also try to reconstruct history; however, they focus on multiple scientific or technological innovations rather than just one. The goal is to identify linkages between innovations and particular types of antecedent events (usually either funding or research). Such analysis is usually done by a panel of experts or investigators. This method is most appropriate for assessing a particular type of accountability question (for example, impact of National Science Foundation funding on mathematics research). The primary disadvantage of this type of analysis is that it takes a long time to conduct and thus is not useful as a tool to provide short-term evaluations for improving research policy and management. Benchmarking8 As noted earlier, maintaining leadership across the frontiers of science is a critical element of the nation's investment strategy for research (COSEPUP, 1993). The question addressed here is, whether an agency's or the nation's research and educational programs are at the cutting edge? This assessment is made by a panel of international and national academic and industrial experts in a given field and in related fields on the basis of available quantitative and qualitative data. COSEPUP has conducted a number of experimental efforts on benchmarking the United States' position in selected fields. Programs can be benchmarked in a similar fashion.

OCR for page 13
--> Notes 3.   For more information regarding individual states see http://www.gsu.edu/~padjem/projects.html.[G-14] 4.   For additional information on corporate experience in assessing research and its applicability to federal research, see Commission on Physical Sciences, Mathematics, and Applications, (1995) Research Restructuring and Assessment, National Academy Press, Washington, D.C. 5.   These descriptions were adapted from the National Science and Technology Council's (NSTC) Assessing Fundamental Science, 1996. 6.   Small, Henry G. "A Co-Citation Model of a Scientific Specialty: A Longitudinal Study of Collagen Research" Social Studies of Science Vol. 7 (1977), 139–66. Anderson, Richard C., F. Narin, Paul McAllister "Publication Ratings versus Peer Ratings of Universities" Journal of the American Society for Information Science March (1978) 91–103. 7.   For additional information on peer review, see Atkinson, Richard C. and William A. Blanpied, Peer Review and the Public Interest, Issues in Science and Technology, vol 1. no. 4, 1985; Bozeman, B. and J. Melkers, "Peer Review and Evaluation of R&D Impacts," Evaluating R&D Impacts, Kluwer Academic Publishers, Norwell, Mass., (1993) 79–98; Cole, J. and S. Cole, Peer Review in the National Science Foundation, Washington, D.C.: National Academy Press, 1981; GAO, Peer Review; Reforms Needed to Ensure Fairness in Federal Agency Grant Selection, June 1984. 8.   See COSEPUP, 1997 and COSEPUP, 1998.

OCR for page 13
This page in the original is blank.