Click for next page ( 120


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 119
APPENDIX C THE PROMISES AND LIMITATIONS OF PERFORMANCE MEASURES Irwin Feller Senior Visiting Scientist, American Association for the Advancement of Science and Professor Emeritus, Economics, Pennsylvania State University I often say that when you can measure what you are speaking about, and express it in numbers, you know something about it; but when you cannot measure it, when you cannot express it in numbers, your knowledge is of a meager and unsatisfactory kind; it may be the beginning of knowledge, but you have scarcely in your thoughts advanced to the state of Science, whatever the matter may be. -Baron William Thomson Kelvin. When you can measure it, when you can express it in numbers, your knowledge is still of a meager and unsatisfactory kind. -Jacob Viner INTRODUCTION Performance measurement is a politically powerful but analytical diffuse concept. Its meanings and implementation can vary from forcing fundamental changes in the ways in which public sector organizations and assessed and thus public funds allocated, as evinced by recent state government initiatives across all levels of U.S. education, to constituting old wine in new bottles, especially to empirically oriented economists, 119

OCR for page 119
120 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH program evaluators and those weaned in the days of program-planning- budgeting. Addressing this analytical diffuseness, this paper assesses the promises and limitations of performance measures as means of measuring economic and other returns of the Federal government’s investments in basic and applied research. Referring to promises and limitations in the same sentence implies differences in perspectives and assessments about the relevance, reliability, validity, transparency, and suitability of performance measures to guide decision making. These differences exist. A stylized dichotomization is as follows: • endorsement of requirements for, belief in, scholarly search supportive of, and opportunistic provision of performance measures that respond or cater to executive and legislative branch expectations or hopes that such measures will facilitate evidence-based decision- making; • research and experientially based assessment that even when well done and used by adepts, performance measures at best provide limited guidance for future expenditure decisions and at worst are rife with potential for incorrect, faddish, chimerical, and counterproductive decisions. The tensions created by these differences are best captured by the observation of Grover Cleveland, 22d and 24th President of the United States: “It’s a condition we confront — not a theory.” The condition is the set of Congressional and Executive requirements upon Federal agencies to specify performance goals and to provide evidence, preferably in quantitative form, that advances towards these goals have been made. The set includes by now familiar legislation such as the Government Performance and Results (GPRA) Act of 1993, the Government Performance and Results Modernization Act of 2010, and requirements of the 2009 American Recovery and Reinvestment Act’s (ARRA) that Federal agencies provide evidence that their expenditures under the Act have stimulated job creation. It also includes comparable Executive branch directives. These include the Bush II Administration’s articulation in 2002 of R and D Investment Criteria , subsequent implementation of these criteria by the Office of Management and Budget (OMB) via its Performance Assessment Rating Tool (PART) procedures, and the Obama Administration’s 2009 OMB memorandum on Science and Technology Priorities for the FY 2011 Budget, that states that “Agencies should develop outcome-oriented goals for their science

OCR for page 119
121 APPENDIX C and technology activities…”, and “… develop science of science policy” tools that can improve management of their research and development portfolios and better assess the impact of their science and technology investments.”To these formal requirements may be added recent and likely increasing demands by congressional authorization and appropriations committees that agencies produce quantitative evidence that their activities have produced results, or impacts. Theory here stands for a more complex, bifurcated situation, creating what Manski has termed dueling certitudes: internally consistent lines of policy analysis that lead to sharply contradictory predictions. (Manski, 2010). One theoretical branch is the currently dominant new public sector management paradigm branch. This paradigm emphasizes strategic planning, accountability, measurement, and transparency across all public sector functions, leading to, and requiring the use of evidence as the basis for informed decision making (OECD, 2005; Kettl, 1997). The second branch is the accumulated and emerging theoretical and empirical body of knowledge on the dynamics of scientific inquiry and the processes and channels by which public sector support of research produces societal impacts. This body of knowledge performs a dual role. Its findings undergird many of the conceptualizations and expectations that policymakers have of the magnitude and characteristics of the returns to public investments in research and of the ways in which these returns can (or should) be measured. However, it is also a major source of the cautions, caveats, and concerns expressed by agency personnel, scientists, and large segments of the academic and science policy research communities that efforts to formally employ performance measures to measure public returns (of whatever form) to research and to then tie support for research to such measures are overly optimistic, if not chimerical, and rife with the potential for counterproductive and perverse consequences. It is in the context of these differing perspectives that this paper is written. Its central thesis is that the promises and limitations of performance impact measures as forms of evidence relate to the decision- making context in which they are used. Context here means who is asking what type of question(s) with respect to what type of decision(s) and for what purpose(s). It also means the organizational characteristics of the Federal agency—can the activities of its operators be observed, and can the results of these activities be observed? (Wilson, 1989, pp. 158-171).

OCR for page 119
122 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH This emphasis on context produces a kaleidoscopic assessment, such that promises and limitations change shape and hues as the decision and organizational contexts shift. An emphasis on context also highlights the analytical and policy risks of assessing the promises and limitations of performance impact measures in terms of stylized characteristics. Performance measures for example can be used for several different purposes, such as monitoring, benchmarking, evaluation, foresight, and advocacy (making a case) (Gault, 2010). Consistent with the STEP- COSEPUP workshop’s stated objective to provide expert guidance to Federal policymakers in the Executive and Legislative branches about what is known and what needs to be better known about how to assess economic and other returns to Federal investments in science and technology—the paper’s focus is mainly on evaluation, although it segues at times into the other functions. Approached in this way, performance is a noun, not an adjective. It also is a synonym for impact. This strict construction is made to separate the following analysis from the larger, often looser language associated with the topic in which performance is an adjective, as in the setting of strategic or annual (performance) goals called for by GPRA; as an indicator of current, changed or comparative (benchmarking) position, as employed for example in the National Science Foundation’s biennial Science and Engineering Indicators reports; or as symptomatic measures of the health/vitality/position of facets of the U.S. science, technology and innovation enterprise, as represented for example in Rising Above the Gathering Storm (2007), where they are employed as evidence that things are amiss or deficient —a performance gap—in the state of the world. The paper proceeds in a sequential, if accelerated manner. Section II contains a brief literature review and an outline of the paper’s bounded scope. Section III presents a general discussion of the promises and limitations of performance measures to assess the impacts of Federal investments in research. Section IV illustrates the specific forms of the promises and limitations of performance measures in the context of what it terms the “big” and “small” questions in contemporary U.S. science policy. Section V offers a personal, “bottom line” perspective on what all this means.

OCR for page 119
123 APPENDIX C Analytical Framework and Scope The paper’s analytical framework and empirical findings derive mainly from economics, although its coverage of performance measures is broader than economic statistics and its treatment of impact assessment is based mainly on precepts of evaluation design. The choice of framework accords with the workshop’s objective, which is suffused with connotations of efficiency in resource allocation, or more colloquially, seeking the highest possible returns on the public’s (taxpayer’s) money. Adding to the appropriateness and relevance of the chosen approach is that many of the arguments on behalf of Federal investments in research, both basic and applied, draw upon economic theories and findings. As Godin has noted, “We owe most of the quantitative analysis of S and T to economists” (Godin, 2005, p. 3). An immediate consequence of treating the workshop’s objective in this manner is that a goodly number of relevant and important subjects, policy issues, and analytical frameworks are touched upon only briefly, while others are ignored completely. Thus, only passing attention is taken of the historical, institutional and political influences that in fact have shaped and continue to shape the setting of U.S. national science priorities and Federal R and D budgets, whether viewed in terms of allocations by broad objectives, agencies, fields of science, or modes of support. Moreover, interpreting the workshop’s objective as a search for measures related to allocative efficiency obviously sidesteps topics and rich streams of research related to political influences on national research priorities (e.g., Hedge and Mowery, 2010) or which generate earmarks, set asides, and sheltered capacity building competitions that palpably diverge from efficiency objectives (e.g., Savage, 1999; Payne, 2006). Likewise omitted are consideration of the normative goals underlying Federal support of research and the distributive effects or societal impacts that flow from it (Bozeman and Sarewitz, 2011). Another consequence is that the paper is primarily about performance measurement as a generic approach rather than about the reliability and validity of specific measures. Where reference is made to specific measures, it is to illustrate larger themes. In fact, there is no shortage of “metrics”, in GPRA-speak- to measure the outputs and outcomes of Federal investments in research. Geisler (2000; pps. 254- 255) offers a well presented catalogue of 37 “core” metrics. These are organized in terms of immediate outputs (e.g., number of publications in refereed journals; number of patents); intermediate outputs (e.g., number

OCR for page 119
124 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH of improved or new products produced; cost reductions from new and improved products/processes); pre-ultimate outputs (e.g., savings, cost reductions, and income generated by improved health, productivity, safety, and mobility of the workshop at sectoral and national levels); and ultimate outputs (e.g., improved GDP/capital; improved level of overall satisfaction and happiness of population.) The list is readily expanded to include combinations of single indicators, new data sets that permit disaggregation of existing measures, and new and improved versions of mainstream measures–the rapid and seemingly accelerating move from publication counts to citation measures to impact factors to h-indices and beyond being one such example. Also in abundance are various scorecards or rankings based on assemblages and weightings of any number of performance measures related to scientific advance, technological advance, competitiveness, innovativeness, university performance, STEM-based educational proficiency and the like that have been used to position US performance within international hierarchies or norms. Indicator construction for science and technology has become a profession in its own stead, with regular international conferences—The European Network of Indicators Designers will hold its 2011 Science and Technology Indicators Conference in Rome, Italy, in September, 2011— and a well recognized set of journals in which new work is published. Plentiful too and continuously being updated are compendia and manuals covering international best practice on how to evaluate public sector R and D programs. These works cover a wide range of performance impact measures and methodologies, including benefit-cost analysis, patent analysis, network analysis, bibliometrics, historical tracings, innovation and on the outputs produced by several different Federal agencies—health, energy, agriculture, environmental protection, international competitiveness, employment. (For recent overviews, see Wagner and Flanagan, 1995; Ruegg and Feller, 2003; Godin, 2005, chpt. 15; Kanninen and Lemola, 2006; Grant, et.al, 2009; Foray, 2009; Gault, 2010; Link and Scott, 2011). Finally, in setting expectations for the workshop, it is perhaps helpful to note that the topics and issues to be discussed are not new ones. Rather, they form the substance of at least 60 years of theoretical, empirical and interpretative work, producing what by now must be a five foot high stack of reports and workshop proceedings, including a sizeable number originating under National Academies’ auspices. The recurrent themes addressed in this previous work, evident since the

OCR for page 119
125 APPENDIX C program-planning-budgeting initiatives of the 1960s and continuing on through its several variants, are a search for decision algorithms that will lead to the improvement in government budgeting and operations and a search for criteria for setting priorities for science (Shils, 1969). Noting these antecedents is not intended to diminish the importance of current activities (nor, for that matter, of this paper). Instead, it is to suggest the complexities of the issues under consideration and as a reminder of the richness and contemporary relevance of much that has been written before. Performance Impact Measures Differences in assessments about the potential positive and negative features of requiring strategic plans and performance measures into how Federal agencies set research priorities and assessed performance were visible at the time of GPRA’s enactment. They continue to this day.1 In 1994, almost immediately after GPRA’s passage, I organized a session at the American Association for the Advancement of Science’s (AAAS) Colloquium on Science and Technology Policy on the applicability of GPRA to budgeting for science and technology. Taking a “neutral” stand on the subject, I invited, among other panelists, Robert Behn, a leading scholar of and advocate for the new public management paradigm subsumed within GPRA and like requirements, and Paul David, a leading researcher in the economics of science and technology. The title of Behn’s talk captured its essence: “Here Comes Performance Assessment-And it Might Even be Good for You.” (Behn, 1 A natural experiment occurring on February 15-16, 2011 highlights the continuing character of these differing perspectives. OSTP’s release on February 10, 2011 of its R and D Dashboard, that contains data about NIH and NSF R and D awards to research institutions and “links those inputs to outputs—specifically publications, patent applications, and patents produced by researchers funded by those investments”– produced an immediate flurry of comments and exchanges on SciSIP’s list server. Most of this exchange contained the point-counterpoint themes in the Behn-David exchange cited above, as well as those recounted in this paper. Among these were: how were outcomes defined? could they be measured? is there reasonable consensus on what they are? One rejoinder to these comments raised in response to specific reservations about the meaningfulness of patent data was that when Congress asks what are we getting from these billions spent on R and D, it is helpful to have patent number to point to as one outcome of the nation’s investment.

OCR for page 119
126 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH 1994). Among the several benefits (or promises) cited by Behn were the following: Having objectives (“knowing where you want to go”) is helpful;Objectives provide useful baseline for assessing each of 4 modalities of accountability–finance, equity, use of power and performance. Well defined objectives and documentation of results facilitate communication with funders, performers, users, and others. For his part, David outlined what he termed “very serious problems…with outcome goal setting for federal programs in general and for research in particular” (David, 1994, p. 294). David’s central argument was that an “outcome reporting may have a perverse effect of distorting the perception of the system of science and technology and its relationship to economic growth” (ibid, p. 297). He further observed that “ Agencies should define appropriate output and outcome measures for all R and D programs, but agencies should not expect fundamental basic research to be able to identify outcomes and measure performance in the same way that applied research or development are able to.” What follows is essentially an expanded exposition of these two perspectives, presented first as promises and then as limitations. Promises • Performance measurement is a (necessary) means towards implementing (and enforcing) the audit precepts – especially those linked to accountability and transparency–contained within GPRA and like requirements. • Performance measures can assist agencies make improved, evidence- based decisions both for purposes of program design and operations (formative evaluations) and longer term assessments of allocative and distributive impacts ( summative evaluations). In these ways, performance measures assist agencies in formulating more clearly defined, realistic, and relevant strategic objectives and in better adjusting ongoing program operations to program objectives. • Well defined, readily measured, and easily communicated performance measures aids both funders and performers to communicate the accomplishments and contributions of the public investments to larger constituencies, thereby maintaining and

OCR for page 119
127 APPENDIX C strengthening the basis of long term public support of these investments. • The search for measures that accurately depict what an agency/program has accomplished may serve as a focusing device, guiding attention to the shortcomings of existing data sets and thus to investments in obtaining improved data. • Performance measurement focuses attention on the end objectives of public policy, on what has happened or happening outside the black box, rather than on the churning of processes and relationships inside the black box. This interior churning produces intermediate outputs and outcomes (e.g., papers, patents) that may be valued by performers (or their institutions, stakeholders, or local representatives), but these outputs and outcomes do not necessarily connect in a timely, effective, or efficient manner to the goals that legitimize and galvanize public support. • Requiring agencies to set forth explicit performance research goals that can be vetted for their societal importance and to then document that their activities produced results commensurate with these goals rather than some diminished or alternative set of outputs and outcomes is a safeguard against complacency on the part of funders and performers that what might have been true, or worked in the past, is not necessarily the case today, or tomorrow. Jones, for example, has recently noted, “Given that science is change, one may generally imagine that the institutions that are efficient in supporting science at one point in time may be less appropriate at a later point in time and that science policy, like science itself, must evolve and continually be retuned” (Jones, 2010, p. 3). Measurement of impacts is one means of systematically attending to the consequences of this evolution. • Performance measurement is a potential prophylactic against the episodic cold fusion-type viruses that have beset the formulation of U.S. science policy. As illustrated by the continuing debates set off by Birch’s claims on the disproportionate role of small firms as sources of job creation (cf. Haltiwanger, J., R. Jarmin, and J. Miranda (2010)) or the challenge posed to the reflexive proposition that the single investigator mode of support is the single best way to foster creative science by Borner, et.al. findings that “Teams increasingly dominate solo scientists in the product of high-impact, highly cited science; (Borner, et. al. 2010, p. 1), U.S. science and

OCR for page 119
128 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH innovation policy contains several examples of Will Roger’s observation that, “It isn’t what we don’t know that gives us trouble, it’s what we know that ain’t so.” • Presented as a method of assessing returns to Federal investments in research, performance measurement provides policymakers and performers with an expanded, more flexible and adaptable set of measures than implied by rate of return or equivalent benefit-cost calculations. Criticism of what is seen as undue reliance on these latter approaches is longstanding; they are based in part on technical matters, especially in the monetization of non-market outputs, but also on the distance between the form that an agency’s research output may take and the form needed for this output to have market or other societal impacts. The largest promise of performance measurement, though, likely arises not from recitation of the maxims of the new public management but from the intellectual ferment now underway in developing new and improved data on the internal processes of scientific and technological research, the interrelationships of variables within the black box, and improved methods for assembling, distilling and presenting data. Much of this ferment, of course, relates to Dr. Marburger’s call for a new science of science policy, the activities of the National Science and Technology Committee’s (NSTC) Committee on Science, and the research currently being supported by the National Science Foundation’s Science of Science and Innovation Policy program (SciSIP). No attempt is made here to present a full précis of the work underway (Lane, 2010). Having been a co-organizer, along with Al Teich, of two AAAS workshops at which SciSIP grantees presented their preliminary findings and interacted with Federal agency personnel, however, it is a professional pleasure to predict that a substantial replenishment and modernization of the intellectual capital underlying existing Federal research policies and investments can be expected. To illustrate though the nature of recent advances, I cite two developments non-randomly selected to reflect the focus of my own research interests. They are the NSF’s Business R and D and Innovation Survey (BRDIS), itself in part redesigned in response to the 2005 NRC study, Measuring Research and Development Expenditures in the U.S. Economy, and advances in the visualization of the (bibliometric) interconnections of disciplines. The NRC report articulated longstanding concerns that NSF’s existing survey of industrial R and D needed methodological upgrading, lagged behind the structure of the U.S.

OCR for page 119
129 APPENDIX C economy in not adequately covering the growth of the service sector or the internationalization of sources and performers of R and D, and did not adequately connect R and D expenditures with downstream “impact” measures, such as innovations. The result has been a major revision of these surveys, undertaken by NSF’s Science Resources Statistics Division. Early findings from the new BRDIS survey on the sources and characteristics of industrial innovation fill a long recognized data gap in our understanding of relationships between and among several variables, including private and public R and D expenditures, firm size and industrial structure, human capital formation and mobility, and managerial strategies. (Boroush, 2010). Combined with pending findings from a number of ongoing SciSIP projects and juxtaposed to and compared with data available from ongoing international surveys, these newly available data hold promise of simultaneously providing policymakers with a finer grained assessment of the comparative and competitive position of the technological performance of the U.S. economy and researchers and evaluators finer grained data to assess the impacts of selected science and technology program and test existing and emerging theories. Science is a set of interconnected, increasingly converging disciplines, so run the claims of many scientists (Sharp, et. al, 2011). But precisely in what ways and with what force do these interconnections flow? Does each field influence all other fields and with equal force, or are there discernible, predictable differences in patterns of connection and influence? Prospectively, being able to answer these questions would provide policymakers with evidence about relative priorities in funding fields of science, presumably giving highest priority to those that served as hubs from which other fields drew intellectual energy. Recent research in data visualization, illustrated by Boyack, Klavans, and Borner’s Mapping the Backbone of Science (2005), combines bibliometric techniques, network theories, and data visualization techniques to offer increasingly accessible “maps” of the structure of the natural and social sciences, thereby providing one type of answer to these questions. Limitations The above noted emphasis on context surfaces immediately in considering the limitations of performance measurement. Perhaps the most obvious and important difference in the use of such measures in

OCR for page 119
142 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH all these proposals could be funded, what means should be used to select from among them? What measures of performance/output/outcomes should be used to assess past performance in determining-out year investments or near-term R and D priorities. Exciting as it may be to envision the prospects of societal impacts flowing from frontier, high- risk, transformative risk, it serves only to bring one full circle back to the policymaker’s priority setting and resource allocation questions noted above. The same issues arise when trying to compute the proper level of support or estimate the returns to public investments for functional objectives, agencies, and fields of science. An impressive body of research, for example, exists on the contributions to the health status of the American population produced by Federal investments in biomedical research. It’s an analytical and empirical stretch to say that this research provides evidence that can be used to determine whether current or proposed levels of appropriation for NIH are right, too little, or too high. No evident empirical basis existed for the doubling of NIH’s budget over a 5-year time period, and the consequences now observed while unintended were not unpredictable (Freeman and van Reenan, 2008). At issue here is what Sarawitz had termed the myth of infinite benefits: “If more science and technology are important to the well-being of society, then the more science and technology society has, the better off it will be” (1996; p. 18). Indeed, arguably, if the budget decision had any lasting impacts, it was to elevate “balance” of funding across agencies as a resource allocation criteria and to set doubling as a formulaic target for other science oriented agencies. Similar problems arise too in attempting to formulate analytically consistent criteria based on performance measures for allocating funds among fields of science and technology, — how much for chemistry?; physics?; economics?- especially as among national objectives and agencies, as well as within agencies. These are the perennial practical questions across the spectrum of Federal science policymakers, yet perhaps with the exception of basing program level allocations on estimated returns from impacts, as in the cases of agriculture (Ruttan, 19820 and health (Gross, Anderson, and Power, 1999) for which few good answers, or funding algorithms, exist. For example, a recent NRC panel tasked with just such an assignment concluded in its report, A Strategy for Assessing Science, “No theory exists that can reliability predict which research activities are most likely to lead to scientific advances or to societal benefits” (2007, p. 89).

OCR for page 119
143 APPENDIX C One would like to do better than this. Here, if anywhere, is where performance measurement may have a role. The challenge at this point is not the absence of performance measures relating Federal investments in research to specific outputs or studies pointing to high social rates of return within functional areas but the sheer number of them and the variations in methodologies that produce them. The result is a portfolio of options about performance measures, each more precisely calibrated over time but still requiring the decision maker to set priorities among end objectives. Thus, the Boyack, et. al, bibliometric study cited above highlights the “centrality” of biochemistry among published papers. Using this study and its implied emphasis on scientific impact as a basis for resource allocation decisions among scientific fields would presumably lead to increased relative support for biochemistry. If one instead turns to the Cohen-Nelson-Walsh survey-based study (2002) of the contributions of university and government laboratory research i.e., (“public”) research to industrial innovation, which contains an implied policy emphasis on economic competitiveness, one finds both considerable variation across industries in the importance of public research and variations in which fields of public research are cited as making a contribution. An overall finding though is that, “As may be expected, more respondents consider research in the engineering fields to contribute importantly to their R and D than research in the basic sciences, except for chemistry” (2002, p. 10). The authors however mute this distinction of the relative contribution of fields of science with the caution that “the greater importance of more applied fields does not mean that basic science has little impact, but that its impact may be mediated through the more applied sciences or through the application of industrial technologists’ and scientists’ basic scientific training to the routine challenges of conducting R and D” (p. 21). But the upshot of the study still would seem to be the need for increased (relative) support of engineering related disciplines. Advocates for increased Federal research for computer science and engineering, for their part may turn to Jorgenson, Ho, and Samuels’ recent estimates of the contribution of the computer equipment manufacturing industry to the growth in US productivity between 1960-2007 (Jorgenson, Ho, and Samuels, 2010). An obvious conclusion, indeed the standard one in discussion of this issue, is that the interconnectedness of fields of science requires that each be supported. And this of course is how the present U.S. system functions. There are considerable differences, however, between funding

OCR for page 119
144 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH each field according to its deeds and each according to its needs. Moreover, the interconnectedness argument applies to historical determinants and levels of support; it is of limited guidance in informing budget decision—show much more or less, given existing levels of support? Little of this should be a surprise. The gap between estimates of returns to public investments in research and using these estimates to formulate budget allocations among missions, agencies, and disciplines was identified by Mansfield in the opening text of the social returns to R and D. Referring to the number of independent studies working from different models and different data bases that have pointed to very higher social rates of return, he noted, “But it is evident that these studies can provide very limited guidance to the Office of Management and Budget or to the Congress regarding many pressing issues. Because they are retrospective, they shed little light on current resource allocation decisions, since these decisions depend on the benefits and costs of proposed projects, not those completed in the past” (Mansfield, 1991, p. 26). The gap has yet to be closed. Similar issues arise in using bibliometric data to allocate resources across fields. Over the last three decades, even as the U.S. position in the life sciences has remained strong, its world share of engineering papers has been cut almost in half, from 38 percent in 1981 to 21 percent in 2009, placing it below the share (33 percent) for the EU27. Similar declines in world share are noted for mathematics, physics, and chemistry (National Science Foundation, 2007; Adams and Pendlebury, 2010). One immediate, and simple interpretation of these data is that aggregate bibliometric performance is a function of resource allocation: a nation gets what it funds. But this formulation begs first the question if what it is producing is what it most needs, and then if what it is producing is being produced in the most efficient manner. Conclusion Having studied, written about, participated in, organized workshops on, and as an academic research administrator been affected by the use of performance measures, something more than an “on the one hand/on the other hand” balance sheet, a concluding section seems in order. It’s simpler to start with the limitations of performance measures for they are real. These include the attempt to reduce assessment of complex, diverse, and circuitously generated outcomes, themselves often

OCR for page 119
145 APPENDIX C dependent on the actions of agents outside the control of Federal agencies, to single or artificially aggregated measures; the substitution of bureaucratically and/or ideologically driven specification and utilization of selective measures for the independent judgment of experts; and the distortion of incentives for science managers and scientists that reduces the overall performance of public investments. To all these limitations must be added that to date there is little publically verifiable evidence outside the workings of OMB-agency negotiations that implementation of a system of performance measurement has appreciably improved decision making with respect to the magnitude or allocation of Federal research funds. When joined with reservations expressed by both scholars and practitioners about the impacts of the new public management paradigm, it produces assessments of the type, “Much of what has been devised in the name of accountability actually interferes with the responsibilities that individuals in organizations have to carry out work and to accomplish what they have been asked to do” (Radin, 2006, p.7; also Perrin, 1998; Feller, 2002; Weingert, 2005; Auranen and Niemien, 2010). The promises, too, are likely to be real if and when they are realized. One takes here as a base the benefits contained in Behn’s presentation and the section on promises above. Atop this base are to be added the revised and new, expanded, disaggregated, and manipulable data sets emerging both from recent Federal science of science policy initiatives and other ongoing research (Lane and Bertuzzi, 2011). Thus, Sumell, Stephan, and Adams’ recent research on the locational decisions on new Ph.D.s working in industry accords with and provides an empirical base for the recent calls by the National Science Foundation’s Advisory Committee for GPRA Performance Assessment 2008 to collect and provide data on the “development of people” as an impact of agency support. A different category of benefits owing less to improved public sector management practices and more to the realities of science policy decision making needs to be added to this list. The very same arguments cited above that the links between initial Federal investments in research are too long term and circuitous to precisely specify in GPRA or OMB planning or budget formats serves to increase the value for intermediate measures. For policymakers operating in real time horizons, even extending beyond the next election cycle, performance measures of the type referred to above are likely as good as they are to get. We live in a second best world. Although it may be analytically and empirically

OCR for page 119
146 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH correct to state say that none of the proximate intermediate output measures, patents or publications for example, are good predictors of the ultimate impacts that one is seeking–increased per capita income, improved health–some such measures are essential to informed decision making. Adding impetus to this line of reasoning is that the environment in which U.S. science policy is made is a globally competitive one, which increases the risks of falling behind rivals. Akin to an arms race or advertising in imperfectly competitive markets, support of research is necessary to maintain market share, even if the information on which decisions are made is imperfect. Finally, as an empirically oriented economist whose work at various times has involved generating original data series of patents and publications and use of a goodly portion of the performance measures and methodologies now in vogue in evaluations of Federal and State science and technology programs, there is a sense of déjà vu to much of the debate about the promises and limitations of performance measures of impacts. The temptation is to observe somewhat like Monsieur Jordain in Moliere’s play, Le Bourgeois Gentilhomme, “Good heavens! For more than forty years I have been doing performance measurement without knowing it.” Performance measures viewed either or both as a method for explicating needs assessments or conducting impact assessments are basic, indispensible elements in policy making, program evaluation, and scholarly research. What are open to issue are: • the specification of the appropriate measures for the decision(s) under review– a complex task involving technical, political, and normative considerations; • the proper interpretation and incorporation of existing and newly developed data and measures used in retrospective assessments of performance into decisions relating to estimating the prospective returns from alternative future Federal investments in research– decisions made within a penumbra of scientific, technical, economic, and societal uncertainties that performance measures reduce but do not eliminate; and • providing evidence that use of performance measures as forms of evidence in fact improves the efficiency or rate(s) of return to Federal investments in research.

OCR for page 119
147 APPENDIX C Given the above recitation of promises and limitations, the optimal course of action seems to be what Feuer and Maranto have termed science advice as procedural rationality (2010). It is to (1) have policymakers employ performance impact measures that correspond to what is known or being learned about how public investments in basic and applied science relate to the attainment of given societal objectives; (2) have the body of existing and emerging knowledge of how Federal in basic and applied research impact on societal objectives connect to the form of the decisions that policymakers are called upon to make; and (3) use gaps that may exist between (1) and (2) to make explicit the nature of the limits to which theory-based/evidence-based knowledge can contribute to informed decision making (Aghion, David, and Foray, 2009). Viewed in terms of preventing worse case outcomes, the objective should be to avoid the pell-mell drive now in vogue in State governments towards formula shaped coupling of performance measures and budgets, a trend as applied to Federal funding of research that is fraught with the risks of spawning the limitations described above. To the extent that the STEP-COSEPUP workshop contributes to producing this course of action, it will have made an important contribution to the formulation of US research policy. REFERENCES Adams, J., and Pendlebury, D. 2010. Global Research Report: United States (Thomas Reuters). Aghion, P., David, P., and Foray, D. 2009. Can We Link Policy Practice with Research on ‘STIG’ Systems? Toward Connecting the Analysis of Science, Technology and Innovation Policy with Realistic Programs for Economic Development and Growth, The New Economics of Technology Policy, edited by D. Foray (Cheltenham, UK: Edward Elgar):46-71. Auranen, O. and Nieminen, M. 2010. University Research Funding and Publication Performance-An International Comparison. Research Policy, 39:822-834. Behn, R. 1994. Here Comes Performance Assessment-and It Might Even be Good for You. AAAS Science and Technology Policy Yearbook-1994, edited by A. Teich, S. Nelson, and C. McEnaney (Washington, DC: American Association for the Advancement of Science), 257-264.Borner, K., Contractor, N., Falk-Krzesinski, H. J., Fiore, S. M., Hall, K., L., Keyton, J., Spring, B., Stokols, D., Trochin, W., and Uzzi, B. 2010. A Multi-Systems Perspective for the Science of Team Science. Science Translational Medicine, 2:1-5.

OCR for page 119
148 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH Boroush, M. 2010. New NSF Estimates Indicate that U.S. R and D Spending Continued to Grow in 2008. National Science Foundation Infobrief, 10-32. Arlington, VA: National Science Foundation, January 2010. 2010. NSF Releases New Statistics on business Innovation, National Science Foundation Info Brief, 11-300, October 2010. Arlington, VA: National Science Foundation. Boskin, M., and Lau, L. 2000. Generalized Solow-Neutral Technical Progress and Postwar Economic Growth NBER Working Paper 8023 Cambridge, MA: National Bureau of Economic Research. Bozeman, B. , and Sarewitz, D. 2011. Public Value Mapping and Science Policy Evaluation. Minerva, 49:1-23. Clemins, P. 2010. Historic Trends in Federal R and D in Research and Development FY2011 Washington, DC: American Association for the Advancement of Science , 21-26. Cohen, W. R. 2005. Patents and Appropriation: Concerns and Evidence. Journal of Technology Transfer, 30:57-71 Cohen, W., Nelson, R., and Walsh, J. 2002. Links and Impacts: The Influence of Public Research on Industrial R and D. Management Science, 48:1-23. Crespi, G. , and Geuna, A. 2008. An Empirical Study of Scientific Production: A Cross Country Analysis, 1981-2002. Research Policy, 37: 565-579. Cutler, D., and Kadiyala, S. 2003. The Return to Biomedical Research: Treatment and Behavioral Effects, in Measuring the Gains from Medical Research, edited by K. Murphy and R. Topel, Chicago, IL: University of Chicago Press, pp. 110-162. David, P. 1994. Difficulties in Assessing the Performance of Research and Development Programs. AAAS Science and Technology Policy Yearbook-1994, op. cit., 293-301. DuPree, A. H. 1957. Science in the Federal Government. New York: Harper Torchbooks. Executive Office of the President, Office of Management and Budget, Science and Technology Priorities for the FY2012 Budget, M-10-30. Evenson,R., Ruttan, V., and Waggoner, P. E. 1979. Economic Benefits from Research: An Example from Agriculture. Science 205: 1101-1107. Feller, I. 2002. Performance Measurement Redux. American Journal of Evaluation, 23:435-452(2007). Feller, I. Mapping the Frontiers of Evaluation of Public Sector R and D Programs. 2007. Science and Public Policy, 34:681-690.

OCR for page 119
149 APPENDIX C Feller, I. 2009. A Policy-Shaped Research Agenda on the Economics of Science and Technology . The New Economics of Technology Policy, edited by D. Foray .Cheltenham, UK: Edward Elgar, 99-112. Feller, I. and G. Gamota. 2007. Science Indicators as Reliable Evidence. Minerva, 45:17-30. Feuer, M. and Maranto ,C. 2010. Science Advice as Procedural Rationality: Reflections on the National Research Council. Minerva: 48:259-275. Freeman, C. and Soete, L. 2009. Developing Science, Technology and Innovation Indicators: What We Can Learn from the Past. Research Policy, 38:583-589. Freeman, R. and van Reenan , J. 2008. Be Careful What You Wish For: A Cautionary Tale about Budget Doubling. Issues in Science and Technology, Fall .Washington, DC: National Academies Press. Gault, F. 2010. Innovation Strategies for a Global Economy. Cheltenham, UK: Edward Elgar. Geiger, R. 1986. To Advance Knowledge. Oxford, UK: Oxford University Press. Geisler, E. 2000. The Metrics of Science and Technology. Westport, CT: Quorum Books. Gladwell, M. 2011. The Order of Things. New Yorker, February 14, 2011: 68ff Godin, B. 2005. Measurement and Statistics on Science and Technology. London, UK: Routledge. Goldston, D. 2009. Mean What You Say. Nature 458, 563. Published online 1 April 2009. Grant, J., Brutscher, P., Kirk, S., Butler, L., and Woodring, S. 2009. Capturing Research Impacts: A Review of International Practice, Report to the Higher Education Funding Council for England, DB- 578-HEFCE.Cambridge, UK: RAND Europe. Gross, C., Anderson, G., and Powe, N. 1999. The Relation between Funding by the National Institutes of Health and the Burden of Disease. New England Journal of Medicine, 340 (24):1881-1887 Haltiwanger, J., Jarmin, R., and Miranda, J. 2010. Who Creates Jobs? Small vs. Large vs. Young. National Bureau of Economic Research Working Paper 16300. Cambridge, MA: National Bureau of Economic Research. Hedge, D. and Mowery, D. 2008. Politics and Funding in the U.S. Public Biomedical R and D System. Science, 322 (19):1797-1798

OCR for page 119
150 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH Heisey, P., King, K., Rubenstein, K., Bucks, D., and Welsh, R.2010. Assessing the Benefits of Public Research Within an Economic Framework: The Case of USDA’s Agricultural Research Service. United States Department of Agriculture, Economic Research Service, Economic Research Report Number 95. Jones, B. 2010. As Science Evolves, How Can Science Policy? National Bureau of Economic Research Working Paper 16002. Cambridge, MA: National Bureau of Economic Research. Jorgenson, D., Ho, M. and Samuels, J. 2010. New Data on U.S. Productivity Growth by Industry. Paper presented at the World KLEMS Conference, Harvard University, August 19 - 20, 2010. Kanninen, S., and Lemola, T. 2006. Methods for Evaluating the Impact of Basic Research Funding. Helsinki, Finland: Academy of Finland. Kelves, B. 1997. Naked to the Bone. New Brunswick, NJ: Rutgers University Press. Kettl, D. 1997. The Global Revolution in Public Management: Driving Themes, Missing Links. Journal of Policy Analysis and Management, 16:446-462. Koopmans, T.J. 1947. Measurement without Theory. Review of Economic Statistics, 39:161-172 Lane, J. and Bertuzzi, S. 2011. Measuring the Results of Science Investments. Science, 331(6018):678-680. Larsen, M. 2011. The Implications of Academic Enterprise for Public Science: An Overview of the Empirical Literature. Research Policy, 40:6-10. Link, A. 2010. Retrospective Benefit-Cost Evaluation of U.S. DOE Vehicle Combustion Engine R and D Investments. Department of Economics Working Paper Series. Link, A. and Scott, J. 2011 Public Goods, Public Gains, Oxford, UK: Oxford University Press. Lundvall, B. and Borras, S. 2005. Science, Technology, and Innovation Policy. The Oxford Handbook of Innovation, edited by J. Fagerberg, D. Mowery, and R. Nelson. Oxford, UK: Oxford University Press. 599-631. Mansfield, E. 1991. Social Returns from R and D: Findings, Methods and Limitations. Research Technology Management, 34:6 Manski, C. 2010. Policy Analysis with Incredible Certitude. NBER Working Paper Series #16207. Cambridge, MA: National Bureau of Economic Research.

OCR for page 119
151 APPENDIX C Massachusetts Institute of Technology (2011) The Third Revolution: The Convergence of the Life Sciences, Physical Sciences, and Engineering. Letter to our Colleagues, January 2011. Moed, H. 2005. Citation Analysis in Research Evaluation. Dordrecht, The Netherlands: Springer. Mohr, L. 1995. Impact Analysis for Program Evaluation. 2d Edition Thousand Oaks, CA: SAGE. Mowery, D. and Rosenberg, N. 1989. Technology and the Pursuit of Economic Growth. Cambridge, UK: Cambridge University Press. Murphy, K. and Topel, R. 2006. The Value of Health and Longevity. Journal of Political Economy. 114:871-904. National Academies of Sciences.1999. Evaluating Federal Research Programs .Washington, DC: National Academy Press. National Academies of Sciences. 2007. Rising Above the Gathering Storm. Washington, DC: National Academies Press. National Academies of Sciences. 2007. A Strategy for Assessing Science. Washington, DC: National Academies Press. National Academies of Sciences. 2010. Managing University Intellectual Property in the Public Interest. Washington, DC: National Academies Press. National Science Foundation 2007. Changing U.S. Output of Scientific Articles: 1988-2003 –Special Report . Arlington, VA: National Science Foundation. Office of Management and Budget .2008. Program Assessment Rating Tool Guidance, No. 2007-02 Organisation for Economic Cooperation and Development. 2005. Modernising Government. Paris, FR: Organisation for Economic Cooperation and Development. OECD Science, Technology and Industry Outlook. 2010. Paris, FR: Organiszation for Economic Cooperation and Development. Payne, A. 2006. Earmarks and EPSCoR. Shaping Science and Technology Policy, edited by D. Guston and D. Sarewitz. University of Wisconsin Press. 149-172. Perrin, B. 1998. Effective Use and Misuse of Performance Measurement. American Journal of Evaluation, 19: 367-379. Radin, B. 2006. Challenging the Performance Movement. Washington, DC: Georgetown University Press. Roessner, D.,Bozeman, B. , Feller, I., Hill, C., and Newman, N. 1997. The Role of NSF’s Support of Engineering in Enabling Technological Innovation, Report to the National Science Foundation Arlington, VA: SRI International.

OCR for page 119
152 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH Rosenberg, N. 1972. Technology and American Economic Growth. New York, NY: Harper Torchbooks. Rosenberg, N. 1982. Learning by Using, in Inside the Black Box Cambridge, UK: Cambridge University Press. 120-140. Ruegg, R. and Feller, I. 2003. A Toolkit for Evaluating Public R and D Investment. NIST GCR 03-857. Gaithersburg, MD: National Institute of Standards and Technology. Ruttan, V. 1982. Agricultural Research Policy. Minneapolis, MN: University of Minnesota Press. Sarawetz, D. 1996. Frontiers of Illusion .Philadelphia, PA: Temple University Press. Savage, J. 1999. Funding Science in American: Congress, Universities, and the Politics of the Academic Pork Barrel. Cambridge, UK: Cambridge University Press. Schmoch, U., Schubert, T., Jansen, D., Heidler, R., and von Gortz, R. 2010. How to Use Indicators to Measure Scientific Performance: A Balanced Approach. Research Evaluation, (19): 2-18. Schubert, T. 2009. Empirical Observations on New Public Management to Increase Efficiency in Public Research-Boon or Bane? Research Policy, 38:1225-1234. Shils, E., editor. 1969. Criteria for Scientific Development: Public Policy and National Goals. Cambridge, MA: MIT Press. Stokols, D., Hall, K., Taylor, B. and Moser, R. 2008. The Science of Team Science. American Journal of Preventive Medicine, 35: S77- S89. U.S. House of Representatives, Committee on Science and Technology 1986. The Nobel Prize Awards in Science as a Measure of National Strength in Science, Science Policy Study Background Report No. 3, 99th Congress, Second Session. Von Hippel, E. 2005. Democratizing Innovation . Cambridge, MA: MIT Press. Wagner, C. and Flanagan, A. 1995. Workshop on the Metrics of Fundamental Science: A Summary, Washington, DC: Critical Technologies Institute, Prepared for Office of Science and Technology Policy, PM-379-OSTP. Weingert, P. (2005) Impact of Bibliometrics upon the Science System: Inadvertent Consequences? Scientometrics (62): 117-131. Wilson, J. 1989. Bureaucracy New York, NY:Basic Books Zakaria, F. 2010. How to Restore the American Dream. Time, October 21, 2010.