2

THE USES AND MISUSES OF PERFORMANCE MEASURES

Economists, policy analysts, and other scholars have studied the returns from federal research investments for decades, and they have made considerable progress. But basic questions still have only partial answers: What percentage of the gross domestic product (GDP) should be devoted to research and development? How should research dollars be allocated among fields of research? Which institutions and researchers can conduct research most efficiently and productively?

In the first session of the workshop, three speakers addressed the broad and complex issues that arise in attempts to answer these questions on the basis of empirical evidence. Each emphasized that the issues are exceedingly complex, and each offered a partly personal perspective on the workshop topic. Their observations and reflections provided a basis for many of the presentations that followed.

THE PROMISE AND THE LIMITS OF MEASURING THE IMPACT OF FEDERALLY SUPPORTED RESEARCH

The endeavor to measure the impacts of federally supported research has an inherent tension, said Irwin Feller, Senior Visiting Scientist at the American Association for the Advancement of Science (AAAS) and Professor Emeritus of Economics at Pennsylvania State University, who spoke on one of the two papers commissioned by the organizing committee in preparation for the workshop (Appendix C). One objective of performance measures is to guide public decision making. Yet the task can be so difficult—and sometimes counterproductive—that it leads to what Feller, quoting John Bunyan’s Pilgrim’s Progress, called the Slough of Despond. The basic problem, as



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 7
2 THE USES AND MISUSES OF PERFORMANCE MEASURES Economists, policy analysts, and other scholars have studied the returns from federal research investments for decades, and they have made considerable progress. But basic questions still have only partial answers: What percentage of the gross domestic product (GDP) should be devoted to research and development? How should research dollars be allocated among fields of research? Which institutions and researchers can conduct research most efficiently and productively? In the first session of the workshop, three speakers addressed the broad and complex issues that arise in attempts to answer these questions on the basis of empirical evidence. Each emphasized that the issues are exceedingly complex, and each offered a partly personal perspective on the workshop topic. Their observations and reflections provided a basis for many of the presentations that followed. THE PROMISE AND THE LIMITS OF MEASURING THE IMPACT OF FEDERALLY SUPPORTED RESEARCH The endeavor to measure the impacts of federally supported research has an inherent tension, said Irwin Feller, Senior Visiting Scientist at the American Association for the Advancement of Science (AAAS) and Professor Emeritus of Economics at Pennsylvania State University, who spoke on one of the two papers commissioned by the organizing committee in preparation for the workshop (Appendix C). One objective of performance measures is to guide public decision making. Yet the task can be so difficult—and sometimes counterproductive—that it leads to what Feller, quoting John Bunyan’s Pilgrim’s Progress, called the Slough of Despond. The basic problem, as 7

OCR for page 7
8 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH Einstein stated, is that “not everything that counts can be counted, and not everything that can be counted counts”—a phrase that was quoted several times during the workshop. The Multiple Uses of Performance Measures Performance measures have many uses, Feller continued. First, they are used to do retrospective assessments of realized, observed, and measured impacts. In this case, basic questions are: How has that program worked? Has it produced the results for which it was funded? How could these research advances contribute to societal objectives? Second, performance measures can be used to assess the best direction in which to head. Is this where scientific advances will occur? Will these scientific advances lead to the achievement of societal objectives? Finally, performance measures can benchmark accomplishments against historical or international measures and advocate for particular actions. In each of these cases, performance measures have little relevance in the abstract, Feller said. They need to be related to the decisions at hand, and their promise and limitations depend on the decision being made. “They are quite necessary and productive for certain types of decisions, problematic for others, and harmful for others.” The context of performance measures determines much of their promise and limitations, according to Feller. A critical question is who is asking the questions. In a university setting, a promotion and tenure committee might ask about publications and citations while a dean or president might ask which areas of the university to support. In the federal government, a member of Congress might ask whether appropriations for a particular laboratory will produce jobs in his or her district, the director of OSTP might ask questions about recommendations to make to the President, and the director of the Office of Management and Budget (OMB) might ask about U.S. research expenditures relative to all other demands on the budget. Similarly, different federal agencies might ask different questions. NSF might want to know how to use research to advance the frontiers of knowledge, while the EPA might want to use science to support regulatory decisions. Performance measures have been the focus of longstanding and diverse research traditions, Feller said. Over the course of four decades, he has studied patent data, bibliometrics, and many other measures

OCR for page 7
9 THE USES AND MISUSES OF PERFORMANCE MEASURES related to research performance. The economics literature continues to produce more refined measures, better data, and new estimation techniques. Feller cited one study that used 37 performance measures in terms of outputs, outcomes, and impacts. Scorecards that compile measures, both nationally and internationally, also are proliferating. New theories, models, techniques, and datasets are producing an intellectual ferment in the use of performance measures. In addition, the community of practice is strengthening, which will increase the supply and use of research-based, policy-relevant performance measures. “This is a rich and fertile field for exploration, for discovery, and for development,” Feller observed. The Promise of Performance Measures In terms of the promise of performance measures, they provide useful baselines for assessing several forms of accountability. First, such measures provide evidence that an agency, laboratory, or individual is making good use of allocated funds. Second, well-defined objectives and documentation of results facilitate communication with funders, performers, users, and others. Results become verifiable and quantifiable information on what has been done. Performance measures focus attention on the ultimate objectives of public policy. Researchers and policymakers sometimes refer to the “black box” of innovation - the complex process of turning knowledge into applications - and much research done in economics and related disciplines tries to explain what goes on inside the black box. Finally, performance measures can help policymakers avoid “fads” that direct attention in unproductive ways. Data can document that some phenomena do not have a solid evidentiary base and that it is time to move on. The Limits of Performance Measures An obvious limit on performance measures is that the returns on research are uncertain, long term, and circuitous. This makes it difficult to put research into a strict accountability regime. Doing so “loses sight of the dynamics of science and technology,” Feller said. In addition, impacts typically depend on complementary actions by entities other than the federal government. This is particularly the case as

OCR for page 7
10 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH fundamental research moves toward technological innovation, implementation, and practice. A less obvious limitation is that the benefits from failure are often underestimated by performance measures. Risk and uncertainty are inevitable in research, which means that research often generates negative results. Yet such results can redirect research into extremely productive directions, Feller said. The selection of performance measure can also offer what Feller called a specious precision. Different measurable outcomes such as productivity, employment, competitiveness, and growth are not necessarily compatible with each other. There may also be tradeoffs among measures, so that greater accuracy in one generates greater uncertainty in the other. The selection of performance measures can distort incentives. Research managers strive to improve performance on the measures selected, which can lead to results that are not necessarily compatible with longer-term objectives. A final limitation, according to Feller, is that there is limited public evidence to date of the contributions that performance measurement has made to improve decision making. Three Major Questions Federal science policy must ask three big questions, Feller observed: 1. How much money should be allocated to federal research? 2. How much money should be spent across missions, agencies, or fields of research? 3. Which performers should conduct research, and what are the allocation criteria used to distribute these funds? Performance measures do not provide a basis for answering the first of these questions. They do not indicate if the ratio of R and D to gross domestic product (GDP) should be 2.8 percent, 3 percent, 3.2 percent, 4 percent, or 6 percent. “I don’t know if there is any evidence to support one level rather than the other,” said Feller. With regard to the allocation of money across fields, performance measures lead to multiple answers and therefore to multiple possible decisions. For example, bibliometric studies among journals might point toward the importance of biochemistry, economic research might point to

OCR for page 7
11 THE USES AND MISUSES OF PERFORMANCE MEASURES the influence of computer engineering, and survey research on the use of scientific knowledge by industry might point to the need to support engineering and applied research fields. Of course, all scientific fields are connected to others, but that does not help make decisions about where to increase funding at the margin. “Depending on the methodology and the performance measures you use, you get different fields of science that tend to be emphasized,” said Feller. Performance measures have greater potential, Feller continued, in deciding among the performers of research, whether universities, government laboratories, non-governmental organizations, or other research institutes and among investigators. Agencies often have to make such decisions, along with decisions about the structure of research teams and centers. However, performance measures are currently underused for this purpose. Do No Harm It is critically important to “do no harm,” Feller emphasized. A major goal of developing performance measures is to improve the quality of decision making. But there are dangers in relying too heavily on performance measures. For example, some states are discussing the use of performance measures to determine funding levels for higher education, despite their many limitations. Some policymakers “are moving pell-mell into the Slough of Despond, and I think that’s what you want to avoid.” Policy analysts also must be careful not to overpromise what performance measures can do. Analysts will be called to account if their measures turn out to be mistaken and lead to harmful decisions, Feller concluded. INNOVATION AS AN ECOSYSTEM Daniel Sarewitz, Professor of Science and Society at Arizona State University, reinforced and expanded on Feller’s comments. The fundamental assumption of the workshop, he said, is that federal investments in research have returns to society that can be measured. However, this assumption raises the much larger question of how the innovation system operates. Policymakers have a tendency to simplify the operation of the system. For example, they may draw a straightforward connection between basic research and applications and

OCR for page 7
12 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH imply that the basic task is to speed the movement from the former to the latter. It is “discouraging,” said Sarewitz, that policymakers still feel a need to present such simplifications to garner public support. Rather than introducing performance metrics into an oversimplified narrative, Sarewitz continued, perhaps it would be better to improve the narrative. This requires re-examining the role of research in the broader innovation process. The Features of Complex Systems Case studies of the role of research in innovation reveal an extremely complex process in which research is an important element of the process but not the only important element. “Everything is connected to everything else,” said Sarewitz. “It’s an ecosystem, and all things flow in different ways at different times depending on who is looking when and where in the process.” For example, technology often enables basic science to address new questions. Similarly, tacit knowledge acquired through the day-to-day practice of, for example, engineers or physicians can raise important questions for researchers. As an example, Sarewitz cited a statement by former NIH Director Harold Varmus that some cancer treatments are “unreasonably effective” but that it is hard to fund research on these treatments because such research is considered high risk. “I was stunned by this, because my view of the complexity of the innovation system is that if we understand that technologies and practices themselves are sources of problems that research can address, then one ought to see unreasonably effective cancer treatments as an incredibly potent attractor of research.” However, the predominant model of research pursued at NIH is to understand the fundamental dynamics of a disease, which then will lead rationally toward the best treatments to use. There is a deeper problem, said Sarewitz. In a complex system such as the innovation ecosystem, there is no reason to believe that optimizing the performance of any one part of the system will optimize or even necessarily improve the performance of the system as a whole. “Another way to put this is that research is not an independent variable in the innovation system. We generally don’t know what the independent variables are. For analytical purposes there may not be any.” The connections that link the elements of the innovation system represent contextual factors that can be crucial determinants of performance. Factors such as trust among the people in an institution, administrative structures that allow for rapid learning and adaptation, or

OCR for page 7
13 THE USES AND MISUSES OF PERFORMANCE MEASURES historical ties between different institutions that allow them to work together can be very important for determining the dynamics and ultimate success of complex innovation processes. These sorts of internal systems dynamics can be teased out through careful case studies, Sarewitz said. But they are very difficult to capture in de-contextualized and rigid performance measures. The Policy Perspective Policymakers have an array of tools that they can use to try to influence the behavior of complex innovation processes. However, just a few of these tools relate directly to research, and the relations among these tools are poorly understood. For example, analysts would have difficulty measuring and comparing the performance of intramural laboratories and extramural university research without also knowing the institutional contexts of the research performers. More generally, research performance measures may reveal little about the value and contextual appropriateness of the full array of science policy tools. For example, tools like demonstration and procurement, especially as done by the Department of Defense, have been enormous drivers of innovation in the past, yet they are outside the domain of research performance measures. Given the importance of other factors, optimizing research performance could lead to undesired outcomes. These undesired outcomes may even have ethical and moral dimensions, said Sarewitz. For example, policy decisions in the early 1980s accelerated the privatization of the results of publicly funded research and helped to elevate the importance of patents as an apparent indicator of innovation. However, these policy decisions have consequences that bear on equity to access of some of the products of publicly funded research. In the medical arena, to cite an example Sarewitz mentioned, they could have slowed innovation in socially important domains of research, such as the development of agricultural biotechnologies for developing countries. Innovative Approaches The science and technology policy and research communities have to engage as imaginatively as possible in expanding the array of approaches used to understand, assess, and talk about innovation

OCR for page 7
14 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH processes and their outcomes in society, Sarewitz said. First, new understandings of complex innovation processes can be used to help improve policy making. Case studies, for example, can produce synthetic systems-oriented insights that can have a powerful and enriching impact on policy making and “hopefully, change the narrative.” Second, the science policy research community can do a better job of coming up with diverse performance criteria and measures that can support rather than displace qualitative insights. An interesting recent example involved the public policy analogues of market failures, which could be used to drive public investments in the same way that market failures have in the past (Bozeman and Sarewitz, 2005). “We don’t know yet if this particular approach is going to turn out to be a valuable tool,” said Sarewitz. “The point I’m trying to make is that the narrow array of things we are now measuring as indicators of performance of the innovation system, mostly matters of research productivity, is impoverished and we can and should do better.” Research is crucially important in innovation, Sarewitz concluded. But its importance is contextual and contingent in space, among institutions, and over time. “If decision makers focus on optimizing performance and the innovation enterprise based on measures that largely deal with research, research performance, and research outputs, they’ll likely fail to achieve the goals that the public expects from the nation’s R and D investment.” OVERCOMING THE CHALLENGES OF RESEARCH MEASURES In a commentary on Feller’s and Sarewitz’s presentations, Alfred Spector, Vice President at Google, agreed that mechanisms are needed to determine the right amount, the proper balance, and the overall effectiveness of research investments. But he also pointed out that these mechanisms face several challenges. First, measurement imposes overhead on the research community. Especially when the measurements do not seem to be related to specific outcomes, researchers can chafe at the time and effort involved in filling out forms or answering questions. If measurements were simple, overhead would be reduced. But the innovation system is complex and single measures can be misleading, which means that multiple measures are needed.

OCR for page 7
15 THE USES AND MISUSES OF PERFORMANCE MEASURES The act of measuring also can perturb the research being done. Spector cited an example from computer science involving the relative emphasis on patenting. He said that most people working in his field would conclude that greater emphasis on patenting would reduce the rate of innovation. “Most faculty agree that patents in computer science basically are almost always a bar that reduces the rate of innovation by creating rigidities and without the benefits of the economic incentives that are supposedly being provided. This may not be true in the biotechnologies, but it is true, I believe, in my field.” Some measures also may be outdated. For example, publications have been important in the past. But in computer science today, an important product of research is open source software that is broadly disseminated. Such dissemination is a form of publication, but it is not a refereed publication that traditionally has factored into evaluations. Similarly, open standards can be incredibly valuable and powerful, as can proprietary products that establish the state of the art and motivate competition. Accounting for Overlooked Measures Greater transparency can help overcome these challenges, said Spector. The growth of modern communication technologies makes transparency much more feasible today than in the past, providing a more open view of research outcomes. Similarly, better visualizations can produce representations that are useful to policymakers and the public in assessing the value of research. One of the most important products of research, though it is sometimes overlooked, is the training of people, Spector said. “If you talk to most of my peers in industry, what we really care about as much as anything else is the immense amount of training that goes on through the research that’s done.” For example, venture capitalists would rate talent as the most important input into innovation. Also, the diversity of research approaches can be an important factor in research. In computer science, for example, funding has come not only from the NSF, in which peer review largely determines what science will be done, but also from the Defense Advanced Research Projects Agency, which has a much more mission-oriented approach. “DARPA has made huge bets, primarily on teams that they believed would win those bets. That has also resulted in huge results.” However

OCR for page 7
16 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH research is measured, it has to accommodate different approaches to realize the advantages of diversity, Spector said. Failure is an important aspect of research. If there is no failure in research projects, then they are not at the right point on the risk-reward spectrum, said Spector. Rewarding failure may not seem like a good thing, but for research it can be essential. At Google, said Spector, “we view it as a badge of honor to agree that a certain line of advanced technology or research is not working and to stop and do something else. I think we need to have measurements like that in the world at large, although it’s clearly a challenging thing to do.” Finally, the potential for serendipity needs to be rewarded. “If everything is so strongly controlled, I have a feeling we’ll do whatever the establishment feels is right and serendipity will be removed.” Serendipity often produces the creative disruption that reshapes entire industries, Spector concluded. DISCUSSION In response to a question about using measures of research outcomes to increase commercialization, Feller warned against the distortions such initiatives can produce in agencies such as NSF. He agreed with Spector that industry is more interested in the trained students research produces than in specific findings or patents. Also, researchers are usually not able to predict with certainty the commercial or societal implications of their research. However, Feller added that it may be possible to document the need for transformative research. For example, NSF has been funding Science and Technology Centers that are focused on emerging scientific opportunities with important societal implications, such as hydrological research or the atmospheric sciences, that can have difficulty obtaining funding through conventional channels because they are too risky or large. These centers can even be evaluated in part using traditional measures, such as the number of collaborators from different disciplines on papers. Sarewitz agreed that the agencies need to emphasize high-risk research because universities tend to pursue incremental change. A workshop participant asked about the best way to evaluate research across an entire agency such as NSF to make decisions about the allocation of funding. Feller emphasized the importance of truth and transparency. He praised the work of the Science of Science and

OCR for page 7
17 THE USES AND MISUSES OF PERFORMANCE MEASURES Innovation Policy (SciSIP) Program at NSF and said that NSF needs to draw on the expertise being developed by the program and elsewhere in the agency. He also noted the need to re-fashion the Government Performance and Results Act (GPRA) to be more suited to research. At the same time, he noted the potential problem of researcher overhead and the need for measures to produce useful information. Sarewitz added that increments of information tend to have no impact on institutional decision-making processes. Measures of research performance can help agencies “get their house in order,” said Feller, since many allocation decisions are still internal to agencies. However, measures demonstrating positive research outcomes do not necessarily guarantee that Congress will continue to allocate funds for those programs. “At some point, these remain fundamentally political decisions with a strong tang of ideology,” said Feller. Congress or OMB can always question, for example, whether a given program is an appropriate role for government. Sarewitz pointed out that oversimplified narratives of innovation can contribute to this politization. If policymakers had a more sophisticated perspective on innovation, they would be more willing to accept a multi-faceted government role rather than devoting money solely to research. Spector added that information technologies provide new ways to disseminate these more sophisticated narratives, regardless of the origins and targets of those narratives. David Goldston, who was on the planning committee for the workshop, pointed out that research funding decisions are inherently political. Showing that a given program is working usually answers a different set of questions than the opponents of a program are asking. Feller responded that dealing with the objections raised by the opponents of a program is like dealing with counterfactual scenarios, in which new scenarios can constantly be created that either have not been tested or are impossible to test. Nevertheless, the perspectives of policymakers on research have changed dramatically over the last few decades, so that they generally accept the need for the federal government to support fundamental research.

OCR for page 7