Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 7
2
THE USES AND MISUSES OF PERFORMANCE
MEASURES
Economists, policy analysts, and other scholars have studied the
returns from federal research investments for decades, and they have
made considerable progress. But basic questions still have only partial
answers: What percentage of the gross domestic product (GDP) should
be devoted to research and development? How should research dollars be
allocated among fields of research? Which institutions and researchers
can conduct research most efficiently and productively?
In the first session of the workshop, three speakers addressed the
broad and complex issues that arise in attempts to answer these questions
on the basis of empirical evidence. Each emphasized that the issues are
exceedingly complex, and each offered a partly personal perspective on
the workshop topic. Their observations and reflections provided a basis
for many of the presentations that followed.
THE PROMISE AND THE LIMITS OF MEASURING THE
IMPACT OF FEDERALLY SUPPORTED RESEARCH
The endeavor to measure the impacts of federally supported
research has an inherent tension, said Irwin Feller, Senior Visiting
Scientist at the American Association for the Advancement of Science
(AAAS) and Professor Emeritus of Economics at Pennsylvania State
University, who spoke on one of the two papers commissioned by the
organizing committee in preparation for the workshop (Appendix C).
One objective of performance measures is to guide public decision
making. Yet the task can be so difficult—and sometimes
counterproductive—that it leads to what Feller, quoting John Bunyan’s
Pilgrim’s Progress, called the Slough of Despond. The basic problem, as
7
OCR for page 8
8 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
Einstein stated, is that “not everything that counts can be counted, and
not everything that can be counted counts”—a phrase that was quoted
several times during the workshop.
The Multiple Uses of Performance Measures
Performance measures have many uses, Feller continued. First, they
are used to do retrospective assessments of realized, observed, and
measured impacts. In this case, basic questions are: How has that
program worked? Has it produced the results for which it was funded?
How could these research advances contribute to societal objectives?
Second, performance measures can be used to assess the best
direction in which to head. Is this where scientific advances will occur?
Will these scientific advances lead to the achievement of societal
objectives?
Finally, performance measures can benchmark accomplishments
against historical or international measures and advocate for particular
actions.
In each of these cases, performance measures have little relevance
in the abstract, Feller said. They need to be related to the decisions at
hand, and their promise and limitations depend on the decision being
made. “They are quite necessary and productive for certain types of
decisions, problematic for others, and harmful for others.”
The context of performance measures determines much of their
promise and limitations, according to Feller. A critical question is who is
asking the questions. In a university setting, a promotion and tenure
committee might ask about publications and citations while a dean or
president might ask which areas of the university to support. In the
federal government, a member of Congress might ask whether
appropriations for a particular laboratory will produce jobs in his or her
district, the director of OSTP might ask questions about
recommendations to make to the President, and the director of the Office
of Management and Budget (OMB) might ask about U.S. research
expenditures relative to all other demands on the budget. Similarly,
different federal agencies might ask different questions. NSF might want
to know how to use research to advance the frontiers of knowledge,
while the EPA might want to use science to support regulatory decisions.
Performance measures have been the focus of longstanding and
diverse research traditions, Feller said. Over the course of four decades,
he has studied patent data, bibliometrics, and many other measures
OCR for page 9
9
THE USES AND MISUSES OF PERFORMANCE MEASURES
related to research performance. The economics literature continues to
produce more refined measures, better data, and new estimation
techniques. Feller cited one study that used 37 performance measures in
terms of outputs, outcomes, and impacts. Scorecards that compile
measures, both nationally and internationally, also are proliferating. New
theories, models, techniques, and datasets are producing an intellectual
ferment in the use of performance measures. In addition, the community
of practice is strengthening, which will increase the supply and use of
research-based, policy-relevant performance measures. “This is a rich
and fertile field for exploration, for discovery, and for development,”
Feller observed.
The Promise of Performance Measures
In terms of the promise of performance measures, they provide
useful baselines for assessing several forms of accountability.
First, such measures provide evidence that an agency, laboratory, or
individual is making good use of allocated funds.
Second, well-defined objectives and documentation of results
facilitate communication with funders, performers, users, and others.
Results become verifiable and quantifiable information on what has been
done.
Performance measures focus attention on the ultimate objectives of
public policy. Researchers and policymakers sometimes refer to the
“black box” of innovation - the complex process of turning knowledge
into applications - and much research done in economics and related
disciplines tries to explain what goes on inside the black box.
Finally, performance measures can help policymakers avoid “fads”
that direct attention in unproductive ways. Data can document that some
phenomena do not have a solid evidentiary base and that it is time to
move on.
The Limits of Performance Measures
An obvious limit on performance measures is that the returns on
research are uncertain, long term, and circuitous. This makes it difficult
to put research into a strict accountability regime. Doing so “loses sight
of the dynamics of science and technology,” Feller said.
In addition, impacts typically depend on complementary actions by
entities other than the federal government. This is particularly the case as
OCR for page 10
10 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
fundamental research moves toward technological innovation,
implementation, and practice.
A less obvious limitation is that the benefits from failure are often
underestimated by performance measures. Risk and uncertainty are
inevitable in research, which means that research often generates
negative results. Yet such results can redirect research into extremely
productive directions, Feller said.
The selection of performance measure can also offer what Feller
called a specious precision. Different measurable outcomes such as
productivity, employment, competitiveness, and growth are not
necessarily compatible with each other. There may also be tradeoffs
among measures, so that greater accuracy in one generates greater
uncertainty in the other.
The selection of performance measures can distort incentives.
Research managers strive to improve performance on the measures
selected, which can lead to results that are not necessarily compatible
with longer-term objectives.
A final limitation, according to Feller, is that there is limited public
evidence to date of the contributions that performance measurement has
made to improve decision making.
Three Major Questions
Federal science policy must ask three big questions, Feller
observed:
1. How much money should be allocated to federal research?
2. How much money should be spent across missions, agencies, or fields
of research?
3. Which performers should conduct research, and what are the
allocation criteria used to distribute these funds?
Performance measures do not provide a basis for answering the first
of these questions. They do not indicate if the ratio of R and D to gross
domestic product (GDP) should be 2.8 percent, 3 percent, 3.2 percent, 4
percent, or 6 percent. “I don’t know if there is any evidence to support
one level rather than the other,” said Feller.
With regard to the allocation of money across fields, performance
measures lead to multiple answers and therefore to multiple possible
decisions. For example, bibliometric studies among journals might point
toward the importance of biochemistry, economic research might point to
OCR for page 11
11
THE USES AND MISUSES OF PERFORMANCE MEASURES
the influence of computer engineering, and survey research on the use of
scientific knowledge by industry might point to the need to support
engineering and applied research fields. Of course, all scientific fields are
connected to others, but that does not help make decisions about where
to increase funding at the margin. “Depending on the methodology and
the performance measures you use, you get different fields of science
that tend to be emphasized,” said Feller.
Performance measures have greater potential, Feller continued, in
deciding among the performers of research, whether universities,
government laboratories, non-governmental organizations, or other
research institutes and among investigators. Agencies often have to make
such decisions, along with decisions about the structure of research
teams and centers. However, performance measures are currently
underused for this purpose.
Do No Harm
It is critically important to “do no harm,” Feller emphasized. A
major goal of developing performance measures is to improve the quality
of decision making. But there are dangers in relying too heavily on
performance measures. For example, some states are discussing the use
of performance measures to determine funding levels for higher
education, despite their many limitations. Some policymakers “are
moving pell-mell into the Slough of Despond, and I think that’s what you
want to avoid.”
Policy analysts also must be careful not to overpromise what
performance measures can do. Analysts will be called to account if their
measures turn out to be mistaken and lead to harmful decisions, Feller
concluded.
INNOVATION AS AN ECOSYSTEM
Daniel Sarewitz, Professor of Science and Society at Arizona State
University, reinforced and expanded on Feller’s comments. The
fundamental assumption of the workshop, he said, is that federal
investments in research have returns to society that can be measured.
However, this assumption raises the much larger question of how the
innovation system operates. Policymakers have a tendency to simplify
the operation of the system. For example, they may draw a
straightforward connection between basic research and applications and
OCR for page 12
12 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
imply that the basic task is to speed the movement from the former to the
latter. It is “discouraging,” said Sarewitz, that policymakers still feel a
need to present such simplifications to garner public support.
Rather than introducing performance metrics into an oversimplified
narrative, Sarewitz continued, perhaps it would be better to improve the
narrative. This requires re-examining the role of research in the broader
innovation process.
The Features of Complex Systems
Case studies of the role of research in innovation reveal an
extremely complex process in which research is an important element of
the process but not the only important element. “Everything is connected
to everything else,” said Sarewitz. “It’s an ecosystem, and all things flow
in different ways at different times depending on who is looking when
and where in the process.” For example, technology often enables basic
science to address new questions. Similarly, tacit knowledge acquired
through the day-to-day practice of, for example, engineers or physicians
can raise important questions for researchers. As an example, Sarewitz
cited a statement by former NIH Director Harold Varmus that some
cancer treatments are “unreasonably effective” but that it is hard to fund
research on these treatments because such research is considered high
risk. “I was stunned by this, because my view of the complexity of the
innovation system is that if we understand that technologies and practices
themselves are sources of problems that research can address, then one
ought to see unreasonably effective cancer treatments as an incredibly
potent attractor of research.” However, the predominant model of
research pursued at NIH is to understand the fundamental dynamics of a
disease, which then will lead rationally toward the best treatments to use.
There is a deeper problem, said Sarewitz. In a complex system such
as the innovation ecosystem, there is no reason to believe that optimizing
the performance of any one part of the system will optimize or even
necessarily improve the performance of the system as a whole. “Another
way to put this is that research is not an independent variable in the
innovation system. We generally don’t know what the independent
variables are. For analytical purposes there may not be any.”
The connections that link the elements of the innovation system
represent contextual factors that can be crucial determinants of
performance. Factors such as trust among the people in an institution,
administrative structures that allow for rapid learning and adaptation, or
OCR for page 13
13
THE USES AND MISUSES OF PERFORMANCE MEASURES
historical ties between different institutions that allow them to work
together can be very important for determining the dynamics and
ultimate success of complex innovation processes. These sorts of internal
systems dynamics can be teased out through careful case studies,
Sarewitz said. But they are very difficult to capture in de-contextualized
and rigid performance measures.
The Policy Perspective
Policymakers have an array of tools that they can use to try to
influence the behavior of complex innovation processes. However, just a
few of these tools relate directly to research, and the relations among
these tools are poorly understood. For example, analysts would have
difficulty measuring and comparing the performance of intramural
laboratories and extramural university research without also knowing the
institutional contexts of the research performers.
More generally, research performance measures may reveal little
about the value and contextual appropriateness of the full array of
science policy tools. For example, tools like demonstration and
procurement, especially as done by the Department of Defense, have
been enormous drivers of innovation in the past, yet they are outside the
domain of research performance measures. Given the importance of
other factors, optimizing research performance could lead to undesired
outcomes.
These undesired outcomes may even have ethical and moral
dimensions, said Sarewitz. For example, policy decisions in the early
1980s accelerated the privatization of the results of publicly funded
research and helped to elevate the importance of patents as an apparent
indicator of innovation. However, these policy decisions have
consequences that bear on equity to access of some of the products of
publicly funded research. In the medical arena, to cite an example
Sarewitz mentioned, they could have slowed innovation in socially
important domains of research, such as the development of agricultural
biotechnologies for developing countries.
Innovative Approaches
The science and technology policy and research communities have
to engage as imaginatively as possible in expanding the array of
approaches used to understand, assess, and talk about innovation
OCR for page 14
14 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
processes and their outcomes in society, Sarewitz said. First, new
understandings of complex innovation processes can be used to help
improve policy making. Case studies, for example, can produce synthetic
systems-oriented insights that can have a powerful and enriching impact
on policy making and “hopefully, change the narrative.”
Second, the science policy research community can do a better job
of coming up with diverse performance criteria and measures that can
support rather than displace qualitative insights. An interesting recent
example involved the public policy analogues of market failures, which
could be used to drive public investments in the same way that market
failures have in the past (Bozeman and Sarewitz, 2005). “We don’t know
yet if this particular approach is going to turn out to be a valuable tool,”
said Sarewitz. “The point I’m trying to make is that the narrow array of
things we are now measuring as indicators of performance of the
innovation system, mostly matters of research productivity, is
impoverished and we can and should do better.”
Research is crucially important in innovation, Sarewitz concluded.
But its importance is contextual and contingent in space, among
institutions, and over time. “If decision makers focus on optimizing
performance and the innovation enterprise based on measures that
largely deal with research, research performance, and research outputs,
they’ll likely fail to achieve the goals that the public expects from the
nation’s R and D investment.”
OVERCOMING THE CHALLENGES OF RESEARCH
MEASURES
In a commentary on Feller’s and Sarewitz’s presentations, Alfred
Spector, Vice President at Google, agreed that mechanisms are needed to
determine the right amount, the proper balance, and the overall
effectiveness of research investments. But he also pointed out that these
mechanisms face several challenges.
First, measurement imposes overhead on the research community.
Especially when the measurements do not seem to be related to specific
outcomes, researchers can chafe at the time and effort involved in filling
out forms or answering questions. If measurements were simple,
overhead would be reduced. But the innovation system is complex and
single measures can be misleading, which means that multiple measures
are needed.
OCR for page 15
15
THE USES AND MISUSES OF PERFORMANCE MEASURES
The act of measuring also can perturb the research being done.
Spector cited an example from computer science involving the relative
emphasis on patenting. He said that most people working in his field
would conclude that greater emphasis on patenting would reduce the rate
of innovation. “Most faculty agree that patents in computer science
basically are almost always a bar that reduces the rate of innovation by
creating rigidities and without the benefits of the economic incentives
that are supposedly being provided. This may not be true in the
biotechnologies, but it is true, I believe, in my field.”
Some measures also may be outdated. For example, publications
have been important in the past. But in computer science today, an
important product of research is open source software that is broadly
disseminated. Such dissemination is a form of publication, but it is not a
refereed publication that traditionally has factored into evaluations.
Similarly, open standards can be incredibly valuable and powerful, as
can proprietary products that establish the state of the art and motivate
competition.
Accounting for Overlooked Measures
Greater transparency can help overcome these challenges, said
Spector. The growth of modern communication technologies makes
transparency much more feasible today than in the past, providing a more
open view of research outcomes. Similarly, better visualizations can
produce representations that are useful to policymakers and the public in
assessing the value of research.
One of the most important products of research, though it is
sometimes overlooked, is the training of people, Spector said. “If you
talk to most of my peers in industry, what we really care about as much
as anything else is the immense amount of training that goes on through
the research that’s done.” For example, venture capitalists would rate
talent as the most important input into innovation.
Also, the diversity of research approaches can be an important
factor in research. In computer science, for example, funding has come
not only from the NSF, in which peer review largely determines what
science will be done, but also from the Defense Advanced Research
Projects Agency, which has a much more mission-oriented approach.
“DARPA has made huge bets, primarily on teams that they believed
would win those bets. That has also resulted in huge results.” However
OCR for page 16
16 MEASURING THE IMPACTS OF FEDERAL INVESTMENTS IN RESEARCH
research is measured, it has to accommodate different approaches to
realize the advantages of diversity, Spector said.
Failure is an important aspect of research. If there is no failure in
research projects, then they are not at the right point on the risk-reward
spectrum, said Spector. Rewarding failure may not seem like a good
thing, but for research it can be essential. At Google, said Spector, “we
view it as a badge of honor to agree that a certain line of advanced
technology or research is not working and to stop and do something else.
I think we need to have measurements like that in the world at large,
although it’s clearly a challenging thing to do.”
Finally, the potential for serendipity needs to be rewarded. “If
everything is so strongly controlled, I have a feeling we’ll do whatever
the establishment feels is right and serendipity will be removed.”
Serendipity often produces the creative disruption that reshapes entire
industries, Spector concluded.
DISCUSSION
In response to a question about using measures of research
outcomes to increase commercialization, Feller warned against the
distortions such initiatives can produce in agencies such as NSF. He
agreed with Spector that industry is more interested in the trained
students research produces than in specific findings or patents. Also,
researchers are usually not able to predict with certainty the commercial
or societal implications of their research.
However, Feller added that it may be possible to document the need
for transformative research. For example, NSF has been funding Science
and Technology Centers that are focused on emerging scientific
opportunities with important societal implications, such as hydrological
research or the atmospheric sciences, that can have difficulty obtaining
funding through conventional channels because they are too risky or
large. These centers can even be evaluated in part using traditional
measures, such as the number of collaborators from different disciplines
on papers. Sarewitz agreed that the agencies need to emphasize high-risk
research because universities tend to pursue incremental change.
A workshop participant asked about the best way to evaluate
research across an entire agency such as NSF to make decisions about
the allocation of funding. Feller emphasized the importance of truth and
transparency. He praised the work of the Science of Science and
OCR for page 17
17
THE USES AND MISUSES OF PERFORMANCE MEASURES
Innovation Policy (SciSIP) Program at NSF and said that NSF needs to
draw on the expertise being developed by the program and elsewhere in
the agency. He also noted the need to re-fashion the Government
Performance and Results Act (GPRA) to be more suited to research. At
the same time, he noted the potential problem of researcher overhead and
the need for measures to produce useful information. Sarewitz added that
increments of information tend to have no impact on institutional
decision-making processes.
Measures of research performance can help agencies “get their
house in order,” said Feller, since many allocation decisions are still
internal to agencies. However, measures demonstrating positive research
outcomes do not necessarily guarantee that Congress will continue to
allocate funds for those programs. “At some point, these remain
fundamentally political decisions with a strong tang of ideology,” said
Feller. Congress or OMB can always question, for example, whether a
given program is an appropriate role for government.
Sarewitz pointed out that oversimplified narratives of innovation
can contribute to this politization. If policymakers had a more
sophisticated perspective on innovation, they would be more willing to
accept a multi-faceted government role rather than devoting money
solely to research. Spector added that information technologies provide
new ways to disseminate these more sophisticated narratives, regardless
of the origins and targets of those narratives.
David Goldston, who was on the planning committee for the
workshop, pointed out that research funding decisions are inherently
political. Showing that a given program is working usually answers a
different set of questions than the opponents of a program are asking.
Feller responded that dealing with the objections raised by the opponents
of a program is like dealing with counterfactual scenarios, in which new
scenarios can constantly be created that either have not been tested or are
impossible to test. Nevertheless, the perspectives of policymakers on
research have changed dramatically over the last few decades, so that
they generally accept the need for the federal government to support
fundamental research.
OCR for page 18