2
Lessons Learned from Developing Metrics

Industry, academia, and federal agencies all have experience in measuring and monitoring research performance. This chapter describes lessons learned from these sectors as well as insight from retrospective analysis of the stratospheric ozone program of the 1970s and 1980s that might be useful to the Climate Change Science Program (CCSP).

INDUSTRY RESEARCH

Use of Metrics in Manufacturing

For more than 200 years,1 industry has employed metrics to monitor budget, safety, health, environmental impacts, material, energy, and product quality.2 A study group of 13 companies has been meeting since 1998 to

1  

DuPont, E.I., 1811, Workers’ rules, Accession 146, Hagley Museum, Manuscripts and Archives Division, Wilmington, Del.; Hounshell, D.A., and J.K. Smith, 1988, Science and Corporate Strategy, DuPont, p. 2.; Kinnane, A., 2002, DuPont: From the Banks of the Brandywine to Miracles of Science, E.I. DuPont, Wilmington, Del., 268 pp.

2  

Examples of financial metrics can be found in the annual report of almost any major chemical company. Quality management metrics appear in the International Organization for Standardization’s ISO 9000 (<http://www.iso.ch/iso/en/iso9000-14000/index.html>). Examples of safety, health, environmental, material consumption, and energy consumption metrics are given in National Academy of Engineering (NAE) and National Research Council, 1999, Industrial Environmental Performance Metrics: Challenges and Opportunities, National Academy Press, Washington, D.C., 252 pp.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program 2 Lessons Learned from Developing Metrics Industry, academia, and federal agencies all have experience in measuring and monitoring research performance. This chapter describes lessons learned from these sectors as well as insight from retrospective analysis of the stratospheric ozone program of the 1970s and 1980s that might be useful to the Climate Change Science Program (CCSP). INDUSTRY RESEARCH Use of Metrics in Manufacturing For more than 200 years,1 industry has employed metrics to monitor budget, safety, health, environmental impacts, material, energy, and product quality.2 A study group of 13 companies has been meeting since 1998 to 1   DuPont, E.I., 1811, Workers’ rules, Accession 146, Hagley Museum, Manuscripts and Archives Division, Wilmington, Del.; Hounshell, D.A., and J.K. Smith, 1988, Science and Corporate Strategy, DuPont, p. 2.; Kinnane, A., 2002, DuPont: From the Banks of the Brandywine to Miracles of Science, E.I. DuPont, Wilmington, Del., 268 pp. 2   Examples of financial metrics can be found in the annual report of almost any major chemical company. Quality management metrics appear in the International Organization for Standardization’s ISO 9000 (<http://www.iso.ch/iso/en/iso9000-14000/index.html>). Examples of safety, health, environmental, material consumption, and energy consumption metrics are given in National Academy of Engineering (NAE) and National Research Council, 1999, Industrial Environmental Performance Metrics: Challenges and Opportunities, National Academy Press, Washington, D.C., 252 pp.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program identify metrics that could be useful tools for industry.3 The group found that the development of useful metrics in the manufacturing sector begins with careful formulation of the objectives for creating them. Important questions to be considered include the following: What is the purpose of the measurement effort? What are the “issues” to be measured? How are goals set for each issue? How is performance measured for that issue? How should the metric be compared to a performance standard? How will the metric be communicated to the intended audience? Metrics that have proven useful in the manufacturing sector tend to have the following attributes: few in number, to avoid confusing the audience with excessive data; simple and thus easily understood by a broad audience; sufficiently accurate to be credible; an agreed-upon definition; relatively easy to develop, preferably using existing data; robust and thus requiring minimal exceptions and footnotes; and sufficiently durable to remain relatively constant over the years. Metrics used in manufacturing tend to focus on input, output, or process (see definitions in Box 1.3), and they are commonly normalized to enable comparisons. In general, output metrics (e.g., pounds of product per pound of raw material purchased) have been the most successful because they are highly specific, relatively unambiguous, and directly related to a specific end point. Over time, and frequently after adjustment based on learning, the use of metrics in the manufacturing sector has been so effective as to give rise to the maxim “what gets measured, gets managed.” Extension to Research and Development Success in the manufacturing sector encouraged efforts to develop quantifiable metrics for research and development (R&D) beginning in the late 1970s.4 However, problems immediately arose. The most successful manu- 3   The group meets under the sponsorship of the American Institute of Chemical Engineers’ (AIChE) Center for Waste Reduction Technology. See reports on AIChE collaborative projects, focus area on sustainable development at <http://www.aiche.org/cwrt/pdf/BaselineMetrics.pdf>. 4   Blaustein, M.A., 2003, Managing a breakthrough research portfolio for commercial success, Presentation to the American Chemical Society, March 25, 2003; Miller, J., and J. Hillenbrand,

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program facturing metrics measured a discrete item of output that could be produced in a short amount of time. These conditions are difficult to achieve in R&D. Research outputs are far less easily defined and quantified than manufacturing outputs, and the proof that a particular metric measured something useful, such as a profitable product or an efficient process, might take years. Early metrics proposed for R&D included the following: Input metrics: total expenses or other resources employed, expenses or resources consumed per principal investigator (PI), and PI activities such as number of technical meetings attended. Output metrics: number of compounds or materials made or screened, and number of publications or patents per PI. Outcome metrics: PI professional recognition earned. The list of possible metrics was long and failures were common. For example, “number of compounds made” could lead to an emphasis on “easy” chemistry instead of groundbreaking effort in a difficult, but potentially fruitful, area. Moreover, absent any professional judgment on the relevance and quality of items such as “technical meetings,” measurement of these items merely consumed time and money that might have been better spent elsewhere. Based on these early lessons learned, a small number of process and output metrics emerged that proved useful to some businesses. These included elapsed time to produce and register a quality product, from discovery to commercialization; and creation of an idealized vision for processing operations such as no downtime, no in-process delays, or zero emissions. Although such goals, stated as process metrics, might not be reachable, they serve to drive research in a desirable direction. Ultimately, most R&D metrics fell from favor because of the long period between measurement and analysis of the result of R&D and the need for expert judgment in evaluating the quality of the item being measured. They are being widely replaced by a “stage-gate” approach for managing R&D. In the stage-gate approach, the R&D process is divided into three or more stages, ranging from discovery through commercialization (see Table 2.1). The number of stages is usually specific to the business, with the number increasing as the complexity and length of the R&D process increase. For example, there will be more stages in the R&D process     2000, Meaningful R&D output metrics: An unmet need of technology and business leadership, Presentation at the Corporate Technology Council, E.I. DuPont, June 20, 2000.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program TABLE 2.1 Example of the Stage-Gate Steps and Metrics for R&D in a Traditional Advanced Materials Chemical Industry Metric Theme Stage 1 Feasibility Stage 2 Confirmation Stage 3 Commercialization Sustainable product • Customer needs have been analyzed • Improved properties, identified through analysis of customer needs, such as increased strength and corrosion or stain resistance, have been demonstrated • New discovery has led to an established patent position • Manufacturing or marketing strengths of the new discovery have been analyzed • Customer alternatives to the use of the new product have been analyzed Economics • Product or process concept has been proven, even though economic practicality has not been established • Materials cost, process yield, catalyst life, capital intensity, and competitors have been analyzed • Sustained pilot operation has been achieved • Impurities and recycle streams have been analyzed Customer acceptance • Target customers have been identified • Plan exists for partnerships and for access to market • Customer reaction to prototype has been satisfactory • Partnerships and access to market have been established Safety and environment • Alternative materials have been considered • Radical processing concept has been considered • Safety in use has been analyzed • Inherently safe and “green” concepts have been demonstrated • Toxicology tests have been completed • Design exists for “fail-safe” operation and “zero” emissions

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program of a drug company than in the R&D process of a polymer developing company. Groups of metrics are identified within each stage, and a satisfactory response to the metrics must be achieved before the project is allowed to proceed to the next stage. Advancement to successive stages can easily be tracked and converted to process metrics reflecting the status or progress of a program (e.g., a yes or no answer to whether the program has been completed). The main difficulty with the stage-gate approach is in choosing the metric themes for each stage and assessing the quality of the results, both of which require professional judgment. A stage-gate process such as that illustrated in Table 2.1 is generally initiated by the scientists in the organization, following an R&D discovery or a promising analysis.5 Generally, after a year of effort by a single principal investigator, the program either transitions to a managed stage-gate R&D program or is terminated due to apparent infeasibility or poor fit with the business intentions of the company. Whether or not an R&D project advances to Stage 1 depends on demonstration of the following: technical feasibility, scientific uniqueness, availability of skills within the organization required to bring the research to fruition, ability to identify a market within the growth areas promoted by the company, ability to define realistic goals and objectives and to establish a clear focus and targets, and ability to attract sponsorship by the business unit of the organization. Tools for Strategic Analysis Industry commonly uses metrics to guide strategic planning. Lessons learned from this experience include the following: Metrics can be applied to most ongoing operations. The greatest value of metrics will be achieved by selecting a few key issues and monitoring them over a long time. Data for measuring research progress are generally of poor quality initially. Standardization of data collection, quality assessment, and verification are necessary to produce broadly credible results. 5   Blaustein, M.A., 2003, Managing a breakthrough research portfolio for commercial success, Presentation to the American Chemical Society, March 25, 2003; Carberry, J., 2004, Managing research programs via numerical metrics and/or a “stage gate” process,” Presentation to the NAE Committee on Global Climate Change R&D, March 3, 2004.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program Most successful R&D programs measure progress against a clearly constructed business plan that includes a statement of the task, goals and milestones, budget, internal or external peer review plan, and communication plan. Applicability to the CCSP The industrial experience with metrics has much to offer the CCSP. For example, the attributes of manufacturing metrics (e.g., few metrics, easily understood) and the importance of expert judgment in assessing the relevance and quality of process and output metrics are likely to be widely applicable. In addition, a number of industry approaches (e.g., analysis of program resource distribution, use of R&D process metrics and peer review rankings, graphical summaries) could be used to guide strategic planning and improve R&D quality and progress. Finally, a stage-gate process might be used to help CCSP agencies plan how to move a program emphasis from the discovery phase that precedes Stage 1 (feasibility of using basic research results to improve decision making), to Stage 2 (developing and testing decision-making tools), to Stage 3 (decision making and communicating program results). Following is a hypothetical example related to the CCSP. Suppose that R&D funding is $900 million and that it is divided among the CCSP goals and approaches according to Table 2.2 (shown graphically Figure 2.1). TABLE 2.2 Hypothetical Distribution of Funding Applied to CCSP Overarching Goals and Core Approaches   Funding (millions of dollars) CCSP Goals→ Improve Knowledge Improve Quantification Reduce Uncertainty Under-stand Adaptability Manage Risk Percentage Approaches↓ Fundamental research 100 85 20 40 37 31 Enhance observations 72 63 73 24 45 31 Aid decision making 19 37 48 65 80 28 Communicate results 26 17 16 14 19 10 Percentage 25 22 17 16 20  

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program FIGURE 2.1 Distribution of effort (A) in the five CCSP overarching goals and (B) in the four CCSP core approaches, based on hypothetical data in Table 2.2. From these data, program managers would decide if the distribution of effort is appropriate or if adjustments are needed. They may decide, for example, that too little of the effort is focused on communicating results. Once program managers are satisfied, the process of evaluating the quality of research activities can begin. Again for the hypothetical example above, assume that the $63 million to improve quantification or enhance

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program observations (Table 2.2) is divided among six research projects. Assume also that 11 R&D measures of the management and leadership process have been developed and scored by peer review as shown in Table 2.3. The peer review panel might evaluate all six projects and rate the quality of the project management on, for example, a 1 to 5 scoring system. An illustration of that scoring system follows: R&D Metric: Quality of the Internal or External Review Process for This Task = Poor—no review plan in place; no reviews, even ad hoc = Fair—no review plan in place; infrequent, ad hoc reviews; unreliable follow-up = Average—review plan exists; irregularly followed; unreliable follow-up = Good—plan exists; regularly followed; spotty follow-up = Excellent—plan exists; regularly followed; excellent follow-up TABLE 2.3 Hypothetical Example of 11 R&D Process Metrics Applied to Six Research Projects Metric Project 1 Project 2 Project 3 Project 4 Project 5 Project 6 Average Quality of the internal or external peer review process for this task 4 5 3 2 2 4 3.3 Statement of the task is sufficiently focused and specific to be evaluated by the peer review process 4 5 3 3 3 3 3.5 Quality of the selection and definition of long-term goals 4 5 2 1 3 2 2.8 Quality of the selection and definition of milestones 3 3 2 1 1 2 2.0

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program Metric Project 1 Project 2 Project 3 Project 4 Project 5 Project 6 Average Progress in achieving milestones 3 3 1 1 1 2 1.8 Communication of the work 1 4 1 2 1 3 2.0 Projected cost to completion in relation to relative importance of the subject and total funds that might be available 2 4 1 3 3 3 2.7 Usefulness of the results in meeting the overall goal 4 5 1 2 3 3 3.0 Feasibility of completing the work in a time frame useful for the overall study 4 4 2 2 4 3 3.2 What is the assessment of the scientific quality of the work? 4 5 3 4 2 3 3.5 What is the assessment of the performance versus the technical specification? 4 4 2 3 3 3 3.2 Average 3.5 4.5 2.2 2.5 2.8 3.4 2.8 NOTE: Rankings are given on a 1-5 (poor to excellent) scale.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program From the information in Table 2.3 program managers could begin asking critical questions about the quality of the R&D effort, for example: Is research project 3 so weak based on the average score that it should be discontinued or at least supervised more closely? Why are the scores for progress in achieving milestones (fifth measure) uniformly low and what can be done? UNIVERSITY RESEARCH Metrics in academia are used to assess the performance of faculty, departments, and the university itself, as well as to manage resources. Metrics to evaluate the success of a university generally focus on outcomes and impacts, such as fraction of degrees completed, student satisfaction, success of the graduates, and national reputation.6 Faculty appointment and promotion systems are designed to evaluate a number of activities, including research, teaching, and service. Teaching and service metrics generally focus on outputs (e.g., number of undergraduates taught, courses developed, or committees served on), although judgment is required to assess the quality of teaching and to weigh the prestige of teaching awards and committee memberships. Peer review is the foundation of research assessment (Box 2.1), and it usually takes the form of internal committees that both review the person’s work and take account of outside letters of evaluation from experts in fields relevant to the particular candidate. These evaluations require a good deal of personal judgment—about qualities of mind, the influence of particular ideas or writings, and the person’s promise for future contributions—but usually these subjective judgments are bolstered by metrics of research performance. Examples of research metrics include the following: number of articles or books that have been accepted in the published literature; the subset of articles that have appeared in the “top” journals in a field (i.e., those viewed as having the toughest review); 6   A number of reports rank universities by reputational measures such as the quality of research programs (e.g., National Research Council, 1995, Research—Doctorate Programs in the United States: Continuity and Change, National Academy Press, Washington, D.C., 768 pp.) or other characteristics, such as selection, retention, and graduation of students; faculty resources; and alumni giving (e.g., U.S. News and World Report, 2005, Best Colleges Index, <http://www.usnews.com/usnews/edu/college/rankings/rankindex_brief.php>). A major criticism of such national rankings is that they distract universities from trying to improve scholarship. See National Research Council, 2003, Assessing Research—Doctorate Programs: A Methodology Study, National Academies Press, Washington, D.C., 164 pp.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program Box 2.1 Scholarly Peer Review Peer review is generally defined as a critical evaluation by independent experts of “the technical merit of research proposals, projects, and programs.”a A mainstay of the scientific process, peer review provides an “in-depth critique of assumptions, calculations, extrapolations, alternate interpretations, methodology and acceptance of criteria employed and conclusions drawn in the original work.”b While the focus on scientific expertise is paramount, commentators also note that the “peer review process is invariably judgmental and thus inevitably involves interplay between expert and personal judgments.”c Definitions of peer review generally focus on the independence and the appropriate expertise of the peer reviewer. An adequate peer review satisfies three criteria: (1) it includes multiple assessments, (2) it is conducted by scientists who have expertise in the research in question, and (3) the scientists conducting the review have no direct connection to the research or its sponsors.c The second criterion can be difficult to fulfill in evaluations of interdisciplinary work. Even if a peer review group with all of the relevant disciplines is assembled, its members may have difficulty seeing beyond the boundaries of their own disciplines to properly evaluate the integrated product. Ideally, each member of the evaluation group would invest significant time developing at least a basic understanding of the other relevant fields. However, this is a luxury that peer review committees rarely, if ever, have. The ideal of unconflicted peer review (criterion 3) is also usually not achieved, simply because there is a limited pool of experts and those most knowledgeable are also likely to be connected to the research and its sponsors. In such cases the objective becomes one of minimizing conflict of interest and bias. a   National Research Council, 1998, Peer Review in Environmental Technology Development Programs, National Academy Press, Washington, D.C., p. 2. b   Altman, W.D., J.P. Donnelly, and J.E. Kennedy, 1988, Peer Review for High-Level Nuclear Waste Repositories: Generic Technical Position, Nuclear Regulatory Commission, NUREG-1297, Washington, D.C., p. 2. c   Salter, L., 1985, Science and peer review: The Canadian standard-setting experience, Science, Technology and Human Values, 10, 37–46. number of other publications—including book chapters, conference proceedings, and research reports—that may not have been subjected to peer review; number of citations; number of honors and awards; and amount of extramural funding.7 7   National Research Council, 1995, Research—Doctorate Programs in the United States: Continuity and Change, National Academy Press, Washington, D.C., 768 pp.; Graham, H.D., and N. Diamond, 1997, The Rise of American Research Universities: Elites and Challenges in the Postwar Era, Johns Hopkins University Press, Baltimore, Md., 319 pp.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program Strategic Goal Annual Performance Goal Performance Measure Serve society’s needs for weather and water information Improve accuracy and timeliness of weather and water information • Lead time (increased to 13 minutes), accuracy (increased to 73%), and false alarm rate (decreased to 69%) for severe weather warnings for tornadoes • Increased lead time (53 minutes) and accuracy (89%) for severe weather warnings for flash floods • Hurricane forecast track error (48 hour) reduced to 128 • Accuracy (threat score) of day 1 precipitation forecasts increased to 27% • Increased lead time (15 hours) and accuracy (90%) for winter storm warnings • Cumulative percentage of U.S. shoreline and inland areas that have improved ability to reduce coastal hazard impacts increased to 28% aNational Science Foundation FY 2005 Budget Request to Congress, <http://www.nsf.gov/bfa/bud/fy2005/pdf/fy2005.pdf>. bNational Oceanic and Atmospheric Administration FY 2005 Annual Performance Plan, <http://www.osec.doc.gov/bmi/budget/05APP/NOAA05APP.pdf>.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program TABLE 2.5 Climate Science-Related Performance Measures in OMB’s FY 2005 PART Agency Performance Measurea DOE • Progress in delivering improved climate data and models for policy makers to determine safe levels of greenhouse gases and, by 2013, toward substantially reducing differences between observed temperature and model simulations at subcontinental scales using several decades of recent data. An independent expert panel will conduct a review and rate progress (excellent, adequate, poor) on a triennial basis EPA • Million metric tons of carbon equivalent of greenhouse gas emissions reduced in the building (or industry or transportation) sector • Tons of greenhouse gas emissions prevented per societal dollar in the building (or industry or transportation) sector • Elimination of U.S. consumption of Class II ozone-depleting substances, measured in tons per year of ozone-depleting potential • Reductions in melanoma and nonmelanoma skin cancers, measured by millions of skin cancer cases avoided • Percentage reduction in equivalent effective stratospheric chlorine loading rates, measured as percent change in parts per trillion of chlorine per year • Cost (industry and EPA) per ozone depletion-potential-ton phase-out targets NASA • As validated by external review, and quantitatively where appropriate, demonstrate the ability of NASA developed data sets, technologies, and models to enhance understanding of the Earth system, leading to improved predictive capability in each of the six science focus area roadmaps • Continue to develop and deploy advanced observing capabilities and acquire new observations to help resolve key [Earth system] science questions; progress and prioritization validated periodically by external review • Progress in understanding solar variability’s impact on space climate or global change in Earth’s atmosphere • Progress in developing the capability to predict solar activity and the evolution of solar disturbances as they propagate in the heliosphere and affect the Earth NOAA • U.S. temperature forecast skill • Determine actual long-term changes in temperature (or precipitation) throughout the contiguous United States • Reduce error in global measurement of sea surface temperature • Assess and model carbon sources and sinks globally • Reduce uncertainty in magnitude of North American carbon uptake • Reduce uncertainty in model simulations of the influence of aerosols on climate • New climate observations introduced • Improve society’s ability to plan and respond to climate variability and change using NOAA climate products and information (number of peer-reviewed risk and impact assessments or evaluations published and communicated to decision makers)

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program Agency Performance Measurea USGS • Percentage of nation with land-cover data to meet land-use planning and monitoring requirements (2001 nat’l data set—66 mapping units across the country) • Percentage of nation with ecoregion assessments to meet land-use planning and monitoring requirements (number of completed ecoregion assessments divided by 84 ecoregions) • Percentage of the nation’s 65 principal aquifers with monitoring wells that are used to measure responses of water levels to drought and climatic variations NOTE: DOE = Department of Energy; EPA = Environmental Protection Agency; NASA = National Aeronautics and Space Administration; USGS = U.S. Geological Survey. a All are long-term measures (several years or more in the future) published with the FY 2006 budget, see <http://www.whitehouse.gov/omb/budget/fy2006/part.html>. agers in particular continue to have difficulty establishing meaningful outcome measures, collecting timely and useful performance information, and distinguishing between results produced by the government and results caused by external factors or players such as grant recipients.16 The report also found that issues within the purview of many agencies (e.g., the environment) are not being addressed in the GPRA context. Agency strategic plans generally contain few details on how agencies are cooperating to address common challenges and achieve common objectives. An OMB presentation to the committee acknowledged the difficulty of taking a cross-cutting view of programs such as the CCSP and identified areas in which performance measures would be especially useful.17 These include reducing uncertainty and improving predictability; assessing trade-offs between different program elements, such as making new measurements and analyzing existing data; and demonstrating that decision support tools are helping decision makers make better choices. These issues are discussed in the following chapters. Applicability to the CCSP It is difficult to extrapolate performance measures from a focused agency program to the CCSP. Some agency goals overlap with CCSP goals 16   General Accounting Office, 2004, Results-Oriented Government: GPRA Has Established a Solid Foundation for Achieving Greater Results, GAO-04-38, Washington, D.C., 269 pp. 17   Presentation to the committee by J. Rothenberg, White House Office of Management and Budget, on March 4, 2004.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program (e.g., NOAA and CCSP climate variability goals), but an agency’s performance measures emphasize its mission and priorities. Moreover, annual GPRA measures are not always suitable for the long time frame required for climate change research. The PART measures allow a long-term focus, but they concentrate on limited parts of the program. The climate change PART measures (Table 2.5), for example, miss a number of CCSP priority areas (e.g., global water cycle, ecosystem function, human contributions and responses, decision support) and other important aspects of the program (e.g., strategic planning, resource allocation). Finally, agency performance measures are not designed to take account of contributions from other agencies. As a result, the aggregate of agency measures does not address the full scope of the CCSP.18 Nevertheless, approaches that agencies have taken to develop performance measures may be useful to the CCSP. Performance measures developed for climate change programs in the agencies provide a starting point for developing CCSP-wide metrics, and OMB guidelines and the Washington Research Evaluation Network (WREN)19 provide tips and examples for developing metrics that are relevant to the program, promote program quality, and evaluate performance effectively (see Box A.1, Appendix A). Finally, all federal agencies with science programs rely on peer and/or internal review to evaluate research performance. Such evaluation will be especially challenging for the CCSP because of (1) a limited pool of fully qualified reviewers for multidisciplinary issues; (2) conflicts of interest, especially for experts funded by participating agencies; and (3) the high cost of conducting peer review in an era of shrinking federal budgets. EVALUATING THE OUTCOME OF RESEARCH Research outcomes and impacts can often be assessed only decades after the research is completed. A number of studies have attempted to trace research to outcomes, including the development of weapons systems and technological innovations, and the advancement of medicine.20 More recently, retrospective review has become an important tool for determining whether 18   Other reasons that simple performance measures cannot be aggregated across fields of research are discussed in Cozzens, S.E., 1997, The knowledge pool: Measurement challenges in evaluating fundamental research programs, Evaluation and Program Planning, 20, 77–89. 19   See <http://www.science.doe.gov/sc-5/wren/>. 20   For example, see Gibbons, M., and R. Johnston, 1974, The roles of science in technological innovation, Research Policy, 3, 220–242; Sherwin, C.W., and R.S. Isenson, 1967, Project Hindsight, Science, 156, 1571–1577; Illinois Institute of Technology Research Institute, 1968, Technology in Retrospect and Critical Events in Science, National Science Foundation, Washington, D.C., 2 vols.; Comroe, J.H. Jr., and R.D. Dripps, 1976, Scientific basis for the support of biomedical science, Science, 192, 105–111.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program research investments were well directed, efficient, and productive (i.e., through the R&D investment criteria; see Appendix A), thus instilling confidence in future investments. Below is a review of the stratospheric ozone program of the 1970s and 1980s, which offers an opportunity to determine what factors made this multiagency program successful. Lessons Learned from Stratospheric Ozone Depletion Research The existence of ozone at high altitude and its role in absorbing incoming ultraviolet (UV) light and heating the stratosphere were deduced in the late nineteenth century. By the early 1930s, the oxygen-based chemistry of ozone production and destruction had been described.21 However, the amount of ozone measured by instruments carried on high-altitude rockets in the 1960s and 1970s was less than expected from the reactions involving oxygen chemistry alone. Consequently, the search began for other reactive species, including free radicals, that could reduce predicted concentrations of stratospheric ozone. Among the candidate radicals considered were the NOx group (nitric oxide and nitrous oxide), which is produced by stratospheric decomposition of nitrous oxide,22 and the ClOx group (chlorine atoms and ClO), which has a natural source from volcanoes and ocean phytoplankton.23 Both radicals are also produced from rocket exhaust, which led to public concern over the possibility that space shuttle or supersonic aircraft flights in the stratosphere could lead to depletion of stratospheric ozone. Independently, F. Sherwood Rowland and Mario Molina were investigating the fate of chlorofluorocarbon (CFC) compounds released to the atmosphere. The very unreactivity of CFCs that made them ideal refrigerants and solvents ensured that they would persist and accumulate in the atmosphere.24 Research showed that destruction of CFCs ultimately takes place only after they are transported to high altitudes in the stratosphere, where high-energy ultraviolet photons dissociate CFC molecules and produce free chlorine radicals. On becoming aware of other work showing that 21   Chapman, S., 1930, On ozone and atomic oxygen in the upper atmosphere, Philosophical Magazine and Journal of Science, 10, 369–383. 22   Crutzen, P.J., 1970, The influence of nitrogen oxide on the atmospheric ozone content, Quarterly Journal of the Royal Meteorological Society, 96, 320–325; Johnston, H.S., 1971, Reduction of stratospheric ozone by nitrogen oxide catalysts from supersonic transport exhaust, Science, 173, 517. 23   Stolarski, R.S., and R.J. Cicerone, 1974, Stratospheric chlorine: A possible sink for ozone, Canadian Journal of Chemistry, 52, 1610–1615. 24   The work was later published in Lovelock, J.E., R.J. Maggi, and R.J. Wade, 1973, Halogenated hydrocarbons in and over the Atlantic, Nature, 241, 194–196.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program chlorine atoms can catalyze the conversion of stratospheric ozone to O2,25 Rowland and Molina concluded that an increase in the chlorine content of the stratosphere would reduce the amount of stratospheric ozone, which in turn would increase the penetration of UV radiation to the Earth’s surface.26 Coincident with the publication of their conclusions in Nature on June 28, 1974,27 the two scientists held a press conference, although widespread press attention occurred only when they presented their results at an American Chemical Society meeting later that year. Public interest in the problem, including calls to ban the use of CFCs as propellants in aerosol spray cans, followed. The U.S. government’s initial response was to create the interagency Federal Task Force on Inadvertent Modification of the Stratosphere and to commission a National Research Council (NRC) study of the problem. Reports of these groups supported the overall scientific conclusions.28 The decision to ban CFCs in spray cans in the United States was announced in 1976 and took effect in 1978. Subsequent scientific investigation improved understanding of the chemistry of chlorine in the stratosphere, including the formation of reservoirs such as chlorine nitrate that were not considered in earlier calculations.29 Models used to predict future changes in stratospheric ozone, which included the HOx, NOx, and ClOx chemistries, began to include more complex descriptions of the circulation of air in the stratosphere, interactions with a greater number of molecular species, and improved values (including temperature dependence) of rate constants. As a result, the magnitude of the overall effects of CFCs on stratospheric ozone predicted by the models changed. In fact, in an NRC report issued in 1984, just prior to the discovery of the Antarctic ozone hole, even the sign of ozone change was in doubt.30 25   For example, see Stolarski, R.S., and R.J. Cicerone, 1974, Stratospheric chlorine: A possible sink for ozone, Canadian J. Chemistry, 52, 1610–1615; Wofsy, S.C., and M.B. McElroy, 1974, HOx, NOx, and ClOx: Their role in atmospheric photochemistry, Canadian Journal of Chemistry, 52, 1582–1591. 26   Molina, M.J., and F.S. Rowland, 1974, Stratospheric sink for chlorofluoromethanes: Chlorine atom-catalysed destruction of ozone, Nature, 249, 810–812. 27   Molina, M.J., and F.S. Rowland, 1974, Stratospheric sink for chlorofluoromethanes: Chlorine atom-catalysed destruction of ozone, Nature, 249, 810–812. 28   Federal Task Force on Inadvertent Modification of the Stratosphere, 1975, Fluorocarbons and the Environment, Council on Environmental Quality, U.S. Government Printing Office, Washington, D.C., 109 pp.; National Research Council, 1976, Halocarbons: Effects on Stratospheric Ozone, National Academy Press, Washington, D.C., 352 pp.; National Research Council, 1976, Halocarbons: Environmental Effects of Chlorofluoromethane Release, National Academy Press, Washington, D.C., 125 pp. 29   National Research Council, 1984, Causes and Effects of Changes in Stratospheric Ozone: Update 1983, National Academy Press, Washington, D.C., 340 pp. 30   National Research Council, 1984, Causes and Effects of Changes in Stratospheric Ozone: Update 1983, National Academy Press, Washington, D.C., 340 pp.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program The Antarctic ozone hole, discovered serendipitously during routine monitoring of ozone levels by the British Antarctic Survey,31 was not predicted by any model. However, work on stratospheric chemistry during the preceding decade enabled rapid deployment of tools and instruments for elucidating the cause of rapid springtime Antarctic ozone loss. Within two years the causes of ozone depletion in the Antarctic polar vortex and the impact of similar chemistry in the northern high latitudes had been determined.32 International regulation, including the Vienna Convention (1985), the Montreal Protocol (1987), and subsequent amendments (London 1990, Copenhagen 1992, Montreal 1997, and Beijing 1999), accompanied these discoveries. Today, the response of governments to regulate stratospheric ozone depletion is viewed as a policy success, and concentrations of CFCs have leveled off or begun to decline, although it will be many decades before the Antarctic ozone hole is expected to disappear.33 Applicability to the CCSP A number of lessons can be drawn from the ozone example above: 1. The unpredictable nature of science. Since World War II, the U.S. government has supported a wide range of science activities because it is not possible to predict what research will turn out to be important.34 Rowland and Molina’s inquiry into the fate of a man-made chlorofluoromethane was outside the scientific mainstream, but led to a key breakthrough in the emerging field of stratospheric chemistry. (No one would have thought that the use of underarm deodorant in spray cans could influence anything at a global scale, and it is doubtful a research proposal stating so would have been funded at the time.) The Antarctic ozone hole was unpredictable in the early 1980s because the appropriate two-dimensional models with stratospheric chemistry parameters were not yet developed and key reactions (even key compounds) were not yet known. The application of these models and research had to await the independent observation of the ozone hole. 31   Farman, J.C., B.G. Gardiner, and J.D. Shanklin, 1985, Large losses of total ozone in Antarctic reveal seasonal ClOx/NOx interactions, Nature, 315, 207–210. 32   World Meteorological Organization, 1988, Report of the International Ozone Trends Panel: 1988, World Meteorological Organization, Report 18, Geneva, 2 vols. 33   World Meteorological Organization, 2002, Scientific Assessment of Ozone Depletion: 2002, WMO Report 47, Geneva, 498 pp. 34   Bush, V., 1945, Science, the Endless Frontier: A Report to the President, U.S. Government Printing Office, <http://www.nsf.gov/od/lpa/nsf50/vbush1945.htm>. The report led to the creation of the National Science Foundation to support research in medicine, physical and natural science, and military matters.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program 2. The role of serendipity. A dramatic loss of ozone in the lower Antarctic stratosphere was first noticed by a research group from the British Antarctic Survey that was monitoring the atmosphere using a ground-based network of instruments.35 The same decline was famously missed by satellite observations at first because “anomalously low” values for total column ozone were flagged as potentially unreliable, and the satellite team’s foremost concern at the time was its ability to accurately measure column ozone with the instrument. Subsequent reanalysis of the satellite data corroborated the existence of the Antarctic ozone hole. 3. The role of leadership. Aside from the initial press release by Rowland and Molina in 1974, Rowland’s efforts to publicize the implications of their results were assisted by actions initiated by others, for example, the publicity department of the American Chemical Society and the politicians who called for further investigation. The resulting series of newspaper articles and interviews helped speed political outcomes, including the regulated reduction of CFCs. The rapidity of scientific progress on the causes of the Antarctic ozone hole is attributed by many involved to the leadership provided by Robert Watson, a National Aeronautics and Space Administration program manager who had both a thorough knowledge of the research he was supporting and the political awareness to release results at the most effective times. 4. “Reduction in uncertainty.” This would have been a poor metric for evaluating scientific progress in the early stages of ozone research. Between 1975 and 1984, improved understanding and modeling of how mixtures of gases behave in the stratosphere actually increased uncertainty about the magnitude and even the sign of predicted trends in stratospheric ozone (see Figure 2.2). 5. Role of assessments. “State of the science” assessments can be useful for summarizing complex problems in a way that is useful to policy makers.36 However, their usefulness in guiding future research is less clear. For example, despite the recommendations of committees convened in the 1970s and 1980s, scientific progress was not coordinated. Instead, progress was made by scientists from different fields working on the problem independently and (importantly) communicating their results broadly. 6. Parallels with the problem of climate change are limited. The ozone 35   Farman, J.C., B.G. Gardiner, and J.D. Shanklin, 1985, Large losses of total ozone in Antarctic reveal seasonal ClOx/NOx interactions, Nature, 315, 207–210. 36   For example, see Federal Task Force on Inadvertent Modification of the Stratosphere, 1975, Fluorocarbons and the Environment, Council on Environmental Quality, U.S. Government Printing Office, Washington, D.C., 109 pp.; World Meteorological Organization, 1988, Report of the International Ozone Trends Panel: 1988, World Meteorological Organization, Report 18, Geneva, 2 vols.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program FIGURE 2.2 Predictions of ozone column depletions from the same assumed chlorine and nitrogen scenarios as a function of the year for which the model was current. New discoveries in chemistry and the incorporation of better values for rate constants led to substantial fluctuations in the predictions in the late 1970s and early 1980s. SOURCE: Donald Wuebbles, University of Illinois; used with permission. problem, although complex, involves transport and reactions in the atmosphere of a suite compounds resulting largely from human activities. Climate change, in contrast, involves a number of atmospheric trace gases, aerosols, and clouds, each of which has important cycles that are independent of human activity. Understanding the ozone hole required advances in understanding the physics of atmospheric circulation and heterogeneous chemical processes, the development of methods to measure and monitor chemical species in the stratosphere, and the modeling of feedback mechanisms. Similar progress in understanding basic physical and chemical properties of the Earth system is required before credible climate change predictions can be made. However, the scope of needed advances is vast because most greenhouse gases have important sources and sinks in the biosphere and hydrosphere, and the controls on these fluxes feed back to atmospheric composition and climate. Finally, the Montreal Protocol and subsequent policies involve a relatively small suite of compounds. In contrast, responses to climate change

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program could involve regulating substances important to every sector of the economy. Reductions in one greenhouse gas may be offset by increases in others. For example, increased storage of carbon in fertilized agricultural fields may be offset by increased release of nitrous oxide. CONCLUSIONS Although industry, academia, and federal agencies have not had to develop metrics for programs as complex as global change, their experience can provide useful guidance to the CCSP. For example, the academic experience illustrates the importance of expert judgment and peer review, which are also applicable to basic research in industry and government. The government experience (including the ozone example) shows the importance of leadership and the pitfalls of relying on a single metric such as uncertainty. Finally, the attributes of useful metrics and a methodology for creating them can be gleaned from the industry experience. However, CCSP differs from industry in two important way that are relevant to the creation of metrics. First, in industry a manager or small management team identifies the metrics. In contrast, the CCSP program office will have to arbitrate among 13 independent agencies to choose the few important measures for guiding the program. Each of these agencies might stress a part of the program that best fulfills its mission, but would also be responsible for implementing CCSP metrics. Second, industry operates within a framework of defined income and expenses and specific products. An increase or decrease in profits provides both a motivation to develop effective metrics and an independent check on their success. Government agencies, on the other hand, are funded by taxpayers, and frequently the “profit” is new knowledge or an innovation that is difficult to measure. Moreover, there are no simple independent checks on whether government performance measures are succeeding. A commitment by the CCSP’s senior leadership to achieve and maintain outstanding performance, an open process for developing metrics, and input and feedback from outside experts and advisory groups will be required to over-come these problems. These and other lessons are useful to guide thinking on how and why metrics should be developed and applied. Principles that can be derived from these lessons are discussed in Chapter 3.

OCR for page 19
Thinking Strategically: The Appropriate Use of Metrics for the Climate Change Science Program This page intentionally left blank.