Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page R1
Panel on Communicating National Science Foundation
Science and Engineering Information to Data Users
Committee on National Statistics
Division of Behavioral and Social Sciences and Education
Computer Science and Telecommunications Board
Division on Engineering and Physical Sciences
OCR for page R2
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001
NOTICE: The project that is the subject of this report was approved by the Govern-
ing Board of the National Research Council, whose members are drawn from the
councils of the National Academy of Sciences, the National Academy of Engineer-
ing, and the Institute of Medicine. The members of the committee responsible for
the report were chosen for their special competences and with regard for appropri-
ate balance.
This study was supported by the National Science Foundation under a grant to the
National Academy of Sciences. Support of the work of the Committee on National
Statistics is provided by a consortium of federal agencies through a grant from the
National Science Foundation (award number SES-0453930). Any opinions, find-
ings, conclusions, or recommendations expressed in this publication are those of the
author(s) and do not necessarily reflect the views of the organizations or agencies
that provided support for the project.
International Standard Book Number-13: 978-0-309-22209-9
International Standard Book Number-10: 0-309-22209-5
Additional copies of this report are available from National Academies Press, 500
Fifth Street, N.W., Lockbox 285, Washington, DC 20055; (800) 624-6242 or (202)
334-3313 (in the Washington metropolitan area); Internet, http://www.nap.edu.
Copyright 2012 by the National Academy of Sciences. All rights reserved.
Printed in the United States of America
Suggested citation: National Research Council. (2012). Communicating Science
and Engineering Data in the Information Age. Panel on Communicating National
Science Foundation Science and Engineering Information to Data Users. Committee
on National Statistics, Division of Behavioral and Social Sciences and Education and
Computer Science and Telecommunications Board, Division on Engineering and
Physical Sciences. Washington, DC: The National Academies Press.
OCR for page R3
The National Academy of Sciences is a private, nonprofit, self-perpetuating society
of distinguished scholars engaged in scientific and engineering research, dedicated to
the furtherance of science and technology and to their use for the general welfare.
Upon the authority of the charter granted to it by the Congress in 1863, the Acad-
emy has a mandate that requires it to advise the federal government on scientific
and technical matters. Dr. Ralph J. Cicerone is president of the National Academy
of Sciences.
The National Academy of Engineering was established in 1964, under the charter
of the National Academy of Sciences, as a parallel organization of outstanding engi-
neers. It is autonomous in its administration and in the selection of its members,
sharing with the National Academy of Sciences the responsibility for advising the
federal government. The National Academy of Engineering also sponsors engineer-
ing programs aimed at meeting national needs, encourages education and research,
and recognizes the superior achievements of engineers. Dr. Charles M. Vest is presi-
dent of the National Academy of Engineering.
The Institute of Medicine was established in 1970 by the National Academy of
Sciences to secure the services of eminent members of appropriate professions in
the examination of policy matters pertaining to the health of the public. The Insti-
tute acts under the responsibility given to the National Academy of Sciences by its
congressional charter to be an adviser to the federal government and, upon its own
initiative, to identify issues of medical care, research, and education. Dr. Harvey V.
Fineberg is president of the Institute of Medicine.
The National Research Council was organized by the National Academy of Sciences
in 1916 to associate the broad community of science and technology with the
Academy’s purposes of furthering knowledge and advising the federal government.
Functioning in accordance with general policies determined by the Academy, the
Council has become the principal operating agency of both the National Academy
of Sciences and the National Academy of Engineering in providing services to
the government, the public, and the scientific and engineering communities. The
Council is administered jointly by both Academies and the Institute of Medicine.
Dr. Ralph J. Cicerone and Dr. Charles M. Vest are chair and vice chair, respectively,
of the National Research Council.
www.national-academies.org
OCR for page R4
OCR for page R5
PANEL ON COMMUNICATING NATIONAL SCIENCE
FOUNDATION SCIENCE AND ENGINEERING
INFORMATION TO DATA USERS
Kevin Novak (Chair), Integrated Web Strategy and Technology, The
American Institute of Architects
Micah Altman, Institute for Quantitative Social Science, Harvard
University
Elana Broch, Population Research Library, Princeton University
John M. Carroll, College of Information Sciences and Technology,
Pennsylvania State University
Patrick J. Clemins, R&D Budget and Policy Program, American
Association for the Advancement of Science, Washington, DC
Diane Fournier, Communications Division, Statistics Canada, Ottawa,
Canada
Christiaan Laevaert, Eurostat, Statistical Office of the European Union,
Luxembourg
Andrew Reamer, George Washington Institute of Public Policy, George
Washington University
Emily Ann Meyer, Costudy Director
Thomas Plewes, Costudy Director
Michael J. Siri, Program Associate
v
OCR for page R6
COMMITTEE ON NATIONAL STATISTICS
2011-2012
Lawrence D. Brown (Chair), Department of Statistics, The Wharton
School, University of Pennsylvania
John M. Abowd, School of Industrial and Labor Relations, Cornell
University
Alicia Carriquiry, Department of Statistics, Iowa State University
William DuMouchel, Oracle Health Sciences, Waltham, Massachusetts
V. Joseph Hotz, Department of Economics, Duke University
Michael Hout, Survey Research Center, University of California, Berkeley
Karen Kafadar, Department of Statistics, Indiana University
Sallie Keller, IDA Science and Technology Policy Institute, Washington,
DC
Lisa Lynch, Heller School for Social Policy and Management, Brandeis
University
Sally C. Morton, Department of Biostatistics, University of Pittsburgh
Joseph Newhouse, Division of Health Policy Research and Education,
Harvard University
Ruth D. Peterson, Department of Sociology (emeritus), Ohio State
University
Hal S. Stern, Donald Bren School of Computer and Information Sciences,
University of California, Irvine
John H. Thompson, National Opinion Research Center at the University
of Chicago
Roger Tourangeau, Joint Program in Survey Methodology, University of
Maryland, and Survey Research Center, University of Michigan
Alan Zaslavsky, Department of Health Care Policy, Harvard University
Medical School
Constance F. Citro, Director
vi
OCR for page R7
COMPUTER SCIENCE AND TELECOMMUNICATIONS BOARD
Robert F. Sproull (Chair), Sun Microsystems (retired), Burlington,
Massachusetts
Prithviraj Banerjee, Hewlett Packard, Palo Alto, California
Steven M. Bellovin, Columbia University, New York
Jack L. Goldsmith III, Harvard Law School
Seymour E. Goodman, Sam Nunn School of International Affairs and
College of Computing, Georgia Institute of Technology, Atlanta
Jon Kleinberg, Department of Computer Science, Cornell University
Robert Kraut, Department of Human-Computer Interaction, Carnegie
Mellon University
Susan Landau, Radcliffe Institute for Advanced Study
Peter Lee, Microsoft Corporation, Redmond, Washington
David Liddle, U.S. Venture Partners, Menlo Park, California
Prabhakar Raghavan, Yahoo! Research, Sunnyvale, California
David E. Shaw, D.E. Shaw Research, New York
Alfred Z. Spector, Google, Inc., New York
John Stankovic, Computer Science Department, University of Virginia
John Swainson, Silver Lake Partnership, Islandia, New York
Peter Szolovits, Computer Science and Artificial Intelligence Lab,
Massachusetts Institute of Technology
Peter Weinberger, Google Inc., New York
Ernest J. Wilson, Annenberg School for Communication, University of
Southern California
Katherine Yelick, Computer Science Division, University of California,
Berkeley
Jon Eisenberg, Director
vii
OCR for page R8
OCR for page R9
Contents
Preface xi
Summary 1
1 The Changing Data Dissemination Landscape 7
2 The Current Dissemination Program 19
3 Strategy for Modernizing Data Storage,
Retrieval, and Dissemination 51
4 Engaging Data Users 63
5 The Way Ahead 79
References 87
Appendixes
A Acronyms and Abbreviations 91
B Suggestions for Improving the Website 95
C Biographical Sketches of Panel Members and Staff 103
ix
OCR for page R10
OCR for page R11
Preface
T
he National Center for Science and Engineering Statistics (NCSES),
as a means of fulfilling its mandate to collect and distribute informa-
tion about the science and engineering enterprise for the National
Science Foundation (NSF), conducts a program of data dissemination that
includes provision of data in hard copy and, increasingly, electronic-only
publication and tabulation formats; hosts a website that provides access
to NCSES reports and methods by topic; and maintains two web-based
tools for retrieving data from the NCSES database: the Integrated Science
and Engineering Resources Data System (WebCASPAR) and the Scientists
and Engineers Statistical Data System (SESTAT). These products and tools
serve a community of information users with wide-ranging data needs and
diversity in statistical savvy, access preferences, and technical abilities.
In 2010, in view of an expanded scope of responsibilities recognized in
the America COMPETES Reauthorization Act of 2010, NCSES requested
that the Committee on National Statistics and the Computer Science and
Telecommunications Board of the National Research Council form a panel
to review the NCSES program of collection and distribution of informa-
tion on science and engineering and to recommend future directions for
the program.
In accomplishing this review, the Panel on Communicating National
Science Foundation Science and Engineering Information to Data Users
has conducted two workshops. Their purpose was to gather information
from data users and experts on various aspects of data storage, retrieval,
dissemination, and archiving. At the request of NCSES, the panel issued an
interim report (National Research Council, 2011), which summarized the
xi
OCR for page R12
xii PREFACE
first workshop and recommended action by NCSES on four key issues: data
content and presentation, meeting changing storage and retrieval standards,
understanding data users and their emerging needs, and data accessibility.
The interim report pointed out that the recommended actions should be
considered as preliminary steps that would assist NCSES in preparing for a
transition from current practices and approaches to an improved program
of data dissemination. The analysis and recommendations from the interim
report are carried into this final report, along with the findings of a second
workshop and the results of subsequent analysis by the panel.
The panel is grateful for the active participation of Lynda Carlson,
director of NCSES, and her senior staff and for their informative and frank
discussion of the status of the dissemination programs in the meetings and
workshops conducted by the panel. Special thanks go to John Gawalt, who
was program director for the Information and Technology Services Program
of NCSES at the beginning of this study and later was named deputy direc-
tor of NCSES. He went out of his way on many occasions to respond to
questions posed by the panel and to provide helpful materials as the review
progressed. His replacement, Jeri Mulrow, continued this willing coopera-
tion as she fulfilled the many requests for information to assist in framing
the issues and arriving at recommendations.
A large group of experts from government agencies, the academic
community, and various other user organizations freely gave their time to
prepare presentations for the workshops and enter into a dialogue with the
panel as it gathered information for this report. The users were represented
by Paula Stephan, Georgia State University; Jeffrey Alexander, SRI Inter-
national; Kei Koizumi, Office of Science and Technology Policy, Executive
Office of the President; and Bhavya Lal and Asha Balakrishnan, Science and
Technology Policy Institute of the Institute for Defense Analyses.
Several experts gave presentations on various aspects of dissemina-
tion technology developments focusing on government-wide or statisti-
cal agency approaches. Alan Vander Mallie, program manager, Data.gov,
briefed the panel on the Data.gov initiatives; George Thomas, Office of
Enterprise Architecture, U.S. Department of Health and Human Services,
provided perspective on Data.gov and similar government initiatives to
take advantage of the Internet. Suzanne Acar, senior information architect,
U.S. Department of the Interior, and cochair, Federal Data Architecture
Subcommittee, gave a presentation on the work of the World Wide Web
Consortium (W3C) group, which is making great headway in developing
government-wide solutions to Internet issues. Judy Brewer, director of the
Web Accessibility Initiative of W3C, gave a forceful presentation on the
importance of ensuring that data products on the web are accessible to
persons with disabilities and other limitations.
The panel benefited from the observations of Ronald Bianchi, director
OCR for page R13
xiii
PREFACE
of the Information Services Division of the Economic Research Service of
the U.S. Department of Agriculture, and chair of the Statistical Community
of Practice and Engagement (SCOPE) working group, which is seeking to
develop a collaborative structure for federal statistical agencies to develop
and share best practices—including, for example, several areas of impor-
tance for dissemination, such as information quality, metadata, and com-
mon definitions. Jeffrey Sisson, program manager, American FactFinder,
and Cavan Capps, chief, DataWeb Applications of the U.S. Census Bureau,
gave presentations on these powerful dissemination tools.
The important area of archiving data was discussed by Margaret
Adams, manager of the Archival Records Program, and Theodore Hull,
senior archivist of the National Archives and Records Administration.
Jeffrey Turner, director of sales and marketing of the U.S. Government
Printing Office, and Donald Hagan, associate director, Office of Program
Development of the National Technical Information Service of the U.S.
Department of Commerce, discussed powerful new initiatives and tools
that permit agencies to move away from dissemination of information in
hard-copy formats.
The private sector is playing a growing role in the dissemination of
public data sets, such as those produced by NCSES. The Google Public
Data Explorer initiative was explained by Benjamin Yolken, product man-
ager, and Jürgen Schwärzler, statistician on the Public Data Team of Google.
Steve McDougall, product manager, and Stephan Jou, technical architect
for IBM, described the lessons that have been learned concerning the Many
Eyes website, wherein users can experiment with, download, and create
visualizations of data sets.
The panel is grateful for the excellent work of the staff of the Commit-
tee on National Statistics and the Computer Science and Telecommunica-
tions Board for their support in developing and organizing the workshop
and this report. Tom Plewes and Emily Ann Meyer, costudy directors for
the panel, ably supported our work. Michael Siri provided administrative
support to the panel. We are especially thankful for the personal participa-
tion of Constance F. Citro, director of the Committee on National Statistics,
and Jon Eisenberg, director of the Computer Science and Telecommunica-
tions Board, in the conduct of the workshops and in the preparation of this
report. Their sage advice benefited the report in numerous ways.
The interim report and this final report have been reviewed in draft
form by individuals chosen for their diverse perspectives and technical
expertise, in accordance with procedures approved by the Report Review
Committee of the National Research Council. The purpose of this inde-
pendent review is to provide candid and critical comments that assist the
institution in making its reports as sound as possible, and to ensure that the
reports meet institutional standards for objectivity, evidence, and respon-
OCR for page R14
xiv PREFACE
siveness to the study charge. The review comments and draft manuscript
remain confidential to protect the integrity of the deliberative process.
The panel thanks the following individuals for their review of the
interim report: John Bertot, College of Information Studies, University
of Maryland; Margaret Hedstrom, School of Information, University of
Michigan; Shirley M. Malcom, Education and Human Resources, American
Association for the Advancement of Science; Gary Marchionini, School of
Information and Library Science, University of North Carolina; Kathryn
Pettit, National Data Repository, The Urban Institute; and Daryl Pregibon,
Google, Inc.
A similar note of appreciation is extended to the following individuals
for their review of this final report: Andrew A. Beveridge, Department of
Sociology, Queens College and Graduate Center, CUNY; Martin Grueber,
research leader, Battelle, Cleveland, OH; James Hendler, Tetherless World
Constellation Chair and director, IT and Web Science Program, Com-
puter and Cognitive Science Departments, Rensselaer Polytechnic Institute,
Troy, NY; Joan K. Lippincott, associate executive director, Coalition for
Networked Information, Washington, DC; Kathryn Pettit, senior research
associate, National Data Repository, The Urban Institute, Washington,
DC; Juana Sanchez, Department of Statistics, University of California, Los
Angeles; and Julie Steele, editor, O’Reilly Media, New York, NY.
Although the reviewers listed above have provided many constructive
comments and suggestions, they were not asked to endorse the conclusions
or recommendations, nor did they see the final draft of the report before its
release. The review of the interim report was overseen by Robert F. Sproull,
Sun Labs, Oracle, Burlington, MA; he also oversaw the review of the final
report. Appointed by the National Research Council, he was responsible
for making certain that the independent examination of this report was
carried out in accordance with institutional procedures and that all review
comments were carefully considered. Responsibility for the final content
of the report rests entirely with the authoring committee and the National
Research Council.
Kevin Novak, Chair
Panel on Communicating National Science Foundation
Science and Engineering Information to Data Users