Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 79
5
The Way Ahead
T
hese are exciting and challenging times for federal government sta-
tistical agencies responsible for disseminating their data products
to their user communities. The times are especially challenging for
the National Center for Science and Engineering Statistics (NCSES), which
is finding the importance of its data magnified many fold by the grow-
ing recognition of the role that science and engineering (S&E) investment
is playing as a source of economic and social growth and prosperity.
But these are also uncertain times for federal government agencies like
NCSES that are concerned over the future of their programs in light of
fixed or declining budgets associated with the need to restrain government
spending. There is a simultaneous growth in pressure to carefully evaluate
all government activities to ensure their efficiency and cost-effectiveness.
A key component of efficiency and effectiveness is a well managed and
responsive data dissemination program.
The environment for the data dissemination program for NCSES is also
in flux. The agency is confronting new roles and missions as directed in the
America COMPETES Act, which changed the agency’s name and added
significant new responsibilities. For example, the newly specified role of
serving as a central federal clearinghouse for the collection, interpretation,
analysis, and dissemination of objective data on science, engineering, tech-
nology, research and development, and innovation suggests a need for the
agency to become more strategic in its outlook. NCSES will be venturing
into new territory and will need to support a broader range of data users,
particularly in areas of competitiveness and innovation, even as it seeks to
modernize the dissemination services it now provides. The key to accom-
79
OCR for page 80
80 COMMUNICATING SCIENCE AND ENGINEERING DATA
plishing these ends in an era of expected budget shortfalls and in view of
the limited staff resources in the agency, including some of the technological
skills that will be required to modernize the data processing and dissemina-
tion systems, is to take advantage of consortia opportunities and to proceed
within a framework that accords priority to the most essential tasks.
STRENGTH IN NUMBERS
The task of developing and implementing a dissemination improvement
plan is a tall order for NCSES to take on by itself. The agency is already
stressed, with its constrained staff and budget resources, to meet the grow-
ing demand for its data and implement the several new areas of responsibil-
ity that have recently been added to its roles and missions.
One of several possible approaches to meet the needs of data users as
well as to encouraging and expanding development of tools and applica-
tions that would facilitate the dissemination of its information by develop-
ers and dissemination channels is to take the necessary steps in concert with
other agencies in the federal statistical community. The federal statistical
agencies, as a group, have begun to organize to enhance dissemination of
their data in the project called the Statistical Community of Practice and
Engagement (SCOPE). SCOPE is an important beginning. There are effi-
ciencies for both the agencies and users from more cross-agency collabora-
tion, harmonization of definitions and terminology, identification of best
practices, and sharing of the development of common tools that support
best practices. As a participant in this community of practice, NCSES could
maximize use of the capacity of Data.gov for service as a primary public
interface and dissemination platform/portal, retrieval of data sets on the
Data.gov data set hosting platform that is currently being developed, and
harness Data.gov cloud computing power.
NCSES should also consider taking advantage of commonly developed,
user-friendly data delivery and data display tools that have largely been
developed by the World Wide Web Consortium (W3C) community. These
tools address 508 compliant alternatives to tabular displays, develop dis-
plays of complex sample survey data while protecting confidential micro-
data, and develop visualization tools for multifaceted statistical designs.
And it can benefit from such projects as promoting data harmonization
and integration through the development of metadata and data exchange.
Specifically, SCOPE will take the fundamental steps of developing and
implementing Stats Metadata 1.0 (for delivery in fiscal 2012) and estab-
lishing common definitions to facilitate data exchange and interoperability
(by fiscal 2013). The goal is to promote development and use of common
platforms for data collection and data analysis and to suggest research on
OCR for page 81
81
THE WAY AHEAD
solutions to the “data mosaic” problem in the current technology environ-
ment and support the creation of an open-source development community.
TIME-PHASED DISSEMINATION IMPROVEMENT PLAN
The panel understands that not every recommendation made in this
report can or should be implemented immediately. Some recommendations
must build on the implementation of others; for example, development of
an open database structure that can support accessibility and dissemination
through the use of open standards and formats requires that NCSES obtain
from its contractors the data sufficient to make the results reproducible, in a
format enabling automatic reproduction of all published tables, along with
metadata sufficient to interpret the data elements and results.
The implementation of the report’s recommendations should be
undertaken within an overall framework that accords priority to the basic
quality of the data and the fundamentals of dissemination, then to signifi-
cant enhancements that are achievable in the short term, while laying the
groundwork for other long-term improvements. The framework could be
organized along the following lines (highest priority first):
1. Focus on collecting the right data (by contractor or otherwise);
using appropriate change management and version control to estab-
lish data provenance, flag data errors and correct them; annotating
those data with sufficient machine-actionable metadata to establish
a process for interpreting the data, enabling efficient access to
third-party data and to automated NCSES publications; and pub-
lishing the data in formats with web-accessible open interfaces for
all to use.
2. Publish methods for combining old data and new data that have
been collected under different assumptions or categories or that are
disseminated in ways that make them difficult to reintegrate—this
is especially necessary for the data from the old and new industry
research and development expenditure surveys that will popu-
late the Industrial Research and Development Information System
(IRIS).
3. Provide the essential data reductions and visualizations that the
mission of the National Science Foundation (NSF) requires, for
example, when Congress asks for authoritative data on a certain
topic, a trusted group must be able to use the data and derived
publications to calculate answers.
4. Provide a growing array of visualizations and printed products
tailored for the many different uses and users.
OCR for page 82
82 COMMUNICATING SCIENCE AND ENGINEERING DATA
Within this overall framework, three parallel tracks are suggested
with concrete steps to improve data dissemination. The first track involves
improving the transparency and reproducibility of published and dissemi-
nated results by obtaining complete, reliably versioned, well-documented,
and machine-understandable data from contractors. This will require the
modification of current contractual arrangements and procurements as ref-
erenced in the panel’s recommendations. The second track involves improv-
ing use of the NCSES products by establishing a formal, systematic, and
continuous program for evaluating user needs and the usability of NCSES
products via the web and other means of delivery. The third track involves
ensuring full short- and long-term access to NCSES content by providing
open data, offering machine-accessible protocols for access to data and
other products, and establishing a continuous process for replicating or
archiving releases by the National Archives and Records Administration
for long-term preservation and access.
IMPROVING THE TRANSPARENCY AND
REPRODUCIBILITY OF PUBLISHED RESULTS
As noted in earlier chapters, it is not currently possible to automatically
and systematically reproduce or validate all tables and results in NCSES
published products from the raw data. There are many contributing causes:
not all data are made available to NCSES at the level of detail at which they
were collected, data are not accompanied by machine-readable metadata,
and there is a lack of a systematic version control/change-management
process for the data prior to final delivery by contractors.
The root cause of this problem, as we have identified, is insufficient
accountability from contractors. Contractors are not delivering the data
and metadata in the detail most needed, and they are not supplying suffi-
cient metadata, provenance information, or change management. Strength-
ening accountability from contractors is a first step to any improvement in
transparency.
This should be followed by more systematic development of metadata
standards, change management and versioning, and provenance tracking.
These need not be perfect; any open, transparent, machine-understandable,
automatic method could be used. And these can be then improved.
As part of improving metadata standards, NCSES should actively par-
ticipate in the development and implementation of the Data.gov compatible
metadata standard now being explored by W3C and the SCOPE project.
Implementation of this standard, as discussed in this report, will require
revamping the specifications for data delivery now in the contracts of the
agency’s data collectors.
OCR for page 83
83
THE WAY AHEAD
ESTABLISHING A FORMAL, SYSTEMATIC, AND CONTINUOUS
USE AND USABILITY EVALUATION PROGRAM
We have pointed to the need for a continuous use and usability evalua-
tion program, much akin to pointing to the need for a program of continu-
ous improvement that is part and parcel of any total quality management
program. We focus on use and usability because, like other federal statistical
agencies, as NCSES continues to shed its hard-copy publication programs in
favor of providing its data through web applications, usability will become
a more important issue, and new uses and users have begun to be identified.
A first step is to develop a clearer understanding of requirements. In the
first instance, the requirements for an NCSES dissemination program are
essentially determined by the environment facing the agency, its legislative
mandate, and guidance and directives from above. These are assessed in
Chapter 1. The more difficult, but nonetheless important part of establish-
ing a requirement is to understand the needs of its customers—the data
users. As discussed in Chapter 4, NCSES today has only a rudimentary
understanding of the range of its users and their data needs. Thus, the first
step in the plan must be to gain a better understanding of the users of the
data—those primary, secondary, and tertiary blocks of users—and then to
engage them in an effort to understand their needs. Some steps have already
been taken to enhance engagement of user groups. The measures of web-
site use and the new online survey of web users are important and neces-
sary first steps, but they are by no means sufficient to provide the kind of
detailed knowledge NCSES needs. Agency leadership would be well advised
to monitor the maturing space of web metrics and analytics. These, along
with customer service programs, would enable continuous input, evalua-
tion, and understanding of all users and their products.
The learnings from these outreach activities should then be widely
shared. One possible activity would be to glean and post some kind of list-
ing of user sites that have distilled the NCSES basic data, aggregated them,
or combined them with other data. Although these derived forms cannot
carry the NSF imprimatur of accuracy, they can be very helpful.
A suggested next step is to review the initiatives taken by Statistics
Canada to evaluate the usability of its delivery methods. Tied in with
usability, we urge attention to issues of accessibility for all users, with the
understanding that 508 compliance is a necessary but insufficient first step.
We make several suggestions in Chapter 4 and Appendix B for enhanc-
ing the visitor’s experience with the NCSES website. Some of these sugges-
tions can be implemented by NCSES; others will require coordination with
the NSF organizations that establish the basic look and feel of the website.
OCR for page 84
84 COMMUNICATING SCIENCE AND ENGINEERING DATA
ENSURING FULL SHORT- AND LONG-TERM ACCESS
As discussed in Chapter 3, the Internet changes the meaning of access.
Ensuring full access in today’s environment requires that, as much as pos-
sible, machine-understandable microdata and metadata be made accessible
via standard open protocols to any third party for use without restriction.
The power of visualization tools to retrieve and explain the data leads
to the suggestion that a major emphasis throughout the implementation
period should be on providing data that can be easily accessed by visu-
alization tools. We do suggest that NCSES develop visualizations beyond
the kind of rudimentary ones that it already provides in the Science and
Engineering Indicators Digest. Rather, the agency should provide data in
machine-accessible formats and explore partner relationships in the private
sector to identify opportunities to leverage developing or existing tools/
applications, along with maintaining open data formats and standards to
allow individual users to import the data into their visualization sets. By
adopting an approach that stresses the basics of data provision (common
formats with appropriate metadata) and partnerships with the private sec-
tor as opportunities become available, the NCSES will avoid the issue of
rapid obsolescence associated with rapid change in the particular tools and
systems offered by the private sector.
Ensuring long-term access requires that both the NCSES publications
and all of the data necessary to fully replicate them be archived. NSF
should work with the National Archives and Records Administration, as
the archive of record, to ensure that copies of all products and data, includ-
ing those created by contractors, are efficiently delivered for long-term
stewardship.
RAPID ITERATIVE IMPROVEMENTS
The recommendations in this report will take several years to imple-
ment. However, the groundwork can be laid, and many improvements
made, in a relatively short amount of time, even in the first year. We suggest
that at least the following be accomplished in the first year:
• Establish an ongoing archiving process.
• Revise contracts with data providers to ensure accountability for
delivery of full microdata in machine-understandable format with
change control.
• Perform a heuristic evaluation of the website.
• Initiate a process of continuous usage/user data needs collection.
• Disseminate existing microdata available using standard open
machine protocols.
OCR for page 85
85
THE WAY AHEAD
We expect that improvement will be iterative, and will primarily stem
from development of further technologies, methods, and standards and from
the collection of systematic information on user behavior and needs. In light
of this, other recommended tasks can be deferred, awaiting further develop-
ments in technology or methods, for example:
• Redesigning the NCSES website can await heuristic evaluation.
• Developing a detailed metadata standard, can await a candidate
metadata standard from the SCOPE and World Wide Web Con-
sortium initiatives.
• Creating a capacity for user-influenced visualizations can await
further developments in accessible visualization technology.
The future well-being of the U.S. economy depends on the nation’s
capacity to generate, and take economic advantage of, technology-driven
innovations across all industries, particularly those that compete inter-
nationally. This capacity in turn depends on choices that market actors,
including the federal government, firms large and small, educational and
research institutions, state and regional technology-based development
agencies, workers, and students, make with regard to research and devel-
opment, development of the science, technology, engineering, and math-
ematical workforce, and the commercialization of innovation. The data
generated by NCSES will guide these choices. The data dissemination
strategy of the agency, then, will have a substantial influence on the nation’s
future economic path.
Technology is opening the door to significant advances in the ability to
communicate data and analytical products to data users. The promise of
such services as Data.gov and the potential for third-party services, such as
the Google Public Data Explorer, and federated catalogs, such as the Data-
Verse Network, to add value to the data and make them accessible to new
groups of users and for new uses are just becoming recognized. The emerg-
ing Semantic Web (Web 3.0), expanded and new tools and approaches,
open standards and platforms, the potential for mashups, and community-
based platforms (including participative input, transparency by means of
wikis and open government movements) show a more distant promise of
communicating data to users in entirely new ways, much to the advantage
of users and the federal agencies themselves.
To avail itself of the opportunities afforded in these new approaches,
NCSES needs to adopt a vision of the future that supports access to data
directly through the agency and through the many third-party services and
catalogs that are emerging. NCSES also needs to have a plan that will lead
to making its data available through open interfaces and open formats,
OCR for page 86
86 COMMUNICATING SCIENCE AND ENGINEERING DATA
accompanied by open metadata, and to develop the necessary infrastructure
to exploit these advances. These evolving technologies could open opportu-
nities for addressing the visualization experience and overcoming accessibil-
ity limitations more effectively than the current browser-based experiences.