| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 99
Appendix C
Prototyping for EOSDIS
This Appendix was prepared by the Panel to Review NASAs Earth
Observing System in the Context of the USGCRP.
UNIQUE CEIALLENGES OF GLOBAL CHANGE DATA MANAGEMENT
The types of data management now being undertaken are truly without
precedent. Any process in the atmosphere or ocean is intricately entwined
with numerous other processes because researchers are looking at the
subsystems of an "object," the Earth. Gaining understanding and an ability
to predict synoptic changes in the global system will require detecting and
studying numerous interconnections among processes.
Many different models are available for organizing a database, such
as hierarchical, relational, and networked. Prototyping will be needed to
learn how scientists will work with EOSDIS so that organizational schemes
optimally suited to the functioning of different components of the system
can be selected.
1
USE OF DATA ARCHIVES AS A RESEARCH LIBRARY
Data collected under EOS and related Earth observing programs will
form the research library for scientists trying to answer crucial questions
about global change. There is no argument about the imperative to improve
understanding as rapidly as possible.
99
OCR for page 100
10{)
How are libraries used for research? The experience base is partly
with libraries of printed material. Another relevant source is the on-line
literature search. In the library, one starts with a card catalog, which has
limited but valuable cross references. The on-line search allows logical
combinations of subjects or keywords, which improves the precision of the
search. But serious study invariably brings the researcher down to the level
of the book index, and to a lot of old-fashioned browsing. If the books are in
stacks with limited or no access, the job becomes increasingly difficult. On-
line literature searches provide a somewhat more powerful ability to locate
information that conforms to user-specified requirements. Its limitations
result partly from the fact that only key words can be searched.
Metadata, defined in the broad sense as the collection of important
information about data, will form the library catalog for global change
research. Research will require finding metadata and data through interre-
lationships. If this cannot be done, scientists will find themselves thwarted
in trying to trace complex causal effects, the understanding of which are the
objectives of global change research. As with libraries, the more completely
the metadata are accessible to the scientist, the more effective will be the
research. Effective accessibility must include more than the equivalent of
the card catalog. The slowness of finding a comprehensive set of relevant
research material in a library, via indexes, tables of contents, and text scan-
ning, will not be acceptable for answering urgent questions about global
change. Furthermore, an efficient system will assist in keeping up with the
high rates at which EOS data will be accumulated.
DATABASE REQUIREMENTS FOR SCIENTIFIC METADATA
The performance of existing systems for managing complex compila-
tions of scientific data is not encouraging. For example, while catalogs of
"available" planetary data are published, many potential users tell stories of
their failure to obtain the data despite determined efforts. The successes,
where they exist, can be instructive for EOSDIS. For example, one very
effective geoscience data management system is available at the National
Center for Atmospheric Research. Its success has been due in large part
to the development of data management systems, quality controls, and
data archives at a scientific center in consultation and collaboration with
scientists.
One of the few true prototypes is the system built at the NASA Space
Science Center for plate tectonic measurements, which are unusually in-
dependent of other geophysical events. Even for that dataset, for which
queries might appear to be relatively predictable in nature, a highly sophis-
ticated, intelligent system of layers (plus a natural language interface to
the user) is used to process queries. The levels of complexity introduced
OCR for page 101
101
in global change research by following interactions among processes are
absent from the tectonics dataset, however.
The observation that emerges is that little is Mown about how scientists
would use an EOS data management system, and it is premature to define it.
Much is known about how to manage bank records and airline reservations
and inventories. They involve large numbers of relatively simple and highly
predictable transactions, e.g., queries and data operations. Such systems
must keep instantaneous track of all changes, such as bank balances,
and airline seating availability. The requirements for global change are
different. Except for new entries, there will be little change in metadata
already entered, so that keeping track of the system state on a second-
by-second basis will not be needed. But the queries posed will tend to
be complex and of a highly unpredictable nature. They will be driven by
the mandate to the scientist: to understand connections between different
elements of the system, to understand underlying causes, and to develop
the an ability to predict. Research is needed to learn how scientists will
work with EOSDIS through prototyping to select a system well suited to
the functions of global change research.
TIMELY ACCESS TO LARGE DATASETS
The history of dealing with large datasets is also discouraging. Re-
sponses to data requests can be slow, and the NASA and NOAA datasets
are known to be difficult to obtain. Current datasets in both agencies are
minuscule compared with those in the predicted EOS archives. Obtaining
timely answers to pressing issues of global change that may affect society
will require performance at a hitherto undreamed of level. Prototyping of
this aspect of data management can be done with the datasets already in
existence, and much could be learned by developing a system to efficiently
locate and deliver data from existing archives.
MISSING AND BAI) DATA
There is a continuum of problems with data that needs to be addressed
in any data management system and that can be explored with prototype
experiments. Potential problems with data range from the predictable
corrections that must be made on any dataset, through data tagged as "bad"
according to some set of criteria, to data that is missing because either an
expected measurement was not made or "bad" data were eliminated from
the dataset.
First, it is necessary to provide complete information about locations
of missing data, so that the user does not discover until after investing both
human and computer time that a dataset chosen for analysis is unusable.
OCR for page 102
102
Second, decisions must be made about' how to handle a segment of data
that is "bad" from one point of view but may contain useful information for
some other research purpose. Experiments must be done by scientists using
data for research purposes and for checking the effectiveness of different
approaches to their problems. Experimentation must then be done on how
to integrate the method into an overall data management scenario.
Another concern is one of data integrity. Scientists should begin
using data as quickly as possible after obtaining it because standard error
checking algorithms may not identify data that look useable but do not
make sense physically, perhaps, for example, because of a malfunctioning
instrument. Solutions may involve getting data online rapidly (which alone
does not guarantee that it will be used quickly) and developing sample
analysis programs with more sophisticated algorithms that will reveal subtle
but systematic nonsense errors.
VISUAL BROWSING
Clearly, browsing is already an important element of data management.
Less clear is how, in the future, it will be possible to use this technique for
any selection of data as a skimming technique. The process of locating data
to browse will thus use the tools mentioned earlier in the discussions of data
archives and metadata -for research. Prototyping of browsing must include
both workstation visualization tools and the full range of data management
tools that make it possible to find data likely to be of interest.
ACCESS BY MANY
Prototypes should reflect the situation that will obtain in the EOSDIS
era; that is, they should provide easy access to everyone with a need, just like
the analogous library. While those involved can be expected to determine
the design of the prototypes, any who need access to the prototypical
systems should be allowed it, within reasonable financial constraints. Only
in this manner can the prototypes be tested for their effectiveness in serving
the needs of the broader scientific community. Lessons learned from such
tests will be the real products of prototype development that must be
incorporated into the design of EOSDIS.
Representative terms from entire chapter:
data archives