having the data available, although usually difficult to measure other than anecdotally, can be much higher than the cost of preserving them.

On the whole, the data sets described above have been, and will continue to be, preserved by the scientific organizations, agencies, laboratories, libraries, and professional societies that played a major role in their original acquisition or dissemination. A key question to be examined later in this report is what happens when the originally responsible institution is unable or unwilling to continue preservation and dissemination. This question has a corollary that introduces what may be a third class of scientific record: how do we as a nation preserve enough documentation about dormant—but not obsolete—areas of experimental science that benefited from considerable accumulated expertise (materials for breeder reactors, bubble chambers, possibly calorimetry, and others) that the expertise can be recovered when needed at some future time? There is a need, that might become urgent, to identify those scientific and technological information sources (paper, as well as electronic) that are in danger of being lost if no one steps in to preserve them.

3 DATA MANAGEMENT REQUIREMENTS

There are a number of general requirements critical to subsequent use of data from physics, chemistry, and materials sciences, including the need

  • for detailed metadata along with the data set. This includes the need to preserve and describe the algorithms and models used to acquire, process, evaluate, or utilize the data set.

  • to save classified data and to ensure that it is declassified as soon as the reason for classification is no longer applicable.

  • for a comprehensive, up-to-date, easily accessible, national or international “locator system” that will enable scientific researchers to determine whether sought-for data exist and how to access them.

  • for prompt access. Scientific researchers need to know quickly if data exist and, for some types of data, to have rapid access to the data.

  • to handle different kinds of appropriate storage media and yet to have standards so that the number of types of media is limited.

  • to handle myriad appropriate data formats and yet to ensure that data are retrievable and usable.

  • to consider data management and preservation, and to provide guidance to scientists as to appropriate formats for data of long-term value, during project initiation.

Metadata, Algorithms, and References

The scientific community often lumps under the heading “metadata” all the information that is required to understand what is in a data set and how to access and use it. In other words, the metadata provide information and references equivalent to that which would be in a peer-reviewed, archived, research journal article, in the introductory chapter(s) of a book of tables, or in an instruction manual. This includes a description of the algorithms and models used to process or interpret the data, as well as the environmental variables, calibration procedures, and other experimental details. The panel believes that it is crucial to preserve the algorithms and models along with the data.

The standard methods in many areas of laboratory science use electronic and computational filters on the data, followed by computational analysis of the resulting data. The programs and program descriptions used by experimenters (and also evaluators) should be saved, along with the papers and reports that are referred to in the documentation for these programs.

The information necessary to make full, effective use of scientific data include the definitions of the data in each field; a description of the methods used to acquire and manipulate the data; references to the sources of the data; detailed descriptions of samples, conditions, assumptions, limitations, etc.; deviations from standard practice; and references and cross-references to other work and data sets. In contrast, archivists generally decide that the metadata are sufficient when they, as nonspecialists, can tell what is in the record and how to access it. On the one hand, institutions holding electronically stored scientific data should be encouraged strongly to ask researchers what documentation is required in order to utilize the data fully and to include that documentation in the archival record. On the other hand, documentation from the scientific community should keep this difference in mind and provide



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement