APPENDIX C Examples of Successful International Data Exchange Activities in the Natural Sciences

The following examples of international exchange and management of data in the natural sciences cover a diverse set of activities involving international collaboration in several subdisciplines. All of them meet the minimum criteria for successful international exchange of scientific data generated from publicly funded research.

LABORATORY PHYSICAL SCIENCES DATA

Nuclear Structure Data

The Evaluated Nuclear Structure Data File (ENSDF), a mature database that has existed in electronic form for about 25 years, consists of evaluations of nuclear structure and decay data. Obtained by a variety of experiments and often spanning decades of measurements, most of the data come from primary sources, such as articles in refereed journals. The evaluations are carried out by an international network of individuals and coordinated by the National Nuclear Data Center (NNDC) at Brookhaven National Laboratory, with the international activities coordinated under the auspices of the International Atomic Energy Agency (IAEA). In addition to the United States, evaluators come from Russia, Japan, the People's Republic of China, Taiwan, Kuwait, the Netherlands, France, Sweden, Canada, and Belgium.

The evaluations themselves are reviewed before being disseminated. In the past the evaluations were submitted via magnetic tape to NNDC; more recently, almost all have been submitted via the Internet. All foreign evaluators now rou



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 205
--> APPENDIX C Examples of Successful International Data Exchange Activities in the Natural Sciences The following examples of international exchange and management of data in the natural sciences cover a diverse set of activities involving international collaboration in several subdisciplines. All of them meet the minimum criteria for successful international exchange of scientific data generated from publicly funded research. LABORATORY PHYSICAL SCIENCES DATA Nuclear Structure Data The Evaluated Nuclear Structure Data File (ENSDF), a mature database that has existed in electronic form for about 25 years, consists of evaluations of nuclear structure and decay data. Obtained by a variety of experiments and often spanning decades of measurements, most of the data come from primary sources, such as articles in refereed journals. The evaluations are carried out by an international network of individuals and coordinated by the National Nuclear Data Center (NNDC) at Brookhaven National Laboratory, with the international activities coordinated under the auspices of the International Atomic Energy Agency (IAEA). In addition to the United States, evaluators come from Russia, Japan, the People's Republic of China, Taiwan, Kuwait, the Netherlands, France, Sweden, Canada, and Belgium. The evaluations themselves are reviewed before being disseminated. In the past the evaluations were submitted via magnetic tape to NNDC; more recently, almost all have been submitted via the Internet. All foreign evaluators now rou

OCR for page 205
--> tinely transmit evaluations electronically and use electronic mail to communicate with the coordinators. Although most of the ENSDF file has an 80-character column format inherited from IBM cards, a concentrated effort has been made in the past 2 years to transform the file into a more modem, relational database. Complete conversion will not be possible until a commercial database program is available that can meet the needs of such a large, diverse file, probably after another 2 to 3 years of development. So far, however, the conversion has proceeded smoothly, and it has facilitated the development of software overlay programs that give the user transparent access to the data. One subset of ENSDF, the nuclear database NUDAT, is a true relational database that can be accessed on-line. Traditionally, the evaluations of nuclear structure data were disseminated via a monthly hard-copy journal. Now on-line access, introduced about a decade ago, is available by logging into the NNDC (guest accounts are available for individuals who want to explore the files) or via the World Wide Web.1 On-line access is menu-driven but has limited graphic capabilities. Users can generate high-quality PostScript files of tables and level spectra, which can be downloaded to a local machine. More than 2,000 registered users from 49 countries on six continents currently have on-line access to ENSDF. To minimize the problems with intercontinental electronic links, ENSDF has mirror sites in Vienna and Paris and plans to provide the same at the center in Obninsk, Russia, when Russia's Internet link is established. These mirror sites are maintained by the host foreign government, with support from the IAEA. While the mirror sites have minimized problems with trans-Atlantic electronic links, delays do occur with trans-Pacific links (except those with Japan) and links to South America. To complement on-line access, various modes of dissemination are being used, including CD-ROM and floppy diskettes. Although users in less developed countries may prefer hard copies, for many the cost of the printed journal is prohibitive. Therefore, to maximize international data flow, increased access to the Internet is critical. While nuclear scientists are the most common users of ENSDF, some of the data have important applications in other areas of science. For example, radioactive decay data are used in medical physics. To facilitate access by the medical community, a dynamically generated form of ENSDF, called MIRD, was created that can be accessed on-line. As a portion of the data in the Evaluated Nuclear Data File/B (ENDF/B), ENSDF data are also used in the design of nuclear reactors and devices. Before the early 1980s the distribution of ENDF/B was restricted, because the information was considered sensitive with respect to national security. This file, which is available electronically and on tape, contains results of nuclear reaction model calculations that could be useful to other scientists, for example, stellar astrophysicists seeking to understand nuclear processes in stars.

OCR for page 205
--> High-energy Physics Data Collider Detector at Fermilab Collaboration The Collider Detector at Fermilab (CDF) collaboration presents a unique example of international data exchange and the barriers associated with the transnational flow of data. The CDF collaboration includes more than 400 scientists from 36 national laboratories and universities in the United States, Canada, Italy, Japan, and Taiwan.2 The CDF detector itself cost hundreds of millions of dollars to construct, and the operating expenses at Fermilab are many tens of millions of dollars per year. Because of the high costs in manpower and operations, the results from this collaboration will not be duplicated. Data at CDF are sorted into various data streams based on the physics: electro weak physics, top quark events, b quark events, events that test quantum chromodynamics, exotica, and so on. As of the end of 1995 the grand ensemble of data from CDF was about 108 events at 200 kilobytes per event, or 20 terabytes. This is not an easily manageable data set. The current storage medium is 8-mm tape. Working data sets are 1/1,000 to 1/100 of the total ensemble. It is estimated that at least 90 percent of the analysis is done on the Fermilab computers with data sets at Fermilab. Relatively few data are transferred electronically. The aggregate load on the network into and out of Fermilab is only about 4 gigabytes per month. In the event that data files need to be transferred to another institution, sufficient "bandwidth" is obtained with a briefcase full of 8-mm tapes. While the data analysis is done predominantly at Fermilab, the various subgroups of the 400-member collaboration, which are organized by branch of physics and technical activities, must meet frequently. In the past, meetings were held in a central location, such as at Fermilab. More recently, video conferencing has been used to link institutions in the United States, Japan, and Italy to facilitate discussion and analysis of the data. However, the expense of video conferencing done by an integrated services digital network call across international borders is a major impediment to smaller institutions, particularly universities. Video conferencing done by Internet also creates problems, since the international connections generally do not have high bandwidth. Although none of the available technology is very good at this time, video conferencing has the potential to be the preferred method of realtime exchange and interpretation of data, so that interactive discussions of the analysis and interpretation can proceed more efficiently and cost-effectively. With 400 individuals in a multi-institutional, multinational collaboration, the discussion and dissemination of manuscripts and technical reports could have posed a problem. However, CDF manuscripts and reports are prepared in LaTeX with PostScript output, including encapsulated PostScript figures embedded in the file. A CDF-Notes database enables note numbers to be assigned and topic/distribution categories to be selected. There is also a CDFNEWS procedure for putting the PostScript file and a brief ASCII description into a centrally accessible directory

OCR for page 205
--> and sending notification to all collaborating institutions. Within minutes of learning about a new posting, scientists can retrieve it and have a hard copy at work or home at their convenience. Electronic Preprints of Topical Information in Theoretical High-Energy Physics For researchers in high-energy particle theory—including phenomenology (theoretical calculations that can be directly related to experiment), more formal quantum field theory and string theory, and lattice and computational approaches—rapid access to information has been a higher priority than the, at times minimal, filtering provided by the conventional refereeing process. Consequently, since at least the early 1970s a hard-copy preprint distribution system has supplanted conventional published journals as conveyers of topical information. With the advent of standardized word processors in the mid 1980s, together with widespread networking connectivity by the late 1980s, researchers regarded electronic transmission of prepublication information as a natural next step. When the "e-print archives" based at Los Alamos National Laboratory came on-line in the early 1990s, they were quickly adopted within these communities as the primary mode of communicating topical research information, as well as accessing longer-term archival material during the periods covered. The e-print archives have essentially supplanted established print journals in these fields.3 Table C.1 summarizes the distribution statistics for three e-print archives in theoretical high-energy physics (HEP) at Los Alamos. Such success has not been seen for two other experimental physics archives, especially the experimental TABLE C.1 1995 Statistics for Three e-print Archives in Theoretical High-Energy Physics Archive Start Date Number of Subscribers (numbers approximate) Submissions per Month Retrievals per Month Highly Requesteda (percent) hep-th (abstract theory) 8/91 4,000 196 110 8 hep-lat (computational approaches) 2/92 1,000 37 30 8 hep-ph (phenomenology) 3/92 3,000 250 75 8 NOTE: The numbers are estimates and averages for 1995. a The highly requested articles are those that had more than twice the average number of retrievals.

OCR for page 205
--> nuclear physics archive (nucl-ex), and only now is there substantial activity on the electronic archive in theoretical condensed-matter physics, which is much more closely connected to laboratory science than is high-energy theory. The users of and contributors to the HEP e-print archives include about 6,000 high-energy physicists (experimentalists and theorists), most of whom belong to the American Physical Society. The success of the HEP e-print archives in high-energy theory could be due to several advantages shared in the subdiscipline: A preexisting formal hard-copy preprint distribution system, so that creating and using the electronic version did not represent a major change; A standardized word-processing program (LaTeX or TeX), making the electronic format easily portable; A subject esoteric enough that the contributions were of uniformly high quality, making refereeing less crucial to maintaining quality control; and A critical mass of participants accustomed to e-mail use. Materials Science Data Data in materials science describe a wide variety of properties (e.g., mechanical, electrical, thermal, and structural) of all types of materials (e.g., metals, insulators, and semiconductors) under many different conditions. The data also relate to differences in methods for preparing and refining materials, and, in processes such as the doping of semiconductors, to difference in methods for achieving desired levels of controlled impurity. They have little or no timeliness, except insofar as up-to-date data may replace older, less confirmed data. Hence the databases have moderate permanence, but grow largely as new materials are added and conditions and properties are extended. 4 Although most of the collaborations involve the developed countries, international involvement in data activities is increasingly significant. Examples in materials science include the following: Structure Reports, a printed crystallographic compendium that is edited at a Canadian university, published in the Netherlands, and has contributors from all over the world; Science Group Thermodata Europe, a consortium of eight different laboratories from four European countries that compiles and analyzes thermodynamic data; The Alloy Phase Diagram International Commission, whose members from 12 countries collaborate in the compilation and evaluation of phase diagrams of metallic systems; The collection of diffusion data from the world literature that are abstracted, recompiled, and published in Switzerland and;

OCR for page 205
--> Activities of the CODATA task group on materials database management, with members from seven countries.5 The potential benefits of computer access to technical information are the same for materials scientists and engineers as for researchers in other fields of technology. With reference to alloy design, three additional capabilities are seen to be of increasing importance: The facilitation of empirical searches for correlations of fundamental parameters from very large volumes of data through the making of two-dimensional and higher cross-plots; The ability to make and display arbitrary sections of three-dimensional objects (e.g., crystal structures or ternary phase diagram modes); and The ability to simulate crystal structures, atomic arrays, kinetic processes, and so forth, given basic information on dimensions, energetics, and modeling schema. The most comprehensive system of on-line databases relevant to materials research is that provided by the Science and Technology Network (STN) International, which is sponsored by the Chemical Abstracts Service and by information services in Germany and Japan. Scientists and engineers are able to search approximately 20 databases covering the physical and mechanical properties of thousands of materials, as well as more than 100 factual and bibliographic databases. Although STN International provides the world's greatest concentration of data on the properties of materials, it does not include, for example, data on composites, most ceramics, semiconductors, or elastomers. The software used by STN is particularly adept at handling numeric data inquiries and is quite sophisticated in readily accommodating range searching, conversion of units, and handling of many data variables. 6 STN International is accessed via commercial on-line communication services. Searchers are charged only a modest fee for the time connected, with most of the charge based on the type and amount of data utilized in terms of the number of records involved. However, even a low fee can be prohibitive to a scientist in the developing world. The principal advantages of a system like STN International are (1) the large number of databases available on a single system; (2) the ability to use the same software to search each of the databases; (3) the ability to search all of the databases at the same time or in groups; (4) the sophisticated search software; and (5) the ability to access only needed data without having a massive amount of data on one's own system. Chemical Sciences Data The work of chemists depends on many sorts of data, including compendia of text, patents, numerical tabulations, spectra rendered as analog data, and mo-

OCR for page 205
--> lecular structures presented both as images and in tables of bond lengths and angles. A typical textual database is the Registry of Toxic Effects of Chemical Substances.7 Another is the bibliographic database of the Chemical Abstracts Service.8 About 40 percent of the world's patents are for chemical substances; this information is included in DERWENT, the compendium of patents carried by STN International and other on-line services.9 The numerical databases incorporate thermodynamic and thermophysical data, such as heats of formation and melting points; mechanical properties, such as compressibility; transport properties, such as heat conductivity and viscosity; and kinetic data, such as rate coefficient and activation energy. Some databases, particularly the older ones, such as the Beilstein Institute's compendium of all known organic substances and its counterpart for inorganic and organometallic compounds from the Gmelin Institute, focus on the substances themselves.10 Others, such as the database on atomic energy levels from the National Institute of Standards and Technology, catalog generic properties of a limited class of species, in this case all the atomic species for which data are available. The Cambridge structural database, originally created at the University of Cambridge and now maintained by the Cambridge Crystallographic Data Centre, an independent organization created for the purpose, is a comprehensive file of evaluated data of crystalline materials and supporting references of much interest to materials scientists and chemists.11 Still others, such as DETHERM from the Fachinformationszentrum Chemie and SPECINFO from Chemical Concepts GmbH, carry data on specific properties of as many substances as possible—thermophysical properties for DETHERM and nuclear magnetic resonance and infrared spectra for SPECINFO. One journal, the Journal of Physical and Chemical Reference Data, publishes tabulations of evaluated data in conventional periodical form; these data generally reappear in large electronic databases. Many of the large chemical databases originated in Europe, particularly in Germany, where the chemical industry and academic science thrived a century ago. The older databases appeared as bound volumes. Then loose-leaf forms became the mode of presentation. Now, most of the databases are available on-line and in computer-readable form such as CD-ROM as well. Chemists and materials scientists who use small amounts of data to identify substances during a series of experiments need the data quickly, and so they use handbooks, local reference libraries or, when they can, on-line material. Often they need information on the quality of the data, such as the spectral resolution of tabulated infrared absorptions. They also need to be able to search databases for substances with related structures or properties. Consequently, software tools for linking data describing related substances are important for the working chemist. The provision of many kinds of data in chemistry, materials science, and condensed-matter physics involves deriving and providing analytic representations of physical and chemical properties. This practice is important because scientists and engineers often need accurate values for properties over a range of conditions

OCR for page 205
--> such as temperature and pressure. A significant effort in data analysis thus is directed toward finding underlying relationships on which physically and chemically sound mathematical representations can be constructed. One example is the equation of state, which links equilibrium properties such as density and internal energy with temperature and pressure. GENOMIC SEQUENCE AND RELATED DATA Information on DNA sequences, complete genomes of organisms, and macromolecular gene products is readily available electronically. These data have given rise to new concepts in the life sciences and to the development of new research fields, as well as to discoveries of commercial consequence such as the development of novel drugs and vaccines, diagnostic tools in medicine, improved plant varieties with better growth characteristics and improved food properties, and bacteria needed for environmental remediation. A crucial component of this capacity for innovation is large-scale international collaboration in generating, assembling, and disseminating the necessary data. The need to collect, analyze, and manage the data is leading to interdisciplinary research involving computer science and engineering, database design and artificial intelligence, and basic biological science. Exchange of this information is accomplished through a number of databases at several institutions supported by national governmental research funding agencies, primarily the National Center for Biotechnology Information and the Genome Database in the United States, the DNA Database of Japan, the European Molecular Biology Organization, and the European Bioinformatics Institute.12 Each organization has agreed to collect data deposited by academic, government, or industrial laboratories, put them into a transparent standard format, exchange them daily in order to maintain a common international database, and make them accessible (through the World Wide Web, by e-mail, and so on) for retrieval and analysis at any time from anywhere in the world. Databases of sequencing and protein structure information are linked through the retrieval system to other related databases (e.g., structural and molecular data, genetic maps, information on genetic diseases, and life sciences literature), so that a scientist searching for a particular DNA gene sequence can also examine the crystal structure of the protein in question and perhaps retrieve information about a related disease in humans as well. The coordination of this database effort, the definition of standards, and planning for the future are being done by database staff and advisory committees. Currently, the international database collection contains about 500 million nucleic acid bases (the individual chemical compounds that link to make up nucleic acid (DNA or RNA) sequences) from more than 16,000 organisms. Data are being generated so rapidly that the database doubles in size every 12 months. These data are essential to ground-breaking work in molecular biology and to progress

OCR for page 205
--> in the Human Genome Project, which is determining the sequence of the approximately 109 nucleotides (the combining form of the nucleic acid bases) contained in human chromosomes.13 It is expected that by the time this effort is complete in less than 5 years, it will have a great impact on the development of diagnostic and therapeutic tools for many human diseases that are currently not treatable or whose symptoms only are amenable to mitigation. Currently, there are no intellectual, political, or proprietary barriers limiting international access to and use of these data. The barriers are technical and economic. The most important technical barrier involves equipment and infrastructure limitations on potential end users' capability to access and then make use of the wealth of information available. These data and their free availability to researchers in the life sciences are contributing to the rapid development of new concepts and applications, and there is a great desire and consequent pressure by academic and industrial institutions to keep the data freely accessible internationally in the future. HUBBLE SPACE TELESCOPE ARCHIVE The Hubble Space Telescope (HST) is an example of a science program with significant international participation and open access to the data. HST was developed by NASA with the participation (nominally 15 percent) of the European Space Agency (ESA) under a memorandum of understanding negotiated between NASA and ESA. ESA also participates in its operation. HST is available for use by the international astronomy community. All science data are archived, kept proprietary (to the astronomer who proposed the observation) for 1 year, and then made available to other astronomers. The archive is accessible to the public via the Internet.14 Science operations for HST are centered at the Space Telescope Science Institute (STScI) in Baltimore, operated by the Association of Universities for Research in Astronomy (AURA), a university consortium, under contract to NASA. AURA has international affiliates and incorporates ESA representatives in its oversight of the STScI. ESA also contributes staff to the STScI; they are integrated into the total operation. ESA astronomers participate in HST committees and advisory structure. In addition, ESA operates a small Space Telescope (HST) European Coordinating Facility (the ST-ECF) in Garching, Germany, in collaboration with the European Southern Observatory. HST observing is open to all astronomers worldwide via a peer review system. Under the memorandum of understanding, astronomers from ESA member countries are entitled to 15 percent of the observing time on average. In practice, they receive at least this amount through the normal peer review system. All HST data are received by the STScI. They undergo routine processing and calibration, and both the calibrated and uncalibrated data and the engineering and other ancillary data are archived. The primary archive for HST data at the

OCR for page 205
--> STScI contains about 2 terabytes of data and is growing at the rate of a gigabyte per day. A duplicate copy of the science data archive is transferred to the ST-ECF, and a third copy of nonproprietary data is maintained by the Canadian Astronomy Data Center in Victoria, British Columbia. Each of these data centers also archives different sets of related data. Under NASA policy for HST, nonproprietary data are freely available, but requests for large amounts of data must be approved by NASA headquarters and are subject to a negotiated level of cost recovery, typically the marginal cost of reproduction. Data analysis software appropriate for HST users was developed and is maintained by the STScI and is freely distributed to astronomers. It operates in a portable data analysis environment, the Image Reduction and Analysis Facility (IRAF), developed by the National Optical Astronomy Observatories. Although other large (and small) data analysis systems exist, IRAF has been adopted by a number of astronomy projects and is used by a large portion of the astronomy community both in the United States and in other countries. It incorporates both general astronomy-oriented data analysis tools and specific packages for individual observatories and facilities. GEOPHYSICAL DATA The World Data Centers In the Earth sciences, with the impetus of the International Geophysical Year in 1957, the World Data Centers (WDCs) were set up under the aegis of the International Council of Scientific Unions (ICSU).15 Their function was to provide international access to various types of observational geophysical data. This effort was very successful, and geoscientists since that time have taken advantage of the WDCs as an effective mechanism for the exchange of data. The WDCs circumvented what otherwise would have been insurmountable political barriers to exchange of scientific data in the era of the Cold War and allowed scientists from the East and the West to use data collected by both sides. The protocol for the WDCs was that any scientist could obtain any of the data residing in the WDCs without government or other restrictions. Of course, only subsets of data collected in different countries were placed in the WDCs, but substantial amounts were made available. Initially, the data were in analog form, but in recent years data holdings have been archived and disseminated in digital form (e.g., via tapes), and additional WDCs have been established for different types of geophysical data. Increasingly, users can browse the data holdings and receive data via electronic networks. In the United States, national data centers, such as the National Geophysical Data Center operated by the National Oceanic and Atmospheric Administration (NOAA), serve a dual role, with a subset of their holdings designated as a WDC and therefore available to any user, domestic or foreign. Other examples include

OCR for page 205
--> the U.S. Geological Survey's Earth Resources Observation Systems (EROS) Data Center, which houses the newly established WDC for land remote sensing, where a subset of remote sensing data is made available, 16 and the Department of Energy's Carbon Dioxide Information Analysis Center, which also serves as a WDC for trace gases in the atmosphere. 17 Although the WDCs provide one very effective avenue for the transnational flow of geophysical data, many important observational data sets are not available through the WDCs and must be obtained through other means, some of which are discussed below. Seismic Data Many thousands of seismic events occur throughout the world each year, some large and destructive. The detection and location of earthquakes and determination of their magnitudes require a globally distributed network of well-calibrated, sensitive seismic stations that continuously record ground motions. Such a network necessarily involves seismic stations in many countries around the world. Data are gathered by a combination of individual institutions and different regional and global networks operated by individual organizations under agreements with countries or institutions where the stations are located. The determination of the location, depth, time of occurrence, and magnitude of an earthquake makes use of data from ground motions observed by many stations, at different distances and azimuths from the source. Monitoring of global seismic activity therefore involves the transnational flow of data and information both in real time and on a recurring basis. Global seismic monitoring serves purposes other than earthquake hazard assessment and mitigation, the most important being enforcement of international treaties governing underground nuclear explosions. Underground explosions are recorded by the same seismic stations that record earthquakes; like earthquakes, they can be detected and located. Considerable past and current research is devoted to developing reliable methods for distinguishing underground nuclear explosions from natural earthquakes and from mining blasts. Under protocols currently being developed among participating countries, all parties are to have equal access to continuous real-time recordings from approximately 50 seismic arrays and from many additional single stations distributed around the world. Each country will then carry out its own assessment of recorded events. The transnational flow of these data in near-real time will be formalized as part of the Comprehensive Test Ban Treaty. In the early part of this century, the international exchange of seismic data was accomplished by a scientist writing to the seismologist operating each station of interest and asking to borrow the original (analog) recording of the event being studied. When the work was complete, the users returned the original recordings to the respective station operators. By contrast, a recently implemented data access capability at the Incorporated Research Institutions for Seismology (IRIS)

OCR for page 205
--> Data Management Center (DMC) allows any scientist (U.S. or international) to download via the Internet the signals recorded at approximately 20 global seismic stations within about 1 hour of an event's occurrence (magnitude greater than 5).18 Similarly convenient Internet access to continuous recordings of many other international stations is possible through the DMC, but with a time lag necessitated by not having real-time or near-real-time data transmission from some of the available stations. IRIS's DMC has become the international Federation of Digital Seismic Networks' archive for continuous digital data. Global digital seismic data from stations distributed around the globe are available through the DMC. Users can browse electronically to determine what data are available and can place requests for data sets they wish to receive; their requests are filled and the data transferred either electronically via the Internet (for modest-size data sets) or via high-density media such as Exabyte cassettes (for large requests). The DMC also serves as a broker for individuals who wish to obtain data from foreign stations that are not routinely archived at the DMC. This valuable service is accomplished by means of data transfer links to data archives in other countries; users would otherwise have to access and transfer data from these various sources individually. In this way, the DMC operates as a ''virtual" data center from which the user extracts desired data, some of which do not physically reside at the center. The World Weather Watch The World Weather Watch is the most formally organized international global observation, communication, processing, and archiving system at this time.19 This distinction stems from the early recognition that scientific understanding and prediction of the atmosphere, even for only a day or two in advance, require observations from very large areas. Beginning more than 100 years ago, the observations were sent by communication systems in near-real time through internationally agreed upon arrangements and procedures. For the last several decades, data have been processed and archived on a global basis through a system of world and regional meteorological centers and world and regional data centers for meteorology and oceanography. During this period the World Weather Watch has developed many of the characteristics required for an effective system for international exchanges of scientific data and therefore can be considered one of the primary models for other such systems. The development of the World Weather Watch was accelerated in the 1960s as a result of the potential capability of Earth-orbiting satellites to obtain atmospheric and oceanographic data on a global basis, and the advent of computers capable of handling large volumes of diverse data for numerical weather predictions on a global basis. An extensive planning and coordination process was put in place in the World Meteorological Organization (WMO) to expand the global observing component of the World Weather Watch through polar and geostation-

OCR for page 205
--> ary satellites and additional in situ observation, to develop an improved telecommunications system capable of exchanging data in real time among all nations, and to establish a system of supporting data centers. Three such centers, in Washington, D.C., Moscow, and Melbourne, were established in the mid-1960s, along with regional meteorological centers to serve specific continental and oceanic areas. These World Meteorological Centers are responsible for the preparation and distribution of an agreed upon set of global products to all nations, through the Global Telecommunications System of the World Weather Watch. Similarly, the regional meteorological centers prepare products as agreed for their specific areas of responsibility. The archival, storage, and retrieval systems for retrospective use of the data are maintained by the World Data Centers, and, by virtue of recent expansion, the Regional Data Centers. WMO does not operate any observing stations, telecommunication systems, or processing centers; through its member nations WMO is responsible only for the planning and coordination of the World Weather Watch. This includes developing the scope and extent of the observing networks, the characteristics and standards of the telecommunication systems, and the products to be prepared at the centers. The World Weather Watch, therefore, is built on the national meteorological systems of each member nation. The national meteorological system in the United States is quite extensive because of the great impact of weather, especially severe storms such as hurricanes, tornadoes, blizzards, and flash floods, on people and industry. Services are provided through a public-private partnership. The federal government is responsible for public forecasts and forecasts related to the safety of life and protection of property—severe weather and flood warnings for the country and surrounding oceans, and forecasts and advisories for aviation terminals and en route paths. The private sector provides tailored forecasts for specific clients, and through television, radio stations, and, to an increasing degree, the Internet leads in the dissemination of severe weather and flood warnings to the public. The federal government operates an extensive satellite and ground-based observing system, together with meteorological prediction and data centers, to obtain the data and products needed to carry out its responsibility for providing services. These data and products are made available to the private sector with no restrictions and at low incremental costs. These same data and products are used in fulfilling the requirements and agreements within the World Weather Watch and for research internationally and nationally. For example, the data from U.S. geostationary satellites can be received directly by centers in South America; the data from U.S. polar-orbiting meteorological satellites are distributed on the World Weather Watch Telecommunication System; and the products prepared by the National Meteorological Center near Washington, D.C., which are designed to meet national requirements, are provided to all countries through the World Weather Watch. Likewise, the data are available for all research programs, primarily through the NOAA National Climatic Data Center and the National

OCR for page 205
--> Oceanographic Data Center. Other nations have functioned in a similar way the same centers fulfilling both national needs and international commitments. The close interaction between operational meteorological services and research in the atmospheric sciences, nationally and internationally, has proved extremely effective. During the late 1960s and 1970s, an extensive program—the Global Atmospheric Research Program—was undertaken internationally by ICSU and WMO to improve the accuracy and extend the time range of weather forecasts. A joint mechanism was established within which the ICSU scientists led the planning of major observational field experiments and WMO led the implementation through national contributions of member nations. The largest and most complex was a global observational experiment during which the World Weather Watch Global Observation System was augmented with additional observations from ships, aircraft, and satellites to provide the most comprehensive set of global observations ever acquired. Again, the World Data Centers were responsible for archiving the data from the experiment for use in the associated research programs. Simultaneously during this period, the data from the World Weather Watch were being used by the U.S. government and private sector to provide services. Such multipurpose use of meteorological data has historically been very effective and efficient. However, the traditionally unrestricted exchange of data in meteorology has been placed in jeopardy in recent years. Pressure on weather services from some governments to charge users for services other than public weather forecasts and severe weather and flood warnings has led to proposals to place restrictions on the use of data and to charge substantially for real-time data or data sets between data centers. The meteorological services in Western Europe have been the most aggressive in charging industries and organizations for specialized services and private meteorological companies for data. This situation was a major consideration at the meeting of the WMO Congress in 1995, which adopted an understanding by members of the WMO that they would endorse the free and unrestricted exchange of data for research and education, and for an agreed set of data—satellite and in situ—to be exchanged in real time. However, it included a provision that an individual country could place restraints on data made available beyond the agreed level,20 and this has resulted both in a reduction of data freely available for research as well as in significant administrative expenses. NOTES 1.   See <http://www.nndc.bnl.gov>. 2.   See <http://www-cdf.fnal.gov/> for additional information on the Collider Detector at Fermilab. 3.   See <http://xxx.lanl.gov/> for the e-print archives. 4.   Summaries of data sources for materials science and engineering (both print and electronic) have been published. See H. Wawrousek, J.H. Westbrook, and W. Grattidge (1989), "Data Sources of Mechanical and Physical Properties of Engineering Materials," Physik Daten, 30-1, Fachinformationszentrum, Karlsruhe, Germany; J.H. Westbrook and W. Grattidge, eds. (1988),

OCR for page 205
-->     "The CODATA Referral Database (CRD)," based on the CODATA Database Directories and the 1988 revision of the UNESCO "Inventory of Data Referral Sources in Science and Technology," available from CODATA, 51 Boul. De Montmorency, Paris; J.H. Westbrook (1986), "Materials Information Sources," Encyclopedia of Materials Science and Engineering, M.B. Bever, ed., p. 527, Pergamon; F.C. Allan and W.R. Ferrell (1989), Database 12,(3):50-58; M.K. Booker (1986)''Computerized Materials Databases," Encyclopedia of Materials Science and Engineering, pp. 796-800, Pergamon. The role of the computer in accessing and manipulating materials data for alloy design is discussed by Westbrook in J.H. Westbrook (1993), "Data Compilation, Analysis, and Access: The Role of the Computer," MRS Bull., 18:44-49. R.A. Matula (1989), "The Importance of Numeric Databases to Materials Science," J. Res. Natl. Inst. Stand. Technol., 94:9-14, emphasizes the importance, in an industrial setting, of computer access to numeric databases in materials science. 5.   Functioning almost entirely without external financial support and on a volunteer basis, this CODATA task group coordinates work in this field, promotes standards, communication, and awareness; assists in education and training; and publishes an international register of materials database managers. 6.   See <http://www.cas.org/stn.html>. 7.   See <http://www.rs.ch/krinfo/products/datastar/sheets/RTEC.htm>. 8.   See <http://www.cas.org> for information about the Chemical Abstracts Service. 9.   See <http://www.derwent.co.uk>. 10.   Access fees to these are often prohibitive. Several respondents to the committee's "Inquiry to Interested Parties" (see Appendix D) noted specifically that they would like access to the Beilstein databases but considered them too costly. 11.   See the Cambridge Crystallographic Data Centre home page at <http://csdvx2.ccdc.cam.ac.uk/ccdchome.html>. 12.   See the National Center for Biotechnology Information home page at <http://www.ncbi.nlm.nih.gov/>. 13.   See <http:/www.nghgr.nih.bov/HGP/> for additional information regarding the Human Genome Project. 14.   See <http://www.stsci.edu/archive.html> for additional information about the Hubble Space Telescope archive. 15.   See <http://www.ngdc.noaa.gov/wdcmain.html> for a description of the World Data Centers System. 16.   See <http://edc.www.cr.usgs.gov> for the EROS Data Center home page. 17.   See <http://cdiac.esd.ornl.gov> for the Carbon Dioxide Information Analysis Center home page. 18.   See <http://www.iris.washington.edu/dmc.new.html> for the IRIS Data Management Center home page. 19.   For additional information on the World Weather Watch, see the World Meteorological Organization home page at <http://www.wmo.ch:80/www/www.html>. 20.   See R.S. Greenfield, E.W. Friday, and M.C. Yerg (1995), "WMO Adopts a Resolution Governing the International Exchange of Meteorological and Related Data and Products," Bulletin of the American Meteorological Society, 76(8):1478-1479.