Building an Electronic Records Archive at the National Archives and Records Administration

Recommendations for Initial Development

Committee on Digital Archiving and the National Archives and Records Administration

Computer Science and Telecommunications Board

NATIONAL RESEARCH COUNCIL OF THE NATIONAL ACADEMIES

Robert F. Sproull and Jon Eisenberg, Editors

THE NATIONAL ACADEMIES PRESS
Washington, D.C. www.nap.edu



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page R1
Building an Electronic Records Archive at the National Archives and Records Administration Recommendations for Initial Development Committee on Digital Archiving and the National Archives and Records Administration Computer Science and Telecommunications Board NATIONAL RESEARCH COUNCIL OF THE NATIONAL ACADEMIES Robert F. Sproull and Jon Eisenberg, Editors THE NATIONAL ACADEMIES PRESS Washington, D.C. www.nap.edu

OCR for page R1
THE NATIONAL ACADEMIES PRESS 500 Fifth Street, N.W. Washington, DC 20001 NOTICE: The project that is the subject of this report was approved by the Governing Board of the National Research Council, whose members are drawn from the councils of the National Academy of Sciences, the National Academy of Engineering, and the Institute of Medicine. The members of the committee responsible for the report were chosen for their special competences and with regard for appropriate balance. Support for this project was provided by the National Archives and Records Administration under Contract No. NAMA-02-C-0012. Any opinions, findings, conclusions, or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the organizations that provided support for the project. International Standard Book Number 0-309-08947-6 (Book) International Standard Book Number 0-309-51729-X (PDF) Copies of this report are available from the National Academies Press, 500 Fifth Street, N.W., Lockbox 285, Washington, DC 20055. Telephone (800) 624-6242 or (202) 334-3313 in the Washington metropolitan area. Internet, http://www.nap.edu. Copyright 2003 by the National Academy of Sciences. All rights reserved. Printed in the United States of America

OCR for page R1
THE NATIONAL ACADEMIES Advisers to the Nation of Science, Engineering, and Medicine The National Academy of Sciences is a private, nonprofit, self-perpetuating society of distinguished scholars engaged in scientific and engineering research, dedicated to the furtherance of science and technology and to their use for the general welfare. Upon the authority of the charter granted to it by the Congress in 1863, the Academy has a mandate that requires it to advise the federal government on scientific and technical matters. Dr. Bruce M. Alberts is president of the National Academy of Sciences. The National Academy of Engineering was established in 1964, under the charter of the National Academy of Sciences, as a parallel organization of outstanding engineers. It is autonomous in its administration and in the selection of its members, sharing with the National Academy of Sciences the responsibility for advising the federal government. The National Academy of Engineering also sponsors engineering programs aimed at meeting national needs, encourages education and research, and recognizes the superior achievements of engineers. Dr. Wm. A. Wulf is president of the National Academy of Engineering. The Institute of Medicine was established in 1970 by the National Academy of Sciences to secure the services of eminent members of appropriate professions in the examination of policy matters pertaining to the health of the public. The Institute acts under the responsibility given to the National Academy of Sciences by its congressional charter to be an adviser to the federal government and, upon its own initiative, to identify issues of medical care, research, and education. Dr. Harvey V. Fineberg is president of the Institute of Medicine. The National Research Council was organized by the National Academy of Sciences in 1916 to associate the broad community of science and technology with the Academy’s purposes of furthering knowledge and advising the federal government. Functioning in accordance with general policies determined by the Academy, the Council has become the principal operating agency of both the National Academy of Sciences and the National Academy of Engineering in providing services to the government, the public, and the scientific and engineering communities. The Council is administered jointly by both Academies and the Institute of Medicine. Dr. Bruce M. Alberts and Dr. Wm. A. Wulf are chair and vice chair, respectively, of the National Research Council. www.national-academies.org

OCR for page R1
COMMITTEE ON DIGITAL ARCHIVING AND THE NATIONAL ARCHIVES AND RECORDS ADMINISTRATION ROBERT F. SPROULL, Sun Microsystems, Chair HOWARD BESSER, University of California, Los Angeles JAMIE CALLAN, Carnegie Mellon University CHARLES DOLLAR, Dollar Consulting STUART HABER, Hewlett-Packard Laboratories MARGARET HEDSTROM, University of Michigan MARK KORNBLUH, Michigan State University RAYMOND LORIE, IBM Almaden Research Center CLIFFORD LYNCH, Coalition for Networked Information JEROME H. SALTZER, Massachusetts Institute of Technology MARGO SELTZER, Harvard University ROBERT WILENSKY, University of California, Berkeley Staff JON EISENBERG, Study Director and Senior Program Officer STEVEN WOO, Program Officer DAVID PADGHAM, Research Associate JENNIFER M. BISHOP, Senior Project Assistant

OCR for page R1
COMPUTER SCIENCE AND TELECOMMUNICATIONS BOARD DAVID D. CLARK, Massachusetts Institute of Technology, Chair ERIC BENHAMOU, 3Com Corporation DAVID BORTH, Motorola Labs JOHN M. CIOFFI, Stanford University ELAINE COHEN, University of Utah W. BRUCE CROFT, University of Massachusetts at Amherst THOMAS E. DARCIE, University of Victoria JOSEPH FARRELL, University of California at Berkeley JOAN FEIGENBAUM, Yale University HECTOR GARCIA MOLINA, Stanford University WENDY KELLOGG, IBM Thomas J. Watson Research Center BUTLER W. LAMPSON, Microsoft Corporation DAVID LIDDLE, U.S. Venture Partners TOM M. MITCHELL, Carnegie Mellon University DAVID A. PATTERSON, University of California at Berkeley HENRY (HANK) PERRITT, Chicago-Kent College of Law DANIEL PIKE, GCI Cable and Entertainment ERIC SCHMIDT, Google, Inc. FRED SCHNEIDER, Cornell University BURTON SMITH, Cray Inc. LEE SPROULL, New York University WILLIAM STEAD, Vanderbilt University JEANNETTE M. WING, Carnegie Mellon University MARJORY S. BLUMENTHAL, Director KRISTEN BATCH, Research Associate JENNIFER M. BISHOP, Senior Project Assistant JANET BRISCOE, Administrative Officer DAVID DRAKE, Senior Project Assistant JON EISENBERG, Senior Program Officer RENEE HAWKINS, Financial Associate PHIL HILLIARD, Research Associate MARGARET MARSH HUYNH, Senior Project Assistant ALAN S. INOUYE, Senior Program Officer HERBERT S. LIN, Senior Scientist LYNETTE I. MILLETT, Program Officer DAVID PADGHAM, Research Associate CYNTHIA A. PATTERSON, Program Officer JANICE SABUDA, Senior Project Assistant BRANDYE WILLIAMS, Staff Assistant STEVEN WOO, Dissemination Officer For more information on CSTB, see its Web site at <http://www.cstb.org>, write to CSTB, National Research Council, 500 Fifth Street, N.W., Washington, DC 20001, call at (202) 334-2605, or e-mail the CSTB at cstb@nas.edu.

OCR for page R1
This page in the original is blank.

OCR for page R1
Preface Like its constituent agencies and other organizations, the federal government generates and increasingly saves a large and growing fraction of its records in electronic form. Recognizing the greater and greater importance of these electronic records for its mission of preserving “essential evidence,” the National Archives and Records Administration (NARA) launched a major new initiative, the Electronic Records Archives (ERA). NARA plans to commence the initial procurement for a production-quality ERA in 2003 and has started a process of defining the desired capabilities and requirements for the system. As part of its preparations for an initial ERA procurement, NARA asked the National Academies’ Computer Science and Telecommunications Board (CSTB) to provide independent technical advice on the design of an electronic records archive, including an assessment of how work sponsored by NARA at the San Diego Supercomputer Center (SDSC) helps inform the ERA design and what key issues should be considered in ERA’s design and operation. CSTB’s Committee on Digital Archiving and the National Archives and Records Administration has been tasked with preparing two reports. This first of the two reports is intended to provide quick, preliminary feedback to NARA on lessons it should take from the SDSC work and to identify key ERA design issues that should be addressed as the ERA procurement process proceeds in 2003. The committee’s second report, anticipated in late 2003, will provide longer-term strategic recommendations to NARA on how to meet its electronic records archiving challenges. In order to provide feedback as soon as possible, this report has been developed on a very tight time line. In preparing it, the committee received briefings from NARA staff and a number of other experts in archiving and related technologies. It conducted two site visits to supplement information received in the briefings: Members of the committee participated in

OCR for page R1
visits to SDSC in San Diego and to NARA’s College Park, Maryland, facility. The committee’s second report will provide longer-term strategic recommendations to NARA on how to meet its electronic records archiving challenges. A number of topics in the committee’s charter, such as advice on NARA’s research program, are deferred to the second report.

OCR for page R1
Acknowledgment of Reviewers This report has been reviewed in draft form by individuals chosen for their diverse perspectives and technical expertise, in accordance with procedures approved by the National Research Council’s Report Review Committee. The purpose of this independent review is to provide candid and critical comments that will assist the institution in making its published report as sound as possible and to ensure that the report meets institutional standards for objectivity, evidence, and responsiveness to the study charge. The review comments and draft manuscript remain confidential to protect the integrity of the deliberative process. We wish to thank the following individuals for their review of this report: William Y. Arms, Cornell University, Eric W. Brown, IBM Research, Paul Conway, Duke University, James Gray, Microsoft Bay Area Research Center, Gary King, Harvard University, Butler W. Lampson, Microsoft Corporation, Michael E. Lesk, Internet Archive, Peter G. Neumann, SRI International, Jeff Rothenberg, RAND, William Scherlis, Carnegie Mellon University, and Jeffrey D. Ullman, Stanford University (emeritus) Although the reviewers listed above provided many constructive comments and suggestions, they were not asked to endorse the conclusions or recommendations, nor did they see the final draft of the report before its release. The review of this report was overseen by Robert

OCR for page R1
J. Spinrad, Xerox Corporation (retired). Appointed by the National Research Council, he was responsible for making certain that an independent examination of this report was carried out in accordance with institutional procedures and that all review comments were carefully considered. Responsibility for the final content of this report rests entirely with the authoring committee and the institution.

OCR for page R1
Contents     SUMMARY AND RECOMMENDATIONS   1 1   INTRODUCTION   13 2   COMMONALITIES BETWEEN REQUIREMENTS FOR THE ERA AND REQUIREMENTS FOR OTHER ACTIVITIES   15 3   SPECIFIC LESSONS TO BE LEARNED FROM THE SDSC DEMONSTRATION PROJECTS   19     Lessons from the SDSC Project That May Be Helpful in Designing the ERA,   20     Aspects of the SDSC Project That Might Not Apply to the NARA System,   22     Areas Where the SDSC Project Experience Should Not Be Used in Designing the NARA System,   23 4   DESIGNING AND ENGINEERING THE ERA   24     An Engineering Approach,   24     Data and Estimates to Support the Definition of Initial Requirements,   25     Pragmatic Engineering Decisions,   26     Supporting Future Archivists and Researchers,   29     Pragmatic Steps to Facilitate Future Access to Records,   30

OCR for page R1
5   KEY TECHNICAL ISSUES   35     Data Model,   35     Storage,   42     Ingest,   46     Access,   49     Security and Access Control,   53     Integrity of Records,   56 6   STRENGTHENING INFORMATION TECHNOLOGY EXPERTISE   58     Expertise to Design and Evolve the ERA,   58     Expertise to Operate the ERA,   60 7   STRATEGY FOR EVOLUTION AND ACQUISITION   62     Strategy for Evolution,   62     Iterative (Spiral) Development,   67     Pilots: Starting Small and Gaining Experience,   68     APPENDIXES         A BACKGROUND ON NARA AND THE ERA PROGRAM   73     B CONCLUSIONS FROM THE GENERAL ACCOUNTING OFFICE REPORT INFORMATION MANAGEMENT: CHALLENGES IN MANAGING AND PRESERVING ELECTRONIC RECORDS   80     C BRIEFERS TO THE STUDY COMMITTEE   81     WHAT IS CSTB?   83