Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 1
1
Executive Summary
Humans have long been intrigued by the forces that shape them
and other organisms. What blueprint dictates blue eyes, brown hair,
or the form of a flower? More than 100 years ago Gregor Mende!
discovered that such inherited traits are controlled by cellular units
that later became known as genes. In recent years, our understanding
of these genes has been greatly increased by knowledge of the
molecular biology of DNA, the giant molecule from which genes are
formed. It is now feasible to obtain the ultimate description of genes
and DNA, since recently developed techniques enable us to map
(Iocate) the genes in the DNA of any organism and then to sequence
(order) each of the DNA units, known as nucleotides, that constitute
the genes.
As more of our genes are mapped and their DNA sequenced, we
will have an increasingly useful resource an essential data base that
will facilitate research in biochemistry, physiology, cell biology, and
medicine. This data base will have a major impact on health care and
disease prevention as well as on our understanding of cells and
organisms. The concept of organizing a large project to map and
sequence the DNA in the genes and the intergenic regions that connect
them (the entire human DNA complement or genome) has received
increasing attention worldwide. Several countries have expressed
interest in launching such a project. To evaluate what the United
States should be doing in this area, the Board on Basic Biology of
the National Research Council's Commission on Life Sciences estab-
lished the Committee on Mapping and Sequencing the Human Genome,
whose findings are reported in this document.
OCR for page 2
2
MAPPING AND SEQUENCING TlIE HUMAN GENOME
In this report the committee explores how, when, and why we
should map and sequence the DNA in the human genome. In studying
these issues, the committee reached the following conclusions:
· Acquiring a map, a sequence, and an increased understanding of
the human genome merits a special effort that should be organized
and funded specifically for this purpose. Such a special effort in the
next two decades will greatly enhance progress in human biology and
, . .
mealclne.
· The technical problems associated with mapping and sequencing
the human and other genomes are sufficiently great that a scientifically
sound program require a diversified, sustained effort to improve our
ability to analyze complex DNA molecules. Although the needed
capabilities do not yet exist, the broad outlines of how they could be
developed are clear. Prospects are therefore good that the required
advanced DNA technologies would emerge from a focused effort that
emphasizes pilot projects and technological development. Once es-
tablished, these technologies would not only make the complete
analysis of the human and other genomes feasible, but would also
make major contributions to many other areas of basic biology and
biotechnology.
· Important early goals of the effort should be to acquire a high-
resolution genetic linkage map of the human genome, a collection of
ordered DNA clones, and a series of complementary physical maps
of increasing resolution. The ultimate goal would be to obtain the
complete nucleotide sequence of the human genome, starting from
the materials in the ordered DNA clone collection. Attaining this goal
would require major (but achievable) advances in DNA handling and
sequencing technologies.
· A comparative genetic approach is essential for interpreting the
information in the human genome. Therefore, intensive studies of
those organisms that provide particularly useful models for under-
standing human gene structure, function, and evolution must be
carried out in parallel.
· The mapping and sequencing effort should begin primarily as a
series of competing, peer-reviewed programs emphasizing technology
development. Funding should include both grants to individuals and
grants to medium-sized multidisciplinary groups of scientists and
engineers. Because the technology required to meet most of the
project's goals needs major improvement, the committee specifically
recommends against establishing one or a few large sequencing centers
at present.
· The human genome project should differ from present ongoing
OCR for page 3
EXECUTIVE SUMMAR Y
3
research inasmuch as the component subprojects should have the
potential to improve by 5- to 10-fold increments the scale or efficiency
of mapping, sequencing, analyzing, or interpreting the information in
the human genome.
· Progress toward all the above goals will require the establishment
of well-funded centralized facilities, including a stock center for the
cloned DNA fragments generated in the mapping and sequencing
effort and a data center for the computer-based collection and
distribution of large amounts of DNA sequence information. The
committee suggests that the groups supplying these services be selected
through open competition.
On the basis of these conclusions, the committee recommends the
following:
· In New of the importance and magnitude of the task, a rapid
scale-up to $200 million of additional funding per year is recommended.
These additional funds should not be diverted from the current federal
research budget for biomedical sciences.
A majority of the committee recommends:
· A single federal agency should serve as the lead agency for the
project. This agency would receive and administer the funds for the
project and would be responsible for the operation of the stock center
and data center, as well as administer the peer review system utilized
in determining the recipients of funds. It should work closely with a
Scientific Advisory Board in developing and implementing a high
standard of peer review. The Scientific Advisory Board, composed
primarily of expert scientists knowledgeable in relevant fields, would
provide advice not only on peer review, but also on quality control,
international cooperation, coordination of efforts of the laboratories
in the project, and the operations of the stock and data centers.
An outline of the major issues presented in this report follows, with
genome mapping, genome sequencing, the handling of information
and materials, and strategies for implementation and management of
a human genome project discussed in turn.
An outline of the human genome and its central role in human
biology is shown in Figure 1-1.
GENOME MAPPING
The two main types of human genome maps are genetic linkage
maps and physical maps. Genetic linkage maps are made mainly by
OCR for page 4
1. Nucleo1Ides, four diNerenl
~_~-
~^~_~-
_l~ _c u~ .
paired In specific
=_~ ^n
_~
2. ^genecon1~1ns
_~ ~-
_- _
_
a _~ ~m
^ am_ ~
~ long DN^ doubly helix
-~
with protelos. Scull
~ ~" -~_m~
c-1elas about
1 ~ and
^~_~
chromosomes in its
Mu_
in10rma1~n In 1-
=m" ~
menu1ecture proteins.
5. Thy human body has abou110
__ ~ 1
_ at avow _
_ _~,
~__~-
3 billion palrs ol nucleo11des
___
of genes and chromosomes.
~;I~
~ ~:~..~isSi~i~!!~is~:~i~:~s:~s~ss~s~:;:~:ss~s~i~.~ss~.sss,~
FIOURE 1-1 ^d~ted Mom an illustralion by Women Isensee far ~e CA~~/e Or
a, September 3, 1986, wi1b permission Tom 1be publisher.
OCR for page 5
EXECUTIVE SUMMARY
5
studying families and measuring the frequency with which two different
traits are inherited! together, or linked. Physical maps are derived
mainly from chemical measurements made on the DNA molecules
that form the human genome. These maps can be of several different
types and include restriction maps and ordered DNA clone collections,
as well as lower resolution maps of expressed genes or anonymous
(function unknown) DNA segments that are mapped by somatic cell
hybridization or by in situ chromosome hybridization. All these maps
share the common goal of placing information about human genes in
a systematic linear order according to their relative positions along
each chromosome. Knowing the location of genes and the correspond-
ing genetic traits they produce allows us to discover patterns of
genomic organization with important functional consequences and to
compare humans with other mammals. Detailed maps of the human
genome should quickly lead to major human health benefits. For
example, by identifying genes or regions of DNA involved in several
diseases, including hereditary forms of cancer, Alzheimer's disease,
manic-depressive illness, Huntington's disease, and cystic fibrosis,
new methods of diagnosis and treatment can be developed. Equally
important, the better understanding of human biology that would
follow from these studies would contribute broadly to the treatment
of most diseases.
The committee believes that full-scale mapping, both genetic linkage
and physical, should begin immediately. Current mapping efforts are
being carried out gene by gene. Each gene is only a small part of the
entire complement of DNA, and the methods involved therefore
require the equivalent of repeatedly finding a needle in a haystack. In
contrast, in any effort to map the entire human genome, each of the
many DNA segments that are obtained by cloning the human genome
will be initially kept as relevant to the project. These then represent
part of a puzzle that is solved by ordering each DNA segment
according to its position in the genome. The cost of obtaining any
particular DNA clone in such a collection of ordered DNA clones is
relatively small. As a result, a project of this type will quickly pay
for itself by saving the enormous aggregate costs involved when each
laboratory must find its own DNA clones.
Several recent breakthroughs in mapping methods make obtaining
the type of detailed data needed in human genome maps a realistic
goal. These breakthroughs range from vastly improved methods for
physical mapping that rely on new techniques for separating and
manipulating DNA molecules to much more accurate mathematical
methods for analyzing genetic linkage data on the basis of restriction
fragment length polymorphisms (RF~Ps). A great deal of synergism
OCR for page 6
6
MAPPING AND SEQUENCING TlIE HUMAN GENOME
exists between genetic linkage and physical mapping methods. Because
of the simultaneous advances in both techniques, there is a real
possibility that a detailed physical and genetic linkage map of the
human genome couIcl be constructed in a relatively short time. This
map would be extremely useful in its own right and would set the
stage for constructing the ultimate physical map-the complete DNA
sequence of the human genome.
The committee concluded that the development and refinement of
techniques should be emphasized early in the mapping part of the
project. Despite recent advances, physical mapping methods need
improvement. For example, DNA fragments as much as 10 million
nucleotides long (Woo the total human genome) can be handled only
with considerable difficulty, and such large fragments cannot yet be
cloned. Ordered DNA clone collections have been started, but not
completed, for several organisms with genomes that are at most l/50
the size of the human genome. Advanced technology, such as the
handling of larger DNA molecules and the development of new cloning
vectors for them, will expedite the preparation of such clone collec-
tions. Thus, much of the effort in the next few years should be devoted
to refining existing mapping techniques and developing even more
powerful new ones.'
The committee believes that most support should go to groups that
are attempting to map large genomes, with support for different
mapping methods proceeding in parallel. Improved methods for the
following would facilitate map construction and usefulness:
· Separating intact human chromosomes.
· Separating and immortalizing identified fragments of human chro-
mosomes.
· Cloning complementary copies of expressed genes, called com-
plementary DNAs (cDNAs), especially those that represent rare
cell-, tissue-, and development-specific messenger RNAs.
· Cloning very large DNA fragments.
· Purifying very large DNA fragments, including higher resolution
methods for separating such fragments.
· Ordering the adjacent DNA fragments in a DNA clone collection.
· Automating the various steps in DNA mapping, including those
of DNA purification and hybridization analysis, and the development
of novel methods that allow simultaneous handling of many DNA
samples.
GENOME SEQUENCING
The nucleotide sequence of the genome is the physical map at the
highest level of resolution. It provides the information that constitutes
OCR for page 7
EXECUTI VE S UMMAR Y
7
the genetic complement of an individual. For the human, a total of
about 3 billion (3 x 109) nucleotides must be ordered; simply to print
out such a DNA sequence would require nearly a million pages in a
book like this. To obtain this critical resource in a timely fashion a
special effort must be mounted, but, because of the high cost arid
slow rate of DNA sequencing with current technology, sequencing of
the entire genome should not be initiated at present. Instead' the
committee believes that two general types of effort should be en-
couraged to increase the efficiency of DNA sequencing.
First, pilot projects should be corrected with a goal of sequencing
approximately 1 million continuous nucleotides (which is 5 to 10 times
as large as the largest continuous regions that have been sequenced
to date). Such projects will provide an opportunity to implement and
test improvements of existing technology as they occur and will also
provide a practical impetus for technological developments. They will
also reveal where the most serious problems in data analysis are likely
to arise in still larger projects. For example, will repetitive sequences
or cloning artifacts complicate the assembly of a unique, contiguous
sequence? How will new genes be identified correctly? Only by
attempting relatively large-scale nucleotide sequencing will we gain
insight into these issues.
Second, improvements in existing sequencing technology and the
development of entirely new technologies should be vigorously en-
couraged. This would include applications of automation and robotics
at all steps in sequencing. It is useful to think in terms of trying to
achieve 5- to ]0-fold incremental improvements in the scale and speed
of DNA sequencing.
To derive the major benefits from a human genome sequence, it
will be necessary to have an extensive data base of DNA sequences
from the mouse (whose genome is the same size as that of the human)
and from simpler organisms with much smaller genomes, such as
bacteria, yeast, Dro.sophilc' melc~no~c~ster (a fruit fly), and Caenor-
hc~bditi* elegant (a nematode worm). This information would allow
the counterparts of important human genes to be readily identified in
organisms in which their functional roles are generally easier to study.
In addition, many genes will initially be found to be important in these
other organisms and will lead to corollary human studies. Comparative
sequence analysis with an organism such as the mouse is a powerful
technique for distinguishing those elements of a nucleotide sequence
that are important (and therefore conserved during evolution) from
those that are not. To succeed, therefore, this project must not be
restricted to the human genome; rather, it must include an extensive
sequence analysis of the genomes of selected other species.
A mechanism of quality control is needed for the groups that are
OCR for page 8
8
MAPPING AND SEQUENCING THE HUMAN GENOAdE
contributing large amounts of sequence information. For example, a
unit could be established to redetermine a small fraction of the
sequence submitted by each sequencing unit, thereby providing an
independent check on the accuracy of the sequences being obtained
by the unit.
INFORMATION AND MATERIALS HANDLING
Considerable data will be generated from the mapping and sequenc-
ing project. Unless this information is effectively collected, stored,
analyzed, and provided in an accessible form to the general research
community worldwide, it will be of little value. This project will also
require an unprecedented sharing of materials among the laboratories
involved. Because access to all sequences and materials generated by
these publicly funded projects should and even must be made freely
available, two different types of centralized facilities will be needed:
(1 ) information centers to collect and distribute mapping and sequenc-
ing data and (2) centers to collect and distribute materials such as
DNA clones and human cell lines.
For an information center to cope effectively with the vast amount
of DNA sequence data, all such data must be provided to the center
in electronic or magnetic form. The information center must also be
effectively linked by a computer network to all the users of the data.
An initial analysis of these data should be carried out by the central
facility to help in classifying the data for future research accessibility.
Both at the information center and in other laboratories, extensive
research in methods of sequence data analysis should be encouraged.
A facility for collecting and distributing materials should be orga-
nized to handle the cloned DNA fragments generated and mapped in
the many different laboratories involved. This facility would store the
appropriate DNA clones, index them according to some agreed-on
plan, and then redistribute them to all laboratories that request them.
The facility might also be involved in the routine conversion of large
human DNA fragments, cloned as artificial chromosomes, into more
readily accessible bacterial virus or cosmic DNA clone collections.
It may also need to fingerprint all the DNA clones by a single method
to provide a standard indexing procedure.
IMPLEMENTATION STRATEGIES
Much of the concern that has been expressed about a project to
map and sequence the human genome stems from its high projected
cost and the potential changes that may result in the infrastructure of
OCR for page 9
EXECUTIVE SUMAIAR Y
9
the current biological research community. The committee examined
the cost of the project and concluded that an annual budget of $200
million over the next 15 years would not be excessive when compared
with the value of the results that would be produced. The expenditure
of $200 million per year on the project would represent roughly 3
percent of the total annual U.S. government expenditure on biological
research. It would thus leave the crucial task of functional studies to
traditionally supported biological research.
All decisions for funding should be based on a peer review by those
expert in the methods involved. This does not mean that funding
would be allocated only to individual investigators, inasmuch as
multidisciplinary research centers of modest size, as well as an
information center and material handling unit, will be required. Some
groups may be more appropriately funded by contracts than by grants.
However, the committee believes that these contracts should be
awarded only after an open, peer-reviewed competition.
Genome mapping, both genetic linkage ant] physical, is already
under way and should be intensified, although a major portion of the
initial monies should be devoted to improving technologies. Large-
scale sequencing should be deferred until technical improvements
make this effort appropriate. This recommendation is based on the
realization that the human genome is orders of magnitude larger than
the genome of any other organism that has yet been mapped or
sequenced. To cope with this vastly greater size, it seems advisable
to establish a special competitive program that focuses on improving
in 5- to 10-fold increments the scale or efficiency of mapping,
sequencing, analyzing, or interpreting the biological information in
the human genome.
The actual mapping of the human genome should begin now. In
contrast, while a variety of pilot projects should be encouraged, only
after the technology is developed and an adequate quality control
procedure is established should a large-scale sequencing effort begin
on the human genome.
A human genome project of this type need not threaten the existing
biological research community for several reasons. First, the money
ought not be provided at the expense of currently funded biological
research. Second, it ought to be distributed by peer review. Third,
by including selected other organisms required for the interpretation
of the human genome map and sequence, the project should not
mislead the public into placing a false emphasis on the uniqueness of
human materials for understanding ourselves. Fourth, this project
ought to include work by both small research laboratories and larger
multidisciplinary centers formed by juxtaposing several small research
OCR for page 10
lo
MAPPING AND SEQUENCING THE HUMAN GENOME
groups having different expertise. Since individual investigators work-
ing in small groups have been the source of nearly all the major
methodological breakthroughs that have driven the modern revolution
ire biology, the proposed organization ensures that our extraordinarily
successful pattern of doing biology will be preserved.
In multidisciplinary centers, 3 to 10 research groups, each with an
outstanding independent scientific director anal a different but related
focus, are envisioned as sharing equipment and personnel in core
facilities and collaborating to accomplish a larger goal than any single
group could readily achieve on its own. These centers could efficiently
coordinate the large number of different experimental and computer
capabilities needed for the development of techniques as well as work
out optimal strategies that produce actual mapping and sequencing
data.
The committee does not believe that one or a few large production
centers for mapping or sequencing should be established at this time.
Strong technical and intellectual advantages are obtained by distrib-
uting mapping and sequencing work among smaller multidisciplinary
centers and individual research laboratories. One major advantage is
that the resulting competition will stimulate research. Another is that
it allows the most successful units to be identified so that the available
resources can be directed to them. Moreover, the dispersal of the
groups will allow close interactions to be established with a large
number of other biological scientists. These interactions will be
essential both for the intellectual contributions derived from other
scientists and for enabling the new techniques developed in this project
to be applied quickly and efficiently to a wide variety of important
biological problems.
MANAGEMENT STRATEGY
For the human genome project to be of maximum value, the
committee believes that it needs to be well organized and coordinated.
For this to be effectively done, a majority of the committee members
feels that the project should be sited within one of three federal
agencies: the National Institutes of Health, the Department of Energy,
or the National Science Foundation. This lead agency would receive
a specific appropriation for the project and be responsible for the
disbursement of funds through a peer-review process. It would be
responsible for the operation of the stock center and the data center,
the coordination of the efforts of the many laboratories involved in
the effort, and serve as an information clearinghouse. It would also
OCR for page 11
EXECUTIVE SUMMARY
1
1
handle the many other administrative details that will arise in a project
of this magnitude.
Although the lead agency would have the ultimate responsibility
for funding and policy decisions, it should draw on the advice and
expertise of a Scientific Advisory Board (SAB). The SAB would be
made up predominately of scientists with expertise in the methods
and goals of the project. The major responsibilities of the SAB would
include:
· To facilitate coordination of the efforts of the many laboratories
that are expected to participate in this effort.
· To help assure the accessibility of all information and materials
generated in the project by advising on the oversight of the data center
and the stock center and recommending contracts where appropriate.
It would oversee formation of standard terminologies and reporting
formats so that the large body of information to be obtained can be
readily communicated and analyzed by the entire scientific community.
· To monitor the quality of research by helping to assure a uniform
standard of peer review.
· To suggest mechanisms for strict quality controls on the sequence
and mapping data collected.
· To promote international cooperation, serving as a liaison to
projects outside the United States regardless of their funding sources.
· To make recommendations concerning the establishment of large
sequencing endeavors, thereby balancing focus with breadth.
· To publish periodic reports stating progress, problems, and
recommendations for research.
The committee strongly believes that a project to map and sequence
the human genome should be undertaken. It is aware of the ethical,
social, and legal implications of such an effort, but feels that they can
be adequately addressed. This project would greatly increase our
understanding of human biology and allow rapid progress to occur in
the diagnosis and ultimate control of many human diseases. As
visualized, it would also lead to the development of a wide range of
new DNA technologies and produce the maps and sequences of the
genomes of a number of experimentally accessible organisms, provid-
ing central information that will be important for increasing our
understanding of all biology.
Representative terms from entire chapter:
dna clone