1
Role of Genomics in Advancing Science

INTRODUCTION

The Genomics: GTL program of the U.S. Department of Energy (DOE) is a fundamental research program to achieve a predictive understanding of microbial systems through systems biology. The goal is to build models of organisms and communities to predict their behavior under different environmental conditions on the basis of their genomes. The program has been funding microbial genomics projects relevant to DOE mission goals since 2002. DOE plans to expand the program and build infrastructure for it. On the basis of the Energy Basic and Applied Sciences Act of 2005, DOE asked the National Research Council to convene an ad hoc committee to review the plans for the Genomics: GTL program, specifically the facilities plans.

Charge to the Committee

The committee was asked to address the following questions:

  1. Is the Genomics: GTL program, as currently designed, scientifically and technically well tailored to the challenges faced by the DOE in energy technology and development and environmental remediation?

  2. Does the proposed Genomics: GTL research and facility investment strategy leverage DOE scientific and technical expertise in the most cost-effective, efficient, and scientifically optimal manner? Specifically, does the business model (i.e., number, scope, scale, order, and user operation plan) for the proposed



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program 1 Role of Genomics in Advancing Science INTRODUCTION The Genomics: GTL program of the U.S. Department of Energy (DOE) is a fundamental research program to achieve a predictive understanding of microbial systems through systems biology. The goal is to build models of organisms and communities to predict their behavior under different environmental conditions on the basis of their genomes. The program has been funding microbial genomics projects relevant to DOE mission goals since 2002. DOE plans to expand the program and build infrastructure for it. On the basis of the Energy Basic and Applied Sciences Act of 2005, DOE asked the National Research Council to convene an ad hoc committee to review the plans for the Genomics: GTL program, specifically the facilities plans. Charge to the Committee The committee was asked to address the following questions: Is the Genomics: GTL program, as currently designed, scientifically and technically well tailored to the challenges faced by the DOE in energy technology and development and environmental remediation? Does the proposed Genomics: GTL research and facility investment strategy leverage DOE scientific and technical expertise in the most cost-effective, efficient, and scientifically optimal manner? Specifically, does the business model (i.e., number, scope, scale, order, and user operation plan) for the proposed

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program Genomics: GTL facilities follow directly from the science case—should one exist—for systems biology at DOE? Are there alternate models for some of the proposed effort that could more efficiently deliver the same scientific output? In an era of flat or declining budgets, which aspects of the proposed Genomics: GTL program are the most meritorious? Which appear to have the highest ratio of scientific benefit to cost? This report was prepared by the committee in response to that charge. To provide background information, the committee gives a brief introduction on genomics and the scientific advances that genomics has brought and describes DOE’s role in genomics research and its Genomics: GTL program in Chapter 1. In Chapter 2, the committee examines the role that the Genomics: GTL program could play in achieving DOE’s mission goals. The committee reviews the design of the program and its infrastructure plan in the last chapter. SCIENTIFIC ADVANCES BROUGHT BY GENOMICS Genomics is the study of the structure, content, and evolution of genomes and the analysis of the expression and function of genes and proteins at the level of the whole cell or organism (Gibson and Muse, 2002). Genomics has many subfields—including functional genomics, structural genomics, proteomics, and metagenomics—and it makes use of bioinformatics and other computational tools to study the global properties of genomes. Such genomic tools as high-throughput DNA sequencing, microarrays, and the polymerase chain reaction have revolutionized biomedical science. The first full genome sequence of a free-living organism, Haemophilus influenzae, was determined 10 years ago (Fleischmann et al., 1995). The process was expensive and took years to accomplish, but completion of the sequence established several important principles. It showed that the so-called shotgun assembly technique was workable and effective in sequencing whole genomes. And it became clear that our understanding of the genetic information in a microorganism was much less than expected—a lesson still true 10 years later, when as much as 30 percent of the open reading frames of new microbial genomes are found to have unknown function. Genome sequencing was quickly applied to microorganisms with larger and more complex genomes, including the yeasts Saccharomyces cerivisiae and Schizosaccharomyces pombe, and then to a series of model organisms, including the nematode, fruit fly, mustard, and mouse. With each new organism came a greater understanding of the organization and function of genomes and the identification of new genes and metabolic pathways. With the completion of the draft human genome sequence in 2003, the basis for rapidly understanding much of the genome information through comparative genomics was in place. The sequencing of the human genome has provided detailed genetic information about specific genes and pathways in humans and has opened vast possi-

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program bilities for new therapies. For example, understanding of genetic changes associated with colon cancer has provided a specific basis for new cancer therapies and has been used to guide development of new drugs to treat resistant cases (Mount and Pandey, 2005), and cancer cells that are resistant to treatment can be classified on the basis of a specific gene sequence. Continuing work on the genomics of microbial species is also contributing to the improvement of human health. Scientists at Chiron Corporation, for example, used information from the sequencing of the bacterium Neisseria meningitis group B as the basis of a vaccine against this microorganism (Pizza et al., 2000). And current efforts to develop a vaccine for malaria, supported by the Bill and Melinda Gates Foundation, are based on interpreting genetic information on the malarial parasite (Gates Foundation, 2005). As experience with sequencing has grown, its cost has fallen from $10 per base pair in 1990 (DOE, 2000), when it would have cost more than $30 billion to sequence the 3 billion base pairs of the human genome, to $0.001 per base pair in 2005, when the same sequence could be obtained at 1x coverage for about $3 million. The decrease in cost can be represented as a linear log curve and suggests a sequencing version of Moore’s law of computing power. In this analogy, just as the complexity of an integrated circuit doubles about every 18 months, the cost of sequencing a base pair of DNA decreases by a factor of 10 roughly every 4 years. If that rate is sustained, sequencing the genome of an individual human for less than $1,000 may be possible within the next 15 years. The time required to obtain a gene sequence is also falling rapidly. In 1989, Andre Goffeau set up a consortium to sequence the 12.5-million-base-pair genome of the budding yeast Saccharomyces cerevisiae. The successful effort involved 74 laboratories and took 7 years (Goffeau et al., 1996). Today, only 10 years later, the complete genome of a new strain of Saccharomyces can be sequenced by a single facility in less than a week, and smaller bacterial genomes can be sequenced in less than a day. In fact, the U.S. Department of Energy (DOE) Joint Genome Institute (JGI) is sequencing at a rate of more than 3 billion base pairs of DNA each month—the equivalent of 1x coverage of the human genome. Other technologies are also revolutionizing genomic research. Microarray technology (also known as gene chips) allows the transcription level of most of the genes in an organism to be examined in a single experiment. A gene-chip experiment on budding yeast identified a previously uncharacterized gene, YDR533c, as being upregulated when the microorganism went into a quiescent state because of an accumulation of misfolded proteins (Trotter et al., 2002). The human homolog of that gene, DJ-1, was immediately identified in the human genome and was later shown to be a mutated autosomal recessive gene that affects early-onset Parkinson disease (Bonifati et al., 2003). (Parkinson disease is a protein-misfolding disorder that affects neurons, which are quiescent cells in the human body.)

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program The development of vast amounts of data about genomes and genetic potential defined a new approach in biomedical science of discovery science in contrast with the traditional hypothesis-driven approach. Discovery science aims to develop data resources with no specific vision as to the scientific questions to be approached. The idea is that vast data stores—when properly collected, annotated, and stored in accessible databases—are available for intense data mining by members of the scientific community who have specific hypothesis-driven questions. The various genome projects are considered discovery science, and this has proved to be a powerful scientific tool. Recently, the same approach has been extended to other “-omics” projects, the most notable being the proteomics projects that aim to define the entire protein library of a genome, including protein-protein interactions and posttranslational protein modifications. Likewise, the definition of all the metabolic pathways of a cell and their regulation (metabolomics) has begun to be an active research approach. The collection of massive data stores in -omics projects is one step in a complex “systems biology” approach to science. But although genome sequencing has proved to be a highly effective tool for gaining biological understanding, the other -omics tools have been less immediately productive thus far, because of the biological complexity of cells. Therefore, the complexity of biological systems beyond the information content of DNA—for example, proteins, metabolites, and molecular interactions, many of which are manifest only under specific developmental or environmental conditions—is not well understood. To quote David Galas in a commentary in Science (Galas, 2001): As simple as it sounds, to know that there are no other unknown genetic components that can provide alternative explanations of experimental results is a fundamental shift of perspective. This shift is beginning to transform our approach to science, enabling researchers to face the challenge of identifying all the molecular components of the cell, as well as understanding how they are controlled, interact, and function. From a picture of the “software” of the single cell, we can look to the future when researchers will begin building, with as fine a degree of resolution, an integrated view of the universe of cell-cell interactions, differentiation, and development from single cell to organism. The availability of complete sequences of Drosophila melanogaster, Caenorhabditis elegans, and Arabidopsis thaliana is already beginning to revolutionize such studies, and this list may soon include significant sequences from other biological models of metazoan development. THE DEPARTMENT OF ENERGY AS A PIONEER IN GENOMICS RESEARCH The U.S. federal system of support for science contains no central department or ministry for science. Mission-oriented research and development (R&D)

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program programs in defense, health, energy, environment, space and aeronautics, oceans and atmosphere, agriculture, transportation, and other fields are, instead, supported by a diverse array of agencies and departments. This pluralistic system of support is regarded as a great strength of the U.S. system and as something to be maintained and safeguarded (NRC, 1995). Under this system, allocation of funding for science is handled mainly by agencies that understand the purpose and content of R&D programs and the value of their results. DOE is charged with promoting scientific and technological innovation in support of its overarching mission to advance the national, economic, and energy security of the United States (DOE, 2005a). As noted by Martha Krebs, former director of the Office of Energy Research (DOE and NRC, 1998) “DOE is a science agency and … our science enables us to meet the energy challenges ahead. All too often, DOE is the forgotten science agency, despite its ranking among the top federal supporters of basic, applied, academic, and overall research.” Many observers (for example, Kenneth I. Shine in DOE and NRC, 1998) have remarked that while the 20th century was the century of physics and astronomy, the 21st century will be the century of biology in all its ramifications. DOE’s contributions to the life sciences began with health physics and radiation biology but expanded into many other fields of health and environmental research relevant to its missions. Today, DOE’s participation in the pluralistic system of federal research funding means that some non-health-related life-science fields that are unfunded or underfunded by other agencies have become central and essential to DOE’s science portfolio, for example, research in many fields of environmental biology, as typified by the Genomics: GTL program. DOE has played a critical role in the development of genomics research. Under the leadership of Charles DeLisi, it initiated discussion of the Human Genome Project (HGP) in 1986. Scientists at the DOE national laboratories recognized that their long-term studies of radiation-induced mutation could be fully understood only in the context of the genetic variation that existed normally in the world’s human populations. Therefore, DOE provided $5.3 million to initiate the HGP at its national laboratories. The National Institutes of Health joined DOE in the HGP in 1988 because it recognized that genomic tools could be important in understanding human genetic disorders. DOE, through efforts at Los Alamos National Laboratory (LANL), had been engaged in early DNA sequence analysis. The Genbank DNA sequence database, now operated by the National Center for Biotechnology Information at the National Library of Medicine, began as a project of Walter Goad at LANL. Many of the important tools for sequence analysis (for example, the Smith-Waterman analysis algorithm) were also developed as projects at LANL. Because of the interdisciplinary culture of the national laboratories, pioneering projects of this type were able to flourish.

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program Applications of Genomics at the Department of Energy In addition to the HGP, DOE invested in other programs and facilities for genomics. In 1994, DOE began its microbial genome program. In 1996, it established JGI in Walnut Creek, California, to integrate work based at the three major DOE human genome centers. After completion of the HGP, JGI refocused its mission to align with three of DOE’s primary missions: energy production, carbon management, and bioremediation. JGI’s massive sequencing capabilities have served the DOE microbial genome program by sequencing the entire genomes of many microorganisms. In addition, JGI began the Community Sequencing Program, which solicits genome sequencing proposals for organisms that are relevant to DOE missions, and other organisms important to other community dynamics. In 2005, 23 projects executed by JGI will have produced complete draft sequences of genomes of diverse organisms, including plants, insects, and fishes. JGI can be characterized as a production facility that serves a broad community of scientists by providing sequence information on diverse organisms, and it has become one of the largest such facilities in the world. Development of new technology is part of the mission of JGI, and it has resulted in remarkable reductions in the time needed to obtain sequence information. Over 50 years of nuclear-weapons research and production in the United States at DOE sites has resulted in radionuclide, metal, and organic-chemical contamination that is difficult and expensive to remove with physical decontamination methods. Microorganisms offer a biological alternative to cleaning up DOE wastes. DOE’s Natural and Accelerated Bioremediation Research (NABIR) program, established in 1995, funds research aimed at providing solutions to bioremediation of contaminants in the subsurface at DOE sites. However, not all NABIR projects depend on genomics; they also involve molecular biology, microbial physiology, geochemistry, microbial ecology, and mathematical modeling. Research supported by other DOE programs on microbial systems has resulted in sequencing of microorganisms that are important in decontamination, such as geobacters, Shewanella oneidensis, and Desulfovibrio vulgaris (Heidelberg et al., 2002; Methé et al., 2003; Heidelberg et al., 2004). A number of projects use genome-based information on those important microorganisms to elucidate metabolic pathways and their interactions with other members of their ecological community. DOE is also participating in an interagency program in phytoremediation research that supports basic science; much of this work focuses on understanding molecular mechanisms of remediation of metals or organic materials by plants. Burning fossil fuels has increased the concentration of atmospheric carbon dioxide (CO2), a heat-trapping greenhouse gas, from the preindustrial 280 ppm to about 375 ppm today (EEA, 2004). Projections are that concentrations will more than double over the next 50 years unless emissions are reduced (IPCC, 2001). Because marine and terrestrial ecosystems play major roles in global carbon

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program cycling, knowledge of the key feedbacks and sensitivities of those systems are necessary to devise carbon sequestration strategies and alternative response strategies. A current example of DOE carbon-cycle management research is the work of a team of researchers at the Oak Ridge, Pacific Northwest, Argonne, and Sandia National Laboratories, and the University of North Carolina at Chapel Hill. The team is investigating cellular function in Rhodopseudomonas palustris, a metabolically versatile bacterium that converts CO2 into cell material and nitrogen into NH3, and produce hydrogen. In parallel, a team of researchers at Harvard, the Massachusetts Institute of Technology, Brigham and Women’s Hospital (in Boston, Mass.), and Massachusetts General Hospital is studying proteins, protein-protein interactions, and gene regulatory networks of Prochlorococcus marinus, a marine cyanobacterium that is important in global photosynthesis. The group is taking a systems approach to understanding the metabolic activity of this microorganism under various environmental conditions. Charged with securing the nation’s energy supply, DOE’s Office of Energy Efficiency and Renewable Energy (EERE) has a Biomass Program and a Hydrogen, Fuel Cells, and Infrastructure Technologies Program, both of which substantially involved the National Renewable Energy Laboratory. The Biomass Program aims to develop advanced technologies that transform biomass into biofuels, biopower, and high-value bioproducts (DOE-EERE, 2005a). The hydrogen program supports research on and development of low-cost, highly efficient technologies to produce hydrogen from diverse domestic sources (DOE-EERE, 2005b). Both programs fund research on genomics, but their primary focus is on applied science, so they could benefit from complementary fundamental research aimed at elucidating biological mechanisms. Current and planned DOE research programs strive to strike a balance between discovery science, exemplified by genomics, and hypothesis-driven science, often identified with single-investigator projects. The benefits of the hybrid approach in subjects related to the DOE mission are apparent in the development of metagenomics. Microbial metagenomics involves the analysis of DNA obtained en masse from environmental samples (Handelsman, 2005a). In a sense, it is “reverse genomics” in that the structure or function of individual genomes or genes is deduced from complex mixtures of microbial consortia rather than with the classical purify-first, characterize-second approach. Metagenomics can be divided into two general categories: (1) shotgun sequencing and assembly of environmental DNA (Tringe and Rubin, 2005), typically resulting in fragmentary genome assemblies of the most abundant organisms, and (2) functional analysis of cloned DNA fragments to determine biochemical properties of interest in heterologous systems (for example, Daniels, 2005). Using metagenomics methods, scientists can study the multitude of species in an environmental system without having to culture the organisms under study. Metagenomics constitutes a huge advance over culture-dependent methods because it allows a glimpse into the nature of organisms that are inaccessible by more traditional methods.

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program Metagenomic analysis has given new insights for our understanding of genetic diversity in a number of environments, notably the world’s oceans, estuaries, and soil communities (Tringe et al., 2005; Venter et al., 2004). Using Systems Biology to Find Solutions for Carbon Sequestration, Environmental Remediation, and Energy Security Although scientists often gain insight into microorganisms or microbial processes one at a time, such studies, even when pieced together, do not provide a global picture of how a biological system works. The lack of knowledge of how microbial systems work hinders our ability to harness microbial processes for bioremediation, carbon sequestration, and bioenergy production (Box 1-1). Systems biology has been defined by Ideker et al. (2001) as an approach to studying “biological systems by systematically perturbing them (biologically, genetically, or chemically); monitoring the gene, protein, and informational BOX 1-1 Cost and Benefit of Understanding the Systems Biology of an Organism in Bioengineering Obtaining an understanding of the systems biology of an organism or community of organisms may seem complex, but the cost of ignorance can be enormous. DuPont, in collaboration with Genencor International, recently succeeded in engineering the common bacterium Escherichia coli to produce 1,3-propanediol (PDO), a chemical building block for the new fabric Sorona (also called 3GT), which is softer and more stretchable than polyester. Chemical and biological approaches to make PDO were already known when the project began, but they were not well suited for industrial-scale production, because they were energy-intensive and required expensive starting materials. Thus, there was a need to develop a new process that would use one microorganism with the ability to convert an inexpensive basic carbon source into the desired PDO product. Such a microorganism did not exist, so one was created by inserting genes that code for enzymes that catalyze the missing chemical steps into an easily grown bacterium. The metabolic-pathway engineering could have involved, in theory, the insertion of only four foreign genes, from the bacterium Klebsiella pneumoniae and the yeast Saccharomyces cerevisiae, into E. coli to enable it to make PDO from glucose. However, because scientists did not have a systems biology understanding of how E. coli would respond to the introduction of the new enzyme activities into its metabolic systems, achieving efficient “green” production of PDO actually required modification of more than 70 different genes. Most of the modified genes were from the host organism and were needed to fine-tune critical pathways, eliminate undesired enzymes, and carefully deregulate ancillary metabolic systems in E. coli (Sanford, 2004). The entire process took a team of 40 people more than 7 years.

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program pathway responses; integrating these data; and ultimately, formulating mathematical models that describe the structure of the system and its responses to individual perturbations.” Systems biology uses comparative, high-throughput assays, and mathematical or computational models to generate a picture of systemwide activities. That approach can be applied to studying systems at the subcellular level (multiprotein metabolic processes), the cellular level (integration of various functions within a cell), and the community level (interactions within multispecies communities). Systems biology focuses on the challenge of understanding at high resolution the interlocking metabolic and molecular context for physiological activity and responses to environmental conditions. Systems biology will realize its full potential only when the properties of individual components are tied to variations at the system level. The recent emergence of synthetic biology (see Box 1-2) also provides a new and powerful approach to understanding biological systems. Synthetic biology combines knowledge from various disciplines—including molecular biology, mathematics, engineering, and physics—to develop new cellular components that are based on fundamental design concepts and that will lead to new cellular behaviors. The emerging field of synthetic biology will provide fundamental insights into cellular systems, improve our understanding of natural phenomena, and promote the development of a new engineering discipline focusing on the design and development of complex cell behaviors with predictable and reliable properties. Using the two complementary approaches to study microorganisms and microbial communities to understand their structure and function, predict their behavior accurately, and manipulate them for desired functions is the key theme of DOE’s Genomics: GTL program. The program seeks to combine discovery science with hypothesis-driven research so that an investigator with a well-formulated research question can mobilize the resources of a high-throughput facility to obtain large amounts of data on genes, gene regulation, gene products, and protein-protein interactions. GENOMICS: GTL PROGRAM The Genomics: GTL program was conceptualized in 2000 after Martha Krebs, director of DOE’s Office of Science (formerly Office of Energy Research), charged DOE’s Biological and Environmental Research Advisory Committee (BERAC) to define the agency’s potential scientific roles after the HGP was completed. In response to its charge, BERAC prepared the report Bringing Genomes to Life (BERAC, 2000), which formed the basis of the first roadmap, “Genomes to Life,” prepared by the Human Genome Management Information System at the Oak Ridge National Laboratory (ORNL) in April 2001 (Table 1-1). That first roadmap argued that the availability of genomic sequences of entire organisms would enable us to gain “a new, comprehensive, and profound under-

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program BOX 1-2 Synthetic Biology Synthetic biology has been defined by some researchers as “the design and fabrication of biological components and systems that do not already exist in the natural world, and the re-design and fabrication of existing biological systems for useful purposes” (MIT Synthetic Biology Working Group, 2005). Researchers in the synthetic biology community believe that it is time to create a scientific and technical infrastructure that supports the design and synthesis of biological systems and are working to “(a) specify and populate a set of standard biological parts that have well-defined performance characteristics and can be used (and re-used) to build biological systems, (b) develop and incorporate design methods and tools into an integrated engineering environment, (c) reverse engineer and re-design pre-existing biological parts and devices in order to expand the set of functions that we can access and program, and (d) reverse engineer and re-design a ‘simple’ natural bacterium” (MIT Synthetic Biology Working Group, 2005). Researchers are exploring a broad range of applications of synthetic biology to manipulate information, fabricate materials, process chemicals, and produce energy, including: Inexpensive biosynthesis of artemisinin, the most effective anti malaria drug. The design of microorganisms that can efficiently convert sunlight into other forms of energy. The engineering of microorganisms that can move toward contaminants and remediate heavy metals, actinides, and nerve agents. Embedding of the equivalent of digital circuits in bacteria and programming of communities of bacteria to perform specific tasks, such as sensing and communications. Synthetic biology is already attracting undergraduate researchers, many of whom have participated in iGEM (intercollegiate Genetically Engineered Machine) competitions, an initiative of the Massachusetts Institute of Technology’s iCampus funded by Microsoft Research. Teams of students have developed reusable “parts” for chemical control of bacterial chemotaxis and two-way cell-cell communication using DNA, which could establish the foundation for a bacterial network akin to the Internet. Although the leading researchers in synthetic biology are in the United States, the European Union has moved aggressively to support synthetic biology as an emerging discipline. Japan is also beginning to fund synthetic biology research. The United States needs a more aggressive strategy for supporting synthetic biology.

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program TABLE 1-1 Major Events Leading to the Release of the 2005 Genomics: GTL Roadmap Year Event 1999 November 24 Martha Krebs, director of DOE Office of Science, charges BERAC to define the department’s potential scientific roles after the HGP is completed 2000 August BERAC publishes Bringing the Genome to Life in response to Krebs’s 1999 charge October 29-November 1 Genomes to Life roadmapping workshop 2001 January 25-26 Genomes to Life roadmapping workshop June 23 Genomes to Life workshop on role of biotechnology in mitigating greenhouse-gas concentrations August 7-8 First Genomes to Life computational biology workshop September 6-7 Visions for computational biology and systems biology workshop for Genomes to Life program December 10-11 Genomes to Life: Technology assessment for mass-spectrometry workshop 2002 January 22-23 Computing infrastructure and networking workshop for Genomes to Life March 6-7 Computer science for Genomes to Life workshop March 18-19 Mathematics for Genomes to Life workshop April 16-18 Imaging workshop for Genomes to Life program April 16-19 Computing-strategies workshop June 19-20 Genomes to Life systems biology facilities planning workshop I July 23 DOE awards $103 million for post genomics research. August 16-17 Genomes to Life systems biology facilities planning workshop II October 14-15 Genomes to Life systems biology facilities planning workshop III December 3-4 Genomes to Life draft facilities strategy and plan submitted to BERAC by Life Sciences Division of Biological and Environmental Research program 2003 April 1-2 GTL facility for whole-proteome analysis workshop April 23 DOE awards $9 million for energy-related genomics research May 12-14 Bioinformatics in GTL facility for whole-proteome analysis May 29-30 GTL facility for production and characterization of proteins and molecular tags workshop

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program Year Event June 2-4 Facility user interactions workshop June 17-18 Characterization and imaging of molecular machines facilities workshop July 22-24 Three Genomes to Life workshops: data infrastructure, modeling and simulation, and protein structure and prediction September 10-11 GTL and beyond: data-standards workshop 2004 February Program name changed from Genomes to Life to Genomics: GTL February 29-March 4 Genomics: GTL contractor-grantee workshop II March 3-4 Planning study I: Genomics: GTL program science and capability needs for DOE missions June 14-16 DOE Genomics: GTL roadmap planning phase 2 2005 February 6-9 Genomics: GTL contractor-grantee workshop III October 3 Genomics: GTL roadmap released   SOURCE: Adapted from http://doegenomestolife.org/program/timeline.shtml. standing of complex living systems.” High-throughput data and high-performance computing are the two key elements to achieve the goal. Large amounts of data would need to be collected to characterize proteins, molecular machines, gene regulatory networks, and entire microbial communities in natural environments at the molecular level. Computational methods and capabilities would need to be developed to integrate the data and to gain a predictive understanding of these complex biological systems. The 2001 roadmap called for program managers to “meet with stakeholders in a series of workshops, scientific society symposia, and other exchanges on scientific topics to guide program development.” In 2002, DOE put out the first request for proposals (RFP) under the Genomes to Life program, now called Genomics: GTL. The RFP called for applications for “research from large, well integrated, multidisciplinary research teams that support the Genomes to Life research program.” The theme of the program was to develop the experimental and computational capabilities necessary to enable a predictive understanding of the behavior of microorganisms and microbial communities of interest to DOE (Box 1-3). Since its launch, the Genomics: GTL program has funded some 75 research projects and subcontracts, including basic research and outreach programs. It has also funded two infrastructure projects at the national laboratories and facilitated 22 workshops on topics ranging from genomics-enabled geomicrobiology to high-performance computing.

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program BOX 1-3 Selected Highlights of Genomics: GTL Research to Date Bioenergy Alternatives During the next 2 decades, U.S. energy demand is expected to outpace substantially the increase in domestic production of fossil fuels. Concurrent with an increased requirement for energy is a need to reduce dependence on foreign sources of oil and thereby increase energy security. One pillar of DOE’s missions is to explore and facilitate development of renewable, environmentally safe, biological sources of energy. Among the topics to be addressed by Genomics: GTL are biological production of liquid (ethanol, methanol, and biodiesel) and gaseous (hydrogen and methane) fuels. One key is an increased understanding of microbial enzyme consortia that participate in degradation of biological polymers, such as lignin and cellulose, which are major chemical components of plant life. JGI has determined the DNA sequence of a fungal species that has an unusual capacity for degradation of cellulose and lignin biomass. In the genome of that fungus, scientists at the DOE national laboratories discovered genes for the enzymes involved in biomass conversion, making the goal of improving enzymes for biomass conversion to ethanol-based fuels more tangible. Other energy-related plans of the Genomics: GTL bioenergy program include efforts to redirect microbial photosynthesis to generate hydrogen fuel in a process that uses energy derived from sunlight to convert water into hydrogen and oxygen and research into the remarkable ability of some soil microorganisms to produce electricity from simple organic compounds. Bioremediation DOE is charged with remediating thousands of our nation’s most contaminated landscapes, many of which are the legacy of a diverse network of defense facilities. The scale of several of those landscapes, some of which exceed 1 million cubic meters of contaminated earth, will require innovative, biologically based remediation strategies. DOE-funded scientists are working to increase knowledge of microbial systems involved in the remediation of toxic metals and radionuclides. Researchers in a project funded through ORNL are meeting the challenge to understand those complex systems by developing computational models that predict the behavior of key regulatory networks involved in bioremediation. In parallel, DOE has funded research on the genetic potential of a microbial species that has a documented capacity for uranium bioremediation and the ability to produce electric energy from organic matter. Carbon Cycling and Sequestration Atmospheric greenhouse-gas concentrations have increased steadily over the last 2 centuries; massive quantities of carbon are released into the atmosphere each year because of human activity. The Intergovernmental Panel on Climate Change predicts a doubling of CO2 concentrations by the middle of the 21st century with potentially serious consequences for the quality of our environment. Earth’s marine environments and in particular their microbial inhabitants constitute a potential tool to change the balance of the CO2 equation. A key to realiz-

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program ing that potential is an increased understanding of the planet’s biological carbon cycle, including microbial photosynthesis—a process that uses light energy to convert atmospheric CO2 into the organic molecules that make up life on Earth. Manipulating photosynthetic systems on a grand scale may offer a means to decrease atmospheric CO2. To that end, researchers at the Sandia National Laboratory are developing experimental and computational methods to understand the genes and proteins of the photosynthetic marine microorganisms of the genus Synechococcus, which play a key role in Earth’s carbon cycle, and their colleagues at ORNL and Pacific Northwest National Laboratories are working to characterize the multiprotein machines involved in the microbial carbon cycle. Research on Enabling Technologies To achieve the long-term goals of Genomics: GTL, it is essential that technical limitations and knowledge gaps be addressed. Much of the research funded by the program aims to lay the foundation for future study by solving key issues in genome-directed science. For example, Several research projects aim to develop computational models to understand complex microbial systems, and other researchers are developing data warehouses and computational tools to organize and relate genomic information for bench scientists. Other scientists, distributed among several projects, are working to develop novel methods to image biological systems, including visualization of DNA-protein interactions that regulate an organism’s genetic potential and monitoring of life’s processes on the microscopic scale of single living cells. DOE-funded scientists are devising innovative methods to culture recalcitrant species of microorganisms; such breakthroughs will greatly facilitate the study and manipulation of these species in a laboratory setting. Although the term genomics typically conjures images of genes and proteins, the ultimate effect of many genes and proteins is to cause changes in the small-molecule complement of a cell, otherwise known as the metabolome. Metabolites can serve a practical role as building blocks of other cellular molecules, or they may have more intriguing roles as signal molecules that orchestrate microbial behavior. In any case, understanding how microbial metabolism influences microbial function is an important goal, and it is the focus of several projects funded by the Genomics: GTL program. DOE has committed about $240 million from FY 2002 to FY 2006. Of that amount, 60 percent has funded scientists at DOE-operated national laboratories, and 40 percent has funded scientists at academic and private research institutions. The majority of funding awarded to scientists in academic and private laboratories has gone to three institutions (see Appendix C). Taken together, the funded research projects are addressing some of the most

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program pressing issues in microbial genomics. Several have direct application to DOE’s energy-related mission, and others are developing enabling technologies and datasets that are necessary for the advancement of microbial genomics generally. All funded projects are relevant to energy security, environmental remediation, or carbon cycling and sequestration. In parallel to the Genomics: GTL program, a series of workshops was held to discuss facility needs. Those workshops led to a working paper presented to BERAC in April 2002 that called for the creation of “unique, high-throughput research facilities to translate the new biology, embodied in the Genomes to Life (GTL) program, into reality for the nation.” Those facilities would integrate high-throughput biology and computation and information management and would be resources for the broad scientific community. Later that year, BERAC provided a draft implementation plan for four user facilities for the Genomics: GTL program. The plan was developed in a series of workshops in 2003. All the workshops ultimately resulted in the outline of the four facilities that are described in the 2005 Roadmap for DOE Genomics: GTL. Systems Biology for Energy and Environment (DOE, 2005b). The facilities would be constructed sequentially and complement each other. Facility for production and characterization of proteins and molecular tags. This facility would produce all proteins encoded in any genome on demand, including molecular tags to identify, locate, and manipulate proteins in living cells. The core facility instrumentation will consist of high-throughput technologies for protein-production screening and robotic systems for affinity-reagent production and characterization. Computational capabilities will allow data capture and management, genomic comparative analysis, and control of high-throughput and robotic systems. Facility for characterization and imaging of molecular machines. This facility would identify and analyze molecular-machine components from microbial cells, including their structure, function, assembly, and disassembly. Facility instrumentation will include mass spectroscopy to characterize molecular machines and imaging capabilities to localize them in cells. Computational capabilities will allow for modeling and simulation of molecular interactions to understand how these complex structures arise. Facility for whole proteome analysis. This facility will enable the identification of all proteins and other biologically significant molecules (such as lipids, carbohydrates, and enzyme cofactors) that a microbial cell produces under different, but controlled, environmental conditions to identify responses to various environmental influences and to elucidate pathways. The core facility instrumentation would include large numbers of chemostats to grow microbial systems under various environmental conditions and instrumentation to analyze the molecular makeup of microbial cells, such as nuclear magnetic resonance spectrometer and mass spectrometer. Computational capabilities would allow for data

OCR for page 9
Review of the Department of Energy’s Genomics: GTL Program analysis and modeling and simulation of microbial systems to inform experiments and predict their outcomes. Facility for modeling and analysis of cellular systems. This facility will focus on the study of microbial communities under highly controlled conditions that mimic natural environments. The goal would be to gain an understanding of microbial communities through analysis of functional properties of individual species or multispecies consortia by using imaging techniques that allow nondestructive monitoring of the molecular makeup of cells within the communities. Instrumentation would include cultivation technologies for microbial communities under highly controlled environmental conditions and imaging instrumentation to resolve the molecular makeup of cells spatially and temporally. Computational capabilities would focus on data analysis and modeling, including simulating complex microbial communities. The committee examined the current Genomics: GTL program and the challenges that it faces in achieving DOE’s mission goals. The committee enthusiastically concluded that the case for DOE to play a leading role in systems biology is extremely strong. On the basis of that assessment, the committee considered whether high-throughput capabilities in protein production, proteomics, molecular imaging, and systems biology would facilitate the advancement of Genomics: GTL research in a cost-effective, efficient and scientifically optimal manner. Finally, the committee examined the current plan for the four proposed user facilities, its own proposed alternative plan, and discussed the pros and cons of the two plans.