Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 20
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology 2 Assessment WHAT IS PLANT BIOLOGY RESEARCH IN 2007? The ultimate goal of plant biology research and of the National Plant Genome Initiative (NPGI) is to create the knowledge-based capability to breed or produce plants with specific performance characteristics (phenotypic traits). Most traits of economic interest are under strong to moderate genetic control and are variable across populations and environments both within a species and between species. Discovering the genetic processes that control trait expression requires deep experimental knowledge in a few model species, intersected with broad knowledge of how natural variation in crop species and their close relatives contributes to it. Of course, the assumption that the most closely related genes across species share function is not always true, but it is an excellent starting assumption that is typically testable. Plant biologists aim to understand the “genetic wiring” of plants and of plant processes of basic, societal, or environmental interest. They aim to inform the breeding of plants with a defined genetic makeup, and to be able to predict with high certainty how these plants will perform, in different environments and climate conditions. Examples of the traits that plant genome scientists would like to understand and control include disease resistance against a wide range of plant pathogens, nematodes, and insects and tolerance to environmental stresses (for example, salt, toxic soil chemistries, drought, extreme temperatures, and soil nutrient utilization). Other important targets include modulation of plant growth and development (for
OCR for page 21
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology example, useful alterations of plant size, shape, and chemistry and the ability to use less fertilizer) and improved control of flowering and of the amount and quality of fruits and seeds produced (see Chapter 1). Achieving the goals of breeding or producing plants with specific performance characteristics requires extensive investment in data generation, data management, and analysis infrastructures, and human capacity-building to make effective use of the data. It also requires a daunting level of intellectual growth in biologists’ perception of how genetic networks control physiological traits, how natural genetic variability in important traits within and across plant species is manifested, how environmental signals are transduced into adaptive responses, and how evolutionary processes lead to network diversification, optimization, and creation of trait novelties. SCIENTIFIC AND SOCIETAL IMPACTS OF NPGI Impacts and Outcomes from NPGI-Funded Research At the beginning of NPGI in 1998, there was little dedicated federal funding for plant genomics research beyond the then rapidly expanding Arabidopsis genome project and its associated research community, and various projects funded by ad hoc grants to principal investigators (PIs) from different research agencies. One exception was the U.S. Department of Agriculture’s (USDA) National Research Initiative, which awarded 86 grants in FY 1997 worth about a total of $11 million from its “Plant Genomics” grant panel. These ad hoc efforts were split among many plant species, which arguably inhibited deep strategic investment in plant biology as a whole and genomics-based crop improvement in particular. A fair assessment of NPGI, then, would address whether and how it has contributed to the building of strong and vibrant research communities linked by common interests. If these research communities have indeed been built, have they invested in cutting-edge genomic technology, and have they performed well using those resources? The committee relied on three key documents that articulated the goals (NRC 2002; NSTC 1998, 2003) and on the advice, critiques, and summaries of discussions at a workshop featuring key academic and private sector plant genome scientists (see Appendix D for workshop agenda and speakers). The committee also used data collected from a questionnaire sent to all lead principal investigators and reviewed the yearly NPGI Progress Reports (NSTC 1998, 1999, 2000, 2001, 2003, 2004, 2005, 2006, 2007). The 5- and 10-year goals of NPGI were noted in Chapter 1 (see also NRC 2002; NSTC 1998, 2003). Some highlights of the research aimed toward those goals are emphasized in the following sections.
OCR for page 22
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology Capacity and Infrastructure Building The committee views at least a significant part of the first nine years of NPGI as a capacity-building exercise, as also emphasized by the previous NRC report The National Plant Genome Initiative: Objectives for 2003–2008 (NRC 2002). The capacity-building exercise was not trivial for two important reasons. First, there are many plant species, each of which might provide unique biology of interest to society. Hence, the mission of “plant genomics” is much broader than the mission of “animal genomics,” which is nearly all driven by ultimate concerns for human health, and to a far lesser degree, by humans’ uses of domesticated animals. Second, traditional plant biology research on the broad number of crops species took place in many institutions that, before NPGI began, had little exposure to either the mind frame or toolkit of genomics. The committee addressed how NPGI has built human capacity and how it has contributed to the distribution of a broad technological platform serving a variety of institutions and plant species. NPGI has done very well by those metrics. First, the number of different PIs funded by NPGI grew nearly 13-fold over the first 9 years (from 21 to 277; see Table E-1 in Appendix E). As is perhaps expected, many of these PIs had more than one grant funded in that period. In sum, these numbers suggest that a critical mass of plant genomics PIs is being recruited for future efforts. Second, the committee noted what seems at first glance to be a rather low proportion of investment ($14 million, or about 2 percent of the total) in the emerging, and often expensive, instruments required to compete effectively in genomics research (Table E-2 in Appendix E). The low investment in genomics instruments is partly a result of NPGI projects taking advantage of “sequencing for hire.” Because sequencing for hire has become a lot cheaper over the nine-year course of the program, it results in cost savings compared to investing in large-scale sequencing equipment. Nevertheless, NPGI needs to ensure that its projects have access to the ever-changing landscape that characterizes high-throughput biology. Technology access facilitates previously impossible experimentation and in fact drives creation of new technologies. The rationale for further investment in technology access and technology creation in the framework of NPGI is discussed in detail in Chapter 3. Human capacity-building is addressed in the Education section below. Genome Sequence, Structure, and Organization NPGI has contributed to revolutionary breakthroughs in plant genome sequencing. The initial priority in plant genomics research is to have a high-quality finished genome sequence of the relevant organisms. The first such sequence for any species is referred to as the “reference” sequence (see below). NPGI initially
OCR for page 23
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology invested in the international sequencing consortium that accelerated finishing of the Arabidopsis thaliana reference sequence (The Arabidopsis Genome Initiative 2000), and then helped to build an international consortium for sequencing the rice genome (see below). The publications describing their genomes are citation classics. Because pathogens and pests cause great losses in crop yield, the sequencing of plant pathogenic genomes was included within the broader NPGI. Sequenced pathogens initially included bacterial pathogens (three strains of Pseudomonas syringae, three Xanthomonads, and several Xylella strains) and the fungal causative agent of rice blast, Magneporthe grisea. The NPGI subsequently supported the sequencing of additional fungal genomes, such as Phytophthora (three species that cause late blight of potato, root and stem rot of soybean, and sudden oak death syndrome; http://www.oomycete.org/). Three Fusarium species, three strains of Verticillium wilt, several powdery mildew and rust fungi, and the necrotrophic fungus Botrytis cinerea (Broad Institute 2007) were sequenced as part of a focus on fungi by the National Human Genome Research Institute (NHGRI). Genome sequences from additional pathogens, like Hyaloperonospora parasitica (oomycete causing downy mildew of Arabidopsis), are nearly finished. This first wave of plant pathogen genome sequences begins to cover the most economically critical plant pathogens, and it opens the door for comparative studies both across different isolates of one species and between species in the search for common mechanisms of virulence. In addition to using “sequencing for hire” in some projects, NPGI has recently benefited from an extremely successful interaction with the Department of Energy’s (DOE) Joint Genome Institute (JGI) to accelerate high-throughput plant and pathogen genome sequencing. That in-kind support to NPGI relies on a stringent external peer review by JGI that prioritizes projects on the basis of a mix of criteria, which include relevance to the DOE mission, organization and activity of the research community centered around candidate species, and evolutionary criteria aimed at maximizing the phylogenetic breadth of sampling. It is the committee’s view that the successful interaction of IWG with JGI, as the key (in fact, the only) major plant genome sequencing center, is critical to future overall success of NPGI. Comparative genomics is central to modern genetic approaches. Perhaps the most profound lesson of the Human Genome Project is that comparative analysis between closely and distantly related genomes provides a rapid and cost-effective way to extract information that can accelerate applied biomedical research and development. After the sequencing and analysis of the mouse and rat genomes (both model systems of direct relevance to biomedical research), it became evident that more sampling of diverse mammals would accelerate the identification and
OCR for page 24
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology characterization of functional elements in the human genome through comparative analysis. That rationale led the NHGRI to sequence not only model organisms like the chicken and dog, but also the opossum, platypus, elephant, armadillo, and squirrel genomes. Nearly 20 mammalian reference genomes are either complete or in progress, totaling perhaps 60 billion base pairs. This rich comparative sequence landscape can lead to profound understanding of genome organization. The development of plant comparative genomics has an important additional strength relative to the parallel comparative study of species related to humans. Humans, in essence, are the sole focus of biomedical research and this consideration drives the selection of relevant genomes to sequence. In contrast, there are dozens of societally important crops and wild plant species that are far more distantly related from one another than are mammals to each other. In addition, many species of plants have already undergone hundreds to thousands of years of domestication and agronomic improvement. They provide snapshots of how traits important to humans can be modified by selection. The multiple species focus of agriculture, therefore, places a premium on research approaches that can leverage generically useful genomics information for application across plant taxa. The committee is confident that comparative genomics within and between plant families will accelerate the definition of gene function in parallel to the way comparative mammalian genomics has accelerated human genomics in the last five years. The evolutionarily conservation across plant genomes strengthens inferences made by comparative genomics methods. Hence, genome comparisons will have many useful cross-family applications between legume, rosaceous, solanaceous, and cereal crops, as well as between wood and fiber crops in the Salicaceae (poplar, willow), Myrtaceae (eucalypts), and the diverse families of conifers (pines and spruces). In particular, synergistic use of Arabidopsis and rice genome sequences can often allow definition of candidates for conserved function in other species for which, for example, expressed tag sequences (ESTs) from specific organs exist. There is also substantial and useful genome conservation that extends to the evolutionarily ancient gymnosperms, which include pine and spruce. Full-length cDNA clone sequences are even more useful to understand gene function and evolution; complete collections of full-length cDNA clones are important tools for subsequent functional experimentation. Completed and Ongoing Land Plant Reference Genome Projects Table 2-1 includes known ongoing plant genome sequencing projects, many of which support the mission of NPGI through in-kind support from JGI. The table includes only those projects that are expected to release sequences publicly in the
OCR for page 25
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology TABLE 2-1 Reference Plant Genomes Sequenced and in Progress Species Common Name Size (Gb) Strategy Estimated or Actual Date of Completiona Group 1 Arabidopsis thaliana Thale cress 0.2 BAC 2000 International consortium 2 Oryza sativa (x2) Rice (indica and japonica) 0.4 BAC 2005 International consortium 3 Populus trichocarpa Black cottonwood 0.5 WGS 2005 Joint Genome Institute 4 Vitis vinifera Grape 0.5 WGS 2007 Genoscope 5 Physcomitrella patens Club moss WGS 2006 Joint Genome Institute 6 Medicago truncatula Barrel medic 0.5 BACb 2007 International consortium 7 Sorghum bicolor Sorghum 0.7 WGS 2007 Joint Genome Institute 8 Carica papaya Papaya 0.4 WGS 2007 University of Hawaii 9 Ricinus communis Castor bean 0.4 WGS 2007 The Institute for Genomic Research 10 Zea mays (x2) Maize 2.3 BACc 2008 Washington University Genome Center 11 Arabidopsis lyrata Rockcress 0.2 WGS 2007 Joint Genome Institute 12 Selaginella mollendoerfii Spike moss 0.2 WGS 2008 Joint Genome Institute 13 Mimulus guttatus Monkeyflower 0.5 WGS 2008 Joint Genome Institute 14 Glycine max Soybean 1.1 WGS 2009 Joint Genome Institute 15 Brachypodium distachyon Purple false brome 0.4 WGS 2008 Joint Genome Institute 16 Prunus persica Peach 0.3 WGS 2008 Joint Genome Institute 17 Solanum lycopersicum Tomato 1.0 BACb 2010? International consortium 18 Brassica rapa Chinese cabbage 0.5 BAC 2009? International consortium 19 Capsella rubella Shepherds purse 0.2 WGS 2008 Joint Genome Institute 20 Setaria italica Foxtail millet 0.5 WGS 2009 Joint Genome Institute 21 Aquilegia formosa Western columbine 0.4 WGS 2009 Joint Genome Institute 22 Eucalyptus grandis Eucalyptus 0.6 WGS 2009 Joint Genome Institute 23 Lotus japonicus Trefoil 0.5 BAC 2010? Kasuza DNA Research Institute NOTE: The strategies used could be map-based sequencing using bacterial artificial chromosomes (BAC) or whole-genome shotgun sequencing (WGS). aSeveral timelines in Table 2-1 are estimated from project websites or personal communication, and are hence approximate. bBAC indicates only euchromatic BACs will be sequenced. cBAC in addition to the BAC-by-BAC maize project, a second maize inbred line is being sequenced using a whole genome shotgun method by the Joint Genome Institute. near future. A reference genome might have gaps and errors but captures greater than 90 to 95 percent of protein-coding gene content in highly accurate sequence (less than 1 error in 10,000 nucleotides), typically (but not always) anchored to physical and genetic maps. In some cases, targeted gap closure generates higher quality “finished” sequence. Resequencing projects that are aimed at characterizing
OCR for page 26
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology variation relative to a reference sequence within a particular species, or efforts at finishing a nearly complete genome, are not included in this table. Some groups are still seeking funds to complete an ongoing sequencing project. Genome sizes are estimates of haploid content, given in billions of base pairs (Gb). As noted above, the ultimate success of plant genomics is enriched by the knowledge of what genes are expressed in various cell types and organs under different stress conditions and their overall developmental time. The sampling of ESTs can give rise to a measure of the gene number (termed Unigene or, as in Table 2-2, a TIGR contig), and hence the deduced number of proteins, in an organism. Additional methods can sample the expression of the genome in specific tissues and cell types over developmental and environmentally altered conditions (transcriptomics). NPGI has contributed significantly to the collection of ESTs from various species, as shown in Table 2-2, and to the deployment of various transcriptomic tools. Despite the large numbers of EST sequences and the equally compelling numbers of different cDNAs represented by these ESTs for many species, the extent and functional relevance of splicing of primary RNA transcripts and other elements and of alternate transcriptional starts and stops in plants are largely unknown. Whole genome analysis with tiling arrays using the Arabidopsis or rice genome sequences have made careful analysis in those important areas possible. Another calculation of the number of putative unique transcripts (PUTs) for these and other species can be found at the Plant Genome Database (Plant Genome Database 2007). Gene Function, Expression, and Regulatory Networks Genome sequence is the raw material for biological discovery. However, it is only one of the first steps toward understanding gene function, even at the biochemical level. In fact, plant scientists claim to have functional knowledge of only about 40 percent of the genes in Arabidopsis, and that estimate is based on an arguably overestimate of gene ontology (GO) functional inference. Hence, one important metric of plant genomics progress is whether the genomics tools have been generated with which to perform functional analysis in both high-throughput “data factories” and by hypothesis-driven studies of detailed gene function, usually in the laboratories of single investigators who specialize in functional networks of genes that act in a particular process or who study specific classes of genes. NPGI has supported a wide range of investigations into gene regulatory mechanisms in model and crop plants. By virtue of the number of plant species funded by NPGI grants, the diversity represented by the funded projects is high. However, they generally fall into one or more of the following categories: defining
OCR for page 27
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology TABLE 2-2 Public Land Plant EST and Assembled Unigene of the National Center for Biotechnology Information (NCBI) or Contig Sequences of the Institute for Genomic Research (TIGR) That Are Deposited in Genbank up to August 2007 Common Name Total ESTs Unigenes (NCBI) TIGR Plant Transcript Assemblies Eurosids II (Brassicas, citrus, cotton) Arabidopsis thaliana Thale cress 1,276,692 29,918 27,983 Brassica napus Oilseed rape 567,177 26,285 16,608 Gossypium hirsutum Upland cotton 177,182 16,367 24,797 Citrus sinensis Sweet orange 94,738 9,667 11,061 Gossypium raimondii New world cotton 63,577 3,279 8,665 Citrus clementina Clementine orange 62,250 6,106 5,222 Gossypium arboretum Tree cotton 39,232 NA 4,591 Brassica rapa Field mustard 33,316 NA 4,409 Brassica oleracea var. alboglabra Wild cabbage 30,759 NA 6,761 Poncirus trifoliate Japanese hardy orange 28,737 NA 5,083 Brassica oleracea Wild cabbage 26,692 NA See var. alboglabra above Brassica rapa subsp. Pekinensis Chinese cabbage 20,073 NA 4,409 Eurosids I (legumes, rosaceous plants, euphorbs, willows) Glycine max Soybean 392,321 24,018 36,399 Malus x domestica Apple tree 255,097 16,903 26,757 Medicago truncatula Barrel medic 236,819 16,211 20,414 Lotus japonicus Trefoil 150,631 13,640 14,461 Populus trichocarpa Black cottonwood 89,943 14,059 12,687 Populus tremula x Populus tremuloides Hybrid aspen 76,160 7,519 11,593 Prunus persica Peach 70,972 6,306 6,596 Ricinus communis Castor bean 53,402 NA 4,524 Populus trichocarpa x Populus deltoides Hybrid poplar 53,208 NA 7,803 Euphorbia esula Leafy spurge 47,543 NA 9,905 Arachis hypogaea Peanut 40,627 NA 1,491 Trifolium pratense Rotklee clover 38,109 NA 4,347 Populus tremula European aspen 37,313 NA 5,961 Manihot esculenta Cassava 36,120 NA 5,189 Phaseolus vulgaris Common bean 22,847 NA 2,941 Bruguiera gymnorrhiza Burma mangrove 20,373 NA 2,031 Populus trichocarpa x Populus nigra Hybrid poplar 20,130 NA 3,531 Phaseolus coccineus Scarlet runner bean 20,120 NA 2,315
OCR for page 28
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology Common Name Total ESTs Unigenes (NCBI) TIGR Plant Transcript Assemblies Asterids Solanum lycopersicum Tomato 257,093 16,945 21,523 Solanum tuberosum Potato 227,289 19,539 26,280 Helianthus annuus Common sunflower 94,111 7,955 10,219 Nicotiana tabacum Common tobacco 88,579 8,436 10,693 Lactuca sativa Garden lettuce 80,781 7,839 11,215 Ipomoea nil Japanese morning glory 62,282 NA 11,216 Coffea canephora Robusta coffee 55,692 NA 6,732 Lactuca serriola Prickly lettuce 55,490 NA 7,125 Cichorium intybus Chicory 41,747 NA 6,501 Nicotiana benthamiana Tobacco 41,440 NA 4,836 Taraxacum officinale Dandelion 41,296 NA 5,993 Helianthus tuberosus Jerusalem artichoke 40,362 NA 5,845 Helianthus exilis Serpentine sunflower 33,961 NA 5,187 Capsicum annuum Pepper 31,090 NA 4,189 Lactuca saligna Willowleaf lettuce 30,696 NA 4,999 Helianthus paradoxus Paradox sunflower 30,517 NA 3,864 Cichorium endivia Endive 30,171 NA 4,098 Lactuca virosa Wild lettuce 30,068 NA 4,912 Lactuca perennis Wild lettuce 29,125 NA 4,485 Helianthus petiolaris Prairie sunflower 27,484 NA 3,994 Antirrhinum majus Snapdragon 25,310 NA 4,221 Ocimum basilicum Sweet basil 23,260 NA 3,343 Helianthus ciliaris Texas blueweed 21,590 NA 3,070 Other eudicots Vitis vinifera Grape 320,538 22,278 21,627 Aquilegia formosa x Aquilegia pubescens Western columbine 85,039 7,555 12,160 Mesembryanthemum crystallinum Common ice plant, “basal” core eudicot 27,348 NA 2,897 Beta vulgaris Beet, “basal” core eudicot 26,745 NA 3,868 Monocots (includes grasses) Oryza sativa Rice 1,211,418 40,259 49,870 Zea mays Maize 1,159,264 57,447 64,601 Triticum aestivum Wheat 1,050,926 34,505 62,121 Hordeum vulgare + subsp. vulgare Barley 437,713 21,418 30,171 Saccharum officinarum Sugarcane 246,301 15,586 26,894 Sorghum bicolor Sorghum 204,308 13,547 20,714 Festuca arundinacea Fescue 41,869 NA 6,297 Zingiber officinale Ginger 38,139 NA 7,850 Hordeum vulgare subsp. spontaneum Barley 24,161 NA See sp. vulgare above
OCR for page 29
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology Common Name Total ESTs Unigenes (NCBI) TIGR Plant Transcript Assemblies Sorghum propinquum Sorghum 20,881 NA 3,402 Brachypodium distachyon Purple false brome 20,449 NA 2,785 Allium cepa Onion 20,159 NA 3,578 Gymnosperms Pinus taeda Loblolly pine 328,628 18,859 28,060 Picea sitchensis Sitka spruce 139,569 15, 683 11,551 Picea glauca White spruce 132,623 17,810 16,102 Picea engelmannii x Picea glauca Hybrid spruce 28,170 NA 5,767 Pinus pinaster Maritime pine 27,288 NA 3,901 Other land plants Physcomitrella patens subsp. Patens Moss 174,908 13,688 18,707 Marchantia polymorpha Liverwort 33,692 NA 3,874 NOTE: All plants with more than 20,000 ESTs are shown, as listed in dbEST. gene function, defining regulatory genes and networks, understanding patterns of gene expression, comparative analysis of gene expression, gene expression resources and databases, and epigenetics and RNA-based regulation. This information is captured in both the published record and in various databases (see Appendix F). A brief summary of the many highlights includes the following: Defining gene function. Large-scale insertional mutagenesis and TILLING resources, first deployed in Arabidopsis but now available in a variety of crop species, have revealed key functional and phenotypic knowledge and provided vital resources for further work (see Table 2-3). The ability to define gene function via loss of function mutation remains the bedrock of genomics, and methods to overcome genetic redundancy and other impeding factors are further being developed. Those methods include the engineering of artificial micro-RNAs capable of silencing several members of a gene family simultaneously. Defining regulatory genes and networks. Several projects focused on identification of novel regulatory genes and features through genome-wide approaches. For example, regulatory networks and factors that control host-microbe interactions and disease resistance, largely identified by large-scale forward genetics in Arabidopsis, are now being exploited in rice, tomato, and legumes, among others, using forward and reverse genetics methods enabled by genome sequences.
OCR for page 30
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology Understanding patterns of gene expression. Functional genomic technologies were developed and applied to analyze gene expression patterns in different cell types, tissues, and organs, and in plants under stress and undergoing developmental transitions. Microarray and other high-throughput profiling tools have been used to identify and characterize important genes for root architecture, leaf form, and tomato fruit development, just to name a few examples. Many of these projects have yielded publicly available expression atlases and searchable resources. Comparative analysis of gene expression. Comparative analysis of expression patterns could be a major outcome of functional genomics applied to a wide variety of plant species. Some success has been realized in NPGI-funded analysis of genes involved in flower development across an evolutionary spectrum of plants. Genes that are regulated by the circadian clock, and by photoperiodic regulatory modules, are being revealed through comparative profiling and analysis in Arabidopsis, poplar, and rice. Gene expression resources and databases. Several databases and online resources emerged from NPGI-funded projects (Table 2-3). Those resources include the MPSS database of transcript and small RNA expression data, and the PlexDB database for expression data. Many of those resources are used regularly by PIs of NPGI projects. (See the list of Websites that NPGI PIs reported as their five most-used websites for their work in Appendix F.) Epigenetics and RNA-based regulation. The diversity and functions of small RNAs (20–25 nt) that affect both genic and intergenic sequences have been revealed using innovative high-throughput sequencing technology in a variety of dicot and monocot models and crops. This work will enable a more subtle understanding of gene regulation and the evolution of developmental regulator processes. NPGI-funded projects have contributed to the rapidly expanding field of epigenetics, which deals with heritable changes and patterns that occur without changes in DNA sequence. Epigenetic inheritance properties are controlled by the structure of chromatin as expressed in changes to histones and DNA methylation, which are affected by polyploidy, hybridization, and the expression of small RNAs. Genome-wide surveys and functional analysis of genes affecting epigenetic inheritances have been done in maize, Arabidopsis, and a few other species. Shortcomings in Gene Function, Expression, and Regulatory Network Analyses Not all progress that was envisioned five years ago (NRC 2002) has been realized. Integration of data across plant species remains a considerable challenge, partly because of the heterogeneity of datasets, disparate data standards, lack of sufficient experimental tools, and small number of groups funded to do database and experimental integration work. Data integration across heterogeneous plat-
OCR for page 42
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology opportunities in bioinformatics for established and new investigators or in plant genomics for plant breeders and physiologists has been considerably slower. The 2002 NRC report proposed a national strategy for bioinformatics that included training, collaboration with large data centers, and bioinformatics-oriented research. Strategies to address this perceived gap were also presented as key objectives of the proposed Plant Cyberinfrastructure Center (Meyerowitz and Rhee 2006). Although the anticipated new generation of researchers specializing in plant genomics is emerging, there is a need for experienced plant physiologists and plant breeders who have acquired skills in genomic technologies. The lack of plant breeders who are well versed in genomic approaches is seen as a major impediment to translational plant genomics and to the future of plant improvement in the public and private sector in the United States. NPGI-supported workshops on marker-assisted selection for plant breeders are a good start to correcting this deficit (NSTC 2006). Outreach to plant breeders, and potentially to farmers, seems to be within the mandate of USDA and its extension arm, but it is unclear whether there has been a concerted effort in this regard. In the first year of the Wheat Coordinated Agricultural Project (CAP), USDA provided workshops or information sessions on marker-assisted selection at more than 40 field days and industry meetings, and mounted a symposium at the Crop Science Society of America meeting that reached more than 120 people (USDA-CSREES presentation to the committee, April 26, 2007, Workshop). NPGI has also sponsored workshops at the Plant and Animal Genome Conferences on specialized topics relevant to specific crops and on general subjects such as database construction, transcriptional profiling, and genomic computing (NSTC 2000). Informing the Broader Research Community The 2002 NRC report called for organizers of community databases to improve the user skill level through short courses and exchange visits (NRC 2002). For example, the Arabidopsis Information Resource (TAIR) has offered one-hour to two-hour introductory and advanced workshops at Plant and Animal Genome Conferences, the International Conference on Arabidopsis Research, and the American Society of Plant Biologists Meeting. TAIR usage has increased steadily since the project was founded in 1999. K-12 Education and Outreach Some NPGI grantees have invested considerable energy in developing outreach efforts targeted towards K-12 students. Several held workshops, where K-12 teachers learn about genomics and biotechnology and develop their own curricular modules or lesson plans (see Appendix H). Recognizing that most precollege
OCR for page 43
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology teachers are not trained in the practice of science as a process, at least two exemplary NPGI-funded programs provide six- to eight-week full-time mentored research internships through which teachers gain first-hand experience in a plant genomics laboratory, as well as education in current learning theory research, so that the teachers are well-equipped to develop research-based curricula. Other outreach efforts resulted in Internet-accessible activities and kits (including “Biotech in a Box” loaner equipment) designed and provided by the scientists or classroom visits by the researchers. The committee could not assess these programs because long-term tracking of their impact is not provided. NPGI researchers frequently tap into existing education and training programs on their campuses (NSTC 2001). Although better public outreach is needed, many PIs are not trained in K-12 education, and they cannot devote much time to it because of the demanding schedule of the research profession. To resolve this issue, some NPGI-funded programs hired a full-time coordinator who provides cohesive leadership for all their outreach activities (NSTC 2004). The committee enthusiastically endorses this concept and concluded that NPGI has set an example for other federal programs by appointing a professorial or an affiliate faculty-level education coordinator for each of its Coordinated Agricultural Projects (Interagency Working Group on Plant Genomes, personal communication, September 18, 2007). Some of the NPGI-associated outreach initiatives have been remarkably large scale. The Plant Genomics Research Experience for Teachers at the University of Missouri has trained 70 teachers over the last four years (NSTC 2007). By creating educational software and online pedagogical materials, holding workshops, and providing equipment loans and ongoing support for teachers serving low-income, rural, and underrepresented minority students, the Partnership for Plant Genomics Education at the University of California, Davis, trained 52 teachers in FY 2005. They were expected to share their information with 772 other teachers and use activities and laboratories from the course with 8,600 students (NSTC 2006). Other K-12 outreach activities that could have broad impacts are listed in Appendix H. Supplemental funding for the MaizeGDB enabled the creation of a central online repository that compiles links to outreach resources in one location, the Plant Genome Research Outreach Portal, or PGROP (PGROP 2006). The portal, which has pull-down menus, allows users to conduct searches by user type (high school teachers, undergraduate students, growers, public at large), plant species, topic (for example, proteomics), or resource type (for example, Web-accessible teaching materials and fellowships). One of the strengths of the gateway’s interface is its capacity for directors of individual outreach or educational programs to upload information about their own programs (Baran et al. 2004). Although navigation of the Web interface is straightforward, searches routinely yield an unwieldy number of marginally relevant “hits.” A search for “Resources for High School Teachers,” for example, returns links to 130 resources including many Web pages on single
OCR for page 44
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology genera of plants that distract from the relatively few resources (such as animated tutorials for use in the classroom) that are truly targeted specifically to teachers. On the other hand, searches for graduate programs or summer internships in plant genomics yield incomplete lists and nonfunctional links. PGROP is a comprehensive resource with high potential impact that would benefit from more inclusive cataloging and more robust, discriminating search functions. International Interactions Another successful aspect of the NPGI-funded efforts is their collaboration with international partners. The coordination among researchers from six groups across three continents in the Arabidopsis sequencing project (The Arabidopsis Genome Initiative 2000) paved the way for subsequent multinational endeavors (Table 2-7). Participating non-U.S. scientists in each of these projects are supported by their respective national research funding programs. The projects are overseen and coordinated by an international committee of scientists, typically elected by the research community. Such projects leverage the resources, expertise, and facilities of many countries to achieve a much richer and more comprehensive set of genome datasets than could be obtained by any single national effort. The free exchange of information engendered by such collaboration maximizes efficiency and minimizes the duplication of efforts among teams of researchers. U.S.-funded projects, from Arabidopsis Genome and Arabidopsis 2010, through the entire spectrum of NPGI projects, have led the way in truly open access data deposition. Policy recommendations for U.S. funding must be fully self-contained, both intellectually and technically. While international collaboration is important, the success of NPGI and other U.S. science cannot be reliant on access to data and resources that, to date, are often only available with intellectual property strings attached. A prominent example of a successful NPGI-supported international collaborative effort is the International Rice Genome Sequencing Project (IRGSP), a consortium of publicly funded laboratories from the United States, Japan, China, Taiwan, India, the Republic of Korea, Brazil, Thailand, and the United Kingdom. Two companies, Monsanto and Syngenta, invested in rice genome sequencing independently and their willingness to release data publicly facilitated the completion of the draft sequence, which was announced in 2002 (IRGSP 2002). The sharing of data, materials, and technology between public and private sector players hastened the completion of the projected 10-year initiative by four years. Building on the success of the rice genome sequencing project, an International Rice Functional Genomics Consortium was convened with leaders from 18 institutions representing 10 countries and two international agricultural research centers. The goals of the initiative are to work cooperatively to elucidate gene
OCR for page 45
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology TABLE 2-7 Examples of NPGI-funded Projects That Involve International Collaboration Project Name Website International Barley Sequencing Consortium http://www.public.iastate.edu/~imagefpc/IBSC%20Webpage/IBSC%20Template-home.html International Brachypodium Initiative http://www.brachypodium.org/ International Citrus Genome Consortium http://int-citrusgenomics.org/ International Cotton Genome Initiative http://icgi.tamu.edu/ International Grape Genome Program http://www.vitaceae.org/ International Legume Database & Info System http://www.ildis.org/ International Populus Genome Consortium http://www.ornl.gov/sci/ipgc/ International Rice Functional Genomics Consortium http://www.iris.irri.org:8080/IRFGC/ International Rice Genome Sequencing Project http://rgp.dna.affrc.go.jp/IRGSP/ International Soybean Genome Consortium http://genome.purdue.edu/isgc/index.shtml International Tomato Sequencing Project http://www.sgn.cornell.edu/about/tomato_sequencing.pl International Wheat Genome Sequencing Consortium http://www.wheatgenome.org/ The Multinational Coordinated Arabidopsis thaliana Functional Genomics Project http://www.arabidopsis.org/ http://www.arabidopsis.org/portals/masc/index.jsp Multinational Brassica Genome Project http://www.brassica.info/ SOL Genomics Network http://www.sgn.cornell.edu/ SOL (EU-SOL) http://www.eu-sol.net/ SOL (Lat-SOL) http://cnia.inta.gov.ar/lat-sol/ SOURCE: Interagency Working Group on Plant Genomes. function, integrate databases, establish bilateral or multilateral partnerships, and enhance rice production (IRFGC 2007). NPGI-funded PIs are prominent on the project’s steering committee, and the USDA Cooperative State Research, Education, and Extension Service (USDA-CSREES) has facilitated participation by American students, postdoctoral fellows, and senior researchers. Other functional genomics projects also capitalize on the resources and expertise of an international scientific community. For example, the goal of the International Solanaceae Genomics is to develop a comparative framework for studying plant diversification and adaptation across the Solanaceae family (including the important crop plants tomato, potato, eggplant, and pepper). The SOL Genomics Networks form partnerships with laboratories in Latin America and in Europe to improve the nutritional value, taste, flavor, fragrance, shelf-life, starch composition, yield, and other traits important to consumers, producers, and processors of these staple fruits and vegetables (European Commission 2006). The Developing Country Collaborations in Plant Genome Research (DCCPGR) program was started as an NPGI activity in 2004 to support collaborative
OCR for page 46
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology research involving researchers in the United States and scientists in developing countries. The goal is to facilitate the application of new tools and resources to solve agricultural, environmental, and energy problems of significance to the foreign researcher’s home country. Supplemental funding to an existing or a new NPGI award of up to $100,000 for two years enables joint research projects and long- or short-term reciprocal visits of students and senior investigators, which could lead to long-term partnerships (NSF 2007b). International collaborative NPGI projects that are targeted to directly benefit resource-poor farmers in developing countries include the following (NSTC 2004, 2005): Using the genome map of sorghum, an important staple cereal in Africa and India, to elucidate networks of genes that control drought tolerance. Developing cultivars of the African cow pea (a legume widely grown in Africa, Latin America, Southeast Asia, and the southern United States) that are resistant to the parasitic weed Striga. Establishing comparative markers to link the genetic maps of chick pea, cow pea, and pigeon pea to the Medicago genome sequence map, enabling breeders in India and Africa to identify disease resistance genes and develop improved cultivars of their local crops. Using proteomics technologies to develop improved oilseed cultivars in Nepal with enhanced processing and feed characteristics. Harnessing genetic variation in natural rice populations to introduce disease resistance and drought tolerance from natural populations into improved cultivars. Developing new Bolivian cultivars of potato that are resistant to bacterial wilt, which causes serious crop losses each year. Investigating the genes that allow plants to produce seed without fertilization (apomixis), which can be used to breed desirable traits into land races of corn that are adapted to the diverse growing conditions across Mexico. Two other major NPGI projects that involve substantial international collaborations and represent the next wave of genomics initiatives with applications to the developing world are the sequencing of cassava and the Generation Challenge Program. U.S. researchers and their partners at the International Center for Tropical Agriculture (a center of the Consultative Group on International Agricultural Research, CGIAR) are working with JGI to perform sample sequencing of the cassava (Manihot esculenta) genome. The tuber grows in diverse climates and in nutrient-poor soil and is an important source of food and biofuel for 1 billion people globally. As a staple for subsistence farmers, a cash crop for local markets,
OCR for page 47
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology and a reliable source of food and animal feed in famines, M. esculenta is well positioned for nutritional improvement, but genome sequencing will also provide insights into starch and protein biosynthesis and stress controls (JGI 2006). The CGIAR Generation Challenge Program is dedicated to alleviating constraints in agricultural productivity that contribute to global poverty and hunger, with an emphasis on harnessing genomic technologies to make rapid progress in the area of drought tolerance (CGIAR Genomics Task Force 2006). NPGI AND INTERAGENCY COOPERATION Earlier sections in this chapter assessed various specific aspects of NPGI, but NPGI is not merely a funding mechanism. It is an interagency collaboration that coordinates activities in plant genomics. The committee also assessed the role of IWG in facilitating research, training, and outreach. In addition, perhaps the most important metric for the success of NPGI is whether, and to what extent, U.S. research and development agencies have reprioritized their mission-oriented, agency-specific research portfolios on the basis of NPGI research and discoveries. Coordination of Programs Although each member agency of IWG has its own mission, some agencies also have overlapping interests and goals. IWG member agencies have increasingly issued joint calls for proposals or co-funded programs of mutual interest. (See Appendix I for examples). The joint programs reduce administrative burdens for principal investigators applying for funds and allow the agencies to jointly achieve common program goals. Perhaps the most important metric for NPGI is whether the science funded to date has served as a springboard for agency-specific, mission-oriented programs that capitalize on either new funding from the public or on public-private partnerships. One of the greatest challenges that the nation faces in the 21st century is reducing dependence on foreign oil. Top quality basic genome-based research is necessary to achieve these goals, and that research will rely heavily on plant genomics and genetics for its progress. Biofuels and bio-based products are potentially sustainable solutions if conversion efficiency is improved dramatically (NRC 2005a). New investments in bioenergy research will leverage basic plant genomics discoveries made though NPGI (Table 2-8). The largest to date is the $500 million investment by BP America, Inc., in conjunction with the University of California, Berkeley, and Lawrence Berkeley National Laboratory to create the Energy Bioscience Institute (EBI). In addition, DOE has made plant genomics a linchpin in its
OCR for page 48
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology Genomics: GTL portfolio for Bioenergy. Using the JGI sequencing platform as a departure point, DOE recently invested about $375 million into three bioenergy centers that are intended to accelerate basic research in the development of cellulosic ethanol and other biofuels (http://genomicsgtl.energy.gov/centers/). Further, a joint DOE and USDA-CSREES program recently announced $8.3 million in grants for improvement of feedstock (DOE 2007a). Additional new programs that leverage the basic science of NPGI include a DOE Genomics: GTL program that is soliciting proposals for new analytical and imaging technologies for lignocellulosic material degradation and for multiplexed screening for mutant plant phenotypes. Also, the 2007 Farm Bill Title VII (H.R. 2419) passed by the House of Representatives has provisions of $50 million per year for an Agricultural Bioenergy and Biobased Products Research Initiative and $100 million per year for a Specialty Crop Research Initiative to develop and disseminate science-based tools, including plant breeding, genetics, and genomics, to address needs of specialty crops (Table 2-8). At this writing, the bill has been placed on the Senate calendar for consideration. If it is passed, the initiatives would provide vital new resources to expand IWG activities. Other examples of refocusing and increased investment in agency mission-specific research include the conversion of the USDA-CSREES National Research Initiative (NRI) Plant Genome panel into a translational genomics program in recent years, with a different crop focus each year, and with a major emphasis on outreach and extension of research efforts. USDA-ARS has also refocused some of its internal programs to complement and support NPGI research. The National Program 301 on Plant Genetic Resources, Genomics, and Genetics Improvement redirected its statement of purpose to support the new discoveries made by NPGI-funded research ($140 million in FY 2007; Table 2-8). As NPGI research generates valuable data, the need for database stewardship and informatics tools to use the data effectively becomes apparent. Therefore, the National Program 301 includes a component on crop informatics, genomics, and genetic analyses that addresses genome database stewardship and informatics development, structural comparison and analysis of crop genomes, and genetic analyses and mapping of important traits. Likewise, the National Program 302 on plant biological and molecular processes has redirected its focus to applications of genomics to crop plants because of NPGI discoveries. As a result of NPGI-funded research on model plants, National Program 302 has refocused its objective to take advantage of the new genomic information and to advance it from model plants to crop plants ($40 million in FY 2007; Table 2-8). The goal is to translate plant genomics into crop improvement. Applied mission-oriented, agency-based forest tree genomics programs have also been derived from basic discoveries made through NPGI. For example, a
OCR for page 49
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology TABLE 2-8 Examples of Agency-specific And Mission-focused Programs That Have Spun Off of, or Benefit from, Results of NPGI Research Programs Program Budget (in millions) Energy Biosciences Institute, UC Berkeley, BP, LNL $500 over 10 years Agricultural Bioenergy and Biobased Products Initiativea $250 over 5 years Barley Coordinated Agricultural Program $5 over 4 years Bioenergy Research Centerb $375 over 5 years Conifer Coordinated Agricultural Program $6 over 4 years National Program 301: Plant Genetic Resources, Genomics, and Genetics Improvement $140 in FY 2007 National Program 302: Plant Biological and Molecular Processes $40 in FY 2007 Plant Feedstocks Genomics for Bioenergy $8 over 3 years Rice Coordinated Agricultural Program $5 over 4 years Specialty Crop Research Initiativea $500 over 5 years Wheat Coordinated Agricultural Program $5 over 4 years aThe Agricultural Bioenergy and Biobased Products Initiative and the Specialty Crop Research Initiative were proposed in the 2007 Farm Bill, which has not been passed by Congress at the time this report was written. bThe program budget presented for the Bioenergy Research Center includes funding for plant and microbial research and technology development. multimillion-dollar Coordinated Agricultural Project from USDA-CSREES and USDA Forest Service (USFS) on conifer genomics began in 2007 and will allow association genetic studies of trees in the major breeding programs throughout the United States. Each of the above examples is testament to the power of federal investment in competitive, peer-reviewed, curiosity-driven basic plant genomics research, and illustrates the return reaped in translation to agency-specific, mission-oriented applied plant genomics. In-kind Support and Distribution of Resources Although some IWG member agencies fund plant genomics research, all of them contribute to the goals of NPGI by providing in-kind support, distributing resources, and keeping each other abreast of latest genomic technologies. In-kind Support (provided largely by IWG member Agencies) USDA-ARS USDA-ARS funding of $3.6 million in FY 2002 and $8.5 million in FY 2006 for plant bioinformatics includes support for the following projects:
OCR for page 50
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology The maize genetics and genomics database. This project aims to synthesize, display, and provide access to maize genomics and genetics data for the research and user communities. Identification of functional sequence in plant genomes through bioinformatic, genomic, and genetic approaches. This project aims to provide resources to characterize, track, and identify sequence associated with agronomically important traits. An integrated database and bioinformatics resource for small grains. This project aims to integrate small grains genetic and genomic data within the Grains-Genes database and link to relevant external databases. It also aims to develop software and interfaces to enhance utility for researchers. Curation and development of the Soybean Breeder’s Toolbox and its integration with other plant genome databases. This project aims to implement webaccessible computation and visualization tools to enable comparison and transfer of agronomically important genetic information among soybean and other related species. The project also involves the curation and enhancement of the SoyBase and the Soybean Breeder’s Toolbox and the coordination of the assembly and annotation of soybean whole-genome sequence. DOE In addition to funding individual research projects, DOE’s contributions to NPGI include sequencing of plant species through its Community Sequencing Program (CSP) or Laboratory Science Programs (LSP). Examples of plant genome sequencing by DOE’s Joint Genome Institute through CSP include Physcomitrella in 2005; Selaginella, sorghum, Arabidopsis lyrata, Capsella, Mimulus, and the chloroplast of Campanulales in 2006; and Brachypodium, Aquilegia, Gossypium, cassava, maize, soybean, and Eucalyptus in 2007. JGI was the lead organization in the sequencing of poplar (Table 2-1). JGI is committed to plant EST sampling as well. For example, JGI agreed to produce ESTs for switchgrass and peach in 2007, as well as for eucalyptus, foxtail millet, and conifers—loblolly pine and 22 other species selected for their commercial and ecological importance or their ability to provide phylogenetic insight into conifer genome evolution—in 2008 (DOE 2007a) (see Table 2-1). The committee notes that JGI’s contribution to plant genomics is unique and fundamental, and spans both explicitly energy-oriented projects and projects that broadly inform all of plant biology from evolution through comparative genomics. There is no other high-throughput sequencing facility interested in serving plant genomics that can match JGI’s power and consequent economy of
OCR for page 51
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology scale. These points inform one of the committee’s most important recommendations (see Chapter 3). USDA Forest Service The USFS has 10 full-time-equivalent scientists who conduct genomics research. USFS has also provided technical support in tree genomics or molecular genetics in the form of competitive awards or cooperative agreements (see Appendix K). Compared to funding from USDA-CSREES NRI and NSF grant programs, USFS has to date made modest investments in plant genomics. Other in-kind products of USFS include: Maps of amplified fragment length polymorphism and single sequence repeats for the American beech. Markers for the selection of butternut that is resistant to butternut canker. Markers for the improvement of black walnut. Multiplex sequencing capability on high-capacity sequencing platform. Specific markers for identifying rust-resistant loblolly pine. Neutral markers for QTL analyses of loblolly pine. NHGRI Although NHGRI’s primary focus is human genome sequencing, it plays a role in advancing NPGI’s objectives through its support for genome sequencing and its built genomics infrastructure. NHGRI has provided financial support for a number of large-scale sequencing centers over the years. Although NHGRI does not fund plant genome sequencing directly, parts of some plant genome sequencing projects have been done at one of the NHGRI-supported sequencing centers, and many of the fungal pathogen genome sequences noted above were done as part of the Broad Institute’s Fungal Genomics Program. NHGRI also supports the advancement of sequencing technology, development of bioinformatics tools, and identification of all functional elements in the human genome. As a member of NPGI, NHGRI can pass on the technologies and tools developed and lessons learned to the plant community swiftly. NHGRI continues to promote free and open data release and keeps NPGI updated on NHGRI’s policies. In fact, NPGI has adopted the Bermuda accord that requires rapid release of publicly-funded sequence assemblies of 2kb or larger and the Fort Lauderdale accord that defines a community-resource project. NHGRI considers whether the data release policies are appropriate periodically and keep NPGI informed on those discussions.
OCR for page 52
Achievements of the National Plant Genome Initiative and New Horizons in Plant Biology Distribution of Resources The National Plant Germplasm System (NPGS), managed and funded by USDA-ARS in partnership with agricultural experiment stations and land-grant universities, aids plant scientists by conserving the plants and seeds of nearly 10,000 species. To ensure that genes are available to NPGI fundees, NPGS continues to acquire, preserve, evaluate, document, and distribute crop germplasms, many of which originate outside the United States (ARS 2005). NPGS distributed over 150,000 accessions in 2006, including 9,131 Triticum, 5,597 Oryza, 11,951 Zea mays, 19,349 Glycine, 9,729 Lycopersicon, and 5,073 Vitus. Among those distributions, some were mutants or cytogenetic stocks (based on information submitted to the committee by USDA-ARS on May 17, 2007). Because of the increasing demand as a result of NPGI-funded research, stock centers were built or expanded. For example, the Maize Genetics Corporation was expanded to provide long-term curation of maize mutant genetic stocks developed by NPGI awardees. The Genetic Stocks—Oryza Collection was established as a result of NPGI when the rice genome was sequenced and the need for a collection of rice seed mutant genetic stocks was recognized. Mutant seed genetic stocks of other plants developed by NPGI awardees are added to the working collections of other NPGS repositories. Other than germplasm collections, many Websites and databases were developed or expanded as a result of NPGI (see Appendix F).