CHAPTER THREE
Maximizing the Return from Reference Sequencing: Translational Agriculture
We suggest in Chapter 2 a focused sequencing effort on a small number of carefully selected genomes. This is greatly preferred over moderate investment in many plant species because it concentrates research efforts on identification of genes and key biologic functions in experimental contexts where those goals can be achieved economically. Our proposal will leverage the power of comparative and functional genomics to understand the biology of the selected crop species and their relatives. It is noteworthy here that the translation to agronomic problems will be facilitated precisely by some of the investments made in the NPGI to date. For example, physical maps of the soybean and maize genomes and extensive sets of expressed sequence tags (ESTs) are now available for many crop species. We encourage further development of mapping tools in crop species during the 2003–2008 period. It is vital that the basic discoveries anticipated from the sequencing of reference species be integrated into current efforts in crop breeding and biotechnology. The NPGI should expand to include applied-plant-biology communities, both public and private, and should involve them explicitly in translation and application of basic discovery to crop improvement. That can be achieved, for example, by developing easy-to-use molecular markers for breeding, by defining genomic intervals that carry traits of interest, and by developing informatics-based tools to hasten translation.
It is important to describe and fully exploit conserved syntenic relationships among species in Poaceae, Brassicaceae, Solanaceae, and Fabaceae so that any finely mapped qualitative or quantitative character in any member of the reference genomes can be easily bred into agronomically relevant cultivars. This effort includes the development of fine-structure genetic and physical maps of key species and the development of a comprehensive set of anchor markers. We anticipate that these tools will enable detailed characterization of the genes that contribute to specialized traits of agronomic interest, such as drought tolerance, salt tolerance, disease resistance, seed quality, and plant architecture.
To determine the genetic basis of traits of economic interest, it will be useful to map genetic variation with relatively high precision, including SNPs or simple, DNA-based, high-resolution introgression tools. Thus, it is important to have comprehensive BAC libraries for species of major agronomic interest and to use BACs for end- and low-pass draft sequencing to see which genes are in each “map bin.” Coupled with identification of a polymorphism for each BAC, this approach would provide powerful tools to assign genetic variation to a small set of candidate genes quickly. Such advances will facilitate translation from reference species to other crops.
TOOLS FOR TRANSLATIONAL AGRICULTURE
The following sections outline a variety of approaches that should be undertaken to develop tools for translational agriculture.
-
Construction of genetic maps for key, carefully chosen species. We recommend identification of a set of several hundred conserved genes that can be used as anchor markers for comparative map construction and phylogenetic studies across relevant taxonomic distances. If possible, sets of conserved genes should be chosen on the basis of the biologic significance of the underlying genes and pathways and of genomic distribution; map positions should initially be defined in a reference genome. Comparative maps lay the foundation for high-resolution mapping of simply
-
inherited and quantitative traits and for gene discovery. Thus, they are an essential part of any genomics toolkit and allow information to flow between species of interest to diverse plant scientists, including breeders and evolutionary biologists.
-
Construction of physical maps for a small number of species (a subset of 1, above) using high-quality BAC libraries. Physical maps can be assembled from end-sequenced BACs. These are useful for finding, sequencing, and mapping genes of interest; investigating gene families; identifying transposable elements; and defining small-scale genome rearrangements.
-
Establishment of mapping populations (preferably fixed inbreds) with genotypic segregation data. Choice of parents for the development of mapping populations should ensure that one parent represents the genotype from which the BAC library is constructed and from which sample sequencing is performed, and that the cross segregates phenotypes of interest. The populations are useful for associating genotype with phenotype and for allele mining, and can be evaluated by different researchers interested in different phenotypes and allele combinations in different environments. The information can be readily shared via genome databases.
-
Assuming that sequencing costs continue to drop, or funding levels increase, and that the appropriate prerequisites (see Chapter 2) are met, it will be possible to begin sequencing the gene-rich regions of additional key crop species. Translational agriculture will eventually be simplified by the availability of genomic DNA sequence. The species identified in Chapter 2 for large-scale genomic sequencing were recommended as the top priorities for the NPGI because they fulfill essentially all of the criteria for the definition of a reference species, and because appropriate biological tools have been developed for these species in preparation for full-scale sequencing. The availability of genome sequence for other important crop species, for example, soybean and wheat, could provide valuable information for research on those crops, but they meet far fewer of the criteria described in Chapter 2 and have not yet met the prerequisites to sequencing. For example, there are not solid estimates of the size of the soybean gene space, or of the distribution of gene dense and gene poor
-
regions in the soybean genome; wheat is polyploid, creating sequence assembly problems, and neither wheat nor soybean is transformed in a routine manner. If the genomes of these crops were to be adequately characterized and the essential biological tools developed, draft sequencing of gene rich regions of soybean and wheat could begin in the 2003–2008 time frame. This recommendation is additionally contingent on drops in sequencing costs and fulfillment of the other, more clearly justifiable NPGI priorities outlined here.
-
Germplasm collections with molecular genotypes. Germplasm collections offer a larger view of genotypic and phenotypic variation than can be studied with a mapping population. Genotypic information about accessions can be derived from conserved simple-sequence repeats, SNPs, or other automated molecular marker systems. Data on such a collection can be used by breeders and by population geneticists to evaluate population structure, to determine the extent of recombination, to develop statistical models for interpreting genetic linkage patterns, for allele mining, and to perform marker-assisted selection. They can also be used to compare gene diversity in different ecotypes or subspecies, to clarify phylogenetic relationships among populations or closely related species, and to look for evidence of and evaluate the extent of allelic diversity.
-
Transcript identification. The genomic sequencing efforts, and the building of translational tools delineated above, are already supported by EST projects, some of which are being generated in ongoing NPGI projects from normalized libraries of different tissues, different developmental time points, or system perturbation contexts (see Table 3.1). More-extensive use of alternative gene-discovery technologies—such as serial analysis of gene expression and microbead-based and other representational methods—should be considered to complement EST projects. Comparison of ESTs or complete cDNA sequences with genomic sequence will provide critical information on gene content, family membership, and sequence diversity. Furthermore, the cDNA-based resources are crucial for accurate annotation of the genomic sequence and will provide information on allelic diversity for use in molecule-based breeding.
Table 3.1. Ten Largest Plant EST Collections by Species. (NCBI 2002) dbEST release 080902
Species |
No. of ESTs |
Glycine max (soybean) |
263,737 |
Hordeum vulgare+subsp. vulgare (barley) |
215,714 |
Triticum aestivum (wheat) |
175,836 |
Arabidopsis thaliana (thale cress) |
174,624 |
Zea mays (maize) |
165,518 |
Medicago truncatula (barrel medic) |
162,741 |
Lycopersicon esculentum (tomato) |
148,346 |
Chlamydomonas reinhardtii |
112,487 |
Oryza sativa (rice) |
104,594 |
Solanum tuberosum (potato) |
94,257 |
Total plant ESTs |
1,617,854 |
Total ESTs in GenBank |
12,323,094 |
SOURCE: NCBI 2002. |
-
Therefore, we advocate sequencing of full-length cDNAs from all the Arabidopsis genes to generate a baseline “plant Open Reading Frame reference set"—the ORFeome—that represents the set of genes from which protein is made. In addition, based on the EST data sets and unigene assemblies now available via NPGI, we advocate sequencing full-length cDNAs for those genes of the other reference species that are either not found in, or are most diverged from, relatives in Arabidopsis. This hierarchical approach to full-length cDNA sequencing will eventually yield a plant ORFeome that incorporates many aspects of plant evolution, as well as having very high value for functional studies.
-
Decorating the virtual plant. The ultimate goal of the Arabidopsis 2010 Functional Genomics Project (NSF 1999) is the development of a virtual plant whose metabolic and gene activity status can be monitored “in silico,” that is, in a computer model, at any time and under any condition. Extending the concept to include information incorporated
-
from all the reference genomes would result in a virtual plant whose most basic functional and structural attributes would be generalizable to all plants. We envision decorating the backbone of this virtual plant with additional virtual representations of specialized cell and tissue types and, in fact, whole organs over developmental time. For example, EST sequencing of the oil-gland secretory cells of peppermint plants demonstrated a substantial enrichment, compared to leaf tissue, in expression of genes involved in oil production. In a simple analogy, the virtual plant should include the ability to make it “grow a tuber” or “develop a cotton boll.” To reach the goal of incorporating important plant phenotypes (such as cotton fibers, tuber formation, apomixis, perennial habit, fleshy fruit development, nitrogen fixation, heterosis [hybrid vigor], nutrient uptake and homeostasis, and cambium development) into the framework of the reference species, it will be necessary to sample gene expression deeply in judiciously chosen, specialized cell and organ types from a variety of species. We recommend that the NPGI support approximately 25 projects to sequence ESTs from specialized plant cell types and organs in species from which specific novelties in the expressed gene sets can be expected.
PLANT INTERACTIONS WITH THEIR BIOTIC ENVIRONMENT
Up to 30% of crop yield worldwide is lost to pests and pathogens. Thus, a systematic understanding of plants should include their interactions with their environment writ large. In this regard, an expanded NPGI project portfolio including plant interactions with pathogenic, mutualist, and symbiotic organisms will have huge rewards. The 2003– 2008 phase of the NPGI should include analyses of fungal genomes. The focus and criteria for selection should be related to how plants regulate their interactions with the biotic environment. Pathogenic fungi of agronomic importance that meet many of the criteria of a tractable experimental species outlined for the selection of reference species should be considered for sequencing. For example, Magnaporthe grisea (rice
blast) is being sequenced with the support of the National Institutes of Health. Other examples might include rust (Puccinia), powdery mildews (Erysiphe), and oomycetes (Phytophthora or Peronospora). Those are all pathogens of the model, Arabidopsis, of the likely reference species, and of important related crop species. Thus, the strengths of concurrent analysis of both host and pathogen can be applied to understand pathogenesis in a broad array of host-parasite interactions.
Equally important is an understanding of fungi beneficial to plants, such as the mycorrhizal species and the obligate endophytes of cool season grasses. It might be premature to propose a sequencing project for such organisms, but it is nonetheless important that they be incorporated into the genomics-based plant systems biology and that experimental tools and approaches be developed for their future exploitation.
INFRASTRUCTURE FOR GENOMICS RESOURCES
Preserving high-quality specimens of genomic resources is important to empower plant-research groups worldwide. The generation of DNA sequence and translational tools will drive the need for new stock centers. Materials developed as part of federally- and internationally-funded initiatives, including collections of unique and valuable seed stocks, clone libraries, and databases have already outgrown the ability of individual labs and projects to manage and distribute them effectively to the community. Professionally managed stock centers designed to collect, organize, maintain and distribute high-quality genomic resources to the community at large are needed to facilitate genomics research.
Stock centers might be organized to manage a variety of resources developed for a specific family of plants (such as the Arabidopsis Genome stock center at Ohio State University), or they may be organized to distribute a specific type of reagent or resource for a wide range of plant families (such as BAC Resource Center currently located at Clemson University). There would be advantages to developing specialized stock centers for several species of plants in collaboration with foreign national and international institutions that house the world’s germplasm reposito-
ries (along with much of the knowledge about specific plant families). Expanding the level of international collaboration and exchange would enhance access to information, germplasm and technology for scientists throughout the world and motivate the formation of partnerships that would generate novel opportunities for innovative genomics research.