Click for next page ( 83


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 82
The Polar Genome Science Initiative Evolutionary processes have created many biological communities that are as stunning in their beauty and complexity as they are unexpected by and novel to biological scientists (Diamond, 2001~. Some of the most fascinating and diverse of these natural experiments have occurred among the organisms and biological communities of the polar regions. Effective strategies for exploring polar ecosystems using approaches based on genome science and other technologies can rapidly advance our under- standing of these ecosystems. SELECTION OF ORGANISMS AND CONSORTIA FOR GENOME ANALYSIS The success of the publicly and privately financed human genome initiatives (Lander et al., 2001; Venter et al., 2001) is directly attributable to the development of high-throughput, low-cost DNA sequencing tech- nologies and appropriate bioinformatics tools for assembling and anno- tating the approximately 3 billion base pairs (bp) of the human genome. Clearly, the sequencing of genomes should no longer be constrained to "model" organisms or limited by resource considerations, and the Polar Genome Science Initiative need not focus on technology development. Nevertheless, the selection of organisms or consortia must be guided by appropriate criteria. The committee proposes that selection of an organ- ism or consortium be based on evidence that: 82

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 83 analysis of its genome will address broad and significant scientific questions; it is a good model for evolution in an isolated polar environment; it provides opportunities for comparisons with organisms of com- parable ecotype from polar habitats and along polar-to-temperate latitu- dinal clines; or its cellular processes possess characteristics of biotechnological or clinical interest. Based on these criteria, the committee provides examples of polar species and consortia that fit the selection criteria mentioned above, but certainly other organisms may fit the selection criteria and warrant sequencing in the near term. The knowledge gained from these organisms will provide an invaluable framework for identifying other organisms for future sequencing projects. Whether some or all of the organisms listed below or other polar organisms are selected for genome analysis will depend on the availability of funding and on changes in research . . . priories. Prokaryotes Efforts in prokaryotic microbial genomics over the past decade have provided a wealth of information on the nature of microbial diversity and the forces that shape prokaryotic genomes. To date, more than 80 prokary- otic genomes have been sequenced completely (), with many more in progress. This collection of data contains more than 250,000 bacterial genes from phylogenetically diverse species; however, it likely represents less than 0.1 percent of the globally distributed prokaryotic gene pool (Stahl and Tiedje, 2002~. Remarkably, however, no genome sequence has yet been completed for a psychrophilic prokaryote; and to the committee's knowl- edge, only two are in progress (isolates of Colwellia and Psychrobacter). This is a shortcoming that should be remedied since psychrophilic prokaryotes offer a tremendous opportunity to better understand the genetic basis for the psychrophilic phenotype, which has potential bio- technological applications as outlined in Chapter 2. Because of their small genomes, it is completely within reason to obtain near full-genome sequences for numerous (20 or more) prokaryotes isolated from Arctic and Antarctic locations including sea and freshwater ice, permafrost, and other extremely cold niches. By comparing the genetic "informational content" of a large number of psychrophilic prokaryotes with each other and with the genetic complements of psychrotolerant and mesophilic prokaryotes, especially those belonging to the same or closely

OCR for page 82
84 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA related genera, it should become apparent whether the psychrophilic phenotype is associated with the presence of specific genes that are not found in nonpsychrophiles, whether psychrophiles have conserved themes in gene complements that distinguish them from their nonpsychrophilic counterparts (for example, encoding enzymes for the synthesis of specific lipids or fatty acid derivatives well suited for life at low temperature, the synthesis of osmolytes that have cryoprotective properties), or whether there are genome features (for example, number of duplicated genes, abundance of mobile genetic elements) that might play a role in adapta- tion to the cold. In addition, the sequence information would serve as the basis for conducting functional genomic studies (transcriptome, proteome, and metabalome analysis) to determine, for instance, whether growth at low temperature involves differences in the abilities of psychrophiles and nonpsychrophiles to express their genome complements at low tempera- ture (for example, the synthesis of transcripts and polypeptides) or whether it relates to the activities of certain proteins at low temperature. Within the criteria outlined at the beginning of this chapter, top can- didates for sequencing would be representative psychrophilic bacteria, particularly organisms that have closely related psychrotolerant isolates for comparison. One project currently in progress involves sequencing the genome of a representative Colwellia sp., which belongs to the gamma- Proteobacteria. The Colwellia being sequenced is an obligate psychrophile isolated from Arctic sediments. It would be prudent to include psychrophiles from other phylogenetic groups for comparison. Other representatives of the bacterioplankton community might include polar representatives of the SAR 11 group, which are abundant in polar bacterio- plankton communities (Bang and Hollibaugh, 2002~. These prokaryotes may or may not be psychrophiles. A temperate representative of this group has just been cultured and is currently being sequenced. Given that SAR 11 small subunit (ssu) ribosomal ribonucleic acid (rRNA) gene sequences from polar and temperate environments are slightly but con- sistently different (Bang and Hollibaugh, 2002; Martinez and Valera, 2000), it is likely that polar populations differ from temperate or tropical repre- sentatives of this group. Other groups of polar prokaryotes that should be considered for sequencing include those isolated from the Siberian permafrost (Vishnivetskaya et al., 2000; Vorobyova et al., 1997) and the low-temperature Crenarchaeota that have been shown to dominate Antarctic plankton communities at times (Delong et al., 1994; Massana et al., 1998; Murray et al., 1998~. Unfortunately, the latter group of organ- isms does not yet have any representatives in culture. As further research unravels the ecology and physiology of polar plankton communities, other candidates for genome sequencing will become obvious.

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 85 Cyanobacteria. Cyanobacterial mats dominated by oscillatorians are a feature of streams, lakes, and ponds in both Arctic and Antarctic regions (Vincent and Neale, 2000; Priscu et al., in press) and constitute a major component of autotrophic community biomass and productivity in these polar deserts (Priscu et al., 1998; Vezina and Vincent, 1997; Vincent et al., 1993~. Surprisingly, although they are abundant in temperate and tropi- cal oceans, marine cyanobacteria have not been found in polar waters. Although the polar freshwater ecosystems are predominantly cold, with summer temperatures rarely exceeding 0C, most cyanobacteria isolated from these habitats are psychrotolerant and show optimal growth and photosynthesis at 15C or higher (Fritsen and Priscu, 1998; Tang et al., 1997a). These data imply that polar cyanobacteria evolved from temperate latitudes and later colonized polar regions (Seaburg et al., 1981; Vincent and lames, 1996~. Recently, Nadeau and Castenholz (2000) described the first true psychrophilic strains of oscillatorians (isolated from Bratina Island, Antarctica) that have optimal growth at 8C and cannot survive at temperatures in excess of 20C. Related Arctic psychrophilic strains were also identified. Phylogenetic analyses of these polar isolates at the ssu rDNA level showed that the few psychrophilic oscillatorians described have arisen in one branch, whereas evolution of the psychrotolerant phenotype has occurred several times (Nadeau et al., 2001~. Nadeau et al. (2001) also showed that psychrotolerant strains are most closely related to organisms of temperate latitudes. The occurrence of a shared rare 11-nt insertion in concert with phylogenetic relationships implies that psychro- tolerant strains from both Arctic and Antarctic isolates originated from temperate species, whereas psychrophilic strains appear to have arisen independently. A complete genome sequence of a psychrophilic cyano- bacterium will allow scientists to establish a database for examining issues of biodiversity, biogeography, and community structure in these impor- tant polar mat-forming organisms. Such analyses may also reveal the mechanisms of temperature tolerance of the psychrotolerant species and the mechanisms of low-temperature adaptation of the psychrophilic species. Comparison of the psychrophilic Oscillatoria genome with the genomes of marine cyanobacteria already available may reveal clues as to the factors limiting distribution of the latter group. Understanding the evolutionary relationships of polar mat-forming oscillatorians may have important implications for the study of the origins of life on our planet and others (see Chapter 2~. Given the important role that cyanobacteria played in the formation of atmospheric oxygen, knowledge of their phy- logeny will also provide new information on the evolution of oxygenic groups and planetary geochemistry.

OCR for page 82
86 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA Protists Polar algae. Chlamydomonas subcaudata is a green psychrophilic alga isolated from the permanently ice-covered Lake Bonney in the McMurdo Dry Valleys in Antarctica (Lizotte and Priscu, 1992~. Because of the ice cover and subsequent lack of vertical mixing, the temperature, nutrient, and irradiance regime experienced by this organism in situ is extremely stable. C. subcaudata is the dominant species in the deep trophogenic zone (17-20 m) of Lake Bonney, where the average temperature and maximum irradiance during the austral summer are 4 to 6 degrees C and 14 Wool photons ~2 S-) (Lizotte and Priscu,1992~. Light penetrating to this depth is mostly in the blue-green wavelengths owing to differential attenuation by the ice cover. As a psychrophile (Morgan et al., 1998), C. subcaudata exhibits unique physiological responses to low temperature when compared to temperate algae or psychrotolerant cyanobacteria isolated from the poles. When exposed to moderate irradiance (150-250,umol ~2 S-~) and low tempera- tures (5-10C), most species of temperate algae and psychrotolerant cyanobacteria show lower chlorophyll content per unit biomass, smaller amounts of photosystem II harvesting proteins, and increased carotenoids (Maxwell et al., 1994, 1995; Tang et al., 1997b), resulting in a visually yellow or orange color. The adjustments in pigment content and light harvesting capabilities allow the cells to maintain balance between the light energy absorbed through photochemistry and the energy consumed through metabolism (Huner et al., 1998), and they protect the cells from photoinhibition. Unlike the other algae and cyanobacteria, C. subcaudata displayed none of these physiological characteristics when grown under moderate irradiance (150 Wool m-2 sol) and low temperature (8C); (T. Pocock, 2002, University of Western Ontario, personal communication). Compared to a mesophilic species C. reinhardtii, C. subcaudata had rather low levels of photosystem I (Morgan et al., 1998), indicating adap- tation to a predominantly blue-green light spectrum (Neale and Priscu, 1995~. Furthermore, C. subcaudata possessed high levels of xanthophylls and low levels of p-carotene, suggesting that this phytoplankton species has efficient light harvesting but reduced photoprotective ability com- pared to C. reinhardtii (Neale and Priscu, 1995, 1998~. Despite its constant exposure and its photoacclimation to low temperature and low irradi- ance, C. subcaudata retains its capacity to adjust its pigment composition via the xanthophyll cycle, thereby allowing the cell to photoacclimate to high irradiance and to resist photoinhibition (Morgan et al., 1998~. Given its specific adaptation to a narrow spectral distribution and its ability to photoacclimate to low and high irradiance, genomic comparison of the C. subcaudata to C. reinharitii could further our knowledge of how

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 87 algal cells photoadapt and photoacclimate to spectral quality and quantity. C. reinhardtii, a temperate algae commonly used as model system is currently being sequenced at Duke University (~. Phaeocystis antarctica is the primary Antarctic species within the globally important genus Phaeocystis and is one of the most important representatives of the family of Prymnesiophytes in the planktonic envi- ronment (coccolithophores are the other). Phaeocystis occurs in almost every open ocean habitat and forms large blooms in Arctic, Antarctic, and temperate waters (Smith et al., 1991~. One distinctive characteristic of Phaeocystis is the formation of globular cell colonies, sometimes contain- ing thousands of cells and attaining millimeter size. P. antarctica has several characteristics that make it both biochemically interesting and a key organism in Antarctic biogeochemistry. P. antarctica forms dense blooms during the Southern Ocean spring-early summer in both pack ice and open water areas (e.g., Ross Sea Polynya; Di Tullio et al., 2000~. These blooms provide an important source of early-season organic carbon to these waters, in part because carbon exported from the blooms tends to stay in the water column (DiTullio et al., 2000~. P. antarctica is also the major planktonic source of organosulfur compounds, which control atmo- spheric concentrations of dimethyl sulfide (DMS), a climatically impor- tant gas (Smith and DiTullio, 1995~. One of the current questions in the phytoplankton ecology of the Southern Ocean is what controls the rela- tive abundance of P. antarctica versus Antarctic diatoms, which co-occur in spring-summer blooms (Smith et al., 2000~. The blooms tend to be spatially separated in the Ross Sea, but the reasons for the spatial separa- tion are unclear. The difference has biogeochemical significance because diatoms and Phaeocystis favor different consumer and decomposer assem- blages, and diatoms are the source of most carbon buried in Southern Ocean deep-sea sediments (Arrigo et al., 2000; Nelson et al., 1996~. Open ocean enrichment experiments have also revealed that diatoms and P. antarctica appear to have different responses to limitation by dissolved iron (Boyd et al., 2000~. The response of P. antarctica to ultraviolet (UV) exposure is another topic that has received attention from Antarctic investigators. Because P. antarctica blooms in the early spring, this species is exposed to solar UV during the period when the development of the ozone hole leads to large increase in UV-B irradiance in the polar biosphere. P. antarctica, like other species of the genus, can accumulate high concentrations of the major UV-absorbing compounds, mycosporine-like amino acids (Merchant et al., l991~. These compounds accumulate in larger amounts in this alga (in proportion to other pigments) than in any other Antarctic phytoplankton, making P. antarctica a particularly attractive system to study the regula-

OCR for page 82
88 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA lion and function of microsporine-like amino acids (MAAs) in marine phytoplankton (Moisan and Mitchell, 2001; Riegger and Robinson, 1997~. Furthermore, the extracellular release of MAAs can be studied in P. antarctica in the colonial form. MAAs are found in single cells, but they are also excreted in the colonial matrix material (Merchant et al., 1991~. Sequencing P. antarctica will define the evolutionary position of the Prymnesiophytes in general and the evolution of Phaeocystis as the only Prymnesiophyte genus common in polar waters, clarify systematics of the genus (Medlin et al., 1994), and enhance our understanding of sulfur cycle and responses to UV radiation. The Arctic congener, P. arctics, shares most of the physiological traits of P. antarctica but is a distinct species as defined by a number of characteristics, including ssu rRNA sequence (Medlin et al., 1994~. Comparing the genomes of those two organisms would provide additional insights into the factors driving phytoplankton speciation. Metazoans Antarctic fishes: Dissostichus mawsoni, Chaenocephalus aceratus, and D. eleginoides. Among polar organisms, the phylogenetic history of the Ant- arctic notothenioid fishes is, without doubt, the most complete (Chen et al., 1998; Eastman, 2000; Eastman and McCune, 2000; Ritchie et al., 1996~. Living at constant extreme cold for ectothermic bony fishes required adap- tive changes in their biochemical and physiological functions; thus, the notothenioids are a "swimming library" of cold-adapted genes and proteins. We have exciting glimpses of some of these changes: (1) the paradoxical loss of vital cell types, genes, and proteins, including the oxygen-binding protein hemoglobin and red blood cells in the icefish family; and (2) the evolution of novel genes that encode proteins with new functions, exemplified by the antifreeze glycoproteins (AFGPs) of most notothenioids. Currently, laboratories throughout the world are engaged in mechanistic studies of biochemical and physiological adapta- tion to cold and of the gain and loss of genes, but these efforts are focused largely on discrete traits or gene families. Sequencing the genomes of three select species of the suborder Nototheniidae (Gon and Heemstra, 1990) could enhance our understand- ing of environmentally driven evolutionary processes. Two of the three species are endemic to the Antarctic and the other is a cool-temperate relative: (1) the Antarctic toothfish Dissostichus mawsoni, a member the oldest lineage (the family Nototheniidae); (2) the Antarctic blackfin icefish Chaenocephalus aceratus, a member of the most derived family (the icefishes, Channichthyidae); and (3) the Patagonian toothfish D. eleginoides (a cool-temperate congener of D. mawsoni). Comparative analyses of these

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 89 genomes should provide major insight into the progression of evolution- ary events that led to the explosive diversification of the notothenioid lineage from its origin as a temperate stock. The haploid genomes of these fishes probably measure ~2 picograms (pg.), or approximately two- thirds the size of the human genome. Once one fish genome has pro- gressed sufficiently, the sequencing of subsequent species will be greatly eased by the ability to assemble onto linkage scaffolds established for the first. Mammalian hibernators: The Arctic ground squirrel and the black bear. Several mammals overwinter in extreme conditions by entering a state of suspended animation known as hibernation (Boyer and Barnes, 1999~. Although little is known about the molecular genetic events that underlie the hibernating phenotype, the interspersed phylogenetic distribution of hibernating and nonhibernating species has led to the hypothesis that rather than requiring the creation of novel gene products, hibernation results from the differential expression of existing genes. Therefore, it is possible that a small number of genetic events are necessary to acquire the ability to hibernate. The mammalian hibernator genome project would focus on sequencing the genomes of two animals that have different strat- egies of hibernation. The sequencing work would be complemented by studies of the patterns of tissue-specific gene expression that enable these animals to express the hibernation phenotype. Arctic ground squirrels (Spermophilus parryii) and black bears (Ursus americanus) are suitable for elucidation of the genomic and transcriptome- level changes that support hibernation because their hibernation cycles are extremely predictable and physiological changes are so dramatic. They survive the winters of Alaska without eating or drinking for six to eight months by reversibly lowering their metabolism. This metabolic shift has profound ramifications for every mammalian physiological sys- tem, yet there are significant differences between the hibernation charac- teristics of squirrels and bears. Ground squirrels reduce their body tem- perature as much as 40C and attain core body temperatures near -2.8C (Barnes, 1989~. Black bears reduce their body temperature by only about 5C. Ground squirrels lose protein and bone mass, while bears maintain both. A comparative approach to analyzing the genomes and the differ- ences in gene expression patterns in ground squirrels and bears during hibernation will facilitate our understanding of the underlying molecular mechanisms that provide tolerance by molecules, cells, and organs to these extreme changes and will provide great potential for beneficial bio- medical applications for humans. For example, understanding the molecu- lar mechanisms of bone mass maintenance in bears may lead to therapeutic modalities that prevent osteoporosis in chronically hospitalized patients (Becker et al., 2002~. During hibernation in squirrels, blood flow to the

OCR for page 82
So FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA several tissues is reduced by as much as 98 percent of normal for up to three weeks, yet no tissue damage from reduced oxygen availability occurs because the metabolic rate is similarly reduced (Boyer and Barnes, 1999~. Identification of the molecular genetic mechanisms affording pro- tection from low blood flow and reperfusion may be applied to protection from injury due to stroke and heart attacks in humans. Another potential use of data obtained in the study of hibernators could be in the develop- ment of emergency field medical protocols for inducing a state of hypo- metabolism in gravely injured humans, for example, soldiers wounded on the battlefield who cannot be transported rapidly to a medical center. By inducing reductions in metabolic rate and enhancing tolerance of reduced blood flow, mechanisms may be developed for sustaining life until sophisticated medical attention can be given to a patient. Polar nematodes. Nematodes in Arctic and Antarctic soils are predators of bacteria, fungi, and other microscopic animals and can be the dominant invertebrates in some polar soil systems. They are important in soil foodwebs because they feed on the primary decomposers (bacteria, yeast, fungi) and influence the rates of decomposition and nutrient cycling. In the Antarctic Dry Valleys, the bacterial-feeding nematode species Scottnema lindsayue lives in the extremely dry soils in water films around soil particles. When unfavorable environmental conditions occur (such as decreasing moisture and temperature), the animals enter into a metabolic state, termed anhydrobiosis (life without water), enabling them to freeze and survive (Treonis et al., 2000~. Favorable soil temperature and mois- ture revive the nematodes, enabling them to save energy for those periods most favorable for activity. The gene for anhydrobiois has been found in temperate fungal feeding nematodes (Browne et al., 2002~. Elucidation of the molecular mechanisms for survival of a nematode such as S. lindsayue, which occurs in the most extreme soil environment on Earth, will contrib- ute to knowledge of developmental biology and to comparisons with the well-known model nematode, Caenorhabditis elegans, which also feeds on bacteria but is not found in polar systems (Freckman and Virginia, 1998; Riddle et al., 1997~. S. lindsaye thus offers an excellent polar organism for determining and comparing genetic mechanisms of survival to those already elucidated in temperate nematodes. Polar insects. Insects are the most common animals on Earth. Nearly 75 percent of the known species of animals are insects. Furthermore, insects live almost everywhere (except in the oceans), thrive in the Arctic, but are rare in the Antarctic. The Arctic beetle, Cucujus clavipes, is extremely cold tolerant, with a mean lower lethal temperature of - 0C (J. Duman, unpublished observa- tions). It occurs over a very wide latitudinal range, from Kentucky to Wiseman, Alaska (south side of the Brooks Range). This beetle winters in

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 91 several larval stages and as an adult. Generally a freeze-avoiding species, C. clavipes prevents its tissues from freezing through production of anti- freeze proteins. However, the beetles sometimes winter in a freeze-tolerant state, meaning that they can freeze and survive. Clearly, the genetic mechanisms for avoidance and tolerance of freezing may be elucidated by genome analysis, most likely at the level of the transcriptome. Further- more, Alaskan populations winter in a deep diapause state, whereas those in Indiana do not. C. clavipes, therefore, provides a model system for genetic dissection of diapause as well as survival of metazoan tissue dur- . . . ng treezmg. Plants Betula nana. The dwarf or bog birch is one of the most characteristic plants of the low Arctic region. It is found around the world and is the dominant plant in many areas. In other areas, such as Alaskan tussock tundra, it remains an important secondary species. Since the extent of shrub cover is critical in controlling snow distribution, Betula nana affects many aspects of the biophysics and the climate dynamics of the Arctic. In North America, it is very responsive to environmental manipulation and changes such as increased nutrients or warming. Experimental warming experiments (in small, in situ greenhouses) can produce a small forest of Betula nana. However, in Scandinavia, the same species appears much less responsive to nutrient additions. Given the importance of Betula nana in Arctic ecology and climatology, it is important to understand its physi- ology and its range of responses. Furthermore, it may be important to understand the nature of genetic variation that exists around the Arctic world. Other Considerations The polar species cited above, based on their biology, represent examples of compelling opportunities for genome science projects. We emphasize that not all projects will require the sequencing of the com- plete organismal genomes. Depending on the scientific questions and objectives, many projects may be addressed more effectively, and with greater cost efficiency, by other genome-wide methods (transcriptional profiling, protein gel profiling). Hence, it will be necessary to develop a framework for prioritization of polar organisms for full genome sequence characterization versus functional genomic profiling. Given the small sizes of most prokaryotic and many protistan genomes, they can be sequenced to completion provided that a strong scientific justification is advanced. The large genomes of metazoans and plants, by contrast, will

OCR for page 82
92 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA require careful assessment of the scientific benefit versus programmatic cost. One possible scenario for initiating a Polar Genome Science Initia- tive is outlined here: 1. Full-scale genome projects can be launched for one or two meta- zoan or plant species whose biology is well understood. 2. Meanwhile, 8-10 other animals and plants would be selected for functional genomic analysis via the construction of EST (expressed sequence tag) libraries, microarray production, and proteomic profiling. The results of the functional genomic studies should indicate whether these genomes deserve more detailed study, and 8-10 EST/proteomic projects could be executed for the price of one complete genome project. Gene expression or protein turnover profiles exhibiting potentially "adap- tive" features would argue for advanced analysis of the appropriate genomic regions, which could be cloned out of BAC, PAC, or YAC libraries (see below). Furthermore, the development of specific hypotheses based on the functional genomic approach would naturally define the appropri- ate comparative taxa while generating economy in focusing the work. The importance of "testing" putative environmental adaptations within the genes and genomes of polar organisms by comparison to phylogentically related, but temperate, species (criterion 3) cannot be over- emphasized. A Polar Genome Science Initiative, by its very nature, will require a strong comparative genomic component; and suggestions for appropriate species comparison are given in previous sections. Finally, as the initially exploratory phase of these genomic projects proceed, we anticipate that each will transition to directed, hypothesis-driven research based on the discoveries made in the first phase. Rigorous analysis of adaptation using approaches such as phylogenetically independent con- trasts (Felsenstein, 1989) will be necessary to distinguish adaptive varia- tion from the influences of ancestry. Because the generation times of most polar organisms are so long, none are likely to be developed into genetic "model organisms." Thus, the functional attributes and/or biotechnological potential of a gene obtained from a polar species must be assessed by reverse genetic strate- gies (e.g., manipulated expression of the gene by antisense morpholino RNA oligonucleotide "knockdown" [Nasevicius and Ekker, 2000] or by gene transfer methods, etc.), perhaps conducted in the organism itself or, more likely, in conventional model systems amenable to such approaches (e.g., various bacterial species, the plant Arabidopsis thaliana, the nematode Caenorhabditis elegans, the zebrafish Danio rerio, or the mouse Mus musculus).

OCR for page 82
94 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA and the fragments are assembled computationally into scaffolds (cf. the Fugu genome; Aparicio et al., 2002~. High-Throughput Sequencing of Genomic DNA and Expressed Genes The sine qua non of a genome project is the ability to sequence rapidly the fragments of a genome, whether large or small, with sufficient redun- dancy (six- to tenfold coverage) to reduce the error rate to between 1 in 1,000 nucleotides (a "rough-cut" genome) and 1 in 10,000 (a "polished" genome). Today, the most common sequencing method is based on auto- mation of the Sanger dideoxynucleotide chain termination protocol (Sanger et al., 1977~. If large-insert clones have been used to establish the physical map, then the clones must be subdivided to produce pieces (~1-2 kb) amenable to sequencing. Thus, one advantage of the shotgun mapping strategy is that libraries of short fragments are the starting point. Once sufficient numbers of sequenced fragments have been obtained, they are ordered into contigs and the contigs into larger "scaffolds" of genome sequence. The sequences of expressed genes (for example, cDNAs, or complementary copies of messenger RNAs [mRNA]) are also incorporated into the assembly because they help to define the intron and exon boundaries of the genes in the genome. Irrespective of effort, some genomic regions will be refractory or "unsequenceable." Often these regions have a biased, high guanine-cytosine (GC) content or consist of short repetitive elements that are difficult to resolve. Whereas microbial genomes are generally sequenced to completion, the "finished" genomes of eukaryotes will normally contain gaps. A major consideration for any genome project, such as the Polar Genome Science Initiative contemplated here, is the cost of the sequenc- ing itself as well as the computational power required to assemble the genome. Fortunately, new sequencing technologies promise to reduce costs to levels unimaginable at the start of the public and private sector human genome projects. The National Human Genome Research Institute has just funded GenomeVision to reduce the costs of large-scale gene sequencing projects by five- to tenfold through miniaturization over the next two years (GenomeWeb staff, 2002~. Many alternative technologies are being developed that should increase the speed and accuracy of sequencing while lowering costs (Lakhman, 2002; McGowan, 2002a). Thus, the Polar Genome Science Initiative is not only intellectually com- pelling but also imminently practical and affordable.

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE Gene Identification and Annotation 95 The genome of the pufferfish, Fugu rubripes, is estimated to contain ~31,000 genes, or roughly the same number as current estimates of the human genome (Aparicio et al., 2002~. Of predicted human proteins, ~75 percent are orthologous to pufferfish proteins, whereas the remain- ing 25 percent either are highly divergent or are not encoded by the fish genome. This comparison emphasizes that gene prediction must be pur- sued both by orthology and by use of ab initio gene prediction tools. Following identification, genes must be annotated with data regarding presumptive function, pattern of expression, and putative orthologues found in other genomes. Population Analysis with Single-Nucleotide Polymorphisms Generating a genome sequence based on one, or at most a few, indi- viduals of a species represents merely a beginning for population biolo- gists. Natural variation among genomes in a population is the "stuff" of phenotypic variation and evolutionary speciation. It is generally assumed that single-nucleotide polymorphisms and indels are responsible for quantitative variation in phenotypic traits. SNPs may be used to track gene flow between separate populations of a species, and their absence signals that the populations are stratified and perhaps in the process of speciating due to ecological, geographic, or behavioral factors. Because DNA-sequencing costs are declining rapidly, the identification of robust "SNP libraries" for population analysis of multiple species is a realistic goal. The capacity to compare distinct populations of a polar species and to monitor community relationships between interacting species using SNP technology promises to revolutionize polar ecology. Some potential appli- cations (Gibson and Muse, 2002) include: inference of the demographic history of populations; analysis of mating systems; conservation biology, including the population forensics of com- mercially exploited species; analysis of breeding structure and dispersal of soil microorganisms, nematodes, and so forth; and . . clahons. timing the establishment of host-symbiont and host/parasite asso-

OCR for page 82
96 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA Web-Based Databases and Interfaces for Data Management and Comparative Genomics Genome projects produce massive amounts of sequence data and annotated information. These data must be made available to the wider biological research community by creation of appropriate relational data- bases and web-based interfaces. Indeed, the ability to compare genomes will speed our understanding of genome evolution and the phylogenetic relationships of all organisms, whether polar or temperate. A compre- hensive Polar Genome Science Initiative must make provision for cre- ation, curation, validation, and management of these databases and for the bioinformatics tools necessary for insightful genome analyses. Transcriptome Analysis As emphasized at several junctures in this report, the ability to quali- tatively and quantitatively describe the transcriptome opens up a number of new avenues for investigating polar organisms. All taxa can be exam- ined through transcriptome analysis, and studies can involve complex microbial consortia as well as individual animals or plants and tissues thereof. A primary use of transcriptome analysis is to study how environ- mental factors, both singly and in combination, influence patterns of gene expression. The environmental factors of interest comprise natural vari- ables such as temperature and UV radiation and anthropogenic factors such as organic and heavy metal pollutants. Transcriptome analysis can provide a "snapshot" of the organism's status in terms of gene expression and makes it possible to follow the time course of organismal responses to environmental change. Although the use of DNA microarrays for examining organisms' tran- scriptional responses to the environment is in its infancy, there are several indications of how promising this approach can be for probing the effects of environmental change. Studies of yeast have shown that a characteristic set of stress-related genes is activated upon exposure of the cells to a variety of stresses (anoxia, temperature, alcohols, and so on) (Causton et al., 2001; Gasch et al., 2000~. Stress-specific alterations in gene expression were also catalogued in yeast. This technology is becoming accessible to scientists interested in all types of organisms, from model systems to species for which no sequencing of the genome has been done (Pennisi, 2002~. The fabrication of DNA microarrays for transcriptome analysis can involve a number of experimental strategies. For organisms having a fully sequenced and well-annotated genome, DNA "microarrays" fabri- cated with specific oligonucleotide probes for the gene (mRNAs) of inter-

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 97 est can be built. Customized "microarrays" are available from a number of commercial sources, and this type of commercially produced technol- ogy will certainly become increasingly available for transcriptome analy- sis of many different organisms. In the case of species for which sequence information is limited or even entirely lacking, the construction of DNA microarrays must follow a different strategy. To construct microarrays for non-sequenced species, strategies such as that described by Gracey et al. (2001) are likely to be effective. Through construction of subtracted and normalized cDNA libraries, thousands of different cDNAs for spot- ting onto microarrays can be obtained. Through iterative analysis of these microarrays, one can screen the cDNA libraries to obtain thousands of unique cDNAs with minimal redundancy. Techniques are also well developed for selecting for full-length cDNAs so as to increase the utility of the cDNAs produced in microarray studies. Although DNA micro- arrays fabricated for "nonmodel" organisms offer an effective means for screening changes in gene expression, they have two key limitations. One stems from the fact that these arrays contain an incomplete representation of the genome. The second is that the absence of extensive sequence information limits the identification of many expressed genes. Also, the usefulness an array constructed for one species in study of another remains to be determined. This is an important question for future work. Despite their limitations, DNA microarrays for "nonmodel" species offer a powerful tool for analyzing the effects of environmental factors on gene expression. Proteome Analysis Changes in the transcriptome do not map one to one with changes in the proteome (Fiehn, 2001; Phelps et al., 2002~. Thus, depending on the goals of a study, analysis of the transcriptome may serve as only an initial step in the study of how environmental changes affect the phenotype. Proteomics is a powerful approach that allows one to characterize the suite of proteins present in a cell or tissue. The applicability of proteomic methodologies to the study of polar organisms, for which large amounts of DNA and protein sequence data are not available, appears promising for several reasons. First and fore- most, the conservation found in the sequences of orthologous proteins facilitates identification of proteins from genetically uncharacterized spe- cies. Second, as more and more genomes are sequenced and increasing amounts of information are obtained on the deduced amino acid sequences of orthologous proteins, proteomic analysis of nonsequenced organisms will become increasingly feasible. Targeted proteomics, in which only a minor fraction of the proteome is analyzed, for example,

OCR for page 82
98 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA using antibody methods, may be the most suitable strategy for screening changes in the levels of proteins that are of interest in a particular physi- ological context. Analysis of the transcriptome may point to the set of proteins that are most important for proteomic analysis. Metabolome Analysis Characterization of the composition of metabolites in the cell- metabolomics carries analysis one step closer to actual physiological activity (Phelps et al., 2002~. Metabolome analysis allows the charting of the types of substrates, end products, and biosynthetic intermediates found in the cell. Through appropriate coupling of analytical technolo- gies, metabolomic approaches can be quantitative as well as qualitative. With the advent of protocols in which magnetic resonance spectroscopy is coupled with effective separation techniques and mass spectrometry, identification and quantification of virtually all organic molecules in the cell is becoming possible (Fiehn, 2001~. Like the analyses of the transcriptome and the proteome, character- ization of the metabolome offers enormous potential for discerning the effects of environmental factors on organismal function. Similar to transcriptome and proteome analyses, metabolomic methods can be applied to any type of organism and to different cell types and tissues within an individual. Ecogenome Analysis Ecogenomics, the use of genome science to study ecology, has great potential to advance our understanding of microbial ecology (Stahl and Tiedje, 2002; Torsvik and Ovreas, 2002~. One promising approach is metagenome analysis (Rondon et al., 2000~. This approach is based on the same technology that is used to sequence the whole genome of specific organisms, but it is applied to entire microbial communities. In these analyses, large DNA fragments are extracted directly from microbial com- munities, large extents of sequence are determined, and the sequences are partially analyzed. In theory, the whole genomes of members of the community sampled can be assembled. From metagenome analysis, the following information (at a minimum) can be gleaned: phylogenetic com- position of the sample, variability of recognizable functional genes, asso- ciation of functional genes with a phylotype, indications of new and unsuspected functions, dosage of a particular gene in a chromosome or contig of interest, and insights into the regulation of gene expression. One important task for investigators of ecogenomics is to develop means for studying both culturable and unculturable (at least by present tech-

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 99 nologies) species, the latter often representing >99 percent of a microbial community. Thus, microarrays developed for examining community structure and function must include probes from both culturable and unculturable species. Although metagenome analysis is still in the development and testing stage, it holds great promise for providing a new, integrated view of the phylogenetic composition of microbial communities and of their functional capabilities. To date, only a few studies have employed metagenomic analyses (Bela et al., 2000; Rondon et al., 2000~. Ambitious plans for more such studies have been announced (McGowan, 2002b). Perhaps the great- est potential of such studies lies in their ability to address the critical need to relate microbial phylogeny to function. To some extent, this has been and can further be accomplished by determining linkages between "func- tional genes" and indicators of phylogeny such as rRNA genes. The metagenome approach may be particularly appropriate for polar problems. First, contig assembly is simplified if simple rather than com- plex communities are studied. Simple communities may be expected in some of the more extreme polar environments, for example, wintertime sea-ice communities, Dry Valley soils (or their Arctic equivalent), lake ice bubbles, and possibly subglacial lakes (see Chapter 2~. Second, metagenome approach could yield information about the composition and functioning of microbial communities that are particularly difficult to sample without disturbance or that are not amenable to experimental manipulation, such as subglacial lakes and sea-ice microbial communities. By analogy to single-organism genomics, ecogenomics must make the transition from sequencing and annotating metagenomic data to func- tional analyses. By further analogy to single-organism genomics, several genomic approaches appear to hold promise for functional ecogenomics. "Environmental microarrays" have several potential applications. Mea- surement of the dynamics of large numbers of individual populations may be possible using probes for indicators of phylogeny. Population dynamics can then be related to environmental data. Similarly, estima- tion of the abundance of large numbers of genes with known function in communities allows population studies of microbial guilds (populations sharing a common function in a community). Furthermore, using envi- ronmental microarrays, it may be possible to do transcriptional analysis, permitting estimates of in situ activity of functional guilds. Complemen- tary to transcriptional analysis, "environmental proteomics" may provide an additional approach to estimating the in situ activity of guilds. More- over, "environmental metabolomics" may provide a third approach for estimating in situ activities of guilds, which would not be limited by our genetic knowledge of the organisms in a community. These functional genomic approaches are applications based largely upon metagenomic

OCR for page 82
100 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA analyses and should be closely coordinated with the latter analyses. Relevant metagenomic data should be readily available to facilitate the application of the functional genomic approaches. Finally, bioinformatic approaches may determine relationships between phylogenetic groups and measurable functions, particularly if a common database is estab- lished relating ecogenomic data to phenotypic, geographic, and environ- mental data (Stahl and Tiedje, 2002~. All of these functional ecogenomic approaches involve severe techni- cal challenges, and none has yet been satisfactorily demonstrated. Several groups are actively developing various types of environmental micro- arrays (Guschin et al., 1997; Small et al., 2001; Wu et al., 2001~. Progress has also been made in environmental transcriptional analysis (Bakermans and Madsen, 2002; Miller et al., 1999; Park et al., 2002~. Environmental proteomics and metabolomics are currently hypothetical approaches. All of these approaches must address the extreme complexity of DNA, RNA, and proteins in most environments, which tends to increase detection limits and decrease specificity of analyses. Another problem common to environmental samples is the complexity of the sample matrix, which can limit analysis recovery and interfere with analyses. Despite the chal- lenges, the great potential of these ecogenomic approaches merits explo- ration. Leading microbial ecologists have endorsed ecogenomic research (Stahl and Tiedje, 2002), and application of ecogenomics to marine micro- bial ecology has been recommended in a previous report (NSF, 2000~. Ecogenomics has great potential for addressing some of the research ques- tions in polar biology outlined in the previous chapter. Impediments to the Study of the Transcriptomes, Proteomes, and Metabolomes of Polar Species Implementation of the study of transcriptomes, proteomes, and metabolomes of polar organisms faces a number of challenges, most of which are common to all three types of "-omic" analysis. In each case, the equipment needed to conduct this research is expensive and requires skilled hands for its operation. In the case of transcriptome studies, the equipment needed to fabricate and analyze DNA microarrays for instance, robotic apparatus for spotting DNA onto slides and for handling large numbers of liquid samples is very costly and it is not likely that all research centers will be able to acquire this equipment. Therefore, efforts should be made to provide access to the technology needed for transcriptome analysis for scientists working at sites where technology shortfalls exist. Identical arguments apply in the case of the equipment required for proteomic and metabolomic studies. When the required equipment is present at a center, it is likely to be housed in a central

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 101 facility for use by multiple investigators. Technical support for equip- ment operation and maintenance will likely be required at these centers. These technological demands for metabolomic and proteomic studies apply not only to polar science but to other bioscience disciplines as well. Support for a central metabolomic and proteomic center by funding agen- cies will benefit a broad community of investigators, including polar biologists. Large amounts of DNA sequencing accompany transcriptome analy- sis, and facilities for this purpose must be available to investigators. For DNA microarrays spotted with uncharacterized cDNAs, the spots that exhibit interesting patterns of up- or down-regulation must be sequenced to identify the genes undergoing shifts in expression. Sequencing may also precede the construction of microarrays to enable genes of interest to be included on the arrays. Sufficient support for DNA sequencing at research universities may already exist in most cases, so hurdles to imple- mentation posed by sequencing capacity may be relatively small. Further- more, where sequencing potential is not found, investigators may be able to farm out the needed sequencing to commercial firms or to universities that perform sequencing on a recharge basis. A final aspect of "-omic" research that merits emphasis is the likeli- hood that most aspects of these studies will be difficult, if not impossible, to carry out at remote field sites. It seems impractical, for example, to site sophisticated robotic systems for preparing DNA microarrays or equip- ment for mass spectrometric or magnetic resonance experiments in proteomics and metabolomics at field sites. Instead, what should be guar- anteed to investigators is the technology needed for sample preservation (for example, liquid nitrogen or dry ice) and reliable transportation of samples from the field back to the home laboratory where the sophisti- cated "-omics" analysis will be conducted. Bioinformatic Tools and Databases Another common requirement of "-omics" research is expertise in bioinformatics. The software needed to organize and to analyze the huge sets of data generated in all types of "-omic" studies is often available on web sites at no cost to the user. However, given the interest in looking for mechanisms of environmental adaptation in a given polar species' genome sequence, current tools are not likely to be appropriate for the task. An initiative in polar environmental genomics requires the design and devel- opment of specific bioinformatics tools that would search for sequence data that would support or refute hypotheses regarding adaptive processes. Therefore, collaborations between polar biologists and computational biologists and bioinformaticists are essential and should be encouraged.

OCR for page 82
102 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA Fellowships may be set up to fund students and postdoctoral researchers in the computer and mathematics departments to participate in polar genome studies. As genomics-based efforts in polar biology research expand, suffi- cient attention must be devoted to the bioinformatics issues related to the distribution and use of the information. Genome sequence data alone are relatively easy to access; however, the diversity of polar organisms being considered for whole-genome analysis (microbial species and various eukaryotes and metazoa), together with the desire to link genome sequence data to geographical and environmental (geochemistry and climatic) data and temporal data, presents a much greater bioinformatics challenge. This type of data sharing and linking requires the development of a fully inte- grated database for which no robust model yet exists. Moreover, investi- gators must be willing to agree on a set of standards for data format and data sharing. Such large matrix arrays of data can only be fully analyzed using network structure models. Maintenance of such databases and development of network analysis tools would require long-term funding, quite likely from multiple funding agencies. The need to develop inte- grated databases to link genome sequence, function, ecological, climatic and geographical, and temporal data is not unique to the polar research community. The polar research community should become actively involved in the ongoing database discussions that are taking place. CREATION OF A POLAR GENOME SCIENCE INITIATIVE Given the great potential of genomic science to address important new research questions in polar regions, some special effort to facilitate and guide these activities is justified. The goal of such an initiative would be to gather talent to work on these problems in an efficient and coordi- nated manner. One option for a Polar Genome Science Initiative would be for the community to form some kind of virtual steering committee or core group to provide leadership, perhaps based on the model offered by the Inter- national Arctic Polyna Programme (IAPP) of the Arctic Ocean Sciences Board (AOSB). The AOSB is a nongovernmental body that includes mem- bers and participants from research and governmental institutions. Its long-term mission is to facilitate Arctic Ocean research by supporting multinational and multidisciplinary natural science and engineering pro- grams. In doing so, it encourages communication, promotes information exchange, and facilitates discussions of needs and priorities. The IAPP Science Coordinating Group comprises volunteer scientists who serve to define the scientific needs and to coordinate the execution of research, and the AOSB serves primarily to facilitate discussion and build network-

OCR for page 82
THE POLAR GENOME SCIENCE INITIATIVE 103 ing opportunities. Part of the success of AOSB is the ability to build international cooperation on what is an inherently international topic, the Arctic Ocean. AOSB members come from Canada, Denmark, Finland, France, Germany, Iceland, lapan, the Netherlands, Norway, Poland, Russia, Sweden, Switzerland, the United Kingdom, and the United States of America. Although this international emphasis is not necessary for the Polar Genome Science Initiative, the model of an informal collaborative body might be useful. The main advantage of this approach is that it is relatively inexpen- sive (although there are still costs associated with supporting a secretariat and quality web presence, and there are costs incurred directly by each participant for travel). The main disadvantage is that this approach may not necessarily be able to facilitate implementation of the steering com- mittee or core group's thinking without the ability to leverage its ideas and plans (with funds) into activities. It requires a significant amount of effort from its volunteer participants, so a core group of truly interested leaders must emerge and be active if it is to make progress. If the Polar Genome Science Initiative is modeled after AOSB's IAPP within the United States, concerns might be raised as to why this informal group has the credibility to "speak" for the discipline. (For more information on the AOSB, see .) In the committee's opinion, a more effective approach for a Polar Genome Science Initiative would be for the National Science Foundation (NSF) to consider this recommendation as a priority area, providing tar- geted funding and facilitating establishment of a Science Steering Com- mittee to lead the planning. This approach might be modeled after the Arctic System Science Program's (ARCSS) Ocean-Atmosphere-Ice Inter- action program (or similar ARCSS's programs). The Steering Committee of the Polar Genome Science Initiative would include representatives of the relevant biological communities and would meet periodically to do strategic planning, set research priorities, discuss needs and how they might be met, solicit further input from the broad biological community, and encourage coordination and communication. Using this approach to implement a comprehensive, coordinated Polar Genome Science Initiative would generate synergies of effort that would maximize scientific output while minimizing the resources required. Under this approach, the Scientific Steering Committee would establish priorities and coordinate large-scale efforts for genome-enabled polar sci- ence (for example, genome sequencing, transcriptome analysis, coordi- nated bioinformatics databases). There would be no immediate need for new facilities or capabilities; instead, the initiative would support "virtual polar" genome science centers, recruited by NSF from the many extant genome centers, to provide the equipment and expertise necessary to

OCR for page 82
104 FRONTIERS IN POLAR BIOLOGY IN THE GENOMIC ERA support the polar biological community. Other advantages of this strategy include: pooling of resources for analysis of multiple genomes, with con- comitant economies of scale; division of labor to enhance the efficiency of the scientific return; coordination of community efforts to avoid unnecessary duplica- tion of research; and provision of uniform databases that facilitate cross-organismal and interpolar genomic comparisons. The committee believes that this approach is the most effective way to move forward, and the Arabidopsis Genome Initiative (see Chapter 5) shows that a well-planned effort can actually finish ahead of schedule if tasks are delegated effectively. This approach also makes it easy for new scientists to participate, because there is a clearly articulated way to engage the process, make contacts within the network, locate information, and seek potential research partners from other fields. This approach could be designed to encourage partnerships between universities and the pri- vate sector. The main disadvantage of this approach is that it requires new funding; the polar science community would not likely support it if it meant taking funds away from existing initiatives. The committee believes that NSF is well positioned to be the lead in the Polar Genome Science Initiative. NSF is the nation's preeminent orga- nization for the support of basic science and the one government agency with the scope and expertise to foster this type of effort in polar science. NSF is a major supporter of research in the Arctic and the key provider of support (with minor other inputs from NASA and others) for activities in the Antarctic, so it is already the acknowledged leader in advancing polar science. In addition, NSF has been funding genomic and integrative bio- logical research through its current programs on Frontiers in Integrative Biological Research (FIBR) and Genome-Enabled Environmental Sciences and Engineering (GEN-EN). However, the suggested Polar Genome Science Initiative is a large-scale research effort that aims to facilitate the applica- tion of genome research in the polar regions, coordinate the sequencing efforts of polar organisms, and encourage collaboration between polar and nonpolar scientists. It is beyond the scope of FIBR and GEN-EN. Together, its Office of Polar Programs and Directorate for Biological Sciences already have the expertise necessary to start and manage this kind of initiative, and they have contact with relevant communities to facilitate the transfer of knowledge that would be a key component of the initiative.