The following HTML text is provided to enhance online
readability. Many aspects of typography translate only awkwardly to HTML.
Please use the page image
as the authoritative form to ensure accuracy.
The New Science of Metagenomics: Revealing the Secrets of Our Microbial Planet
Examples of Questions That Illustrate the Utility of Metadata
Do microbial gene richness and evenness patterns (at some specific sampling density) correlate with other environmental characteristics?
Which microbial phylotypes or functional guilds co-occur with high statistical probability in different environments?
Do specific phylotypes track particular geographic or physico-chemical clines (latitudes, isotherms, isopycnals, and so on)?
Do specific microbial community open reading frames (functionally identified or not) track specific bioenergetic gradients (solar, geothermal, digestive tracts, and so on)?
What is the percentage of genes with a given role, as a function of some physical feature, such as the average temperature, of the sample sites?
Do microbial community protein families, amino acid content, or sequence motifs vary systemically as a function of habitat of origin?
Are specific protein sequence motifs characteristic of specific habitats?
annotations are not updated and thus become inconsistent with new ones. Although there is now a mechanism (called third party annotation18) for the community to annotate genomic sequences in GenBank, EMBL-Bank, and DDBJ, the original authors’ annotations, even if outdated, remain as primary annotations seen in the database. For instance, annotations added through curation at the appropriate model organism database are only very slowly being incorporated into central databases. In metagenomics projects, where many types of annotations would become possible only after additional data (or metadata) are collected by other groups, an annotation database must be able to accept and integrate both individual and large-scale (computational) annotations of metagenomic data and able to integrate them in a transparent way for their user communities. The need for dynamic and flexible annotation will require ongoing, professional curation—another reason that long-term database funding will be important.
It will be seen that the scientific community will be presented with challenges by the generation of metagenomics data. Many of the challenges will require a high degree of community organization and collaboration. Given the wide array of microbial communities that will be studied—from toxic waste sites to agricultural soil to the human mouth—the interested scientific community will be extremely diverse, and coordination will be