B
Challenge Problems in Bioinformatics and Computational Biology from Other Reports

B.1 GRAND CHALLENGES IN COMPUTATIONAL BIOLOGY (David Searls)1

  1. Protein structure prediction

  2. Homology searches

  3. Multiple alignment and phylogeny construction

  4. Genomic sequence analysis and gene-finding

B.2 OPPORTUNITIES IN MOLECULAR BIOMEDICINE IN THE ERA OF TERAFLOP COMPUTING (Klaus Schulten et al.)2

  1. Study protein-protein and protein-nucleic acid recognition and assembly

  2. Investigate integral functional units (dynamic form and function of large macromolecular and supramolecular complexes)

  3. Bridge the gap between computationally feasible and functionally relevant time scales

  4. Improve multiresolution structure prediction

  5. Combine classical molecular dynamics simulations with quantum chemical forces

  6. Sample larger sets of dynamical events and chemical species

  7. Realize interactive modeling

  8. Foster the development of biomolecular modeling and bioinformatics

  9. Train computational biologists in teraflop technologies, numerical algorithms, and physical concepts

  10. Bring experimental and computational groups in molecular biomedicine closer together.

1  

D. Searls, “Grand Challenges in Computational Biology,” Computational Methods in Molecular Biology, S. Salzberg, D. Searls, and Simon Kasif, eds., Elsevier Science, 1998.

2  

K. Schulten, G. Budescu, F. Molnar, Opportunities in Molecular Biomedicine in the Era of Teraflop Computing, NIH Resource for Macromolecular Modeling and Bioinformatics, March 3-4, 1999, Rockville, MD; see http://whitepapers.zdnet.co.uk/0,39025945,60014729p-39000617q,00.htm.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 429
Catalyzing Inquiry at the Interface of Computing and Biology B Challenge Problems in Bioinformatics and Computational Biology from Other Reports B.1 GRAND CHALLENGES IN COMPUTATIONAL BIOLOGY (David Searls)1 Protein structure prediction Homology searches Multiple alignment and phylogeny construction Genomic sequence analysis and gene-finding B.2 OPPORTUNITIES IN MOLECULAR BIOMEDICINE IN THE ERA OF TERAFLOP COMPUTING (Klaus Schulten et al.)2 Study protein-protein and protein-nucleic acid recognition and assembly Investigate integral functional units (dynamic form and function of large macromolecular and supramolecular complexes) Bridge the gap between computationally feasible and functionally relevant time scales Improve multiresolution structure prediction Combine classical molecular dynamics simulations with quantum chemical forces Sample larger sets of dynamical events and chemical species Realize interactive modeling Foster the development of biomolecular modeling and bioinformatics Train computational biologists in teraflop technologies, numerical algorithms, and physical concepts Bring experimental and computational groups in molecular biomedicine closer together. 1   D. Searls, “Grand Challenges in Computational Biology,” Computational Methods in Molecular Biology, S. Salzberg, D. Searls, and Simon Kasif, eds., Elsevier Science, 1998. 2   K. Schulten, G. Budescu, F. Molnar, Opportunities in Molecular Biomedicine in the Era of Teraflop Computing, NIH Resource for Macromolecular Modeling and Bioinformatics, March 3-4, 1999, Rockville, MD; see http://whitepapers.zdnet.co.uk/0,39025945,60014729p-39000617q,00.htm.

OCR for page 429
Catalyzing Inquiry at the Interface of Computing and Biology B.3 WORKSHOP ON MODELING OF BIOLOGICAL SYSTEMS (Peter Kollman and Simon Levin)3 Challenging Issues That Span All Areas of Modeling Systems Integrating data and developing models of complex systems across multiple spatial and temporal scales Scale relations and coupling Temporal complexity and coding Parameter estimation and treatment of uncertainty Statistical analysis and data mining Simulation modeling and prediction Structure-function relationships Large and small nucleic acids Proteins Membrane systems General macromolecular assemblies CeIlular, tissue, organismal systems Ecological and evolutionary systems Image analysis and visualization Image interpretation and data fusion Inverse problems Two-, three- and higher-dimensional visualization and virtual reality Basic mathematical issues Formalisms for spatial and temporal encoding Complex geometry Relationships between network architecture and dynamics Combinatorial complexity Theory for systems that combine stochastic and nonlinear effects often in partially distributed systems Data management Data modeling and data structure design Query algorithms, especially across heterogeneous data types Data server communication, especially peer-to-peer replication Distributed memory management and process management B.4 WORKSHOP ON NEXT-GENERATION BIOLOGY: THE ROLE OF NEXT-GENERATION COMPUTING (Shankar Subramaniam and John Wooley)4 Exemplar Challenges for Bioinformatics and Computational Biology Full genome-genome comparisons Rapid assessment of polymorphic genetic variations 3   “Modeling of Biological Systems,” P. Kollman and S. Levin (chairs), a workshop at the National Science Foundation, March 14 and 15, 1996, available at http://www.resnet.wm.edu/~jxshix/math490/Modeling%20of%20Biological%20Systems.htm. 4   S. Subramaniam and J. Wooley, DOE-NSF-NIH 1998 Workshop on Next-Generation Biology: The Role of Next Generation Computing, available at http://cbcg.lbl.gov/ssi-csb/nextGenBioWS.html.

OCR for page 429
Catalyzing Inquiry at the Interface of Computing and Biology Complete construction of orthologous and paralogous groups of genes Structure determination of large macromolecular assemblies/complexes Dynamical simulation of realistic oligomeric systems Rapid structural/topological clustering of proteins Prediction of unknown molecular structures; protein folding Computer simulation of membrane structure and dynamic function Simulation of genetic networks and the sensitivity of these pathways to component stoichiometry and kinetics Integration of observations across scales of vastly different dimensions and organization to yield realistic environmental models for basic biology and societal needs B.5 TECHNOLOGIES FOR BIOLOGICAL COMPUTER-AIDED DESIGN (Masaru Tomita)5 Enzyme engineering: to refine enzymes and to analyze kinetic parameters in vitro Metabolic engineering: to analyze flux rates in vivo Analytical chemistry: to determine and analyze the quantity of metabolites efficiently Genetic engineering: to cut and paste genes on demand, for modifying metabolic pathways Simulation science: to efficiently and accurately simulate a large number of reactions Knowledge engineering: to construct, edit and maintain large metabolic knowledge bases Mathematical engineering: to estimate and tune unknown parameters B.6 TOP BIOINFORMATICS CHALLENGES (Chris Burge et al.)6 Precise, predictive model of transcription initiation and termination: ability to predict where and when transcription will occur in a genome Precise, predictive model of RNA splicing/alternative splicing: ability to predict the splicing pattern of any primary transcript Precise, quantitative models of signal transduction pathways:ability to predict cellular response to external stimuli Determining effective protein-DNA, protein-RNA and protein-protein recognition codes Accurate ab initio structure prediction Rational design of small molecule inhibitors of proteins Mechanistic understanding of protein evolution: understanding exactly how new protein functions evolve Mechanistic understanding of speciation: molecular details of how speciation occurs Continued development of effective gene ontologies-systematic ways to describe the functions of any gene or protein (Infrastructure and education challenge) Education: development of appropriate bioinformatics curricula for secondary, undergraduate, and graduate education B.7 EMERGING FIELDS IN BIOINFORMATICS (Patricia Babbitt)7 Data storage and retrieval, database structures, annotation Analysis of genomic/proteomic/other high-throughput information 5   M. Tomita, “Towards Computer Aided Design (CAD) of Useful Microorganisms,” Bioinformatics 17(12):1091-1092, 2001. 6   C. Burge, “Bioinformaticists Will Be Busy Bees,” Genome Technology, No. 17, January, 2002. Available (by free subscription) at http://www.genome-technology.com/articles/view-article.asp?Article=20021023161457. 7   P. Babbitt et al., “A Very Very Very Short Introduction to Protein Bioinformatics,” August 22-23, 2002, University of California, San Francisco, available at http://baygenomics.ucsf.edu/education/workshop1/lectures/w1.print2.pdf.

OCR for page 429
Catalyzing Inquiry at the Interface of Computing and Biology Evolutionary model building and phylogenic analysis Architecture and content of genomes Complex systems analysis/genetic circuits Information content in DNA, RNA, protein sequences and structure Metabolic computing Data mining using machine learning tools, neural nets, artificial intelligence Nucleic acid and protein sequence analyses B.8 TEN GRAND CHALLENGES (Sylvia Spengler)8 The origin, structure, and fate of the universe The fundamental structure of matter Earth’s physical systems The diversity of life on Earth The tree of life The language of life The web of life Human ecology The brain and artificial thinking machines Integrating Earth and human systems A knowledge server for planetary management Research Across Domains: Data Information management—human evolution continued Exponential increase in data and information across domains Access to information across domains—as or more important than the information itself Integration of data across knowledge domains Apply analytical tools across knowledge domains Modeling of complex systems Simulation of phenomena—descriptive science becomes predictive science Research Across Domains: People Share data across disciplines Build and use analytical and modeling tools across disciplines Work in collaborative, cross-domain groups Research Across Domains: Time Real-time data access, integration, and analysis Real-time modeling and effects prediction Real-time dissemination of research results Real-time testing by research community Real-time policy discussions Real-time policy decisions 8   S. Spengler, Lawrence Berkeley National Laboratory, personal communication to John Wooley, January 3, 2005.

OCR for page 429
Catalyzing Inquiry at the Interface of Computing and Biology B.9 GRAND CHALLENGES IN BIOMEDICAL COMPUTING (John A. Board, Jr.)9 Biomedical Applications from Coupling Imaging and Modeling Real-time noninvasive three-dimensional imaging of many body systems Real-time generation of three-dimensional patient-specific models Multiple-technology (multimodal) imaging and modeling Whole-organ modeling Multiple-organ system modeling Patient-specific modeling of organ anomalies Model support for (partial) restoration of hearing, coarse vision, and locomotion (via both paralyzed and artificial limbs) All of these applications make use of: Three-dimensional models Increasingly refined grids and increasing levels of tissue discrimination Anatomically realistic models Special-purpose hardware for visualization Distributed computing techniques. B.10 ACCELERATING MATHEMATICAL-BIOLOGICAL LINKAGES: REPORT OF A JOINT NSF-NIH WORKSHOP (Margaret Palmer et al.)10 List of Top Ten Problems at the Mathematical Biology Interface Model multilevel systems: from the cells in people, to human communities in physical, chemical, and biotic ecologies. Model networks of complex metabolic pathways, cell signaling, and species interactions. Integrate probabilistic theories: understand uncertainty and risk. Understand computation: gaining insight and proving theorems from numerical computation and agent-based models. Provide tools for data mining and inference. Address linguistic and graph theoretical approaches. Model brain function. Build computational tools for problems with multiple temporal and spatial scales. Provide ecological forecasts. Understand effects of erroneous data on biological understanding. B.11 GRAND CHALLENGES OF MULTIMODAL BIOMEDICAL SYSTEMS (J. Chen et al.)11 Science Challenges Allow early detection of where and when an infectious disease outbreak occurs, whether it is naturally occurring or man-made, in real time. 9   J.A. Board, Jr., “Grand Challenges in Biomedical Computing, High-Performance Computing in Biomedical Research, T.C. Pilkington, B. Loftis, J.F. Thompson, S.L.Y. Woo, T.C. Palmer, and T.F. Budinger, eds., CRC Press, Boca Raton, FL, 1993. 10   M. Palmer et al., “Accelerating Mathematical-Biological Linkages: Report of a Joint NSF-NIH Workshop,” February 2003, available at www.maa.org/mtc/NIH-feb03-report.pdf. 11   J. Chen et al., “Grand Challenges of Multimodal Bio-Medical Systems,” IEEE Circuits and Systems Magazine, pp. 46-52, 2nd Quarter 2005, available at http://gsp.tamu.edu/Publications/PDFpapers/pap_CASmag_MBM.pdf.

OCR for page 429
Catalyzing Inquiry at the Interface of Computing and Biology Develop multidimensional drug profiling databases to facilitate drug discovery and to identify biomarkers for diagnosis and monitoring the progress of individual disease treatments. Connect activities and events derived from cellular processes to high-level cognitions. Support personalized medical care and clinical decision for patients Technology Challenges and Enabling Technologies Formalization of biological knowledge into predictive models for systems biology and system-based analysis Interdisciplinary training Development of open source, multiscale modality informatics toolkits B.12 THE DEPARTMENT OF ENERGY’S GENOMES TO LIFE PROGRAM12 21st Century Biology Requiring “Biocomp” Tools Population models, symbiosis, and stability Discrete growth models Reaction kinetics Biological oscillators and switches Coupled oscillators Reaction-diffusion, chemotaxis, and nonlocality Oscillator-generated wave phenomena and patterns Spatial pattern formation with population interactions Mechanical models for generating pattern and form in development Evolution and morphogenesis A Mathematica for Molecular, Cellular, and Systems Biology Core data models and structures [database management] Optimized functions [core libraries] Scripting environment [e.g., Python, PERL, ruby, etc.] Database accessors and built-in schemas Simulation interfaces Parallel and accelerated kernels Visualization interfaces (for information visualization and scientific visualization) Collaborative workflow and group use interfaces Hierarchical Biological Modeling Environment Genetic sequences Molecular machines Molecular complexes and modules Networks + pathways [metabolic, signaling, regulation] Structural components [ultrastructures] Cell structure and morphology Extracellular environment Populations and consortia 12   R. Stevens, “GTL Software Infrastructure: A Computer Science Perspective,” undated presentation, Argonne National Laboratory, available at www.doegenomics.org/compbio/mtg_1_22_02/RickStevens.ppt.

OCR for page 429
Catalyzing Inquiry at the Interface of Computing and Biology Modeling and Simulation Challenges for 21st Century Biology Modeling activity of single genes Probabilistic models of prokaryotic genes and regulation Logical models of regulatory control in eukaryotic systems Gene regulation networks and genetic network inference in computational models and applications to large-scale gene expression data Atomistic-level simulation of biomolecules Diffusion phenomena in cytoplasm and extracellular environment Kinetic models of excitable membranes and synaptic interactions Stochastic simulation of cell signaling pathways Complex dynamics of cell cycle regulation Model simplification B.13 HIGH-PERFORMANCE COMPUTING, COMMUNICATION, AND INFORMATION TECHNOLOGY GRAND CHALLENGES (LATE 1980s, EARLY 1990s)13 Computing Applications to Map and Sequence Human Genome Understanding protein folding Predicting structure of native protein Exhaustive discovery and analysis of cancer genes Molecular recognition and dynamics Drug discovery 13   Committee on Physical, Mathematical, and Engineering Sciences of the Federal Coordinating Council for Science, Engineering, and Technology, U.S. Office of Science and Technology Policy, FY1992 Blue Book: Grand Challenges: High Performance Computing and Communications—The FY 1992 U.S. Research and Development Program.

OCR for page 429
Catalyzing Inquiry at the Interface of Computing and Biology This page intentionally left blank.