| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 49
s
Infrastructure: Capabilities and Goals
Remarkable advances in information technologies computer speed, algo-
rithm power, data storage, and network bandwidth have led to a new era of
capabilities that range from computational models of molecular processes to re-
mote use of one-of-a-kind instruments, shared data repositories, and distributed
collaborations. At the current pace of change, an order-of-magnitude increase in
computing and communications capability will occur every five years. Advances
in information technology (IT) allow us to carry out tasks better and faster; in
addition, these sustained rapid advances create revolutionary opportunities. We
are still at the early stages of taking strategic advantage of the full potential of-
fered by scientific computing and information technology in ways that benefit
both academic science and industry. Investments in improving chemical-based
understanding and decision making will have a high impact because chemical
science and engineering are at the foundation of a broad spectrum of technologi-
cal and biological processes. If the United States is to maintain and strengthen its
position as a world leader, chemical science and technology will have to aggres-
sively pursue the opportunities offered by the advances in information and com-
munication technologies.
At the intersection of information technology and the chemical sciences there
are infrastructural challenges and opportunities. There are needs for infrastruc-
ture improvements that could enable chemical scientists and engineers to attain
wholly new levels of computing-related research and education and demonstrate
the value of these activities to society. These needs extend from research and
teaching in the chemical sciences to issues associated with codes, software, data
and storage, and networking and bandwidth.
Some things are currently working very well at the interface of computing
49
OCR for page 50
so
INFORMATION AND COMMUNICATION
and the chemical sciences. Networking and Internet high-speed connectivity have
been integrated into the chemical sciences, changing the landscape of these fields
and of computational chemistry. Commercial computational chemistry software
companies and some academic centers provide and maintain computational and
modeling codes that are widely used to solve problems in industry and academia.
However, these companies and centers do not, and probably cannot, provide the
infrastructure required for the development of new scientific approaches and
codes for a research market that is deeply segmented. The development of new
codes and applications by academia represents a mechanism for continuous inno-
vation that drives the field and helps to direct the choice of application areas on
which the power of computational chemistry and simulation is brought to bear.
Modern algorithms and programming tools have speeded new code development
and eased prototyping worries, but creating the complicated codes typical of
chemical science and engineering applications remains an exceedingly difficult
and time-consuming task. Defining new codes and applications is potentially a
growth area of high value but one that faces major infrastructure implications if it
is to be sustained.
Successful collaborations between chemists and chemical engineers, as well
as broadly structured interdisciplinary groups in general, have grown rapidly dur-
ing the past decade. These have created the demand for infrastructure develop-
ment to solve important problems and new applications in ways never before
envisioned. The current infrastructure must be improved if it is to be used effec-
tively in interdisciplinary team efforts, especially for realizing the major potential
impact of multiscale simulations. Infrastructure developments that support im-
proved multidisciplinary interactions include resources for code development,
assessment, and life-cycle maintenance; computers designed for science and en-
gineering applications; and software for data collection, information management,
visualization, and analysis. Such issues must be addressed broadly in the way that
funding investments are made in infrastructure, as well as in cross-disciplinary
education and in the academic reward structure.
The overarching infrastructure challenge is to provide at all times the needed
accessibility, standardization and integration across platforms while also provid-
ing the fluidity needed to adapt to new concurrent advances in a time of rapid
. .
nnovahon.
RESEARCH
Significant gains in understanding and predictive ability are envisioned to
result from the development of multiscale simulation methods for the investiga-
tion of complicated systems that encompass behavior over wide ranges of time
and length scales. Such systems usually require a multidisciplinary approach.
Often, multiscale simulations involve multiple phenomena that occur simulta-
neously with complex, subtle interactions that can confound intuition. While much
OCR for page 51
INFRASTRUCTURE: CAPABILITIES AND GOALS
51
is known about simulating aspects of behavior at individual scales (e.g., ah initio,
stochastic, continuum, and supply-chain calculations), integration across scales is
essential for understanding the behavior of entire systems.
A critical component in achieving the benefit implied by multiscale model-
ing will be funding for interdisciplinary research for which effective, collabora-
tive web-based tools are required. The integration of computational results with
experimental information is often necessary to solve multiscale problems. In some
instances, creating opportunities to access shared equipment will be as critical as
access to shared computers or software. Especially important is the ability to
represent and understand the uncertainties, not only in the underlying scientific
understanding, but also in experimental data that may come from extremely het-
erogeneous sources. The infrastructure to achieve these research goals must in-
clude definition of standard test cases for software and experiments.
Basic infrastructure needs include high-bandwidth access to high-perfor-
mance computational facilities, further increased network and bus speed, diverse
computer architectures, shared instruments, software, federated databases, stor-
age, analysis, and visualization. Computers designed with a knowledge of the
memory usage patterns of science and engineering problems will be useful, as
will algorithms that take full advantage of the new generation of supercomputers.
Continuation of the successful trend towards clusters of commodity computers
may result in further opportunities for improved computational efficiency and
cost effectiveness. Software should be characterized by interoperability and port-
ability so that codes and computers can talk to each other and can be moved in a
seamless manner to new systems when they become available.
EDUCATION
The need for student learning in basic mathematics at the intersection of
computing and the chemical sciences is essential because it provides the founda-
tion for computational chemistry, modeling, and simulation as well as associated
software engineering. Although many entry-level students in the chemical sci-
ences are familiar with the use of computers and programs, they often have little
or no understanding of the concepts and design principles underlying their use.
The integration of these topics in interdisciplinary courses is essential for the
development of a skilled workforce.i Educational activities will require the in-
vestment of time and resources to develop new content in the curriculum for
chemists and chemical engineers. New pedagogical approaches at both the under-
graduate and graduate levels will be needed to address subjects at the interface of
disciplines linked by scientific data, programming, and applications areas. Train-
ing students to adopt a problem-solving approach is critically important for good
iBuilding a Workforce for the Information Economy, National Research Council, National Acad-
emy Press, Washington, DC, 2001.
OCR for page 52
52
INFORMATION AND COMMUNICATION
software engineering and especially for producing codes and data structures
that are useful to other people. A national community of educational open-source
software would help speed development of training tools.
Just as training in mathematics and physics has been needed for work in
chemical sciences and engineering, so will specific education in the use of mod-
ern IT tools, software design, and data structures be needed by the chemical pro-
fessional of the twenty-first century. Such education will help in the rapid devel-
opment of new approaches, cross-disciplinary integration, and integrated data
handling and utilization.
Interdisciplinary research and development at the IT-chemical science inter-
face are areas of great excitement and opportunity. Nevertheless, people trained
to carry out such projects are in short supply. The continued capability of indi-
viduals requires both deep competence and the ability to interact across disci-
plines. The emphasis in graduate training therefore must be balanced between
specialization within a discipline and cross-disciplinary collaboration and team-
work. Transfer of information between fields remains difficult when evaluating
performance, particularly for tenure and promotion of faculty who focus on inter-
disciplinary projects or hold joint appointments in multiple departments. Such
evaluation of scholarship will require attentive administrative coordination to re-
solve cultural differences. Creating high-quality educational programs to train
people to work at interdisciplinary interfaces is currently a rate-limiting step in
the growth of the field. Recognizing and rewarding the success of interdiscipli-
nary scientists at different stages in their careers is becoming critically important
for the sustained development of the field.
Computational chemistry and simulation methods should be incorporated into
a broad range of educational programs to provide better understanding of the
scope and limitations of various methods, as well as to facilitate their application
over the full range of interdisciplinary problems to which they apply. Both sci-
ence and engineering applications have to be addressed, since these can have
different goals and methods of pursuit with widely differing levels of sophisti-
cation. These include simple applications that can be helpful in early stages, com-
plicated applications that require greater skill, and applications to truly complex
nonlinear systems that represent the current focus of many experts in the field.
Such training will benefit industry, where there is a need for computational spe-
cialists who understand the goals and objectives of a broad interdisciplinary prob-
lem and know how and when computational chemistry and systems-level model-
ing can provide added value. In academia, infrastructure support to facilitate better
communication and interaction between chemists and chemical engineers will
enhance the training of computational experts of the future. The field will be well
served by establishing commonality in understanding and language between the
creators and users of codes as well as the skilled computer science and engineer-
ing nonusers who develop the IT methods.
OCR for page 53
INFRASTRUCTURE: CAPABILITIES AND GOALS
53
An increasingly important part of the infrastructure will be the skilled work-
ers who maintain codes, software, and databases over their life cycle. The wide
variety of tasks that require sustained management may necessitate a combina-
tion of local (funded through research grants) and national (funded through center
grants) support to address the overall portfolio of needs.
Advances in the chemical sciences have permitted major advances in medi-
cine, life science, earth science, physics and engineering, and environmental sci-
ence, to name a few. Advances in productivity, quality of life, security, and eco-
nomic vitality of global and American society have flowed directly from the
efforts of people who work in these fields. Looking to the future, we need to build
on these advances so that computational discovery and design can become stan-
dard components of broad education and training goals in our society. In this
way, the human resources will be available to create, as well as to realize and
embrace, the capabilities, challenges, and opportunities provided by the chemical
sciences through advanced information technology.
Information and communication, data and informatics, and modeling and
computing must become primary training goals for researchers in chemical sci-
ence. These skills have to be accessible to effectively serve others in the soci-
ety from doctors to druggists, ecologists to farmers, and journalists to decision
makers who need an awareness of chemical phenomena to work effectively and
to make wise decisions. Such skills provide liberating capabilities that enable
interactions among people and facilitate new modes of thought, totally new capa-
bilities for problem-solving, and new ways to extend the vision of the chemical
profession and of the society it serves.
CODES, SOFTWARE, DATA AND BANDWIDTH
A critical issue for codes, software, and databases is maintenance over the
life cycle of use. In the academic world, software with much potential utility can
be lost with the graduation of students who develop the codes. Moreover, as
codes become more complicated, the educational value of writing one's own code
must be balanced against the nontrivial effort to move from a complicated idea, to
algorithm, and then to code. Increasing fractions of student researchers are tend-
ing to develop skills with simpler practice codes, and then to work with and
modify legacy codes that are passed down. Yet at the same time, working in a big
coding environment with codes written by people who have long gone is difficult
and often frustrating. Development of software that uses source-code generation
to automatically fix dusty decks will be increasingly important for decreasing the
time and effort associated with extending the useful life of codes. Also, the devel-
opment of semiautomatic methods for generation of improved graphical user in-
terfaces will reduce a significant barrier to sustaining the use of older code. A1-
though the open-source approach works well for communities where thousands
OCR for page 54
54
INFORMATION AND COMMUNICATION
of eyes help remove bugs, it is unable to accommodate certain applications for
example, when proprietary information is involved.
Growth in multiscale simulation may be expected to drive development of
improved tools for integration of different software systems, integration of differ-
ent hardware architectures, and use of shared code by distributed collaborators.
An increasing need will steadily result for improved interoperability and portabil-
ity and for users to be able to modify codes created by others. Advances in object-
oriented programming and component technology will help. Examples such as
the Portable, Extensible Toolkit for Scientific Computing (PETSc) Library at
Argonne National Laboratory represent the kind of infrastructure that will sup-
port growth in strategic directions.
Central to the vision of a continuously evolving code resource for new appli-
cations is the ability to build on existing concepts and codes that have been exten-
sively developed. However, at present, academic code sharing and support mecha-
nisms can at best be described as poor sometimes as a result of perceived
commercialization potential or competitive advantage. Moreover, code develop-
ment and support are not explicitly supported by most research grants, nor is
maintenance of legacy codes. Consequently, adapting academic codes from else-
where may generate a risk that the code will become unsupported during its use-
ful life cycle. Avoiding this risk results in continual duplication of effort to
produce trivial codes that could be better served by open-source toolkits and li-
braries maintained as part of the infrastructure.
The assurance of code verification, reliability, standardization, availability,
maintenance, and security represents an infrastructure issue with broad implica-
tions. Sometimes commercial software has established a strong technical base,
excellent interfaces, and user-friendly approaches that attract a wide range of
users. Commercial software can be valuable when market forces result in con-
tinuous improvements that are introduced in a seamless manner, but generally,
commercial code development is not well matched to the needs of small groups
of research experts nor to many large potential markets of nonexperts. Therefore
a critical need exists to establish standards and responsibilities for code.
The rapid growth of data storage per unit cost has been accompanied by
equally significant increases in the demand for data, with the result that there is
rapid increase in emphasis on data issues across chemical science and engineer-
ing. Bioinformatics and pharmaceutical database mining represent areas in which
sophisticated methods have been effective in extracting useful knowledge from
data. Newly emerging applications include scientific measurements, sensors in
the environment, process-engineering data, manufacturing execution, and sup-
ply-chain systems. The overall goal is to build data repositories that can be ac-
cessed easily by remote computers to facilitate the use of shared data among
creative laboratory scientists, plant engineers, process-control systems, business
managers, and decision makers. Achieving this requires improved procedures
that provide interoperability and data-exchange standards.
OCR for page 55
INFRASTRUCTURE: CAPABILITIES AND GOALS
55
The integration of federated databases with predictive modeling and simula-
tion tools represents an important opportunity for major advances in the effective
use of massive amounts of data. The framework will need to include computa-
tional tools, evaluated experimental data, active databases, and knowledge-based
software guides for generating chemical and physical property data on demand
with quantitative measures of uncertainty. The approach has to provide validated,
predictive simulation methods for complicated systems with seamless multiscale
and multidisciplinary integration to predict properties and to model physical phe-
nomena and processes. The results must be in a form that can be visualized and
used by even a nonexpert.
In addition to the insightful use of existing data, the acquisition of new chemi-
cal and physical property data continues to grow in importance as does the need
to retrieve data for future needs. Such efforts require careful experimental mea-
surements as well as skilled evaluation of related data from multiple sources. It
will be necessary to assess confidence with robust uncertainty estimates; validate
data with experimentally or calculated benchmark data of known accuracy; and
document the metadata needed for interpretation.
There is a need to advance IT systems to provide scientific data and available
bandwidth in the public arena. High-quality data represent the foundation upon
which public and proprietary institutions can develop their knowledge-manage-
ment and predictive modeling systems. It is appropriate that federal agencies par-
ticipate in the growing number of data issues that are facing the chemical science
and engineering community including policy issues associated with access to
data. Improved access to data not only will benefit research and technology but
will provide policy and decision makers with superior insights on chemical data-
centric matters such as environmental policy, natural resource utilization, and
management of unnatural substances. Expanded bandwidth is crucial for collabo-
rations, data flow and management, and shared computing resources.
You might ask "What is the twenty-first century Grid infrastructure
that is emerging?" I would answer that it is this tightly optically
coupled set of data clusters for computing and visualization tied
together in a collaborative middle layer.... So, if you thought you
had seen an explosion on the Internet, you really haven't seen any-
thing yet.
Larry Smarr (Appendix D)
ANTICIPATED BENEFITS OF INVESTMENT IN INFRASTRUCTURE
Chemical science and engineering serve major sectors that, in turn, have a
wide range of expectations from infrastructure investments. At the heart of these
OCR for page 56
56
INFORMATION AND COMMUNICATION
is the development and informed use of data and simulation tools. The use of
information technology to facilitate multidisciplinary teams that collaborate on
large problems is in its infancy. Sustained investment in information technologies
that facilitate the process of discovery and technological innovation holds truly
significant promise, and the chemical sciences provide a large number of poten-
tial testbeds for the development of such capabilities.
In science and engineering research, the complex areas identified in Chapter
4 are clear points of entry for computer science, engineering, and applied math-
ematics along with chemical science and engineering. One of the great values of
simulation is the insight it gives into the inner relationships of complicated sys-
tems as well as the influence this insight has on the resulting outcome. The key
enabling infrastructure elements are those that enhance the new intuitions and
insights that are the first steps toward discovery.
The advances being made in Grid technologies and virtual labora-
tories will enhance our ability to access and use computers, chemi-
cal data, and first-of-a-kind or one-of-a-kind instruments to advance
chemical science and technology. Grid technologies will substan-
tially reduce the barrier to using computational models to investi-
gate chemical phenomena and to integrating data from various sources
into the models or investigations. Virtual laboratories have already
proven to be an effective means of dealing with the rising costs of
forefront instruments for chemical research by providing capabilities
needed by researchers not co-located with the instruments all we
need is a sponsor willing to push this technology forward on behalf
of the user community.
The twenty-first Century will indeed be an exciting time for chemi-
cal science and technology.
Thom Dunning (Appendix D)
In industrial applications, tools are needed that speed targeted design and
impact business outcomes through efficient movement from discovery to techno-
logical application. Valuing IT infrastructure tools requires recognizing how they
enhance productivity, facilitate teamwork, and speed time-consuming experimen-
tal work.
Finding: Federal research support for individual investigators and for
curiosity-driven research is crucial for advances in basic theory, formal-
isms, methods, applications, and understanding.
History shows that the investment in long-term, high-risk research in the
chemical sciences must be maintained to ensure continued R&D progress
that provides the nation's technological and economic well-being. Large-
scale, large-group efforts are complementary to individual investigator
OCR for page 57
INFRASTRUCTURE: CAPABILITIES AND GOALS
57
Computer-Aided Design of Pharmaceuticals
Computer-aided molecular design in the pharmaceutical industry
is an application area that has evolved over the past several de-
cades. Documentation of success in the pharmaceutical discovery
process now transcends reports of individual applications of various
techniques that have been used in a specific drug discovery pro-
gram. The chemistry concepts of molecular size, shape, and proper-
ties and their influence on molecular recognition by receptors of
complementary size, shape, and properties are central unifying con-
cepts for the industry. These concepts and computational chemistry
visualization tools are now used at will and without hesitation by
virtually all participants, regardless of their core discipline (chemis-
try, biology, marketing, business, management). Such ubiquitous
use of simple chemical concepts is an exceedingly reliable indicator
of their influence and acceptance within an industry. The concepts
that unify thinking in the pharmaceutical discovery field seemingly
derive little from the complexity and rigor of the underlying computa-
tional chemistry techniques. Nevertheless, there is little reason to
assume that these simple concepts could ever have assumed a
central role without the support of computational chemistry founda-
tions. In other words, having a good idea in science, or in industry,
does not mean that anyone will agree with you (much less act on it)
unless there is a foundation upon which to build.
Organizing Committee
projects both are crucial, and both are critically dependent on next-genera-
tion IT infrastructure.
Finding: A strong infrastructure at the intersection with information
technology will be critical for the success of the nation's research invest-
ment in chemical science and technology.
The infrastructure includes hardware, computing facilities, research support,
communications links, and educational structures. Infrastructure enhance-
ments will provide substantial advantages in the pursuit of teaching, research,
and development. Chemists and chemical engineers will need to be ready to
take full advantage of capabilities that are increasing exponentially.
To accomplish this we must do the following:
· Recognize that significant investments in infrastructure will be necessary
for progress.
· Enhance training throughout the educational system (elementary through
OCR for page 58
58
INFORMATION AND COMMUNICATION
postgraduate) for computational approaches to the physical world. Assuring that
chemists and chemical engineers have adequate training in information technol-
ogy is crucial. Programming languages have been the traditional focus of such
education; data structures, graphics, and software design are at least as important
and should be an integral component (along with such traditional fundamental
enablers as mathematics, physics, and biology) of the education of all workers in
chemistry and chemical engineering.
· Maintain national computing laboratories with staff to support research
users in a manner analogous to that for other user facilities.2
· Develop a mechanism to establish standards and responsibilities for veri
fication, standardization, availability, maintenance, and security of codes.
· Define appropriate roles for developers of academic or commercial soft-
ware throughout its life cycle.
.
.
Provide universal availability of reliable and verified software.
The findings and recommendations outlined here and in previous chapters show
that the intersection of chemistry and chemical engineering with computing and
information technology is a sector that is ripe with opportunity. Important accom-
plishments have already been realized, and major technical progress should be ex-
pected if new and existing resources are optimized in support of research, educa-
tion, and infrastructure. While this report identifies many needs and opportunities,
the path forward is not yet fully defined and will require additional analysis.
Recommendation: Federal agencies, in cooperation with the chemical
sciences and information technology communities, will need to carry out
a comprehensive assessment of the chemical sciences-information tech-
nology infrastructure portfolio.
The information provided by such an assessment will provide federal fund-
ing agencies with a sound basis for planning their future investments in both
disciplinary and cross-disciplinary research.
The following are among the actions that need to be taken:
· Identify criteria and appropriate indicators for setting priorities for infra-
structure investments that promote healthy science and facilitate the rapid move-
ment of concepts into well-engineered technological applications.
· Address the issue of standardization and accuracy of codes and databases,
including the possibility of a specific structure or mechanism (e.g., within a fed-
eral laboratory) to provide responsibility for standards evaluation.
Cooperative Stewardship: Managing the Nation's Multidisciplinary User Facilities for Research
with Synchrotron Radiation, Neutrons, and High Magnetic Fields, National Research Council, Na-
tional Academy Press, Washington, D.C., 1999.
OCR for page 59
INFRASTRUCTURE: CAPABILITIES AND GOALS
59
· Develop a strategy for involving the user community in testing and adopt-
ing new tools, integration, and standards development. Federal investment in IT
architecture, standards, and applications are expected to scale with growth of the
user base, but the user market is deeply segregated, and there may not yet be a
defined user base for any specific investment.
· Determine how to optimize incentives within peer-reviewed grant pro-
grams for creation of high quality cross-disciplinary software.
During the next 10 years, chemical science and engineering will
be participating in a broad trend in the United States and across the
world: we are moving toward a distributed cyberinfrastructure. The
goal will be to provide a collaborative framework for individual in-
vestigators who want to work with each other or with industry on
larger-scale projects that would be impossible for individual investi-
gators working alone.
Larry Smarr (Appendix DJ
Recommendation. In order to take full advantage of the emerging Grid-
based IT infrastructure, federal agencies in cooperation with the
chemical sciences and information technology communities should
consider establishing several collaborative data-modeling environments.
By integrating software, interpretation, data, visualization, networking, and
commodity computing, and using web services to ensure universal access,
these collaborative environments could impact tremendously the value of IT
for the chemical community. They are ideal structures for distributed learn-
ing, research, insight, and development on major issues confronting both the
chemical community and the larger society.
Collaborative Modeling-Data Environments should be funded on a multiyear
basis; should be organized to provide integrated, efficient, standardized, state-of-
the-art software packages, commodity computing and interpretative schemes; and
should provide open-source approaches (where appropriate), while maintaining
security and privacy assurance.
This report should be seen in the context of the larger initiative, Beyond the
Molecular Frontier: Challenges for Chemistry and Chemical Engineering,3 as
well as in the six accompanying reports on societal needs (of which this report is
3Beyond the Molecular Frontier: Challenges for Chemistry and Chemical Engineering, National
Research Council, The National Academies Press, Washington, D.C., 2003.
OCR for page 60
60
INFORMATION AND COMMUNICATION
one).4 5 6 7 ~ This component on Information and Communications, examines per-
haps the most dynamically growing capability in the chemical sciences. The find-
ings reported in the Executive Summary and in greater depth in the body of the
text constitute what the committee believes to be viable and important guidance
to help the chemical sciences community to take full advantage of growing IT
capabilities for the advancement of the chemical sciences and technology and
thereby for the betterment of our society and our world.
4Challenges for the Chemical Sciences in the 21st Century: National Security & Homeland De-
fense, National Research Council, The National Academies Press, Washington, D.C., 2002.
Challenges for the Chemical Sciences in the 21st Century: Materials Science and Technology,
National Research Council, The National Academies Press, Washington, D.C., 2003.
Challenges for the Chemical Sciences in the 21st Century: Energy and Transportation, National
Research Council, The National Academies Press, Washington, D.C., 2003 (in preparation).
Challenges for the Chemical Sciences in the 21st Century: The Environment, National Research
Council, The National Academies Press, Washington, D.C., 2003.
Challenges for the Chemical Sciences in the 21st Century: Health and Medicine, National Re-
search Council, The National Academies Press, Washington, D.C., 2003 (in preparation).
Representative terms from entire chapter:
chemical science