7
Conclusions

SUPPORTING HIGH-END COMPUTATIONAL RESEARCH

Variability of High-End Capability Computing

The four fields of science and engineering discussed in Chapters 2-5, while differing in many ways, all face key research challenges that cannot be addressed well or at all without new computational capabilities. In astrophysics, the committee concluded that four of the six major challenges identified are critically dependent on advances in computing capabilities, and the remaining two will require HECC to exploit the massive amounts of data that will soon be collected. In atmospheric sciences, the committee identified 10 major challenges, half of which are clearly limited by today’s capabilities for computing; the other half of the 10 challenges are also impeded by limitations in HECC, but they require commensurate advances in complementary directions as well. Two of the three major challenges in chemical separations are dependent on increased application of high-end computing as well as on the development of new capabilities, although experimental approaches continue to be viable. Evolutionary biology is still an emerging field in terms of computational sophistication, and progress will no doubt take place on different fronts. Of the seven major challenges in evolutionary biology identified by the committee, the first three are ripe to benefit from increased application of high-end computing as well as from the development of new capabilities, and HECC will become essential in those areas as research pushes toward more realistic models that build on the rapidly expanding universe of data.

Progress on some of the major challenges identified in Chapters 2-5 would ensue immediately if more powerful computers and/or algorithms were available, and users would see tangible advances. For example, numerical weather predictions could be run with finer resolution, and some simulations in astrophysics and climate modeling could include additional processes or details of importance. For most of the major challenges, progress would not be so immediate, but that does not imply that investing now in appropriate HECC infrastructure is less important. For some of the major challenges, HECC investments would be particularly timely because investments have already been made in new data sources that will soon stimulate or require advances in computing.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 121
7 Conclusions SUPPORTING HIGH-END COMPUTATIONAL RESEARCH Variability of High-End Capability Computing The four fields of science and engineering discussed in Chapters 2-5, while differing in many ways, all face key research challenges that cannot be addressed well or at all without new computational capabilities. In astrophysics, the committee concluded that four of the six major challenges identified are critically dependent on advances in computing capabilities, and the remaining two will require HECC to exploit the massive amounts of data that will soon be collected. In atmospheric sciences, the committee identified 10 major challenges, half of which are clearly limited by today’s capabilities for computing; the other half of the 10 challenges are also impeded by limitations in HECC, but they require commen- surate advances in complementary directions as well. Two of the three major challenges in chemical separations are dependent on increased application of high-end computing as well as on the development of new capabilities, although experimental approaches continue to be viable. Evolutionary biology is still an emerging field in terms of computational sophistication, and progress will no doubt take place on different fronts. Of the seven major challenges in evolutionary biology identified by the committee, the first three are ripe to benefit from increased application of high-end computing as well as from the development of new capabilities, and HECC will become essential in those areas as research pushes toward more realistic models that build on the rapidly expanding universe of data. Progress on some of the major challenges identified in Chapters 2-5 would ensue immediately if more powerful computers and/or algorithms were available, and users would see tangible advances. For example, numerical weather predictions could be run with finer resolution, and some simulations in astrophysics and climate modeling could include additional processes or details of importance. For most of the major challenges, progress would not be so immediate, but that does not imply that investing now in appropriate HECC infrastructure is less important. For some of the major challenges, HECC invest- ments would be particularly timely because investments have already been made in new data sources that will soon stimulate or require advances in computing. 

OCR for page 121
 THE POTENTIAL IMPACT OF HIGH-END CAPABILITY COMPUTING Evaluating the potential impact of HECC in this way necessitates a definition of high-end capability computing that differs from the usual platform-centric one. From the perspective of the scientists and engineers who are working to push the frontiers of knowledge in their fields, the computational capa- bilities they seek are those that enable new scientific insights. Those computational capabilities serve as a lever to pry new insight from a mass of data or a complicated mathematical model. The capabili- ties are not simply processing “muscle,” so they are not necessarily measured in terms of floating point operations per second or number of processors. Rather, from a researcher’s perspective, the high-end computational capabilities include whatever mix of hardware, models, algorithms, software, intellectual capacity, and computational infrastructure must be combined to enable the desired computations. High- end computing platforms are certainly part of that mix, but ambitious and progressive computational science and engineering is a systems process that depends on many factors. Thus the committee reached the following conclusion: Conclusion 1. High-end capability computing (HECC) is adanced computing that pushes the bounds of what is computationally feasible. Because it requires a system of interdependent com- ponents and because the mix of critical-path elements aries from field to field, HECC should not be defined simply by the type of computing platform being used. It is nonroutine in the sense that it requires innoation and poses technology risks in addition to the risks normally associated with any research endeaor. Infrastructure Needs of HECC At the very least, HECC infrastructure consists of hardware, operating software, and applications software. There is also a need for data management tools, graphical interface tools, data analysis tools, and algorithms research and development. Some critical problem areas may need targeted research into mathematical models, and others might require training or incentives to speed a targeted community’s climb up the learning curve. All parts of the HECC ecosystem must be healthy in order for HECC- enabled research to thrive. Some high-opportunity fields will not fully exploit HECC unless other changes fall into place. An example is chemical separations: While there is already a strong community of computational chemists skilled at HECC, they are not generally working on problems of industrial importance, for a variety of valid reasons. If it is determined that scientific progress is impaired because of underexploitation of HECC, then the incentives that drive computational chemistry researchers should be changed. Fields will generally take advantage of the increased availability of HECC in proportion to how much of the necessary infrastructure has been created—that is, whether the field is ready for HECC. All available evidence suggests that the advancement of science and its applications to society increasingly depends on computing. For all fields to contribute, they must receive support for whatever they need to ready them to capitalize on HECC. The committee members, even though very diverse in disciplinary and computing expertise, readily reached the following conclusion: Conclusion 2. Adanced computational science and engineering is a complex enterprise that r equires models, algorithms, software, hardware, facilities, education and training, and a com- munity of researchers attuned to its special needs. Computational capabilities in different fields of science and engineering are limited in different ways, and each field will require a different set of inestments before it can use HECC to oercome the field’s major challenges.

OCR for page 121
 CONCLUSIONS Drivers for Investment Decisions Once a decision is made to invest in computational resources, that investment must address all the elements of infrastructure that are needed by fields likely to use the resource. HECC infrastructure should be interpreted to mean whatever set of investments is needed to enable the desired progress. Every ele- ment that contributes to computational capability should be of concern to the providers of computing infrastructure. The needs vary from field to field, and optimal progress in science and engineering is not likely unless HECC infrastructure suits the existing capabilities of the fields it is meant to serve. What are the preconditions for a field to profit from HECC? At a minimum they would include the following: • The field must have established mathematical models for important research questions. • Algorithms must exist or be under development for computing solutions to those models. • The field should have sufficient theory and experimental data on which to base the models, if not to completely validate them. • Some simplified computations should already have been performed, so that the value and limits of computational approaches are becoming clear. • The relevant community must see value in computational approaches. • The research community must have or be able to tap into appropriate skills in computation, includ- ing the ability to optimize models and algorithms for HECC architectures. • Appropriate hardware and computational software must be available and convenient. • Some community software should be available or, alternatively, researchers using HECC should be well aware of how other computational scientists in their field are approaching similar problems. • Supporting software (for example, data management tools, automated grid generators, interface software) must be available, suitable, and understood by the researchers. • There must be enough of a peer community to adequately review and discuss computational results and to afford students working on computational research a viable career track. Conclusion 3. Decisions about when, and how, to inest in HECC should be drien by the potential for those inestments to enable or accelerate progress on the major challenges in one or more fields of science and engineering. THE NEED FOR CONTINUING INVESTMENT IN HECC Task (d) of the Statement of Task for this study calls for the committee to reflect on the oppor- tunity costs of simply waiting for computing capabilities to improve in response to some of the same competitive incentives that have been driving information technology for several decades now. Some of the major challenges identified in this report, particularly those that would enable better understanding of climate change, are innately urgent, so that delays in addressing them would affect more than just scientific progress. More generally, the committee believes that the major challenges identified in this report cannot necessarily be decoupled from one another and supported selectively, which led the com- mittee to the following conclusion: Conclusion 4. Because the major challenges of any field of science or engineering are by definition critical to the progress of the field, underinestment in any of them will hold back the field.

OCR for page 121
 THE POTENTIAL IMPACT OF HIGH-END CAPABILITY COMPUTING Optimum progress will be achieved if all modalities of research—theoretical, experimental, and computational—are supported in a balanced way. In many cases, HECC capabilities must continue to advance in order to maximize the value of data already collected or experimental investments already made or committed. For instance, remote-sensing projects under way in astrophysics and atmospheric science will produce quantities of data that cannot be utilized by those fields without commensurate progress in analytical capabilities. In evolutionary biology, so much genomic information is being gener- ated that the paucity of tools for handling the information in a comparative framework is jeopardizing its value; in short, the inflow of observations is exceeding our ability to process them. We want to avoid what is known as the “write-once, read-never” phenomenon. The value of massive amounts of data cannot be properly leveraged without commensurate investments in the HECC infrastructure needed for their analysis. Most of the opportunities opened up by HECC, as described in this report, require long-term R&D. For those that will require the next-generation of computer architectures, many algorithmic and soft- ware developments must come together before the opportunities can be realized. Computers, models, applications, and knowledge all evolve in a coordinated way. If researchers simply watch the advance of computing technology and wait until it possesses the needed capabilities, they will not be prepared to use the new capabilities effectively and efficiently. Past experience—adapting first to vector processors and then to parallel processors—suggests that this will entail years of effort. For those opportunities that require other sorts of groundwork, such as model development, there is also a long path toward capital- izing on the opportunities. To minimize the risk of technological surprise, multiple directions must be tried out—and tried out now. The issue raised by Task (d) is not just whether to invest in a new platform, because so many other steps must in any case come together before a new capability is available. The real issue is how to stage those preparations so as to build new capability. The risk of not investing now is that those steps will not be ready when needed, and that technological risks will be too high. The committee notes that Task (d) seems to assume that HECC is focused on the computing plat- form, with jumps in capability being tied to advances in hardware. That is somewhat at odds with the more holistic view expressed in committee Conclusions 1 and 2. In the committee’s view, different fields must take different steps in order to capitalize on HECC, and many of those steps do not depend on pushing the state of the art in processing speed. This is not to say that there is never a need for the federal government to invest in computing hardware that is more ambitious than that targeted by the commercial world. It is, rather, to drive home the point that hardware investments are not the only way for the federal government to advance HECC. The premise behind Task (d) is that the information technology industry has for several decades advanced processing capabilities exponentially. To continue at that rate, which is generally referred to as following Moore’s law, computer manufacturers are now moving toward new hardware architectures, as yet undefined. No matter what form these architectures take, this change is likely to be very disrup- tive, akin to the shift from von Neumann to parallel architectures in the 1980s. That shift required a great deal of effort in algorithm and software research before users could effectively use many proces- sors in parallel. The emergence of new architectures will probably necessitate a comparable amount of research to develop robust and efficient basic algorithms that make good use of the new architecture. An exponential increase in computing speed is not guaranteed, because we cannot assume that existing codes will port well to the new platforms. There are still no productive and easy-to-use programming methodologies or low-level blocks of code that can take full advantage of multicore processors. Multicore parallelism is unfamiliar to many commercial software developers, and it also requires different sorts of parallel algorithm development. For instance, MPI and Open-MP, which are common software packages used throughout astrophysics,

OCR for page 121
 CONCLUSIONS will not work well on multicore architectures. Because the hardware architecture for the next generation of HECC machines is not yet defined, efficient software libraries have yet to be developed. Moreover, computer-science education has focused on teaching sequential algorithms, while automated methods, such as those in compilers, cannot deduce algorithmic concurrency from most sequential codes. Many algorithms will have to be rethought and much software rewritten. As a result of these impending developments, the committee reached the following conclusion: Conclusion 5. The emergence of new hardware architectures precludes the option of just waiting for faster machines and then porting existing codes to them. The algorithms and software in those codes must be reworked. CLASSES OF NUMERICAL AND ALGORITHMIC CHALLENGES Chapter 6 discusses the classes of numerical and algorithmic challenges that the committee discerned from the four areas of science and engineering covered by this study. The committee reached the follow- ing conclusion about the classes of numerical and algorithmic characteristics that will be needed: Conclusion 6. All four fields will need new, well-posed mathematical models to enable HECC a pproaches to their major challenges. Astrophysics and the atmospheric sciences share two needs: one for new ways to handle stiff differential equations and one for continuing adances in multi- resolution and adaptie discretization methods. Astrophysics and chemical separations also share two needs: one for accurate and efficient methods for ealuating long-range potentials that scale to large numbers of particles and processors and one for stiff integration methods for large systems of particles. The HECC-dependent challenges in all four of the fields studied will rely on software of much greater complexity than that currently in use, and it must be optimized for new (as-yet-undefined) computer architectures. Both factors point to the need for large, sustained efforts in software. In addition, it is very clear that data management, analysis, and mining are increasingly critical and crosscutting algorithmic challenges. While data-intensive computing is not always thought of as within the province of HECC, it is clear that high-end computational science and engineering very often stress data-management capabilities. Very powerful computer simulations can generate hundreds of gigabytes of data, which are very difficult to manage and visualize. Other types of research are faced with similarly enormous sets of input data, such as those from satellites and telescopes, and in those situations HECC capabilities are required to digest data and create insight. HUMAN RESOURCES Conclusion 5 implies that the committee foresees an increasing need for computational scientists and engineers who can work with mathematicians and computer scientists to develop next-generation HECC software. Chapters 4, 5, and 6 explicitly mention the need for more widespread education about scientific computing. A typical earth scientist, for instance, is not prepared to transition code from an IBM supercomputer to a Cray. That is a task that calls for specialists with a knowledge of software engineering, applied mathematics, and also some domain knowledge. In the atmospheric sciences, it might be feasible to plug in new physics models and not risk disturbing the underlying performance of

OCR for page 121
 THE POTENTIAL IMPACT OF HIGH-END CAPABILITY COMPUTING the code, but in other fields, like computational chemistry, one needs to dig deeply into the code and model in order to work on its optimization. Based on these observations, the committee reached the following conclusion: Conclusion 7. To capitalize on HECC’s promise for oercoming the major challenges in many fields, there is a need for students in those fields, graduate and undergraduate, who can contribute to HECC-enabled research and for more researchers with strong skills in HECC. The career path for people who invest time in developing high-end computing capabilities, which by themselves might not constitute publishable research, is problematic, especially in academia. What is needed is a career path that encompasses both a service role (HECC consulting within their field and to computer scientists) plus opportunities to conduct their own research. LESSONS LEARNED FOR FIELDS THAT MIGHT PERFORM SIMILAR STUDIES The findings and conclusions in this chapter might apply to other fields of science and engineer- ing, but the committee did not explore that question. The study that led to this report was, in part, an experiment to determine whether particular fields of science and engineering could follow a method- ology known as “gaps analysis” to determine the potential impact of—and hence their implied need for—advanced computing. In the committee’s view, the experiment was a success. Even though the four fields selected for this study are very disparate, the committee was able to develop credible snap- shots of the major challenges from each of those fields and then determine which of them are critically dependent on HECC. Any other field that wishes to perform a similar self-assessment should take the following lessons to heart: • It is necessary to build on existing statements about a field’s current frontiers or major challenges. Developing a consensus picture of the frontier, and of the major challenges that define promising directions for extending that frontier, is a major task by itself. • It is important to determine which major challenges for the field are critically dependent on HECC. It is easy to spot opportunities for applying HECC to advantage, but that is not the same as identifying the major challenges where progress will be limited if appropriate HECC cannot be brought to bear. • Compelling justification for a particular proposed HECC investment would require a level of analysis not included in this report. For each of the major challenges targeted by the investment, it would be appropriate to identify the various research directions that are germane to progress and their associated infrastructure requirements. From them, one could develop an investment strategy that maximizes the potential for scientific progress. All the infrastructure components needed to apply HECC to the challenges that depend on it must be identified, and the community must develop a clear understanding of the resources needed to build a complete infrastructure. Merely giving a field access to supercomputers is no guarantee that the field’s scientific progress will be enabled or accelerated.