3
State of U.S. Climate Modeling
An important task of this study was to quantitatively assess the computational and human resources presently directed towards climate modeling in the United States. To accomplish this goal two surveys were developed ( Appendix C and Appendix D ). One of these surveys was sent to large and intermediate-size modeling centers and one was sent to small centers. After these surveys were drafted, a specialist in social surveying edited them to ensure that the information collected was as free of bias as possible. These surveys were sent to 50 modeling institutions and groups and 42 responses were received. The panel does not claim to have surveyed all groups or institutions operating small-scale modeling efforts. Because of the varied and extensive use of modeling in many areas of earth science, it would be extremely difficult to identify all of these small centers, thus, those responses that were received were taken to be indicative of smaller efforts. A good estimate of resources could be obtained for the largest centers because they were easier to identify, and all responded. Survey responses are discussed below and tabulated in Appendix E.
3.1 MODELS
The information collected on current modeling activities shows the robust and varied nature of climate and weather modeling in the United States. Smaller modeling centers enjoy a level of resources equivalent to what would have been considered supercomputer resources only a de-
cade ago allowing them to run either in-house (regional and global) component models, component models from the larger centers, or a combination of the two. Smaller centers can even run coupled climate models but at coarse resolution (e.g., 800 km) or a higher resolution (300 km) for shorter time periods. Responses to the question about improvements that are being planned for models at all centers were varied, but most involved a mixture of increased model physics, dynamics, numerics, efficiency, and applicability. Many respondents also noted the desire to better incorporate new types of satellite and radar data.
In general, most respondents stated that their code was portable on platforms other than those on which they normally operated although some models required a moderate amount of optimization to ensure that they ran with minimal performance loss. Most centers release their modeling results to the wider scientific community without restrictions. A few centers freely release their data but stipulate that the results be used only for research purposes; others limit the release of modeling results to collaborators.
Large and intermediate-size modeling centers were asked whether there were plans to convert model code to run on massively parallel (MPP) architectures. Most institutions responded that this conversion had already taken place although those that have converted or are in the process of converting noted the difficulty in transferring certain models to an MPP architecture. Many respondents also noted that this conversion required significant programmer time and drained resources that could have been devoted to other activities. When asked for comment on the relative merits and hindrances of MPP versus VPP architectures, the majority of respondents preferred VPP architecture for the following reasons:
-
MPP systems are generally more difficult to program and require increased computer expertise. There are therefore significant training issues involved in the use of these systems. These difficulties are particularly significant for university centers as they often rely on graduate student labor that is characterized by high turnover.
-
Data assimilation and processing are more difficult on MPP systems.
-
VPP systems are more stable and reliable.
-
There are significant scalability problems on MPP systems.
-
There is a lack of compilers on current MPP systems that make these systems difficult to use.
Despite the difficulties with MPP systems some respondents felt that these systems had significant benefits over VPP systems (e.g., lower memory cost and increased aggregate CPU power).
3.2 COMPUTING 1
Most small and many intermediate-size modeling centers either rely on the use of workstations or clusters of workstations for their modeling efforts or they collaborate with the larger centers and use their computational facilities. Larger-size modeling centers primarily rely on supercomputers for their climate and weather simulations. Of the large modeling centers surveyed, half share their computational time with the wider community. The computing capacity of large and intermediate-size modeling centers are described in Table 3-1. This table also includes planned upgrades to existing systems.
When asked what upgrades would be incorporated if funds were available, the responses were varied (Table 3-1; Appendix C and Appendix D), although the majority of centers noted the need for increased capabilities such as additional processors, nodes, and disk space or some combination. Some centers also noted the need for additional network bandwidth to more rapidly acquire data sets from remote sources. Some of the smaller centers, when asked what additional upgrades would be incorporated if funds were available, said they would prefer to devote any new funds to the purchase or enlargement of an existing PC cluster rather than pooling these funds to upgrade shared supercomputing resources.
Most centers (large, intermediate, and small) responded that computing capabilities were limiting the resolution and number of model runs and the production of real-time forecasting products. Although it is arguable that the desire will always be to produce a greater number of higher resolution, higher complexity model runs regardless of the available computational capacity, it is apparent that the ability to accurately model weather and climate at finer spatial and temporal scales is dependent on the ability to obtain a robust estimate of climate model uncertainty. This requires the analysis of a large number of cases and ensemble members per case. Increased model quality will lead to increased predictive skill and higher quality operational products for climate and weather prediction. Thus, the computational limitations noted in the survey are not only affecting current research activities and model development but also the production of outputs required for operational use.
It is important to note that, in addition to the need for additional computing capabilities, many respondents discussed the critical need for qualified scientists, modelers, and hardware and software engineers. This need is discussed more fully in the next section.
1 |
The information in Table 3-1 was accurate at the time that the survey results were assembled. Since then, information detailing the upgraded computing capabilities at NCEP was provided. The recently upgraded machine uses IBM's Power 3 Winterhawk-II technology, operating at 375mhz. The system has 2208 processors in 40 frames and has 512 compute nodes, with 2 GB memory per node. |
TABLE 3-1 Computing Resources Located At Large Modeling Centers a
Institution b |
Computer System |
Processors |
Last Upgrade |
Sustained System Performance |
Central Memory / Secondary Disc Storage |
Future Upgrades Planned |
CIT-JPL |
1Cray T3D/T3E 2SGI Origin 2000 |
1512 2128 |
1999 |
110~50 Gflops |
No information provided. |
No information provided. |
COLA |
1SGI Origin 2000 2Compaq ES40 3Compaq DS20 |
116 CPUs 24 CPUs |
1999 |
12.5 Gflops 21.25 Gflops |
14 GB 24 GB/node Disk capacity: 2.3 TB (shared via gigabit-switch LAN) |
None. |
CSU |
1SGI Origin 2000 2Octane |
110 212 |
20% of inventory upgraded/year |
No information provided. |
No information provided. |
8-processor Origin in 2000 (Chance). |
FSU |
1IBM SP2 with 9 nodes running on a fast interconnect bus 6 RS6000 model 260/270 series. |
2 of the 260 series are dual processors, the remaining 4 units are 4 processor machines. |
No major upgrades. |
Unknown. |
Each machine has approximately 2Gb of memory; 270's have ~ 50 GB of disk space Other machines have ∼ 9Gb of disk space per machine. |
|
UCLA |
1Compaq XP1000-cluster |
15 |
11999 |
12 Gflops |
12 GB/0.1 TB |
None planned. |
UH |
1Cray SV-1 2SGI Origin 2000 3SGI Origin 2000 4SGI Origin 2000 |
124/300 Mhz 232/250 Mhz 316/195 Mhz + 8/30 Mhz 44/180 Mhz |
1March1999 2March 1999 3March 2000 4December 1999 |
128.8 Gflops 216 Gflops 36.2 Gflops 41.4 Gflops |
116.0 GB RAM/156GB 214 GB RAM/180 GB 34.5 GB RAM/36 GB 41.0 GB RAM/1 TB RAID5 (capacity is extended by Veritas HSM using a tape library with 13.6 TB capacity) |
No information provided. |
UI |
1NekoTech Jaguar 333Mhz 2DCG Computers Viper 500 MHz 3DCG Computers LX 533 MHz 4DCG Computers LX 533 MHz 5MicroWay Alpha 600 MHz |
11 21 31 41 51 |
11995 21997 31997 41998 51999 |
No information provided. |
164 M/9 G 2128 M/18G 3128 M/18 G 4128 M/18 G 5256 M/18 G |
Three AlphaStation-type workstations in the next five years. |
IRI |
1Cray J-9 2SGI O2000 3NEC SX-4B |
18 and 16 264 32 |
4 years for Crays; nearly 1 year for Origin upgrade, just over 1 year for SX4. |
11.5 Gflops 25 Gflops 32.5 Gflops |
132 Gbytes, 1.4Tb 216 Gbytes, 0.1Tb 38 Gbytes; 0.2Tb Additional mass store available (10Tb at LDEO, larger system at SDSC) |
Crays will be replaced within the next year. New system not known yet. |
LANL |
1SGI Origin 2000 |
11024 |
11999 |
1100 Gflops (theoretical sustained)c |
1256 MB/processor or 256 GB/system |
Unknown. |
NASA-DAO |
1SGI Origin-2000 clusters |
1Six 64-CPU machines, one 32-CPU machine |
12000 |
1~ 3-4 GFLOPS on each of the 64-CPU clusters |
116 GB central memory; disk space varies |
Only minor upgrades planned. |
NASA-GISS |
1SGI Origin 2000 |
196 |
11998 |
1For mostly single-processor and ensembles of runs it is ∼ 75 Gflops |
1Central memory 20 GB/1000 GB |
Upgrade to 128 processors and an upgrade of chip speed to the current state of the art as well as increased disk storage. |
NASA-GSFC |
1CRAY T3E/600 2DEC alpha 4100 |
11024 212 |
12000 21999 |
140 Gflops 21 Gflop |
1128 GB (mem) 750 GB (disk) 23.5 GB (mem) 1800 GB (disk), 20 TB mass strorage system |
1and 2. doubling of capability for the current system in 2001 and another in 2003. |
NCAR-M. Blackmon |
1Cray C-90 2Cray J-90 3SGI Origin 4IBM SP |
116 216-20 332, 64 or 128 4Variety of configurations |
1Decomissioned in late 1999 4Spring 2000 |
1~5 Gflops 3~5 Gflops Both using 64 processors |
Unknown. |
New system procurement to be installed in early 2001 |
NCAR-W. Washington |
1CRAY T3E900 2SGI Origin 3Origin 2000/128 4Hp SPP2000 5IBM SP2 6Sun Starfire 7DEC/Compaq 8Alpha Cluster 9Linux Cluster |
Unknown. |
Unknown. |
Unknown. |
No information provided. |
NCAR will soon be involved in procurement for a new system to be installed in early 2001. |
NOAA-CDC |
1Compaq AlphaServer DS10 2Sun Enterprise 4500 3Sun Ultra 60 4Sun Enterprise 450 |
112 machines each with a single 466mhz Alpha 21264 processor 22 machines one with 8 UltraSparc II 400mhz processors, the other with 4 36 machines each with 2 360mhz UltraSparc II processors 44machines each with 4 300 MHz UltraSparc II processors |
1May 2000. |
16.3 Gflops 23.6 Gflops 33.25 Gflops 43.6 Gflops LINPACK Gflops for aggregate of each system type. |
1Each node has 512 MB/50 GB 24 GB on the 8-processor machine, 2 GB on the 4-processor machine 31 GB RAM on 3 machines, 2GB on the others 42 GB 2928 GB of disk storage shared by the Sun systems. |
AlphaServer cluster will be upgraded, as faster processors become available (resources permitting). |
NOAA-GFDL |
1SGI/Cray T932 2SGI/Cray T94 3SGI/Cray T3E (water-cooled chassis) |
1122 224 3128 450-MHz |
1Upgraded to 26 processors in 1996; was de-rated to 22 processors in 1999 because of irreparable damage to the inter-processor network. 3The air-cooled T3E system with 40 450-MHz processors, each with 128 MB of memory was replaced with a water-cooled T3E with 128 450-MHz processors, each with 256 MB of memory. |
Sustained system performance of approximately 14–15 Gflops for the laboratory's actual workload. |
Central Memory: 10.004 TB (Shared Memory) 20.001 TB (Shared Memory) 30.033 TB (Distributed Memory) Secondary Storage: 132 GB 22 GB 30 GB Rotating Disc Secondary Storage: 1450 GB 2770 GB 3430 GB |
Acquire a balanced high performance system to replace the current SGI/Cray systems. The first phase of this new system is expected to provide at least a three-to-four-fold increase in performance. The second phase, should deliver a substantial increase in performance over the phase-one system. |
NOAA-NCEP |
1IBM-SP 2SGI/Origin 2000 |
1768 2256 |
1Nov. 1998; Major upgrade due in Sept. 2000 2Fall 1999 |
Unknown. |
1256MB/node on 384 nodes, ∼96GB total 2128GB total |
-The IBM-SP will be upgraded to 128 nodes (2048 PE) system in Sept. 2000 -Further upgrades to increase capacity in 2001. -NAVO MSRC will continue to increase its total capacity by installing new systems such as Sun server and IBM SP. |
NPGS |
1T3E 2SGI Origin 2000 3IBM SP2 All off-site |
1256 2128 364 |
0–3 years old |
110 Gflops 210 Gflops 35 Gflops |
0.5~1.0 GB |
The remote systems have plans in the works for upgrades of 2x to 5x in computing power. |
NRL |
1Cray C90 (2 systems at FNMOC) 2Dec Alpha (NRL system) 3SGI O2K (FNMOC) 4T3E (DoD HPC/NAVO) |
116/8 28 3128 41088 |
11999 21999 32000 41998 |
16.4/3.2 Gflops 22.0 Gflops 340 Gflops 450 Gflops |
18GB/3 TB 28GB/1 TB 3256GB/3.7TB 4387GB/1.5 TB |
SGI O2K will be upgraded to SGI SN1 during fall 2000. DoD HPC undergoes constant upgrades. |
PNNL-S. Ghan |
1~3 SUN ultra 5 workstations 2Beowulf cluster |
11 216 |
11999 22000 |
10.2 Gflop 22 Gflops |
1512 Mb /30 GB 24 GB/320 GB |
Upgrade Beowulf network to gigabit. |
|||
PNNL-R. Leung |
1IBM-SP2 |
1512 |
11999 |
1247 Gflops |
1262 GB/5 TB |
Upgrade IBM-SP by replacing all existing processors with faster ones. |
|||
PSU |
1Cray SV-1 2IBM RS6000 SP (8 Winterhawk nodes) |
116 (each 1.2 GF) 28 nodes of 4 cpus each (32) |
12000 – Cray SV-1 replaced a J-class machine 2Brand new |
16 Gflops 26 Gflops |
14 GB/220 GB 216 GB/292 GB |
The IBM is an effort to match the architectures of recent U.S. lab purchases. If successful in transitioning codes to this machine the plan is to increase the number of cpus, hopefully by a factor of 3. |
|||
|
3.3 HUMAN RESOURCES
The survey responses revealed an overwhelming need at many of the modeling centers for highly qualified technical staff, such as modelers, hardware engineers, computer technologists, and programmers, who are difficult to find because private industry lures them away with higher salaries and other financial incentives.
An interesting point to note from the survey responses is that staffing levels at all three sizes of centers are similar despite differences in the scale of effort. This is likely because at the smaller centers many of those listed as staff are students and post-docs, whose number vary depending on funding levels. There are approximately 550 full-time employees dedicated to climate and weather modeling in the United States. This number is likely to be low because all small modeling centers were not surveyed, and there were a few intermediate-size centers that did not respond.
Most centers, regardless of size, indicated the likelihood of increasing the number of staff in the near future. Although many of the staffing increases listed were in the area of software development and computational support, a number of institutions were also increasing the scientific staff devoted to model interpretation and parameterization. Larger centers tended to be more satisfied with their staffing numbers. In part, this difference appears to be due to difficulties in finding stable, long-term funding for permanent staff at the small centers.
The respondents from universities differed in the belief that there is a decrease in the availability of high quality graduate students entering the atmospheric sciences. Those centers that felt there were sufficient students noted that the greater difficulty was finding continued funding to support the highest quality students available.
3.4 THE HIGHER-END CENTERS
Table 3-1 gives a synoptic view of the computer resources available to the higher-end centers in the United States. In general, most of the centers have computer capabilities on the order of 20 Gflops with one or two having twice that. With these resources most coupled climate models are run at about 300 km resolution in the atmosphere and about 100 km in the ocean.
In contrast, the European Center for Medium-range Weather Forecasting (ECMWF) has a 100-processor Fujitsu VPP5000 rated at a sustained 300 Gflops, a 116-processor Fujitsu VPP700 rated at a sustained 75 Gflops, and a 48-processor VPP700E rated at a sustained 34 Gflops. Its forecast model is run at 60 km resolution globally while its seasonal-to-interannual predictions are run at about 130 km resolution globally in a
one-tiered sense and with ensembles of 15 per month. For more detailed information refer to http://www.ecmwf.int/research/fc_by_computer.html .
The Japanese Frontier Program is developing a 10 km global atmospheric model and has contracted for a supercomputer (“The Earth Simulator ”) having a sustained speed of 5 Tflops ( http://www.gaia.jaeri.go.jp/OutlineOfGS40v3_1.pdf).
3.5 ORGANIZATIONAL BACKGROUND
The earlier modeling report (NRC, 1998a) pointed out the basic health of small-scale climate modeling and the lagging progress of high-end climate modeling: these findings were confirmed above. That report summarized the difficulties faced by high-end climate modeling as follows: “The lack of national coordination and funding, and thus sustained interest, are substantial reasons why the United States is no longer in the lead in high-end climate modeling.” It also identified the United States Global Change Research Program (USGCRP) as the only available mechanism to coordinate and balance the priorities established by individual agencies, but pointed out that the USGCRP did not have the means to do this.
More background is appropriate and again the organizational comparison of weather and climate proves valuable. The government organization for weather and weather forecasting was solidified about 1970 when NOAA and its Weather Service was placed in the Department of Commerce. The Weather Service embodied a specific agency structure with a well-defined mission that could be evaluated by progress in the production, accuracy, and delivery of weather forecast products.
The development of climate research in the United States was hastened by concerns over the perceived problem of global warming, but was constrained by the existence of an agency structure that had solidified by 1970. No additional government re-organizations occurred after 1970 and previous ones did not have climate as a tangible concern. Because no single agency could address all the aspects of climate (or more precisely, because many agencies claimed different aspects of climate but none were founded with climate as a mission), the Global Change Research Act of 1990 established the U.S. Global Change Research Program (USGCRP) “aimed at understanding and responding to global change, including the cumulative effects of human activities and natural processes on the environment, and to promote discussions toward protocols in global change research and for other purposes ” (Appendix A of NRC, 1999a). It set into motion the USGCRP interagency process that addressed the following research elements:
-
global observations of “physical, chemical and biological processes in the earth system”;
-
documentation of global change;
-
studies of earlier global change using paleo proxies;
-
predictions of global change including regional implications;
-
“focused research initiatives to understand the nature of and interactions among physical, chemical, biological, and social processes related to global change.”
It also called upon the National Research Council to evaluate the science plan and provide priorities of future global change research. This was the motivation behind the NRC “Pathways” report (NRC, 1999a).
The Pathways report pointed out the flaws in the conception and implementation of the USGCRP—in particular that “in practice, the monitoring of climate variability is not currently an operational requirement of the USGCRP nor is there an agency of the U.S. government that accepts climate monitoring as an operational requirement or is committed to it as a goal.” It also expanded the domain of climate research to include variability on seasonal-to-interannual and decadal-to-centennial time scales.
A group of agencies, each devoted only to research and combined in the USGCRP, is currently the only institutional arrangement for performing climate research; for establishing and sustaining a climate observing system; for identifying, developing and producing climate information products; for delivery of these products; and for building the general infrastructure needed to accomplish these tasks. The USGCRP is currently the only entity organized to develop climate models and to secure the computational and human infrastructure needed to respond to the demands placed on the climate modeling community. About 6% of the $1.8 billion annually allocated to the USGCRP is devoted to modeling and this includes the major data assimilation efforts of the NASA Data Assimilation Office.
3.6 SUMMARY OF HIGH-END CAPABILITIES IN THE UNITED STATES
With a sustained computer capability of 20 Gflops, the current capability of some of the U.S. high-end centers, a climate model consisting of a 300 km resolution atmosphere with 20 levels in the vertical, a land model, and 100 km ocean model, all coupled together and well coded for parallel machines is able to simulate 5–10 years per wall-clock day (see http://www.cgd.ucar.edu/pcm/sc99/img002.jpg . A 1000-year run would therefore take between 3 and 6 months to complete as a dedicated job. As we will see in the next section, these run times are too long to address some of the recent demands placed on the U.S. climate modeling community.