| ||||||||||||
| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 51
s
Toward the Future
Steve Chen
Supercomputer Systems, InG
If Jack Worlton is a lifetime fellow-user of supercomputers, I have
become a longtime pursuer of a dream machine. I have chased this
machine for more than 10 years. I still have not found the perfect machine
to fulfill the users' needs. This has become very challenging but also very
rewarding work.
My hope is that some day we can come up with a machine that is
about 100 times faster than today's machines. This machine, as one of
the fundamental tools, will be used by scientists and engineers in many
different disciplines to study things they cannot do today.
I would like to share with you some of my thoughts on the future
developments in supercomputing and their potential impact. I will speak
only from a designer's point of view.
THE CURRENT STAGE IN SUPERCOMPUTING
Supercomputing has come a long way, when viewed from many angles:
in speed, the central processing units (CPUs), memory size, input/output
(I/O), peripherals, physical size, and software.
Speed
You have heard about machine clock rate coming down from 100 ns to
50 ns, then to 25 ns, 12.5 ns, 6 ns, and 4 ns. And each time the clock rate
is reduced by half, the underlying component technology becomes more
51
OCR for page 52
52
SO CHEN
complex. Furthermore, the requirements for data space increase. So the
challenge we face in designing the machine pets worse
, ~. . _
Central Processing Units
The central processing unit (CPU) is the heart of the system. When
we cannot get more speed out of a single CPU, we start combining more
CPUs. But this is not an easy job either. We cannot just tie many boxes
together and make the machine faster. My favorite analogy: to build a
faster racing car, we have to decrease the car size and at the same time
have more engines in the chassis. We cannot put in larger engines because
the car would become big and clumsy. So for each generation, we have
to invent a smaller engine that runs faster than the previous one and link
together as many engines as possible, such that the car can run efficiently
when all engine power is applied concurrently.
We have seen the number of CPUs increasing from 1 to 2, to 4, to 8
in a machine. But keep in mind that each CPU has to be faster than the
previous generation. That makes the development work tough!
Memory Size
We start with 1 million words per CPU for data space. Next we see the
words increasing to 4, then to 8, then to 16 megawords per CPU. The data
space is increased to allow solving bigger problems as each generation's
machines harness more and faster CPUs. We are trying to stay one step
ahead of the application. Unfortunately, sometimes we have felt that we
are fighting a losing game. The memory component designer can only give
us a bigger memory chip with very little improvement in speed. Hence the
data access time from memory becomes slower relative to data compute
time. We must now figure out all kinds of tricks to compensate for the gap
between the memory chip and the CPU speed.
Input/Output
Many years ago an input/output (I/O) channel could run about 1
megabyte per second. This was increased to 10 megabytes per second,
and then to 100 megabytes per second, which soon will become a standard
rate for anything usable. So the trend is clear. To solve bigger problems
of the future, we cannot just add memory size and CPU power without
significantly increasing the I/O transfer rate.
Peripherals
Peripherals are also a serious problem. Advances in storage technology
are falling behind the CPU's improvement in terms of capacity and speed.
OCR for page 53
TOWARD THE FUTURE
53
Ten years ago it was common to have disks with hundreds of megabytes
and a 1-megabyte-per-second transfer rate. ~day, we have gigabyte storage
units with a 10-megabyte-per-second transfer rate. In the meantime, we still
have to use a solid-state secondary memory device as a buffer to smooth
out the speed difference between CPUs and peripherals.
Physical Size
Not too many people recognize the changes in the physical size of
supercomputers. Many years ago the CDC 6600 filled about 500 square
feet of floor space. The CRAY-XMP occupied roughly 100 square feet.
The CPU module of the CRAY-YMP is suitcase-sized. Future products
may shrink even further. But that does not mean that such a CPU is easy
to design. We cannot just squeeze everything together. As each generation
of machine comes down in size, the heat dissipation becomes harder to
deal with. We can increase circuit density in the chip, but we cannot
proportionally reduce the power per gate.
For example, a suitcase-sized supercomputer may dissipate a couple of
thousand watts of power. We may be able to put it on the desktop, but we
will have an instant meltdown in case of a cooling malfunction it will go
right through the table. We are dealing with a fantastic problem. It's no
small design challenge to try to keep a supercomputer cool.
Software
No one paid attention to software initially. Most people were thinking
about supercomputers as just pieces of hardware. The user was forced to
figure out how to use it and then hand-code to optimize everything. Later
on we had a little primitive compiler software. Then slowly, people started
to recognize that this was not good enough anymore. Production-quality
compiler software was developed for vector processing over the past 10
years. User expectations for software functionality and performance fea-
tures continue to rise as more and more supercomputers become available
and are widely used.
Systems
Let's view supercomputer development from a different perspective to
appreciate how far we have come. When we look at the 10-year period
from 1955 to 1965, we can see that the CDC 6600 was a dominant factor
in the supercomputer arena, with 1 million to 10 million floating-point
operations per second.
OCR for page 54
54
S1~^ CHEN
In the period 1965 to 1975, the CDC 7600, the TI AS C, the Burroughs
BSP, and the Illiac IV were developed. They reached from 10 million to
100 million floating-point operations per second. The CDC 7600 was the
major workhorse during this time period.
From 1975 to 1985, thanks to Seymour Cray, a new machine took
the lead. Cray created the CRAY-1 architecture to take advantage of
extensive pipeline vector processing. In addition, supercomputer systems
became more reliable. The mean time between failures jumped from 10
hours to 100 hours and then to 1000 hours a viable product for use in
commercial industry. After the Cray-XMP was introduced, applications
expanded rapidly, from pure laboratory research to various commercial
product areas.
During this time, more machines and manufacturers entered the mar-
ket: the GRAY-2, the CDC Cyber-205, and also, from overseas, the Fujitsu,
Hitachi, and NEC models. These machines generally reached from 100 mil-
lion to 1 billion floating-point operations per second. Many more players
have joined in because they see the importance of supercomputing, not
only in the computer industry itself, but also in its wide effects on many
key industry applications.
Personally, I have had the good fortune to work with two of the best
designers in the world, Dave Kuck and Seymour Cray. I have learned a
lot from them. Dave Kuck inspired me with the Illiac IV and with the
follow-on Burroughs BSP project. These projects gave me a deeper inside
view of the system and software areas. I was also pleased to be able to join
Cray Research. Seymour Cray was a good model of the best designer in the
hardware and packaging areas. Finally, I was lucky to have the opportunity
to participate in designing the CRAY-XMP and the YMP, to try my first
foot in the water.
THE NEXT STAGE IN SUPERCOMPUTING
What's in store in the next 10 years? Definitely more companies will
enter the competition, but also some will fall out. The important thing is
that speed will be widespread. In the highest-performance arena, instead
of going 10 times faster, the range will increase to 100 times faster. We
will see machines with 32 to 256 CPUs in production use. Machine speed
will reach between 1 billion and 100 billion floating-point operations per
second. This is based on the technology as far as we can see, barring any
major breakthroughs.
Even this may not be fast enough.
The Director of the National
Center for Atmospheric Research, Bill Buzbee, once told me that the next
generation of ocean problems may take about 100 to 1000 hours of current
supercomputer time. I couldn't even comprehend the problem he was
OCR for page 55
TOWARD THE FUTURE
55
describing. But the problem definitely cannot be solved today. We need
to continue to push supercomputer technology forward in order to fulfill
those requirements.
My personal goal in the future is to develop such a computational
engine for scientists and engineers to open new frontiers in science and
industry, similar to those made possible by the electron microscope and by
steam- and gas-powered engines in earlier days.
I have discovered that developing such a machine is not an easy job
anymore. No single person or single company can do it alone. We must
depend on various technologies-component, software, and application to
advance in a balanced way. We need to take advantage of every technology
we can get and stretch to move all these areas ahead.
Parallel Processing Environment
We are going into the arena of parallel processing, and it is just a
matter of time before people will learn how to do it. I know it is painful.
But we have moved from assembly language to Fortran. We took a long
time to get there and now Fortran may never die. Now we must move
from Fortran to parallel Fortran. It took about 10 years to grow from serial
Fortran into vector Fortran, and now it may take another 10 years to go
from vector to parallel Fortran. But if we don't start now, we may never
be able to take advantage of the performance of future machines.
So we see where the train is going. Idday, and in the near future,
we will have in production from 1- to 16-processor, high-performance
machines. But we also have seen experimental or developmental machines
that have 32 to 256 processors or even 1000 processors. Right now such
machines are in the research and development stage the critical task is
to study how to use them. Because each processor is quite slow, these
machines are not used in production for general applications.
Our goal is to move gradually toward more and faster processors, while
maintaining a consistent system architecture. This approach will ensure that
no users will suffer a degradation of performance in running their existing
production codes on the next-generation, more parallel machines when they
become available. In the meantime, as users gain experience in developing
more parallel application algorithms, they will be able to explore higher
performance through the added number of processors. I believe this is
a sensible approach to protect the users' software investment and, at the
same time, induce the long-term development of parallel applications.
Next, let us focus on how the three key technology areas component,
system, and application may proceed in developing a future high-perfor-
mance supercomputer.
OCR for page 56
56
STEVE CHEN
Component Technology Development
We will stretch the currently available component technology. We
must combine improvements in many elements to enhance the design of
the machine.
Device Speed
Device speeds have come down from 1 ns to 0.5 ns, and then to 250 ps
and 125 ps. They may even come down to the 50-ps range. Complementary
metal oxide semiconductor (CMOS), gallium arsenide (GaAs), and bipolar
devices are all viable. Each has its own advantages and disadvantages.
Circuit Den silty
Depending on the device type, today's circuit density is approaching
the 1 K-gate level for GaAs, the 10 K-gate level for bipolar, and the
100 K-gate level for CMOS. In the future, we may see even larger-scale
integrated circuits. How usable are these big chips? Bigger doesn't always
mean better. The advantage of these superchips depends on the trade-off
of speed, power, circuit complexity, and overall system considerations.
Metal Interconnect
As circuit density increases, more transistors have to be connected in
a relatively small and expensive silicon area. One way to keep the chip
size down is to make the interconnect metal thinner, so that more signal
lines can be placed next to each other. However, a thin metal line may
degrade the signal speed and integrity. As a result, the electronic signal
may travel more slowly between transistors, even though each transistor's
intrinsic switching speed is very fast. And, in the worse case, the signal may
not travel far enough before it disappears.
Furthermore, very thin metal may cause an electromigration problem
in a high-speed (high-current) application. This is due to the loss of the
electron-carrying property altogether inside the chip, leading to unreliable
components. Hence we have to develop a better metal interconnect system
within the integrated circuit to allow sufficient current-carrying capability
(for speed), while maintaining smaller physical size (for density). The
balancing act between speed and density is among the most demanding
requirements facing our component designers in the future.
Substrate Material
The substrate material used to fabricate the printed circuit board is
another critical factor. The traditional fiberglass-like material may not be
OCR for page 57
TOWARD THE FUTURE
57
sufficient for future high-speed and high-density applications. The electrical
property of the material may cause the signal to slow down and become
noisy and lossy as speed increases. In addition, the mechanical and thermal
properties of the material are also important in deciding the number of
signal layers, the density of signal lines, and the compatibility between chip
and substrate. We should continue to enhance current substrate materials
and search for new ones to give us the maximum component packaging
density required for a high-performance system.
Power Consumption
As I mentioned earlier, for a given technology, power per gate in a
chip is not coming down as fast as we would like it to. We have seen
improvements from 50 to 100 milliwatts per gate dropping to 10 to 20
milliwatts per gate (a factor of 5 reduction), then to 5 to 10 milliwatts per
gate (only a factor of 2 reduction). This power-reduction trend appears
to have flattened out. Hence, while we are increasing circuit density, the
total power per chip is rising, causing difficult cooling problems at the
component and system levels. This is a very critical area, and we need
intensive cooperative research efforts with component manufacturers in the
future.
Packaging
Many of the integrated circuits we are using are getting faster. Un-
fortunately, the performance gains at the component level are aerated
significantly because of the packaging loss all the way up to the system
level. Multiple levels of interconnect media, such as printed circuit boards,
chip attachments, connectors, backplane wires, and so on, all affect perfor-
mance. As clock rate increases, component, module, and system packaging
becomes a very critical issue for the total system design.
Testing and Measurement
The bigger the chip, the more pins there are to handle. Future chips
might have 250 to 1000 pins. In addition, they will operate at high speeds
and high power levels. As a result, the problem of testing chips becomes
quite complex and expensive. The same is true for high-speed measurement
equipment for circuit board and system checkout. Because a piece of test
equipment may cost up to $5 million, the availability of cost-eRective,
high-performance test equipment has become a more visible concern.
Unfortunately, it is getting harder to find suppliers of advanced test and
measurement equipment to satisfy the performance requirements. Com-
panies in the United States keep dropping out of the market, and some
equipment is only available from overseas. Without such equipment, one
OCR for page 58
58
SO CHEN
may have the best design, but one cannot build, test, and ship the machine.
So this is also a very important area to watch.
System Technology Development
Architecture Concepts
Once we have the best components, the next step is to put the system
together in the slickest way. There are many ways we can do this. We hear
about many different architectural concepts being explored:-single versus
multiple processors; system throughput versus processor speed; single-level
versus multiple-level parallelism; loosely coupled versus tightly coupled
system interconnects; monolithic versus distributed memory; and special-
purpose versus general-purpose system design. If one looks underneath
the design of future machines, it will have one or more of these architec-
ture Savors. However, the most important thing is to design a balanced
architecture and provide good software to support an application or many
applications. The user, in general, should be aware of but not be bothered
with the complexity of system design.
Solution Time
As I have mentioned before, the issue now Is not how fast one can
design a machine to do A + B; the real issue is solution time. In earlier
days, people compared different machines by counting how many millions
of floating-point "add" or "multiply" operations could be done in a second
(MFLOPS). That measurement is similar to the RPM (revolutions per
minute) rate of the wheels of a racing car. The RPM rate is not an
indicator of how much usable horsepower is available when driving on a
real road. Similarly, the MFLOPS of supercomputers bear no relation to
the performance obtainable on real user applications.
Later, when performance was measured by how fast a machine could
compute "Livermore Loops," some people could not differentiate between
a real supercomputing system and a "designer machine" targeted for Liv-
ermore Loops.
We should raise ourselves to a higher level. It took me about 5
years of preaching I can tell you that's how long I've kept arguing the
point to convince users to find a new performance measurement yardstick.
Fortunately, now they have gone up one notch to use LINPACK' a set of
mathematical subroutines for solving linear algebra that is, in general, more
usable than just the Livermore Loops rate or the peak MFLOPS rate.
Even so, the performance numbers on LINPACK are still only an
indicator of the computation time for a small part of the total solution
process. To be successful in future high-performance parallel processing
OCR for page 59
TOWARD THE FUTVRE
59
systems, we must strive for overall system performance and start to talk
about solution time. And we need the users' help to define what we mean
by solution time rather than computation time.
For example, three-dimensional seismic processing may involve reading
more than 20,000 tapes of earth data before a machine begins to do A
+ B. The process starts by getting data into the machine with the 20,000
tapes and then generating the analysis and output to see exactly ' whet is
underneath the ground. The whole process may take 3 months of today's
supercomputer time, during which only a few days may be spent on numeric-
intensive computational tasks. We need to define this whole process so that
we can measure "total time to get results." We want to make sure the
scientists can do their thinking instead of playing around with the computer
system, or running around the computer room.
If I give a machine to an aircraft designer, that person should be
able to construct a model, pick a grid point, describe the air foil, wing,
and tail, and then simulate it to see if the design is correct. The model
should include structure, air flow, and control and other interdisciplinary
conditions that have to be satisfied in one design. The designer should be
able to define this design process from beginning to end and measure the
machine performance by the total time that must be spent completing this
design process. This measurement is called solution time. The solution
time includes all of the following elements:
Data acquisition/entry;
Data access/storage;
Data motion/sharing;
Data computation/process; and
Data interpretation/visualization.
How to capture the raw and digitized design data, how to store it,
and how to move it efficiently in and out of the disk, solid-state secondary
memory, and main memory during computation are all essential to the
solution process.
Then, after all that has been done, how quickly can the results be
interpreted? When data can be generated very rapidly, a whole week may
be required to digest the numbers. I would rather see the visual: the
underground picture, or the heat flow on the surface of the integrated
circuit chip. When the alpha particle hits the electronic device, I want to
see the electromagnetic field moving while I watch. I want to be able to
start, or stop and restart again, the simulation process any time I want.
While I am simulating an air foil for an aircraft, I want to see if a particular
region of the air foil is subject to high pressure or temperature. If I feel
something is going wrong, I want to zoom into a particular area to test
it again or try out a different algorithm or analysis. I need to have an
OCR for page 60
60
STEVE CHEN
Levels of Parallelism
System User Specified, System Scheduled
Job
Job Step
Program
Procedure
Basic Blocks
Loops
Statements
System Scheduled
User and Utility Specified
User and Compiler
Compiler and User
Compiler and User
Compiler
Compiler
interactive design or analysis capability on the system. And last but not the
least important of all, I want to be able to complete all this process without
leaving my own design station.
I hope these examples illustrate the important difference between the
computation time and the solution time that involves the whole process.
Whoever designs it, the machine with minimal solution time will be the
best system in real application.
Exploitation of Parallelism
~ achieve high performance on future parallel systems, we should work
from two directions (see box). From the bottom up, we should continue
to improve the compiler techniques to exploit automatically the parallelism
in user programs. This includes extending vector detection capability to
the detection of parallel processable code. From the top down, we should
provide system and applications support in terms of libraries, utilities, and
packages, all designed to help users prepare their applications to get the
most performance out of the parallelism existing at the highest level.
One way to think of a parallel application in the future is as a multiple-
domain approach. We have many, many processors at our disposal. How
do we decompose a problem and make it 99 percent parallel? It is not
difficult. If we look at natural phenomena, most are parallel. Unfortunately,
we are trained to think sequentially. Take the aircraft design example again.
We simulate one wing, then another wing, then the body, the tail. Each
part is called one domain. We can now simulate all domains at the same
time.
OCR for page 61
TOWARD THE FUTURE
61
We can also think of a parallel application as a multiple-stage pipeline
approach. Take the seismic processing example. First we start with tape
input, and then comes data verification and alignment. The next step is
analysis and simulation. The final step is data interpretation and visual-
ization of the underground picture. All stages of the whole process can
be done concurrently on the system. The first stage can be performed in
groups of a few processors, with data flowing continuously to support the
next stage on another group of processors, and so on.
Take this one step further. If we look at the future application de-
velopment, we can bring different disciplines into one design solution, a
multidiscipline approach. For example, in the design of a space shuttle,
materials, structure, aerodynamics, and control problems can all be evalu-
ated at the same time with various design criteria. The analysis step of each
discipline can be processed in parallel by different groups of processors.
These examples are just a few of the ideas for exploring future parallel
systems to achieve much higher system performance through a top-down
application decomposition than can be obtained only by the bottom-up
compiler approach. The key to success is the adaptability of the system
architecture. Users should not have to change application algorithms when
they migrate to higher parallel machines.
Application Technology Development
Many examples indicate that supercomputers have proved very useful
in various industries-in the defense, petroleum, aerospace, automotive,
meteorological, electronic, and chemical segments. Today, all the industrial
countries of the world are developing their own application techniques using
supercomputers. These tools improve their competitiveness in creating
new materials, developing better processes and products, or making new
scientific discoveries. We see existing applications expanding to include
more complex geometry or more refined theory as machine capability and
capacity keep improving.
We also see the potential in new areas, especially materials science. We
need help to find new materials, whether we are designing integrated circuits
for supercomputers or developing industrial products. Other emerging
application areas include biomedical engineering, pharmaceuticals, and
financial analysis. New applications will also evolve from interdisciplinary
areas.
We have to think about how to develop future application technology
along with future system design. We must start earlier to interact with
leading application scientists and engineers to develop the next generation
of algorithms to make the greatest use of parallel processing. These efforts
OCR for page 62
62
51~ CHEN
will also help to speed up the migration of existing application codes onto
new machines.
Our challenge is to start using these machines in production as soon
as they become available. The worst thing we can do in this country is
to design the best machines but then not use them. Then some other
country will jump in to make use of them ahead of us. We have already
seen this happening in some industries with the current generation of
supercomputers. We certainly want to keep our leadership position in
application technology development for future machines.
SUMMARY
New Directions
In summary, I will point out a few new directions that may evolve in
supercomputing:
petition;
Comprehensive support for parallel processing;
Development of open systems that enhance productivity and com
Total system design to minimize solution time;
Seamless services environment and distribution of functions; and
Wider applications in scientific, engineering, and commercial fields.
In the future there will be more comprehensive support for parallel
processing from very primitive to very sophisticated levels. This means
that more compiler and system software features will be made available
for supporting users in parallelizing their application algorithms as well as
developing and debugging parallel programs.
The open system concept is spreading rapidly. Participants are work-
ing from many directions to exchange ideas and codes. An open system
environment will allow us to concentrate our development and application
resources only on those extension areas related to performance or function-
ality. This will prevent the "reinventing the wheel" syndrome and enhance
our productivity in delivering competitive products.
A total system design that minimizes solution time is an important
key. We will measure machines by solution time instead of by computation
time.
The user will see a seamless services environment with distribution of
functions the supercomputer merged with mainframes and workstations.
Users won't have to tackle different kinds of environments. Instead, an
integrated design, engineering, and manufacturing computing environment
will emerge, greatly enhancing user productivity and industry efficiency.
OCR for page 63
TOWARI) THE FUTURE
63
We will also see a broad expansion of applications for science, engi-
neering, and commercial endeavors. Scientists and engineers will explore
the unknown and develop new technologies. Industry will be more compet-
itive and productive through its development of new products or processes.
Potential Impact
Developments in supercomputing technology strongly influence not
only the competitiveness of key industries in our national economy but also
the vitality of the computing industry itself. This influence on the computer
industry can be shown in a simple triangle (Figure 5.1~. The base of
the triangle represents personal computers and workstations. The middle
section contains mainframe or mid-range computers. At the top is the
supercomputer. All three levels of technology are interacting heavily. For
example, the basic component technology, parallel architecture concepts,
and software and hardware design exploited in the supercomputer arena will
trickle down to the mainframe and workstation level; vice-versa, the user
interface software and application tools commonly seen at the workstation
level will be introduced at the supercomputer level. As a result, the
supercomputing technology pulls the computer industry upward, creating
new market opportunities and enhancing user productivity.
Need for Technological Leadership
I used to say, '`How do we stay there?" I have changed my mind. Now
I say, "How do we get there?" The race is too close to call at this time.
I don't think we have too much leadership in component technology.
I have worked on this problem for many years. Each year I become more
humble when I see how difficult it is to build this kind of machine without
a competitive and sustainable technology base.
We are losing by months from many points of view. We are starting to
lose some of the critical components. We have tried to help U.S. companies,
to work with them, to drive their capability forward to meet with us. But
sometimes it is like wrestling with a big boat.
Our competitors have the advantage. Their work is integrated. They
can focus on something and stay in there for a long time. They can
sacrifice one segment of their industry to pay for another one as long as
it is strategically important to their long-term technology objectives. In
the past, we in the United States seemed not to be able to do that no
matter how hard we tried. Thus, to reverse this trend, some component
and computer industry leaders need to work together intensively to develop
and maintain a strong component technology base in this country.
OCR for page 64
64
Apple
Sun
Silicon Graphics
/
SO CHEN
A
, SSI
Cray
CDC \SUPERCOMPUTERS
IBM
Fujitsu\
Hitachi \
NEC \
/ IBM
/ DEC
CDC
Unisys \
Fujitsu \ MAINFRAMES
Hitachi
NEC
IBM
\ WORKSTATIONS
HP/Apollo \ PCS
NeXt \
· Integrated Circuits
· Architecture
· Printed Circuits · Hardware
· Packaging
Cooling
· Power
· Software
· User Environment
· Applications
FIGURE 5.1 Impact of supercomputing technology. (Note: Manufacture of supercomput-
ers by CDC was discontinued in April 1989.)
Fortunately, we still have some lead in software and application tech-
nology, especially with respect to parallel processing. My hope is to combine
our resources with those of government, universities, and industry. It is
important for us to keep this cooperative development effort moving. In
5 years, we can design a machine that is 100 times faster than today's,
but nobody will be able to use it unless we ship it with good software and
application tools.
OCR for page 65
TOWARD THE FUTURE
65
We must start working with users today. It may take 5 years to develop
an application. Beginning now, while users are developing their next-
generation applications for a high-performance parallel machine, we can
be developing our next-generation system software and application libraries
and tools for a high-efflciency user environment. We are entering a new
paradigm of supercomputing in which user application (and productivity)
is in the center, instead of hardware (peak rate) as in the last decade.
That is my goal. We have to keep this technology leadership. We can
accomplish it as long as we have a common view of the future. In order to
develop and sustain supercomputing technology, we must take a long-term
view. We must be willing to take risks. We have learned from our past
experience. Also, most importantly, we should have a focus. We have many
resources in this country, but they are scattered and never focused enough.
That is why we are losing step by step in some areas.
These are just some of my personal observations and experiences that
I would like to share with you. Certainly, I am not done yet. I am still
chasing that dream machine!
OCR for page 66
66
DISCUSSION
DISCUSSION
Michel Gouilloud: Steve Chen, you have come with a long list of
challenges and problems. Can you suggest some priorities, in other words,
some of the problems you see as the most critical for you in the path of
developing your next generation of machines?
Steve Chen: I think the underlying component technology is the most
critical problem. For example, in silicon technology I seem plateau for
speed and power. The next-generation chip we see is denser but not faster,
and it requires more power. We certainly don't want to have a machine
that is 100 times faster but needs 100 times more power. We may have to
build a power substation next to the computer room. That problem is real.
We need a breakthrough in this area.
Another critical area is high-density cooling. We have to be able to
cool a small area that has very dense heat dissipation, e.g., 10,000 to 20,000
watts.
The next area is application. We need to work with users to design
machines that are balanced, while at the same time preparing their future
applications to take full advantage of parallel processing.
Michael Teter: We from Corning Glass are interacting fairly heavily with
the Cornell Supercomputer Facility. We seem to notice that, independent
of the size of the supercomputer there, as soon as users start competing
for time, the amount that any individual scientist has for his own research
becomes essentially negligible, and he would almost be better off buying a
VAX and working by himself.
Larry Smarr: The largest university user of the NCSA has received
10,000 hours in the last year. Several users have used over 1000 hours.
Kodak uses more than 100 hours a month. It is management of the
allocation of time that is important. The national centers are still learning
how to do this. In fact, it is only within the past several months that the
blue ribbon peer review boards for each center have taken over completely
the allocation of time. Previously, individual program officers at the NSF
simply forwarded any good proposal they received, and that caused some
real saturation problems.
Our goal is certainly to upgrade the facilities as rapidly as we can.
That requires leadership and support from Congress and the NSF. I believe
we are all now beginning to pull together on that. Our goal is to give to
those users who are on the machine both supercomputer response time and
supercomputer power, even if that means that we have to limit by strict
peer review the number of users on the system.
Arthur Freeman: I would like to add to the discussion about whether it
OCR for page 67
DISCUSSION
67
is better to use a VAX. If you can use a VAX, don't go to a supercomputer.
One thing that is very clear is that 100,000 VAXs don't add up to a
supercomputer in terms of capability, just as 100,000 VoLkswagen engines
don't add up to a Saturn engine. Supercomputers are very different from
VAXs. I think people have to understand this difference between capacity
and capability. Capability just is not there on a VAX: It is there on a
supercomputer. We want to increase that capability all the time.
George Super: My question addresses a concern outside the operational
discussion that has just been going on. Steve Chen, you very accurately de-
scribed that one of the major challenges you face is decomposing problems
and understanding how to think differently about solution sets. You said
that we need to increase the demand function, because we are possibly at
a stage now in our society where our supply of supercomputing capacity
exceeds our ability to use it wisely.
I wonder if you think that we are facing a major intellectual challenge,
a computational mechanics challenge that is even greater than the technical
challenge of building faster machines?
Steve Chen: Yes, we face a psychological challenge. I was joking with
Jack Worlton. For many years, every time I spoke with him, he always
said he needed a machine 100 times faster. Now I say, '`I will give you
that machine, but tell me how to use it." Each time I have given him a
machine at Los Alamos, it was already too slow. But, at the same time, the
machine was not used to exploit its full performance features. We had a
four-processor system for more than 5 years. But the users were still using
the system as a throughput machine without going to parallel processing.
This was because it was so easy to port all the existing application codes
onto the new system, to run it as a four-way throughput machine instead
of a four-way parallel machine.
In contrast, the overseas users are more aggressive. A good example
is the European Consortium for Medium-Range Weather Forecasting. In
anticipating the future performance required by finer-resolution forecasting
models and upcoming parallel machines, they have already decomposed
their problems with a general e-way parallel approach where n is greater
than 1. They had demonstrated their parallel algorithms in a research model
before the next machines arrived. Hence they were able to continue to
upgrade their production forecast model from 1 processor to 2 processors,
to 4 processors today; next they will have 8, 16, and even higher numbers
of processors as soon as those become available. Their transition from a
research to a production model has been quick and successful, because they
took a long-term view and broke that psychological barrier very quickly.
We in the United States are behind in this respect. We have got to catch
up in this area.
OCR for page 68
68
DISCUSSION
John Riganai~: Steve, in the earliest days of vector architecture, Sey-
mour Cray made a presentation to Lawrence Livermore National Labora-
tory. At the end of the presentation he was asked what made him believe
that the vector architecture he was discussing was really a general-purpose
machine-it didn't exist at the timecard whether the problems at Livermore
would be able to map into those.
The way Harry Nelson tells the story, Seymour just smiled enigmatically
and said, "We'll see." Well, we did see, and the vector architecture has
proven to be quite general purpose. But the architectures that are evolving
now are one step more difficult to understand. Can you help us, especially
from a user point of view, to understand why the parallel architectures, the
cluster architectures, really will be general purpose in the sense that they
will be able to map general applications onto those architectures?
Steve Chen: Yes. Let's refer to my earlier remarks. You can think
about your applications and decompose them from the top down, e.g.,
using the multiple-domain approach, the multiple-stage pipeline approach,
or the multi-discipline approach. These are natural approaches by which
you can easily map many applications on to the parallel architecture. You
get the best performance that way. With the proper tool set, the user
should be able to exploit this high-level parallelism in a simple and general
way without entanglement with the lowest-level machine complexity.
Mark Karplus: You gave us the hope that in 5 or 6 years you might have
a machine that is 100 times faster and that will combine some improvements
in technology, plus minor parallelism. What many people wonder about is
doing much better. I think there will be people who will very easily figure
out how to use a machine that is 100 times faster and who will want more.
But there is very little discussion of massive parallelism, and many people
say from the computer point of view that the future is to get machines that
are 1000 or 10,000 times faster.
Steve Chen: I can only give you my personal viewpoint. I think those
are worthwhile research activities at this moment. I would like to see that
effort moving forward. But as far as putting 1000 microprocessors together,
I don't think you can achieve the same capability we're talking about in
solving general applications problems. I would rather evolve from the
currently available smaller parallel machines to larger parallel machines,
step by step. We have to move the whole community, instead of just one
or two very bright scientists. A few people might be able to sit down at a
terminal and decompose a problem into 1000 parallel tasks. That would
be very good. But I don't think we can bring in the whole community that
way in a short period of time. However, I do see the possibility of special-
purpose massively parallel machines cooperating with the general-purpose
supercomputer.
Edward Mason: At Amoco Corporation we use supercomputers and
OCR for page 69
DISCUSSION
69
massive computation for geophysics, but we also have a chemical company
and a refining company. One of the biggest problems is retraining or
educating people who are very good in particular fields of science but who
have not used supercomputers, to solve problems by taking advantage of the
opportunities provided by computational science and, when appropriate, by
supercomputers.
Visualization and transparency are crucial. Parallel computing has
been discussed a lot here, but the biologist or the chemist could care less
how it is done. The concern is what can be done. And the problem is to
have those experts in chemistry, biology, and other fields become familiar
with how to simply, from their point of view, exploit supercomputers.
Lany Smarr: Critical to the success of that education and training,
which I think is issue number one, is having the industrial users live and
work in the university environment where, because of the NSF initiative,
we have such a vast number of faculty and students who are not having to
relearn but are very energetically going directly into using supercomputers.
Having them work shoulder to shoulder with the people from industry is
proving to be very effective in bringing about that technology transfer. I
would very much like to see more support from the government for this
education and training part of the program.
OCR for page 70
Representative terms from entire chapter:
parallel processing