| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 388
THE LIFE SCIENCES
considerations are important. The computer becomes widely and success-
fully used in a field only as the scientists in that field come to understand
and use the computer. Otherwise application remains isolated and
peripheral. Despite superficial similarity of tasks among various fields, each
field proves to have its own unique problems in processing information.
This is not to deny the importance of an autonomous computer science,
or of attempts at interdisciplinary efforts (especially when they serve to
seduce those with computer expertise ultimately to become life scientists!~.
It does, however, establish the essential condition of successful assimila-
tion and where the ultimate responsibility rests.
6. Computer facilities must specialize. The generality of the computer
leads naturally to the view that a single computational facility can service
all needs. The establishment of large university computation centers in the
last decade reflects this philosophy in part. But so great is the diversity
of computer use that, in fact, each computer facility, whether large or small,
serves only a fraction of the whole range of needs. Every existing computer
facility substantiates this notion. Time-sharing systems are not efficient for
processing large numerical calculations; large systems for statistical calcula-
tions cannot accommodate laboratory monitoring, and so on. The gen-
erality of the computer is that it can be adapted to any symbolic task, not
that it can be all things for all users simultaneously.
THE STATE OF COMPUTER APPLICATION IN THE LIFE SCIENCES
How quickly and thoroughly, and in what respects, computer application
becomes effective in a given field will be determined by the interplay of the
general facts cited above.
Extent of Use
Computing is widespread in the life sciences. Approximately one in
three life scientists computes, and the total cost of their computing is
in excess of $18 million a year. Unless otherwise specified, all data con-
cerning the state of computing in the life sciences come from the census of
individual life scientists conducted by the Committee and reflect use in
academic year 1966-1967. See Appendix A, Individual Questionnaire,
Questions 22-27. Information was gathered on the number of hours of
usage by type of machine (A, B. C, D, using the classification of the
OCR for page 389
DIGITAL COMPUTERS IN THE LIFE SCIENCES
Rosser reported. Hours can be converted (for some purposes) to their
equivalents on a type B machine (e.g., an IBM 70901: 1 type B machine
is equal to 0.25 type A, 4 type C, and 20 type D machines. 1 B-hour
provides approximately 300 million basic operations (in practice divided
between input, compiling, computing, editing, output, etc.) .
The total hours of computer time reported was equivalent to 90,000
hours on type B machines. This amounts to approximately $18 million a
year in rental. Extrapolation to the total computing population increases
this amount by a factor of 1.5 to 2. Large users who own computers that
are used 24 hours a day partially offset the cost extrapolation as they can
purchase machine time at less than the standard rental.
Unfortunately we do not have comparable figures for other fields. The
Rosser report, which confined itself to academic institutions, estimated
that for 1968 the physical sciences would require 90, engineering 20, and
the life sciences 21 type B computers. For all biologists the estimate from
our census approximates 32 type B computers, with 17 of them in uni-
versities. A guess must be made as to how many B-hours equal a type B
machine, as defined in the Rosser report. It must range between 2,000 and
7,000 hours (one to four shift operations); we chose 4,000 hours. Actual
usage in the other fields undoubtedly also exceeded earlier estimates. The
two-decade head start in the physical sciences still remains. In any case,
we can be sure that biologists are doing large amounts of computing, and
the life sciences are no longer to be viewed as "computationally undevel-
oped country."
As would have been anticipated, actual computer use is very unevenly
distributed. This is illustrated by dividing the users into "light users" (less
-
than 10 B-hours per year), "medium users" (10 to 99 B-hours per year)
and "heavy users" (more than 100 B-hours per year). As Figure 40 shows,
69 percent of life scientists do no computing at all; 21 percent are light
users and do 5 percent of the computing in the aggregate; 8 percent are
medium users, doing 25 percent of the computing; and the remaining
2 percent are heavy users, accounting for 70 percent of the computing.
One of the consequences of this skewed distribution is to make average
values somewhat meaningless, since there is no "central tendency.": The
; Digital Computer Needs in Universities and Colleges, A report of the Committee
on Uses of Computers, NAS-NRC, Publ. 1233, National Academy of Sciences,
Washington, D.C., 1966.
~ More precisely, if we imagine the present sample to be drawn from some under-
lying continuous distribution that gives the probability of a person using x number
of B-hours of computing, then it appears that this distribution does not have a finite
mean value, i.e., that|xp(x)dx does not converge.
OCR for page 390
390 THE LIFE SCIENCES
A)
U'
. _
.a)
Nonusers
Light Users
t' <10 hr/yr
u'
o
Medium Users
10-99 hr/yr
169
Light Users
`,,<10 hr/yr
3
o
IMedium Users
10-99 hr/yr
4 -
o
Heavy Users
~ 100 hr/yr
62,000
U)
. _
o
o
._
m
Light Users
<10 hr/yr
Medium Users
10-99 hr/yr
Heavy Users
>100 hr/yr
378
FIGURE 40 Distribution of computation use. (Source: Survey of Individual Life
Scientists, National Academy of Sciences Committee on Research in the Life
Sciences. )
total amount of computing is markedly dominated by the few heavy users.
The diversity underlying this distribution is impressive. Seemingly equal
amounts of computing power are not equivalent at the small and large ends
of the scale. The composite figures summarized above include the use of
vastly different computers, differing in power and facility up to a factor
of 100. Our sample of almost 4,000 computer users claimed 14,000 hours
of type A computer (like an IBM 360/65 or a CDC 6600),23,000 hours
of type B computer (like an IBM 7090), 19,000 hours of type C computer
OCR for page 391
DIGITAL COMPUTERS IN THE LIFE SCIENCES 391
(like an IBM 360/40 or an SDS 940), and 100,000 hours of type D com-
puter (like a DEC PDP-8 or an IBM 11301. This last category is especially
noteworthy because it shows how widespread the use of small laboratory
computers has become in the life sciences. Although accounting only for
a small fraction of the total computing power (about 5,000 effective
B-hours, or 5 percent of the total), it accounts for 60 percent of the "con-
tact hours." Much of this form of computer use, as in on-line data acquisi-
tion, cannot meaningfully be exchanged for hours on larger machines.
Examining the data from our sample, diversity of arrangement meets one
at every turn. The amount of actual type D computer use in effective time
(equivalent to 5,000 B-hours) is of the order of the amount of computing
time used by all light users (4,500 B-hours). However, light users utilized
all types of machines; thus, approximately 40 percent of their computing
was done on type A machines and 20 percent each on types B. C, and D.
Conversely, approximately half of all time on the small computers (type D)
was occupied by individuals who were heavy users.
Types of Use
Some information is available concerning the types of use of computers by
broad functional categories. Almost all users performed some data analysis
(93 percent), but half (47 percent) also did some other type of comput-
ing, and a substantial number ( 17 percent) engaged in several types.
Table 57 shows the distribution of hours and number of scientists for each
TABLE 57 Tasks for Which Computers Were Used
PERCENTAGE
APPLICATION PERCENTAGE OF COMPUTING
OF LIFE SCIENTISTS (B-HOURS)
ALL USES 100 100
Data Analysis Only 54 21
Information Storage and Retrieval 16 7
Data Acquisition 3 3
On-line Experimental Control 1 1
Simulation 3 2
Theoretical Analysis 6 3
Multiple Uses 17 63
Source: Survey of Individual Life Scientists. National Academy of Sciences Committee on Research
in the Life Sciences.
OCR for page 392
o
loo
Oa)
to
N
o
`,, 2
In
o
o
o
C~
.
s
.
C)
V:
_
_
5~ ~
.m .° ' ~
<: ~ o
~ 4.
_ ~
-
o
.
.4.
n
os .
~:
a
~ a
. ~
J ~
, . _
O Q
x
L~
~o
4,
o
,CO
o
. _
-
. _
cn
~ tn
. _ . _
a~ >\
o
a
~ 6
1
a
Q
._ ~
. ~ ~ ~ ~
. ~
~n
~n
41)
~ .- ~
~ 15
b0 Q)
i ~ I
.~
U'
~o
~:
o
^
C~ U~
C) C)
.O
V)
. ~
o
._
-
:,
._
C)
.r, ·=
C~
C~
C)
3 ~
o
CC
C)
_
._
O ~
o
c)
. -
o v)
~ o
c~
ct
ct
L~ _
~ o
. -
~ -
OCR for page 393
DIGITAL COMPUTERS IN THE LIFE SCIENCES
functional type. The indicated categories, other than 'Data Analysis Only,"
imply that both data analysis and the specified activity were conducted, but
nothing else. The "Multiple Uses" category indicates that data analysis
and at least two other types of computation were reported.
The table shows that most of the computing hours are used by those in
the "Multiple Uses" category, resecting the fact that most heavy users
use computers in several ways. Surprisingly, the light users were not much
more likely to engage only in data analysis than was the population as a
whole (59 to 52 percent). Other similarities between the three user cate-
gories are illustrated in Figure 41. Likewise, the machines themselves are
not specialized for specific types of computing. As can be seen from Figure
42, only a few general aspects show through, e.g., type D computers are
used relatively heavily for experimental control (which seems naturals,
and type A computers are used somewhat less than others for storage and
retrieval. The reason for this is not so clear. However, it was unlikely that
our gross categories could reveal the specialization of particular facilities to
do particular jobs.
This description of functional types of computer use, based on answers
to our questionnaire, is rather abstract and fails to describe the remarkable
diversity of computer use in the life sciences. Only a few uses can be
noted here. Included are very large numerical calculations, such as the
processing of statistical data or deciphering the structure of an organic
molecule from crystallographic data. Small data analysis may include
relatively simple routine calculations from instrumental analysis, the strik-
ing of a nutritional balance, or the calculation of relatively simple reaction
rates. The widest possible variety of functions is found within the category
of "data acquisition." This may be an experiment in which electrical signals
are obtained and converted to digital records for later processing, possibly
with concurrent display to check whether a good record has been obtained.
It could mean equally well the use of a currently existing system for gather-
ing data about the feeding and milk-producing behavior of cows from all
over the country, which (along with their genealogy) permits the evalua-
tion of both feeding plans and the worth of bulls. Simulation could mean
a study of enzyme kinetics or a study of the life cycle of a salmon. The
relatively small amount of effort indicated for on-line experimental control
is a reflection of the relative recency of the practicality of such exercises.
The use of small laboratory computers to "run the show" is, qualitatively,
a completely different use of computers than all the other categories.
The variation is such that no general pattern emerges, and the variety
will surely expand in response to future demands for information-process-
ing tasks.
OCR for page 394
OCR for page 396
OCR for page 397
OCR for page 398
OCR for page 399
OCR for page 400
OCR for page 401
Representative terms from entire chapter:
individual life
394 THE LIFE SCIENCES
50
40
a)
Q
~ 30
a)
~4
a,
20
10
O
o
10
20
In
o 30
I
a)
ma
DIGITAL COMPUTERS IN THE LIFE SCIENCES 395
Computer Use in Research Areas of the Life Sciences
Within each research area there occurs the same skewed distribution of
computing, with many small users shading down to a very few heavy users.
The more investigators active in a research area, the more computing they
do; beyond that, no generalizations emerge. It is not useful to compute
an average number of hours per scientist. Figure 43 displays this rather
curious situation. This figure shows, for our sample, the number of com-
puter scientists in a research area versus the amount of computing done by
that area. Although it clearly rises linearly as the number of scientists who
compute increases, the points become widely scattered. In terms of a
computed mean value these "wild" points almost completely determine the
slope, due to the small number of heavy users in each subfield.
Our survey shows that, for all research areas, the light users consume
about 2.4 B-hours per year, the medium users about 30.6, and the heavy
users about 295. Six people (0.2 percent of the population) indicated they
used more than 1,000 B-hours of computing time per year. These "super-
heavy" users averaged 2,390 B-hours per year apiece. Further, for any
one research area, the percentage of light users is about 67, the percentage
of medium users about 27, and the percentage of heavy users about 5.
As consumers or computing power, all subareas of the life sciences
appear much the same. The deviations among areas do not appear to have
any meaning.* Notwithstanding the uniformity of the distribution, there
is a large variation in the amount of computing, depending on the exact
behavior of the few heavy users. However, because these were truly very
few, and subject to a large sampling bias, their exact values are not
meaningful.
Hence the data suggest that all research areas of the life sciences are
engaged in computing. There is no specialized subarea that is the "com-
puting part" of the life sciences. Surely this reflects the considerable gen-
eral advancement of the life sciences in making use of this major tool of
modern research.
The Growth of Computer Usage
The movement into the use of the computer has been rapid and recent.
Figure 44A shows the percentages of our sample that had been using the
: More precisely, the deviations appear to be due entirely to sampling variation.
The deviations from the mean for each area are completely uncorrelated at the four
levels of use. Furthermore, much of the seeming scatter results from the data for
research areas involving relatively small numbers of scientists, so unusual behavior
by a few scientists seems to create a large deviation.
/
396 THE LIFE SCIENCES
1 1,000
10,000
9,000
8,000
7,000
o 6,000
I
m
i_ 5,000
4,000
3,000
2,000
1,000
o
. . . .. . . . . . . . .... . . . . . .
100 200 300
Number of Computing Scientists in Research Area
(Answering Both Questions)
400
FIGURE 43 Number of computing scientists versus hours of computing, by field.
(Source: Survey of Individual Life Scientists, National Academy of Sciences Com-
mittee on Research in the Life Sciences.)
computer for various lengths of time; five years or less, six to ten years, etc.
Seventy-seven percent of current users had begun their use of computers
within the last five years, and 19 percent have been computing six to ten
years. Thus four times as many biologists began computing within the most
DIGITAL COMPUTERS IN THE LIFE SCIENCES
recent five-year period as had begun during the previous five-year interval.
When due account is taken of the steep curve, this is an entry rate of almost
20 percent per year. This certainly cannot continue. Eventually the rate
must approach the growth rate of the life scientist population (currently
about 8 percent). By now (early 1970), the number of life scientists com-
puting is already between 40 and 50 percent, instead of the 30 percent
reported by our sample in 1966-1967. At such time as the percentage
approaches 50, it is likely that the rate of growth will have slowed markedly.
Figure 44B shows the years-of-use curve for the two areas in which approxi-
mately 50 percent of the computing scientists commenced their computing
within the last five years. These are genetics and nutrition, which have
identical year-of-use distributions. Their curves have already begun to
"bend over," and their entry rate is about 10 percent per year.
Table 58 illustrates that, at least for our sample of biologists, the asser-
tion that "computing is a young man's game" is not so. An examination of
the age distributions of all biologists whether they compute or not, and
of those who compute shows that they are essentially the same. Also, all
ages, whether light, medium, or heavy users, do their equivalent share of
computing. Furthermore, as seen in Table 59, the distribution of com-
puting effort (percentage B-hours) for biologists in each age group is pro-
portionate to the number of individuals in that group. This proportionality
holds for the three types of users-light, medium, and heavy. Table 60
TABLE 58 Age Distribution
of All Biologists versus Com-
puting Biologists
397
TABLE 59 Age Distribution of Computing
Biologists versus Extent of Computing
PERCENTAGE DISTRIBUTION OF
AGE PERCENTAGE COMPUTING TIME IN B-HOURS
GROUP AGE
(YEARS) Biologists GROUP AllLight MediumHeavy
All. Who (YEARS) BiologistsUsers UsersUsers
Biologists Compute Who( 100
Computehours) hours)hours)
ALL AGES 100 100
ALL AGES 100100 100100
<30 4 3
30-39 39 39 ~ 30 14 31
40-49 37 38 30-39 3741 3637
50-59 16 17 40-49 4139 4340
> 60 4 3 50-59 1614 1715
>60 52 17
Source: Survey of Individual Life Scien-
tists, National Academy of Sciences
Committee on Research in the Life Sci-
ences.
Source: Survey of Individual Life Scientists, National Academy
of Sciences Committee on Research in the Life Sciences.
m
' ~ _
to
1 1 1 1 1
0 0 0 0 0 0
ID ~A) N ~
f
--it
At_
o o o
Do ~
Su!~ndwo~ s~S!8olo!8 10 98elUa0~9d
.
.
_
. .
. .
. .
, .
. . .
. . .
o o o o o o o
CM
Ou!~ndwo~ SIS!9olo!9 10 aPelU90~9d
o
.
-
.
so
Ct
cn
.
on ~
o ~
o ~
.
Cal
o c'5
Cal
on ~
.=
~ o
Cal
.
V)
.
DIGITAL COMPUTERS IN THE LIFE SCIENCES
TABLE 60 Shift of Percentage Distribution of Computing (Percent B-
Hours) with Years of Computing Experience
YEARS OF
COMPUTING LIGHT MEDIUMHEAVY
EXPERIENCE USERS USERSUSERS
TOTAL 100 100100
<5 79 7263
6-10 18 2331
11-15 2 45
16-20 1 11
Source: Survey of Individual Life Scientists, National Academy of Sciences Committee on Research
in the Life Sciences.
shows that as they become more experienced, computer users tend to shift
into the higher use categories, a trend quite in keeping with expectations.*
However, research areas do differ in the percentage of their active scien-
tists who compute (Figure 451. Here percentage participation for each
field is plotted horizontally, and the percentage of these computing scientists
who have used computers five years or less is plotted vertically. Thus
genetics has the highest participation, with 49 percent of the field com-
puting; and morphology has the lowest participation, with 18 percent.
Hence, variation between different biological fields is considerable.: Fur-
thermore, a field with very little participation should contain a high pro-
portion of individuals just making the acquaintance of the computer. The
regular decreasing sequence of Figure 45 clearly shows such a relationship.
Extrapolation of the data to 100 percent participation indicates that about
30 percent of the users commenced computing within the last five years.
This is equivalent to an approximate annual growth rate for computing
participation of 6 percent. Such a growth rate is in tolerable agreement
with the growth rate of biology as a whole. Hence it is plausible (though
hardly conclusive from the evidence) to view all subareas as migrating down
the curve of Figure 45, reinforcing the impression that all areas of biology
are assimilating the computer. Their rate of assimilation differs only be-
cause of the point in time in which they commenced computing; their rate
depends upon their position on the curve.
However, the magnitude of this effect is not enough to help predict the amount of
computing used in a research area by knowing its age distribution, even though the
average number of B-hours per scientist is about 25 for a new user (less than five
years) and about 50 for an old user (greater than five years).
~ Again, there is no correlation with how much computing is done per scientist.
400 THE LIFE SCIENCES
100
80
n
In
a)
o
En
~60
ID
._
Q
E
o
C: 40
as
a)
20
o
Percentage of Field Computing
0 20 40 60 80 100
-
FIGURE 45 Percentage of biologists computing in a field versus percentage of
recent entries to computing within that field. (Source: Survey of Individual Life
Scientists, National Academy of Sciences Committee on Research in the Life
Sciences. )
Institutional Arrangements for Computer Use
It remains only to look briefly at institutional arrangements. Here again,
the main impression, as seen in Table 61, is diversity; all arrangements are
used heavily for multiple purposes. The table shows the percentages of
A, B. C, and D type computer hours used by researchers using three major
types of computation facilities: (1) laboratories that own their own com
DIGITAL COMPUTERS IN THE LIFE SCIENCES 401
TABLE 6 1
Facilities
Percentage of Computing Done Using Different Types of Computer
COMPUTER SIZE
SOURCE OF COMPUTER
All
Sizes
A
B
C
EXTENT OF USE
D Light Medium Heavy
ALL SOURCES 100 100 100 100 100 100 100 100
Investigator's Laboratory 31 23 47 29 46 20 25 36
Life Sciences Computing
Center 27 3 1 18 26 28 17 22
Other (Including University
Computing Center) 42 46 35 45 26 63 53 34
Source: Survey of Individual Life Scientists, National Academy of Sciences Committee on
Sciences.
puters, (2) laboratories that use a life sciences computing center, and
(3) laboratories that use some other computation center, usually a uni-
versity center. Table 61 also shows corresponding percentages for light,
medium, and heavy users. In this table it is possible to verify some facts
that one might have expected. Thus, most type D computers belong to the
scientists' own laboratories; but much computing by light users is done at
university computation centers. The overall impression is one of multiple
arrangements.
Funding of Computer Use
Table 62 shows the sources of funding for computing in the various re-
search areas of biology. Overall, 42 percent of the support came from
research grants to individual scientists; 29 percent from federal funds
specifically allocated for life sciences computing, 9 percent from non-life-
sciences funds (e.g., a university's own computing budget), 11 percent from
other funds (e.g., state life sciences funds), and 9 percent from support
whose source was unknown to the individual scientist. Given that about
75 percent of research grants are also provided by the federal government
and that some fraction of the non-life-sciences funds and funds of unknown
source undoubtedly is funded from National Science Foundation computer-
facility grants to universities, it is clear that the federal government supports
the great bulk of computation. Furthermore, Table 62 reveals that differ-
ent areas of biology meet their computing costs in different ways. For
Research in the Life