Skip to main content

Currently Skimming:

Research on Large-Scale Systems
Pages 99-141

The Chapter Skim interface presents what we've algorithmically identified as the most significant single chunk of text within every page in the chapter.
Select key terms on the right to highlight them within pages of the chapter.


From page 99...
... Computer scientists and engineers have examined ways of combining components whether individual transistors, integrated circuits, or devices into larger IT systems to provide improved performance and capability. The incredible improvements in the performance of computer systems seen through the past five decades attest to advances in areas such as computer architectures, compilers, and memory management.
From page 100...
... WHAT IS THE PROBLEM WITH LARGE-SCALE SYSTEMS? Since its early use to automate the switching of telephone callsthereby enabling networks to operate more efficiently and support a growing number of callers IT has come to perform more and more critical roles in many of society's most important infrastructures, including those used to support banking, health care, air traffic control, telephony, government payments to individuals (e.g., Social Security)
From page 101...
... . In some cases, difficulties in design and development have resulted in significant cost overruns and/or a lack of desired functionality in fielded systems.
From page 102...
... After spending more than 15 years and approximately $411 million, the program was canceled in 1999. Vehicle registration and driver's license database was never deployed after $44 million in development costs three times the original cost estimate.
From page 103...
... The Federal Aviation Administration (FAA) will have spent some $42 billion over 20 years in a much-maligned attempt to modernize the nation's air traffic control system (see Box 3.2)
From page 106...
... "Supply-chain management" is not possible on a large scale with existing database technology and can require technical approaches other than data warehouses.9 · Knowledge discovery which incorporates the acquisition of data from multiple databases across an enterprise, together with complex data mining and online analytical processing applications will become more automated as, for example, networked distributed sensors are used to collect more information and user and transaction information is captured on the World Wide Web. These applications severely strain the state of the art in both infrastructure and database technology.
From page 107...
... TECHNICAL CHALLENGES ASSOCIATED WITH LARGE-SCALE SYSTEMS Why are large-scale systems so difficult to design, build, and operate? As evidenced by their many failures, delays, and cost overruns, largescale systems present a number of technical challenges that IT research has not yet resolved.
From page 108...
... Because so much of the activity surrounding the Internet in the late l990s was based in industry, the academic research community has been challenged to define and execute effective contributions. The nature of the research that would arise from the research community is not obvious, and the activities in current networking research programs as clustered under the Next Generation Internet (NGI)
From page 109...
... RESEARCH ON LARGE-SCALE SYSTEMS 109 often impossible for a single individual, or even a small group of individuals, to understand the overall functioning of the system. As a result, predicting performance is incredibly difficult, and failures in one part of the system can propagate throughout the system in unexpected ways (Box 3.3~.
From page 110...
... Techniques are needed to help design robust, reliable, and secure software in this new and highly challenging environment.
From page 111...
... because the ensemble must continue to evolve as new hardware replaces old or as software is repaired or enhanced. These requirements are difficult to accommodate using traditional reductionist engineering approaches, and methodologies to successfully engineer such systems are poorly understood.
From page 113...
... Changes or additions to the system can therefore produce unexpected and unanticipated results.23 Another source of difficulty is that many systems are designed without the modularity and encapsulation of functionality needed to facilitate future upgrades. In many hardware and software projects, the emphasis is on getting a system up and running.
From page 114...
... Trustworthiness increasingly is recognized as one of the most important challenges in IT, because systems are increasingly used to support critical functions and are increasingly networked, which can introduce new vulnerabilities. Ensuring trustworthiness is particularly difficult in largescale IT systems because of their size and complexity.
From page 115...
... . A Swedish hacker shut down a 911 emergency call system in Florida for an hour, according to the FBI, and in March 1997 a series of commands sent from a hacker's personal computer disabled vital services to the FAA's control tower in Worcester, Massachusetts.26 Such vulnerabilities are not limited to government computer systems, whose problems are more likely to be publicized; they apply as well to a growing number of private-sector systems, which become attractive targets of corporate espionage as attackers come to recognize that proprietary information is stored on networked systems.
From page 116...
... Large-scale system designs clearly differ from, say, desktop office suites in that they must operate in unknown, changing environments. Unfortunately, most algorithms and design techniques for computer hardware and software assume a benign environment and the correct operation of every component.
From page 117...
... RESEARCH ON LARGE-SCALE SYSTEMS 117 scale systems themselves reliable is more difficult. The telephone system, which is based heavily on software, may be the closest to reaching this goal, but its robustness has been achieved only at considerable cost and with delays in development.28 The race to develop new critical applications, driven by the rapid pace of innovation in Internet applications and services, has resulted in inadequate, even dangerously poor, robustness.
From page 118...
... . Techniques for assuring robustness in hardware have been of critical importance in, for example, space flight; by performing each computation using three independent hardware systems and attaching a "voting" circuit to the outcome to determine the majority answer, one can catch and overcome many hardware failure modes.
From page 119...
... Traditionally, IT systems research has emphasized advances in performance, functionality, and cost, primarily to improve device (or component) characteristics (Hennessy, 1999~.
From page 121...
... and a 1997 workshop on the same topic by the committee's successor, the Committee on Computing, Information, and Communications of the National Science and Technology Council (CCIC, 1997~. The High Confidence Systems research program was added under the HPCCI umbrella, but concerns about the limitations of existing research efforts were expressed in a variety of reports on critical infrastructure and in the associated calls for research.3~ The Information Technology for the Twenty-First Century (IT2)
From page 122...
... A number of other programs sponsored by the NSF and the Defense Advanced Research Projects Agency (DARPA) in late 1999 and early 2000 promise continued exploration of systems issues: · Information Assurance and Survivability (DARPA)
From page 123...
... Such an effort would need to support research along many different dimensions theory, architecture, design methodologies, and the like because no single approach to system design will be able to address the full scope of challenges presented by large-scale systems.32 It is possible (although, in the committee's judgment, unlikely) that dramatically improved methodologies for the design of large-scale systems are beyond human capability certainly, it is difficult to get one's arms around the challenge (especially for researchers who have little hands-on experience with large-scale systems)
From page 124...
... . Case research and methodology research are complementary: case research identifies specific shortcomings and problems in large-scale system design methodologies that can be more fully explored through methodology research, and improved methodologies arising from methodology research can be validated by trying them out on one or more specific cases using case research.
From page 125...
... The investment in methodology research needs to be greatly expanded to stimulate more research that pursues high-risk approaches to system design and to foster greater collaboration among IT researchers in universities and industry and end users with operational knowledge of large-scale system problems. An expanded research agenda would need to address systems that are (1)
From page 126...
... Theoretical Approaches One element of any approach to studying the properties of systems of a scale and complexity exceeding current capabilities is to develop theoretical constructs of behaviors. Theoretical computer science has been quite successful in applying such methodologies to, for example, the computing requirements for algorithms of arbitrary complexity, quantifying which algorithms have desirable properties and which algorithms do not.
From page 127...
... Another approach may be to investigate alternatives to the top-down approach to decomposing systems advocated by structured programming (which tends to work on a small scale only) and to the bottom-up approach to system design embodied in notions of component software (described below)
From page 129...
... Other issues include the development of processes that work well even when people have less-than-optimal skills.39 Extensions of Existing Approaches Existing approaches to large-scale system design, including some that are in commercial practice, show promise for facilitating the development of large-scale systems and could benefit from greater attention from the research community. Two approaches worth mentioning are methodologies based on component software and mobile code.
From page 130...
... Emerging component software frameworks, such as lavaBeans, exploit libraries of predefined elements that fit within a common design framework. Applications are built by assembling existing components as well as by creating new, unique components that fit within the framework.
From page 131...
... Such systems often need to operate continuously, and operators are understandably unwilling to allow experimentation with missioncritical systems. In some contexts, additional concerns may arise relating to the protection of proprietary information.44 Such concerns have long roots.
From page 132...
... Existing infrastructure programs have a critical limitation with respect to the kind of research envisioned in this report: they help investigators in universities and government laboratories routinely access dedicated computers and networks used for scientific research or related technical work, but they do not provide researchers with access to experimental or operational large-scale systems used for purposes other than sciencecomputers and networks used for everything from government functions (tax processing, benefits processing) through critical infrastructure management (air traffic control, power system management)
From page 133...
... One option would be to modify AUPs to allow some forms of business traffic to use the research Internet, so as to create a laboratory for studying the issues. Firms might be willing to bear the cost of maintaining backups for their commercial traffic on the commercial Internet if they could use the research network at below market prices.47 Government could also fund some data collection activities by Internet service providers (ISPs)
From page 134...
... 1987. "No Silver Bullet: Essence and Accidents of Software Engineering," IEEE Computer 20~4~:10-19.
From page 135...
... 1997. Reports submitted for NSTAC XX (Volume I: Information Infrastructure Group Report, Network Group Intrusion Detection Subgroup Report, Network Group Widespread Outage Subgroup Report; Volume II: Legislative and Regulatory Group Report, Operations Support Group Report; Volume III: National Coordinating Center for Telecommunications Vision Subgroup Report, Information Assurance, Financial Services Risk Assessment Report, Interim Transportation Information Risk Assessment Report)
From page 136...
... 1997. Air Traffic Control: Immature Software Acquisition Processes Increase FAA System Acquisition Risks.
From page 137...
... In the original Pentium Pro, which had about 5.5 million transistors in the central processing unit, Intel found and corrected 1,200 design errors prior to production; in its forthcoming Willamette processor, which has 30 million transistors in the central processing unit, engineers have found and corrected 8,500 design flaws (data from Robert Colwell, Intel, personal communication, March 14, 2000~. Despite these efforts, bugs in microprocessors occasionally slip through.
From page 138...
... 23. For example, simply upgrading the memory in a personal computer can lead to timing mismatches that cause memory failures that, in turn, lead to the loss of application data even if the memory chips themselves are functioning perfectly.
From page 139...
... How to design and construct such large computer programs is the focus of research in software engineering. Current research efforts, however, do not go far enough, as discussed later in this chapter.
From page 140...
... ) , how to recognize and manage emergent behavior, and how to specify and guarantee behavior constraints." Additional information about this project is available online at .
From page 141...
... They have been attempting to develop broader and more accurate Internet traffic and performance data for some time. Federal support associated with networking research might provide vehicles for better Internet Service Provider data collection.


This material may be derived from roughly machine-read images, and so is provided only to facilitate research.
More information on Chapter Skim is available.