Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 1
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering DROWNING IN DATA
OCR for page 2
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering This page in the original is blank.
OCR for page 3
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering Magnetic Recording: Winner of the Data Storage Technology Race THOMAS R. ALBRECHT IBM Almaden Research Center San Jose, California A recent article in The Washington Post (Burgess, 1999) contains a number of interesting statements about hard disk drives (HDDs), including: Its insides would warm the heart of a Swiss watchmaker—tiny, finely crafted components whirring, reaching, spinning in choreographed precision, doing their duty year after year with razor-sharp accuracy and reliability.... And the technology keeps getting better and cheaper, despite perennial predictions of its ultimate demise.... The hard-disk industry claims to do better than the famous ''Moore's Law" of microchip progress. That law says that chips are meant to double in performance roughly every 18 months; diskmakers say they manage that about every 12. The strange thing is that despite such accomplishments, the hard disk gets so little respect. While the public is well aware of the great strides made in recent decades in the areas of microprocessor and memory performance, HDD technology has been advancing at an even faster pace in some respects (Figure 1). Microprocessors have shown a 30 percent increase in clock speed and a 45 percent increase in MIPS (million instructions per second) per year over two decades. Dynamic random access memory chips deliver per dollar 40 percent more capacity per year. Yet HDDs have offered 60 percent more storage per disk (and per dollar) each year over the last decade, and in the last three years, this pace has accelerated to 100 percent per year. Progress is so rapid that today the industry suffers from a drive capacity glut. While unit shipments of HDDs continue to increase, the number of platters per drive is falling, indicating that many customers no longer choose to purchase the largest available capacities. Another factor in this
OCR for page 4
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering FIGURE 1 Comparison of magnetic data storage and competing technologies. (a) Until the 1990s, density of optical recording ex ceeded that of magnetic. Traditionally, magnetic densities have grown at 25 percent per year, increasing to 60 percent in the early 1990s with the introduction of anisotropic magneto resistant (AMR) heads, and finally to 100 percent with giant magnetoresistant (GMR) heads in the last few years. Optical densities are growing at only 20 percent per year. (b) Although the rate of price reduction has accelerated for both magnetic and solid state technology in recent years, the price difference continues to widen. SOURCE: IBM.
OCR for page 5
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering trend is the replacement of large drives with arrays of smaller ones, which offers better performance, lower cost, and capability for fault tolerance without data loss. The premature demise of magnetic recording as the leading storage technology has been predicted for decades. Projections from the past (Eschenfelder, 1970; Wildmann, 1974) typify the widely held perception during the 1970s and 1980s that engineering challenges would limit the achievable areal density of magnetic recording to levels much lower than could be achieved with optical recording. In fact, during the 1990s magnetic recording densities surpassed those of optical recording (Figure 1) and continue to improve at a faster pace than that of optical recording. After 50 years as the data storage medium of choice, magnetic recording, predominantly in the form of hard disk drives, remains the best choice for high capacity, high performance, and low cost. At the moment, more than 1019 bytes/year of HDD storage are being shipped. Yet even as magnetic recording is being pronounced the "winner" of the storage technology race, physical limitations that may slow or impede further progress are beginning to appear on the horizon. HDDs are a combination of advanced technologies from several fields, including materials science, magnetics, signal processing, microfabrication, high-speed electronics, and mechanics. As the newspaper quote above mentions, HDDs are one of the few places in a computer where precision mechanics still plays a major role. The head-disk interface requires ultrasmooth disk surfaces (peak roughness less than 10 nm) with magnetic read/write elements riding on a slider (Figure 2) flying ~20 nm from the disk surface at velocities of up to 45 m/sec. Minimizing head-disk spacing is essential to maximizing areal density, while at the same time, head-disk physical contact must be avoided to prevent interface failure. Tight flying-height control is accomplished through the use of photo-lithographically defined multilevel air bearings that use combined positive and negative pressure regions to minimize sensitivity to interface velocity, atmospheric pressure, skew angle, and manufacturing tolerances. Although flying heights have decreased steadily over the years from ~20 µm to ~20 nm, further progress may require bringing the heads and disks fully into contact, which will require major technology breakthroughs to deal with friction and wear issues. Ever increasing areal density requires decreasing track widths, which are ~1 µm in today's products. Conventional rotary actuator and track-following servo technology appears to be approaching its limit. In the near future, HDDs are expected to incorporate two-stage actuators, with a conventional rotary actuator serving as a coarse head positioner, and a micromechanical actuator serving as a fine positioner. An example of a slider mounted on an electrostatic micromachined actuator (Fan et al., 1999) is shown in Figure 3. Magnetic read/write elements have evolved from wire-wound ferrite inductive heads to microfabricated thin-film structures. Today, separate read and write elements are used, with a thin-film inductive element for writing and magnetoresistive element for reading. The quest for ever-increasing read head sensi-
OCR for page 6
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering FIGURE 2 Fabrication of heads for HDDs. (a) Thin film inductive and GMR heads are batch fabricated on ceramic wafers. (b) Wafers are sliced into rows. At the row level, the air bearing surface is first lapped to tight flatness specifications, and then lithographically patterned and etched to create air bearing features. Air bearing features are typically 0.1– 2 mm in height. (c) Finally, rows are parted into individual sliders 1 × 1.25 × 0.3 mm in size. SOURCE: IBM. tivity to provide adequate signal-to-noise for reduced bit size has led to the adoption of read heads using the recently discovered giant magnetoresistance (GMR) effect, which has moved from discovery of a new physical effect (Baibich et al., 1988) to high-volume mass production of heads in just under 10 years. GMR heads incorporate a multilayer structure of magnetic and nonmagnetic thin films (Figure 4). The magnetization of a "pinned" layer is constrained by anti-ferromagnetic coupling to another layer. A thin nonmagnetic layer (e.g., 5 nm of Cu) separates the pinned layer from a second magnetic layer, whose magnetization is free to rotate under the influence of an external field (in this case, the fringing fields of magnetic bits written on the disk). Changes in the relative orientation of the magnetization in these two films result in changes in resistance on the order of 5–20 percent, an effect much larger than is achieved by the older anisotropic magnetoresistive (AMR) head technology. Although read elements have been the focus of recent advances in head technology, inductive write heads
OCR for page 7
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering FIGURE 3 Slider (air bearing side up) mounted on a micromachined electrostatic secondary actuator on the end of a suspension. The microactuator serves as the fine positioning element for the track-following servo system of the HDD. SOURCE: IBM. are also seeing a wave of innovation, with the implementation of higher moment materials, more compact coils, and better dimensional control to provide increased write field at higher speed in narrower tracks. The greatest technical challenges facing the industry today are in the area of
OCR for page 8
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering FIGURE 4 Magnetic read/write head. (a) The slider, with read/write elements on its rear surface facing downward toward the disk, is mounted on a spring-like suspension that holds the slider against the disk surface. An air bearing between the disk and slider surfaces prevents head-disk contact, while maintaining a ~20 nm separation between the head and disk. (b) The inductive write element is composed of Cu coils and a soft magnetic NiFe yoke, with poles facing the disk. (c) The GMR read element is a multilayer thin film structure, with pinned and free magnetic layers separated by a thin nonmagnetic Cu spacer layer. SOURCE: IBM. disk materials and magnetics. Specifically, the size of recorded bits is approaching the superparamagnetic limit, where magnetization becomes thermally unstable. Disk media use granular cobalt alloy sputtered thin films, on the order of 15 nm thickness. Spontaneous changes in magnetization of individual grains can occur if the energy product KuV (where Ku is the anisotropy energy of the material and V is the grain volume) is insufficiently large compared to the Boltzman energy kBT (Weller and Moser, 1999). However, shrinking bit size requires shrinking grain size to maintain adequate signal-to-noise (Figure 5). Choosing materials with higher Ku. poses problems in generating sufficiently large write fields (today's disk materials already require fields that cause magnetic saturation of write head poles). Thus, simple scaling of previous design points results in thermal
OCR for page 9
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering FIGURE 5 (a) Magnetic tracks are recorded on a granular thin-film magnetic medium on the disk. The finite size of grains in the medium results in "noisy" (not straight) boundaries between recorded bits. Achieving suitable disk signal-to-noise requires that grain size shrink as density is increased. Dimensions shown are for 10 Gbit/sq. inch density. The image of the track is a Lorentz microscopy image of a track recorded at lower density (1 Gbit/sq. inch). (b) Choice of materials, underlayers, and sputtering conditions determines grain size. The medium on the right, with its smaller grains, can support a higher recording density; however, its thermal stability may be lower than the medium on the left. SOURCE: IBM.
OCR for page 10
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering instability (and lost data) when densities of 50–100 Gbit/sq. inch are reached, which is a design point only a few years away at the current rate of progress. To continue increasing density while maintaining thermal stability, some combination of the following will be used: perpendicular media and heads, reduced bit aspect ratio, thermally assisted writing, and enhanced error correction. Use of patterned media (one bit per pre-defined grain) offers the ultimate route to extremely high densities, although economical manufacturing strategies have not yet been shown. Although there are many technical challenges, magnetic data storage is expected to remain a dominant player for years to come. As magnetic data storage reaches its 50th anniversary, it is branching out in new directions. Disk drives, once restricted to the domain of data storage for computers, are entering new markets. One example is the "microdrive," a tiny drive with a 1-inch disk for digital cameras and other handheld devices. Another example is the HDD video recorder, which may potentially revolutionize the television industry. Even the realm of solid-state memory may someday yield to magnetics: magnetic random access memory (MRAM) shows promise as a future replacement for solid-state flash memory (Parkin et al., 1999). REFERENCES Baibich, M. N., J. M. Broto, A. Fert, F. Nguyen Van Dau, and F. Petroff. 1988. Giant magnetoresistance of (001) Fe / (001) Cr magnetic superlattices. Physical Review Letters 61:2472–2474. Burgess, J. 1999. Hard disks, which just keep improving, deserve more respect. The Washington Post (July 26):F23. Eschenfelder, A. H. 1970. Promise of magneto-optic storage systems compared to conventional magnetic technology. Journal of Applied Physics 41:1372–1376. Fan, L.-S., T. Hirano, J. Hong, P. R. Webb, W. H. Juan, W. Y. Lee, S. Chan, T. Semba, W. Imaino, T. S. Pan, S. Pattanaik, F. C. Lee, I. McFadyen, S. Arya, and R. Wood. 1999. Electrostatic microactuator and design considerations for HDD applications. IEEE Transactions on Magnetics 35:1000–1005. Parkin, S. S. P., K. P. Roche, M. G. Samant, P. M. Rice, R. B. Beyers, R. E. Scheuerlein, E. J. O'Sullivan, S. L. Brown, J. Bucchigano, D. W. Abraham, Yu Lu, M. Rooks, P. L. Trouilloud, R. A. Wanner, and W. J. Gallagher. 1999. Exchange-biased magnetic tunnel junctions and application to nonvolatile magnetic random access memory. Journal of Applied Physics 85(8): 5828–5833. Weller, D., and A. Moser. 1999. Thermal effect limits in ultrahigh-density magnetic recording. IEEE Transactions on Magnetics 35(6):4423–4439. Wildmann, M. 1974. Mechanical limitations in magnetic recording. IEEE Transactions on Magnetics 10:509–514.
OCR for page 11
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering Evolution of Large Multiprocessor Servers KOUROSH GHARACHORLOO Western Research Laboratory Compaq Computer Corporation Palo Alto, California ABSTRACT A decade ago, large multiprocessors were part of a niche market and were primarily used to solve difficult scientific and engineering problems. Today, these multiprocessors are enjoying a phenomenal growth, and commercial databases and World Wide Web servers constitute the largest and fastest growing segment of the market. This evolution is continuing with the increasing popularity of the Web. For example, many experts believe that there will be a push toward large centralized servers used as permanent data repositories, providing users with easy access to their data from a wide range of devices connected to the Internet. These trends have made multiprocessor servers a key component in today's computing infrastructure. Multiprocessor server design has gone through a corresponding evolution during this period. The most significant change has been a shift from message passing to cache-coherent shared-memory systems. The key advantage of a shared-memory system is that it allows efficient and transparent sharing of resources (e.g., memory, disk, network) among all processors. This feature greatly simplifies the task of application programmers by reducing (and sometimes eliminating) the need for resource partitioning. In addition, shared-memory naturally lends itself to a single system image (from an operating system perspective), which greatly simplifies the task of system management. However, preserving these benefits for larger scale systems presents a new set of technical challenges. One key challenge is to achieve software and operating system scalability while maintaining a single system image. Another challenge is in satisfying the reliability, availability, and serviceability requirements of commercial applications
OCR for page 14
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering Such recurring themes lead some to believe that there is nothing new in computer architecture, with new designs primarily being an adaptation of existing ideas to current technology trends and market needs. CURRENT CHALLENGES Operating System and Application Scalability One of the myths in computer system design is that software is easier to change than hardware. In reality, commercial software, such as operating systems and databases, is quite complex (consisting of millions of lines of code), making it a daunting task to change it in a major way. In comparison, hardware systems are much less complex and are designed to a simpler and more precise specification. This allows hardware to be redesigned every 18 months to 2 years, and making major changes is quite feasible as long as architectural compatibility is maintained. Compared to scientific and engineering applications, commercial applications make it challenging for system software to scale to a large number of processors, due to their frequent use of operating system services and input/output devices (I/O). The process of scaling software to a larger number of processors typically involves reducing the amount of inherent synchronization and communication by employing new data structures and algorithms. These types of changes are difficult to make in operating systems and applications such as databases because of their inherent size and complexity. Major strides have been made during the last few years with some database and operating system software currently scaling up to around 64 processors. Nevertheless, there is still much work to be done. Reliability, Availability, and Serviceability A large number of mission-critical commercial applications require guarantees of little or no downtime. Isolating and protecting against hardware and software faults is especially difficult when resources are transparently shared, since the faults can quickly propagate and corrupt other parts of the system. Furthermore, a number of applications require incremental upgrades or replacement of various hardware and software components while the system is running. Finally, a major fraction of the total cost of ownership in commercial servers is in management and maintenance costs, and system designs and software that can tackle these costs are highly desirable. Hardware System Design The ever increasing gap between processor and memory speeds also introduces several challenges in multiprocessor designs, since memory system per-
OCR for page 15
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering formance becomes a more dominant factor in overall performance. Comparing the Stanford DASH (Lenoski et al., 1992) with aggressive next-generation multiprocessor systems, processor speeds will be over 30 times higher with memory latencies improving by only 10 times. Fortunately, memory bandwidths have improved faster than processor speeds during this period. Therefore, in addition to designing for low latency, it is important to exploit concurrency (supported by higher memory bandwidths) to tolerate the remaining latency. Techniques such as out-of-order instruction execution that are used in part to help tolerate higher memory latencies and provide more instruction-level parallelism have led to major increases in design time, number of designers, and number of verification engineers. This problem is exacerbated by the higher levels of integration that are making it feasible for a single chip to have several hundred million transistors. Hence, there will likely need to be a major shift in design methodology to deal with the increasing complexities of future chip designs. Commercial Applications Commercial workloads have been shown to have dramatically different behavior compared to scientific and engineering workloads (Barroso et al., 1998; Keeton et al., 1998). Due to the fraction of time spent in the memory system and lack of instruction-level parallelism, there is a relatively small gain from improving integer processor performance (also, there is no floating-point computation). Commercial applications also make frequent use of operating system services and I/O, making the performance of system software more important. The size and complexity of commercial applications, in addition to their frequent interaction with the operating system, also make it difficult to study and simulate the behavior of these workloads for designing next-generation systems. Nevertheless, there has been progress in the areas of scaling down commercial workloads to make them amenable to simulation (Barroso et al., 1998) and complete system simulation environments that include the behavior of the operating system (Rosenblum et al., 1995). INTERESTING DEVELOPMENTS Hardware Trends With increasing chip densities, future microprocessor designs will be able to integrate many of the traditional system-level modules onto the same chip as the processor. For example, the next-generation Alpha 21364 plans to exploit aggressively such integration by including a 1GHz 21264 core, separate 64 KByte instruction and data caches, a 1.5MByte second-level cache, memory controllers, coherence hardware, and network routers all on a single die. Such integration translates into a lower latency and higher bandwidth cache hierarchy and memory system. Another key benefit is lower system component counts, which
OCR for page 16
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering leads to more reliable and lower cost systems that are easier to manufacture and maintain. Finally, this approach lends itself to modular designs with incremental scalability. As mentioned, exploiting concurrency to tolerate memory latencies is also becoming increasingly important. There are two promising techniques in this area that exploit the inherent and explicit parallelism in workloads such as commercial applications for this purpose. The first technique, called simultaneous multithreading (SMT) (Tullsen et al., 1995; Lo et al., 1998), is applicable to wide-issue out-of-order processors. SMT involves issuing instructions from multiple threads in the same cycle to better use the many functional units in a wide-issue processor. The second technique, called chip multiprocessing (CMP) (Hammond et al., 1997), is motivated by higher integration densities and involves incorporating multiple (simpler) processor cores on the same chip. SMT and CMP are alternative mechanisms for exploiting explicit application-level parallelism to better use a given chip area. Hardware and Software Faults As multiprocessor servers scale to a larger number of processors, it becomes exceedingly important to isolate or hide both hardware and software failures. There are a number of options for handling faults with varying degrees of complexity and performance impact. At one extreme, the goal of fault tolerance is to completely hide the occurrence of failures by using redundancy. An alternative is to confine or isolate the effect of the fault to a small portion of the system. In the latter case, the chance of failure for a task should depend only on the amount of resources it uses and not on the size of the system. Therefore, while failures will be visible, they will be limited to tasks that used the failing resource. Compared to fault tolerance, fault containment can typically be achieved with much less cost and complexity and with minimal effect on performance. Fault containment is well understood in distributed systems where communication occurs only through explicit messages, and incoming messages can be checked for consistency. However, the efficient resource sharing enabled by shared-memory servers allows the effect of faults to spread quickly and makes techniques used in distributed systems too expensive given the low latency of communication. To provide fault isolation, most current shared-memory multiprocessors depend on statically dividing the machine into "hard" partitions that cannot communicate, thus removing the benefits of resource sharing across the entire machine. More recent work on fault containment in the context of the Stanford FLASH multiprocessor allows for "soft" partitions with support for resource sharing across partitions (Teodosiu. et al., 1997; Chapin et al., 1995). On the hardware side, this approach requires a set of features that limit the impact of faults and allow the system to run despite missing hardware resources. In addition, there is a recovery algorithm that restores normal operation after a
OCR for page 17
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering hardware fault. For software faults, the approach involves logically partitioning the system into cells that act as "failure units" and using various firewall mechanisms to protect cells from each other. This type of solution has yet to appear in commercially available systems, partly because of the lack of sufficient hardware support in current designs and partly because restructuring a commodity operating system for fault containment is a challenge. Achieving Software Scalability and Reliability Through Virtual Machine Monitors A virtual machine monitor is an extra layer of software introduced between the hardware and the operating system. The monitor exports and virtualizes the hardware interfaces to multiple operating systems, referred to as virtual machines. Virtual machine monitors were prevalent in mainframes in the 1970s and were used to multiplex the expensive hardware among several operating systems and users. However, the use of this technique eventually faded away. The Disco project at Stanford (Bugnion et al., 1997) has proposed to rejuvenate virtual machine monitors as an alternative to dealing with scalability and reliability requirements for large-scale shared-memory systems. As mentioned, restructuring a commodity operating system with millions of lines of code to achieve scalability and reliability is a difficult task. Fortunately, while operating systems and application programs continue to grow in size and complexity, the machine-level interface has remained fairly simple. Hence, the virtual machine monitor can be a small and highly optimized piece of software written with scalability and fault containment in mind. The Disco prototype consists of only 13,000 lines and uses several techniques to reduce the overheads (measured to be 5–15 percent) from running the extra layer of software. Disco allows multiple, possibly different, commodity operating systems to be running at the same time as separate virtual machines. To achieve scalability, multiple instances of the same operating system can be launched. For handling applications whose resource needs exceed the scalability of the commodity operating system, Disco provides the ability to explicitly share memory regions across virtual machines. For example, a parallel database server can share its buffer cache among multiple virtual machines. Disco also provides the alternative of developing specialized operating systems for resource-intensive applications that do not need the full functionality of commodity operating systems. Such specialized operating systems can coexist with commodity operating systems as separate virtual machines. Virtual machines also serve as the unit of fault containment. Hardware or software faults only affect the virtual machines that actually used the faulty resource. Disco also handles memory management issues that arise from non-uniform memory access by transparently doing page replication and migration. Again, changing commodity operating systems to do this would be more diffi-
OCR for page 18
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering cult. Finally, Disco inherits advantages of traditional virtual machine monitors: (a) older versions of system software can be kept around to provide a stable platform for running legacy applications, and (b) newer versions of operating systems can be staged in carefully with critical applications residing on older operating systems until newer versions have proven themselves. SUMMARY The emergence of the information age and the World Wide Web has had a major impact on the market for and design of high performance computers. Commercial workloads such as databases and Web servers have become the primary target of multiprocessor servers. Furthermore, the reliance on centralized information services has created demand for designs that provide high availability and incremental scalability. Hardware designs have reacted to these needs relatively quickly, with many of the current challenges in scalability and reliability residing on the software side. REFERENCES Barroso, L. A., K. Gharachorloo, and E. D. Bugnion. 1998. Memory system characterization of commercial workloads. Pp. 3–14 in Proceedings of the 25th International Symposium on Computer Architecture, Barcelona, Spain, June 27–July 1, 1998. Washington, D.C.: IEEE Computer Society Press. Bugnion E., S. Devine, K. Govil, and M. Rosenblum. 1997. Disco: Running commodity operating systems on scalable multiprocessors. ACM Transactions on Computer Systems 15(4):412–447. Chapin, J., M. Rosenblum, S. Devine, T. Lahiri, D. Teodosiu, and A. Gupta. 1995. Hive: Fault containment for shared-memory multiprocessors. Pp. 12–25 in Proceedings of the 15th ACM Symposium on Operating Systems Principles, Copper Mountain Resort, Colo. New York: Association for Computing Machinery. Hammond, L., B. A. Nayfeh, and K. Olukotun. 1997. A single-chip multiprocessor. Computer 30(9):79–85. Keeton, K., D. A. Patterson, Y. Q. He, R. C. Raphael, and W. E. Baker. 1998. Performance characterization of the quad Pentium Pro SMP using OLTP workloads. Pp. 15–26 in Proceedings of the 25th International Symposium on Computer Architecture, Barcelona, Spain, June 27–July 1, 1998. Washington, D.C.: IEEE Computer Society Press. Lenoski, D., J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. S. Lam. 1992. The Stanford DASH multiprocessor. Computer 25(3):63–79. Lo, J. L., L. A. Barroso, S. J. Eggers, K. Gharachorloo, H. M. Levy, and S. S. Parekh. 1998. An analysis of database workload performance on simultaneous multithreaded processors. Pp. 39–50 in Proceedings of the 25th International Symposium on Computer Architecture, Barcelona, Spain, June 27–July 1, 1998. Washington, D.C.: IEEE Computer Society Press . Rosenblum, M., S. A. Herrod, E. Witchel, and A. Gupta. 1995. Complete computer system simulation: The SimOS approach. IEEE Parallel and Distributed Technology 3(4):34–43. Teodosiu, D., J. Baxter, K. Govil, J. Chapin, M. Rosenblum, and M. Horowitz. 1997. Hardware fault containment in scalable shared-memory multiprocessors. Pp. 73–84 in Proceedings of the 24th International Symposium on Computer Architecture, Denver, Colo., June 2–4, 1997. Washington, D.C.: IEEE Computer Society Press.
OCR for page 19
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering Tullsen, D., S. Eggers, and H. Levy. 1995. Simultaneous multithreading: Maximizing on-chip parallelism. Pp. 392–403 in Proceedings of the 22nd International Symposium on Computer Architecture, Santa Margherita Ligure, Italy, June 18–24, 1995. Washington, D.C.: IEEE Computer Society Press.
OCR for page 20
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering Network Survivability and Information Warfare MICHAEL K. REITER Bell Laboratories, Lucent Technologies Murray Hill, New Jersey Perhaps the primary reason that we are "drowning in data" is the explosion of network connectivity in the last five years. Networked information systems (NISs) make accessible an unprecedented wealth of information and possibilities for communication, and do so in a very cost-efficient manner. We seem capable of deploying NISs with ever increasing capacities for transmitting, storing, and processing data, and the undeniable advantages of "getting connected" are drawing a broad array of applications—some of them life critical or financially critical—to be conducted or managed with the help of large NISs. At the same time we seem embarrassingly incapable of deploying NISs that can withstand adversity and protect the data and services they manage. This was cogently displayed in a 1997 operation named "Eligible Receiver," in which a National Security Agency team demonstrated how to break into U.S. Department of Defense and electric power grid systems from the Internet; generate a series of rolling power outages and 911 overloads in Washington, D.C., and other cities; break into unclassified systems at four regional military commands and the National Military Command Center; and gain supervisory-level access to 36 networks, enabling e-mail and telephone service disruptions (NRC, 1999). Over a period of a year during the Gulf War, five hackers from the Netherlands penetrated computer systems at 34 military sites, gaining the ability to manipulate military supply systems and obtaining information about the exact locations of U.S. troops, the types of weapons they had, the capabilities of the Patriot missile, and the movement of U.S. warships in the Gulf region (Denning, 1999). During 1999, the United States military has been under a sophisticated and largely successful network attack originating from Russia, in which even top secret systems have been compromised (Campbell, 1999). The very NISs that
OCR for page 21
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering are drowning us in data are also leaving us vulnerable to attacks that can take place from a distance, that can be conducted anonymously, and that can disable or degrade critical systems and national infrastructures. This is cause for deep concern in information warfare circles, as numerous countries and terrorist organizations are reported to be developing ways to exploit these vulnerabilities offensively. The present state of affairs has led to increased research in areas such as intrusion detection and response, and "penetration tolerant" or "survivable" NIS technologies. The breadth and complexity of the problems that must be addressed renders it impossible to cover them here; the interested reader is referred to a recent National Research Council (NRC) study that proposes research directions to address some of the issues involved (NRC, 1999). For the purposes of the present discussion, however, we call attention to one research direction advocated in this study, namely the construction of trustworthy (survivable or penetration-tolerant) systems from untrustworthy components. The basic idea is to construct an NIS using replication, redundancy, and diversity, so that even if some of its components are successfully attacked, the NIS will nevertheless continue to provide critical service correctly. This notion has analogs in a broad range of engineering disciplines. In avionics, for example, a technique known as Triple Modular Redundancy (TMR) is commonly used to mask the failure of a processor by having three processors redundantly perform a computation and ''vote" on the result. In this way, the result that occurs in a majority "wins," thereby hiding any effects of a single faulty processor. While a well-understood technique in the confines of an embedded system, these ideas are largely untested in larger, open settings in order to mask the misbehavior of data servers (e.g., file servers, name servers), network routers, or other NIS components. Most work on applying these principles to build survivable data services has attempted to duplicate the TMR approach in networked settings, where it has come to be known as "state machine replication." These systems consist of an ensemble of closely coupled computers that all respond to each client request. Again, the correct answer of the ensemble of servers is determined by voting. Even this seemingly simple extension of TMR to network settings introduces complexities, however. First, concurrent requests from different clients, if processed in different orders at different servers, can cause the states of correct servers to diverge, so that even correct servers return different answers to each request. Most systems thus implement protocols to ensure that all correct servers process the same requests in the same order—a task that is theoretically impossible to achieve and simultaneously guarantee that the system makes progress (Fischer et al., 1985). Therefore, systems that adopt this approach employ request delivery protocols that impose additional timing assumptions on the network (e.g., Pittelli and Garcia-Molina, 1989; Shrivastava et al., 1992; Cristian et al., 1995) or are guaranteed to make progress only conditionally (e.g., Reiter, 1996; Kihlstrom et al., 1998; Castro and Liskov, 1999). Second, the use
OCR for page 22
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering of every server to process every request precludes dividing server load among different servers. That is, adding more servers does not yield better performance; on the contrary, it tends to degrade performance, due to the additional obligations it imposes on the request ordering and output voting protocols. These properties, and the expense of the request ordering protocols themselves, thus seem to limit this approach to moderate replication of servers on local area networks only. A more scalable approach presently being explored is the adaptation of quorum systems to the task of providing survivable services tolerant of server penetrations (Malkhi and Reiter, 1998a). In this approach, clients perform data operations only at a subset, called a quorum, of the servers. Quorums are constructed so that, for example, any two quorums intersect in 2t + 1 servers, where t is a presumed maximum number of server penetrations that may occur. Using this property, it is possible to build access protocols that emulate a wide range of data objects despite the corruption of up to t data servers (Malkhi and Reiter, 1998b). An advantage of this approach over state machine replication is that quorums can be surprisingly small, e.g., O(vtn) servers out of a total of n servers. Consequently, this approach offers improved scaling and load balancing, so much so that it was recently employed in the design of a survivable nationwide electronic voting trial for the country of Costa Rica that was required to scale to hundreds of electronic polling stations spread across the country. There are limitations even to this approach, however. First, it complicates the task of detecting penetrated servers, and thus demands new mechanisms for doing so (Alvisi et al., 1999). Second, it is subject to a fundamental trade-off between availability and an ability to balance load among the servers; research is investigating techniques to break this trade-off by allowing a small and controlled probability of error in data operations (Malkhi et al., 1997). Third, in general, a new data access protocol must be designed to emulate each different data object, but on the other hand, many useful objects can be emulated in this environment without the limitations imposed by the impossibility result of Fischer et al. (1985). Finally, there remain basic limitations to what can be achieved with any attempt to build highly resilient data services from untrustworthy components. While effective against attacks that an attacker cannot readily duplicate at all servers (e.g., attacks exploiting configuration errors or platform-specific vulnerabilities in particular servers, or physical capture or server administrator corruption), these approaches provide little protection against attacks that can simultaneously penetrate servers with little extra incremental cost per penetration to the attacker. Research in adding artificial diversity to servers, in the hopes of eliminating common vulnerabilities among servers, is underway, but it is in a fledgling state.
OCR for page 23
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering REFERENCES Alvisi, L., D. Malkhi, L. Pierce, and M. K. Reiter. 1999. Fault detection for Byzantine quorum systems. Pp. 357–372 in Proceedings of the 7th IFIP Working Conference on Dependable Computing for Critical Applications, San Jose, Calif., January 6–8, 1999. Los Alamitos, Calif: IEEE Computer Society. Campbell, M. 1999. U.S. losing cyber war to Russian hackers. The Australian, July 26. Castro, M., and B. Liskov. 1999. Practical Byzantine fault tolerance. Paper presented at the 3rd Symposium on Operating Systems Design and Implementation, New Orleans, La., February 22–25,1999. Cristian, F., H. Aghili, R. Strong, and D. Dolev. 1995. Atomic broadcast: From simple message diffusion to Byzantine agreement. Information and Computation 18(1):158–179. Denning, D. E. 1999. Information Warfare and Security. New York: ACM Press. Fischer, M. J., N. A. Lynch, and M. S. Paterson. 1985. Impossibility of distributed consensus with one faulty process. Journal of the ACM 32(2):374–382. Kihlstrom, K. P., L. E. Moser, and P. M. Melliar-Smith. 1998. The SecureRing protocols for securing group communication. Pp. 317–326 in Proceedings of the 31st Annual Hawaii International Conference on System Sciences, Vol. III, R. H. Sprague, Jr., ed. North Hollywood, Calif.: Western Periodicals Co. Malkhi, D., and M. K. Reiter. 1998a. Byzantine quorum systems. Distributed Computing 11 (4):203–213. Malkhi, D., and M. K. Reiter. 1998b. Secure and scalable replication in Phalanx. Pp. 51–58 in Proceedings of the 17th IEEE Symposium on Reliable Distributed Systems, West Lafayette, Ind., October 20–23, 1998. Los Alamitos, Calif.: IEEE Computer Society. Malkhi, D., M. K. Reiter, and R. Wright. 1997. Probabilistic quorum systems. Pp. 267–273 in Proceedings of the 16th ACM Symposium on Principles of Distributed Computing, Santa Barbara, Calif., August 21–24, 1997. New York: Association for Computing Machinery. NRC (National Research Council). 1999. Trust in Cyberspace, F. B. Schneider , ed. Washington, D.C.: National Academy Press. Pittelli, F. M., and H. Garcia-Molina. 1989. Reliable scheduling in a TMR database system. ACM Transactions on Computer Systems 7(1):25–60. Reiter, M. K. 1996. Distributing trust with the Rampart toolkit. Communications of the ACM 39(4):71–74. Shrivastava, S. K., P. D. Ezhilchelvan, N. A. Speirs, S. Tao, and A. Tully. 1992. Principal features of the VOLTAN family of reliable node architectures for distributed systems. IEEE Transactions on Computers 41(5):542–549.
OCR for page 24
Fifth Annual Symposium on Frontiers of Engineering: National Academy of Engineering Moving up the Information Food Chain: The Future of Web Search OREN ETZIONI Go2net, Inc., Seattle and Department of Computer Science and Engineering University of Washington, Seattle The World Wide Web is at the very bottom of the Information Food Chain. The Yahoos and Alta Vistas of the world are information herbivores, which graze on Web pages and regurgitate them as directories and searchable collections. The talk focuses on information carnivores, which hunt and feast on Web herbivores. Meta-search engines, shopbots, and the future of Web search are also considered. URLs to visit include: Meta-search: www.metacrawler.com Shopbot: jango.excite.com Concierge for the web: www.askjeeves.com Clustering search: www.cs.washington.edu/research/clustering
Representative terms from entire chapter: