3
Summary and Findings: Research for National-scale Applications

RESEARCH CHALLENGES OF CRISIS MANAGEMENT

Crises can make enormous demands on the widely distributed information resources of a nation (see summary in Box 3.1). Responding to the Oklahoma City bombing disaster required a national call for search and rescue experts and their tools to help find survivors and to reinforce unstable areas of the damaged Alfred P. Murrah Building so that rescuers could enter safely, as well as massive coordination to focus a diverse set of teams on a common goal. Hurricane Andrew and the Northridge, California, earthquake caused widespread devastation and placed pressure on relief authorities to distribute food, water, shelter, and medicine and to begin receiving and approving applications for disaster assistance without delay.

Crises often bring together many different organizations that do not normally work together, and these groups may require resources that they have not used before. To mount an organized response in this environment, crisis managers can benefit from the use of information technology to catalog, coordinate, analyze, and predict needs; to report status; and to track progress. This kind of coordinated management requires communications networks, from handheld radios and the public telephone network to high-speed digital networks for voice, video, and data. Rapidly deployable communications technologies can help relief teams communicate and coordinate their actions and pool their resources. Crisis managers also need computers to help them retrieve, organize, process, and share information, and they rely on computer models to help them analyze and predict complex phenomena such as weather and damage to buildings or other structures.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 99
--> 3 Summary and Findings: Research for National-scale Applications RESEARCH CHALLENGES OF CRISIS MANAGEMENT Crises can make enormous demands on the widely distributed information resources of a nation (see summary in Box 3.1). Responding to the Oklahoma City bombing disaster required a national call for search and rescue experts and their tools to help find survivors and to reinforce unstable areas of the damaged Alfred P. Murrah Building so that rescuers could enter safely, as well as massive coordination to focus a diverse set of teams on a common goal. Hurricane Andrew and the Northridge, California, earthquake caused widespread devastation and placed pressure on relief authorities to distribute food, water, shelter, and medicine and to begin receiving and approving applications for disaster assistance without delay. Crises often bring together many different organizations that do not normally work together, and these groups may require resources that they have not used before. To mount an organized response in this environment, crisis managers can benefit from the use of information technology to catalog, coordinate, analyze, and predict needs; to report status; and to track progress. This kind of coordinated management requires communications networks, from handheld radios and the public telephone network to high-speed digital networks for voice, video, and data. Rapidly deployable communications technologies can help relief teams communicate and coordinate their actions and pool their resources. Crisis managers also need computers to help them retrieve, organize, process, and share information, and they rely on computer models to help them analyze and predict complex phenomena such as weather and damage to buildings or other structures.

OCR for page 99
--> BOX 3.1 Summary of Crisis Management Characteristics and Needs Crises make large-scale demands, are unpredictable, and require an immediate response. Large-scale demands. Crises require resources beyond those on hand—people, equipment, communications, information, and computing must be moved rapidly to the scene physically and/or virtually (over networks). Unpredictable. It cannot be known in advance what resources will be needed or where, and what the specific needs will be (although there can be some degree of generalizing and pre-positioning). Urgent. The response must be rapid, because lives and property are at stake. Crises require planning and coordination. Crisis managers must develop and implement a response plan rapidly, despite information shortfalls (gaps, uncertainty, errors, deliberate falsification by a foe) and the lack of correspondence to any previous situations (i.e., standard operating procedures are not sufficient). Diverse organizations and people respond to crises, including those that have not worked together before and did not know that they would have to do so. This creates challenges for collaboration, information sharing, and communication. Crises are complex and multifaceted, and so decision makers must weigh multifaceted consequences. Trade-offs require not just tactical optimization, but judgment—the best tactical option may not be the best political option (e.g., in an international context where U.S. military and civilian agencies are operating in another country, perhaps in a tense situation). Operational needs include communications and networking. A rapid initial assessment of the situation is necessary, requiring reports from the scene, augmented by sending assessment teams with tools and communications to report back quickly. Remote sensing may also be involved (e.g., satellite and aircraft imagery, ground-based weather monitors, and strain gauges predeployed within bridges and buildings). Rapid deployment of communications capabilities is required—to expand the initial situation assessment, coordinate the response teams, and disseminate information to the public. It is necessary to (a) assess what is available (remote regions, less developed countries, and badly damaged areas may all have limited infrastructure) and (b) obtain needed capabilities by commandeering what is there (priority access), restoring networks, and augmenting with deployable capabilities (cellular telephones, wireless networks, sensors) as needed. Required communications parameters must be defined and implemented rapidly. These include (a) reliability (crucial for life-critical communications, e.g., fire and rescue, telemedicine); and (b) security—to maintain confidentiality (especially if an active adversary is involved, but also to protect any private information that is communicated), maintain the integrity of information, and authenticate certain users to allow them priority access. Crises require more than voice communications—text, all types of sensor outputs, images, full-motion video, and data files must also be communicated; all involve different technological requirements and trade-offs (e.g., latency, quality, bandwidth). Crises demand integration across a wide, unpredictable scale of resources; thus, there must be flexibility about centralization versus distribution: (a) computing and communications that are available at or accessible to the crisis team include laptops and wireless at the scene, workstations and T-1 (1.5 megabits per second) data links at the command center, and fully distributed computing and communications (e.g., World Wide Web, remote supercomputers) outside the crisis zone; (b) flows of information throughout (into, out of, within) this architecture are unpredictable and may change during the crisis itself. At the scene, computers and communications platforms must be mobile and untethered. Operational needs also include information resources and computation. There is a need for multifaceted information—multiple modes (voice, video, images, text, sensors, geographic information system (GIS) data, relational databases, and so on). It cannot be predicted in advance which multiple sources will be required. Sources (a) cannot be used if crisis managers cannot find them or do not know about them (discovery); (b) cannot be used unless they can be accessed and integrated into a crisis manager's information system (interoperability, composability, access rights—intellectual property, privacy); and (c) cannot be used if the crisis managers are flooded with information. Some kind of automated help is needed to sort information—not just filtering it, but also integrating the information to reduce the flow and detecting patterns that can help with interpretation. Information systems must continually check and integrate new information coming in from the field. There is a real-time demand: for example, simulation or model data (e.g., weather forecast) will not be useful if they arrive late. Crisis management in a broader context involves other needs as well. Crisis management draws on other application domains, for example: (a) secure electronic commerce for locating, purchasing, and distributing relief supplies and services; (b) digital libraries as means for information discovery, integration, and presentation for crisis managers; (c) secure telemedicine and distributed medical records to facilitate the delivery of emergency care in the field; and (d) manufacturing and distributed design, which, although not applicable in real time (during a crisis), reflect common interests such as distributed modeling, simulation, shared databases, and virtual environments for collaboration. Application needs for technology exist in a broader context as well: (a) solving crisis management problems is not just a computing or communications issue, given that it also involves political, managerial, social, and educational issues; (b) the political, economic, and marketplace context affects the availability of technology resources (hardware, software, information) for crisis management—thus, affordability is essential; and (c) the sociology of organizations affects how they use these technologies. Complex systems must be tested in operational contexts to validate research and determine new research needs. NOTE: See Chapter 1 for a detailed discussion of the crisis management characteristics and needs that create demands for computing and communications. When disasters occur, the public deserves and demands a rapid response, and so the ability to anticipate events is at a premium. For example, when a hurricane approaches, relief agencies deploy mobile communications centers to places where sophisticated computer models predict the storm will strike land. Damage simulations help planners decide where to send food, medicine, shelters, blankets, and other basic necessities even before the damage has occurred. As the response to California's Northridge earthquake demonstrated, relief agencies can

OCR for page 99
--> use computer simulation to speed the approval of disaster relief (e.g., home rebuilding loans) in areas that the model estimates are hard hit, even before agents have visited the site. Preparing for and responding to crises place demands on information technology that cannot be satisfied readily with existing tools, products, and services. These unmet demands point to many promising research directions, which this chapter summarizes. They encompass most aspects of computing and

OCR for page 99
--> communications technology, including networks and the architectures and conventions that organize and manage them; the services and standards that unite a network of communications devices, sensors, computers, and databases into a useful information infrastructure; and the applications that rely on that infrastructure. One common thread among the steering committee's findings is that some of the most severe technical challenges stem from the sheer scale of requirements that must be met. Scale in this context has many dimensions—the large number of people and devices (e.g., computers, databases, sensors) involved; the diversity of information resources and software applications that must be accessible; the amount of computing power needed to run models and process information quickly enough to meet the urgent demands of crises, along with the ability to

OCR for page 99
--> access and use that power rapidly and easily; and the complexity of the interactions among systems (both human and technical) that must interwork to deal with crises. Another theme is that technologies must be easy enough to use that they complement people, rather than distract them from their mission. Technology does nothing by itself; people use technology, and designers and developers of technical systems must consider people and their methods of working as integral to the systems and their successful performance. For example, a secure, distributed information system may fail to remain secure in practice if it is so cumbersome that users ignore necessary procedures in favor of convenience. Too often, unfortunately, users are given too little consideration in the design of complex systems, and the systems consequently fail to be as useful as they could or should be. In the extreme case of a crisis, a system that is difficult to use will not be used at all. Research on and development of computing and communications technologies that help crisis managers cope with extreme time pressures and the unpredictability of crises will likely be useful in other application areas domains.1 For example, breakthroughs in meeting the time-critical information discovery and integration requirements of crisis management would benefit broader digital library applications as well. Distributed simulation and the need to compose existing, legacy information sources and software smoothly into new, case-specific systems are among the overlaps with manufacturing. Secure, mobile communication in a crisis is also valuable for emergency medicine, particularly as confidential medical records begin to be communicated over networks. Tools that are easy to use in a crisis will probably also be usable for electronic commerce, which similarly must span a wide range of personal skills, computer platforms, and network access mechanisms. Although many of the research issues identified throughout the workshop series are not new to the computing and communications research community, placing them in the context of crisis management and other national-scale applications suggests priorities and sharpens the research focus. The priorities fall across a spectrum. Research projects tied relatively closely to specific crisis management application needs are valuable both because of the potential benefit to the applications and for the advances they may produce in technology usable in other areas. Box 3.2 presents promising examples from the workshops. To secure the full benefits of this application-specific research, there must also be recognition of the broader, increasingly interconnected context in which national-scale applications operate. These interconnections allow components to be called on in unforeseen ways. This presents powerful opportunities for creative new uses of resources, but only if technical challenges to these novel uses can be overcome. During Hurricane Andrew, for example, it was not only the difficulty of translating between different standards that delayed Dade County authorities from making data available to federal relief officials, but also their

OCR for page 99
--> BOX 3.2 Selected Crisis Management Application-Specific Research The discussions between crisis management experts and technologists at the three Computer Science and Telecommunications Board workshops led to identification of a variety of compelling, application-motivated computer science and engineering research topics. A selection of these topics is presented here. It is not an exhaustive list of the technologies needed to solve problems in crisis management, nor does it imply that technological advances are crisis management's most dire needs. However, these topics do appear promising in terms of advancing the state of technology and testing broader architectural concepts. A self-configuring wireless data network for linking crisis field stations could create a capability that does not exist today for crisis managers to coordinate information. It could produce advances useful in other domains, such as hospital or school networking, and could provide useful tests for new communications protocols and network management methods based on, for example, self-identification by components and resources on networks to other components that call upon them. Adaptive networks could be developed that discover and react to changes in available communications links and changing patterns of traffic. They could, for example, route traffic around points of network failure and take advantage of links that become available after a crisis begins, such as the self-configuring wireless network described above or parts of the public switched telecommunications network. This research could stimulate more general work on adaptivity in communications hardware and network management software in contexts other than crises. In crisis management (as well as in military command and control), there is a need for research on ''judgment support" tools that can assist crisis managers in making decisions based on incomplete knowledge of conditions, capabilities, and needs, by drawing on information from diverse and unanticipated sources. Such tools would interpret information about the quality and reliability of varied inputs and assist the user in taking these variations into account. They would differ from, but could build upon, traditional decision support tools such as knowledge-based systems, which operate using rules associated with previously examined problems. Because many of the problems raised by crisis management are not known ahead of time, more general techniques are needed. These might include the development of new representations of the quality of inputs (such as meta-data about those inputs), data fusion, models of information representation and integration, rapidly reconfigurable knowledge-based systems, and new techniques for filtering and presenting information to users. A meta-computer resource would provide rapid allocation of distributed computing resources outside a crisis zone to meet crisis management needs for modeling and simulation. Because crises are intermittent and unpredictable, but require real-time urgent response, this research would highlight ways to bring computers, storage, and communications on line rapidly. It would also require new ways to coordinate resources across infrastructure that was not previously set aside for that purpose. Crisis management can motivate simulation research in hurricane landfall prediction, severe storm prediction, atmospheric dispersion of toxic substances, urban fire spread, and many other modeling and simulation problems. To be useful in actual crisis response, these simulations must be linked with inputs such as environmental sensor data in real time and must be able to produce outputs in readily usable form. They should also support a capability to focus the simulation on specific locations and scales in response to requests from crisis managers. In addition, better simulation of human behavior and social phenomena could provide more realistic training and decision support for crisis managers by indicating consequences of decisions on public opinion, international tensions, financial markets, and other areas. Multimedia fusion of data coming from varied, unexpected sources, including imagery (e.g., still photographs and video from amateur citizens with video cameras, news helicopters, automated teller machine security cameras), sensor data (e.g., weather, seismology, structural stress gauges in bridges and buildings); and information from databases (e.g., locations of buildings and roads from a geographic information system (GIS), names of residents and businesses from telephone directories) will be necessary. The ability to integrate information in response to unanticipated queries could be facilitated by automated tagging of data with relevant meta-data (e.g., source, time, image resolution). There is a need for virtual anchor desks in crisis management, with a mixture of people and machines and adaptive augmentation as necessary. Anchor desks for specific functional or subject areas (e.g., logistics, weather forecasting, handling of toxic substances) would provide a resource of computational power, information, and expertise (both human and machine) to crisis officials. The anchor desk capability could be distributed throughout the nation and assembled on call via networks in response to crises. Research issues related to this effort might include new models of information needs within organizations and optimizing the balance of computation, communications, and storage inside and outside the crisis zone. Adaptive interfaces for crisis managers could respond to changing user performance under stress by observing usage patterns and testing for user errors. Crisis management tools could adapt to the performance of the user—for example, by presenting information more slowly and clearly and, if necessary, warning the user that he or she needs rest. Such tools ideally would operate on platforms across the wide scale of computing and display capabilities (e.g., processing, storage, and bandwidth) available in crisis management. Geographic information systems that are more capable than current ones could integrate many data formats (which would promote competition among information providers) and many types of information (such as pictures and real-time links to sensors). Integration should include registering and incorporating these data types fully into the coordinate representation and relational model of the GIS, not just appending markers to points on maps. The GIS should more naturally display crucial factors that are not currently shown clearly, such as uncertainty of data points. Finally, it should be affordable and usable for field workers with laptop computers.

OCR for page 99
This page in the original is blank.

OCR for page 99
--> hesitancy to share private data, which relates to the lack of reliable, in-place mechanisms for ensuring privacy and payment for those data. Therefore, applications require both efforts focused on specific needs and a broadly deployed information infrastructure, including services that help people and their tools to achieve integration, and standards and architectures that provide consistent interactions between elements at all levels. Information infrastructure, of course, does not spring into existence from a vacuum. The workshops reinforced the observation that in crisis management and other national-scale applications, diversity—of people, organizations, methods of working, and technologies (e.g., databases, computers, software)—impedes creating national architectures from scratch. (See Box S.2 in the chapter "Summary and Overview" for further discussion.) Although it might be possible to imagine a single, uniform architecture that met crisis managers' needs for communications interoperability, data interchange, remote access to computation, and others, deploying it would not be practicable. The technical challenge of incorporating legacy systems into the new architecture would slow such an effort. In addition, many public and private organizations would have to agree to invest in new technologies in concert, but no single architecture could conform to all organizations' needs and work modes. Retraining and reorganizing organizations' processes to accommodate new systems would take time. Finally, crisis management illustrates that even a coherent architecture created for one domain would be called upon to integrate in unexpected ways with other domains. Therefore, there is a need for—and the steering committee's findings address—research, development, and deployment efforts leading both to consistent architectural approaches that work on national scales and to general-purpose tools and services that make ad hoc integration of existing and new systems and resources easier. Specific applications, such as those listed in Box 3.2, should serve to test these approaches, to advance key technologies, and to meet important application needs. The organization of the findings reflects this view that both application-targeted and broader infrastructural research is needed. Finding 1 emphasizes the importance of experimental testbeds as a context for coordinating the crucial interplay among research, development, and deployment in one important and challenging application area, crisis management. Finding 2 highlights the value of investigating the features of existing national-scale architectures to identify principles underlying their successes and failures. These findings are discussed in the section "Technology Deployment and Research Progress." The remaining findings identify architectural concerns that represent technological leverage points for computing and communications research investments, the outcomes of which could benefit many national-scale applications. The research underlying these findings is discussed in greater detail in Chapter 2. The findings abstract the common threads among the networking, computation, information, and user-centered technologies of Chapter 2 to indicate high-priority

OCR for page 99
--> application-motivated research needs that cross multiple levels of computing and communications. There is necessarily some overlap in the research issues discussed in these areas, because some technological approaches can contribute to meeting more than one architectural objective. These findings are presented in four subsequent sections: Support of Human Activities Finding 3: Usability Finding 4: Collaboration System Composability and Interoperability Finding 5: Focused Standards Finding 6: Interoperability Finding 7: Integration of Software Components Finding 8: Legacy and Longevity Adapting to Uncertainty and Change Finding 9: Adaptivity Finding 10: Reliability Performance of Distributed Systems Finding 11: Performance of Distributed Systems Outcomes of testbed and architecture study activities (see Findings 1 and 2) can and should inform future reexamination of findings in these architectural areas, which represent the best understanding of a range of technology and application experts in 1995-1996. The findings frame research derived primarily from addressing the requirements of crisis management. However, the steering committee believes that such research would have much broader utility, because of the extreme nature of the demands that crises place on technology. In addition, many of the research directions relate to increasing the capabilities of information infrastructure to meet extreme demands for ease of use, integration, flexibility, and distributed performance, which will benefit any application using it. The findings are illustrated by practical needs identified in the workshops and examples of specific directions that researchers could pursue. These suggestions are not intended to be exhaustive, nor are they presented in priority order; deployment and experimentation are required to determine which approaches work best. However, they are promising starting directions, and they illustrate the value of studying applications as a source of research goals. TECHNOLOGY DEPLOYMENT AND RESEARCH PROGRESS The workshop series focused on applications partly in the recognition that computing and communications research and development processes depend on the deployment and use of the technology they create. This is true not only in the

OCR for page 99
--> BOX 3.3 Crisis Management Testbeds: Relationship to Previous Visions The Committee on Information and Communications (CIC) has called for "pilot implementations, applications testbeds, and demonstrations, presenting opportunities to test and improve new underlying information and communications technologies, including services for information infrastructures." (CIC, 1995a, p. 8) The CIC plan anticipates these testbeds and demonstrations as fitting into three broad classes of user-driven applications related to long-term National Science and Technology Council goals: High-performance applications for science and engineering (modeling and simulation, focused on the Grand Challenges); High-confidence applications for dynamic enterprises (security, reliability, and systems integration issues); and High-capability applications for the individual (education, digital libraries, in-home medical information). The crisis management application area bridges all three of these classes. A testbed or technology demonstration that supported distributed training and planning exercises in crisis response, for example, could incorporate modeling of natural and man-made phenomena in real time; secure and reliable communications; interoperability and integration of existing information resources, such as public and commercial databases; and adaptive, user-centered interfaces. Evolving the High Performance Computing and Communications Initiative to Support the Nation's Information Infrastructure, a comprehensive review of the High Performance Computing and Communications Initiative conducted by the Computer Science and Telecommunications Board (CSTB, 1995a), also supported the notion of emphasizing nationally important applications (what the initiative called National Challenges) to test and guide the development of basic infrastructure: Recommendation 8: Ensure that research programs focusing on the National Challenges contribute to the development of information infrastructure technologies as well as to the development of new applications and paradigms. The National Challenges incorporate socially significant problems of national importance that can also drive the development of information infrastructure. Hardware and software researchers should play a major role in these projects to facilitate progress and to improve the communication with researchers developing basic technologies for the information infrastructure. Awards to address the National Challenges should reflect the importance of the area as well as the research team's strength in both the applications and the underlying technologies. The dual emphasis recommended by the steering committee contrasts with the narrower focus on scientific results that has driven many of the Grand Challenge projects.1 Testbeds for computing and communications technologies to aid crisis management would support the dual focus on applications and infrastructure by emphasizing the participation of crisis managers and technology experts in limited-scale deployments for training, planning, and to the extent practical, operational missions. 1   The 1995 CSTB report concluded that the National Challenges defined in the High Performance Computing and Communications Initiative were too broad to offer specific targets for large-scale research, and therefore, "the notion of establishing testbeds for a complete national challenge is premature. Instead, research funding agencies should regard the National Challenges as general areas from which to select specific projects for limited-scale testbeds. . . ." (CSTB, 1995a, p. 59). The crisis management testbed described by Finding 1 of this current report should be understood as an intermediate step—something larger than a single research project as the 1995 CSTB report implied, but not a complete, nationwide crisis management system. sense that efficient allocation of research investments should lead ideally to products and services that people want, but also in the sense that it is ultimately through deployment and use that technologists can test the validity of their theories and technical approaches. This is not a unique recognition; it fits within a stream of recent analyses, including a strategic implementation plan of the Committee on Information and Communications, America in the Age of Information (CIC, 1995a) and the Computer Science and Telecommunications Board review of the High Performance Computing and Communications Initiative (HPCCI), Evolving the High Performance Computing and Communications Initiative to Support the Nation's Information Infrastructure (CSTB, 1995a). The opportunities that the steering committee's first two findings identify for learning from study of deployed technologies, however, have not received extensive attention to date.

OCR for page 99
--> Finding 1: Crisis Management Testbeds Testbeds and other experimental technology deployments can enable researchers and technologists to develop and test technologies cooperatively in a realistic application context (see Box 3.3). They can serve as a demanding implementation environment for new technologies and sources of feedback to identify and refine research objectives. Such testing is particularly important in progressing toward deploying national-scale applications, in order to verify theoretical concepts about the scalability of system characteristics, interoperability with other systems, and usability by people in realistic situations—all of which are difficult or impossible to predict in the laboratory. Test projects and technology demonstrations are under way in most national-

OCR for page 99
--> the medium may outlast the hardware or software for accessing the information on it, leaving the information inaccessible. This problem is particularly critical for national-scale applications because these applications and the data supporting them should not be bound to particular software components, computer platforms, data formats, and other technological artifacts that will be outlived by the specific information being managed. Otherwise, application users in the future will be unable to optimize specific technology decisions to meet their needs because they will be shackled by a legacy of old information objects and software. The constraints placed on current technical options by the need to maintain access to technologies developed in the past are the essence of the technological hand-from-the-grave influence that currently restrains the evolution of many large, complex systems, such as the nation's air traffic control systems. An approach to the management of information objects and systems architectures that is based on sound general principles can prevent such constraints in the future. There are three directions in which further research is needed to address problems of longevity—(1) naming and addressing, (2) resource discovery, and (3) support for evolution. With respect to naming and addressing, a key problem is that information and other resources and services are mobile, and over long periods of time anything that survives is extremely likely to move. For example, network hosts disappear or move to different locations, file systems are reorganized, and whole institutions split, merge, or move. As a result, the situation with respect to URLs, which identify the location of resources in the World Wide Web, is unstable. URLs contain not only the location (including both host name and path name within a host), but also the access method or protocol. Although the widespread deployment of the Web is only a few years old, many URLs have already become obsolete, often providing no recourse to discover whether the information sought has moved elsewhere or is simply unavailable. One significant direction for improvement, whose requirements were recently defined within the Internet Engineering Task Force, is to separate naming from addressing.14 This would involve the definition of Uniform Resource Names (URNs), a new type of name intended to be long-lived, globally unique, and independent of either location or access method. These, in turn, are translated (resolved) into URLs as necessary, but it is the URNs that should be embedded in objects for long-term storage, enabling future identification and use. There is still significant work to be done in this domain, because the problems of how to do name-to-location resolution have not been solved. This undertaking is larger in scale by orders of magnitude than the host-name resolution provided by the Internet's Domain Name Service, which is probably inadequate to handle the degree of volatility and mobility needed for URNs because information probably can move much more frequently than hosts. A follow-on problem is that even if a service arises that scales and handles the rate of updates more effectively, in the long run it may well fail or be replaced. Research at the Massachusetts Institute

OCR for page 99
--> of Technology (MIT; the Information Mesh project) is attempting to address problems of allowing for both a multiplicity of resolution services and an architecture that provides fallback mechanisms, so that if one path to finding resolution fails, another may succeed; this is all very preliminary, however, and more research is needed. The second part of the solution is to help users find resources. URNs are not intended to be user friendly, but rather computer friendly. Because they should be globally unique, they are unlikely to be mnemonic or to fit into the various naming schemes that suit human preferences. For this, additional resource discovery services are needed, such as keyword searching and similarity checking. There are some significant early efforts in this direction,15 but there continues to be a need for more sophisticated searching tools, especially as people with less computer savviness become frequent users. It is difficult to build a local naming and search tool that is tuned to particular application domains or to private use. All too frequently these services point to dead ends, such as outdated URLs; the services should be better able to weed out bad data. In a crisis, if a search engine overwhelms the user with an indistinguishable mix of good and bad information, the overall result may be useless. A third area, discussed further in Finding 9 (''Adaptivity"), relates to the ability of information and other resources to evolve. Although it is desirable for new capabilities and technologies to be employed within equipment and services (e.g., to use new, enhanced interfaces), the evolution must be smooth and easy for people and their applications to adapt to or else the new capabilities may not be used. Application designers cannot know in advance all possible directions for evolution of useful resources, and so to support evolution, applications and infrastructures should be designed to enable applications to learn about and utilize new and evolving resources. The specific research directions implied by this need are discussed in Finding 9. 8. Technological and architectural methods should be developed for reconciling the need to maintain access to long-lived information and software assets with the need to enable application users to exploit new technologies as they become available. Such methods should be applied both at the middleware level and in the architectural design of national-scale applications. Suggested Research Topics: Research is necessary to specify the minimal component services in an information infrastructure that allow for identifying, finding, and accessing resources, and to develop protocols for service definitions that are both minimal in terms of needs and extensible to allow for improved service. Some specific examples following the library analogy are services to help people determine

OCR for page 99
--> which objects and resources they want (a service like that of a librarian who suggests books), the registration of individual resources (e.g., Library of Congress catalog numbers), the location service (e.g., a catalog), and mechanisms for user authentication and access control policies (e.g., placing books on reserve for students registered in a particular class). Mechanisms to implement these services require, in particular, ways to manage information about how to interpret typed information objects (ranging from documents to data in databases and software components) at the network level. ADAPTING TO UNCERTAINTY AND CHANGE A crucial problem faced by all national-scale application areas, but particularly crisis management, is that of dealing effectively with uncertainty in three areas: infrastructure (e.g, networks and network-based services such as naming and addressing), components of integrated solutions, and the nature and behavior of potentially useful resources. Uncertainty and change are involved in all of these areas. In a crisis, changes can produce uncertainty on a scale of minutes: Are the telephone lines in the disaster area down? How soon will they be restored? Change on a longer time scale can also produce uncertainty: Can a firm adapt its new computer system to work with its old databases? These problems highlight the need for systematic, architectural solutions to the problems of adaptivity and reliability. Progress in these specific areas will benefit any application domain that is sensitive to factors such as human errors, overloading of resources, and other unpredictable situations. Indeed, as all application domains grow in scale, these conditions will become more common. Finding 9: Adaptivity During and after a crisis, it is critically important that network services and resources be available. This need implies an adaptivity to unusual or extenuating circumstances beyond traditional network operational criteria. Other national-scale application areas could also benefit from increased adaptivity, for several reasons: sharing of network-based resources implies significant fluctuations in demand for and availability of those resources; human errors and system failures are inevitable; and new applications and unusual uses of existing applications can generate entirely unanticipated circumstances. Network-based systems (e.g., communications systems, computer networks, and sensor networks) should be prepared not only to route around points of congestion or failure, but also to adapt to changing availability of resources. Methods for achieving this adaptivity in a crisis are likely to be broadly useful in many domains. Crisis management demonstrates a number of specific ways in which adaptivity is critical to system design. At the network level, for example, if the local,

OCR for page 99
--> preexisting network infrastructure is at least partially operational, it may be valuable to integrate it with components brought by the crisis management team. This could involve attaching portable computers preloaded with crisis-related data and software into existing local area networks or connecting predeployed sensors, such as security cameras, into a network deployed for the crisis response. In practice, identifying and making use of the existing infrastructure are difficult; consequently, relief workers frequently arrive with an entirely separate system whose parts and operation they understand. Yet this approach does not eliminate their problems, because in many cases, multiple organizations arrive, each with its own equipment, networks, and policies for using them (such as access priority and security), making effective integration of all available resources difficult. Adaptivity in this case may reflect the ability to rapidly implement compromise positions where resources owned or controlled by different parties are integrated with agreements about policies for shared use. Applications that run in uncertain environments also should be designed for adaptivity. For example, if network service is available only intermittently, applications such as shared information repositories and collaboration tools should be prepared to adapt to varying network resources. They should also be reconfigurable or able to configure themselves to take advantage of new or evolving resources. For example, information sent from a crisis command center might be sent to field workers as maps and diagrams when sufficient bandwidth is available, but as text when the bandwidth is reduced. Multiple, distributed copies of databases could be designed to replicate updates to each other (maintaining overall coherence) only when bandwidth is available or to restrict updates only to the highest-priority information such as locations of people needing medical attention. During a videoconference, if congestion occurs, a shift to a lower image resolution could enable the conference to continue. An attractive feature in such circumstances would be support for choice by the users between reduced resolution and fewer frames per second as appropriate to their needs.16 A different kind of example is the application that can adapt to changes in the availability of information inputs. Crisis managers must make judgments in the absence of complete data. Judgment support applications (e.g., building damage simulations, logistics planners to estimate emergency supply requirements, map-based evacuation route planners) must adapt not only to statistical uncertainty, but also to gaps, mistakes, and deliberate falsifications in their input data. This requires much more than simplistic interpolation of missing data—it demands an ability to make inferences about what the correct data are likely to be. Applications also need to evolve and adapt to changes on a longer scale. For example, if simulation-based training programs are designed to train people by providing accurate maps and images of possible crisis locations, adaptivity should enable incorporation of new, better sources for that information over time. Originally, the simulation may use line drawings with altitude designations, later

OCR for page 99
--> incorporating information from aerial photographs and weather prediction systems. In crises, it would be especially valuable for applications to discover and exploit automatically, without the need for time- and attention-consuming human intervention, the capabilities of resources whose usefulness could not have been anticipated when the application was written. These might include new objects or services with enhanced functionalities that did not exist when the application was written (e.g., new kinds of environmental sensors); legacy resources that have existed for a long time, but have a structure or form that the application designer did not anticipate having to access (e.g., records of earthquake damage patterns from past years); and resources created for use in a different application area (e.g., architectural designs used to plan evacuation routes during a crisis). To enable successful use of unanticipated resources in all these cases, continued research should address the question of how applications might learn about and make use of such objects. This problem has two parts. First, the application must be able to learn about the functionality of the new resource, which can be expressed in its type. To find the type of the new resource, the application must be able to ask the resource itself or some other service to identify the type of the resource. Both CORBA and the Information Mesh project at MIT make a first cut at this by requiring that all objects (resources) support a function to answer such a query, if asked. Second, the application may have to import code to access the new type of resource. The importation of code at run time generally is possible only in programming environments that support interpreters, such as the Lisp programming language and its derivatives or Java; importing code at run time to interface into other languages such as C or C++ generally is not feasible. Thus, the problem of utilizing resources of unanticipated types can be split into two research directions, one directed toward protocols for querying objects and the services to support that activity, and the other advancing work in language, compiler, and runtime technologies. 9. Research is needed to increase the adaptivity of networks and applications by providing them with the tools, resources, and facility to function during and after unusual or extenuating circumstances. National-scale applications, especially those supporting crisis management, must be able to function in an environment marked by variability of available resources and of requirements for resources. Suggested Research Topics: Self-organizing networks are those in which the components and resources of the network can discover and learn about each other without the need for a centralized management structure. Self-organizing networks will have less need for human intervention than is otherwise required. There is both theoretical and practical research to be done, ranging from whether such networks can stabilize

OCR for page 99
--> themselves, to the protocols by which components learn about each other and the specific kinds of information that components must share to enable self-organization. Supporting mobile users and resources is a particular challenge, since the network must be able to reorganize continually. Mobile IP is one way of accounting for mobility by forwarding packets to the user's current location (similar to roaming in cellular telephone systems). However, it introduces latency that is often unacceptable for real-time communications such as voice and video (Katz, 1995). Improvements in network management are needed, including tools for discovering the state of existing infrastructure17 and extensions to current models of network capabilities to reflect such aspects as reliability, availability, security, throughput, connectivity, and configurability. These could enable management tools with new paradigms of merging the access, priority, and security parameters of networks that interconnect with each other during crises in unanticipated ways. One approach might be to develop a priority server that could administer access rights flexibly within a network as users and their needs change during a crisis. Methods are needed for reconciling network adaptivity with minimizing vulnerabilities to intruders and other threats. Legitimate actions taken by adaptive self-organizing networks to conform to changes in the available infrastructure may in some cases be difficult for network managers to distinguish from hostile infiltration by an intruder. Significant challenges exist in making secure, adaptive networks that recognize self and do not launch "autoimmune" attacks. Artificial intelligence methods in network management may be a fruitful area for research to meet this need. Security should adapt to the mobility of people and changing configurations of networks. For example, how can federal officials arrive in California after an earthquake and provide valid identification recognized by the network without requiring that the infrastructure assign everyone new identities and passwords? How do those officials access useful files from their home offices while in some other security domain? How do the secure domains decide they can trust each other? Research is needed to support composition of security policies across administrative domains and mobility of access rights. Crisis managers have a clear need for better tools for discovering what network-accessible resources are available to them in time of crisis. More powerful search and retrieval mechanisms than keyword matching are necessary, as are solutions that allow searching within an unanticipated application domain. Rapidly configurable virtual subnets are required that span multiple underlying network resources but provide services such as privacy and access control, as though users were isolated on a private network. Research is needed both to develop the actual protocols necessary to create functional virtual subnets and to provide a clearer understanding of how well virtual subnets can be isolated from broader network environments to support features such as security, access control, reliability, and bandwidth on demand.

OCR for page 99
--> Application component interface specification and exploration protocols are needed to enable applications to interact with evolving or new resources. There has been some research into interface specifications, but uniformity is lacking. To provide application adaptivity that works at a national scale, either one architecture must be selected (which is unlikely) or protocols must be written to allow negotiation between applications and services of the interface specification language and support tools to be used in any particular case. For example, new protocols would be needed to allow an application that accesses both CORBA objects and OLE objects to discover from objects which kind they are and then use the appropriate model to query the object or resource about its capabilities. Finding 10: Reliability The utility of an application or application component often depends on an assessment of its reliability. Maximum reliability is not always necessary; what the user requires is to understand the degree of reliability, to determine whether or not it is within acceptable tolerances, and to decide appropriate actions. In managing a crisis, for example, decision makers must constantly judge the accuracy of the information they are using in making decisions. (They do not necessarily ignore questionable information, but they weigh it differently than more certain information.) Aircraft manufacturers assess the reliability of a subcontractor's part design before incorporating it into an airplane design. Health care workers assess the probable correctness of each item of data about a patient before making a diagnosis or taking action. The quality of inputs, the predictability of events, the validity of simulations, the correct functioning of large-scale applications, and similar factors underlie the quality of information yielded by computer and network applications. These must be understood for people to rely on information and computation technologies in national-scale applications. To facilitate these assessments for computing and communications systems on which the nation increasingly depends, reliability attributes of system components need to be formalized and exposed whenever possible. This will require research. For example, a crisis response application constructed dynamically from disparate parts must continually predict and assess the reliability of each of its parts. Some of the parts, such as remote computing facilities running a well-tested modeling program, may be assumed by the crisis application to be highly reliable with known probabilities of correctness and measures of precision. More typically, however, many of the components contributing to a crisis management solution do not have such known attributes. This is particularly true if people are part of the system or if untested, previously unintegrated subsystems are used. Furthermore, the nature of the crisis may change a reliable system into an unreliable one through unanticipated scaling problems. Therefore, an important unmet

OCR for page 99
--> application need is the ability to develop confidence factors based on the reliability of parts of a system. Assessment of confidence factors can complement other approaches to improving reliability. Many application areas, such as manufacturing, use design and testing processes and redundant subsystems to achieve reliability goals. Adaptive systems, such as those discussed in Finding 9, represent another set of approaches to achieving reliability. Some components of an application solution, however—particularly those involving people—do not have well-defined ways of developing reliability factors. New insights and approaches are needed to improve the reliability of the weak links in a system and, as a separate topic, to capture, quantify, and communicate the reliability status (whether strong or weak) of each component. The latter topic is particularly important in national-scale applications, which have high public visibility and must provide the public with a high level of confidence that they function correctly and, when they do not, that the problem can be identified and corrected quickly. When an airplane crashes, investigators retrieve the "black box" and analyze recorded data to determine what may have caused the crash, so that steps can be taken to avoid future problems and reestablish public confidence. It would be valuable in national-scale applications to develop a black-box analog (perhaps a set of required procedures) for identifying and correcting errors. 10. Research is needed to enable accurate assessments of the reliability of systems composed of potentially unreliable hardware, software, and people. Consistent methods for evaluating reliability should lead not only to more reliable systems, but also to better ways of using systems in applications, such as crisis management, where absolute reliability is unattainable but reliability factors might be assessable. The ultimate goal of these efforts is to develop measures of confidence in the behavior of systems that support national-scale applications. Suggested Research Topics: A black box technology should be developed for national-scale applications, analogous to that in aircraft, that enables the rapid identification and correction of errors, coupled with procedures for responding to problems that ensure continuing confidence in the viability of the application. Basic and applied research in chaotic processes is needed to better understand the reliability of applications in the presence of poor-quality information (e.g., errors, incompleteness, internal inconsistencies). Research might examine the trade-offs between urgency and fidelity of information collection in crises and methods for validating and reconciling poor-quality information. To adapt to errors, whatever the source, applications must be robust.

OCR for page 99
--> Applications should be self-adapting and have self-describing, self-propagating metrics of component and information reliability. These metrics should reflect the implications of having people as an integral part of applications. Reliability attributes should be developed and propagated as meta-data associated with system components. PERFORMANCE OF DISTRIBUTED SYSTEMS As the scale of applications grows, not only in the geographical distance between components but also in the complexity of the interrelationships among the components and the utilization of lower-level resources (e.g., networks, processors, memory, storage), the performance of systems that support applications must increase if they are to achieve results rapidly enough to be usable. In addition, the performance of the various infrastructural resources must be balanced to produce effective results. Finding 11: Performance of Distributed Systems Crisis management presents an especially challenging set of requirements for balanced performance in both computer systems and networks. Because timeliness is nearly always paramount, extraordinary computing power and network bandwidth are required to ensure that results can be delivered soon enough to be relevant. Moreover, there is rarely time in a crisis to tune software performance, and the easier a computer program is to use effectively, the more likely it is to be used in the stress-laden working environment of a crisis. Since crises are infrequent and seldom predictable as to place and time, establishing dedicated computing and communications resources is economically impractical. Whatever large-scale, high-performance computing and communications capabilities are made available for responding to a crisis will need to be preempted from less urgent work. The potpourri of data needed to help answer queries and supply input for simulations must be marshaled from its many resident locations as quickly as possible, and high-bandwidth networking must be delivered to the scene for transmission of imagery, including simulation results. Achieving computer system interoperability, adaptivity, and reliability, especially in connection with a crisis, calls for exceptional computing power and storage capacity. For crisis management, capabilities even beyond those appropriate to ordinary circumstances are required to manage a largely ad hoc and unreliable interconnection of computer systems that were never designed to work together in the first place. The software that makes these deficiencies tolerable adds to the computing burden. The deployment of computations across networks and the use of distributed and possibly heterogeneous computer systems to address single problems are attractive for crisis management and other national-scale applications.

OCR for page 99
--> Increasing the size of many candidate computations to national scale may be impractical because of poor performance. For example, storm and wildfire simulations may perform more poorly as distributed computations than data acquisition and reformatting do. As MIT's Barbara Liskov said, "Everyone knows scalability is important. But no one knows how to show [that] you have it, short of running experiments with huge numbers of machines, which is usually not practical. We need a way to reason about scalability." At every point in the parallel and distributed software design and development cycle, scalability in performance should be treated as a first-class problem. 11. Research is needed to better understand how to reason about, measure, predict, and improve the performance of distributed systems. Crisis management and other national-scale applications demand high-performance systems and tools that balance processing speed, communications bandwidth, and information storage and retrieval. Suggested Research Topics: Current capability to model the performance of systems that are distributed across heterogeneous networks and computing platforms is very limited.18 Predicting the performance of large, distributed software systems is particularly difficult but would be quite valuable in addressing national-scale application needs. Research is needed to identify what parameters of network, processing, and storage components are critical to systems' ability to meet specified performance criteria, such as capacity and responsiveness, and to develop appropriate metrics for these parameters. Research should include a measurement program to evaluate the ability of models to predict how systems will perform under normal conditions and in crises. These models could be tested, for example, in the context of the crisis management testbeds discussed in Finding 1. NOTES 1.   The reverse, however, is not necessarily true; technologies that have been developed for other domains may not meet the needs of crisis management for coping with urgency and unpredictability. 2.   The "where" of deployment includes physical as well as conceptual locations, such as a layer or layers in the technical architecture. 3.   The CSTB report observed that the gigabit testbeds—experimental research networks supported under the High Performance Computing and Communications Initiative—supported the concept of large-scale networks offering higher performance than current networks. The examination of existing, apparently successful network architectures advocated in the steering committee's Finding 2 should be seen as complementary to work recommended in the 1995 report's conclusion that "ongoing research in several areas is still needed before a ubiquitous high-performance information infrastructure can be developed and deployed nationwide. . . . [S]uccessful evolution of the nation's communications capability rests on continued investment in basic hardware, networking, and software technologies research" (CSTB, 1995a, p. 54).

OCR for page 99
--> 4.   For a discussion of the relationship between the study of deployed systems and the development of new research directions, see CSTB (1989) and CSTB (1992). 5.   A different kind of negative effect that people may have on systems occurs when, in hostile situations such as crime or warfare, they attack systems to harm their performance. 6.   Research in scientific visualization aims at permitting computational scientists to observe and understand intuitively the effects of variations in models of phenomena they are studying; see, for example, Hibbard et al. (1994). Extending this sort of visualization into the crisis management context not only requires better models of uncertain phenomena such as mass social behavior, but also challenges the ability to display results meaningfully on equipment of a performance that crisis management agencies are likely to be able to afford. 7.   The diversity of organizations with different structures and patterns of working makes it necessary for these communications models to accommodate different modes, when collaboration crosses organizational boundaries as it frequently does. 8.   An important feature of the problem-solving environment would be the ability to abstract application requirements and translate those requirements into specifications for software system functionality. Developing such an ability will require considerable research. 9.   Middleware provides services within an information infrastructure that are used in common among multiple applications. For a discussion, see CSTB, 1994b, p. 49. 10.   The ARPANET, precursor to the Internet, exhibited emergent phenomena related to network control functions that unpredictably produced massive slowdowns in the network. Fundamental design principles to predict and avoid such phenomena in large-scale systems remain lacking. 11.   Revisions to code are no guarantee of improvement; managing the proliferation of different versions of the same code is another formidable challenge. 12.   Alternatively, Java and similar network-centered models of computing illustrate an emerging, distributed approach to software development. In this approach, developers across the Internet are participating in group development projects using the models of consortia and distributed applications based on multiple interactive Web services. These projects do not look like software development projects in the traditional sense, but they may yield workable, large-scale solutions. 13.   In fact, because genetic influences on medical conditions may be understood increasingly, maintaining medical histories longer than a lifetime may become more and more valuable to descendants. 14.   Kunze, John A, "Functional Recommendations for Internet Resource Locators," Internet Request for Comments (RFC) 1736, February 1995; and Sollins, Karen, and Larry Masinter, "Functional Requirements for Uniform Resource Names," Internet RFC 1737, December 1994. Both are available on line at http://www.cis.ohio-state.edu/hypertext/information/rfc.html. 15.   Examples include the Wide Area Information Service (WAIS) and Harvest, a project at the University of Colorado. There are also a number of searching tools designed specifically for the World Wide Web, such as AltaVista from Digital Equipment Corporation, Lycos, and others. 16.   For example, doctors might decide that only the full level of performance is acceptable, whereas medical insurers might opt for lower resolution and professors showing chalkboard diagrams might opt for fewer frames per second. 17.   Such tools exist, but are difficult to use and require a higher level of technical expertise than is readily available in a crisis response. 18.   Research has achieved some success in one aspect of this problem, that of producing real-time systems. These are systems that can vary their algorithmic approach to a problem in order to converge on a solution by a specified deadline, perhaps sacrificing some accuracy to meet the time constraint. However, much more work is required to generalize this understanding to aspects of performance other than converging before deadlines and to the less well-defined problems characteristic of crises.