Click for next page ( 5


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 4
2 Summary of Workshop Discussions SESSION 1: PROCESS, ARCHITECTuRE, AND THE gRAND SCALE Panelists: John Vu, Boeing, and Rick Selby, Northrop Grumman Corporation Moderator: Michael Cusumano Panelist presentations and general discussions at this session were intended to explore the following questions from the perspectives of soft- ware development for government and commercial aerospace systems: • What are the characteristics of successful approaches to architec- ture and design for large-scale systems and families of systems? • Which architecture ideas can apply when systems must evolve rapidly? • What kinds of management and measurement approaches could guide program managers and developers? Synergies Across Software Technologies and business Practices Enable Successful Large-Scale Systems Context matters in trying to determine the characteristics of success- ful approaches—different customer relationships, goals and needs, pacing of projects, and degree of precedent all require different practices. For example, different best practices may apply depending on what sort of 

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS system or application is under development. Examples discussed include commercial software products, IT and Internet financial services, air- planes, and government aerospace systems. • Different systems and software engineering organizations hae different customers and strategies. They may produce a variety of deliverables, such as a piece of software, an integrated hardware-software environment, or very large, complicated, interconnected, hardware-software networked systems. • Different systems and software engineering organizations hae different goals and needs. Product purposes vary—user empowerment, business operations, and mission capabilities. Projects can last from a month to 10 or 12 years. The project team can be one person or literally thousands. The customer agreement can be a license, service-level agreement, or contract. There can be a million customers or just one—for example, the government. The managerial focus can be on features and time to market; cycle time, workflow, and uptime; or reliability, contract milestones, and interdependencies; and so on. • While some best practices, such as requirements and design reiews and incremental and spiral life cycles, are broadly applicable, specific practice usu- ally aries. Although risk management is broadly applicable, commercial, financial, and government system developers may adopt different kinds of risk management. While government aerospace systems developers may spend months or years doing extensive system modeling, this may not be possible in other organizations or for other types of products. Com- mercial software organizations may focus on daily builds (that is, each day compiling and possibly testing the entire program or system incor- porating any new changes or fixes); for aerospace systems, the focus may be on weekly or 60-day builds. Other generally applicable best practices that vary by market and organization include parallel small teams, design reuse, domain-specific languages, opportunity management, trade-off studies, and portability layers. These differences are driven by the differ- ent kinds of risks that drive engineering decisions in these sectors. • Goernment aerospace systems deelopers, along with other ery large software-deelopment enterprises, employ some distinctie best practices. These include independent testing teams and, for some aspects of the systems under consideration, deterministic, simple designs. These practices are driven by a combination of engineering, risk-management, and contrac- tual considerations. In a very large1 organization, synergy across software technologies and business practices can contribute to success. Participants explored 1Very large in this case means over 100,000 employees throughout a supply chain doing systems engineering, systems development, and systems management; managing multiple product lines; and building systems with millions of lines of code.

OCR for page 4
 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE the particular case of moderately precedented systems2 and major com- ponents with control-loop architectures. For systems of this kind there are technology and business practice synergies that have worked well. Here are some examples noted by speakers: • Decomposition of large systems to manage risk. With projects that typi- cally take between 6 and as many as 24 months to deliver, incremental decomposition of the system can reduce risk, provide better visibility into the process, and deliver capability over time. Decomposition accelerates system integration and testing. • Table-based design, oriented to a system engineering iew in terms of states and transitions, both nominal and off-nominal. This enables the use of clear, table-driven approaches to address nominal modes, failure modes, transition phases, and different operations at different parts of the system operations. • Use of built-in, domain-specific (macro) languages in a layered architec- ture. The built-in, command-sequencing macro language defines table- driven executable specifications. This permits a relatively stable infra- structure and a run-time system with low-level, highly deterministic designs yet extensible functionality. It also allows automated testing of the systems. • Use of precedented and well-defined architectures for the task manage- ment structure that incorporates a simple task structure, deterministic process- ing, and predictable timelines. For example, a typical three-task management structure might have high-rate (32 ms) tasks, minor-cycle (128 ms) tasks, and background tasks. The minor cycle reads and executes commands, formats telemetry, handles fault protection, and so forth. The high-rate cycle handles message traffic between the processors. The background cycle adds capability that takes a longer processing time. This is a reusable processing architecture that has been used for over 30 years in space- craft and is aimed at the construction of highly reliable, deterministic systems. • Gaining adantages from lack of fault proneness in reused components by achieing high leels of code, design, and requirement reuse. One example of code reuse was this: Across 25 NASA ground systems, 32 percent of software components were either reused or modified from previous sys- tems (for spacecraft, reuse was said to be as high as 80 percent). Designs and requirements can also be reused. Typically, there is a large backward 2 Precedent refers to the extent to which we have experience with systems of a similar kind. More specifically, there can be precedent with respect to requirements, architecture and de- sign, infrastructure choices, and so on. Building on precedent leads to routinization, reduced engineering risk, and better predictability (lower variance) of engineering outcomes.

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS compatibility set of requirements, and these requirements can be reused. Requirements reuse is very common and very successful even though the design and implementation might each be achieved differently. Design reuse might involve allocation of function across processors in terms of how particular algorithms are structured and implemented. The functions might be implemented differently in the new system, for example, in com- ponents rather than custom code or in different programming languages. This is an example of true design reuse rather than code reuse. In addition to these synergies, it was suggested that other types of analyses could also contribute to successful projects. Data-driven statisti- cal analyses can help to identify trends, outliers, and process improve- ments to reduce or mitigate defects. For example, higher rates of compo- nent interactions tend to be correlated with more faults, as well as more fault-correction effort. Risk analyses prioritize risks according to cost, schedule, and technical safety impacts. Charts that show project risk mitigation over time and desired milestones help to define specific tasks and completion criteria. It was suggested that each individual risk almost becomes a microcosm of its own project, with schedules and milestones and progressive mitigation of that risk to add value. One approach to addressing the challenge of scale is to divide and conquer. Of course, arriving at an architectural design that supports decomposition is a prerequisite for this approach, which can apply across many kinds of systems development efforts. Suggestions included the following: • Diide the organization into parallel teams. Divide very large 1,000- person teams into parallel teams; establish a project rhythm of design cycles and incremental releases. This division of effort is often based on a system architectural structure that supports separate development paths (an example of what is known as Conway’s law—that software system structures tend to reflect the structures of the organizations they are devel- oped by). Indeed, without agreement on architectural fundamentals—the key internal interfaces and invariants in a system—division of effort can be a risky step. • Innoate and synchronize. Bring the parallel teams together, whether the task is a compilation or a component delivery and interface integra- tion. Then stabilize, usually through some testing and usage period. • Encourage coarse-grain reuse. There is a lot of focus on very fine- grain reuse, which tends to involve details about interfaces and depen- dencies; there is also significant coarse-grain opportunity to bring together both legacy systems and new systems. A coarse-grain approach makes possible the accommodation of systems at different levels of maturity.

OCR for page 4
 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE Examples of success in coarse-grain reuse are major system frameworks (such as e-commerce frameworks), service-based architectures, and lay- ered architectures. • Automate. Automation is needed in the build process, in testing, and in metrics. Uncertainty is inherent in the development of software-intensive systems and must be reassessed constantly, because there are always unknowns and incomplete information. Waiting for complete information is costly, and it can take significant time to acquire the information—if it is possible to acquire it at all. Schedules and budgets are always limited and almost never sufficient. The goal, it was argued, should be to work effec- tively and efficiently within the resources that are available and discharge risks in an order that is appropriate to the goals of the system and the nature of its operating environment: Establish the baseline design, apply systematic risk management, and then apply opportunity management, constantly evaluating the steps needed and making decisions about how to implement them. Thus, it was suggested that appropriate incentives and analogous penalty mechanisms at the individual level and at the organization or supplier level can change behavior quickly. The goal is thus for the incentive structure to create an opportunity to achieve very efficient balance through a “self-managing organization.” In a self-man- aging organization, it was suggested, the leader has the vision and is an evangelist rather than a micromanager, allowing others to manage using systematic incentive structures. Some ways to enable software technology and business practices for large-scale systems were suggested: • Creating strategies, architectures, and techniques for the devel- opment and management of systems and software, taking into account multiple customers and markets and a broad spectrum of projects, from small scale through large. • Disseminating validated experiences and data for best practices and the circumstances when they apply (for example, titles like “Case Studies of Model Projects”). • Aligning big V waterfall-like systems engineering life-cycle models with incremental/spiral software engineering life-cycle models. 3 3 The V model is a V-shaped, graphical representation of the systems development life cycle that defines the results to be achieved in a project, describes approaches for developing these results, and links early stages (on the left side of the V) with evaluation and outcomes (on the right side). For example, requirements link to operational testing and detailed design links to unit testing.

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS • Facilitating objective interoperability mechanisms and benchmarks for enabling information exchange. • Lowering entry barriers for research groups and nontraditional suppliers to participate in large-scale system projects (Grand Challenges, etc.). • Encouraging advanced degree programs in systems and software engineering. • Defining research and technology roadmaps for systems and soft- ware engineering. • Collaborating with foreign software developers. Process, Architecture, and very Large-Scale Systems Remarks during this portion of the session were aimed at thinking outside the box about what the state of the art in architectures might look like in the future for very large-scale, complex systems that exhibit unpre- dictable behavior. The primary context under discussion was large-scale commercial aircraft development—the Boeing 777 has a few million lines of code, for example, and the new 787 has several million and climbing. It was argued that very large-scale, highly complex systems and fami- lies of systems require new thinking, new approaches, and new processes to architect, design, build, acquire, and operate. It was noted that these new systems are going from millions of lines of code to tens of millions of lines of code (perhaps in 10 years to billions of lines of code and beyond); from hundreds of platforms (servers) to thousands, all interconnected by heterogeneous networks; from hundreds of vendors (and subcontractors) to thousands, all providing code; and from a well-defined user com- munity to dynamic communities of interdependent users in changing environments. It was suggested that the issue for the future—10 or 20 years from now—is how to deal with the potential billion lines of code and tens of thousands of vendors in the very diverse, open-architecture- environment global products of the future, assembled from around the world. According to the forward-looking vision presented by speakers, these systems may have the following characteristics: • Very large-scale systems would integrate multiple systems, each of them autonomous, having distinctive characteristics, and performing its own functions independently to meet distinct objectives. • Each system would have some degree of intelligence, with the objectives of enabling it to modify its relationship to other component sys- tems (in terms of functionality) and allowing it to respond to changes, per- haps unforeseen, in the environment. When multiple systems are joined

OCR for page 4
0 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE together, the significant emergent capabilities of the resulting system as a whole would enable common goals and objectives. • Each very large-scale system would share information among the various systems in order to address more complex problems. • As more systems are integrated into a very large-scale system, the channels connecting all systems through which information flows would become more robust and continue to grow and expand throughout the life cycle of the very large-scale system. It was argued that a key benefit of a very large-scale system is the interoperability between operational systems that allows decision mak- ers to make better, more informed decisions more quickly and accurately. From a strategic perspective, a very large-scale system is an environment where operational systems have the ability to share information among geographically distributed systems with appropriate security and to act on the shared information to achieve their desired business goals and objectives. From an operational perspective, a very large-scale system is an environment where each operational subsystem performs its own functions autonomously, consistent with the overall strategy of the very large-scale system. The notion of continuous builds or continuous integration was also discussed. Software approaches that depend on continuous integration— that is, where changes are integrated very frequently—require processes for change management and integration management. These processes are incremental and build continuously from the bottom up to support evolution and integration, instead of from the top down, using a plan- driven, structured design. They separate data and functions for faster updates and better security. To implement these processes, decentralized organizations and an evolving concept of operations are required to adapt quickly to changing environments. The overall architectural framework for large-scale systems described by some participants in this session consists of five elements: • Goernance. These describe the rules, regulations, and change man- agement that control the total system. • Operational. These describe how each operational system can be assembled from preexisting or new components (building blocks) to oper- ate in their own new environment so they can adapt to change. • Interaction. These describe the communication (information pipe- line) and interaction between operational systems that may affect the very large system and how the very large system will react to the inputs from the operational systems.

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS • Integration and change management. These describe the processes for managing change and the integration of systems that enable emergent capabilities. • Technical. These depict the technology components that are neces- sary to support these systems. It was suggested that large-scale systems of that future that will cope with scale and uncertainty would be built from the bottom up by start- ing with autonomous building blocks to enable the rapid assembly and integration of these components to effectively evolve the very large-scale system. The architectural framework would ensure that each building block would be aligned to the total system. Building blocks would be assembled by analyzing a problem domain through the lens of an opera- tional environment or mission for the purpose of creating the character- istics and functionality that would satisfy the stakeholders’ requirements. In this mission-focused approach, all stakeholders and modes of opera- tions should be clearly identified; different user viewpoints and needs should be gathered for the proposed system; and stakeholders must state which needs are essential, which are desirable, and which are optional. Prioritization of stakeholders’ needs is the basis for the development of such systems; vague and conflicting needs, wants, and opinions should be clarified and resolved; and consensus should be built before assembling the system. At the operational level, the system would be separated from current rigid organization structures (people, processes, technology) and would evolve into a dynamic concept of operation by assembling separate build- ing blocks to meet operational goals. The system manager should ask: What problem will the system solve? What is the proposed system used for? Should the existing system be improved and updated by adding more functionality or should it be replaced? What is the business case? To realize this future, participants suggested that research is needed in several areas, including these: • Governance (rules and regulations for evolving systems). • Interaction and communication among systems (including the pos- sibility of negative interactions between individual components and the integrity, security, and functioning of the rest of the system). • Integration and change management. • User’s perspective and user-controlled evolution. • Technologies supporting evolution. • Management and acquisition processes. • An architectural structure that enables emergence.

OCR for page 4
 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE • Processes for decentralized organizations structured to meet opera- tional goals. SESSION 2: DOD SOFTWARE CHALLENgES FOR FuTuRE SySTEMS Panelists: Kristen Baldwin, Office of the Under Secretary of Defense for Acquisitions, Technology and Logistics, and Patrick Lardieri, Lockheed Martin Moderator: Douglas Schmidt Panelist presentations and general discussions during this session were intended to explore two questions, from two perspectives: that of the government and that of the government contractor: • How are challenges for software in DoD systems, particularly cyber-physical systems, being met in the current environment? • What advancements in R&D, technology, and practices are needed to achieve success as demands on software-intensive system development capability increase, particularly with respect to scale, complexity, and the increasingly rapid evolution in requirements (and threats)? DoD Software Engineering and System Assurance An overview of various activities relating to DoD software engineering was given. Highlights from the presentation and discussion follow. The recent Acquisition & Technology reorganization is aimed at positioning systems engineering within the DoD, consistent with a renewed emphasis on software. The director of Systems and Software Engineering now reports directly to the Under Secretary of Defense for Acquisition and Technology. The mission of Systems and Software Engineering, which addresses evolv- ing system—and software—engineering challenges, is as follows: • Shape acquisition solutions and promote early technical planning. • Promote the application of sound systems and software engineer- ing, developmental test and evaluation, operational test and evaluation to determine operational suitability and effectiveness, and related technical disciplines across DoD’s acquisition community and programs. • Raise awareness of the importance of effective systems engineering and raise program planning and execution to the state of the practice. • Establish policy, guidance, best practices, education, and train- ing in collaboration with the academic, industrial, and government communities.

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS • Provide technical insight to program managers and leadership to support decision making. DoD’s Software Center of Excellence is made up of a community of participants including industry, DoD-wide partnerships, national part- nerships, and university and international alliances. It will focus on sup- porting acquisition; improving the state of the practice of software engi- neering; providing leadership, outreach, and advocacy for the systems engineering communities; and fostering resources that can meet DoD goals. These are elements of DoD’s strategy for software, which aims to promote world-class leadership for DoD software engineering. Findings from some 40 recent program reviews were discussed. These reviews identified seven systemic issues and issue clusters that had con- tributed to DoD’s poor execution of its software program, which were highlighted in the session discussion. The first issue is that software requirements are not well defined, traceable, and testable. A second issue cluster involves immature architectures; integration of commercial-off- the-shelf (COTS) products; interoperability; and obsolescence (the need to refresh electronics and hardware). The third cluster involves software development processes that are not institutionalized, have missing or incomplete planning documents, and inconsistent reuse strategies. A fourth issue is software testing and evaluation that lacks rigor and breadth. The fifth issue is lack of realism in compressed or overlapping schedules. The sixth issue is that lessons learned are not incorporated into successive builds—they are not cumulative. Finally, software risks and metrics are not well defined or well managed. To address these issues, DoD is pursuing an approach that includes the following elements: • Identification of software issues and needs through a software industrial base assessment,4 a National Defense Industrial Association (NDIA) workshop on top software issues, and a defense software strategy summit. The industrial base assessment, performed by CSIS, found that the lack of comprehensive, accurate, timely, and comparable data about software projects within DoD limits the ability to undertake any bottom-up analysis or enterprise-wide assessments about the demand for software. Although the CSIS analy- sis suggests that the overall pool of software developers is adequate, the CSIS assessment found an imbalance in the supply of and demand for the specialized, upper echelons of software developer/management cadres. These senior cadres can be grown, but it takes time (10 or more years) and 4 Center for Strategic and International Studies (CSIS), Defense-Industrial Initiatives Group, 2006. Software Industrial Base Assessment: Phase I Report, October 4.

OCR for page 4
 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE a concerted strategy. In the meantime, management/architecture/systems engineering tools might help improve the effectiveness of the senior cadres. Defense business system/COTS software modification also places stress on limited pools of key technical and management talent. Moreover, the true cost and risk of software maintenance deferral are not fully understood. • Creation of opportunities and partnerships through an established net- work of goernment software points of contact; chartering the NDIA Software Committee; information exchanges with goernment, academia, and industry, and planning a systems and software technology conference. Top issues emerg- ing from the NDIA Defense Software Strategy Summit in October 2006 included establishment and management of software requirements, the lack of a software voice in key system decisions, inadequate life-cycle planning and management, the high cost and ineffectiveness of traditional software verification methods, the dearth of software management exper- tise, inadequate technology and methods for assurance, and the need for better techniques for COTS assessment and integration. • Execution of focused initiaties such as Capability Maturity Model Inte- gration (CMMI) support for integrity and acquisition, a CMMI guidebook, a handbook on engineering for system assurance, a systems engineering guide for systems of systems (SoSs), the proision of software support to acquisition programs, and a ision for acquisition reform. SoSs to be used for defense require special considerations for scale (a single integrated architecture is not feasible), ownership and management (individual systems may have different owners), legacy (given budget considerations, current systems will be around for a long time), changing operations (SoS configurations will face changing and unpredictable operational demands), criticality (systems are integrated via software), and role of the network (SoSs will be network-based, but budget and legacy challenges may make imple- mentation uneven). To address a complex SoS, an initial (incremental) version of the DoD’s SoS systems engineering guide is being piloted; future versions will address enterprise and net-centric considerations, management, testing, and sustaining engineering. The issue of system assurance—reducing the vulnerability of systems to malicious tampering or access—was noted as a fundamental consid- eration, to the point that cybertrust considerations can be a fundamental driver of requirements, architecture and design, implementation practice, and quality assurance.5 Because current assurance, safety, and protection 5A separate National Research Council study committee is exploring the issue of cyberse- curity research and development broadly, and its report, Toward a Safer and More Secure Cyber- space, will be published in final form in late 2007. See http://cstb.org/project_cybersecurity for more information.

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS tivity increase, it was argued that the assurance bar for software quality and cybersecurity attributes can be raised by (1) raising the component assurance bar (resources are finite and organizations can spend too much time and too many resources trying to patch their way to security) and (2) getting customers to understand and accept that assurance for custom software can be raised if they are willing to pay more (if customers do not know about costs that are hidden, they cannot accept or budget for them). One set of best practices and technologies to write secure software was described. It includes • Secure coding standards, • Developer training in secure coding, • Enabled, embedded security points of contact (the “missionary model”), • Security as part of development including functional, design, test (include threat modeling), • Regressions (including destructive security tests), • Automated tools (home grown, commercial of multiple flavors), • Locked-down configurations (delivering products that are secure on installation), and • Release criteria for security. However, these practices are not routinely taught in universities. Nei- ther the software profession not the industry as a whole can simply rely on a few organizations doing these kinds of things. Discussion identified some necessary changes in the long run: • Uniersity curricula. It was argued that university programs should do a better job of teaching secure coding practices and training future developers to pay attention to security as part of software development. If the mindset of junior developers does not change, the problem will not be solved. One participant said, “Process won’t fix stupidity or arrogance.” Incentives to be mindful of security should be integrated throughout the curriculum. When security is embedded throughout the development process, a small core of security experts is not sufficient. One challenge is how to balance the university focus on enduring knowledge and skills against the need for developers to understand particular practices and techniques specific to current technologies. • Automation. Automated tools are promising and will be increas- ingly important, but they are not a cure-all. Automated tools are not yet ready for universal prime time for a number of reasons, including: Tools need to be trained to understand the code base; programmers have dif-

OCR for page 4
0 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE ficulty establishing sound and complete rules; most of today’s tools look only for anticipated vulnerabilities (e.g., buffer overruns) and cannot be readily adapted to new classes of vulnerabilities; there are often too many false positives; scalability is an issue; one size does not fit all (it is prema- ture for standards) and therefore multiple tools are needed; and there is not a good system for rating tools. Conventional wisdom holds that people will not pay more for secure software. However, people already are paying for bad security—a 2002 study by the National Institute of Standards and Technology (NIST) reported that the consequences of bad software cost $59 billion a year in the United States.20 It was argued that from a development standpoint, security cost-effectiveness should be measured pragmatically. However, a simple return on investment (ROI) is not the right metric. From a devel- oper’s perspective, the goal should be the highest net present value (NPV) for cost avoidance of future bugs—not raw cost savings or the ROI from fixing bugs in the current year. Another way of valuing security is oppor- tunity cost savings—what can be done (e.g., building new features) with the resources saved from not producing patches. From the customer’s perspective, it is the life-cycle cost of applying a patch weighed against the expected cost of the harm from not applying the patch. Customers want predictable costs, and the perception is that they cannot budget for the cost of applying patches (even though the real cost driver is the consequences of unpatched systems). If customers know what they are getting, they can plan for a certain level of risk at a certain cost. The goal is to find the match between expected risk for the customer and for the vendor—how suitable the product is for its intended use. Certification is a way of assessing what is “good.”21 But partici- pants were not optimistic when considering prospects for certification of development processes. There is too much disagreement and ideol- ogy surrounding development processes. However, there can be some commonality around aspects of good development processes. Certifying developers is also problematic. In engineering, there are accredited degree programs and clear licensing requirements. The awarding of a degree in computer science is not analogous to licensing an engineer because there is not the same common set of requirements, especially robustness and safety requirements. In addition, it can be difficult to replicate the results 20 See NIST, 2002, “Planning Report 02-3: The economic impacts of inadequate infrastructure for software testing.” Available online at http://www.nist.gov/ director/prog-ofc/report02-3.pdf. 21A recent NRC study examines the issue of certification and dependability of software systems. See information on the report Software for Dependable Systems: Sufficient Eidence? at http://cstb.org/project_dependable.

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS of software engineering processes, making it hard to achieve confidence such that developers are willing to sign off on their work. Moreover, it was argued that with current curricula, developers generally do not even learn the basics of secure coding practice. There is little to no focus on security, safety, or the possibility that the code is going to be attacked in most educational programs. It was argued that curricula need to change and that computer science graduates should be taught to “assume an enemy.” Automated tools can give better assurance to the extent that ven- dors use them in development and fix what they find. Running evaluation tools after the fact on something already in production is not optimal. 22 It was suggested that there is potential for some kind of “goodness meter” (a complement to the “badness meter” described in the next section) for tool use and effectiveness—what tool was used, what the tool can and cannot find, what the tool did and did not find, the amount of code cov- ered, and that tool use was verified by a third party. Software Security: building Security In Discussions in this session focused on software security as a systems problem as opposed to an application problem. In the current state of the practice, certain attributes of software make software security a challenge: (1) connectivity—the Internet is everywhere and most software is on it or connected to it; (2) complexity—networked, distributed, mobile code is hard to develop; and (3) extensibility—systems evolve in unexpected ways and are changed on the fly. This combination of attributes also con- tributes to the rise of malicious code. Massively multiplayer online games (MMOGs) are bellwethers of things to come in terms of sophisticated attacks and exploitation of vul- nerabilities. These games experience the cutting edge of what is going on in software hacking and attacks today.23 Attacks against such games are 22 It was suggested that vendors should not be required to vet products against numerous tools. It was also suggested that there is a need for some sort of Common Criteria reform with mutual recognition in multiple venues, eliminating the need to meet both Common Criteria and testing requirements. Vendors, for example, want to avoid having to give gov- ernments the source code for testing, which could compromise intellectual property, and want to avoid revealing specifics on vulnerabilities (which may raise security issues and also put users of older versions of the code more at risk). Common Criteria is an international standard for computer security. Documentation for it can be found at http://www.niap- ccevs.org/cc-scheme/cc_docs/. 23 World of Warcraft, for example, was described as essentially a global information grid with approximately 6 million subscribers and half a million people playing in real time at any given time. It has its own internal market economy, as well as a significant black market economy.

OCR for page 4
 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE also at the forefront of so-called rootkit24 technology. Examining attacks on large-scale games may be a guide to what is likely to happen in the non-game world. It was suggested that in 2006, security started to become a differentiator among commercial products. Around that time, companies began televising ads about security and explicitly offering security features in new products. Customers were more open to the idea of using multiple vendors to take advantage of diversity in features and suppliers. Security problems are complicated. There is a difference between implementation bugs such as buffer overflows or unsafe systems calls, and architectural flaws such as compartmentalization problems in design or insecure auditing. As much attention needs to be paid to looking for architectural or requirements flaws as is paid to looking for coding bugs. Although progress is being made in automation, both processes still need people in the loop. When a tool turns up bugs or flaws, it gives some indication of the “badness” of the code—a “badness-o-meter” of sorts. But when use of a tool does not turn up any problems, this is not an indica- tion of the “goodness” of the code. Instead, one is left without much new knowledge at all. Participants emphasized that software security is not application security. Software is everywhere—not just in the applications. Software is in the operating system, the firewall, the intrusion detection system, the public key infrastructure, and so on. These are not “applications.” Appli- cation security methods work from the outside in. They work for COTS software, require relatively little expertise, and are aimed at protecting installed software from harm and malicious code. System software secu- rity works from the inside out, with input into and analysis of design and implementation, and requires a lot of expertise. In one participant’s view, security should also be thought of as an emergent property of software, just like quality. It cannot be added on. It has to be designed in. Vendors are placing increased emphasis on security, and most customers have a group devoted to software security. It was suggested that the tools market is growing, for both application security (a market of between $60 million and $80 million) and software security (a market of about $20 million, mostly for static analysis tools). Consult- ing services, however, dwarf the tools market. One speaker described the “three pillars” of software security: 24A rootkit is a set of software tools that can allow hackers to continue to gain undetected, unauthorized access to a system following an initial, successful attack by concealing pro- cesses, files, or data from the operating system.

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS • Risk management, tied to the mission or line of business. Financial institutions such as banks and credit card consortiums are in the lead here, in part because Sarbanes-Oxley made banks recognize their software risk. • Touchpoints, or best practices. The top two are code review with a tool and architectural risk analysis. • Knowledge, including enterprise knowledge bases about security principles, guidelines, and rules; attach patterns; vulnerabilities; and his- torical risks. SESSION 5: ENTERPRISE SCALE AND bEyOND Panelists: Werner Vogels, Amazon.com, and Alfred Spector, AZS-Services Moderator: Jim Larus The speakers at this session focused on the following topics, from the perspective of industry: • What are the characteristics of successful approaches to addressing scale and uncertainty in the commercial sector, and what can the defense community learn from this experience? • What are the emerging software challenges for large-scale enter- prises and interconnected enterprises? • What do you see as emerging technology developments that relate to this? Life Is Not a State-Machine: The Long Road from Research to Production Discussions during this session centered on large-scale Web opera- tions, such as that of Amazon.com, and what lessons about scale and uncertainty can be drawn from them. It was argued that in some ways, software systems are similar to biological systems. Characteristics and activities such as redundancy, feedback, modularity, loose coupling, purg- ing, apoptosis (programmed cell death), spatial compartmentalization, and distributed processing are all familiar to software-intensive systems developers, and yet these terms can all be found in discussions of robust- ness in biological systems. It was suggested that there may be useful les- sons to be drawn from that analogy. Amazon.com is very large in scale and scope of operations: It has seven Web sites; more than 61 million active customer accounts and over 1.1 million active seller accounts, plus hundreds of thousands of

OCR for page 4
 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE registered associates; over 200,000 registered Web services developers; over 12,500 employees worldwide; and more than 20 fulfillment centers worldwide. About 30 percent of Amazon’s sales are made by third-party sellers; almost half of its sales are to buyers outside the United States. On a peak shipping day in 2006, Amazon made 3.4 million shipments. Amazon.com’s technical challenges include how to manage millions of commodity systems, how to manage many very large, geographically dispersed facilities in concert, how to manage thousands of services run- ning on these systems, how to ensure that the aggregate of these services produces the desired functionality, and how to develop services that can exploit commodity computing power. It, like other companies providing similar kinds of services, faces challenges of scale and uncertainty on an hourly basis. Over the years, Amazon has undergone numerous transformations—from retailer to technology provider, from single application to platform, from Web site and database to a massively distributed parallel system, from Web site to Web service, from enterprise scale to Web scale. Amazon’s approach to man- aging massive scale can be thought of as “controlled chaos.” It continuously uses probabilistic and chaotic techniques to monitor business patterns and how its systems are performing. As its lines of business have expanded these techniques have had to evolve—for example, focusing on tracking customer returns as a negative metric does not work once product lines expand into clothing (people are happy to order multiple sizes, keep the one that fits, and return the rest). Amazon builds almost all of its own software because the commercial and open source infrastructure available now does not suit Amazon.com’s needs. The old technology adoption life cycle from product development to useful acceptance was between 5 and 25 years. Amazon and similar companies are trying to accelerate this cycle. However, it was suggested that for an Amazon developer to select and use a research technology is almost impossible. In research, there are too many possibilities to choose from, experiments are unrealistic compared to real life, and underly- ing assumptions are frequently too constrained. In real life, systems are unstable, parameters change and things fail continuously, perturbations and disruptions are frequent, there are always malicious actors, and fail- ures are highly correlated. In the real world, when the system fails, the mission of the organization cannot stop—it must continue.25 Often, complexity is introduced to manage uncertainty. However, there may well exist what one speaker called “conservation laws of com- plexity.” That is, in a complex interconnected system, complexity cannot 25 Examples of systems where assumptions did not match real life include the Titanic, the Tacoma Narrows bridge, and the Estonian ferry disaster.

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS be reduced absolutely, it can only be moved around. If uncertainty is not taken into account in large scale system design, it makes adoption of the chosen technology fairly difficult. Engineers in real life are used to deal- ing with uncertainty. Assumptions are often added to make uncertainty manageable. At Amazon, the approach is to apply Occam’s razor: If there are competing systems to choose from, pick the system that has the fewest assumptions. In general, assumptions are the things that are really limit- ing and could limit the system’s applicability to real life. Two different engineering approaches were contrasted, one with the goal of building the best possible system (the “right” system) whatever the cost, and the other with the more modest goal of building a smaller, less-ambitious system that works well and can evolve. The speaker char- acterized the former as being incredibly difficult, taking a long time and requiring the most sophisticated hardware. By contrast, the latter approach can be faster, it conditions users to expect less, and it can, over time, be improved to a point where performance almost matches that of the best possible system. It was also argued that traditional management does not work for complex software development, given the lack of inspection and control. Control requires determinism, which is ultimately an illusion. Amazon’s approach is to optimize team communication by reducing team size to maximum of 8-10 people (a “two-pizza team”). For larger problems, decompose the problem and reduce the size of the team needed to tackle the subproblems to a two-pizza group. If this cannot be done, it was sug- gested, than do not try to solve that problem—it’s too complicated. A general lesson that was promoted during this session was to let go of control and the notion that these systems can be controlled. Large systems cannot be controlled—they are not deterministic. For various reasons, it is not possible to consider all the inputs. Some may not have been included in the original design; requirements may have changed; the environment may have changed. There may be new networks and/or new controllers. The problem is not one of control; it is dealing with all the interactions among all the different pieces of the system that cannot be controlled. Amazon.com’s approach is to let these systems mature incrementally, with iterative improvement to yield the desired outcome during a given time period. The Old, the Old, and the New In this session’s discussions, the first “old” was the principle of abstraction-encapsulation-reuse. Reuse is of increasing importance every- where as the sheer quantity of valuable software components continues to grow. The second “old” was the repeated quest (now using Web services

OCR for page 4
 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE and increasingly sophisticated software tools) to make component reuse and integration the state of practice. Progress is being made in both of these areas, as evidenced by investment and anecdotes. The “new” dis- cussed was the view that highly structured, highly engineered systems may have significant limitations. Accordingly, it was argued, “semantic integration,” more akin to Internet search, will play a more important role. There are several global integration agendas. Some involve broad societal goals such as trade, education, health care, and security. At the firm or organization level, there is supply chain integration and N to 1 integration of many stores focusing on one consumer, as in the case Ama- zon and its many partners and vendors. In addition, there is collaborative engineering, multidisciplinary R&D, and much more. Why is global integration happening? For one thing, it is now tech- nically possible, given ubiquitous networking, faster computers, new software methodologies. People, organizations, computation, and devel- opment are distributed, and networked systems are now accepted as part of life and business, along with the concomitant benefits and risks (including security risks). An emerging trend is the drive to integrate these distributed people and processes to get efficiency and cost-effective development derived from reuse. Another factor is that there are more software components to inte- grate and assemble. Pooling of the world’s software capital stock is creat- ing heretofore unimaginably creative applications. Software is a major element of the economy. It was noted that by 2004, the amount of U.S. commercial capital stock relating to software, computer hardware, and telecommunications accounted for almost one-quarter of the total capital stock of business; about 40 percent of this is software. Software’s real value in the economy could even be understated because of accounting rules (depreciation), price decreases, and improvements in infrastructure and computing power. The IT agenda and societal integration reinforce each other. Core elements of computer science, such as algorithms and data struc- tures, are building blocks in the integration agenda. The field has been focusing more and more on the creation and assembly of larger, more flexible abstractions. It was suggested that if one accepts that the notion of abstraction-encapsulation-reuse is central, then it might seem that ser- vice-oriented computing is a done deal. However, the challenge is in the details: How can the benefits of the integration agenda be achieved throughout society? How are technologists and developers going to create these large abstractions and use them? When the Internet was developed, some details—such as quality of service and security—were left undone. Similarly, there are open chal-

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS lenges with regard to integration and service-oriented approaches. What are the complete semantics of services? What security inheres in the ser- vice being used? What are the failure modes and dependencies? What is the architectural structure of the world’s core IT and application services? How does it all play out over time? What is this hierarchy that occurs globally or, for the purposes of this workshop, perhaps even within DoD or within one of the branches of the military? Service-oriented computing is computing whereby one can create, flexibly deploy, manage, meter and charge for (as appropriate), secure, locate, use, and modify computer programs that define and implement well-specified functions, having some general utility (services), often recursively using other services developed and deployed across time and space, and where computing solutions can be built with a heavy reliance on these services. Progress in service-oriented computing brings together information sharing, programming methodologies, transaction process- ing, open systems approaches, distributed computing technologies, and Web technologies. There is now is a huge effort on the part of industry to develop appli- cation-level standards. In this approach, companies are presented with the definition of some structure that they need to use to interoperate with other businesses, rather than, for example, having multiple individual fiefdoms within each company develops unique customer objects. The Web services approach generally implies a set of services that can be invoked across a network. For many, Web services comprise things such as Extensible Markup Language (XML) and SOAP (a protocol for exchanging XML-based messages over computer networks) along with a variety of Web service protocols that have now been defined and are heav- ily used, developed, produced, and standardized (many in a partnership between IBM and Microsoft). Web services are on the path to full-scale, service-oriented computing; it was argued that this path can be traced back to the 1960s and the airlines’ Sabre system, continuing through Arpanet, the Internet, and the modern World Wide Web. Web services based on abstraction-encapsulation-reuse are a new approach to applying structure-oriented engineering tradition to informa- tion technology (IT). For example, integration steps include the precise definition of function (analogous to the engineering specifications and standards for transportation system construction), architecture (analo- gous to bridge design, for example), decomposition, rigorous component production (steel beams, for example), careful assembly, and managed change control. The problem is, there may be limits to this at scale. In software, each of these integration steps is difficult in itself. Many projects are inherently multiorganizational, and rapid changes have dire conse- quences for traditional waterfall methodologies.

OCR for page 4
 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE It was argued that “semantic integration,” a dynamic, fuzzier inte- gration more akin to Internet search, will play a larger role in integration than more highly structured engineering of systems. Ad hoc integration is a more humble approach to service-based integration, but it is also more dynamic and interpretive. Components that are integrated may be of lower generality (not a universal object) and quality (not so well specified). Because they will be of lower generality, perhaps with dif- ferent coordinate systems, there will have to be automated impendence matching between them. Integration may take place on an intermediate service, perhaps in a browser. Businesses are increasingly focusing on this approach for the same reasons that simple approaches have always been favored. This is a core motivational component of the Web 2.0 mash-up focus. Another approach to ad hoc integration uses access to massive amounts of information—with no reasonable set of predefined, param- eterized interfaces, annotation and search will be used as the integration paradigm. It is likely that there will be tremendous growth in the standards needed to capitalize on the large and growing IT capital plant. There will be great variability from industry to industry and from place to place around the world, depending on the roles of the industry groups involved, differential regulations, applicable types of open source, and national interests. Partnerships between the IT industry and other indus- tries will be needed to share expertise and methodologies for creating usable standards, working with competitors, and managing intellectual property. A number of topics for service-oriented systems and semantic inte- gration research were identified, some of which overlap with traditional software system challenges. The service-oriented systems research areas and semantic integration research areas spotlighted included these: • Basics. Is there a, practical, normative general theory of consistency models? Are services just a remote procedure call invocation or a complex split between client and server? How are security and privacy to be pro- vided for the real world, particularly if one does not know what services are being called? How does one utilize parallelism? This is an increasingly important question in an era of lessening geometric clock-speed growth. • Management. With so many components and so much information hiding, how does one manage systems? How does one manage intellec- tual property? • Global properties. Can one provide scalability generally? How does one converge on universality and standards without bloat? What systems can one deploy as really universal service repositories?

OCR for page 4
 SUMMARY OF WORKSHOP DISCUSSIONS • Economics. What are realistic costing/charging models and implementations? • Social networking. How does one apply social networking technol- ogy to help? • Ontologies of ast breadth and scale. • Automated discoery and transformation. • Reasoning in the control flow. • Use of heuristics and redundancy. • Search as a new paradigm. Complexity grows despite all that has been done in computer science. There is valuable, rewarding, and concrete work for the field of computer science in combating complexity. This area of work requires focus. It could prove as valuable as direct functional innovation. Participants identified several research areas to address complexity relevant to service-oriented systems and beyond, including: meaning, measuring, methodology, sys- tem architecture, science and technology, evolutionary systems design, and legal and cultural change.