Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 4
2
Summary of Workshop Discussions
SESSION 1: PROCESS, ARCHITECTuRE,
AND THE gRAND SCALE
Panelists: John Vu, Boeing, and Rick Selby, Northrop Grumman
Corporation
Moderator: Michael Cusumano
Panelist presentations and general discussions at this session were
intended to explore the following questions from the perspectives of soft-
ware development for government and commercial aerospace systems:
• What are the characteristics of successful approaches to architec-
ture and design for large-scale systems and families of systems?
• Which architecture ideas can apply when systems must evolve
rapidly?
• What kinds of management and measurement approaches could
guide program managers and developers?
Synergies Across Software Technologies and business Practices
Enable Successful Large-Scale Systems
Context matters in trying to determine the characteristics of success-
ful approaches—different customer relationships, goals and needs, pacing
of projects, and degree of precedent all require different practices. For
example, different best practices may apply depending on what sort of
OCR for page 5
SUMMARY OF WORKSHOP DISCUSSIONS
system or application is under development. Examples discussed include
commercial software products, IT and Internet financial services, air-
planes, and government aerospace systems.
• Different systems and software engineering organizations hae different
customers and strategies. They may produce a variety of deliverables, such as
a piece of software, an integrated hardware-software environment, or very
large, complicated, interconnected, hardware-software networked systems.
• Different systems and software engineering organizations hae different
goals and needs. Product purposes vary—user empowerment, business
operations, and mission capabilities. Projects can last from a month to 10
or 12 years. The project team can be one person or literally thousands.
The customer agreement can be a license, service-level agreement, or
contract. There can be a million customers or just one—for example, the
government. The managerial focus can be on features and time to market;
cycle time, workflow, and uptime; or reliability, contract milestones, and
interdependencies; and so on.
• While some best practices, such as requirements and design reiews and
incremental and spiral life cycles, are broadly applicable, specific practice usu-
ally aries. Although risk management is broadly applicable, commercial,
financial, and government system developers may adopt different kinds
of risk management. While government aerospace systems developers
may spend months or years doing extensive system modeling, this may
not be possible in other organizations or for other types of products. Com-
mercial software organizations may focus on daily builds (that is, each
day compiling and possibly testing the entire program or system incor-
porating any new changes or fixes); for aerospace systems, the focus may
be on weekly or 60-day builds. Other generally applicable best practices
that vary by market and organization include parallel small teams, design
reuse, domain-specific languages, opportunity management, trade-off
studies, and portability layers. These differences are driven by the differ-
ent kinds of risks that drive engineering decisions in these sectors.
• Goernment aerospace systems deelopers, along with other ery large
software-deelopment enterprises, employ some distinctie best practices. These
include independent testing teams and, for some aspects of the systems
under consideration, deterministic, simple designs. These practices are
driven by a combination of engineering, risk-management, and contrac-
tual considerations.
In a very large1 organization, synergy across software technologies
and business practices can contribute to success. Participants explored
1Very large in this case means over 100,000 employees throughout a supply chain doing
systems engineering, systems development, and systems management; managing multiple
product lines; and building systems with millions of lines of code.
OCR for page 6
SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
the particular case of moderately precedented systems2 and major com-
ponents with control-loop architectures. For systems of this kind there are
technology and business practice synergies that have worked well. Here
are some examples noted by speakers:
• Decomposition of large systems to manage risk. With projects that typi-
cally take between 6 and as many as 24 months to deliver, incremental
decomposition of the system can reduce risk, provide better visibility into
the process, and deliver capability over time. Decomposition accelerates
system integration and testing.
• Table-based design, oriented to a system engineering iew in terms of
states and transitions, both nominal and off-nominal. This enables the use of
clear, table-driven approaches to address nominal modes, failure modes,
transition phases, and different operations at different parts of the system
operations.
• Use of built-in, domain-specific (macro) languages in a layered architec-
ture. The built-in, command-sequencing macro language defines table-
driven executable specifications. This permits a relatively stable infra-
structure and a run-time system with low-level, highly deterministic
designs yet extensible functionality. It also allows automated testing of
the systems.
• Use of precedented and well-defined architectures for the task manage-
ment structure that incorporates a simple task structure, deterministic process-
ing, and predictable timelines. For example, a typical three-task management
structure might have high-rate (32 ms) tasks, minor-cycle (128 ms) tasks,
and background tasks. The minor cycle reads and executes commands,
formats telemetry, handles fault protection, and so forth. The high-rate
cycle handles message traffic between the processors. The background
cycle adds capability that takes a longer processing time. This is a reusable
processing architecture that has been used for over 30 years in space-
craft and is aimed at the construction of highly reliable, deterministic
systems.
• Gaining adantages from lack of fault proneness in reused components
by achieing high leels of code, design, and requirement reuse. One example
of code reuse was this: Across 25 NASA ground systems, 32 percent of
software components were either reused or modified from previous sys-
tems (for spacecraft, reuse was said to be as high as 80 percent). Designs
and requirements can also be reused. Typically, there is a large backward
2 Precedent refers to the extent to which we have experience with systems of a similar kind.
More specifically, there can be precedent with respect to requirements, architecture and de-
sign, infrastructure choices, and so on. Building on precedent leads to routinization, reduced
engineering risk, and better predictability (lower variance) of engineering outcomes.
OCR for page 7
SUMMARY OF WORKSHOP DISCUSSIONS
compatibility set of requirements, and these requirements can be reused.
Requirements reuse is very common and very successful even though the
design and implementation might each be achieved differently. Design
reuse might involve allocation of function across processors in terms of
how particular algorithms are structured and implemented. The functions
might be implemented differently in the new system, for example, in com-
ponents rather than custom code or in different programming languages.
This is an example of true design reuse rather than code reuse.
In addition to these synergies, it was suggested that other types of
analyses could also contribute to successful projects. Data-driven statisti-
cal analyses can help to identify trends, outliers, and process improve-
ments to reduce or mitigate defects. For example, higher rates of compo-
nent interactions tend to be correlated with more faults, as well as more
fault-correction effort. Risk analyses prioritize risks according to cost,
schedule, and technical safety impacts. Charts that show project risk
mitigation over time and desired milestones help to define specific tasks
and completion criteria. It was suggested that each individual risk almost
becomes a microcosm of its own project, with schedules and milestones
and progressive mitigation of that risk to add value.
One approach to addressing the challenge of scale is to divide and
conquer. Of course, arriving at an architectural design that supports
decomposition is a prerequisite for this approach, which can apply across
many kinds of systems development efforts. Suggestions included the
following:
• Diide the organization into parallel teams. Divide very large 1,000-
person teams into parallel teams; establish a project rhythm of design
cycles and incremental releases. This division of effort is often based on a
system architectural structure that supports separate development paths
(an example of what is known as Conway’s law—that software system
structures tend to reflect the structures of the organizations they are devel-
oped by). Indeed, without agreement on architectural fundamentals—the
key internal interfaces and invariants in a system—division of effort can
be a risky step.
• Innoate and synchronize. Bring the parallel teams together, whether
the task is a compilation or a component delivery and interface integra-
tion. Then stabilize, usually through some testing and usage period.
• Encourage coarse-grain reuse. There is a lot of focus on very fine-
grain reuse, which tends to involve details about interfaces and depen-
dencies; there is also significant coarse-grain opportunity to bring together
both legacy systems and new systems. A coarse-grain approach makes
possible the accommodation of systems at different levels of maturity.
OCR for page 8
SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
Examples of success in coarse-grain reuse are major system frameworks
(such as e-commerce frameworks), service-based architectures, and lay-
ered architectures.
• Automate. Automation is needed in the build process, in testing,
and in metrics.
Uncertainty is inherent in the development of software-intensive
systems and must be reassessed constantly, because there are always
unknowns and incomplete information. Waiting for complete information
is costly, and it can take significant time to acquire the information—if it is
possible to acquire it at all. Schedules and budgets are always limited and
almost never sufficient. The goal, it was argued, should be to work effec-
tively and efficiently within the resources that are available and discharge
risks in an order that is appropriate to the goals of the system and the
nature of its operating environment: Establish the baseline design, apply
systematic risk management, and then apply opportunity management,
constantly evaluating the steps needed and making decisions about how
to implement them. Thus, it was suggested that appropriate incentives
and analogous penalty mechanisms at the individual level and at the
organization or supplier level can change behavior quickly. The goal is
thus for the incentive structure to create an opportunity to achieve very
efficient balance through a “self-managing organization.” In a self-man-
aging organization, it was suggested, the leader has the vision and is an
evangelist rather than a micromanager, allowing others to manage using
systematic incentive structures.
Some ways to enable software technology and business practices for
large-scale systems were suggested:
• Creating strategies, architectures, and techniques for the devel-
opment and management of systems and software, taking into account
multiple customers and markets and a broad spectrum of projects, from
small scale through large.
• Disseminating validated experiences and data for best practices
and the circumstances when they apply (for example, titles like “Case
Studies of Model Projects”).
• Aligning big V waterfall-like systems engineering life-cycle models
with incremental/spiral software engineering life-cycle models. 3
3 The
V model is a V-shaped, graphical representation of the systems development life
cycle that defines the results to be achieved in a project, describes approaches for developing
these results, and links early stages (on the left side of the V) with evaluation and outcomes
(on the right side). For example, requirements link to operational testing and detailed design
links to unit testing.
OCR for page 9
SUMMARY OF WORKSHOP DISCUSSIONS
• Facilitating objective interoperability mechanisms and benchmarks
for enabling information exchange.
• Lowering entry barriers for research groups and nontraditional
suppliers to participate in large-scale system projects (Grand Challenges,
etc.).
• Encouraging advanced degree programs in systems and software
engineering.
• Defining research and technology roadmaps for systems and soft-
ware engineering.
• Collaborating with foreign software developers.
Process, Architecture, and very Large-Scale Systems
Remarks during this portion of the session were aimed at thinking
outside the box about what the state of the art in architectures might look
like in the future for very large-scale, complex systems that exhibit unpre-
dictable behavior. The primary context under discussion was large-scale
commercial aircraft development—the Boeing 777 has a few million lines
of code, for example, and the new 787 has several million and climbing.
It was argued that very large-scale, highly complex systems and fami-
lies of systems require new thinking, new approaches, and new processes
to architect, design, build, acquire, and operate. It was noted that these
new systems are going from millions of lines of code to tens of millions of
lines of code (perhaps in 10 years to billions of lines of code and beyond);
from hundreds of platforms (servers) to thousands, all interconnected by
heterogeneous networks; from hundreds of vendors (and subcontractors)
to thousands, all providing code; and from a well-defined user com-
munity to dynamic communities of interdependent users in changing
environments. It was suggested that the issue for the future—10 or 20
years from now—is how to deal with the potential billion lines of code
and tens of thousands of vendors in the very diverse, open-architecture-
environment global products of the future, assembled from around the
world. According to the forward-looking vision presented by speakers,
these systems may have the following characteristics:
• Very large-scale systems would integrate multiple systems, each of
them autonomous, having distinctive characteristics, and performing its
own functions independently to meet distinct objectives.
• Each system would have some degree of intelligence, with the
objectives of enabling it to modify its relationship to other component sys-
tems (in terms of functionality) and allowing it to respond to changes, per-
haps unforeseen, in the environment. When multiple systems are joined
OCR for page 10
0 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
together, the significant emergent capabilities of the resulting system as a
whole would enable common goals and objectives.
• Each very large-scale system would share information among the
various systems in order to address more complex problems.
• As more systems are integrated into a very large-scale system, the
channels connecting all systems through which information flows would
become more robust and continue to grow and expand throughout the life
cycle of the very large-scale system.
It was argued that a key benefit of a very large-scale system is the
interoperability between operational systems that allows decision mak-
ers to make better, more informed decisions more quickly and accurately.
From a strategic perspective, a very large-scale system is an environment
where operational systems have the ability to share information among
geographically distributed systems with appropriate security and to act
on the shared information to achieve their desired business goals and
objectives. From an operational perspective, a very large-scale system
is an environment where each operational subsystem performs its own
functions autonomously, consistent with the overall strategy of the very
large-scale system.
The notion of continuous builds or continuous integration was also
discussed. Software approaches that depend on continuous integration—
that is, where changes are integrated very frequently—require processes
for change management and integration management. These processes
are incremental and build continuously from the bottom up to support
evolution and integration, instead of from the top down, using a plan-
driven, structured design. They separate data and functions for faster
updates and better security. To implement these processes, decentralized
organizations and an evolving concept of operations are required to adapt
quickly to changing environments.
The overall architectural framework for large-scale systems described
by some participants in this session consists of five elements:
• Goernance. These describe the rules, regulations, and change man-
agement that control the total system.
• Operational. These describe how each operational system can be
assembled from preexisting or new components (building blocks) to oper-
ate in their own new environment so they can adapt to change.
• Interaction. These describe the communication (information pipe-
line) and interaction between operational systems that may affect the very
large system and how the very large system will react to the inputs from
the operational systems.
OCR for page 11
SUMMARY OF WORKSHOP DISCUSSIONS
• Integration and change management. These describe the processes
for managing change and the integration of systems that enable emergent
capabilities.
• Technical. These depict the technology components that are neces-
sary to support these systems.
It was suggested that large-scale systems of that future that will cope
with scale and uncertainty would be built from the bottom up by start-
ing with autonomous building blocks to enable the rapid assembly and
integration of these components to effectively evolve the very large-scale
system. The architectural framework would ensure that each building
block would be aligned to the total system. Building blocks would be
assembled by analyzing a problem domain through the lens of an opera-
tional environment or mission for the purpose of creating the character-
istics and functionality that would satisfy the stakeholders’ requirements.
In this mission-focused approach, all stakeholders and modes of opera-
tions should be clearly identified; different user viewpoints and needs
should be gathered for the proposed system; and stakeholders must state
which needs are essential, which are desirable, and which are optional.
Prioritization of stakeholders’ needs is the basis for the development of
such systems; vague and conflicting needs, wants, and opinions should be
clarified and resolved; and consensus should be built before assembling
the system.
At the operational level, the system would be separated from current
rigid organization structures (people, processes, technology) and would
evolve into a dynamic concept of operation by assembling separate build-
ing blocks to meet operational goals. The system manager should ask:
What problem will the system solve? What is the proposed system used
for? Should the existing system be improved and updated by adding
more functionality or should it be replaced? What is the business case?
To realize this future, participants suggested that research is needed in
several areas, including these:
• Governance (rules and regulations for evolving systems).
• Interaction and communication among systems (including the pos-
sibility of negative interactions between individual components and the
integrity, security, and functioning of the rest of the system).
• Integration and change management.
• User’s perspective and user-controlled evolution.
• Technologies supporting evolution.
• Management and acquisition processes.
• An architectural structure that enables emergence.
OCR for page 12
SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
• Processes for decentralized organizations structured to meet opera-
tional goals.
SESSION 2: DOD SOFTWARE CHALLENgES FOR
FuTuRE SySTEMS
Panelists: Kristen Baldwin, Office of the Under Secretary of Defense
for Acquisitions, Technology and Logistics, and Patrick Lardieri,
Lockheed Martin
Moderator: Douglas Schmidt
Panelist presentations and general discussions during this session
were intended to explore two questions, from two perspectives: that of
the government and that of the government contractor:
• How are challenges for software in DoD systems, particularly
cyber-physical systems, being met in the current environment?
• What advancements in R&D, technology, and practices are needed
to achieve success as demands on software-intensive system development
capability increase, particularly with respect to scale, complexity, and the
increasingly rapid evolution in requirements (and threats)?
DoD Software Engineering and System Assurance
An overview of various activities relating to DoD software engineering
was given. Highlights from the presentation and discussion follow. The
recent Acquisition & Technology reorganization is aimed at positioning
systems engineering within the DoD, consistent with a renewed emphasis
on software. The director of Systems and Software Engineering now reports
directly to the Under Secretary of Defense for Acquisition and Technology.
The mission of Systems and Software Engineering, which addresses evolv-
ing system—and software—engineering challenges, is as follows:
• Shape acquisition solutions and promote early technical planning.
• Promote the application of sound systems and software engineer-
ing, developmental test and evaluation, operational test and evaluation to
determine operational suitability and effectiveness, and related technical
disciplines across DoD’s acquisition community and programs.
• Raise awareness of the importance of effective systems engineering
and raise program planning and execution to the state of the practice.
• Establish policy, guidance, best practices, education, and train-
ing in collaboration with the academic, industrial, and government
communities.
OCR for page 13
SUMMARY OF WORKSHOP DISCUSSIONS
• Provide technical insight to program managers and leadership to
support decision making.
DoD’s Software Center of Excellence is made up of a community of
participants including industry, DoD-wide partnerships, national part-
nerships, and university and international alliances. It will focus on sup-
porting acquisition; improving the state of the practice of software engi-
neering; providing leadership, outreach, and advocacy for the systems
engineering communities; and fostering resources that can meet DoD
goals. These are elements of DoD’s strategy for software, which aims to
promote world-class leadership for DoD software engineering.
Findings from some 40 recent program reviews were discussed. These
reviews identified seven systemic issues and issue clusters that had con-
tributed to DoD’s poor execution of its software program, which were
highlighted in the session discussion. The first issue is that software
requirements are not well defined, traceable, and testable. A second issue
cluster involves immature architectures; integration of commercial-off-
the-shelf (COTS) products; interoperability; and obsolescence (the need
to refresh electronics and hardware). The third cluster involves software
development processes that are not institutionalized, have missing or
incomplete planning documents, and inconsistent reuse strategies. A
fourth issue is software testing and evaluation that lacks rigor and breadth.
The fifth issue is lack of realism in compressed or overlapping schedules.
The sixth issue is that lessons learned are not incorporated into successive
builds—they are not cumulative. Finally, software risks and metrics are
not well defined or well managed.
To address these issues, DoD is pursuing an approach that includes
the following elements:
• Identification of software issues and needs through a software industrial
base assessment,4 a National Defense Industrial Association (NDIA) workshop
on top software issues, and a defense software strategy summit. The industrial
base assessment, performed by CSIS, found that the lack of comprehensive,
accurate, timely, and comparable data about software projects within DoD
limits the ability to undertake any bottom-up analysis or enterprise-wide
assessments about the demand for software. Although the CSIS analy-
sis suggests that the overall pool of software developers is adequate, the
CSIS assessment found an imbalance in the supply of and demand for the
specialized, upper echelons of software developer/management cadres.
These senior cadres can be grown, but it takes time (10 or more years) and
4 Center for Strategic and International Studies (CSIS), Defense-Industrial Initiatives Group,
2006. Software Industrial Base Assessment: Phase I Report, October 4.
OCR for page 14
SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
a concerted strategy. In the meantime, management/architecture/systems
engineering tools might help improve the effectiveness of the senior cadres.
Defense business system/COTS software modification also places stress on
limited pools of key technical and management talent. Moreover, the true
cost and risk of software maintenance deferral are not fully understood.
• Creation of opportunities and partnerships through an established net-
work of goernment software points of contact; chartering the NDIA Software
Committee; information exchanges with goernment, academia, and industry,
and planning a systems and software technology conference. Top issues emerg-
ing from the NDIA Defense Software Strategy Summit in October 2006
included establishment and management of software requirements, the
lack of a software voice in key system decisions, inadequate life-cycle
planning and management, the high cost and ineffectiveness of traditional
software verification methods, the dearth of software management exper-
tise, inadequate technology and methods for assurance, and the need for
better techniques for COTS assessment and integration.
• Execution of focused initiaties such as Capability Maturity Model Inte-
gration (CMMI) support for integrity and acquisition, a CMMI guidebook, a
handbook on engineering for system assurance, a systems engineering guide
for systems of systems (SoSs), the proision of software support to acquisition
programs, and a ision for acquisition reform. SoSs to be used for defense
require special considerations for scale (a single integrated architecture is
not feasible), ownership and management (individual systems may have
different owners), legacy (given budget considerations, current systems
will be around for a long time), changing operations (SoS configurations
will face changing and unpredictable operational demands), criticality
(systems are integrated via software), and role of the network (SoSs will
be network-based, but budget and legacy challenges may make imple-
mentation uneven). To address a complex SoS, an initial (incremental)
version of the DoD’s SoS systems engineering guide is being piloted;
future versions will address enterprise and net-centric considerations,
management, testing, and sustaining engineering.
The issue of system assurance—reducing the vulnerability of systems
to malicious tampering or access—was noted as a fundamental consid-
eration, to the point that cybertrust considerations can be a fundamental
driver of requirements, architecture and design, implementation practice,
and quality assurance.5 Because current assurance, safety, and protection
5A separate National Research Council study committee is exploring the issue of cyberse-
curity research and development broadly, and its report, Toward a Safer and More Secure Cyber-
space, will be published in final form in late 2007. See http://cstb.org/project_cybersecurity
for more information.
OCR for page 29
SUMMARY OF WORKSHOP DISCUSSIONS
tivity increase, it was argued that the assurance bar for software quality
and cybersecurity attributes can be raised by (1) raising the component
assurance bar (resources are finite and organizations can spend too much
time and too many resources trying to patch their way to security) and
(2) getting customers to understand and accept that assurance for custom
software can be raised if they are willing to pay more (if customers do
not know about costs that are hidden, they cannot accept or budget for
them).
One set of best practices and technologies to write secure software
was described. It includes
• Secure coding standards,
• Developer training in secure coding,
• Enabled, embedded security points of contact (the “missionary
model”),
• Security as part of development including functional, design, test
(include threat modeling),
• Regressions (including destructive security tests),
• Automated tools (home grown, commercial of multiple flavors),
• Locked-down configurations (delivering products that are secure
on installation), and
• Release criteria for security.
However, these practices are not routinely taught in universities. Nei-
ther the software profession not the industry as a whole can simply rely
on a few organizations doing these kinds of things. Discussion identified
some necessary changes in the long run:
• Uniersity curricula. It was argued that university programs should
do a better job of teaching secure coding practices and training future
developers to pay attention to security as part of software development. If
the mindset of junior developers does not change, the problem will not be
solved. One participant said, “Process won’t fix stupidity or arrogance.”
Incentives to be mindful of security should be integrated throughout the
curriculum. When security is embedded throughout the development
process, a small core of security experts is not sufficient. One challenge
is how to balance the university focus on enduring knowledge and skills
against the need for developers to understand particular practices and
techniques specific to current technologies.
• Automation. Automated tools are promising and will be increas-
ingly important, but they are not a cure-all. Automated tools are not yet
ready for universal prime time for a number of reasons, including: Tools
need to be trained to understand the code base; programmers have dif-
OCR for page 30
0 SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
ficulty establishing sound and complete rules; most of today’s tools look
only for anticipated vulnerabilities (e.g., buffer overruns) and cannot be
readily adapted to new classes of vulnerabilities; there are often too many
false positives; scalability is an issue; one size does not fit all (it is prema-
ture for standards) and therefore multiple tools are needed; and there is
not a good system for rating tools.
Conventional wisdom holds that people will not pay more for secure
software. However, people already are paying for bad security—a 2002
study by the National Institute of Standards and Technology (NIST)
reported that the consequences of bad software cost $59 billion a year in
the United States.20 It was argued that from a development standpoint,
security cost-effectiveness should be measured pragmatically. However,
a simple return on investment (ROI) is not the right metric. From a devel-
oper’s perspective, the goal should be the highest net present value (NPV)
for cost avoidance of future bugs—not raw cost savings or the ROI from
fixing bugs in the current year. Another way of valuing security is oppor-
tunity cost savings—what can be done (e.g., building new features) with
the resources saved from not producing patches. From the customer’s
perspective, it is the life-cycle cost of applying a patch weighed against
the expected cost of the harm from not applying the patch. Customers
want predictable costs, and the perception is that they cannot budget
for the cost of applying patches (even though the real cost driver is the
consequences of unpatched systems). If customers know what they are
getting, they can plan for a certain level of risk at a certain cost. The goal
is to find the match between expected risk for the customer and for the
vendor—how suitable the product is for its intended use.
Certification is a way of assessing what is “good.”21 But partici-
pants were not optimistic when considering prospects for certification
of development processes. There is too much disagreement and ideol-
ogy surrounding development processes. However, there can be some
commonality around aspects of good development processes. Certifying
developers is also problematic. In engineering, there are accredited degree
programs and clear licensing requirements. The awarding of a degree in
computer science is not analogous to licensing an engineer because there
is not the same common set of requirements, especially robustness and
safety requirements. In addition, it can be difficult to replicate the results
20 See NIST, 2002, “Planning Report 02-3: The economic impacts of inadequate
infrastructure for software testing.” Available online at http://www.nist.gov/
director/prog-ofc/report02-3.pdf.
21A recent NRC study examines the issue of certification and dependability of software
systems. See information on the report Software for Dependable Systems: Sufficient Eidence?
at http://cstb.org/project_dependable.
OCR for page 31
SUMMARY OF WORKSHOP DISCUSSIONS
of software engineering processes, making it hard to achieve confidence
such that developers are willing to sign off on their work. Moreover, it
was argued that with current curricula, developers generally do not even
learn the basics of secure coding practice. There is little to no focus on
security, safety, or the possibility that the code is going to be attacked in
most educational programs. It was argued that curricula need to change
and that computer science graduates should be taught to “assume an
enemy.”
Automated tools can give better assurance to the extent that ven-
dors use them in development and fix what they find. Running evaluation
tools after the fact on something already in production is not optimal. 22 It
was suggested that there is potential for some kind of “goodness meter”
(a complement to the “badness meter” described in the next section) for
tool use and effectiveness—what tool was used, what the tool can and
cannot find, what the tool did and did not find, the amount of code cov-
ered, and that tool use was verified by a third party.
Software Security: building Security In
Discussions in this session focused on software security as a systems
problem as opposed to an application problem. In the current state of the
practice, certain attributes of software make software security a challenge:
(1) connectivity—the Internet is everywhere and most software is on it
or connected to it; (2) complexity—networked, distributed, mobile code
is hard to develop; and (3) extensibility—systems evolve in unexpected
ways and are changed on the fly. This combination of attributes also con-
tributes to the rise of malicious code.
Massively multiplayer online games (MMOGs) are bellwethers of
things to come in terms of sophisticated attacks and exploitation of vul-
nerabilities. These games experience the cutting edge of what is going on
in software hacking and attacks today.23 Attacks against such games are
22 It
was suggested that vendors should not be required to vet products against numerous
tools. It was also suggested that there is a need for some sort of Common Criteria reform
with mutual recognition in multiple venues, eliminating the need to meet both Common
Criteria and testing requirements. Vendors, for example, want to avoid having to give gov-
ernments the source code for testing, which could compromise intellectual property, and
want to avoid revealing specifics on vulnerabilities (which may raise security issues and also
put users of older versions of the code more at risk). Common Criteria is an international
standard for computer security. Documentation for it can be found at http://www.niap-
ccevs.org/cc-scheme/cc_docs/.
23 World of Warcraft, for example, was described as essentially a global information grid
with approximately 6 million subscribers and half a million people playing in real time at
any given time. It has its own internal market economy, as well as a significant black market
economy.
OCR for page 32
SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
also at the forefront of so-called rootkit24 technology. Examining attacks
on large-scale games may be a guide to what is likely to happen in
the non-game world. It was suggested that in 2006, security started to
become a differentiator among commercial products. Around that time,
companies began televising ads about security and explicitly offering
security features in new products. Customers were more open to the idea
of using multiple vendors to take advantage of diversity in features and
suppliers.
Security problems are complicated. There is a difference between
implementation bugs such as buffer overflows or unsafe systems calls,
and architectural flaws such as compartmentalization problems in design
or insecure auditing. As much attention needs to be paid to looking for
architectural or requirements flaws as is paid to looking for coding bugs.
Although progress is being made in automation, both processes still need
people in the loop. When a tool turns up bugs or flaws, it gives some
indication of the “badness” of the code—a “badness-o-meter” of sorts. But
when use of a tool does not turn up any problems, this is not an indica-
tion of the “goodness” of the code. Instead, one is left without much new
knowledge at all.
Participants emphasized that software security is not application
security. Software is everywhere—not just in the applications. Software
is in the operating system, the firewall, the intrusion detection system, the
public key infrastructure, and so on. These are not “applications.” Appli-
cation security methods work from the outside in. They work for COTS
software, require relatively little expertise, and are aimed at protecting
installed software from harm and malicious code. System software secu-
rity works from the inside out, with input into and analysis of design and
implementation, and requires a lot of expertise.
In one participant’s view, security should also be thought of as an
emergent property of software, just like quality. It cannot be added on. It
has to be designed in. Vendors are placing increased emphasis on security,
and most customers have a group devoted to software security. It was
suggested that the tools market is growing, for both application security
(a market of between $60 million and $80 million) and software security
(a market of about $20 million, mostly for static analysis tools). Consult-
ing services, however, dwarf the tools market. One speaker described the
“three pillars” of software security:
24A rootkit is a set of software tools that can allow hackers to continue to gain undetected,
unauthorized access to a system following an initial, successful attack by concealing pro-
cesses, files, or data from the operating system.
OCR for page 33
SUMMARY OF WORKSHOP DISCUSSIONS
• Risk management, tied to the mission or line of business. Financial
institutions such as banks and credit card consortiums are in the lead
here, in part because Sarbanes-Oxley made banks recognize their software
risk.
• Touchpoints, or best practices. The top two are code review with a
tool and architectural risk analysis.
• Knowledge, including enterprise knowledge bases about security
principles, guidelines, and rules; attach patterns; vulnerabilities; and his-
torical risks.
SESSION 5: ENTERPRISE SCALE AND bEyOND
Panelists: Werner Vogels, Amazon.com, and Alfred Spector,
AZS-Services
Moderator: Jim Larus
The speakers at this session focused on the following topics, from the
perspective of industry:
• What are the characteristics of successful approaches to addressing
scale and uncertainty in the commercial sector, and what can the defense
community learn from this experience?
• What are the emerging software challenges for large-scale enter-
prises and interconnected enterprises?
• What do you see as emerging technology developments that relate
to this?
Life Is Not a State-Machine:
The Long Road from Research to Production
Discussions during this session centered on large-scale Web opera-
tions, such as that of Amazon.com, and what lessons about scale and
uncertainty can be drawn from them. It was argued that in some ways,
software systems are similar to biological systems. Characteristics and
activities such as redundancy, feedback, modularity, loose coupling, purg-
ing, apoptosis (programmed cell death), spatial compartmentalization,
and distributed processing are all familiar to software-intensive systems
developers, and yet these terms can all be found in discussions of robust-
ness in biological systems. It was suggested that there may be useful les-
sons to be drawn from that analogy.
Amazon.com is very large in scale and scope of operations: It has
seven Web sites; more than 61 million active customer accounts and
over 1.1 million active seller accounts, plus hundreds of thousands of
OCR for page 34
SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
registered associates; over 200,000 registered Web services developers;
over 12,500 employees worldwide; and more than 20 fulfillment centers
worldwide. About 30 percent of Amazon’s sales are made by third-party
sellers; almost half of its sales are to buyers outside the United States.
On a peak shipping day in 2006, Amazon made 3.4 million shipments.
Amazon.com’s technical challenges include how to manage millions of
commodity systems, how to manage many very large, geographically
dispersed facilities in concert, how to manage thousands of services run-
ning on these systems, how to ensure that the aggregate of these services
produces the desired functionality, and how to develop services that can
exploit commodity computing power. It, like other companies providing
similar kinds of services, faces challenges of scale and uncertainty on an
hourly basis.
Over the years, Amazon has undergone numerous transformations—from
retailer to technology provider, from single application to platform, from Web
site and database to a massively distributed parallel system, from Web site to
Web service, from enterprise scale to Web scale. Amazon’s approach to man-
aging massive scale can be thought of as “controlled chaos.” It continuously
uses probabilistic and chaotic techniques to monitor business patterns and
how its systems are performing. As its lines of business have expanded these
techniques have had to evolve—for example, focusing on tracking customer
returns as a negative metric does not work once product lines expand into
clothing (people are happy to order multiple sizes, keep the one that fits, and
return the rest).
Amazon builds almost all of its own software because the commercial
and open source infrastructure available now does not suit Amazon.com’s
needs. The old technology adoption life cycle from product development
to useful acceptance was between 5 and 25 years. Amazon and similar
companies are trying to accelerate this cycle. However, it was suggested
that for an Amazon developer to select and use a research technology is
almost impossible. In research, there are too many possibilities to choose
from, experiments are unrealistic compared to real life, and underly-
ing assumptions are frequently too constrained. In real life, systems are
unstable, parameters change and things fail continuously, perturbations
and disruptions are frequent, there are always malicious actors, and fail-
ures are highly correlated. In the real world, when the system fails, the
mission of the organization cannot stop—it must continue.25
Often, complexity is introduced to manage uncertainty. However,
there may well exist what one speaker called “conservation laws of com-
plexity.” That is, in a complex interconnected system, complexity cannot
25 Examples of systems where assumptions did not match real life include the Titanic, the
Tacoma Narrows bridge, and the Estonian ferry disaster.
OCR for page 35
SUMMARY OF WORKSHOP DISCUSSIONS
be reduced absolutely, it can only be moved around. If uncertainty is not
taken into account in large scale system design, it makes adoption of the
chosen technology fairly difficult. Engineers in real life are used to deal-
ing with uncertainty. Assumptions are often added to make uncertainty
manageable. At Amazon, the approach is to apply Occam’s razor: If there
are competing systems to choose from, pick the system that has the fewest
assumptions. In general, assumptions are the things that are really limit-
ing and could limit the system’s applicability to real life.
Two different engineering approaches were contrasted, one with the
goal of building the best possible system (the “right” system) whatever
the cost, and the other with the more modest goal of building a smaller,
less-ambitious system that works well and can evolve. The speaker char-
acterized the former as being incredibly difficult, taking a long time
and requiring the most sophisticated hardware. By contrast, the latter
approach can be faster, it conditions users to expect less, and it can, over
time, be improved to a point where performance almost matches that of
the best possible system.
It was also argued that traditional management does not work for
complex software development, given the lack of inspection and control.
Control requires determinism, which is ultimately an illusion. Amazon’s
approach is to optimize team communication by reducing team size to
maximum of 8-10 people (a “two-pizza team”). For larger problems,
decompose the problem and reduce the size of the team needed to tackle
the subproblems to a two-pizza group. If this cannot be done, it was sug-
gested, than do not try to solve that problem—it’s too complicated.
A general lesson that was promoted during this session was to let
go of control and the notion that these systems can be controlled. Large
systems cannot be controlled—they are not deterministic. For various
reasons, it is not possible to consider all the inputs. Some may not have
been included in the original design; requirements may have changed;
the environment may have changed. There may be new networks and/or
new controllers. The problem is not one of control; it is dealing with all
the interactions among all the different pieces of the system that cannot
be controlled. Amazon.com’s approach is to let these systems mature
incrementally, with iterative improvement to yield the desired outcome
during a given time period.
The Old, the Old, and the New
In this session’s discussions, the first “old” was the principle of
abstraction-encapsulation-reuse. Reuse is of increasing importance every-
where as the sheer quantity of valuable software components continues to
grow. The second “old” was the repeated quest (now using Web services
OCR for page 36
SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
and increasingly sophisticated software tools) to make component reuse
and integration the state of practice. Progress is being made in both of
these areas, as evidenced by investment and anecdotes. The “new” dis-
cussed was the view that highly structured, highly engineered systems
may have significant limitations. Accordingly, it was argued, “semantic
integration,” more akin to Internet search, will play a more important
role.
There are several global integration agendas. Some involve broad
societal goals such as trade, education, health care, and security. At the
firm or organization level, there is supply chain integration and N to 1
integration of many stores focusing on one consumer, as in the case Ama-
zon and its many partners and vendors. In addition, there is collaborative
engineering, multidisciplinary R&D, and much more.
Why is global integration happening? For one thing, it is now tech-
nically possible, given ubiquitous networking, faster computers, new
software methodologies. People, organizations, computation, and devel-
opment are distributed, and networked systems are now accepted as
part of life and business, along with the concomitant benefits and risks
(including security risks). An emerging trend is the drive to integrate
these distributed people and processes to get efficiency and cost-effective
development derived from reuse.
Another factor is that there are more software components to inte-
grate and assemble. Pooling of the world’s software capital stock is creat-
ing heretofore unimaginably creative applications. Software is a major
element of the economy. It was noted that by 2004, the amount of U.S.
commercial capital stock relating to software, computer hardware, and
telecommunications accounted for almost one-quarter of the total capital
stock of business; about 40 percent of this is software. Software’s real
value in the economy could even be understated because of accounting
rules (depreciation), price decreases, and improvements in infrastructure
and computing power. The IT agenda and societal integration reinforce
each other.
Core elements of computer science, such as algorithms and data struc-
tures, are building blocks in the integration agenda. The field has been
focusing more and more on the creation and assembly of larger, more
flexible abstractions. It was suggested that if one accepts that the notion
of abstraction-encapsulation-reuse is central, then it might seem that ser-
vice-oriented computing is a done deal. However, the challenge is in
the details: How can the benefits of the integration agenda be achieved
throughout society? How are technologists and developers going to create
these large abstractions and use them?
When the Internet was developed, some details—such as quality of
service and security—were left undone. Similarly, there are open chal-
OCR for page 37
SUMMARY OF WORKSHOP DISCUSSIONS
lenges with regard to integration and service-oriented approaches. What
are the complete semantics of services? What security inheres in the ser-
vice being used? What are the failure modes and dependencies? What is
the architectural structure of the world’s core IT and application services?
How does it all play out over time? What is this hierarchy that occurs
globally or, for the purposes of this workshop, perhaps even within DoD
or within one of the branches of the military?
Service-oriented computing is computing whereby one can create,
flexibly deploy, manage, meter and charge for (as appropriate), secure,
locate, use, and modify computer programs that define and implement
well-specified functions, having some general utility (services), often
recursively using other services developed and deployed across time and
space, and where computing solutions can be built with a heavy reliance
on these services. Progress in service-oriented computing brings together
information sharing, programming methodologies, transaction process-
ing, open systems approaches, distributed computing technologies, and
Web technologies.
There is now is a huge effort on the part of industry to develop appli-
cation-level standards. In this approach, companies are presented with
the definition of some structure that they need to use to interoperate with
other businesses, rather than, for example, having multiple individual
fiefdoms within each company develops unique customer objects.
The Web services approach generally implies a set of services that
can be invoked across a network. For many, Web services comprise things
such as Extensible Markup Language (XML) and SOAP (a protocol for
exchanging XML-based messages over computer networks) along with a
variety of Web service protocols that have now been defined and are heav-
ily used, developed, produced, and standardized (many in a partnership
between IBM and Microsoft). Web services are on the path to full-scale,
service-oriented computing; it was argued that this path can be traced
back to the 1960s and the airlines’ Sabre system, continuing through
Arpanet, the Internet, and the modern World Wide Web.
Web services based on abstraction-encapsulation-reuse are a new
approach to applying structure-oriented engineering tradition to informa-
tion technology (IT). For example, integration steps include the precise
definition of function (analogous to the engineering specifications and
standards for transportation system construction), architecture (analo-
gous to bridge design, for example), decomposition, rigorous component
production (steel beams, for example), careful assembly, and managed
change control. The problem is, there may be limits to this at scale. In
software, each of these integration steps is difficult in itself. Many projects
are inherently multiorganizational, and rapid changes have dire conse-
quences for traditional waterfall methodologies.
OCR for page 38
SOFTWARE-INTENSIVE SYSTEMS AND UNCERTAINTY AT SCALE
It was argued that “semantic integration,” a dynamic, fuzzier inte-
gration more akin to Internet search, will play a larger role in integration
than more highly structured engineering of systems. Ad hoc integration
is a more humble approach to service-based integration, but it is also
more dynamic and interpretive. Components that are integrated may
be of lower generality (not a universal object) and quality (not so well
specified). Because they will be of lower generality, perhaps with dif-
ferent coordinate systems, there will have to be automated impendence
matching between them. Integration may take place on an intermediate
service, perhaps in a browser. Businesses are increasingly focusing on this
approach for the same reasons that simple approaches have always been
favored. This is a core motivational component of the Web 2.0 mash-up
focus. Another approach to ad hoc integration uses access to massive
amounts of information—with no reasonable set of predefined, param-
eterized interfaces, annotation and search will be used as the integration
paradigm.
It is likely that there will be tremendous growth in the standards
needed to capitalize on the large and growing IT capital plant. There
will be great variability from industry to industry and from place to
place around the world, depending on the roles of the industry groups
involved, differential regulations, applicable types of open source, and
national interests. Partnerships between the IT industry and other indus-
tries will be needed to share expertise and methodologies for creating
usable standards, working with competitors, and managing intellectual
property.
A number of topics for service-oriented systems and semantic inte-
gration research were identified, some of which overlap with traditional
software system challenges. The service-oriented systems research areas
and semantic integration research areas spotlighted included these:
• Basics. Is there a, practical, normative general theory of consistency
models? Are services just a remote procedure call invocation or a complex
split between client and server? How are security and privacy to be pro-
vided for the real world, particularly if one does not know what services
are being called? How does one utilize parallelism? This is an increasingly
important question in an era of lessening geometric clock-speed growth.
• Management. With so many components and so much information
hiding, how does one manage systems? How does one manage intellec-
tual property?
• Global properties. Can one provide scalability generally? How does
one converge on universality and standards without bloat? What systems
can one deploy as really universal service repositories?
OCR for page 39
SUMMARY OF WORKSHOP DISCUSSIONS
• Economics. What are realistic costing/charging models and
implementations?
• Social networking. How does one apply social networking technol-
ogy to help?
• Ontologies of ast breadth and scale.
• Automated discoery and transformation.
• Reasoning in the control flow.
• Use of heuristics and redundancy.
• Search as a new paradigm.
Complexity grows despite all that has been done in computer science.
There is valuable, rewarding, and concrete work for the field of computer
science in combating complexity. This area of work requires focus. It could
prove as valuable as direct functional innovation. Participants identified
several research areas to address complexity relevant to service-oriented
systems and beyond, including: meaning, measuring, methodology, sys-
tem architecture, science and technology, evolutionary systems design,
and legal and cultural change.