Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 79
6 Lessons Learned
Although this report anct its recommendations
are directec! to the NSTS Program, they are of
broacler applicability. It wouIc! be wise to consider
the lessons learnec! by the Committee when struc-
turing a risk assessment anc! management system
for other programs with similar characteristics,
such as the Space Station Program. These charac-
teristics wouic! include large size, use of highly
J r- · ion by
several NASA centers anct prime contractors. The
following are generalized conclusions derived from
the preceding sections. Numbers in parentheses
refer to the principal sections of the report from
which the conclusions were clerivecI.
complex technolo~v. and mnior oarticinat
6.1 ELEMENTS OF AND
RESPONSIBILITIES FOR RISK
ASSESSMENT AND RISK MANAGEMENT
In the Committee's view, any large, complex,
multi-center program shouic! entail an overall risk
assessment anct risk management process which
inclucles the following basic elements:
Risk assessment:
A comprehensive method for identifying po-
tential failure mocles anc! hazards associates! with
the system.
—A specific, quantitative methodology for iclen-
tifying anc! assessing (or estimating) the safety risks
of the system.
Risk management:
--A management process by which the safety
risks can be brought to levels or values that are
79
acceptable to the final approval authority. Risk
management includes establishment of acceptable
risk levels; the institution of changes in system
clesign or operational methods to achieve such risk
levels; system valiciation and certification; and
system quality assurance. (4. ~ ~
The Committee believes that risk management
nest be the responsibility of line management (i.e.,
the program manager anti, ultimately, the Acimin-
istrator of NASA). Only this program management,
not the safety organizations, can make judicious
use of the means available to achieve the opera-
tional goals while reclucing the safety risks to
acceptable levels. The safety organizations at NASA
centers anc! Headquarters are staff organizations-
i.e., they can anc! shouic! be responsible for provid-
ing the assessments of a system's risks. They shouIc!
also be responsible for assuring that the activities
associates! with controlling the risks to the levels
assessed have been carried out anc! clocumentecI.
Safety organizations cannot, however, assure safe
operation; they can only assure that the safety risks
have been properly evaluatecI, anc! that the system
configuration anc! operation is being controller! to
those risk levels which have been accepted by top
management. (4.l, 4.3)
In each such major program, the risk assessment
anc! management processes shouIc! be supported
by a focused agency-wicle Systems Safety Engi-
neering function, at both Headquarters and the
centers involves! in the program, which wouicI:
be structurecI so as to be integrally involves!
in the entire set of design, clevelopment, validation,
. ~ . . . .
anc qua ~hcat~on activities;
provide a full systems approach to the contin-
uous identification of safety risks (not just failure
OCR for page 80
noodles and hazarcis) and the objective (quantitative)
evaluation of such safety risks;
provide the output of this function to the
program clirector in support of his risk management
process;
support the program director by providing
assurance that his system is ready for final safety
certification to the risk levels established by the
NASA Administrator. (5. ~ ~ ~
This focused systems safety engineering would
combine the functions of reliability and systems
safety analysis. It should be responsible for (refining
the requirements and procedures, and performing
or managing, as appropriate, at least the following
functions which shouicI comprise the basis of a risk
assessment and risk management system:
T. Identification of failure modes and effects
2. Establishment of design criteria for redun-
(lancy
3. Iclentification of hazards and their potential
consequences
4. Identification of critical items
5. Evaluation of the probability of occurrence
of causes and consequences of failure modes
and hazarcis
6. Establishment of safety-risk level criteria for
design margins and hazarct controls
7.
Design of qualification and certification test
programs
8. Objective assessment of safety risks
,. Development of acceptance rationale for
retained hazards and hazard reports
6
10. Specification of environmental and operat-
ing constraints at all levels (parts, units,
subsystem, element, and system) to assure
that validated margins are not violated
1. Quantitative evaluation of flight data to
update safety margin validations
12. Oversight of quality assurance functions to
control safety risks
13. Overall system safety risk assessment ant]
definition of the potential to reduce the level
of risk.
All of these systems safety engineering functions
(elaborated upon in Appendix F) are necessary
80
both for achieving credible risk assessment and for
defining the risk controls requires! to justify ac-
ceptance of critical failure modes and other hazards.
During clesign ant! development, the quantitative
evaluation of relative risks for each design against
acceptable criteria for levels of risk should be
consiciered as an integral part of the systems en-
gineering activity. Finally, these activities would
provide a definitive basis for establishing the design
margins and operational constraints needed to
reduce the overall risk to the accepted level ant!
subsequently tO control the risk. They also can
provide a rational basis for decisions on which
risks should be recluced through changes in design
or procedures. (5. ~ ~ ~
In controlling risks, there must be a formal,
continuing, and iterative linkage between the risk
assessment and risk management processes, on the
one hand, and the system's engineering change
activities, on the other. (5.4)
As a program moves toward its operational
phase, a system should be establisher! for the rapid
and effective feedback of inspection and test results,
and repair and flight data into the risk assessment,
risk management, ant! decision making processes.
In the case of flight programs, this should include
ensuring that all mission anomalies detected] in real
time anc! from recorded events, as well as those
(detected during the near-term inspection of any
recovered hardware, are promptly fed into the
formal risk assessment and management processes
for action prior to committing to the next flight;
all such anomalies should be caller! to the immediate
attention of launch decision makers. (S.S'J
6.2 ESTABLISHMENT OF
RESPONSIBILITY FOR PROGRAM
DIRECTION AND INTEGRATION
An imbalance between the authority of the NASA
centers and that of the Program Office could lead
to serious problems in a large program where two
or more centers have major roles in what must be
a tightly integrated program, such as the STS and
Space Station. Without strong, central direction
and integration, the success and safety of these
complex programs can be placer! in jeopardy. The
Administrator of NASA should ensure that strong
direction and integration of all aspects of such a
program are maintained at Level ~ via the Program
Of lice. (5. ~ 0.4) There also must be clear and
unambiguous direction of the program at all levels.
80
OCR for page 81
Those responsible for decisions should be desig-
nated and known to all. Boards and panels should
be advisory to these persons and not decision
making bodies in themselves. (5.10.~)
6.3 THE NEED FOR QUANTITATIVE
MEASURES OF RELATIVE RISK
Top management and program attention should
he focused on those items with the greatest risk to
the safety of a system by means of a prioritization
of all contributors to the overall risk. (5.2) Ac-
ceptable levels of risk in each program should be
set by the Administrator of NASA. However,
suitable quantitative measures of risk, such as
probabilistic risk assessment, are required to ob-
jectively define the acceptable levels, track progress
toward achieving these levels, and evaluate alter-
nate courses of action to reduce risk. (5.6, 5.11)
6.4 THE NEED FOR INTEGRATED REVIEW
AND OVERVIEW IN THE ASSESSMENT OF
RISK, AND IN INDEPENDENT EVALUATION
OF RETENTION RATIONALES
There should be an i~tegrateci review process
which provides ~ comprehensive, overall assess-
ment of risk (including an in~epen~e'~t evaluation,
constantly updated, of retention rationales) upon
which to base any decisions to grant waivers which
permit operating with items that appear on the
Critical Items List. (5.l, 5.3, 5.~) A balance is
needed between "bottom-up" assessment tools (e.g.,
FMEA/CIL) and'`top-down" analyses (e.g., hazard
analyses). In particular, the "top-down" analysis
processes must encompass an integrated system-
wide engineering analysis, including a system safety
analysis. (5.7)
6.5 INDEPENDENCE OF THE
CERTIFICATION OF FLIGHT
HARDWARE AND OF SOFTWARE
VALIDATION AND VERIFICATION
Responsibility for approval of hardware certifi-
cation and software Independent Validation and
Verification (IV&V) should be vested in entities
separate from the program management structure
and the centers directly involved in the program's
development and operation. However, the latter
organizations should continue to conduct activities
supporting certification and TV&V. (5.8)
81
6.6 SAFETY MARGINS FOR FLIGHT
STRUCTURES
Safety margins for flight structures should be
established which are in consonance with the ac-
cepted levels of safety risk for the program. How-
ever, great care is needed to properly verify that
the margins have been achieved and are maintained
in the flight structures. Verification can include the
use of analytical models, but should be supported
by static tests before flight, and in the case of
reusable flight hardware—continued monitoring in
flight by permanently instrumenting, calibrating,
and analyzing data from a representative flight
system. Also, in the case of reusable hardware and
man-rated systems destined to remain in orbit for
long periods of time, comprehensive plans should
be developed and implemented for conducting
periodic inspection and maintenance of the struc-
ture of each system throughout the service life of
each vehicle or platform. (5.10.2)
6.7 OTHER
There are other important factors in risk assess-
ment and management which have been discussed
in this report with respect to the STS as it existed
following the Challenger accident. However, they
are items which are considered to be less important
than those enumerated above or not generally
applicable to several other programs. Where ap-
plicable, they certainly should be given serious
consideration in structuring the risk assessment and
management program. These other factors are
listed here by title and section reference:
Operational Issues (5.9)
Launch Commit Criteria Waiver Policy (5.9.~)
Human Factors as a Contributor to Risk
(5.9.2)
Cannibalization of Spare Parts (5.9.3)
Other Weaknesses in Risk Assessment and Man-
agement (5. ~ 0 )
Software Issues (5. ~ 0.3 ~
Use of Non-Destructive Evaluation (NDE)
Techniques (5. ~ 0.5~.
For any new program, such as the Space Station,
there is the opportunity to structure an optimum
risk assessment and management program at the
outset which builds on the experience gained in
the NSTS Program and assembles those techniques
which will be most effective in establishing, mon-
itoring, and controlling risks to accepted levels.
OCR for page 82
Representative terms from entire chapter:
risk levels