Read "Reducing the Logistics Burden for the Army After Next: Doing More with Less" at NAP.edu

« Previous: 6 Engagement

Page 110 Cite

Suggested Citation:"7 Reliability Concepts." National Research Council. 1999. Reducing the Logistics Burden for the Army After Next: Doing More with Less. Washington, DC: The National Academies Press. doi: 10.17226/6402.

Page 111 Cite

Page 112 Cite

Page 113 Cite

Page 114 Cite

Page 115 Cite

Page 116 Cite

Page 117 Cite

Page 118 Cite

Page 119 Cite

Page 120 Cite

Page 121 Cite

Page 122 Cite

Page 123 Cite

Page 124 Cite

Page 125 Cite

Page 126 Cite

Page 127 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

7 Reliability Concepts The AAN battle force concept is predicated on all systems being highly reliable. Furthermore, to reduce logistics demand the Army must make reliability an equal partner with lethality, survivability, and mobility considerations. This chapter describes the reliability concepts and technologies needed to develop AAN mission-reliable systems. LOGISTICAL IMPLICATIONS OF HIGHT,Y RELIABLE SYSTEMS Improving the reliability of the systems used by an AAN battle force will have a multiplier effect on reducing logistics demands. This multiplier effect can be illustrated by reviewing two functional requirements arising from the assumed AAN concept for logistics support. First, as briefed to the committee, the AAN battle force will take no separate maintenance and supply units into the area of operations. Second, a battle unit support element (BUSE) located at the staging area will be responsible for rapid refitting, refueling, and repair of battle force systems between combat pulses in preparation for subsequent pulses. From these two functional requirements, it is clear that decreasing the maintenance and spare parts needed for subsequent pulses are essential aspects of AAN reliability. Thus, the functional requirements can be used to help define the essential degree of reliability and the things that are unnecessary or too costly, even if they are desirable in principle. The classic definition for the reliability of an item, system, or component is the probability that it will operate successfully during its mission (see Box 7-1~. A typical AAN battle force mission will require systems that are at least reliable enough to meet the two functional requirements described above. No doubt, other functional requirements for AAN systems will also generate reliability requirements, but these two basic requirements are sufficient to illustrate that (~) improving reliability reduces logistics demand and (2) the reliability of any system is relative to the mission context in which it is expected to operate. Pulse-Reliable Systems For the AAN assumption of no maintenance and repair support in the battle space during an operational pulse, all systems must be reliable enough for commanders to meet their force readiness and deployment timelines. 110

RELIABILITY CONCEPTS BOX 7-1 Classical Definitions of Reliability and Related Concepts Reliability. The probability that an item, component, or system will operate successfully during its mission. Maintainability. The probability that an item, component, or system will remain in a specified operational condition or can be restored to that condition within a given period of time, when maintenance is performed according to prescribed procedures and resources. Availability. The probability that, at any random instant, an item, component, or system will be in proper condition to begin a mission. Durability. The probability that an item, component, or system will successfully survive its projected service life, overhaul point, or rebuild point without a catastrophic failure. (A catastrophic failure is a failure that requires that the item, component, or system be rebuilt or replaced.) 111 For a combat operation to accomplish its objectives, all systems taken into the area of operations must be close to fully operational and require neither repair nor maintenance throughout the period of the operation. Systems must not fad! during that operational time period. Even in a dynamic, high-stress environment, all systems must maintain a high level of operability, even if a part or component has been damaged or malfunctions. For the purposes of this report, an AAN system is pulse-reliable if it meets the following criteria: requires no maintenance or wear-related repair or replacement by external logistics personnel for the duration of a combat pulse can continue to perform within minimum operational parameters even with damage to subsystems If all AAN systems are pulse-reliable, no maintenance and repair support elements troops, tools, or spare parts would have to accompany the combat elements. It follows that no fuel, food, water, or shelter would be needed for support elements, and no combat capability would have to be diverted to protect them. No transport systems, which have their own logistics requirements, would be needed to bring in and retrieve the support elements. if commanders could rely on the pulse-reliability of their combat systems, the effectiveness of a force of any given size would be increased for planning purposes. In other words, a given objective could be met with a smaller force, requiring proportionately less fuel, ammunition, etc. These indirect effects of puIse-reliable systems are based on the assumption that every system taken into a combat operation will perform successfully for the duration of the operation.

112 REDUCING THE LOGISTICS BURDEN FOR THE ARMYAFTER NEXT Fast Refitting through Improved Maintainability The pulsed operations of an AAN battle force will require that the refueling, refitting, and repair phase of each cycle be short enough to keep an opponent off guard and incapable of regrouping and responding effectively to the next combat pulse. AAN systems will require maintenance and repair, but the speed with which a resuming battle element can be reconstituted to a state of pulse-reliability and readiness will determine how short this part of the cycle can be. In this context, the overall mission reliability of a system depends on improved" maintainability, including longer times between preventive maintenance cycles, faster diagnosis and repair or replacement of parts, and preventive or predictive (prognostic) maintenance, rather than reactive maintenance. This performance goal will be called "fast refit." The logistics impacts of fast refit include (1) fewer maintenance personnel in the BUSE per unit of combat strength required for a given refitting (higher tooth-to-tai] ratio); (2) fewer spare parts per average combat-day, including fewer spare systems to replace systems that cannot be repaired before the next pulse; (3) reduced logistics burdens (fuel, food, water, and energy) at the staging area to support a smaller maintenance element; and (4) simpler planning requirements to ensure that the BUSE can sustain the war-fighting operations. AAN Mission Reliability Versus Ultrareliability The discussion above shows how the reliability for AAN missions, or ANN mission reliability, can be analyzed into specific reliability requirements for systems, such as pulse reliability and fast refit. These requirements can also be expressed as objective, quantifiable measures, or performance metrics. A hypothetical example of a metric for pulse reliability would be a 90 percent level of confidence that a system verified as pulse reliable will be able to meet eight standards for full operating performance throughout the duration of a 14-day pulse, with a 99 percent level of confidence that it will perform at a degraded (but still operable) level of performance for no more than two of those measures during a pulse. The fast refit requirement might be expressed as an availability metric, such as a 98 percent probability that a given system will be available for the next pulse (pulse reliable and ready) after 12 person-hours of maintenance and refitting, if no battle damage was sustained in the previous operation. The details of both of these hypothetical reliability metrics depend on the specifics of how an AAN battle force would fight and be sustained for the duration of a campaign. Exploring other AAN operational concepts will reveal additional elements of AAN mission reliability. Basing reliability on how an AAN mission would be conducted is very different from the idea of "ultrareliability," which is too often sold as a context-free property that can be achieved by adopting a particular technology or design. But the concept of ultrareliability is too general to help achieve AAN mission-reliable systems. For this Nor example, an analysis of the soldier-machine interfaces required for the complexity, tempo, and intensity of an AAN operation in a three-dimensional battle space would reveal important features of what might be called "trainability": the ease with which a system operator can attain and maintain a high level of proficiency for a specific mission profile.

RELIABILITY CONCEPTS 113 reason, the committee decided to avoid the term ultrareliability and to focus on AAN mission reliability, that is, the minimum reliability requirements that can be used in a distributed M&S environment to develop AAN systems. AAN Mission Reliability and RAMD Reliability, availability, maintainability, and durability (RAMD) have become linked as key factors in keeping future systems affordable, both in terms of investing scarce dollars for research, development, testing, and evaluation (RDT&E) and in terms of fielded equipment that can be procured in adequate quantities within budgetary con- straints.2 The system characteristics that contribute to RAMD are sometimes contrasted with operational performance values because, in the past, performance has typically been achieved at the expense of RAMD. AAN mission reliability, however, cannot be sepa- rated from other aspects of system performance. Systems that fad! to meet reliability requirements will also fad! to meet ANN mission objectives. Three approaches can be used to ensure that AAN mission reliability (and re- lated RAMD qualities) receives appropriate consideration along with other performance objectives. First, RAMD qualities must be interpreted into objective, assessable charac- teristics that can be designed into a system. Rather than being Jumped together as a vague quality to which lip service is paid with terms like "ultrareliability," RAMD must be defined in terms of concrete metrics that reflect operational requirements. System designs should be assessed and engineered against these metrics, just as they are against metrics for mobility, lethality, or any other performance requirement. Second, these ob- jective characteristics must be weighed, along with other performance characteristics, in system trade-offs when designs or prototypes cannot meet all performance goals. The third approach is a Tonger-term alternative to the second. Instead of being forced to trade off a desirable level of performance (whether in a reliability measure or some other goal, such as cross-country mobility, lethality, or survivability) to achieve the optimum performance for all performance characteristics, new technology and new design concepts can be developed to improve overall (combined) performance. In short, the third approach is to seek new and better solutions. Applied and basic research, when informed by the specific characteristics required to meet difficult AAN system constraints, can, in time, provide new solutions. These three approaches are not mutually exclusive. All three are likely to be needed for complex systems with demanding performance requirements. The M&S environment described in Chapter 3 provides a near-term implemen- tation of the first two approaches: designing systems for AAN mission reliability and making system trade-offs that do not sacrifice reliability to other performance goals. M&S is an essential and powerful too! for systems engineering to move AAN mission reliability off the bullet charts and into the battle force. The next section describes the necessary elements for a distributed, hierarchical federation of M&S tools adequate for building AAN mission reliability (or "RAMD for AAN") into each system at every Sometimes adaptability is added to these four characteristics, making the acronym "RAAMD." The argument made here applies to RAAMD, as well as to RAMD.

114 REDUCING THE LOGISTICS BURDEN FOR THE ARMYAFTER NEXT level, from subsystems down through components, structures, and materials. After that ways to enhance the third approach are discussed. USING AN M&S ENVIRONMENT TO DEVELOP AAN MISSION-RELIABLE SYSTEMS A, Designing systems and performing trade-off analyses with tools that can simulate whether feasible systems, subsystems, components, and structures will meet mission-specific RAMD metrics is the key to a realistic strategy for achieving AAN mission-reliable systems by 2025. A hierarchy of mode! domains, illustrated in Figure 7-1, can be constructed for any complex system developed for the AAN battle force. Note that Figure 7-1 is based on the discussion in Chapter 3 of the distributed M&S environment illustrated in Figures 3-l and 3-2. Fielding AAN materiel in 2025 wit! require that engineering and manufacturing development begin by 2010. To meet this milestone, extremely complex system trade-offs will have to be made, and the supporting technologies for engineering and manufacturing development will have to be available. In the judgment of the committee, the only way to perform the systems engineering essential to making trade-off analyses while reducing the costs, in time, resources, and risk, of trial-and-error developmental approaches is to use the simulation techniques described in Chapter 3, beginning with conceptual design. This approach is used extensively by leading manufacturers to design highly reliable subsystems and can be effectively exploited by the Andy to significantly enhance the reliability of AAN systems. Unfortunately, existing M&S tools cannot feed data on achievable reliability and performance levels at the component and subsystem levels back to the operational level, at which system trade-offs should be made. Without performing iterative simulations up and down a hierarchy of M&S tools, as illustrated in Figure 7-l, determining through M&S whether a design concept will meet AAN mission reliability objectives will be impossible.3 Reliability (much less the pulse and mission reliability needed by AAN systems) is not currently part of the design process, but it can be easily included by adding reliability analysis at appropriate levels of the M&S hierarchy. Designing candidate AAN system concepts to meet AAN mission reliability requirements will require the following extensions of current capabilities and design approaches: M&S systems must be adequate at every level in the hierarchy at which "designing for reliability" is done, from the top level of force-on-force engagement down to the lowest level at which the reliability of design options is evaluated. Metrics for reliability at each level must be defined in terms of operational requirements, so that reliability can be assessed objectively at that level. The design process must include iterative simulations up and down the hierarchy. 3For the argument supporting this point, see "M&S Environment to Support AAN Logistics Trade- off Analysis" in Chapter 3.

RELIABILITY CONCEPTS AAN mission reliability metrics ~ 1 Virtual Proving Ground (Single Vehicle: Orator and Hardware in System-level reliabilitymetrics ~ T `` System Architecture Single vehicle performance results in meeting AAN mission reliability requirements System structural characteristics related to system performance ~an_ Subsystem-level r Subsystem reliability metrics performance results ~ r related to reliability (I Subsystem Architecture ~ Component-specific ~Component performance reliability metrics ~ ~results related to subsystem ~reliability Component Design Mate nal characteristics ~ t Mate n al pro pe rties related to required fjorbcjlojtrynponent ~ _ component reliability Materials Selection and Processing Analysis Properties needed in new materials to meet component ~ Simulated material properties perFonnance requirements, ~ r and processing approaches including component reliability ~ --. ~ M&S for Design and Processing of `` Material Microstructure and Composition J 115 FIGURE 7-1 Hierarchy of model domains. An extended M&S environment can be used to design reliability into AAN systems, perform system trade-off analyses, and develop new options for enhancing reliability. The figure is based on Figures 3-1 and 3-2. For the sake of simplicity note that the top two engagement levels are not shown. Also, the requirements and metrics for other performance goals (left side of Figure 3-1) and M&S results relevant to them (right side of Figure 3-1 ) are not shown. At the lowest level at which M&S is being used, valid data on alternatives must be available for the characteristics that determine reliability at that level (i.e., estimates of the metrics for reliability at that level of system decomposition must be realistic, not guesses or wishful thinking). During the iterative design process, and subsequently during engineering development, testing, and evaluation, the mission reliability of the system (i.e.,

116 RED UCING THE LOGISTICS B URDEN FOR THE ARMY AFTER NEXT the reliability-related minimum requirements set for the top level of system performance in simulated AAN engagements) must not be traded away to sustain or increase another desired aspect of overall system value. In practice, each of these extensions can be achieved in varying degrees. Therefore, the extent to which AAN systems can be designed for reliability will depend on how well the M&S environment and the methodology of using it meet these five goals. The challenges and opportunities in these five areas are explored below. Adequate M&S Systems Chapters 3 and 5 examined at length the existing mobility M&S systems at each level in the hierarchy, diagnosed some of their limitations, and recommended improvements. Because AAN mission reliability is a cumulative outcome of complex system-level behaviors, it will be helpful to consider the necessary capabilities of each M&S system type described in Chapter 3 to provide a reasonable simulation for assessing reliability. For example, at each of the three engagement levels shown in Figure 3-l (force- on-force, multiple systems with operators, and single system with operator), both the normal or "expected" duty cycle and the frequency-versus-severity profile of excursions from the normal cycle must be realistically simulated and exercised. At the system and subsystem levels, the models must include system-stressing loads and conditions and variable patterns of operation, not just baseline operating scenarios. Because time is often a key factor in the appearance of failure modes that reduce reliability, either the simulations at each level must be run for durations required by AAN-mission reliability or the analytical methods used to extrapolate from shorter run times to the durations characteristic of AAN operations and duty cycles must be validated. Because reliability is relative to context (e.g., mission or duty-cycle profiled, the realism of the higher-level models in the hierarchy will be critical to using an M&S environment for designing reliability into an AAN system. In effect, a systems engineer will have to rely on the results from the higher-level models to define the behaviors of the subsystems, components, and materials critical to making the entire system mission reliable. One may, of course, rely on engineering experience or rules of thumb to make a reasonable guess at characteristics that will affect reliability at higher levels of system integration. But these approaches are difficult to quantify into metrics or validate. Reliance on qualitative and heuristic approaches to reliability has probably contributed to the ease with which reliability has typically been traded away for performance characteristics that could be more easily quantified during requirements specification, design, and evaluation. . . Defining Reliability in Measurable Characteristics Once mission-specific reliability is accepted as a performance value that applies at each level of an M&S hierarchy, each level will require that appropriate reliability

RELIABILITY CONCEPTS 117 measures be specified for which mode! runs can be evaluated. Less obvious perhaps is that the appropriateness of the reliability measures at one level is determined by the performance properties at the next higher level to achieve the reliability characteristics required there. This linkage of the reliability measures at a given level to the required performance characteristics for reliability at the next higher level makes it possible for assessable reliability requirements to "flow down" from mission-specific, functional requirements (like pulse reliability or rapid refit) to reliability requirements for particular subsystems and components. The linkages between levels enable systematic design to achieve the ultimate (i.e., top level) reliability requirements. As noted in Chapter 3, the absence of this linkage in existing models prevents the flow of Tower-leve! data up to the systems level in the hierarchy, where alternative designs and trade-offs of one system value for another ought to be made. To state the ~ , ~ ~ 1 , 1 , `1 1 r , , ~ ~ ~ ~ 1 a, problem In practical terms, the value ot using Iterative M&S cycles as an alternative to trial-and-error cycles of design, building, testing, and modifying depends on how closely the M&S hierarchy can model the causal relations between the metrics for reliability at one level and the properties at the next higher level of integration that affect system performance. Iterative Simulation In Chapter 3, the committee stressed the importance of iterative simulations up and down the M&S hierarchy. As a high-level system requirement, AAN-mission reliability is a good example of why the iterative approach is essential for making design decisions. (Performance characteristics defined at the system level for such things as energy management, mobility, and lethality require an iterative approach for the same basic reasons.) A mode! by its nature is not an exact replica of the thing it models. When a general mode! is applied to a specific case (for example, when a force-on-force mode! is used to simulate a particular type of AAN mission or the NRMM is used to simulate the behavior of a particular vehicle concept over selected terrains), the fit of the model can be improved if input parameters are specified and the settings selected for the model's run parameters. if the concept to be modeled is at an early design stage, results from earlier runs can be used as feedback to "refine, tune, and tweak" both the mode! and the design being tested. Reliability outcomes of detailed engineering model runs for a combat vehicle (for example) indicating that loads on a bearing approach or exceed design limits may lead to a redesign of the vehicle. Systems that are frequently or easily defeated in a force-on-force simulation or that consistently run out of fuel or ammunition may require that the system be redesigned or modified, that tactics for using the system be reconsidered, or that the training for operators be changed. Another alternative is that the validity of the mode! itself may be questioned, leading to corrections and refinements in the model. in an M&S hierarchy of tools, this feedback process extends beyond a single mode! at any given level. Determining the implications for reliability metrics at the system or subsystem level of a particular design using particular components in a

118 REDUCING THE LOGISTICS BURDEN FOR THE ARMYAFTER NEXT particular configuration will require modeling up through several layers of the hierarchy. The general axiom of systems engineering applies: optimization for a quality (such as reliability) at a sublevel in a structural-functional hierarchy does not necessarily lead to optimization even for the analogous quality expressed at a higher level of system integration. Furthermore, the Army will have to optimize more than one quality (e.g., pulse reliability for at least two weeks of pulses, plus various mobility and lethality objectives). Optimization at Tower levels for any one of the overall performance qualities may not provide the best system solution for all of them. For all of these reasons, upward iterations through the hierarchy will be crucial. Similar reasoning applies to the downward flow of performance requirements (including reliability requirements) from the top level to the component level (and below that to the materials selection and materials design levels). High-level functional requirements may, on analysis at lower levels, turn out to be inherently incompatible (for example, they jointly "violate the laws of physics". Or they may be jointly unachievable for all exist- ing design options. In either case, some kind of"goal leveling" across performance re- quirements will be necessary. Additional downward iterations of different combinations of modified requirements will be necessary to make reasoned decisions about the "best" system trade-offs. For instance, lightening a vehicle by using advanced composite mate- rials may increase its range per fuel Toad and may improve its pulse reliability, but these materials may require longer maintenance checks between pulses and, therefore, require additional maintenance specialists in the staging area to meet the fast refit requirement. As the larger AAN process evolves with new combinations of tactics and doctrine, performance specifications will change (including the mission-specific medics that define reliability requirements). Modeling the new options downward through the hierarchy and running alternative solutions back up-will again be necessary. Valid Data on Alternatives Even in a well coupled hierarchical M&S environment, independent input variables must be set at each level. These include variables that specify design choices or environmental conditions specific to each level, as well as the input variables that represent design choices at the lowest model level in the overall simulation scheme. The utility and validity of a simulation exercise for making design decisions and system performance trade-offs depends on how accurately these input data characterize the design options and conditions that the simulation is supposed to represent. This simple point has important consequences for a simulation being used to assess a complex variable like AAN mission reliability. When new design concepts are introduced at any level in the simulation hierarchy, the properties that influence the reliability metrics at that level may not be well characterized. New structural options may in principle be available for insertion into designs (for example, new materials that might be used in components and structures), but valid data on even well established properties that affect reliability may not be available to the designer. For AAN mission reliability to be analyzed objectively and reasonably, modelers will need sound data for all design options of potential interest and for all properties that significantly affect the reliability metrics at each mode! level.

RELIABILITY CONCEPTS 119 Preserving Mission Reliability during System Trade-offs Once RAMD is represented by measurable characteristics reflecting operational requirements in the system conception, design, and testing processes, the funa1amental systems engineering issue is whether all metrics representing these and other operational requirements can be met with known technology. If all metrics cannot be satisfied in one system, trade-offs can be made to optimize the outcome. In the past, whether this was done systematically or haphazardly, RAMD values were often sacrificed for "performance" values, real or perceived. For AAN systems, it may be necessary to sacrifice some desirable mobility, lethality, or survivability characteristics to maintain the level of AAN mission reliability requires! for an acceptable probability of success. A principal benefit of an M&S environment like the one proposed in Chapter 3 is that it allows the established trade-off methods of systems engineering to be applied to novel AAN systems, beginning very early with design conception and continuing, with increasing precision and certainty, through detailed design, engineering development, testing, and evaluation. With rigorous adherence to good systems engineering practices, a performance goal like mission reliability, which is a global property of overall system performance across a system's mission profile, can be achieved. AAN mission reliability can only be assured if the trade-offs inherent In creating a novel and complex product that can be fielded by 2025 maintain adequate levels of mission reliability. Assume that an adequate M&S environment is available for a proposed AAN system concept. A reasonable starting point for designing the new concept is to attempt to meet all performance metrics (including those for reliability) with existing, well characterized solutions. Suppose, though, that all of the requirements cannot be met jointly. This is a likely outcome for the leap-ahead systems needed. The next step might be to look for less well characterized options that can be substituted for some of the tried-and-true standard materials, structures, and components. Because less is known about these options, additional physical testing of the proposed alternatives and modeling of the system configured with them will have to be done. Data on the alternatives must be validated, and additional cycles of iterative simulation will assess whether the new design can meet the requirements for mission reliability and other performance qualities. Advances in materials engineering may be able to help here by providing new approaches to obtaining data about relatively untested options. For example, it is diff~- cult to use accelerated testing methods to determine how a component fashioned by new means from novel materials of construction will respond to a complex duty cycle. Knowledge of the physical properties of the materials gained from experimental data, including their dynamic responses throughout the duty cycle, may make it possible to mode! the Tong-term failure, wear, and aging behavior of the alternative, in the context of a particular design for a particular system. A radical form of this "search for better system inputs" is to Took to materials engineering to provide a "new solution" that meets the particular requirements (e.g., specific strength or resistance to failure modes of the familiar options, or ease of replacement) of an element in the modeled system. "Designing" a new material (or novel structuring of known materials) depends, like the modeling of hard-to-test physical

120 REDUCING THE LOGISTICS BURDEN FOR THE ARMYAFTER NEXT behavior, on knowing the physical properties that will provide the desired functional behavior and knowing how to engineer those properties into the structural element in question. These approaches will probably not be valid for the development of the first AAN systems, but they are discussed in the following section as longer term options for meeting the AAN functional requirements for reliability. An equally valid approach from the standpoint of systems engineering is for designers to re-examine the performance requirements, including the reliability requirements, at each level in the M&S environment from the top down, to see if any can be relaxed without sacrificing the essential requirements for the system to do its job (i.e., top-down reduction of functional requirements). Eventually it may be necessary to compromise on functional requirements to find an acceptable system solution. In the past, when a lower requirement for one performance goal was traded to achieve an acceptable metric for another, system reliability was often "traded away." To varying degrees, the justification for the other performance characteristic was considered "more important than cost," and decreased reliability could be compensated for by buy- ing additional quantities of the system (for replacements). Two other reasons for sacrificing reliability have been, first, the lack of objective, assessable metrics for mission-specific reliability at each level in the structural-functional hierarchy of Army systems and, second, a dearth of hard data about the reliability-relevant properties of system elements that were introduced to meet other performance objectives. If reliability is considered on an equal basis with lethality, survivability, and mobility, then reliability can no longer be used as an excuse for poor design. Even the best systems engineering in the world will not consistently produce AAN mission-reliable systems unless and until the following steps are taken to supplement systems engineering throughout the design, development, and testing process: Reliability must not be traded away to meet other performance objectives, at least not to the point that mission reliability will be threatened or lost. Designers must have a design construct (e.g., an M&S environment) for highly complex systems that incorporates meaningful, quantifiable characteristics that define mission reliability at the topmost system (platform) level and characteristics that are closely coupled with mission reliability at each lower level of the system structure-function hierarchy. Contractors who offer proposals to build a system, subsystem, or component should be evaluated (using the M&S environment) on the basis of the proposed design's capability to achieve the requirements for AAN mission reliability (as well as the requirements for other mission-critical system goals, such as system fuel efficiency "Chapter 4], vehicle mobility "Chapter 5], and precision engagement "Chapter 63~. · Contracts should be awarded on the basis of meeting mission-specific reliability requirements, and contractors should be held to delivering what they promise. Source selection criteria must be changed to consider reliability on an equal basis with other mission-specific goals. Currently, reliability is often traded off for performance, which increases logistics support requirements for new systems.

RELIABILITY CONCEPTS 121 THE THIRD APPROACH: RESEARCH TO ENABLE NEW RELIABILITY SOLUTIONS Let us assume that the system design and trade-off analyses performed with M&S tools indicate that not all of the logistics reduction and perfo~ance requirements, including the reliability-related requirements, can be met jointly with materials, structures, and components that are well characterized from testing and accumulated experience in similar applications. An alternative to relaxing one or more requirements to optimize the system design is to search for better designs or for better components or materials. Although the better designs, components, and materials may not be ready for engineering and manufacturing development by 2010, the research to find them may be necessary to meet all of the AAN requirements at a later date. Furthermore, it is always possible that a breakthrough or burst of progress in a key area will lead to improvements sooner. Improving System Reliability at the Level of Component Analysis and Design Although much can be done in the near term (by 2010) to improve existing M&S tools, research will be necessary in the following areas, even if the results do not bear directly on systems for AAN until after 2010. mechanisms offailure modeling, to relate structural failure modes at one level in the M&S hierarchy to physical properties at the next lower level materials selection and materials design to provide new options (and inputs at the level of component design and analysis) in the M&S hierarchical environment prognostics (the design and application of prognostic sensing technology) to monitor for physical precursors of failure when the mechanisms of failure for a design or a material are known but no better design or material (with respect to meeting all system performance objectives) is available Modeling Mechanisms of Failure Iterative simulation runs up and down an M&S hierarchy can only ensure system reliability to the extent that the models accurately represent the causal relations between the reliability-related characteristics (performance metrics) at one structural level and the physical properties at the next Tower level of structure. Models can misrepresent these linkages in three ways: The models may be inaccurate because of errors in the assumptions or approxi- mations used in the modeling tool itself or in the runs for a particular configuration.

122 REDUCING THE LOGISTICS BURDEN FOR THE ARMYAFTER NEXT The models may be reasonably accurate around a "good design" point (anticipated range of operation and performance) but may not be able to predict off-design performance or identify failure signatures and failure modes. Data for accurate simulation of the system may be insufficient. Improving the models to resolve these problems can be considered increasing their fidelity by incorporating more complete knowledge of the mechanisms of failure into the model. When a system fails to operate properly during its mission, a failure has occurred. "Mechanisms of failure" is just another name for the causal linkages between structure at one level and successful performance at the next higher level of integration. "Physics of failure," a term often used to describe the gathering and applying of knowi- edge of these causal linkages, originated in efforts to improve the reliability of electronic materials and structures. Thinking in terms of the physics of failures has been highly productive in semiconductor electronics because a great deal is known about how the physical structure of semiconducting materials produces the functional characteristics of the "component" electronic device at the next higher level of organization. Building on the success of the physics of failure approach, the Army is now implementing physics of failure studies to assess the reliability of electronic packaging concepts that are still at the design stage. A fundamental constraint on a "physics of failure" approach to determining mechanisms of failure, a constraint that is not always clearly recognized (or stated), is that the ability to predict failure modes and failure events from underlying physical properties depends on two factors. First, specific physical properties must be strongly linked to specific failures (for example, is occurrence of condition A sufficient in itself to cause failure mode F. or is it just a contributing factor that requires other conditions before F occurs?. Second, do we understand the causal structure that determines whether or not a failure will occur? As the causal relations between physical conditions and the occurrence of failure become more complex and our knowledge of that complexity becomes more tenuous, predicting failure modes and events becomes more and more speculative. A more practical way to express this theoretical point is that research on the mechanisms of failure for AAN systems is unlikely to reveal all of the fundamental failure modes of a system and what causes them. In most cases, this is probably an impossible, or even a meaningless, task. However, to design and manufacture highly reliable components that can meet AAN performance requirements, enough must be known about the underlying properties and conditions that can create known or suspected failure modes (i.e., some of the causal linkages) to build components with superior performance in reliability-related characteristics. An example far afield from the area of semiconductor design illustrates the potential value of modeling mechanisms of failure. Given the importance of energy management in reducing logistics burdens (see Chapter 4), many AAN vehicles or other systems will require high-horsepower engines that operate at high fuel efficiencies throughout the range of operating conditions required for AAN mission scenarios. These engines must also be highly reliable, not just when operated at "design" conditions but

RELIABILITY CONCEPTS 123 also under any conditions that occur during an AAN mission. Even if the engine is operated outside its design envelope for optimal performance, it must continue to perform, at least until a combat pulse has been completed and it can be returned to the staging area. Designing this engine will require an understanding of some of the detailed physical characteristics of the engine's subsystems in relation to the performance of the entire engine. For example, the design engineer would want to know how the fuel-air mixing and combustion process is affected by local mixing inefficiencies, pressure oscillations in the fuel feed line, combustion instability in the combustor, soot formation and resulting inhibition of the ignition system, and the stability of lean flames, including local extinction and reignition. In each of these areas, knowing something about the conditions that can lower operating efficiency or damage the engine structures over time would help in modeling the mechanisms of failure for the high-efficiency, high-power, highly reliable engine the designer is trying to build. Tf the designer has a simulation model that incorporates this knowledge of failure mechanisms, the model will be better at simulating how a real engine would perform under a broader range of conditions than a mode! that represents only the optimum design point of operation. However, the designer is never going to sit down with a set of fundamental physical equations (even a very large set) and "deduce" an engine design from them. Nevertheless, a simulation model that incorporates "first- principles" parametric representations for even some of the physical processes in an engine is likely to show the designer some unexpected failure modes when the mode! is run under off-design or nonoptimal conditions. But even a completely accurate physical description of the engine will not enable the designer to deduce all of the failure modes of the engine. As the example of engine operating efficiency illustrates, much of the research that is often described as investigating "physics of failure" that is, investigating the mechanisms of failure-can and should be part of research on the physical processes underlying the complex technologies needed for AAN systems, such as advanced engines, active suspension systems, and lightweight protection systems. However, the duty cycles for many commercial applications for these technologies will be very different from the duty cycles in AAN systems. Therefore, the Army may have to support and encourage basic research on the broad issues in the mechanisms offailure that are unique to the duty cycles for AWN concepts of operations. Two broad issues are (1) the relationship between dynamic physical conditions and properties (conditions and properties that do not vary uniformly over time) and failure modes of subsystems and components, and (2) how failure modes are affected by materials with different structural patterns at different spatial scales (ranging from atomic to micron scaler, as opposed to materials with bulk properties dete~ined predominantly by their atomic- scale structure. Materials Selection for Improved Reliability If all of the performance requirements, including the reliability metrics, for component-level models of an AAN system cannot be met with standard materials,

124 REDUCING THE LOGISTICS BURDEN FOR THE ARMYAFTER NEXT seeking new materials is an alternative to relaxing the requirements. Appendix C, Materials Selection and Design, discusses how materials science and engineering can develop solutions for component design and analysis. Alternatives may exist among lesser known materials, and better materials databases and selection charts can help designers find potential solutions. Often these materials may require testing to determine how well they meet design requirements. As materials science develops better tools for modeling the performance characteristics of materials based on their underlying structures, and as methods for forming materials with novel structures, particularly at very fine scales, are improved, an even more innovative approach may become possible. The component designer may be able to call on the materials designer to design a new material to meet the performance requirements for a particularly demanding application. The M&S tools needed to support "materials by design" must have the same generic capabilities as the tools at various levels in the M&S hierarchy for systems design. In fact, the M&S tools for materials design and processing can be considered another level of structure-function relationships below the component level. The extensions of current capabilities required to develop AAN systems will also be necessary at this new level in the hierarchy. 1 Prognostics Prognostics (prognostic sensing technology) can be described as the use of sensor technology to detect precursors of failure before the failure occurs. Prognostics applies our limited knowledge of the mechanisms of failure to detect "failures in the making" so that the failure can be prevented, avoided, or ameliorated (the graceful degradation of performance). If the M&S capability at each level in the hierarchy, including the just-emerging "materials design" level, could simulate physical reality and predict failure modes and events perfectly (i.e., if the mechanisms of failure linking each structure-function level to the ones above and below it were fully understood and had been incorporated into the models at every level) and if engineers knew how to design each level of a system so that all faiTure-causing conditions could be avoided, then prognostics would not be needed. Often, though, something is known about conditions that cause operational failures but not enough to ensure that none of them occurs. In some cases, an optimal design for the full set of performance requirements for a given system, subsystem, or component is known to be subject to a particular failure mode when certain antecedent conditions arise. In these cases, prognostic sensing technology can improve the reliability of the system. The use of prognostic sensors is well established at the higher levels in the hierarchy of systems design. A warning light goes on when the lining of an automobile brake is worn to the point that a replacement is needed to avoid brake failure. An oil pressure gauge is not used by a knowledgeable driver or mechanic as a means to measure oil pressure but as a prognostic sensor indicating a condition that could lead to the catastrophic failure of the vehicle (oil pump wear or failure, a system leak, or overheating). The more innovative (and sometimes controversial) uses of prognostic sensing are for detecting precursors of structural failures at small spatial scales, particularly by sensors embedded in the material.

RELIABILITY CONCEPTS 125 These new technological possibilities for small-scale sensors (measured in micrometers or even nanometers) to detect equally small causal preconditions for a structural failure have the potential to operate at the material and component levels of complex structures in a manner analogous to the more familiar prognostic sensors at higher, larger scales. The following general principles apply to prognostics at any level of system design: · Using a sensor system for prognostics implies that something is known about the mechanisms of failure for the performance characteristic that the sensor is monitoring. If one knows enough about the failure mode and knows a way to avoid it, it is better to design the system not to fad! rather than to use a sensor to predict when the failure will occur. If we do not know for sure how to avoid it and the failure mode is important enough, a prognostic sensor may be useful. Not every precondition for failure that can be monitored with a sensor is worth monitoring. Prognostic sensors are useful when the causal link between the precondition and the consequence is well established, the consequence is likely to lead to overall operational failure of the larger system, and something useful and relatively easy can be drone to prevent the operationalfailure of the larger system. Prognostic sensors could be used to speed the refitting of AAN systems between combat pulses. For example, knowing that an embedded sensor would detect nascent crack formation in a key structural component could be a faster way to ensure that a system is puIse-reliable than performing a laborious, and possibly destructive, testing procedure during each maintenance check. A prognostic sensor might contribute to pulse reliability by warning a well trained driver to avoid certain stresses, thereby trading a constraint on vehicle operation (a small degradation in performance) for a larger system failure. Although prognostics is not a substitute for AAN mission reliability, it is clearly a complementary technology. SCIENCE AND TECHNOLOGY INITIATIVES TO ACHIEVE AAN MISSION RELIABILITY Based on the preceding analyses of the role of reliability (and related concepts, such as maintainability, availability, and durability) in reducing logistics burdens for AAN systems and the technological opportunities for improving reliability, the committee concluded that the Army should pursue the following areas of scientific research and technology development. The order of a numbered item reflects a rough order of priority. AAN Mission Reliability Defining AAN Mission Reliability. Reliability for AAN systems (or RAMD for AAN) must be defined in relation to AAN operational concepts, as illustrated in this chapter by

126 REDUCING THE LOGISTICS BURDEN FOR THE ARMY~FTER NEXT the examples of pulse-reliable systems and the improved maintainability required for rapid refitting of a battle force between combat pulses. Additional aspects of AAN mis- sion reliability can be defined as the operational concept for the AAN evolves. However, the war-fighters and technologists must make every effort to define mission reliability in objective, quantifiable, and accountable tens. In short, the terms must be usable in system design and trade-off processes. Working definitions of mission reliability must begin at the highest levels of small unit and force-on-force engagement analysis and pro- ceed down to the reliability requirements for individual systems (e.g., AAN combat vehicles). Factoring reliability into the logistics analyses necessary to design, develop, and field systems that will meet AAN performance objectives by 2025 will require clearly linking reliability requirements to mission perfo~ance. The higher-level, func- tional definitions of reliability should be reviewed and updated by both war-fighters and technologists as the concepts of AAN operations evolve. The science and technology community should ensure that the lower levels of system analysis include reliability- related performance metrics that contribute to reliability at the next higher level of system integration. Three Approaches to Mission Reliability I. Designing for Reliability with a Distributed M&S Environment. Five extensions of current capabilities and design approaches must be incorporated into M&S tools at every level of system structure, from components to fully integrated systems. These extensions are (1) models incorporated into the M&S environment that can represent the system properties and environmental conditions that affect AAN mission reliability requirements, (2) measurable reliability-related requirements defined for the models at each level in the M&S hierarchy, (3) iterative simulations up and down the hierarchy of models in the design and engineering process, (4) provisions for obtaining valid data on lesser known design options that could contribute to satisfying combinations of AAN performance goals (including the goal of mission reliability), and (5) mission reliability, defined by assessable reliability requirements, as a performance objective for design and engineering development that cannot be compromised to meet other performance objectives. 2. System Trade-offs That Include Reliability as a Primary Performance Goal. The M&S environment used for designing reliability into systems can also enable rational trade-offs when existing technologies and design concepts do not meet all of the primary performance goals. Especially in the near term, compromises will be necessary, and op- timum system performance should take priority over meeting individual performance goals or the performance of a subsystem. If objective, measurable reliability require- ments have been defined for the system at each level in the M&S environment, then adjustments to those requirements should flow down the hierarchy, and the con- sequences of other design changes on reliability should be assessed upward through the hierarchy. Novel or less conventional technological or design alternatives should be evaluated in terms of their impact on reliability, as well as on other performance goals. Contractor proposals should be evaluated, and contracts awarded, on the basis of how well they meet reliability requirements, as well as other performance goals.

RELIABILITY CONCEPTS 127 3. Application of Materials Science to New Reliability Solutions. Although the payoffs are likely to come after the 2010 deadline for decisions on major AAN systems, several important areas of basic research are likely to provide important contributions to developing systems that can meet AAN reliability requirements, as well as other primary performance requirements. The Army should continue to leverage its resources in these areas of research through networking with industry and academic partners and through active participation in joint programs. Three research areas that are particularly important for improving reliability are (1) investigating mechanisms of failure and incorporating this knowledge into M&S tools, (2) selecting or designing alternatives for materials that can meet AAN requirements, including reliability requirements, that familiar materials cannot meet, and (3) using embedded prognostic sensing technology in designing structures and components. The potential impact of this research and a realistic assessment of their potential contributions to AAN solutions should be framed in terms of refining, improving, and extending to smaller spatial scales the hierarchical M&S environment for systems design and logistics trade-off analyses.

Next: 8 Soldier Sustainment »

Reducing the Logistics Burden for the Army After Next: Doing More with Less (1999)

Chapter: 7 Reliability Concepts

Welcome to OpenBook!

Get Email Updates