| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 518
A.5.5 Reliability
Definition of Reliability
The reliability of a system is defined by a measure of He success wad which the system
conforms to Be perfonnance specification for which it was designed and commissioned
(formally tested and accepted). Without a performance specification, there is little basis for
defining He reliability of He system. When He performance of a system deviates from Hat
specified, this is called a failure. A failure is thus an event related to specified performance
deviation, wad reliability of the system being inversely related to the frequency of failure events.
A system has internal states which, in aggregate support the extemal state of He system. Thus
the external state of a system is an abstraction of its internal states. Usually a specification
defines external states of a system which are He operations performed by the system. Thus,
ntemal state failures cause failures of He external stem, unless the system is designed to be
"tolerant" of intemal state failures. The rotating of He internal states are considered to be the
algon~ms of He system or He methodology by which the system operates.
Errors are not faults as long as errors occur within He bounds of the specification. In
communications, bit errors are finite and are a function of the "internal states" of He
communications system. The major cause of bit errors In a system can be related to signal level
versus noise on a communications channel and may vary USA He type of modulation udlized.
Timing jitter is another cause of bit errors. Wig a siren signal to noise ratio and jitter of dining
signals, predictable bit error rates are achievable. A prudent communications equipment
designer Includes bit error rate In his specifications.
A system is composed of components, which in a non-integrated sense may also be considered as
a system. Terminology for this case is subsystem. Subsystems are composed of modules and
modules are composed of discrete components. The system is the interworldugs of components,
modules, and subsystems to perform the system function. Thus, a communications system, is a
subsystem of an US system. Its function is to provide communications paths from source to
desdnation within specified time periods and within specified error allowances. Failures of
subsystems and components may occur, however, as long as He system performs to
specifications, no system failure has occurred. Modem communications system architecture
~\NCHRP~p ~NCHRP3-51 · Phase2F~nalReport
AS-20
OCR for page 519
supports failure fault tolerance by providing alternate communications paws.
A subsystem failure which does not cause a system failure is, of course, important from a
maintenance standpoint. In the sense of fault tolerant design, a subsystem or component failure
usually means Me loss of redundancy, ~us, the probability of system failure decreases.
Therefore, it is important to restore redundancy to maintain a low probability of failure at the
system level.
With properly designed systems, errors should be tolerated, especially human errors. Systems
which fall because of errors, have not been designed to accommodate a normal occurring
phenomena For this reason, systems should be designed to validate human entries from We
standpoint of reasonable parametric values and entry actions. While it may be impossible to
inhibit an error that meets entry validation logic, from being propagated Trough a system, logic
should not cause a failure of the system (i.e. a system "crash") but rather an output which is
logically related to the errored input (even though the input is in errors. ~ a prudently designed
system, algorithms can be employed to test values, tagging abnormal changes that are either
impossible or highly unusual.
The medium through which symbolic information is transmitted (fiber, copper, air, etc.) directly
relates to bit error rate as well as to the structure of the signal and its ability to maintain
Information content dunng transmission Trough He medium. This "signal structure" is referred
to as modulation. A third element is "noise" associated with the signal transfer, in relation to
signal strength. For example, wireless communications may have an expected bit error rate
(BER) of ~ error in 105 bits while a fiber optic communications linic may have a BER of ~ error
in 10~ bits or less. Being a function of signal to noise, wireless can sacrifice communications
distance for BET. For a typical operating distance with the eransminer effective radiated power
within FCC specifications, a typical wireless network is specified wad a BER of 105.
Errors should be expected in systems and the system design should accommodate expected
errors. An error occurrence within specification, agate, is not a system failure. An error rate
which exceeds specification is a failure as long as all over specifications are met (such as
transr~iit~d power, receiving sensitivity, timing stability, etch.
c;WCHRE`Pb~' NCHRP3-51 · Phase2FinalReport
AS-21
OCR for page 520
Within a modem communications system, at1 internal error (assuming a properly designed
system) should be detected, tagged, statistically counted, and prevented from propagating
through the system. This is accomplished by use of simple panty, check sum panty, and
techniques such as forward error correction coding (which is a more sophisticated form of
parity). Monitoring the statistics of bit errors can be an indication of a pending failure. An
increase in bit error rate indicates a problem in communicators links, such as increased
interfering noise in wireless transmission from an aging transmitter solid state device causing a
decrease In transmit signal power.
Within wireless communications systems, a failure may occur and yet the system may be
performing to specification. The reason is that He wireless communications system must
operate within an environment which it does not control. Specifications for wireless
communication define Be maximum noise environment in which Be system provides BER to
specifications. Design margins for signal strength at receivers, when properly designed,
accommodate normal fade conditions based on statistical weaker vanabons. Should a very
unusual weaker condition occur or a noise source be instaUed near a receiving site, not
considered urchin Be initial installation design, Be noise floor win rise or the signal strength win
decrease which negatively impacts signal-to-noise (S/N) ratio, Bus causing the BER to exceed
specification. In fact, the wireless system is performing to specification as predictive by the
SIN ratio. Thus, ~ reless systems, monitoring noise floor and signal strength at the receiver
provides a means of evaluating whether the system has failed or the medium environment has
changed. This is not true of wire line nor optical links since the cable is part of the system and
can be maintained. It is impossible to "maintain" the atmosphere; it can only be monitored as to
Be conditions supporting com}nuIiications.
From an operator's standpoint, an unacceptable BER, causing system response to degrade and
responsive "answers" to be unavailable, is a system failure and corrective action is rEqui - . For
wireless, where an unusual change In medium conditions has occulted, this is not a system failure
in Be true sense of reliability. Where a system has been specified to need a 15 dB signal margin
to accommodate fade and ambient noise changes and where this signal margin is not provided,
Ben Be system has failed in Be Sue sense of a failure, since Be requiem margin is "out of
specification."
~;`NCEDWha ~NCHRP3-51 · Phase2F~nalReport
A5-22
OCR for page 521
In summary, physical components have finite functional life. As components wear and/or age
parameter values may change. Component parameters change with temperature. The failure of
the component may vary based on the ability of interconnected components to accommodate
normal changes such as wear and/or aging. Poor designs, not employing worst case design
principles for component parameters vanation, can experience rapid aging and failure. Failure
rate of a component is statistically predictable. Reliability models can be developed to predict
failure rate of integrated components, modules, and subsystems. The predicted MIBF of a
system, when analytically computed using recognized reliability engineering procedures, is
significant to the success of achieving a reliable system when installed.
Formal reliability testing on a subsystem may be conducted. These tests, to be valid, must
include a statistically significant sample of equipment and must be tested over the operational
environmental vanabon expected for He system. This includes over and under power conditions,
temperature extremes, and rapid temperature vanations. Any reliability information obtained
without formal analysis and testing is 'highly suspect" and is usually incorrect.
Reliability does not just "happen." It must be designed into equipment and a Quality Assurance
Program must be in place to assure that material, production processes, assembly, and tests are
conducted in compliance u ith Be design and associated production drawings and procedures.
Reliability is assured wig proven design and test approaches. Product "burn-in" (for
environmental stress testing) to preclude "infant mortality" of components is a must for quality
products.
In Be ITS procurement process, reliability has not been adequately stressed and managed. Even
with the emerging 'life Cycle Cost?' procurements, verifiable MTBF information is necessary;
otherwise, the cost analysis is unsubstandable.
A.5.5.1 How Reliability is Achieved
Reliability in communications equipment starts u id quality design. Quality design includes
consideration for Be statistical var~abon of signals based on component differences, tolerance
differences in assembly, variations over operation temperature, and variations due to prime
L;\NCHR~Pha ~NCHRP 3-51 · Phase 2 primal Report
AS-23
OCR for page 522
voltage. A quality product design works with worst case var~adons anticipated through the
manufacturing process and during operations.
Second, product reliability is achieved by assuming Mat products are manufactured under a well
managed, quality assurance program such as Eat recommended by Bellcore or ISO 9000,
through 9003. Quality Assurance (QA) involves verifying that correct manufacturing assembly
and test processes and correct materials are used in production of the product. This includes
attention to Electronic Static Discharge (ESD) and protection of sensitive devices. QA also
verifies that We product conforms to specification prior to shipping.
Third, product reliability is achieved Group use of Electrical and Environmental Stress (EES)
testing over We operational specifications of the equipment. Stress testing, of which '~burn-'n" is
a subset, assures Eat weak components, and marginal electromechanical connections are found.
Thus infant moronity of marginal components is precluded and aH electromechanical connections
are verified to be suitable to support functional operation. Without ESS at We factory level, We
jurisdictional installation becomes the place of burn-in.
Use of large scale integration fiercer supports product reliability. The reason is Eat mechanical
interconnects are minimized and noise interference is more easily managed when protection is
considered a part of We integrated circuit design. Perhaps We use of large scale integrated
circuits has enhanced reliability of advanced communications products compared with
convendonal products more so Dan improved manufacturing methods and testing.
Selection of components which are certified over operating temperature range is important. Also
important is a design which manages heat dissipation from components, especially integrated
circuits. Components which are operated at temperatures over cerdficabon can have their life
expectancy reduced by 50% or more.
Redundancy is another method of improving reliability by increasing fault tolerance. There are
various fonns of redundancy depending on the desired reliability. Redundancy provides We
capability for a backup device or unit to assume operation In case of the primary unit failure.
The use of real-time fault tolerance requires failure detection, fault propagation prevention, and
switch-over to the redundant unit. Non-real-time redundancy may be used. This involves a
L::~NCHR~t NCHRP3-51 · Phase2FmalRepo~
A5-24
OCR for page 523
standby unit being manually switched into operation upon detection of failure in a primary unit.
Very high reliability systems use a two-out~f-~ee match: where two outputs do not agree, a
failure occurs. Some space systems use two-out-of-~ee matching for reliability.
Win real-time fault tolerance, as long as the unit continues to perform to specifications, it has
not failed. Thus Mean Time Between Failures (MIBl;) in Be lOs of years awe possible through
use of real-~ne fault tolerance.
Network design in communications also supports high reliability. Having alternate
commun~cabons paths provides a means of commun~cadon if a primary paw is disrupted.
Disruption may be through cable damage, or in the case of winless, an obstruction in the radio
line of sight Pa~-sw~tched optical rings and packet radio networks are examples of cable and
wireless communications path disruption fault tolerance. Id packet radio networks, to achieve
fault tolerance a system must be designed wad a minimum of two communications paths for
each transceiver location.
In junsdictional communications networks, Mere is a high probability of cable Carnage or
wireless communication path blockage due to construction activity as Be jurisdiction grows.
Thus, use of communications paw redundancy is very important in achieving reliability. There
are a variety of approaches to paw redundancy. An unfolded counter rotating ring is perhaps Be
simplest, with star structures providing more redundancy paths and thus higher path reliability at
higher cost per communications path (i.e. cable infrastructure). Reliability must be traded off
wad cost of more elaborate, redundant network geometries such as star versus ring versus
nterwor~ng nags.
A compromise on star architecture is interwor~ng optical ring topology related to SONET.
Interwor~g rings can sustain more than a single break in a fiber cable (number depending on
ring interworking geometry) and can provide more usable geographic area coverage compared
with a "stat' topology, usually at a lower cost compared with "star?' topology Origin a
metropolitan area
Again, reliability must be designed into equipment and into software. Reliability cost can
usually be justified based on life cycle cost analysis which considers cost of emergency repairs
L:~NCHR~Pba~t NCHRP3-51 · Phase2FmalReport
N
A5-25
OCR for page 524
and related logistics in large systems. Unfortunately, system reliability objectives may be
compromised because of limited funds for system acqutsidon, since maintenance funds usually
come from different funding sources and cannot be used for acquisition..
A.5.6.2 Reiiabilky Considerations of Older Communications Technology Versus
Advanced Com munications technology
Older commun~cadons technology was designed wad discrete components and limited integrated
circuitry. Only since 1990 has hybnd integrates! circuitry been stressed In systems. Hybnd
indicates the use of a combination of digital and analog technology in an integrated circuits.
Win evolution of integrated dig~tal/analog circuitry in a single "chip," and advances in
applications-specific integrated circuits, significant component reduction has resulted. MTBFs
of co~nurucations devices have grown from several thousand hours to tens of thousands of
hours. It is not uncommon to find a 20,000 hour MTBF advanced communications product, and
new products are emerging win five-year (43,800 hours) MTBFs. Using redundancy, systems
availability levels of 99.98% are achievable and are reasonably common win Me use of
advanced communications technology within Me telephone ~ndus~y (such as SONET).
Older communications technologies generally used analog techniques which are susceptible to
noise. Digital technology provides improved noise immunity. This is especially true of distal
video communications and optical commun~cabons versus twisted pair wire line systems using
analog modems.
Thus, use of advanced communications technology, assuming Mat it has successfully been
Trough beta testing and is in production status, usually exhibits a much hider MIBF Man
conventional technology.
A.5.5.3 tmpactofInstalIation Design on Reliability
Communications equipment may be designed win high reliability; however, if installation design
is improper, reliability can be significantly impacted. Installation design must properly consider
communications paw length, signal attenuation, environmental noise, signal distortion caused by
marginal bandwiddls, modulation types used, propagation time, synchronization, and over
L::WCH=Pbase~pt N~" 3-51 · ~ 2 Few ~n
AS-26
OCR for page 525
factors impacting reliable point-to-point communications. Without adequate margins to
accommodate increases In attenuation and noise, commun~cabons reliability is impacted.
Similarly, installation design must consider ambient temperature extremes, such as reading
internal heat dissipation and requned air conditioning. If air conditioning is critical to equipment
operation, then it should be fault tolerant and should be backed-up with emergency power.
Similarly, power outage and need for interruptible power supply for system reliability must be
considered. Batteries must be sized to Me maximum outage period acceptable, prior to
activating a motor/generator. Continuous load factor must be considered to assure reliability
because some UPS equipment only allows 80% continuous load.
Proper instalIabon design is just as important as proper equipment selection in insuring high
reliability systems. Attention to professional installation design and quality inspection of Me
installation provides high confidence that system reliability is not compromised.
A.5.5.4 Environmental Impact on Reliability
Temperature mismanagement is one of the major contributors to equipment failure. It is also one
of the least understood requirements by junsdictional personnel. ~ we review deployed ITS
communications equipment today, one of Me major deficiencies noted is a disregard for
operating temperature specifications. Junsdictions uaR emphasize the need for NEMA TS-2
environmental specifications for field controllers and will interconnect these controllers with
communications devices (such as Hayes modems) which are only designed to operate from +10°
to +40° C and wad humidity of 30% to 85% RH, noncondensing. The argument is Hat the
devices work; however, what is not understood is Hat reliability is sit ficantly decreased.
areas where devices are operated in high ambient temperatures (such as 50° C in Arizona,
Nevada, or California desert communities) reliability can be impacted causing failure after a
relatively short operating penod.
Electronic equipment using solid state components (such as integrated circuits and transistors)
must be designed to maintain operating temperatures of junctions at an acceptable level to
achieve the reliability objective of He equipment within He specified operating temperature
range and cooling approach. There are many types of solid state components from low cost
plastic, to industrial grade ceramic, to surface mount MIL-STD-~83 compliant. There are many
t.\NCHRmPh~.rp ~NCHRP3-51 · Phase2F~nalReport
A5-27
OCR for page 526
approaches to cooling components from attachable fins for extended surface and
conduction/convechon cooling to use of a ground plane on a printed circuit board which conducts
heat to printed circuit board gu~des/card holders, and finally to a large heat sync such as Me case.
Equipment operated in an office environment usually does not require sophisticated heat
management; thus, plastic (not a good heat transfer matenal) components and cooling fans are
used.
When a device designed to operate in a stable office environment is employed on the roadside, it
is exposed to a much wider temperature environment which it was not designed to manage. ~ a
NEMA cabinet on a freeway in Arizona wig a 49° C ambient environment and solar loading of
Me cabinet, an internal temperature of 74° C is highly probable dunng peak solar loacling
periods. Solid state component junction temperature will increase perhaps by 60° C causing an
MIFF decrease from 107 hours to 104 hours (10 million to 10 thousand hours). Figure
/
A.5.5.4-! illustrates typical failure rates of silicon and GaAs integrated circuit. When heat is
properly managed, reliability of 107 hours is achievable at Me component level. '~lack box"
reliability is obtained through analyzing the interconnection of components in senal and parallel
structures. Reliability of a serially interconnected group of semiconductor components is
calculated by the same basic formula used to calculate resistance of parapet regions.
(Se~ia1 Fo'6ue Ram =
Ail ~"2 ^u
Where: PR =
n =
L:\NCH]Wh~t
Failure rate of component in service
Nib component
NCHRP 3-51 · Phase 2 Fmal Report
A5-28
OCR for page 527
At
o o
F
C)
Z
_
\ O
_ Cat
Cat
'A ' ,
hi\
_
.
~1
c
Hi'
~ ~1
Ow
Z Icy
A)
At:
1 ~
4o
W~ Z
_ ~-
~I A O \
~` C _
m <:
\
\
.0
~
\,
~A
\
\
o
CD
\
.. \
C`c _ O
at, ~_ _ ~ ~ ~(0 _
_- _
O O O
111
-
~_
-=
llJ O
~ I
-
O O O
~r' cat
o
O ''O
to
to
to
C~
o
o
LL
L,I
_`
_
~_
C~
~ d
u) ~
~ Q
-
Ln
o
O o
X
L~
C ~
Z JO
O o
~r
In
~ 8 ~
~0
OCR for page 528
As can be seen, the failure rate of integrated components can be even more significantly
impacted by reduction in each individual failure role because of the serial interconnects.
Paying attention to environmental design specifications of communications equipment applies
not only to old components but also to new technology. Where air condidon~ng is used to meet
environmental requirements, failure of the air condidon~ng unit becomes a critical point of failure
for the system, even if fault tolerance is used in site electronics modules. The reason is that both
the back-up and primary electronic modules win be subject to rapid decrease in time between
failures. Thus, use of redundant air conditioning is necessary or use of rising temperature alarms
which support air conditioner repair prior to the cndcal failure temperature of He electronics.
A.5.5.5 Software Reliability
The Institute of Software Eng~neenng OSE) is recognized by Me U.S. Government and ~ndus~y
as the certifying agency for software quality. Companies are certified based on Be qualifier of
Weir software development organization, management, development tools, test approach,
documentation, and quality assurance.
The Institute of Electrical and Electronic Engineers ~EE3 has published stardoms and
guidelines for producing reliable, quality software. IEEE Standard 982 defines "Measures to
Produce Reliable Software," JEER Communications Magazine (Volume 32, No. 10, October
1994 pas. 58-63) presents an excellent article entitled 'life after ISO 9001. Britishtelecom's
Approach to Software Quality," discussing Be impact of software quality assurances on
software reliability
For software to be reliable it must be weB designed and weB tested, including stress testing.
Stress testing is associated with worst case loading of Be processor (within its specified
operational envelope) to assure Hat He integrated hardware and software can perform.
Hardware/software integrated timing are Bus tested and the ability of He hardware/soRware to
handle buffenug requirements under worst case loading. Furler tested are errors to validate Hat
He software does in fact detect error conditions and prevent error propagation Hat could cause a
software "crash."
L:\NCHR~Pb~!pt NCHRP3-51 · Phase2FinalReport
A5-30
OCR for page 529
Structured software also supports reliability by preventing multiple entry and exit points from
software modules (100 to 200 lines of code to perform a subtask).
While Mere are design techniques for enhancing software reliability, these are seldom used
because of cost For instance, a two-out-of-~ree rating technique, where three software routines
written by Free different software engineers have outputs compared, wad the acceptance factor
being that "two must match" for valid output. In general, weB tested software uric provide
reliable communications operations. As software matures, '~bugs" are identified and repaired in
new rehearse versions. In general, higher software versions should be more reliable; however,
major performance addidons may increase probability of software failure.
In general, when communications equipment uses software, procurement attention should take
into consideration:
.
Was it developed under a formal software quality and reliability program?
· -Is it structured?
· Was it thoroughly tested, including software stress testing?
· Were formal test procedures used and test results documented?
· Is it designed to accommodate expected error conditions?, and
Is adaptation to a specific application by table driven entnes?
Software quality and reliability is a formal process and should be treated by jur~sdichons as an
important requirement of a procurement
Advanced communications systems employ more software and firmware Wan convendonal
commun~cabons systems. The result is Fat hardware components are minimized since these
systems use digital processors which support multifunctional operations. As digital signal
processors have been introduced into communications equipment, higher processing performance
~;~NCHR~ ~NCHRP3-51 · Phase2F~nalReport
A5-31
OCR for page 530
has had an additional impact on my zing hardware components. As components are
eliminated and hardware reliability is transposed to digital signal processor (DSP) chips,
advanced communications hardware reliability is increased.
Since software and/or firmware provides Me communications functionality, reliability of
software must be considered. Contrary to the classic statement: "Software doesn't break,"
poorly designed software does in fact result in ma~nchons. For this reason it is important for
junsdictions to procure software from companies which have formal software reliability and
quality programs.
A.5.5.6 Specifications and Practices Related to Reliability of Communications
Systems
Tables A.5.S.6-la and b summarize specifications and practices related to U.S.
telecommunications and related data communications industries. Within the BeDcore family of
specifications (a), there are numerous specifications related to communications devices,
subsystems networks, and services which relate to system reliability. These specifications define
standards, procedures, and tests to be used in construction, integration and testing of
communications hardware and software for implementation within public and private
telecommun~cadons systems.
Quality Assurance directly relates to reliability. Thus BeNcore has included Quality and
Reliability In common sped if icadons. Poor quality assurance usually results in poor reliability
when products are integrated into operational systems.
BeNcore TRUEST 000332 entitled 'reliability P - ichon Procedures for Electronic
Equipment" is an effort to standardize how reliability figures are placed on equipment. TEl
NW~-000332 is Bellcore's equivalent of M1L HDBK-217F which is Me Department of
Defense's methodology for developing reliability predictions. Table A.5.5.6-2 summarizes
hardware and super-design issues related to communications reliability.
L.~.NCHRP\Phase2.rp ~NCHRP3-51 · Phase2FmalRepon A5-32
\
OCR for page 531
Table A.5.5.~1a
Industry Recognized Specifications and Practices Related to Reliability
Belicore Specifications
Specification Number Title
.
FR-NWT-000796 Reliability and Quality: Generic Requirements
. . _
SR-NWT-001756 Automatic Protection Switching for SONET
SR-NWT-001907 Transport Reliability Analysis, Generic Guidelines
. ..
SR-NWT-002159 Quality Systems and Software Requirements
SR-STS-02579 Algorithms for Redundancy Management
.
SR-TSY-000385 Reliability Manual for Telecommunications
SR-TSY-001130 Reliability and System Architecture Testing
SR-TSY-001369 ~ Reliability of Laser Diodes and Modules: Generic Requirements
TA-NWT-357 Assuring Reliability of Components Used In Telecommunications
Equipment: Generic Requirements
_ .
TA-NWT-000942 Hardware Reliability Assurance Program: Generic
Requirements
TA-NWT-001089 ~ Electromagnetic Compatit lity and Electrical Safety: Generic
Criteria for Network Telecommunications Equipment
TA-NWT-001202 Supplier Quality Process Requirements
TA-NWT-001221 | Generic Requirements for Passive Fiber Optic Component
Reliability Assurance Practices
TR-NWI-000063 Network Equipment- Building Systems: Generic Requirements
(Environment and Electrical Safely)
TR-NWT-000284 Reliability and Quality, Switching Systems: Generic
Requirements
TR-NWT-000332 ~ Reliability Prediction Proc ! d ures for Electronic Equipment
TR-NWT-000418 T Re li abi I ity Assu rance R eq Hi reme nts to r Fibe r O ptic Transpo rt
Systems
TR-NWT-000468 Reliability Assurance Practice for Optoelectronic Devices in
Central Office Applications
TR-NWT-000870 ~ Electrostatic Discharge C' control in the Manufacture of
Telecommunications Equipment: Generic Requirements
TR-NWr-000974 Generic Requirements for Telecommunications Line Protectors
TR - NWT-001011 | G ene ric Req u i remeets fc S u rge P rotective Devices on AC
:\NCHRmPba~t NCHRP 3-51 · Phase 2 Fmal Report AS-33
OCR for page 532
Specification Number rule
_
TR-NWT-001042 SONET Path Protection Switched Ring
. . __
TR-NWT-001075 Generic Requirements for Outside Plant Bonding and Grounding
of System Hardware
TR-NWT-001230 SONET Bidirectional Line Switched Ring Equipment Generic
Criteria
.__
TR-NW1-001274 Reliability Qualification Testing of Printed Wiring Assemblies
Exposed to Airborne Hygroscopic Dust
TR-NWT-001305 Generic Requirements for Surge Protected Terminal Blocks
. . .
TR-NW[-001349 Reliability and Quality Measurements for Telecommunications
Systems; Supplier Support Measures
.
TR-NWT-001400 SON ET Unidirectional, Dual Feed' Self-Healing Ring Generic
Criteria
TR-TSY-000512 Reliability: Generic Requirements
~ .
TR-TSY-000757 | Generic Requirements to Uninterruptable Power System (UPS)
. ._ . _
TR-TSY-000929 Reliability and Quality Measurements for Telecommunications
s~m
TR-TSY-000983 Reliability Assurance Practices for Optoelectronic Devices in
Loop Applications
-
Table A.5.5.6-1b
Technical Requirements and Specifications Related to Maintainability
Institute of Electrical and Electronic Engineers
. . .
Specification Number Title
62.41-1991 | Recommended Practice n Surge Voltages in Low Voltage AC
Power Circuits
_ . _
62.47-1992 | Guide on Electrostatic Di. charge
63.12-1987 Recommended Practice for Electromagnetic Compatibility Limits
1 41 -1 986 | Recom m ended P ractice f ~ r E lecincal Powe r D istri button
142-1991 Recommended Practice for Grounding of Industrial and
Commercial Power Systems
_ .
241-1990 | Recommended Practice rElectricalPowerSystemsin
| Commercial Buildings
242-1986 Recommended Practice for Protection and Coordination of
Do_ ~
let NCHRP3-51 ~ ~ase2FmalReport AS-34
OCR for page 533
Specification Number
446-1 987
~Title
. . .._
Recommended Practice for Emergency and Standby Power
Systems
493-1 990
Recommended Practice for the Design of Reliable Industrial and
Commercial Power Systems
_
Guide for the Installation of Electrical Equipment to Minimize
Noise to Controllers from External Sources
518-1982
730-1 989
IEEE Standard for Software Quality Assurance Plans
l I IEEE Standard for Softwa ~ Configuration Management Plan l
828-1 990
829-1 983
IEEE Standard for Test Documentation
982.1/2-1 988
Measures to Produce Reliable Software
1 008-1 987
l I Standard for Software Uni Testing
=
1012-1986
1042-1987
Standard for Software Verification and Validation Plans
Guide to Software Configuration Management
1061-1992
Standard for Software Quality Metrics Methodology
Standard for Developing Software Lde Cycle Process
1074-1991
1110-1992
Recommended Practice for Powering and Grounding Sensitive
Electronic Equipment
IEEE Standards for Electromagnetic Compatibility
SH 15537
Society for Automotive Engineers
_ 1 _
SAEJ 1113
Department of Defense
DOD 5000.3
Electromagnetic Susceptibility Procedures for Verifying Vehicle
Components
. .
Test and Evaluation Master Plan - Guidelines
MIL-HDBK-21 7
MIL-HDBK-781
MIL-STD-210 C
~ aJ~_~=
Reliant ~ ;= ~ ~ ~ ~ =~ :~ == ==
and Production
..
MIL-STD461 C
MIL-STD462
EMI and Susceptibility Requirements
Test Procedures for EMI/EMC
MIL-STD-810 E
Environmental Test Methods and Engineering Guidelines
Test Methods and Procedures for Microelectronics
MIL-STD-883 D
MIL-STD-21 68
Software Quality Assurance
~;wcHRnPb ~NCHRP3-51 · Phase2FnalReport A5-35
OCR for page 534
Specification Number ~
Federal Communications Commission
Part 15
Unintentional Radiators, Class A and B Devices, Radiated
Emissions and Susceptibility
Table A.5.5.6~2
Summary Overview of Communication Reliability Considerations
Equipment Related Issues ~ _ '~ _
_
· Formally determined, guaranteed mean · Proper network design with link marked
time between failure and high probability of maintaining
signal/noise
,
· System designed to reliability objectives · Compatible specifications at all OSI
using subsystem MTBF vex:
· Fault tolerant hardware and network· Fault tolerance network architecture
architecture
· Equipment designed for compatibility· Installation environment designed to
with environmental variations equipment compatibility and to reliability
· Equipment meeting radio frequency objectives (such as redundant air
interference and electromagnetic conditioning)
compatibility standards
· Equipment designed for compatibility
with power variation
· Equipment designed using worst case· Prime power interconnect designed to
component parametric variations reliability objectives with use of battery
backed-up, uninterruptable power
system
· Equipment with dynamic test, fault~ · Data element secured to prevent
detection and fault propagation unskilled, unknowledgeable tampering
_ ~
· Equipment using large scale integration· Installation design considering
electromagnetic compatibility and
grounding standards
· Equipment manufactured under formal| · Lightning protection on power end
quality assurance program (using metallic signal lines
specification-compliant components,
materials, processes, and test
procedures including bum-in)
· Use of recognized standards and· Port installation testing using stress
protocol testing techniques (full loading)
L;`NCHR~t NCHRP 3-51 · Phase 2 Final Report A5-36
OCR for page 535
A.5.5.7 Summary of Reliability Issues
New communications equipment is more reliable Han older equipment for the following reasons:
Use of large scale integration;
· Use of softwaIe/firmware to reduce hardware components;
Perfection of fault tolerant technology and application to advanced communications
.
equipment;
Advances in adaptive signal processing and modulation technology;
Evaluation of open standards; and
Advances in network routing protocol and network management protocol.
Table A~.5.7-! summaries technology advances affecting commun~cabons reliability.
\
L::\N~t NCHRP3-51 · Phase2FmalReport A5-37
OCR for page 536
Table A.5.5.7~1
Summary of Technology Advancements Impacting Communications Reliability
L Technology Advancements | Benefits
· Large scale integrated circuits, · Improved power management and
programmable logic arrays (PLA) application filtering technology
specific integration circuits (ASICs) and
digital signal processors reducing
component count
· Evolution of fault tolerar t technology from · Advanced modulation techniques
NASA and communications equipment with improved immunity to
n~nd He and mu,
· Development of OSI interface standards to 7 | · Emphasis on product design for
levels reliability and quality assurance
sups ng ~ ~ ~
· Emphasis on network interoperability with | · Improved IC packaging and heat L
associated standards for bridges, routers, management on printed circuit
and switches Roams
· Emphasis onnetworcprotocolstandards | · Development ofinten~et protocol
with congestion control and dynamic routing and dynamic routing
· Improved transmission medium adaptation | · Improvements in hardware and
through use of echo cancelers and adaptive software test methodology
modern technology
| · Evolution of optics communications | · Advancements in error detection
technology with low-loss, low-noise and error correction
transmissions
| · Improved circus design and printed circuit | · Development of hybrid and
board design through modem computer- microwave integrated circuit
aided design technology
· NehNorkmodelingtechnologytoprove- | · Lower cost perforrnanceevatuation
before-build and tradeoff analysis
Several articles amplifying this information may be of interest to the reader:
Proceedings of the [EKE, Volume 82, No. 7, July 1994, pp. 992-10(~4; "Predicting We
Reliability of Electronic Equipment," by M. Pechy and F. Nash.
IEEE Communications Magazine, October, 1994, pp. 64-68; "Network Reliability Design
Techniques to improve Customer Satisfaction," by M. Taka and T. Abe.
L;`NCHRmPh~t NCHRP3-51 · Phase2F~nalReport AS-38
OCR for page 537
IEEE Communications Magazine, June 1993, pp. 4043; "Incorporating Reliability
Specifications into He Design of Telecommunications Networks," by S. Nojo and
H. Watanabe.
Electronic Design News, June 1994, pp. 109-!16; "Keep Metastability from Killing your
Digital Design," by G. Grosse.
Note: This article deals wad Among stability versus MTBF of equipment.
01d communications equipment typically had MTBFs In the 2000 to 3000 hour range.
Advanced commun~cadons system technology can be obtained wad MTBFs in He 50,000 to
100,000 hour range. The bottom line is whether a failure is critical. Win modern fault
tolerance, network availability in the range of 99.998% is achievable.
L;`NCHIWba~' NCHRP3-51 · Phase2FmalReport AS-39
Representative terms from entire chapter:
communications equipment