| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 25
3
Testing and Analyses of the ASP and PVT/RIID Systems
The committee was asked to evaluate the adequacy of past testing and analyses of the
advanced spectroscopic portal (ASP) systems performed by the Department of Homeland
Security’s (DHS’s) Domestic Nuclear Detection Office (DNDO), and the scientific rigor and
robustness of DNDO's testing and analysis approach. The Joint Explanatory Statement from
Congress states that the intent of the Secretary of Homeland Security’s consultation with the
National Academies is to “bring robustness and scientific rigor to the procurement process.” As
noted at the beginning of this report, when the committee ended its information gathering for this
interim report in mid-January, the testing and analyses were incomplete and DNDO had not
provided written reports describing test results. No one on the study committee observed ASP
tests before the committee was formed in May 2008. This chapter is based on the committee’s
observations in visits to ports of entry and test sites, reports of testing done before 2008 and
documented plans for 2008 tests, observations of performance tests conducted in 2008 at the
Nevada Test Site, and a briefing (October 8, 2008) on preliminary results from performance tests
done in 2008.
The Government Accountability Office (GAO), DHS’s Independent Review Team (IRT),
and Congress already have reviewed and criticized pre-2008 testing of ASPs and PVT/RIIDs.
The criticism resulted in the requirement for additional testing to support a decision about
procurement of ASPs. Another factor that led to the requirement for DNDO to revisit testing in
2008 is that Customs and Border Protection (CBP) was dissatisfied with the ASP systems’
reliability and compatibility with other CBP systems. Systems qualification testing, and
particularly systems integration testing, were more rigorous and demanding in 2008. These tests
took much longer than expected and only one vendor had successfully completed systems
integration testing, as of January 2009.
DNDO, CBP, and their contractors have conducted many tests over the last three years. A
list of the major tests conducted on the ASPs and RPMs can be found in Table 3.1. DNDO has a
complex set of criteria to evaluate. The characterization of a system is a process, and no one set
of tests is expected to describe thoroughly all variables. Indeed, the scientific method describes a
cycle of hypothesis and experimentation, which when applied to instrument development, allows
for an iterative process of identification and mitigation of weaknesses. How the tests could be
better crafted to carry out this process is described in detail later in this chapter.
The process for testing radiation portal monitor systems, such as the ASP systems, begins
at the component level and progresses to the subsystem and system level. Initial testing is
conducted with components and subsystems in the laboratory, such as functional and
environmental testing of individual detector elements, graduating to larger subsystems and full
systems in systems qualification testing. The last of these is done at Pacific Northwest
Laboratory. Overall systems performance is measured with live radiation sources and a
simulated port of entry at the Nevada Test Site (NTS, see Figure 3.1), and field validation testing
is conducted outdoors at U.S. ports of entry with representative container cargo loadings.
25
Prepublication Copy
OCR for page 26
26 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
Table 3.1 Tests and Key Questions
Tests Description Objective Key Questions
NYCT Tests ASP and PVT portals were To collect data (spectra) What does radiation in the stream
installed in primary and on stream of commerce of commerce look like? What is the
secondary screening sites. cargo containers to feed range and variation in radiation
The data collected were into injection studies emitted by typical cargo?
used for modeling and
injection studies.
Special Set of 12 “relatively blind” To assess vulnerabilities in Has bias been introduced into the
(“Blind” or test configurations. Tests the performance test plan. ASP test results by either vendors
“Demo”) performed at NTS. To evaluate the possibility or the test team?
Testing Anticipated results were that bias had been Does the test plan contain enough
compared to results given introduced into the test of a diversity of sources and test
to the operator. When results by vendors or the configurations?
available, underlying data test team.
(raw spectra) were To provide additional data
evaluated by third party to the vendors for system
isotope identification development.
algorithm. These results
were compared to operator
results. Statistical analysis
was performed by NIST to
determine how special test
results compared to
standard test results.
Phase 3 Tests Tests performed at NTS To aid in development of How do known areas for
with various sources and secondary screening improvement affect the
attenuating materials in operations and procedures. performance of ASPs, and what can
cargo containers moving at be done to address them?
different speeds.
Environmental Tests took place at the Verify that the system can Are all components of the ASP
Product vendor’s facility and at a function within the system durable enough to withstand
Qualification National Recognized Test environment, including the climate and environmental
Testing Laboratory and witnessed weather and climate, in stresses at ports of entry (POEs)
by government which the system will be across the country?
representatives. operated and maintained.
Systems A series of tests designed Verify technical Have the basic system requirements
Qualification by the vendors and achievement of the system been met? Is the system ready to
Tests approved by DNDO to requirements as described enter performance testing?
assure that the system in the Performance Is the ASP system suitable and
requirements of the Specification for ASPs deployable within the existing
performance specification nuclear detection architecture?
have been met. Tests took
place at the vendor’s
facility and PNNL’s 331G
facility and were witnessed
by government
representatives.
Performance Cargo containers loaded Evaluate system How do the ASP systems perform
Tests at NTS with varying configurations performance and collect relative to the current generation of
of shielding material, data to support operational detection and identification
masking material, threat test and evaluation. systems?
- Compare ASP system
objects, and surrogate
sources are run on a performance with that What are the thresholds for
roadway flanked by the detection of threat materials?
of the PVT and RIID
Prepublication Copy
OCR for page 27
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 27
PVT and ASP detectors in systems.
- Characterize the effect
sequence. Secondary RIID How do the systems perform with
screening is carried out in threat sources in the presence of
of shielding and
the staging area masking and attenuating material?
masking on ASP and
RIID performance
against threat objects
and NORM
- Collect data to support
verification of system
requirements
Collect data in support of
operational testing and
evaluation requirements
Integration Tests conducted by DNDO Demonstrate that the ASP Do the ASP systems meet the
Tests at PNNL’s 331G test systems are ready to be necessary integration requirements
facility. Test systems were integrated into the associated with their deployment,
placed in a simulated port interdiction systems at and are they suitable for operator
of entry environment and U.S. POEs for field use?
evaluated for compatibility validation in primary and
with CBP standard secondary configurations
operating procedures
(SOP) and other
equipment, such as gate
arms and traffic lights.
Both hardware and
software were evaluated.
- Perform system
Field Test conducted at ports of Does the ASP system fit readily
Validation Test entry. Conducted by CBP into the existing POE RPM sites?
installation procedures
with ASP systems in place and process Are they suitable for operator use?
- Train officers in the use
screening the stream of Is the ASP system interoperable
commerce trucks. PNNL with users/stakeholders to execute
of the system
- Familiarize officers with
will draft the final report. the nuclear detection and reporting
mission?
operations of ASP
systems with PVT
systems
- Conduct operations with
ASP alone
Operational ASP systems will be placed Validate the operational How effective is the ASP system in
Test at a POE in both primary effectiveness and terms of time to conduct screening,
and secondary locations in suitability of ASP at ports number of referrals to secondary
conjunction with PVT of entry under realistic screening, involvement of LSS, and
monitors to screen stream operating conditions reliability, availability, and
of commerce cargo maintenance of the system? Have
containers. The systems CBP personnel identified any
will be operated by CPB concerns or limitations of the
officers using standard system?
operating procedures. A Is the ASP system interoperable
survey of CBP personnel with users/stakeholders to execute
will also occur. the nuclear detection and reporting
mission?
Is the ASP system suitable and
deployable within the existing
nuclear detection architecture?
Prepublication Copy
OCR for page 28
28 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
(a) (b)
Figure 3.1 (a) Computer rendering of the PNNL 331-G site; and (b) ASP Test track at the
Nevada Test Site.
Because certain masking or shielding materials can interfere with the ability of the
warning system to detect or identify objects containing special nuclear material (SNM), tests are
also conducted at NTS with such masking or shielding materials and SNM. Fully integrated
operational tests follow the field validation tests and also are conducted outdoors at selected U.S.
ports of entry.
The committee has focused much of its attention on performance testing. This is not
because the other tests are unimportant: Regardless of the performance, the portals will be of
little use if they cannot operate in real conditions (rain for example) or if they are incompatible
with CBP’s computer systems. However, the design, execution, and evaluation of these tests are
comparatively routine, even if solutions to problems revealed by the tests are not. The design,
execution, and evaluation of performance tests for the portals is more challenging and involves
more of the science and engineering principles on which the committee has advice to offer.
Some types of testing for ASPs are constrained in ways that testing of many Department of
Defense procurement subjects (for example) are not. The main restrictions arise from the DOE
security regulations for SNM and health and safety requirements. These requirements result in the
need to separate the testing venues to meet the security needs and not impact health, safety, and
commerce at operational ports. While it was hoped that later testing would address the criticisms
of the earlier testing, DHS still has to operate under the limitations and constraints of security
required for SNM and minimal impact to the flow of commerce. Furthermore, it is neither possible
nor desirable to test every possible combination of cargoes and configurations. Physical testing
with radiation sources, especially special nuclear material, is expensive and time consuming, and
procurement decisions must be made in a timely fashion. For all of these reasons, the tests need to
be designed strategically to answer questions about performance across the vast space of possible
cargo and threat objects, rather than testing that space comprehensively through gross effort.
As a general principle, the goals of testing and criteria for evaluation need to be clear and
testable for a test and evaluation program to be effective. In some past testing, the goals and
criteria were not clear, or they shifted with time. This is one factor that led to test designs the
Prepublication Copy
OCR for page 29
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 29
results of which did not adequately answer key questions about performance. Furthermore, to be
useful, the goals and criteria need to be relevant. In this case, relevance means that the tests need
to reflect conditions in real world cargo, real environments, and the actual operation of detectors
in the field. DNDO did base some of its test design on data collected on the stream of commerce
using a PVT system and an ASP system at NYCT. Much more information relevant to test
design could have been elicited from data collected on alarms, correlated to shipping manifests at
ports of entry around the country, even without ASP data.
One set of goals has been articulated following Congress’ language that requires that the
ASPs demonstrate “a significant increase in operational effectiveness.” DHS was responsible for
defining these terms and in July 2008 issued the definition, found in Sidebar 3.1. The criteria in
the definition pertain to detection, identification, referrals from primary screening to secondary
screening, and speed of screening.
SIDEBAR 3.1 DHS definition of Significant Increase in Operational Effectiveness of the ASP-C
Criteria for Significant Increase in Operational Effectiveness [SIOE] of the ASP-C when deployed for:
Primary Screening
If ASP-C satisfies all of the following four criteria for primary screening, then a SIOE has been
demonstrated, independent of whether the criteria for deployment to secondary screening have been
satisfied. These enhancements would increase CBP's capability to interdict SNM as well as reduce the
volume of traffic requiring secondary screening.
1. When Special Nuclear Material [SNM] is present in cargo without NORM, the probability of a correct
operational outcome for the ASP-C must be equal to or greater thana the PVT RPM.
2. When SNM is present in cargo with NORM, the ASP-C in primary must increase the probability of a
correct operational outcome compared to the current end-to-end system as defined above.
3. When licensable medical or industrial isotopes are present in cargo, the probability of a correct
operational outcome for the ASP-C must be equal to or greater than the PVT RPM.
4. When the only radioactive source present in the cargo is NORM, the ASP-C must refer at least 80%
fewer conveyances for further inspection than the PVT RPM.
Criteria for Significant Increase in Operational Effectiveness of the ASP-C when deployed for
Secondary Screening
If ASP-C satisfies both of the following criteria for secondary screening, then a SIOE has been
demonstrated, independent of whether the criteria for deployment to primary have been satisfied. These
enhancements would increase CBP's capability to interdict SNM while more consistently and
expeditiously executing secondary screening operations.
1. When compared to the handheld Radioactive Isotope Identification Device (RIID), ASP-C must
reduce, by at least a factor of two, the probability that SNM is misidentified as NORM, a
medical/industrial radionuclide, unknown, or no source at all.
2. When compared to the handheld RIID, the ASP-C must reduce the average time required to correctly
release conveyances from secondary screening.
a
For HEU, ASP-C must show improved performance compared to PVT RPMs at operational thresholds.
SOURCE: Oxford et al. (2008)
Prepublication Copy
OCR for page 30
30 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
PAST TESTING
FINDING
Performance tests prior to 2008 had serious flaws that were identified by the
Government Accountability Office and the Secretary’s ASP Independent Review Team.
Tests prior to 2008 did not adequately establish the full capabilities of the ASP systems
compared with the currently deployed PVT and RIID screening systems, nor whether the
ASP systems met criteria for procurement.
This finding is based on several factors, which are discussed in some detail below. In
briefings to the committee in 2008, DNDO staff agreed with several of the criticisms of its prior
tests and stated that its 2008 tests were designed to address those deficiencies. The 2008 testing
approach is described in the next section.
The GAO in 2007 stated that DNDO used biased test methods that enhanced the
performance of the ASPs; DNDO’s NTS tests were not designed to test the limitations of the
ASPs’ detection capabilities; and DNDO did not objectively test the performance of handheld
detectors because they did not use a critical CBP standard operating procedure that is
fundamental to this equipment’s performance in the field (GAO 2007b). Specifically, GAO
wrote “DNDO conducted numerous preliminary runs of almost all of the materials, and
combinations of materials, that were used in the formal tests and then allowed ASP contractors to
collect test data and adjust their systems to identify these materials.”
With respect to bias, the IRT (2008) stated:
However the IRT’s assessment is that the system’s configurations were locked
and the test results were derived from automated systems that had not been
modified to benefit from the reduced set of possible outcomes. Operators were
given no advance guidance on the sequence in which threat objects were
presented. In short, the IRT did not find any evidence to support the notion that
the NTS test procedure resulted in the manipulation or biasing of test results, nor
does the committee believe that the NTS data needs to be discarded on the basis
of this issue. [Page 91.]
The committee did not independently verify these facts (e.g., that the configurations in
2007 were locked). The committee’s understanding of the operational use of the ASP and PVT is
that the systems provide alarm outputs based on programmed algorithms, not on operator
decisions, so no intentional real-time biasing of results by test operators was possible during the
tests. However, DNDO utilized the same sources, masking material, attenuating material, and
configurations in performance testing that were used in the set up for testing (dry runs and dress
rehearsals). If the vendors were allowed to calibrate their equipment and adjust their algorithms
using the test threat objects, then the equipment could more easily recognize the spectra. The
numbers of sources available were small, but this is not sufficient reason to use the same sources
for both set up and testing. Device setup and any calibration must use separate sources from
those used for testing.
In contrast with the ASP, the RIID requires much more operator interaction. DNDO
performance tests prior to 2008 did not follow all of the relevant standard operating procedures
for use of the RIIDs. According to the test plan (DNDO test plan) and briefings to the committee,
Prepublication Copy
OCR for page 31
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 31
this error was corrected in the 2008 performance tests. Regarding those procedures, the
committee observed in visits to ports of entry that the operator actions with RIID and
Laboratories and Scientific Services (LSS) are inconsistent, which could affect results, and
would even permit bias—either a positive or a negative bias—for comparing PVT/RIID and ASP
in secondary, although the committee observed no operator bias. Based upon observations at
operational ports and during the testing at NTS in 2008, even under the best circumstances (ideal
technical performance by the RIID), the effective use of the RIID depends on the actions of the
operator and decisions on the spot, which may not be consistent. The committee observed
variations in procedure, from one inspection to another, even with the same operator. The
committee therefore concludes that the RIID is susceptible to ineffective use.
The committee agrees that pre-2008 tests did not examine the limitations of the ASP’s
detection capabilities. If all of the results from a particular test are either positive (able to detect)
or negative (unable to detect), the examiner does not know how close the detector is to the
transition between positive and negative. The transition can be quite steep, and can be affected
by other factors that are not controlled in an operational environment. Furthermore, it is useful to
identify cases in which the ability to detect is poor both because it could help to provide
guidance on how to improve the system and because there is good reason to believe that
smugglers will choose smuggling strategies that result in poorer detection. A good physical test
of the capabilities and performance of a detector system maps the output of the system (the
result) as one parameter, such as the shielding, is increased stepwise and the detector transitions
from being able to detect to not being able to detect the radiation of interest. For example,
according to the IRT review (IRT 2008), the average NORM used in the 2007 NTS tests was
comparable to the average NORM in cargo observed at NYCT. But a small percentage of cargo
observed at NYCT had much higher levels, which may be sufficient to mask at least some of the
threat objects identified by DOE and DNDO.
SCIENTIFIC RIGOR AND ROBUSTNESS OF DNDO'S
2008 TESTING AND ANALYSIS APPROACH
FINDING
The 2008 performance tests were an improvement over previous tests. DNDO
physically tested some of the limits of the systems. However, the following shortcomings
remain. (1) Without modeling to complement the physical experiments, the selected test
configurations are too limited; (2) the sample sizes are small and limit the confidence that
can be placed in comparisons among the results; and (3) in its analysis, some of the
performance metrics are not the correct ones for comparing operational performance of
screening systems.
Many of the flaws in past testing were addressed in 2008 tests. For example, in 2008
performance tests, real CBP officers conducted the RIID screening of containers referred to
secondary screening, and DNDO included LSS analysis in evaluating the outcomes of those
screens. The threat objects (highly enriched uranium and plutonium sources) used in 2008 tests
had not been used in any previous tests or calibrations, which addressed another criticism of the
2007 NTS tests. Also, more challenging masking material was used for some cases. Appendix D
lists the combinations of threat objects, shielding material, and masking material, and their
configurations used in the 2008 performance tests.
Prepublication Copy
OCR for page 32
32 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
However, even with these improvements, shortcomings remain. These include structural
problems with the testing.
Without modeling to complement the physical experiments, the selected test
configurations are too limited
DNDO was limited by time and resources in what could be evaluated. For example, the
number and type of threat objects available to the testers through NTS and the Device Assembly
Facility (DAF) was small, and only one was the same mass and shape as the objects described in
the threat guidance.32 DNDO and its supporting scientists adapted to the lack of a threat source
that corresponds to the guidance threat by using computer simulations to model the sources and
determine what mass of threat material in a standard shape would emit equivalent radiation. The
number and type of sources tested cannot be considered “canonical,” i.e., they do not comprise a
“complete set” from which any possible source in a cargo container can be constructed.
Although a complete set is not practical or feasible, in the context of modeling described below it
is likely that a useful subset that spans the space of possible threats can be identified.
Because the number of possible permutations of cargo material is very large, loading and
unloading the shipping containers during the tests to cover all possible shielding and masking
variants is impossible, and the fact that the test sources are only available at NTS precluded the
assessment of background effects at multiple sites. In light of these limitations, the tests were
designed to evaluate the response of the detectors to containers with different configurations:
empty, a radiation source without additional shielding, a radiation source with shielding, and a
radiation source with masking material. The test design takes advantage of factorial design,
which allows for multiple factors to be tested and evaluated at one time, and is considered a
sound method of experimental design to obtain much information in a limited number of test
runs (see Appendix C).33 However, while the test design is reasonable as far as it goes, the tests
performed are not adequate to fully characterize the instruments nor to predict their performance
when monitoring the stream of commerce.
In part to address this problem, DNDO engaged scientists at Pacific Northwest National
Laboratory, Sandia National Laboratories, the Johns Hopkins Applied Physics Laboratory, and
Los Alamos National Laboratory to carry out “injection studies.” These are virtual tests in which
the gamma spectra of additional test sources, which were experimentally recorded at the national
labs under controlled circumstances, are added to (“injected into”) spectra of cargo in the stream
of commerce collected by ASPs during the 2007 New York Container Terminal test. These
combined spectra were then used to challenge the threat identification algorithms of the ASPs.
For example, of the 22 radiological and industrial isotopes of concern to DNDO, 13 were
acquired for testing, and nine were considered impractical or unnecessary to obtain for physical
testing. The response of the detectors to these nine radioisotopes is assessed by “an inspection of
the threat algorithm” alone. (Description of Medical and Industrial Radionuclides in version 4.10
of the ASP-C Performance Specification April, 2008)
32
The committee was told that DNDO selected among the few SNM sources available from the DAF.
33
Practical constraints on the performance testing prevented DNDO from conducting random trials. In other words,
the same threat object and configuration was passed through the portals repetitively in a linear sequence. Such a
testing approach is unlikely to detect some kinds of systematic errors, although the committee could not identify
credible, significant systematic errors that would be missed. Randomness is important because the usual methods for
assigning uncertainties to the results assume random trials and do not account for possible systematic effects.
However, there are good reasons why these tests could not be random and the committee was unable to identify a
significant consequence of the non-random tests.
Prepublication Copy
OCR for page 33
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 33
This type of testing is appropriate, and calculations of this kind seem to have helped
DNDO address the problems from 2007, when the performance tests did not chart the performance
across detection thresholds. The preliminary 2008 test results that the committee has seen suggest
that the tests found the transition ranges from undetectable to detectable. The committee concludes,
however, that DNDO should go beyond the existing tests and model a set of test sources that
represents the spectrum of possible sources and compare the results of the studies to the physical
data acquired during testing to identify flaws in the modeling and algorithms.
For baseline information, DNDO needs to characterize the performance of the ASP and
PVT detection systems for the cases of highly enriched uranium, plutonium, uranium-238, with
and without NORM, and shielding, as well as NORM without threat material. In addition,
DNDO needs characterization data for the background spectra for non-radioactive containers at
both NTS and one or more of the representative ports. These data will provide basic detector
characterization information, which will assist in the development and assessment of
computerized system models.
The committee recognizes that the security and health and safety restrictions for using
SNM in tests preclude doing realistic tests at operational ports of entry and that some
calculational bridge is needed to explore a detection system’s capability. At the time of this
interim report the committee had not received a full description of the “Injection Studies,” but
the briefing the committee received indicates that they were done by adding experimental threat-
object spectra to data collected on actual commerce traffic with NORM present and using the
algorithms to see what the detection probability would be for the superposed spectra. The
committee would like to see this approach extended to a more robust modeling approach that
uses simulations of the radiation source, radiation transport through the material in the container
and to the detector, and the response of the detector to generate the spectrum. These simulations
need experimental validation and so should be compared to the performance data collected at
NTS. If they do not agree within statistical uncertainties, then the reasons for disagreement
should be examined and corrected. When broad agreement has been obtained, then examples of
observed NORM and medical and industrial radiation sources can be integrated in a model with
threat material to explore the capabilities of the ASPs and PVTs against a much larger, more
multidimensional threat space.
These new simulations are distinct from the isotope identification step. DNDO has
required that the detector systems record data in a standard format, which represents the gamma
spectrum. The isotope identification software algorithm analyzes the spectrum in that data file.
Any isotope identification software should be able to analyze the spectrum from any detector and
from any simulations. There are other important elements of the software, such as reading the
occupancy sensor and operating the gate arms. Those pertain to integration with the physical
system, but the isotope identification module is the essential piece for performance of the system
and is separable from the rest of the system (see Figure 3.2).
Prepublication Copy
OCR for page 34
34 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
Radiation
Detection Portal
Cargo Container
Detector
Scattered
gamma ray
Pulse
Ana-
gamma ray
lyzer
Source
Spectrum in
Isotope ID
standard output
Algorithm
format
Software
Module
New Modeling
Spectrum in
standard output
format
Figure 3.2 Illustration of the physical system that generates a detected gamma ray spectrum (top)
and the suggested new modeling to simulate the same process and generate a spectrum (bottom).
Note: This drawing is not to scale and does not show all of the elements or components of the
detector system.
To overcome the inherent limitations of physical testing, modeling of the ASP systems
responses would be invaluable to the DNDO testing and analysis. With these models, many test
geometries could be evaluated and the selected results compared to the actual physical tests to
verify the modeling. Modeling can help to identify configurations for physical testing, and the
physical tests can be used to validate the models. Accurate modeling could help identify the
limitations inherent to the technology and the detectors and can assist in the development of new
technology over time.
In the current round of testing, the effects of shielding and masking were assessed
separately. While this allows for characterization of instrument response when faced with each
scenario, it does not reflect a realistic scenario in which both masking and shielding material
could be used to conceal radioactive material. The effects of the two types of concealment are
not simply additive, and a combination of the two should be investigated. The number of test
configurations that can be tested physically is finite. Loading and unloading of containers with
shielding and masking material is time-consuming, and time spent on testing is costly.
Here again is a case where a thorough modeling of the well-characterized spectral
response of the ASP systems would be beneficial in assessing a wider range of scenarios for
concealment of radioactive material. Data from the shielded-only, masked-only, and shielded +
masked sources would enable DNDO to assess the validity of the simulations and their ability to
accurately reflect detector performance capabilities. Using modeling calculations with the
vendors’ algorithms, test scientists can determine configurations of shielding and masking that
Prepublication Copy
OCR for page 35
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 35
would likely result in detection and identification in primary and identification in secondary with
a probability of 50 percent. This would enable DNDO to identify the critical portion of the
performance curve, that is the transition from correct to incorrect results from the ASP system
and to confirm these calculations by measurements at NTS. The probability of each outcome can
be tested at the NTS to confirm the accuracy of the models for select cases and either cause a re-
evaluation of the models or build confidence.
The subset of configurations for physical testing to validate models would be chosen to
test the cases where the expected results, based on simulations, are most sensitive (transition
regions). In other words, the simulations would be used to predict the configurations that are in
the detectors’ performance transition (from high-confidence detection to low- or no-confidence
detection), and the physical tests would be run to test that hypothesis. Each set of physical tests
would be used to validate the performance of the models in different regions of the test space.
Tests that DNDO has already done (including the pre-2008 tests, which used a wider range of
source materials) could be used in this effort, despite their shortcomings as performance tests.
Performance testing takes place only at NTS, and DHS’s operational testing of the ASPs
is planned to take place at only one location: The Port of Long Beach. The committee believes
that it is important to evaluate the effects of a variation in background intensity and spectra
because significant variations are expected among the ports of entry across the United States.
Computer modeling would be able to assist in the identification of limits of the algorithms’
ability to differentiate threat materials from the background radiation.
There are many factors that can affect a radiation detector’s capability, but it is not
possible to test all of the possible variations to threat material configurations, background,
shielding, and masking within the stream-of-commerce at all ports of entry. The current
round of physical testing does not reflect realistic scenarios well, although it does provide
important information about the response of the detectors to specific, controlled cases. A
thorough consideration of the methods of concealment of nuclear and radiological material
that could reasonably be expected from an adversary would better characterize the
performance of ASPs for the cargo-screening mission. The models could better cover the full
test space of scenarios that need to be evaluated, a goal that cannot be attained practically by
physical testing alone.
The sample sizes were small and limit the confidence that can be placed in comparisons
among the results
The time and resource constraints mentioned above limited the number of runs for each
configuration (the sample size) severely: as few as 6 and as many as 12. With such small sample
sizes, the uncertainties associated with the results are relatively large. This is mostly a concern in
the performance transition range for the detectors (where the detection probability is neither 1
nor 0). The number of runs (sample size) for each configuration needs to be large enough that
the uncertainties (error bars) are small enough for reasonable comparisons to be made to each
other and to results of simulations. The size of the sample needed can depend on the results of
the tests.
In its analysis, some of the performance metrics are not the correct ones for comparing
operational performance of screening systems.
Test system performance usually is characterized in terms of detection probabilities,
measuring the probability that the test system alarms (the test result is positive), given that the
Prepublication Copy
OCR for page 36
36 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
screened cargo truly contains threat material, or that it does not alarm (the test result is negative),
given that the screened cargo does not contain threat material. Because measurement of the
detection probabilities relies on true knowledge of the cargo contents, one can estimate those
probabilities only from a designed experiment.
In real life, however, with real trucks, one observes only the result (alarm status) of the
screening system. Either the system alarms or it does not, but one does not know the true state of
the cargo. The result of an accurate system ("alarm" or "no alarm") would be a reliable indicator
of the cargo contents (SNM or no SNM), but an inaccurate system would be an unreliable
indicator. One is concerned especially with this question: Given that the test system did not
alarm, what is the probability that the cargo contained SNM? That is, what risk does CBP take
by allowing a "no-alarm" cargo to pass? This "false-negative rate" (FNR) has serious
consequences. But translating from the measured probabilities to the false-negative rate and the
false positive rate requires some mathematical manipulation and introduction of an additional
parameter: the prevalence of threat material in cargo. Given that this parameter is neither known
nor measurable, comparisons between the performance of two screening systems can best be
measured by using ratios between the rates for the systems being compared. Such a metric will
more accurately reflect the relative performance of the screening systems. This issue is described
in detail in Appendix B.
Performance Testing Results and Evaluation
FINDING
Because they have large detectors and because of their configuration, ASPs would
be expected to improve isotope identification, and provide greater consistency in screening
each container, greater coverage of each container, and increased speed of screening over
that of the PVT/RIID combination when used in secondary screening. Consequently, tests
of ASPs in secondary screening are focused on confirming and quantifying that advantage for a
variety of threat objects, cargos, and configurations.
The greater consistency, better coverage, and increased speed of secondary screening are
the results of the configuration of the ASP systems. The ASPs have larger sodium iodide crystals
than the RIIDs. That size results in higher gamma count rates than in a handheld RIID
examining the same source, which compensates for the greater standoff distance and the shorter
exposure time for the ASP. The ASPs have better coverage of the containers. The consistency of
ASP screening depends on the speed of the truck through the portal. As noted elsewhere in this
report, different CBP officers using the handheld RIID place it differently. Preliminary results
from 2008 tests confirmed that this is true for the tested cases, but the physical tests could not
demonstrate that ASPs are superior to the screening system currently in place over the whole
operational envelope.
As noted above, when used for primary screening, an ASP system should be compared to
the existing combined primary and secondary screening system (both PVT and RIID) because of
differences in standard operating procedures for primary screening. DNDO’s preliminary
analysis appears to have accounted for this difference.
It is not clear to the committee how DNDO will interpret the performance test results in
the context of the criteria for “significant increase in operational effectiveness. Each tested
configuration is distinct, and averaging across configurations is not meaningful without applying
normalization or weighting factors. DNDO could use the NYCT data as weighting factors,
Prepublication Copy
OCR for page 37
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 37
although there are two challenges associated with this approach: (1) the relevant features are
multidimensional (gamma flux, radionuclides in cargo, density of attenuating material,
composition of attenuating material) and (2) NYCT data reflect cargo passing through one large
port at the time of the data collection, and cargo is different in different ports of entry and
changes with time. Even if these challenges are addressed weighting factors may only be valid
for evaluating likely referral rates, not performance against threat objects in containers in
commerce. The configurations could be weighted according to their frequency in the actual
stream of commerce (if that could be determined). However, there is no reason to think that
malefactors will choose the configuration of a cargo container for smuggling a nuclear weapon
randomly from configurations in the stream of commerce.
Finally, as noted above, there are large uncertainties in the results of these tests. The
numbers of conveyances for each source were small and the uncertainty associated with a small
sample is large. The costs of conducting larger sample tests with the same number of
configurations may have been prohibitive, which simply highlights the need to select the
physical test configurations carefully to maximize the information gained from those tests.
Operational Testing
The current plans call for operational testing of the ASP systems that is of short duration
and limited breadth. ASP systems will be installed at only one site for three weeks. This limited
testing and subsequent analysis does not allow DNDO to take full advantage of the opportunity
to collect information about real-world stream-of-commerce effects on detector performance.
While Pier A at the Port of Long Beach, the location for the test, does have a high volume of
cargo traffic, it is a location where the weather generally does not vary a great deal, and the type
of container coming through the terminal is predictable and not representative of all ports of
entry (POEs). By limiting operational testing to the environment and the cargo mix at a single
site, the curtailed field test is missing a prime opportunity to assess detector performance in the
real world.
Operational testing is designed to determine if the system is effective and fully useful in
field, operational settings and when operated by regular users, not just in a laboratory or test
setting. Operational test and evaluation means the field test, under realistic operational
conditions, of any equipment item or system intended for use by typical DHS users in defending
the U.S. homeland; and the evaluation of the results of such tests. Realistic operational testing is
intended to be independent from the contractor or developer of the system being tested, with the
evaluation of the results also reported independently.
Realistic operational testing is intended to use production representative systems,
operated by typical users who may not have the same training or expertise as the scientists and
engineers who developed the system in the first place. To the extent possible, the system or
equipment under test is to be operated under realistic stress and operational tempo, in an end-to-
end manner, using the same procedures as would be expected in everyday use, in an
operationally realistic environment, with the other interfacing systems with which the proposed
system is to be interoperable on line. In the case of an RPM, the “threat” is to be as realistic as
possible, including both the types of radioactive materials defined in the threat, and the naturally
occurring radioactive materials that are found in routine commerce. If the system under test
might be vulnerable to interferences, such as radio communications or other electromagnetic
interference, those sources should be present in the test also. Finally, because it may not be
Prepublication Copy
OCR for page 38
38 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
practicable to conduct a statistically significant number of operational tests, the test challenges to
the system are to be at the edges of the operating envelope and not only at the center of the
operating envelope. Contractor involvement in these operational tests is to be strictly avoided to
eliminate a possible source of bias, the effects of having a highly trained “golden crew”
operating the system, and to gauge the effectiveness of the system when operated by expected
users.
At the time that this Interim Report was written, the operational tests planned by DNDO
had not been conducted, and the committee does not know whether the general guidelines for
operational testing described above will be followed.
Changes to the DNDO Approach to Testing
RECOMMENDATION
For a more rigorous approach, DNDO should use theory and models of threat
objects, radiation transport, and detector response to simulate performance, predicting
outcomes, and use physical experiments to validate or critique the models’ fidelity to reality
and enable developers to refine the models iteratively. With validated models, DNDO can
evaluate the performance of the ASP systems over a larger, more meaningful range of cases
than is feasible with physical tests alone.
To make the testing and evaluation more scientifically rigorous, the committee
recommends an iterative approach with modeling and physical testing complementing each
other. As is noted earlier in the report, the threat space—that is, the set of possible threat objects,
configurations, surrounding cargoes, and conditions of transport—is so large and
multidimensional that DNDO needs an analytical basis for understanding the capabilities of
detectors for screening cargo. DNDO’s current approach is to physically test small portions of
the threat space and to use other experimental data to interpolate within the threat space to test
the identification algorithms in the detector systems.
Computer models are essential to the testing process: It is not feasible to examine all of
the relevant permutations of cargo and threat materials with physical tests alone. Computer
modeling can examine detector-system and algorithm behavior for a large number and breadth of
cases with a relatively modest commitment of funds and time. However, the models need to be
validated against results of physical tests that are carefully designed and selected to represent
cases covering the test space (the full domain of configurations and compositions of cargo,
masking material, shielding material, and threat objects). The injection studies that DHS and
DOE have sponsored enable scientists to test the isotope identification algorithms, but the role of
injection studies in the overall test plan is still very limited and does not establish an analytical
basis for understanding the detector systems’ capabilities, so a more full and more fully
integrated approach to modeling and physical testing is needed.34
34
GAO describes a PNNL report that discusses the limitations of injection studies.
According to a Pacific Northwest National Laboratory report submitted to DNDO in December
2006, injection studies are particularly useful for measuring the relative performance of
algorithms, but their results should not be construed as a measure of (system) vulnerability. To
assess the limits of portal monitors’ capabilities, the Pacific Northwest National Laboratory report
states that actual testing should be conducted using threat objects immersed in containers with
various masking agents, shielding, and cargo. (GAO 2007b)
Prepublication Copy
OCR for page 39
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 39
DHS and DOE are both deploying detectors that screen vehicles and cargo for nuclear
and radiological material, and both have an interest in better understanding the capabilities of
deployed and proposed detection systems. The committee recommends that DHS and DOE
integrate the modeling and testing in a scientific, iterative approach: theory and models would be
used to predict outcomes of tests; the test outcomes would then be used to validate or critique the
models; and the models would be used to explore a variety of possible threats, the full range of
which is very large and cannot be individually tested. This kind of interaction between computer
models and physical tests is essential for building scientific confidence. DOE and its national
laboratories have extensive experience with both detector development and iterative simulation
and experimental validation of models, most prominently in the stockpile stewardship program.
The performance tests conducted to date provide some validation points for modeling as well as
some assessment of detection capability for parameters such as the effects of source, shielding,
masking, speed, and background radiation level on ASP system performance. These existing
results are a sensible starting point for validation, but large uncertainties remain in these
parameters due to limited experimental conditions and small sample sizes.
For all of the reasons cited above about 2008 performance tests, DHS cannot conclude
definitively whether ASPs will consistently outperform the current PVT-RIID systems in routine
practice until the shortcomings are addressed. Better measurement and characterization are a
necessary first step but may not be sufficient to enable DHS to conclude that the ASPs meet the
criteria DHS has defined for achieving a “significant increase in operational effectiveness.” The
committee recommends modifications to the current DHS approach to the evaluation procedure.
These modifications would influence subsequent procurement steps.
Recommended Approach to the ASP Procurement Process
RECOMMENDATION
DHS should develop a process for incremental deployment and continuous
improvement, with experience leading to refinements in both technologies and operations
over time, rather than a single product purchase to replace current screening technology.
In attempting to meet a procurement schedule, DNDO has approached the development
of the ASP systems as a point goal rather than the beginning of a longer-term process of
technological improvement. The DNDO approach limits the possibility of iterative
improvements to the technology and could result in unnecessary constraints on the ability to
deploy future nuclear detection systems that would have improved performance characteristics.
The committee agrees that injection studies and modeling cannot be seen as valid without physical tests with threat
objects. Physical tests are needed for validation, as noted above, but they also can reveal engineering or
manufacturing flaws. Modeling tells how a system should perform, assuming that the equipment as built matches the
modeled detector, but confirmatory tests are needed with different units of the same equipment and under different
conditions. The committee’s recommendation above states that well validated models can and should be used in
conjunction with well selected physical tests when it is impractical to do sufficiently comprehensive testing by
physical tests alone.
Prepublication Copy
OCR for page 40
40 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
The passive radiation screening of cargo at ports of entry is expected to operate for a long
time. Although this capability may be enhanced with scanning or interrogation equipment,35
Congress has directed CBP to deploy passive detectors as part of the screening procedures for
cargo entering the United States. CBP has put RPMs in place at hundreds of ports of entry.
The threat environment, the composition of container cargo, technological and analytical
capabilities, and the nature of commerce at the ports of entry have changed significantly over the
last decade and are expected to evolve in both predictable and unpredictable ways in the coming
years. Containerization changed the nature of shipping in recent decades. Patterns of flow in
commerce continue to evolve as international trade changes, the world economy adjusts, and
production shifts among different countries. Patterns of transport also shift in response to costs
and incentives—for example, rail transport may increase relative to truck transport as pressures
to reduce carbon emissions and other environmental impacts increase.
Rather than focusing on the single decision about the deployment of ASPs, the current
testing should be viewed as a first step in a continuous process of improvement and adaptation of
the systems. DHS should develop a process for continuous improvement able to address and
exploit these changes, rather than a single product to replace current screening technology. This
would enable the system to be updated continuously so that it is not outdated or obsolete by the
time all of the systems are deployed.
RECOMMENDATION
DHS should deploy its currently unused low-rate initial production ASPs for
primary and secondary inspection at various sites. This would allow extended operational
testing with a small investment.
Such deployment, even on this limited scale, would provide additional data concerning
their operation, reliability, and performance, and allow DHS to better assess their capabilities in
multiple environments without investing in a much larger acquisition at the outset.
The committee has heard DNDO staff say that under current law such deployments are not
permitted prior to certification. The committee did not examine this question and cannot offer a
legal opinion, but the committee considers a phased deployment to be a sensible approach. The
committee recommends that DNDO reexamine the perceived restrictions and, if DNDO concludes
that such deployments are not permitted, ask for permission to go ahead with them.
RECOMMENDATION
DHS should match the best hardware to the best software (particularly the
algorithms), drawing on tools developed for the competition and elsewhere, such as the
national laboratories. This should be applied to ASPs and also to improved RIIDs.
The development of the hardware for radiation detection and the software for analyzing
the signals from the detectors is separable. It has been useful to have a competitive approach for
the systems and to see the results. However, as DHS moves forward, it should match the best
hardware to the best software (particularly the algorithms). In doing so, DHS should draw on
tools developed for the competition and elsewhere, such as the national laboratories.
35
Scanning is a process that actively irradiates the subject with x-rays or gamma rays to generate images of the
interior of the container. Interrogation systems if deployed, would use pulsed neutrons or gamma rays to irradiate a
container and would alarm on particular radiations from the irradiated cargo.
Prepublication Copy
OCR for page 41
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 41
The NaI detectors used in the ASP are a mature technology but continued improvements
in the detection and analysis algorithms can occur with research supported by DOE, DHS, and
others. The vendors’ algorithms are somewhat limited compared to algorithms developed at
government expense. With data from the hardware in a standard format, it would be
straightforward to later incorporate new and improved detection and analysis algorithms.
Further, improved algorithms, or even current ASP algorithms, could be used to substantially
improve the performance of handheld RIIDs.
ASPs will not eliminate the need for handheld detectors with spectroscopic capabilities.
The greatest deficiency of the RIIDs currently in use is their software. Because some of the
improvement in isotope identification offered by the ASPs over the RIIDs results from software
improvements, the best software package should also be incorporated into improved handheld
detectors. Newer RIIDs with better software might significantly improve their performance and
expand the range and flexibility of deployment options available to CBP for cargo screening. If
integration of improved software in hand-held devices is deemed impractical because of the
computational limitations of a low-power, handheld device, the computational capabilities of a
handheld device could be replaced or enhanced with a nearby desktop computer system that
receives data from the handheld detector by wireless transmission. In 2006, DNDO rolled out a
program to improve RIID software, called the Human Portable Radiation Detection System
(HPRDS). However, the committee saw no evidence that this effort was linked to the ASP
program or that potential improvements in the RIID were being considered in cost-benefit
analyses (CBA). Linkage makes sense for the technology development, as noted above, and also
for the CBA. If the HPRDS yields improved RIIDs in the next few years then the ASP
performance tests will have compared the ASPs to outdated technology, which can lead to poor
choices in cost-benefit tradeoffs.
By separating the software and hardware elements and engaging the broader science and
engineering community,36 DHS would have increased confidence in its procurement of the best
product available with current technology, and simultaneously could advance the state of the art.
Correlation of Models and Simulations with Physical Test Results
In addition to operational testing to demonstrate the performance of the system under
realistic conditions, one must develop faithful models and simulations to examine scenarios that
may not have been attempted in the field. The process of validating these models and
simulations will include predictions of systems performance under conditions that are well-
defined and can be tested in the field. Only if the models and simulations actually predict
observed performance under conditions that are amenable to testing (within statistical
uncertainties) will DHS have confidence that the models and simulations might be dependable
for describing other configurations. Even then, there may be some configurations which the
models and simulations do not predict adequately. This would not be surprising. To minimize the
number of potential non-conforming configurations in this set, physical testing needs to explore
informative, challenging cases.
36
Even short of the innovation that might arise from broader scientific perspectives, better documentation and peer
review of the algorithms would make it easier to compare the algorithms and to evaluate this critical part of the
system.
Prepublication Copy
OCR for page 42
42 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
The RPMs must be tested as a complete operational system (not just as components), and
under conditions that reproduce a fully integrated installation under a range of conditions to
demonstrate correlation between test results and models and simulations. Similarly, test objects
must be selected to adequately represent the threat that the system is meant to address. If the
threat is nuclear terrorism, then the test objects and configurations would include nuclear
materials in the quantities, shapes, and intensities, along with shielding or masking materials
designed to foil the RPM, such as might be expected from an inventive terrorist.
In addition to the improved understanding such testing affords, it can offer operational
solutions to problems arising from the limitations of the detectors. If the threshold that would
mask threat objects were known, then all cargo containers that are above that threshold could be
referred to secondary screening and more thorough analysis. (As noted earlier in this chapter,
DNDO revised its performance testing for 2008 to address this problem, and preliminary results
suggest that the tests found the transition ranges.)
The committee believes that by approaching the test, evaluation, and future technology
development as an iterative process, the limited deployment of the existing ASP systems could
be a vital tool in improving the technology prior to blanket deployment at U.S. ports of entry.
Distribution of the existing ASP systems to ports and border crossings in a variety of locations
and environments (Port of LA/LB, NYCT, and Detroit for example), would provide information
about the variables in the real-world system that could be fed back into models and could be used
to develop future generations of the hardware, software, and analytical algorithms. At the very
least, operational testing should be expanded to take advantage of some of these opportunities.
Other considerations
RECOMMENDATION
Scenarios identified by red-teaming efforts should be used in developing new models
and physical tests of detection systems to learn ways of improving the technologies and
their deployment.
DNDO already has a red-teaming capability that is applied to operations, and the test
programs are already intended to identify systematically the detection capabilities of the ASP
systems. Red teams suggested here as part of an on-going testing and development program
could help DNDO (a) identify strategies that smugglers without detailed knowledge of the
systems are more likely to try and what the adversaries’ adaptation might look like; (b) identify
new vulnerabilities that the new technologies and CONOPs introduce; and (c) identify what
technological changes affect the effectiveness of the systems and their applications. Similarly,
this approach is valuable in test design, ensuring that a realistic range of cases is examined and
validating the testing protocols. The Special Tests (see Table 3.1) may have served some of this
function, although they were designed for a slightly different purpose and appear not to have
been as systematic as what one would expect from a red teaming effort.
As noted earlier in this report, DNDO, CBP, and DOE have similar and overlapping
missions and needs for screening vehicles and cargo. They use and are considering procuring
much of the same equipment. DNDO has consulted and cooperated with DOE on some aspects
of the ASP development, but these efforts should be expanded. A wealth of experience dealing
with algorithm development and archives of data relating to radioactive material and spectral
analysis exists within the DOE national laboratories. A call to the labs and other agencies for a
Prepublication Copy
OCR for page 43
CHAPTER 3: TESTING AND ANALYSIS OF THE ASP AND PVT/RIID SYSTEMS 43
survey of past research and information, assistance, and collaboration could help DNDO tap into
the expertise within those institutions.
Prepublication Copy