| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 69
Appendix C
The Value of Factorial Experiments
Factorial experiments are extremely useful designs when outcomes are needed for a
variety of test conditions. For example, consider the following factors that could affect test
performance (e.g., probability of an alarm, or probability of no alarm):
• Masking (absent, present)
• Shielding (absent, present)
• Mask location (front, middle)
• Mask height (front, middle)
• Shield location (front, middle)
• Shield height (front, middle)
• SNM (none, some)
• NORM (none, some)
More than 8 factors could be envisioned (e.g., cargo density, ambient temperature,
ambient humidity, background radiation level), and more than just 2 levels for each factor could
be considered. For example, the masking and shielding factors could have levels labeled
“absent,” “front,” and “middle;” and the SNM and NORM factors could have four levels labeled
“none,” “small,” “medium,” and “large,” resulting in a 3x3x4x4 design (a total of 144 test
conditions). This appendix illustrates the value of factorial designs (and a way to reduce the
number of test conditions) with the above design simply for ease of illustration. The same
concepts apply to more complex designs. But even with only these 8 factors at these levels, the
testing of all 2 x 8 = 16 single-factor tests would not be informative. For example, what happens
if a cargo contains some SNM and some NORM with much shielding and some masking placed
in the middle of the truck? None of the 16 test runs would answer this question. One might also
want to know if the probability of detecting SNM is affected by the combined presence of
masking and shielding of different magnitudes—a question that likewise would not be answered
by any of the 16 runs.
The benefits of running test combinations can be seen already with the following
(simpler) test design with these hypothetical results:
shielding
present absent
present 0.20 0.95
masking
absent 0.80 0.99
If one tested only “masking present” and “masking absent” in the absence of shielding, one
might conclude that masking has some effect on SNM detection (0.95 vs. 0.99), but not as great as
the effect of shielding in the absence of masking (0.80 vs. 0.99). One needed 3 runs to ascertain
this conclusion. But with only one more run (masking and shielding both present), one sees that
their combined effect is devastating to the probability of detection (0.20)—far lower than with
69
Prepublication Copy
OCR for page 70
70 EVALUATING TESTING, COSTS, & BENEFITS OF ASPs: INTERIM REPORT
either factor singly. The effect of different combinations of factors can be especially illuminating;
hence the value of experimental designs with combinations of factors or “factorial designs.”
Unfortunately, testing all 2x2x2x2x2x2x2x2 = 28 = 256 combinations would be
infeasible, especially since the outcome of each test is a “probability of detection”; i.e., (number
of runs that sounded alarm)/(total number of runs). To minimize the uncertainty in this estimated
probability, several runs must be conducted at each test scenario. With only n=6 or n=12 runs,
one would have to conduct 256x6 = 1536 or 256x12 = 3072 test runs, and, even then, the
uncertainty in the estimated probability could be as high as 30%-40% (95% confidence). For
example, a perfect test of 6 correct actions (6/6) would yield an approximate 95% confidence
interval for the true probability of detection as [(1-0.95)1/n, 1] = (0.61, 1.00) if n = 6 or [(1-
0.95)1/n, 1] = (0.78, 1.00) if n = 12. Clearly some reduction in the number of test scenarios is
needed.
Fractional factorial experiments are factorial experiments with only a fraction of the total
number of runs. Consider, for ease of illustration, only 4 factors, denoted A, B. C, D, each at 2
levels (“present”, “absent”). Sixteen test scenarios would cover all combinations, as follows:
Factor levels Product (Mod 2)
Scenario A B C D ABCD
1 1 1 1 1 1
2 1 1 1 0 0
3 1 1 0 1 0
4 1 1 0 0 1
5 1 0 1 1 0
6 1 0 1 0 1
7 1 0 0 1 1
8 1 0 0 0 0
9 0 1 1 1 0
10 0 1 1 0 1
11 0 1 0 1 1
12 0 1 0 0 0
13 0 0 1 1 1
14 0 0 1 0 0
15 0 0 0 1 0
16 0 0 0 0 1
“1” = “present”, “0” = “absent”; “Product (Mod 2)” = 1 with even numbers of 1’s, 0 with odd numbers of 1’s
Consider the rows whose last column value is 1:
Run # A B C D
1 1 1 1 1
4 1 1 0 0
6 1 0 1 0
7 1 0 0 1
10 0 1 1 0
11 0 1 0 1
13 0 0 1 1
16 0 0 0 0
Prepublication Copy
OCR for page 71
APPENDIX C: THE VALUE OF FACTORIAL EXPERIMENTS 71
Notice that exactly 4 runs have A absent and 4 runs have A present; the same is true of B,
C, or D. Moreover, when A is present (first 4 runs), exactly 2 of the 4 runs have B present and 2
have B absent; the same is true for C and D, and any two of the four factors (A and C, A and D,
etc.). In fact, all 8 runs for any combination of 3 factors (A, B, C; A, B, D; B, C, D) are included.
So this design allows us to evaluate:
• The effect of A (present vs. absent)
• The effect of B
• The effect of C
• The effect of D
• The effect of A and B together
• The effect of A and C together
• The effect of A and D together
• The effect of B and C together
• The effect of B and D together
• The effect of C and D together
• The effect of A, B, and C together
• The effect of A, B, and D together
• The effect of B, C, and D together
The only effect that we cannot assess is the 4-way interaction, ABCD. But we have
reduced the number of runs from 16 to 8, a big savings.
The same principle applies with 8 factors. If resources allow us to run only 64 scenarios,
then we sacrifice the ability to estimate the interactions that involve 5 or more factors at once—
e.g., ABCDEFGH, all 7-factor interactions (ABCDEFG, …, BCDEFGH)—but we can estimate
all other main effects and 2-way, 3-way, and 4-way interactions. (Usually interactions involving
4 or more factors are hard to interpret anyway.) If we can run only 32 scenarios, we sacrifice the
ability to estimate not only these high-order interactions, but also some ability to resolve some
two-factor interactions; but we can still assess the main effects (A alone, ..., H alone) and most
two-factor interactions (AB, ..., GH)—all with just 32 runs, a huge savings.
The designs that NIST provided to DNDO for their test runs followed this principle. The
only limiting factors are n, the number of test runs, and the inability to conduct the “SNM
present” tests as blind tests. The former can be improved by increasing n; the latter can be
addressed by hiring “actors” to pretend to act as security agents, with only DNDO personnel
aware of the true SNM test scenarios. The effect of bias when tests are run unblinded has been
documented extensively in the medical literature; unblinded tests must be viewed with great
caution and even skepticism.
Prepublication Copy