Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 49
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true
to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please
APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION 49
APPENDIX D: THE USE OF TESTS FOR SCREENING
AND SELECTION
One use of tests of vision, and one realm in which the working group believes that the emerging techniques
can make a considerable contribution, is in the screening and selection of personnel. There are several important
and independent issues that must be taken into account when tests are used for these purposes. These issues are
often discussed in the context of statistical decision theory (e.g., Einhorn and Hogarth, 1981) and measurement
theory (e.g., Stevens, 1951; Townsend and Ashby, 1984). Any test giving a quantitative score, whether visual
acuity, grating resolution, or I.Q., has three fundamental characteristics that determine its worthiness: reliability,
accuracy, and validity.
RELIABILITY
Reliability is the degree to which a test score is repeatable. It is usually measured by the correlation
coefficient, R, calculated between either two separate administrations of the same test or two separate versions of
the same test given simultaneously. With a reliable test, persons scoring high on the first administration will
score high on the second, while those scoring low on the first will score low on the second. For a perfectly
reliable test, R = 1; in a completely unreliable test, R = 0.
ACCURACY
Accuracy (or precision) is a concept related to reliability. A measurement can be considered to be composed
of two components: the true value and some random error. It is not possible to know how much random error is
in a specific measurement, but it is possible, using statistical methods, to estimate the average size of the error
component. We can therefore specify the accuracy of a particular measurement in terms of a statistical
confidence interval. The standard error of the mean is the most commonly used measure of statistical accuracy.
Take, for example, a measured threshold value of −2.00 log contrast having a standard error of 0.05. Using the
Gaussian probability distribution, one can estimate that the “true” threshold value lies within the range of values
−2.05 to −1.95, with a statistical probability of 0.68 (i.e., the 68 percent confidence interval is −2.05 to −1.95).
use the print version of this publication as the authoritative version for attribution.
The commonly used 95 percent confidence interval corresponds to plus and minus 1.96 times the standard rror
(in this example, −1.90 to −2.10). All other factors being equal, it is desirable to use tests having the highest
reliability and giving the most accurate measurements. In any case, the reliability and the accuracy of any vision
test used for screening or personnel selection should be known by its users.
OCR for page 50
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true
to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please
APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION 50
VALIDITY
Being reliable is a necessary attribute of a test, but it is not sufficient to ensure that it is useful. A test
measurement must also have validity. There are actually three main types of test validity (Nunnally, 1978;
Wood, 1977): content validity, predictive validity, and construct validity. For the purpose of screening and
selection, predictive validity is the most relevant of the three. Predictive validity is determined by assessing the
ability of a test to accurately predict performance on some other test (the other test is called the criterion or
standard test). Predictive validity is expressed in terms of R2, the proportion of variance in the criterion test that
is accounted for by the variance in the predicting test. A perfect test would give R2 = 1 and would allow for no
errors in screening or selection.
SCREENING AND SELECTION
The factors that must be taken into consideration when a test is used to screen or select personnel will be
discussed using a hypothetical example. Flanagan (1947) presents a discussion of the application of these
methods to the selection of pilots during World War II. Let us assume that we want to select pilots for their
ability to detect distant targets, and that we would like to make this selection on the basis of a simple vision test
(which is fast and easy to administer) rather than on the basis of actual target detection performance (which is
time-consuming and expensive to measure). In order to use the vision test for selection, we must first measure its
predictive validity. To measure its predictive validity, one first takes a randomly selected sample of people from
the target population (Air Force pilots in this example), then administers both the vision test and the actual target
detection test to all the subjects, then calculates the predictive validity of the vision test by calculating the ability
of the vision test to predict each person's score on the target detection task. A set of hypothetical data based on
this procedure is shown in Figure 23. Here pilots' target detection ability (criterion task) is plotted as a function
of their score on the vision test (predictor score). The predictive validity of the vision test shown in the figure is
0.6, a reasonable value for actual tests. In screening and selection, it is necessary to establish a vision score
above which a person will be accepted and below which he or she will be rejected. This cutoff score is illustrated
in Figure 23 by the vertical line extending to the horizontal axis of the graph. There is a second cutoff score that
also must be established: the performance level on the criterion task that defines the minimum acceptable level
of performance. This cutoff is represented by the horizontal line extending to the vertical axis of the graph.
Target detection performance above this level is defined as acceptable, while performance below it is
unacceptable.
These two cutoffs divide the population of tested individuals into four groups or quadrants. Two quadrants
represent correct decisions and the two others represent mistakes in the screening or selection process.
use the print version of this publication as the authoritative version for attribution.
OCR for page 51
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true
to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please
APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION 51
FIGURE 23 Selection of pilots based on hypothetical data.
The people in the upper right quadrant are those who are accepted on the basis of their vision test score and
who have acceptable target detection ability: they are called hits. The other correct decisions are the people in the
use the print version of this publication as the authoritative version for attribution.
lower left quadrant who are rejected on the basis of their vision score and who indeed have unacceptable target
detection ability: they are called correct rejections. There are two types of selection mistakes. The people in the
upper left quadrant are those who are rejected on the basis of their low score on the vision test but who
nevertheless have acceptable target detection ability: they are called misses. The other mistakes are in the lower
right quadrant; those who, because of their high vision score, are accepted but have poor target detection
performance: they are called false acceptances. Generally, it is desirable to maximize the hits and correct
rejections, while minimizing the misses and false acceptances.
OCR for page 52
About this PDF file: This new digital representation of the original work has been recomposed from XML files created from the original paper book, not from the original typesetting files. Page breaks are true
to the original; line lengths, word breaks, heading styles, and other typesetting-specific formatting, however, cannot be retained, and some typographic errors may have been accidentally inserted. Please
APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION 52
An interesting aspect of the screening or selection process is that it is only possible to have error-free
decisions when using a predictive test with a predictive validity of 1. In the more usual screening situation
illustrated in Figure 23, errors of selection will always occur; there is no way to achieve error-free selection. The
two cutoffs can be adjusted as is appropriate for each specific circumstance. For example, requiring a higher
vision score for acceptance (moving the cutoff score to the right) will decrease the number of false acceptances
and increase the correction rejections, but at the cost of increasing the misses and decreasing the hits. In a similar
manner, lowering the criterion level defining acceptable target performance will increase the number of hits and
decrease the false acceptances but at the cost of reduced correct rejections and increased miss rate. Exactly where
the two cutoffs are placed would usually depend on the relative costs of the two types of errors and the relative
benefits of the two types of correct decisions. This approach to selection allows a rational basis for setting cutoff
points to achieve the goals of the screening or selection process.
SUMMARY
In order for a test to be useful in screening or personnel selection it must first be reliable (a necessary but
not sufficient condition) and it must have a reasonably high predictive validity on some relevant criterion task.
Accuracy is also important because it contributes both to reliability and validity; lower accuracy lowers both test
reliability and predictive validity. It is important when comparing two tests on their reliability and predictive
validity that the measurements be taken with the same level of statistical accuracy. Otherwise the test with the
higher statistical accuracy (smaller confidence interval) will have an artifactual advantage over the other. It is not
possible to achieve error-free screening or selection using a test whose predictive validity is less than 1. Cutoff
points can be adjusted, however, to allow an appropriate balance between hits, correct rejections, misses, and
false acceptances. The costs, benefits, and goals of the screening or selection process will determine how the
cutoffs are set.
use the print version of this publication as the authoritative version for attribution.