Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION 49 APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION One use of tests of vision, and one realm in which the working group believes that the emerging techniques can make a considerable contribution, is in the screening and selection of personnel. There are several important and independent issues that must be taken into account when tests are used for these purposes. These issues are often discussed in the context of statistical decision theory (e.g., Einhorn and Hogarth, 1981) and measurement theory (e.g., Stevens, 1951; Townsend and Ashby, 1984). Any test giving a quantitative score, whether visual acuity, grating resolution, or I.Q., has three fundamental characteristics that determine its worthiness: reliability, accuracy, and validity. RELIABILITY Reliability is the degree to which a test score is repeatable. It is usually measured by the correlation coefficient, R, calculated between either two separate administrations of the same test or two separate versions of the same test given simultaneously. With a reliable test, persons scoring high on the first administration will score high on the second, while those scoring low on the first will score low on the second. For a perfectly reliable test, R = 1; in a completely unreliable test, R = 0. ACCURACY Accuracy (or precision) is a concept related to reliability. A measurement can be considered to be composed of two components: the true value and some random error. It is not possible to know how much random error is in a specific measurement, but it is possible, using statistical methods, to estimate the average size of the error component. We can therefore specify the accuracy of a particular measurement in terms of a statistical confidence interval. The standard error of the mean is the most commonly used measure of statistical accuracy. Take, for example, a measured threshold value of â2.00 log contrast having a standard error of 0.05. Using the Gaussian probability distribution, one can estimate that the âtrueâ threshold value lies within the range of values â2.05 to â1.95, with a statistical probability of 0.68 (i.e., the 68 percent confidence interval is â2.05 to â1.95). The commonly used 95 percent confidence interval corresponds to plus and minus 1.96 times the standard rror (in this example, â1.90 to â2.10). All other factors being equal, it is desirable to use tests having the highest reliability and giving the most accurate measurements. In any case, the reliability and the accuracy of any vision test used for screening or personnel selection should be known by its users.
APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION 50 VALIDITY Being reliable is a necessary attribute of a test, but it is not sufficient to ensure that it is useful. A test measurement must also have validity. There are actually three main types of test validity (Nunnally, 1978; Wood, 1977): content validity, predictive validity, and construct validity. For the purpose of screening and selection, predictive validity is the most relevant of the three. Predictive validity is determined by assessing the ability of a test to accurately predict performance on some other test (the other test is called the criterion or standard test). Predictive validity is expressed in terms of R2, the proportion of variance in the criterion test that is accounted for by the variance in the predicting test. A perfect test would give R2 = 1 and would allow for no errors in screening or selection. SCREENING AND SELECTION The factors that must be taken into consideration when a test is used to screen or select personnel will be discussed using a hypothetical example. Flanagan (1947) presents a discussion of the application of these methods to the selection of pilots during World War II. Let us assume that we want to select pilots for their ability to detect distant targets, and that we would like to make this selection on the basis of a simple vision test (which is fast and easy to administer) rather than on the basis of actual target detection performance (which is time-consuming and expensive to measure). In order to use the vision test for selection, we must first measure its predictive validity. To measure its predictive validity, one first takes a randomly selected sample of people from the target population (Air Force pilots in this example), then administers both the vision test and the actual target detection test to all the subjects, then calculates the predictive validity of the vision test by calculating the ability of the vision test to predict each person's score on the target detection task. A set of hypothetical data based on this procedure is shown in Figure 23. Here pilots' target detection ability (criterion task) is plotted as a function of their score on the vision test (predictor score). The predictive validity of the vision test shown in the figure is 0.6, a reasonable value for actual tests. In screening and selection, it is necessary to establish a vision score above which a person will be accepted and below which he or she will be rejected. This cutoff score is illustrated in Figure 23 by the vertical line extending to the horizontal axis of the graph. There is a second cutoff score that also must be established: the performance level on the criterion task that defines the minimum acceptable level of performance. This cutoff is represented by the horizontal line extending to the vertical axis of the graph. Target detection performance above this level is defined as acceptable, while performance below it is unacceptable. These two cutoffs divide the population of tested individuals into four groups or quadrants. Two quadrants represent correct decisions and the two others represent mistakes in the screening or selection process.
APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION 51 FIGURE 23 Selection of pilots based on hypothetical data. The people in the upper right quadrant are those who are accepted on the basis of their vision test score and who have acceptable target detection ability: they are called hits. The other correct decisions are the people in the lower left quadrant who are rejected on the basis of their vision score and who indeed have unacceptable target detection ability: they are called correct rejections. There are two types of selection mistakes. The people in the upper left quadrant are those who are rejected on the basis of their low score on the vision test but who nevertheless have acceptable target detection ability: they are called misses. The other mistakes are in the lower right quadrant; those who, because of their high vision score, are accepted but have poor target detection performance: they are called false acceptances. Generally, it is desirable to maximize the hits and correct rejections, while minimizing the misses and false acceptances.
APPENDIX D: THE USE OF TESTS FOR SCREENING AND SELECTION 52 An interesting aspect of the screening or selection process is that it is only possible to have error-free decisions when using a predictive test with a predictive validity of 1. In the more usual screening situation illustrated in Figure 23, errors of selection will always occur; there is no way to achieve error-free selection. The two cutoffs can be adjusted as is appropriate for each specific circumstance. For example, requiring a higher vision score for acceptance (moving the cutoff score to the right) will decrease the number of false acceptances and increase the correction rejections, but at the cost of increasing the misses and decreasing the hits. In a similar manner, lowering the criterion level defining acceptable target performance will increase the number of hits and decrease the false acceptances but at the cost of reduced correct rejections and increased miss rate. Exactly where the two cutoffs are placed would usually depend on the relative costs of the two types of errors and the relative benefits of the two types of correct decisions. This approach to selection allows a rational basis for setting cutoff points to achieve the goals of the screening or selection process. SUMMARY In order for a test to be useful in screening or personnel selection it must first be reliable (a necessary but not sufficient condition) and it must have a reasonably high predictive validity on some relevant criterion task. Accuracy is also important because it contributes both to reliability and validity; lower accuracy lowers both test reliability and predictive validity. It is important when comparing two tests on their reliability and predictive validity that the measurements be taken with the same level of statistical accuracy. Otherwise the test with the higher statistical accuracy (smaller confidence interval) will have an artifactual advantage over the other. It is not possible to achieve error-free screening or selection using a test whose predictive validity is less than 1. Cutoff points can be adjusted, however, to allow an appropriate balance between hits, correct rejections, misses, and false acceptances. The costs, benefits, and goals of the screening or selection process will determine how the cutoffs are set.