Read "Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery" at NAP.edu

« Previous: 11 In Whose Interest: Potential Effects of the VG-GATB Referral System

Page 235 Cite

Suggested Citation:"12 Evaluation of Economic Claims." National Research Council. 1989. Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery. Washington, DC: The National Academies Press. doi: 10.17226/1338.

Page 236 Cite

Page 237 Cite

Page 238 Cite

Page 239 Cite

Page 240 Cite

Page 241 Cite

Page 242 Cite

Page 243 Cite

Page 244 Cite

Page 245 Cite

Page 246 Cite

Page 247 Cite

Page 248 Cite

Page 249 Cite

Page 250 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Evaluation of Economic Claims ., There is no question that any individual employer who can be selective in hiring workers will benefit. What is problematic is the magnitude of the economic benefits that would accrue to the individual employer or to the economy as a whole if ability testing were more widely used. Part of the Department of Labor's rationale for promoting the VG-GATB Referral System is based on very specific claims of economic benefits. John Hunter, the author of U.S. Employment Service (USES) Test Research Report No. 47, which contains an analysis of the economic benefits of personnel selection using ability tests (U.S. Department of Labor, 1983e), estimates that a "potential increase in work force productivity among the employers who hire through the service would come to $79.36 billion per year." That report also refers the reader to the work of Hunter and Schmidt (1982), in which they estimate productivity gains of between $13 billion and $153 billion in the economy as a whole due to using ability tests for selection. In this chapter we review these claims. UTILITY ANALYSIS: GAINS FOR THE INDIVIDUAL FIRM In the first part of the discussion we review the model (known as utility analysis) that Hunter and Schmidt used to estimate how much an individual employer would gain by using ability tests to select workers. The formula that Hunter and Schmidt derive to measure the gains from using ability testing is taken from Brogden (19461: 235

236 ASSESSMENT OF THE VG-GATB PROGRAM G= (r)(s)(A), where G = the dollar gain per worker per year due to hiring in order of test score rather than randomly, the correlation between test score and productivity, s = the standard deviation of yearly productivity in dollars among workers in the applicant pool, and A = the average test score of those applicants selected, when test scores are standardized to have mean O and variance 1 in the applicant pool. In this formula, the economic benefits to an employer are determined by three parameters. The first is the validity of the test, the extent to which test performance is correlated with productivity. The second and third parameters measure the potential an employer has for improving productivity by selecting better workers. How much productivity could improve depends on the variability of productivity in the employers applicant pool and on the latitude the employer has in selecting workers. If productivity varies widely, an employer will benefit from using a test that selects the best workers. However, if one worker is about as good as another, the gains from selecting the best will be small. Similarly, if an employer must hire everyone who applies for a job, then it does not help him to know who is best. However, if it is possible to reject 90 or 95 percent of all applicants, it is obviously advantageous to be able to identify the most able workers. If the selection were random, then the average test score among selected workers would be zero, and there would be no gains in productivity. The gain is derived because the employer can select the top-scoring percentage of those who apply for a job. If the test score distribution is normal, the influence of selectivity, p, on the employer's gains is measured by M(pJ, a statistical formula that is the inverse of the Mill's ratio. For our purposes, it suffices to note that M(p' calibrates the influence of selectivity, p, on productivity gains. M(p) is a decreasing function of p; the more selective an employer can be, the lower is p and the greater are the potential gains from using ability tests to hire the best workers. 'The formal definition of M(p) is M(p) = f [H(1 - p)]lp where f and H are, respectively, the density and the inverse of the cumulative distribution function of the standardized normal distribution function.

EVALUATION OF ECONOMIC CLAIMS 237 Potential Benefits of Employment Service Use of the VG-GATB As a demonstration of the use of the utility formula, we examine the Hunter estimate that optimal test use would have resulted in an estimated benefit of $79.36 billion-to employers using the Employment Service system in 1980 (U.S. Department of Labor, 1983e). That figure is widely quoted in promotional literature for the General Aptitude Test Battery (GATE). (The numbers in this discussion relate to 1980. The technique could be applied to contemporary data with corrections for inflation and the scale of Employment Service operations.) The first number needed for the formula is the correlation between test score and productivity, which Hunter takes to be .5, based on USES validity generalization studies connecting test score and supervisor ratings. The second number is the standard deviation of worker productivity, which Hunter estimates to be 40 percent of average wages. This figure is based on six empirical studies that covered clerks, nursers aides, grocery clerks, adding machine operators, and radial drill-press operators, with estimated standard deviations of 20 percent, 15 percent, 15 percent, 10 percent, 10 percent, and 25 percent (Hunter and Schmidt, 1982: Table 7.1~. It is also based on a method of variability assessment developed by Hunter and Schmidt (see U.S. Department of Labor, 1983e) in which supervisors are asked to estimate the dollar value of an average worker and of a worker at the 85th percentile. The ratio of the two estimates is an estimate of the standard deviation of worker productivity (under the assumption that productivity is normally distributed, an assumption that has been supported by Hunter and Schmidt in a study of computer programmers.) Hunter and Schmidt developed values of 60 percent and 55 percent for budget analysts and computer programmers. Combining these estimates with the previous empirical studies produces their overall estimate of the standard deviation of worker productivity as 40 percent of average annual wages. The final number is the referral ratio, the proportion of applicants referred. Hunter takes the value of 10 percent based on an "informal enquiry that tile U.S. Employment Service has jobs for only about 1 in 10 of the applicants." The value of M(pJ for this referral ratio is 1.76; this means that the average test score over the top 10 percent of scorers is 1.76, when the test is standardized to have mean O and standard deviation 1. Applying Brogden's formula gives a percentage gain, per worker per year, of G= .50 x 40 x 1.76= 3556. In 1980, the Employment Service placed 4 million applicants in jobs. Average annual wage in the jobs served by the Employment Service is

238 ASSESSMENT OF THE VG-GATB PROGRAM $16,000. Average job tenure in the United States is 3.6 years. Thus the total wages spent on workers hired in a particular year, over the expected tenure of their jobs, is $230 billion, and, according to Hunter's calcula- tions, the savings if they had been hired top-down in order of test score would be 35 percent x $230 billion = $80.5 billion. Wid VG-GATB Testing Save $80 Billion? We examine the applicability of Brogden's formula for evaluating gains from the use of the GATE by the Employment Service and reconsider the particular numerical inputs used by Hunter (U.S. Department of Labor, 1983e). There are two points to consider about the r value (correlation between test score and productivity), which Hunter estimates at .5. First, the .5 value is based on corrections for restriction of range and for unreliability in the criterion that the committee does not accept (see Chapter 8) and is significantly larger than is supported by the second wave (post-1972) of GATB validity studies. Second, Brogden's formula measures the gains to an employer from using ability tests, under the assumption that, without the tests, hiring is random. Hunter asserts that the counseling used by the Employment Service instead of the test "is equivalent to random selection" (U.S. Department of Labor, 1983e). We do not have convincing evidence, however, that the other techniques used by the Employment Service and by employers are of no value. (If, indeed, workers are being selected at random from applicant pools by the alternative methods, how can it be argued, as Hunter does elsewhere, that it is necessary to correct corre- lations computed on worker groups for restriction of range in order to estimate their values for applicant groups?) In any case, some employers use their own selection methods to screen applicants sent by the Employ- ment Service. In assessing the gains from using ability tests, it would be necessary to understand how ability tests complement existing proce- dures. Suppose an employer is using a procedure that has a validity of .10. For example, an employer uses some combination of interviews and biographic information to rank job applicants and hires those who come out best in that ranking. The ranking has a correlation of .10 with productivity. Now suppose the employer adds an ability test, which in combination with other selection methods has a validity of .3 to select applicants. The gain in productivity can be measured by Brogden's formula, but the validity term in the formula must be replaced by .30 - .10 = .20, the change in validity due to adopting the new procedure.

EVALUATION OF ECONOMIC CMIMS 239 At first glance, it might be thought that the employer's prior procedure with validity .10 could be combined with a cognitive test of validity .30 to produce a combined selection procedure with validity .40, so that the gain in validity due to using the cognitive test is .30. That, however, is not the case. Even if the two are uncorrelated, the correlation of the combined procedures is only .33; if they are positively correlated, it will be somewhat less than this. To discover the improvement due to using a cognitive test, one cannot avoid adjusting for the validity of the prior procedure. Thus, in place of Hunter's estimate of .5, we suggest that the gain in the validity of an employer's selection procedures from using the GATE is more likely to range from .1 to .3. The .1 corresponds to jobs for which the employer already has a reasonable selection procedure, and the .3 corresponds to jobs for which the current selection procedure is effec- tively random. Hunter's estimate of the second value in the Brogden formula is also open to question. The empirical evidence cited for the standard deviation of worker productivity is quite slight-eight studies by five authors (U.S. Department of Labor, 1983e). Six of these studies are for jobs in the Job Families IV and V principally served by the Employment Service, and the standard deviations of output as a percentage of wages average 16 percent. Two of the studies, using a questionnaire of supervisors devel- oped by Hunter and Schmidt, give values of 55 percent and 60 percent for budget analysts and computer programmers, respectively. However, the Employment Service does not see many applicants like budget analyst and computer programmer. It seems overly optimistic to produce a figure of 40 percent as the consensus figure for Employment Service jobs. In Schmidt and Hunter (1983) the low-complexity jobs were estimated to have standard deviations of 20 percent, and in more recent work (Hunter et al., 1988) the estimates have been revised downward to 15 percent. In our judgment, a more appropriate consensus figure for Employment Service jobs would be about 20 percent. The third figure in Brogden's formula is the selection ratio, which Hunter takes to be 1 in 10 (1 selected for every 10 applicants). In 1980 the Employment Service placed 4 million applicants in jobs. To achieve a selection ratio of 1 in 10, it would have needed 40 million applicants, the top 4 million test scorers being placed. The figures for 1986-1987 were 3.2 million placements of 6.9 million referrals for 19.2 million applicants, a ratio of 1 in 6 (and perhaps 1 in 4 would be more reasonable, because 7 million of the 192 million were unemployment insurance claimants legally obliged to register). The theoretical gains to be reaped from testing come from allocating the top X percent of test scorers to jobs and the bottom 100 - X percent to no jobs. Hunter's numbers would mean that 10 percent would be selected and 90 percent would not. For an individual employer

240 ASSESSMENT OF THE VG-GATB PROGRAM who can afford to be highly selective, Brog~en's formula may well be applicable. But it cannot apply to the whole economy, for which the prospect of the top-scor~ng 10 percent working and the bottom 90 percent not working is absurd. And the Employment Service is a microcosm of the economy; of the 16 million applicants not placed during 1986-1987, many will have already had jobs when they applied or will get them through some other route than the Employment Service. Thus, even if they score low on the test, they will get to work, and their productivity must be allowed for. Suppose there was only one job and all job seekers were tested, and the top 90 percent of test scorers were employed and the rest were unem- ployed. Ten percent is regarded as a reasonably high rate of unemploy- ment. The gains from testing against random hiring would be computed using a selection ratio of 9 in 10. The corresponding inverse Mill's ratio is .20, which should be compared with an M(p) value of 1.76 when the selection ratio is 1 in 10. Taking a more optimistic view, let us now assume a selection ratio of 6 to 1 based on the 1986-1987 figures. (This is optimistic in the sense that it supposes that the 16 million workers not placed by the Employment Service did not have or find jobs and so did not lower average produc- tivity.) The corresponding value of M(pJ is 1.40. If one accepts the committee's more cautious estimates of the first two values in the Brogden formula, and if the Employment Service referred in order of test score and the employers hired in order of test score, the economic gain by Brogden's rule would be: G= .2 x 20 x 1.40= 5.6%. This would lead to an estimated dollar gain, in 1980, of $13 billion as opposed to Hunter's $80 billion. However, this is still an overestimate because the average job tenure figure was not discounted for the decreased value of the savings over time. Rather, one year's savings was multiplied by the 3.6-year average tenure figure. A value of 3 would be more appropriate, since next year's savings are not as valuable as this year' s.2 This correction would reduce the dollar gain to about $10.75 billion. 2To correctly estimate the amount discounted, one would need to know both the appropriate discount rate and the distribution of job tenure (not just its mean). To arrive at the value 3, we took 10 percent as a discount rate. This is probably conservative. The most conservative assumption one could make about the distribution of job tenure would be to suppose that every worker stays on the job for exactly 3.6 years and then quits. Under that assumption, the discounted present value of savings to a firm is 3.15 times annual savings. A less conservative procedure would assume that workers leave jobs at a constant rate. In this case the discounted present value of one year's savings should be multiplied by 2.63. A reasonable compromise value is 3.

EVALUATION OF ECONOMIC CMIMS 24} The more radical view, with a selection ratio of 9 in 10 (that is, 9 of 10 Job Service applicants get jobs one way or another), would lead to a gain of 0.8 percent. Including the discounted job tenure figure, the dollar gain in this scenario would be on the order of 1.5 billion. The committee concludes that both the logic and the numbers used in the estimate of $80 billion to be gained from testing are flawed, and that an estimate in the range $1.5 billion to $10 billion is more plausible. Although we regard this as a plausible estimate of savings, provided both the Employment Service and employers used the GATE optimally, we emphasize that it is not reasonable to conclude that the economy as a whole would save this amount of money or that the gross national product (GNP) could increase by this amount. Employment Service use of the VG-GATB will not improve the quality of the labor force as a whole. If employers using the Employment Service get better workers, employers not using the Employment Service will necessarily have a less competent labor force. One firm's gain is another firm's loss. With great ambivalence, we have developed alternative computations of the economic gains to be anticipated from widespread use of the VG-GATB system. Such dramatic claims of dollar gains have been proposed and given a credence perhaps not originally intended that we feel compelled to demonstrate that a careful critique of the assumptions and the numbers would lead many experts to very different, and much more modest, estimates. Our ambivalence stems from a reluctance to do anything to encourage further use of dollar estimates in Employment Service literature. Given the paucity of empirical evidence and the state of the art, all estimates of productivity gains from ability testing are highly speculative. The choice of a dollar metric lends a false precision to the analysis. We feel that it is more likely to mislead than to inform policy. GAINS TO THE ECONOMY AS A WHOLE ARE FROM JOB MATCHING Several attempts have been made to calculate the gains that would accrue to the economy as a whole if ability testing were used to select all workers in the economy. This calculation cannot be made simply by applying Brogden's formula to the economy as a whole. The reason is that an important source of increased productivity is an employer's ability to select the best-qualified workers and to avoid hiring the least-qualified workers. If there is no selectivity, then an employer gains nothing by identifying the able, since this identification will not affect the hiring decisions. The economy as a whole is very much like a single employer who must accept all workers. All workers must be employed. Whereas it may be

242 ASSESSMENT OF THE VG-GATB PROGRAM true for an individual firm that more than 10 percent of its workers fit into the top 10 percent of the ability distribution, this can never be true of the entire labor force. The economy as a whole must make do with a labor force that has only 10 percent of the workers who fit into the top 10 percent of the ability distribution. It must somehow reserve 10 percent of its jobs for the least able 10 percent. This situation contrasts with that of the individual employer. If a firm uses tests to identify the able and if the firm can be selective, then it can improve the quality of its work force. The economy as a whole cannot; the economy as a whole must employ the labor force as a whole.3 Testing can increase aggregate productivity only if there are gains to be made from matching people to jobs. Estimating those gains requires models and procedures that are different from those used to measure the gains that accrue to an individual employer who uses ability tests. In estimating the effect on the economy as a whole, the mode! must balance the single employer's gains against the losses of others. To summarize, utility analysis cannot be applied to the economy as a whole because the economy as a whole cannot have a selection ratio of much less than 100 percent. The economy as a whole must make do with the labor force that it has. It is not possible to assign the best workers to every job. ~- ~7 Economic Gains Based on the Hunter and Schmidt Job-Matching Model In job matching, individuals are assigned to jobs to maximize overall productivity. In the simplest case, when there is one predictor for each of several jobs, gains over random assignment occur only if the quantity validity x standard deviation of productivity varies over the different jobs. The higher-scoring workers are assigned to the jobs with the higher values of this quantity (Cronbach and Gleser, 1965: Chap. 51. 3What about the unemployed? One not entirely frivolous answer is that being unem- ployed is a job; unemployment is essential to the smooth functioning of the economy. If there were no unemployment, then inflation would be unacceptably high. Furthermore, unemployment is necessary if the labor force is to respond to changing economic demands. Without unemployment we would have many blacksmiths and no computer technicians. The fact that the unemployment rate (or at least the unemployment rate that is consistent with reasonable price stability) changes quite slowly is support for this view. If one takes seriously this point of view, then it is clear that productivity can increase if the most able are given the job "work" and the least able remain unemployed. But this conclusion rests on the observation that some jobs are more productive than others and that aggregate productivity increases when the more able are assigned to the more productive jobs. In other words, this is a theory about how good job matching enhances productivity.

EVALUATION OF ECONOMIC CLAIMS 243 Brogden (1955, 1959, 1964) developed algorithms for optimal classifi- cation when separate equations are used for predicting success in the different jobs. The assignment part of the problem is mathematically standard. There are m jobs and n workers, and each worker has an expected dollar productivity for each job. Each worker is assigned to a job to maximize expected total productivity. This is a problem in the field of linear programming called the assignment problem. It will take a while to do the calculation when m and n are large, but it is clear what needs to be done. The hard problem is developing a plausible estimate of dollar productivity for each worker for each job, then assessing the gain in using optimal assignment versus random assignment. Under some simplifying assumptions, Brogden (1959) showed that the gain from optimal assignment was proportional to (1 - c), where c is the correlation between the predictors used in the different jobs. Under these assumptions, it is thus important to classify jobs so that different prediction equations are appropriate for the different jobs. Schmidt and Hunter (1983) present two job-matching models that assign workers optimally. In the first of these, the univariate model, they divide jobs into four types: management-professional, skilled trade, clerical, and semiskilled and unskilled labor. Productivity is predicted by a single predictor, cognitive ability, with correlation .4 in all jobs. The standard deviation of productivity is assumed proportional to average productivity in the job. Thus the optimal classification assigns higher- ability workers to the higher-wage jobs, for which their expected produc- tivity is higher because the standard deviation of productivity in dollars is higher. If there is a single predictor, then Brogden (1959) would predict no gains from the use of testing. Hunter and Schmidt's different conclusion is based on a different assumption about the way in which: validity x standard deviation of productivity varies across jobs. Hunter and Schmidt argue that the higher the average productivity of a job, the greater is the influence of a worker's ability on the output of the job. Some fragmentary confirming evidence that supports this point of view can be found in Hunter et al. (1988~. Brogden implicitly assumes that the effect of ability on job output is the same for all jobs. We regard the Hunter and Schmidt assumption as plausible but note that there is very little evidence about the nature of the relationship of ability to output. In the second of Hunter and Schmidt's models, the multivariate model, different predictors are used for the different job types. Cognitive ability is used for managerial-professional and for semiskilled-unskilled, with an assumed correlation of .4 for each. Cognitive ability and spatial ability

244 ASSESSMENT OF THE VG-GATB PROGRAM predict productivity in skilled trades, and the three correlations between the two abilities and productivity are assumed to be .4. Cognitive ability and perceptual ability predict productivity in clerical work, and the three correlations between the two abilities and productivity are assumed to be .4. Finally, the correlation between spatial and perceptual ability is assumed to be .16. The workers are assigned in the second model as follows: first, those scoring highest on cognitive ability are assigned to the management- professional group; then, of those remaining, the highest scorers on spatial plus cognitive ability are assigned to the skilled trades; of those remaining, the highest scorers on perceptual plus cognitive ability are assigned to clerical work; and the remainder go to semiskilled-unskilled labor. (Although it is a minor academic point, this assignment does not maximize productivity; despite their high cognitive ability, some prodi- gious scorers on spatial ability should be assigned to skilled trades.) Hunter and Schmidt use their models to estimate the amount by which the GNP would increase if testing were used to place all workers optimally in jobs. Under the assumption that validity is .4, their estimates range from 1.7 percent of the GNP for the univariate model (using a low- 16 percent of average output-estimate of the standard deviation of productivity) to 8.1 percent of GNP for the multivariate mode} (using a high 40 percent of average output-estimate of the standard deviation of productivity). Using our preferred parameters-validity is .2 and the standard devia- tion of productivity on a job is 20 percent of output on that jo~their univariate model suggests that improved job matching would increase the GNP by about 1.1 percent; the multivariate model suggests an increase of 2.1 percent. These percentage increases should be compared with the 35 percent increase estimated by Hunter for Employment Service jobs (U.S. Department of Labor, 1983e). Hunter and Schmidt argue that their multivariate model overestimates the potential gain from a testing program because it does not take into account that placement is not now at random. They suggest that a reasonable way to correct this estimate of the potential gains is to take the difference between the multivariate and univariate models. Under their assumptions, these gains would range from 1.6 percent to 4 percent of the GNP; under our preferred assumptions, this technique puts potential gains at 1 percent of the GNP. How do these economy-wide models relate to Employment Service use of the GATE? This is an important question, because a policy that would increase the GNP by just 1 percent would be of enormous value to the country (l percent of the GNP in 1987 was $45 billion). In answering this

EVALUATION OF ECONOMIC CORMS 245 question it is important to remember that only a small fraction of those who find jobs each year do so through the Employment Service system. The gains that Hunt~and Schmidt calculate would be realized only if all employers used tests optimally. It is also important to remember that the most important assumptions of the Hunter-Schmidt models rest on a very slim empirical foundation. Nevertheless, the committee views the economy-wide matching models as a promising way to assess the economic effects of testing. By looking beyond a single job, they offer the Employment Service a device for balancing the demands of all employers and all applicants. In particular, if they are to be taken seriously, they would require a job classification scheme that as much as possible reduces the correlation between predictors in different job classes. The present five-family classification scheme is not adequate for effective multivariate match- ~ng. Few economists have tried to answer the question of how productivity is affected by the way in which workers are matched to jobs. Those who have approached this problem have used models and procedures that are very different from those used by Hunter and Schmidt. Most economic models assume that workers choose the job for which they are best fitted. With this maintained assumption it is not possible to address the question that Hunter and Schmidt ask. Some economic models (notably those of Heckman and Sedlacek, 1985, and Willis and Rosen, 1979) have been tested in the sense that they have been successfully fitted to data about the U.S. economy. In this weak sense they have a firmer empirical base than the Hunter-Schmidt models. However, on the issue of how much output would go up if people were better fitted to their jobs, they are, at present, silent. Hunter and Schmidt's economy-wide models are based on simple assumptions for which the empirical evidence is slight. The most impor- tant one is that the standard deviation of productivity is proportional to average wage of the job. That assumption is supported by only a very few studies. Without that effect there would be no gains in placing higher- scoring workers in the more highly paid jobs. The second set of assump- tions concerns the correlation of various aptitudes with productivity. Although there are many more data on which to base these correlations, there is much variation in the data and considerable disagreement about what the correlations should be. The general concept of the models is promising, but the particular numerical values used can be regarded as only illustrative. We do not know how well employers and workers match themselves already. We do not have a classification of jobs that lends itself to job matching, so the gains from the multivariate model are only theoretical.

246 ASSESSMENT OF THE VG-GATB PROGRAM SUPERVISOR RATINGS AND TRUE PRODUCTIVITY Proponents of the VG-GATB claim that its use will lead to increased productivity. The scientific base of GATB research does not support such an inference directly. This is because the GATB validity studies do not report correlations between test performance and productivity; instead they report correlations between test performance and a surro- gate for productivity, supervisor ratings. (A number of studies report correlations between test scores and performance in training programs. In analysis of the economic benefits of using the GATB, the data on training are largely ignored.) A small number of studies, discussed in Chapter 10, have attempted to measure the economic benefits of using the GATB directly. The small number and mixed quality of these studies make it difficult to draw inferences that can be generalized to other settings. The correlation between test scores and true productivity could well be either higher or lower than the correlation between test scores and supervisor's ratings. If the GATB measures productivity, and if super- visor ratings are imperfect measures of productivity, then the correla- tion between productivity and test scores will be higher than the reported correlation between test scores and supervisor ratings. For an elaboration of this point see the discussion of criterion unreliability in Chapter 6. If, however, the GATB measures well what supervisors regard highly and if supervisor ratings tend to ignore or overlook significant contri- butions to productivity, contributions that are not well measured by the GATB, then the correlation between supervisor ratings and GATB scores will exceed the correlation between productivity and GATB scores. Which is the case? In the absence of direct data on the joint distribution of test scores, supervisor ratings, and productivity, we cannot say with confidence whether reported validity coefficients overstate or underesti- mate the true correlation between test scores and productivity. It seems highly unlikely that data that will resolve this problem will exist in the near future (or ever). What then is to be done? The most reasonable course would seem to be to regard correlation with supervisor ratings as the best available estimate of the correlation between test scores and productivity. However, those who use these numbers to evaluate potential economic gains should be aware of the uncertain scientific base on which their estimates rest.

EVALUATION OF ECONOMIC CHUMS 247 FINDINGS, CONCLUSIONS, AND RECOMMENDATIONS A major attraction of the VG-GATB system is the anticipation of substantial economic gains. USES Test Research Report No. 47 (U.S. Department of Labor, 1983e), written by John Hunter, contends that a potential increase in work force productivity of $79.36 billion per year would accrue if the 4 million placements made by the Employment Service system were based on top-down referral from GATE test scores. Our evaluation of the potential economic effects of the VG-GATB Referral System included study of the work of labor economists as well as the utility analysis developed in recent years by psychologists. We have looked carefully at Hunter's work with GATE data as well as the more elaborate models proposed by Hunter and Schmidt in other contexts. Findings Benefits to the Individual Employer 1. There is evidence in the economics and industrial/organizational psychology research literature that people who score higher on ability tests tend to produce more and make fewer errors, as well as to complete training somewhat faster and stay on the job longer. 2. How selective an individual firm can be depends on the people available and how much the firm can offer its employees in pay and other benefits. Selection can operate only within those conditions, and the potential gains are commensurately constrained. Aggregate Economic Elects 1. There is no well-developed body of evidence *om which to estimate the aggregate effects of better personnel selection. A number of theoret- ical models have been developed that imply various estimates of produc- tivity gains from improved selection and placement. But we have seen no empirical evidence that any of them provides an adequate basis for estimating the aggregate economic effects of implementing the VG-GATB on a nationwide basis. The Hunter-Schmidt Moclels 1. The Hunter-Schmidt univariate and multivariate models for estimat- ing the aggregate economic gain of optimal selection are potentially valuable. However, we have seen no empirical evidence that supports

248 ASSESSMENT OF THE VG-GATB PROGRAM their estimates of dollar gains in the GNP if employment testing with top-down scoring were widely used. Conclusions 1. Our review of the economics literature and our analysis of the Hunter-Schmidt theoretical models lead us to reject their estimates of specific dollar gains from test-based selection. 2. Furthermore, given the state of scientific knowledge, we do not believe that realistic dollar estimates of aggregate gains from improved selection are even possible. They lend a spurious certainty to the argument for the VG-GATB Referral System that can only mislead policy makers, employers, and those who administer the referral system. 3. We agree that better selection of workers would be likely to benefit individual employers and that a better matching of people to jobs according to their particular abilities or other work-related characteristics would tend to foster the economic health of the community, all other things being equal. But the current state of economic knowledge does not permit estimation of the overall economic effects of widespread testing. Recommendation 1. Given the primitive state of knowledge about the aggregate eco- nomic effects of better personnel selection, we recommend that Employ- ment Service officials refrain from making dollar estimates of the gains that would result from test-based selection.

PART V CONCLUSIONS AND RECOMMENDATIONS Whereas the committee's specific conclusions and recommendations appear at the end of each chapter, Part V highlights the committees most important recommendations. Chapter 13 presents the committee's rec- ommendations on the use of score adjustments for black and Hispanic job seekers in the VG-GATB Referral System and its recommendations on what scores to report to test takers and employers. Chapter 14 is a summary of the committee's central recommendations: it recapitulates the committee's statements on operational use of the VG-GATB system, methods of referring applicants to jobs, options for reporting GATE scores to employers and to job seekers, promotion of the VG-GATB system, research on its effects, and action with regard to veterans and people with handicapping conditions.

Next: 13 Recommendations for Referral and Score Reporting »

Fairness in Employment Testing: Validity Generalization, Minority Issues, and the General Aptitude Test Battery (1989)

Chapter: 12 Evaluation of Economic Claims

Welcome to OpenBook!

Get Email Updates