Overview

Performance assessment has been the subject of a great deal of attention and rhetoric in recent years as part of the movement to reform American education fundamentally. Also called “authentic assessment,” this approach eschews written multiple-choice tests in favor of demonstrated performance on tasks in which desired knowledge and skills are used. The emphasis is on giving pupils the opportunity to demonstrate what they can do, rather than how well they can answer questions about a subject.

As the title of this book indicates, performance assessment has also come to the workplace. Ability tests, usually in the multiple-choice format, have been used for generations for employee selection and classification, for certification, and for career guidance. Underlying these practices is a set of assumptions, both economic and scientific, that support the use of tests. The essence of the argument is that a selected work force will be more efficient or productive than one chosen at random, so that it is in the employer's economic self-interest to be selective in hiring. From a broader perspective, the argument is made that improving the person-job match is in the national interest because it results in greater overall productivity and optimum utilization of workers.

Since the advent of industrial psychology in the first half of the century, tests have been a favored tool for screening workers to find out which applicants have the ability to do the work—or enter the apprenticeship program, be certified to work on sophisticated machinery, etc. Psychometrics,



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 1
Performance Assessment for the Workplace Overview Performance assessment has been the subject of a great deal of attention and rhetoric in recent years as part of the movement to reform American education fundamentally. Also called “authentic assessment,” this approach eschews written multiple-choice tests in favor of demonstrated performance on tasks in which desired knowledge and skills are used. The emphasis is on giving pupils the opportunity to demonstrate what they can do, rather than how well they can answer questions about a subject. As the title of this book indicates, performance assessment has also come to the workplace. Ability tests, usually in the multiple-choice format, have been used for generations for employee selection and classification, for certification, and for career guidance. Underlying these practices is a set of assumptions, both economic and scientific, that support the use of tests. The essence of the argument is that a selected work force will be more efficient or productive than one chosen at random, so that it is in the employer's economic self-interest to be selective in hiring. From a broader perspective, the argument is made that improving the person-job match is in the national interest because it results in greater overall productivity and optimum utilization of workers. Since the advent of industrial psychology in the first half of the century, tests have been a favored tool for screening workers to find out which applicants have the ability to do the work—or enter the apprenticeship program, be certified to work on sophisticated machinery, etc. Psychometrics,

OCR for page 1
Performance Assessment for the Workplace the science of mental measurement, strengthened the claims that ability tests can predict which applicants will be the more successful ones by encouraging the use of objective procedures in a controlled setting, with empirical verification of the accuracy of the results. On the latter point, however—empirical verification of predictive accuracy—justifications for employment tests have been noticeably deficient. Test scores have little meaning in and of themselves. One of the most important psychometric techniques for examining their significance or validity has been through correlation with measures of job performance. If, for example, the people who do well on the job are the same ones who did well on an ability test administered for experimental purposes at the time of hiring, then the employer can have some faith in the predictive power of the test for future hires. The problem is that the measures of success on the job typically used to give meaning to test scores have tended to be measures of convenience—time and attendance records, supervisor ratings, training grades. None of them is a very convincing indicator of how well a worker performs on the job. None of them provides compelling evidence of the significance of test scores. This is the problem that brought authentic assessment to the workplace and provided the occasion for this book, which describes a large-scale effort to develop hands-on measures of the performance of enlisted military personnel for the purpose of validating selection and classification tests. Although the military as an institution has unique features, it also shares many of the characteristics and concerns of other employers. The research on measuring performance in entry-level military jobs described here should be relevant to employers more generally. Indeed, because so much of the performance measurement methodology is new and, until now, relatively untried, the leaders of education reform will find much to interest them as well. THE JPM PROJECT The Joint-Service Job Performance Measurement/Enlistment Standards (JPM) Project is one of the largest coordinated studies of job performance on record. Initiated by the Department of Defense (DoD) in 1980 and scheduled for completion in 1992, the JPM Project represents an investment of many millions of dollars and involved the participation of thousands of people—from the measurement specialists who designed the performance tests to the local base personnel who provided logistical support for the data collection and the more than 15,000 troops who supplied the performance data. The sheer size of the effort ensures that the JPM Project will provide a wealth of raw material and guidance for the next generation of researchers

OCR for page 1
Performance Assessment for the Workplace in the field of human resource management, quite aside from its more immediate goal of improving the selection and classification of military enlisted personnel. The project's many achievements add in important ways to the understanding of personnel selection systems. Even its shortcomings are informative, for they point up the need for additional methodology and highlight the dilemma resulting from conflicting purposes that are inevitable in a project of this magnitude. The purpose of this volume is to convey to military and civilian human resource planners as well as to the measurement community the advances in theory, technology, and practical knowledge resulting from the military research. It does not present a detailed analysis of the performance data, leaving that to the research teams that carried out the work. We have tried to place the technical discussions concerning the development of performance measures within a larger policy context, providing in Chapter 1 a historical introduction to the criterion problem and sketching in Chapter 2 the many and often competing forces that influence personnel selection in the military. Succeeding chapters describe the design of the project and look more closely at substantive and methodological issues in performance measurement. These issues are for the most part as relevant to the assessment of civilian-sector job performance as to the military and speak to the new interest in performance assessment in school settings as well. ORIGINS OF THE PROJECT The JPM Project had its origins in the mid-1970s. In 1973, Congress abolished military conscription, and the military establishment was faced with the prospect of maintaining an active-duty military force on the basis of voluntary enlistment. Intense public debate accompanied the move to an all-volunteer force. Many feared that able volunteers would not sign up in sufficient numbers. Opponents warned that the national security would be weakened. Others were concerned on social and philosophical grounds that the burden of national defense would fall largely to minorities, the poor, and the undereducated—those who would have most difficulty finding work in the civilian economy (Fullinwider, 1983). With the matter of exemptions from the draft made moot by the shift to a volunteer force, military manpower policy came to revolve around issues of recruit quality and the high cost of attracting qualified personnel in the marketplace (Bowman et al., 1986). Concern about the quality of the all-volunteer force reached a climax in 1980, when DoD informed Congress of an error in scoring the Armed Services Vocational Aptitude Battery (ASVAB), the test used throughout the military since 1976 to determine eligibility for enlistment. A mistake had been made in the formula for scaling scores to established norms, with the

OCR for page 1
Performance Assessment for the Workplace result that applicants in the lower ranges of the ability distribution were given inflated ASVAB scores. As a consequence, approximately 250,000 applicants were enlisted between 1976 and 1980 who would not normally have met minimum entrance standards (Office of the Assistant Secretary of Defense—Manpower, Reserve Affairs, and Logistics, 1980a, 1980b). Not surprisingly, the military oversight committees of Congress had fleeting, though apparently unfounded, suspicions that the misnorming had been engineered in order to help the four Services meet their enlistment quotas. More to the point, both Congress and DoD policy officials wanted to know how the induction of so many people who should have failed to qualify was affecting job performance. Initial attempts to address the question revealed that the relation between ASVAB scores and satisfactory performance in military jobs was more assumed than empirically established (Office of the Assistant Secretary of Defense—Manpower, Reserve Affairs, and Logistics, 1981; Maier and Hiatt, 1984). In response to the misnorming and to allay its own broader concerns about building an effective enlisted force solely with volunteers, DoD launched two major research projects to investigate the overall question of recruit quality. The first project, conducted in cooperation with the U.S. Department of Labor, administered the ASVAB to a nationally representative sample of young people between the ages of 18 and 23. This Profile of American Youth (Office of Assistant Secretary of Defense—Manpower, Reserve Affairs, and Logistics, 1982a) permits comparisons between the vocational aptitude scores of military recruits and the test performance of a representative sample of their peers in the general population as of 1980. No longer do the test scores of today 's recruits have to be interpreted with test data from the World War II era. The Profile provided important evidence to quell the worst fears about the quality of the all-volunteer force. The scores of enlistees for fiscal 1981 on the four subtests of the ASVAB that make up the Armed Forces Qualification Test (AFQT) were higher than those of the 1980 sample of American youth. In particular, the proportion of enlistees in the average range was considerably larger, and the proportion of enlisted personnel in the below-average range smaller, than in the general population. Although the results were reassuring, the weakness in the evidence was that quality was defined in terms of the aptitudes of recruits, not realized job performance—that is, in terms of inputs, not outputs. The relation between test scores and performance on the job was not established empirically, and thus DoD still could not satisfactorily answer the more difficult questions about the quality of the voluntary military: How much quality is enough to ensure a competent military force? Given the need to compete in the marketplace for able recruits—using the lures of enlistment bonuses, high entry-level pay scales, and educational benefits—how much quality can the country afford?

OCR for page 1
Performance Assessment for the Workplace In 1980, the assistant secretary of defense in charge of manpower and personnel affairs called on the Services to investigate the feasibility of measuring on-the-job performance and, using the measures, to link military enlistment standards to job performance. With the endorsement of the House and the Senate Committee on Armed Services, the Joint-Service Job Performance Measurement/Enlistment Standards Project, DoD's second major research project, got under way. The progress of this massive research effort is charted in an ongoing series of annual reports to Congress from the Office of the Assistant Secretary of Defense. Now, after more than a decade of research, empirical evidence has replaced assumptions about the efficacy of the ASVAB. The JPM Project has successfully measured the job proficiency of incumbents in a sample of military entry-level jobs. In the process, it has compared several types of measures and different approaches to test development. The performance measures provide a credible criterion against which to validate the ASVAB, and the ASVAB has been demonstrated to be a reasonably valid predictor of performance in entry-level military jobs. Generalizations from the JPM results will take their place in the literature and lore of industrial and organizational psychology. Because of the superior measures of performance, constructed with a care normally reserved for standardized tests used as predictors, these results provide a solid base for general conclusions formerly based on less satisfactory criteria. This overview reviews the project briefly, emphasizing those aspects that seem particularly noteworthy, either in illuminating special accomplishments of interest to policy makers or technical experts, or in posing challenges to the technical community. CONCEPTUAL FRAMEWORK What is Performance? The initial goal of the JPM project was to develop measures of job performance that could be used as criteria in evaluating personnel selection procedures. Because so many Service personnel are first-term enlistees, and because the predictive power of entrance characteristics such as ASVAB scores attenuates over time, the project was limited to the study of first-term job performance. The first necessity was to define job performance. Any concept that seems unitary from afar becomes complex when viewed up close, and job performance is no exception. Should the definition include the full range of the job, or be limited to what the incumbents regularly do? Should motivation or perseverance or willingness to go along with the institutional culture be assessed? For this project, job performance was defined as proficiency,

OCR for page 1
Performance Assessment for the Workplace that is, how well an incumbent can do the job. The definition explicitly rejected assessment of how well the incumbent does do the job. That is, tasks that incumbents are rarely called on to do were to be included, but measures of motivation—the willingness to do well—were not. A good argument can be made that proficiency is the most appropriate criterion if the intent is to evaluate the placement of recruits, which was indeed the case here. However important such factors as effort, personal discipline, and military bearing may be to the overall functioning of the organization, these factors do not differentiate among jobs. The choice of proficiency as the performance dimension of interest was doubly sensible in this case because the battery used by the Services for selection and classification is not designed to tap the attitudinal or motivational aspects of performance. The emphasis on proficiency was underlined by the selection of hands-on performance tests as the benchmark measure for the project. Also called work samples, these tests are as faithful to actual job performance as criterion measures can be, short of observing people in their daily work. Giving pride of place to an assessment format that requires workers to do job-related tasks was perhaps the most significant decision of the entire development effort. It had, of course, enormous cost implications, for this sort of one-on-one assessment is labor-intensive, very time-consuming, and difficult to develop. Nevertheless, the need to make military selection procedures credible in the face of widespread doubt called for a technology that could anchor them solidly in job performance. No other assessment format is as faithful to actual job performance. What is a Job? Having chosen proficiency as the facet of performance to be measured, project planners had next to address the difficult question of proficiency at what? As we discuss in Chapter 4, job analysis can focus on either the work to be done in a job or on the personal traits and attributes an incumbent needs to do the job, or both. The former is task-oriented; the latter is person-oriented. For several reasons, trait analysis did not play a major role in the JPM Project. Traditional task analysis complemented the desire to stay as close as possible to concrete job performance. Moreover, the Services have well-established systems for taking inventory of the tasks that comprise each job on a regular basis. In effect, a bare-bones job analysis had already been done for each of the jobs in the JPM Project; it needed only to be refined. If sheer time and cost considerations made it expedient to use the existing task inventories and soldier's manuals, the absence of any consistent attempt to determine the human attributes that might contribute to the suc

OCR for page 1
Performance Assessment for the Workplace cessful accomplishment of tasks had certain costs. A combined task and trait analysis would have provided a richer understanding of the performance requirements of the jobs being studied and may well have improved the generalizability of the JPM results to other military jobs—a matter that has proved difficult. What is a Task? The JPM Project is highly unusual in having applied classical test construction methods to the development of criterion measures. To a large extent, the success of this project in measurement terms is due to the thought and care that went into answering the question: What is a task? Not infrequently the existing task statements were at very different levels (either of breadth or of abstraction), and those derived from task inventories were by definition stripped of content richness. The research staff, assisted by subject matter experts, had to identify a coherent set of activities—with a definable beginning and end—that would represent the task. The process of turning tasks into hands-on test items, in light of the goal of measuring “manifest, observable job behaviors,” became a matter of decomposing each task into its behavioral subcomponents, or steps, and identifying the associated equipment, manuals, and procedures required to perform the steps. MEASURING JOB PERFORMANCE Having chosen the task as the unit of analysis, the Services set about developing a variety of measures to assess the proficiency of incumbents in a sample of some 30 jobs chosen to represent broadly the mechanical, technical, administrative, and soldiering occupations in the military. In addition to hands-on tests, a number of other types of measures were developed as possible surrogates for the real thing. These included interview procedures, simulations, multiple-choice tests of job knowledge, and a variety of ratings intended to elicit performance appraisals from supervisors, peers, and the examinees themselves. Task selection presented one of the most difficult challenges to the development of measures that could be considered representative of performance on the job. For each job studied, the 300 to 400 tasks remaining after culling redundant, outdated, and other problem tasks had to be reduced to a handful—about 15 for hands-on testing. Tasks were turned into test items by means of careful and thorough understanding of each task and detailed analysis of its component steps. The administration of hands-on and interview procedures, both of which took place under the watchful eye of an examiner, required careful planning and continual vigilance in order to control the quality of the data collection.

OCR for page 1
Performance Assessment for the Workplace The chapters of this report discuss these steps in detail. The committee highlights below some of the most important issues and lessons learned from the Services' efforts. Relative Versus Absolute Measures An early and largely implicit decision in the JPM Project was that the proficiency measures would be developed in the style of the usual norm-referenced tests used in prediction. That is, the research paradigm was to rank each job incumbent relative to his peers, rather than determining how well the incumbent could do the job in an absolute sense. This decision fundamentally influenced how the tests were designed and the tasks selected. If the intent is relative measurement, then the test developer avoids tasks that are so easy that everyone will pass or so hard that everyone will fail—the point is to select items of a range of difficulty that will produce a good distribution of scores. This is an appropriate approach if the main goal is correlation with predictor scores. But the resulting test scores do not indicate whether the predicted performance is good enough. The committee felt strongly that a domain-referenced approach would have been more appropriate to the long-term goal of the JPM Project, which was not simply to validate the ASVAB, but to link enlistment standards to job performance. The argument for this position is laid out in Chapter 9 and in Green and Wigdor (Vol. II). Had this approach been adopted, the tests would have been designed to measure individual performance against a scale of competence or job mastery, and test scores would have indicated how well the incumbent could do the job. Tasks would have been selected to represent levels of mastery, rather than to spread the test takers along a distribution. This challenges the traditional research paradigm, but offers more compelling evidence to policy makers concerned with the question of how much quality is enough. As it is, inferences cannot be made directly from the JPM data about the competence of individuals relative to a job, but only about competence relative to others who perform the job. Sampling Issues The JPM Project presented a number of interesting sampling problems. To begin with, there are over 900 different military occupational specialties in the four Services, of which only a small number could be studied. Although the 30 occupations selected—9 in the Army, 6 in the Navy, 8 in the Air Force, and 7 in the Marine Corps (see Chapter 3)—included more than one-third of all current first-tour personnel, the problems of generalizing to the other jobs had to be, and indeed have proved to be, generally vexing (see Chapter 9 and Sackett in Vol. II).

OCR for page 1
Performance Assessment for the Workplace Specifying a sample of personnel to be tested also had its challenges. Ideally, one would want to be able to test all incumbents in a job or, if that proved impossible, a representative sample of the total population. To be avoided at all costs is having a manager provide whoever he or she chooses to make available. In some Services, the central records system was not current enough to permit drawing up a list of those to be tested prior to arrival on location. By and large, however, the Services were able to avoid the worst pitfalls of availability sampling, even if a good deal of creativity was required to overcome logistical problems such as gathering a sample from among widely scattered ships subject to impulsive departures. Chapter 5 provides further detail on these obstacles. From the point of view of applied measurement, the sampling issues of greatest interest surround the selection of tasks to be tested. If performance test scores are to be a meaningful indicator of performance on the job, then the test must be representative of the job. For policy makers and managers, it is probably enough that a performance measure should “look like” the job. But providing a scientifically supportable basis for extrapolating from performance on a subset of tasks to performance on the job as a whole—whether by judgment-based or empirical means—requires much more than surface similarity. There are two schools of thought on selecting tasks: one adheres to purposive sampling, by which job experts choose the tasks to be included, while the other calls for random sampling. It was the committee 's position that, all things considered, a stratified random sampling approach would put the project on stronger ground scientifically; it is the only approach that allows one to make, with known margins of error, statements that can be generalized to the entire domain of tasks in a job. But purposive sampling was far more prevalent in the JPM Project, as it has been in private-sector test development. The reasons for this are certainly worth consideration. It was feared that using random sampling might create a test that omitted essential job elements. Perhaps even more compelling to the research teams was the belief that only purposive sampling could guarantee an instrument that policy makers would accept because it looks like the job. Chapter 7 presents the arguments of each school of thought on task sampling as well as the committee's suggestions for a possible rapprochement between the two. Test Administration The JPM Project is the best reservoir of experience currently available on administering performance-based measures in a way that permits comparisons across test takers, and as such should be of considerable value in the discussions of authentic assessment for education. As Chapter 5 makes

OCR for page 1
Performance Assessment for the Workplace clear, any sort of job sample test requires knowledgeable and dispassionate raters to score the performance. Considerable effort in training the raters will pay off. Particular care is needed to get raters to accept their role as passive participants; the tendency of former supervisors is to correct the errors that the test takers make and show them how to do the task correctly. Although this is exemplary behavior for a supervisor—or a schoolteacher—it is inappropriate for a test administrator if one wants to make comparisons. The logistics of test administration similarly need careful attention with the performance-based methodologies. Early on in the project there were serious problems with moving people through the test stations efficiently and ensuring that each examinee worked independently. By the time the Marine Corps began assessing infantry riflemen, elaborate protocols had become the norm. The test site had seven stations containing tasks to be administered indoors and seven to be administered in the field. Each station was isolated from the others to the extent that the examinees could not see or hear what was occurring at other stations. The number of men to be tested was equal to the number of stations. A balanced randomized block was used to schedule the tasks, and each man had a list of the stations he was to attend, in the order in which he was to attend them. Each station was set up to require no more than 30 minutes to complete the tasks located there, and the time schedule was rigidly adhered to. Not only did the examinees rotate through the test stations, but from day to day the examiners rotated through the stations, so they would not get stale or bored with continually testing the same tasks. Maintaining rater alertness is as important as isolating the examinees. EVALUATION: THE QUALITY OF THE PERFORMANCE MEASURES Because hands-on tests have been so little tried in the past, it was not clear that the assessment method would produce measurements of sufficient stability and relevance to be meaningful psychometrically. Moreover, the hands-on format called for creative twists on standard procedure in reliability and validity analysis. Reliability When raters are part of a measurement system, the relative contributions of raters and tasks to measurement error should be assessed. Chapter 6 presents the committee's view that the best procedure for doing this involves generalizability theory (Cronbach et al., 1972; Shavelson, Vol. II). Because generalizability theory is an extension of classical test theory in an

OCR for page 1
Performance Assessment for the Workplace analysis of variance framework, more elaborate measurement (cf. “experimental”) designs can also be studied. Specifically, effects of testing conditions (e.g., administrators, locations, tasks, raters, and the like) can be estimated simultaneously with major sources of error being pinpointed. As reported in Chapter 6, two of the Services used this technique and, surprisingly, found that there was virtually no effect of rater. Raters are commonly found to have a large effect in more subjective settings. Since extensive observations of the rating process did not uncover any large amount of collusion between raters, we infer that the design of the scoring procedures for the hands-on tests was clear and objectively based, with little room for individual subjective variation in judgment. This is an important achievement for the JPM Project; we commend the procedures used to construct the hands-on test scoring forms to others for study and emulation. By contrast, tasks turned out to be large contributors to measurement error. There is no question that the examinees differed in what they could do on the hands-on test, but how much of that was due to the exigencies of the particular set of 15 or so tasks that made up the test and how much to differences in individual abilities is unknown. The implication is that more tasks are needed to get a clearer picture of the stable performance differences among the Service personnel in the study. This result was not unanticipated. The feasibility of testing enough tasks to achieve stable results in the work-sample mode has been a concern all along. Many committee members were surprised that the hands-on tests turned out to be as reliable as they are. Validity The JPM Project provided ample evidence that ASVAB scores are related to job proficiency. Chapter 8 explains the reasoning behind criterion-related validity studies and describes in abundant detail the relationships among the predictor composites and the various criterion measures. A number of interesting trends emerge from the data. For example, the Armed Forces Qualification Test, an ASVAB composite that measures general ability, can predict performance in all of the jobs studied, but better prediction can be achieved in most jobs by using different combinations of test scores from the battery. Another general trend, one that had been anticipated, was the somewhat higher correlation between the entrance tests and the job knowledge performance measure than between the entrance tests and hands-on measures. To some extent this represents a method effect—both the entrance tests and the job knowledge test are multiple-choice paper-and-pencil tests. Finally, it is worth noting that the degree of relationship between the ASVAB and the various criterion measures, while large enough to justify the entrance test's utility in military selection and classification, is

OCR for page 1
Performance Assessment for the Workplace modest enough to encourage a search for additional predictors to supplement the ASVAB. Fairness Analysis The analysis of how tests function for various population subgroups has been common practice since the 1970s because of concerns over observed group differences in average test scores. Equal employment opportunity laws established a governmental interest in making sure that these score differences are related to real performance differences and not to some artifact of the test. Subgroup differences were expected and were observed in the JPM data, although the data are thin and should be interpreted with caution. Blacks scored lower than nonminorities on both selection tests and job performance measures. An interesting point for policy purposes, however, is that the magnitude of the average score differences between blacks and nonminorities is much larger on the AFQT (−.85 of a standard deviation) and on the job knowledge performance test (−.78 of a standard deviation), both of which are paper-and-pencil tests, than on the hands-on test (−.36 of a standard deviation). In other words, the two groups look much more similar on the hands-on tests. To the extent that one has confidence in the hands-on criterion as a good measure of performance on the job, these findings, reported in Chapter 8, suggest that scores on the AFQT exaggerate the size of the difference that will ultimately be found in the job performance of the two groups. THE FINAL STEP: LINKING ENLISTMENT STANDARDS TO JOB PERFORMANCE Selecting personnel who will turn out to be successful on the job, particularly from a youthful and inexperienced applicant population, is a complicated business. Phase I of the JPM Project has demonstrated that reasonably high-quality measures of job performance can be developed, and that the relationships between these measures and the Armed Services Vocational Aptitude Battery are strong enough to justify its use in setting enlistment standards. But the human resource management problem is not solved by showing that recruits who score well on the ASVAB tend to score well on hands-on performance measures. High-quality personnel cost more to recruit, and the public purse is not bottomless. In order to make reasonable budgetary decisions, Congress needs to be able to balance performance gains attributable to selecting those with better-than-average scores on the ASVAB against the costs of recruiting, training, and retaining high-quality personnel. And to improve their control over performance in the enlisted

OCR for page 1
Performance Assessment for the Workplace ranks, DoD and the Services need to be able to make more empirically grounded projections of their personnel quality requirements. The critical policy question is: How much quality is enough? The second phase of the JPM Project is concentrating on the development of analytical tools that will illuminate for policy makers the effects of alternative enlistment standards on performance and costs. The research is still under way and will have to be reported elsewhere. Chapter 9 of this report lays out the general outlines of the problem of developing cost-performance trade-off models and discusses the strengths and weaknesses of the JPM data for making performance an operative element in such models. Although the problems are complex and there is still room for improvement at every stage of the research and development, the results of the JPM Project to date indicate that the concept of linking selection standards to objective measures of job performance is basically sound. It appears that it will be feasible for human resource planners and policy makers to incorporate empirical data derived from job performance into the decision process in a systematic way. The development of cost-performance models for setting enlistment standards has great potential relevance for accession policy. Until now, the standards-setting process has been largely based on an informal process of individual judgments and negotiations among the stakeholders. The manpower management models used by military planners for other purposes have simply assumed an appropriate enlistment standard or have used surrogates at quite some remove from job performance. If the JPM performance data can be successfully incorporated into trade-off models, the models will offer policy officials useful tools for estimating the probable effects on performance and/or costs of various scenarios —say a 10 percent reduction in recruiting budgets, a 20 percent reduction in force, or a downturn in the economy. Although the solutions provided by such models are not intended to and will not supplant the overarching judgment that policy officials must bring to bear, they can challenge conventional assumptions and inject a solid core of empirical evidence into the decision process. The full implications of the job performance measurement research for military policy makers—and for civilian sector employers—remain to be worked out in coming years. The JPM Project has produced a rich body of data and a wealth of methodological insights and advances. And, as important research efforts so frequently do, it has defined the challenges for the next generation of research on performance assessment.

OCR for page 1
Performance Assessment for the Workplace This page in the original is blank.