Read "An Assessment of Research-Doctorate Programs in the United States: Social and Behavioral Sciences" at NAP.edu

« Previous: 9 Sociology Programs

Page 175 Cite

Suggested Citation:"10 Summary and Discussion." National Research Council. 1982. An Assessment of Research-Doctorate Programs in the United States: Social and Behavioral Sciences. Washington, DC: The National Academies Press. doi: 10.17226/9781.

Page 176 Cite

Page 177 Cite

Page 178 Cite

Page 179 Cite

Page 180 Cite

Page 181 Cite

Page 182 Cite

Page 183 Cite

Page 184 Cite

Page 185 Cite

Page 186 Cite

Page 187 Cite

Page 188 Cite

Page 189 Cite

Page 190 Cite

Page 191 Cite

Page 192 Cite

Page 193 Cite

Page 194 Cite

Page 195 Cite

Page 196 Cite

Page 197 Cite

Page 198 Cite

Page 199 Cite

Page 200 Cite

Page 201 Cite

Page 202 Cite

Page 203 Cite

Page 204 Cite

Page 205 Cite

Page 206 Cite

Page 207 Cite

Page 208 Cite

Page 209 Cite

Page 210 Cite

Page 211 Cite

Page 212 Cite

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

x Summary and Discussion Results of the assessment of 639 research-doctorate programs in anthropology, economics, geography, history, political science, psy- chology, and sociology are presented in the preceding seven chapters. Included in each chapter are summary data describing the means and in- tercorrelations of the program measures for each discipline. In this chapter a comparison is made of the summary data reported in the seven disciplines. Also presented here are an analysis of the reliability (consistency) of the reputational survey ratings and an examination of some factors that might possibly have influenced the survey results. This chapter concludes with suggestions for improving studies of this kind--with particular attention given to the types of measures one would like to have available for an assessment of research-doctorate programs. This chapter necessarily involves a detailed discussion of various statistics (means, standard deviations, correlation coefficients) de- scribing the measures. Throughout, the reader should bear in mind that all these statistics and measures are necessarily imperfect attempts to describe the real quality of research-doctorate programs. Quality and some differences in quality are real, but these differences cannot be subsumed completely under any one quantitative measure. For example, no single numerical ranking--by measure 08 or by any weighted average of measures--can rank the quality of different programs with precision. However, the evidence for reliability indicates considerable sta- bility in the assessment of quality. For instance, a program that comes out in the first decile of a ranking is quite unlikely to "really" belong in the third decile, or vice versa. If numerical ranks of pro- grams were replaced by groupings (distinguished, strong, etc.), these groupings again would not fully capture actual differences in quality since there would likely be substantial ambiguity about the borderline Furthermore, any attempt at linear ordering (best, next best, . . .) may also be inaccurate. Programs of roughly comparable quality may be better in different ways, so that there simply is no one best--as will also be indicated in some of the numerical anal- yses. However, these difficulties of formulating ranks should not hide the underlying reality of differences in quality or the importance of high quality for effective doctoral education. between adjacent groups. 175

176 SUMMARY OF THE RESULT S Displayed in Table 10.1 are the numbers of programs evaluated (bot- tom line) and the mean values for each measure in the seven social and behavioral science disciplines. As can be seen, the mean values re- ported for individual measures vary considerably among disciplines. The pattern of means on each measure is summarized below, but the reader in- terested in a detailed comparison of the distribution of a measure may wish to refer to the second table in each of the seven preceding chapters.2 Program Size (Measures 01-03~. Based on the information provided to the committee by the study coordinator at each university, psychology pro- grams had' on the average, the largest number of faculty members (29 in December 1980), followed by history (28). Psychology programs graduated the most students (71 Ph.D. recipients in the FY1975-79 period) and had the largest enrollment (102 doctoral students in December 1980). In con- trast' geography programs were reported to have an average of only 13 faculty members, 16 graduates, and 22 doctoral students. Program Graduates (Measures 04-077. The mean fraction of FY1975-79 doctoral recipients who as graduate students had received some national fellowship or training grant support (measure 04) ranges from .21 for graduates of economics programs to .48 for graduates in anthropology. With respect to the median number of years from first enrollment in a graduate program to receipt of the doctorate (measure 05), psychology graduates typically earned their degrees more than a year sooner than graduates in any other discipline. Graduates in geography and history reported the longest median times to the Ph.D. In terms of employment status at graduation (measure 06), an average of 78 percent of the Ph.D. recipients from economics programs reported that they had made firm job commitments by the time they had completed requirements for their degree, contrasted with 56 percent of the program graduates in history. A mean of 33 percent of the sociology graduates indicated that they had made firm commitments to take positions in Ph.D.-granting institutions (measure 07), while only 16 percent of those in history had made such plans. The low averages in history for measures 06 and 07 may be due, in part, to an apparent shortage of faculty openings in this discipline In recent years. Survey Results (Measures 08-117. Differences in the mean ratings de- rived from the reputational survey are not large. The mean rating of scholarly quality of program faculty (measure 08) ranges from 2.3 in has noted in Chapter II, for programs in history, data are not presented for measures 13 and 14; in anthropology and geography, data are not available for measure 14. 2The second table in each of the seven preceding chapters presents the standard deviation and decile values for each measure.

177 TABLE 10.1 Mean Values for Each Program Measure, by Discipline Anthro- Political Psych- Soci- pology Economics Geography History Science ology ology Program Size 01 17 23 13 28 23 29 21 02 28 41 16 38 35 71 33 03 51 68 22 51 50 102 49 Program Graduates 04 .48 .21 .26 .26 .28 .39 .38 05 8.3 7.3 8.7 9.2 8.3 6.2 8.2 06 .60 .78 .72 .56 .68 .69 .75 07 .28 .26 .28 .16 .26 .24 .33 Survey Results 08 2.8 2.3 2.8 2.6 2.6 2.5 2.5 09 1.6 1.3 1.6 1.6 1.5 1.6 1.5 10 1.0 1.1 1.0 1.1 1.1 1.1 1.0 11 1.1 .9 1.2 .9 1.0 .7 1,0 University Library 12 .4 .1 .4 .2 .2 .1 .2 Research Support 13 .22 .11 .14 NA . 06 .21 .12 14 NA 832 NA NA 520 1003 790 Publication Records 17 30 52 17 43 43 81 52 18 .61 .63 .51 .58 .63 .66 .66 Total Programs 70 93 49 102 83 150 92

178 economics to 2.8 in anthropology and geography, and programs were judged to be, on the average, between "reasonably" (2.0) and "mini- mally" {1.0) effective in educating research scholars/scientists (mea- sure 09~. In the opinions of the survey respondents, there has been "little or no change" (approximately 1.0 on measure 10) in the last five years in the overall average quality of programs. The mean rating of an evaluator's familiarity with the work of program faculty (measure 11) is close to 1.0 ("some familiarity in every discipline except psychology (0.7~--about which more will be said later in this chapter. The reader should be reminded that the distribution of ratings may vary from one discipline to another. If one examines, for example, the top program ratings recorded for measure 08 in each discipliner one finds noticeably higher top ratings in economics (five programs with ratings above 4.7) and history (three programs with ratings above 4.7) than in either anthropology and geography (no programs with ratings above 4.6~. The study committee does not have an explanation of this observation but wishes to emphasize that many differences may be found in the distributions of survey ratings in the various disciplines and that the determinants of these differences are not known. As dis- cussed in Chapter II, the survey ratings reflect each program's stand- ing relative to other programs in the same discipline and provide no basis for making comparisons across disciplines. University Library (Measure 12~. Measure 12, based on a composite in- dex of the sizes of the library in the university in which a program resides, is calculated on a scale from -2.0 to 3.0, with means ranging from .1 in economics and psychology to .4 in anthropology and geogra- phy. These differences may be explained, in large part, by the number of programs evaluated in each discipline. In the disciplines with the fewest doctoral programs (anthropology and geography), the programs included are typically found in the larger institutions, which are likely to have high scores on the library size index. Ph.D. programs in economics and psychology are found in a much broader spectrum of universities that includes the smaller institutions as well as the larger ones. Research Support {Measures 13-14~. Measure 13, the proportion of pro- gram faculty who had received ADAMHA, NIH, OR NSF4 research grant awards during the FY1978-80 period, has mean values ranging from .22 and .21 in anthropology and psychology' respectively, to .06 in politi- cal science. It should be emphasized that this measure does not take 3The index, derived by the Association of Research Libraries, reflects a number of different measures, including number of volumes, fiscal ex- penditures, and other factors relevant to the size of a university li- brary. See the description of this measure presented in Appendix D. Alcohol, Drug Abuse, and Mental Health Administration; National Insti- tutes of Health; and National Science Foundation.

179 into account research support that faculty members have received from sources other than these three federal agencies. In terms of total university expenditures for R&D in a particular discipline (measure 14), the mean values are reported to range from $520,000 in political science to $1,003,000 in psychology. (As noted earlier, data are available for programs in only four of the seven disciplines.) The large differences in reported expenditures are likely to be related to three factors: the differential availability of research support in each of the disciplines, the differential average cost of doing re- search, and the differing numbers of individuals involved in the re- search effort. Publication Records (Measures 17 and 18). Considerable diversity is found In the mean number of articles by program faculty (measure 173.5 An average of 81 articles published in the 1978-80 period have been attributed to program faculty members in psychology, contrasted with 17 articles by geography program faculty. These large differences reflect both the average faculty size in a particular discipline and the frequency with which scientists in that discipline publish; it may also depend on the length of a typical paper in a discipline. With re- spect to measure 18, the fraction of faculty who had published at least one article during this three-year period, the differences among the means in the seven disciplines are much smaller. The largest fractions are found in psychology and sociology (.66) and the smallest in geog- raphy (.51~. · . _ . · . . . C ORRELATIONS AMONG MEASURES Relations among the program measures are of intrinsic are relevant to the issue of validity of the measures as indices __ _ quality of a research-doctorate program. Measures that are logically related to program quality are expected to be related to each other. To the extent that they are, a stronger case might be made for the va- lidity of each as a quality measure. A reasonable index of the relationship between any two measures is the Pearson product-moment correlation coefficient. A table of corre- lation coefficients of all possible pairs of measures is presented in each of the seven preceding chapters. This chapter presents selected correlations to determine the extent to which coefficients are compa- rable in the seven disciplines. Special attention is given to the correlations involving the number of FY1975-79 program graduates (mea- sure 02), survey rating of the scholarly quality of program faculty (measure 08), university R&D expenditures in a particular discipline (measure 14), and the total number of faculty articles (measure 17~. These four measures have been selected because of their relatively high interest and of the ssee Appendix J for two alternative measures of publication records that have been compiled for programs in psychology.

180 correlations with several other measures. Readers interested in corre- lations other than those presented in Tables 10.2-10.5 may refer to the third table in each of the preceding chapters. Correlations with Measure 02. Table 10.2 presents the correlations of measure 02 with each of the other measures used in the assessment. As might be expected, correlations of this measure with the other two measures of program size--number of faculty and doctoral student en- rollment--are reasonably high in all seven disciplines. Of greater interest are the strong positive correlations between measure 02 and measures derived from either reputational survey ratings or publication records. The coefficients describing the relationship of measure 02 with measure 17 are greater than .60 in anthropology, economics, his- tory, and sociology and approximately .50 in the other three disci- plines. The correlations with measure 18, the fraction of faculty with one or more articles published during the 1978-80 period, are much smaller. This result is not surprising, of course, since measure 17 reflects the total number of articles by program faculty, while measure 18 reflects the fraction of faculty members who publish (and is not size dependent). The correlations of measure 02 with measures 08, 09, and 11 are also moderately high--.56 or greater in all disciplines ex- cept psychology. It is quite apparent that the programs that received high survey ratings and with which evaluators were more likely to be familiar were also ones that had larger numbers of graduates. The weaker relationship in psychology may be explained, in part, by the fact that some of the programs have produced large numbers of graduates in clinical areas of psychology and may not have distinguished reputa- tions in research. Although the committee gave serious consideration to presenting an alternative set of survey measures that were adjusted for program size, a satisfactory algorithm for making such an adjust- ment was not found. In attempting such an adjustment on the basis of the regression of survey ratings on measures of program size, it was found that some exceptionally large programs appeared to be unfairly penalized and that some very small programs received unjustifiably high adjusted scores. Measure 02 also has positive correlations in most disciplines with measure 12, an index of university library size, and with measures 13 and 14, which pertain to the level of support for research in a pro- gram. Of particular note are the moderately large coefficients in economics for all three of these measures. The correlations of measure 02 with measures 04, 05, 06, and 07 are smaller but still positive in most of the disciplines. From this analysis it is apparent that the number of program graduates tends to be positively correlated with all of the other 15 variables and that the relationship of measure 02 with the other variables tends to be weakest for programs in psychology. Correlations with Measure 08. Table 10.3 shows the correlation coef- ficients for measure 08, the mean rating of the scholarly quality of program faculty, with each of the other variables. The correlations of measure 08 with measures of program size (01, 02, and 03) are .40 or greater for all disciplines except psychology. Not surprisingly, the

181 TABLE 10.2 Correlations of the Number of Program Graduates (Measure 02) with Other Measures, by Discipline Anthro- pology Political Psych- Soci- Economics Geography History Science ology ology Program Size 01 .69 .61 .48 .77 .56 .65 .55 03 .68 .63 .52 .83 .82 .81 .68 Program Graduates 04 .23 .36 .09 .34 .23 .10 .31 05 .10 .29 .19 .07 .03 -. 1~ . 18 06 .43 . 32 .13 .09 .06 -.06 .19 07 .35 .33 .30 .43 .20 -.06 .32 Survey Results 08 .71 .75 .60 .74 .60 .31 .72 09 .68 .74 .68 .72 .56 .23 .73 10 -.15 .00 .00 .02 -.04 -.04 -.03 11 .67 .71 .57 .77 .58 .39 .68 On iver s ity Library 12 .68 .57 .30 .73 .66 .36 .54 Research Support 13 .39 .54 .42 N/A .24 - . 04 .46 14 N/A .52 N/A N/A .43 .24 .38 Publication Records 17 .70 .76 .46 .82 .50 .49 .63 18 .24 . 37 . 22 .35 .18 -. 01 . 25

182 TABLE 10.3 Correlations of the Survey Ratings of Scholarly Quality of Program Faculty (Measure 08) with Other Measures, by Discipline Anthro- Political pology Economics Geography History Science Psych- Soc i - ology ology Program Size 01 .83 .61 .46 .69 .63 .57 .62 02 .71 .75 .60 .74 .60 .31 .72 03 .65 .56 .42 .66 .47 .20 .60 Program Graduates 04 .49 .42 .36 .63 .64 .64 .51 05 .34 .36 .16 .19 .10 .13 .29 06 .40 .31 .36 .05 .30 .24 .15 07 .50 .48 .51 .54 .52 .74 . .47 Survey Results 09 .96 .98 .98 .98 .98 .97 .98 10 .21 .35 .19 .24 .13 .05 .33 11 .95 .97 .94 .97 .98 .97 .97 University Library 12 .64 .67 .52 .71 .74 .73 .75 Research Suppor t 13 .46 .76 .52 N/A .40 .75 .63 14 N/A .44 N/A N/A .43 · 49 · 30 Publication Records 17 .75 .78 .78 .79 .71 .74 .80 18 .26 .47 .59 .53 .44 .57 .49

183 larger the program, the more likely its faculty is to be rated high in quality. This relationship is especially strong in anthropology, eco- nomics, history, and sociology. Correlations of measure 08 with measure 04, the fraction of stu- dents with national fellowship awards, are greater than .60 in history, political science, and psychology and range between .36 and .51 in the other four disciplines. In contrast, for programs in the physical sciences and engineering, the corresponding coefficients (reported in earlier volumes) are considerably smaller. The correlation of rated faculty quality with measure 05, the shortness of time from matricula- tion in graduate school to award of the doctorate, is positive but small in each of the social and behavioral science disciplines. Corre- lations of ratings of faculty quality with measure 06, the fraction of program graduates with definite employment plans, are also small but positive in most of the disciplines. In every discipline the correla- tion of measure 08 is higher with measure 07, the fraction of graduates having agreed to employment at a Ph.D.-granting institution. These coefficients are greater than .70 in psychology and range between .47 and .54 in the other six disciplines. The correlations of measure 08 with measure 09, the rated effec- tiveness of doctoral education, are uniformly very high, at or above .96 in every discipline. This finding is consistent with results from the Car~ter and Roose-Andersen studies.6 The coefficients describing the relationship between measure 08 and measure 11, familiarity with the work of program faculty, are also very high, ranging from .94 to .98. In general, evaluators were more likely to have high regard for the quality of faculty in those programs with which they were most fa- miliar. That the correlation coeffients are as large as observed may simply reflect the fact that ~known" programs tend to be those that have earned strong reputations. Correlations of ratings of faculty quality with measure 10, the ratings of perceived improvement in program quality, are below .25 in all disciplines except economics and sociology. One might have ex- pected that a program judged to have improved in quality would have been somewhat more likely to receive high ratings on measure 08 than would a program judged to have declined--thereby imposing a small post tive correlation between these two variables. Correlations ranging from .52 to .75 are observed between measure 08 and measure 12 (university library size). Moderate to high corre- lations also are found between measure 08 and support for research (measures 13 and 14) and publication records (measures 17 and 18~. Of particular note are the strong correlations with measure 17, the total number of published articles by program faculty--ranging from .71 to .80.7 In all disciplines the correlations with measure 17 are appre- 6Roose and Andersen, p. 19. 7 See Appendix J for the correlations of measure 08 with measures 15 and 16 (alternative measures of publication records) in psychology. These coefficients are nearly as high as those found between measures 08 and 17.

184 ciably higher than those with measure 18, the fraction of faculty with one or more articles published during the 1978-80 period. Correlations with Measure 14. Correlations of measure 14, reported dollars of support for research and development, with other measures are shown in Table 10.4. (Data on research expenditures in anthropol- ogy, geography, and history are not available.) The pattern of rela- tions is quite similar for programs in economics, political science, psychology, and sociology: moderately high correlations with both mea- sures of program size and reputational survey results (except measure 10) and somewhat higher correlations with measure 17, the total number of faculty articles. In interpreting these relationships one must keep in mind the fact that the research expenditure data have not been ad- justed for the number of faculty and other staff members involved in research in a program. The correlation with measure 13, which has been adjusted for faculty size, ranges from .28 to .38. Correlations with Measure 17. Measure 17 is the number of published articles by program faculty during the 1978-80 period. The correla- tions of this measure with all others appear in Table 10.5. Of partic- ular interest are the high correlations with the reputational survey results {excluding measure 10~. Most of those coefficients exceed .70. Measure 17 is positively related to the measures of program size (01, 02, and 03~; moderately high correlations are also observed between measure 17 and measures 12 and 14. Also of note are the correlations with measure 07, the fraction of graduates with commitments to take positions in Ph.D.-granting universities. These coefficients range from .34 (in anthropology) to .47 (in sociology). For psychology programs, data have also been compiled on two alter- native measures of publication records--measure 15, the total number of 1978-79 articles attributed to faculty and other program staff, and measure 16, the estimated "overall influence" of these articles. The relationship of these two measures with each of the other measures used in the evaluation of psychology programs is reported in Appendix J. Of particular interest is the correlation of measure 15 with measure 17 since these measures were derived from different sources (see Appendix J) and represent independent estimates of total publication productiv- ity for a program. The coefficient describing the relation of these two measures is as high as .78. Despite the appreciable correlations between reputational ratings of quality and program size measures, the functional relations between the two probably are complex. If there is a minimum size for a high- quality program, this size is likely to vary from discipline to disci- pline. Increases in size beyond the minimum may represent more high- quality faculty, or a greater proportion of inactive faculty, or a faculty with heavy teaching responsibilities. In attempting to select among these alternative interpretations, a single correlation coeffi- cient provides insufficient guidance. Nonetheless, certain similari- ties across disciplines may be seen in the correlations among the measures. High correlations consistently appear among measures 08, 09, and 11 from the reputational survey, and these measures also are prom-

185 TABLE 10. 4 Correlations of the University Research Expenditures in a Di (Measure 14) with Other Measures, by Discipline Anthro- pology .scipl ine Political Psych- Soci- Economics Geography History Science ology ology Program Size 01 N/A . 49 N/A N/A .55 · 35 · 37 0 2 N/A . 5 2 N/A N/A . 4 3 . 2 4 . 3 8 03 N/A .45 N/A N/A .27 .11 .30 Program Graduates 04 N/A .11 N/A N/A .32 . 29 .31 05 N/A .25 N/A N/A -.12 .05 .10 06 N/A .18 N/A N/A .11 . 26- .15 07 N/A . 27 N/A N/A . 20 . 31 . 22 Survey Results 08 N/A .44 N/A N/A .43 .49 .30 0 9 N/A . 4 4 N/A N/A . 3 9 · 5 3 · 3 7 10 N/A . 05 N/A N/A . 06 - . 03 - .18 11 N/A .38 N/A N/A .41 .47 .29 University Library 12 N/A .41 N/A N/A .40 .45 .25 Research Suppor t 13 N/A .29 N/A N/A .35 .28 .38 Publ ication Records 17 N/A .54 N/A N/A .59 .53 .45 18 N/A .31 N/A N/A .04 .29 .11

186 TABLE 10. 5 Correlations of the Total Number of Articles Published by Faculty (Measure 17) with Other Measures, by Discipline Anthro- Political Psych- Soci- pology Economics Geography History Science ology ology Program Size 01 .82 .79 .56 .86 .78 .76 .69 02 .70 .76 .46 .82 .50 .49 .63 03 .53 .47 .33 .75 .42 .34 .47 Program Graduates 04 .14 .32 . 23 .39 .34 .35 .33 05 .26 .32 .06 .08 -. 04 .12 .36 06 .30 .33 .37 .08 .29 .09 .21 07 .34 .38 .45 .39 .46 .43 . 47 ~ U ~ v By ^~ US ~~ 08 .75 .78 .78 .79 .71 .74 .80 09 .72 .75 .77 .77 .70 .71 .82 10 .19 .26 .45 .21 .17 .15 .37 11 .71 .72 .73 .79 .69 .77 .77 University Library 12 .~_ .v' .=v .uv .11 .~a too Research Support 13 .36 .60 .42 N/A .26 .42 .54 14 N/A .54 N/A N/A . 59 · 53 · 45 Publication Records 18 . 39 .54 .67 .60 .59 .54 .63

187 inently related to program size {measures 01, 02, and 03), to publica- tion productivity (measure 17), to R&D expenditures (measure 14), and to library size (measure 12~. These results show that for all disci- - ~~ ~ ~ ~ ~ ~~ ~~ ~ ~~ tend to be associated with program size and with other correlates of size: publi- cation volume, R&D expenditures, and library size. Also, for most disciplines, the reputational measures 08, 09, and 11 tend to be posi- tively related to the availability of fellowship support (measure 04), to the employment prospects of program graduates (especially measure 07), to the fraction of faculty holding research grants (measure 13), and to the fraction who have recently published {measure 18~. plines the reputational rating measures (08, 09, and 11) ANALYSIS OF THE SURVEY RESPONSE Measures 08-11, derived from the reputational survey, may be of particular interest to many readers since measures of this type have been the most widely used (and frequently criticized) indices of qual- ity of graduate education. In designing the survey instrument for this assessment the committee made several changes in the form that had been used in the Roose-Andersen study. The modifications served two pur- poses: to provide the evaluators with a clearer understanding of the programs that they were asked to judge and to provide the committee with supplemental information for the analysis of the survey response. One change was to restrict to 50 the number of programs that any indi- vidual was asked to evaluate. Probably the most important change was the inclusion of lists of names and ranks of individual faculty members involved in the research-doctorate programs to be evaluated on the sur- vey form, together with the number of doctoral degrees awarded in the previous five years. Ninety percent of the evaluators were sent forms with faculty names and numbers of degrees awarded; the remaining 10 percent were given forms without this information, so that an analysis could be made of the effect of this modification on survey results. Another change was the addition of a question concerning an evaluator's familiarity with each of the programs. In addition to providing an in- dex of program recognition (measure 11), the inclusion of this question permits a comparison between the ratings furnished by individuals who had considerable familiarity with a particular program and the ratings by those not as familiar with the program. Each evaluator was also asked to identify his or her own institution of highest degree and cur- rent field of specialization. This information enables us to compare, for each program, the ratings furnished by alumni of that institution with the ratings by other evaluators, as well as to examine differences in the ratings supplied by evaluators in certain specialty fields. Before examining factors that may have influenced the survey re- sults, some mention should be made of the distributions of responses to the four survey items and the reliability (consistency) of the rat- ings. For example, in judging the scholarly quality of faculty (mea- sure 08), survey respondents in each discipline rated between 6 and 8 percent of the programs as being "distinguished" and between 2 and 7 percent as "not sufficient for doctoral education" (see Table 10.6)

188 TABLE 10.6 Distribution of Responses to Each Survey Item, by Discipline Anthro- Eco- Geog- Political Psych- Soci- Survey Measure Total pology nomics raphy History Science ology ology 08 SCHOLARLY QUALITY OF PROGRAM FACULTY Distinguished 7.1 6.5 8.0 7.4 7.9 7.1 6.5 6.7 Strong 15.8 18.3 12.0 19.8 16.6 16.2 14.5 16.8 Good 21.3 27.6 17.9 27.4 20.5 25.7 16.6 21.1 Adequate 18.1 21.4 19.5 20.5 16.7 18.9 14.3 19.7 Marginal 9.6 8.5 13.4 8.6 8.3 10.5 7.3 11.2 Not Sufficient for Doctoral Education 4.1 1.9 6.6 4.1 4.6 4.4 2.3 5.1 Don't Know Well Enough to Evaluate 23.9 15.7 22.6 12.2 25.3 17.2 38.6 19.3 TOTAL 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 09 EFFECTIVENESS OF PROGRAM IN EDUCATING SCIENTISTS Extremely Effective 7.0 6.4 7.2 9.9 7.1 7.4 5.8 6.7 Reasonably Effective 27.1 33.0 22.2 40.2 25.8 28.5 22.9 27.0 Minimally Effective 16.0 17.5 17.9 21.9 15.6 18.5 10.8 15.9 Not Effective 4.7 3.3 6.8 6.1 3.5 6.5 2.7 5.7 Don't Know Well Enough to Evaluate 45.3 39.8 45.8 22.0 48.0 39.1 57.9 44.7 TOTAL 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 10 CHANGE IN PROGRAM QUALITY IN LAST FIVE YEARS Better 11.8 12.6 13.8 16.5 10.1 12.9 8.4 12.5 Little or No Change 27.7 32.0 27.7 41.3 27.1 32.3 19.1 27.2 Poorer 8.5 11.3 6.1 17.2 7.4 9.3 5.5 9.3 Don't Know Well Enough to Evaluate 52.0 44.2 52.4 25.1 55.5 45.6 67.0 51.0 TOTAL 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 11 FAMILIARITY WITH WORK OF PROGRAM FACULTY Considerable 23.8 27.3 23.2 33.7 23.7 25.2 16.7 26.1 Some 42.8 49.6 41.0. 48.2 43.8 47.3 35.6 43.5 Little or None 32.0 22.5 34.5 17.0 30.5 26.3 46.1 28.8 No Response 1.5 .7 1.3 1.2 2.0 1.2 1.7 1.7 TOTAL 100.0 100.0 100.0 100.0 100.0 100.0 100.0 100.0 NOTE: For survey measures 08, 09, 10 the "don't knows category includes a small number of cases for which the respondents provided no response to the survey item.

189 In evaluating the effectiveness in educating research scholars/scien- tists, they rated 6-10 percent of the programs as being "extremely ef- fective" and approximately 3-7 percent as "not effective." Of particu- lar interest in this table are the frequencies with which evaluators failed to provide responses to measures 08, 09, and 10. Approximately 24 percent of the total number of evaluations requested for measure 08 were not furnished because survey respondents in the social and behav- ioral sciences felt that they were not familiar enough with a particu- lar program to evaluate it. In psychology, which had 150 programs in- cluded in the assessment, this percentage was nearly 39 percent; in geography, with 49 programs, it was 12 percent. The corresponding per- centages of "don't known responses for measures 09 and 10 are consider- ably larger--45 and 52 percent, respectively--suggesting that survey respondents found it more difficult (or were less willing) to judge program effectiveness and change than to judge the scholarly quality of program faculty. The large fractions of "don't know" responses are a matter of some concern. However, given the broad coverage of research-doctorate pro- grams, it is not surprising that faculty members would be unfamiliar with many of the less distinguished programs. As shown in Table 10.7, survey respondents in each discipline were much more likely to furnish evaluations for programs with high reputational standing than they were for programs of lesser distinction. For example, for social and behav- TABLE 10.7 Survey Item Response Rates, by Discipline and Mean Rating on Measure 08 Survey Measure Total Anthro- Eco- Geog- Political Psych- Soci- pology nomics raphy History Science ology ology 08 SCHOLARLY QUALITY OF PROGRAM FACULTY Mean Rating on Measure 08 4.0 or Higher 96.9 97.3 98.1 98.6 97.4 98.8 92.5 97.7 3.0 - 3.9 90.2 93.2 94.9 94.0 89.2 94.0 82.3 94.2 2.0 - 2.9 77.4 81.2 84.9 88.1 72.6 84.5 61.0 81.8 Less than 2.0 57.9 68.4 63.5 72.9 58.7 65.6 38.9 65.6 09 EFFECTIVENESS OF PROGRAM IN EDUCATING SCIENTISTS Mean Rating on Measure 08 4.0 or Higher 86.3 91.9 89.6 96.1 84.2 90.1 75.5 86.1 3.0 - 3.9 70.0 71.9 74.9 88.4 66.1 74.7 60.6 70.4 2.0 - 2.9 53.2 54.2 57.7 76.3 47.3 57.6 39.8 52.5 Less than 2.0 35.7 39.2 37.7 59.0 36.2 44.0 22.6 38.7 10 CHANGE IN PROGRAM QUALITY IN LAST FIVE YEARS Mean Rating on Measure 08 4.0 or Higher 76.7 83.4 82.0 91.1 73.6 82.1 59.1 78.8 2 0 - 2 9 68 5 68 4 50 0 84 3 59 3 67 3 47 7 64 8 Less than 2.0 28.0 34.3 29.5 55.2 27.0 34.3 16.6 29.9

190 ioral science programs that received mean ratings of 4.0 or higher on measure 08, almost 97 percent of the evaluations requested on measure 08 were provided; 86 and 77 percent, respectively, were provided on measures 09 and 10. In contrast, the corresponding response rates for programs with mean ratings below 2.0 are much lower--58, 36, and 28 percent response on measures 08, 09, and 10, respectively. Of great importance to the interpretation of the survey results is the reliability of the response. How much confidence can one have in the reliability of a mean rating reported for a particular program? In the second table in each of the preceding seven chapters, estimated standard errors associated with the mean ratings of every program are presented for all four survey items (measures 08-11~. While there is some variation in the magnitude of the standard errors reported in every discipline, they rarely exceed .15 for any of the four measures and typically range from .05 to .10. For programs with higher mean ratings the estimated errors associated with these means are generally smaller--a finding consistent with the fact that survey respondents were more likely to furnish evaluations for programs with high reputa- tional standing. The "split-half" correlations presented in Table 10.8 give an indication of the overall reliability of the survey re- sults in each discipline and for each measure. In the derivation of these correlations individual ratings of each program were randomly divided into two groups (A and B), and a separate mean rating was com- puted for each group. The last column in Table 10.8 reports the cor- relations between the mean program ratings of the two groups and is not corrected for the fact that the mean ratings of each group are based on only half rather than a full set of the responses.9 As the reader will note, the coefficients reported for measure 08, the scholarly quality of program faculty, are in the range of .97 to .98--indicating a very high degree of consistency in evaluators' judgments. The corre- lations reported for measures 09 and 11, the rated effectiveness of a program and the evaluators' familiarity with a program, are somewhat lower but still at a level of .92 or higher in each discipline. Not surprisingly, the reliability coefficients for ratings of change in program quality in the last five years (measure 10) are considerably lower, ranging from .63 to .94 in the seven social and behavioral sci- ence disciplines. While these coefficients represent tolerable relia- bility, it is quite evident that the responses to measure 10 are not as reliable as the responses to the other three items. For a discussion of the interpretation of "split-half" coefficients, see Robert L. Thorndike and Elizabeth Hagan, Measurement and Evaluation in Psychology and Education! John Wiley & Sons, New York, 1969, pp. 182-185. 9To compensate for the smaller sample size the n spl it-half" coeffi- cient may be adjusted using the Spearman-Brown formula: r' = 2r/~1 + r). This adjustment would have the effect of increasing a correlation of .70, for example, to .82, a correlation of .80 to .89, a correlation of .90 to .95, and a correlation of .95 to .97.

191 TABLE 10.8 Correlations Between Two Sets of Average Ratings from Two Randomly Selected Groups of Evaluators in the Social Sciences MEASURE 08: SCHOLARLY QUALITY OF PROGRAM FACULTY Discipline Mean Rating Std. Deviation Correlation l Group A Group B Group A Group B N r Anthropology 2.78 2.75 .79 .78 70 .97 Economics 2.28 2.30 1.20 1.16 93 .98 Geography 2.72 2.77 .88 .89 49 .98 History 2.60 2.62 1.07 1.03 102 .97 Political Science 2.59 2.58 .98 1.01 83 .98 Psychology 2.55 2.54 1.00 1.01 150 .97 Sociology 2.53 2.47 1.04 1.06 92 .98 MEASURE 09: EFFECTIVENESS OF PROGRAM IN EDUCATING SCHOLARS Discipline Mean Rating Std. Deviation Correlation Group A Group B Group A Group B N r Anthropology 1.62 1.61 .42 .41 70 .94 Economics 1.33 1.33 .63 .64 93 .96 Geography 1.62 1.62 .49 .47 49 .95 History 1.54 1.56 .55 .51 102 .93 Political Science 1.48 1.46 .55 .56 83 .95 Psychology 1.56 1.52 .54 .54 150 .92 Sociology 1.47 1.48 .57 .53 92 .95 MEASURE 10: IMPROVEMENT IN PROGRAM IN LAST FIVE YEARS Discipline Mean Rating Std. Deviation Correlation Group A Group B Group A Group B N r Anthropology 1.03 1.01 .26 .27 70 .78 Economics 1.13 1.11 .29 .27 93 .84 Geography .97 1.01 .30 .32 49 .94 History 1.06 1.04 .21 .24 102 .63 Political Science 1.06 1.06 .22 .24 83 .75 Psychology 1.07 1.09 .27 .26 150 .64 Sociology 1.03 1.04 .31 .32 92 .85 MEASURE 11: FAMILIARITY WITH WORK OF PROGRAM FACULTY Discipline Mean Rating Std. Deviation Correlation Group A Group B Group A Group ~ N r Anthropology 1.05 1.04 .32 .34 70 .93 Economics .89 .88 .51 .51 93 .98 Geography 1.18 1.16 .33 .31 49 .93 History .94 .92 .42 .41 102 .94 Political Science .99 .99 .41 .42 83 .96 Psychology .69 .70 .42 .44 150 .96 Sociology .96 .98 .46 .46 92 .96

192 Further evidence of the reliability of the survey responses is pre- sented in Table 10.9. As mentioned in Chapter VI of the first volume (mathematical and physical sciences) of the committee's reports, 11 mathematics programs selected at random were included on a second form sent to 178 survey respondents in this discipline, and 116 indi- viduals (65 percent) furnished responses to the second survey. A com- parison of the overall results of the two survey administrations (col- umns 2 and 4 in Table 10.9) demonstrates the consistency of the ratings provided for each of the 11 programs. The average, absolute observed difference in the two sets of mean ratings is less than 0.1 for each measure. Columns 6 and 8 of Table 10.9 report the results based on the responses of only those evaluators who had been asked to consider a particular program in both administrations of the survey. (For a given program approximately 40-45 percent of the 116 respondents to the sec- ond survey had been asked to evaluate that program in the prior sur- vey.) It is not surprising to find comparable small differences in the mean ratings provided by this subgroup of evaluators. Critics of past reputational studies have expressed concern about the credibility of reputational assessments when evaluators provide judgments of programs about which they may know very little. As al- ready mentioned, survey participants in this study were offered the explicit alternative, "Don't know well enough to evaluate." This re- sponse option was quite liberally used for measures 08, 09, and 10, as is shown in Table 10.6. In addition, evaluators were asked to indicate their degree of familiarity with each program. Respondents reported ~considerable" familiarity with an average of only one program in every four or five. While this finding supports the conjecture that many program ratings are based on limited information, the availability of reported familiarity permits us to analyze how ratings vary as a func- tion of familiarity. This issue can be addressed in more than one way. It is evident from the data reported in Table 10.10 that mean ratings of the schol- arly quality of program faculty tend to be higher if the evaluator has considerable familiarity with the program. There is nothing surprising or, for that matter, disconcerting about such an association. When a particular program fails to provoke more than vague images in the evaluator's mind, he or she is likely to take this as some indication that the program is not an extremely lustrous one on the national scene. While visibility and quality are scarcely the same, the world of research in higher education is structured to encourage high quality to achieve high visibility, so that any association of the two is far from spurious. From the data presented in Table 10.10 it is evident that if mean ratings were computed on the basis of the responses of only those most familiar with programs, the values reported for individual programs would be increased. A largely independent question is whether a re- i°Mathematics is the only discipline in which results were obtained from two separate administrations of the survey.

193 TABLE 10.9 Comparison of Mean Ratings for 11 Mathematics Programs Included in Two Separate Survey Administrations Evaluators Rating the Same Survey All Evaluators Program in Both Surveys Measure First Second First Second N X N X N X N X Program A 08 100 4.9 114 4.9 50 4.9 50 4.9 09 90 2.7 100 2.8 42 2.7 43 2.7 10 74 1.2 83 1.2 38 1.1 34 1.2 11 100 1.6 115 1.6 50 1.5 50 1.6 Program B 08 94 4.6 115 4.6 48 4.6 flu 4. 09 81 2.6 91 2.5 40 2.6 39 2.5 10 69 1.0 82 1.0 37 1.0 36 0.9 11 98 1.4 116 1.4 50 1.5 50 1.5 ~y~t ~ TV vv J.~ Bud Job 4z ~.4 ~~ ~. 09 56 2.0 66 2.1 28 2.1 29 2.0 10 55 1.1 62 1.3 30 1.2 27 1.4 11 99 1.0 116 1.1 50 1.1 50 1.0 ^~111 v 08 ,~ -.v ~- J · V ~ 09 50 1.8 48 1.6 27 1.7 16 1.6 10 46 1.4 52 1.5 24 1.4 23 1.5 11 90 1.0 113 0.9 46 1.0 46 0.9 __,_~ ~ 08 _ ~ 09 40 1.8 60 1.9 25 1.8 30 1.8 10 36 0.8 58 0.9 24 0.8 29 0.9 11 96 0.8 115 0.9 52 0.9 52 1.0 ,_~ 08 09 35 1.8 46 1.7 10 1.6 13 1.8 10 32 1.1 43 1.1 11 1.3 12 1.2 11 9S 0.7 115 0.8 43 0.7 44 0.7 09 35 1.7 45 1.6 17 1.7 19 1.7 10 36 1.1 43 1.2 17 1.1 19 1.2 11 85 0.9 116 0.8 46 0.9 46 0.9 08 09 32 1.3 43 1.3 22 1.2 19 1.3 10 30 1.5 39 1.5 20 1.7 17 1.4 11 90 0.7 116 0.6 51 0.7 52 0.6 08 09 33 1.0 41 0.9 19 1.0 18 0.8 10 27 1.2 31 1.1 15 1.1 13 1.2 11 99 0.5 115 0.5 50 0.5 50 0.5 08 09 31 0.8 36 0.7 14 0.6 14 0.7 10 26 1.2 23 1.1 14 1.2 12 1.3 11 96 0.5 113 0.3 49 0.4 48 0.4 09 19 0.8 21 0.5 11 0.6 8 0.4 10 12 0.8 15 0.9 5 1.0 5 0.8 11 99 0.2 114 0.2 48 0.2 47 0.2

194 TABLE 10.10 Mean Ratings of Scholarly Quality of Program Faculty, by Evaluator's Familiarity with Work of Faculty MEAN RATINGS CORRELATION Consid- Some/ erable Little r N Anthropology 3.11 2.61 .92 70 Economics 2.69 2~25 .90 90 Geography 2.96 2.59 .95 49 History 2.85 2.51 .92 102 Political Science 2.90 2.54 .91 81 Psychology 3.03 2.42 .86 148 Sociology 2.96 2.37 .91 91 NOTE: N reported in last column represents the number of programs with a rating from at least one evaluator in each of the two groups. striation of this kind would substantially change our sense of the rel- ative standings of programs on this measure. Quite naturally, the an- swer depends in some degree on the nature of the restriction imposed. For example, if we exclude evaluations provided by those who confessed "little or no" familiarity with particular programs, then the revised mean ratings would be correlated at a level of at least .99 with the mean ratings computed using all of the data. (This similarity arises, in part, because only a small fraction of evaluations are given on the basis of no more than ~little" familiarity with the program.) The third column in Table 10.10 presents the correlation in each d iscipline between the array of mean ratings supplied by respondents claiming "considerable" familiarity and the mean ratings of those in- dicating "some" or "little or no" familiarity with particular programs. This coefficient is a rather conservative estimate of agreement since there is not a sufficient number of ratings from those with "consider- able" familiarity to provide highly stable means. Were more such rat- ings available, one might expect the correlations to be higher. How ever, even in the form presented, the correlations, which are at least .90 in all disciplines except psychology, are high enough to suggest that the relative standing of programs on measure 08 is not greatly affected by the admixtures of ratings from evaluators who recognize that their knowledge of a given program is limited. As mentioned previously, 90 percent of the survey sample members were supplied the names of faculty members associated with each program to be evaluated, along with the reported number of program graduates (Ph.D. or equivalent degrees) in the previous five years. Since ear- lier reputational surveys had not provided such information, 10 percent of the sample members, randomly selected, were given forms without faculty names or doctoral data, as a "control group. n As one might

195 TABLE 10.11 Item Response Rate on Measure 08, by Selected Characteristics of Survey Evaluators in the Social Sciences Anthro- Eco- Geog- Political Psych- Soci- Total pology nomics raphy History Science ology ology EVALUATOR'S FAMILIARITY WITH PROGRAM Considerable 99.9 99.9 99.9 100.0 100.0 99.9 100.0 99.9 Some 97.4 98.8 97.2 98.3 96.4 99.0 96.3 96.6 Little or None 31.8 34.8 40.0 37.1 27.0 39.2 22.0 41.5 TYPE OF SURVEY FORM Names 77.2 85.4 79.1 87.9 7S.6 84.1 62.7 81.6 No Names 66.4 75.6 59.5 87.2 67.0 70.9 50.1 71.9 INSTITUTION OF HIGHEST DEGREE Alumni 98.9 100.0 98.8 98.9 100.0 98.8 97.7 98.8 Nonalumni 75.8 84.1 77.2 87.6 74.4 82.6 61.2 80.5 EVALUATOR'S PROXIMITY TO PROGRAM Same Region 83.7 89.8 81.9 94.1 83.1 88.0 73.0 Outside Region 74.9 83.4 76.7 86.8 73.4 82.0 59.8 NOTE: The item response rate is the percentage of the total ratings requested from survey participants that included a response other than "don't know. n expect, those given faculty names were more likely than other survey respondents to provide evaluations of the scholarly quality of program faculty (see Table 10.11), although the differences found were not large. (The reader may recall that the provision of faculty names ap- parently had little effect on survey sample members' willingness to complete and return their questionnaires. In all disciplines except anthropology, the mean ratings provided by the group furnished faculty names are lower than the mean ratings supplied by other respondents (see Table 10.12~. Although the differ- ences are small, they attract attention because they are consistent with findings in the mathematical and physical sciences, humanities, engineering, and biological sciences and because the direction of the differences was not anticipated. After all, those programs more famil- iar to evaluators tended to receive higher ratings, yet when steps were taken to enhance the evaluator 'a familiarity, the resulting ratings are somewhat lower. One post hoc interpretation of this finding is that a program may be considered to have distinguished faculty if even only a few of its members are considered by the evaluator to be outstanding in their field. However, when a full list of program faculty is pro- tSee Table 2.3.

196 vided, the evaluator may be influenced by the number of individuals whom he or she could not consider to be distinguished. Thus, the presentation of these additional, unfamiliar names may occasionally result in a lower rating of program faculty. However interesting these effects may be, one should not lose sight of the fact that they are small at best and that their existence does not necessarily imply that a program's relative standing on measure 08 would differ much whichever type of survey form was used. Since only about 1 in 10 ratings was supplied without the benefit of faculty names, it is hard to establish any very stable picture of relative mean ratings of individual programs. However, the correlations between the mean ratings supplied by the two groups are reasonably high--ranging from .82 in psychology to .96 in geography (see Table 10.12~. Were these coefficients adjusted for the fact that the group furnished forms without names constituted only about 10 percent of the survey respon- dents, they would be substantially larger. From this result it seems reasonable to conclude that differences in the alternative survey forms used are not likely to be responsible for any large-scale reshuffling in the reputational ranking of programs on measure 08. It also sug- gests that the inclusion of faculty names in the committee's assess- ment need not prevent comparisons of the results with those obtained from the Roose-Andersen survey. Another factor that might be thought to influence an evaluator's judgment about a particular program is the geographic proximity of that program to the evaluator. There is enough regional traffic in academic life that one might expect proximate programs to be better known than those in distant regions of the country. This hypothesis may apply especially to the smaller and less visible programs and is confirmed TABLE 10.12 Mean Ratings of Scholarly Quality of Program Faculty, by Type of Survey Form Provided to Evaluator MEAN RATINGS Names No Name_ CORRELATION Anthropology 2.79 2.56 .89 70 Economics 2.29 2.30 .95 93 Geography 2.74 2.78 .96 49 History 2.61 2.62 .90 102 Political Science 2.57 2.82 .90 83 Psychology 2.55 2.70 .82 146 Sociology 2.46 2.90 .91 92 NOTE: N reported in last column represents the number of programs with a rating from at least one evaluator in each of the two groups.

197 TABLE 10.13 Mean Ratings of Scholarly Quality of Program Faculty, by Evaluator's Proximity to Region of Program MEAN RATINGS CORRELATION Nearby Outside r N Anthropology 2.74 2.77 .89 68 Economics 2.34 2.31 .96 91 Geography 2.82 2.73 .89 48 History 2.73 2.60 .96 101 Political Science 2.61 2.59 .95 81 Psychology 2.59 2.54 .90 149 Sociology 2.59 2.52 .95 90 NOTE: N reported in last column represents the number of programs with a rating from at least one evaluator in each of the two groups. by the survey results. For purposes of analysis, programs were as- signed to one of nine geographic regions 2 in the United States, and ratings of programs within an evaluator's own region are categorized in Table 10.13 as "nearby." Ratings of programs in any of the other eight regions were put in the "outside" group. Findings reported else- where in this chapter confirm that evaluators were more likely to pro- vide ratings if a program was within their own region of the country,~3 and it is reasonable to imagine that the smaller and the less visible programs received a disproportionate share of their ratings either from evaluators within their own region or from others who for one reason or another were particularly familiar with programs in that region. Although the data in Table 10.13 suggest that "nearby" programs were given higher ratings than those outside the evaluator's region (except in anthropology), the differences in reported means are quite small and probably represent no more than a secondary effect that might be expected, because, as we have already seen, evaluators tended to rate higher those programs with which they were more familiar. Fur- thermore, the high correlations found between the mean ratings of the two groups indicate that the relative standings of programs are not dramatically influenced by the geographic proximity of those evaluating them. Another consideration that troubles some critics is that large programs may be unfairly favored in a faculty survey because they are 2 See Appendix I for a list of the states included in each region. 3 See Table 10.11.

198 TABLE 10.14 Mean Ratings of Scholarly Quality of Program Faculty, by Evaluator's Institution of Highest Degree MEAN RATINGS NUMBER OF PROGRAMS WITH ALUMNI RATINGS Alumni Nonalumni N Anthropology 3.89 3.40 26 Economics 3.47 3.14 36 Geography 3.66 3.15 28 History 4.05 3.46 32 Political Science 3.76 3.38 31 Psychology 4.13 3.40 42 Sociology 3.97 3.33 34 NOTE: The pairs of means reported in each discipline are computed for a subset of programs with a rating from at least one alumnus and are substantially greater than the mean ratings for the full set of programs in each discipline. likely to have more alumni contributing to their ratings who, it would stand to reason, would be generous in the evaluations of their alma maters. Information collected in the survey on each evaluator's insti- tution of highest degree enables us to investigate this concern. The findings presented in Table 10.14 support the hypothesis that alumni provided generous ratings--with differences in the mean ratings {for measure 08) of alumni and nonalumni ranging from .33 to .73 in the seven disciplines. Given the appreciable differences between the rat- ings furnished by program alumni and other evaluators, one might ask how much effect this has had on the overall results of the survey. The answer is "very little." As shown in the table, in history and psy- chology fewer than one program in every three received ratings from any alumnus; in geography slightly more than half of the programs were evaluated by one or more alumni.) 4 Even in the latter discipline, however, the fraction of alumni providing ratings of a program is al- ways quite small and should have had minimal impact on the overall mean rating of any program. To be certain that this was the case, mean rat- ings of the scholarly quality of faculty were recalculated for every social and behavioral science program--with the evaluations provided by alumni excluded. The results were compared with the mean scores based on a full set of evaluations. Out of the 639 social and behav- ~4Because of the small number of alumni ratings in every discipline, the mean ratings for this group are unstable and therefore the correla- tions between alumni and nonalumni mean ratings are not reported.

199 ioral science programs evaluated in the survey, none of the programs had an observed difference as large as 0.2, and for 593 programs (93 percent) their mean ratings remain unchanged (to the nearest tenth of a unit). On the basis of these findings the committee saw no reason to exclude alumni ratings in the calculation of program means. Another concern that some critics have is that a survey evaluation may be affected by the interaction of the research interests of the evaluator and the areats) of focus of the research-doctorate program to be rated. It is said, for example, that some narrowly focused pro- grams may be strong in a particular area of research but that this strength may not be recognized by a large fraction of evaluators who happen to be unknowledgeable in this area. This is a concern more difficult to address than those discussed in the preceding pages since little or no information is available about the areas of focus of the programs being evaluated (although in certain disciplines the title of a department or academic unit may provide a clue). To obtain a better understanding of the extent to which an evaluator's field of specialty may have influenced the ratings he or she has provided, an analysis was made of ratings provided by evaluators in psychology. Survey partici- pants in this discipline were divided into two groups according to specialty field (as reported on the survey questionnaire): those spe- cializing in clinical psychology or counseling and guidance and those in the other fields of psychology. The mean ratings of the two groups are reported in Table 10.15. The program ratings provided by clinical psychologists are, on the average, slightly higher than those provided by evaluators in nonclinical areas. Despite these differences there is a high degree of correlation in the mean ratings furnished by the two groups (r = .911. Although one cannot conclude from these findings that an evaluator's specialty field has no bearing on how he or she rates a program, these findings do suggest that the relative standings of programs in psychology would not be greatly altered if the ratings by either group were discarded. Similar findings, presented in the mathematical and physical sciences volume of the committee's report, were obtained from an analysis of survey evaluators in differing spe- cialties within physics and within statistics/biostatistics. TABLE 10.15 Mean Ratings of Scholarly Quality of Program Faculty, by Evaluator's Field of Specialty within Psychology MEAN RATINGS CORRELATION Clinical Other r N Psychology 2.63 2.52 .91 150 NOTE: N reported in last column represents the number of programs with a rating from at least one evaluator in each of the two groups.

200 INTERPRETATION OF REPUTATIONAL SURVEY RATINGS It is not hard to foresee that results from this survey will re- ceive considerable attention through enthusiastic and uncritical re- porting in some quarters and sharp castigation in others. The study committee understands the grounds for both sides of this polarized re- sponse but finds that both tend to be excessive. It is important to make clear how we view these ratings as fitting into the larger study of which they are a part. The reputational results are likely to receive a disproportionate degree of attention for several reasons, including the fact that they reflect the opinions of a large group of faculty colleagues and that they form a bridge with earlier studies of graduate programs. But the results will also receive emphasis because they alone, among all of the measures, seem to address quality in an overall or global fashion. While most recognize that "objective" program characteristics (i.e., publication productivity, research funding, or library size) have some bearing on program quality, probably no one would contend that a single one of these measures encompasses all that need be known about the quality of research-doctorate programs. Each is obviously no more than an indicator of some aspect of program quality. In contrast, the repu- tational ratings are global from the start because the respondents are asked to take into account many objective characteristics and to arrive at a general assessment of the quality of the faculty and the effec- tiveness of the program. This generality has self-evident appeal. On the other hand, it is wise to keep in mind that these reputa- tional ratings are measures of Perceived program quality rather than of ~quality" in some ideal or absolute sense. What this means is that, just as for all of the more objective measures, the reputational rat- ings represent only a partial view of what most of us would consider quality to be; hence, they must be kept in careful perspective. Some critics may argue that such ratings are positively misleading because of a variety of methodological artifacts or because they are supplied by "judges" who often know very little about the programs they are rating. The committee has conducted the survey in a way that per- mits the empirical examination of a number of the alleged artifacts and, although our analysis is by no means exhaustive, the general con- clusion is that their effects are slight. Among the criticisms of reputational ratings from prior studies are some that represent a perspective that may be misguided. This perspec- tive assumes that one asks for ratings in order to find out what t~qual- ity" really is and that to the degree that the ratings miss the mark of "quintessential quality," they are unreal, although the quality that they attempt to measure is real. what this perspective misses is the reality of quality and the fact that impressions of quality, if widely shared, have an imposing reality of their own and therefore are worth knowing about in their own right. After all, these perceptions govern a large-scale system of traffic around the nation's graduate institu- tions--for example, when undergraduate students seek the advice of pro- fessors concerning graduate programs that they might attend. It is possible that some professors put in this position disqualify them- selves on grounds that they are not well informed about the relative

201 merits of the programs being considered. Most faculty members, how- ever, surely attempt to be helpful on the basis of impressions gleaned from their professional experience, and these assessments are likely to have major impact on student decision-making. In short, the impres- sions are real and have very real effects not only on students shopping for graduate schools but also on other flows, such as jobseeking young faculty and the distribution of research resources. At the very least, the survey results provide a snapshot of these impressions from disci- pline to discipline. Although these impressions may be far from ideally informed, they certainly show a strong degree of consensus within each discipline, and it seems safe to assume that they are more than passingly related to what a majority of keen observers might agree program quality is all about. COMPARI SON WITH RESULTS OF THE ROOSEANDERSEN STUDY An analysis of the response to the committee's survey would not be complete without comparing the results with those obtained in the sur- vey by Roose and Andersen 12 years earlier. Although there are obvious similarities in the two surveys, there are also some important differ- ences that should be kept in mind in examining individual program rat- ings of the scholarly quality of faculty. Already mentioned in this chapter is the inclusion, on the form sent to 90 percent of the sample members in the committee's survey, of the names and academic ranks of faculty and the numbers of doctoral graduates in the previous five years. Other significant changes in the committee's form are the iden- tification of the university department or academic unit in which each program may be found, the restriction of requesting evaluators to make judgments about no more than 50 research-doctorate programs in their discipline, and the presentation of these programs in random sequence on each survey form. The sampling frames used in the two surveys also differ. The sample selected in the earlier study included only indi- viduals who had been nominated by the participating universities, while more than one-fourth of the sample in the committee's survey were cho- ~ _a~ a_ c.~ ~ I_ ~ -- ~ :~ an ~allu~lll `' loll ~ All ~ mu ~ by 11~3. (Except for this difference the samples were quite similar--i.e., in terms of the number of evaluators in each discipline and the fraction of senior scholars. Several dissimilarities in the coverage of the Roose-Andersen and this committee's reputational assessments should be mentioned. The former included a total of 130 institutions that had awarded at least 100 doctoral degrees in two or more disciplines during the FY1958-67 period. The institutional coverage in the committee's assessment was based on the number of doctorates awarded in each discipline (as de- scribed in Chapter I) and covered a total population of 228 universi- ties. Most of the universities represented in the present study but not the earlier one are institutions that offered research-doctorate Moor a description of the sample group used in the earlier study, see Roose and Andersen, pp. 28-31.

202 programs in a limited set of disciplines. In the Roose-Andersen study, programs in the same seven social and behavioral science disciplines were rated: anthropology, economics, geography, history, political science, psychology, and sociology. Finally, in the Roose-Andersen study only one set of ratings was compiled from each institution repre- sented in a discipline, whereas in the committee's survey separate ratings were requested if a university offered more than one research- doctorate program in a given discipline. The consequences of these differences in survey coverage are quite apparent: in the committee's survey, evaluations were requested for a total of 639 research-doctor- ate programs in the social and behavioral sciences, compared with 515 programs in the Roose-Andersen study. Figures 10.1-10.7 plot the mean ratings of scholarly quality of faculty in programs included in both surveys; sets of ratings are graphed for 38 programs in anthropology, 71 in economics, 31 in geog- raphy, 79 in history, 61 in political science, 103 in psychology, and 65 in sociology. Since in the Roose-Andersen study programs were identified by institution and discipline (but not by department), the matching of results from this survey with those from the committee's survey is not precise. For universities represented in the latter survey by more than one program in a particular discipline, the mean rating for the program with the largest number of graduates {measure 02) is the only one plotted here. Although the results of both surveys are reported on identical scales, some caution must be taken in inter- preting differences in mean ratings a program received in the two evaluations. It is impossible to estimate what effect all of the dif- ferences described above may have had on the results of the two sur- veys. Furthermore, one must remember that the reported scores are based on the opinions of different groups of faculty members and were provided at different time periods. In 1969, when the Roose-Andersen survey was conducted, graduate departments in most universities were still expanding and not facing the enrollment and budget reductions that many departments have had to deal with in recent years. Conse- quently, a comparison of the overall findings from the two surveys tells us nothing about how much graduate education has improved {or declined) in the past decade. Nor should the reader place much stock in any small differences in the mean ratings that a particular program may have received in the two surveys. On the other hand, it is of particular interest to note the high correlations between the results of the evaluations. For programs in anthropology, economics, history, political science, and psychology, the correlation coefficients range between .90 and .94; in geography and sociology the coefficients are .79 and .86, respectively. The extraordinarily high correlations found in five of the seven disciplines may suggest to some readers that repu- tational standings of programs in these disciplines have changed very little in the last decade. However, differences are apparent for some institutions. Also, one must keep in mind that the correlations are based on the reputational ratings of only 70 percent of the programs evaluated in this assessment in these disciplines and do not take into account the emergence of many new programs that did not exist or were too small to be rated in the Roose-Andersen study.

203 5.0++ + + + + 4.0++ + + 08 * * * * * * * * * * Measure + 3.0++ + + 2.0++ 1 . 0+ * * * * * * * * * * * * * * * * * * * * * * * * * * * r = .90 C.O + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.0 2.0 3.0 Roose-Andersen Rating (1970) 4.0 FIGURE 10.1 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the Roose-Andersen study--38 programs in anthropology. 5.0

s-o++ + + + + 4. 0++ + + + + Measure + 3.0++ 08 + + + + 2.0++ + + 1.0++ + + * * * * * * * * * * 204 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * r s .94 O.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.0 2.0 3.0 4.0 5.0 Roose-Andersen Rating (1970) FIGURE 10.2 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the Roose-Andersen study--71 programs in economics.

205 5 . 0++ + + 4. 0++ + + + Measure + 3.0++ 0 8 + + + 2. 0++ 1 . 0++ * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * + + + + 0.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.0 2.0 3.0 4.0 5.0 Roose-Andersen Rating (1970) FIGURE 10.3 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the Roose-Andersen study--31 programs in geography.

206 s . o _ 4 . 0 _ Measure + 3.0++ 08 2 . 0+ ~ 1 . 0_ * * : * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * ^.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.0 2.0 3.0 4.0 5.0 Roose-Andersen Rating (1970 ) FIGURE 10.4 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the Roose-Andersen study--79 programs in history.

207 s.o++ 4.0++ + + Measure + 3.0++ + + 2. 0++ + + + + 1.0++ + + u.O + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.0 2.0 3.0 ~ ~ 08 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * r = .93 Roose-Ander sen Rat ing ( 197 0 ) 4.0 FIGURE 10.5 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the Roose-Andersen study--61 programs in political science. 5.0

208 s . o ++ * * * * * * * + * * + 4.0++ * * + * * * * * + ~ * * * + * * + * * + * * + * * * * * + * * * + * Measure + 3.0++ * 08 + + * * + * * + * * * * + * * + * * * * + * * * * * + * * * + * * 2.0++ * * * + * * * + * * * + * * + * * * + * + * * * + * + * * * + * * * 1.0++ r = .93 t * w.0 + + + + + + + + + + + + + + ~ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.0 2.0 3.0 4.0 5.0 Roose-Andersen Rating (1970) FIGURE 10.6 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the Roose-Andersen study--103 programs in psychology.

209 5 . 0++ . 4.0++ Measure + 3.0++ 08 + + + + + + + * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * + * * 2.0++ * * + * * + * * * * + * + * * * * + + * * * + * + * + 1.0++ * r = .86 + + + + + 0.0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1.0 2.0 3.0 4.0 5.0 Roose-Andersen Rating (1970) FIGURE 10.7 Mean rating of scholarly quality of faculty (measure 08) versus mean rating of faculty in the Roose-Andersen study--65 programs in sociology.

210 FUTURE STUDIES One of the most important objectives in undertaking this assessment was to test new measures not used extensively in past evaluations of graduate programs. Although the committee believes that it has been successful in this effort, much more needs to be done. First and fore- most, studies of this kind should be extended to cover other types of programs and other disciplines not included in this effort. As a con- sequence of budgeting limitations, the committee had to restrict its study to 32 disciplines, selected on the basis of the number of doc- torates awarded in each. A multidimensional assessment of research- doctorate programs in many important disciplines not included among these 32 should be of great value to the academic community. Consider- ation should also be given to embarking on evaluations of programs of- fering other types of graduate and professional degrees. As a matter of fact, plans for including master 's-degree programs in this assess- ment were originally contemplated, but because of a lack of available information about the resources and graduates of programs at the mas- ter's level, it was decided to focus on programs leading to the re- search doctorate. Perhaps the most debated issue the committee has had to address concerned which measures should be reported in this assessment. In fact, there is still disagreement among some of its members about the relative merits of certain measures, and the committee fully recognizes a need for more reliable and valid indices of the quality of graduate programs. First on a list of needs is more precise and meaningful in- formation about the product of research-doctorate programs--the gradu- ates. For example, what fraction of the program graduates have gone on to be productive investigators--either in the academic setting or in government and industrial laboratories? What fraction have gone on to become outstanding investigators--as measured by receipt of major prizes, membership in academies, and other such distinctions? How do program graduates compare with regard to their publication records? Also desired might be measures of the quality of the students applying for admittance to a graduate program (e.g., Graduate Record Examination scores, undergraduate grade point averages). If reliable data of this sort were made available, they might provide a useful index of the rep- utational standings of programs, from the perspective of graduate stu- dents. A number of alternative measures relevant to the quality-of program faculty were considered by the committee but not included in the as- sessment because of the associated difficulties and costs of compiling the necessary data. For example, what fraction of the program faculty were invited to present papers at national meetings? What fraction had been elected to prestigious organizations/groups in their field? What fraction had received senior fellowships and other awards of distinc- tion? In addition, it would be highly desirable to supplement the data presented on NSF, NIH, and ADAMHA research grant awards (measure 13) with data on awards from other federal agencies as well as from major private foundations.

211 As described in the preceding pages, the committee was able to make several changes in the survey design and procedures, but further imp provements could be made. Of highest priority in this regard is the expansion of the survey sample to include evaluators from outside the academic setting. To add evaluators from nonacademic sectors would require a major effort in identifying the survey population from which a sample could be selected. Although such an effort is likely to in- volve considerable costs in both time and financial resources, the come mittee believes that the addition of evaluators from nonacademic set- tings would be of value in providing a different perspective to the reputational assessment and that comparisons between the ratings sup- plied by academic and nonacademic evaluators would be of particular interest.

Next: Minority Statement »

An Assessment of Research-Doctorate Programs in the United States: Social and Behavioral Sciences (1982)

Chapter: 10 Summary and Discussion

Welcome to OpenBook!

Get Email Updates