| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 137
Appendix C
TechnicaIand S1aUsUcaITechniques
1. AnOfNa18 ~aYG 10 Present Rankings: Random Halves and Boo1s1rap Methods
2. CorreIa1es of Reputation Analysis
~7
OCR for page 138
138
APPENDIX G
Alternate Ways to Present Rankings:
Random Halves and Bootstrap Methods
Reputational surveys, such as those conducted for earlier research-doctorate program
assessments, were not designed to provide accurate rankings of the programs. They
represented estimates of ratings, where the results could vary, depending on the selected
set of raters. The confidence interval analysis performed in the last two assessments
illustrated this point. However, users of the assessments chose to ignore this and focused
instead on specific scores obtained by averaging questionnaire responses.
A far better method would be to incorporate variability into the reporting of ratings en c!
display a range of program ratings rather than a single ranking. Random Halves and
Bootstrap are two methods which could be used to assign measures of accuracy to
statistical estimates as well as to present data. Both methods involve the resampling of
original data set in slightly different ways and would provide slightly different results.
Methods
For a particular field, such as English Language and Literature, assume there are M
programs and N program raters. Each rater only rates a subset of M programs; therefore,
some programs may be rated more often than others, since the number of rating for a
program depends on which raters responded to the survey ant} whether they actually rated
a program on their questionnaire. A response matrix R can be constructed with a
reputational rating rij as an entry for rater i rating program j, i = I, . . . ,N and j = I ,. . . ,M.
Along each of the rows in the matrix there will be blank spaces for programs that the
rater was not asked to rate or did not rate. The different ratings for a given program are
then aggregated into a single "mean" rating, rj ( rj could also include weighting and
trimming, for example, and may not be just the simple mean of all ratings for program j).
Random Halves Method: The Random Halves method is closely related to what is known
in statistics literature as the "random group method" for assessing variances of estimates
in complex sample surveys. This approach, which has many variants, has literature that
goes back to at least 1939 (see Wolter, 19851. It is closely related to another method
called the "Jackknife" which was introduced in 1949 and popularized in the 1960s. The
essence of the random group or the Jackknife method is to calculate a numerical quantity
of interest on a smaller part of the whole data set, and to do this for several such smaller
parts of the original data. The differences between the results from these smaller parts of
the data are combined to assess the amount of variability, computed on the whole data
set. The random halves method is an example of this in which the smaller parts of the
whole data are random halves of the data.
The Random Halves method is applied as follows: A random sample of N/2 of the rows
of R is made without replacement, meaning that a row cannot be selected twice. The
mean rj for each program is then computed from this random half sample of the full
data. All the programs are then ranked on a basis of these mean ratings. This procedure
could be repeated ten, one hundred, or several hundred times to produce a range of
OCR for page 139
APPENDIX G
ratings and rankings for each program in the field. Rankings for each program could be
summarized as the distribution that lies within the interquartile range of the ratings.
Users of reputational ratings would recognize that raters rate programs differently and
half of the raters ranked program j from a to b, where a is the 25 percentile of its ranking
distribution and b is the 75 percentile.
Bootstrap Method: The Bootstrap method was developed more recently than the random
group method, and its literature only dates back to 1979. it is well described in Efron
(1982) and Efron and Tibshirani (19931. Although the Bootstrap method was not created
specifically for assessing variances in complex sample surveys, it has been used for that
purpose. It was created as a general method for assessing the variability of the results of
any type of data analysis, complex or simple, and has become a standard tool. Instead of
sampling N/2 rows of R without replacement, N of the rows would be sampled from R
with replacement, meaning that a row could be selected several times. The same
procedure could be used for computing the mean, as in the Random Halves method.
The two methods provide very similar results. The perceived advantage in the Random
Halves method is in the process, where a rater pool is selected and half the raters are sent
questionnaires. This rating process is repeated again and again for the original pool. It is
not significantly different from what was done in the past, when the selection of raters
and the use of a confidence interval show that a certain percentage of the ratings would
fall in a similar interval even if a different set of raters were selected. The advantage in
the Bootstrap method, on the other hand, is an established method with a developed
theory for the statistical accuracy of survey measurements.
A Comparison of the Random Halves and Bootstrap Methods
The differences between the methods can be demonstrated by the following simple
example. Consider an example where three raters rate two programs. The raters are
labeled 1, 2 and 3, and the two programs are labeled A and B. The Rating Matrix Is:
.
Table I: The Rating Matrix
Average rating by
Raters A B raters
..
1 1 o 1 0.5
1 2 1 ~ 1.5
.
1 O ~ 0.5
.. ; - .
Average rating 1 2/3 1
In this example, all three raters rate the same two programs on a scale of O to 2. In
turning ratings into rankings, assume that lower ratings correspond to assessments of
higher quality. Thus, rater 1 rated A higher than B. by giving A a rating of O and B a
rating of 1. The last row of the Rating Matrix has the average ratings for each program.
For these ratings, B is ranked higher than A because its average rating is slightly lower
than that of A. In the discussion of the example, the rank of A, will be denoted by
Rank(A). Therefore, Rank(A) = 2, while Rank(B) = I.
139
OCR for page 140
140
APPENDIX G
This example may appear to be unrealistic in at least two ways. First, it is very small.
This means that it is only possible to examine the probability that A is ranker! ISt or And.
Second, programs are not sampled for raters to rate, instead, the raters rate all of the
programs in the example. However, neither of these simplifications is very important for
the things that will be demonstrated by the example. On the other hancI, the example
shows some differences among ratings of the three raters. Rater 1 ranks A and B
differently from the way Raters 2 and 3 clot Also the seconc} raters rating numbers are
higher than the other two.
In applying Random Halves (RH) to this example, there are two variations, since the
number of responses is not an even number. Hence, denote by RH(1) the "half-sample"
consists of 1 of the 3 raters chosen at random, and in RH(2) the "half-sample" consists of
2 of the 3 raters chosen at ranclom. These are the only possibilities for the RH method in
the example.
in the RH(~) case, since there are three possible raters to be sampled, they are each
sampled with probability I/3, and that the averages are the rating. Below is a table that
summarizes the three possible sample results for RHO.
Table 2: Summary of RH(~)
Sample Average A Average B Rank(A) Probability of
rating rating the sample
{1} 0 1 1 1/3
{2} 2 1 2 1/3
{3} 1 0 2 1/3
Because Rank(A) = 2 in two of the three possible half samples, the probability that
Rank(A) = 2 is 2/3. This should be compared to the finding that in the data (i.e., the
Rating Matrix on Table 1) the Rank of A is 2, so the RH(1) method indicates that it could
have been different from 2 about 1/3 of the time.
In the RH(2) case, two raters are samplecl, and there are three possibilities ~ I,2 I, ~ 1,3 }
en c! {2,3 ~ . Suppose the two sampled raters are 1 ant! 2. Then the data to be averaged are
given in the following table. The table below summarizes what occurs for three possible
half samples for RHO. Note that in the cases, where the average ratings are the same
random tie splitting is used and the rank order is clenoted by l.5.
Table 3: Summary of RH(2)
Sample | Average A Average B | Rank(A) | Probability
rating rating
. .. .. . ... . . .. . . .
{ 1,2} 1 1.5 1 1/3
{1,3} ~ 1/2 1/2 1 5 1 1/3
{2,3} 3/2 1/2 2 T 1/3
OCR for page 141
APPENDIX G
In the case of RH(2), there are three ways to get the probability that Rank(A) = 2. The
first is from sample ~2,3 I. The other two ways are either one of two other samples and
have the tie split so that Rank (A) = 2. Hence, the probability is l/3 + (~/3)(~/2) +
(~/3)(~/2) = 2/3. The fraction, I/2, represents the tie splitting. Note that 2/3 is also the
probability for Rank(A) = 2 in RH(1L).
In summary, the RH method calls for repeatecIly taking "half-samples" of the rating
matnx, averaging the resulting ratings for A en cl B. en cl then ranking A en cl B based on
these average ratings. In resampling over en c! over, a clistnbution is constructed of how
many times A is ranked ~ or 2. For example, in the case of RHO) or RH(2), A wouIct be
ranked 2 about 2/3 of the time. Therefore, while the two versions of the RH methoc!
give different data, using random tie splitting gives the same results for the probability
that A is ranker! 2.
Applying the Bootstrap (Boot) method to the example, three raters were samplecI, en c} the
same rater couIc! be selectee! more than once. They were regarded as representative of all
the possible raters who conic! have been sampled to rate the programs. Clearly such an
assumption vanes in plausibility clue to various factors, such as how many raters are
being considerecl and how they are onginal~y chosen. It is, however, a useful assumption
and appears throughout many applications of statistics.
In sampling three rows from the original Rating Matrix there are 27 possible
combinations or the probability of any sample is I/27. They are listen! in the following
table.
Table 4: Bootstrap samples, their average ratings for A and B and the Rank of A.
Sam le I A
. P
111
112
113
121
122
- 123
131
32
133
0/3
2/3
1/3
2/3
4/3
3/3
1/3
~313
2/3
| B | Rank(A) | Sam le | A ~ B | Rank(A)
1 i P .
3/3 1 1 211 2/3 3/3
3/3 1 1 212 4/3 ~ 3/3
2/3 1 1 213 3/3 2/3
~ 3/3 221 4/3 3/3
.
. 3/3 2 1 222 6/3 3/3
2/3 2 1 223 5/3 2/3
2/3 1 1 231 3/3 2/3
,
2/3 2 1 232 5/3 2/3
1/3 2 1 233 4/3 1/3
Sample
.
311 .
312
.
313
321
322
323
331
332
333
A |
1/3
3/3
2/3
.
3/3
5/3
.
4/3
2/3 .
4/3
3/3
B
2/3
2i3
1/3
2/3
2/3
1/3
1/3
1/3
0/3
Rank(A) = 2 occurs a total of 20 times in the table above, yielcling a probability of 20/27
= .74. This is different from the results of the RH methods (i.e., .671. However, it is still
plausible because while A was ranked second in a sample of 3, there is still some
probability that it could have been ranked ~ in a different sample of raters. The Boot
methoc! producer! a somewhat smaller probability estimate, i.e., .26 rather than .33, so
that A couicl have been ranker! Ist, but both of these values are less than ~/z and, are both
plausible in such a small example.
There is no very convincing, intuitive way to favor either one of these two probability
estimates, .67 or .74. Hence, this example has little to offer in making an intuitive choice
141
OCR for page 142
142
between the two approaches. What this does show is that the RH ant! Boot methods do
not give the same results for something that is closely related to the types of probabilities.
Thus, any claim that the two methods are "equivalent" is wrong, but they are clearly
"similar."
Statisticians who are specialists in variance estimation prefer the Bootstrap to ad hoc
methods because it is grounded in theory. The Bootstrap method is the nonparametric,
maximum likelihood estimate of the probability that Rank(A) = 2. The Random Halves
method floes not enjoy this property. However, variance estimation is an important
subject in statistics and many methods, in particular the Jackknife, can be tailored to
situations where they provide serious competition to the Bootstrap. The next section will
illustrate that, when the number of raters and programs are both large, there is little
difference between the Ranclom Halves and the Bootstrap methods.
Analysis of the Expected Variance for the Two Methods
A natural question to ask is: What do the Ranclom Halves and Boot methods produce for
probability distributions of average ratings for programs? Drawing on some results from
probability theory it can be shown that these methods give similar results.
Any method of resampling creates random variables with distributions that cJepencJ on the
resampling method. In the rating example, let the random variables for the average
ratings that result for A and B for each sample be denotes! by RA ant! RB, respectively.
These are random variables with means en c} variances that have well-known values. The
average ratings of A and B in the rating matrix are given in the last row of The Rating
`, . . . ,~ ~ ~ ~ . . . ~ , ~ . ~ ~
Matrix In ~ ante I, and they are denoted In general as rA ano rB. Thus, in the example, rA
= ~ en c! rB = 2/3. In abolition to the average ratings, the variance of the ratings in each
column is clefined as the average of the squares of the ratings in each column minus the
square of the mean rating for that column. Thus, for program A, the variance is
vA = (O + 2 +! )/3—~ = 5/3—~ = 2/3,
anti, for program B. it is
V = (~2 + ]2 +021/3 _ (2/312 = 2/3—4/9 = 6/9—4/9 = 2/9.
Table 5 gives the results for N raters rating Program A ant! n raters user! in the RH(n)
method. If N is even, then n = N/2. In the table let E(RA) denote the "expected value" or
"Iong-run average value" of the average rating for A, RA. Statistics show that it is the
same value, rA, for both the Boot anti the RH methods. rA is also the average rating for A
in the original Rating Matrix, and in general, rA is the average rating given to program A
by the raters rating it. Thus, both the RH en c! Boot methods are unbiased for rA, and any
sensible resampling method will share this property.
APPENDIX G
OCR for page 143
APPENDIX G
Table 5: The mean and variance of the average rating for A in a single resample
~ Bootstra Method | Random Halves, RH(n),Method
~ i P 1
E(RA) rA rA
Var(RA) VA VA (N-n)
N n (N - 1)
Where the two methods can differ is in the value of the variance, Var(RA). This variance
is a measure of how much RA deviates on average from the mean value, rA, from one
random resampling to another. Observe that both formulas for Var(RA) involve, VA, the
variance of the ratings in the column of the Rating Matrix for program A. Note that when
N is even, ant! n = N/2 then the N - n in the numerator for RH(n) is n and it cancels the n
in the denominator leaving only N - ~ in the denominator. This is to be compared to the
N in the denominator for the Bootstrap method. When N. the number of raters is large,
then N and N-l are close ant} the variances of average rating, RA, for the two methods are
nearly the same.
The factor or the right side of the formula for the RH(n) variance is known as the finite
sampling correction anti it gets smaller as n increases relative to N. In the simple
example, here is what these formulas yield.
RHINE: In this case, RA takes on these three possible values with the corresponding
probabilities.
Possible average ratings 0 |~ |2
Probabilities ~ /3 ~ /3 ~ /3
The mean of this distribution is 0(~/3) + (/3) + 2(~/3) = ~ = rA
its variance is o2(~/3) + 12(~/3) + 22(1/3)- 12 = 2/3.
Applying the formula for the variance for RHO) from Table S gives
((2/3)/~)(3 - l)/(3 - l) = 2/3, the same value.
RH(21: in this case, RA takes on these three possible values with the corresponding
probabilities.
Possible average ratings I/2 ~ 3/2
Probabi liti e s I /3 I /3 I /3
The mean of this distribution is (~/21(~/3) + I(~/3) + (3/2~(~/3) = ~ = rA, as before.
143
OCR for page 144
144
APPENDIX G
Its variance is (1/2)2(1/3) + (1)2(1/3) + (312)2(1/3) _ 12
= ((1/4) + 1 + (914))13 - 1 = (1414)13 - (12/12) = 2/12 = 1/6.
Applying the formula for the variance for RH(2) from Table 7 gives
((213)12)(3 - 2)/(3 - 1) = (1/3)(1/2) = 1/6, the same value.
Boot: In this case, RA takes on seven possible values with the corresponding
probabilities.
:
Possible average ratings O 1/3 2/3 1 4/3 5/3 2
Probabi 1 iti es 1/27 3/27 6/27 7/27 6/27 3/3 7 1/3
—
These probabilities are found by summing up the Bootstrap samples that yield the given
possible value in Table 4. This is a larger set of possible average ratings for A than either
one of the RH methods gives. This is due to the richer set of samples available under the
Boot method.
The mean of this distribution is (0)(1/27) + (113)(3127) + (213)(6127) + (1)(7/27) +
(413)(6127) + (5/3)(3/27) + (2)(1/27) = 1 = rA, as it is for the other two methods.
The variance is (0)2(1/27) + (113)2(3127) + (213)2(6127) + (1)2(7/27) + (4/3)2(6127) +
(513)2(3127) + (2)2(1/27) _ 12 = (1/9)(1/27)(3 + 24 + 63 + 96 + 75 + 36) - 1 =
(2971(9X27)) - 1 = (1 I/9) - (919) = 219.
Applying the formula for the variance for Boot from Table 5 gives
((213)13) = 2/9, the same value.
Summary of results
The mean and variance calculations as applied to this simple example illustrates the
following:
(a) The RH and Boot methods are only similar when N. the number of raters rating a
program. is large enough to make the difference between N and N - ~ negligible.
a- - o- ~ of C7 (_7 ~_7
(b) The set of possible samples from which resampling takes place differs for the two
methods, the one for method Boot is much larger in general.
(c) Both methods are unbiased for the mean rating of a program, but they differ in
their variances. When N is even, the variance of Boot is smaller, when N is odd,
the variance of Boot lies between that for RH(n) and RH(n+~) where n < N/2
not. This is observed by examining the data in Table 4.
(d) The Boot method usually has a much richer set of possible ratings in its
resampling distribution, and fewer ties.
OCR for page 145
APPENDIX G
References.
Wolter, K. M. 1985. Introduction to Variance Estimation. New York: Springer-VerIag.
Efron, B. 1982. The Jackknife, the Bootstrap ant! other Resampling Plans.
Philadelphia: Society for Tnclustrial and Applied Mathematics.
Efron, B., and Tibshirani, R. J. 1993. An Introduction to the Bootstrap. New York
Chapman & Hall.
145
OCR for page 146
46
Correlates of Repulation Analysis
The reputational quality of a program is a purely subjective measure; however, it is
related to quantitative measures in the sense that quality judgment could be macle on the
basis of information about programs, such as the scholarly work of the faculty and the
honors awarded to the faculty for that scholarship. Therefore, it may be possible to relate
or to predict quality rankings for programs using quantitative measures. It is clear that
preclicted quality rankings would also be subjective and that the accuracy of such
predictions may change over time.
One way to construct such a relationship is to clo a least squares multilinear regression.
The dependent variable in the regression analysis is represented! by a set of average
ratings, rat, r2, . . , rN for N programs in a particular fielcl. The predictors or independent
variables would be a set of quantitative or coclect program characteristics that are
represented by a vector, an, for program n. The analysis wouIc! construct a function fix)
which provides a predicted average rating foxy) for program n. In this case the relation
between rn ant! few) wouIc! be
rn = fern) + en = aixI'n + a2x2,n + · · · + amXm,n + am+} + en (~)
where x,, x2,n, ..., xm,n represent the m quantity or coded characteristics for the program
n in the field, and en, is the resiclual or the amount by which the predicted average rating
varies from the actual average rating for that program. If the prediction is "goocI" then
the resicluals are relatively small. The coefficients aj are cleterminec! by minimizing the
sum of the squares of the differences rn - fern).
While a single regression equation is generates! using quantitative data and the
reputational score, the selectee! raters of the program provide a certain amount of
variability. This variability can be shown in the following manner: Associated with each
coefficient al is a 95%-conficlence interval [L~, Uil, en cl by ranclomly selecting values for
the coefficients within their confidence intervals, a predicted average rating rn can be
generated for program n. A measure of how close the set of rn ratings is to the rn ratings
can be calculated by
r - r ~~ < p s F.
(2)
where r = Oft, r2, ..., rN), r = art, r2, ..., rN) and ~~ ~2 denotes the sum of squares of the
components of the difference vector. The bound on the inequality, p s2 F. is a constant
that is derived from the regression analysis.
p = m, the number of nonconstant terms in the regression equation,
s2 is the "mean square for error" given in the output of a regression program, en c}
F = the 95% cutoff point for the F-ciistribution with p and n-p degrees of freedom.
By repeating the random selection of coefficients many times, a collection of coefficients
can be determined that satisfies inequality (2), en cl the upper- and lower-bounds of this
APPENDIX G
OCR for page 147
APPENDIX G
collection defines an interval [L'i, U'i]. For coefficients in these intervals a range of
predicted ratings can be generated.
From the practical point of a program trying to estimate the quality of its program, a few
years after a reputationa] survey is conducted, it couIct use a linear regression equation
with coefficients in [L.'i, U'i] to generate a new range of ratings based on current program
data, or if data for all programs in the field were available, a new interquartile ranking of
programs could be obtained.
The following is an example where this methoc! is applied to the 1995 ratings of
programs in Mathematics.
Mathematics
Using the STATA statistical package and applying a forward stepwise, least-squares
linear regression on a large number of quantitative variables which characterized
publications, citations, faculty size and rank, research grant support, number of doctorates
by gender en cl race/ethnicity, graduate students by gentler, graduate student support, and
time to degree, the following seven variables were identifier! as being the most
significant:
(ginipub) Gini Coefficient for Program Publications, 1988-92: The Gini coefficient is
an indicator of the concentration of publications on a small number of the
program faculty clunng the penod 1988-92.
(phcis) Total Number of Doctorates FY 86-92
(perfuIl) Percentage of Full Professors Participating in the Program
(persupp) Percentage of Program Faculty with Research Support (1986-92)
(perfpub) Percentage of Program Faculty Publishing in the Penoc! 1988-1992
(ratiocit) Ratio of the Total Number of Program Citations in the Period 1988-1992 to the
Number of Program Faculty
(myth) Meclian Time Lapse from Entenng Graduate School to Receipt of Ph.D. in
Years
Results of a regression analysis are shown below. About 95% of the variation is
explainer! by these vanables, where R2 = 0.8304 .
Source ~
________+
Model ~ 112.36003 7 16.0514329
Residual ~ 22.954789 131 .175227397
SS df MS
Total 1 135.314819 138 .98054217
147
Number of obs = 139
F( 7, 131) = 91.60
Prob > F =
it-squared =
Adj it-squared =
Root MSE
O.0000
0.8304
0.8213
.4186
OCR for page 148
148
APPENDIX G
quality ~ Coef. Std. Err. t Patti [95% Conf. Interval]
_ ____________+________________ ____ ___ _________ _________ _______ ________
phds 1 .3489197 .0544665 6.41 0.000 .2411721 .4566674
perfull 1 .008572 .0027864 3.08 0.003 .0030598 .0140842
persupp ~ .0183162 .0025146 7.28 0.000 .0133418 .0232906
perfpub ~ -.0150464 .0035235 -4.27 0.000 - 0220167 -.0080762
ratiocit ~ .0258671 .0077198 3.35 0.001 .0105955 .0411387
myd 1 -.7737551 .1995707 -3.88 0.000 -1.168553 _ -.3789567
ginipub ~ -.0294944 .0044222 -6.67 0.000 -.0382425 -.0207462
_cons ~ 3.070145 .3625634 8.47 0.000 2.352908 3.787382
_______________ ____________________ _ __________________________ _ _______________
The resulting predictor equation is:
fix) = 3.07 + 0.349(phtls) + 0.009(perfull) + O.OlS(persupp) -
0.026(ratiocit) - .774(myd) - 0.029(ginipub)
0.0 ~ 5(perfpub)+
it is noted that the Root Mean Square Error (RMSE) from the regression is 0.4186, en c!
the variation in scores from the 1995 confidence interval calculation has an RMSE of
0.2277.
The following is scatter plot of the actual 1995 ratings and the predicted ratings.
Plot of the Predicted Faculty Quality Score Against the Actual 1995 Score for Programs in
Mathematics
6 -
.
5-
c' 4
In
~ 3 -
i~
2 -
1 -
O-
· -
: : - ·
~ ~ ~ ,~$ .. , .-
·
.
· .$~..S>~;S.
~ · ~ ~ ~ ~ _
· · · ~
.
·. ~ , ~ ~
0 1 2 3
1 995 Score
6
The 95%-confidence interval for each of the variables used in the regression can now be
useri to find a new estimate for the quality score. As described above, values for the
OCR for page 149
APPENDIX G
coefficients in the regression equation are randomly selected in the intervals and tester! to
see if that set of coefficients satisfies the relation ~~ r - r ~2 < p S2 F. For Mathematics
data the bounct p s2 F = (71~.418612~2.12) = 2.563556. For this example 3,000 random
selections were made in the coefficient intervals ant! 220 coefficients sets satisfied the
inequality. The corresponding maximum ant! minimum interval are:
- - - Cat 1 ,, , ,~
phds persupp ginipub myd perfpub ratiocit perfull
coefficient coefficient coefficient coefficient coefficient coefficient coefficient constant
Max 0.35469 0.018583 -0.029026 -0.7526 -0.014673 0.026686 0.0088674 3.10858
Min 0.34314 0.018049 -0.029964 -0.79495 -0.015421 0.025047 0.0082761 3.03164
Using the values in the above table, the maximum and minimum predicted quality scores
can be calculated, and the scores for Mathematics programs are displayer! in the table
below.
As described earlier, these maximum ant! minimum coefficient values could be used to
construct new quality scores, by randomly selecting the coefficients in the regression
equation between the corresponding maximum and minimum values. If this is clone
repeatedly a collection of quality scores is obtained for each program and the interquartile
range of this collection could be generated. This was clone 100 times and the results are
given as the Predicted Ranks in the table with the Bootstrap rankings.
Quality Score Predicted Ranks Bootstrap Ranks
Maximum Minimum 1 st 3rd 1 st 3rd
Institution Quartile Quartile Quartile Quartile
Dartmouth College 2.73 2.51 73 76 53 62
Boston University 2.70 2.42 77 80 48 52
Brandeis University 3.17 2.88 49 51 32 36
Harvard University 4.41 4.09 8 9 2 4
Massachusetts Inst of Technology 5.27 4.93 2 2 3 4
U of Massachusetts at Amherst 3.40 3.11 38 40 54 60
Northeastern University 2.41 2.13 99 103 70 80
Brown University 4.60 4.31 5 6 26 29
Brown University-Applied Math 4.59 4.26 6 6 14 17
Universityof Rhode Island 1.69 1.40 128 129 122 125
University of Connecticut 2.66 2.39 79 83 98 102
Wesleyan University 2.31 2.09 104 107 101 110
Yale University 3.38 3.13 38 40 7 8
Adelphi University 1.07 0.82 138 138 130 133
CUNY - Grad Sch & Univ Center 3.38 3.10 40 41 30 32
Clarkson University 2.49 2.21 90 94 109 118
Columbia University 4.32 3.99 11 11 10 12
Cornell University 4.81 4.46 3 4 14 16
New York University 4.83 4.50 3 4 7 8
Polytechnic University 2.15 1.88 112 114 98 105
Rensselaer Polytechnic Inst 3.64 3.36 27 30 48 52
University of Rochester 3.10 2.83 52 54 56 62
149
OCR for page 150
50
State Univof New York-Albany 2.55 2.33 85 88 82 90
State Univof New York-Binghamton 2.55 2.33 85 87 65 75
State Univ of New York-Buffalo 3.00 2.76 57 59 61 70
State Univ of New York-Stony
Brook 3.60 3.31 30 32 19 22
Syracuse University 2.42 2.18 95 100 76 84
Princeton University 4.52 4.21 7 7 2 3
Rutgers State Univ-New Brunswick 4.06 3.77 16 18 17 20
Stevens Inst of Technology 1.73 1.48 127 127 121 128
Carnegie Mellon University 3.63 3.33 28 31 34 40
English Language and Literature
Applying the same method to the 1995 programs in English Language and Literature, a
slightly different result is obtained, since programs in this field do not have the same
productivity characteristics as those in Mathematics. Again, forward stepwise least
squares linear regression was applied to a large number of quantitative variables, and the
following were iclentifiec} as being the most significant:
(nopubs2) Number of Publications During the Perioc} 1985-1992
(perfawd) Percentage of Program Faculty with at Least One Honor or Award for the
Perioc} 1986-1992
(acadplan) Total Number of Doctorates FY 1986-1992 with academic employment
plans at the 4-year college or university level.
(ginicit) Gini Coefficienticient for Program Citations, 1988-1992: The Gini
coefficienticient is an indicator of the concentration of citations on a small
number of the program faculty clurina the nerioc! 1988-1992.
- r--=-~ ~ ~-~ -= ---- r
(nocitsI)
(fulIprof)
Number of Citations During the Perioc] 1981-1992
Percentage of Full Professors Participating in the Program
(empplan) Total Number of Doctorates FY 1986-1992 with Employment Plans.
None of the variables iclentified in the Mathematics regression are present in this
. .
regression ana ysls.
Results of this regression analysis are shown below. About 95% of the variation is
explained by these variables, where R2 = 0.8106.
APPENDIX G
OCR for page 151
APPENDIX G
Source ~ SS df MS Number of obs = 117
________+________________________------ F( 7, 109) = 66.65
Model ~ 83.985691 7 11.9979559 Prob > F = 0.0000
Residual ~ 19.6227839 109 .18002554 it-squared = 0.8106
------------+------------------------------ Adj it-squared = 0.7984
Total 1 103.608475 116 .893176507 Root MSE = .42429
q93a ~ Coef. Std. Err. t P>~t~ [95% Conf. Interval]
nopubs2 ~ .1202936 .1017753 1.18 0.240 -.0814218 .322009
perfawd ~ .0326877 .0041423 7.89 0.000 .0244777 .0408977
acadplan ~ .7961931 .2416467 3.29 0.001 .3172573 1.275129
ginicit ~ -.0007486 .0001839 -4.07 0.000 -.001113 -.0003842
nocitsl ~ .0827859 .0234272 3.53 0.001 .036354 .1292178
fullprof ~ .2942413 .1096454 2.68 0.008 .0769276 .511555
empplan 1 -.599897 .2698761 -2.22 0.028 -1.134783 -.0650113
_cons ~ 1.955276 .1533968 12.75 0.000 1.651249 2.259304
The resulting preclictor equation is:
fix) = 1.955 + 0.12(nopubs2) + 0.033(perfawcl) + 0.796(acadplan)
-O.OOl~ginicit) + 0.083(nocitsi) + 0.294(fuliprof) - 0.6(emppplan).
The following is a scatter plot of the Random Halves draw from the 1995 rankings and
the predicted! ranking for that draw.
For programs in English Language anti Literature, the Root Mean Square Error (RMSE)
from the regression is 0.42429, and the variation in scores from the 1995 confidence
interval calculation has an RMSE of 0.2544.
r - - or- ~ --I-- ~ -A -- - 0 - - - ~
Plot of the Predicted Faculty Quality Score Against the Actual 1995 Score for Programs in
English Language and Literature
6
5
4
o
U)
~ 3
· . - ·
· ·—
.^ · -
~ ·~t~f. .> . I
.- 1
*
.S.~. ~ ·. .. · .
`~' S
·-- :
0 1 2 3 4 5 6
1 995 Score
OCR for page 152
152
In Mathematics the 95%-conficlence interval for each of the variables user! in the
regression can be used to determine a new estimate for the quality score. In this case, the
bound p s2 F = (71~.42869~2~2.~) = 2.747136. For this example 3,000 random selections
were also made in the coefficient intervals en cl 242 coefficients sets satisfied the
inequality. The corresponding maximum and minimum intervals are:
nopubs2 perfawd acadplan ginicit nocits fullprof empplan
coefficient coefficient coefficient coefficient coefficient coefficient coefficient constant
Max 0.13384 0.033239 0.82835 -0.00072 0.085903 0.30883 -0.56399 1.97569
Min 0.10684 0.03214 0.76425 -0.00077 0.079689 0.27975 -0.63557 1.935
For the example used with Mathematics programs, the maximum and minimum values
for the coefficients can be used to calculate the maximum and minimum Prectictect Quality
r . ~
scores for tne programs in English Language and Literature. These scores are displayed
in the table below.
Repeating the exercise, descnbed for Mathematics, of randomly selecting coefficient
values in the maximum-minimum intervals a large number of times, an interquartile
range can be generated for programs in English Language and Literature. This was again
done 100 times and the results are given as the Predicted Ranks in the table with the
Ranclom Halves rankings.
Quality Score Predicted Ranking Random Halves
Ranks
Maximum Maximum 1 st 3rd 1 st 3rd
Institution Quartile Quartile Quartile Quartile
Universityof New Hampshire 2.74 2.56 91 93 70 77
Boston College 2.57 2.42 96 98 59 64
Boston University 3.80 3.59 20 21 38 42
Brandeis University 3.63 3.40 19 21 44 55
Harvard University 5.55 5.05 1 1 2 3
U of Massachusetts at Amherst 3.84 3.51 30 34 38 43
Tufts University 2.35 2.22 108 110 67 74
Brown University 4.21 3.78 15 16 13 15
University of Rhode Island 2.39 2.22 113 115 94 113
University of Connecticut 3.26 3.05 53 57 79 87
Yale University 5.07 4.52 5 6 2 3
CUNY- Grad Sch & Univ
Center 3.50 3.21 42 48 18 19
Columbia University 4.90 4.24 9 10 ~ 7 9
Cornell University 4.71 4.16 13 13 6 8
St John's University 1.93 1.86 127 127 119 122
Fordham University 2.38 2.23 103 106 104 112
New York University 3.59 3.25 26 28 18 20
Drew University 2.30 2.15 116 119 123 126
Universityof Rochester 3.30 3.02 30 33 44 48
State Univ of New York-
Binghamton 3.01 2.72 62 64 65 69
APPENDIX G
OCR for page 153
APPENDIX G 153
State Univ of New York-Buffalo 3.65 3.16 30 37 25 27
State U of New York-Stony
Brook 3.17 2.77 48 55 46 52
Syracuse University 2.53 2.38 95 98 71 76
Indiana Univ of Pennsylvania 2.19 1.93 124 126 122 124
Princeton University 4.82 4.39 5 6 12 14
Rutgers State Univ-New
Brunswick 3.96 3.62 22 23 16 18
Carnegie Mellon University 3.17 3.01 33 35 52 54
OCR for page 154
Representative terms from entire chapter:
average ratings