Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 14

4
The Overall Rating of Program Quality
The dimensional measures provide a summary of program performance along individual
dimensions that are of importance in doctoral education. The overall rating combines the
variables that make up the dimensional measures into a single measure. In addition to reflecting
the faculty preferences in each field as derived from the faculty questionnaire, it includes the
results of the importance measures derived from the rating survey. This section describes in non-
technical terms how the overall rating of a program is calculated. Readers who wish more
technical detail are referred to Appendix A.
THE OVERARCHING IDEA
There is a great deal of uncertainty in the ratings of the quality of programs. Uncertainty
can come from a variety of sources. For example, although many academics may think that they
can identify the top five or ten programs in their field, this certainty about perceived quality
decreases as more and more programs are included. Furthermore, one program may be strong in
one area while a second program’s strengths may lie in a different area. Faculty asked to rate
programs may differ in their views about the importance of these strengths, and the programs
may differ in various characteristics, many of which may be considered important to the
perceived quality of a doctoral program.
Describing this uncertainty was a key task of the predecessor committee that produced
Assessing Research-Doctorate Programs: A Methodology Study.22 This committee examined the
methodology of the 1995 study and recommended that the next study rely more explicitly on
22
National Research Council., Assessing Research-Doctorate Programs: A Methodology Study. Washington, D.C.
2003.
14
PREPUBLICATION COPY—UNEDITED PROOFS

OCR for page 14

program data. It also contained two key recommendations as to how the methodology of
obtaining reputation measures should be revised:
“The next study should have sufficient resources to collect and analyze auxiliary
information from peer raters and the programs being rated to give meaning and context to
the rating ranges that are obtained for the programs….” (p. 5)
and
“Re-sampling methods should be applied to ratings to give ranges of rankings for each
program that reflect the variability of ratings by peer raters. The panel investigated two
related methods, one based on Bootstrap re-sampling and another closely related method
based on Random Halves, and found that either method would be appropriate.” (p. 5)
The dimensional ratings, described in the previous section, fulfill the first recommendation. This
section describes how the second recommendation was followed and combined with the first to
obtain an overall rating for each program within a field.
THE OVERALL APPROACH
A schematic description of the overall approach appears in Box 4-1 and is described in
the text:
15
PREPUBLICATION COPY—UNEDITED PROOFS

OCR for page 14

Box 4-1
Faculty Students Institutions and Programs Existing Data
1. DATA
More than 5,000 doctoral programs in 222
institutions in 61 fields across the sciences,
engineering, social sciences arts, and humanities.
Institutional practices, program characteristics, and
faculty and student demographics.
Obtained through a combination of original surveys
and existing data sources (NSF surveys and ISI
publication and citation data).
2. WEIGHTS
In two surveys, program faculty provided the NRC with
information on what they value most in Ph.D. programs
1) Asked directly how important they felt 21 items in a
list of program characteristics were.
2) A sample of faculty rated a sample of programs in
their field. These ratings were then related through
regressions to the same items as appeared in 1).
3. ANALYSIS
“Direct” and “regression-based”
weights provided by faculty were
averaged into one combined set of
weights, reflecting the multi-
dimensional views faculty hold about
contributing factors to the quality of
doctoral programs.
4. RANGES OF RANKINGS.
Each program’s rating was calculated 500 times by randomly
selecting half of the raters from the faculty sample in Step #2 and
also incorporating statistical and measurement variability.
Similarly, 500 samples of direct weights were selected.
Combined weights were then applied to 500 randomly selected
sets of program data to produce ratings for each program.
These ratings for each of the 500 samples determine a rank
16
ordering of the programs.
A “range PREPUBLICATION COPY—UNEDITED the middle
of rankings” was then constructed showing PROOFS
half of calculated rankings. What may be compared, among
programs in a field, is this range of rankings.

OCR for page 14

Faculty were surveyed to get their views on the importance of different characteristics of
programs as measures of quality. Ratings were based on faculty members’ views of how those
measures related to program quality, as discussed in the chapter on dimensional measures. The
views were related to program quality using two distinct methods: (1) directly, through answers
to questions on the faculty survey; and (2) regression-based, obtained by asking faculty raters to
provide program ratings for a sample of programs in a field and then relating these ratings,
through a regression model that corrected for correlation among the characteristics, to data on the
program characteristics. The two methods approach the ratings from different perspectives. The
direct approach is a “bottom-up” approach that builds up the ratings from the importance that
faculty members gave to specific program characteristics independent of reference to any actual
program. The regression-based method is a “top-down” approach that starts with ratings of
actual programs and uses statistical techniques to infer the weights given by the raters to specific
program characteristics. The direct approach is idealized. It asks about the characteristics that
faculty feel contribute to quality of doctoral programs without reference to any particular
program. The second approach presented the respondent with 15 programs in his or her field and
asked for ratings of program quality23, but the responders were not explicitly queried about the
basis of their ratings.
Because it turned out that these different approaches gave results that were similar in
magnitude24 but not strongly correlated25, the two views of the importance of program
characteristics were combined26 to obtain an overall view (or combined weight) for each
measured program characteristic. The sum of these weighted characteristics yielded a rating for
each program. As is explained below, each rating is recalculated 500 times using different
samples of raters. The program ratings obtained from all these calculations can then be arranged
23
The question given raters about program quality was:
On a scale from 1 to 6, where 1 equals not adequate for doctoral education and 6 equals a
distinguished program, how would you rate this program?
Not Adequate
For Doctoral Don’t
Know
Education Marginal Adequate Good Strong Distinguished Well
Enough
1 2 3 4 5 6 9
24
In the case of the resulting direct and regression based weights.
25
For any given measure, the results from the two methods are not highly correlated with one another, permitting us
to assume that the results from the two approaches are statistically independent.
26
If there were no uncertainty, the weights would simply be averaged. Because there is uncertainty, the optimal
combined weight is not so simple. but takes into account the variances of the separate coefficients. See equations
(19) and (20) in Appendix A and the related discussion.
17
PREPUBLICATION COPY—UNEDITED PROOFS

OCR for page 14

in rank order and, in conjunction with all the ratings from all the other programs in the field, used
to determine a range of possible rankings.
Because of the various sources of uncertainty, which are discussed at greater length in
Appendix A, each ranking is expressed as a range of values. These ranges were obtained by
taking into account the different sources of uncertainty in these ratings (statistical variability
from the estimation, program data variability, and variability among raters). The measure of
uncertainty is expressed by reporting the end points of the inter-quartile range of rankings for
each program; that is, the range that contains the middle half of a large number of ratings
calculations that take uncertainty into account.27 An example of the derivation of rankings for a
program is given in the Chapter 5.
In summary, we obtain a range of rankings for each program in a given field by first
obtaining two sets of weights through two different methods, direct and regression-based. We
then standardize all the measures to put them on the same scale and obtain ratings by multiplying
the value of the standardized measure by the weights. We obtain both the direct weights and
coefficients from regressions through calculations carried out 500 times, each time with a
different set of faculty, to generate a distribution of ratings that reflects their uncertainties. We
obtain the range of rankings for each program by trimming the bottom quarter and the top quarter
of the 500 rankings to obtain the inter-quartile range. This method of calculating ratings and
rankings takes into account variability in rater assessment of what contributes to program quality
within a field, variability in values of the measures for a particular program, and the range of
error in the statistical estimation. It is important that these techniques give us a range of rankings
for most programs. We do not know the exact ranking for each program, and to try to obtain
one—by averaging, for example—could be misleading, because we have not imposed any
particular distribution on the range of rankings.28 The database that presents the range of
rankings for each program will list the programs alphabetically and give the range for each
program. Users are encouraged to look at groups of programs that are in the same range as their
own programs, as well as programs whose ranges are above or below, in trying to answer the
question, “Where do we stand?”
The next section provides an example of how the ranges of rankings were calculated for a
particular program.
27
The inter-quartile range eliminates the top and bottom 125 ratings calculated from 500 regressions and 500
samples of direct weights from faculty. It is a range that contains half of all the rankings for a program.
28
For example, most of the rank ordered ratings could be at the top of the range.
18
PREPUBLICATION COPY—UNEDITED PROOFS