| Copyright © 2009. National Academy of Sciences. All rights reserved. Terms of Use and Privacy Statement |
Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 93
DATA NEEDS FOR PROGRAM EVALUATION
To assess the numerical adequacy of the nation's biomedical and behavioral research
personnel and to make judgments about the quality of their training, we need both
quantitative and qualitative information. Timely, accurate, and relevant information is
essential to the success of this effort. A number of data sets are maintained by NIH and
other federal agencies that are directly relevant to the responsibilities of the Committee on
Biomedical and Behavioral Research Personnel. Unfortunately, some of the sets are
complex and difficult to manipulate. Even when one manages to retrieve information from
them, the information's quality is sometimes questionable. We propose to extract from
them and from other sources a data set that is tailored to the information needs of this
committee and potentially to those of NIH.
_
The proposed ''evaluation data matrix could form the core of a management
information system for use in tracking and evaluating the National Research Service
Award (NRSA) program. The difficulties encountered by this committee in simply
attempting to correlate the historical levels of prior committee recommendations with
actual NRSA awards by field suggest that this tracking has been difficult at best. What is
needed is a coordinated systematic data set that provides descriptive and comparative
. . . .
statistics relevant to the committee. It should be coordinated and systematic in its use of
common taxonomies and measures that satisfy the needs of the committee. It should be
descriptive of programs at a level of detail that the committee deems appropriate'
providing information about characteristics, both of the programs and of their
participants, and it should be comparative across programs and through time.
The needed data set can be conceptualized simply as a time series of matrices whose
rows represent program categories (or "activity codes," as they are called by NIH) such as
F31: predoctoral individual NRSA fellowship. The columns represent characteristics of
the programs and their participants. For example, one of the nine program characteristics
requested was median length of training time. Thus, a cell in this column would represent
the median training time of the program type that the particular row represents. Primarily
for the benefit of future versions of our committee, we have undertaken an initial design
and pilot construction of such a data matrix. Our design of both rows and columns is
shown below.
In order to gain an idea of the magnitude and feasibility of the project, we
contracted with a firm that is experienced in working with the relevant data bases to
undertake a pilot construction.
Results: The contracted firm concluded that, while some items of the data matrix
could be constructed fairly readily, others would involve a greater level of effort to
construct. In principle, at least, data exist with which to construct all cells of the 39 by 20
matrix. The exercise revealed several important problems, most of which the committee
had been aware. The five most important problems with the existing data sets are
described briefly in the following section.
Problems in the Data Sets: The data sets are subject to a number of criticisms, five
of which are most serious for our purposes.
1. Access: Since our study was located in NAS/NRC's Office of Scientific and
Engineering Personnel (OSEP), which also houses the Survey of Earned
Doctorates (SED) and SDR, there was no problem in accessing these data.
However, there are some difficulties in accessing the major NIH data sets.
Although our staff have direct accessing abilities, it appears preferable to
work through the NIH staff. The Information for Management, Planning,
Analysis and Coordination (IMPAC) file is the primary source of financial
93
OCR for page 94
and other data on Public Health Service (PHS) extramural programs; data are
organized by fiscal year. The Information Systems Branch of the Division of
Research Grants (DRG) responds to requests for data regarding these PHS
activities. In addition, DRG provides annual data that are used by the
National Research Council (NRC) to update the Trainee Fellow File (TFF)
and Consolidate] Grant Applicant File (CGAF). These two files are
organized by individual recipients of traineeships, fellowships, or grants.
Data derived by DRG from the IMPAC file are considered to be "official"
data, while data derived by others from TFF and CGAF may be considered
for some purposes to be "unofficial." A written request for data to be
extracted! from the IMPAC file was sent on behalf of the committee to DRG.
Some materials were received from DRG, although most were not at the level
of detail needed for the data matrix. Some unofficial data were extracted by
the contracted firm from the TFF and CGAF files and provided to the
committee, forming the basis of the statistical profile of training programs in
Chapter 2.
3.
4
Quality of CGA F and TFF files: These two files rearrange the IMPAC
information to identify all the training received by a single individual (TFF)
and all the research grants given to an individual principal investigator
(CGAF). This information is essential for the committee to determine the
subsequent research participation of those who have received NRSA research
training and to do longitudinal studies of participation in NRSA research.
Some concerns have been expressed by DRG and NIH about the quality of
these two data sets. Thus, we recommend that NIH evaluate the accuracy of
a sample of the data sets.
Classification of race/ethnicity and sex: Apparently there are problems of
nonreporting and incorrect reporting of gender and race/ethnicity data on
the IMPAC file. A representative of DRG discourages the use of these data.
This is most discouraging in light of the clear need to monitor progress of
women and minorities in science. We recommend further investigation of
data quality, including the matching of individuals across sources of data
and over time in an effort to resolve inconsistent reporting of data.
Classification of training field: The definition of fields of science presents
several problems. The Discipline/Specialty/Field (D/S/F) codes in the
IMPAC file apparently have not been coded consistently across the various
institutes of NIH and ADAMHA. The DRF, derived from the SED, provides
data only for Ph.D.s and does not take into account persons with degrees in
one field who are receiving additional training in another. The D/S/F codes
need additional investigating; it may be the case that they provide
sufficiently accurate data at the broad field levels of biomedical sciences
and behavioral sciences, but we cannot be certain without further
investigation. We strongly recommend that NIH investigate the accuracy of
this classification anal, if necessary, design better centralized quality control
methods and apply them to these classifications.
5. Response rates in the SDR: These rates have been extremely low for many
years, varying across fields from the high 40s to the low 70s. We recognize
the complexities and problems of attrition in a longitudinal data set such as
this but recommend that ways be found to improve the rates. An NAS panel
on the NSF data system, of which the SDR is a part, recently has made a
94
OCR for page 95
similar recommendation.8 We understand that a study of nonresponse bias in
the SDR currently is under way within NAS/NRC and await its results with
interest.
8C. F. Citro and G. Kolton teds.), Surveying the Nation's Scientists and Engineers: a Data
System for the 1990s, Washington, D.C.: National Academy Press, 1989.
95
OCR for page 96
Preliminary List of Evaluation Data Matrix
Rows: Program Categories
1. Basic Biomedical Science
a. Predoctoral
o Individual
oo NRSA Fellowships
oo Other
o Institutional
oo MARC Undergraduate
oo NRSA Traineeships
oo Other
b. Postdoctoral
o Individual
oo NRSA Fellowships
oo Career Development Awards (K07, K08)
oo Other
o Institutional
oo NRSA Traineeships
oo Other
2. Behavioral Sciences
a. Predoctoral
o Individual
oo NRSA Fellowships
oo Other
o Institutional
oo MARC Undergraduate
oo NRSA Traineeships
oo Other
b. Postdoctoral
o Individual
oo NRSA Fellowships
oo Career Development Awards (K07, K08)
oo Other
o Institutional
oo NRSA Traineeships
oo Other
3. Clinical Sciences
a. Predoctoral
o Individual
oo NRSA Fellowships
oo Other
Institutional
oo NRSA Traineeships
oo Other
b. Postdoctoral
o Individual
oo NRSA Fellowships
oo Career Development Awards (KXX series, including Kll, K15)
oo Other
Institutional
oo NRSA Traineeships
oo Other
96
OCR for page 97
PRELIMINARY LIST OF MATRIX COLUMNS: CHARACTERISTICS OF PROGRAMS
AND PARTICIPANTS
1. Program characteristics in given year
a.
b.
c.
d.
e.
f.
g.
h.
i.
Goals
Number of institutions involved
Number of recipients
Median length of training
Total enrollment
Total cost
Cost per recipient month
Median number of trainees per institution
Publication counts of faculty in primary departmentts) of program
Participant characteristics in given year (median, except as noted)
a. Baccalaureate selectivity scores (A. Astin)
b. Quality rating (NRC, 1982) of doctoral department
c. Quality rating of primary department in postdoctoral programs
d. GRE scores
e. Percent female
f. Percent Asian/Pacific Islander
g. Percent other minority
h. Number of publications in first K postdoctoral years
i. Number of citations in first K postdoctoral years
i. Percent who apply for research grants in first K post-doctoral years
k. Percent who receive research grants in first K post-doctoral years
1. Percent in academia K years after termination of training
-
97
OCR for page 98
Representative terms from entire chapter:
nrsa traineeships