Read "Doctoral Scientists and Engineers in the United States: 1995 Profile" at NAP.edu

« Previous: SAMPLE DESIGN

Page 86 Cite

Suggested Citation:"DATA PREPARATION." National Research Council. 1998. Doctoral Scientists and Engineers in the United States: 1995 Profile. Washington, DC: The National Academies Press. doi: 10.17226/9524.

population of 594,300. Across strata, however, the rates ranged from 4 to 67 percent. The range in sampling rates serves to increase the variance of the survey estimates.

Data Collection

In 1995, there were two phases of data collection: a mail survey and telephone follow-up interview for nonrespondents to the mail. Phase 1 consisted of two mailings of the survey questionnaire with a reminder postcard between the mailings. The first mailing was in May 1995 and the second (using Priority Mail) in July 1995. To encourage participation, all survey materials were personalized with the respondent’s name and address. The mail survey achieved a response rate of about 62 percent.

Phase 2 consisted of conducting computer-assisted telephone interviewing (CATI) on a 60-percent sample of nonrespondents to the mail survey (the CATI subsample). Telephone numbers were located for about 90 percent of the subsample and interviews were completed with 63 percent. Telephone interviewing was conducted between November 1995 and February 1996.

Data Preparation

As completed mail questionnaires were received, they were logged into a receipt control system that kept track of the status of all cases. Coding staff then carried out a variety of checks and prepared the questionnaires for data entry. Specifically, they resolved incomplete or contradictory answers, reviewed “other specify” responses for possible backcoding to a listed response, and assigned numeric codes to open-ended questions (e.g., employer name). A coding supervisor validated the coders’ work.

Once cases were coded, they were sent to data entry. The data entry program contained a full complement of range and consistency checks for entry errors and inconsistent answers. The range and consistency checks were also applied to the CATI data via batch processing. Further computer checks were performed to test for inconsistent values; these were corrected and the process repeated until no inconsistencies remained.

At this point, the survey data file was ready for imputation of missing data. As a first step, basic frequency distributions were produced to show nonresponse rates to each question—these were generally less than 3 percent, with the exception of salary, which was 6 percent. Two methods for imputation were adopted. The first, cold decking, was used mainly for demographic variables that are static, i.e., not subject to change. Using this method, historical data provided by respondents in previous years were used to fill a missing response. In cases where no historical data were available, and for nondemographic variables (such as employment status, primary work activity, and salary), hot decking was used. Hot decking involved creating cells of cases with common characteristics (through the cross-classification of auxiliary variables) and then selecting a donor at random for the case with the missing value. As a general rule, no data value was imputed from a donor in one cell to a recipient in another cell.

Next: RELIABILITY »

Welcome to OpenBook!

You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

Do you want to take a quick tour of the OpenBook's features?

No Thanks

Take a Tour »

Doctoral Scientists and Engineers in the United States: 1995 Profile (1998)

Chapter: DATA PREPARATION

Data Collection

Data Preparation

Welcome to OpenBook!

Get Email Updates