National Academies Press: OpenBook

The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs (2002)

Chapter: Methodological Issues in the Measurement of Work Disability

« Previous: Conceptual Issues in the Measurement of Work Disability
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 211
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 212
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 213
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 214
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 215
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 216
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 217
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 218
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 219
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 220
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 221
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 222
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 223
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 224
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 225
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 226
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 227
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 228
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 229
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 230
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 231
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 232
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 233
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 234
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 235
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 236
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 237
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 238
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 239
Suggested Citation:"Methodological Issues in the Measurement of Work Disability." Institute of Medicine and National Research Council. 2002. The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs. Washington, DC: The National Academies Press. doi: 10.17226/10411.
×
Page 240

Below is the uncorrected machine-read text of this chapter, intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text of each book. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Methodological Issues in the 1 Measurement of Work Disability Nancy A. Mathiowetz, Ph.D.2 The collection of information about persons with disabilities presents a particularly complex measurement issue because of the variety of con- ceptual paradigms that exist, the complexity of the various paradigms, and the numerous means by which alternative paradigms have been operationalized in different survey instruments (see paper by Jette and Badley for a review). For example, disability is often defined in terms of environmental accommodation of an impairment; hence, two individuals with the same impairment may not be similarly disabled or share the same perception of their impairment. For an individual with mobility limitations who lives in an assisted-living environment that accommo- dates the impairment, the environmental adaptations may result in little or no disability. The same individual living on the second floor of an apartment building with no elevator may have a very different percep- tion of the impairment and may see him- or herself as disabled because of the environmental barriers that exist within his or her immediate environ- ment. The Social Security Administration (SSA) is currently reengineering its disability claims process for providing benefits to blind and disabled 1This paper was originally prepared for the committee workshop titled “Workshop on Survey Measurement of Work Disability: Challenges for Survey Design and Method” held on May 27–28, 1999 in Washington, D.C. (IOM, 2000). 2Nancy Mathiowetz is an Associate Professor at the University of Maryland’s Joint Pro- gram in Survey Methodology. 211

212 THE DYNAMICS OF DISABILITY persons under the Social Security Disability Insurance (SSDI) and Supple- mental Security Income (SSI) programs. As part of the effort to redesign the claims process, SSA has initiated a research effort designed to address the growth in disability programs, including the design and conduct of the Disability Evaluation Study (DES). The DES will provide SSA with comprehensive information concerning the number and characteristics of persons with impairments severe enough to meet SSA’s statutory defini- tion of disability, as well as the number and characteristics of people who are not currently eligible but who could be eligible as a result of changes in the disability decision process. For those years in which the DES is not conducted, SSA will need to monitor the potential pool of applicants. One means by which SSA can monitor the size and characteristics of potential beneficiaries is through other ongoing federal data collection efforts. For both the conduct of the DES and the monitoring of the pool of potential beneficiaries through the use of various data collection efforts, it is critical to understand the measurement error properties associated with the iden- tification of persons with disabilities as a function of the essential survey conditions under which the data have been and will be collected. The extent to which alternative instruments designed to measure persons with disabilities map to various eligibility criteria under consideration by SSA is also important. BACKGROUND The collection of disability data is an evolving field. Although a large and growing number of scales attempt to measure functional status and work disability, little is known about the measurement error properties of various questions and composite scales. The empirical literature provides clear evidence of variation in the estimates of the number of persons with disabilities in the United States, depending upon the conceptual para- digm of interest, the analytic objectives of the particular measurement process, and the essential survey conditions under which the information is collected (e.g., Haber, 1990; McNeil, 1993; Sampson, 1997). This litera- ture suggests that estimates of the disabled population not only are re- lated to the conceptual framework underlying the measurement construct but are also a function of the essential survey conditions under which the measurement occurred, including the specific questions used to measure disability, the context of the questions, the source of the information (self- versus proxy response), variations in the mode and method of data collec- tion, and the sponsor of the data collection effort. Furthermore, terms such as impairment, disability, functional limitation, and participation are of- ten inconsistently used, resulting in different and conflicting estimates of prevalence. Attempts to measure not only the prevalence but also the

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 213 severity of an impairment or disability further complicate the measure- ment process. Recent shifts in the conceptual paradigm of disability, in which dis- ability is viewed as a dynamic process rather than a static measure and as an interaction between an individual with an impairment and the envi- ronment rather than as a characteristic only of the individual, imply that those responsible for the development of disability measures must sepa- rate the measurement of the impact of environmental factors in the enablement-disablement process from the measurement of ability. View- ing disability as a dynamic state resulting from an interaction between a person’s impairment and a particular environmental context further com- plicates the assessment of the quality of various survey measures of dis- ability, specifically, the reliability of a measure. As a dynamic characteris- tic, one would anticipate changes in the reports of disability as a function of changes in the individual as well as changes in the social and environ- mental contexts. The challenge for the measurement process is to disen- tangle true change from unreliability. This workshop comes at a time when the federal government is un- dertaking several initiatives with respect to the measurement of disability in federal data collection efforts. The Americans with Disability Act of 1990 (ADA) defines disability as (1) a physical or mental impairment that substantially limits one or more of the major life activities of the indi- vidual, (2) a record of a substantially limiting impairment, or (3) being regarded as having a substantially limiting impairment. Although the measurement of disability within household surveys is not bound by the ADA definition, the passage of the ADA provides a socioenvironmental framework for how society comprehends and uses terms such as disability and impairment (e.g., the popular press and court rulings on ADA-related litigation). These definitions will evolve as a function of litigation related to ADA legislation and presentation of that litigation in the press. Hence, society is entering a period in which potential dynamic shifts in the com- prehension and interpretation of the language associated with the mea- surement of persons with disabilities can be anticipated. This paper is intended to serve as a means of facilitating discussion among individuals from diverse theoretical and empirical disciplines con- cerning the methodological issues related to the measurement of persons with disabilities. As a first step to achieving this goal, a common language and framework needs to be established for the enumeration and assess- ment of the various sources of error that affect the survey measurement process. The chapter draws from several empirical investigations to pro- vide evidence as to the extent of knowledge concerning the error proper- ties associated with various approaches to the measurement of functional limitations and work disability.

214 THE DYNAMICS OF DISABILITY SOURCES OF ERROR IN THE SURVEY PROCESS: THE SURVEY RESEARCH PERSPECTIVE For the purpose of defining a framework that can be used to examine error associated with the measurement of persons with disabilities, I draw upon the conceptual structure and language used by Groves (1989), based on earlier work of Kish (1965) and used by Andersen et al. (1979). Suchman and Jordan (1990) have described errors in surveys as the discrepancy between the concept of interest to the researcher and the quantity actually measured in the survey. Bias, according to Kish (1965, p. 509), refers to systematic errors in a statistic that affect any sample taken under a speci- fied survey design with the same constant error or, as stated by Groves (1989), is the type of error that affects the statistic in all implementations of a survey. Variable errors are those errors that are specific to a particular implementation of a design, that is, specific to the particular trial. The concept of variable error requires the possibility of repeating the survey, with changes in the units of replication, that is, the particular set of re- spondents, interviewers, supervisors, coding, editing, and data entry staff. Errors of Nonobservation Within the framework of survey methodology, both variable error and bias are further characterized in terms of errors of nonobservation and errors of observation. As one would expect from the term, errors of nonobservation reflect failure to obtain observations for some segment of the population or for all elements to be measured. Errors of non- observation are most often classified as arising from three sources: sam- pling, coverage, and nonresponse. Sampling Error Sampling error represents one type of nonobservation variable error; it arises from the fact that measurements (observations) are taken for only a subset of the population. Sampling variance refers to changes in the value of some statistic over possible replications of a survey in which the sample design is fixed but different individuals are selected for the sample. Estimates based on a particular sample will not be identical to estimates based on a different subset of the population (selected in the same manner) or to estimates based on the full population.

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 215 Coverage Error Coverage error defines the failure to include all eligible population members on the list or frame used to identify the population of interest. Those members not identified on the frame have a zero probability of selection and are never measured. For example, in the United States, ap- proximately 5 percent of the population lives in households without tele- phone service; any survey that is conducted by telephone and that at- tempts to describe the entire household-based population of the United States therefore suffers from coverage error. To the extent that those with- out telephones differ from those with telephones for the construct of in- terest, the resulting estimates will be biased. Nonresponse Error Nonresponse error can arise from failure to obtain any information from the persons selected to be measured (unit nonresponse) or from failure to obtain complete information from all respondents to a particu- lar question (item nonresponse). The extent to which nonresponse affects survey statistics is a function of both the rate of nonresponse and the difference between respondents and nonrespondents, as illustrated in the following formula:  nr  y r = y n +   ( y r − y nr )  n where: yr = the statistic estimated from the r respondents, yn = the statistic estimated from all n sample cases, ynr = the statistic estimated from the nr nonrespondents, and nr = the proportion of nonrespondents. Knowing the response rate is not sufficient to determine the level of nonresponse bias; studies with both high and low rates of nonresponse can suffer from nonresponse bias. As noted by Groves and Couper (1998), it is useful to further distin- guish among the types of unit nonresponse, each of which may be related to the failure to measure different types of persons. For most household data collection efforts involving interviewers, the final outcome of an interview attempt is often classified into one of the following four catego- ries: completed or partial interview, refusal, noncontact, and other non-

216 THE DYNAMICS OF DISABILITY interview.3 Survey design features can affect the distribution of cases across the various categories. Noncontact rates are affected by the length of the field period (in which short field periods result in higher noncontact rates than longer field periods). Surveys that place greater demands on the respondent may suffer from higher refusal rates than less burden- some instruments. The choice of respondent rule affects the rate of non- response; designs that permit any knowledgeable adult within the house- hold to serve as the respondent provide an interviewer with some flexibility, should one adult within the household refuse or be unable to participate. Field efforts that fail to accommodate non-English-speaking respondents or that focus their attention on frail subpopulations tend to experience higher rates of other noninterviews. Errors of Observation Observational errors can arise from any of the elements directly en- gaged in the measurement process, including the questionnaire, the re- spondent, and the interviewer, as well as the characteristics that define the measurement process (e.g., the mode and method of data collection). This section briefly reviews the theoretical framework and empirical find- ings related to the various sources of measurement error in surveys. Questionnaire as Source of Measurement Error Tourangeau (1984) and others (see Sudman et al. [1996] for a review) have categorized the survey question-and-answer process as a four-step process involving comprehension of the question, retrieval of information from memory, assessment of the correspondence between the retrieved information and the requested information, and communication of the response. In addition, the encoding of information, a process outside the control of the survey interview, determines a priori whether the informa- tion of interest is available for the respondent to retrieve. Comprehension of the question involves the assignment of meaning to the question by the respondent. Ideally, the question will convey the meaning of interest to the researcher. However, several linguistic, struc- tural, and environmental factors affect the interpretation of the question by the respondent. These factors include the specific wording of the ques- 3Other noninterview is used to classify cases in which contact was made with the members of the household in which the sample person resides, but for reasons such as physical or mental health, language difficulties, or other reasons not related to reluctance to participate, the interviewer was unable to conduct the interview.

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 217 tion, the structure of the question, the order in which the questions are presented, the overall topic of the questionnaire, whether the question is read by the respondent (self-administration) or is presented to the respondent by an interviewer, and the mode of communication used by the interviewer (that is, telephone versus face-to-face presentation). The wording of a question is often seen as one of the major problems in survey research: although one can standardize the language read by the respon- dent or the interviewer, standardization of the language does not imply standardization of the meaning. For example, “Do you own a car?” appears to be a simple question from the perspective of semantics and structure. However, several of the words in the question are subject to variation in interpretation, including “you” (just the respondent or the respondent and his or her family), “own” (completely paid for, purchased as opposed to rented), and even the word “car” (does this include vans and trucks?). The goal for the questionnaire designer is to develop questions that exhaust the range of possible interpretations, making sure that the par- ticular concept of interest is the concept that the respondent has in mind when responding to the item. One source of variation in a respondent’s comprehension of survey questions is due to differences in the perceived intent or meaning of the question. Perceived intent can be shaped by the sponsorship of the sur- vey, the overall topic of the questionnaire, or the environment more im- mediate to the question of interest, such as the context of the previous question or set of questions or the specific response options associated with the question. Respondent as Source of Measurement Error Once the respondent comprehends the question, he or she must retrieve the relevant information from memory, make a judgment as to whether the retrieved information matches the requested information, and communicate a response. Much of the measurement error literature has focused on the retrieval stage of the question-answering process, clas- sifying the lack of reporting of an event as retrieval failure on the part of the respondent and comparing the characteristics of events that are re- ported with those that are not reported. Several factors have been found to be related to the quality of reporting, including the length of the refer- ence period of interest and the salience of the information. For example, the literature suggests that the greater the length of the recall period, the greater the expected bias in the reporting of episodic information (e.g., Cannell et al., 1965; Sudman and Bradburn, 1973). Salience is hypoth- esized to affect the strength of the memory trace and, subsequently, the effort involved in retrieving the information from long-term memory.

218 THE DYNAMICS OF DISABILITY The weaker the trace, the greater the effort needed to locate and retrieve the information. As part of the communication of the response, the respondent must determine whether he or she wishes to reveal the information as part of the survey process. Survey instruments often ask questions about socially and personally sensitive topics. It is widely believed and well documented that such questions elicit patterns of underreporting (for socially undesir- able behavior and attitudes), as well as overreporting (for socially desir- able behaviors and attitudes). The determination of social desirability is a dynamic process and is a function of the topic of the question, the imme- diate social context, and the broader social environment at the time the question is asked. Even if the respondent is able to retrieve accurate infor- mation, he or she may choose to edit this information at the response formation stage as a means of reducing the costs associated with reveal- ing the information. The use of proxy reporters, that is, asking individuals within sampled households to provide information about other members of the house- hold, is a design decision that is often framed as a trade-off among costs, sampling errors, and nonsampling errors. The use of proxy informants to collect information about all members of a household can increase the sample size (and hence reduce the sampling error) at a lower marginal data collection cost than increasing the number of households. The use of proxy respondents also facilitates the provision of information for those who would otherwise be lost to nonresponse because of an unwillingness or inability to participate in the survey interview. However, the cost asso- ciated with the use of proxy reporting may be an increase in the rate of errors of observation associated with poorer-quality reporting for others compared with the quality that would have been obtained under a rule of all self-response. Most of the evaluations of the quality of proxy responses compared with the quality of self reports have focused on the reporting of autobio- graphical information (e.g., Mathiowetz and Groves, 1985; Moore, 1988) with some recent investigations examining the convergence of self and proxy reports of attitudes (Schwarz and Wellens, 1997). The literature is, however, for the most part silent with respect to the quality of proxy reports for personal characteristics, the exception being a small body of literature that addresses self-reporting versus proxy reporting effects in the reporting of race/ethnicity (Hahn et al., 1996) and the reporting of activities of daily living (e.g., Mathiowetz and Lair, 1994; Rodgers and Miller, 1997). The findings suggest that proxy reports of functional limita- tions tend to be higher than self-reports; the research is inconclusive as to whether the discrepancy is a function of overreporting on the part of proxy informants, underreporting on the part of self-respondents, or both.

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 219 Interviewers as Sources of Measurement Error For interviewer-administered questionnaires, interviewers may affect the measurement processes in one of several ways, including: • failure to read the question as written; • variation in interviewer’s ability to perform the other tasks associated with interviewing, for example, probing insufficient responses, selecting appropriate respondents, and recording the information provided by the respondent; and • demographic and socioeconomic characteristics as well as voice characteristics that influence the behavior of the respondent and the responses provided by the respondent. The first two factors contribute to measurement error from a cognitive or psycholinguistic perspective in that different respondents are exposed to different stimuli; thus, variation in responses is, in part, a function of the variation in stimuli. All three factors suggest that the interviewer effect contributes to an increase in variable error across interviewers. If all interviewers erred in the same direction (or their characteristics resulted in errors of the same direction and magnitude), interviewer bias would result. For the most part, the literature indicates that among well-trained interview staff, interviewer error contributes to the overall variance of estimates as opposed to resulting in biased estimates (Lyberg and Kasprzyk, 1991). Other Essential Survey Conditions as Sources of Measurement Error Any data collection effort involves decisions concerning the features that define the overall design of the survey, referred to here as the “essen- tial survey conditions.” In addition to the sample design and the wording of individual questions and response options, these decisions include the following: • whether to use interviewers or to collect information via some form of self-administered questionnaire; • the means for selecting and training interviewers (if applicable); • the mode of data collection for interviewer administration (tele- phone versus face to face); • the method of data collection (paper and pencil, computer assisted); • whether to contact respondents for a single interview (cross- sectional design) or follow respondents over time (longitudinal or panel design);

220 THE DYNAMICS OF DISABILITY • for longitudinal designs, the frequency and periodicity of mea- surement; • the identification of the organization for whom the data are col- lected; and • the identification of the data collection organization. No single design feature is clearly superior with respect to overall data quality. For example, as noted above, interviewer variance is one source of variability that can be eliminated through the use of a self- administered questionnaire. However, the use of an interviewer may aid in the measurement process by providing the respondent with clarifying information or by probing insufficient responses. The use of a panel sur- vey design, with repeated measurements with the same individuals, fa- cilitates more efficient estimation of change over time (compared with the use of multiple cross-sectional samples); however, panel designs may be subject to higher rates of nonresponse (as a result of nonresponse at every round of data collection) or panel conditioning bias, an effect in which respondents alter their reporting behavior as a result of exposure to a set of questions during an earlier interview. The following scenario is an illustration of statistical measures of error used by survey methodologists. Assume that the measure of interest is personal earnings among all adults in the United States. A “true value” exists if the construct of interest is carefully defined. The data will be collected as part of a household-based health survey being conducted by telephone. The decision to use the telephone for data collection implies that approximately 5 percent of the adults will not be eligible for selec- tion. To the extent that the personal earnings of adults without telephones differ significantly from those with telephones, population-based esti- mates for the entire adult population will suffer from coverage bias. Simi- larly, not all eligible sample persons will participate in the interview because of refusal to cooperate, an inability on the part of the survey organization to contact the respondent, or other reasons, such as lan- guage barriers or poor health that limits participation. Once again, to the extent that the earnings of those who participate differ significantly from those who do not participate, population-based estimates of earnings will suffer from nonresponse bias. If all respondents misreport their earnings, underreporting their earn- ings by 10 percent, and they consistently do so in response to repeated measures, the measure will be reliable but not valid and population esti- mates based on the question (e.g., population means) would be biased. However, multivariate model-based estimates that examine the relation- ship between earnings and human capital investment would not be biased, since all respondents erred in the same direction and relative

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 221 magnitude. Differential response error, for example, the overreporting of earnings by low-income individuals and the underreporting of earnings by high-income individuals, may produce unbiased population estimates (e.g., mean earnings per person) but biased model-based estimates related to individual behavior. MEASUREMENT ERROR: THE PSYCHOMETRIC PERSPECTIVE The language and concepts of measurement error in psychometrics are different from the language and concepts used within the fields of survey methodology and statistics. The focus for psychometrics is on variable errors; from the perspective of classical true score theory, all questions produce unbiased estimates, but not necessarily valid estimates, of the construct of interest. The confusion arises in that both statistics and psychometrics use the terms validity and reliability to sometimes refer to very similar concepts and to sometimes refer to concepts that are quite different. Within psychometrics, the terms validity and reliability are used to describe two types of variable error. Validity refers to “the correlation between the true score and the respondent’s answer over trials” (Groves, 1991, p. 8). The validity of a measure can be assessed only for the popula- tion, whereas the validity of both population estimates and individuals’ responses presented in the survey methodological literature can be assessed. Reliability refers to the ratio of the true score variance to the observed variance, where variance refers to variability over persons in the popula- tion and over trials within a person (Bohrnstedt, 1983). Once again, the measurement of reliability from this perspective does not facilitate mea- surement for a person but produces a measure of reliability specific to the particular set of individuals for whom the measurement was taken. The psychometric literature identifies several means by which valid- ity can be assessed; the choice of measures is, in part, a function of the purpose of the measurement. These measures of validity include content, construct, concurrent, predictive, and criterion. If one considers that the questions included in a particular instrument represent a sampling of all questions that could have been included to measure the construct of inter- est, content validity refers to the comprehensiveness as well as the rel- evance of those questions. Content validity refers to the extent to which the question or questions reflect the domain or domains reflected in the conceptual definition. Face validity refers to the extent to which each item appears to measure that which it purports to measure. Cognitive inter- viewing techniques that focus on the comprehension of items by respon- dents are, to some extent, a test of face validity. Criterion-related validity evaluates the extent to which the measure of interest correlates highly with a “gold standard.” The gold standard

222 THE DYNAMICS OF DISABILITY could consist of a different self-reported measure, a behavioral measure, or an observation or evaluation outside the measurement process (e.g., clinical evaluation). Criterion-related validity is further categorized as concurrent validity or predictive validity. Concurrent validity refers to the correlation between the item of interest and some other item, event, or behavior measured at the same point in time, whereas predictive validity refers to the correlation between an indicator measured at time t and some other measure, event, or behavior measured at time t + 1. When no gold standard exists, validity is evaluated in terms of the correlation between the measure of interest and other measures, accord- ing to theory-based hypotheses. As noted by McDowell and Newall (1996), “construct validation begins with a conceptual definition of the topic or construct to be measured, indicating the internal structure of its components and the theoretical relationship of scale scores to external criteria” (p. 33). Measures of reliability include internal consistency (often referred to as coefficient Alpha or Cronbach’s Alpha), test-retest, and interrater reli- ability. Internal consistency measures the extent to which all items in a scale measure the same underlying concept; it is only applicable for multi- item Likert scales. The reliability coefficient is a function of both the extent to which the items are homogeneous and the number of items in the scale; the coefficient increases with an increase in either the homogeneity of the items or the number of items. Test-retest reliability involves the measure- ment of the same person under the same measurement conditions at two points in time and can be used for single-item measures, as well as multi- item scales.4 Interrater reliability refers to the consistency with which different raters or observers rating the same person agree with one another. Returning to the example of the measurement of earnings to illustrate the measurement error properties of the construct in terms of psycho- metrics, assume that the question or questions designed to measure earn- ings are both comprehensive and relevant. Therefore, the questions would be assessed as having content validity (face validity). If, as noted above, all respondents underreported their earnings by 10 percent, the construct would have a lower score with respect to criterion validity, but since all respondents erred in the same direction and the same magnitude, the indicator would have construct validity. If repeated measurement resulted in consistent reports by all respondents, test-retest measures would indi- cate a high degree of reliability, not dissimilar to the conclusion drawn by statisticians. 4Within survey research, the conduct of a reinterview under the same essential survey conditions as the original interview is an example of a test-retest assessment of reliability.

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 223 POTENTIAL SOURCES OF MEASUREMENT ERROR SPECIFIC TO PERSONS WITH DISABILITIES Similar to any other measurement of persons via the survey process, the identification of persons with disabilities is subject to the various sources of error discussed above. The measurement of persons with dis- abilities raises particular challenges, in light of the complexity of the phe- nomenon of interest and the demands of the measurement process. Some of the various sources that may be of particular importance are high- lighted. Coverage, Access, and Participation The interactive nature of the survey interview places great demands on the sensory and physical resources of respondents. A face-to-face inter- view requires that the respondent have the capacity to hear the questions, respond orally, understand individual questions and response categories, and be able to maintain cognitive focus. In addition, the respondent must tolerate the physical demands of the interview, a task that may take up to an hour or two. Impairments or disabilities may limit a person’s ability to participate in the survey process or limit access to the individual. The essential survey design features of a data collection effort can facilitate or limit access and participation of persons with disabilities. This is not unique to the measurement of persons with impairments or disabilities. The use of the telephone for data collection restricts the sample to those households with telephones; if the data collection by telephone does not accommodate the use of TTY technology, hearing-impaired individuals will also not be measured. Similarly, the use of self-administered paper and pencil questionnaires limits participation to those who are literate and whose vision permits the reading of the font size used on the ques- tionnaire. The implementation of a self-response rule eliminates from measurement those for whom gatekeepers deny access and those, although they are willing to participate, who are unable to do so because of physical, mental, or emotional impairments or those for whom the barrier to participation is language, either their use of a different spoken language or their use of sign language. Cognition and the Measurement of Persons with Disabilities From a cognitive perspective, the measurement of persons with dis- abilities offers particular challenges. First, one needs to understand how individuals encode information about impairments and disabilities. In addition, effective questionnaire design requires an understanding of how

224 THE DYNAMICS OF DISABILITY the encoding of the information varies according to perceptual perspec- tive (self-response versus other response, nature of the relationship between the respondent and the person for whom they are reporting). Second, little is known about how ability (capacity) is measured indepen- dent of environmental context (participation). Many of the questions and sets of questions used to measure impair- ments and disability are plagued by comprehension problems related to both semantic and lexical complexity. For example, questions concerning work disability are subject to comprehension problems with respect to the shared meaning of “work.” As noted earlier, the respondent must infer whether limitations in the kind or amount of work include factors related to transportation and access to the workplace. The desire for parsimoni- ous means by which an individual’s status can be assessed with respect to impairments or particular functional limitations has led to the creation of “composite” screening questions that nevertheless represent a single ques- tion and that may therefore be cost-effective, even though they press against the limits of working memory.5 The response task requires the respondent to retrieve information, determine the relevance of that information to the posed question, and formulate a response. Often the respondent is limited in the form of the response to a simple classification (e.g., “yes,” limited in the kind or amount of work versus not limited) that fails to capture the full spectrum of the enablement-disablement process and the complexity of the phe- nomenon of interest. The mapping of this complex phenomenon to a limited number of response categories is most likely fraught with error. The integration of theories of cognitive psychology with survey meth- odology has given rise to new methods of questionnaire design and evalu- ation. Many of the current measures of disability used in federal data collection efforts have not been subjected to testing methods common to new questions and questionnaires, for example, cognitive interviewing and behavior coding. Cognitive interviewing encompasses several tech- niques designed to elicit information about the respondent’s comprehen- sion of the question, the strategies by which the respondent attempts to retrieve information from memory, judgments as to whether the retrieved information meets the perceived goals of the question, and the formula- tion of responses. These techniques include the use of “think-aloud” pro- tocols, follow-up probes, vignettes, and “sort-order” tasks (Forsyth and Lessler, 1991; Willis et al., 1991). 5For example: “Because of a physical, mental or emotional problem does anyone in the family have any difficulty with activities such as bathing, dressing, eating, getting in or out of a chair or bed, or walking across a room?”

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 225 A small body of literature has attempted to address problems in the comprehension of functional limitation questions in community-based survey interviews through the use of cognitive interviewing techniques (Jobe and Mingay, 1990; Keller et al., 1993). The findings from these inves- tigations of functional limitation questions by use of cognitive interview- ing techniques suggest that respondents varied in their interpretation of terms, tended to emphasize capacity rather than actual performance, over- looked qualifying statements within the question, failed to remember the use of human assistance, or failed to remember help with specific activities.6 Social Cognition, Self-Concept, and Social Desirability What is meant when an individual is asked to classify him- or herself or someone else with respect to disability? Although reliable measure- ment may call for the use of clear, unambiguous, and objective defini- tions, it is questionable whether these goals are achievable with respect to the measurement of disability. Disability is a dynamic concept related to an underlying interface between an individual, societal accommodations and barriers, cultural norms and expectations, and behavioral norms. The use of “fuzzy logic” in which attributes apply only partially to given individuals may be more appropriate than standard survey techniques for the classification of disability (Hahn et al., 1996). Although theories from cognitive psychology can provide informa- tion about the different cognitive processes by which self and proxy re- porters engage in the response formulation process, one can turn to theo- ries from social cognition to understand how individuals classify themselves and each other with respect to social categories. Although social cognition draws heavily from the theory and methods of cognitive psychology, as a subfield its focal point is on social objects, specifically, individuals or groups of individuals. As noted by Brewer, In comparison to object categories, social categories have been postulated to be overlapping rather than hierarchically organized . . ., disjunctively rather than conjunctively defined . . . and more susceptible to accessibil- ity effects. (Brewer, 1988, p. 1) 6See also Beatty and Davis (1998) for a cognitive evaluation of questions from the Survey of Income and Program Participation and the National Health Interview Survey concerning discrepancies in print reading disability statistics.

226 THE DYNAMICS OF DISABILITY She further states that “social categories are assumed to be ‘fuzzy sets’ represented in the form of prototypical images rather than verbal trait lists” (Brewer, 1988, p. 10). Social cognition also provides a theoretical perspective that provides information about divergent perspectives of actors and observers. The actor-observer difference suggests that actors draw on situational infor- mation to explain behavior at any given time, whereas observers use stable disposition properties of the actor to understand behavior (Jones and Nisbett, 1971). To the extent that proxy reporters view disabilities as stable as opposed to dynamic characteristics, one would anticipate dis- crepancies between self-reports and proxy reports. Two sets of concepts drawn from social psychology are also useful for consideration with respect to the measurement of disability. The first is the concept of self; from a sociological perspective, self-conceptions involve three components: (1) how an individual sees him- or herself, (2) how other people actually see the individual, and (3) how the indi- vidual believes others see him or her (Rosenberg, 1990). The National Health Interview Survey-Disability Supplement (NHIS-D) and the National Organization on Disability/Harris Survey of Americans with Disabilities included questions that asked whether the respondent perceived that he or she had a disability and whether others perceived that the respondent had a disability. The second concept of interest involves the notion of social identity and the groups, statuses, and social categories to which the members of society are recognized as belonging. If the social identity category is ambiguous, the self-concept related to the social identity will also be ambiguous. As noted by Jette and Badley in their paper, the measurement of disability is often presented in surveys as an “all or nothing phenom- enon.” This approach to the measurement of disability assumes that (1) the respondent recognizes and identifies with the socially defined label and (2) is willing to reveal membership in the group. If disability were an “all- or-nothing” phenomenon, identification with the classification would be less ambiguous; however, as already noted, the enablement-disablement process is a dynamic one, subject to variation as a function of both self and society. To the extent that identification or affiliation with group member- ship carries with it any type of social stigma, willingness to reveal mem- bership in the group also carries with it a social cost, not unlike other phenomena subject to social desirability bias. Ambiguous social classification categories are also more likely to be subject to context effects; respondents use the specific wording of ques- tions, the immediately prior questions, or the overall focus of the question as a means for interpreting questions on disability. From a theoretical perspective, it is not surprising to find that estimates of the number of

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 227 persons with disabilities vary as a function of differences in the specific wording of the question, the number of questions used to determine the prevalence and severity of impairments and disabilities, the context of the questions immediately proximate to the question of interest, and the over- all focus of the questionnaire (health versus employment versus program participation). EMPIRICAL EVIDENCE CONCERNING MEASUREMENT OF DISABILITY ERROR To date, most investigations with respect to the error properties associ- ated with the measurement of persons with disabilities or the measurement of persons with work disabilities have focused on errors of observation, ignoring differences in estimates due to coverage error and nonresponse error. This review of the empirical literature is therefore focused on errors of observation. As an illustration of the type of empirical investigations concerning error in the measurement of disability, this section begins by examining the work that has been done to date with respect to measures of activities of daily living (ADL). The intent is to provide an illustration of the type of work that has been done (and not done) with respect to a frequently used measure of functional limitation. The focus is then turned to the measurement of persons with work disabilities. Measurement of ADLs, Functional Limitations, and Sensory Impairments Although there are several different measurement methods for the assessment of physical disability, one of the most often used (within the context of survey measurement) is the Index of Activities of Daily Living, often referred to as the Index of ADL (Katz et al., 1963). The index was originally developed to measure the physical functioning of elderly and chronically ill patients, but several national surveys of the general popu- lation administer the index to adults of all ages. The index assesses inde- pendence in six activities: bathing, dressing, toileting, transferring from a bed or chair, continence, and feeding. Despite its wide acceptance and use, the psychometric properties of the index have not been well docu- mented. Brorsson and Asberg (1984) reported reliability scores of 0.74 to 0.88 (based on 100 patients). Katz et al. (1970) applied the Index of ADLs as well as other indexes to a sample of patients discharged from hospitals for the chronically ill and reported correlations between the index and a mobility scale and between the index and a confinement measure of 0.50 and 0.39, respectively. Most assessments of the Index of ADLs have exam- ined the predictive validity of the index with respect to independent liv-

228 THE DYNAMICS OF DISABILITY ing (e.g., Katz and Akpom, 1976) or the length of hospitalization and discharge to home or death (e.g., Ashberg, 1987). These studies indicate relatively high levels of predictive validity. Despite the psychometric findings, a growing body of survey litera- ture suggests that the measurement of functional limitations via the use of ADL scales is subject to substantial amounts of measurement error and that measurement error is a significant factor in the apparent improve- ment or decline in functional health observed in longitudinal data. Jette (1994) found that minor changes in the wording of the questions resulted in significant differences in the percentage of the population identified as being limited. Rodgers and Miller (1997) directly compared responses by the same respondents (or more specifically, for the same target individu- als) by using different sets of ADL items and across different modes.7 They conclude that the measurements of functional limitations with re- spect to counts of ADLs, indications of the use of assistive devices or personal help, and indications of any difficulty are all subject to large amounts of measurement error, of which a substantial portion is random error. Similar to other empirical work (e.g., Mathiowetz and Lair, 1994), their findings indicate that the use of proxy respondents results in higher levels of reporting, of which only 25 to 33 percent can be explained by demographic characteristics and health variables of the target individual. The finding suggests that higher levels of functional limitations reported by proxy respondents are not simply a result of selection bias, in which those with the most severe limitations are reported by proxy.8 Their analy- ses also suggest that there was no clear effect of mode of data collection on estimates of functional limitations. As illustrative of the variability and lack of reliability that is evident in survey estimates of functional limitations, Tables 1 and 2 present find- ings from the 1990 decennial census and the Content Reinterview Survey (CRS) (U.S. Bureau of the Census, 1993; McNeil, 1993). The CRS was conducted approximately 5 to 9 months following the 1990 decennial 7Note, however, that the allocation across modes was not experimentally varied but rather was an artifact in the design in which older respondents (80 years and older) were assigned to the face-to-face mode of data collection and those less than 80 years of age were assigned to the telephone mode of data collection. However, a substantial number of re- spondents were interviewed in the mode other than that to which they were originally assigned; the crossover permits determination of both main and interaction effects related to the mode of data collection. 8In comparisons of self-reports and proxy reports with clinical evaluations, Rubenstein et al. (1984) found self response to be more “optimistic” and responses obtained by proxy report to be more pessimistic, findings which suggest that both self and proxy responses are subject to measurement error, albeit in different directions.

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 229 TABLE 1 Mobility Limitations: Distributions to Census Question 19a and Content Reinterview Survey Question 34a, Persons 16 to 64 Years of Age, United States, 1990 Content Reinterview Survey: Difficulty Going Outside Census Long Form: Difficulty Going Outside Yes No Total Yes 146 155 301 No 152 14,194 14,346 Total 298 14,349 14,647 NOTE: Prevalence rate based on Census: 2.03 percent, of which 49.0 percent were consis- tent responses. The prevalence rate based on the CRS: 2.05 percent, of which 48.5 percent were consistent responses. SOURCE: McNeil, 1993. TABLE 2 Self-Care Limitations: Distributions to Census Question 19b and Content Reinterview Survey Question 34b, Persons 16 to 64 Years of Age, United States, 1990 Content Reinterview Survey: Difficulty Going Outside Census Long Form: Difficulty Taking Care of Personal Needs Yes No Total Yes 69 346 415 No 120 13,856 13,976 Total 189 14,202 14,391 NOTE: The prevalence rate based on census: 2.9 percent, of which 16.6 percent were consistent responses. The prevalence rate based on the Content Reinterview Survey: 1.3 percent, of which 36.5 percent were consistent responses. SOURCE: McNeil, 1993. census, with a sample of 15,000 housing units selected from among those housing units assigned to complete the long form of the census. With respect to mobility limitations, estimates from the two surveys appear to be similar (e.g., 2.03 versus 2.05 percent), but examination of the responses for individuals indicates a low rate of consistent responses (less than 50 percent) among those who reply affirmatively for either survey. With respect to personal care limitations, once again, a high rate of inconsis- tency in the responses is seen among individuals who respond affirma-

230 THE DYNAMICS OF DISABILITY tively to the question in either survey. For example, among those 16 to 64 years of age, almost all (83.4 percent) of those who report a self-care limitation at the time of the census fail to report a self-care limitation in the CRS. Comparison of the percentage of persons with mobility and self-care limitations from the two surveys is confounded by differences in the es- sential survey conditions under which the data were collected and that most likely contribute to the discrepancies evident in the data. These differences include: • Differences in the mode of data collection. The decennial census is, for the most part, a self-administered questionnaire, whereas the CRS is interviewer administered and is conducted either by tele- phone (84 percent) or as a face-to-face interview (16 percent). McHorney et al. (1994) report that telephone administration of the SF-36 led to lower levels of reporting of chronic conditions and self-reports of poor health compared with a self-administered ver- sion of the SF-36. • Differences in the context in which the questions were asked. Although the wording of the specific items is almost the same with respect to mobility limitations or self-care limitations, as can be seen from a comparison of the two questionnaires, the context in which the questions are asked differs in the two instruments. Sev- eral additional questions concerning sensory impairments, the use of assistive devices for mobility, mobility limitations related to walking a quarter mile or up a flight of steps, and the ability to lift and carry objects weighing up to 10 pounds precede the items of interest in the CRS. There is a large body of literature that docu- ments the existence of context effects in attitude measurement (e.g., Schuman and Presser, 1981). The asking of additional questions could prime the respondent to think about impairments that he or she did not consider while answering the census questions, thereby resulting in an increase in the reporting of limitations. Alterna- tively, having just answered questions about a number of sensory impairments and limitations, respondents, when answering the more general questions, assume that the general question is in- tended to capture information not already reported; in this case one would expect the CRS estimates to be lower than those based on the census form. (See Sudman et al. [1996] for a review of the theoretical underpinning related to context effects and a thorough discussion of addition and subtraction effects.) • Self-reporting versus proxy reporting. There is little information as to who provided information on either the census form or the CRS.

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 231 Although the CRS attempts to obtain self-reports from each adult household member, information for approximately 25 percent of the persons was reported by proxy. As noted earlier, proxy respon- dents tend to report more activity limitations and more severe limitations than self-respondents. Finally, the possibility that the lack of reliability is indicative of the occur- rence of real change between the time of the census and the time of the CRS must also be considered. Although one can enumerate possible sources that explain the low rate of consistency between the two surveys, the lack of experimental design does not permit the identification of the relative contributions of the various design features to the overall lack of stability of these esti- mates. Empirical evidence shows that even when questions are administered under the same essential survey conditions, responses are subject to a high rate of inconsistency. This evidence comes from the administration of the same topical module on functional limitations and disability to respondents in the 1992-1993 panel of the Survey of Income and Program Participation. The module was administered between October 1993 and January 1994 (Time 1) and then again between October 1994 and January 1995 (Time 2). The context of the questionnaire is the same in both waves; the topical module is preceded by the core interview, which focuses on earnings, transfer income, program participation, and other forms of in- come. Information is collected for all members of the household, usually by having one person report for himself or herself and all other family members. In addition, information as to who served as the respondent is recorded; thus one can examine consistency in the reporting of informa- tion across time among all self-responses. Table 3 presents selected com- parisons of functional limitations and sensory impairments reported at Time 1 with those reported at Time 2. The comparisons clearly reveal high levels of theoretical inconsistency, even among self-respondents. For ex- ample, among those who report an inability to walk at Time 1, only 70.3 percent report the same status at Time 2. Limiting the comparison to self- reports only does not greatly improve the consistency. Among self-re- porters, 76.7 percent of those reporting inability to walk at Time 1 report the same status in the subsequent interview. These empirical findings illustrate some of the error properties asso- ciated with the measurement of functional limitations and sensory im- pairments. The research indicates that despite psychometric measures that indicate a relatively high degree of reliability, survey applications offer several examples of low levels of reliability, even under conditions in which the essential survey conditions are held constant. Subtle changes

232 THE DYNAMICS OF DISABILITY TABLE 3 Selected Panel Survey of Income and Program Participation Data: Time 1 (October 1993-January 1994) and Time 2 (October 1994- January 1995) Comparisons, United States Self-Respondents All Cases Both Times Percent at Percent at Time 2 Time 2 Number with Number with Status at Time 1 of Persons Disability of Persons Disability Uses cane, crutches, walker 508 45.5 286 50.0 Uses a wheelchair 175 61.7 83 68.7 Unable to see 159 49.1 87 49.4 Unable to hear 121 50.4 41 48.8 Unable to speak 47 68.1 5 80.0 Unable to walk 1,045 70.3 587 76.7 Unable to lift/carry 975 61.2 566 65.6 Unable to climb stairs 1,132 68.3 658 72.3 Needs help outside 699 53.5 302 57.3 Needs help bathing 271 52.0 114 54.4 Needs help dressing 237 49.8 80 55.0 SOURCE: McNeil, 1998. in the wording of questions, the order of questions, or the immediate prior context offer further illustration of the lack of robustness of these items. Although one can enumerate all of the factors that may contribute to this volatility, the relative contributions of the various factors have not been experimentally determined. Empirical Evidence Concerning Error in the Measurement of Work Disability The assessment of work disability in federal surveys has focused on variants of a limited number of questions, most of which concern whether the individual is limited in the kind or amount of work he or she is able to do or is unable to work at all because of a physical, mental, or emotional problem. Not dissimilar to the assessment of functional limitations, work disability is measured in data collection efforts that vary with respect to the essential survey conditions, the specific wording of questions, the number of questions asked, and the determination of severity, duration, and the use of assistive devices or environmental barriers. As McNeil (1993) points out, one of the problems with the current set of indicators

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 233 designed to measure work disability is that many fail to acknowledge the role of environmental barriers and accommodations. He states: Questions can be raised about the validity of data on persons who are “limited in kind or amount of work they can do” or are “prevented from working.” The work disability questions make no mention of environ- mental factors, even though it is obvious that a person’s ability to work cannot be meaningfully separated from his or her environment. Work may be difficult or impossible under one set of environmental factors but productive and rewarding under another. It would certainly be log- ical for a respondent to answer “no” to the question, “Do you have a condition that prevents you from working?” if the real reason he or she is not working is the inaccessibility of the transportation system or the lack of accommodations at the workplace. (pp. 3–4) As noted in the paper by Jette and Badley, the “fundamental con- ceptual issue of concern is that health-related restriction in work par- ticipation may not be solely or even primarily related to the health condi- tion . . .”. One of the challenges facing questionnaire designers is the development of questions that match the conceptual framework of interest with respect to work disability, specifically, whether the focus is on the health condition that limits the individual’s ability to perform specific tasks related to a specific job, the external factors related to the perfor- mance of work, other factors that affect participation in the work environ- ment (e.g., transportation), or all three sets of factors. Although McNeil (1993) raises questions concerning the validity of the work disability measures currently in use, several empirical investiga- tions raise questions about the reliability of these measures, not unlike the findings with respect to the measurement of functional limitations and sensory impairments. Once again, it can be seen that differences in the wording of the questions, the context in which they are asked, the nature of the respondent, and other essential survey conditions, including the data collection organization and the sponsorship of the survey, may contrib- ute to differences in estimates of the working-age disabled population. Haber (1990), as revised from Haber and McNeil (1983), examined work disability from selected surveys between 1966 and 1988. He notes that “despite a high degree of consistency in the social and economic composition of the disabled population over a variety of studies, the over- all level of disability prevalence has varied considerably” (p. 43). Haber’s findings are reproduced in Table 4. The estimates from the various sur- veys represent differences in the year of administration, the wording of the questions, the overall content of the survey, the mode of administra- tion, the organization collecting the information, and the organization

234 THE DYNAMICS OF DISABILITY TABLE 4 Prevalence of Work Disability Across Various Surveys, United States, 1966-1982 Percent Classified with a Work Disability Data Source (age range [years] for estimate) Total Males Females 1966 SSA (18-64) 17.2 17.2 17.2 1967 SEO (17-64) 14.0 14.0 14.0 1969 NHIS (17-64) 11.9 13.1 10.9 1970 Census (16-64) 9.4 10.2 8.6 1972 SSA (20-64) 14.3 13.6 15.0 1976 SIE (18-64) 13.3 13.3 13.3 1978 SSA (18-64) 17.2 16.1 18.4 1980 Census (16-64) 8.5 9.0 8.0 1980 NHIS (17-64) 13.5 14.3 12.8 March, 1981 CPS (16-64) 9.0 9.5 8.5 March, 1982 CPS (16-64) 8.9 9.3 8.5 March, 1983 CPS (16-64) 8.7 9.0 8.3 March, 1984 CPS (16-64) 8.6 9.2 8.1 1984 SIPP (16-64) 12.1 11.7 12.4 March, 1985 CPS (16-64) 8.8 9.2 8.4 March, 1986 CPS (16-64) 8.8 9.4 8.2 1986 NHIS (18-64) 13.5 14.3 12.8 NOTES: SSA = Social Security Administration Disability Survey; SEO = Survey of Eco- nomic Opportunity; NHIS = National Health Interview Survey; SIE = Survey of Income and Education; March CPS = Annual March Supplement (Income Supplement) to the Current Population Survey; SIPP = Survey of Income and Program Participation. SOURCE: Haber, 1990. sponsoring the study. Although the wording of the questions is quite similar across the various surveys, there are some minor differences in specific wording (e.g., differences with respect to the emphasis on a health condition) and the order of the questions (e.g., whether the questions begin, as in the NHIS, by asking about whether a health condition keeps the person from working or begin, as in the SSA surveys, by asking whether the person’s health limits the kind or amount of work that the person can do). As is evident from Table 4, the survey’s content appears to be related to the overall estimate; the lowest rates of work disability prevalence come from the Census and the March Supplement to the Cur- rent Population Survey (8.5 to 9.4 percent), and the highest rates come from the surveys sponsored by SSA (14.3 to 17.2 percent). The lack of stability that was evident for estimates of mobility and self-care limitations between the 1990 census and the CRS is also evident for estimates of work disability. Table 5 presents the comparison of

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 235 TABLE 5 Work Disability: Distributions to Census Questions 18a and 18b and Content Reinterview Survey Questions 33a and 33b for Persons 16-64 years of age, United States, 1990 Content Reinterview Survey: Limited in Kind or Amount of Work or Prevented from Working Census Long Form: Limited in Kind or Amount of Work or Prevented from Working Yes No Total Yes 778 366 1,144 No 650 12,988 13,638 Total 1,428 13,354 14,782 NOTE: The prevalence rate based on census: 7.7 percent, of which 68 percent were consis- tent responses. The prevalence rate based on the Content Reinterview Survey: 9.7 percent, of which 54.5 percent were consistent responses. SOURCE: McNeil, 1993. responses between the 1990 census and the CRS with respect to whether the person is limited in the kind of work, or the amount of work, or is prevented from working at a job because of physical, mental, or other health conditions. Once again, it can be seen that between one-third and almost one-half of the respondents are inconsistent in their responses. More recent investigations have used the extensive data from NHIS-D to investigate alternative estimates of the population with work disabili- ties. The data also provide an opportunity to examine inconsistencies in the reporting of work disability and receipt of SSI or SSDI benefits. For example, LaPlante (1999) found that, based on the data from the NHIS-D, 9.5 million adults 18 to 64 years of age report being unable to work be- cause of a health problem. Among these 9.5 million adults, 5.3 million (or 56 percent) do not report receipt of SSI or SSDI benefits. If one looks at those who report receiving SSI or SSDI benefits, 75 percent report that they are unable to work and 13 percent report that they are limited in the kind or amount of work that they can perform, but 12.3 percent who report receipt of benefits do not report any limitation with respect to work. Although these variations in estimates derived from different surveys suggest instability in the estimates of the proportion of persons with work disabilities as a function of the wording of the question, the nature of the respondent, and the essential survey conditions under which the mea- surement was taken, they provide little information about measurement

236 THE DYNAMICS OF DISABILITY error within the framework of either survey statistics or psychometrics. Little is known about the validity of these items or the reliability of these items, whether one views validity from the perspective of survey statis- tics as deviations from the true value or from the perspective of psycho- metrics as criterion-related or construct validity. The relative contribu- tions of various sources of error are, for the most part, unknown; it is only known that various combinations of design features produce different estimates. None of the studies address errors of nonobservation. QUESTION WORDING ISSUES RELATED TO SELECTED MEASURES OF WORK DISABILITY Jette and Badley point out the conceptual problems inherent in many questions designed to measure persons with work disabilities, including the failure of most questions to enumerate the separate elements related to the role of work. That failure is evident in most work disability screen- ing questions designed to be administered to the general adult popula- tion. The gap between the conceptual framework and the questions used to screen for work disability, is illustrated by using questions from several federal data collection efforts. The long form of the decennial census for the year 2000 includes the following questions: Because of a physical, mental, or emotional condition lasting 6 months or more, does this person have any difficulty in doing any of the follow- ing activities: . . . d. (Answer if this person is 16 years old or over.) Working at a job or business? The respondent is to check a box corresponding to “Yes” or “No.” The question is complex for several reasons: • The respondent must consider multiple dimensions of health (physical, mental, and emotional) and attribute difficulty working at a job or business to one or more of these health problems. The explicit enumeration of physical, mental, or emotional conditions serves as a means of clarifying for the respondent the fact that the question is intended to cover all three dimensions of health, but at the cost of additional cognitive processing by the respondent. • The respondent must also assess the duration of the condition and determine the degree to which the 6 months is intended to convey

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 237 6 months specifically or a more general concept of a “long-term” condition. • The term “difficulty” is subject to interpretation. Cognitive evalua- tion of the term “difficulty” suggests that for some respondents the term implies capacity or ability to perform the activity but does not infer actual participation in the activity. • What is or is not included in the concept of working is further subject to interpretation by the respondent (e.g., inclusion or exclu- sion of sheltered workshops). As with many single screening items, the question fails to address accommodations that facilitate participation or barriers that prohibit par- ticipation. For example, if an individual is currently employed in an envi- ronment that accommodates a health condition, the respondent must de- termine whether the person should be considered as having difficulty working, even though the present employment situation presents no dif- ficulty to the person. The NHIS asks two questions concerning work limitations: Does any impairment or health problem NOW keep _______ from working at a job or business? Is ____ limited in the kind OR amount of work ___ can do because of any impairment or health problem? In contrast to the questions in the census long form, the NHIS ques- tions do not enumerate the various areas of health for consideration, nor does either question include a qualifying statement with respect to dura- tion. The two questions are more specific in addressing the impact on working; compared with the term “difficulty” used in the census ques- tionnaire, the NHIS probes whether a condition prevents the person from working or limits the kind or amount of work. Once again, note the lack of distinction between the ability to perform the activities associated with the actual performance of the job and those activities related to the role of work. For those who retire early because of a health condition or impair- ment, would the respondent consider that health problem as keeping the person from working? IMPLICATIONS FOR METHODOLOGICAL RESEARCH The point of the examples presented above is not to criticize the ques- tionnaires in which they appear but rather to illustrate the problem of

238 THE DYNAMICS OF DISABILITY attempting to measure a complex, multidimensional, dynamic construct with a single question or a set of two questions. No one or even two questions can possibly tap into the various components of work disabili- ties. Clearly the first step toward a robust set of screening items is the acceptance of a shared conceptual framework and understanding of the dimensions of the construct of interest. That framework must consider the social environment in which the measurement of interest will be taken, understanding that the comprehension of the question is shaped not only by the specific words used in the question and the context of the question, but by the perceived intent of the question. The use of cognitive labora- tory techniques can aid in the identification of problems of comprehen- sion due to the use of inherently vague terms and differential perceptions of the intent of the question. Such techniques will aid in the understand- ing of the validity of the questions and, through the refinement of the wording of questions, hopefully improve the reliability of the items. Simply documenting that variation in the essential survey conditions of the measurement process contributes to different estimates of persons with work disabilities is not sufficient; the marginal effects of various factors need to be measured and the impact needs to be reduced through the use of alternative design features. Both of these can be accomplished only through a program of experimentation. Similarly, the psychometric properties of these measures need to be assessed. Without undertaking a thorough program of development and evaluation, the discrepant esti- mates evident in the empirical literature will persist. REFERENCES Andersen R, Kasper J, Frankel M. 1979. Total Survey Error. San Francisco: Jossey-Bass Pub- lishers. Ashberg K. 1987. Disability as a predictor of outcome for the elderly in a department of internal medicine. Scandinavian Journal of Social Medicine 15:261–265. Beatty P, Davis W. 1998. Evaluating Discrepancies in Print Reading Disability Statistics through Cognitive Interviews. Unpublished Memorandum. Washington, DC: U.S. Bureau of the Census. Bohrnstedt G. (1983) Measurement. In Rossi, Wright, Anderson, eds. Handbook of Survey Research. New York: Academic Press. Brewer M. 1988. A dual process model of impression formation. In: Srull T, Wyer R, eds. Advances in Social Cognition, Volume 1. Hillsdale, NJ: Lawrence Erlbaum Associates. Brorsson B, Asberg K. 1984. Katz Index of Independence in ADL: Reliability and validity in short-term care. Scandinavian Jour of Rehabilitation Medicine 16:125–132. Cannell C, Fisher G, Bakker T. 1965. Reporting of hospitalizations in the health interview survey. Vital and Health Statistics, Series 2, Number 6. Washington, DC: U.S. Public Health Service. Forsyth B, Lessler J. 1991. Cognitive laboratory methods: A taxonomy. In: Biemer, Groves, Lyberg, Mathiowetz, Sudman, eds. Measurement Errors in Surveys. New York: John Wiley and Sons.

METHODOLOGICAL ISSUES IN THE MEASUREMENT OF WORK DISABILITY 239 Groves R. 1989. Survey Errors and Survey Costs. New York: John Wiley and Sons. Groves R. 1991. Measurement errors across the disciplines. In: Biemer P, Groves R, Lyberg L, Mathiowetz N, Sudman S, eds. Measurement Errors in Surveys. New York. John Wiley and Sons. Groves R, Couper M. 1998. Nonresponse in Household Surveys. New York: John Wiley and Sons. Haber L. 1967. Identifying the disabled: Concepts and methods in the measurement of disability. Social Security Bulletin 30:17–34. Haber L. 1990. Issues in the definition of disability and the use of disability survey data. In: Levin, Zitter, Ingram, eds. Disability Statistics: An Assessment. Washington, DC: National Academy Press. Haber L, McNeil J. 1983. Methodological Questions in the Estimation of Disability Prevalence. Unpublished report. Washington, DC: U.S. Bureau of the Census. Hahn R, Truman B, Barker N. 1996. Identifying ancestry: The reliability of ancestral identifi- cation in the United States by self, proxy, interviewer and funeral director. Epidemiol- ogy 7:75–80. Hansen M, Hurwitz W, Bershad M. 1961. Measurement errors in censuses and surveys. Bulletin of the International Statistical Institute 38:359–374. Jette A. 1994. How measurement techniques influence estimates of disability in older popu- lations. Social Science and Medicine 38:937–942. Jette A, Badley E. 1999. Conceptual issues in the measurement of work disability. In: Mathiowetz N, Wunderlich GS, eds. Survey Measurement of Work Disability: Summary of a Workshop. Washington, DC: National Academy Press. Pp. 4–27. Jobe J, Mingay D. 1990. Cognitive laboratory approach to designing questionnaires for sur- veys of the elderly. Public Health Reports 105:518–524. Jones E, Nisbett R. 1971. The Actor and the Observer: Divergent Perceptions of the Causes of Behavior. Morristown, NJ: General Learning Press. Katz S, Akpom C. 1976. Index of ADL. Medical Care 14:116–118. Katz S, Ford A, Moskowitz R, Jacobsen B, Jaffe M. 1963. Studies of illness in the aged: The index of ADL: A standardized measure of biological and psychosocial function. Jour- nal of the American Medical Association 185:914–919. Katz S, Downs T, Cash H, Grotz R. 1970. Progress in development of the Index of ADL. Gerontologist 10:20–30. Keller D, Kovar M, Jobe J, Branch L. 1993. Problems eliciting elders’ reports of functional status. Journal of Aging and Health 5:306–318. Kish L. 1965. Survey Sampling. New York: John Wiley and Sons. LaPlante M. 1999. Highlights from the National Health Interview Survey Disability Study. Pre- sentation to the Committee to Review the Social Security Administration’s Disability Decision Process Research, Institute of Medicine, and Committee on National Statis- tics, National Research Council. Lord F, Novick M. 1968. Statistical Theories of Mental Test Scores. Reading, Mass: Addison- Wesley. Lyberg L, Kasprzyk D. 1991. Data collection methods and measurement error: An over- view. In: Biemer, Groves, Lyberg, Mathiowetz, Sudman, eds. Measurement Errors in Surveys. New York: John Wiley and Sons. Mathiowetz N, Groves R. 1985. The effects of respondent rules on health survey reports. American Journal of Public Health 75:639–644. Mathiowetz N, Lair T. 1994. Getting better? Change or error in the measurement of func- tional limitations. Journal of Economic and Social Measurement 20:237–262. McDowell I, Newall C. 1996. Measuring Health. A Guide to Rating Scales and Questionnaires. New York: Oxford University Press.

240 THE DYNAMICS OF DISABILITY McHorney C, Kosinski M, Ware J. 1994. Comparisons of the costs and quality of norms for the SF-36 Health Survey collected by mail versus telephone interview: Results from a national survey. Medical Care 32:551–567. McNeil J. 1993. Census Bureau Data on Persons with Disabilities: New Results and Old Questions about Validity and Reliability. Paper presented at the 1993 Annual Meeting of the Society for Disability Studies, Seattle, Washington. McNeil J. 1998. Selected 92/93 Panel SIPP Data: Time 1 = Oct.93–Jan.94, Time 2 = Oct.94– Jan.95. Unpublished table. Moore J. 1988. Self/proxy response status and survey response quality: A review of the literature. Journal of Official Statistics 4:155–172. Rodgers W, Miller B. 1997. A comparative analysis of ADL questions in surveys of older people. The Journals of Gerontology 52B:21–36. Rosenberg M. 1990. The self-concept: Social product and social force. In: Rosenberg M and Turner R, eds. Social Psychology: Sociological Perspectives. New Brunswick: Transaction Publishers. Rubinstein L, Schaier C, Wieland G, Kane R. 1984. Systematic biases in functional status assessment of elderly adults: Effects of different data sources. The Journal of Gerontology 39(6):686–691. Sampson A. 1997. Surveying individuals with disabilities. In Spencer B, ed. Statistics and Public Policy. Oxford: Clarendon Press. Schuman H, Presser S. 1981. Questions and Answers in Attitude Surveys. New York: Aca- demic Press. Schwarz N, Wellens T. 1997. Cognitive dynamics of proxy responding: The diverging per- spectives of actors and observers. Journal of Official Statistics 13:159–180. Suchman L, Jordan B. 1990. Interactional troubles in face-to-face survey interviews. Journal of the American Statistical Association 85:232–241. Sudman S, Bradburn N. 1973. Effects of time and memory factors on response in surveys. Journal of the American Statistical Association 68:805–815. Sudman S, Bradburn N, Schwarz N. 1996. Thinking About Answers: The Application of Cogni- tive Processes to Survey Methodology. San Francisco: Jossey-Bass. Tourangeau R. 1984. Cognitive sciences and survey methods. In: Jabine, Straf, Tanur, Tourangeau, eds. Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines. Washington, DC: National Academy Press. U.S. Bureau of the Census. 1993. Content Reinterview Survey: Accuracy of Data for Selected Population and Housing Characteristics as Measured by Reinterview. U.S. Department of Commerce, 1990 Census of Population and Housing, Evaluation and Research Re- ports. Washington, DC. Willis G, Royston P, Bercini D. 1991. The use of verbal report methods in the development and testing of survey questionnaires. Applied Cognitive Psychology 5(3):251–267.

Next: SSA's Disability Determination of Mental Impairments: A Review Toward an Agenda for Research »
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs Get This Book
×
Buy Hardback | $68.00 Buy Ebook | $54.99
MyNAP members save 10% online.
Login or Register to save!
Download Free PDF

The Society Security disability program faces urgent challenges: more people receiving benefits than ever before, the prospect of even more claimants as baby boomers age, changing attitudes culminating in the Americans With Disabilities Act. Disability is now understood as a dynamic process, and Social Security must comprehend that process to plan adequately for the times ahead. The Dynamics of Disability provides expert analysis and recommendations in key areas:

  • Understanding the current social, economic, and physical environmental factors in determining eligibility for disability benefits.
  • Developing and implementing a monitoring system to measure and track trends in work disability.
  • Improving the process for making decisions on disability claims.
  • Building Social Security's capacity for conducting needed research.

This book provides a wealth of detail on the workings of the Social Security disability program, recent and emerging disability trends, issues and previous experience in researching disability, and more. It will be of primary interest to federal policy makers, the Congress, and researchers—and it will be useful to state disability officials, medical and rehabilitation professionals, and the disability community.

  1. ×

    Welcome to OpenBook!

    You're looking at OpenBook, NAP.edu's online reading room since 1999. Based on feedback from you, our users, we've made some improvements that make it easier than ever to read thousands of publications on our website.

    Do you want to take a quick tour of the OpenBook's features?

    No Thanks Take a Tour »
  2. ×

    Show this book's table of contents, where you can jump to any chapter by name.

    « Back Next »
  3. ×

    ...or use these buttons to go back to the previous chapter or skip to the next one.

    « Back Next »
  4. ×

    Jump up to the previous page or down to the next one. Also, you can type in a page number and press Enter to go directly to that page in the book.

    « Back Next »
  5. ×

    To search the entire text of this book, type in your search term here and press Enter.

    « Back Next »
  6. ×

    Share a link to this book page on your preferred social network or via email.

    « Back Next »
  7. ×

    View our suggested citation for this chapter.

    « Back Next »
  8. ×

    Ready to take your reading offline? Click here to buy this book in print or download it as a free PDF, if available.

    « Back Next »
Stay Connected!