Methodological Issues in the Measurement of Work Disability1

Nancy A. Mathiowetz, Ph.D.2

The collection of information about persons with disabilities presents a particularly complex measurement issue because of the variety of conceptual paradigms that exist, the complexity of the various paradigms, and the numerous means by which alternative paradigms have been operationalized in different survey instruments (see paper by Jette and Badley for a review). For example, disability is often defined in terms of environmental accommodation of an impairment; hence, two individuals with the same impairment may not be similarly disabled or share the same perception of their impairment. For an individual with mobility limitations who lives in an assisted-living environment that accommodates the impairment, the environmental adaptations may result in little or no disability. The same individual living on the second floor of an apartment building with no elevator may have a very different perception of the impairment and may see him- or herself as disabled because of the environmental barriers that exist within his or her immediate environment.

The Social Security Administration (SSA) is currently reengineering its disability claims process for providing benefits to blind and disabled

1  

This paper was originally prepared for the committee workshop titled “Workshop on Survey Measurement of Work Disability: Challenges for Survey Design and Method” held on May 27–28, 1999 in Washington, D.C. (IOM, 2000).

2  

Nancy Mathiowetz is an Associate Professor at the University of Maryland’s Joint Program in Survey Methodology.



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs Methodological Issues in the Measurement of Work Disability1 Nancy A. Mathiowetz, Ph.D.2 The collection of information about persons with disabilities presents a particularly complex measurement issue because of the variety of conceptual paradigms that exist, the complexity of the various paradigms, and the numerous means by which alternative paradigms have been operationalized in different survey instruments (see paper by Jette and Badley for a review). For example, disability is often defined in terms of environmental accommodation of an impairment; hence, two individuals with the same impairment may not be similarly disabled or share the same perception of their impairment. For an individual with mobility limitations who lives in an assisted-living environment that accommodates the impairment, the environmental adaptations may result in little or no disability. The same individual living on the second floor of an apartment building with no elevator may have a very different perception of the impairment and may see him- or herself as disabled because of the environmental barriers that exist within his or her immediate environment. The Social Security Administration (SSA) is currently reengineering its disability claims process for providing benefits to blind and disabled 1   This paper was originally prepared for the committee workshop titled “Workshop on Survey Measurement of Work Disability: Challenges for Survey Design and Method” held on May 27–28, 1999 in Washington, D.C. (IOM, 2000). 2   Nancy Mathiowetz is an Associate Professor at the University of Maryland’s Joint Program in Survey Methodology.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs persons under the Social Security Disability Insurance (SSDI) and Supplemental Security Income (SSI) programs. As part of the effort to redesign the claims process, SSA has initiated a research effort designed to address the growth in disability programs, including the design and conduct of the Disability Evaluation Study (DES). The DES will provide SSA with comprehensive information concerning the number and characteristics of persons with impairments severe enough to meet SSA’s statutory definition of disability, as well as the number and characteristics of people who are not currently eligible but who could be eligible as a result of changes in the disability decision process. For those years in which the DES is not conducted, SSA will need to monitor the potential pool of applicants. One means by which SSA can monitor the size and characteristics of potential beneficiaries is through other ongoing federal data collection efforts. For both the conduct of the DES and the monitoring of the pool of potential beneficiaries through the use of various data collection efforts, it is critical to understand the measurement error properties associated with the identification of persons with disabilities as a function of the essential survey conditions under which the data have been and will be collected. The extent to which alternative instruments designed to measure persons with disabilities map to various eligibility criteria under consideration by SSA is also important. BACKGROUND The collection of disability data is an evolving field. Although a large and growing number of scales attempt to measure functional status and work disability, little is known about the measurement error properties of various questions and composite scales. The empirical literature provides clear evidence of variation in the estimates of the number of persons with disabilities in the United States, depending upon the conceptual paradigm of interest, the analytic objectives of the particular measurement process, and the essential survey conditions under which the information is collected (e.g., Haber, 1990; McNeil, 1993; Sampson, 1997). This literature suggests that estimates of the disabled population not only are related to the conceptual framework underlying the measurement construct but are also a function of the essential survey conditions under which the measurement occurred, including the specific questions used to measure disability, the context of the questions, the source of the information (self-versus proxy response), variations in the mode and method of data collection, and the sponsor of the data collection effort. Furthermore, terms such as impairment, disability, functional limitation, and participation are often inconsistently used, resulting in different and conflicting estimates of prevalence. Attempts to measure not only the prevalence but also the

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs severity of an impairment or disability further complicate the measurement process. Recent shifts in the conceptual paradigm of disability, in which disability is viewed as a dynamic process rather than a static measure and as an interaction between an individual with an impairment and the environment rather than as a characteristic only of the individual, imply that those responsible for the development of disability measures must separate the measurement of the impact of environmental factors in the enablement-disablement process from the measurement of ability. Viewing disability as a dynamic state resulting from an interaction between a person’s impairment and a particular environmental context further complicates the assessment of the quality of various survey measures of disability, specifically, the reliability of a measure. As a dynamic characteristic, one would anticipate changes in the reports of disability as a function of changes in the individual as well as changes in the social and environmental contexts. The challenge for the measurement process is to disentangle true change from unreliability. This workshop comes at a time when the federal government is undertaking several initiatives with respect to the measurement of disability in federal data collection efforts. The Americans with Disability Act of 1990 (ADA) defines disability as (1) a physical or mental impairment that substantially limits one or more of the major life activities of the individual, (2) a record of a substantially limiting impairment, or (3) being regarded as having a substantially limiting impairment. Although the measurement of disability within household surveys is not bound by the ADA definition, the passage of the ADA provides a socioenvironmental framework for how society comprehends and uses terms such as disability and impairment (e.g., the popular press and court rulings on ADA-related litigation). These definitions will evolve as a function of litigation related to ADA legislation and presentation of that litigation in the press. Hence, society is entering a period in which potential dynamic shifts in the comprehension and interpretation of the language associated with the measurement of persons with disabilities can be anticipated. This paper is intended to serve as a means of facilitating discussion among individuals from diverse theoretical and empirical disciplines concerning the methodological issues related to the measurement of persons with disabilities. As a first step to achieving this goal, a common language and framework needs to be established for the enumeration and assessment of the various sources of error that affect the survey measurement process. The chapter draws from several empirical investigations to provide evidence as to the extent of knowledge concerning the error properties associated with various approaches to the measurement of functional limitations and work disability.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs SOURCES OF ERROR IN THE SURVEY PROCESS: THE SURVEY RESEARCH PERSPECTIVE For the purpose of defining a framework that can be used to examine error associated with the measurement of persons with disabilities, I draw upon the conceptual structure and language used by Groves (1989), based on earlier work of Kish (1965) and used by Andersen et al. (1979). Suchman and Jordan (1990) have described errors in surveys as the discrepancy between the concept of interest to the researcher and the quantity actually measured in the survey. Bias, according to Kish (1965, p. 509), refers to systematic errors in a statistic that affect any sample taken under a specified survey design with the same constant error or, as stated by Groves (1989), is the type of error that affects the statistic in all implementations of a survey. Variable errors are those errors that are specific to a particular implementation of a design, that is, specific to the particular trial. The concept of variable error requires the possibility of repeating the survey, with changes in the units of replication, that is, the particular set of respondents, interviewers, supervisors, coding, editing, and data entry staff. Errors of Nonobservation Within the framework of survey methodology, both variable error and bias are further characterized in terms of errors of nonobservation and errors of observation. As one would expect from the term, errors of nonobservation reflect failure to obtain observations for some segment of the population or for all elements to be measured. Errors of nonobservation are most often classified as arising from three sources: sampling, coverage, and nonresponse. Sampling Error Sampling error represents one type of nonobservation variable error; it arises from the fact that measurements (observations) are taken for only a subset of the population. Sampling variance refers to changes in the value of some statistic over possible replications of a survey in which the sample design is fixed but different individuals are selected for the sample. Estimates based on a particular sample will not be identical to estimates based on a different subset of the population (selected in the same manner) or to estimates based on the full population.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs Coverage Error Coverage error defines the failure to include all eligible population members on the list or frame used to identify the population of interest. Those members not identified on the frame have a zero probability of selection and are never measured. For example, in the United States, approximately 5 percent of the population lives in households without telephone service; any survey that is conducted by telephone and that attempts to describe the entire household-based population of the United States therefore suffers from coverage error. To the extent that those without telephones differ from those with telephones for the construct of interest, the resulting estimates will be biased. Nonresponse Error Nonresponse error can arise from failure to obtain any information from the persons selected to be measured (unit nonresponse) or from failure to obtain complete information from all respondents to a particular question (item nonresponse). The extent to which nonresponse affects survey statistics is a function of both the rate of nonresponse and the difference between respondents and nonrespondents, as illustrated in the following formula: where: yr = the statistic estimated from the r respondents, yn = the statistic estimated from all n sample cases, ynr = the statistic estimated from the nr nonrespondents, and nr = the proportion of nonrespondents. Knowing the response rate is not sufficient to determine the level of nonresponse bias; studies with both high and low rates of nonresponse can suffer from nonresponse bias. As noted by Groves and Couper (1998), it is useful to further distinguish among the types of unit nonresponse, each of which may be related to the failure to measure different types of persons. For most household data collection efforts involving interviewers, the final outcome of an interview attempt is often classified into one of the following four categories: completed or partial interview, refusal, noncontact, and other non-

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs interview.3 Survey design features can affect the distribution of cases across the various categories. Noncontact rates are affected by the length of the field period (in which short field periods result in higher noncontact rates than longer field periods). Surveys that place greater demands on the respondent may suffer from higher refusal rates than less burdensome instruments. The choice of respondent rule affects the rate of nonresponse; designs that permit any knowledgeable adult within the household to serve as the respondent provide an interviewer with some flexibility, should one adult within the household refuse or be unable to participate. Field efforts that fail to accommodate non-English-speaking respondents or that focus their attention on frail subpopulations tend to experience higher rates of other noninterviews. Errors of Observation Observational errors can arise from any of the elements directly engaged in the measurement process, including the questionnaire, the respondent, and the interviewer, as well as the characteristics that define the measurement process (e.g., the mode and method of data collection). This section briefly reviews the theoretical framework and empirical findings related to the various sources of measurement error in surveys. Questionnaire as Source of Measurement Error Tourangeau (1984) and others (see Sudman et al. [1996] for a review) have categorized the survey question-and-answer process as a four-step process involving comprehension of the question, retrieval of information from memory, assessment of the correspondence between the retrieved information and the requested information, and communication of the response. In addition, the encoding of information, a process outside the control of the survey interview, determines a priori whether the information of interest is available for the respondent to retrieve. Comprehension of the question involves the assignment of meaning to the question by the respondent. Ideally, the question will convey the meaning of interest to the researcher. However, several linguistic, structural, and environmental factors affect the interpretation of the question by the respondent. These factors include the specific wording of the ques- 3   Other noninterview is used to classify cases in which contact was made with the members of the household in which the sample person resides, but for reasons such as physical or mental health, language difficulties, or other reasons not related to reluctance to participate, the interviewer was unable to conduct the interview.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs tion, the structure of the question, the order in which the questions are presented, the overall topic of the questionnaire, whether the question is read by the respondent (self-administration) or is presented to the respondent by an interviewer, and the mode of communication used by the interviewer (that is, telephone versus face-to-face presentation). The wording of a question is often seen as one of the major problems in survey research: although one can standardize the language read by the respondent or the interviewer, standardization of the language does not imply standardization of the meaning. For example, “Do you own a car?” appears to be a simple question from the perspective of semantics and structure. However, several of the words in the question are subject to variation in interpretation, including “you” (just the respondent or the respondent and his or her family), “own” (completely paid for, purchased as opposed to rented), and even the word “car” (does this include vans and trucks?). The goal for the questionnaire designer is to develop questions that exhaust the range of possible interpretations, making sure that the particular concept of interest is the concept that the respondent has in mind when responding to the item. One source of variation in a respondent’s comprehension of survey questions is due to differences in the perceived intent or meaning of the question. Perceived intent can be shaped by the sponsorship of the survey, the overall topic of the questionnaire, or the environment more immediate to the question of interest, such as the context of the previous question or set of questions or the specific response options associated with the question. Respondent as Source of Measurement Error Once the respondent comprehends the question, he or she must retrieve the relevant information from memory, make a judgment as to whether the retrieved information matches the requested information, and communicate a response. Much of the measurement error literature has focused on the retrieval stage of the question-answering process, classifying the lack of reporting of an event as retrieval failure on the part of the respondent and comparing the characteristics of events that are reported with those that are not reported. Several factors have been found to be related to the quality of reporting, including the length of the reference period of interest and the salience of the information. For example, the literature suggests that the greater the length of the recall period, the greater the expected bias in the reporting of episodic information (e.g., Cannell et al., 1965; Sudman and Bradburn, 1973). Salience is hypothesized to affect the strength of the memory trace and, subsequently, the effort involved in retrieving the information from long-term memory.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs The weaker the trace, the greater the effort needed to locate and retrieve the information. As part of the communication of the response, the respondent must determine whether he or she wishes to reveal the information as part of the survey process. Survey instruments often ask questions about socially and personally sensitive topics. It is widely believed and well documented that such questions elicit patterns of underreporting (for socially undesirable behavior and attitudes), as well as overreporting (for socially desirable behaviors and attitudes). The determination of social desirability is a dynamic process and is a function of the topic of the question, the immediate social context, and the broader social environment at the time the question is asked. Even if the respondent is able to retrieve accurate information, he or she may choose to edit this information at the response formation stage as a means of reducing the costs associated with revealing the information. The use of proxy reporters, that is, asking individuals within sampled households to provide information about other members of the household, is a design decision that is often framed as a trade-off among costs, sampling errors, and nonsampling errors. The use of proxy informants to collect information about all members of a household can increase the sample size (and hence reduce the sampling error) at a lower marginal data collection cost than increasing the number of households. The use of proxy respondents also facilitates the provision of information for those who would otherwise be lost to nonresponse because of an unwillingness or inability to participate in the survey interview. However, the cost associated with the use of proxy reporting may be an increase in the rate of errors of observation associated with poorer-quality reporting for others compared with the quality that would have been obtained under a rule of all self-response. Most of the evaluations of the quality of proxy responses compared with the quality of self reports have focused on the reporting of autobiographical information (e.g., Mathiowetz and Groves, 1985; Moore, 1988) with some recent investigations examining the convergence of self and proxy reports of attitudes (Schwarz and Wellens, 1997). The literature is, however, for the most part silent with respect to the quality of proxy reports for personal characteristics, the exception being a small body of literature that addresses self-reporting versus proxy reporting effects in the reporting of race/ethnicity (Hahn et al., 1996) and the reporting of activities of daily living (e.g., Mathiowetz and Lair, 1994; Rodgers and Miller, 1997). The findings suggest that proxy reports of functional limitations tend to be higher than self-reports; the research is inconclusive as to whether the discrepancy is a function of overreporting on the part of proxy informants, underreporting on the part of self-respondents, or both.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs Interviewers as Sources of Measurement Error For interviewer-administered questionnaires, interviewers may affect the measurement processes in one of several ways, including: failure to read the question as written; variation in interviewer’s ability to perform the other tasks associated with interviewing, for example, probing insufficient responses, selecting appropriate respondents, and recording the information provided by the respondent; and demographic and socioeconomic characteristics as well as voice characteristics that influence the behavior of the respondent and the responses provided by the respondent. The first two factors contribute to measurement error from a cognitive or psycholinguistic perspective in that different respondents are exposed to different stimuli; thus, variation in responses is, in part, a function of the variation in stimuli. All three factors suggest that the interviewer effect contributes to an increase in variable error across interviewers. If all interviewers erred in the same direction (or their characteristics resulted in errors of the same direction and magnitude), interviewer bias would result. For the most part, the literature indicates that among well-trained interview staff, interviewer error contributes to the overall variance of estimates as opposed to resulting in biased estimates (Lyberg and Kasprzyk, 1991). Other Essential Survey Conditions as Sources of Measurement Error Any data collection effort involves decisions concerning the features that define the overall design of the survey, referred to here as the “essential survey conditions.” In addition to the sample design and the wording of individual questions and response options, these decisions include the following: whether to use interviewers or to collect information via some form of self-administered questionnaire; the means for selecting and training interviewers (if applicable); the mode of data collection for interviewer administration (telephone versus face to face); the method of data collection (paper and pencil, computer assisted); whether to contact respondents for a single interview (cross-sectional design) or follow respondents over time (longitudinal or panel design);

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs for longitudinal designs, the frequency and periodicity of measurement; the identification of the organization for whom the data are collected; and the identification of the data collection organization. No single design feature is clearly superior with respect to overall data quality. For example, as noted above, interviewer variance is one source of variability that can be eliminated through the use of a self-administered questionnaire. However, the use of an interviewer may aid in the measurement process by providing the respondent with clarifying information or by probing insufficient responses. The use of a panel survey design, with repeated measurements with the same individuals, facilitates more efficient estimation of change over time (compared with the use of multiple cross-sectional samples); however, panel designs may be subject to higher rates of nonresponse (as a result of nonresponse at every round of data collection) or panel conditioning bias, an effect in which respondents alter their reporting behavior as a result of exposure to a set of questions during an earlier interview. The following scenario is an illustration of statistical measures of error used by survey methodologists. Assume that the measure of interest is personal earnings among all adults in the United States. A “true value” exists if the construct of interest is carefully defined. The data will be collected as part of a household-based health survey being conducted by telephone. The decision to use the telephone for data collection implies that approximately 5 percent of the adults will not be eligible for selection. To the extent that the personal earnings of adults without telephones differ significantly from those with telephones, population-based estimates for the entire adult population will suffer from coverage bias. Similarly, not all eligible sample persons will participate in the interview because of refusal to cooperate, an inability on the part of the survey organization to contact the respondent, or other reasons, such as language barriers or poor health that limits participation. Once again, to the extent that the earnings of those who participate differ significantly from those who do not participate, population-based estimates of earnings will suffer from nonresponse bias. If all respondents misreport their earnings, underreporting their earnings by 10 percent, and they consistently do so in response to repeated measures, the measure will be reliable but not valid and population estimates based on the question (e.g., population means) would be biased. However, multivariate model-based estimates that examine the relationship between earnings and human capital investment would not be biased, since all respondents erred in the same direction and relative

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs magnitude. Differential response error, for example, the overreporting of earnings by low-income individuals and the underreporting of earnings by high-income individuals, may produce unbiased population estimates (e.g., mean earnings per person) but biased model-based estimates related to individual behavior. MEASUREMENT ERROR: THE PSYCHOMETRIC PERSPECTIVE The language and concepts of measurement error in psychometrics are different from the language and concepts used within the fields of survey methodology and statistics. The focus for psychometrics is on variable errors; from the perspective of classical true score theory, all questions produce unbiased estimates, but not necessarily valid estimates, of the construct of interest. The confusion arises in that both statistics and psychometrics use the terms validity and reliability to sometimes refer to very similar concepts and to sometimes refer to concepts that are quite different. Within psychometrics, the terms validity and reliability are used to describe two types of variable error. Validity refers to “the correlation between the true score and the respondent’s answer over trials” (Groves, 1991, p. 8). The validity of a measure can be assessed only for the population, whereas the validity of both population estimates and individuals’ responses presented in the survey methodological literature can be assessed. Reliability refers to the ratio of the true score variance to the observed variance, where variance refers to variability over persons in the population and over trials within a person (Bohrnstedt, 1983). Once again, the measurement of reliability from this perspective does not facilitate measurement for a person but produces a measure of reliability specific to the particular set of individuals for whom the measurement was taken. The psychometric literature identifies several means by which validity can be assessed; the choice of measures is, in part, a function of the purpose of the measurement. These measures of validity include content, construct, concurrent, predictive, and criterion. If one considers that the questions included in a particular instrument represent a sampling of all questions that could have been included to measure the construct of interest, content validity refers to the comprehensiveness as well as the relevance of those questions. Content validity refers to the extent to which the question or questions reflect the domain or domains reflected in the conceptual definition. Face validity refers to the extent to which each item appears to measure that which it purports to measure. Cognitive interviewing techniques that focus on the comprehension of items by respondents are, to some extent, a test of face validity. Criterion-related validity evaluates the extent to which the measure of interest correlates highly with a “gold standard.” The gold standard

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs tively to the question in either survey. For example, among those 16 to 64 years of age, almost all (83.4 percent) of those who report a self-care limitation at the time of the census fail to report a self-care limitation in the CRS. Comparison of the percentage of persons with mobility and self-care limitations from the two surveys is confounded by differences in the essential survey conditions under which the data were collected and that most likely contribute to the discrepancies evident in the data. These differences include: Differences in the mode of data collection. The decennial census is, for the most part, a self-administered questionnaire, whereas the CRS is interviewer administered and is conducted either by telephone (84 percent) or as a face-to-face interview (16 percent). McHorney et al. (1994) report that telephone administration of the SF-36 led to lower levels of reporting of chronic conditions and self-reports of poor health compared with a self-administered version of the SF-36. Differences in the context in which the questions were asked. Although the wording of the specific items is almost the same with respect to mobility limitations or self-care limitations, as can be seen from a comparison of the two questionnaires, the context in which the questions are asked differs in the two instruments. Several additional questions concerning sensory impairments, the use of assistive devices for mobility, mobility limitations related to walking a quarter mile or up a flight of steps, and the ability to lift and carry objects weighing up to 10 pounds precede the items of interest in the CRS. There is a large body of literature that documents the existence of context effects in attitude measurement (e.g., Schuman and Presser, 1981). The asking of additional questions could prime the respondent to think about impairments that he or she did not consider while answering the census questions, thereby resulting in an increase in the reporting of limitations. Alternatively, having just answered questions about a number of sensory impairments and limitations, respondents, when answering the more general questions, assume that the general question is intended to capture information not already reported; in this case one would expect the CRS estimates to be lower than those based on the census form. (See Sudman et al. [1996] for a review of the theoretical underpinning related to context effects and a thorough discussion of addition and subtraction effects.) Self-reporting versus proxy reporting. There is little information as to who provided information on either the census form or the CRS.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs Although the CRS attempts to obtain self-reports from each adult household member, information for approximately 25 percent of the persons was reported by proxy. As noted earlier, proxy respondents tend to report more activity limitations and more severe limitations than self-respondents. Finally, the possibility that the lack of reliability is indicative of the occurrence of real change between the time of the census and the time of the CRS must also be considered. Although one can enumerate possible sources that explain the low rate of consistency between the two surveys, the lack of experimental design does not permit the identification of the relative contributions of the various design features to the overall lack of stability of these estimates. Empirical evidence shows that even when questions are administered under the same essential survey conditions, responses are subject to a high rate of inconsistency. This evidence comes from the administration of the same topical module on functional limitations and disability to respondents in the 1992-1993 panel of the Survey of Income and Program Participation. The module was administered between October 1993 and January 1994 (Time 1) and then again between October 1994 and January 1995 (Time 2). The context of the questionnaire is the same in both waves; the topical module is preceded by the core interview, which focuses on earnings, transfer income, program participation, and other forms of income. Information is collected for all members of the household, usually by having one person report for himself or herself and all other family members. In addition, information as to who served as the respondent is recorded; thus one can examine consistency in the reporting of information across time among all self-responses. Table 3 presents selected comparisons of functional limitations and sensory impairments reported at Time 1 with those reported at Time 2. The comparisons clearly reveal high levels of theoretical inconsistency, even among self-respondents. For example, among those who report an inability to walk at Time 1, only 70.3 percent report the same status at Time 2. Limiting the comparison to self-reports only does not greatly improve the consistency. Among self-reporters, 76.7 percent of those reporting inability to walk at Time 1 report the same status in the subsequent interview. These empirical findings illustrate some of the error properties associated with the measurement of functional limitations and sensory impairments. The research indicates that despite psychometric measures that indicate a relatively high degree of reliability, survey applications offer several examples of low levels of reliability, even under conditions in which the essential survey conditions are held constant. Subtle changes

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs TABLE 3 Selected Panel Survey of Income and Program Participation Data: Time 1 (October 1993-January 1994) and Time 2 (October 1994-January 1995) Comparisons, United States   All Cases Self-Respondents Both Times Status at Time 1 Number of Persons Percent at Time 2 with Disability Number of Persons Percent at Time 2 with Disability Uses cane, crutches, walker 508 45.5 286 50.0 Uses a wheelchair 175 61.7 83 68.7 Unable to see 159 49.1 87 49.4 Unable to hear 121 50.4 41 48.8 Unable to speak 47 68.1 5 80.0 Unable to walk 1,045 70.3 587 76.7 Unable to lift/carry 975 61.2 566 65.6 Unable to climb stairs 1,132 68.3 658 72.3 Needs help outside 699 53.5 302 57.3 Needs help bathing 271 52.0 114 54.4 Needs help dressing 237 49.8 80 55.0   SOURCE: McNeil, 1998. in the wording of questions, the order of questions, or the immediate prior context offer further illustration of the lack of robustness of these items. Although one can enumerate all of the factors that may contribute to this volatility, the relative contributions of the various factors have not been experimentally determined. Empirical Evidence Concerning Error in the Measurement of Work Disability The assessment of work disability in federal surveys has focused on variants of a limited number of questions, most of which concern whether the individual is limited in the kind or amount of work he or she is able to do or is unable to work at all because of a physical, mental, or emotional problem. Not dissimilar to the assessment of functional limitations, work disability is measured in data collection efforts that vary with respect to the essential survey conditions, the specific wording of questions, the number of questions asked, and the determination of severity, duration, and the use of assistive devices or environmental barriers. As McNeil (1993) points out, one of the problems with the current set of indicators

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs designed to measure work disability is that many fail to acknowledge the role of environmental barriers and accommodations. He states: Questions can be raised about the validity of data on persons who are “limited in kind or amount of work they can do” or are “prevented from working.” The work disability questions make no mention of environmental factors, even though it is obvious that a person’s ability to work cannot be meaningfully separated from his or her environment. Work may be difficult or impossible under one set of environmental factors but productive and rewarding under another. It would certainly be logical for a respondent to answer “no” to the question, “Do you have a condition that prevents you from working?” if the real reason he or she is not working is the inaccessibility of the transportation system or the lack of accommodations at the workplace. (pp. 3–4) As noted in the paper by Jette and Badley, the “fundamental conceptual issue of concern is that health-related restriction in work participation may not be solely or even primarily related to the health condition …”. One of the challenges facing questionnaire designers is the development of questions that match the conceptual framework of interest with respect to work disability, specifically, whether the focus is on the health condition that limits the individual’s ability to perform specific tasks related to a specific job, the external factors related to the performance of work, other factors that affect participation in the work environment (e.g., transportation), or all three sets of factors. Although McNeil (1993) raises questions concerning the validity of the work disability measures currently in use, several empirical investigations raise questions about the reliability of these measures, not unlike the findings with respect to the measurement of functional limitations and sensory impairments. Once again, it can be seen that differences in the wording of the questions, the context in which they are asked, the nature of the respondent, and other essential survey conditions, including the data collection organization and the sponsorship of the survey, may contribute to differences in estimates of the working-age disabled population. Haber (1990), as revised from Haber and McNeil (1983), examined work disability from selected surveys between 1966 and 1988. He notes that “despite a high degree of consistency in the social and economic composition of the disabled population over a variety of studies, the overall level of disability prevalence has varied considerably” (p. 43). Haber’s findings are reproduced in Table 4. The estimates from the various surveys represent differences in the year of administration, the wording of the questions, the overall content of the survey, the mode of administration, the organization collecting the information, and the organization

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs TABLE 4 Prevalence of Work Disability Across Various Surveys, United States, 1966-1982   Percent Classified with a Work Disability Data Source (age range [years] for estimate) Total Males Females 1966 SSA (18-64) 17.2 17.2 17.2 1967 SEO (17-64) 14.0 14.0 14.0 1969 NHIS (17-64) 11.9 13.1 10.9 1970 Census (16-64) 9.4 10.2 8.6 1972 SSA (20-64) 14.3 13.6 15.0 1976 SIE (18-64) 13.3 13.3 13.3 1978 SSA (18-64) 17.2 16.1 18.4 1980 Census (16-64) 8.5 9.0 8.0 1980 NHIS (17-64) 13.5 14.3 12.8 March, 1981 CPS (16-64) 9.0 9.5 8.5 March, 1982 CPS (16-64) 8.9 9.3 8.5 March, 1983 CPS (16-64) 8.7 9.0 8.3 March, 1984 CPS (16-64) 8.6 9.2 8.1 1984 SIPP (16-64) 12.1 11.7 12.4 March, 1985 CPS (16-64) 8.8 9.2 8.4 March, 1986 CPS (16-64) 8.8 9.4 8.2 1986 NHIS (18-64) 13.5 14.3 12.8 NOTES: SSA = Social Security Administration Disability Survey; SEO = Survey of Economic Opportunity; NHIS = National Health Interview Survey; SIE = Survey of Income and Education; March CPS = Annual March Supplement (Income Supplement) to the Current Population Survey; SIPP = Survey of Income and Program Participation. SOURCE: Haber, 1990. sponsoring the study. Although the wording of the questions is quite similar across the various surveys, there are some minor differences in specific wording (e.g., differences with respect to the emphasis on a health condition) and the order of the questions (e.g., whether the questions begin, as in the NHIS, by asking about whether a health condition keeps the person from working or begin, as in the SSA surveys, by asking whether the person’s health limits the kind or amount of work that the person can do). As is evident from Table 4, the survey’s content appears to be related to the overall estimate; the lowest rates of work disability prevalence come from the Census and the March Supplement to the Current Population Survey (8.5 to 9.4 percent), and the highest rates come from the surveys sponsored by SSA (14.3 to 17.2 percent). The lack of stability that was evident for estimates of mobility and self-care limitations between the 1990 census and the CRS is also evident for estimates of work disability. Table 5 presents the comparison of

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs TABLE 5 Work Disability: Distributions to Census Questions 18a and 18b and Content Reinterview Survey Questions 33a and 33b for Persons 16-64 years of age, United States, 1990   Content Reinterview Survey: Limited in Kind or Amount of Work or Prevented from Working Census Long Form: Limited in Kind or Amount of Work or Prevented from Working Yes No Total Yes 778 366 1,144 No 650 12,988 13,638 Total 1,428 13,354 14,782 NOTE: The prevalence rate based on census: 7.7 percent, of which 68 percent were consistent responses. The prevalence rate based on the Content Reinterview Survey: 9.7 percent, of which 54.5 percent were consistent responses. SOURCE: McNeil, 1993. responses between the 1990 census and the CRS with respect to whether the person is limited in the kind of work, or the amount of work, or is prevented from working at a job because of physical, mental, or other health conditions. Once again, it can be seen that between one-third and almost one-half of the respondents are inconsistent in their responses. More recent investigations have used the extensive data from NHIS-D to investigate alternative estimates of the population with work disabilities. The data also provide an opportunity to examine inconsistencies in the reporting of work disability and receipt of SSI or SSDI benefits. For example, LaPlante (1999) found that, based on the data from the NHIS-D, 9.5 million adults 18 to 64 years of age report being unable to work because of a health problem. Among these 9.5 million adults, 5.3 million (or 56 percent) do not report receipt of SSI or SSDI benefits. If one looks at those who report receiving SSI or SSDI benefits, 75 percent report that they are unable to work and 13 percent report that they are limited in the kind or amount of work that they can perform, but 12.3 percent who report receipt of benefits do not report any limitation with respect to work. Although these variations in estimates derived from different surveys suggest instability in the estimates of the proportion of persons with work disabilities as a function of the wording of the question, the nature of the respondent, and the essential survey conditions under which the measurement was taken, they provide little information about measurement

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs error within the framework of either survey statistics or psychometrics. Little is known about the validity of these items or the reliability of these items, whether one views validity from the perspective of survey statistics as deviations from the true value or from the perspective of psychometrics as criterion-related or construct validity. The relative contributions of various sources of error are, for the most part, unknown; it is only known that various combinations of design features produce different estimates. None of the studies address errors of nonobservation. QUESTION WORDING ISSUES RELATED TO SELECTED MEASURES OF WORK DISABILITY Jette and Badley point out the conceptual problems inherent in many questions designed to measure persons with work disabilities, including the failure of most questions to enumerate the separate elements related to the role of work. That failure is evident in most work disability screening questions designed to be administered to the general adult population. The gap between the conceptual framework and the questions used to screen for work disability, is illustrated by using questions from several federal data collection efforts. The long form of the decennial census for the year 2000 includes the following questions: Because of a physical, mental, or emotional condition lasting 6 months or more, does this person have any difficulty in doing any of the following activities: … d. (Answer if this person is 16 years old or over.) Working at a job or business? The respondent is to check a box corresponding to “Yes” or “No.” The question is complex for several reasons: The respondent must consider multiple dimensions of health (physical, mental, and emotional) and attribute difficulty working at a job or business to one or more of these health problems. The explicit enumeration of physical, mental, or emotional conditions serves as a means of clarifying for the respondent the fact that the question is intended to cover all three dimensions of health, but at the cost of additional cognitive processing by the respondent. The respondent must also assess the duration of the condition and determine the degree to which the 6 months is intended to convey

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs 6 months specifically or a more general concept of a “long-term” condition. The term “difficulty” is subject to interpretation. Cognitive evaluation of the term “difficulty” suggests that for some respondents the term implies capacity or ability to perform the activity but does not infer actual participation in the activity. What is or is not included in the concept of working is further subject to interpretation by the respondent (e.g., inclusion or exclusion of sheltered workshops). As with many single screening items, the question fails to address accommodations that facilitate participation or barriers that prohibit participation. For example, if an individual is currently employed in an environment that accommodates a health condition, the respondent must determine whether the person should be considered as having difficulty working, even though the present employment situation presents no difficulty to the person. The NHIS asks two questions concerning work limitations: Does any impairment or health problem NOW keep _______ from working at a job or business? Is ____ limited in the kind OR amount of work ___ can do because of any impairment or health problem? In contrast to the questions in the census long form, the NHIS questions do not enumerate the various areas of health for consideration, nor does either question include a qualifying statement with respect to duration. The two questions are more specific in addressing the impact on working; compared with the term “difficulty” used in the census questionnaire, the NHIS probes whether a condition prevents the person from working or limits the kind or amount of work. Once again, note the lack of distinction between the ability to perform the activities associated with the actual performance of the job and those activities related to the role of work. For those who retire early because of a health condition or impairment, would the respondent consider that health problem as keeping the person from working? IMPLICATIONS FOR METHODOLOGICAL RESEARCH The point of the examples presented above is not to criticize the questionnaires in which they appear but rather to illustrate the problem of

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs attempting to measure a complex, multidimensional, dynamic construct with a single question or a set of two questions. No one or even two questions can possibly tap into the various components of work disabilities. Clearly the first step toward a robust set of screening items is the acceptance of a shared conceptual framework and understanding of the dimensions of the construct of interest. That framework must consider the social environment in which the measurement of interest will be taken, understanding that the comprehension of the question is shaped not only by the specific words used in the question and the context of the question, but by the perceived intent of the question. The use of cognitive laboratory techniques can aid in the identification of problems of comprehension due to the use of inherently vague terms and differential perceptions of the intent of the question. Such techniques will aid in the understanding of the validity of the questions and, through the refinement of the wording of questions, hopefully improve the reliability of the items. Simply documenting that variation in the essential survey conditions of the measurement process contributes to different estimates of persons with work disabilities is not sufficient; the marginal effects of various factors need to be measured and the impact needs to be reduced through the use of alternative design features. Both of these can be accomplished only through a program of experimentation. Similarly, the psychometric properties of these measures need to be assessed. Without undertaking a thorough program of development and evaluation, the discrepant estimates evident in the empirical literature will persist. REFERENCES Andersen R, Kasper J, Frankel M. 1979. Total Survey Error. San Francisco: Jossey-Bass Publishers. Ashberg K. 1987. Disability as a predictor of outcome for the elderly in a department of internal medicine. Scandinavian Journal of Social Medicine 15:261–265. Beatty P, Davis W. 1998. Evaluating Discrepancies in Print Reading Disability Statistics through Cognitive Interviews. Unpublished Memorandum. Washington, DC: U.S. Bureau of the Census. Bohrnstedt G. (1983) Measurement. In Rossi, Wright, Anderson, eds. Handbook of Survey Research . New York: Academic Press. Brewer M. 1988. A dual process model of impression formation. In: Srull T, Wyer R, eds. Advances in Social Cognition, Volume 1. Hillsdale, NJ: Lawrence Erlbaum Associates. Brorsson B, Asberg K. 1984. Katz Index of Independence in ADL: Reliability and validity in short-term care. Scandinavian Jour of Rehabilitation Medicine 16:125–132. Cannell C, Fisher G, Bakker T. 1965. Reporting of hospitalizations in the health interview survey. Vital and Health Statistics, Series 2, Number 6. Washington, DC: U.S. Public Health Service. Forsyth B, Lessler J. 1991. Cognitive laboratory methods: A taxonomy. In: Biemer, Groves, Lyberg, Mathiowetz, Sudman, eds. Measurement Errors in Surveys. New York: John Wiley and Sons.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs Groves R. 1989. Survey Errors and Survey Costs. New York: John Wiley and Sons. Groves R. 1991. Measurement errors across the disciplines. In: Biemer P, Groves R, Lyberg L, Mathiowetz N, Sudman S, eds. Measurement Errors in Surveys. New York. John Wiley and Sons. Groves R, Couper M. 1998. Nonresponse in Household Surveys. New York: John Wiley and Sons. Haber L. 1967. Identifying the disabled: Concepts and methods in the measurement of disability. Social Security Bulletin 30:17–34. Haber L. 1990. Issues in the definition of disability and the use of disability survey data. In: Levin, Zitter, Ingram, eds. Disability Statistics: An Assessment. Washington, DC: National Academy Press. Haber L, McNeil J. 1983. Methodological Questions in the Estimation of Disability Prevalence. Unpublished report. Washington, DC: U.S. Bureau of the Census. Hahn R, Truman B, Barker N. 1996. Identifying ancestry: The reliability of ancestral identification in the United States by self, proxy, interviewer and funeral director. Epidemiology 7:75–80. Hansen M, Hurwitz W, Bershad M. 1961. Measurement errors in censuses and surveys. Bulletin of the International Statistical Institute 38:359–374. Jette A. 1994. How measurement techniques influence estimates of disability in older populations. Social Science and Medicine 38:937–942. Jette A, Badley E. 1999. Conceptual issues in the measurement of work disability. In: Mathiowetz N, Wunderlich GS, eds. Survey Measurement of Work Disability: Summary of a Workshop. Washington, DC: National Academy Press. Pp. 4–27. Jobe J, Mingay D. 1990. Cognitive laboratory approach to designing questionnaires for surveys of the elderly. Public Health Reports 105:518–524. Jones E, Nisbett R. 1971. The Actor and the Observer: Divergent Perceptions of the Causes of Behavior. Morristown, NJ: General Learning Press. Katz S, Akpom C. 1976. Index of ADL. Medical Care 14:116–118. Katz S, Ford A, Moskowitz R, Jacobsen B, Jaffe M. 1963. Studies of illness in the aged: The index of ADL: A standardized measure of biological and psychosocial function. Journal of the American Medical Association 185:914–919. Katz S, Downs T, Cash H, Grotz R. 1970. Progress in development of the Index of ADL. Gerontologist 10:20–30. Keller D, Kovar M, Jobe J, Branch L. 1993. Problems eliciting elders’ reports of functional status. Journal of Aging and Health 5:306–318. Kish L. 1965. Survey Sampling. New York: John Wiley and Sons. LaPlante M. 1999. Highlights from the National Health Interview Survey Disability Study. Presentation to the Committee to Review the Social Security Administration’s Disability Decision Process Research, Institute of Medicine, and Committee on National Statistics, National Research Council. Lord F, Novick M. 1968. Statistical Theories of Mental Test Scores. Reading, Mass: Addison-Wesley. Lyberg L, Kasprzyk D. 1991. Data collection methods and measurement error: An overview. In: Biemer, Groves, Lyberg, Mathiowetz, Sudman, eds. Measurement Errors in Surveys. New York: John Wiley and Sons. Mathiowetz N, Groves R. 1985. The effects of respondent rules on health survey reports. American Journal of Public Health 75:639–644. Mathiowetz N, Lair T. 1994. Getting better? Change or error in the measurement of functional limitations. Journal of Economic and Social Measurement 20:237–262. McDowell I, Newall C. 1996. Measuring Health. A Guide to Rating Scales and Questionnaires. New York: Oxford University Press.

OCR for page 211
The Dynamics of Disability: Measuring and Monitoring Disability for Social Security Programs McHorney C, Kosinski M, Ware J. 1994. Comparisons of the costs and quality of norms for the SF-36 Health Survey collected by mail versus telephone interview: Results from a national survey. Medical Care 32:551–567. McNeil J. 1993. Census Bureau Data on Persons with Disabilities: New Results and Old Questions about Validity and Reliability. Paper presented at the 1993 Annual Meeting of the Society for Disability Studies, Seattle, Washington. McNeil J. 1998. Selected 92/93 Panel SIPP Data: Time 1 = Oct.93–Jan.94, Time 2 = Oct.94– Jan.95. Unpublished table. Moore J. 1988. Self/proxy response status and survey response quality: A review of the literature. Journal of Official Statistics 4:155–172. Rodgers W, Miller B. 1997. A comparative analysis of ADL questions in surveys of older people. The Journals of Gerontology 52B:21–36. Rosenberg M. 1990. The self-concept: Social product and social force. In: Rosenberg M and Turner R, eds. Social Psychology: Sociological Perspectives. New Brunswick: Transaction Publishers. Rubinstein L, Schaier C, Wieland G, Kane R. 1984. Systematic biases in functional status assessment of elderly adults: Effects of different data sources. The Journal of Gerontology 39(6):686–691. Sampson A. 1997. Surveying individuals with disabilities. In Spencer B, ed. Statistics and Public Policy. Oxford: Clarendon Press. Schuman H, Presser S. 1981. Questions and Answers in Attitude Surveys. New York: Academic Press. Schwarz N, Wellens T. 1997. Cognitive dynamics of proxy responding: The diverging perspectives of actors and observers. Journal of Official Statistics 13:159–180. Suchman L, Jordan B. 1990. Interactional troubles in face-to-face survey interviews. Journal of the American Statistical Association 85:232–241. Sudman S, Bradburn N. 1973. Effects of time and memory factors on response in surveys. Journal of the American Statistical Association 68:805–815. Sudman S, Bradburn N, Schwarz N. 1996. Thinking About Answers: The Application of Cognitive Processes to Survey Methodology. San Francisco: Jossey-Bass. Tourangeau R. 1984. Cognitive sciences and survey methods. In: Jabine, Straf, Tanur, Tourangeau, eds. Cognitive Aspects of Survey Methodology: Building a Bridge Between Disciplines. Washington, DC: National Academy Press. U.S. Bureau of the Census. 1993. Content Reinterview Survey: Accuracy of Data for Selected Population and Housing Characteristics as Measured by Reinterview. U.S. Department of Commerce, 1990 Census of Population and Housing, Evaluation and Research Reports. Washington, DC. Willis G, Royston P, Bercini D. 1991. The use of verbal report methods in the development and testing of survey questionnaires. Applied Cognitive Psychology 5(3):251–267.