Click for next page ( 237


The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 236
11 Statistical Methodology for Health Policy Analysis INTRODUCTION Previous chapters of this report have described the changing demographic profile of the U.S. population and the impact of the growing number of older Americans on demand for health, housing, and social services. They have also discussed the need for additional scientific research and policy-oriented analysis on patterns of aging and their consequences. Such investigations will require sophisti- cated use of existing statistical methodology and, in some cases, the development of new methods. Although the importance of longitudinal information sometimes seems self-evident to investigators, the need for longitudinal studies and, conversely, the limitations of cross-sectional and retrospective investigations are not always apparent to decision makers and fund- ing agencies. Thus, this chapter begins with a discussion of the rationale for longitudinal studies, followed by a discussion of prob- lems in the design and analysis of longitudinal studies. A second major theme in this report is increased usage of existing studies and administrative information through linkage of data bases. The sec- ond section of this chapter is devoted to a discussion of some of the administrative, legal, and technical issues raised by attempts to link data bases. The third section of the chapter discusses methodolog- ical issues in a third area of fundamental importance in the study of aging, forecasting of the sizes and composition of populations, as well as population characteristics such as health status and needs for 236

OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 237 services. The final section of the chapter discusses a generic problem in policy analysis, the quantification and reporting of uncertainty. LONGITUDINAL IN1?ORMATION The Rationale for Longitudinal Studies Given the cost and complexity of developing longitudinal infor- mation for studies of any population and particularly the elderly, it is reasonable to ask why the data needed about the policy implications of an aging population cannot be obtained by simpler and less costly designs, especially cross-sectional studies. In subsequent paragraphs, we discuss two considerations that motivate longitudinal studies: (1) the need to study aging as a process and (2) the need to reduce bias and improve precision of estimates of net change in populations. In many areas of investigation discussed in this report, the essen- tial role of longitudinal data derives from the need to study aging and its consequences as processes, that is, complexes of states and events occurring over time (Rowe, 19773. Because these processes and the factors that determine their course are defined for individuals rather than population aggregates, they can be studied only by gathering data on individuals over time (Fienberg and Tanur, 1986~. Examples of processes that must be understood in order to anticipate the needs of tomorrow's elderly include: (lj The interaction between economic circumstances and health care utilization. One aspect of this interaction is the effect of acute and chronic illness on economic circumstances after retire- ment (Menefee, 1985), especially the spend-down to poverty that can occur during an extended illness. Another aspect is the eject of economic circumstances on patterns of health care utilization and the influence of different pathways on the economics of care (Meiners, 1985b). (2) The dynamics of social networks involving elderly persons and the effects of social support on the subsequent physical and mental health of the elderly (Berkman, 1985~. (3) Patterns of morbidity and mortality in the elderly and the predictive significance of functional status, living conditions, and clinical risk factors (see Chapter 3~. By definition, these are questions about change-change in health or functional status, economic circumstances, or environment, and the resulting changes in utilization and costs of health, social,

OCR for page 236
238 AGING POPULATION IN THE TWENTY-FIRST CENTURY and housing services. More generally, the preceding chapters have identified a need for increased understanding of the transitions be- tween states experienced by elderly persons and the dependence of these transitions on factors such as diseases, lifestyle habits, and changes in psychosocial supports. Only longitudinal data can pro- vide the basic observations of change that are the objects of study. Stochastic models can play an important role in the study of these phenomena. By formulating, fitting, and validating explicit models, statisticians can clarify multivariate relationships and refine projections of the health and service needs of tomorrow's elderly population. For example, models for relationships between health care costs and health care utilization can be used to examine the effects of changes in costs of medical services on future demand. Ideally a stochastic model for health care utilization and costs would also consider the role of personal health, personal economic resources, and other factors that influence health care needs. In practice, more complex models can be more difficult to validate and can also lead to more uncertain projections. Thus, stochastic modeling requires careful choices about model complexity. Longitudinal data can be obtained either prospectively, by fol- lowing individuals over time, or retrospectively, by obtaining his- torical information from study participants. Retrospective studies can play an important role in the study of aging, as evidenced by the ongoing NIA Survey of the Last Days of Life (National Research Council, 1986~. They have important limitations, however, especially in studies of elderly persons. First, many subjective states or events cannot be reconstructed retrospectively. Similarly, physiologic mea- surements can be obtained only in prospective studies. Retrospective studies will also be affected by selective mortality. The absence of deceased persons from a-retrospective study will produce an incom- plete picture of the process under study. Finally, even objective states, such as changes in family composition or economic circum- stances, are subject to inaccuracies of recall that can be especially severe in elderly persons. Thus, although retrospective studies can be of value, they do not provide a general methodologic alternative to longitudinal designs. Longitudinal data are required for the study of gross flows and for the study of individual change and its determinants-studies that cannot be carried out with cross-sectional data. Although either longitudinal data or cross-sectional data can be used to provide data about net change in populations with age, several problems can

OCR for page 236
STATISTICAL METHOD OLOGY FOR HEALTH POLICY ANALYSIS 239 be encountered in using cross-sectional data for this purpose. By net change, we mean the difference in age-specific distributions of population characteristics. As a specific example, we might wish to estimate the changes between ages 70 and 75 in percentages of persons who live in nursing homes and who need assistance with two or more activities of daily living. Using successive cross-sectional data, the percentage of 7~year-old residents requiring such assistance could be estimated at one year and the percentage of 75 residents requiring such assistance could be estimated 5 years later. This method could give a biased estimate of net change in that five-year period because of selective mortality. In discussing mortality as a potential source of bias in cross- sectional data Rowe (1977) points out that whenever the variable under study is related to survival, a cross-sectional study can give biased estimates of age-related changes. In the study of changes in living conditions, for example, higher rates of mortality among individuals living in nursing homes would lead to underestimates of the age-specific rates of movement from independent living to nursing homes in cross-sectional ciata. Selective mortality can be viewed as a special kind of missing data problem; the effects of missing data on longitudinal analysis are discussed below. Data from a single cross-sectional study could be used to make the comparison described above, which was based on sequential cross- sectional data. The percentage of 70-year-old residents needing assis- tance with two or more activities of daily living could be compared with the corresponding percentage of 75-year-old residents at the time of the study. Studying net change in this manner introduces the additional problem of cohort effects. Cohort effects in this instance would be differences between age-specific patterns of limitations in activies of daily living in cohorts born five years apart. The size of cohort effects can vary depending on the variable under study. One way to address the issue of cohort effects is to use successive cross-sectional data for the same cohort as described above. In view of these considerations, cross-sectional data can be used to obtain valid estimates of net change if the potential bias from selective mor- tality and cohort effects is small (Louis et al., 1986~. Otherwise, longitudinal data are necessary for this purpose. Finally, we note that longitudinal or repeated measures exper- iments are often designed to increase the precision of estimates of changes over time or between experimental conditions. Although this can also be a consideration in designing observational studies of

OCR for page 236
240 AGING POPULATION IN THE TWENTY-FIRST CENTURY aging, efficiency is typically a second-order consideration relative to considerations of informativeness and validity. In summary, longitudinal designs are essential to the study of processes associated with aging and the factors determining their course because their focus on individuals encompasses patterns of improvement over time as well as decline. Cross-sectional data po- tentially can provide some information about net change in popula- tions due to aging, but they can give biased, and possibly misreading, estimates when the variables under study are subject to cohort or selection effects. Designing Longitudinal Studies Although the design phase of longitudinal studies has received some attention in the statistical literature, most papers have ad- dressed single issues in highly simplified settings. Several papers, for example, have investigated the relative efficiency of different re- peated measures designs for estimating the average rate of change over time of a single measured variable. Schiesselman (1973) and Berry (1974) discussed the effect of duration and frequkency of mea- surement on the precision of estimated rates of change in measured physiologic variables such as pulmonary function or blood pressure measurements. For a review of this literature, see Cook and Ware (1983~. Design issues in more complex, possibly multipurpose, studies have not received the same attention, in part because the formula- tion of the issues is sensitive to the type of process and end point to be studied. One important class of questions involves the definition of units of study and end points. For example, longitudinal studies of family units require a longitudinal definition of the family. The Income Survey Development Program (David, 1983) defined house- holds operationally by following members of the original household even when they moved away from the household group, but chose not to follow individuals entering the household after the original interview if they subsequently moved away. Each such decision cre- ates opportunities and limitations in subsequent analysis. Similarly, when variables are measured repeatedly over time, special care is needed in developing measurement systems that are free of spurious temporal variation. This is easily appreciated in longitudinal studies of physiologic variables (Dawber, 1980) but equally important in the collection of questionnaire or interview data.

OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 241 Every longitudinal study raises issues of this type, including the definition of study end points, the duration of study and frequency of measurement, the nature of the measurement system, the definition of study units, the nature of the sampling and follow-up plans, and many more such issues. We believe that resolution of these issues, critical to the success of complex studies, requires special considera- tion of the features of individual research settings. The panel urges that sponsors of longitudinal studies make provision in study staffing and timetables to allow for considerations of the special design prob- lems posed by individual studies. Designs and Their Implications for Longitudinal Data Analysis In empirical applications, testing whether specific classes of stochastic process models describe the occurrence of events or the evolution of continuous variables is best facilitated by observing, in full, many realizations of the underlying process for all times in a wide time interval. Examples of such data are the work histories in the Seattle and Denver Income Maintenance Experiments (Tuma et al., 1979), the fertility histories in the Taichung IUD experiment (Freedman and Takeshita, 1969i, and the job vacancy histories for 1 - m~n~sters In Episcopalian churches in New England (White, 1970~. In most substantive contexts, however, ascertaining the exact timing of each occurrence of art event for each individual is either impossi- ble, economically infeasible, or both. Observations usually contain gaps and censoring relative to a continuously evolving process. Three examples of this situation are as follows: (1) In the Framingham study (Dawber, 1980) of atherosclerotic disease, individuals were examined once every two years, at which times symptoms of illness, hospitalizations, or other events occurring between examinations were recorded (retrospective information). In addition, a physical examination, some blood studies, and other laboratory work (current information) were completed. One topic of considerable interest is the intraindividual dynamics of systolic blood pressure. This is a continuous-time and continuous-state process that can be modeled only using the biennial samples; i.e., measurements made at the examinations. Such data represent fragmentary infor- mation about the underlying process. (2) In the Taeuber et al. (1968) residence history study, ob- servations were taken retrospectively on current residence, first and second prior residence, and birthplace of individuals in particular

OCR for page 236
242 AGING POPULATION IN THE TWENTY-FIRST CENTURY age cohorts. Analyses in which duration of residence is a dependent variable of interest must accommodate censoring on the right for current residence. Furthermore, characterizations of the pattern of adult residence histories is complicated by the fact that initial con- ditions are unknown for persons who have occupied more than three residences beyond, for example, age 18. (3) The first Duke longitudinal study of aging was conducted on a sample of 271 male and female volunteers who were socioe- conomically representative of elderly persons in the community of Durham, North Carolina. The mean age of the study population at the first test data was 71.3 years. At each of the 11 waves of the 21-year pane} survey, a standard battery of physiological risk fac- tors, serum cholesterol, diastolic blood pressure, and pulse pressure were ascertained from all persons still alive and in the study. Wech- sler intelligence tests were also given to participants at each wave. In modeling the age-dependent risk factors that affect mortality in this elderly population (Manton and Woodbury, 1983b), the analyst must take account of the fact that (a) relative to continuously evolv- ing processes, there are gaps in the data on the physiological risk factors they are measured at most 11 distinct times in 21 years; (b) high rates of mortality selection and unequal follow-up times led to small sample sizes for studying risk factor changes at the most ad- vanced ages, and (c) all variables are not necessarily recorded for all individuals present at a given survey, thereby yielding very intricate patterns of missing data within a survey and over time. A feature of modeling with such fragmentary data is that aige- braic characterizations of the data sets that can possibly be generated by given continuous-time models are frequently very difficult to ob- tain. However, these characterizations are, of necessity, the basis of tests for compatibility of the data with proposed classes of models. In addition, estimation of quantities such as rates of occurrence of events per individual at risk of the event at a given time is compli- cated by the fact that some of the occurrences are unobserved. This necessitates estimation of rates that have meaning within stochastic process models that are found to be compatible with the observed data. Analytical Strategies In the ideal but rather infrequent settings in which evolving processes are observed over a time interval, two quite disparate in

OCR for page 236
STATISTICAL METHOD OLOGY FOR HEALTH POLICY ANALYSIS 243 terms of historical roots- but nevertheless related modeling frame- works have been utilized to estimate rates of occurrence of events, changes in levels of continuous variables, and parameters associated with covariates that are viewed as the primary factors that influence outcomes. One of these modeling frameworks is a direct adaptation of the classical linear statistical models to longitudinal data, reviews of which are contained in Ware (1985) and Ware et al. (1988~. Also see Geisser (1980) for pertinent review of the capabilities of longitudinal data analysis for making projections and predictions. An alternative framework, originating from diverse stochastic process specifications, is the multivariate counting process literature (see the review by An- dersen and Borgan, 1984) and the multivariate Gaussian process and diffusion process literature (see in this regard Woodbury et al., 1979, Manton and Woodbury, 1983a). Applications of both the stochastic process framework and the linear models specifications in the context of labor economics and the sociology of work are discussed in detail in Heckman and Singer (1985~. For data with the types of limitations illustrated in the exam- ples given above, a major research program remains to be carried out before longitudinal data analysis can be regarded as a mature subject. A commonly used analytical strategy proceeds according to the following basic steps: (1) When estimation of transition rates between a discrete set of states is the primary focus of the analysis, one begins with very simple, somewhat plausible classes of models as candidates to de- scribe some portion of the observed data and within which the unobserved dynamics are well defined for example, a time series of time-homogeneous Markov chains for which each separate mode! describes only unobserved dynamics between a pair of consecutive surveys in a multiwave pane! design and fits the observed transitions. (2) Estimate and interpret the parameters of interest-for ex- ample, transition rates between discrete pairs of states- within the simplified models and then assess whether these models can, in fact, account for finer-grained detail such as the joint frequency of state occupancy at three or more consecutive surveys in a multiwave pane! study. (3) Typically, the original proposed models they are usually first-order Markovian across a wide range of subject matter con- texts-that may adequately represent data based on pairs of consec- utive surveys will not account for higher-order dependencies. Such

OCR for page 236
244 AGING POPULATION IN THE TWENTY-FIRST CENTURY dependencies tend to be the rule rather than the exception in Tongitu- dinal microdata. We then Took for structured residuals from the sam- ple models to guide the selection of more realistic and interpretable specifications (see, e.g., Singer and Spilerman, 1976; Goodman, 1978; Duncan, 1981) for a discussion of this kind of strategy in a variety of sociology and economics investigations). The repeated fitting of models and subsequent utilization of structured residuals to guide successively more realistic mode! selec- tion is a strategy that, on the surface, seems to be very reasonable. However, the process frequently stagnates after only one or two stages because the possible explanations for given structured residuals are usually too extensive to be helpful by themselves. One really needs, in addition, specific subject matter theories translated into mathe- matics to guide the mode} selection process. Unfortunately, in most fields in which analysis of longitudinal microdata is of interest, the development of substantive theory is quite weak. The potential danger of the foregoing strategy, even for the estimation of transition rates, is that parameter estimates may be biased simply as a result of mode! misspecifications. The biases, in turn, can lead to incorrect conclusions about relationships between events. As the foregoing discussion suggests, considerable room remains for the development and assessment of stochastic models for temporal processes. Ideally these models will be based on mathematical con- structs that represent the underlying biological, behavioral, or other processes in a meaningful way. Analytic methods should be applica- ble in settings in which data sets are unbalanced and incomplete. In some cases, it will be important to explicitly mode! the processes that lead to missing data. Although there are many successful examples of longtitudinal analysis, methodological approaches and applications are very diverse and approaches tend to differ in different applied settings. In particular, the econometric, sociometric, biometric, and human growth literatures contain extensive work on longitudinal methods, but communication among the groups of investigators is imperfect. One can hope that future methodological developments will lead toward more unified approches to model formulation and data analysis. An important new development for exploring high-climensional longitudinal data sets is the faintly of grade of membership (GOM) models developed by Woodbury et al. (1978), Woodbury and Man- ton (1982), and Manton et al. (1987~. This modeling framework is

OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 245 well suited to studying the dynamics of highly heterogeneous elderly populations and has the added very desirable feature of breaking free of conventional regression structures when assessing the impact of covariates on outcome variables. The essential idea is to represent the individual histories in a vector stochastic process in terms of the evolution of degrees of similarity (or grades of membership) of individuals to ideal (or pure type) profiles. The profiles are charac- terized by combinations of levels on particular covariates that occur with high frequency. The profiles also need not be static. They may in fact be stochastic processes themselves, thereby leading to a representation of individual dynamics in terms of evolving degrees of similarity to special pure type processes. Although much further theoretical, empirical, and numerical computational development re- mains to be carried out before GOM can be considered to be a well understood and readily utilized framework, it has already shown suf- ficient promise in studies of health status dynarn~cs and health care utilization in elderly populations to warrant much more sustained investigation. A much neglected topic with both fragmentary and relatively complete longitudinal data is the recognition that choice-based sam- pling is a pervasive aspect of many data sets, and that it plays a particularly central role in program evaluation of almost every kind. For a very readable and insightful introduction to the analytical issues associated with choice-based samples and selection bias, see Heckman and Hotz (1987~. This topic should be but to date has not been playing a major role in evaluations of health care systems for elderly populations. It is a research area critically in need of development. Recommendation 11.1: Given the growing importance and complexity of longitudinal research, the pane! recommends that federal agencies encourage methodological research on innovative approaches to study design and analysis, both through support of methodological work within their own technical groups and through funding of methodological re- search. RECORD [IN1LAGE The Rationale for Record Linkages Many of the data requirements discussed in preceding chapters

OCR for page 236
246 AGING POPULATIONIN THE TWENTY-FIRST CENTURY can be satisfied most economically by linking data from different sources. Linkages may be at the aggregate level or they may involve individual records from different data systems. Linkages of individ- ual records may be accomplished either by exact matching or by statistical matching. In an exact match, the goal is to link records for the same individuals from two or more data systems; in a statis- tical match, the goal is to link records for individuals who are similar in important respects. Exact matching, if feasible, is a method to be adopted because fewer assumptions are needed to ensure the va- lidity of analyses based on the linked data sets (U.S. Department of Commerce, 1980, Recommendation lb, p.33~. Statistical purposes of record linkages include enhancement of survey data by adding data from administrative files, development of sampling frames for surveys and the evaluation of coverage and response errors in censuses and surveys. Important multipurpose statistical data systems have been developed by exact matching of records from different administrative files maintained by federal agen- cies. Linkage of existing records as an alternative to direct data col- lection has become more appearing because of the development of so- phisticated computerized record linkage techniques, based on models first proposedin the 1950s and 1960s (Newcombe et al., 1959; Fellegi and Sunter, 1969~. Linkages are further facilitated by increasingly widespread use of Social Security numbers as identifiers (Jabine, 1985~. Examples of exact record linkages that have contributed or have the potential to contribute to our information base on the status of older Americans include: Linkage of administrative data on retirement benefits to sur- vey data collected in the Social Security Administration's Retirement History Survey (Fox, 1979~. Linkage of survey records from the Current Population Sur- vey, the National Health Interview Survey, and the National Health and Nutrition Examination Survey to death records in the National Death Index (Patterson and Bilgrad, 1985~. Linkage of survey records from the National Medical Care Uti- lization and Expenditures Survey to records in the Medicare and Medicaid administrative data files (Cox and Bonham, 1983~.

OCR for page 236
248 AGING POPULATION IN THE TWENTY-FIRST CENTURY aspects for which significant technical improvements are possible. These aspects include, for example, the choice and standardization of matching variables, the choice of blocking strategies (blocking limits the comparison of records in two files to those that agree on certain variables that appear in both files) and the role of manual intervention in computerized systems. Matching errors inevitably occur in linked data files, so far relatively little has been done to assess these errors and to develop estimation and analysis techniques that take them into account. One source of error among the elderly is the often noted phenomenon of age misreporting (exaggeration) among those age 85 and over (Rosenwaike, 1968~. Recommendation 11.3: The pane] recommends that fed- eral agencies support research aimed at the development of improved multipurpose computerized record linkage sys- tems and better methods of estimation and analysis when matching errors are present. Formats vary for recording identifiers such as names and ad- dresses in statistical and administrative data files. Other potential matching variables, such as age (or date of birth), race, ethnic origin, and marital status are defined and categorized in various ways. These variations make record linkages more difficult and lead to matching errors. Recommendation 11.4: The pane! recommends that fed- eral agencies that maintain statistical or administrative data bases work together to encourage standardization of defi- nitions and reporting formats for personal identifiers (the Social Security number ~ critically important) and personal characteristics likely to be used as matching variables. Although in the long run some data requirements can be met more economically by linking existing records, it is often Biscuit to obtain the necessary resources. Substantial research and devel- opment efforts may be needed to explore feasibility and to adapt existing record linkage programs to malice them suitable for a spe- cific application. Understandably, agencies like the Social Security Administration and the Internal Revenue Service that maintain po- tentially useful administrative data systems do not consider the de- velopment of statistical data bases not directly related to their own programs to be a high-priority activity. Therefore most of the re- sources must come from potential users, some of whom may be

OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 249 reluctant to invest significant amounts in the development of data systems over which they will have relatively little control. Statutes like the 1974 Privacy Act, the Census Bureau's Title 13, and the Tax Reform Act of 1976 restrict, in varying degrees, the disclosure of individually identifiable data by one agency to another. Nevertheless, interagency record linkages can usually be undertaken in ways that do not violate statutory prohibitions, provided the agen- cies controlling the record systems involved are sufficiently motivated to do the linkage. These statutory limitations may, however, be con- sidered obstacles to record linkages in two senses. First, they provide a convenient argument against performing a particular record linkage for an agency that is not anxious to do it. Second, even when both agencies want to do the linkage, the time required for development and approval of suitable arrangements can often substantially delay completion of a project, adversely affecting the utility of results. Recommendation 11.5: The pane! recommends that heads of appropriate federal agencies take the initiative to seek methods of overcoming legal and policy obstacles to cost- effective use of existing information through linkages that are clearly beneficial to the public and ethical, and that they seek legislative remedies if necessary. Policy considerations are probably the primary obstacle to suc- cessfu! record linkage undertakings. Both statistical and adminis- trative agencies are concerned that public cooperation with their requirements or requests for information may be adversely affected if it becomes widely known that they are allowing such information to be linked with the records of other agencies. The Privacy Act, however, requires that people who supply information be informed of any plans to disclose such information in individually identifiable form. Agencies vary considerably in the amount of detail they pro- vide in their Privacy Act notification statements: this is an ethical dilemma that has received some recent attention (Scheuren, 19853. These concerns are by no means groundless. Computers, data banks, and record linkages are, rightly or wrongly, viewed with con- cern or suspicion by a substantial segment of the public. Even those _ ,, , , . . .. . . ., . ~ . - . agencies that nave mandatory authority to collect information de- pend heavily on voluntary cooperation to obtain the data they need. Any widely publicized report of deception or other improper behavior on their part, whether accurate or not, could have serious repercus- sions. A recent example is the controversy that erupted in Sweden,

OCR for page 236
250 AGING POPULATION IN THE TWENTY-FIRST CENTURY early in 1986, concerning privacy aspects of Project Metropolitan, a sociological record linkage study designed to follow, over a 20-year period, all l~year-olds who lived in Stockholm in lg63. Nonresponse in Sweden's monthly labor force survey has more than doubled in recent months, apparently as a result of the public debate about Project Metropolitan (Dalenius, 1986~. Obstacles to Dissemination and Use of Record Linkage Results For record linkages that are undertaken despite the obstacles just described, the benefits derived depend on how widely and in how much detail the resulting data are disseminated to potential users. Publicly collected data are disseminated in two forms: ag- gregate statistics and microdata files. The latter contain individ- ual records from which explicit identifiers, such as name and ad- dress, have been removed (microdata files that can be released with- out restriction are called public-use files). Statistical agencies have been releasing public-use microdata files for about 30 years, and the widespread availability of these files has benefited society by increas- ing the amount and relevance of quantitative information available for collective decision making. Agencies that release data for statistical purposes must attempt, whether or not the data are derived from linked records, to avoid dis- closing information in a form that makes it possible to identify spe- cific individuals and thereby learn more about them. In other words, they must avoid statistical disclosure. Methods used to limit the risk of statistical disclosure resulting from release of microdata files include elimination of some geographic and other detail, replacement of exact amounts with class intervals, and deliberate introduction of error. In practice, zero risk of disclosure is unattainable (Cox et al. 1985~. The development of powerful computerized record linkage techniques has increased the likelihood of success by someone who, for whatever reason, might deliberately try to identify one or more persons from a public-use microdata file. Recent research by Paass (1985) suggests that deliberate introduction of random errors into records does not provide effective protection but does hamper the intended uses of the data. More recently, Duncan and Lambert (1986) introduced a "disclosure limiting" (DL) approach that provides a framework for measuring the extent of statistical disclosure if specific aggregated statistical data are released. They have demonstrated

OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 251 that certain ad hoc disclosure control policies commonly used by statistical agencies are special cases of the DL approach. Although microciata would fit into the context of the DL approach, use of DL with this type of data has not yet been explored. Concerns about statistical disclosure risks have led to curtail- ment of the amount of information released in the form of microdata files. The extreme case is that of the Continuous Work History Sample (CWHS). Since shortly after the passage of the Tax Reform Act of 1976, no CWHS microdata files have been released because of findings by the Internal Revenue Service that there is a nonzero risk of disclosing individual tax return information. Prior to that time, CWHS microdata files were widely used by researchers inside and outside government to study relationships between earnings and benefits, internal migration, labor mobility, industry, mortality, and other topics. Plans had been developed to enhance the CWHS with data on mortality, retirement benefits, and Medicare benefits, thus creating a longitudinal data file of great potential value for research on policy issues related to the health of the aging population. The CWHS is a 1 percent sample of ah persons who have ever been issued Social Security numbers and therefore provides virtually complete coverage of older persons. It is large enough (as of 1984 it included about 250,000 persons age 65 and over who receive Social Security benefits) to support analyses for subgroups of the elderly defined geographically or in other ways. The amount of longitudinal data is unmatched by other data sources. Unfortunately, when releases of CWHS microdata files were ter- minated In 1976, the plans for content expansion through linkage with other files had to be terminated. Limited efforts along these lines are presently under way as a collaborative project of the Statis- tics of Income Division of the Internal Revenue Service and the Office of Research, Statistics, and Internal Policy of the Social Security Ad- ministration, with funding from the National Cancer Institute. The panel recognizes the Continuous Work History Sample as a poten- tially valuable data base for research on policy issues relating to health, Medicare benefits, disability, and mortality of the aging pop- ulation. It supports current efforts of the Internal Revenue Service and the Social Security Administration to enhance the system and resume limited dissemination of microdata files. Recommendation 11.6: The panel recommends that the Social Security Administration and the Internal Revenue

OCR for page 236
252 AGING POPULATIONIN THE TWEN~-FIRST CENTURY Service conduct a comprehensive review of the Continu- ous Work History Sample system, with a view to resuming broader release of its products under user agreements. If necessary, legislative authority should be sought. Federal agencies with information requirements that can be met by enhancing the CWHS system are urged to provide budgetary support. Releases of public-use m~crodata files from the decennial cen- suses, the Current Population Survey, and other sources, some of which include linked administrative records, have continued. How- ever, many users, including other federal agencies, have said that their uses of the files are hampered by the content restrictions that the releasing agencies impose in order to maintain the risk of sta- tistical disclosure at what they consider to be an acceptably low level. Recommendation 11.7: The pane! recommends that federal statistical agencies develop procedures for making microdata files more readily available to users, including both other federal agencies and nongovernment researchers. Technical procedures, such as curtailment of file content and the delib- erate introduction of error, cannot reduce the risk of statisti- cal disclosure to zero; therefore, other methods of protecting the confidentiality of data subjects should be explored. Such methods include user agreements with penalties for violation and legal remedies for data subjects harmed by disclosure. PROJECTIONS FOR AN AGING POPULATION The future needs of the aged population depend on its size, com- position, morbidity and mortality rates, educational and economic status, housing and living arrangements, and also on the unpre- dictable changes in the incidence and treatment of illness. The aged population has a dynamic nature in both size and composition be- cause of new entrants (people who reach the age of 65) and those who die. During the next 20 years, the new entrants into the aged popu- lation, persons now ages 45-64, represent a cohort that differs Tom the cohort now ages 65-84 in educational, marital, income, health, and other characteristics (Myers, 1985~. The informed evaluation of issues pertaining to policy decision making for this aging population

OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 253 requires forecasts for the years 2000 and beyond that will account for the forthcoming demographic and societal developments. The Bureau of the Census issues basic population projections for the future composition of the national population by age, race, and sex as well as by state and region. These projections also provide a basis for others, such as those concerning educational status, marital status, household structure, and labor force status all of which are single-factor projections. The other general type of forecast, which is discussed later in the chapter, is interactive models that can account for the effect of factors on each other. A noteworthy limitation of these Census Bureau projections is their lack of regularity. The most recent set was released in final form in May 1984, and the previous set in 1977. The periods covered by these projections have also tended to vary widely, with more recent sets extending farther into the future (Myers, 1985~. Recommendation 11.8: The pane! recommends that a basic set of projections be prepared by the Bureau of the Census every 10 years based on the decennial census and covering all characteristics included in the past age, race, sex, marital status, living arrangements, and educational attainment. Ethnicity, migration, and greater geographic detail are also needed. A shorter-term projection should be prepared five years after release of the projection based on decennial census data. The Bureau of the Census has used methodology similar to that of the Social Security Administration to produce future population estimates for its program. An important contribution made by Social Security is to forecast mortality. These estimates take into account trends in cause-specific mortality by age and sex but not race. Their construction also involves judgmental procedures with potentially arbitrary assumptions about the patterns of change in mortality rates over time. This issue merits attention for assessments of health status when more extensive consideration of the biological process is useful, particularly in the development of morbidity and disability forecasts. Cooperation with the Health Care Financing Administration and the National Center for Health Statistics in these efforts is essential. The Health Care Financing Administration has produced forecasts of national health expenditures and types of expenditures to the year 1990 on the basis of Social Security Projections (Freeland and SchendIer, 1983~. Also, the National Center for Health Statistics

OCR for page 236
254 AGING POPULATION IN THE TWENTY-FIRST CENTURY has reported on projections of various aspects of health services utilization and health care expenditures for the year 2003 (National Center for Health Statistics, 1983b). As noted previously, important factors in many projections con- cerning population size, composition, and health status are the levels of age-specific mortality rates in the future. The mortality rates provided through Social Security account for age, sex, major dis- ease category, and judgments about patterns of variation over time. However. they are not differentiated by race. Recommendation 11.9: The panel recommends that race be used in addition to age and sex in mortality rates by disease and in forecasting the size and health status of the population. Dissemination of data from projections has tended to be mainly in the form of published tables showing particular age and charac- teristic information for some specific points in time. For example, in 1979 the Census Bureau projected marital status for the period 1979- 1995 focusing on 10-year age intervals for 1985, 1990, and 1995. In addition, the input schedules for the change components and related assumptions may not be presented in published reports in adequate detail. One way of overcoming these limitations of published tables is to release information in tape form or through interactive computer systems. These would enable users to have their own capability of preparing their own alternative projections. Recommendation 11.10: The pane] recommends that agen- cies release projections and underlying information in tape form as well as in published tables and that documentation of basic assumptions in the projections accompany the tape and the published tables. There are several sources of uncertainty which influence the ac- curacy of projections for the elderly population. Data quality at older ages is one important consideration; potential problem areas include age misstatement, underenumeration, and inaccurate report- ing of characteristics, as well as nonresponse at the characteristic and entire person levels. An additional well-known source of vari- ability in surveys is sampling variability. Its role for older ages can be substantial when the sample size for older age groups is very small. Another important source of uncertainty for projections is sensitivity to assumptions such as those concerning the magnitude and pattern

OCR for page 236
STATISTICAL METHOD OLOGY FOR HEALTH POLICY ANALYSIS 255 of change of mortality rates. The previously stated considerations merit attention for projections concerning an aging population be- cause the rates of transition between relevant states at older ages are often much greater than those at younger ages. For example, the mortality rate for males ages 45-49 was 546.1 per 100,000 in 1982, but nearly 20 times greater at ages 80~84. Higher transition rates among the elderly also apply to morbidity, hospitalization, and other experiences. As a consequence of these issues, the level of error ex- perienced in estimation are greater at older ages, so care needs to be given to managing such uncertainty and describing its implications in order to avoid potentially misreading findings and subsequently misguided policy (Myers, 1985~. Recommendation 11.11: The pane} recommends that agen- cies describe the nature of uncertainty for projections. The basis of estimates of uncertainty should also be documented in terms of underlying sources such as those for data quality, sampling variability, and sensitivity to assumptions. Many of the factors that are of interest for an aging population interact with one another. One way to account for such interactions is through global models that attempt to integrate different submod- els or modules into a common projection framework. Such models can have either an aggregate economic-demographic structure or a microsimulation structure for a created population of individuals. Two noteworthy examples of the former type with specific features in their designs for studies of the elderly are the Macroeconomic- Demographic Mode] (National Institute on Aging, 1984a) and the Demographic-Econom~c Mode! of the Elderly (Olsen et al., 1981~. The Demographic-Economic Model of the Elderly (DECO) is part of a larger mode! developed by Data Resources, Inc. It provided 25-year projections of the economic implications of an aging population. The Macroeconomic-Demographic Mode! (MDM) was initiated under the President's Commission on Pension Policy and has been further developed by the National Institute on Aging. It is a Tong- term model intended to assess how the changing age structure of the population will affect the income level of the elderly as well as productivity, consumption, savings, and investment. The model treats all population factors exogenously within the comprehensive, integrated mode! of Tong-term economic growth and labor force sup- ply and demand. These, in turn, are related to major features of national pension systems and transfer programs. Research is under

OCR for page 236
256 AGING POPULATION IN THE TWENTY-FIRST CENTURY way for the development of modules to assess the demand for health insurance and services and another on health expenditures. These modules would appear to require further detail on health status. The model could benefit from the development of more endogenous mod- ules for the demographic component with respect to family formation and dissolution, fertility, health status, and mortality. Microsimulation models involve Monte Cario procedures that can apply different patterns of transitions to the individuals in a sample population. When adequate estimation of their parameters is feasible, they can enable evaluation of factors that can affect changes in distributions of population characteristics such as health status or health services utilization. Examples of useful m~crosimulation models include POPSIM (a product of the Research Triangle Insti- tute), DYNASIM (prepared by the Urban Institute: Wertheimer and Zedlewski, 1980~; and a preliminary research model designed by the Duke University Center for Demographic Studies to examine future health status (Myers et al., 1977~. Another consideration in the development of forecasting models is the incorporation of more biomedical information. A recent ef- fort in this direction is described in Manton (1985~. Incidence and prevalence of cancer morbidity by age group to age 90 and over are projected to the year 2000 under assumptions of a changing population structure and one fixed from 1977. Projections of lung cancer deaths in the year 2000 under both assumptions were made. An interesting feature of these forecasts is their linkage to stochas- tic compartment modeling techniques for the estimation of health state transitions for persons subject to specific diseases. Multiple sources of data are used to deal with age cohort differences in risk, changes in risk over the life span, individual differences in risk, and both independent and dependent competing risk assumptions about interactions among diseases (Myers, 1985~. As discussed here, developments are needed that would lead to more sophisticated modeling of the components of population, par- ticularly the mortality component that so affects the projections of the aged population. In this regard, integrated efforts are necessary to forecast health status, functional limitations, and support sys- tems available for older persons (Manton, 1984~. Not only can these forecasts have utility for probing important policy issues related to health care expenditures and welfare programs, but they also can be informative for improved mortality forecasts in general popula- tion projections. The NIA recognized the importance of forecasting

OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 257 methodology and recently issued a Request for Applications on the methodology of forecasting active and disabled life expectancy. Recom~nendation 11.12: The panel recommends that a study to evaluate theoretical, methodological, and data require- ments for forecasting the characteristics of the aging pop- ulation be undertaken. This would include theoretical and practical considerations for evaluating the sensitivity of fore- casts to underlying assumptions. The Bureau of the Census and the National Institute on Aging would be appropriate agencies to fund this study. QUANTIFYING UNCERTAINTY In a paper prepared for the panel, Stoto (1985) discussed the problem of quantifying the uncertainty associated with projections or other data summaries produced in policy-oriented analysis. These difficulties have two principal sources: (1) the need for assumptions that cannot be verified and (2) the use of data bases that arise from poorly defined stochastic models. The inability to verify critical assumptions about the adequacy of the stochastic model beyond the range of the available data Is intrin- sic to projection. Stoto discusses several approaches to the charac- ter~zation of uncertainty in projection. One unportant methodology is sensitivity analysis, the evaluation of the changes induced in a projection by changes in key assumptions or parameters. Sensitiv- ity analysis is sometimes summarized through the reporting of high, middle, and low projections. Regrettably, many policy analyses do not include a discussion of the sensitivity of key findings to unverifi- able assumptions. Similarly, many policy analyses involve the collection and inte- gration of data from a variety of sources. The analysis of such data sets requires special methods or higher-level models that link the information from different sources. This problem has received con- siderable attention in the statistical and social-scientific literature, under the rubrics risk assessment (DuMouchel and Harris, 1983) and meta-ana~ysis (Glass, 1976~. Many research workers are concerned with methodology for combining data from different sources (see, for example, Hedges and Olkin, 1985; Wolf, 1986; Gupta and WiTton, 1987~. Speaking more generally, much of statistical methodology is

OCR for page 236
258 AGING POPULATION IN THE TWENTY-FIRST CENTURY based on the paradigm of a well-defined experiment or sampling plan. Critical information required for decision making, however, is often not of this type but is rather more diffuse and less clearly structured. The analysis of such information poses a challenge to statisticians and other quantitative scientists, as has been recognized by many and has led to considerable research on policy analysis, as well as the formation of new professional groups such as the Society for Decision Analysis. The panel believes that this line of research is very important not only to policy analysis on the consequences of an aging population, but also to policy analysis in many other areas of importance to this country. We believe also that further advances in understanding of methods for conducting and reporting policy oriented analyses are critically needed. Recommendation 11.13: The panel recommends that fed- eral agencies relying on quantitative analysis to guide policy encourage and support research on methods for conducting and reporting policy analyses, especially methods for quan- tifying the uncertainty of projections and data summaries.