Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.
Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.
OCR for page 236
11
Statistical Methodology for Health
Policy Analysis
INTRODUCTION
Previous chapters of this report have described the changing
demographic profile of the U.S. population and the impact of the
growing number of older Americans on demand for health, housing,
and social services. They have also discussed the need for additional
scientific research and policy-oriented analysis on patterns of aging
and their consequences. Such investigations will require sophisti-
cated use of existing statistical methodology and, in some cases, the
development of new methods.
Although the importance of longitudinal information sometimes
seems self-evident to investigators, the need for longitudinal studies
and, conversely, the limitations of cross-sectional and retrospective
investigations are not always apparent to decision makers and fund-
ing agencies. Thus, this chapter begins with a discussion of the
rationale for longitudinal studies, followed by a discussion of prob-
lems in the design and analysis of longitudinal studies. A second
major theme in this report is increased usage of existing studies and
administrative information through linkage of data bases. The sec-
ond section of this chapter is devoted to a discussion of some of the
administrative, legal, and technical issues raised by attempts to link
data bases. The third section of the chapter discusses methodolog-
ical issues in a third area of fundamental importance in the study
of aging, forecasting of the sizes and composition of populations, as
well as population characteristics such as health status and needs for
236
OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 237
services. The final section of the chapter discusses a generic problem
in policy analysis, the quantification and reporting of uncertainty.
LONGITUDINAL IN1?ORMATION
The Rationale for Longitudinal Studies
Given the cost and complexity of developing longitudinal infor-
mation for studies of any population and particularly the elderly, it is
reasonable to ask why the data needed about the policy implications
of an aging population cannot be obtained by simpler and less costly
designs, especially cross-sectional studies. In subsequent paragraphs,
we discuss two considerations that motivate longitudinal studies: (1)
the need to study aging as a process and (2) the need to reduce bias
and improve precision of estimates of net change in populations.
In many areas of investigation discussed in this report, the essen-
tial role of longitudinal data derives from the need to study aging and
its consequences as processes, that is, complexes of states and events
occurring over time (Rowe, 19773. Because these processes and the
factors that determine their course are defined for individuals rather
than population aggregates, they can be studied only by gathering
data on individuals over time (Fienberg and Tanur, 1986~. Examples
of processes that must be understood in order to anticipate the needs
of tomorrow's elderly include:
(lj The interaction between economic circumstances and health
care utilization. One aspect of this interaction is the effect of
acute and chronic illness on economic circumstances after retire-
ment (Menefee, 1985), especially the spend-down to poverty that
can occur during an extended illness. Another aspect is the eject
of economic circumstances on patterns of health care utilization and
the influence of different pathways on the economics of care (Meiners,
1985b).
(2) The dynamics of social networks involving elderly persons
and the effects of social support on the subsequent physical and
mental health of the elderly (Berkman, 1985~.
(3) Patterns of morbidity and mortality in the elderly and the
predictive significance of functional status, living conditions, and
clinical risk factors (see Chapter 3~.
By definition, these are questions about change-change in
health or functional status, economic circumstances, or environment,
and the resulting changes in utilization and costs of health, social,
OCR for page 236
238
AGING POPULATION IN THE TWENTY-FIRST CENTURY
and housing services. More generally, the preceding chapters have
identified a need for increased understanding of the transitions be-
tween states experienced by elderly persons and the dependence of
these transitions on factors such as diseases, lifestyle habits, and
changes in psychosocial supports. Only longitudinal data can pro-
vide the basic observations of change that are the objects of study.
Stochastic models can play an important role in the study of
these phenomena. By formulating, fitting, and validating explicit
models, statisticians can clarify multivariate relationships and refine
projections of the health and service needs of tomorrow's elderly
population. For example, models for relationships between health
care costs and health care utilization can be used to examine the
effects of changes in costs of medical services on future demand.
Ideally a stochastic model for health care utilization and costs would
also consider the role of personal health, personal economic resources,
and other factors that influence health care needs. In practice, more
complex models can be more difficult to validate and can also lead
to more uncertain projections. Thus, stochastic modeling requires
careful choices about model complexity.
Longitudinal data can be obtained either prospectively, by fol-
lowing individuals over time, or retrospectively, by obtaining his-
torical information from study participants. Retrospective studies
can play an important role in the study of aging, as evidenced by
the ongoing NIA Survey of the Last Days of Life (National Research
Council, 1986~. They have important limitations, however, especially
in studies of elderly persons. First, many subjective states or events
cannot be reconstructed retrospectively. Similarly, physiologic mea-
surements can be obtained only in prospective studies. Retrospective
studies will also be affected by selective mortality. The absence of
deceased persons from a-retrospective study will produce an incom-
plete picture of the process under study. Finally, even objective
states, such as changes in family composition or economic circum-
stances, are subject to inaccuracies of recall that can be especially
severe in elderly persons. Thus, although retrospective studies can
be of value, they do not provide a general methodologic alternative
to longitudinal designs.
Longitudinal data are required for the study of gross flows and
for the study of individual change and its determinants-studies that
cannot be carried out with cross-sectional data. Although either
longitudinal data or cross-sectional data can be used to provide data
about net change in populations with age, several problems can
OCR for page 236
STATISTICAL METHOD OLOGY FOR HEALTH POLICY ANALYSIS 239
be encountered in using cross-sectional data for this purpose. By
net change, we mean the difference in age-specific distributions of
population characteristics. As a specific example, we might wish
to estimate the changes between ages 70 and 75 in percentages of
persons who live in nursing homes and who need assistance with two
or more activities of daily living. Using successive cross-sectional
data, the percentage of 7~year-old residents requiring such assistance
could be estimated at one year and the percentage of 75 residents
requiring such assistance could be estimated 5 years later. This
method could give a biased estimate of net change in that five-year
period because of selective mortality.
In discussing mortality as a potential source of bias in cross-
sectional data Rowe (1977) points out that whenever the variable
under study is related to survival, a cross-sectional study can give
biased estimates of age-related changes. In the study of changes
in living conditions, for example, higher rates of mortality among
individuals living in nursing homes would lead to underestimates of
the age-specific rates of movement from independent living to nursing
homes in cross-sectional ciata. Selective mortality can be viewed as a
special kind of missing data problem; the effects of missing data on
longitudinal analysis are discussed below.
Data from a single cross-sectional study could be used to make
the comparison described above, which was based on sequential cross-
sectional data. The percentage of 70-year-old residents needing assis-
tance with two or more activities of daily living could be compared
with the corresponding percentage of 75-year-old residents at the
time of the study. Studying net change in this manner introduces the
additional problem of cohort effects. Cohort effects in this instance
would be differences between age-specific patterns of limitations in
activies of daily living in cohorts born five years apart. The size
of cohort effects can vary depending on the variable under study.
One way to address the issue of cohort effects is to use successive
cross-sectional data for the same cohort as described above. In view
of these considerations, cross-sectional data can be used to obtain
valid estimates of net change if the potential bias from selective mor-
tality and cohort effects is small (Louis et al., 1986~. Otherwise,
longitudinal data are necessary for this purpose.
Finally, we note that longitudinal or repeated measures exper-
iments are often designed to increase the precision of estimates of
changes over time or between experimental conditions. Although
this can also be a consideration in designing observational studies of
OCR for page 236
240
AGING POPULATION IN THE TWENTY-FIRST CENTURY
aging, efficiency is typically a second-order consideration relative to
considerations of informativeness and validity.
In summary, longitudinal designs are essential to the study of
processes associated with aging and the factors determining their
course because their focus on individuals encompasses patterns of
improvement over time as well as decline. Cross-sectional data po-
tentially can provide some information about net change in popula-
tions due to aging, but they can give biased, and possibly misreading,
estimates when the variables under study are subject to cohort or
selection effects.
Designing Longitudinal Studies
Although the design phase of longitudinal studies has received
some attention in the statistical literature, most papers have ad-
dressed single issues in highly simplified settings. Several papers,
for example, have investigated the relative efficiency of different re-
peated measures designs for estimating the average rate of change
over time of a single measured variable. Schiesselman (1973) and
Berry (1974) discussed the effect of duration and frequkency of mea-
surement on the precision of estimated rates of change in measured
physiologic variables such as pulmonary function or blood pressure
measurements. For a review of this literature, see Cook and Ware
(1983~.
Design issues in more complex, possibly multipurpose, studies
have not received the same attention, in part because the formula-
tion of the issues is sensitive to the type of process and end point to
be studied. One important class of questions involves the definition
of units of study and end points. For example, longitudinal studies
of family units require a longitudinal definition of the family. The
Income Survey Development Program (David, 1983) defined house-
holds operationally by following members of the original household
even when they moved away from the household group, but chose
not to follow individuals entering the household after the original
interview if they subsequently moved away. Each such decision cre-
ates opportunities and limitations in subsequent analysis. Similarly,
when variables are measured repeatedly over time, special care is
needed in developing measurement systems that are free of spurious
temporal variation. This is easily appreciated in longitudinal studies
of physiologic variables (Dawber, 1980) but equally important in the
collection of questionnaire or interview data.
OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 241
Every longitudinal study raises issues of this type, including the
definition of study end points, the duration of study and frequency of
measurement, the nature of the measurement system, the definition
of study units, the nature of the sampling and follow-up plans, and
many more such issues. We believe that resolution of these issues,
critical to the success of complex studies, requires special considera-
tion of the features of individual research settings. The panel urges
that sponsors of longitudinal studies make provision in study staffing
and timetables to allow for considerations of the special design prob-
lems posed by individual studies.
Designs and Their Implications for Longitudinal Data Analysis
In empirical applications, testing whether specific classes of
stochastic process models describe the occurrence of events or the
evolution of continuous variables is best facilitated by observing, in
full, many realizations of the underlying process for all times in a
wide time interval. Examples of such data are the work histories
in the Seattle and Denver Income Maintenance Experiments (Tuma
et al., 1979), the fertility histories in the Taichung IUD experiment
(Freedman and Takeshita, 1969i, and the job vacancy histories for
· · 1 -
m~n~sters In Episcopalian churches in New England (White, 1970~.
In most substantive contexts, however, ascertaining the exact timing
of each occurrence of art event for each individual is either impossi-
ble, economically infeasible, or both. Observations usually contain
gaps and censoring relative to a continuously evolving process. Three
examples of this situation are as follows:
(1) In the Framingham study (Dawber, 1980) of atherosclerotic
disease, individuals were examined once every two years, at which
times symptoms of illness, hospitalizations, or other events occurring
between examinations were recorded (retrospective information). In
addition, a physical examination, some blood studies, and other
laboratory work (current information) were completed. One topic of
considerable interest is the intraindividual dynamics of systolic blood
pressure. This is a continuous-time and continuous-state process that
can be modeled only using the biennial samples; i.e., measurements
made at the examinations. Such data represent fragmentary infor-
mation about the underlying process.
(2) In the Taeuber et al. (1968) residence history study, ob-
servations were taken retrospectively on current residence, first and
second prior residence, and birthplace of individuals in particular
OCR for page 236
242
AGING POPULATION IN THE TWENTY-FIRST CENTURY
age cohorts. Analyses in which duration of residence is a dependent
variable of interest must accommodate censoring on the right for
current residence. Furthermore, characterizations of the pattern of
adult residence histories is complicated by the fact that initial con-
ditions are unknown for persons who have occupied more than three
residences beyond, for example, age 18.
(3) The first Duke longitudinal study of aging was conducted
on a sample of 271 male and female volunteers who were socioe-
conomically representative of elderly persons in the community of
Durham, North Carolina. The mean age of the study population at
the first test data was 71.3 years. At each of the 11 waves of the
21-year pane} survey, a standard battery of physiological risk fac-
tors, serum cholesterol, diastolic blood pressure, and pulse pressure
were ascertained from all persons still alive and in the study. Wech-
sler intelligence tests were also given to participants at each wave.
In modeling the age-dependent risk factors that affect mortality in
this elderly population (Manton and Woodbury, 1983b), the analyst
must take account of the fact that (a) relative to continuously evolv-
ing processes, there are gaps in the data on the physiological risk
factors they are measured at most 11 distinct times in 21 years; (b)
high rates of mortality selection and unequal follow-up times led to
small sample sizes for studying risk factor changes at the most ad-
vanced ages, and (c) all variables are not necessarily recorded for all
individuals present at a given survey, thereby yielding very intricate
patterns of missing data within a survey and over time.
A feature of modeling with such fragmentary data is that aige-
braic characterizations of the data sets that can possibly be generated
by given continuous-time models are frequently very difficult to ob-
tain. However, these characterizations are, of necessity, the basis of
tests for compatibility of the data with proposed classes of models.
In addition, estimation of quantities such as rates of occurrence of
events per individual at risk of the event at a given time is compli-
cated by the fact that some of the occurrences are unobserved. This
necessitates estimation of rates that have meaning within stochastic
process models that are found to be compatible with the observed
data.
Analytical Strategies
In the ideal but rather infrequent settings in which evolving
processes are observed over a time interval, two quite disparate in
OCR for page 236
STATISTICAL METHOD OLOGY FOR HEALTH POLICY ANALYSIS 243
terms of historical roots- but nevertheless related modeling frame-
works have been utilized to estimate rates of occurrence of events,
changes in levels of continuous variables, and parameters associated
with covariates that are viewed as the primary factors that influence
outcomes. One of these modeling frameworks is a direct adaptation of
the classical linear statistical models to longitudinal data, reviews of
which are contained in Ware (1985) and Ware et al. (1988~. Also see
Geisser (1980) for pertinent review of the capabilities of longitudinal
data analysis for making projections and predictions. An alternative
framework, originating from diverse stochastic process specifications,
is the multivariate counting process literature (see the review by An-
dersen and Borgan, 1984) and the multivariate Gaussian process and
diffusion process literature (see in this regard Woodbury et al., 1979,
Manton and Woodbury, 1983a). Applications of both the stochastic
process framework and the linear models specifications in the context
of labor economics and the sociology of work are discussed in detail
in Heckman and Singer (1985~.
For data with the types of limitations illustrated in the exam-
ples given above, a major research program remains to be carried
out before longitudinal data analysis can be regarded as a mature
subject. A commonly used analytical strategy proceeds according to
the following basic steps:
(1) When estimation of transition rates between a discrete set
of states is the primary focus of the analysis, one begins with very
simple, somewhat plausible classes of models as candidates to de-
scribe some portion of the observed data and within which the
unobserved dynamics are well defined for example, a time series
of time-homogeneous Markov chains for which each separate mode!
describes only unobserved dynamics between a pair of consecutive
surveys in a multiwave pane! design and fits the observed transitions.
(2) Estimate and interpret the parameters of interest-for ex-
ample, transition rates between discrete pairs of states- within the
simplified models and then assess whether these models can, in fact,
account for finer-grained detail such as the joint frequency of state
occupancy at three or more consecutive surveys in a multiwave pane!
study.
(3) Typically, the original proposed models they are usually
first-order Markovian across a wide range of subject matter con-
texts-that may adequately represent data based on pairs of consec-
utive surveys will not account for higher-order dependencies. Such
OCR for page 236
244
AGING POPULATION IN THE TWENTY-FIRST CENTURY
dependencies tend to be the rule rather than the exception in Tongitu-
dinal microdata. We then Took for structured residuals from the sam-
ple models to guide the selection of more realistic and interpretable
specifications (see, e.g., Singer and Spilerman, 1976; Goodman, 1978;
Duncan, 1981) for a discussion of this kind of strategy in a variety of
sociology and economics investigations).
The repeated fitting of models and subsequent utilization of
structured residuals to guide successively more realistic mode! selec-
tion is a strategy that, on the surface, seems to be very reasonable.
However, the process frequently stagnates after only one or two stages
because the possible explanations for given structured residuals are
usually too extensive to be helpful by themselves. One really needs,
in addition, specific subject matter theories translated into mathe-
matics to guide the mode} selection process. Unfortunately, in most
fields in which analysis of longitudinal microdata is of interest, the
development of substantive theory is quite weak.
The potential danger of the foregoing strategy, even for the
estimation of transition rates, is that parameter estimates may be
biased simply as a result of mode! misspecifications. The biases, in
turn, can lead to incorrect conclusions about relationships between
events.
As the foregoing discussion suggests, considerable room remains
for the development and assessment of stochastic models for temporal
processes. Ideally these models will be based on mathematical con-
structs that represent the underlying biological, behavioral, or other
processes in a meaningful way. Analytic methods should be applica-
ble in settings in which data sets are unbalanced and incomplete. In
some cases, it will be important to explicitly mode! the processes that
lead to missing data. Although there are many successful examples of
longtitudinal analysis, methodological approaches and applications
are very diverse and approaches tend to differ in different applied
settings. In particular, the econometric, sociometric, biometric, and
human growth literatures contain extensive work on longitudinal
methods, but communication among the groups of investigators is
imperfect. One can hope that future methodological developments
will lead toward more unified approches to model formulation and
data analysis.
An important new development for exploring high-climensional
longitudinal data sets is the faintly of grade of membership (GOM)
models developed by Woodbury et al. (1978), Woodbury and Man-
ton (1982), and Manton et al. (1987~. This modeling framework is
OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 245
well suited to studying the dynamics of highly heterogeneous elderly
populations and has the added very desirable feature of breaking free
of conventional regression structures when assessing the impact of
covariates on outcome variables. The essential idea is to represent
the individual histories in a vector stochastic process in terms of
the evolution of degrees of similarity (or grades of membership) of
individuals to ideal (or pure type) profiles. The profiles are charac-
terized by combinations of levels on particular covariates that occur
with high frequency. The profiles also need not be static. They
may in fact be stochastic processes themselves, thereby leading to a
representation of individual dynamics in terms of evolving degrees
of similarity to special pure type processes. Although much further
theoretical, empirical, and numerical computational development re-
mains to be carried out before GOM can be considered to be a well
understood and readily utilized framework, it has already shown suf-
ficient promise in studies of health status dynarn~cs and health care
utilization in elderly populations to warrant much more sustained
investigation.
A much neglected topic with both fragmentary and relatively
complete longitudinal data is the recognition that choice-based sam-
pling is a pervasive aspect of many data sets, and that it plays a
particularly central role in program evaluation of almost every kind.
For a very readable and insightful introduction to the analytical
issues associated with choice-based samples and selection bias, see
Heckman and Hotz (1987~. This topic should be but to date has
not been playing a major role in evaluations of health care systems
for elderly populations. It is a research area critically in need of
development.
Recommendation 11.1: Given the growing importance and
complexity of longitudinal research, the pane! recommends
that federal agencies encourage methodological research on
innovative approaches to study design and analysis, both
through support of methodological work within their own
technical groups and through funding of methodological re-
search.
RECORD [IN1LAGE
The Rationale for Record Linkages
Many of the data requirements discussed in preceding chapters
OCR for page 236
246
AGING POPULATIONIN THE TWENTY-FIRST CENTURY
can be satisfied most economically by linking data from different
sources. Linkages may be at the aggregate level or they may involve
individual records from different data systems. Linkages of individ-
ual records may be accomplished either by exact matching or by
statistical matching. In an exact match, the goal is to link records
for the same individuals from two or more data systems; in a statis-
tical match, the goal is to link records for individuals who are similar
in important respects. Exact matching, if feasible, is a method to
be adopted because fewer assumptions are needed to ensure the va-
lidity of analyses based on the linked data sets (U.S. Department of
Commerce, 1980, Recommendation lb, p.33~.
Statistical purposes of record linkages include enhancement of
survey data by adding data from administrative files, development
of sampling frames for surveys and the evaluation of coverage and
response errors in censuses and surveys. Important multipurpose
statistical data systems have been developed by exact matching of
records from different administrative files maintained by federal agen-
cies.
Linkage of existing records as an alternative to direct data col-
lection has become more appearing because of the development of so-
phisticated computerized record linkage techniques, based on models
first proposedin the 1950s and 1960s (Newcombe et al., 1959; Fellegi
and Sunter, 1969~. Linkages are further facilitated by increasingly
widespread use of Social Security numbers as identifiers (Jabine,
1985~.
Examples of exact record linkages that have contributed or have
the potential to contribute to our information base on the status of
older Americans include:
Linkage of administrative data on retirement benefits to sur-
vey data collected in the Social Security Administration's
Retirement History Survey (Fox, 1979~.
Linkage of survey records from the Current Population Sur-
vey, the National Health Interview Survey, and the National
Health and Nutrition Examination Survey to death records
in the National Death Index (Patterson and Bilgrad, 1985~.
Linkage of survey records from the National Medical Care Uti-
lization and Expenditures Survey to records in the Medicare
and Medicaid administrative data files (Cox and Bonham,
1983~.
OCR for page 236
248
AGING POPULATION IN THE TWENTY-FIRST CENTURY
aspects for which significant technical improvements are possible.
These aspects include, for example, the choice and standardization
of matching variables, the choice of blocking strategies (blocking
limits the comparison of records in two files to those that agree on
certain variables that appear in both files) and the role of manual
intervention in computerized systems. Matching errors inevitably
occur in linked data files, so far relatively little has been done to
assess these errors and to develop estimation and analysis techniques
that take them into account. One source of error among the elderly
is the often noted phenomenon of age misreporting (exaggeration)
among those age 85 and over (Rosenwaike, 1968~.
Recommendation 11.3: The pane] recommends that fed-
eral agencies support research aimed at the development
of improved multipurpose computerized record linkage sys-
tems and better methods of estimation and analysis when
matching errors are present.
Formats vary for recording identifiers such as names and ad-
dresses in statistical and administrative data files. Other potential
matching variables, such as age (or date of birth), race, ethnic origin,
and marital status are defined and categorized in various ways. These
variations make record linkages more difficult and lead to matching
errors.
Recommendation 11.4: The pane! recommends that fed-
eral agencies that maintain statistical or administrative data
bases work together to encourage standardization of defi-
nitions and reporting formats for personal identifiers (the
Social Security number ~ critically important) and personal
characteristics likely to be used as matching variables.
Although in the long run some data requirements can be met
more economically by linking existing records, it is often Biscuit
to obtain the necessary resources. Substantial research and devel-
opment efforts may be needed to explore feasibility and to adapt
existing record linkage programs to malice them suitable for a spe-
cific application. Understandably, agencies like the Social Security
Administration and the Internal Revenue Service that maintain po-
tentially useful administrative data systems do not consider the de-
velopment of statistical data bases not directly related to their own
programs to be a high-priority activity. Therefore most of the re-
sources must come from potential users, some of whom may be
OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 249
reluctant to invest significant amounts in the development of data
systems over which they will have relatively little control.
Statutes like the 1974 Privacy Act, the Census Bureau's Title
13, and the Tax Reform Act of 1976 restrict, in varying degrees, the
disclosure of individually identifiable data by one agency to another.
Nevertheless, interagency record linkages can usually be undertaken
in ways that do not violate statutory prohibitions, provided the agen-
cies controlling the record systems involved are sufficiently motivated
to do the linkage. These statutory limitations may, however, be con-
sidered obstacles to record linkages in two senses. First, they provide
a convenient argument against performing a particular record linkage
for an agency that is not anxious to do it. Second, even when both
agencies want to do the linkage, the time required for development
and approval of suitable arrangements can often substantially delay
completion of a project, adversely affecting the utility of results.
Recommendation 11.5: The pane! recommends that heads
of appropriate federal agencies take the initiative to seek
methods of overcoming legal and policy obstacles to cost-
effective use of existing information through linkages that
are clearly beneficial to the public and ethical, and that they
seek legislative remedies if necessary.
Policy considerations are probably the primary obstacle to suc-
cessfu! record linkage undertakings. Both statistical and adminis-
trative agencies are concerned that public cooperation with their
requirements or requests for information may be adversely affected
if it becomes widely known that they are allowing such information
to be linked with the records of other agencies. The Privacy Act,
however, requires that people who supply information be informed
of any plans to disclose such information in individually identifiable
form. Agencies vary considerably in the amount of detail they pro-
vide in their Privacy Act notification statements: this is an ethical
dilemma that has received some recent attention (Scheuren, 19853.
These concerns are by no means groundless. Computers, data
banks, and record linkages are, rightly or wrongly, viewed with con-
cern or suspicion by a substantial segment of the public. Even those
_
· ,, , , . . .. · . . ., . · ~ . - .
agencies that nave mandatory authority to collect information de-
pend heavily on voluntary cooperation to obtain the data they need.
Any widely publicized report of deception or other improper behavior
on their part, whether accurate or not, could have serious repercus-
sions. A recent example is the controversy that erupted in Sweden,
OCR for page 236
250
AGING POPULATION IN THE TWENTY-FIRST CENTURY
early in 1986, concerning privacy aspects of Project Metropolitan, a
sociological record linkage study designed to follow, over a 20-year
period, all l~year-olds who lived in Stockholm in lg63. Nonresponse
in Sweden's monthly labor force survey has more than doubled in
recent months, apparently as a result of the public debate about
Project Metropolitan (Dalenius, 1986~.
Obstacles to Dissemination and Use of Record Linkage Results
For record linkages that are undertaken despite the obstacles
just described, the benefits derived depend on how widely and in
how much detail the resulting data are disseminated to potential
users. Publicly collected data are disseminated in two forms: ag-
gregate statistics and microdata files. The latter contain individ-
ual records from which explicit identifiers, such as name and ad-
dress, have been removed (microdata files that can be released with-
out restriction are called public-use files). Statistical agencies have
been releasing public-use microdata files for about 30 years, and the
widespread availability of these files has benefited society by increas-
ing the amount and relevance of quantitative information available
for collective decision making.
Agencies that release data for statistical purposes must attempt,
whether or not the data are derived from linked records, to avoid dis-
closing information in a form that makes it possible to identify spe-
cific individuals and thereby learn more about them. In other words,
they must avoid statistical disclosure. Methods used to limit the
risk of statistical disclosure resulting from release of microdata files
include elimination of some geographic and other detail, replacement
of exact amounts with class intervals, and deliberate introduction of
error.
In practice, zero risk of disclosure is unattainable (Cox et al.
1985~. The development of powerful computerized record linkage
techniques has increased the likelihood of success by someone who,
for whatever reason, might deliberately try to identify one or more
persons from a public-use microdata file. Recent research by Paass
(1985) suggests that deliberate introduction of random errors into
records does not provide effective protection but does hamper the
intended uses of the data. More recently, Duncan and Lambert (1986)
introduced a "disclosure limiting" (DL) approach that provides a
framework for measuring the extent of statistical disclosure if specific
aggregated statistical data are released. They have demonstrated
OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 251
that certain ad hoc disclosure control policies commonly used by
statistical agencies are special cases of the DL approach. Although
microciata would fit into the context of the DL approach, use of DL
with this type of data has not yet been explored.
Concerns about statistical disclosure risks have led to curtail-
ment of the amount of information released in the form of microdata
files. The extreme case is that of the Continuous Work History
Sample (CWHS). Since shortly after the passage of the Tax Reform
Act of 1976, no CWHS microdata files have been released because
of findings by the Internal Revenue Service that there is a nonzero
risk of disclosing individual tax return information. Prior to that
time, CWHS microdata files were widely used by researchers inside
and outside government to study relationships between earnings and
benefits, internal migration, labor mobility, industry, mortality, and
other topics. Plans had been developed to enhance the CWHS with
data on mortality, retirement benefits, and Medicare benefits, thus
creating a longitudinal data file of great potential value for research
on policy issues related to the health of the aging population. The
CWHS is a 1 percent sample of ah persons who have ever been issued
Social Security numbers and therefore provides virtually complete
coverage of older persons. It is large enough (as of 1984 it included
about 250,000 persons age 65 and over who receive Social Security
benefits) to support analyses for subgroups of the elderly defined
geographically or in other ways. The amount of longitudinal data is
unmatched by other data sources.
Unfortunately, when releases of CWHS microdata files were ter-
minated In 1976, the plans for content expansion through linkage
with other files had to be terminated. Limited efforts along these
lines are presently under way as a collaborative project of the Statis-
tics of Income Division of the Internal Revenue Service and the Office
of Research, Statistics, and Internal Policy of the Social Security Ad-
ministration, with funding from the National Cancer Institute. The
panel recognizes the Continuous Work History Sample as a poten-
tially valuable data base for research on policy issues relating to
health, Medicare benefits, disability, and mortality of the aging pop-
ulation. It supports current efforts of the Internal Revenue Service
and the Social Security Administration to enhance the system and
resume limited dissemination of microdata files.
Recommendation 11.6: The panel recommends that the
Social Security Administration and the Internal Revenue
OCR for page 236
252
AGING POPULATIONIN THE TWEN~-FIRST CENTURY
Service conduct a comprehensive review of the Continu-
ous Work History Sample system, with a view to resuming
broader release of its products under user agreements. If
necessary, legislative authority should be sought. Federal
agencies with information requirements that can be met by
enhancing the CWHS system are urged to provide budgetary
support.
Releases of public-use m~crodata files from the decennial cen-
suses, the Current Population Survey, and other sources, some of
which include linked administrative records, have continued. How-
ever, many users, including other federal agencies, have said that
their uses of the files are hampered by the content restrictions that
the releasing agencies impose in order to maintain the risk of sta-
tistical disclosure at what they consider to be an acceptably low
level.
Recommendation 11.7: The pane! recommends that federal
statistical agencies develop procedures for making microdata
files more readily available to users, including both other
federal agencies and nongovernment researchers. Technical
procedures, such as curtailment of file content and the delib-
erate introduction of error, cannot reduce the risk of statisti-
cal disclosure to zero; therefore, other methods of protecting
the confidentiality of data subjects should be explored. Such
methods include user agreements with penalties for violation
and legal remedies for data subjects harmed by disclosure.
PROJECTIONS FOR AN AGING POPULATION
The future needs of the aged population depend on its size, com-
position, morbidity and mortality rates, educational and economic
status, housing and living arrangements, and also on the unpre-
dictable changes in the incidence and treatment of illness. The aged
population has a dynamic nature in both size and composition be-
cause of new entrants (people who reach the age of 65) and those who
die. During the next 20 years, the new entrants into the aged popu-
lation, persons now ages 45-64, represent a cohort that differs Tom
the cohort now ages 65-84 in educational, marital, income, health,
and other characteristics (Myers, 1985~. The informed evaluation of
issues pertaining to policy decision making for this aging population
OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 253
requires forecasts for the years 2000 and beyond that will account for
the forthcoming demographic and societal developments.
The Bureau of the Census issues basic population projections for
the future composition of the national population by age, race, and
sex as well as by state and region. These projections also provide a
basis for others, such as those concerning educational status, marital
status, household structure, and labor force status all of which are
single-factor projections. The other general type of forecast, which is
discussed later in the chapter, is interactive models that can account
for the effect of factors on each other.
A noteworthy limitation of these Census Bureau projections is
their lack of regularity. The most recent set was released in final form
in May 1984, and the previous set in 1977. The periods covered by
these projections have also tended to vary widely, with more recent
sets extending farther into the future (Myers, 1985~.
Recommendation 11.8: The pane! recommends that a basic
set of projections be prepared by the Bureau of the Census
every 10 years based on the decennial census and covering all
characteristics included in the past age, race, sex, marital
status, living arrangements, and educational attainment.
Ethnicity, migration, and greater geographic detail are also
needed. A shorter-term projection should be prepared five
years after release of the projection based on decennial census
data.
The Bureau of the Census has used methodology similar to that
of the Social Security Administration to produce future population
estimates for its program. An important contribution made by Social
Security is to forecast mortality. These estimates take into account
trends in cause-specific mortality by age and sex but not race. Their
construction also involves judgmental procedures with potentially
arbitrary assumptions about the patterns of change in mortality rates
over time. This issue merits attention for assessments of health status
when more extensive consideration of the biological process is useful,
particularly in the development of morbidity and disability forecasts.
Cooperation with the Health Care Financing Administration and
the National Center for Health Statistics in these efforts is essential.
The Health Care Financing Administration has produced forecasts
of national health expenditures and types of expenditures to the
year 1990 on the basis of Social Security Projections (Freeland and
SchendIer, 1983~. Also, the National Center for Health Statistics
OCR for page 236
254
AGING POPULATION IN THE TWENTY-FIRST CENTURY
has reported on projections of various aspects of health services
utilization and health care expenditures for the year 2003 (National
Center for Health Statistics, 1983b).
As noted previously, important factors in many projections con-
cerning population size, composition, and health status are the levels
of age-specific mortality rates in the future. The mortality rates
provided through Social Security account for age, sex, major dis-
ease category, and judgments about patterns of variation over time.
However. they are not differentiated by race.
Recommendation 11.9:
The panel recommends that race
be used in addition to age and sex in mortality rates by
disease and in forecasting the size and health status of the
population.
Dissemination of data from projections has tended to be mainly
in the form of published tables showing particular age and charac-
teristic information for some specific points in time. For example, in
1979 the Census Bureau projected marital status for the period 1979-
1995 focusing on 10-year age intervals for 1985, 1990, and 1995. In
addition, the input schedules for the change components and related
assumptions may not be presented in published reports in adequate
detail. One way of overcoming these limitations of published tables is
to release information in tape form or through interactive computer
systems. These would enable users to have their own capability of
preparing their own alternative projections.
Recommendation 11.10: The pane] recommends that agen-
cies release projections and underlying information in tape
form as well as in published tables and that documentation
of basic assumptions in the projections accompany the tape
and the published tables.
There are several sources of uncertainty which influence the ac-
curacy of projections for the elderly population. Data quality at
older ages is one important consideration; potential problem areas
include age misstatement, underenumeration, and inaccurate report-
ing of characteristics, as well as nonresponse at the characteristic
and entire person levels. An additional well-known source of vari-
ability in surveys is sampling variability. Its role for older ages can be
substantial when the sample size for older age groups is very small.
Another important source of uncertainty for projections is sensitivity
to assumptions such as those concerning the magnitude and pattern
OCR for page 236
STATISTICAL METHOD OLOGY FOR HEALTH POLICY ANALYSIS 255
of change of mortality rates. The previously stated considerations
merit attention for projections concerning an aging population be-
cause the rates of transition between relevant states at older ages are
often much greater than those at younger ages. For example, the
mortality rate for males ages 45-49 was 546.1 per 100,000 in 1982,
but nearly 20 times greater at ages 80~84. Higher transition rates
among the elderly also apply to morbidity, hospitalization, and other
experiences. As a consequence of these issues, the level of error ex-
perienced in estimation are greater at older ages, so care needs to be
given to managing such uncertainty and describing its implications
in order to avoid potentially misreading findings and subsequently
misguided policy (Myers, 1985~.
Recommendation 11.11: The pane} recommends that agen-
cies describe the nature of uncertainty for projections. The
basis of estimates of uncertainty should also be documented
in terms of underlying sources such as those for data quality,
sampling variability, and sensitivity to assumptions.
Many of the factors that are of interest for an aging population
interact with one another. One way to account for such interactions
is through global models that attempt to integrate different submod-
els or modules into a common projection framework. Such models
can have either an aggregate economic-demographic structure or a
microsimulation structure for a created population of individuals.
Two noteworthy examples of the former type with specific features
in their designs for studies of the elderly are the Macroeconomic-
Demographic Mode] (National Institute on Aging, 1984a) and the
Demographic-Econom~c Mode! of the Elderly (Olsen et al., 1981~.
The Demographic-Economic Model of the Elderly (DECO) is part of
a larger mode! developed by Data Resources, Inc. It provided 25-year
projections of the economic implications of an aging population.
The Macroeconomic-Demographic Mode! (MDM) was initiated
under the President's Commission on Pension Policy and has been
further developed by the National Institute on Aging. It is a Tong-
term model intended to assess how the changing age structure of
the population will affect the income level of the elderly as well
as productivity, consumption, savings, and investment. The model
treats all population factors exogenously within the comprehensive,
integrated mode! of Tong-term economic growth and labor force sup-
ply and demand. These, in turn, are related to major features of
national pension systems and transfer programs. Research is under
OCR for page 236
256
AGING POPULATION IN THE TWENTY-FIRST CENTURY
way for the development of modules to assess the demand for health
insurance and services and another on health expenditures. These
modules would appear to require further detail on health status. The
model could benefit from the development of more endogenous mod-
ules for the demographic component with respect to family formation
and dissolution, fertility, health status, and mortality.
Microsimulation models involve Monte Cario procedures that
can apply different patterns of transitions to the individuals in a
sample population. When adequate estimation of their parameters is
feasible, they can enable evaluation of factors that can affect changes
in distributions of population characteristics such as health status
or health services utilization. Examples of useful m~crosimulation
models include POPSIM (a product of the Research Triangle Insti-
tute), DYNASIM (prepared by the Urban Institute: Wertheimer and
Zedlewski, 1980~; and a preliminary research model designed by the
Duke University Center for Demographic Studies to examine future
health status (Myers et al., 1977~.
Another consideration in the development of forecasting models
is the incorporation of more biomedical information. A recent ef-
fort in this direction is described in Manton (1985~. Incidence and
prevalence of cancer morbidity by age group to age 90 and over
are projected to the year 2000 under assumptions of a changing
population structure and one fixed from 1977. Projections of lung
cancer deaths in the year 2000 under both assumptions were made.
An interesting feature of these forecasts is their linkage to stochas-
tic compartment modeling techniques for the estimation of health
state transitions for persons subject to specific diseases. Multiple
sources of data are used to deal with age cohort differences in risk,
changes in risk over the life span, individual differences in risk, and
both independent and dependent competing risk assumptions about
interactions among diseases (Myers, 1985~.
As discussed here, developments are needed that would lead to
more sophisticated modeling of the components of population, par-
ticularly the mortality component that so affects the projections of
the aged population. In this regard, integrated efforts are necessary
to forecast health status, functional limitations, and support sys-
tems available for older persons (Manton, 1984~. Not only can these
forecasts have utility for probing important policy issues related to
health care expenditures and welfare programs, but they also can
be informative for improved mortality forecasts in general popula-
tion projections. The NIA recognized the importance of forecasting
OCR for page 236
STATISTICAL METHODOLOGY FOR HEALTH POLICY ANALYSIS 257
methodology and recently issued a Request for Applications on the
methodology of forecasting active and disabled life expectancy.
Recom~nendation 11.12: The panel recommends that a study
to evaluate theoretical, methodological, and data require-
ments for forecasting the characteristics of the aging pop-
ulation be undertaken. This would include theoretical and
practical considerations for evaluating the sensitivity of fore-
casts to underlying assumptions.
The Bureau of the Census and the National Institute on Aging would
be appropriate agencies to fund this study.
QUANTIFYING UNCERTAINTY
In a paper prepared for the panel, Stoto (1985) discussed the
problem of quantifying the uncertainty associated with projections
or other data summaries produced in policy-oriented analysis. These
difficulties have two principal sources: (1) the need for assumptions
that cannot be verified and (2) the use of data bases that arise from
poorly defined stochastic models.
The inability to verify critical assumptions about the adequacy of
the stochastic model beyond the range of the available data Is intrin-
sic to projection. Stoto discusses several approaches to the charac-
ter~zation of uncertainty in projection. One unportant methodology
is sensitivity analysis, the evaluation of the changes induced in a
projection by changes in key assumptions or parameters. Sensitiv-
ity analysis is sometimes summarized through the reporting of high,
middle, and low projections. Regrettably, many policy analyses do
not include a discussion of the sensitivity of key findings to unverifi-
able assumptions.
Similarly, many policy analyses involve the collection and inte-
gration of data from a variety of sources. The analysis of such data
sets requires special methods or higher-level models that link the
information from different sources. This problem has received con-
siderable attention in the statistical and social-scientific literature,
under the rubrics risk assessment (DuMouchel and Harris, 1983) and
meta-ana~ysis (Glass, 1976~. Many research workers are concerned
with methodology for combining data from different sources (see, for
example, Hedges and Olkin, 1985; Wolf, 1986; Gupta and WiTton,
1987~.
Speaking more generally, much of statistical methodology is
OCR for page 236
258
AGING POPULATION IN THE TWENTY-FIRST CENTURY
based on the paradigm of a well-defined experiment or sampling
plan. Critical information required for decision making, however,
is often not of this type but is rather more diffuse and less clearly
structured. The analysis of such information poses a challenge to
statisticians and other quantitative scientists, as has been recognized
by many and has led to considerable research on policy analysis, as
well as the formation of new professional groups such as the Society
for Decision Analysis. The panel believes that this line of research
is very important not only to policy analysis on the consequences
of an aging population, but also to policy analysis in many other
areas of importance to this country. We believe also that further
advances in understanding of methods for conducting and reporting
policy oriented analyses are critically needed.
Recommendation 11.13: The panel recommends that fed-
eral agencies relying on quantitative analysis to guide policy
encourage and support research on methods for conducting
and reporting policy analyses, especially methods for quan-
tifying the uncertainty of projections and data summaries.