Modeling Incidence and Mortality Data in an Ecologic Study

A starting point for ecologic modeling of cancer rate is Poisson regression for rates and counts. In classic Poisson regression, a count, *N _{i}* of some data item (e.g., a count of childhood leukemias) is modeled as a Poisson random variable, with a probability distribution function equal to:

Here *μ _{i}* is the expected value of

Here *α* = (*α*_{1},*α*_{2},…,*α _{p}*)

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 389

J
Modeling Incidence and Mortality
Data in an Ecologic Study
A starting point for ecologic modeling of cancer rate is Poisson regres-
sion for rates and counts. In classic Poisson regression, a count, Ni of some
data item (e.g., a count of childhood leukemias) is modeled as a Poisson
random variable, with a probability distribution function equal to:
µi N e − µ
Ni ! (1)
Here mi is the expected value of Ni (i.e., the number of cancer incident
cases or deaths in a particular geographic unit expected from broad popula-
tion rates, typically cross-classified by other variables such as age, gender,
and race/ethnicity with i as the identifying index). In Poisson regression the
mean, mi, is unknown but assumed to be a function of known covariates.
For example, in generalized linear regression (McCullagh and Nelder, 1989)
a model for the mean involves a covariate vector Xi = (Xi1,Xi2,…,Xip)T
observed for each i. These Xi may be either continuous variables, such as
dose, or indicator variables, indicating levels taken by categorical variables.
The generalized linear model for mi is of form:
g(mi) = a1Xi1 + a2Xi2 +…+ apXip = XiTa (2)
Here a = (a1,a2,…,ap)T and a1 is the regression coefficient relating
covariate value Xi1 to the mean mi, a2 relates Xi2 to mi, etc. Here g is a link
389

OCR for page 389

390 APPENDIX J
function, for example when (as is often the case) g is the log function then
the model is equivalent to:
mi = exp(a1Xi1 + a2Xi2 +…+ apXip) (3)
When Ni counts the number of events observed over a period of time,
ti (years), for a known number of individuals, ki , then the person-years of
observation, pyi, defined as tiki will be made a part of model as:
mi = exp(a1X2 + a2Xi2 +…+apXip + log(pyi)) =
pyi exp(XiTa) (4)
so that the mean of the counts is proportional to the person-years of obser-
vation multiplied by the effect of covariates.
In the setting described here Ni would correspond to a single entry
in a cross-tabulation of events (death due to or incidence of a particular
cancer) by each geographical unit, and by gender, race, age, calendar time,
and any other relevant variable known (from the cancer registry) about the
cases. For each cell in the table the number of events and person-years at
risk, pyi, are required to be calculated (see discussion below) in addition
the variable of interest, dose Di, and other covariates available for each
geographical unit (i.e., indices of social economic status) are required for
each table entry i.
A variation on model , known as the linear excess relative risk (ERR)
model, is commonly used in radiation epidemiology. The linear ERR model
incorporates dose in the model for mi as:
pyi exp(XiTa)(1 + bDi) (5)
Here pyi exp(XiTa) is the background rate of disease (for unexposed
cells), multiplied by person-years at risk, and the ERR parameter b is the
excess relative risk associated with dose or dose surrogate Di. Much more
complex models can be considered and software for generalized Poisson
regression is available (Epicure, Hirosoft Software, Seattle, Washington).
The background rate of disease is allowed to vary depending on race, gen-
der, age, and calendar time (to allow for disease rates to differ by age and
for age-specific rates to vary by calendar year, for example). Covariates in
ecologic models are not individual covariates, but instead are summaries
obtained for each geographical unit, although these can also vary in time;
for example, we may have information about some socioeconomic variables
at the level of census tract and these variables may change with time over
the period of interest. Such variables are incorporated by including (catego-
ries of) calendar time as a cross-classification variable.

OCR for page 389

391
APPENDIX J
J.1 DOSE AND DOSE SURROGATES
The presumed effect on risk of the dose or dose surrogate variable, Di,
in model is much simpler (involving only the ERR parameter, b) than the
model for the background risk (involving many additional parameters a);
however, Di will also vary in time. For example, if Di is cumulative dose
from a particular nearby plant for representative individuals, then Di for
all census tracts near that plant would be zero until the start of operations
of that plant and would accumulate in time during operation. Even treat-
ment of much simpler dose surrogates (exposed or not exposed according
to distance) should reflect startup times of each plant or facility.
Other factors may also need to be considered in the calculation of Di;
for example, if it is known that a population around a particular plant or
facility has been highly mobile over the period of exposure then it would be
desirable to incorporate that mobility into the calculation of Di in order to
approximate the average cumulative dose to the individuals in each census
tract for each time period considered. If distance is to be used as a dose
surrogate then time-weighted distance could also be considered.
J.2 PERSON-YEAR CALCULATIONS
Another key issue in Poisson modeling is to adequately approximate
person-years of exposure to some hazard, pyi, as well as counting the num-
ber of events Ni. For each cell in the tabulation of events cross-classified by
geographical unit, race, age, and calendar time, census data are required in
order to determine the population size for each table entry, i.e., the whole
population must be classified according to these same variables. Data from
each decennial census must be interpolated to the out years. The accuracy
of person-year approximations affect the modeling of Ni using Poisson
regression and inaccuracies in estimation of person-years is one (among
many) reasons to assume that the Poisson model may not adequately cap-
ture the variability of the observed counts Ni.
J.3 OVERDISPERSION
It is likely that observed counts Ni will depart from the Poisson regres-
sion distribution in a way that must be adequately accommodated when
fitting the regression models such as (5). If a random variable is distributed
according to the Poisson distribution then the variance of Ni is also equal
to mi. However, there are good reasons why we expect that the actual vari-
ability of Ni will be greater than that predicted by Poisson distribution. For
example, as mentioned above, for the out years at least, the population size
and hence person-years will not be known exactly. Even more importantly,
however, is that other known and unknown risk factors that influence dis-

OCR for page 389

392 APPENDIX J
ease occurrence are not being accounted for in the variables that are used
in the ecologic regression. Even if those risk factors are completely indepen-
dent of distance or dose from a plant or facility then they will still increase
the dispersion of Ni while leaving the model for the mean unaffected.
Ignoring overdispersion will lead to underestimation of standard errors of
the estimates of the regression parameters, including those of most interest
(i.e., b). The treatment of overdispersion in Poisson regression models has
been considered by a number of authors (Liu and Pierce, 1993; McCullagh
and Nelder, 1989; Moore, 1986). A simple and usually effective approach
(McCullagh and Nelder, 1989) to solving this problem is to fit the means
model using Poisson regression but then to estimate an overdispersion term
s2 with s2 > 1 so that the variance of Ni is estimated to be equal to s2mi.
Inference about the significance of the parameters of interest (i.e., b) is
performed after adjusting the usual standard error estimates (assuming the
Poisson model). A method of moments approaches for fitting this and simi-
lar models is described by Moore (1986). More generally, the “sandwich
estimator” of Zeger and Liang (1986) can be used to compute variances of
the parameter estimates that adequately reflect the variability of the counts.
The overall approach described above relates observed disease rates
to distance or other dose surrogates in a systemic way, i.e., addressing
the question of whether or not disease risk appears to be associated with
proximity to a nuclear facility, or to other dose surrogates, averaging over
all the facilities. For some common cancers it will be possible to consider
site-specific analyses, i.e., whether proximity to a specific facility or plant is
associated with risk. Such analyses are subject to concerns about multiple
comparisons (as described in the main text) but may also be particularly
sensitive to the problem of overdispersion described above. If one uses an
uncorrected test, i.e., a test based upon the assumption that the Poisson
distribution holds exactly, then it is very likely that there will be some sites
where for some cancers proximity is “significantly” associated with risk,
but for which the inference differs greatly depending upon whether or not
purely Poisson variation of counts is assumed. The estimation of overdis-
persion terms s2 > 1 (or providing other treatment of overdispersion as in
a random effects analysis) is crucial in order to avoid overinterpretation of
random fluctuation that simply are greater in magnitude (due to unmea-
sured characteristics affecting disease risk) than expected under the Poisson
model. These problems appear in many different kinds of settings and have
been described by a number of different authors (Efron, 1992). Modeling
of both the mean (as in equation (5) of the appendix) and the variance of
counts will be essential in ensuring that unrealistic inference from fitting
these models is avoided; this is true both for the overall analysis of risk in
relation to plant proximity and especially for site-specific analyses.

OCR for page 389

393
APPENDIX J
REFERENCES
Efron, B. (1992). Poisson overdispersion estimates based on the method of asymmetric maxi-
mum likelihood. JASA 87.
Liu, Q., and D. A. Pierce (1993). Heterogeneity in Mantel-Haeszel-type models. Biometrika
80(3):543-556.
McCullagh, P., and J. Nelder (1989). Generalized linear models, 2nd edition. Boca Raton,
FL: CRC Press.
Moore, D. F. (1986). Asymptotic properties of moment estimates for overdispersed counts and
proportions. Biometrika 73(3):583-588.
Zeger, S., and K. Liang (1986). Longitudinal analysis for discrete and continuous outcomes.
Biometrics 42:121-130.

OCR for page 389