Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 439

Reducing Suicide: A National Imperative
Appendixes

OCR for page 439

Reducing Suicide: A National Imperative
This page in the original is blank.

OCR for page 439

Reducing Suicide: A National Imperative
A
Statistical Details
MIXTURE MODELS
One can naturally model the unobserved population heterogeneity or extra-population heterogeneity via mixture models. The simplest and most natural occurrence of the mixture model arises when one samples from a super population g which is a mixture of a finite number, say m, of populations (g1,…, gm), called the components of the population. Suppose a sample from a super population g is recorded as data (Yi, J i) for i = 1, …, n, where Yi = yi is a measurement on the ith sampled unit and Ji = ji indicates the index number of the component to which the unit belongs. If sampling is done from the jth component, an appropriate probability model for the sampling distribution is given by
P(Yi = yi|Ji = j) = fj(yi; θj)
The function fj(yi; θj) represents a density function, called the component density for the ith observation yi and the parameter θj Θj dj called the component parameter, that describes the specific attributes of the jth component population, gj.
We treat the Ji as missing data and define the latent random variable Φ= (Ф1,…, Фn) to be the values of the parameter θ {θ1, …,θm} corresponding to the sampled components J1, …, Jn; i.e., if the ith observation came from the jth component, then define Фi = θj. Then the Фi are a random sample from the discrete probability measure Q that assigns a positive probability πj to the support point θj for j = 1, …, m. That is, the latent

OCR for page 439

Reducing Suicide: A National Imperative
random variable Φ has a distribution Q, which is called the mixing distribution, defined by P(Φ =θj) = Q({θj }) = πj. Thus the mixing distribution
puts a mass (or weight) of πj on each parameter θj. We thus arrive at the mixture distribution , or equivalently
One can find the posterior probability that the ith observation comes from the jth component (i.e, yi Gj) using
In turn, one can assign each yi to the population to which it has the highest estimated posterior probability of belonging; i.e., to gj if τij > τit for t = 1, …, m and t ≠ j.
Estimation of the Parameters and the Mixture Components
One has to estimate the number of subpopulations or number of mixture components m as well as the parameters in the mixture distribution g…). This can be achieved via parametric or non-parametric maximum likelihood estimation (Laird, 1978; Böhning et al. 1992, Pilla and Lindsay, 2001). Laird (1978) considered nonparametric likelihood estimation of a mixing distribution and Lindsay (1983) linked its general theory, developed from a study of the geometry of the mixture likelihood, to the problem of estimating a discrete mixing distribution. One may also use an alternative approach to maximum likelihood for which menu driven software is provided by Böhning et al. (1998).
For a given m, one can estimate the parameters in the mixture distribution via the EM algorithm (Dempster et al. 1977) and several improvements of it which are discussed in McLachlan and Krishnan (1997). Let = ( , ) be the vector of estimated parameters. Once has been obtained, estimates of the posterior probability of population membership ij can be formed for each yi to give a probabilistic clustering.
When m, the number of mixtures, is large, the EM algorithm can be excruciatingly slow. In that case, one can use alternative EM methods proposed by Pilla (1997) and Pilla and Lindsay (2001) in the nonparamet-

OCR for page 439

Reducing Suicide: A National Imperative
ric mixture setting. Their approach assumes that the θ’s are known and the goal is to estimate the π’s only. However, by setting up the problem in a nonparametric framework, one may find the maximum likelihood estimator (MLE) quite easily in a high-dimensional problem via Pilla and Lindsay’s method. These methods are especially useful because θ1 < θ2 < …, i.e., the support points are ordered and hence the corresponding nearby densities can be paired to take advantage of the correlated densities.
Mixture Setting for Detecting Spatial Occurrence of Suicide
Suppose data are available on the spatial occurrence of suicide in N counties in a state or for the entire country over a five year period. Assume that the counties are a mixture of two groups g1 and g2, corresponding to normal and high risk with respect to suicide, in proportions of π1 and π2 respectively. The number of events yi in the ith county is assumed to have a Poisson distribution with a rate of μj in gj for j = 1, 2; i= 1, …, n. Let = (μ1, μ2, π1, π2)T and θi = (μ1Ni, μ2Ni)T, where Ni = Mi ti is the mass of population at risk in the ith county assuming that there are Mi individuals at risk over time ti. In this scenario, the log likelihood becomes
where fj(yi; θi) = exp(–μjNi) (μjNi)yi/yi!. One can use the EM algorithm to find the MLE of the parameters. The complete data log likelihood function becomes
where zij = 1 if yi gj and 0 otherwise. The MLE of the parameters are found iteratively via
and
where which is the estimated posterior probability that yi gj.

OCR for page 439

Reducing Suicide: A National Imperative
Testing for the Number of Components in a Mixture
Testing for the number of components in a mixture is a very hard problem. Consider the problem of testing the hypothesis H0: number of components = m against H0: number of components = m + 1. The likelihood ratio test (LRT) statistic defined as 2 log Ln = 2 [l(m) – l(m+1)], where l(m) is the log likelihood log evaluated at m, the nonparametric MLE under H0 and similarly l(m+1) is the nonparametric MLE under H1. The LRT statistic generally has an asymptotic χ2 distribution with d degrees of freedom, where d equals the difference between the number of parameters under the alternative and null hypothesis. However, this theory is known to fail for the mixture problem (Titterington et al., 1985:154). The reason for this is that the null hypothesis does not lie in the interior of the parameter space. The problem is not yet solved except for some special cases. However, the simulations or bootstrap can shed some light on the distributional properties of the LRT. See Lindsay (1995) and Böhning (1999) (including the references therein) for recent developments in theory and methods to test for the number of components in a mixture. There are other techniques for identifying the number of components of a mixing distribution including a variety of graphical techniques (Lindsay and Roeder, 1992).
The Choice of Initial Values and Multiple Mode Problem for the EM Algorithm
The continuous EM algorithm that finds the MLE of parameter estimators, for a given m, requires the specification of initial values for the number and location of the support points. Pilla and Lindsay (2001) show that if the number of support points is fixed in advance, then one could have multiple modes whereas the nonparametric ML approach corresponds to a unimodal likelihood. Böhning (1999) illustrates through an example where the choice of initial values is critical; for example, in testing for the number of components in the mixture. Through simulations, he demonstrates the effectiveness of several approaches that combat this problem. His recommendation is to choose the support points as values of certain order statistics of the data (see Böhning, 1999:68–70). In his simulation studies, the gold standard (the grid approach given below) seems to outperform the other methods. The optimization problem solved by the grid-based EM (where the support points are assumed known) proposed by Pilla and Lindsay (2001) has a unique optimum which in turn generates an estimated number of components. However, the continuous EM can converge to the suboptimal mode.

OCR for page 439

Reducing Suicide: A National Imperative
In light of the above characteristics, Pilla and Lindsay (2001) suggest the following: to obtain a continuous solution with a flexible number of support points, first start with one of the fast EM-based algorithms proposed by Pilla and Lindsay on a grid and next use the resulting solution as starting values for the continuous EM. This will protect against using an incorrect number of points or finding suboptimal solutions.
Convergence Criteria
Finding a natural stopping rule for iterative algorithms in the mixture problem when the final log likelihood is unknown is an important problem. In a potentially slow algorithm like EM, a far weaker convergence criterion based on the changes in the log likelihood value or on changes in the parameter estimates between iterations can be very misleading (Titterington et al. 1985:90). See Lindsay (1995:62–63) for further details. The following gradient-based stopping criterion proposed by Lindsay (1995:131–132) can be used to detect the convergence of the EM algorithm: we stop when sup DQ(θ) ≤ ε, with ε = 0.005, where the gradient function DQ(θ) is defined as
then we automatically satisfy the ‘ideal stopping criterion’: |log Lobs(; y) – log Lobs(p; y)| ≤ ε, where p is the estimate at the pth iteration. The gradient function therefore creates a natural stopping rule for iterative algorithms such as the EM when the final log likelihood is unknown, although it is more stringent than the ideal stopping rule.
Illustration of Poisson Mixture Model
To illustrate use of the Poisson mixture model we analyzed countylevel suicide data for the US for the time period of 1996–1998. The three year period was used to obtain more stable estimates of the parameters. Application of the Poisson mixture model identified a mixture of four component distributions with proportions of .1125, .4523, .3119, and .1234, and means of 7.5, 10.9, 14.9, and 20.6 suicides per 100,000. Results of classifying counties into component distributions are displayed in Figure 1. Inspection of the map in Figure 1 reveals that those counties classified in the highest component distribution (20.6 suicides per 100,000), are predominantly in the western states with lowest population density, although there are exceptions throughout the country (see Figure A-1). Counties classified in the lowest component distribution (7.5 suicides per 100,000)

OCR for page 439

Reducing Suicide: A National Imperative
FIGURE A-1 Annual Suicide Rates per 100,000 (1996–1998). The Poisson Mixture Model applied to county-level data.
appear to be clustered in the central United States. Finally, counties classified into the two intermediate groups are largely in the eastern portion of the United States. Unlike the earlier attempt at fitting a mixture distribution to temporal suicide data collected in Cook county (Gibbons et al., 1990) that was designed to identify “suicide epidemics” and failed to do so, the results of this analysis has focused on the spatial distribution of suicide and has identified qualitatively distinct geographic groupings of suicide rates across the United States. Note that this mixture distribution does not adjust for demographic disparity across the counties (e.g., age, sex, race, population density, etc.) that may have produced the mixture distribution in the first place.
POISSON REGRESSION MODELS
In the following sections, we present statistical details of both fixed-effects and mixed-effects Poisson regression models.

OCR for page 439

Reducing Suicide: A National Imperative
Fixed-Effects Poisson Regression Model
In Poisson regression modeling, the data are modeled as Poisson counts whose means are expressed as a function of covariates. Let yi be a response variable and xi = (1, xi1,…, xip)T be a [(p+1) × 1] vector of covariates for the ith individual. Then the Poisson regression model yi is
(2)
where λ i = exp(βT xi) and β = (β0, β1, …, βp)T is a (p+1)-dimensional vector of unknown parameters corresponding to xi. An important property of the Poisson distribution is the equality of the mean and variance:
E(yi) = Var(yi) = λi
(3)
From equation (2) one can write the likelihood and the log likelihood functions for the n independent observations as
and
respectively. The first and second partial derivatives with respect to the unknown parameters β are given by
and
respectively. Since the above equations are not linear, iterative procedures such as the Newton-Raphson algorithm or Fisher’s scoring algorithm are used to obtain the MLE of β, denoted by . A consistent estimator of the variance-covariance matrix of , V() is the inverse of the information matrix, I() — which is the negative of inverse of the matrix of second partial derivatives evaluated at the MLE. Inferences on the regression parameter β can be made using V().

OCR for page 439

Reducing Suicide: A National Imperative
Mixed-Effects Poisson Regression Model
When we have a mixture of fixed and random effects, the more general mixed-effects Poisson regression model is used. Suppose that there are n = Σi ni nonnegative observations yij for i = 1, …, nc clusters and j = 1, …, ni observations and a (p + 1) dimensional unknown parameter vector, β, associated with a covariate vector xij = (1, xij1, …, xijp)T. Further, let the random effect υi be normally-distributed with mean 0 and variance σ2 and independent of the covariate vector xij. Then the conditional density function, f(yi; θi), of the ni suicide rates in cluster i is written as:
(4)
with
λij =exp(βTxij + υi )
=exp(βTxij + σθi ),
where θi = υi/σ such that θi ~N (0,1). Thus the log-likelihood function corresponding to equation 4 becomes
The first and second derivatives of log L with respect to β and σ are
and

OCR for page 439

Reducing Suicide: A National Imperative
The marginal density of yi can be written as
where g(θ) represents the multivariate standard normal density. One can approximate the above integral via the Gaussian quadrature as
(5)
where λijq = exp(βT xij + σBq), Bq is the quadrature node and A(Bq) is the corresponding quadrature weight. From equation (5), one can obtain the first derivatives of the approximate log likelihood as
and
Solution of the above likelihood equations can be obtained iteratively using the Newton-Raphson algorithm. On the ith iteration, estimates for a parameter vector Θ are given by
(6)
where is the matrix of second derivatives of the log-likelihood, evaluated at Θi. Alternatively, one can use the Fisher scoring method. This method replaces the matrix of second derivatives in (6) by their expected values:
which is equal to the negative of the information matrix. The scoring solution is often used in cases where the matrix of second derivatives is difficult to obtain. Note that when the expected value and the actual value of the Hessian matrix coincide, the Fisher scoring method and the Newton-Raphson method reduce to the same algorithm.

OCR for page 439

Reducing Suicide: A National Imperative
State
County Name
Observed # of Suicides
Expected # of Suicides
NEW MEXICO
LINCOLN
17
10.72
NEW MEXICO
LOS ALAMOS
8
7.68
NEW MEXICO
LUNA
22
15.23
NEW MEXICO
MC KINLEY
30
25.16
NEW MEXICO
MORA
3
2.05
NEW MEXICO
OTERO
16
17.58
NEW MEXICO
QUAY
3
3.72
NEW MEXICO
RIO ARRIBA
37
27.71
NEW MEXICO
ROOSEVELT
7
6.83
NEW MEXICO
SANDOVAL
30
29.60
NEW MEXICO
SAN JUAN
44
40.91
NEW MEXICO
SAN MIGUEL
19
14.81
NEW MEXICO
SANTA FE
81
75.18
NEW MEXICO
SIERRA
22
11.37
NEW MEXICO
SOCORRO
14
8.99
NEW MEXICO
TAOS
12
10.79
NEW MEXICO
TORRANCE
12
7.78
NEW MEXICO
UNION
1
1.53
NEW MEXICO
VALENCIA
29
26.99
NEW YORK
NASSAU
217
225.92
NEW YORK
ROCKLAND
30
40.87
NEW YORK
WESTCHESTER
163
167.11
TEXAS
HIDALGO
77
85.03
Observed and expected number of suicides : sum of 1996 - 1998
Population = Annual population
MIXED-EFFECTS ORDINAL REGRESSION MODELS
In motivating probit and logistic regression models, it is often assumed that there is an unobservable latent variable (y) which is related to the actual response through the “threshold concept.” For a binary response, one threshold value is assumed, while for an ordinal response, a series of threshold values γ1, γ 2, …, γ K–1, where K equals the number of ordered categories, γ0 = –∞, and γ K = ∞. Here, a response occurs in category k (Y = k) if the latent response process y exceeds the threshold value γ k–1, but does not exceed the threshold value γ k.
The traditional ordinal regression model is parameterized such that regression coefficients α do not depend on k, i.e., the model asumes that the relationship between the explanatory variables and the cumulative logits does not depend on k. McCullagh (1980) calls this assumption of identical odds ratios across the K – 1 cut-offs, the proportional odds assump-

OCR for page 439

Reducing Suicide: A National Imperative
Observed Rate/100,000
Expected Rate/100,000
Population
Bayes Estimate
37.89
23.88
14,955
1.64
15.39
14.77
17,330
1.01
33.63
23.28
21,807
1.70
16.60
13.92
60,256
1.44
22.35
15.27
4,474
1.07
10.55
11.59
50,533
0.86
10.51
13.03
9,516
0.93
36.07
27.01
34,197
2.07
13.65
13.31
17,099
1.00
12.85
12.68
77,840
1.00
15.47
14.38
94,802
1.25
23.81
18.57
26,596
1.38
23.93
22.20
112,852
1.58
70.75
36.57
10,364
2.33
31.08
19.96
15,014
1.48
16.35
14.70
24,469
1.08
29.56
19.15
13,533
1.39
8.86
13.55
3,762
0.95
16.84
15.67
57,397
1.15
5.92
6.17
1,221,351
0.44
3.86
5.25
259,392
0.40
6.51
6.67
834,764
0.50
5.66
6.25
453,250
0.50
tion. This assumption implies that the effect of a regressor variable is the same across the cumulative logits of the model, or proportional across the cumulative odds. In the present context, this implies that the effect of an intervention is the same on the shift from no ideation to suicidal ideation, as it is from suicidal ideation to a suicidal attempt, an assumption that is questionable at best. As noted by Peterson and Harrell (1990), hovever, examples of non-proportional odds are not difficult to find. Recently Hedeker and Mermelstein (1998) have described an extension of the random effects proportional odds model to allow for non-proportional odds for a set of explanatory variables. For example, it sems reasonable to assume that the effects of the model covariates on suicidal ideation and suicidal attempts may not be the same. The resulting mixed-effects partial proportional odds model follows Peterson and Harrell’s (1990) extension of the fixed-effects proportional odds model.

OCR for page 439

Reducing Suicide: A National Imperative
Proportional Odds Model
Assuming that there are i = 1, …, N level-2 units and j = 1, …, ni level-1 units nested within the ith level-2 unit. The cumulative probabilities for the k ordered categories (k = 1, … K) are defined for the ordinal outcomes Y as:
Pijk = Pr(Y ≤ k | βi ,γk ,α ),
where the mixed-effects logistic regression model for these cumulative probabilities is given as
with K – 1 strictly increasing model intercepts γk (i.e., γ1 > … > γK–1). As before, wij is the p × 1 covariate vector and xij is the design vector for the r random effects, both vectors being for the jth level-1 unit nested within level-2 unit i. Also, α is the p × 1 vector of unknown fixed regression parameters, and βi is the r × 1 vector of unknown random effects for the level-2 unit i. Since the regression coefficients ,α do not depend on k, the model assumes that the relationship between the explanatory variables and the cumulative logits does not depend on k.
As in most mixed-effects models, it is convenient to orthogonally transform the response model. Letting β = Λt, where ΛΛT=Σβ is the Cholesky decomposition of Σβ, the reparameterized model is then written as log
where ti are distributed according to a multivariate standard normal distribution.
Partial-Proportional Odds Model
To allow for a partial proportional odds model the intercepts γk are denoted instead as γ0k, and the following terms are added to the model:
or absorbing γ0k and into γk′

OCR for page 439

Reducing Suicide: A National Imperative
where, uij is a [(h+1) × 1] vector containing (in addition to a 1 for γ0k) the values of observation ij on the subset of h covariates for which proportional odds are not assumed (i.e., uij is a subset of wij). In this model, γk is a (h+1) × 1 vector of regression coefficients associated with the h variables (plus the intercept) in uij. Notice that the effects of these h covariates vary across the (K – 1) cumulative logits. This extension of the model follows similar extensions of the ordinary fixed-effects ordinal logistic regression model discussed by Peterson and Harrell (1990) and Cox (1995). Terza (1985) discusses a similar extension for the ordinal probit regression model.
There is a caveat in this extension of the proportional odds model. For the explanatory variables without proportional odds, the effects on the cumulative log odds, namely result in (K – 1) non-parallel regression lines. These regression lines inevitably cross for some values of u*, leading to negative fitted values for the response probabilities. For u* variables contrasting two levels of an explanatory variable (e.g., gender coded as 0 or 1), this crossing of regression lines occurs outside the range of admissible values (i.e., < 0 or > 1). However, if the explanatory variable is continuous, this crossing can occur within the range of the data, and so, allowing for non-proportional odds is problematic. For continuous explanatory variables, other than requiring proportional odds, a solution to this dilemma is sometimes possible if the variable u has, say, m levels with a reasonable number of observations at each of these m levels. In this case (m – 1) dummy-coded variables can be created and substituted into the model in place of the continuous variable u.
With the above mixed-effects regression model, the probability of a response in category k for the i-th level-2 unit, conditional on γk, α, and t is given by:
where, under the logistic response function,
with Note that Pj0 and PjK = 1.

OCR for page 439

Reducing Suicide: A National Imperative
Estimation of the p covariate coefficients α, the population variance-covariance parameters Λ (with r(r+1)/2 elements), and the (h + 1)(K –1) parameters in γk (k = 1, …, K – 1) is described in detail by Hedeker and Gibbons (1994).
Illustration
To illustrate application of the mixed-effects ordinal logistic regression model, we reanalyzed longitudinal data from Rudd et al. (1996) on suicidal ideation and attempts in a sample of 300 suicidal young adults (personal communication, M.D. Rudd, Baylor University, October 2001). 180 subjects were assigned to an outpatient intervention group therapy condition and 120 subjects received treatment as usual. In this analysis, we used an ordinal outcome measure of 0=low suicidal ideation, 1=clinically significant suicidal ideation, and 3=suicide attempt. Suicidal ideation was defined as a score of 11 or more on the Modified Scale for Suicide Ideation (MSSI; Miller et al., 1986). Model specification included main effects of month (0, 1, and 6) and treatment (0=control, 1=intervention), and the treatment by month interaction. Although data at 12, 18 and 24 months were also available, the drop-out rates at these later months were too large for a meaningful analysis. In addition, to illustrate the flexibility of the model, depression as measured by the Beck Depression Inventory (BDI; Beck and Steer, 1987) and anxiety as measured by the Millon Clinical Multiaxial Inventory (MCMI-A; Millon, 1983) were treated as time-varying covariates in the model, to relate fluctuations in depressed mood and anxiety to shifts in suicidality.
Table A-5 displays the observed proportions in each of the three ordinal suicide categories as a function of month and treatment. Table A-5 reveals that, if anything, the observed rate of attempts is as high or higher in the intervention group than the control group. Note that this is also true at baseline, therefore, these difference may be an artifact of randomization.
Table A-6 displays the model parameter estimates, standard errors, and associated probability estimates. A model with both random intercept and slope (i.e., month) effects failed to converge because the variance component for the intercept was close to zero. In light of this, the model was specified with a single random effect (i.e., a random slope model). Table A-6 reveals that both of the time-varying covariates, depression and anxiety, were significantly associated with suicidality. By contrast, the treatment by time interaction was not significant, indicating that the intervention did not significantly affect the rates suicide ideation or at-

OCR for page 439

Reducing Suicide: A National Imperative
TABLE A-5 Suicide as a Function of Treatment and Time, Observed Frequencies
Month
None
Ideation
SuicideAttempt
Total
0
Control
11
43
66
120
9.2%
35.8%
55.0%
Active
11
56
113
180
6.1%
31.1%
62.8%
1
Control
70
18
2
90
77.8%
20.0%
2.2%
Active
96
32
7
135
71.1%
23.7%
5.2%
6
Control
41
8
1
50
82.0%
16.0%
2.0%
Active
61
11
3
75
81.3%
14.7%
4.0%
TABLE A-6 Mixed-effects Ordinal Regression Model, Suicide Data (Rudd et. al., 1996)
Variable
Estimate
Stand. Error
Z
p-value
Month
–3.33237
0.44739
–7.44855
0.00000 (2)
Intercept
0.80413
0.39477
2.03695
0.04165 (2)
Treatment
0.34707
0.21230
1.63484
0.10208 (2)
Treatment × Time
–0.01756
0.32389
–0.05422
0.95676 (2)
Depression
0.02591
0.01047
2.47364
0.01337 (2)
Anxiety
0.00970
0.00462
2.09768
0.03593 (2)
Random effect variance term (standard deviation)
Month
1.77373
0.25394
6.98491
0.00000 (1)
Thresholds (for identification: threshold 1 = 0)
2
2.17069
0.17705
12.26010
0.00000 (1)
(1) = 1-tailed p-value
(2) = 2-tailed p-value

OCR for page 439

Reducing Suicide: A National Imperative
tempts over time. The random month effect was significant, indicating appreciable inter-individual variability in the rates of change over time.
It might be argued, that by including the effects of depression and anxiety in the model, the effect of treatment and the treatment by time interaction may be masked. For example, the intervention may work by modifying depression and anxiety which are in turn related to suicide ideation and attempts. In this case, including depression and anxiety in the model may mask the effect of the intervention. Reanalysis excluding depression and anxiety, however, also yielded nonsignificant treatment related effects, suggesting that this is not the case. Finally, a nonproportional odds model, in which the treatment by time interaction was allowed to be category-specific, also failed to identify any positive treatment related effects on suicide ideation or attempts.
INTERVAL ESTIMATION
Poisson Confidence Limits
A (1 –α)100% confidence interval for the true suicide rate λ can be approximated as
(11)
where = y/n, y is the total number of suicides over n time intervals (e.g., months) or geographic areas (e.g. counties), and z is an upper percentage point of the normal distribution. In general, the approximation will work extremely well for y ≥ 20. When y < 20 tabled values are available in Hahn and Meeker (1991).
As an example, consider the case in which after 61 months of observation for a particular county, 123 suicides are recorded; therefore, y = 123 and n = 61. The 95% confidence limit for the population suicide rate λ is therefore
(12)
In light of this, we can have 95% confidence that the true suicide rate is between 1.659 and 2.373 suicides per month.

OCR for page 439

Reducing Suicide: A National Imperative
Poisson Prediction Limits
Cox and Hinkley (1974) consider the case in which y has a Poisson distribution with mean µ. Having observed y their goal is to predict y*, which has a Poisson distribution with mean cµ, where c is a known constant. In the present context, y is the number of suicides observed in n previous time-periods, and y* is the number of suicides observed in a single future time-period, therefore, c = 1/n. Alternatively, for a given time period, n may represent a number of counties, and our inference is to the predicted suicide rate in a single future county for the same length of time.
Following Cox and Hinkley (1974), Gibbons (1987) derived a prediction limit using as an approximation the fact that the left side of:
(13)
is approximately a standard normal deviate, therefore, the 100 (1 –a)% prediction limit for y* is formed from percentage points of the normal distribution. Upon solving for y* the upper limit value is found as the positive root of the quadratic equation:
(14)
Returning to the previous example in which after 61 months of observation for a particular county, 123 suicides are recorded, we have y = 123 and c = 1/61. The 95% confidence Poisson prediction limit is given by
The prediction limit reveals that we can have 95% confidence that no more than four suicides will occur in the next month.
Poisson Tolerance Limits
The uniformly most accurate upper tolerance limit for the Poisson distribution is given by Zacks (1970). In terms of a future time interval, we can construct an upper tolerance limit for the Poisson distribution that will cover P(100)% of the population of future measurements with (1 – a)100% confidence. The derivation begins by obtaining the cumulative probability that a or more suicides will be observed in the next time period:

OCR for page 439

Reducing Suicide: A National Imperative
(15)
which can be approximated by:
Pr(a,μ) = Pr( χ2 [2a+2] ≥ 2μ
(16)
where χ2[f] designates a chi-square random variable with f degrees of freedom. This relationship between the Poisson and chi-square distribution was first described by Hartley and Pearson (1950). Given n independent and identically distributed Poisson random variables the sum
(17)
also has a Poisson distribution. Substituting Tn for μ, we can find the value for which the cumulative probability is 1 – α, that is,
(18)
The P(100)% upper tolerance limit is therefore Pr–1[P; K1–α(Tn)] = smallest j (≥ 0) such that
(19)
The required probability points of the chi-square distribution can be most easily obtained using the Peizer and Pratt approximation described by Maindonald (1984:294).
Returning to the previous example, where after 61 months of observation for a particular county, 123 suicides are recorded, we have, Tn = 123 and n = 61. Furthermore, we require the resulting tolerance limit to have 99% coverage and 95% confidence. The cumulative 95% probability point is
The 99% upper tolerance limit is obtained by finding the smallest nonnegative integer j such that:
Inspection of standard chi-square tables (extracted below for j = 5 to 7) reveals that the value of j that satisfies this equation is 7.

OCR for page 439

Reducing Suicide: A National Imperative
6
4.6604
7
5.8122
Therefore, the P–1 [.99 ; K.95(123)] upper tolerance limit is 7 suicides per month in the example posed.
REFERENCES
Beck AT, Steer RA. 1987. Beck Depression Inventory: Manual. San Antonio, TX: Psychological Corporation.
Bock RD, Aitkin M. 1981. Marginal maximum likelihood estimation of item parameters: an application of the EM algorithm. Psychometrika, 46: 443–459.
Böhning D, Schlattmann P, Lindsay BG. 1992. Computer-assisted analysis of mixtures (C.A.MAN): Statistical algorithms. Biometrics, 48: 283–303.
Böhning D, Dietz E, Schlattmann P. 1998. Recent developments in computer-assisted analysis of mixtures. Biometrics, 54: 525–536.
Böhning D. 1999. Computer-Assisted Analysis of Mixtures and Applications. Meta-Analysis, Disease Mapping and others. Chapman and Hall CRC: Boca Raton.
Cox C. 1995. Location-scale cumulative odds models for ordinal data: A generalized nonlinear model approach. Statistics in Medicine, 14: 1191–1203.
Cox DR, Hinkley DV. 1974. Theoretical Statistics. Chapman and Hall: London.
Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm (with discussion). Journal of the Royal Statistical Society, Series B, 39: 1–38.
Gibbons RD. 1987. Statistical models for the analysis of volatile organic compounds in waste disposal sites. Ground Water, 25: 572–580.
Gibbons RD, Clark DC, Fawcet J. 1990. A statistical method for evaluating suicide clusters and implementing cluster surveillance. American Journal of Epidemiology, 132 (Suppl 1): S183–S191.
Hahn GJ, Meeker WQ. 1991. Statistical Intervals: A Guide for Practitioners. Wiley: New York.
Hartley HO, Pearson ES. 1950. Tables of the chi-squared integral and of the cumulative Poisson distribution. Biometrika, 37: 313–325.
Hedeker D, Gibbons RD. 1994. A random effects ordinal regression model for multilevel analysis. Biometrics, 50: 933–944.
Hedeker D, Mermelstein RJ. 1998. A multilevel thresholds of change model for analysis of stages of change data. Multivariate Behavioral Research, 33: 427–455.
Laird N. 1978. Nonparametric maximum likelihood estimation of a mixing distribution. Journal of the American Statistical Association, 73: 805–811.
Liang KY, Zeger SL. 1986. Longitudinal data analysis using generalized linear models. Biometrika, 73: 13–22.
Lindsay BG. 1983. The geometry of mixture likelihoods, part I: A general approach. Annals of Statistics, 11: 783–792.
Lindsay BG. 1995. Mixture Models: Theory, Geometry and Applications. NSF-CBMS Regional Conference Series in Probability and Statistics, Vol. 5. Hayward, CA: Institute of Mathematical Statistics.
Lindsay BG, Roeder K. 1992. Residual diagnostics for mixture models. Journal of the American Statistical Association, 87: 785–794.
Maindonald JH. 1984. Statistical Computation. Wiley: New York.
McCullagh P. 1980. Regression models for ordinal data (with discussion). Journal of the Royal Statistical Society, Series B, 42: 109–142.
McLachlan GJ, Krishnan T. 1997. The EM Algorithm and Extensions. New York: Wiley.

OCR for page 439

Reducing Suicide: A National Imperative
Miller IW, Norman WH, Bishop SB, Dow MG. 1986. The Modified Scale for Suicidal Ideation: Reliablity and validity. Journal of Consulting and Clinical Psychology, 54(5):724– 725.
Millon T. 1983. Millon Clinical Multiaxial Inventory Manual. 2nd ed. Minneapolis, MN: National Computer Systems.
Peterson B, Harrell FE. 1990. Partial proportional odds models for ordinal response variables. Applied Statistics, 39: 205–217.
Pilla RS. 1997. Improving the Rate of Convergence of EM in High-Dimensional Finite Mixtures. Ph.D. dissertation, The Pennsylvania State University.
Pilla RS, Lindsay BG. 2001. Alternative EM methods for nonparametric finite mixture models. Biometrika, 88: 535–550.
Rudd MD, Rajab MH, Orman DT, Joiner T, Stulman DA, Dixon W. 1996. Effectiveness of an outpatient problem-solving intervention targeting suicidal young adults: Preliminary results. Journal of Consulting and Clinical Psychology, 64(1):179–180.
Terza JV. 1985. Ordinal probit: A generalization. Communications in Statistical Theory and Methods, 14: 1–11.
Titterington DM, Smith AFM, Makov UE. 1985. Statistical Analysis of Finite Mixture Distributions. New York: Wiley.
Zacks S. 1970. Uniformly most accurate upper tolerance limits for monotone likelihood ratio families of discrete distributions. Journal of the American Statistical Association, 65: 307–316.
Zeger SL, Liang KY. 1986. Longintudinal data analysis for discrete and continuous outcomes. Biometrics, 42: 121–130.