Drawing Inferences from Incomplete Data

In this chapter, we review and comment on several approaches for drawing inferences from incomplete data. A substantial literature on this topic has developed over the last 30 years, and the range of approaches to modeling and inference is extremely broad. We make no attempt here to summarize that entire literature; rather, we focus on those methods that are most directly relevant to the design and analysis of regulatory clinical trials. We begin by presenting a set of principles for drawing inference from incomplete data. A major theme that we reiterate throughout the chapter is that inference from incomplete data relies on subjective, untestable assumptions about the distribution of missing values. On its face, this statement seems obvious. However, for a number of commonly used methods, users are not always aware of the assumptions that underlie the methods and the results drawn from applying them. This lack of awareness is particularly true of single imputation methods—such as last or baseline observation carried forward (LOCF or BOCF) and random effects (mixed effects) regression models—that rely on strong parametric assumptions.

In the second section of the chapter, we introduce a set of notation that is used throughout (and in Chapter 5). The third section summarizes the assumptions that underlie inference from incomplete data (missing completely at random, missing at random, etc.). The remaining sections describe commonly-used methods of analysis and offer comments and recommendations about their use in practice. In some cases, we offer recommendations for further research and investigation.

For both this chapter and the next, it is important to note the role of software. None of the techniques for either the primary analysis of clini-

Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.

Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter.
Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 47

4
Drawing Inferences from
Incomplete Data
In this chapter, we review and comment on several approaches for
drawing inferences from incomplete data. A substantial literature on this
topic has developed over the last 30 years, and the range of approaches to
modeling and inference is extremely broad. We make no attempt here
to summarize that entire literature; rather, we focus on those methods that
are most directly relevant to the design and analysis of regulatory clinical
trials. We begin by presenting a set of principles for drawing inference from
incomplete data. A major theme that we reiterate throughout the chapter is
that inference from incomplete data relies on subjective, untestable assump-
tions about the distribution of missing values. On its face, this statement
seems obvious. However, for a number of commonly used methods, users
are not always aware of the assumptions that underlie the methods and the
results drawn from applying them. This lack of awareness is particularly
true of single imputation methods—such as last or baseline observation car-
ried forward (LOCF or BOCF) and random effects (mixed effects) regres-
sion models—that rely on strong parametric assumptions.
In the second section of the chapter, we introduce a set of notation that
is used throughout (and in Chapter 5). The third section summarizes the
assumptions that underlie inference from incomplete data (missing com-
pletely at random, missing at random, etc.). The remaining sections describe
commonly-used methods of analysis and offer comments and recommenda-
tions about their use in practice. In some cases, we offer recommendations
for further research and investigation.
For both this chapter and the next, it is important to note the role of
software. None of the techniques for either the primary analysis of clini-

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
cal trial data or for the subsequent sensitivity analysis that are described
in the next chapter can be widely used, either at the U.S. Food and Drug
Administration (FDA) or by trial sponsors, unless they are made available
in one or more of the standard statistical software packages. It is beyond
the scope of this report to describe and review specific software packages
or routines. Many of the commonly used commercial and open-source
packages used in the analysis of trials for the regulatory setting (SAS, SPSS,
Stata, and R) allow for the analysis of incomplete data, using methods such
as direct likelihood, Bayesian analysis, generalized estimating equations,
inverse probability weighting, and multiple imputation.
Statistical software is evolving at a rapid pace to keep up with new
developments in methodology and to implement proven methods. How-
ever, although progress is being made, the current suite of available tools
remain lacking regarding augmented inverse probability weighting (IPW),
missing not at random (MNAR) models, and analysis of the sensitivity to
assumptions concerning the mechanism for missing outcome data. Given
the urgency of the greater application of MNAR models and sensitivity
analysis, we encourage the development and release of software tools to
address these deficiencies. We again emphasize the importance of under-
standing and communicating the assumptions underlying analyses that are
implemented in whatever software package is being used to draw inference
about treatment effects. In most cases, communication of this information
will necessitate referring to technical documentation for a specific analysis
routine or procedure.
PRINCIPLES
There is no universal method for handling incomplete data in a clinical
trial. Each trial has its own set of design and measurement characteristics.
There is, however, a set of six principles that can be applied in a wide
variety of settings.
First, it needs to be determined whether missingness of a particular
value hides a true underlying value that is meaningful for analysis. This
may seem obvious but is not always the case. For example, consider a lon-
gitudinal analysis of CD4 counts in a clinical trial for AIDS. For subjects
who leave the study because they move to a different location, it makes
sense to consider the CD4 counts that would have been recorded if they
had remained in the study. For subjects who die during the course of the
study, it is less clear whether it is reasonable to consider CD4 counts after
time of death as missing values.
Second, the analysis must be formulated to draw inference about an
appropriate and well-defined causal estimand (see Chapter 2). The causal
estimand should be defined in terms of the full data (i.e., the data that were

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
intended to be collected). It is important to distinguish between the esti-
mand and the method of estimation, the latter of which may vary depend-
ing on assumptions.
Third, reasons for missing data must be documented as much as pos-
sible. This includes full and detailed documentation for each individual
of the reasons for missing records or missing observations. knowing the
reason for missingness permits formulation of sensible assumptions about
observations that are missing, including whether those observations are
well defined.
Fourth, the trial designers should decide on a primary set of assump-
tions about the missing data mechanism. Those primary assumptions then
serve as an anchor point for the sensitivity analyses. In many cases, the
primary assumptions can be missing at random (MAR) (see Chapter 1).
Assumptions about the missing data mechanism must be transparent and
accessible to clinicians.
Fifth, the trial sponsors should conduct a statistically valid analysis
under the primary missing data assumptions. If the assumptions hold, a
statistically valid analysis yields consistent estimates, and standard errors
and confidence intervals account for both sampling variability and for the
added uncertainty associated with missing observations.
Sixth, the analysts should assess the robustness of the treatment effect
inferences by conducting a sensitivity analysis. The sensitivity analysis
should relate treatment effect inferences to one or more parameters that
capture departures from the primary missing data assumption (e.g., MAR).
Other departures from standard assumptions should also be examined, such
as sensitivity to outliers.
NOTATION
Throughout this and the next chapter, we use the following conven-
tions. Let X represent treatment indicators and baseline (i.e., pretreatment)
covariates that are fully observed and conditioned on in the primary statisti-
cal analysis (such as study center and stratification variables). Another way
to characterize X is as the design variables that would be adjusted for or
conditioned on in the final analysis. Let Y denote the primary outcome vari-
able, which may be a single outcome, a vector of repeated measurements,
or a time to event. Auxiliary variables are denoted by V; these variables
are distinct from design variables X and may represent individual-level
characteristics (either pre- or posttreatment) that aid in drawing inference
from incomplete response data. Information on compliance or side effects
of treatments that may be useful for modeling the missing data but are not
included in the primary analytic model may be included in V. (We note that
the collection and use of all available covariate information that is predic-

OCR for page 47

0 MISSING DATA IN CLINICAL TRIALS
tive of the outcome in the full data model, and the occurrence of missing
outcome data in the missing data model, is important and can dramatically
improve the associated inference.)
In the absence of missing data, let Z denote the values of (V,Y) for an
individual participant. For simplicity, we assume throughout that observa-
tions on (V,Y) are independent within levels of X.
To distinguish between missing and observed data, let M denote the
indicator of whether Y is missing. In repeated measures studies, we include
a subscript for repeated measures. That is, if the intended outcome mea-
sures are Y = (Y1,Y2,…,YK), the corresponding missingness indicators are
M = (M1,M2,…,MK), where Mj = 1 if Yj is missing, and Mj = 0 if it is
observed. We generally will assume that Y and V have the same missing
data pattern, though in practice this restriction can be relaxed.
In many situations, missing values can be denoted by a single value,
such as M = 1; in other settings, it may be useful to allow more than one
missing-value code to indicate different types of missing data, such as M = 1
for lack of efficacy, M = 2 for inability to tolerate a drug because of side
effects, M = 3 for a missed clinic visit, and so on. This notation allows for
different modeling assumptions for the different causes of missing data.
ASSUMPTIONS ABOUT MISSING DATA AND
MISSING DATA MECHANISMS
The general missing data taxonomy described in this section is fully
presented in Rubin (1976) and Little and Rubin (2002). Elaboration on the
sequential versions of these for longitudinal data can be found in Robins
et al. (1995) and Scharfstein et al. (1999). Discussion of the more general
notion of coarsening can be found in Heitjan (1993) and Tsiatis (2006).
Missing Data Patterns and Missing Data Mechanisms
It is useful to distinguish the pattern of the missing data from the miss-
ing data mechanism. The pattern simply defines which values in the data set
are observed and which are missing, as described for an individual by the
vector of indicators M = (M1,…,MK). Some methods for handling missing
data apply to any pattern of missing data; other methods assume a special
pattern.
A simple example of a special pattern is univariate missing data, where
missingness is confined to a single variable. Another special pattern is
monotone missing data, where the variables can be arranged so that Yj+1
is missing for all cases where Yj is missing. This pattern commonly arises in
longitudinal data, when the sole cause of missingness is attrition or drop-
outs, and there are no intermittently missing values.

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
The missing data mechanism relates to why values are missing and the
connection of those reasons with treatment outcomes. The missing data
mechanism can be represented in terms of the conditional distribution
[M | X,V,Y]1 for the missing data indicators given the values of the study
variables that were intended to be collected. To emphasize that this distri-
bution may depend both on observed and missing values of V and Y, this
is sometimes written as [M | X,Vobs,Vmis,Yobs,Ymis].
Missing Completely at Random
Missing data are missing completely at random (MCAR) if missing-
ness does not depend on values of the covariates, auxiliary and outcome
variables (X,V,Y). That is,
[M | X,Vobs,Vmis,Yobs,Ymis] = [M]. (1)
MCAR is generally a very strong assumption, unlikely to hold in
many clinical trials. Situations in which MCAR might be plausible include
administrative censoring, when outcomes are censored because a study is
terminated at a planned date, and the outcome has not yet occurred for late
accruals; and designed missing data, when expensive or onerous measure-
ments are recorded only for a random subsample of participants. A closely
related concept is conditional MCAR, which allows for the independence
of the missing values, but is conditional on covariates X. Finally, it is useful
to mention that MCAR is unique in that one can test whether the miss-
ing outcomes are MCAR if they are at least missing at random, which is
discussed below.
Missing at Random
A more realistic condition than MCAR for many studies is MAR, which
requires that missingness is independent of missing responses Ymis and Vmis,
conditionally on observed responses (Yobs,Vobs) and covariates X. That is,
[M | X,Vobs,Vmis,Yobs,ymis] = [M | X,Vobs,Yobs]. (2)
If Y and V are considered to be random variables with distributions
based on a model, then one can show that condition (2) is equivalent to
[Ymis,Vmis | X,Vobs,Yobs,M] = [Ymis,Vmis | X,Vobs,Yobs], (3)
1 The notation [a | b, c] (e.g., [M | X,V,Y] ) is used to denote the conditional distribution of
a given the joint distribution of b and c.

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
which implies that the predictive distribution of the missing variables given
the observed variables does not depend on the pattern of missing values.
This version of MAR is relevant from an analysis perspective because it
characterizes the predictive distribution of the missing values, which is the
basis for principled methods of imputation.
As we describe below, many standard analysis methods for incomplete
data operate under the MAR assumption. It is therefore imperative that
both the MAR assumption and the assumptions underlying the full data
model (e.g., multivariate normality) be thoroughly justified before results
from these models can be considered valid for treatment comparisons. In
general: (a) even under MAR, different assumptions about the full data
model will lead to different predictive distributions; (b) with incomplete
data, assumptions about both the missing data mechanism and the full
data model are unverifiable from the data; and (c) nevertheless, inference
and therefore decisions about treatment effect often crucially depend upon
them.
MAR for Monotone Missing Data Patterns
With longitudinal repeated measures, and even for event time out-
comes, the MAR assumption is not always intuitive for a general pattern
of missing values.
However, it has a simple interpretation in the case of monotone miss-
ing data, such as that caused by dropouts. Suppose the data intended to
be collected comprise repeated measures on an outcome Y, denoted by
Y1,…,YK. Let Mj = 1 if Yj is missing, and let Mj = 0 if Yj is observed. Under
monotone missingness, if observation j is missing (Mj = 1), then all subse-
quent observations also are missing (Mj+1 = … = MK = 1).
–
At any given time j, let Yj = (Y1,…,Yj–1) denote the history of measure-
+
ments up to but not including time j, and let Yj = (Yj,…,YK) denote the
future measurements scheduled, including and after time j. At time j, the
predictive distribution of future values given the observed history is denoted
+ –
by [Yj | Yj ,X,Mj = 0]. The MAR condition holds if predictions of future
measurements for those who drop out at time j are equivalent in distribu-
tion to predictions for those who have observed data at and after time j.
Formally, MAR is equivalent to
+ +
– –
[Yj | Yj ,X,Mj = 1] = [Yj | Yj ,X,Mj = 0]. (4)
Hence, under MAR, missing values at time j and beyond can be predicted
sequentially from the histories of participants still in the study at time j.
MAR for monotone missing data patterns also can be written in terms
of the probability of dropouts at each measurement occasion. At time j,

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
the dropout probability is P(Mj = 1 | Mj–1 = 0). In general, this probability
can depend on any aspect of the observations intended to be collected.
MAR states that the dropout probability can only depend on observed
data history,
+
– –
P(Mj = 1 | Mj–1 = 0,Yj ,Yj ,X) = P(Mj = 1 | Mj–1 = 0,Yj ,X). (5)
This representation shows that one can think of the MAR assumption as
a sequentially random dropout process, where the decision to drop out at
time j is like the flip of a coin, with probability of ‘heads’ (dropout) depend-
ing on the measurements recorded through time j – 1.
Both (4) and (5) can be generalized by allowing the past measurements
–
to include auxiliary covariates. Specifically, let Zj = (Y1,…,Yj–1,V1,…,Vj–1)
denote the observed history of both outcomes and auxiliaries. Then MAR
– –
can be restated by replacing Yj with Zj in (4) and (5). In fact, the
MAR assumptions (4, 5) change depending on the set of auxiliary variables
V included in the analysis. The validity of the MAR assumption can be
improved by measuring and including auxiliary variables that are predictive
of whether the outcome variables are missing and predictive of the values
of the missing variables.
Missing Not at Random
MAR will fail to hold if missingness or dropout depends on the values
of missing variables after conditioning on the observed variables. When
MAR fails to hold, missing data are said to be MNAR.
For a monotone missing data pattern, missingness will be MNAR if
–
there exists, for any j, at least one value of Zj for which
+ +
– –
[Yj | Zj ,X,Mj = 1] ≠ [Yj | Zj ,X,Mj = 0], (6)
+
or equivalently, there exists, for any j, at least one value of Yj , such that
+
– –
P(Mj = 1 | Mj–1 = 0,Zj ,Yj ,X) ≠ P(Mj = 1 | Mj–1 = 0,Zj ,X). (7)
For (6), the consequence of MNAR is that the prediction of future
observations for those who drop out cannot be reliably predicted using
+
data observed prior to dropping out; or, that the distribution [Yj |Zj ,Xj ]
– –
differs between those who do and do not drop out at time j. Because these
differences cannot be estimated from the observed data, they are entirely
assumption driven. This is the central problem of missing data analysis in
clinical trials.

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
Example: Hypertension Trial with Planned and Unplanned Missing Data
Murray and Findlay (1988) describe data from a large multicenter trial
of metopropol and ketanserin, two antihypertensive agents for patients with
mild to moderate hypertension, with diastolic blood pressure as the outcome
measure of interest. The double-blind treatment phase lasted 12 weeks,
with clinic visits scheduled for weeks 0, 2, 4, 8, and 12. The protocol stated
that patients with diastolic blood pressure exceeding 110 mmHg at either
the 4- or 8-week visit should “jump” to an open follow-up phase—a form
of planned dropout. In total, 39 of the 218 metopropol patients and 55 of
the 211 ketanserin patients jumped to open follow-up.
In addition, 17 metopropol patients and 20 ketanserin patients had
missing data for other reasons, including side effects. Analyses of the
observed data clearly showed that those with missing blood pressure read-
ings differed systematically from the patients who remained in the study,
as would be predicted by the protocol for jumping to the open phase. This
example provides an illustration of the importance of defining what is
represented by a missing outcome. For the participants who were removed
from protocol, it is possible to treat the missing values as values that would
be observed had the individuals remained on treatment. The mechanism for
those with missing values is MAR because missing outcomes resulted from
the value of a recorded intermediate outcome variable for blood pressure,
and are therefore a function of an observed value.
Summary
1. Inferences from incomplete data, whether model-based or not, rely
on assumptions—known as missing data mechanisms—that cannot be
tested from the observed data.
2. A formal taxonomy exists for classifying missing data mechanisms,
including for longitudinal and event history data. The mechanisms can be
classified as MCAR, MAR, and MNAR.
3. Missing data mechanisms describe the relationship between the
missing data indicator(s) M, the full outcome data Y = (Yobs,Ymis), design
variables X, and auxiliary covariates V. Traditionally, these assumptions
characterize restrictions on the distribution of M given (Yobs,Ymis,X,V). Each
has an equivalent representation in terms of the predictive distribution of
missing responses, namely Ymis given (M,Yobs,X,V).
COMMONLY USED ANALYTIC METHODS UNDER MAR
Three common approaches to the analysis of missing data can be dis-
tinguished: (1) discarding incomplete cases and analyzing the remainder

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
(complete-case analysis); (2) imputing or filling in the missing values and
then analyzing the filled-in data; and (3) analyzing the incomplete data by
a method that does not require a complete (i.e., a rectangular) data set.
Examples of (3) include likelihood-based methods, such as maximum
likelihood (ML), restricted ML, and Bayesian methods; moment-based
methods, such as generalized estimating equations and their variants; and
semiparametric models for survival data, such as the Cox proportional
hazards model. Multiple imputation (Rubin, 1987; Little and Rubin, 2002),
an extension of single imputation that allows uncertainty in the imputations
to be reflected appropriately in the analysis, is closely related to Bayesian
methods (discussed later in this chapter).
Deletion of Cases with Missing Data
A simple approach to missing data is complete-case analysis, also
known as listwise deletion, in which incomplete cases are discarded and
standard analysis methods are applied to the complete cases. In many sta-
tistical packages, it is the default analysis.
Although it is possible to list conditions under which an analysis of
complete cases provides a valid inference (essentially, conditional MCAR),
this method is generally inappropriate for a regulatory setting. When miss-
ingness is in the outcome, the MAR assumption is generally weaker and can
reduce bias from deviations from MCAR by making use of the information
from incomplete data. Furthermore, when missingness is appreciable, rejec-
tion of incomplete cases will involve a substantial waste of information and
increase the potential for significant bias.2
In addition, if data are not collected after withdrawal from treat-
ment, then the MAR assumption relies only on information accumulated
while subjects are on treatment. Hence, any method that relies on MAR
is estimating the mean under the condition that everyone had remained
on treatment. This generally will not provide a valid estimator of the
intention-to-treat effect. On the other hand, if data are collected after with-
drawal from treatment, this information can be used either within inverse
probability weighting (IPW) or in an imputation context to estimate an
intention-to-treat effect under MAR (Hogan and Liard, 1996). It is for this
2 When data are not MCAR, the bias of complete-case analysis depends on the degree of
deviation from MCAR, the amount of missing data, and the specifics of the analysis. In par-
ticular, the bias in estimating the mean of a variable is the difference in the means for complete
and incomplete cases multiplied by the fraction of incomplete cases. Thus, the potential for
bias increases with the fraction of missing data. With respect to regression models, complete-
case analysis yields valid inferences in regression if the model is correctly specified and missing-
ness depends on the predictor variables, observed or missing, but not on the outcome. (For
details, see Little and Rubin [2002].)

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
reason that we emphatically recommend aggressive collection of outcome
data after individuals withdraw from treatment.
Inverse Probability Weighting
Univariate Outcome
When data are MAR but not MCAR, a modification of complete-case
analysis is to assign a sampling weight to the complete cases. This tends to
reduce bias, to the extent that the probability of being observed is a func-
tion of the other measured variables. Consider the simple case in which
the intended outcome is Y, the design variables are X, and some auxiliary
variables V are available. As usual, M = 1 indicates that Y is missing. To
describe IPW, it is useful to introduce a response indicator, R = 1 – M, such
that R = 1 when Y is observed and R = 0 when it is missing.
An IPW estimator for the mean of Y can be computed as follows:
1. Specify and fit a model for π(X,V,q) = Pq(R = 1 | X,V), for example
using logistic regression.
2. Estimate the mean of Y using the weighted average
RiYi
µ = (1 / n)∑
ˆ ;
)
( (8)
ˆ
π Xi ,Vi ,θ
i
that is, the average of the observed Y weighted inversely by the probability
of being observed.
3. Standard error estimators can be computed analytically or by boot-
strap methods. (For details on the bootstrap estimator of variance, see
Efron and Tibshirani, 1993.)
For large samples, this method properly adjusts for bias when the data
are MAR, provided the model for π(X,V,q) is correctly specified. In finite
samples, the method can yield mean estimates that have high variance when
some individual-specific weights are high (i.e., when π is close to zero). An
alternative is to create strata based on the predicted probability of being
complete and then weight respondents by the inverse of the response rate
within these strata. Strata can be chosen to limit the size of the weights and
hence control variance.
In addition to the MAR assumption, the IPW method requires two
other key assumptions: (1) there are no covariate profiles (X,V) within
which Y cannot be observed and (2) the support of the missing data distri-

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
bution is the same as that for the observed data distribution. Technically,
(1) stipulates that P(R = 1 | X,V) > 0 for all possible realizations of (X,V). A
potential restriction imposed by (2) is that individual missing values cannot
be imputed outside the range of observed values.
IPW Regression for Repeated Measures
With repeated measures, a convenient way to estimate the treatment
effect is through a regression model for the mean of the outcome vector
conditional on the design variables X. With fully observed data, repeated
measures regression models can be fit using generalized estimating equa-
tions (GEE) (zeger and Liang, 1986).
With fully observed data, a desirable property of regression parameter
estimates from GEE is that they retain such properties as consistency and
asymptotic normality regardless of the assumed within-subject (longitudi-
nal) correlation structure. When data are missing, this property no longer
holds, and regression estimates may depend strongly on the assumed cor-
relation structure (see Hogan et al., 2004, for an empirical example).
When missingness is MAR and follows a monotone pattern, the IPW
method can be used to obtain consistent estimates of regression parameters
using a specified procedure. Here, we emphasize that auxiliary information
–
should be included in the observed-data history, Zj = (Y1,…,Yj–1,V1,…,Vj–1)
and the model for π(X,V,q). The procedure is as follows:
1. Specify the regression model that would be used had all the intended
data been collected.
– –
2. Let fj(X,Zj ; q) = P(Rj = 1 | Rj–1 =1,X,Zj ; q) denote the probability
that Yj is observed.
ˆ
3. Specify and fit a model for fj; denote the estimated parameters by θ .
j
( ) ( )
4. Let π j X, Z j ;θ = ∏ φ j X, Zk ;θ
− −
denote the probability that an
k =1
individual has remained in the study to time j.
5. Fit the regression specified in Step 1, and weight individual contri-
{( )}
−1
−ˆ
butions to the model by π X, Z ;θ . Use the assumed independence
j j
correlation structure.
6. Use the bootstrap technique for standard error estimation.
In large samples, the IPW GEE yields consistent estimators when the
response probability model is correctly specified, but again may have high
variance when individual weights are large. The augmented IPW GEE pro-
cedure (discussed below) can be used to partially remedy this weakness.

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
Target Distribution (or Parameter) Most often in clinical trials, primary
interest centers on the distribution [Y | X] = [Yobs,Ymis | X], where X
includes the treatment group and possibly other design variables. The target
distribution is related to the full-data distribution through the identity:
)
)(
) )
( ( (
p yobs , ymis x = ∑ p yobs , ymis , m x = ∑ p yobs , m x p ymis yobs , m, x . (19)
m m
Hence, inference about the target distribution relies critically on the untest-
able assumptions being made about p(ymis | yobs,m,x).
Selection and Pattern Mixture Models Two broad classes of models for the
joint distribution of Y and M are selection models, which factor the full
data distribution as
Yobs , Ymis , M X = M Yobs , Ymis , X × Yobs , Ymis X (20)
and pattern mixture models, which factor the full-data distribution as
Yobs , Ymis , M X = Yobs , Ymis M, X × M X . (21)
Pattern mixture models can be factored to make the missing data extrapola-
tion explicit within missing data pattern M, that is
Yobs , Ymis , M X = Ymis Yobs , M, X × Yobs M, X × M X . (22)
Selection Models
Selection models can be divided into two types, (1) parametric and
(2) semiparametric. Parametric selection models were first proposed by
Rubin (1974) and Heckman (1976), based on parametric assumptions
for the joint distribution of the full data (usually, a normal distribution
for responses and a probit regression for the missing data indicators). For
repeated measures, parametric selection models were described by Diggle
and kenward (1994), and semiparametric models were proposed by Robins
et al. (1995) and Rotnitzky et al. (1998).
To illustrate a standard formulation, assume the full-response data
comprise (Y1,Y2), and the objective is to capture the mean of Y2 in each
treatment group. Further, assume Y2 is missing on some individuals. A
parametric selection model might assume that the full-response data follows
a bivariate normal distribution:
)
(
(Y1,Y2 ) X = x ~ N µ ( x), Σ ( x) , (23)

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
and the “selection mechanism” part of the model follows a logistic
regression
{( )} = α0 + α1Y1 + α 2Y2 .
logit P M = 0 Y1 , Y2 , X (24)
Parametric selection models can be fit to observed data, even though
there appears to be no empirical information about several of the model
parameters. Specifically, there is no information about the association
between M and Y2 because Y2 is missing. Likewise, there is no informa-
tion about the mean, variance, and covariance parameters involving Y2.
The model can be fit because of the parametric and structural assump-
tions being imposed on the full-data distribution. This can be seen as
both beneficial or as a reason to exercise extreme caution. Convenience
is the primary benefit, especially if the model can be justified on scientific
grounds. The reason for caution is that, again, none of the assumptions
underlying this parametric model can be checked from the observed data.
In parametric selection models fit under the MNAR assumption, identifica-
tion of parameters and sensitivity to assumptions raises serious problems:
see, for example, kenward (1998), Little and Rubin (2002, Chapter 15), the
discussion of Diggle and kenward (1994), and Daniels and Hogan (2008,
Chapter 9).
Semiparametric selection models do not assume a parametric model
for the full-data response distribution, so they are therefore somewhat less
sensitive to these assumptions. These models are discussed in greater detail
in Chapter 5.
Pattern Mixture Models
Pattern mixture models were proposed for repeated measures data by
Little (1993, 1994); a number of extensions and generalizations have fol-
lowed. The connection between pattern mixture and selection models is
described in Little and Wang (1996), in Molenberghs et al. (1998), and in
Birmingham et al. (2003).
The models can be viewed from an imputation perspective, in which
missing values Ymis are imputed from their predictive distribution given the
observed data including M; that is,
p(ymis | yobs,x,M). (25)
Under MAR, this equals p(ymis | yobs,x). However, if data are not MAR, the
predictive distribution (25) is a direct by-product of the pattern mixture
formulation because it conditions on the missing data indicators. This
more direct relationship between the pattern mixture formulation and the

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
predictive distribution for imputations yields gains in transparency and
computational simplicity in some situations, as illustrated in kenward and
Carpenter (2008, Section 4.6).
Under MNAR, the selection model factorization requires full specifica-
tion of the model for the missing data mechanism. Some pattern mixture
models avoid specification of the model for the missing data mechanism
in MNAR situations by using assumptions about the mechanism to yield
restrictions on the model parameters (Little, 1994; Little and Wang, 1996;
Hogan and Laird, 1997).
Many pattern mixture formulations are well suited to sensitivity analysis
because they explicitly separate the observed data distribution from the pre-
dictive distribution of missing data given observed data. Sensitivity analyses
can be formulated in terms of differences in mean (or other parameter)
between those with observed and those with missing responses.
Advantages and Disadvantages of Selection and Pattern Mixture Models
Substantively, it seems more natural to assume a model for the full-
data response, as is done in selection models. For example, if the outcome
is blood pressure, it may seem natural to assume the combined distribution
of blood pressures over observed and missing cases follows a single distri-
bution, such as the normal distribution. Moreover, if MAR is plausible,
a likelihood-based selection formulation leads directly to inference based
solely on the model for the full-data response, and inference can pro-
ceed by ML.
However, it may not be intuitive to specify the relationship between
nonresponse probability and the outcome of interest, which typically has
to be done in the logit or probit scale. Moreover, the predictive distribu-
tion of missing responses typically is intractable, so it can be difficult to
understand in simple terms how the missing observations are being imputed
under a given model. And, as indicated above, selection models are highly
sensitive to parametric assumptions about the full data distribution. This
concern can be alleviated to some degree by the use of semiparametric
selection models.
Specification of pattern mixture models also appeals to intuition in the
sense that it is natural to think of respondents and nonrespondents having
different outcome distributions. The models are transparent with respect
to how missing observations are being imputed because the within-pattern
models specify the predictive distribution directly.
Pattern mixture models can present computational difficulties for esti-
mating treatment effects because of the need to average over missing data
patterns; this is particularly true of pattern mixture specifications involving
regression models within each pattern.

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
Examples: Pattern Mixture Model for Continuous Outcomes
Daniels and Hogan (2008, Chapter 10) use pattern mixture models to
analyze data from a randomized trial of recombinant human growth hor-
mone (rHGH) on muscle strength in elderly people. More than 120 people
were randomized to four different treatment arms. The primary outcome
in this trial was quadriceps strength, assessed at baseline, 6 months, and
12 months. A pattern mixture model was fit under MAR and parameterized
to represent departures from MAR. The example shows how to construct
sensitivity plots to assess the effect of departures from MAR on the infer-
ences about treatment effect. An important feature of the model is that the
fit to the observed data is unchanged at different values of the sensitivity
parameters. However, the model does rely on parametric assumptions, such
as normality. These assumptions can be checked for the observed data, but
have to be subjectively justified for the missing data.
Example: Pattern Mixture Model for Binary Outcomes
Daniels and Hogan (2008, Chapter 10) use pattern mixture models
to analyze data from an intervention study for smoking cessation among
substance abusers. The primary outcome was smoking status, assessed at
baseline, 1 month, 6 months, and 1 year. A pattern mixture model was fit
under MAR and expanded to allow for MNAR missingness. In addition
to presenting sensitivity analysis, the example shows how to incorporate
prior information about the smoking rate of dropouts to obtain a summary
inference about treatment effect.
Sensitivity of Parametric Selection Models
The sensitivity of MNAR selection models to distributional assump-
tions is illustrated by Verbeke and Molenberghs (2000, Chapter 17), who
show that, in the context of an onychomycosis study, excluding a small
amount of measurement error drastically changes the likelihood ratio test
statistics for the MAR null hypothesis. In a separate example, kenward
(1998) revisited the analysis of data from a study on milk yield performed
by Diggle and kenward (1994). In this study, the milk yields of 107 cows
were to be recorded during 2 consecutive years. Data were complete in the
first year, but 27 measurements were missing in year 2 because these cows
developed mastitis, which seriously affected their milk yield and there-
fore deemed missing for the purposes of the study. Although in the initial
paper there was some evidence for MNAR, kenward (1998) showed that
removing two anomalous profiles from the 107 completely eliminated this
evidence. kenward also showed that changing the conditional distribution

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
of the year 2 yield, given the year 1 yield, from a normal to a heavy-tailed
t distribution led to a similar conclusion.
Several authors have advocated using local influence tools for purposes
of sensitivity analysis (Thijs et al., 2000; Molenberghs et al., 2001; Van
Steen et al., 2001; Verbeke et al., 2001; Jansen et al., 2006). In particular,
Molenberghs et al. (2001) revisited the mastitis example. They were able
to identify the same two cows also found by kenward (1998), in addition
to another one. However, it is noteworthy that all three are cows with
complete information, even though local influence methods were originally
intended to identify subjects with other than MAR mechanisms of missing-
ness. Thus, an important question concerns the combined nature of the data
and model that leads to apparent evidence for an MNAR process. Jansen et
al. (2006) showed that a number of features or aspects, but not necessarily
the (outlying) nature of the missingness mechanism in one or a few subjects,
may be responsible for an apparent MNAR mechanism.
Selection and Pattern Mixture Models: Literature
The literature covering selection and pattern mixture models is extensive.
Review papers that describe, compare, and critique these models include
Little (1995), Hogan and Laird (1997, 2004), kenward and Molenberghs
(1999), Fitzmaurice (2003), and Ibrahim and Molenberghs (2009). The
models are also discussed in some detail in Little and Rubin (2002), Diggle
et al. (2002), Fitzmaurice et al. (2004), Molenberghs and kenward (2007),
and Daniels and Hogan (2008).
An extensive literature also exists on extensions of these models involving
random effects, sometimes called shared-parameter or random-coefficient-
dependent models. Reviews are given by Little (1995) and Molenberghs
and kenward (2007). Although these models can be enormously useful for
complex data structures, they need to be used with extreme caution in a
regulatory setting because of the many layers of assumptions needed to fit
the models to data.
Recommendations
Recommendation 9: Statistical methods for handling missing data
should be specified by clinical trial sponsors in study protocols, and
their associated assumptions stated in a way that can be understood
by clinicians.
Since one cannot assess whether the assumptions concerning missing
data are or are not valid after the data are collected, one cannot assert that
the choice of missing data model made prior to data collection needs to be

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
modified as a result of a lack of fit. Thus, one needs to carry out a sensitivity
analysis. Of course, model fitting diagnostics can be used to demonstrate
that the complete data model may need to be adjusted, but the missing data
model raises no additional complexities.
Recommendation 10: Single imputation methods like last observation
carried forward and baseline observation carried forward should not be
used as the primary approach to the treatment of missing data unless
the assumptions that underlie them are scientifically justified.
Single imputation methods do not account for uncertainty associated with
filling in the missing responses. Further, LOCF and BOCF do not reflect
MAR data mechanisms.
Single imputation methods are sometimes used not as a method for
imputation but rather as a convenient method of sensitivity analysis when
they provide a clearly conservative treatment of the missing data. This
can obviously be accomplished by using a best possible outcome for the
missing values in the control group and a worst possible outcome for
the missing values in the treatment group. If the result of such a technique
is to demonstrate that the results of the primary analysis do not depend on
the treatment of the missing data, such an approach can be useful. How-
ever, techniques that are often viewed as being conservative and therefore
useful in such an approach, are sometimes not conservative and so care is
required.
Recommendation 11: Parametric models in general, and random effects
models in particular, should be used with caution, with all their assump-
tions clearly spelled out and justified. Models relying on parametric
assumptions should be accompanied by goodness-of-fit procedures.
We acknowledge that this is an area where the current toolkit is some-
what lacking, and therefore more research is needed. Some contributions
to this area include Verbeke et al. (2001, 2008), Gelman et al. (2005), and
He and Raghunathan (2009).
Recommendation 12: It is important that the primary analysis of the
data from a clinical trial should account for the uncertainty attribut-
able to missing data, so that under the stated missing data assump-
tions the associated significance tests have valid type I error rates and
the confidence intervals have the nominal coverage properties. For
inverse probability weighting and maximum likelihood methods, this
can be accomplished by appropriate computation of standard errors,
using either asymptotic results or the bootstrap. For imputation, it

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
is necessary to use appropriate rules for multiply imputing missing
responses and combining results across imputed datasets because single
imputation does not account for all sources of variability.
Recommendation 13: Weighted generalized estimating equations
methods should be more widely used in settings when missing at
random can be well justified and a stable weight model can be deter-
mined, as a possibly useful alternative to parametric modeling.
Recommendation 14: When substantial missing data are anticipated,
auxiliary information should be collected that is believed to be associ-
ated with reasons for missing values and with the outcomes of interest.
This could improve the primary analysis through use of a more appro-
priate missing at random model or help to carry out sensitivity analyses
to assess the impact of missing data on estimates of treatment differ-
ences. In addition, investigators should seriously consider following
up all or a random sample of trial dropouts, who have not withdrawn
consent, to ask them to indicate why they dropped out of the study,
and, if they are willing, to collect outcome measurements from them.
INSTRUMENTAL VARIABLE METHODS FOR ESTIMATING
TREATMENT EFFECTS AMONG COMPLIERS
Estimates of treatment effects for all individuals randomized as in
intention-to-treat analysis are protected against bias by the randomiza-
tion. In this estimand, individuals who are assigned a treatment but never
comply with it, perhaps because they cannot tolerate treatment side effects,
are treated in the same way as individuals who comply with the treatment.
Sometimes, particularly in secondary analyses, interest lies in the treat-
ment effect in the subpopulation of individuals who would comply with a
treatment if assigned to it. The average treatment effect in this population
is called the complier-average causal effect (CACE) (Baer and Lindeman,
1994; Angrist et al., 1996; Imbens and Rubin, 1997a, 1997b; Little and
yau, 1998; White, 2005).
An alternative estimand to the CACE is the average treatment effect
(ATE) (Robins, 1989; Robins and Greenland, 1996). It is defined as the dif-
ference in mean outcome if all individuals had been assigned and complied
with the treatment (T = 1) and the mean if all individuals had been assigned
and complied with the control treatment (T = 0). The ATE is defined for the
whole target population, and it requires assumptions about the treatment
outcome for noncompliers had they complied with the treatment. Whether
this counterfactual event is meaningful typically depends on context. For
example, noncompliance to a behavioral treatment, such as an exercise

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
regime, might plausibly be changed by increased motivation, as might occur
if evidence of success of the treatment becomes widely known. In contrast,
if noncompliance to a drug is the result of intolerable side effects, then com-
pliance may require a reformulation of the drug to remove the side effects.
Such reformulation may change the properties of the drug, and estimation
of the ATE is consequently more speculative.
Simple approaches to estimating the CACE or the ATE include as-
treated analysis, in which participants are classified according to the treat-
ment actually received, and per-protocol analysis, which restricts analysis
to participants who comply with the assigned treatment. These analyses are
subject to selection bias in that participants who comply with a treatment
may be a biased sample of participants randomized to that treatment. The
bias may be reduced by adjustment for covariates, but it remains a major
concern.
Although this is often characterized as a problem of selection bias,
recent approaches have suggested alternatives to as-treated and per-protocol
analyses by applying a missing-data perspective. Consider a binary variable
C(T) taking the value 1 if an individual would comply with a particular
treatment T if assigned to it, and 0 otherwise. We call this variable principal
compliance, to distinguish it from observed compliance, which depends on
the treatment actually assigned. It is a special case of principal stratifica-
tion (Frangakis and Rubin, 2002). Principal compliance C(T) is observed
for participants who are assigned to treatment T, but it is not observed for
participants assigned other treatments, T′, so for these individuals the values
of C(T) can be regarded as missing.
In simple trials involving an active treatment and a control treatment,
an alternative to as-treated and per-protocol estimates is based on the idea
of treating the randomization as an instrumental variable (IV), in economic
parlance. The IV estimator yields a direct estimate of the CACE, and it is
protected from selection bias by the randomization. However, it requires
certain assumptions to be valid, and it also yields estimators with poten-
tially high variance, particularly if the treatment compliance rate is low.
Model-based versions of the IV estimator based on treating C as missing for
some participants have been proposed that are potentially more efficient,
although they make stronger distributional assumptions. For a nontechnical
article comparing this approach with as-treated and per-protocol estimates,
see Little et al. (2009) for a discussion of extensions to two or more active
treatments, see Long, Little, and Lin (in press).
An example illustrating the above discussion and a number of associ-
ated issues is provided by the evaluation of a trial to assess the effect of an
influenza vaccine (Hirano et al., 2000). The trial randomly assigned physi-
cians to encouragement (T1) or no encouragement (T2) to vaccinate their
patients against influenza. The primary endpoint was hospitalization, and

OCR for page 47

0 MISSING DATA IN CLINICAL TRIALS
the intention-to-treat estimates showed of those encouraged, 7.8 percent
were hospitalized and of those not encouraged 9.2 percent were hospital-
ized. However, the trial had only a weak effect on the actual taking of the
vaccine: of those encouraged, 31 percent of patients received the vaccine; of
those not encouraged, 19 percent of patients received the vaccine.
Therefore, to better understand the trial results, at least a secondary
estimand of interest was CACE, that is, in this case, the effect of encourage-
ment on hospitalization for the patients who would have been vaccinated
if their physician had been encouraged but not vaccinated if their physician
had not been encouraged. Assuming the standard exclusion restrictions of
IV, CACE was estimated as an 8.2 percent reduction in hospitalization.
yet, even this turned out to represent only part of a better understanding
of the trial results.
In this study, there were a number of good baseline predictors of compli-
ance under both arms, C(T1) and C(T2), and thus, the effect of compliers
could be in part identified without the need of exclusion restrictions. When
these restrictions were relaxed, the effect of encouragement on compliers was
estimated at 3.7 percent, but there was at least as large of an estimated effect
(5.3 percent) of encouragement on hospitalization for always-takers. Later
commentaries on these results suggested that the latter effect is explainable
by the earlier time in the season at which the always-takers likely receive
the vaccine when encouraged, compared to when not encouraged. Since this
effect is comparable to CACE, it suggested that the effect of vaccination lies
more in its timing and not only on its receipt.
To further explicate this method, we offer an example of coprimary
outcomes that induce missing data. For randomized controlled trials with
two (or more) coprimary outcomes, say E and Y, values of E can determine
whether Y has a meaning as a measurement. This effect presents a challenge
in the very definition of the effect between the two interventions, say T1 and
T2, on Y, because the existence of Y is determined after the intervention.
This problem can be treated in principle in the context of missing informa-
tion, not of Y (which is sometimes undefined) but of certain strata, called
principal strata. Our example involves clinical trials for HIV.
The idea of cell-mediated immunity is to train the killer cells to recog-
nize and attack a protein that human CD4 cells create when the CD4 cells
are infected (as opposed to targeting the virus directly, whose identifica-
tion is difficult due to mutations over time). For this reason, randomized
trials for cell-mediated immunity vaccines should be designed to assess two
coprimary outcomes: reducing primary infection (say, E), and, if a person
is infected (E = 1), keeping low viral load (say Y). Work by Gilbert et al.
(2003) and then by Mehrotra et al. (2006) showed how principal strati-
fication (Frangakis and Rubin, 2002) can be used to formulate the target
hypotheses with such coprimary outcomes. Specifically, the first coprimary

OCR for page 47

DRAWING INFERENCES FROM INCOMPLETE DATA
research hypothesis is that changing treatment T1 (placebo) to T2 (vac-
cine) changes the primary infection rate E. The second coprimary research
hypothesis should capture that the vaccine can also affect viral load when
infected. However, the viral load distributions between infectees under the
placebo condition and those infected under the vaccine condition could be
different simply because the immune system is inherently different between
the two groups. (In fact, if the vaccine prevents some primary infections,
infectees under vaccine are expected to have weaker baseline immune
system than infectees under placebo.) One can disentangle baseline dif-
ferences from vaccine effects if one focuses on the people who would
have been infected regardless of receiving the vaccine or the placebo. This
stratum is known as a principal stratum because membership to it does
not change depending on assignment to different interventions. Thus, the
second coprimary research hypothesis can be that changing treatment T1
(placebo) to T2 (vaccine) will change the viral load for those for whom
changing T1 to T2 does not prevent primary infection.
For a person under placebo who gets infected (E(T1) = infected), one
does not know if the person would have been also infected under vaccine
(E(T2) = infected), so membership to the principal stratum—E(T1) = E(T2)
= infected—is partly missing. (Estimation of the effect of vaccine on viral
load Y for this stratum is discussed above.)
Additional examples of randomized controlled trials with coprimary
outcomes using principal stratification include determining if the immune
response to a vaccine is causing reduction in infection rates (Follmann,
2006); assessing more general surrogate outcomes in vaccine trials (Qin et
al., 2008); and evaluating the effect of an intervention on severity of a dis-
ease (e.g., of prostate cancer) when a person does get the disease (Shepherd
et al., 2008).
MISSING DATA IN AUXILIARY VARIABLES
The assumptions and models discussed above have been limited to
outcome variables. Usually, there are many auxiliary variables collected at
each visit that can be useful to incorporate into the analysis. Specifically,
these variables are useful because they both help explain the reasons for
future nonresponse as well as help predict the missing outcomes (and so
help improve the efficiency with which the treatment effects are estimated).
They can also serve to make the MAR assumption more tenable. We
have assumed throughout that the collection of auxiliary variable data is
complete, which is clearly not always the case. We do note that the above
approaches can be modified to incorporate missing auxiliary data by aug-
menting the missing outcome variable with a missing V. Although including
V along with the missing outcome variable will often address the problem,

OCR for page 47

MISSING DATA IN CLINICAL TRIALS
the literature on missing data in longitudinal settings is fairly limited, and
more research on dealing with missing auxiliary data would be useful.
We do believe that many of the above approaches can be easily modified
–
to incorporate auxiliaries by replacing Yk in the conditional means and
–
probabilities with Zk , which includes (Y1,…,Yk–1,V1,…,Vk–1). An excellent
example of the use of this method is Liu et al. (2009).