If outcome data is not obtained from patients who drop out from treatment, that participant’s outcome data will be missing. It is critical to recognize that dropout from treatment does not have to produce missing outcome data. Outcome data can still be obtained from subjects who discontinue treatment, so missing data is partly produced by study design (e.g., a failure to follow up patients who stop treatment), and is not an inevitable result of a condition, treatment, or behavior (Lavori, 1992). This was shown in studies of PTSD treatment by Schnurr et al. (2003, 2007) that successfully obtained outcomes measurements from a large fraction of participants who discontinued treatment. Very few of the studies examined here obtained outcome information after a patient stopped treatment or during post-treatment follow-up. Because a very high percentage of patients, from 20 percent to 50 percent, typically dropped out of these studies, large fractions of outcome data were therefore missing. The most common way this is handled in the literature reviewed was to use the last recorded outcome as the final outcome from a patient who dropped out—the “last observation carried forward” (LOCF) approach.

The motivation for this statistical approach is understandable: to include as many patients as possible in the final analysis, and to use as much information as possible from every patient. Unfortunately, the LOCF approach, while it uses “all available data,” does so in a way that typically produces improper answers. For that reason, it has long been rejected as a valid method of handling missing data by the statistical community, even as its use has remained prevalent in various domains of research. Statisticians recommend a wide array of more appropriate, albeit technically more complex, methods that have been in existence for decades and can now be implemented in standard software (Schafer and Graham, 2002; Mallinckrodt et al., 2003; Molenberghs et al., 2004; Leon et al., 2006; Little and Rubin, 2002).


The basic principles of how missing data should be handled depend partly on the reasons for that missingness, as reflected in the statistical relationships between the missing data and the observed data used in the analytic model. Technically, there are three types of missing data: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR); the latter two are also known as “nonignorable” or “informative” missingness.

The first type—MCAR—means that the missingness of the outcome data Y does not depend on either the observed (Yobs) or unobserved (Ymiss) outcomes, after taking into account the other variables included in the analytic model. The mechanism by which this would be produced might

The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement