Appendix D
Analysis and Interpretation of Studies with Missing Data

A characteristic of virtually all studies of posttraumatic stress disorder (PTSD), and of many psychiatric conditions, is a high degree of attrition of participants from assigned treatment, whether that treatment be pharmacologic or psychotherapeutic. This can be caused by the underlying condition and patient characteristics, which makes adherence to any form of therapy difficult, or it can be caused by improving or worsening of symptoms. High degrees of dropout are common in studies of a broad range of psychologic conditions. In a review of studies by Khan (2001a,b), dropout rates in trials of antidepressants averaged 37 percent, similar between treatment and placebo, and were in the 50–60 percent range for trials of antipsychotics, somewhat greater on treatment than on placebo, and intermediate among active controls.

The numbers in the PTSD literature studied here were comparable. The median follow-up in the 37 PTSD pharmacotherapy studies was 74 percent (10th–90th percentiles 58–90 percent), with one not reporting follow-up. The median differential follow-up (treatment-placebo) was −3 percent (10th–90th percentiles 19 percent to +15 percent). For the psychotherapy studies, in the 79 active treatment arms used in 56 studies, the median follow-up was 80 percent (10th–90th percentiles 61–100 percent). The median follow-up in the 32 minimal care and wait-list arms was 94 percent (10th–90th percentiles 79–100 percent). The median differential follow-up among the 13 trials without a minimal care arm was zero (interquartile range −6 percent to +11 percent). Among the 32 studies with a minimal care or wait-list arm, the median differential follow-up (treatment-minimal care) was −6 percent (10th–90th percentiles, −26 percent to +3 percent).



The National Academies | 500 Fifth St. N.W. | Washington, D.C. 20001
Copyright © National Academy of Sciences. All rights reserved.
Terms of Use and Privacy Statement



Below are the first 10 and last 10 pages of uncorrected machine-read text (when available) of this chapter, followed by the top 30 algorithmically extracted key phrases from the chapter as a whole.
Intended to provide our own search engines and external engines with highly rich, chapter-representative searchable text on the opening pages of each chapter. Because it is UNCORRECTED material, please consider the following text as a useful but insufficient proxy for the authoritative book pages.

Do not use for reproduction, copying, pasting, or reading; exclusively for search engines.

OCR for page 185
Appendix D Analysis and Interpretation of Studies with Missing Data A characteristic of virtually all studies of posttraumatic stress disorder (PTSD), and of many psychiatric conditions, is a high degree of attrition of participants from assigned treatment, whether that treat- ment be pharmacologic or psychotherapeutic. This can be caused by the underlying condition and patient characteristics, which makes adherence to any form of therapy difficult, or it can be caused by improving or worsening of symptoms. High degrees of dropout are common in studies of a broad range of psychologic conditions. In a review of studies by Khan (2001a,b), dropout rates in trials of antidepressants averaged 37 percent, similar between treatment and placebo, and were in the 50–60 percent range for trials of antipsychotics, somewhat greater on treatment than on placebo, and intermediate among active controls. The numbers in the PTSD literature studied here were comparable. The median follow-up in the 37 PTSD pharmacotherapy studies was 74 percent (10th–90th percentiles 58–90 percent), with one not reporting follow- up. The median differential follow-up (treatment-placebo) was –3 percent (10th–90th percentiles 19 percent to +15 percent). For the psychotherapy studies, in the 79 active treatment arms used in 56 studies, the median follow-up was 80 percent (10th–90th percentiles 61–100 percent). The median follow-up in the 32 minimal care and wait-list arms was 94 percent (10th–90th percentiles 79–100 percent). The median differential follow-up among the 13 trials without a minimal care arm was zero (interquartile range –6 percent to +11 percent). Among the 32 studies with a minimal care or wait-list arm, the median differential follow-up (treatment-minimal care) was –6 percent (10th–90th percentiles, –26 percent to +3 percent). 

OCR for page 185
 TREATMENT OF POSTTRAUMATIC STRESS DISORDER If outcome data is not obtained from patients who drop out from treatment, that participant’s outcome data will be missing. It is critical to recognize that dropout from treatment does not have to produce miss- ing outcome data. Outcome data can still be obtained from subjects who discontinue treatment, so missing data is partly produced by study design (e.g., a failure to follow up patients who stop treatment), and is not an inevitable result of a condition, treatment, or behavior (Lavori, 1992). This was shown in studies of PTSD treatment by Schnurr et al. (2003, 2007) that successfully obtained outcomes measurements from a large frac- tion of participants who discontinued treatment. Very few of the studies examined here obtained outcome information after a patient stopped treat- ment or during post-treatment follow-up. Because a very high percentage of patients, from 20 percent to 50 percent, typically dropped out of these studies, large fractions of outcome data were therefore missing. The most common way this is handled in the literature reviewed was to use the last recorded outcome as the final outcome from a patient who dropped out—the “last observation carried forward” (LOCF) approach. The motivation for this statistical approach is understandable: to include as many patients as possible in the final analysis, and to use as much information as possible from every patient. Unfortunately, the LOCF approach, while it uses “all available data,” does so in a way that typically produces improper answers. For that reason, it has long been rejected as a valid method of handling missing data by the statistical community, even as its use has remained prevalent in various domains of research. Statisti- cians recommend a wide array of more appropriate, albeit technically more complex, methods that have been in existence for decades and can now be implemented in standard software (Schafer and Graham, 2002; Mallinckrodt et al., 2003; Molenberghs et al., 2004; Leon et al., 2006; Little and Rubin, 2002). PROPERTIES OF MISSING DATA: REASONS FOR MISSINGNESS The basic principles of how missing data should be handled depend partly on the reasons for that missingness, as reflected in the statistical relationships between the missing data and the observed data used in the analytic model. Technically, there are three types of missing data: missing completely at random (MCAR), missing at random (MAR), and missing not at random (MNAR); the latter two are also known as “nonignorable” or “informative” missingness. The first type—MCAR—means that the missingness of the outcome data Y does not depend on either the observed (Yobs) or unobserved (Ymiss) outcomes, after taking into account the other variables included in the analytic model. The mechanism by which this would be produced might

OCR for page 185
 APPENDIX D be some administrative or conduct process, wherein the discontinuation of treatment, or the failure to gather data, has nothing to do with a subject’s clinical course. Under this scenario, complete case analysis is unbiased, as complete cases constitute a representative sample of the study population. However, complete case analysis is inefficient in that it does not make use of the interim information from subjects without final outcome data. Interestingly, even in this situation where completers represent a completely random representative sample, LOCF is generally biased, because of its assumption that disease severity remains unchanged from its last recorded value (Molenberghs, 2004). The second kind of missing data (MAR) occurs when data are missing at random if, conditional upon the independent variables in the analytic model, the missingness depends on the observed values of the outcome being analyzed (Yobs) but does not depend on the unobserved values of the outcome being analyzed (Ymiss). It is thus similar to MCAR, except that a subject’s observed disease severity affects the likelihood of subsequent drop- out. It assumes that the average future behavior of all individuals with the same characteristics and clinical course up to a given time will be the same, regardless of whether their outcome data is missing after that time. The best approach to this kind of missing data involves forms of data imputation or modeling that take into account all the observed data up to the point where it is missing. These techniques include mixed model repeated measurement (MMRM) and multiple imputation, random regression or hierarchal regres- sion models (Molenberghs et al., 2004; Schafer and Graham, 2002). Both complete case and LOCF perform suboptimally in this situation, the former because it doesn’t use the information from patients with incomplete data at all, and LOCF because it does not utilize that information properly. Finally, data that are missing “not at random” (MNAR) is data whose value is not predictable from the observed data of other patients that com- pleted the trial and from the data on the patient in question up until the point of dropout. An example of this is a patient who drops out due to an unrecorded relapse after apparently doing well, or a patient who drops out because of side effects, whose tolerance might be reduced when their PTSD is worse. Because missingness of the data is related to the value of the unobserved data, this kind of data is called “informatively” or “nonignor- ably” missing. This condition by definition cannot be ascertained from the observed data, yet most missing data methods take as their assumption that it does not exist. The higher proportion of outcome data that are missing, the more the validity of any analysis rests on this unverifiable assumption, and the less reliable the results from any method. It can be dealt with only via sensitivity analysis, or better, by learning something about the reasons for the dropouts using information external to the data in hand. If the data allows, studying the characteristics and intermediate outcomes of patients

OCR for page 185
 TREATMENT OF POSTTRAUMATIC STRESS DISORDER with different patterns of dropout can also be informative (Mallinckrodt et al., 2004; Schafer and Graham, 2002). Several key points arise from these definitions. Most importantly, the characterization of the missingness mechanism does not rest on the data alone; it involves both the data and the model used to analyze the data. Consequently, missingness that might be MNAR given one model could be MAR or MCAR given another. Therefore, statements about the missingness mechanism cannot be interpreted without reference to what other variables are included in the analytic model. Such subtleties can be easy to overlook in practice, leading to mis- understanding about missing data and its consequence. For example, when dropout rates differ by treatment group, then it can be said that dropout is not random. But it would be incorrect to conclude that the missingness mechanism giving rise to the dropout is MNAR and that analyses assum- ing MCAR or MAR would be invalid. Although dropout is not completely random in the simplest sense, if dropout depends only on treatment, and treatment is included in the analytic model, the mechanism giving rise to the dropout would be MCAR. ISSUES WITH LAST OBSERVATION CARRIED FORWARD APPROACHES TO MISSING DATA We will focus here on the problems created by using the LOCF approach to handling missing data, which is the most widely used approach in the literature reviewed. The problems with the LOCF approach are several- fold, deriving from a variety of unlikely assumptions (Molenberghs et al., 2004): (1) A patient’s outcome value would not have changed between the time of its last recorded value and the time of last possible follow- up (the “constant profile” assumption). • This has the effect not only of possibly misrepresenting what that final outcome would have been, but making it appear as though we can be as certain about the missing outcomes of dropouts as we are about those subjects whose outcome are measured. This makes the precision of the final estimates higher than is justified by the data. (2) There is nothing about the patient or their course preceding the dropout that is informative about their course after the point of dropout. • It is quite often the case that those who drop out differ from those who remain, either at baseline or in their subsequent course. Because LOCF ignores this information, its predictions

OCR for page 185
 APPENDIX D are more likely to be wrong than other methods that take that data into account. In this sense, LOCF does not actually use “all the data.” (3) The dropout itself is not informative about a patient’s ultimate outcome. • This occurs when patients who are either responding, or not responding, preferentially drop out, and that this difference is not reflected in anything already measured about the patient (e.g., occurring when patient is feeling better, or worse, right before they dropped out). These three factors—false certainty about the missing outcome, ignoring relevant information about the missing outcome, and assuming that drop- out itself is not related to outcome—conspire to make LOCF a misleading statistical approach to handling missing data. There is an extensive treat- ment of this subject in the statistical, medical, and psychiatric literature going back decades (Gueorguieva and Krystal, 2004; Lavori, 1992; Leon et al., 2006; Little and Rubin, 2002; Mallinckrodt et al., 2003; Schafer and Graham, 2003). We summarize here the background for our judgments about the difficulties in deriving inferences from studies that used LOCF in the presence of high proportions (e.g., greater than 30 percent) of missing data. Although it is sometimes stated that an LOCF analysis will be “conser- vative,” meaning biased towards a null effect, this is not true generally. This approach can introduce a bias in any direction, depending on the trajectory of disease severity in arms being compared, the reasons for and degrees of dropout, and the other factors included in the models. All of these compo- nents interact, so neither the magnitude nor direction of bias can be easily predicted. Also, the precision of any estimated effect is always overstated even when no bias is introduced into the estimate of effect. Mallinckrodt et al. (2003) described conditions that produce bias. Holding all other factors constant, LOCF approaches will: • overestimate a drug’s advantage when dropout is higher in the comparator and underestimate the advantage when dropout is lower in the comparator; • overestimate a drug’s advantage when the advantage is maximum at intermediate time points and underestimate the advantage when the advantage increases over time; and • have a greater likelihood of overestimating a drug’s advantage when the advantage is small.

OCR for page 185
0 TREATMENT OF POSTTRAUMATIC STRESS DISORDER In scenarios in which the overall tendency is for patient worsening, the above biases are reversed. LOCF analyses can be biased under all reasons for missingness; the bias generally increases as the dropout rate increases and becomes more dif- ferential between groups. The artificially high precision of LOCF estimates also becomes more serious as the dropout rate increases. This does not mean that analyses with LOCF are “invalid” in a binary sense, but rather that the quality of the evidence they provide becomes weaker as dropout rates rise and as its underlying assumptions become harder to confirm from the data. It is difficult to quantify in a simple manner the relationship between dropout rate and the degree of bias introduced by LOCF, since that bias depends on a number of things besides the dropout rate: the clinical course of untreated patients over time, the time course of the therapeutic effect, the relationship between the interim measurement and the final measure- ment, and the nature of the outcome measurement (e.g., percentage of “suc- cess” versus disease severity). In a comprehensive treatment of the subject, Molenberghs et al. (2004) present equations that allow us to calculate the degree of bias produced by LOCF in a continuous measure of disease se- verity in the simple situation where each subject is assessed once halfway through treatment, and again at the end. It is assumed that everyone has an intermediate measurement, but that a certain percentage in each group drops out before a final value is measured. Table D-2 shows the degree of bias for the scenarios presented in Table D-1, under equal dropout rates, which is generally the most favorable scenario for the use of LOCF. We see from these tables that both the degree and direction of bias caused by LOCF is not immediately apparent from underlying treatment ef- fects and trends, and that this bias increases as the follow-up rate decreases (i.e., the dropouts increase). What is not included here are simulations related to the overstated precision of estimates; it is possible that even if the effect size is understated the statistical significance is overstated, if the standard error decreases proportionally by more than the effect size. These scenarios are merely demonstrative and not meant to be repre- sentative of the literature studied herein, although many are plausible PTSD treatment patterns. It is calculations such as these and more intensive and detailed simulations that lead statisticians to view LOCF as problematic for most situations (Cook et al., 2004; Mallinckrodt et al., 2004; Molenberghs et al., 2004), particularly so when the rate of missingness exceeds 30–40 percent. With proper methods such as MMRM or multiple imputation, to the extent that the MAR assumption is met, there is minimal bias. However, at high levels of dropout even these methods become more heavily depen- dent on the unverifiable MAR assumption. Not all of the scenarios reported in Table D-1 follow a MAR pattern.

OCR for page 185
 APPENDIX D TABLE D-1 Various Hypothetical Patterns of PTSD Scores (CAPS-2) in an Idealized Study with Two On-Treatment Measures; One Interim, One Final Natural Disease Baseline Interim Effect Final Effect Course Scenario 1 Completers: Interim benefit, sustained Dropouts: Interim benefit, nonsustained benefit LOCF bias: 0–100% overstated benefit Completers 75 –15 –15 0 Dropouts 75 –15 0 0 Scenario 2 Completers: Interim benefit, increasing Dropouts: Interim, decreasing benefit LOCF bias: 0–25% overstated benefit Completers 75 –10 –15 0 Dropouts 75 –10 –5 0 Scenario 3 Completers: Early sustained benefit Dropouts: Deferred benefit, equal to completers LOCF bias: 0–50% understated benefit Completers 75 –10 –10 0 Dropouts 75 0 –10 0 Scenario 4 Completers: Less severe than dropouts. Interim, increasing benefit. Dropouts: Identical benefit LOCF bias: 0–33% understated benefit Completers 75 –5 –15 0 Dropouts 90 –5 –15 0 Scenario 5 Completers: Steadily increasing benefit, with equal natural improvement Dropouts: Identical to completers LOCF bias: 0–25% understated benefit Completers 75 –5 –10 –5 Dropouts 75 –5 –10 –5 Scenario 6 Completers: Early large benefit, sustained Dropouts: No effect, some early benefit LOCF bias: 0–33% overstated benefit Completers 75 –15 –15 0 Dropouts 75 –5 0 0 NOTE: True underlying patterns for completers and non-completers are listed. “Natural disease course” is the temporal trend in both groups. Negative values represent improvement.

OCR for page 185
 TREATMENT OF POSTTRAUMATIC STRESS DISORDER TABLE D-2 Degree of Bias Induced by LOCF Analysis Under Above Scenarios Follow-up Scenario 1 Scenario 2 Scenario 3 Scenario 4 Scenario 5 Scenario 6 1 0 0 0 0 0 0 .9 –11 –4 10 7 5 –4 .8 –25 –8 20 13 10 –8 .7 –43 –13 30 20 15 –14 .6 –67 –18 40 27 20 –22 .5 –100 –25 50 33 25 –33 NOTE: Follow-up is equal in each group. Negative bias represents overstatement of the observed effect, since lower CAPS-2 scores represent clinical improvement. These biases are percentages of the true final effect size. For example, if a therapy had on average a 15-point reduction in the CAPS score, an estimate based on LOCF of a 10-point reduction would rep- resent a bias of 33%, and an estimated 30-point reduction would produce a bias of –100%. It is for the kinds of reasons that reviews and consensus papers from researchers with academic affiliations (Gueorhuieva and Krystal, 2004; Lieberman et al, 2005), consensus papers from a mix of academic and industry researchers (Leon et al., 2006; Mallinckrodt et al., 2004), and statistics text books (Little and Rubin, 2002; Molenberghs and Kenward, 2007; Verbeke and Molenherghs, 2000) have all recommended that analy- ses of longitudinal clinical trial data move away from simple methods such as LOCF or observed-case analysis to more principled approaches, such as multiple imputation or the likelihood-based family in which MMRM resides. These are the foundations of our recommendations that the analytic treatment of missing data and the effort to gain outcome information from subjects who drop out of PTSD treatment studies, need to be greatly strengthened. They have also guided us in our assessment of the quality of studies: if the dropout rate was high (particularly exceeding 30 percent), the differential dropout between arms was high (particularly exceeding 15 per- cent); and if LOCF was used to address dropouts, then the evidence from otherwise well-designed or well-executed studies was considered lower in quality. REFERENCES Cook, R. J., L. Zeng, and G. Y. Yi. 2004. Marginal analysis of incomplete longitudinal binary data: A cautionary note on LOCF imputation. Biometrics 60:820-828. Gueorguieva, R., and J. H. Krystal. 2004. Move over ANOVA: Progress in analyzing repeated- measures data and its reflection in papers published in the Archives of General Psychiatry. Archives of General Psychiatry 61:310-317.

OCR for page 185
 APPENDIX D Khan, A., S. R. Khan, R. M. Leventhal, and W. A. Brown. 2001a. Symptom reduction and suicide risk in patients treated with placebo in antidepressant clinical trials: A replica- tion analysis of the Food and Drug Administration database. International Journal of Neuropsychopharmacology 4:113-118. Khan, A., S. R. Khan, R. M. Leventhal, and W. A. Brown. 2001b. Symptom reduction and suicide risk among patients treated with placebo in antipsychotic clinical trials: An analysis of the Food and Drug Administration database. American Journal of Psychiatry 158:1449-1454. Lavori, P. W. 1992. Clinical trials in psychiatry: Should protocol deviation censor patient data? Neuropsychopharmacology 6:39-48; discussion 49-63. Leon, A. C., C. H. Mallinckrodt, C. Chuang-Stein, D. G. Archibald, G. E. Archer, and K. Chartier. 2006. Attrition in randomized controlled clinical trials: Methodological issues in psychopharmacology. Biology and Psychiatry 59:1001-1005. Lieberman, J. A., J. Greenhouse, R. M. Hamer, K. R. Krishnan, C. B. Nemeroff, D. V. Sheehan, M. E. Thase, and M. B. Keller. 2005. Comparing the effects of antidepressants: Consensus guidelines for evaluating quantitative reviews of antidepressant efficacy. Neuropsychopharmacology 30:445-460. Little, R. J. A. 1994. A class of pattern-mixture models for normal incomplete data. Biometrika 81:471-483. Little, R. J. A., and D. Rubin. 2002. Statistical analysis with incomplete data. New York: Wiley. Mallinckrodt, C. H., T. M. Sanger, S. Dube, G. Molenberghs, W. Potter, T. Sanger, and G. Tollefson. 2003. Assessing and interpreting treatment effects in longitudinal clinical trials with missing data. Biology and Psychiatry 53:754-760. Mallinckrodt, C. H., C. J. Kaiser, J. G. Watkin, G. Molenberghs, and R. J. Carroll. 2004. The effect of correlation structure on treatment contrasts estimated from incomplete clinical trial data with likelihood-based repeated measures compared with last observation car- ried forward ANOVA. Clinical Trials 1:477-489. Molenberghs, G., and M. G. Kenward. 2007. Missing data in clinical studies. Chichester, England: John Wiley & Sons. Molenberghs, G., H. Thijs, I. Jansen, and C. Beunckens. 2004. Analyzing incomplete longitu- dinal clinical trial data. Biostatistics 5:445-464. Schafer, J. L., and J. W. Graham. 2002. Missing data: Our view of the state of the art. Psychol- ogy Methods 7:147-177. Schnurr, P., M. Friedman, D. Foy, M. Shea, F. Hsieh, P. Lavori, S. Glynn, M. Wattenberg, and N. Bernardy. 2003. Randomized trial of trauma-focused group therapy for posttraumatic stress disorder: Results from a Department of Veterans affairs cooperative study. Archives of General Psychiatry 60(5):481-489. Schnurr, P. P., M. J. Friedman, C. C. Engel, E. B. Foa, M. T. Shea, B. K. Chow, P. A. Resick, V. Thurston, S. M. Orsillo, R. Haug, C. Turner, and N. Bernardy. 2007. Cognitive be- havioral therapy for posttraumatic stress disorder in women: A randomized controlled trial. Journal of the American Medical Association 297(8):820-830. Verbeke, G., and G. Molenberghs. 2000. Linear mixed models for longitudinal data. New York: Springer.

OCR for page 185