Introduction
Cognitive therapy (CT) and interpersonal psychotherapy (IPT), the two best studied and commonly practiced psychological interventions for the treatment of major depressive disorder (MDD), have shown to be effective treatments for many depressed patients (Cuijpers et al. Reference Cuijpers, Andersson, Donker and Van Straten2011; Barth et al. Reference Barth, Munder, Gerger, Nuësch, Trelle, Znoj, Jüni and Cuijpers2013; Cuijpers et al. Reference Cuijpers, Berking, Andersson, Quigley, Kleiboer and Dobson2013a; Cuijpers et al. Reference Cuijpers, Sijbrandij, Koole, Andersson, Beekman and Reynolds2016). With initial response rates up to 60%, they have shown to be at least as efficacious as antidepressant medications (ADM) in the acute phase of the disorder (Cuijpers et al. Reference Cuijpers, Donker, Weissman, Ravitz and Cristea2013b, Reference Cuijpers, Hollon, Van, Bockting, Berking and Andersson2013c). However, even when treated effectively in the acute phase†Footnote 1, depression has an unfavorable prognosis. It is estimated that at least 50% of those who recover from a first episode of MDD will have one or more additional episodes later on in life, and the risk of recurrence progressively increases with each additional episode (Solomon et al. Reference Solomon, Keller, Leon and Al.2000; Burcusa and Iacono, Reference Burcusa and Iacono2007; Eaton et al. Reference Eaton, Shao, Nestadt, Lee, Bienvenu and Zandi2008). It is therefore important that treatments do not only reduce symptoms in the acute phase, but also produce enduring effects.
CT has consistently been shown to have an enduring effect that lasts beyond the end of treatment, with survival rates higher than those associated with (prior) pharmacological treatmentFootnote 2 (Vittengl et al. Reference Vittengl, Clark, Dunn and Jarrett2007; Cuijpers et al. Reference Cuijpers, Donker, Weissman, Ravitz and Cristea2013b). Research in IPT is less extensive. Even though IPT has shown to prevent relapse and recurrence when continued or maintained (Cuijpers et al. Reference Cuijpers, Sijbrandij, Koole, Andersson, Beekman and Reynolds2016), only one older study has examined whether it has a prophylactic effect following treatment termination (Shea et al. Reference Shea, Elkin, Imber, Sotsky, Watkins, Collins, Pilkonis, Beckham, Glass and Dolan1992). This was the follow-up to the NIMH Treatment of Depression Collaborative Research Program (TDCRP), a placebo-controlled randomized comparison among CT, IPT, and ADM that found comparable rates of relapse between prior IPT and prior CT (33% v. 36%) that were each non-significantly lower than prior ADM (50%). These findings must be interpreted with caution, since sample sizes were small, ADM was continued for 6 months following the end of acute treatment, and the difference between prior CT and prior ADM was among the smallest reported in the literature, but they are suggestive of a possible enduring effect for prior IPT. Additional research into the extent to which the effects of IPT persist following the cessation of treatment is needed.
Recently, we conducted a large randomized controlled trial (RCT) investigating the effects of individual CT and IPT for adult depression, primarily designed to compare long-term outcomes of both therapies in a research-oriented routine clinical setting (Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2011; Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015; Reference Lemmens, Galindo-Garre, Arntz, Peeters, Hollon, Derubeis and Huibers2017). CT and IPT were both superior to a waiting-list control (WLC) condition over the first 2 months of treatment, and did not differ from another across the rest of the 7-month treatment phase (Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015) – as has been the case in most acute phase comparisons between the two modalities (Jakobsen et al. Reference Jakobsen, Hansen, Simonsen, Simonsen and Gluud2012). We now report on the long-term outcomes of these two interventions over the next 17 months, through the end of 24 months post-randomization. We expected depression scores to be relatively stable across the follow-up period. Furthermore, we expected relapse rates in CT to be similar to those reported in the previous studies; approximately 30%. Following earlier findings (Shea et al. Reference Shea, Elkin, Imber, Sotsky, Watkins, Collins, Pilkonis, Beckham, Glass and Dolan1992), one would not expect large differences between CT and IPT. However, since CT has a stronger tradition in focusing on relapse prevention compared with IPT, we expected that CT would do somewhat better.
Methods
Design and participants
Data come from a single-center RCT (parallel group design) into the clinical effects and mechanisms of change of individual CT and IPT for MDD. In this study, 182 depressed adults were randomly allocated to CT (n = 76), IPT (n = 75), or a 2-month WLC condition followed by treatment of choice (n = 31). In the present study, we only included the patients who were assigned to one of the two active conditions (CT and IPT) and who provided data at post-treatment (month 7; n = 134; CT = 69, IPT = 65; henceforth total sample).
Details concerning study design, participants, interventions, and acute outcomes have been fully described elsewhere (Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2011; Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015), and will therefore only briefly be summarized here. Participants were adult outpatients referred to the mood disorder treatment program of the Academic Community Mental Health Centre Maastricht. All patients had a primary diagnosis of MDD, as ascertained by the Dutch version of the Structured Clinical Interview for DSM-IV Axis I disorders (SCID-I; First et al. Reference First, Spitzer, Gibbon and Williams1997). Further inclusion criteria were: internet access, an e-mail address, and sufficient knowledge of the Dutch language. Patients receiving ADM or other psychological treatment at baseline were excluded from the study, as were those at imminent risk for suicide. Other exclusion criteria were: bipolar or chronic depression (current episode >5 years), IQ lower than 80, and substance abuse/dependence.
All participants provided informed consent. Randomization took place via computer-generated block randomization (10:10:4) and was pre-stratified according to the presence or absence of prior episodes. The random allocation sequence was generated by an independent computer scientist and was concealed from the researchers. Blinding of patients and therapists for treatment condition was not possible. As outlined elsewhere (Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2011), sample size calculations were based on long-term expectations of CT v. IPT. An a priori power analysis indicated that 75 patients per arm in the active conditions (taking 15% attrition into account) would provide 80% power (two-tailed α = 0.05) to detect an expected 20% difference in the relapse rate between CT and IPT (favoring CT) at the end of the follow-up period.
Treatment consisted of 16–20 individual 45-min sessions. The CT protocol followed the guidelines laid out by Beck et al. (Reference Beck, Rush, Shaw and Emery1979) and included homework assignments. The IPT protocol was based on the manual by Klerman et al. (Reference Klerman, Weissman, Rounsaville and Chevron1984). Therapists were uniquely assigned to one of the treatment conditions to prevent contamination. All therapists had several years of clinical experience in the field of depression and with the assigned intervention. Prior to the study, all therapists received 16 hours of additional training by experts in the field. During the study, therapists and researchers met biweekly in consultation sessions to discuss their caseloads (separate sessions for CT and IPT). The study was approved by the Maastricht University's Ethical Board, and is registered at the Netherlands Trial Register, part of the Dutch Cochrane Centre (ISRCTN67561918). Patients completed an average of 17 therapy sessions (s.d. = 2.9). Independent assessors rated the quality of therapy, measured with the Cognitive Therapy Scale (Dobson et al. Reference Dobson, Shaw and Vallis1985) for CT and the Short version of the IPT Adherence and Quality Scale (Stuart, Reference Stuart2011) for IPT, as being ‘(very) good’ to ‘excellent’ in both conditions (see Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015 for more details). Furthermore, significant differences in therapy-specific behavior between conditions were found (as measured with the Collaborative Study Psychotherapy Rating Scale – version 6; Hollon et al. Reference Hollon, Derubeis, Shelton, Amsterdam, Salomon, O'reardon, Lovett, Young, Haman, Freeman and Gallop1984; Hollon et al. Reference Hollon, Evans, Auerbach, Derubeis, Elkin, Lowery, Kriss, Grove, Tuason and Piasecki1988), indicating that therapists adhered to the protocol.
Both treatments led to considerable improvement in depressive symptom severity as measured with the Beck Depression Inventory II (BDI-II; Beck et al. Reference Beck, Steer and Brown1996: pre–post-treatment effect size d = 1.72 in the pooled active conditions). Response to the therapy exceeded response in the WLC condition. No differential effects between the active treatments were found (Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015).
Outcomes
Self-reported depression severity
Self-reported depressive symptom severity was measured with the BDI-II. The BDI-II is a 21-item questionnaire with strong psychometric properties (Beck et al. Reference Beck, Steer and Brown1996; Van der Does, Reference Van Der Does2002). Items are rated on a four-point Likert scale (0–3), with higher scores indicating higher levels of depression severity (range 0–63).
Clinician-rated depression severity
The MDD section of the Longitudinal Interval Follow-up Evaluation (LIFE; Keller et al. Reference Keller, Lavori, Friedman, Nielsen, Endicott, Mcdonald-Scott and Andreasen1987), a semi-structured interview for assessing the longitudinal course of psychiatric illness using a retrospective rating system, was used to obtain a clinician-rated measure of depression. The LIFE uses DSM-IV diagnostic criteria to classify depression retrospectively over the course of a pre-determined follow-up period (in our case 1 year; see further). Ratings are made on a six-point scale, ranging from meeting the full criteria of MDD (ratings of 5 or 6) to no residual symptoms (rating of 1). The LIFE has shown to be a reliable and valid instrument for identifying the course of several mental disorders examined retrospectively over the period of 1 year (Warshaw et al. Reference Warshaw, Keller and Stout1994; Warshaw et al. Reference Warshaw, Dyck, Allsworth, Stout and Keller2001).
Procedure
BDI-II assessments were completed at post-treatment (month 7), monthly thereafter for the next 5 months (month 7–12), and then again at the end of the follow-up (month 24). All assessments were administred on a computer. The post-treatment assessment took place at the research center (Maastricht University). All other assessments were administred online. The LIFE interview took place after the 24-month assessment and addressed retrospectively the period between 12 and 24 monthsFootnote 3. A rating was made for each 2-week period between months 12 and 24, resulting in a total of 26 retrospective observations. A schematic overview of the study design and the data points used in this study can be found in Fig. 1.
The majority of LIFE interviews (90%) were administered by a clinical psychology graduate student. The remaining 10% was administred by a resident in psychiatry. Both LIFE assessors had several years of clinical experience in the field of depression. Prior to the study, assessors studied relevant literature, the original set of LIFE training materials, and the detailed instruction manual that was developed for the current study. Furthermore, they conducted several pilot interviews to familiarize themselves with the rating system. During the study, regular consensus sessions took place, in order to discuss interpretation and pitfalls. The interviewers were blind to condition, treatment-adherence, -satisfaction, and -outcome. Interviews took place face-to-face or by telephoneFootnote 4 and ratings were made after the interview. Ratings made by the psychiatric resident were discussed with the other rater until consensus was reached.
Since patients were free to pursue additional treatment during the follow-up phase, we assessed whether patients received additional psychological support for MDD (conservatively defined as having one or more sessions with a general practitioner (GP) or a mental health care professional for depressive symptoms) or used ADM (⩾2 weeks) throughout the 17-month follow-up period. Information on health care status was obtained during the LIFE interview and at the 12 and 24 months assessmentFootnote 5 with the periodic retrospect health care consumption questionnaire (de Graaf et al. Reference De Graaf, Gerhards, Evers, Arntz, Riper, Severens, Widdershoven, Metsemakers and Huibers2008).
Definition of response and relapse
When investigating the clinical course of a disorder after acute phase treatment, one needs to carefully consider the definitions of response and relapse. Response is often conceptualized as a pre-determined change score representing a clinical significant improvement over the course of treatment (Jacobson and Truax, Reference Jacobson and Truax1991). Even though this method is useful in the majority of cases, in some cases, it leads to somewhat peculiar classifications. For example, the approach excludes patients who reach remission without the necessary drop in symptoms as treatment responders (e.g. a drop from 14 to 8 on the BDI-II). Furthermore, it includes patients who do show a clinical significant improvement, but still report high depression scores at the end of treatment (e.g. a drop from 61 to 48), hereby indicating that treatment had some effect, but worked insufficiently to reach (partial) remission. In order to take these variations into account, we defined response to treatment as either (1) a post-treatment BDI-II score lower than 10 (the cut-off for remission in our trial determined with the method of Jacobson and Truax, Reference Jacobson and Truax1991; see Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015); or (2) an overall change of at least 9 BDI-II points (the cut-off for reliable change in our trial determined with the method of Jacobson and Truax, Reference Jacobson and Truax1991; see Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015) and a post-treatment BDI-II score lower than 20 (the border between moderate and mild depression on the BDI-II, seeFootnote 6). Since the BDI-II and the LIFE assess different aspects of depression (depressive symptom severity v. DSM-IV classification), we formulated two separate definitions for relapse. Relapse on the BDI-II was defined as losing ⩾50% of the improvement that occurred over the course of treatment at any point during the follow-up (7, 8, 9, 10, 11, 12, 24 months). This was done to account for individual symptom severity change. Following Hollon et al. (Reference Hollon, Waskow, Evans and Lowery2005), relapse on the LIFE occurred as soon as the patients met full DSM-IV criteria for a depressive episode (rating of 5 or 6) on one of the 26 retrospective data points. Definitions of response and relapse are summarized in Table 1.
BDI-II, Beck Depression Inventory-II; DSM-IV, Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition; MDD, Major Depressive Disorder
Data analysis
First, for all patients (n = 134), we mapped out study compliance (LIFE and BDI-II) across the follow-up period (7–24 months), and compared patients with and without complete data on each of the outcome measures in terms of baseline characteristics (gender, age, education level, work- and marital status, first/recurrent depression) and post-treatment BDI-II score. We used χ2 tests for categorical data and independent samples t tests for continuous data. In addition, for each of the outcome measures (BDI-II and LIFE), BDI-II scores of patients with incomplete data were plotted to explore potential patterns in depression severity prior to drop-out. For the BDI-II, reliability at each time point was assessed using Cronbach's α.
After that, we determined the course of self-reported depressive symptom severity after treatment termination, and tested whether one of the treatments was superior to the other across the follow-up period. For this, we used a linear mixed-effects (multilevel) model with repeated BDI-II scores as the dependent variable, and time, condition (CT = −0.5, IPT = 0.5), and the time × condition interaction as the independent variables (Diggle et al. Reference Diggle, Heagerty, Liang and Zeger2002). Because mixed regression takes the nested structure of the data into consideration and can deal with autocorrelation and missing values (see e.g. Schafer and Graham, Reference Schafer and Graham2002; Singer and Willett, Reference Singer and Willett2003; Snijders and Bosker, Reference Snijders and Bosker2012), missing values were not imputed. Since CT and IPT differed in depression severity (BDI-II) and quality of life (EQ5D utility score; EuroQol-Group, 1990) at baseline, albeit not significant (see Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015), we added their standardized baseline scores as covariates to the model. Visual inspection of BDI-II change scores over time showed separate linear patterns for the 7–12 and 12–24 months intervals. Therefore, for the fixed effects, the slopes were modeled separately for each interval (piecewise regression; see online Supplementary material I). An autoregressive covariance structure was applied to factor in the correlation between measurement points. Intercepts and slopes (for the time variable) were allowed to be correlated and to vary randomly over subjects. Robust standard errors were applied. Effect sizes Cohen's d and r were computed from the multilevel estimates. Within-condition change was defined as Cohen's d = (post-treatment mean−mean at time i)/(pooled post-treatment s.d.), with the estimated means derived from the mixed regression analysis. Between-group effect sizes were determined by calculating the difference between the within-condition effect sizes of CT and IPT at time i. The r was defined as √(F/(F + df), with F and df values from the fixed part of the mixed regression analysis. Effect sizes were classified as being small (±0.2), medium (±0.5), and large (±0.8; Cohen, Reference Cohen1988).
Subsequently, we calculated response rates at 7 months and compared pre-treatment characteristics (similar to those described above) and post-treatment BDI-II scores between responders and non-responders. After that, we continued our analyses with the responder sample only (responder analysis; n = 85). First, we re-ran the previously described linear mixed-effects model and computed effect sizes r and d. Second, relapse rates (separate analyses for the BDI-II and LIFE) were examined using Cox regression models with condition as an independent variable and standardized baseline BDI-II and EQ5D utility scores as covariates (Cox and Oakes, Reference Cox and Oakes1984). The proportional hazard assumption was tested with Schoenfeld Residuals Test. Between-condition survival rates were compared using the log-rank test. Drop-outs were censored after the last measurement. In addition, for the BDI-II, we examined the rates of sustained response: the number of patients who responded to treatment and remained well (no relapse) during the follow-up period (see Table 1). Following Hollon et al. (Reference Hollon, Waskow, Evans and Lowery2005), percentages of sustained response rates were adjusted for missing observations and calculated from the ‘baseline’ sample (all patients that were initially assigned to CT or IPT, regardless of their enrollment in the current study; n = 151). By doing this, percentages of sustained response reflect pre-treatment probabilities of enduring treatment effects. CT/IPT differences in sustained response were examined using a χ2 test of independence.
Finally, sensitivity analyses were performed on all models by adding the following variables sequentially as (centered) covariates to each model: gender, work- and marital status, number of sessions received in acute phase, therapist, and additional psychological support for MDD, and/or use of ADM in the follow-up period. Multilevel analyses were carried out in SPSS (version 21). Other analyses were performed in STATA (version 13.1). All effects were tested at the p < 0.05 level (two-tailed).
Results
Patient flow and attrition
Of the 134 patients that were enrolled in the current study, 119 (88.8%) completed all BDI-II assessmentsFootnote 7. Baseline characteristics of patients with incomplete data on the BDI-II [eight in CT and seven in IPT: χ2 (1, 134) = 0.02, p = 0.88] were not significantly different from those with complete data (all p > 0.23). In addition, exploration of the course of depression indicated no distinctive patterns of findings between patients with incomplete and complete BDI-II data. The depression severity pattern for patients with missing BDI-II data appeared to be random. A total of 98 LIFE interviews were administered. Among the 36 patients whose LIFE interview was missing, six dropped out in an earlier phase of the study, 14 were unattainable and did not respond to contact requests, and 16 indicated that they no longer wanted to participate. Furthermore, LIFE data of two patients were incomplete. The conditions did not differ with respect to whether there were (complete) LIFE data [20 in CT v. 18 in IPT: χ2 (1, 134) = 0.03, p = 0.87]. Relative to those with complete LIFE data, patients without (complete) LIFE data were significantly more likely to have an intermediate (vocational) level of education [χ2 (2, 134) = 8.62, p = 0.01]. Furthermore, they reported higher post-treatment BDI-II scores [M = 18.7 (s.d. = 14.7) v. M = 13.3 (s.d. = 10.6); t (132) = 2.33, p = 0.02]. In addition, patients with incomplete LIFE data had somewhat higher BDI-II scores at all time points as compared with patients with complete LIFE data, indicating a more severe pattern of depression in general. BDI-II reliability coefficients ranged from α = 0.96 (at 7 and 8 months) to α = 0.97 (12 months). A total of 54 patients (40.3%; CT = 33, IPT = 21) had one or more sessions with their GP or a mental health care professional during the follow-up periodFootnote 8. Twenty-nine patients (21.6%; CT = 14, IPT = 15) used ADM. As some patients received both psychological and pharmacological support, the total number of patients with some form of additional support was 63 [47.0%; CT = 36, IPT = 27; χ2 (1, 128) = 2.01, p = 0.16].
Course of depressive symptomatology in the total sample
Table 2 presents the observed mean BDI-II scores (95% CI) and mixed regression-based estimated means (95% CI) over the course of follow-up for the total sample (n = 134), stratified according to the treatment condition. In addition, the results of the mixed-effects model and effect sizes r and d are reported. As can be seen in Table 2, symptom scores remained stable across the follow-up for both conditions. Effect sizes were small.
BDI-II, Beck Depression Inventory-II; CT, cognitive therapy; IPT, interpersonal psychotherapy; 95% CI, 95% confidence interval; baseline severity is standardized BDI-II score at baseline; baseline quality of life is standardized EuroQol (EQ5D) Utility Score at baseline; condition is CT v. IPT centered at −0.5 and 0.5, respectively; time effects represent the linear trend from 7 to 24 months, with week = 0 at 7 months; year is first v. second year of the study, coded as 0 < 12 and 1 ⩾ 12 months; data unavailable for 7 (4 CT; 3 IPT), 6 (4 CT; 2 IPT), 6 (3 CT; 3 IPT), 8 (4 CT; 4 IPT), 8 (4 CT; 4 IPT) and 11 (7 CT; 4 IPT) patients at 8, 9, 10, 11, 12, and 24 months, respectively; * = (Mt7-Mti)/SDpooledt7; ** = effect size r = √(F/(F + df)).
Responder analyses
At post-treatment (7 months), 85 patients (63.4%) met criteria for response. No between-condition differences were found [65.2% in CT v. 61.5% in IPT; χ2 (1, 134) = 0.20, p = 0.66]. Responders did not differ from non-responders at baseline, but reported significantly lower BDI-II scores at post-treatment [M = 7.5 (s.d. = 6.0) v. M = 27.6 (s.d. = 9.0); t (132) = 15.48, p < 0.0001], and fewer received additional support for MDD throughout the follow-up [26/85 responders v. 37/49 non-responders; χ2 (1, 128) = 25.87, p < 0.0001].
Outcomes on the BDI-II
The linear mixed-effects model on the BDI-II for the responder sample (Table 3) revealed significant time × condition and time × year × condition interactions. The time × condition interaction points toward more favorable outcomes for CT up to month 12 (BDI-II scores showed a slight decrease in CT, whereas they increased in IPT), and the negative time × year × condition interaction reflects a subsequent drop in scores for IPT in the second year. At 17 months follow-up, these opposite effects resulted in comparable overall outcomes for CT and IPT. Effect sizes of change throughout the follow-up period (7–24 months) were small for both CT and IPT. Two-thirds of treatment responders (57/85; 67.1%) completed the 17-month follow-up phase without meeting criteria for relapse on the BDI-II. Cumulative survival rates per treatment condition are shown in Figure 2a. Relapse rates were 13 (28.9%) for CT and 15 (37.5%) for IPT. A log-rank test [χ2 (1, 85) = 0.99, p = 0.32] and a Cox regression model (HR = 1.47, s.e. = 0.57, p = 0.32, CI95 = 0.69–3.14) indicated that there were no significant differences in relapse rates between CT and IPT. For those who did relapse (n = 28), the mean time to relapse was 61.4 weeks (s.e. = 5.7) after baseline. On average, patients in the IPT condition relapsed somewhat faster than those treated with CT [mean time to relapse of 54.1 (s.e. = 6.9) for IPT v. 69.8 weeks (s.e. = 9.2) for CT]. Patients were slightly more likely to show sustained responseFootnote 9 in CT (32 of 45 = 42.1%) than in IPT (25 of 40 = 33.3%), but differences were not significant [χ2 (1, 151) = 1.24, p = 0.27].
BDI-II, Beck Depression Inventory-II; CT, cognitive therapy; IPT, interpersonal psychotherapy; 95% CI, 95% confidence interval; baseline severity is standardized BDI-II score at baseline; baseline quality of life is standardized EuroQol (EQ5D) Utility Score at baseline; condition is CT v. IPT centered at −0.5 and 0.5, respectively; time effects represent the linear trend from 7 to 24 months, with week = 0 at 7 months; year is first v. second year of the study, coded as 0 < 12 and 1 ⩾ 12 months; data unavailable for 6 (3 CT; 3 IPT), 4 (2 CT; 2 IPT), 4 (2 CT; 2 IPT), 5 (2 CT; 3 IPT), 4 (2 CT; 2 IPT) and 9 (5 CT; 4 IPT) patients at 8, 9, 10, 11,12, and 24 months, respectively; * = (Mt7-Mti)/SDpooledt7; ** = effect size r = √(F/(F + df)).
Outcomes on the LIFE
LIFE data were available for 65 of 85 responders (76.5%; CT = 33, IPT = 32); 55 completed the follow-up phase without meeting criteria for relapse. Figure 2b shows – separately for CT and IPT – the cumulative proportion of treatment responders without relapse on the LIFE. Survival rates of CT and IPT were not significantly different [log-rank test: χ2 (1, 65) = 0.43, p = 0.51; Cox regression analysis: HR = 1.57, s.e. = 1.03, p = 0.48, CI95 = 0.44–5.66]. For the patients that did show relapse on the LIFE (n = 10; four in CT and six in IPT), mean time to relapse was 75.4 weeks (s.e. = 4.3) after baseline [67.5 (s.e. = 5.4) and 80.7 (s.e. = 5.5) weeks in CT and IPT, respectively]. For all models, the proportional hazard assumption was not violated.
Sensitivity analyses
Sensitivity analyses indicated that the total number of sessions, therapist, work- and marital status did not influence the findings. None of these covariates were significant in any of the models. Gender was a significant covariate in the survival analysis on the LIFE, but did not change the conclusions. Additional psychological and/or pharmacological support for MDD also did not change the conclusions. However, these variables were significant in some of the models as well: in both multilevel models and in the survival analysis on LIFE. Results indicated that patients who received psychological support for MDD and/or used ADM during the follow-up reported higher BDI-II scores at 24 months and were more likely to meet criteria for relapse on the LIFE than those without additional support. In the majority of cases, additional support was requested after relapse occurred (n = 6 v. n = 2 for support before relapse).
Discussion
The current study evaluated the long-term outcomes of acute phase CT v. IPT for MDD. In the context of a large RCT, we determined the course of self-reported depressive symptom severity up to 17 months after treatment termination, and tested whether CT and IPT differed throughout the follow-up phase. Furthermore, for treatment responders, rates of relapse and sustained response were examined for self-reported (BDI-II) and clinician-rated (LIFE) depression. On average, the symptom reduction achieved during the 7-month treatment phase was maintained across the follow-up period (7–24 months) for both CT and IPT. Effect sizes of change throughout the follow-up period were small. No differential effects between conditions were found. Two-thirds of the treatment responders completed the follow-up phase without meeting criteria for relapse on the BDI-II. Relapse rates assessed with the LIFE were somewhat lower. Patients who responded to IPT were no more likely to experience a return of symptoms than patients who responded to CT. The between-condition differences that were observed favored prior CT slightly but were not significant. This is important because CT has been shown to have an enduring effect that lasts beyond the end of treatment (relative to prior ADM) whereas IPT has not. Our findings are far from conclusive, but the findings from this trial suggest that IPT just might have a prophylactic effect.
One is always careful to make too much of what are essentially null findings, but there are several reasons why we think we might be justified in doing so in this instance. First, relapse rates were low in both conditions (around 33%), and are within the range of those reported in the previous CT studies (see e.g. Vittengl et al. Reference Vittengl, Clark, Dunn and Jarrett2007 for an overview) and similar to those obtained in the previous IPT studies (Shea et al. Reference Shea, Elkin, Imber, Sotsky, Watkins, Collins, Pilkonis, Beckham, Glass and Dolan1992). Furthermore, evidence for CT's enduring effect is relatively robust (Cuijpers et al. Reference Cuijpers, Donker, Weissman, Ravitz and Cristea2013b). While it is possible that we implemented CT in a less than adequate fashion, independent raters could tell CT from IPT in our trial and rated the quality of implementation as good (see Lemmens et al. Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2015). In addition, while we relied on cross-sectional monthly assessments on the BDI over the first 5 months of follow-up and year-long retrospective assessments on the LIFE for the second year, it is unlikely that we missed many relapses or recurrences since the temporal intervals on the BDI were short and we detected more recurrences at the end of the interval covered by the LIFE than the start. The fact that CT did not outperform IPT suggests that IPT might have enduring effects as well. In the absence of a control condition that does not have an enduring effect – such as ADM – we cannot conclude that both IPT and CT were prophylactic, but it remains a real possibility.
Our study has several strengths. First of all, it was the first to examine the long-term effects of IPT since the initial study by Shea et al. (Reference Shea, Elkin, Imber, Sotsky, Watkins, Collins, Pilkonis, Beckham, Glass and Dolan1992). Because we included a larger sample, provided high-quality IPT, and used more sophisticated statistical analyses techniques, our study goes beyond the initial study. Second, our RCT design provided a unique opportunity to directly compare the long-term outcomes of IPT with those of CT. Third, because we included both the BDI-II and the LIFE, our study provided information in terms of self-reported symptom severity change and in terms of clinician-rated DSM classifications. Moreover, by examining both relapse and sustained response rates, our study does not only provide information about the prognosis after successful initial treatment (i.e. what can a treatment responder expect after treatment termination?), but also about the prognosis at the start of treatment (i.e. what are the chances for successful and stable treatment effects when patients enter the clinic?). This is valuable information for clinical practice. Other strengths include the repeated-measures design and the inclusion of a series of sensitivity analyses. In addition, by using carefully considered and rather stringent definitions for response, relapse, and treatment status, we feel confident that we have not overestimated our effects.
There were limitations as well. Inherent to conducting a study with a long follow-up period, we were confronted with missing data. However, drop-out rates in our study were relatively low. Moreover, we accounted for missing data in our analyses by using mixed (multilevel) regression models, which are suitable to handle missing data (Schafer and Graham, Reference Schafer and Graham2002; Singer and Willett, Reference Singer and Willett2003; Snijders and Bosker, Reference Snijders and Bosker2012). Another factor complicating the interpretation of results was the naturalistic setting of the follow-up phase. Even though we addressed this by controlling for additional professional support for MDD, it is impossible to exactly control for all influencing parameters. However, this approach resembles clinical practice and hereby increases the generalizability of study findings. In addition, although the LIFE has shown to be a valid instrument to retrospectively assess depression severity up to 1 year, recall biases may have occurred. To conclude, our sample may have been too small to detect (smaller) differential effects between CT and IPT.
In sum, our findings suggest that IPT may have an enduring effect similar to that already established for CT. However, in order to make strong claims about the enduring effect for IPT, more powerful tests are needed. Comparisons to prior ADM are one possible option, as that is how the enduring effect for CT has been established. Furthermore, attention should be paid to the predictors and moderators of relapse, as we plan to do in a follow-up publication. More insight in the (relative) long-term effects of CT and IPT, and associated factors, can provide valuable information about treatment options and prognosis for depressed patients and may assist in the process of treatment selection (DeRubeis et al. Reference Derubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014; Huibers et al. Reference Huibers, Cohen, Lemmens, Arntz, Peeters, Cuijpers and Derubeis2015), hereby improving everyday health care for depression.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291718001083.
Acknowledgements
The authors wish to acknowledge the contribution of participating patients and therapists at RIAGG Maastricht. Furthermore, we gratefully thank Annie Raven and Annie Hendriks for their practical assistance during the study, and Prof Dr Robert DeRubeis and his laboratory members for their input on the definitions of response and relapse.
This research was funded by the research institute of Experimental Psychopathology (EPP), the Netherlands, and the Academic Community Mental Health Centre (RIAGG) in Maastricht, the Netherlands.
Author contributions
MH, AA, FP, and SH designed the trial. MH obtained funding for the study. LL conducted the trial and carried out recruitment and data collection. MH, AA, FP, and SH supervised throughout the study. All authors had full access to all the data in the study and share responsibility for the decision to submit for publication. LL, SvB, AA, and MH performed the data analysis and interpretation. LL drafted the manuscript in close collaboration with SvB. All authors provided critical revisions and have approved the final version of the manuscript.
Declaration of interests
None.