Introduction
Major depressive disorder (MDD), the most common mental health disorder, is currently the leading cause of disability worldwide with more than 300 million people affected (WHO, 2017). It is associated with high levels of comorbidity (e.g. anxiety and substance use disorders), leaving a pure MDD diagnosis accounting for only one-quarter of all diagnosed patients (Kessler et al., Reference Kessler, Nelson, McGonagle, Liu, Swartz and Blazer1996). MDD is also a highly recurrent disorder: those experiencing first episode depression have a 50–60% risk of developing a second episode, while relapse estimates reach 70 and 90% following a second and third episode, respectively (Burcusa and Iacono, Reference Burcusa and Iacono2007). To prevent likely relapse, clinicians recommend the use of psychological and/or pharmacological interventions. Uptake of pharmacological interventions has increased dramatically following the advent of third-generation antidepressant medications (ADMs), leading practitioners guidelines to recommend their use as a first-line treatment for severe MDD (NICE, 2009). Despite ADM efficacy, patients tend to prefer psychological interventions alone (McHugh et al., Reference McHugh, Whitton, Peckham, Welge and Otto2013), because of potential side-effects, withdrawal symptoms – often leading to post-withdrawal disorders and high costs – ADMs are 23% more expensive in comparison to psychological interventions (Churchill et al., Reference Churchill, Khaira, Gretton, Chilvers, Dewey, Duggan, Lee and Nottingham2000; Fava, Reference Fava2003; Johnstone, Reference Johnstone2003; Butler et al., Reference Butler, Chapman, Forman and Beck2006; Bockting et al., Reference Bockting, Klein, Elgersma, van Rijsbergen, Slofstra, Ormel, Buskens, Dekker, de Jong, Nolen, Schene, Hollon and Burger2018; Fava et al., Reference Fava, Benasi, Lucente, Offidani, Cosci and Guidi2018).
Thus, psychological interventions represent key alternative treatments for depression. Among these, systematic reviews (e.g. Cuijpers et al., Reference Cuijpers, Andersson, Donker and van Straten2011; Shinohara et al., Reference Shinohara, Honyashiki, Imai, Hunot, Caldwell, Davies, Moore, Furukawa and Churchill2013) and, more recently, network meta-analyses (Barth et al., Reference Barth, Munder, Gerger, Nuesch, Trelle, Znoj, Juni and Cuijpers2013) have demonstrated that seven therapy types – cognitive-behavioural, non-directive supportive, behavioural activation, psychodynamic, problem-solving, interpersonal psychotherapy and social skills training – show comparable, moderate to large effects in treating depression. This equal effectiveness of all therapies is commonly referred to as the Dodo Bird verdict. Originating from Lewis Carroll's ‘Alice's Adventures in Wonderland’, the Dodo bird's announcement of ‘everyone has won, all must have prizes’ translates the current status of these psychotherapies (Honyashiki et al., Reference Honyashiki, Furukawa, Noma, Tanaka, Chen, Ichikawa, Ono, Churchill, Hunot and Caldwell2014). This effectiveness equivalency finding, along with the high relapse rates consecutive to all depression therapies, has led to a research move towards identifying moderators of treatment response to inform practice and clinical guidance (NICE, 2009). Understanding moderators, that are the clinical and socio-demographic factors influencing therapy efficacy, is essential to optimise personalisation of treatment. Indeed, moderators can be used prescriptively to indicate who is going to respond better to one treatment over another (Fournier et al., Reference Fournier, DeRubeis, Shelton, Hollon, Amsterdam and Gallop2009). Recent studies increasingly focus on the assessment of such moderators of efficacy in otherwise comparable psychological treatments. For example, Driessen et al. (Reference Driessen, Smits, Dekker, Peen, Don, Kool, Westra, Hendriksen, Cuijpers and Van2016) conducted post-hoc analyses following a randomised controlled trial (RCT) and demonstrated that cognitive-behavioural therapy (CBT) was more beneficial than psychodynamic therapy for a depressive episode that lasted for <1 year, when the inverse pattern was observed for longer episodes: psychodynamic treatment was then more efficacious than CBT. The study also found that, in combination with medication, psychodynamic therapy was more efficacious than CBT for moderately or severely depressed patients. However, to conduct conclusive moderators testing, large sample sizes are needed to achieve the required power (Cuijpers et al., Reference Cuijpers, Ebert, Acarturk, Andersson and Cristea2016). Among the most widely used psychotherapies are CBT and interpersonal therapy (IPT) (Rucci et al., Reference Rucci, Frank, Calugi, Miniati, Benvenuti, Wallace, Fagiolini, Maggi, Kupfer and Cassano2011; Bayliss and Holttum, Reference Bayliss and Holttum2015). CBT, conceptualising MDD as the consequence of maladaptive cognitive processes and related behaviours, has received strong support from both efficacy (Hofmann et al., Reference Hofmann, Asnaani, Vonk, Sawyer and Fang2012) and effectiveness research (Butler et al., Reference Butler, Chapman, Forman and Beck2006). IPT, designed specifically for depression treatment, frames the disorder as the consequence of current interpersonal issues and has also strong empirical support for efficacy (Law, Reference Law2011; Barth et al., Reference Barth, Munder, Gerger, Nuesch, Trelle, Znoj, Juni and Cuijpers2013). Recent meta-analyses suggested that IPT may be more effective than other therapies for certain clinical presentations, further indicating the need for moderators of efficacy research (Cuijpers et al., Reference Cuijpers, Andersson, Donker and van Straten2011).
Comparing these therapies in single RCTs has led to mixed findings, with some studies concluding that CBT is more effective than long-term IPT (Shapiro et al., Reference Shapiro, Barkham, Rees, Hardy, Reynolds and Startup1994; Rossello et al., Reference Rossello, Bernal and Rivera-Medina2008), while others finding them of comparable efficacy (Luty et al., Reference Luty, Carter, McKenzie, Rae, Frampton, Mulder and Joyce2007; Lemmens et al., Reference Lemmens, Arntz, Peeters, Hollon, Roefs and Huibers2011). Meta-analytical studies tend to agree that neither is superior in treating depression (Miranda et al., Reference Miranda, Bernal, Lau, Kohn, Hwang and LaFromboise2005; Cuijpers et al., Reference Cuijpers, van Straten and Smit2006; Barth et al., Reference Barth, Munder, Gerger, Nuesch, Trelle, Znoj, Juni and Cuijpers2013). Recent evidence suggests that depending on the MDD presentation, either CBT or IPT can be the most efficacious (Driessen et al., Reference Driessen, Smits, Dekker, Peen, Don, Kool, Westra, Hendriksen, Cuijpers and Van2016). For example, greater initial depression severity predicts a better response to IPT relative to CBT (Elkin et al., Reference Elkin, Gibbons, Shea, Sotsky, Watkins, Pilkonis and Hedeker1995) while comorbid personality disorder and attachment disorder – a better response to CBT (McBride et al., Reference McBride, Atkinson, Quilty and Bagby2006; Carter et al., Reference Carter, Luty, McKenzie, Mulder, Frampton and Joyce2011). Treatment format has also been found a possible moderator, with individual CBT being shown as more effective than group CBT (Cuijpers et al., Reference Cuijpers, van Straten and Warmerdam2008; Craigie and Natan, Reference Craigie and Nathan2009). Some suggest that CBT and IPT are just as effective alone, as they are in adjunctive to ADMs (Thase et al., Reference Thase, Greenhouse, Frank, Reynolds, Pilkonis, Hurley, Grochocinski and Kupfer1997). Although clinical guidelines recommend the use of combined treatment, no significant trend towards the benefit of additional ADMs is observed in evaluative research (Otto et al., Reference Otto, Smits and Reese2005; Hollon et al., Reference Hollon, DeRubeis, Shelton, Amsterdam, Salomon, O'Reardon, Lovett, Young, Haman, Freeman and Gallop2005a, Reference Hollon, Jarrett, Nierenberg, Thase, Trivedi and Rush2005b; Mintz, Reference Mintz2006). Socio-demographic patients' characteristics also moderate treatment efficacy with female gender and increased age being associated with poorer CBT response (Thase et al., Reference Thase, Reynolds, Frank, Simons, McGeary, Fasiczka, Garamoni, Jennings and Kupfer1994; Hyer et al., Reference Hyer, Kramer and Sohnle2004) and IPT being suggested as a better alternative to CBT for geriatric depression (Hollon et al., Reference Hollon, DeRubeis, Shelton, Amsterdam, Salomon, O'Reardon, Lovett, Young, Haman, Freeman and Gallop2005a, Reference Hollon, Jarrett, Nierenberg, Thase, Trivedi and Rush2005b).
To date, only one systematic review sought to identify potential moderators of both CBT and IPT effectiveness (Zhou et al., Reference Zhou, Hou, Liu and Zhang2017). It included solely RCTs directly comparing the two treatments. Given the small number of studies included (n = 10), only one moderator could be assessed, i.e. study format (individual v. group), but its effect was not significant. Although RCTs are the gold standard for effectiveness assessment, they present a number of limitations for moderators' research. Firstly, the smaller overall population limits the statistical power associated with predictor by treatment interaction effects from analysis of variance (ANOVA), multiple and logistic regression models (Fournier et al., Reference Fournier, DeRubeis, Shelton, Hollon, Amsterdam and Gallop2009). Secondly, as RCTs tend to study depression without its comorbidities, representative MDD individuals with comorbidities are often excluded to maximise homogeneity, limiting the evaluation of that potential moderator (Budd and Hughes, Reference Budd and Hughes2009). Thirdly, restraining reviews to only direct comparisons does not lead to a representative sample of studies of either treatments, leading to questionable conclusions (Gartlehner and Moore, Reference Gartlehner and Moore2008). Combining studies through a meta-analytical approach allows to optimise power and compare efficacy beyond the limitations of research considering only RCTs of direct comparisons (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009).
Consequently, the present study aimed to (a) compare the overall efficacy of CBT and IPT for depression through a comprehensive systematic review, not limited to direct comparisons; and (b) evaluate, through a meta-analysis, the effects of commonly computed preselected moderators of therapy efficacy on face-to-face CBT and IPT.
Methods
Prisma guidelines for conducting and reporting systematic reviews were followed (Moher et al., Reference Moher, Liberati, Tetzlaff and Altman2009).
Search strategy and selection criteria
The electronic databases PsycArticles, PsycINFO, PubMed and Cochrane Library were searched from the year 1980 [MDD diagnosis current conceptualisation (APA, 2000)] to December 2017. For each database, the following search string was used: (depression OR major depressive disorder OR MDD OR major depression) AND (CBT OR cognitive behavi* therapy OR IPT OR interpersonal psychotherapy OR interpersonal therapy) AND (cohort OR longitudinal OR response OR panel OR prospective OR retrospective OR predictor).
Studies published in English were included if: (1) patients whose primary diagnosis was MDD [according to the Diagnostic and Statistical Manual of Mental Disorders (DSM)-III, DSM-III-R, DSM-IV, DSM-IV-TR, DSM-5 (APA, 2000), International Classification of Diseases (ICD)-9, or ICD-10 (WHO, 1992) criteria] (2) received individual or group CBT or IPT as the only psychological treatment of depression where (3) these therapies followed a recognised format and have not been altered or extended with other psychological components (e.g. integrated CBT); (4) were administered in a face-to-face setting, with (5) depression severity being quantitatively assessed both pre- and post-therapy using a validated measure with standardised cut-offs for mild, moderate, severe and very severe depression (e.g. the Hamilton Rating Scale for Depression, HAM-D). Both RCTs and observational designs were included. Patients may have comorbidities, provided depression was the primary diagnosis. Antidepressant pharmacotherapy (ADMs) was permitted as the only adjunctive possible. Case series and online therapies were excluded.
Data collection and coding procedures
References returned from each database were imported to the reference managing software Endnote X8 and duplicates removed. Two raters independently screened titles and abstracts of articles for potential inclusion. Disagreements were resolved through consensus discussions between the authors. Full texts of potentially eligible studies were then assessed against the inclusion criteria. Where sufficient data for inclusion was not reported (e.g. missing post-treatment depression severity score), a request for this information was e-mailed to the corresponding author.
From each study (n) who met the inclusion criteria, the following variables were coded for each included sample (k): (1) type of therapy administered, CBT or IPT; (2) number of participants at each assessment time point; (3) mean (M) and standard deviation (s.d.) of pre- and post-depression severity scores; (4) time delay (days) between pre- and post-treatment depression severity assessment; (5) patient demographics, including: mean age, gender and employment status; (6) clinical characteristics, including initial depression severity, number of previous depressive episodes and comorbidities; and (7) therapy characteristics, including concomitant ADMs, inpatient or outpatient setting, group or individual format, and number of dropouts.
Statistical methods/meta-analyses
From each study, for each independent sample (either CBT or IPT), using the pre- and post-depression severity scores, standardised mean difference effect sizes (ESs) with confidence intervals (CI) were calculated for each sample using Cohen's d index of individual effects: dk = (M1k–M2k)/s.d.pk, where d is the effect size, k the individual sample, M1k pre-treatment mean, M2k post-treatment mean and s.d.pk is the pooled standard deviation. A standardised ES is necessary as although all studies assessed depression severity, different scales were used. A positive ES indicated that depression severity was lower post-treatment. These ESs were interpreted according to Cohen's cut off recommendations of 1.30, 0.80, 0.50 and 0.20 for, respectively very large, large, medium and small ES (Cohen, Reference Cohen1988).
CBT and IPT samples were then separately pooled using an inverse-variance weighted random-effects model on Comprehensive Meta Analysis software. The random-effects model is based on the assumption that different studies estimate different, yet related intervention effects, therefore assigning more weight to larger samples (DerSimonian and Laird, Reference DerSimonian and Laird1986). As studies pooled in this analysis varied in sample characteristics and depression scales administered heterogeneity will be substantial, thus justifying a random-effects model. Hedges' g correction for small sample bias was the effect size chosen for this software as it provides better estimates for smaller sample sizes.
Prior to conducting the moderator analyses, CBT and IPT were compared using subgroup ANOVA. This analysis assumes between study variation allowing to determine if, overall, the therapies are comparable or significantly different in the treatment of depression (Yusuf et al., Reference Yusuf, Wittes, Probstfield and Tyroler1991). Then, CBT and IPT samples were compared on the delay in days between pre-treatment and post-treatment to assess the need for controlling of this possible confound on the intervention effect. The I 2 index was also examined to further quantify true heterogeneity across the samples within each treatment not otherwise due to chance. This was interpreted using the recommended cut-offs: 25% small, 50% moderate and 75% large heterogeneity (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009).
Moderator analyses were then conducted on the prespecified variables. Random effects method of moments meta-regressions were conducted when ⩾10 studies reported on a continuous moderator variable (Borenstein et al., Reference Borenstein, Hedges, Higgins and Rothstein2009). This analysis was used to demonstrate if the outcome variable (ES in depression severity change) is predicted by explanatory moderator variables across CBT and IPT therapies. Continuous moderators included age, gender (% male), employed (%), number of previous episodes and number of dropouts. When a regression model was non-significant, but the moderator of interest was a significant predictor within the model, separate regressions were conducted for the CBT and IPT samples. If the model and the predictor were non-significant, no further analysis was conducted. Sub-group mixed effects meta-analyses were conducted when at least three studies per level were available for the categorical moderators, which included initial depression severity (mild/moderate/severe/very severe); therapy format (individual/group); therapy setting (inpatient/outpatient); comorbidities (yes/no); ADM (yes/no). The categorical moderators were first compared at each level between CBT and IPT samples to determine if CBT or IPT were more effective in treating depression at that level. However, if the therapies were comparable and high levels of heterogeneity remain, then a within-therapy sub-group meta-analysis was also conducted to determine if, for that therapy, there was a significant moderator's effect on the main outcome.
To address publication bias, Orwin's Fail Safe N procedure was calculated to determine the number of unpublished studies with null findings that would reduce our significant moderator findings to a ‘trivial’ ESs (Orwin, Reference Orwin1983). A trivial ES is defined as ≤0.18 (Cohen, Reference Cohen1988). The likelihood of publication bias is minimal if the fail-safe N is >5k + 10 (Rosenthal, Reference Rosenthal1991). To control for potential Type-I errors associated with multiple comparisons, two steps were taken. Firstly, all moderators were chosen a priori with a clear rationale for selection (Bender et al., Reference Bender, Bunce, Clarke, Gates, Lange, Pace and Thorlund2008). Secondly, the Holm's (Reference Holm1979) sequentially rejective multiple hypotheses test was applied. Adjustments for multiple testing are generally not recommended for meta-analyses (Bender et al., Reference Bender, Bunce, Clarke, Gates, Lange, Pace and Thorlund2008) due to samples differing across outcome variables. However, we applied the correction for the family of tests at all levels of each moderator where such an overlap was possible (Bender et al., Reference Bender, Bunce, Clarke, Gates, Lange, Pace and Thorlund2008; Higgins et al., Reference Higgins, Deeks, Altman, Higgins and Green2008; Polanin, Reference Polanin2013; Streiner, Reference Streiner2015). The Holm's correction combines the Bonferroni theorem with a step-up procedure. Initially, within each moderator, all p values are ordered from smallest to largest. Beginning with the smallest, p values were entered into the following formula: a* = a/(n–k). Whereby a* represents the new alpha level, a is the original alpha level of 0.05, n is the number of tests and k is the position of the p value in the ordered list. The p values were then tested against their new alpha level (a*). As this is a step-up procedure, once one null hypothesis is accepted (p > a*), hypothesis testing stops (Polanin, Reference Polanin2013).
Results
After deleting duplicates, the search strategy identified 9878 citations from which 758 were assessed for eligibility and 137 were meta-analysed. Figure 1 represents the review process.

Fig. 1. PRISMA flowchart of the review process and study selection.
From the 137 studies (n), 168 samples (k) were extracted. A total of 11 374 patients (N), treated with either CBT N = 9375 or IPT N = 1999 were included in the analysis. Sample Ns ranged 7–639, with mean age of 38.34 for CBT (range 12.7–74.4) and 34.96 for IPT (10.6–51.2). There was no significant difference in mean ages between CBT and IPT samples, t (158) = 1.59, p = 0.11, but there was a significant association between gender and therapy, χ2(1) = 76.53, p < 0.001, with males being more likely to be treated with CBT: mean % males 34.6% than IPT: 25.5%. Also, no significant difference was observed on the pre-post treatment outcome assessment delay (days) between CBT (M = 102.17, s.d. = 46.78) and IPT (M = 103.38, s.d. = 34.68) samples, t (135) = −0.146, p = 0.35. Thus, even if spontaneous symptom recovery occurred within the samples, it has played equally in both CBT and IPT. Table 1 details percentage of meta-analysed studies reporting of each prespecified moderator. See full details on extracted data in online Supplementary Tables S1–S6.
Table 1. Percentage of moderators reported across all CBT and IPT samples

Overall treatment efficacy
No significant differences were observed between the therapies with very large ESs characterising both CBT and IPT for the treatment of depression (Table 2).
Table 2. Meta-analyses of overall treatment effect and moderator effects of initial depression severity, comorbidities, therapy format, therapy setting and concomitant antidepressant medications

CI, confidence interval; k, number of samples; N, number of patients; g, Hedges' g effect size; LL, lower limit; UL, upper limit; Zw, within-group heterogeneity; Qb, between-group heterogeneity; p, significance value.
Table 3. Meta-regression mixed-effects analyses of the following moderators: age, gender, employment, number of previous episodes and number of dropouts

CI, confidence interval; T, therapy; k, number of samples; N, number of patients; b, predictor coefficient; Y, intercept; LL, lower limit; UL, upper limit; p, significance value of named predictor; Q m, heterogeneity of the model; p m, significance value of the model; R 2, coefficient of determination.
Demographic moderators
Age
Data from 160 samples were pooled to examine this moderator effect: k CBT = 114 and k IPT = 46. No significant Q index was observed, Q model(2) = 4.28, p = 0.12. Age however, was a significant predictor in the model when controlling for therapy (p = 0.044). Thus, to further investigate its moderating effect, age was entered into separate regression models for CBT and IPT. In CBT samples, age was a significant predictor of efficacy, Q model(1) = 5.21, p = 0.021, with CBTs efficacy decreasing as the mean age increased. Orwin's fail-safe N was robust, indicating that 950 unpublished studies with null effects would be needed to invalidate this result (benchmarkk=114 = 580). In IPT samples, age was not a significant predictor of efficacy (p = 0.53).
Gender
Data from 163 samples were pooled to examine this moderator effect: k CBT = 117 and k IPT = 46. The model was non-significant Q model(2) = 0.91, p = 0.63. Gender was also not a significant predictor of treatment efficacy when controlling for therapy type (p = 0.40).
Employment status Data from 44 samples – k CBT = 30, k IPT = 14 – examined this moderator effect. Q index was not significant, Q model(2) = 0.54, p = 0.76 and employment remained a non-significant predictor of treatment efficacy when controlling for therapy (p = 0.50).
Clinical moderators
Initial depression severity
Data from 152 samples – k CBT = 108, k IPT = 44 – were pooled to examine this moderator effect. No significant differences were observed between CBT and IPT in treating moderate (p = 0.93), severe (p = 0.19) or very severe depression (p = 0.81). All levels of initial depression significantly predicted treatment efficacy with very large ESs (p < 0.001). However, due to moderate to high levels of heterogeneity (I 2 = 51.3–90.2%), within-therapy analyses were also conducted. For both CBT (k = 3) and IPT (k = 2), there were insufficient samples to include ‘very severe’ level of initial depression in the subsequent moderator analyses. For CBT, sufficient samples were available to analyse this moderator effect for mild (k = 6), moderate (k = 65) or severe (k = 34) initial depression. A significant difference was observed between levels of depression with efficacy increasing with initial depression severity (see Table 2). Orwin's fail-safe N was robust, indicating an additional 810 studies with null effects would be need to invalidate this result this result (benchmarkk=105 = 535). Post-hoc analyses were conducted to assess if CBT samples with severe initial depression had the greatest treatment effect due to potentially taking adjunctive ADMs. Post-hoc subgroup ANOVA was conducted on severe initial depression samples which reported adjunctive ADMs (k = 13) or no ADMs (k = 15). No significant difference was observed between the groups (p = 0.40), suggesting that CBT is significantly more effective for severe depression both alone and with ADMs. For IPT, sufficient samples were available to analyse this moderator effect for moderate (k = 23) and severe (k = 19) depression, with no significant difference being observed (p = 0.79).
Comorbidities
Data from 128 samples – k CBT = 94, k IPT = 34 – examined this moderator effect. Based on samples that reported percentages of comorbidities (k = 49), on average 55.47% (range 10–100%) of participants presented with at least one comorbid Axis I or II disorder. No significant difference was observed between CBT or IPT efficacy whether comorbidities were present (p = 0.38) or not (p = 0.68). High heterogeneity was observed in both subgroups justifying within-therapy analysis. No significant difference in efficacy for depression with or without comorbidities was observed neither within the CBT samples (p = 0.12) nor within the IPT samples (p = 0.16).
Number of previous episodes
Insufficient number of IPT samples were available to analyse this moderator effect (k = 5). Meta-regression was thus carried out on CBT samples, k = 20, only and led to a non-significant Q index for number of previous episodes as a predictor of CBT efficacy Q model(1) = 0.00, p = 0.997.
Therapy moderators
Format
Data from 141 samples – k CBT = 105, k IPT = 36 – examined this moderator effect. No significant differences were found between IPT and CBT delivered in a group (p = 0.52) or individual (p = 0.19) format. Both formats significantly predicted treatment efficacy (p < 0.001) with very large ESs. Heterogeneity was high in both sub-group justifying within-therapy analysis. For CBT, a significant difference was observed between the two formats: Q(1) = 10.75, p = 0.001, with individual therapy (g = 1.72, 95% CI 1.55–1.90, k = 81) showing better efficacy than group therapy (g = 1.31, 95% CI 1.12–1.49; k = 24). Orwin's fail-safe N indicated an additional 786 samples with null effects would be needed to invalidate this result (benchmarkk=105 = 535). For IPT, there was no significant difference between individual (k = 30) and group (k = 6) formats: Q(1) = 0.003, p = 0.96, with both displaying positive very large ESs (p < 0.001).
Setting
Insufficient data were available to analyse inpatient setting for IPT (k = 1). Data from 156 samples – k CBT = 112, k IPT = 44 – examined the relative efficacy of the two treatments in an outpatient setting. No significant difference was observed: p = 0.32. For CBT, there was no significant difference between inpatient (k = 6) and outpatient (k = 112) settings: p = 0.26, with both settings being associated with positive very large ESs (p < 0.001).
Concomitant ADMs
Data from 141 samples – k CBT = 101, k IPT = 40 – examined this effect. No significant between-treatment differences were found between IPT and CBT for concomitant ADMs. ADMs prescribed were selective serotonin reuptake inhibitors (e.g. citalopram), serotonin–norepinephrine reuptake inhibitors (e.g. venlafaxine) and Tricyclic (e.g. amitriptyline) (see online Supplementary Tables S5 and S6). For samples without concomitant ADMs, CBT (g = 1.82, 95% CI 1.63–2.05) was associated with significantly better efficacy than IPT (g = 1.54, 95% CI 1.34–1.73), p = 0.037. Orwin's fail-safe N indicated an additional 700 samples with null effects would be needed to invalidate this result (benchmarkk=84 = 430). In CBT samples, absence to concomitant ADMs led to larger clinical improvements (g = 1.82, 95% CI 1.63–2.01) than concomitant ADMs (g = 1.42, 95% CI 1.24–1.61), Q(1) = 8.74, p = 0.003. Orwin's fail-safe N indicated an additional 808 samples with null effects would be needed to invalidate this result (benchmarkk=101 = 515). For IPT, no significant difference was observed between receiving concomitant ADMs or not (p = 0.56), with both prescriptions leading to positive very large ESs (p < 0.001).
Number of dropouts
Data from 107 samples were pooled to examine the moderator effect of number of dropouts: k CBT = 77 and k IPT = 30. No significant Q index was observed Q model(2) = 2.50, p = 0.29. The number of dropouts was also not a significant predictor of efficacy when controlling for therapy type (p = 0.16).
Post hoc sensitivity analyses of study's design effects
Post-hoc sensitivity analyses were conducted to determine if study design, that is RCTs v. non-RCT studies, has affected the results. These were conducted on all moderators and across all levels of moderators. All analyses, except two relating to the therapy moderator ADMs, showed equivalent results between RCTs and non-RCTs. Specifically, the overall treatment effects were similar both within CBT (g RCT = 1.72, gnon−RCT = 1.52, p = 0.13) and IPT samples (g RCT = 1.60, g non−RCT = 1.52, p = 0.67). There were no significant effects of design on any demographic (all p values>0.095) or clinical (all p values>0.079) moderators. Interestingly, with the exception of studies where only patients with no comorbidity were studied, for both therapies, RCTs tended to show larger, although not significantly so, ESs of effectiveness compared to non-RCTs. With regard to concomitant ADMs, post-hoc analyses showed that, overall, RCTs led to significantly larger ESs (g = 1.83, CI 1.52–2.14) than non-RCTs (g = 1.17, CI 0.99–1.34, p < 0.001). This result was explained by a therapy effect: there was not a significant difference between RCTs and non-RCTs for IPT (p = 0.19), but there was for CBT (g RCT = 1.80 CI 1.44–2.15; g non−RCT = 1.10, CI 1.70–2.25, p = 0.001). As an effect for design was found for this moderator, for RCT with concomitant ADMs only, CBT and IPT were compared, but no significant difference was observed (p = 0.72). See online Supplementary Table S7 for full results of sensitivity analyses.
Results following Holm's (Reference Holm1979) sequentially rejective multiple hypotheses test
Holm's (Reference Holm1979) test was applied where overlapping samples were observed. Among these, three out of five significant p values remained significant: namely, the findings that CBT is significantly moderated by age (p = 0.021), therapy format (p = 0.001) and ADMs (p = 0.003) remained significant under their respective adjusted alpha levels within their moderator's family of tests. However, the result that CBT was significantly more effective than IPT without concomitant antidepressants (p = 0.037) did not remain significant after applying a more conservative alpha of 0.025. Similarly, the significant moderating effect of initial depression on CBT (p = 0.022) did not remain significant under the adjusted alpha of 0.0125. For a full overview of the Holm's procedure and all adjusted alpha levels see online Supplementary Table S8.
Discussion
The current systematic review and meta-analyses aimed at determining moderators of efficacy of face-to-face CBT and IPT for MDD. Our results further supported existing evidence of the equivalent overall treatment effects of CBT and IPT for depression (Miranda et al., Reference Miranda, Bernal, Lau, Kohn, Hwang and LaFromboise2005; Cuijpers et al., Reference Cuijpers, van Straten and Smit2006; Weisz et al., Reference Weisz, McCarty and Valeri2006). Between-therapy moderator analyses also showed comparable efficacy of CBT and IPT across age, gender, employment status, initial depression severity, presence of comorbidities, number of previous episodes, therapy formats, therapy settings and number of dropouts. However, a significant difference between CBT and IPT was observed when examining the prespecified moderator of concomitant ADMs use. Specifically, CBT was superior to IPT in treating depression when therapies were administered alone, i.e. without adjunctive ADMs.
Within-therapy analyses showed the effect of CBT to be moderated by age, initial depression severity, therapy format and adjunctive ADMs. Namely, CBTs efficacy declined as patients' age increased and was more effective in treating severe initial depression than moderate or mild depression. CBT was also more effective when delivered in an individual rather than in group format and when administered alone rather than with concomitant ADMs. Within-therapy analyses of IPT did not identify any significant effect on the efficacy of the preselected moderators.
CBTs efficacy moderators
Few individual studies have compared CBT efficacy across age groups. However, the present meta-analysis, cumulating strong power through analyses of 9375 participants from120 samples, shows a decline of efficacy with sample's mean age increase. This result is consistent with trial findings showing that increasing age predicted poorer response to cognitive therapy (Fournier et al., Reference Fournier, DeRubeis, Shelton, Hollon, Amsterdam and Gallop2009). Other researchers have also expressed concern over the use of CBT with older patients, as they consider the cognitive slowing associated with ageing to negatively impact the treatment delivery (Hyer et al., Reference Hyer, Kramer and Sohnle2004). Indeed, CBT staples of assigning homework or challenging distorted cognitions, both outlined as less effective and less preferred for older patients (Hyer et al., Reference Hyer, Kramer and Sohnle2004). Considering age as a prescriptive factor has important clinical applications for both treatment selection and in the development of personalised treatment guidelines with regard to age.
The moderation of CBT efficacy by the therapy format supports and expands on the results of an earlier meta-analysis of 15 studies suggesting that individual CBT might be more effective than group CBT based on post-treatment depression scores alone (Cuijpers et al., Reference Cuijpers, van Straten and Warmerdam2008). At the time, the authors recommended further research given their sample size. Our meta-analysis of 8004 participants drawn from 105 samples demonstrates the significant superiority of individual CBT relative to group CBT with very strong power and based on comparisons of differences between pre- and post-treatment changes. Possible explanations for the benefit of individual therapies centre around the nature of CBT for depression. Due to the severity of depression and associated symptoms, patients may find it easier to engage with CBT in an individual setting (Craigie and Natan, Reference Craigie and Nathan2009). This finding has important clinical implications as currently, group formats are widely disseminated, partly due to their apparent cost-effectiveness. Although group formats are still effective in reducing depressive symptoms, individual CBT is a significantly more efficacious therapy, supporting further the importance of treatment personalisation.
Drawing strong power from the analysis of 7163 participants from 105 samples, results demonstrated that CBT was significantly more effective for those with severe depression in comparison to moderate or mild depression, even when controlling for adjunctive ADMs. Although this finding did not remain significant under the Holm's (Reference Holm1979) multiple testing correction, it is important to note that corrections of multiplicity are not routinely used in meta-analysis as their test assumptions are rarely met by meta-analytic data. Therefore, multiple testing correction results within this study should be interpreted with caution (Bender et al., Reference Bender, Bunce, Clarke, Gates, Lange, Pace and Thorlund2008; Higgins et al., Reference Higgins, Deeks, Altman, Higgins and Green2008; Streiner, Reference Streiner2015). Nevertheless, finding that CBT is more effective for severe depression is consistent with the results of a naturalistic study of 193 patients with depression where this superiority was partly attributed to a regression of the mean, with depression scores that are significantly higher than the mean (high severity) pre-treatment being likely to become closer to the mean at re-assessment assessment (Schindler et al., Reference Schindler, Hiller and Witthoft2013). While this may be a possibility, the ESs observed in the present meta-analysis were significantly different at all severity levels, making it unlikely that a regression to the mean occurred. From a clinical perspective, severity of depression appears as an important prescriptive factor for treatment personalisation. Patients with severe depression are often prescribed ADMs alone in line with practitioners' guidelines (NICE, 2009). The present results suggest that CBT should be consistently considered as a first-line treatment for severe depression.
Furthermore, in relation to ADMs use, our meta-analysis of 8421 participants drawn from 101 samples showed that CBT alone, with no ADMs, was significantly more effective than CBT with concomitant ADMs. This result is at odds with some clinical guidelines that recommend the use of CBT plus ADMs (NICE, 2009). However, this result does align with both a narrative review and a meta-analysis suggesting that CBT, unlike other psychotherapies, is much less effective in combination with ADMs than alone (Cuijpers et al., Reference Cuijpers, van Straten, Warmerdam and Andersson2009; Craighead and Dunlop, Reference Craighead and Dunlop2014). One possible explanation for this finding may be acceptance of ADMs, depressed individuals are three times more likely to choose psychological therapies over ADMs (McHugh et al., Reference McHugh, Whitton, Peckham, Welge and Otto2013). While a possibility, it is unlikely this explains the current finding as most included samples allowed patients to continue their previous ADM treatment instead of newly prescribing ADMs. However, future studies might assess this hypothesis. Another, more plausible explanation for this finding is related to Johnstone's (Reference Johnstone2003) argument that ADMs can potentially limit CBT engagement. While relieving mood symptoms, ADMs trigger an emotional blunting which conflicts with the very nature of CBT (Fava et al., Reference Fava, Benasi, Lucente, Offidani, Cosci and Guidi2018). Their withdrawal symptoms and common side-effects of anxiety, insomnia, and agitation can further hinder therapy process and engagement (Churchill et al., Reference Churchill, Khaira, Gretton, Chilvers, Dewey, Duggan, Lee and Nottingham2000; Fava, Reference Fava2003; Johnstone, Reference Johnstone2003; Butler et al., Reference Butler, Chapman, Forman and Beck2006; Bockting et al., Reference Bockting, Klein, Elgersma, van Rijsbergen, Slofstra, Ormel, Buskens, Dekker, de Jong, Nolen, Schene, Hollon and Burger2018; Fava et al., Reference Fava, Benasi, Lucente, Offidani, Cosci and Guidi2018). The meta-analysis results suggest that prescribing ADMs alongside CBT in future clinical practice should be considered on a careful, case-by-case basis as the therapy alone may prove more effective for the treatment of mild, moderate and severe depression in comparison to combined with adjunctive ADMs treatment.
Moderators of IPT efficacy
IPT showed equivalent treatment effects across all prespecified moderators. This contrasts with previous studies outlining IPT as less effective in the presence of comorbidities (Cyranowski et al., Reference Cyranowski, Frank, Shear, Swartz, Fagiolini, Scott and Kupfer2005); or for high-severity depression (Frank et al., Reference Frank, Cassano, Rucci, Thompson, Kraemer, Fagiolini, Maggi, Kupfer, Shear, Houck, Calugi, Grochocinski, Scocco, Buttenfield and Forgione2011). A possible explanation for this finding can be related to sample sizes – 18 (Cyranowski et al., Reference Cyranowski, Frank, Shear, Swartz, Fagiolini, Scott and Kupfer2005) and 117 (Frank et al., Reference Frank, Cassano, Rucci, Thompson, Kraemer, Fagiolini, Maggi, Kupfer, Shear, Houck, Calugi, Grochocinski, Scocco, Buttenfield and Forgione2011) patients received IPT in these studies, thus limiting their respective results' generalizability. Our large IPT sample (N = 1999) allows the conclusion that IPT appears to be equally effective for the treatment of depression across these moderators.
Despite IPT's comparable efficacy to CBT, it remains a much less prescribed treatment for depression. Only one-third of the samples retrieved received IPT. This represents a limitation of the meta-analysis, as insufficient data was available to investigate if the number of previous episodes, or very severe depression, moderated the treatment outcome. Similarly, although data were sufficient to examine age as moderator of IPT outcomes, these samples had a maximum mean age of 51, unlike CBT samples with a maximum mean age of 74. Nevertheless, IPT was shown an effective treatment for depression across a range of demographic, clinical and therapeutic moderators. IPT should, therefore, be consistently considered in clinical guidelines, applied more frequently in clinical practice and further investigated for other moderators of efficacy.
Limitations of within-therapy analysis
Firstly, high levels of heterogeneity were observed throughout the analysis. One possible reason is that the label CBT has been applied to a variety of interventions that do not always reflect the pure therapy strain. Although inclusion criteria aimed to control for this by specifying CBT must not be altered or extended, in practice, as the therapy is often modified beyond focus, this might not always be reported in the original articles (Hyer et al., Reference Hyer, Kramer and Sohnle2004). Secondly, even though significant moderator effects were strong, they could not account for most of the variance in treatment outcomes. These moderators may be interacting to explain variability, but complex interactive models were not examined, as larger samples would be needed for reliable results. Thirdly, recovery or remission rates were not analysed, thus limiting the current results to post-treatment outcomes relative to the pre-treatment depression severity. However, by optimising in this way the overall sample analyses, the meta-analytical findings are representative of existing research variations and have stronger power than selective (non-representative) sampling.
Between-therapies comparisons everyone has won, all must get prizes?
In line with previous meta-analytic studies, our research demonstrated that overall, CBT and IPT are equally effective in treating depression, therefore supporting again the Dodo bird verdict (Miranda et al., Reference Miranda, Bernal, Lau, Kohn, Hwang and LaFromboise2005; Cuijpers et al., Reference Cuijpers, van Straten and Smit2006; Weisz et al., Reference Weisz, McCarty and Valeri2006). This verdict also spread wings across all but one of the preselected ten moderators. Interestingly, the meta-analysis showed that CBT was significantly more effective than IPT for the treatment of depression when there is no concomitant ADMs prior to a multiple testing correction. Therefore, considering (a) the significant difference between CBT and IPT when ADMs were excluded and (b) the fact that previous intervention studies displayed mixed findings, can we really conclude from this analysis that in the end, everyone has won, all must have prizes?
One possible explanation for this finding is that when controlling for ADMs, CBT is, in fact, more effective than IPT. While this contradicts the Dodo bird verdict and previous meta-analytic research, it is not spurious. For example, returning to the aforementioned trials by Luty et al. (Reference Luty, Carter, McKenzie, Rae, Frampton, Mulder and Joyce2007) and Rossello et al. (Reference Rossello, Bernal and Rivera-Medina2008) both excluded patients on ADMs came to the same conclusion that CBT was more effective than IPT. While this explanation is possible, high levels of heterogeneity remained and this result did not remain significant under a conservative alpha correction; therefore, again, there is a possibility of interaction effects. A second explanation can be related back to the within-therapy findings on CBT and ADMs. While the current meta-analysis supports previous research conclusions that combined treatment may be overvalued and not necessarily required for CBT, the same has not been demonstrated for IPT (Hollon et al. Reference Hollon, DeRubeis, Shelton, Amsterdam, Salomon, O'Reardon, Lovett, Young, Haman, Freeman and Gallop2005a, Reference Hollon, Jarrett, Nierenberg, Thase, Trivedi and Rush2005b; Mintz, Reference Mintz2006). As a result, CBT without ADMs may be superior compared to both CBT plus ADMs and IPT without ADMs, which has important clinical implications. The application of CBT alone at most severity levels will not only contribute to favourable costs, it will also be a better long-term treatment plan avoiding the aforementioned deleterious effects of long-term ADM use (Fava, Reference Fava2003; Butler et al., Reference Butler, Chapman, Forman and Beck2006; Otto et al., Reference Otto, Smits and Reese2005).
Excluding the effect of concomitant ADM therapy, CBT and IPT demonstrated equal efficacy across all other moderators. Unlike the most recent meta-analysis conducted by Zhou et al. (Reference Zhou, Hou, Liu and Zhang2017) and the majority of single moderator studies, our meta-analysis was not limited to RCTs, as recommended by Westen et al. (Reference Westen, Novotny and Thompson-Brenner2004). While this increased the external validity and generalizability of the current results, as any meta-analysis, ours was also constrained by the studies included (Harrison, Reference Harrison2011). Closer inspection of the moderator ‘comorbidities’ demonstrated that almost half of the studies completely excluded patients who presented with comorbidities. Even though this was not the case for all, according to the average percentage of participants presenting with comorbidities in this study (55.47%), this exclusion of comorbidities resulted in potentially half of the depressed population missing from many analyses, thus perhaps interfering with the identification of some other significant moderators. Nevertheless, considering that our significant findings were robust against publication bias assessment and the sufficient power to analyse all prespecified moderators, our findings strongly support the comparable short-term efficacy of face-to-face CBT and IPT, while highlighting the importance of moderating variables within each therapy.
Our meta-analysis also has important clinical and research implications. Firstly, researchers may need to reconsider existing clinical guidelines asking for the examination of efficacy moderators by addressing the factors that might prevent us from identifying useful moderators. Future research should consider depression together with all its complexities and focus on a more ecological definition of the disorder. If studies continue to exclude patients with comorbidities, who represent the MDD reality, actual therapy efficacy will never be comprehensively examined and thus conclusions on relapse will remain lacking. Secondly, cumulating the present findings, this study also suggests that combined therapy (psychotherapy with concomitant ADMs) should be re-considered. Such treatment is currently more expensive and shows little evidence of superiority for MDD. Thus, considering the superiority of CBT alone and the side-effects, tapering problems and withdrawal symptoms associated with ADMs, combined treatment should be prescribed carefully, only in complex cases and on a case-by-case basis (Bockting et al., Reference Bockting, Klein, Elgersma, van Rijsbergen, Slofstra, Ormel, Buskens, Dekker, de Jong, Nolen, Schene, Hollon and Burger2018; Fava et al., Reference Fava, Benasi, Lucente, Offidani, Cosci and Guidi2018). Therefore, supporting the conclusions of a recent meta-analysis conducted by Fava et al. (Reference Fava, Benasi, Lucente, Offidani, Cosci and Guidi2018), the use of ADMs, in this case combined with therapy, should only be targeted at the most persistent cases of MDD and for the shortest possible time.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291719002812