Personalized prediction of antidepressant v. placebo response: evidence from the EMBARC study

Christian A. Webb; Madhukar H. Trivedi; Zachary D. Cohen; Daniel G. Dillon; Jay C. Fournier; Franziska Goer; Maurizio Fava; Patrick J. McGrath; Myrna Weissman; Ramin Parsey; Phil Adams; Joseph M. Trombello; Crystal Cooper; Patricia Deldin; Maria A. Oquendo; Melvin G. McInnis; Quentin Huys; Gerard Bruder; Benji T. Kurian; Manish Jha; Robert J. DeRubeis; Diego A. Pizzagalli

doi:10.1017/S0033291718001708

Personalized prediction of antidepressant v. placebo response: evidence from the EMBARC study

Published online by Cambridge University Press: 02 July 2018

Christian A. Webb ,

Madhukar H. Trivedi ,

Myrna Weissman and

Christian A. Webb*: Affiliation:
Harvard Medical School – McLean Hospital, Boston, MA, USA
Madhukar H. Trivedi: Affiliation:
University of Texas, Southwestern Medical Center, Dallas, TX, USA
Zachary D. Cohen: Affiliation:
University of Pennsylvania, Philadelphia, PA, USA
Daniel G. Dillon: Affiliation:
Harvard Medical School – McLean Hospital, Boston, MA, USA
Jay C. Fournier: Affiliation:
University of Pittsburgh School of Medicine, Pittsburgh, PA, USA
Franziska Goer: Affiliation:
Harvard Medical School – McLean Hospital, Boston, MA, USA
Maurizio Fava: Affiliation:
Harvard Medical School, Massachusetts General Hospital, Boston, MA, USA
Patrick J. McGrath: Affiliation:
New York State Psychiatric Institute & Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, NY, USA
Myrna Weissman: Affiliation:
New York State Psychiatric Institute & Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, NY, USA
Ramin Parsey: Affiliation:
Stony Brook University, Stony Brook, NY, USA
Phil Adams: Affiliation:
New York State Psychiatric Institute & Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, NY, USA
Joseph M. Trombello: Affiliation:
University of Texas, Southwestern Medical Center, Dallas, TX, USA
Crystal Cooper: Affiliation:
University of Texas, Southwestern Medical Center, Dallas, TX, USA
Patricia Deldin: Affiliation:
University of Michigan, Ann Arbor, MI, USA
Maria A. Oquendo: Affiliation:
University of Pennsylvania, Philadelphia, PA, USA
Melvin G. McInnis: Affiliation:
University of Michigan, Ann Arbor, MI, USA
Quentin Huys: Affiliation:
University of Zurich, Zurich, Switzerland
Gerard Bruder: Affiliation:
New York State Psychiatric Institute & Department of Psychiatry, College of Physicians and Surgeons of Columbia University, New York, NY, USA
Benji T. Kurian: Affiliation:
University of Texas, Southwestern Medical Center, Dallas, TX, USA
Manish Jha: Affiliation:
University of Texas, Southwestern Medical Center, Dallas, TX, USA
Robert J. DeRubeis: Affiliation:
University of Pennsylvania, Philadelphia, PA, USA
Diego A. Pizzagalli: Affiliation:
Harvard Medical School – McLean Hospital, Boston, MA, USA
*: Author for correspondence: Christian A. Webb, E-mail: cwebb@mclean.harvard.edu

Article contents

Abstract
Background
Methods
Results
Conclusions
Introduction
Methods and materials
Results
Discussion
Footnotes
References

Rights & Permissions

Abstract

Background

Major depressive disorder (MDD) is a highly heterogeneous condition in terms of symptom presentation and, likely, underlying pathophysiology. Accordingly, it is possible that only certain individuals with MDD are well-suited to antidepressants. A potentially fruitful approach to parsing this heterogeneity is to focus on promising endophenotypes of depression, such as neuroticism, anhedonia, and cognitive control deficits.

Methods

Within an 8-week multisite trial of sertraline v. placebo for depressed adults (n = 216), we examined whether the combination of machine learning with a Personalized Advantage Index (PAI) can generate individualized treatment recommendations on the basis of endophenotype profiles coupled with clinical and demographic characteristics.

Results

Five pre-treatment variables moderated treatment response. Higher depression severity and neuroticism, older age, less impairment in cognitive control, and being employed were each associated with better outcomes to sertraline than placebo. Across 1000 iterations of a 10-fold cross-validation, the PAI model predicted that 31% of the sample would exhibit a clinically meaningful advantage [post-treatment Hamilton Rating Scale for Depression (HRSD) difference ⩾3] with sertraline relative to placebo. Although there were no overall outcome differences between treatment groups (d = 0.15), those identified as optimally suited to sertraline at pre-treatment had better week 8 HRSD scores if randomized to sertraline (10.7) than placebo (14.7) (d = 0.58).

Conclusions

A subset of MDD patients optimally suited to sertraline can be identified on the basis of pre-treatment characteristics. This model must be tested prospectively before it can be used to inform treatment selection. However, findings demonstrate the potential to improve individual outcomes through algorithm-guided treatment recommendations.

Keywords

Antidepressant depression endophenotype machine learning placebo precision medicine prediction

Type: Original Articles
Information: Psychological Medicine , Volume 49 , Issue 7 , May 2019 , pp. 1118 - 1127

DOI: https://doi.org/10.1017/S0033291718001708 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

Introduction

Meta-analyses reveal that average differences in depressive symptom improvement between antidepressant medications [ADMs; most commonly, selective serotonin reuptake inhibitors (SSRIs)] and placebo are often small (i.e. between-group differences in symptom change of <3 points on the Hamilton Depression Rating Scale (Hamilton, Reference Hamilton1960; Moncrieff et al., Reference Moncrieff, Wessely and Hardy2004; Kirsch et al., Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Fournier et al., Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam, Shelton and Fawcett2010; Kirsch, Reference Kirsch2015; Jakobsen et al., Reference Jakobsen, Katakam, Schou, Hellmuth, Stallknecht, Leth-Møller, Iversen, Banke, Petersen, Klingenberg, Krogh, Ebert, Timm, Lindschou and Gluud2017; Cipriani et al., Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson, Ogawa, Leucht, Ruhe, Turner, Higgins, Egger, Takeshima, Hayasaka, Imai, Shinohara, Tajika, Ioannidis and Geddes2018)). A potential reason for this modest differentiation is that major depressive disorder (MDD) is a highly heterogeneous condition in terms of symptom presentation and, likely, underlying pathophysiology (Wakefield and Schmitz, Reference Wakefield and Schmitz2013; Fried and Nesse, Reference Fried and Nesse2015a; Reference Fried and Nesse2015b; Baldessarini et al., Reference Baldessarini, Forte, Selle, Sim, Tondo, Undurraga and Vázquez2017). Accordingly, it is possible that subsets of depressed individuals are better suited to SSRIs, whereas others may derive limited benefit. For example, for certain depressed individuals, the mere passage of time – possibly coupled with the expectation of improvement – may result in symptom remission (e.g. ‘spontaneous remitters’). Such individuals may not require SSRIs. Instead a less costly, low-intensity alternative intervention with minimal or no side effects may be sufficient for symptom remission [e.g. internet-based cognitive behavioral therapy (CBT), which is included in the National Institute for Health and Care Excellence Guidelines (NICE, 2018) as an efficacious intervention]. Currently, treatment selection is largely based on trial-and-error. Approximately 55–75% of depressed individuals in primary care fail to achieve remission to first-line antidepressants, and 8–40% will switch to at least one other medication (Rush et al., Reference Rush, Trivedi, Wisniewski, Nierenberg, Stewart, Warden, Niederehe, Thase, Lavori, Lebowitz, McGrath, Rosenbaum, Sackeim, Kupfer, Luther and Fava2006; Marcus et al., Reference Marcus, Hassan and Olfson2009; Schultz and Joish, Reference Schultz and Joish2009; Vuorilehto et al., Reference Vuorilehto, Melartin and Isometsä2009; Milea et al., Reference Milea, Guelfucci, Bent-Ennakhil, Toumi and Auray2010; Saragoussi et al., Reference Saragoussi, Chollet, Bineau, Chalem and Milea2012; Thomas et al., Reference Thomas, Kessler, Campbell, Morrison, Peters, Williams, Lewis and Wiles2013; Ball et al., Reference Ball, Classi and Dennehy2014; Mars et al., Reference Mars, Heron, Gunnell, Martin, Thomas and Kessler2017). Identifying predictors of antidepressant response may ultimately inform the development of algorithms generating personalized predictions of optimal treatment assignment for clinicians and patients to consider in their decision-making regarding which intervention to select.

A range of pre-treatment variables (e.g. baseline clinical, demographic, and neurobiological characteristics) have been examined as predictors of SSRI response.Footnote ¹ Perhaps the most well-supported clinical moderator of SSRI v. placebo response is baseline depressive symptom severity (Khan et al., Reference Khan, Leventhal, Khan and Brown2002; Kirsch et al., Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Fournier et al., Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam, Shelton and Fawcett2010). Meta-analyses indicate that in patients with MDD, lower levels of depressive symptom severity predicts minimal to no advantage of ADM over placebo, but that as depression severity increases, so does the magnitude of the advantage of ADM over placebo (Khan et al., Reference Khan, Leventhal, Khan and Brown2002; Kirsch et al., Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Fournier et al., Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam, Shelton and Fawcett2010). Other relevant predictors of greater ADM response include younger age (Fournier et al., Reference Fournier, DeRubeis, Shelton, Hollon, Amsterdam and Gallop2009), being female (Trivedi et al., Reference Trivedi, Rush, Wisniewski, Nierenberg, Warden, Ritz, Norquist, Howland, Lebowitz, McGrath, Shores-Wilson, Biggs, Balasubramani and Fava2006; Jakubovski and Bloch, Reference Jakubovski and Bloch2014), higher education (Trivedi et al., Reference Trivedi, Rush, Wisniewski, Nierenberg, Warden, Ritz, Norquist, Howland, Lebowitz, McGrath, Shores-Wilson, Biggs, Balasubramani and Fava2006), being employed (Fournier et al., Reference Fournier, DeRubeis, Shelton, Hollon, Amsterdam and Gallop2009; Jakubovski and Bloch, Reference Jakubovski and Bloch2014), lower anhedonia (McMakin et al., Reference McMakin, Olino, Porta, Dietz, Emslie, Clarke, Wagner, Asarnow, Ryan, Birmaher, Shamseddeen, Mayes, Kennard, Spirito, Keller, Lynch, Dickerson and Brent2012; Uher et al., Reference Uher, Perlis, Henigsberg, Zobel, Rietschel, Mors, Hauser, Dernovsek, Souery, Bajs, Maier, Aitchison, Farmer and McGuffin2012a), non-chronic depression (Souery et al., Reference Souery, Oswald, Massat, Bailer, Bollen, Demyttenaere, Kasper, Lecrubier, Montgomery, Serretti, Zohar and Mendlewicz2007), and lower anxiety (Fava et al., Reference Fava, Rush, Alpert, Balasubramani, Wisniewski, Carmin, Biggs, Zisook, Leuchter, Howland, Warden and Trivedi2008). Although each of these variables has limited predictive power when considered individually, recent advances in multivariable machine learning approaches allow for the combination of large sets of variables to predict treatment response (Gillan and Whelan, Reference Gillan and Whelan2017). Critically, to be clinically useful for treatment selection, predictors of treatment response must be applicable to individual patients. Consistent with the goals of precision medicine, such work aims to translate treatment outcome moderation findings to actionable, algorithm-guided treatment recommendations (Cohen and DeRubeis, Reference Cohen and DeRubeis2018).

We sought to use machine learning coupled with a recently published Personalized Advantage Index (PAI) (DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014; Huibers et al., Reference Huibers, Cohen, Lemmens, Arntz, Peeters, Cuijpers and DeRubeis2015) to predict treatment outcome at the individual level on the basis of pre-treatment patient data. Our aim was to use the above approach to identify the subset of patients who may be optimally suited to SSRI. With regards to machine-learning approaches, we used four complementary variable selection procedures in an effort to identify a reliable and stable set of predictors from the initial, larger set of baseline variables. These procedures rely on different algorithms, such as decision tree-based ensemble learning methods [e.g. Random Forests (RF)] and regression-based methods [e.g. Elastic Net Regularization (ENR)]. This approach encouraged the selection of a set of predictors that emerged consistently across differing variable selection algorithms (see ‘Variable selection’ section below). Data were derived from the multi-site EMBARC (Establishing Moderators and Biosignatures of Antidepressant Response for Clinical Care) clinical trial comparing SSRI (sertraline) v. placebo (Trivedi et al., Reference Trivedi, McGrath, Fava, Parsey, Kurian, Phillips, Oquendo, Bruder, Pizzagalli, Toups, Cooper, Adams, Weyandt, Morris, Grannemann, Ogden, Buckner, McInnis, Kraemer, Petkova, Carmody and Weissman2016). Of relevance, in a recent study based on EEG and cluster analyses, we reported that the substantial heterogeneity of MDD could be parsed by considering three putative endophenotypes of depression: neuroticism, blunted reward learning, and cognitive control deficits (Webb et al., Reference Webb, Dillon, Pechtel, Goer, Murray, Huys, Fava, McGrath, Weissman, Parsey, Kurian, Adams, Weyandt, Trombello, Grannemann, Cooper, Deldin, Tenke, Trivedi, Bruder and Pizzagalli2016). Endophenotypes are hypothesized to lie on the pathway between genes and downstream symptoms, and are traditionally defined as meeting the following criteria (Gottesman and Gould, Reference Gottesman and Gould2003): (1) associated with the disease, (2) heritable, (3) primarily state-independent, (4) cosegregate within families, (5) familial association, and (6) measured reliably (Goldstein and Klein, Reference Goldstein and Klein2014). We posited that depressed patients with certain endophenotype profiles may be differentially responsive to certain interventions (e.g. the cluster of depressed patients defined by relatively high levels of neuroticism may be more responsive to SSRIs). Indeed, there is an evidence that depressed individuals characterized by elevated neuroticism may derive relatively greater therapeutic benefit from SSRIs relative to CBT (Bagby et al., Reference Bagby, Quilty, Segal, McBride, Kennedy and Costa2008) or placebo (Tang et al., Reference Tang, DeRubeis, Hollon, Amsterdam, Shelton and Schalet2009). Thus, we examined whether the combination of putative endophenotypes (neuroticism, reward learning, cognitive control deficits, anhedonia) with both baseline clinical (depressive symptom severity, depression chronicity, anxiety severity) and demographic (gender, age, marital status, employment status, years of education) variables previously linked with antidepressant response could be used to identify individual depressed patients optimally suited to SSRIs. Plausible neuroimaging predictor variables (McGrath et al., Reference McGrath, Kelley, Holtzheimer, Dunlop, Craighead, Franco, Craddock and Mayberg2013; Pizzagalli et al., Reference Pizzagalli, Webb, Dillon, Tenke, Kayser, Goer, Fava, McGrath, Weissman, Parsey, Adams, Trombello, Cooper, Deldin, Oquendo, McInnis, Carmody, Bruder and Trivedi2018) were excluded from this particular study given that they are substantially more costly and time-consuming than the above set of clinical, demographic, and behavioral variables, the latter of which could be reasonably integrated into a current psychiatric clinic for the purpose of treatment selection.

Methods and materials

After providing informed consent, participants completed several behavioral and self-report assessments prior to enrolling in an 8-week, double-blind, placebo-controlled clinical trial of sertraline v. placebo. The clinical trial design has been described in detail in a previous publication (Trivedi et al., Reference Trivedi, McGrath, Fava, Parsey, Kurian, Phillips, Oquendo, Bruder, Pizzagalli, Toups, Cooper, Adams, Weyandt, Morris, Grannemann, Ogden, Buckner, McInnis, Kraemer, Petkova, Carmody and Weissman2016).

Participants

Eligible participants (ages 18–65) met Diagnostic and Statistical Manual of Mental Disorders, Fourth Edition (DSM-IV) criteria for a current MDD episode (SCID-I/P), scored ⩾14 on the 16-item Quick Inventory of Depression Symptomatology (QIDS-SR₁₆) (Rush et al., Reference Rush, Trivedi, Ibrahim, Carmody, Arnow, Klein, Markowitz, Ninan, Kornstein, Manber, Thase, Kocsis and Keller2003), and were medication-free for ⩾3 weeks prior to completing any study measures. Exclusion criteria included: history of bipolar disorder or psychosis; substance dependence (excluding nicotine) in the past 6 months or substance abuse in the past 2 months; active suicidality; or unstable medical conditions (see online Supplementary Methods). Data from 216 MDD subjects who passed quality control criteria for both Flanker and Probabilistic Reward Task and completed at least 4 weeks of treatment (American Psychiatric Association, 2010; Fournier et al., Reference Fournier, DeRubeis, Hollon, Gallop, Shelton and Amsterdam2013) were included (online Supplementary Methods).

Endophenotype measures

NEO Five-Factor Inventory-3 (NEO-FFI-3) (McCrae and Costa, Reference McCrae and Costa2010). The 12-item neuroticism subscale from the NEO-FFI was used.

Probabilistic Reward Task (PRT). The PRT uses a differential reinforcement schedule to assess reward learning (i.e. the ability to adapt behavior as a function of rewards), and has been described in detail in previous publications (Pizzagalli et al., Reference Pizzagalli, Jahn and O'Shea2005, Reference Pizzagalli, Evins, Schetter, Frank, Pajtas, Santesso and Culhane2008a) (see online Supplementary Methods).

Snaith-Hamilton Pleasure Scale (SHAPS) (Snaith et al., Reference Snaith, Hamilton, Morley, Humayan, Hargreaves and Trigwell1995). The SHAPS is a 14-item self-report scale, with items asking about hedonic experience in the ‘last few days’ for a variety of pleasurable activities. Items consist of four response categories, with ‘strongly agree’ (=1), ‘agree’ (=2), ‘disagree’ (=3), ‘strongly disagree’ (=4). Higher scores indicate higher anhedonia.

Flanker Task (Eriksen and Eriksen, Reference Eriksen and Eriksen1974). An adapted version of the Eriksen Flanker Task that included an individually titrated response window was used to assess cognitive control (see online Supplementary Methods) (Holmes et al., Reference Holmes, Bogdan and Pizzagalli2010).

Clinical measures

Hamilton Rating Scale for Depression (HRSD) (Hamilton, Reference Hamilton1960). The 17-item HRSD, a clinician-administered measure of depressive symptom severity, was administered by trained clinical evaluators.

Mood and Anxiety Symptoms Questionnaire (MASQ) (Watson et al., Reference Watson, Clark, Weber, Assenheimer, Strauss and McCormick1995). The anxious arousal subscale from a 30-item adaptation of the MASQ (MASQ-AA) assessed anxiety.

Data acquisition and reduction

PRT. The primary variable of interest was reward learning, which has been found to predict response to antidepressant treatment among inpatients with MDD (Vrieze et al., Reference Vrieze, Pizzagalli, Demyttenaere, Hompes, Sienaert, de Boer, Schmidt and Claes2013). As in prior studies (Pizzagalli et al., Reference Pizzagalli, Goetz, Ostacher, Iosifescu and Perlis2008b; Vrieze et al., Reference Vrieze, Pizzagalli, Demyttenaere, Hompes, Sienaert, de Boer, Schmidt and Claes2013), reward learning was defined as change in response bias (RB) scores throughout the task [here, from the first to the second block (RB_Block2–RB_Block1)].

Flanker Task. The primary variable of interest was the interference effect on accuracy, defined as lower accuracy on incongruent relative to congruent trials, computed as (Accuracy_{Compatible trials} − Accuracy_{Incompatible trials}). Higher scores reflect greater interference (i.e. reduced cognitive control).

Data Pre-Processing. Missing data were imputed using a RF-based imputation strategy [missForest (Stekhoven and Bühlmann, Reference Stekhoven and Bühlmann2012) package in R (R Core Team, 2013)] (see online Supplementary Methods) (Waljee et al., Reference Waljee, Mukherjee, Singal, Zhang, Warren, Balis, Marrero, Zhu and Higgins2013). This approach can handle both categorical and continuous variables, and generates a single imputed dataset via averaging across multiple regression trees. Consistent with the recommendation of Kraemer and Blasey (Kraemer and Blasey, Reference Kraemer and Blasey2004), continuous variables were mean-centered and categorical variables were transformed into binary variables with the values of −0.5 and 0.5. Of the 216 individuals in this sample, 10.19% were missing data for the outcome variable (week 8 HRSD) and thus had their data imputed. There were no significant differences in week 8 completion rates between the SSRI (88.0%) or placebo (91.5%) conditions [χ² (1) = 0.41, p = 0.52]. For additional analyses on dropout rates and medication/placebo adherence, see online Supplementary Methods.

Statistical analyses

Variable selection

Prior to implementing the PAI algorithm, pre-treatment variables that interact with treatment group (SSRI or placebo) in predicting HRSD outcome (week 8 scores) must be selected. We implemented (1) RF modeling [using the mobForest (Garge et al., Reference Garge, Bobashev and Eggleston2013) package in R (R Core Team, 2013)], (2) ENR [glmnet package (Friedman et al., Reference Friedman, Hastie and Tibshirani2010)], and (3) Bayesian Additive Regression Trees [BART; bartMachine package (Kapelner and Bleich, Reference Kapelner and Bleich2016)]. For each of these three models, we entered all of our selected pre-treatment variables simultaneously: four endophenotype variables [neuroticism (NEO-FFI-3), cognitive control (Flanker interference effect on accuracy), reward learning (PRT), and anhedonia (SHAPS)], three clinical variables [baseline severity of depressive symptoms (HRSD), baseline severity of anxiety (MASQ-AA), and chronic MDD (yes/no)] and five demographic variables (age, gender, marital status, employment status, and years of education). Variables showing treatment group × predictor variable interactions in two of the three models were entered into a final stepwise AIC-penalized bootstrapped variable selection [using the bootStepAIC package (Austin and Tu, Reference Austin and Tu2004)]. For details on each of these approaches and how variables are selected from each model, see online Supplementary Methods.

Generating PAIs

Briefly, to generate treatment recommendations with the PAI approach, a regression model is built and used to predict treatment outcome (week 8 HRSD) for each patient in SSRI and placebo separately. A patient's PAI is the signed difference between the two predictions (i.e. week 8 HRSD predicted in SSRI minus week 8 HRSD predicted in placebo), where a negative value reflects a predicted better outcome in SSRI, and a positive value reflects the reverse. Moreover, the magnitude of the absolute value of the PAI reflects the strength of the differential prediction, such that patients with larger PAIs, in either direction, are those who are most likely to evidence a substantially better outcome in their PAI indicated, relative to their PAI non-indicated treatment. To limit bias that could occur when evaluating model performance on individuals whose data were used to set model weights, PAIs were generated using 10-fold cross-validation. This procedure ensures that each model is estimated absent any data from the patient whose outcome will be predicted (see PAI generation and PAI evaluation in the online Supplementary Methods for details; see also Alternative PAI models section below).

Evaluating PAIs

To assess whether PAI scores moderate treatment group differences in depression outcomes, we tested a treatment group × PAI score interaction with week 8 HRSD scores as the dependent variables. Next, and similar to previous PAI publications (DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014; Huibers et al., Reference Huibers, Cohen, Lemmens, Arntz, Peeters, Cuijpers and DeRubeis2015), to evaluate the utility of the PAIs, we compared mean week 8 HRSD scores for SSRI-indicated individuals who were randomized to SSRI in comparison to SSRI-indicated participants who received placebo. We performed the analogous comparison for those identified as ‘placebo-indicated’. We then evaluated the above comparisons with only those patients for whom the absolute value of the PAI was 3 or greater (i.e. predicted to have a ‘clinically significant’ advantage in one treatment condition) (DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014). Finally, the entire 10-fold cross-validation procedure and evaluation was repeated 1000 times to generate stable estimates.

Results

Variable selection

See Table 1 for variable selection results, including which variables were selected during each stage. The following variables survived the four-step procedure and were included in the final model (see Fig. 1 and Table 2):

$$\eqalign{Y = \; & {\rm treatment} \times ({\rm depression}\,{\rm severity}[{\rm HRSD}] \cr & + {\rm neuroticism}[{\rm NEO} - {\rm FFI} - 3] \cr & + {\rm cognitive}\,{\rm control}\,[{\rm Flanker}\,{\rm Interference}\left( {{\rm Accuracy}} \right)] \cr & + {\rm age} + {\rm employment}\,{\rm status)}{\rm.}} $$

Fig. 1. Plots of baseline predictor by treatment group interactions from the final model.

Table 1. Variable selection results

HDRS, Hamilton Depression Rating Scale (17-item)(Hamilton, Reference Hamilton1960); MASQ-AA, Mood and Anxiety Symptoms Questionnaire, Anxious Arousal subscore (Watson et al., Reference Watson, Clark, Weber, Assenheimer, Strauss and McCormick1995); MDD, major depressive disorder; NEO-FFI-3, NEO Five-Factor Inventory – 3 (McCrae and Costa, Reference McCrae and Costa2010); SHAPS, Snaith–Hamilton Pleasure Scale (Snaith et al., Reference Snaith, Hamilton, Morley, Humayan, Hargreaves and Trigwell1995); PRT, Probabilistic Reward Task (Pizzagalli et al., Reference Pizzagalli, Jahn and O'Shea2005); Flanker ACC, Flanker Interference Accuracy score (=Accuracy_{Compatible trials} − Accuracy_{Incompatible trials}); higher scores indicate more interference (i.e. reduced cognitive control); BART, Bayesian Additive Regression Trees.

^a Variables selected by BootStepAIC to be included in the final model.

Table 2. Final model

HDRS, Hamilton Depression Rating Scale (17-item) (Hamilton, Reference Hamilton1960); NEO-FFI-3, NEO Five-Factor Inventory – 3 (McCrae and Costa, Reference McCrae and Costa2010); Flanker ACC, Flanker Interference Accuracy score (= Accuracy_{Compatible trials} − Accuracy_{Incompatible trials}).

⁺p < 0.10. *p < 0.05. **p < 0.01.

Predicted outcomes and PAIs

The average absolute value of PAI scores was 3.4 (s.d. = 2.6), indicating that our model predicted an average 3.4-point difference in week 8 HRSD scores between indicated and non-indicated treatment assignment. The absolute value of the PAI was 3 or greater in approximately half (48.6%) of the sample (see Fig. 2 for distribution of PAI scores). Specifically, 31.5% of the sample was predicted to have a ‘clinically significant’ advantage (DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014) in the SSRI condition (PAI ⩽ −3), whereas this value was 17.1% for placebo (PAI ⩾ 3). In contrast, the model indicates that 51.4% of the sample was predicted to exhibit relatively minimal differences in outcome between treatment conditions.

Fig. 2. Frequency histogram displaying distribution of Personalized Advantage Index (PAI) scores, computed as the predicted difference in week 8 HRSD scores for SSRI minus placebo. Accordingly, a PAI score <0 signifies that SSRI was indicated, whereas a PAI score >0 indicates that placebo was expected to yield a better outcome. The kernel density estimate illustrates the expected distribution of PAI scores in the population.

Observed outcomes in indicated v. non-indicated treatment condition

Full sample

First, it is important to highlight that, in the full sample, patients randomized to SSRI (M = 10.86; s.d. = 6.27) and placebo (M = 11.88; s.d. = 7.37) did not significantly differ in mean week 8 HRSD outcomes (adjusting for baseline HRSD scores) [F _(1,213) = 0.92; p = 0.339; Cohen's d = 0.15; Fig. 3, left panel). Critically, a significant treatment group × PAI interaction emerged in predicting week 8 HRSD scores, indicating that PAI scores moderated treatment group differences in outcome [F _(1,212) = 6.68; p = 0.010). For the full sample, patients randomized to their PAI-indicated treatment condition (M = 10.39; s.d. = 6.97) were observed to have lower week 8 HRSD scores relative to those randomized to their contraindicated condition (M = 12.38; s.d. = 6.70) [d = 0.29, t(214) = 2.16; p = 0.032]. For patients predicted to have better outcomes to SSRI than placebo (PAI < 0), those randomized to SSRI (M = 10.57; s.d. = 6.48) were observed to have lower week 8 HRSD scores than those randomized to placebo (M = 13.12; s.d. = 7.03) [d = 0.38, t(121) = 2.08; p = 0.040; see Fig. 3, right panel]. However, for patients predicted to have better outcomes to placebo (PAI > 0), those who received placebo (M = 10.18; s.d. = 7.54) did not differ significantly in outcome relative to those who received SSRI (M = 11.23; s.d. = 6.04) [d = 0.16; t(91) = 0.74; p = 0.460; see Fig. 3, right panel].

Fig. 3. Comparison of mean week 8 HRSD for patients randomized to SSRI or placebo (left panel) (n = 216). Comparison of mean week 8 HRSD scores for patients randomly assigned to their PAI-indicated treatment v. those assigned to their PAI-contraindicated treatment for the full sample (n = 216) v. including only patients for whom the algorithm predicted a clinically significant advantage in one treatment condition (PAI ⩾ |3|); n = 105) (right panel). Error bars represent standard error.

Largest PAIs (PAI ⩾ |3|)

Among this subset, patients randomized to their indicated treatment condition (M = 9.53; s.d. = 6.68) were observed to have lower week 8 HRSD scores relative to those randomized to their contraindicated condition (M = 14.09; s.d. = 6.42) [d = 0.70, t(103) = 3.59; p < 0.001]. SSRI-indicated patients randomized to SSRI (M = 10.68; s.d. = 7.04) were observed to have lower week 8 HRSD scores than those randomized to placebo (M = 14.66; s.d. = 6.83) [d = 0.58; t(66) = 2.34; p = 0.023; see Fig. 3, right panel]. Conversely, placebo-indicated patients randomized to placebo (M = 7.65; s.d. = 5.64) had better outcomes than those randomized to SSRI (M = 13.06; s.d. = 5.57) [d = 1.01; t(35) = 3.07; p = 0.004; see Fig. 3, right panel].

Alternative PAI models

See online Supplementary Material for results from two alternative PAI models. First, a PAI model was run including all 12 a priori baseline variables, rather than the reduced set of five moderators emerging from our variable selection procedure. In other words, in the former model including all a priori variables, our variable selection procedure was not performed. The fact that a similar pattern of findings emerged in this control PAI analysis suggests that our findings are likely not attributable to overfitting due to running our PAI analysis on a reduced set of variables emerging from our variable selection steps. Second, to evaluate the utility of treatment recommendations based solely on depression severity (rather than our five moderator variables), we re-ran the above analysis using only baseline depressive symptom (HRSD) severity to inform the PAI, which did not yield significant findings.

Discussion

This study used the variable selection approach proposed by Cohen et al. (Cohen et al., Reference Cohen, Kim, Van, Dekker and Driessen2017) combining machine learning with a previously published PAI algorithm (DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014; Huibers et al., Reference Huibers, Cohen, Lemmens, Arntz, Peeters, Cuijpers and DeRubeis2015) to generate individualized treatment recommendations on the basis of (i) putative behavioral endophenotypes of depression (Goldstein and Klein, Reference Goldstein and Klein2014; Webb et al., Reference Webb, Dillon, Pechtel, Goer, Murray, Huys, Fava, McGrath, Weissman, Parsey, Kurian, Adams, Weyandt, Trombello, Grannemann, Cooper, Deldin, Tenke, Trivedi, Bruder and Pizzagalli2016) and (ii) clinical and demographic characteristics previously linked with antidepressant response. Ultimately, the goal is to translate research on predictors of antidepressant response to actionable treatment recommendations for individuals. First, it is important to highlight that the baseline moderators emerging from our machine learning variable selection steps are largely consistent with prior research. In particular, depressed individuals with higher baseline severity of depressive symptoms (Khan et al., Reference Khan, Leventhal, Khan and Brown2002; Kirsch et al., Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Fournier et al., Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam, Shelton and Fawcett2010), higher neuroticism (Tang et al., Reference Tang, DeRubeis, Hollon, Amsterdam, Shelton and Schalet2009), and who were employed (Fournier et al., Reference Fournier, DeRubeis, Shelton, Hollon, Amsterdam and Gallop2009; Jakubovski and Bloch, Reference Jakubovski and Bloch2014) had better outcomes to SSRI than placebo. In addition, relatively older patients and those with lower deficits in cognitive control (i.e. smaller Flanker accuracy interference effect) also exhibited better outcomes to SSRI. Of note, owing to their minimal cost and relatively low time burden, these baseline measurements could be more easily integrated into a treatment clinic than baseline assessments involving neuroimaging.

Perhaps the most well-supported clinical moderator of SSRI v. placebo response is baseline depressive symptom severity (Khan et al., Reference Khan, Leventhal, Khan and Brown2002; Kirsch et al., Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Fournier et al., Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam, Shelton and Fawcett2010). It should be noted that total depression score at baseline is not the only meaningful marker of depression severity. Other relevant variables such as episode chronicity and anhedonia were included in our initial models but did not survive the variable selection steps. Chronicity is known to be linked with poor response to placebo (Khan et al., Reference Khan, Dager, Cohen, Avery, Scherzo and Dunner1991; Dunner, Reference Dunner2001), yet did not emerge as a moderator of SSRI v. placebo response. Consistent with prior work, higher neuroticism was associated with greater response to SSRI relative to placebo, which may in part be due to the role of SSRIs in blunting negative affect (Quilty et al., Reference Quilty, Meusel and Bagby2008; Tang et al., Reference Tang, DeRubeis, Hollon, Amsterdam, Shelton and Schalet2009; Soskin et al., Reference Soskin, Carl, Alpert and Fava2012). It is important to highlight that elevated neuroticism moderated SSRI v. placebo response above and beyond the contribution of baseline depression (i.e. while the baseline HRSD × treatment group interaction was included in the model).

The interpretation of the cognitive control finding is less clear. Namely, those with more intact cognitive control exhibited better outcomes in SSRI v. placebo; whereas those with greater impairments showed little between-group differences in outcome. Continued cognitive impairments – even following symptom remission – are among the most common residual symptoms of depression (Herrera-Guzmán et al., Reference Herrera-Guzmán, Gudayol-Ferré, Herrera-Guzmán, Guàrdia-Olmos, Hinojosa-Calvo and Herrera-Abarca2009; Lam et al., Reference Lam, Kennedy, McIntyre and Khullar2014). Moderation may be more likely to be observed when comparing a treatment that more successfully targets cognitive control deficits [e.g. vortioxetine, (Mahableshwarkar et al., Reference Mahableshwarkar, Zajecka, Jacobson, Chen and Keefe2015)] v. one with limited pro-cognitive effects (also see Etkin et al., Reference Etkin, Patenaude, Song, Usherwood, Rekshan, Schatzberg, Rush and Williams2015).

Of the 12 a priori variables we initially included, seven did not survive our four-step variable selection procedure. It may be that some of these variables are prognostic predictors of outcome, but were not selected as they do not moderate SSRI v. placebo response. For example, higher anhedonia (McMakin et al., Reference McMakin, Olino, Porta, Dietz, Emslie, Clarke, Wagner, Asarnow, Ryan, Birmaher, Shamseddeen, Mayes, Kennard, Spirito, Keller, Lynch, Dickerson and Brent2012; Uher et al., Reference Uher, Perlis, Henigsberg, Zobel, Rietschel, Mors, Hauser, Dernovsek, Souery, Bajs, Maier, Aitchison, Farmer and McGuffin2012a) and blunted reward learning (Vrieze et al., Reference Vrieze, Pizzagalli, Demyttenaere, Hompes, Sienaert, de Boer, Schmidt and Claes2013) have each been shown to predict worse antidepressant outcome. Although anhedonia did not moderate of SSRI v. placebo response, it did emerge as a prognostic predictor of worse outcome across groups (t = 3.51, p < 0.001; reward learning ns; see online Supplementary Results). With regards to the specific variable selection approaches used, both RF and BART identified the same five variables; whereas ENR selected a larger set of eight variables. Differences in results between these approaches are not unexpected, and may be due to the fact that both RF and BART rely on a similar decision tree-based ensemble learning algorithm, whereas ENR is a variant of classic regression. As well, unlike ENR, both RF and BART consider both unspecified non-linear relationships and higher order interactions between variables.

Importantly, there were no overall differences in depression outcomes between patients randomized to SSRI and placebo in the overall sample (d = 0.15). These findings are in line with meta-analyses of SSRI v. placebo indicating small overall differences in outcome (Moncrieff et al., Reference Moncrieff, Wessely and Hardy2004; Kirsch et al., Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Fournier et al., Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam, Shelton and Fawcett2010; Kirsch, Reference Kirsch2015; Jakobsen et al., Reference Jakobsen, Katakam, Schou, Hellmuth, Stallknecht, Leth-Møller, Iversen, Banke, Petersen, Klingenberg, Krogh, Ebert, Timm, Lindschou and Gluud2017; Cipriani et al., Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson, Ogawa, Leucht, Ruhe, Turner, Higgins, Egger, Takeshima, Hayasaka, Imai, Shinohara, Tajika, Ioannidis and Geddes2018). However, overall between-group comparisons obscure any meaningful between-patient characteristics that may moderate SSRI v. placebo differences in outcome. Indeed, we identified five patient characteristics that moderated group differences in depression outcome. These variables were subsequently entered into a PAI algorithm (DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014; Huibers et al., Reference Huibers, Cohen, Lemmens, Arntz, Peeters, Cuijpers and DeRubeis2015) to generate patient-specific predictions of SSRI v. placebo outcome. Results using our PAI model indicated that approximately one-third of the sample would have a clinically significant advantage (DeRubeis et al., Reference DeRubeis, Cohen, Forand, Fournier, Gelfand and Lorenzo-Luaces2014) with SSRI relative to placebo (PAI ⩽ −3). Intriguingly, and unexpectedly, the model also predicted that a subset (17%) of depressed individuals would exhibit a clinically significant advantage in placebo.

As the treatment recommendations for some individuals indicated almost no advantage of one treatment over the other (e.g. see distribution of PAI scores near 0 in Fig. 2), one might reasonably expect that differences in outcome between patients who received their PAI-indicated v. contraindicated treatment would be larger for those individuals predicted to have more clinically meaningful differences in outcomes (i.e. larger absolute PAI values), which our sub-analyses confirmed. Notably, when considering the subset with larger PAIs (absolute PAI values ⩾3), the effect size for the difference in outcome for SSRI-indicated patients who were randomized to SSRI v. placebo (d = 0.58) was substantially larger than the overall treatment group difference between SSRI and placebo (d = 0.15), as well as larger than the effect sizes reported in systematic reviews of ADM v. placebo comparisons (d ~ 0.30) (Kirsch et al., Reference Kirsch, Deacon, Huedo-Medina, Scoboria, Moore and Johnson2008; Turner et al., Reference Turner, Matthews, Linardatos, Tell and Rosenthal2008; Fournier et al., Reference Fournier, DeRubeis, Hollon, Dimidjian, Amsterdam, Shelton and Fawcett2010; Khin et al., Reference Khin, Chen, Yang, Yang and Laughren2011; Kirsch, Reference Kirsch2015; Moncrieff and Kirsch, Reference Moncrieff and Kirsch2015; Cipriani et al., Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson, Ogawa, Leucht, Ruhe, Turner, Higgins, Egger, Takeshima, Hayasaka, Imai, Shinohara, Tajika, Ioannidis and Geddes2018 ), and those observed between active treatments and controls from general medical contexts (d ~ 0.45) (Leucht et al., Reference Leucht, Hierl, Kissling, Dold and Davis2012). In sum, findings suggest that our statistical approach may identify patients who are optimally suited to SSRI treatment. Of course, this study compared SSRI v. a placebo condition, rather than an alternative evidence-based treatment (e.g. CBT). Thus, our model identified individuals who would likely evidence greater depressive symptom improvement on an SSRI relative to an intervention providing the ‘non-specific’ therapeutic elements associated with a pill placebo condition (i.e. the expectation of symptom improvement, the passage of time, symptom monitoring and minimal contact/support from a clinician).

Although no statistically significant advantage was observed for placebo-indicated patients who received their indicated treatment, a significant advantage of placebo over SSRI was observed for the 17% of the sample for whom placebo was more strongly indicated (PAIs ⩾ 3; d = 1.01). The possibility that SSRIs are relatively ineffective or countertherapeutic for certain patients (e.g. due to side effects) requires additional research (Bet et al., Reference Bet, Hugtenburg, Penninx and Hoogendijk2013; Julien, Reference Julien2013; Hollon, Reference Hollon2016). It is important to emphasize that this finding did not emerge in the full sample. Given the reduced sample size in the latter analysis, conclusions must be tempered and replications are required.

An alternative PAI model based exclusively on pre-treatment HRSD scores did not yield significant findings, suggesting that baseline depressive symptom severity alone is not as informative as our model incorporating baseline data on five variables. Second, a similar pattern of findings emerged in a control PAI analysis (in which all 12 a priori variables were included) relative to our primary analysis, suggesting that our findings are likely not attributable to overfitting due to running our PAI analysis on a reduced set of variables emerging from our variable selection steps.

Limitations

Several limitations should be noted. First, and importantly, prospective tests are needed in which a PAI model is built in one sample, and then tested in a separate sample. The k-fold cross-validation approach we used approximates such a test by leaving each patient's data out of the model used to generate their predicted outcomes. However, although we implemented cross-validation during the weight-setting stage, we used the full sample for variable selection which can lead to overfitting and inflated associations (Hastie et al., Reference Hastie, Tibshirani and Friedman2009; Fiedler, Reference Fiedler2011). Until such models are tested and replicated in separate samples, it will be difficult to determine the extent to which overfitting contributes to findings and whether models generalize to new sets of treatment-seeking depressed individuals. Second, we focused on clinical, demographic, and putative behavioral endophenotypes that could be collected at low cost and with relatively minimal clinic staff and patient burden. The extent to which neural assessments provide incremental predictive validity above and beyond such variables is an important direction for research, particularly with regards to relatively less costly and non-invasive imaging approaches (e.g. EEG). Third, it is unclear whether findings would generalize to depressed individuals who do not meet the inclusion/exclusion criteria of this trial. In addition, as others have highlighted (Uher et al., Reference Uher, Tansey, Malki and Perlis2012b), measures of outcome (HRSD) and predictors include a certain amount of error, which may significantly attenuate the magnitude of observed predictor–outcome associations. Fourth, sample size was relatively small. Finally, the current PAI model relies on randomized designs (i.e. to examine outcomes for those randomly assigned to their indicated v. non-indicated treatment). An important future direction for research is to adapt these statistical models for the investigation of optimal treatment assignment in current clinical practice settings in which patients are not randomly assigned to interventions. These limitations notwithstanding, our findings demonstrate the potential for precision medicine to improve individual outcomes through model-guided treatment recommendations rather than the current practice of trial-and-error. Findings from replicated prescriptive algorithms could ultimately be used to inform the development of web-based ‘treatment selection calculators’ available to clinicians and patients to facilitate decision-making.

Financial support

The EMBARC study was supported by the National Institute of Mental Health of the National Institutes of Health under award numbers U01MH092221 (Trivedi M.H.) and U01MH092250 (McGrath P.J., Parsey R.V., Weissman M.M.). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health. This work was supported by the EMBARC National Coordinating Center at UT Southwestern Medical Center (Madhukar H. Trivedi, M.D., Coordinating PI) and the Data Center at Columbia and Stony Brook Universities. Christian A. Webb and Diego A. Pizzagalli were partially supported by 5K23MH108752 and 2R37MH068376, respectively. Zachary D. Cohen and Robert J. DeRubeis are supported in part by a grant from MQ: Transforming mental health MQ14PM_27. The opinions and assertions contained in this article should not be construed as reflecting the views of the sponsors.

Supplementary material

The supplementary material for this article can be found at https://doi.org/10.1017/S0033291718001708

Acknowledgements

This study was supported by NIMH grants (M.H.K., grant number U01MH092221; P.J.M., R.V.P., M.M.W., grant number U01MH092250).

Conflict of interest

In the last three years, the authors report the following financial disclosures, for activities unrelated to the current research: Dr Trivedi reports the following lifetime disclosures: research support from the Agency for Healthcare Research and Quality, Cyberonics Inc., National Alliance for Research in Schizophrenia and Depression, National Institute of Mental Health, National Institute on Drug Abuse, National Institute of Diabetes and Digestive and Kidney Diseases, Johnson & Johnson, and consulting and speaker fees from Abbott Laboratories Inc., Akzo (Organon Pharmaceuticals Inc.), Allergan Sales LLC, Alkermes, AstraZeneca, Axon Advisors, Brintellix, Bristol-Myers Squibb Company, Cephalon Inc., Cerecor, Eli Lilly & Company, Evotec, Fabre Kramer Pharmaceuticals Inc., Forest Pharmaceuticals, GlaxoSmithKline, Health Research Associates, Johnson & Johnson, Lundbeck, MedAvante Medscape, Medtronic, Merck, Mitsubishi Tanabe Pharma Development America Inc., MSI Methylation Sciences Inc., Nestle Health Science-PamLab Inc., Naurex, Neuronetics, One Carbon Therapeutics Ltd., Otsuka Pharmaceuticals, Pamlab, Parke-Davis Pharmaceuticals Inc., Pfizer Inc., PgxHealth, Phoenix Marketing Solutions, Rexahn Pharmaceuticals, Ridge Diagnostics, Roche Products Ltd., Sepracor, SHIRE Development, Sierra, SK Life and Science, Sunovion, Takeda, Tal Medical/Puretech Venture, Targacept, Transcept, VantagePoint, Vivus, and Wyeth-Ayerst Laboratories. Dr Dillon reports funding from NIMH, consulting fees from Pfizer Inc. Dr Fava reports the following lifetime disclosures: http://mghcme.org/faculty/faculty-detail/maurizio_fava. Dr Weissman reports funding from NIMH, the National Alliance for Research on Schizophrenia and Depression (NARSAD), the Sackler Foundation, and the Templeton Foundation; royalties from the Oxford University Press, Perseus Press, the American Psychiatric Association Press, and MultiHealth Systems. Dr Oquendo reports funding from NIMH; royalties for the commercial use of the Columbia-Suicide Severity Rating Scale. Her family owns stock in Bristol Myers Squibb. Dr McInnis reports funding from NIMH; consulting fees from Janssen and Otsuka Pharmaceuticals. Dr McGrath has received research grant support from Naurex Pharmaceuticals (now Allergan), Sunovion, and the State of New York. Dr Pizzagalli reports funding from NIMH and the Dana Foundation; consulting fees from Akili Interactive Labs, BlackThorn Therapeutics, Boehringer Ingelheim, Takeda Pharmaceuticals USA and Posit Science. Dr Trombello currently owns stock in Merck and Gilead Sciences and within the past 36 months previously owned stock in Johnson & Johnson. Drs Adams, Cohen, Bruder, Cooper, Deldin, DeRubeis, Fournier, Huys, Jha, Kurian, McGrath, Parsey, Webb, Ms Goer report no financial conflicts.

Ethical standards

The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.

Footnotes

¹ The term predictor is used differently in different contexts [e.g. a ‘prescriptive predictor’ or ‘moderator’ (i.e. defined as a treatment group × predictor variable interaction) of outcome v. a ‘prognostic’ (i.e. treatment non-specific) predictor of outcome] ( Fournier et al., Reference Fournier, DeRubeis, Shelton, Hollon, Amsterdam and Gallop2009; Kraemer Reference Kraemer2013). Here, we include variables that have either demonstrated moderation (e.g. baseline depression and neuroticism moderating SSRI v. placebo differences in outcome), but also include findings from single-arm designs demonstrating that a particular variable (e.g. educational level) predicts outcome within ADM.

References

American Psychiatric Association (2010) Treatment of Patients with major Depressive Disorder. 3rd Edn. Washington, DC: American Psychiatric Press.Google Scholar

Austin, PC and Tu, JV (2004) Bootstrap methods for developing predictive models. The American Statistician 58, 131–137.Google Scholar

Bagby, RM, Quilty, LC, Segal, ZV, McBride, CC, Kennedy, SH and Costa, PT (2008) Personality and differential treatment response in major depression: a randomized controlled trial comparing cognitive-behavioural therapy and pharmacotherapy. The Canadian Journal of Psychiatry 53, 361–370.Google Scholar

Baldessarini, RJ, Forte, A, Selle, V, Sim, K, Tondo, L, Undurraga, J and Vázquez, GH (2017) Morbidity in depressive disorders. Psychotherapy and Psychosomatics 86, 65–72.Google Scholar

Ball, S, Classi, P and Dennehy, EB (2014) What happens next?: a claims database study of second-line pharmacotherapy in patients with major depressive disorder (MDD) who initiate selective serotonin reuptake inhibitor (SSRI) treatment. Annals of General Psychiatry 13, 8.Google Scholar

Bet, PM, Hugtenburg, JG, Penninx, BWJH and Hoogendijk, WJG (2013) Side effects of antidepressants during long-term use in a naturalistic setting. European Neuropsychopharmacology 23, 1443–1451.Google Scholar

Cipriani, A, Furukawa, TA, Salanti, G, Chaimani, A, Atkinson, LZ, Ogawa, Y, Leucht, S, Ruhe, HG, Turner, EH, Higgins, JPT, Egger, M, Takeshima, N, Hayasaka, Y, Imai, H, Shinohara, K, Tajika, A, Ioannidis, JPA and Geddes, JR (2018) Comparative efficacy and acceptability of 21 antidepressant drugs for the acute treatment of adults with major depressive disorder: a systematic review and network meta-analysis. The Lancet 391, 1357–1366.Google Scholar

Cohen, ZD and DeRubeis, RJ (2018) Treatment selection in depression. Annual Review of Clinical Psychology 14, 209–236.Google Scholar

Cohen, ZD, Kim, T, Van, H, Dekker, J and Driessen, E (2017) Recommending cognitive-behavioral versus psychodynamic therapy for mild to moderate adult depression. PsyArXiv, https://osf.io/6qxve/.Google Scholar

DeRubeis, RJ, Cohen, ZD, Forand, NR, Fournier, JC, Gelfand, LA and Lorenzo-Luaces, L (2014) The personalized advantage index: translating research on prediction into individualized treatment recommendations. A demonstration. PLoS ONE 9, e83875.Google Scholar

Dunner, DL (2001) Acute and maintenance treatment of chronic depression. The Journal of Clinical Psychiatry 62(Suppl. 6), 10–16.Google Scholar

Eriksen, BA and Eriksen, CW (1974) Effects of noise letters upon the identification of a target letter in a nonsearch task. Perception & Psychophysics 16, 143–149.Google Scholar

Etkin, A, Patenaude, B, Song, YJC, Usherwood, T, Rekshan, W, Schatzberg, AF, Rush, AJ and Williams, LM (2015) A cognitive–emotional biomarker for predicting remission with antidepressant medications: a report from the iSPOT-D trial. Neuropsychopharmacology 40, 1332–1342.Google Scholar

Fava, M, Rush, AJ, Alpert, JE, Balasubramani, GK, Wisniewski, SR, Carmin, CN, Biggs, MM, Zisook, S, Leuchter, A, Howland, R, Warden, D and Trivedi, MH (2008) Difference in treatment outcome in outpatients with anxious versus nonanxious depression: a STAR*D report. American Journal of Psychiatry 165, 342–351.Google Scholar

Fiedler, K (2011) Voodoo correlations are everywhere – not only in neuroscience. Perspectives on Psychological Science 6, 163–171.Google Scholar

Fournier, JC, DeRubeis, RJ, Hollon, SD, Dimidjian, S, Amsterdam, JD, Shelton, RC and Fawcett, J (2010) Antidepressant drug effects and depression severity: a patient-level meta-analysis. JAMA 303, 47–53.Google Scholar

Fournier, JC, DeRubeis, RJ, Hollon, SD, Gallop, R, Shelton, RC and Amsterdam, JD (2013) Differential change in specific depressive symptoms during antidepressant medication or cognitive therapy. Behaviour Research and Therapy 51, 392–398.Google Scholar

Fournier, JC, DeRubeis, RJ, Shelton, RC, Hollon, SD, Amsterdam, JD and Gallop, R (2009) Prediction of response to medication and cognitive therapy in the treatment of moderate to severe depression. Journal of Consulting and Clinical Psychology 77, 775–787.Google Scholar

Fried, EI and Nesse, RM (2015 a) Depression is not a consistent syndrome: an investigation of unique symptom patterns in the STAR*D study. Journal of Affective Disorders 172, 96–102.Google Scholar

Fried, EI and Nesse, RM (2015 b) Depression sum-scores don't add up: why analyzing specific depression symptoms is essential. BMC Medicine 13, 72.Google Scholar

Friedman, J, Hastie, T and Tibshirani, R (2010) Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33, 1–22.Google Scholar

Garge, NR, Bobashev, G and Eggleston, B (2013) Random forest methodology for model-based recursive partitioning: the mobForest package for R. BMC Bioinformatics 14, 125.Google Scholar

Gillan, CM and Whelan, R (2017) What big data can do for treatment in psychiatry. Current Opinion in Behavioral Sciences 18, 34–42.Google Scholar

Goldstein, BL and Klein, DN (2014) A review of selected candidate endophenotypes for depression. Clinical Psychology Review 34, 417–427.Google Scholar

Gottesman, II and Gould, TD (2003) The endophenotype concept in psychiatry: etymology and strategic intentions. The American Journal of Psychiatry 160, 636–645.Google Scholar

Hamilton, M (1960) A rating scale for depression. Journal of Neurology, Neurosurgery & Psychiatry 23, 56–62.Google Scholar

Hastie, T, Tibshirani, R and Friedman, J (2009) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. 2nd Edn. New York, NY: Springer.Google Scholar

Herrera-Guzmán, I, Gudayol-Ferré, E, Herrera-Guzmán, D, Guàrdia-Olmos, J, Hinojosa-Calvo, E and Herrera-Abarca, JE (2009) Effects of selective serotonin reuptake and dual serotonergic-noradrenergic reuptake treatments on memory and mental processing speed in patients with major depressive disorder. Journal of Psychiatric Research 43, 855–863.Google Scholar

Hollon, SD (2016) The efficacy and acceptability of psychological interventions for depression: where we are now and where we are going. Epidemiology and Psychiatric Sciences 25, 295–300.Google Scholar

Holmes, AJ, Bogdan, R and Pizzagalli, DA (2010) Serotonin transporter genotype and action monitoring dysfunction: a possible substrate underlying increased vulnerability to depression. Neuropsychopharmacology 35, 1186–1197.Google Scholar

Huibers, MJH, Cohen, ZD, Lemmens, LHJM, Arntz, A, Peeters, FPML, Cuijpers, P and DeRubeis, RJ (2015) Predicting optimal outcomes in cognitive therapy or interpersonal psychotherapy for depressed individuals using the personalized advantage index approach. PLoS ONE 10, e0140771.Google Scholar

Jakobsen, JC, Katakam, KK, Schou, A, Hellmuth, SG, Stallknecht, SE, Leth-Møller, K, Iversen, M, Banke, MB, Petersen, IJ, Klingenberg, SL, Krogh, J, Ebert, SE, Timm, A, Lindschou, J and Gluud, C (2017) Selective serotonin reuptake inhibitors versus placebo in patients with major depressive disorder. A systematic review with meta-analysis and Trial Sequential Analysis. BMC Psychiatry 17, 58.Google Scholar

Jakubovski, E and Bloch, MH (2014) Prognostic subgroups for citalopram response in the STAR*D trial. The Journal of Clinical Psychiatry 75, 738–747.Google Scholar

Julien, RM (2013) A Primer of Drug Action: A Concise Nontechnical Guide to the Actions, Uses, and Side Effects of Psychoactive Drugs, Revised and Updated. New York: Henry Holt and Company.Google Scholar

Kapelner, A and Bleich, J (2016) Bartmachine: machine learning with Bayesian additive regression trees. Journal of Statistical Software 70, 1–40.Google Scholar

Khan, A, Dager, SR, Cohen, S, Avery, DH, Scherzo, B and Dunner, DL (1991) Chronicity of depressive episode in relation to antidepressant-placebo response. Neuropsychopharmacology 4, 125–130.Google Scholar

Khan, A, Leventhal, RM, Khan, SR and Brown, WA (2002) Severity of depression and response to antidepressants and placebo: an analysis of the Food and Drug Administration database. Journal of Clinical Psychopharmacology 22, 40–45.Google Scholar

Khin, NA, Chen, Y-F, Yang, Y, Yang, P and Laughren, TP (2011) Exploratory analyses of efficacy data from major depressive disorder trials submitted to the US food and drug administration in support of new drug applications. The Journal of Clinical Psychiatry 72, 464–472.Google Scholar

Kirsch, I (2015) Clinical trial methodology and drug-placebo differences. World Psychiatry 14, 301–302.Google Scholar

Kirsch, I, Deacon, BJ, Huedo-Medina, TB, Scoboria, A, Moore, TJ and Johnson, BT (2008) Initial severity and antidepressant benefits: a meta-analysis of data submitted to the food and drug administration. PLoS Medicine 5, e45.Google Scholar

Kraemer, HC (2013) Discovering, comparing, and combining moderators of treatment on outcome after randomized clinical trials: a parametric approach. Statistics in Medicine 32, 1964–1973.Google Scholar

Kraemer, HC and Blasey, CM (2004) Centring in regression analyses: a strategy to prevent errors in statistical inference. International Journal of Methods in Psychiatric Research 13, 141–151.Google Scholar

Lam, RW, Kennedy, SH, McIntyre, RS and Khullar, A (2014) Cognitive dysfunction in major depressive disorder: effects on psychosocial functioning and implications for treatment. The Canadian Journal of Psychiatry 59, 649–654.Google Scholar

Leucht, S, Hierl, S, Kissling, W, Dold, M and Davis, JM (2012) Putting the efficacy of psychiatric and general medicine medication into perspective: review of meta-analyses. The British Journal of Psychiatry 200, 97–106.Google Scholar

Mahableshwarkar, AR, Zajecka, J, Jacobson, W, Chen, Y and Keefe, RS (2015) A randomized, placebo-controlled, active-reference, double-blind, flexible-dose study of the efficacy of vortioxetine on cognitive function in major depressive disorder. Neuropsychopharmacology 40, 2025–2037.Google Scholar

Marcus, SC, Hassan, M and Olfson, M (2009) Antidepressant switching among adherent patients treated for depression. Psychiatric Services 60, 617–623.Google Scholar

Mars, B, Heron, J, Gunnell, D, Martin, RM, Thomas, KH and Kessler, D (2017) Prevalence and patterns of antidepressant switching amongst primary care patients in the UK. Journal of Psychopharmacology 31, 553–560.Google Scholar

McCrae, RR and Costa, PT (2010) NEO Inventories Professional Manual. Lutz, FL: Psychological Assessment Resources.Google Scholar

McGrath, CL, Kelley, ME, Holtzheimer, PE, Dunlop, BW, Craighead, WE, Franco, AR, Craddock, RC and Mayberg, HS (2013) Toward a neuroimaging treatment selection biomarker for major depressive disorder. JAMA Psychiatry 70, 821–829.Google Scholar

McMakin, DL, Olino, TM, Porta, G, Dietz, LJ, Emslie, G, Clarke, G, Wagner, KD, Asarnow, JR, Ryan, ND, Birmaher, B, Shamseddeen, W, Mayes, T, Kennard, B, Spirito, A, Keller, M, Lynch, FL, Dickerson, JF and Brent, DA (2012) Anhedonia predicts poorer recovery among youth with selective serotonin reuptake inhibitor-treatment resistant depression. Journal of the American Academy of Child and Adolescent Psychiatry 51, 404–411.Google Scholar

Milea, D, Guelfucci, F, Bent-Ennakhil, N, Toumi, M and Auray, J-P (2010) Antidepressant monotherapy: a claims database analysis of treatment changes and treatment duration. Clinical Therapeutics 32, 2057–2072.Google Scholar

Moncrieff, J and Kirsch, I (2015) Empirically derived criteria cast doubt on the clinical significance of antidepressant-placebo differences. Contemporary Clinical Trials 43, 60–62.Google Scholar

Moncrieff, J, Wessely, S and Hardy, R (2004) Active placebos versus antidepressants for depression. The Cochrane Library.Google Scholar

NICE (2018) Depression in adults: recognition and management|guidance and guidelines|NICE.Google Scholar

Pizzagalli, DA, Evins, AE, Schetter, EC, Frank, MJ, Pajtas, PE, Santesso, DL and Culhane, M (2008a) Single dose of a dopamine agonist impairs reinforcement learning in humans: behavioral evidence from a laboratory-based measure of reward responsiveness. Psychopharmacology 196, 221–232.Google Scholar

Pizzagalli, DA, Goetz, E, Ostacher, M, Iosifescu, DV and Perlis, RH (2008b) Euthymic patients with bipolar disorder show decreased reward learning in a probabilistic reward task. Biological Psychiatry 64, 162–168.Google Scholar

Pizzagalli, DA, Jahn, AL and O'Shea, JP (2005) Toward an objective characterization of an anhedonic phenotype: a signal-detection approach. Biological Psychiatry 57, 319–327.Google Scholar

Pizzagalli, DA, Webb, CA, Dillon, DG, Tenke, CE, Kayser, J, Goer, F, Fava, M, McGrath, P, Weissman, M, Parsey, R, Adams, P, Trombello, J, Cooper, C, Deldin, P, Oquendo, MA, McInnis, MG, Carmody, T, Bruder, G and Trivedi, MH (2018) Pretreatment rostral anterior cingulate cortex theta activity in relation to symptom improvement in depression: a randomized clinical trial. JAMA Psychiatry 75, 547–554.Google Scholar

Quilty, LC, Meusel, L-AC and Bagby, RM (2008) Neuroticism as a mediator of treatment response to SSRIs in major depressive disorder. Journal of Affective Disorders 111, 67–73.Google Scholar

R Core Team (2013) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar

Rush, AJ, Trivedi, MH, Ibrahim, HM, Carmody, TJ, Arnow, B, Klein, DN, Markowitz, JC, Ninan, PT, Kornstein, S, Manber, R, Thase, ME, Kocsis, JH and Keller, MB (2003) The 16-item quick inventory of depressive symptomatology (QIDS), clinician rating (QIDS-C), and self-report (QIDS-SR): a psychometric evaluation in patients with chronic major depression. Biological Psychiatry 54, 573–583.Google Scholar

Rush, AJ, Trivedi, MH, Wisniewski, SR, Nierenberg, AA, Stewart, JW, Warden, D, Niederehe, G, Thase, ME, Lavori, PW, Lebowitz, BD, McGrath, PJ, Rosenbaum, JF, Sackeim, HA, Kupfer, DJ, Luther, J and Fava, M (2006) Acute and longer-term outcomes in depressed outpatients requiring one or several treatment steps: a STAR*D report. The American Journal of Psychiatry 163, 1905–1917.Google Scholar

Saragoussi, D, Chollet, J, Bineau, S, Chalem, Y and Milea, D (2012) Antidepressant switching patterns in the treatment of major depressive disorder: a General Practice Research Database (GPRD) Study. International Journal of Clinical Practice 66, 1079–1087.Google Scholar

Schultz, J and Joish, V (2009) Costs associated with changes in antidepressant treatment in a managed care population with major depressive disorder. Psychiatric Services 60, 1604–1611.Google Scholar

Snaith, RP, Hamilton, M, Morley, S, Humayan, A, Hargreaves, D and Trigwell, P (1995) A scale for the assessment of hedonic tone: the Snaith-Hamilton Pleasure Scale. British Journal of Psychiatry 167, 99–103.Google Scholar

Soskin, DP, Carl, JR, Alpert, J and Fava, M (2012) Antidepressant effects on emotional temperament: toward a biobehavioral research paradigm for major depressive disorder. CNS Neuroscience & Therapeutics 18, 441–451.Google Scholar

Souery, D, Oswald, P, Massat, I, Bailer, U, Bollen, J, Demyttenaere, K, Kasper, S, Lecrubier, Y, Montgomery, S, Serretti, A, Zohar, J and Mendlewicz, J, Group for the Study of Resistant Depression (2007) Clinical factors associated with treatment resistance in major depressive disorder: results from a European multicenter study. The Journal of Clinical Psychiatry 68, 1062–1070.Google Scholar

Stekhoven, DJ and Bühlmann, P (2012) Missforest – non-parametric missing value imputation for mixed-type data. Bioinformatics (Oxford, England) 28, 112–118.Google Scholar

Tang, TZ, DeRubeis, RJ, Hollon, SD, Amsterdam, J, Shelton, R and Schalet, B (2009) Personality change during depression treatment: a placebo-controlled trial. Archives of General Psychiatry 66, 1322–1330.Google Scholar

Thomas, L, Kessler, D, Campbell, J, Morrison, J, Peters, TJ, Williams, C, Lewis, G and Wiles, N (2013) Prevalence of treatment-resistant depression in primary care: cross-sectional data. The British Journal of General Practice 63, e852–e858.Google Scholar

Trivedi, MH, McGrath, PJ, Fava, M, Parsey, RV, Kurian, BT, Phillips, ML, Oquendo, MA, Bruder, G, Pizzagalli, D, Toups, M, Cooper, C, Adams, P, Weyandt, S, Morris, DW, Grannemann, BD, Ogden, RT, Buckner, R, McInnis, M, Kraemer, HC, Petkova, E, Carmody, TJ and Weissman, MM (2016) Establishing moderators and biosignatures of antidepressant response in clinical care (EMBARC): rationale and design. Journal of Psychiatric Research 78, 11–23.Google Scholar

Trivedi, MH, Rush, AJ, Wisniewski, SR, Nierenberg, AA, Warden, D, Ritz, L, Norquist, G, Howland, RH, Lebowitz, B, McGrath, PJ, Shores-Wilson, K, Biggs, MM, Balasubramani, GK and Fava, M (2006) Evaluation of outcomes with citalopram for depression using measurement-based care in STAR*D: implications for clinical practice. American Journal of Psychiatry 163, 28–40.Google Scholar

Turner, EH, Matthews, AM, Linardatos, E, Tell, RA and Rosenthal, R (2008) Selective publication of antidepressant trials and Its influence on apparent efficacy. New England Journal of Medicine 358, 252–260.Google Scholar

Uher, R, Perlis, RH, Henigsberg, N, Zobel, A, Rietschel, M, Mors, O, Hauser, J, Dernovsek, MZ, Souery, D, Bajs, M, Maier, W, Aitchison, KJ, Farmer, A and McGuffin, P (2012 a) Depression symptom dimensions as predictors of antidepressant treatment outcome: replicable evidence for interest-activity symptoms. Psychological Medicine 42, 967–980.Google Scholar

Uher, R, Tansey, KE, Malki, K and Perlis, RH (2012 b) Biomarkers predicting treatment outcome in depression: what is clinically significant? Pharmacogenomics 13, 233–240.Google Scholar

Vrieze, E, Pizzagalli, DA, Demyttenaere, K, Hompes, T, Sienaert, P, de Boer, P, Schmidt, M and Claes, S (2013) Reduced reward learning predicts outcome in major depressive disorder. Biological Psychiatry 73, 639–645.Google Scholar

Vuorilehto, MS, Melartin, TK and Isometsä, ET (2009) Course and outcome of depressive disorders in primary care: a prospective 18-month study. Psychological Medicine 39, 1697–1707.Google Scholar

Wakefield, JC and Schmitz, MF (2013) When does depression become a disorder? Using recurrence rates to evaluate the validity of proposed changes in major depression diagnostic thresholds. World Psychiatry 12, 44–52.Google Scholar

Waljee, AK, Mukherjee, A, Singal, AG, Zhang, Y, Warren, J, Balis, U, Marrero, J, Zhu, J and Higgins, PD (2013) Comparison of imputation methods for missing laboratory data in medicine. BMJ Open 3, e002847.Google Scholar

Watson, D, Clark, LA, Weber, K, Assenheimer, JS, Strauss, ME and McCormick, RA (1995) Testing a tripartite model: II. Exploring the symptom structure of anxiety and depression in student, adult, and patient samples. Journal of Abnormal Psychology 104, 15–25.Google Scholar

Webb, CA, Dillon, DG, Pechtel, P, Goer, FK, Murray, L, Huys, QJ, Fava, M, McGrath, PJ, Weissman, M, Parsey, R, Kurian, BT, Adams, P, Weyandt, S, Trombello, JM, Grannemann, B, Cooper, CM, Deldin, P, Tenke, C, Trivedi, M, Bruder, G and Pizzagalli, DA (2016) Neural correlates of three promising endophenotypes of depression: evidence from the EMBARC study. Neuropsychopharmacology 41, 454–463.Google Scholar

Fig. 1. Plots of baseline predictor by treatment group interactions from the final model.

Table 1. Variable selection results

Table 2. Final model

Webb et al. supplementary material

Webb et al. supplementary material 1

File 302.8 KB

Article contents

Personalized prediction of antidepressant v. placebo response: evidence from the EMBARC study

Abstract

Keywords

Introduction

Methods and materials

Participants

Endophenotype measures

Clinical measures

Data acquisition and reduction

Statistical analyses

Variable selection

Generating PAIs

Evaluating PAIs

Results

Variable selection

Predicted outcomes and PAIs

Observed outcomes in indicated v. non-indicated treatment condition

Full sample

Largest PAIs (PAI ⩾ |3|)

Alternative PAI models

Discussion

Limitations

Financial support

Supplementary material

Acknowledgements

Conflict of interest

Ethical standards

Footnotes

References

Webb et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests