Introduction
Depression affects an estimated one in five people over the lifetime with most cases beginning during the adolescent to young adult period (Kessler et al. Reference Kessler, Berglund, Demler, Jin, Merikangas and Walters2005, Reference Kessler, Angermeyer, Anthony, De Graaf, Demyttenaere and Gasquet2007). It is often a chronic and recurring condition (Wilson et al. Reference Wilson, Hicks, Foster, McGue and Iacono2015) associated with high levels of psychological distress, impairments in functioning and poor physical health (Lewinsohn et al. Reference Lewinsohn, Rohde and Seeley1998, Reference Lewinsohn, Rohde, Seeley, Klein and Gotlib2003; Brent & Birmaher, Reference Brent and Birmaher2002; Thapar et al. Reference Thapar, Collishaw, Pine and Thapar2012), and is the leading contributor to the global burden of disease in young people under the age of 25 (Gore et al. Reference Gore, Bloem, Patton, Ferguson, Joseph and Coffey2011).
Established, guideline recommended treatments for depression such as cognitive behavioural therapy (CBT) and antidepressants (e.g., fluoxetine) are at best only modestly effective (Weersing & Brent, Reference Weersing and Brent2006; Weisz et al. Reference Weisz, McCarty and Valeri2006; Hetrick et al. Reference Hetrick, McKenzie, Cox, Simmons and Merry2012; Cipriani et al. Reference Cipriani, Zhou, Del Giovane, Hetrick, Qin and Whittington2016), with significant proportions of recipients either non-responsive or continuing to experience symptoms (Andrews et al. Reference Andrews, Sanderson, Corry and Lapsley2000; March et al. Reference March, Silva, Petrycki, Curry, Wells and Fairbank2004; TADS Team, 2007). Alternative interventions are therefore indicated to support full recovery, either as stand-alone or adjunct treatment strategies. Lifestyle medicine is one such alternative strategy increasingly implicated in the management of mental ill-health, particularly the use of physical activity to treat depression (Sarris et al. Reference Sarris, O'Neil, Coulson, Schweitzer and Berk2014).
The mechanisms through which physical activity exerts influence on depression are largely understudied, however they are likely complex and multifaceted, involving synergies of neurobiological and psychosocial factors. These may include processes that are both disrupted or dysregulated in depression and potentially modulated by physical activity including inflammatory and oxidative stress responses, neurogenesis, modulation of monoamines (e.g., serotonin), and HPA axis regulation, among others (see Deslandes et al. (Reference Deslandes, Moraes, Ferreira, Veiga, Silveira and Mouta2009); Wegner et al. (Reference Wegner, Helmich, Machada, Nardi, Arias-Carrion and Budde2014); Schuch et al. (Reference Schuch, Deslandes, Stubbs, Gosmann, da Silva and Fleck2016a) for review). In terms of proposed psychosocial processes, physical activity may have a general behavioural activation effect though activity scheduling and positive reinforcement, and may provide opportunities for mastery or achievement, thus improving self-efficacy. It may also afford opportunities for social interaction and potentially provide distraction from negative thoughts, mood states or ruminative cognitions (Salmon, Reference Salmon2001; Craft & Perna, Reference Craft and Perna2004; Veale, Reference Veale2008).
Recent meta-analytic reviews of adult trials have demonstrated that physical activity interventions can reduce depression symptoms, with moderate to large effects (Cooney et al. Reference Cooney, Dwan, Greig, Lawlor, Rimer and Waugh2013; Stubbs et al. Reference Stubbs, Vancampfort, Rosenbaum, Ward, Richards and Ussher2016a; Kvam et al. Reference Kvam, Kleppe, Nordhus and Hovland2016; Schuch et al. Reference Schuch, Vancampfort, Richards, Rosenbaum, Ward and Stubbs2016b). Meta-analyses of child and adolescent trials have identified small to moderate effects on mental health outcomes, including reducing depression (Larun et al. Reference Larun, Nordheim, Ekeland, Hagen and Heian2006; Brown et al. Reference Brown, Pearson, Braithwaite, Brown and Biddle2013; Carter et al. Reference Carter, Morres, Meade and Callaghan2016). However, these analyses have relied upon trials where physical activity was delivered either to healthy samples, samples with primary conditions other than depression (e.g., anxiety, obesity, autism), or to children (under 12 years). The efficacy of physical activity for young people (aged 12–25) who are experiencing depression, particularly at clinical levels, is yet to be established.
We performed a meta-analysis on all available randomised controlled trials (RCT) where physical activity was delivered as an intervention to participants aged 12–25 years, experiencing a diagnosis or symptoms of depression. The primary aim was to estimate the effect of physical activity on depression symptoms, with secondary aims to examine intervention acceptability using dropout as a proxy, and whether trial-level characteristics such as age group, diagnostic status, depression severity, clinical v. non-clinical samples and type of control group, modified the treatment effect. We also aimed to investigate the effect of different physical activity intervention characteristics on depression symptoms.
Method
The methods described in the Cochrane Handbook of Systematic Reviews (Higgins & Green, Reference Higgins and Green2011a) were used and reporting is according to the PRISMA guidelines (Moher et al. Reference Moher, Liberati, Tetzlaff and Altman2009, Reference Moher, Shamseer, Clarke, Ghersi, Liberati and Petticrew2015). The review was prospectively registered with PROSPERO (CRD42015024388).
Trial eligibility criteria
Types of studies
RCTs were eligible. Only published, peer-reviewed English-language trials were considered.
Types of participants
Trials recruiting adolescents and/or young adults (mean age ⩾12 and <26 years) experiencing depression as determined by (a) meeting diagnostic criteria according to established nosology or (b) an explicitly stated minimum threshold (defined by trial authors) on a self-report or observer-rated symptom measure indicating presence of depression symptoms. Trials that recruited participants without depression or where depression was secondary to another disorder or health condition were excluded.
Types of interventions
All physical activity interventions were eligible. We used the American College of Sports Medicine definition of physical activity, which is ‘any bodily movement produced by skeletal muscles that results in energy expenditure above resting levels' (Garber et al. Reference Garber, Blissmer, Deschenes, Franklin, Lamonte and Lee2011).
Types of control/comparison groups
Control groups included no-treatment (NT), wait-list (WL) and attention/activity placebo (AP) conditions. AP was defined as a condition that could reasonably be considered to control for non-specific intervention group factors and was not an established treatment for depression (Lindheimer et al. Reference Lindheimer, O'Connor and Dishman2015). Comparison treatments could include psychological therapy, medication and treatment as usual (TAU).
Outcome measures
The primary outcome was depression symptoms as assessed with a validated symptom scale at the post-intervention time-point. Where a trial reported more than one depression outcome, the following hierarchy was used: (1) Observer-rated depression, (2) Self-report depression.
Search strategy
Electronic database searches were conducted for the period January 1980 to September 2016 using PsycINFO, Medline, Embase and the Cochrane Central Register of Controlled Trials. Search terms for depression, physical activity/exercise and controlled trials are available in Supplementary Material. This strategy was supplemented by an ancestry search of the included trials and recently published systematic reviews (Larun et al. Reference Larun, Nordheim, Ekeland, Hagen and Heian2006; Rethorst et al. Reference Rethorst, Wipfli and Landers2009; Brown et al. Reference Brown, Pearson, Braithwaite, Brown and Biddle2013; Cooney et al. Reference Cooney, Dwan, Greig, Lawlor, Rimer and Waugh2013; Rosenbaum et al. Reference Rosenbaum, Tiedemann, Sherrington, Curtis and Ward2014; Wegner et al. Reference Wegner, Helmich, Machada, Nardi, Arias-Carrion and Budde2014; Nyström et al. Reference Nyström, Neely, Hassmén and Carlbring2015). A two-stage screening process was conducted using the eligibility criteria defined above. One author conducted first stage screening based on title and abstract. A second author screened 10% of these references to ensure consistency. Independent second stage screening was conducted on the full-text of all references identified in the first stage. Discrepancies were resolved by discussion of full-text.
Data extraction
Data were extracted using a previously piloted, standardised extraction template and targets included sample, intervention (e.g., type, frequency, duration and intensity of physical activity) and control/comparison group characteristics, and outcome data at post-intervention and follow-up. Where outcome data were reported in graphical format, trial authors were contacted requesting numeric data. Where it could not be obtained, the WebPlotDigitizer application (Rohatgi, Reference Rohatgi2013; Tsafnat et al. Reference Tsafnat, Glasziou, Choong, Dunn, Galgani and Coiera2014) was used to convert graphical to numeric data. This process was used to reduce potential bias in the meta-analysis if these trials were excluded (Higgins & Green, Reference Higgins and Green2011a; Vučić et al. Reference Vučić, Jeličić Kadić and Puljak2015). A second author independently extracted outcome data for meta-analysis. Discrepancies were discussed and checked against the trial publication.
Risk of bias and GRADE
Bias within trials was assessed using the Cochrane Collaboration's risk of bias tool (Higgins et al. Reference Higgins, Altman, Gøtzsche, Jüni, Moher and Oxman2011b). We examined selection bias (random sequence generation, allocation concealment), performance bias (blinding of participant and personnel), detection bias (outcome assessor blinding), attrition bias (handling of incomplete outcome data), and other bias including baseline imbalance on the primary outcome and selective reporting. Risk of bias assessments were rated independently by two authors. Discrepancies were resolved in consultation with a third author. The GRADE criteria were used to rate overall quality of the evidence contributing to the primary meta-analysis (Balshem et al. Reference Balshem, Helfand, Schünemann, Oxman, Kunz and Brozek2011; Schünemann et al. Reference Schünemann, Brożek, Guyatt and Oxman2013). GRADE criteria included limitations of study design (risk of bias across trials), indirectness of evidence, inconsistency of results, imprecision of results and probability of significant publication bias.
Data analysis
The primary outcome was depression symptoms at post-intervention. Data were entered in RevMan® (The Cochrane Collaboration, 2014) as mean, standard deviation and number of participants for both intervention and control groups, and pooled for meta-analysis using a random-effects model due to expected between-trial heterogeneity (as trials likely employed different physical activity interventions). The effect was estimated as standardised mean difference (SMD) using Hedges’ g (adjusted for small sample size bias) with 95% Confidence Intervals (CI) to allow pooling of data from different depression symptom scales. The magnitude of estimated SMD was categorised as small (0.2), medium (0.5) or large (0.8) (Cohen, Reference Cohen1988). Heterogeneity was assessed using standard I 2 statistic parameters (Higgins et al. Reference Higgins and Green2011a). Publication bias was assessed by funnel plot inspection, use of the trim-and-fill method to adjust the pooled effect (Duval & Tweedie, Reference Duval and Tweedie2000) and estimation of the fail-safe N (Rosenthal, Reference Rosenthal1979).
Sensitivity analyses were based on the primary meta-analysis and targets included risk of bias domains (sequence generation, allocation concealment, outcome assessor blinding and incomplete outcome data were selected as these have been shown to bias effect estimates towards the intervention (Schulz et al. Reference Schulz, Chalmers, Hayes and Altman1995; Wood et al. Reference Wood, Egger, Gluud, Schulz, Jüni and Altman2008; Bell et al. Reference Bell, Kenward, Fairclough and Horton2013)), source of depression symptom rating, and review-level decisions including pooling of activity arms and inclusion of potentially heterogeneous forms of activity intervention or control.
The secondary outcome was intervention acceptability, which was assessed using dropout rates. Where dropout and missing data could not be distinguished, missing data at post-treatment was used. These data were pooled for meta-analysis and the risk difference (RD) with 95% CI was estimated using the Mantel–Haenszel method with random-effects.
Observational subgroup analysis was used to investigate whether the effect of physical activity on depression was modified by certain factors. Pre-specified targets for subgrouping were type of control group (WL/NT v. AP), trial sample characteristics including age group (<18 v. ⩾18 years), depression severity (mild, moderate, severe), diagnostic criteria (diagnosis v. threshold symptoms), sample recruitment (clinical v. non-clinical) and physical activity intervention characteristics including intensity (light, moderate, vigorous) and activity type (aerobic v. resistance). Meta-regression was used to examine whether continuous variables (mean age and mean baseline depression symptom severity) were associated with effect size.
Unit of analysis issues
Where a trial used a cross-over design, outcome from the first phase prior to cross-over was selected. Where a trial reported more than one physical activity arm compared with a control condition, the physical activity arms were pooled. This was done to avoid data loss and potential unit of analysis problems (Higgins et al. Reference Higgins and Green2011a). Where a trial utilised more than one control arm (e.g., WL and AP), the more rigorous control was selected (see Lindheimer et al. Reference Lindheimer, O'Connor and Dishman2015). These approaches were taken to ensure the treatment effect was not inflated.
Results
We retrieved 9288 unique publications (see Fig. 1), of which 17 trials were eligible for inclusion (McCann & Holmes, Reference McCann and Holmes1984; Woolery et al. Reference Woolery, Myers, Sternlieb and Zeltzer2004; Jeong et al. Reference Jeong, Hong, Myeong, Park, Kim and Suh2005; Nabkasorn et al. Reference Nabkasorn, Miyai, Sootmongkol, Junprasert, Yamamoto and Arita2006; Yavari, Reference Yavari2008; Chu et al. Reference Chu, Buckworth, Kirby and Emery2009; Mohammadi, Reference Mohammadi2011; Roshan et al. Reference Roshan, Pourasghar and Mohammadian2011; Hemat-Far et al. Reference Hemat-Far, Shahsavari and Mousavi2012; Moghaddam et al. Reference Moghaddam, Hefzollesan, Salehian and Shirmohammadzadeh2012; Hughes et al. Reference Hughes, Barnes, Barnes, DeFina, Nakonezny and Emslie2013; Noorbakhsh & Alijani, Reference Noorbakhsh and Alijani2013; Legrand, Reference Legrand2014; Carter et al. Reference Carter, Guo, Turner, Morres, Khalil and Brighton2015; Cecchini-Estrada et al. Reference Cecchini-Estrada, Mendez-Gimenez, Cecchini, Moulton and Rodriguez2015; Balchin et al. Reference Balchin, Linde, Blackhurst, Rauch and Schonbachler2016; Sadeghi et al. Reference Sadeghi, Ahmadi, Ahmadi, Rezaei, Miri and Abdi2016). Of these, 16 trials provided data for the primary meta-analysis. The characteristics of the included trials are presented in Table 1 and briefly summarised below.
BDI, Beck Depression Inventory; CDI-2, Children's Depression Inventory-2; CES-D, Centre for Epidemiological Studies Depression scale; CDRS-R, Childs Depression Rating Scale – Revised; DSM, Diagnostic & Statistical Manual of Mental Disorders; HAM-D, Hamilton Depression Rating Scale; N, total participants randomised; QIDS-A-C17, Quick Inventory of Depression Symptomatology – Adolescent – Clinician Rated; QIDS-A-SR, Quick Inventory of Depression Symptomatology – Adolescent – Self-report; *, author reported severity category; –, not-reported or unclear.
Characteristics of included trials
Participants
Trial sample sizes ranged from 20 to 106 participants (median = 47, IQR = 41). Mean age ranged from 15.4 to 25.8 years. Eight trials were conducted with female participants only. Five trials recruited clinical samples (from inpatient/outpatient treatment services or having a clinician confirmed diagnosis) and 12 trials recruited non-clinical samples. Most trials recruited participants with elevated depression symptoms above a specified threshold (n = 13), while four used a clinician confirmed diagnosis of depression. Baseline depression severity ranged from mild (n = 4) to moderate (n = 10) to severe (n = 2) (see Supplementary Material for categories). Ten trials recruited an inactive sample, while seven did not report baseline activity level.
Interventions and controls
The characteristics of the physical activity interventions delivered in each trial are summarised in Table 2. Most trials used aerobic-based physical activity (n = 12), and there was considerable variation in the type of activity. The intensity of activity was estimated by converting reported activity type or intensity into metabolic equivalents (METs) (Norton et al. Reference Norton, Norton and Sadgrove2010; Ainsworth et al. Reference Ainsworth, Haskell, Herrmann, Meckes, Bassett and Tudor-Locke2011). Most trials involved moderate (3–6METs, n = 6) to vigorous activity (>6METs, n = 4). All trials prescribed either the type or intensity of activity, although four incorporated participant preference. Intervention periods ranged from 5 to 12 weeks (median = 8, IQR = 4) with one to five activity sessions per week (median = 3, IQR = 1). Session duration ranged from 30 to 90 min (median = 60, IQR = 15). Most trials used supervised activity sessions (n = 11), with seven using trained and qualified professionals. Eight trials implemented interventions in group settings, one of which combined group and individual components. Three additional trials were done with individuals. Control groups were no-treatment (NT, n = 5), wait-list (WL, n = 5), and attention/activity placebo (AP, n = 7). Placebo conditions consisted of stretching/flexibility (n = 3), relaxation (n = 1), a physical education class (n = 1), very light activity (n = 1) and an unguided group meeting (n = 1). Eight trials had multiple intervention arms. Six contained two or more physical activity arms v. control. These multiple activity arms were collapsed within trials for the primary meta-analysis (Chu et al. Reference Chu, Buckworth, Kirby and Emery2009; Mohammadi, Reference Mohammadi2011; Noorbakhsh & Alijani, Reference Noorbakhsh and Alijani2013; Cecchini-Estrada et al. Reference Cecchini-Estrada, Mendez-Gimenez, Cecchini, Moulton and Rodriguez2015; Balchin et al. Reference Balchin, Linde, Blackhurst, Rauch and Schonbachler2016). One trial was physical activity v. AP v. WL and the comparison against AP was selected for meta-analysis (McCann & Holmes, Reference McCann and Holmes1984). One trial was physical activity v. CBT v. control and another trial added physical activity to TAU compared with TAU alone. No trials were identified comparing physical activity to medication.
Supervised (S) or unsupervised (U), qualied instructor (Q), group (G) or individual (I); EEG, energy expenditure goal; AP, attention/activity placebo; NT, no-treatment control; WL, wait-list control; TAU, treatment as usual; HR, heart rate; MaxVO2, maximal oxygen uptake; *MET, metabolic equivalent estimate (based on Ainsworth et al. Reference Ainsworth, Haskell, Herrmann, Meckes, Bassett and Tudor-Locke2011 and Norton et al. Reference Norton, Norton and Sadgrove2010); (Ainsworth). –, not reported or unclear.
Outcomes
Fifteen trials used self-report measures, most commonly the Beck Depression Inventory (BDI) (n = 9), and three reported observer-rated depression symptom measures.
Risk of bias
Risk of bias assessments within and across trials is displayed in Fig. 2a and b. Generation of the randomisation sequence was adequate in only five trials. Four trials adequately concealed allocation. Blinding of intervention personnel and participants to group allocation cannot be adequately achieved in physical activity trials. Blinding of outcome assessor cannot be achieved for self-report outcome measures. Two of three trials using an observer-rated outcome measure masked assessors to group allocation. Six trials were rated as low risk of bias for handling of incomplete post-treatment data. Baseline imbalance on the primary outcome was not detected in 15 trials. Protocols were identified for only three trials resulting in a low risk of bias rating for selective reporting. Overall, selection bias could not be ruled out in 88% of trials, performance bias was likely present in 100% of trials, detection bias was present or could not be ruled out in 88% of trials and attrition bias was present or could not ruled out in 59% of trials.
Intervention adherence
Seven trials reported intervention adherence or attendance data. Three reported that on average 66% to 87% of intervention sessions were attended, one reported an average energy expenditure target adherence of 77%, two reported that 64% and 68% of participants completed all activity sessions and one trial reported that all participants attended at least 22 of 24 sessions.
Imputation of trial outcome data
Two trials reported graphical outcome data which we converted to numerical format as described above (McCann & Holmes, Reference McCann and Holmes1984; Nabkasorn et al. Reference Nabkasorn, Miyai, Sootmongkol, Junprasert, Yamamoto and Arita2006). One trial did not report an estimate of variability (McCann & Holmes, Reference McCann and Holmes1984), therefore we imputed the missing standard deviation with an estimate pooledFootnote †Footnote 1 from the eight included trials that had used the same outcome measure (BDI) at post-intervention, based on the recommendations by Furukawa et al. (Reference Furukawa, Barbui, Cipriani, Brambilla and Watanabe2006) and in the Cochrane Handbook (Higgins et al. Reference Higgins and Green2011a). One trial did not report extractable outcome data and is therefore not included in meta-analysis (Moghaddam et al. Reference Moghaddam, Hefzollesan, Salehian and Shirmohammadzadeh2012).
Meta-analysis results
The primary meta-analysis pooled 16 trials (n = 771) testing the effect of physical activity on depression symptoms at post-intervention compared with a control condition (Fig. 3), finding a large effect in favour of physical activity (SMD = −0.82, 95% CI = −1.02, to −0.61, p < 0.05, I 2 = 38%).
Publication bias
Estimation of the fail-safe N suggests that 430 trials with no effect would be needed before the pooled effect was no longer statistically significant. The fill-and-trim analysis suggests four trials may be missing from the right side of the funnel plot (see Supplementary Material Fig. S2). Imputing these missing trials produced an adjusted pooled effect in favour of physical activity of −0.69 (95% CI = −0.90 to −0.48).
Sensitivity analysis (Table 3)
We were unable to conduct a sensitivity analysis restricted to better quality trials as there were not enough available trials at low risk of bias across all or most domains of bias. Therefore we conducted four separate sensitivity analyses excluding trials that were rated as either unclear or high risk of bias for sequence generation, allocation concealment, outcome assessor blinding and incomplete outcome data. The pooled effect remained in favour of physical activity for trials at low risk of bias for sequence generation (k = 5, SMD = −0.63, 95% CI = −0.97 to −0.29), for blinding of outcome assessor (k = 2, SMD = −0.90, 95% CI = −1.47 to −0.32) and for incomplete outcome data (k = 6, SMD = −0.72, 95% CI = −1.03 to −0.40), but not for allocation concealment (k = 4, SMD = −0.48, 95% CI = −1.02 to 0.05). When multiple activity and control arms were available within a trial, the comparison identified as producing the largest effect size was selected for sensitivity analysis. This was in contrast to the primary analysis where a more conservative approach was taken by pooling activity arms within trials and selecting the more rigorous control group for comparison. This sensitivity analysis produced a larger effect (SMD = −1.00) when compared with the primary analysis (SMD = −0.82), however heterogeneity was substantially increased (I 2 = 38% to 61%). Four trials appeared to categorically differ from the others and therefore may have introduced heterogeneity to the primary analysis; two employed alternative intervention modalities (yoga in Woolery et al. (Reference Woolery, Myers, Sternlieb and Zeltzer2004); dance movement therapy in Jeong et al. (Reference Jeong, Hong, Myeong, Park, Kim and Suh2005)), and two used control conditions, which may not be equivalent to NT, WL or AP (physical activity + TAU v. TAU in Carter et al. (Reference Carter, Guo, Turner, Morres, Khalil and Brighton2015); the AP control group engaged in significant levels of activity in Balchin et al. (Reference Balchin, Linde, Blackhurst, Rauch and Schonbachler2016)). Removal of these trials reduced heterogeneity (I 2 = 0%), but did not substantially alter the pattern of results (SMD = −0.92). Similar magnitudes of effect were found when the analysis was restricted to either observer-rated or self-report depression symptom measure outcomes and when trials with imputed data from graphical representations were removed from the analysis.
k, number of trials; n, number of participants; SMD, standardised mean difference; CI, confidence interval.
a Excluded from analysis are Carter et al. (Reference Carter, Guo, Turner, Morres, Khalil and Brighton2015) and Balchin et al. (Reference Balchin, Linde, Blackhurst, Rauch and Schonbachler2016).
b Excluded from analysis are Jeong et al. (Reference Jeong, Hong, Myeong, Park, Kim and Suh2005) and Woolery et al. (Reference Woolery, Myers, Sternlieb and Zeltzer2004).
Analysis of dropout
Dropout rate from randomisation to post-intervention was 11% (95% CI = 4.8–17.6) in physical activity arms and 18% (95% CI = 9.5–27.8) in control arms, however there was no significant difference between arms when trial dropout was pooled (k = 12, RD = −0.01, 95% CI = −0.04 to 0.03, p = 0.70) (Fig. 4).
Subgroup analyses
The observational results in Table 4 show that in these included trials, the effect sizes did not significantly differ by type of control group (WL/NT v. AP), age group (<18 v. ⩾18), diagnostic status (diagnosis v. threshold symptoms), sample recruitment (clinical v. non-clinical), depression severity category (mild, moderate, severe), type of physical activity (aerobic v. resistance) and intensity (light, moderate, vigorous). Meta-regression analyses found no relationship between physical activity's observed effect and either of the two continuous variables (mean age and standardised mean depression symptoms at baseline, both p > 0.1).
k, number of trials; n, number of participants; SMD, standardised mean difference; CI, confidence interval; PA, physical activity; NT, no-treatment; WL, wait-list.
a Two trials (Chu et al. Reference Chu, Buckworth, Kirby and Emery2009; Balchin et al. Reference Balchin, Linde, Blackhurst, Rauch and Schonbachler2016) have multiple physical activity arms of differing intensity and thus contribute non-independent effects to the intensity sub-group analysis.
Grade
Overall quality of the evidence contributing to the primary meta-analysis was rated as LOW to VERY LOW. Serious or very serious limitations in study design and suspected publication bias led to a downgrading of the evidence by two to three levels (See Supplementary Material for GRADE ratings). The level of evidence was not downgraded for either imprecision, inconsistency, or indirectness.
Discussion
Main findings
Physical activity appears to show efficacy for improving depression symptoms in adolescents and young adults experiencing a diagnosis or threshold symptoms of depression. However the risk of bias within included trials and the low quality of the overall evidence base limit our confidence in this finding. None-the-less, physical activity does appear to be an acceptable and feasible intervention modality for young people experiencing depression given the low dropout rate. Subgroup and meta-regression analyses suggest that the treatment effect may not be modified by characteristics such as age, depression severity, diagnostic status, physical activity type or intensity, however these analyses are observational, likely underpowered to detect effects and should be interpreted with caution. While we do not yet know the specific intervention characteristics required to bring about symptom improvement, we identify a number of characteristics common across trials that may inform future research agendas and the implementation of physical activity interventions.
Context of main findings
To provide a clinical interpretation of the large pooled effect, the SMD (−0.82) was back-transformed into units of the BDI (Higgins et al. Reference Higgins and Green2011a), showing that those receiving a physical activity intervention would score, on average, 5.38 (95% CI = 4.00–6.69) points lower on the BDI than those in a control conditionFootnote 2. The minimal clinically important difference on the BDI has been estimated at between three and five points (Hiroe et al. Reference Hiroe, Kojima, Yamamoto, Nojima, Kinoshita and Hashimoto2005) and elsewhere as a 17.5% reduction from baseline (Button et al. Reference Button, Kounali, Thomas, Wiles, Peters and Welton2015). This suggests that physical activity may produce a clinically significant reduction in depression symptoms. Furthermore, the effect was robust when restricting the analysis to the seven trials comparing physical activity to attention/activity placebo controls (−0.82, I 2 = 0%). Importantly this provides some indication that the effect estimate may be due to the physical activity intervention rather than the non-specific factors that cannot be controlled in comparison with no-treatment/wait-list controls (Lindheimer et al. Reference Lindheimer, O'Connor and Dishman2015; Stubbs et al. Reference Stubbs, Vancampfort, Rosenbaum, Ward, Richards and Ussher2016a). However further research is needed to establish this finding given the observational nature of the analysis, the small number and the low quality of included trials.
The large effect generated from this meta-analysis is consistent in size with meta-analytic findings of physical activity for depression in adults (Cooney et al. Reference Cooney, Dwan, Greig, Lawlor, Rimer and Waugh2013; Kvam et al. Reference Kvam, Kleppe, Nordhus and Hovland2016; Schuch et al. Reference Schuch, Vancampfort, Richards, Rosenbaum, Ward and Stubbs2016b). In terms of previous child and adolescent meta-analyses, these have included trials of healthy young people, those with other medical or mental health conditions or children under 12 years, potentially complicating the generalisability of their findings to the treatment of depression (Larun et al. Reference Larun, Nordheim, Ekeland, Hagen and Heian2006; Brown et al. Reference Brown, Pearson, Braithwaite, Brown and Biddle2013; Carter et al. Reference Carter, Morres, Meade and Callaghan2016). The current meta-analysis synthesised only trials of adolescents and young adults with either a diagnosis or threshold symptoms of depression highlighting its relevance to young people needing treatment, particularly as our subgroup analysis suggests a robust effect size in trials that recruited clinical samples. We also identified and included seven RCTs that had not appeared in any previous adult or child-adolescent review.
In the context of established treatments for youth depression, psychological interventions demonstrate small-to-moderate treatment effects (Weisz et al. Reference Weisz, McCarty and Valeri2006, Reference Weisz, Kuppens, Ng, Eckshtain, Ugueto and Vaughn-Coaxum2017; Watanabe et al. Reference Watanabe, Hunot, Omori, Churchill and Furukawa2007). While our meta-analysis generated a large preliminary effect size, physical activity is considerably less researched than established psychotherapies and we have limited information regarding head-to-head comparisons. Only one trial to date has compared physical activity with CBT for depression in young people, finding equivalent treatment effects in comparison with control (Sadeghi et al. Reference Sadeghi, Ahmadi, Ahmadi, Rezaei, Miri and Abdi2016). Physical activity interventions may exert some influence on depression via a general behavioural activation effect, an often-utilised treatment component of CBT. This is potentially relevant to youth depression given that behavioural-based interventions may be better suited to younger age groups (Hetrick et al. Reference Hetrick, Cox, Fisher, Bhar, Rice and Davey2015). Preliminary work is exploring the use of physical activity-based interventions delivered via behavioural activation frameworks for depression in both young people and adults (Parker et al. Reference Parker, Hetrick, Jorm, Mackinnon, McGorry and Yung2016; Euteneuer et al. Reference Euteneuer, Dannehl, del Rey, Engler, Schedlowski and Rief2017).
Our investigation of attrition rates as a proxy for intervention acceptability showed that dropout across physical activity arms was 11%, which did not differ from controls. This rate is comparable with that established in a recent meta-analysis of dropout from physical activity trials in adults with depression (15.2%, (Stubbs et al. Reference Stubbs, Vancampfort, Rosenbaum, Ward, Richards and Soundy2016b)). It is also equivalent to pooled attrition rates observed from psychotherapy trials for depression in young people (12%, (Weisz et al. Reference Weisz, McCarty and Valeri2006)) and substantially better than rates identified for antidepressant medication (19% to 38%, (Hetrick et al. Reference Hetrick, McKenzie, Cox, Simmons and Merry2012)), suggesting that physical activity is at least as acceptable as psychotherapy and may be more acceptable than medication. Additionally, young people appear more likely to endorse physical activity as a helpful intervention for depression, than either medication or psychotherapy (Jorm & Wright, Reference Jorm and Wright2007; Reavley & Jorm, Reference Reavley and Jorm2011), further highlighting the potential acceptability and feasibility of employing this intervention modality with young people.
Quality of evidence
The overall quality of evidence contributing to the meta-analysis is low, suggesting the current findings should be interpreted caution. We were unable to undertake an analysis restricted to high quality trials, because there are currently not enough available trials at low risk of bias across all or most domains, to do so. While the effect sizes from three of our four sensitivity analyses by individual risk of bias domain remained largely unchanged compared with the overall effect, each analysis was restricted to a very small number of trials meaning we cannot rule out bias from the overall effect size. This uncertainty is likely a result of inadequate reporting of trial methods, particularly as many domains (e.g., selection and attrition bias) received unclear ratings across trials. Both trial-level selection and attrition bias have been shown to impact the size of effect estimate (Schulz et al. Reference Schulz, Chalmers, Hayes and Altman1995; Bell et al. Reference Bell, Kenward, Fairclough and Horton2013). Of particular concern to the internal validity of the current finding is that physical activity is an unblinded intervention (risk of performance bias), and in the context of a self-report outcome measure (risk of detection bias), there is the potential to inflate the effect in favour of the intervention. Large, robust, adequately reported trials that attempt to reduce the risk of bias in their methodologies are therefore needed to increase confidence in the current finding. Publication bias cannot be ruled out given the small attenuation of effect size using the trim and fill method, however its potential effect appears small given the adjusted effect size (after imputing potentially suppressed trials) was moderate and remained significant, coupled with the large observed fail-safe N. Ratings for two of the five GRADE domains (limitations of study design, publication bias) resulted in a downgrading of the current RCT-generated evidence from HIGH to LOW or VERY LOW, suggesting that confidence in the effect is limited and the effect size may be substantially different from the estimate presented (Balshem et al. Reference Balshem, Helfand, Schünemann, Oxman, Kunz and Brozek2011; Schünemann et al. Reference Schünemann, Brożek, Guyatt and Oxman2013).
Implementation and further research
We do not yet know the specific characteristics or type of young people who might be suited to, or benefit most from a physical activity intervention. Our analysis suggests that in these trials, physical activity may produce a similar, large magnitude of effect for young people irrespective of whether they were recruited with a diagnosis or threshold symptoms of depression, and appears unchanged when restricted to trials conducted with clinical samples. Similarly, the treatment effect does not appear to be associated with baseline depression symptom severity, however given the small number of clinical-based trials, further work is needed to confirm these findings. While appearing consistent with a recent adult level moderator analysis of physical activity trials (Schuch et al. Reference Schuch, Dunn, Kanitz, Delevatti and Fleck2016c), caution should still be taken when interpreting these subgroup analyses as they are likely underpowered and only observational in nature. All but two included trials in this analysis were in the mild and moderate severity range suggesting that physical activity may be clinically relevant for young people experiencing this symptom severity, and that further research is needed to explore the benefits for severe depression. Current treatment guidelines recommend providing general advice on the benefit of physical activity, alongside first-line interventions (e.g., CBT), to all young people presenting with depression, regardless of severity (NICE, 2015). While the current finding highlights the potential of physical activity as stand-alone intervention, larger scale replication trials, particularly with clinical samples, are needed before this work can be used to inform treatment guidelines.
Common intervention characteristics were observed across trials that may guide further research and the clinical implementation of physical activity protocols, including the use of supervised group sessions of moderate or vigorous intensity aerobic activity over 60 min sessions, multiple times per week, over at least an 8-week period. Adult-level syntheses have identified a similar pattern of common characteristics that may lead to symptom improvement (Perraton et al. Reference Perraton, Kumar and Machotka2010; Silveira et al. Reference Silveira, Moraes, Oliveira, Coutinho, Laks and Deslandes2013; Stanton & Reaburn, Reference Stanton and Reaburn2014; Nyström et al. Reference Nyström, Neely, Hassmén and Carlbring2015). Our observational subgroup analyses suggest that the intervention characteristics we investigated may not have modified the treatment effect in the included trials. However, caution should be taken when interpreting this finding given the small number of trials in subgroups leaving analyses underpowered to detect differences if they exist. The current evidence base is therefore limited to the characteristics common in the small number of trials published to date, with further work needed to determine the component ingredients required to bring about improvement in depression and if identified, how best to implement them in clinical settings.
To date, an optimum dose of activity for depression cannot be recommended due to a lack of available trial data. Only two trials with young people have directly tested the effect of differing intensities of aerobic activity (Chu et al. Reference Chu, Buckworth, Kirby and Emery2009; Balchin et al. Reference Balchin, Linde, Blackhurst, Rauch and Schonbachler2016), with equivocal findings. Pooling of included trials according to intensity appeared to suggest that those implementing moderate and vigorous intensity activities produced large effects, however there were too few trials of low intensity activity to allow meaningful comparison, requiring further investigation. Two highly cited trials in adults suggest that more physical activity, whether in the form of higher intensity or overall energy expenditure may produce better results for the treatment of depression (Dunn et al. Reference Dunn, Trivedi, Kampert, Clark and Chambliss2005; Trivedi et al. Reference Trivedi, Greer, Church, Carmody, Grannemann and Galper2011). While the dose–response relationship looks promising, further trials are required, particularly in young people. Investment in dose-response trials needs to be considered alongside an alternative treatment option that focuses less on minimum thresholds and more on promoting incidental physical activity and reducing sedentary behaviour (Vancampfort et al. Reference Vancampfort, Stubbs, Ward, Teasdale and Rosenbaum2015; Parker et al. Reference Parker, Hetrick, Jorm, Mackinnon, McGorry and Yung2016).
Our pooled effect was based on variable types of physical activity, yet it remained unchanged when trials that differed substantially were removed (e.g., yoga, dance movement therapy), suggesting that the type of activity may not be important. Although the type was variable, most interventions consisted of an aerobic-based activity, with only one trial using resistance-based activity (Woolery et al. Reference Woolery, Myers, Sternlieb and Zeltzer2004) and two others using a combination (Legrand, Reference Legrand2014; Carter et al. Reference Carter, Guo, Turner, Morres, Khalil and Brighton2015). In adults, resistance-based activity has produced reductions in depression symptoms and direct comparison suggests both modalities perform equally well (Doyne et al. Reference Doyne, Ossip-Klein, Bowman, Osborn, McDougall-Wilson and Neimeyer1987; Martinsen et al. Reference Martinsen, Hoffart and Solberg1989; Krogh et al. Reference Krogh, Saltin, Gluud and Nordentoft2009; Cooney et al. Reference Cooney, Dwan, Greig, Lawlor, Rimer and Waugh2013). Further investigation of resistance-based activity in young people is warranted, particularly as some may show preference for this modality (Firth et al. Reference Firth, Rosenbaum, Stubbs, Vancampfort, Carney and Yung2016).
Supervision is a common feature of physical activity protocols (Perraton et al. Reference Perraton, Kumar and Machotka2010), and may lead to lower dropout, particularly when delivered by a qualified professional (e.g., exercise physiologist or physiotherapist) (Stubbs et al. Reference Stubbs, Vancampfort, Rosenbaum, Ward, Richards and Soundy2016b). Conversely, a lack of supervision may contribute to poor engagement and compliance (Knapen et al. Reference Knapen, Vancampfort, Moriën and Marchal2015), and is a likely factor in null findings in some adult level trials (Chalder et al. Reference Chalder, Wiles, Campbell, Hollinghurst, Haase and Taylor2012; Pfaff et al. Reference Pfaff, Alfonso, Newton, Sim, Flicker and Almeida2014). Most trials in this review utilised supervision, with seven employing a qualified professional, potentially contributing to the positive pooled effect.
Strengths and limitations
The rigour of this review is enhanced by the inclusion of RCTs, the use a comprehensive and exhaustive search, systematic methodology to identify trials and extract data, and the use of systematic tools to assess bias and overall evidence quality. Additionally the requirement of a diagnosis or threshold depression symptoms for trial inclusion highlights the potential clinical applicability of the findings. This is the first meta-analysis to examine the effects of physical activity interventions for depression spanning the adolescent-young adult period, providing valuable knowledge about a period that overlaps with the peak onset of depression.
A number of factors may limit the generalisability of the findings, including the overall low quality of the evidence base contributing to the main analysis, over-representation of female-only samples, use of potentially heterogeneous activity protocols, small sample sizes and the limited number of available trials, particularly those recruiting from clinical settings. Our subgroup findings are limited by being observational in nature and underpowered due to the small number of trials in many subgroupings. We were unable to investigate a number of important factors due to the paucity of available trials, including the effect of physical activity over longer-term follow-up (as maintenance of post-intervention benefit is often an important clinical goal) and the relative benefits of physical activity compared with established depression treatments such as medication and psychotherapy. Determining whether these interventions are equivalent may provide young people who do not want, are not suited for or do not benefit from established therapies, a viable and effective treatment option. Exploring the mechanisms by which physical activity improves depression is also needed to better understand the necessary ingredients for symptom change and to inform the design of more targeted intervention strategies. Also missing from the current evidence base is an investigation of the effect that physical activity interventions have on physical health outcomes in depression, particularly given the risk that both depression and low activity levels confer to negative health consequences (Lee et al. Reference Lee, Shiroma, Lobelo, Puska, Blair and Katzmarzyk2012; Goldstein et al. Reference Goldstein, Carnethon, Matthews, McIntyre, Miller and Raghuveer2015).
Conclusion
This review indicates that physical activity is a promising primary intervention for adolescents and young adults experiencing a diagnosis or threshold symptoms of depression, however concerns surrounding methodological quality of included trials limit our ability to conclude on its effectiveness. While the effect of physical activity appears large and robust in comparison with attention/activity placebo control conditions, and when restricted to trials in clinical samples, the findings should be interpreted with caution given the quality of the underlying evidence base is currently low. This suggests uncertainty surrounding the size of the effect and indicates that large, well-reported and robust trials conducted with help-seeking clinical samples in real-world treatment settings are required to increase confidence in the current finding. Physical activity appears to be acceptable to young people, suggesting the potential feasibility of incorporating it into the routine clinical treatment of depression, however research is still required to establish the intervention characteristics that are necessary to improve depression.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291717002653
Acknowledgements
This work was supported in part by National Health and Medical Research Council Project Grant 1063033 awarded to AGP. APB is supported by a PhD scholarship attached to this grant. SR is funded by a UNSW Scientia & a NHMRC Early Career Fellowship (APP1098518). This work received no other specific support from any funding agency, commercial or not-for-profit sectors.
Declaration of Interest
None.