Introduction
Cognitive impairment is a widely recognized finding in schizophrenia and has been established for chronic (Heinrichs & Zakzanis, Reference Heinrichs and Zakzanis1998), first-episode (Mesholam-Gately et al. Reference Mesholam-Gately, Giuliano, Goff, Faraone and Seidman2009) and drug-naïve patients (Fatouros-Bergman et al. Reference Fatouros-Bergman, Cervenka, Flyckt, Edman and Farde2014). Compared with healthy controls, the largest differences are reported for verbal episodic memory and processing speed (Dickinson et al. Reference Dickinson, Ramsey and Gold2007; Mesholam-Gately et al. Reference Mesholam-Gately, Giuliano, Goff, Faraone and Seidman2009; Palmer et al. Reference Palmer, Dawes and Heaton2009; Schaefer et al. Reference Schaefer, Giangrande, Weinberger and Dickinson2013), but deficits also extend to other domains such as attention, perception, language and visuo-spatial abilities (e.g. Heinrichs & Zakzanis, Reference Heinrichs and Zakzanis1998; Fioravanti et al. Reference Fioravanti, Carlone, Vitale, Cinti and Clare2005, Reference Fioravanti, Bianchi and Cinti2012; Palmer et al. Reference Palmer, Dawes and Heaton2009).
Beyond these cognitive domains, deficits in executive functions and their association with clinical symptoms of schizophrenia have generated particular interest (e.g. Donohoe & Robertson, Reference Donohoe and Robertson2003). Executive functions constitute an umbrella term subsuming top-down cognitive processes related to the conscious control of behaviour (e.g. Lezak et al. Reference Lezak, Howieson and Loring2004; Alvarez & Emory, Reference Alvarez and Emory2006; Jurado & Rosselli, Reference Jurado and Rosselli2007). Although a uniform definition of executive functions still does not exist, there is broad consensus that working memory, inhibition and set shifting represent basic executive processes which contribute to high-order executive functions such as reasoning and problem solving (Miyake et al. Reference Miyake, Friedman, Emerson, Witzki, Howerter and Wager2000; Diamond, Reference Diamond2013). The ability to plan ahead future behaviour constitutes a prototypical example of high-level executive functioning, as it reflects the conscious selection of actions based on the anticipation of potential outcomes in relation to goals contingent upon current situational demands (Norman & Shallice, Reference Norman, Shallice, Davidson, Schwartz and Shapiro1986). More specifically, planning is required for successful behaviour in many situations beyond everyday routine where known action schemata are either not applicable or suitable (Ward & Morris, Reference Ward, Morris, Ward and Morris2005). In these instances, an appropriate and purposive behavioural sequence must be identified by mentally generating alternative sequences of interdependent actions and by evaluating the consequences of these anticipated actions in relation to goal attainment (Goel, Reference Goel and Nadel2002; Ward & Morris, Reference Ward, Morris, Ward and Morris2005). Planning crucially depends on the integrity of the prefrontal cortex (Unterrainer & Owen, Reference Unterrainer and Owen2006; Nitschke et al. Reference Nitschke, Köstering, Finkel, Weiller and Kaller2017), parts of which are known to be structurally and functionally affected in schizophrenia (Minzenberg et al. Reference Minzenberg, Laird, Thelen, Carter and Glahn2009).
Although planning deficits in schizophrenia patients have been reported previously (see Greenwood et al. Reference Greenwood, Wykes, Sigmundsson, Landau and Morris2011), a brief overview of the extant literature provides a highly inconsistent picture (see Sullivan et al. Reference Sullivan, Riccio and Castillo2009): evidence on severe impairments in planning performance (e.g. Morice & Delahunty, Reference Morice and Delahunty1996; Marczewski et al. Reference Marczewski, de Linden and Larøi2001; Tyson et al. Reference Tyson, Laws, Roberts and Mortimer2004) is in stark contrast to a considerable amount of studies reporting either no significant deficits (Krabbendam et al. Reference Krabbendam, de Vugt, Derix and Jolles1999; Dichter et al. Reference Dichter, van der Stelt, Boch and Belger2006; Feldmann et al. Reference Feldmann, Schuepbach, von Rickenbach, Theodoridou and Hell2006; Greenwood et al. Reference Greenwood, Wykes, Sigmundsson, Landau and Morris2011; Asevedo et al. Reference Asevedo, Gadelha, Noto, Mansur, Zugman, Belangero, Berberian, Scarpato, Leclerc, Teixeira, Gama, Bressan and Brietzke2013) or only small decreases in planning performance of schizophrenia patients compared with matched controls (Badcock et al. Reference Badcock, Michiel and Rock2005; Zhu et al. Reference Zhu, Liu, Wang, Jiang, Fang, Hu, Wang, Wang, Liu and Zhang2010). This heterogeneity is all the more surprising given that all of these studies assessed planning performance by using well-defined tower tasks such as the Tower of London (TOL), the Tower of Hanoi or one of their variants (Berg & Byrd, Reference Berg and Byrd2002).
In the light of these inconsistent findings on the extent of planning impairments in schizophrenia, the objectives of the present study were twofold. First, by conducting a comprehensive literature search and meta-analysis, we addressed the general question as to whether planning performance is indeed impaired in schizophrenia. More than 80 studies have examined planning in schizophrenia patients using tower tasks (see below), but – to the best of our knowledge – no quantitative evaluation of this literature in terms of a meta-analysis exists so far (but see Sullivan et al. Reference Sullivan, Riccio and Castillo2009 for a first overview). Second, using moderator analyses, we aimed to investigate whether various task- and subject-related variables may account for the inconsistency of previous findings.
As planning tasks measure the ability to flexibly and consciously adapt behaviour to changing situational demands in a complex world, assessments of planning performance may potentially constitute a valuable surrogate marker for a patient's ability to live an independent and autonomous life (see Holt et al. Reference Holt, Wolf, Funke, Weisbrod and Kaiser2013). Understanding possible causes for the inconsistencies in the literature could thus be of utmost importance for future improvements in neuropsychological assessment in research and clinical contexts.
Method
Literature search
A comprehensive literature search using Medline, PsycINFO and ISI Web of Knowledge provided the basis of the present meta-analysis. Following the PRISMA guidelines (‘Preferred Reporting Items for Systematic Reviews and Meta-Analyses’; Moher et al. Reference Moher, Liberati, Tetzlaff and Altman2009), the search was guided by a pre-specified review protocol stating the eligibility criteria. More specifically, manuscripts on tower tasks were selected using the search terms ‘Tower of London’, ‘Tower of Hanoi’, ‘Tower of Toronto’, ‘CANTAB’ and ‘Stockings of Cambridge’. No specifications regarding the start date of publication were made. Out of the resulting 1224 references, 99 papers were identified that were published in English until January 2014 and that included samples of schizophrenia patients. After exclusion of reviews, case studies, unpublished dissertations and conference abstracts, a total of 84 studies remained. These studies were screened according to the following criteria: inclusion of independent patient and healthy control sample(s) at adult age and standardized application of a three-ball/disk tower version (TOL, Stockings of Cambridge and Tower of Hanoi) as a measure of planning performance.
Planning accuracy is commonly measured by calculating the number of problems that were perfectly solved in the minimum possible number of moves (see Berg & Byrd, Reference Berg and Byrd2002). As this turned out to be the most frequently reported outcome variable in the remaining studies (31 out of 44), the number of perfect solutions was used as the dependent variable in the present meta-analysis. A detailed flowchart of the selection process is provided in Fig. 1.

Fig. 1. Overview on the selection process.
Data extraction
From the selected studies, results and sample characteristics were extracted and entered in an Excel spreadsheet by F.K. and independently cross-checked by K.N. In the case of missing information on means and standard deviations, reported t or F statistics were converted to Hedges’ g where applicable (Hedges & Olkin, Reference Hedges and Olkin1985). In a few studies (Morris et al. Reference Morris, Rushe, Woodruffe and Murray1995; Langdon et al. Reference Langdon, Coltheart, Ward and Catts2001; Joyce et al. Reference Joyce, Hutton, Mutsatsa, Gibbins, Webb, Paul, Robbins and Barnes2002), the means and standard deviations needed to compute the g values were extracted from figures using graph digitizer software (DigitizeIT version 2.1; http://www.digitizeit.de; independent cross-check using WebPlotDigitizer version 3.9; Rohatgi, Reference Rohatgi2015). In 12 cases, the corresponding authors of the primary studies were contacted for further information. Of these, five authors provided the requested details, three provided additional data, and four did not respond.
Statistical analyses
Following Hedges & Olkin (Reference Hedges and Olkin1985), reported means and standard deviations from each dataset were used to calculate standardized mean differences in terms of Hedges’ g, thereby correcting for small-sample bias. The standardized mean difference reflects the difference in planning performance (or size of effect) between a given patient group and the respective group of healthy controls. Positive values indicate impaired planning performance in the patient group.
For several studies, multiple standardized mean differences were extracted (e.g. for different patient groups). The interdependency between these outcomes (e.g. different patient groups compared with one control group) was accounted for in the analyses by computing covariances between the dependent outcome measures and incorporating this information in the model (see online Supplementary material, section S1). A multilevel meta-analytic model was used including random effects at the study level and the effect-size level (Konstantopoulos, Reference Konstantopoulos2011; see also online Supplementary material, section S2, for more detailed information).
To evaluate potential sources of inconsistency between studies, the influence of potential moderator variables on the size of the (average) standardized mean difference was examined via meta-regression models by adding these potential moderators as fixed effects to the multilevel model (van Houwelingen et al. Reference van Houwelingen, Arends and Stijnen2002). Based on available information, we hereby examined the role of task difficulty, sociodemographic variables [i.e. age, sex, intelligence quotient (IQ) and education level] and clinical variables (i.e. age at disease onset, disease duration, symptom severity and medication use; see Table 1).
Table 1. Overview on the included studies and datasets

PANSS, Positive and Negative Syndrome Scale; SZ, schizophrenia; HC, healthy control; IQ, intelligence quotient; n.a., not available.
a Years of education.
b Chlorpromazine equivalent dose.
c Clarification and additional data obtained through personal communication.
d Wechsler Adult Intelligence Scale, third edition (WAIS-III) vocabulary raw scores.
e Measured on an eight-point scale.
For task difficulty, two analyses were conducted. The first analysis included all studies and outcomes, using one estimate of difficulty per outcome (in terms of the average minimum number of moves across problem items) as the predictor. In the second analysis, we restricted the model to data from studies reporting varying task difficulties (i.e. performance scores for different levels of minimum moves). The latter analysis thus focused on multiple estimates from the same sample of subjects, which partly circumvents the problem of comparing outcomes of different studies that varied in task difficulty.
The relevance of the demographic variables (i.e. age, sex, IQ and education level) was examined in two different ways in the analyses. First, we examined the impact of potential differences between patients and controls in the demographic variables on effect sizes. For this analysis, we used predictor variables that indicated the difference between the patient and the control group within each study. In particular, the standardized mean differences for age, IQ and education level (i.e. years of education) were computed between each pair of patient and control group by subtracting the value of the patient group from the value of the control group. For sex, we computed the difference in the proportion of females between each pair. Second, we examined the overall impact of demographic variables on effect sizes by including the mean age, sex (proportion of females), IQ and education level for each study (i.e. for patient and control groups combined) as predictors.
We further examined potential moderator effects of various clinical variables (i.e. age at disease onset, disease duration, symptom severity and medication use). For symptom severity, only those studies were included that reported negative, positive, general and total symptom scores according to the Positive and Negative Symptom Scale [i.e. Positive and Negative Syndrome Scale (PANSS); Kay et al. Reference Kay, Opler and Lindenmayer1989].
Funnel plot asymmetry as a possible indicator for publication bias was examined visually (by plotting the observed standardized mean differences against their standard errors) and by adding the sampling variances of the standardized mean differences as a predictor to the multilevel model (which essentially models a quadratic relationship between the standardized mean differences and the standard errors). The intercept of this model was also used as an estimate of the ‘genuine’ effect devoid of publication selection bias [analogous to the ‘precision-effect estimate with SE’ (PEESE) method; see Stanley & Doucouliagos, Reference Stanley and Doucouliagos2014].
All models were fitted using restricted maximum likelihood estimation (van Houwelingen et al. Reference van Houwelingen, Arends and Stijnen2002). Analyses were carried out in R (version 3.2.3; R Core Team, 2015) using the metafor package (version 1.9.8; Viechtbauer, Reference Viechtbauer2010).
Results
Literature search and subsequent selection (Fig. 1) resulted in 66 effect size estimates extracted from 31 studies. The majority of studies (i.e. 17) provided only one estimate of the standardized mean difference, whereas multiple estimates were extracted from 14 studies (online Supplementary material, section S1). In sum, the included studies comprised the data of 1377 schizophrenia patients and 1477 healthy controls. Diagnoses of schizophrenia were based on the Diagnostic and Statistical Manual of Mental Disorders, third edition (DSM-III), DSM-III, revised (DSM-III-R), DSM, fourth edition (DSM-IV), DSM-IV, text revision (DSM-IV-TR) or the International Classification of Diseases (ICD)-10 in all but one study (Elliott et al. Reference Elliott, McKenna, Robbins and Sahakian1998). In 29 studies, planning performance was assessed using variants of the TOL, whereas variants of the Tower of Hanoi were applied in the remaining two studies. For more detailed information, please refer to the overview on all included studies and datasets provided in Table 1.
Average standardized mean difference in planning performance
The observed standardized mean differences in planning performance ranged from −0.06 to 2.58 (see histogram in Fig. 2
a), with a mean difference of 0.71 (median, 0.64) (Fig. 3). The average standardized mean difference as estimated in the multilevel model was
$\hat \mu \; = 0.67$
with the 95% confidence interval (CI) ranging from 0.56 to 0.78. The true effects appeared to be heterogeneous (Q = 147.63, degrees of freedom = 65, p < 0.0001). The larger part of the heterogeneity in the true effects was attributable to differences in the true effects within studies
$(\hat \sigma _2^2 \; = 0.0509)$
, but there was also noteworthy between-study heterogeneity
$(\hat \sigma _1^2 \; = 0.0137)$
. Identifiability of the variance components was ensured by inspection of corresponding profile likelihood plots (see online Supplementary material, section S3).

Fig. 2. (a) Histogram of the standardized mean differences across all included datasets. (b) (Funnel) plot of the standard errors against the observed standardized mean differences. The regression line is based on the model including the sampling variances as a predictor (as a formal test for funnel plot asymmetry). (c) Scatterplot of task difficulty against the effect size estimates (drawn inversely proportional to their standard errors). The grey shaded area corresponds to a 95% confidence interval band for the predicted effect size as a function of task difficulty; the solid black line corresponds to the resulting regression line. Multiple estimates extracted from the same study based on the same tower task but with different difficulties are connected by dotted lines.

Fig. 3. Forest plot of the standardized mean difference (SMD) between schizophrenia patients and healthy controls across all included datasets. Note that the 95% confidence intervals (CIs) include the value of 0 in a substantial part of the datasets.
The 95% CI of 0.56–0.78 denotes the interval within which the average standardized mean difference can be expected, but it does not indicate where the true standardized mean difference in any particular study would be expected to be. To estimate this true standardized mean difference in a particular study, we thus computed the 95% prediction interval, which was found to range between 0.16 and 1.18. While the interval encompasses effects that could be considered to range from very small to very large, it does not include the value 0. This suggests that the group difference is expected to be present in any particular single study with high probability, albeit its exact magnitude being uncertain.
Robustness of the results
An examination of the standardized residuals revealed three reported effects with relatively large absolute values of ⩾±2 (Morice & Delahunty, Reference Morice and Delahunty1996; Langdon et al.
Reference Langdon, Coltheart, Ward and Catts2001; Marczewski et al.
Reference Marczewski, de Linden and Larøi2001). In fact, these studies yielded the smallest/largest standardized mean differences in the dataset (i.e. −0.06, 1.89 and 2.64, respectively). Removing these estimates led to a slight decrease in the estimated average effect (
$\hat \mu \; = 0.64$
, 95% CI 0.54–0.73), but not to a change in the conclusions.
Two studies (Goldberg et al.
Reference Goldberg, Saint-Cyr and Weinberger1990; Bustini et al.
Reference Bustini, Stratta, Daneluzzo, Pollice, Prosperini and Rossi1999) measured performance differences using the Tower of Hanoi task, which is similar but not identical to the TOL taskFootnote †
Footnote
1
used in the remaining studies. After removing these two studies, model estimation yielded virtually identical results (
$\hat \mu \; = 0.66$
, 95% CI 0.56–0.77), thus indicating that the results were not influenced by the type of tower task used.
Publication bias
According to the funnel plot (Fig. 2b ), some small studies with estimates close to zero are possibly missing. As a formal test for funnel plot asymmetry, the sampling variance of the standardized mean differences was added as a predictor to the model, which yielded evidence of a significant relationship (p < 0.001) and hence indicates possible publication bias. As discussed by Stanley & Doucouliagos (Reference Stanley and Doucouliagos2014), the intercept of this model can be used to estimate the effect free of publication selection bias. As shown in Fig. 2b (solid line), this yielded an estimate of 0.38 (95% CI 0.20–0.56), which can be regarded as substantially lower than the estimate of 0.67 obtained in the main analysis, but still significantly different from zero.
These findings appeared to be driven to a large extent by the two largest positive estimates (i.e. 1.83 from Marczewski et al. Reference Marczewski, de Linden and Larøi2001 and 2.58 from Morice & Delahunty, Reference Morice and Delahunty1996). Removing these two studies reduced the relationship between the standardized mean differences and the sampling variances to just below significance (p = 0.06) and resulted in a higher intercept estimate of 0.48 (95% CI 0.30–0.66; see Fig. 2b , dotted line).
Moderator analyses
Effects of task difficulty
Information of task difficulty (in terms of the minimum number of moves required for the optimal solution) was available for 57 out of the 66 effect size estimates. Meta-regression analysis yielded a significant relationship between overall task difficulty and the size of the (average) effect (b = 0.124, s.e. = 0.038, p = 0.001; online Supplementary material, Supplementary Table S4-1; see also Fig. 2c
), with effect sizes increasing with higher task difficulty. That is, an increase in a problem set's difficulty of one minimum move amplified the estimated effect size of planning differences between groups of schizophrenia patients and healthy controls by 0.12 units. The analysis showed that there was a 95% chance of the true standardized mean difference in a particular study being larger than 0 if its task difficulty was three moves or higher (Table 2). Moreover, a medium effect size (i.e. a standardized mean difference above 0.5) was reliably achieved if the task difficulty comprised at least four minimum moves (
$\hat \mu \; = 0.68$
, 95% CI 0.54–0.82).
Table 2. Effect of task difficulty on effect size

pred, Predicted average standardized mean difference; s.e., standard error; ci.lb, lower-bound confidence interval; ci.ub, upper-bound confidence interval; pi.lb, lower-bound prediction interval; pi.up, upper-bound prediction interval.
a Number of minimum moves for a perfect solution.
Differences in task difficulty across studies (or any other potential moderator) could be easily confounded with unknown third variables, leading to spurious relationships – a general and well-known problem with meta-regression analyses (Thompson & Higgins, Reference Thompson and Higgins2002). This problem can be diminished by analysing studies that provide multiple estimates from the same sample of subjects. We therefore repeated the analysis including only those eight datasets reporting performance differences at various levels of task difficulty (i.e. the points in Fig. 2c connected by the dotted lines; k = 34 effect size estimates). The relationship between task difficulty and outcome was of comparable magnitude (b = 0.137, s.e. = 0.039) and remained significant (p < 0.001; online Supplementary material, Supplementary Table S4-1), providing additional support for the hypothesis that the difference in planning performance between groups of schizophrenia patients and controls increased as studies/datasets applied more difficult problem items.
Effects of sociodemographic and clinical variables
Ideally, the groups being compared in the individual studies should be as comparable as possible, except for patient status. To examine the respective fits, information about the age, sex, IQ and level of education was coded for each group where available. Based on this information, the standardized mean differences for age, IQ and education were computed for each pair of groups being compared. For sex, we computed the difference in the proportion of females within each pair of groups. Summary statistics are provided in online Supplementary material (Supplementary Table S4-2). While on average groups did not differ noticeably with respect to age and sex, the healthy control groups had on average higher IQ scores and a higher education level (by approximately half of a standard deviation). However, when using the difference scores between patients and controls for age, sex, IQ and education as predictors in additional meta-regression analyses, none of these variables was found to be significantly related to the size of the group difference in planning performance (all p ⩾ 0.31; online Supplementary material, Supplementary Table S4-3).
Similarly, when using mean age, proportion of females, mean IQ and mean educational level for each patient and control group pair combined as potential moderator variables, we did not find any significant relationships (all p ⩾ 0.53; online Supplementary material, Supplementary Table S4-4).
Finally, patient groups differed with respect to various clinical variables, including mean age at disease onset, disease duration, symptom severity (PANSS negative, positive, general and total symptom scores), and medication use (measured in terms of the equivalent mean daily doses of chlorpromazine per 100 mg). In additional meta-regression models, we therefore examined to what extent these variables were related to the outcome. Again, none of these variables was found to be related to the size of the performance difference between groups (all p ⩾ 0.20; online Supplementary material, Supplementary Table S4-5).
Discussion
In light of the hitherto heterogeneous findings from individual studies (see the forest plot in Fig. 3), the present meta-analysis resolves this inconsistency by demonstrating that schizophrenia is indeed associated with impaired planning performance. Assuming that planning performance is normally distributed within patients and controls (see Kaller et al. Reference Kaller, Debelak, Köstering, Egle, Rahm, Wild, Blettner, Beutel and Unterrainer2016), the standardized mean difference between groups of 0.67 implies that a randomly chosen healthy control subject would outperform a randomly chosen patient with 68.3% probability (95% CI 65.5–70.9%)Footnote 2 . The present results further demonstrate that these impairments in schizophrenia patients are amplified by increased planning demands in terms of the minimum number of moves to a solution, thus suggesting that the performance deficit of schizophrenia patients on tower tasks is indeed reflecting a specific planning deficit rather than unspecific cognitive impairment.
Effects of sociodemographic and clinical variables
The present results point to an overall planning deficit in schizophrenia, but further reveal considerable heterogeneity (Figs 2 a and 3) which has also been observed in meta-analyses of impairments of schizophrenia patients in other cognitive domains (Fioravanti et al. Reference Fioravanti, Carlone, Vitale, Cinti and Clare2005, Reference Fioravanti, Bianchi and Cinti2012; Forbes et al. Reference Forbes, Carrick, McIntosh and Lawrie2009; Mesholam-Gately et al. Reference Mesholam-Gately, Giuliano, Goff, Faraone and Seidman2009; Knowles et al. Reference Knowles, David and Reichenberg2010). Regarding planning performance, various authors have attempted to explain this heterogeneity by linking planning deficits to specific clinical variables. For instance, Greenwood et al. (Reference Greenwood, Wykes, Sigmundsson, Landau and Morris2011) have suggested that planning performance is specifically affected in patients with (positive) disorganization symptoms compared with patients with (negative) psychomotor poverty symptoms. Morice & Delahunty (Reference Morice and Delahunty1996) also found an association between TOL performance and positive symptoms. In contrast, Braw et al. (Reference Braw, Benozio and Levkovitz2012) showed that only patients in positive but not in negative symptomatic remission exhibited planning deficits, underscoring the general role of (persisting) negative symptoms for cognitive dysfunction (Ventura et al. Reference Ventura, Hellemann, Thames, Koellner and Nuechterlein2009). Thus, while specific patient characteristics seem to exert a differential impact on planning performance, the pattern behind these partly controversial findings is still not understood. In this respect, meta-regression analysis would constitute a powerful tool for unveiling moderating effects of clinical and/or sociodemographic characteristics that potentially drive these differences between studies (see Forbes et al. Reference Forbes, Carrick, McIntosh and Lawrie2009; Knowles et al. Reference Knowles, David and Reichenberg2010; Fioravanti et al. Reference Fioravanti, Bianchi and Cinti2012).
However, the present attempt to elucidate the moderating effect of sociodemographic and clinical variables on planning impairments was significantly hampered by the lack of necessary information and the low consistency of which details were reported across the studies. More specifically, from the initially identified 99 studies, only 31 studies provided sufficient information for the meta-analysis. An even smaller number of eight to 28 studies could be included in the subsequent meta-regression analyses of potential sociodemographic (i.e. age, sex, IQ and education) and clinical moderator variables (i.e. age at disease onset, disease duration, severity of positive and negative symptoms, medication; see online Supplementary material, section S4). This problem of missing information clearly limits the generalizability of the present results. In particular, it cannot be ruled out that possible effects of sociodemographic and/or clinical characteristics on the existence or severity of planning deficits have remained undetected due to limited statistical power. Notably, the importance of increasing the degree of overlapping information on patient characteristics across studies has already been pointed out in previous meta-analyses on cognitive impairment in schizophrenia patients (Mesholam-Gately et al. Reference Mesholam-Gately, Giuliano, Goff, Faraone and Seidman2009; Fioravanti et al. Reference Fioravanti, Bianchi and Cinti2012).
Effects of task difficulty
As a second key finding, the present meta-regression analyses revealed that planning deficits of schizophrenia patients were exacerbated by increasing task demands (see Morice & Delahunty, Reference Morice and Delahunty1996). Notably, two-move TOL problems do not require planning ahead at all but can be solved by simple perceptual matching strategies (Owen, Reference Owen, Morris and Ward2005), whereas three-move TOL problems require to plan ahead only one move at maximum. This skill is reliably mastered, for instance, by neurotypical children at the age of 6 years (Kaller et al. Reference Kaller, Rahm, Spreer, Mader and Unterrainer2008; McCormack & Atance, Reference McCormack and Atance2011; Unterrainer et al. Reference Unterrainer, Ruh, Loosli, Heinze, Rahm and Kaller2013). Deficits in these very easy types of problems may hence be indicative of more general impairments in lower-order cognitive processes such as selective attention, working memory or inhibition rather than of specific planning impairments. Planning impairments in more difficult problems may also be partly driven by increasing task demands on lower-order cognitive processes. However, the finding that differences between schizophrenia patients and healthy controls were larger for problems with higher planning demands (i.e. with four minimum moves and more) than for the easy two- and three-move problems attests to a genuine planning deficit in schizophrenia.
The present moderator analyses imply that planning demands of TOL problems are mainly determined by the minimum number of moves. However, recent research has highlighted the impact of additional structural problem parameters such as search depth and goal hierarchy on planning performance (e.g. Ward & Allport, Reference Ward and Allport1997; Carder et al. Reference Carder, Handley and Perfect2004; Kaller et al. Reference Kaller, Unterrainer, Rahm and Halsband2004; Unterrainer et al. Reference Unterrainer, Rahm, Halsband and Kaller2005; Newman & Pittman, Reference Newman and Pittman2007; Berg et al. Reference Berg, Byrd, McNamara and Case2010; Nitschke et al. Reference Nitschke, Ruh, Kappler, Stahl and Kaller2012; see Kaller et al. Reference Kaller, Rahm, Köstering and Unterrainer2011a , for an overview). These additional determinants of planning difficulty differentially affect planning performance in childhood and older age (Kaller et al. Reference Kaller, Rahm, Spreer, Mader and Unterrainer2008; Unterrainer et al. Reference Unterrainer, Ruh, Loosli, Heinze, Rahm and Kaller2013, Reference Unterrainer, Kaller, Loosli, Heinze, Ruh, Paschke-Müller, Rauh, Biscaldi and Rahm2015a ; Köstering et al. Reference Köstering, Stahl, Leonhart, Weiller and Kaller2014), in patients with various neurodevelopmental or neurological pathologies (McKinlay et al. Reference McKinlay, Kaller, Grace, Dalrymple-Alford, Anderson, Fink and Roger2008; Köstering et al. Reference Köstering, McKinlay, Stahl and Kaller2012; Rainville et al. Reference Rainville, Lepage, Gauthier, Kergoat and Belleville2012; Unterrainer et al. Reference Unterrainer, Rauh, Rahm, Hardt, Kaller, Klein, Paschke-Müller and Biscaldi2015b ) and further elicit differential patterns of brain activation (Newman et al. Reference Newman, Greco and Lee2009; Kaller et al. Reference Kaller, Rahm, Spreer, Weiller and Unterrainer2011b ; Ruh et al. Reference Ruh, Rahm, Unterrainer, Weiller and Kaller2012). Instead of a global deficit, planning impairments in schizophrenia might hence be differentially associated with specific planning demands, for instance, on the depth and/or breadth of searching ahead. As these additional structural problem parameters are seldom systematically accounted for in existing TOL versions, the extant heterogeneity of findings on planning impairments in schizophrenia across studies (Figs 2a and 3) may at least to some extent be attributable to possible differences in the structural properties of the problem sets applied, thus taxing different specific planning demands across studies.
Not only the insufficient standardization of the TOL as a measurement instrument has been criticized (Kaller et al. Reference Kaller, Rahm, Köstering and Unterrainer2011a ), but also the reliability (Humes et al. Reference Humes, Welsh, Retzlaff and Cookson1997; Lowe & Rabbitt, Reference Lowe and Rabbitt1998; Syväoja et al. Reference Syväoja, Tammelin, Ahonen, Räsänen, Tolvanen, Kankaanpää and Kantomaa2015) and validity (Kafer & Hunter, Reference Kafer and Hunter1997) of some TOL versions has been questioned, which may have likewise contributed to the heterogeneous findings (but see Culbertson & Zillmer, Reference Culbertson and Zillmer1998a , Reference Culbertson and Zillmer b ; Schnirman et al. Reference Schnirman, Welsh and Retzlaff1998; Kaller et al. Reference Kaller, Unterrainer and Stahl2012, Reference Kaller, Debelak, Köstering, Egle, Rahm, Wild, Blettner, Beutel and Unterrainer2016; Köstering et al. Reference Köstering, Nitschke, Schumacher, Weiller and Kaller2015a , Reference Köstering, Schmidt, Egger, Amtage, Peter, Klöppel, Beume, Hoeren, Weiller and Kaller b ; Debelak et al. Reference Debelak, Egle, Köstering and Kaller2016; Tunstall et al. Reference Tunstall, O'Gorman and Shum2016 for suggestions of TOL versions with adequate psychometric properties).
Limitations
Several limitations need to be taken into account. The results indicate possible publication bias, with small studies yielding estimates close to zero possibly missing. While the estimated standardized mean difference of 0.67 may be an overestimation, the most conservative estimate of 0.38 still indicates a small to medium effect size.
As stated earlier, the lack of overlap in reported sample characteristics between studies severely constrained the options for further analyses and the potential for identifying variables driving the observed heterogeneity of effect sizes. Limited statistical power constitutes one issue, but meta-regression analysis in the presence of heterogeneity and a low number of observations may also result in inflated rates of false-positive findings (Higgins & Thompson, Reference Higgins and Thompson2004). Furthermore, potential interdependencies of moderating effects in terms of higher-order interactions (e.g. minimum moves by symptom severity) could not be tested due to the small numbers of reports providing sufficient information. Finally, the present results are focused on accuracy scores and the number of problems optimally solved in the minimum number of moves as the most frequently reported outcome (Berg & Byrd, Reference Berg and Byrd2002; Sullivan et al. Reference Sullivan, Riccio and Castillo2009) but cannot be extrapolated to other outcome variables (e.g. pre-planning and execution times).
Conclusions
Taken together, the present results advocate using psychometrically sound TOL versions with a graded difficulty of at least four minimum moves for reliably identifying planning impairments in schizophrenia patients. Future meta-regression analyses, particularly on the impact of task-specific as well as clinical and sociodemographic variables on planning impairments in schizophrenia, are needed to further investigate the heterogeneity of effects. Such analyses, however, require more comprehensive reports of sample and task characteristics than is currently provided by studies on planning performance in schizophrenia. Notwithstanding this heterogeneity in the extent of effects, it can be concluded that schizophrenia indeed incurs a significant and genuine deficit in the ability to plan ahead one's own actions.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291717000459
Acknowledgements
This research was partly supported by the BrainLinks-BrainTools Cluster of Excellence funded by the German Research Foundation (DFG; grant no. EXC 1086). The authors thank Dr Benjamin Rahm and Dr Lena Schumacher for their valuable comments on a previous version of the paper.
Declaration of Interest
None.