Introduction
Antipsychotic medication has been used to treat the symptoms of schizophrenia since the early 1950s. The mode of action for all currently licensed antipsychotics is via their action on dopamine D2 receptors (Seeman & Lee, Reference Seeman and Lee1975; Seeman et al. Reference Seeman, Lee, Chau-Wong and Wong1976; Kapur & Seeman, Reference Kapur and Seeman2001). However, approximately one-third of patients with a diagnosis of schizophrenia (Lindenmayer, Reference Lindenmayer2000; Mortimer et al. Reference Mortimer, Singh, Shepherd and Puthiryackal2010) fail to respond adequately to a trial of antipsychotic medication at recommended doses and duration; surprisingly, this occurs despite adequate D2 receptor occupancy (Wolkin et al. Reference Wolkin, Barouche, Wolf, Rotrosen, Fowler and Shiue1989; Coppens et al. Reference Coppens, Slooff, Paans, Wiegman, Vaalburg and Korf1991). The implication is that these occurrences of ‘treatment-resistant’ schizophrenia (TRS) are either characterized by a distinct neurochemical deficit, reflecting the heterogeneous nature of schizophrenia, or that the dopaminergic dysfunction is markedly more severe in TRS, sufficient that modulating the dopaminergic system with standard dopamine blocking antipsychotics is not enough to alleviate symptoms in these complex cases.
Schizophrenia has frequently been studied within a framework of reinforcement learning given the involvement of dopamine function in reward prediction (Deserno et al. Reference Deserno, Schlagenhauf and Heinz2016). Reinforcement learning is driven by midbrain dopamine neurons encoding violations of expected reward outcomes (Schultz, Reference Schultz1998), known as reward prediction error (RPE) signals. Specifically, unexpected reward elicits a phasic increase in firing of dopamine neurons, whereas omission of an expected reward results in a phasic decrease in firing. Midbrain RPE signals are thought to act as a learning signal which is fed through fronto-cortical basal ganglia loops in order to adjust behavior accordingly. Functional magnetic resonance imaging (fMRI) of brain regions which are densely innervated by dopamine neurons, particularly the striatum and aspects of the prefrontal cortex, typically show activation reflective of an RPE response, in line with the notion that the blood oxygen level-dependent (BOLD) signal likely reflects the information an area is receiving and processing. A recent meta-analysis of neuroimaging studies of prediction error during reinforcement learning confirmed robust prediction error activation in both ventral and dorsal aspects of the striatum as well as cortical regions including medial prefrontal, inferior and superior frontal, inferior parietal, and occipital cortex (Garrison et al. Reference Garrison, Erdeniz and Done2013). Consistent with pathologically increased tonic striatal dopamine in schizophrenia, phasic RPE signaling in the striatum has been shown to be reduced in schizophrenia patients (Murray et al. Reference Murray, Corlett, Clark, Pessiglione, Blackwell and Honey2008; Waltz et al. Reference Waltz, Schweitzer, Gold, Kurup, Ross and Salmeron2009; Schlagenhauf et al. Reference Schlagenhauf, Huys, Deserno, Rapp, Beck and Heinze2014), a finding attributed to ‘drowning’ of these phasic signals due to elevated presynaptic dopamine. As the primary target of dopaminergic neurons, the ventral striatum has been a major region of interest (ROI) for reinforcement learning studies in schizophrenia; however, impaired RPE signaling has also been detected in patients in additional areas such as prefrontal cortex (Corlett et al. Reference Corlett, Murray, Honey, Aitken, Shanks and Robbins2007; Koch et al. Reference Koch, Schachtzabel, Wagner, Schikora, Schultz and Reichenbach2010), parietal cortex (Waltz et al. Reference Waltz, Schweitzer, Gold, Kurup, Ross and Salmeron2009), thalamus (Murray et al. Reference Murray, Corlett, Clark, Pessiglione, Blackwell and Honey2008; Gradin et al. Reference Gradin, Kumar, Waiter, Ahearn, Stickle and Milders2011), and cerebellum (Waltz et al. Reference Waltz, Schweitzer, Gold, Kurup, Ross and Salmeron2009). Furthermore, there is evidence that reward feedback processing and RPE signaling in schizophrenia is selectively impaired for reward outcomes, but largely intact for loss outcomes, typically consisting of omission of expected reward (Waltz et al. Reference Waltz, Frank, Robinson and Gold2007, Reference Waltz, Schweitzer, Gold, Kurup, Ross and Salmeron2009, Reference Waltz, Schweitzer, Ross, Kurup, Salmeron and Rose2010; Koch et al. Reference Koch, Schachtzabel, Wagner, Schikora, Schultz and Reichenbach2010; Simon et al. Reference Simon, Biller, Walther, Roesch-Ely, Stippich and Weisbrod2010; Gold et al. Reference Gold, Waltz, Matveeva, Kasanova, Strauss and Herbener2012; Dowd et al. Reference Dowd, Frank, Collins, Gold and Barch2016). While meta-analytic findings have shown some overlap of neural regions processing reward and punishment in healthy individuals including in the striatum and medial frontal cortex, encoding of prediction errors during gain and loss outcomes appears to be spatially segregated in temporal and occipital regions (Garrison et al. Reference Garrison, Erdeniz and Done2013). This supports the possibility that the reward processing network could be selectively impaired in schizophrenia.
The question of whether a common dopaminergic abnormality underlies both treatment-responsive schizophrenia and TRS remains largely unresolved. Recent evidence suggests that elevated striatal dopamine synthesis capacity is specific to treatment-responsive schizophrenia, whereas anterior cingulate glutamate levels may be selectively increased in TRS (Demjaha et al. Reference Demjaha, Murray, Mcguire, Kapur and Howes2012; Reference Demjaha, Egerton, Murray, Kapur, Howes and Stone2014). However, the neural activation associated with dopamine functioning in the context of reinforcement learning has not been explicitly compared between these patient groups. Given the link between dopamine and RPE activation, a normal RPE signature would be expected in TRS if dopamine function is indeed unimpaired in this group. In contrast, treatment-responsive patients would be expected to exhibit the abnormal RPE activation typically associated with schizophrenia. Note that behavior may be similarly impaired in the two groups if distinct nodes of the same reward network are differentially impaired. Reinforcement learning relies not only on striatal dopamine function, but also on complex fronto-striatal interactions regulating related processes such as cognitive control, goal maintenance and planning, as well as action value and effort computations (Frank et al. Reference Frank, Loughry and O'reilly2001; Frank & Claus, Reference Frank and Claus2006; Barch & Dowd, Reference Barch and Dowd2010). As bottom–up learning signals are utilized to update a model of the surrounding environment, it is necessary to exert top–down cognitive control – particularly in the presence of persistent cognitive or behavioral bias – in order to optimize task-focused learning. As such, it is possible that even with intact RPE signaling, a lack of cognitive control modulating learning processes could lead to a disruption of reinforcement learning. Notably, glutamatergic dysfunction may be associated with these cognitive control deficits in schizophrenia (Falkenberg et al. Reference Falkenberg, Westerhausen, Specht and Hugdahl2012; Taylor et al. Reference Taylor, Neufeld, Schaefer, Densmore, Rajakumar and Osuch2015), providing a useful explanatory mechanisms for potential deficits in TRS.
In this study, we aimed to tap into these processes by quantifying cognitive bias in a reinforcement learning task and observing its modulation of RPE signaling. We compared treatment-resistant and treatment-responsive patients with a diagnosis of schizophrenia using fMRI while investigating (1) neural correlates of RPEs during wins and losses and (2) the association of cognitive bias with these learning signals. Cognitive bias was induced with a probabilistic reinforcement learning task using faces with varying expressions (Averbeck & Duchaine, Reference Averbeck and Duchaine2009), which is known to elicit a bias toward happy faces in both healthy controls (HC) and patients with schizophrenia (Evans et al. Reference Evans, Shergill, Chouhan, Bristow, Collier and Averbeck2011b). We examined RPE signaling separately for wins and losses on this task both because dissociable systems have been suggested for prediction error signaling of rewards and losses (Yacubian et al. Reference Yacubian, Gläscher, Schroeder, Sommer, Braus and Büchel2006; Garrison et al. Reference Garrison, Erdeniz and Done2013) and due to evidence that reward and loss processing may be differentially impacted in schizophrenia (Waltz et al. Reference Waltz, Frank, Robinson and Gold2007, Reference Waltz, Frank, Wiecki and Gold2011; Chang et al. Reference Chang, Waltz, Gold, Chan and Chen2016; Reinen et al. Reference Reinen, Van Snellenberg, Horga, Abi-Dargham, Daw and Shohamy2016). In addition, we anticipated that this would more closely reflect variabilities in prediction errors rather than effects of outcome itself.
Based on the theory that treatment-responsive schizophrenia, but not TRS, is characterized by an abnormal dopaminergic signature, we tested the hypothesis that responsive patients would show reduced RPE signaling compared with HC and TRS patients. This effect was expected to be particularly pronounced for win outcomes in areas typically associated with RPE signaling and dysfunctions in schizophrenia such as the striatum and thalamus. An additional exploratory analysis examined whether emotional bias would differentially modulate the neural RPE response in TRS patients compared with both responsive patients and controls.
Methods and materials
Participants
The study recruited 42 individuals with a diagnosis of schizophrenia [according to the International Classification of Diseases (ICD)-10 criteria] and 24 HC matched for age, sex, and socioeconomic background consented to participate in this study. The patient sample included 21 with TRS, based on persistent psychotic symptoms as defined as a score of at least 4 (moderate) on at least two positive symptom items of the Positive and Negative Syndrome Scale (PANSS) (Kay et al. Reference Kay, Flszbein and Opfer1987), at least two prior drug trials of 4–6 weeks duration with no clinical improvement, and persistence of illness for longer than 5 years with no period of good social or occupational functioning. The latter two criteria were ascertained by reviewing patients’ medical records and self-report of occupational status. The remaining 21 patients [non-treatment-resistant schizophrenia (NTR)] fulfilled criteria for being in symptomatic remission, as defined by a score of 3 or less on all items of the PANSS (Conley & Kelly, Reference Conley and Kelly2001), these symptoms having been stable for at least 6 months (Andreasen et al. Reference Andreasen, Carpenter, Kane, Lasser and Marder2005) and prescribed a stable dosage of antipsychotic for the previous 6 months. Current clozapine use was an exclusion criterion for all patients. Exclusion criteria for all subjects were a history of neurological illness, current major physical illness, and drug dependency over the last 6 months. Exclusion criteria for HC were a history of psychiatric illness and a first-degree relative having suffered from a psychotic illness. All subjects had normal hearing and normal or corrected-to-normal vision. The two patient groups were matched for age, sex, duration of illness, medication type and dosage. Intelligence quotient was measured with the two-item Wechsler Abbreviated Scale of Intelligence (WASI; Wechsler, Reference Wechsler1999). Chlorpromazine (CPZ) equivalent doses of medications were calculated using conversion tables (Woods, Reference Woods2003; Bazire, Reference Bazire2005). Ethical approval was provided by the London Camberwell St Giles Research and Ethics Committee. All participants provided informed written consent and were compensated for their time and travel.
fMRI procedure
A schematic of a trial sequence is shown in online Supplementary Fig. S1. Subjects underwent a reward learning paradigm consisting of choosing between two simultaneously presented faces, and over a series of iterative trials, learning to identify which of the faces was associated with a higher reward probability. Subjects were given the task of maximizing the reward (10p per correct choice) achieved during the task. The task screen was viewed via a head-mounted mirror inside the MRI scanner and response selection was via a button box operated by the right index and middle fingers.
The task consisted of four blocks of 30 trials each, during which two faces were presented side by side. One face was associated with a 60% reward probability and the other with a 40% reward probability. Faces within a block differed either in emotional expression (blocks 1 and 3) or identity (blocks 2 and 4), as described previously (Evans et al. Reference Evans, Fleming, Dolan and Averbeck2011a). In brief, emotional blocks consisted of one happy and one angry face with the same identity. Neutral blocks consistent of two faces with different identities but with neutral expressions. Combinations of identities and reward contingencies were counterbalanced across blocks and subjects.
Each trial began with a period of 1000 ms during which a white central fixation cross was presented against a dark background. This was followed by two faces being presented to the right and left of the fixation cross for 4500 ms. Within this time window, subjects were required to select one of the faces by pressing the corresponding button with their right hand. The selected face was highlighted by a yellow square surrounding it. Feedback was then presented on the screen for 1500 ms. The task had a total duration of approximately 15 min.
Scanning parameters
Functional scans were acquired using a T2* echo planar sequence (430 volumes, TR = 2000 ms, TE = 35 ms, field of view = 24 cm, slice thickness = 3 mm, matrix = 64 × 64, flip angle = 75°) sensitive to BOLD contrast on a 3 T GE Excite II MR scanner (GE Healthcare, Chicago, IL). A structural image was acquired for each subject with a T1-weighted magnetization prepared rapid acquisition gradient echo sequence (TR = 7321 ms, TE = 3 ms, TI = 400 ms, field of view = 240, slice thickness = 1.2 mm, 196 slices).
Reinforcement learning model
The behavioral data were modeled using a ‘double update’ reinforcement learning model (Schlagenhauf et al. Reference Schlagenhauf, Huys, Deserno, Rapp, Beck and Heinze2014). Choice probability for choosing option 1 on trial t was computed on each trial using the softmax function
where the inverse temperature β determines the randomness of the subject's choice, and Q 1(t) denotes the action value, or expected reward, for choice 1 on trial t. The action value for the chosen option is updated on a trial-by-trial basis using the RPE, defined as the difference between the expected reward Q and obtained reward R on trial t, scaled by the learning rate parameter α.
The action value for the unchosen option 2 was additionally updated on each trial, using the inverse reward value and identical learning rate parameter:
This model reflects the symmetry of choice outcomes, whereby feedback associated with a chosen option is also informative of the unchosen option (e.g. if stimulus 1 lost, stimulus 2 would have won).
The two free parameters β and α were estimated for each group separately by minimizing the negative log likelihood of the observed data pooled across all subjects within the group.
Behavioral analysis
Choices were defined as ideal if the action value (computed by the model) of the chosen option was greater than that of the unchosen option. Subjects’ proportions of ideal choices were analyzed using a linear mixed-effects model including the predictors group (HC v. NTR v. TRS) and condition (emotional v. neutral).
Emotional bias was defined as the difference between the proportion of choices for the happy face when the angry face would have been an ideal choice, and proportion of choices for the angry face when the happy face would have been the ideal choice. Emotional bias was compared between groups using one-way analysis of variance.
fMRI preprocessing and analysis
The fMRI data were preprocessed and analyzed using the FEAT tool from the FMRIB Software Library (FSL, http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/, Smith et al. Reference Smith, Jenkinson, Woolrich, Beckmann, Behrens and Johansen-Berg2004). Functional and structural brain images were extracted from non-brain tissue using FSL's brain extraction tool, and EPI images were realigned using MCFLIRT to correct effects of head motion. A 100 s temporal high-pass filter was applied and data were spatially smoothed using a Gaussian kernel of 5 mm full width at half maximum.
The fMRI data were analyzed using the general linear model as implemented in FSL FEAT. For the first-level analysis, the phases of the task (face presentation, choice, win outcome, and loss outcome) were modeled separately for emotional and neutral trials, resulting in eight unmodulated regressors. In addition, the win outcome and loss outcome phases were parametrically modulated with the trial-by-trial RPE values, again separately for emotional and neutral trials, resulting in four additional parametric regressors.
Each regressor was modeled with a δ function of zero duration and convolved with a canonical hemodynamic response function and its temporal derivative. Six standard motion parameters as well as a motion artifact confound matrix, which identified motion-corrupted volumes, were added as regressors of no interest. Volumes detected as corrupted were calculated by DVARS (Power et al. Reference Power, Barnes, Snyder, Schlaggar and Petersen2012) as implemented by FSL Motion Outliers. Percentage of corrupted volumes did not differ between groups, F(2,60) = 0.166, p > 0.848 (HC: N = 24; M = 0.4%, s.d. = 0.2%; NTR: N = 21; M = 0.4%, s.d. = 0.2%; TRS: N = 18; M = 0.4%, s.d. = 0.3%).
Contrasts of interest were constructed using the RPE regressors of win and loss outcomes separately. The first two contrasts averaged across the emotional and neutral conditions, resulting in the contrasts of interest: (1) win RPE and (2) loss RPE. The following two contrasts were constructed to detect activation which was greater in the emotional condition compared with the neutral condition: (3) win RPE [emotional > neutral] and (4) loss RPE [emotional > neutral].
At the group level, contrasts were submitted to separate mixed-effects analyses (FLAME1), modeling the effect of group (HC, NTR, or TRS) on BOLD signal. Whole-brain activation differences between groups were tested for win RPE and loss RPE. In order to detect subcortical RPE activation, we conducted an ROI analysis using a binary subcortical mask consisting of the bilateral striatum and thalamus (anatomically defined from the probabilistic Harvard Oxford Subcortical Structural Atlas thresholded at 30%). Broad inclusion of all structures of the striatum as well as the thalamus was based on the fact that subcortical RPE signaling was detected in each of these regions in a meta-analysis (Garrison et al. Reference Garrison, Erdeniz and Done2013) and dysfunctions in schizophrenia have also been observed in both striatum and thalamus (Gradin et al. Reference Gradin, Kumar, Waiter, Ahearn, Stickle and Milders2011).
In order to assess the differential effect of emotional bias on RPE-related signal, analyses of the win RPE [emotional > neutral] and loss RPE [emotional > neutral] contrasts included emotional bias as a covariate, and group × bias interaction effects were assessed. Significant clusters were determined by a voxelwise z-threshold of 2.3 and a cluster significance threshold of p = 0.05 (whole-brain family wise error corrected for multiple comparisons).
Correlation analyses were conducted between key positive symptoms (delusion and hallucinations) and significant clusters of RPE-related activation detected in the subcortical ROI analysis, and are reported where significant.
Results
Demographic characteristics of the studied samples are presented in Table 1. The TRS patients showed higher scores on all PANSS symptom dimensions compared with NTR patients.
HC, heathy controls; NTR, non-treatment resistant; TRS, treatment-resistant schizophrenia; WASI, Wechsler Abbreviated Scale of Intelligence; NS-SEC, National Statistics Socio-economic Classification; CPZ, chlorpromazine; PANSS, Positive and Negative Symptom Scale
Behavioral results
The proportion of ideal choices differed significantly between the three groups, F(2,63) = 3.69, p = 0.031, with HC (M = 0.63, s.d. = 0.13) making significantly more ideal choices compared with NTR patients (M = 0.55, s.d. = 0.13), p = 0.037, and marginally more compared with TRS patients (M = 0.57, s.d. = 0.11), p = 0.062. There was no significant main effect of (emotional v. neutral) condition, and no group × condition interaction.
All groups showed an emotional bias toward choosing the happy over the angry face, which did not differ significantly between groups, p > 0.05 (HC: M = 0.06, s.d. = 0.13; NTR: M = 0.13, s.d. = 0.22; TRS: M = 0.04, s.d. = 0.16).
Neuroimaging results
RPE signaling for wins and losses
HC showed RPE-related activation in response to win outcomes of the bilateral dorsolateral prefrontal cortices, superior frontal cortex, parietal cortices, and visual cortex as well as cerebellum (see Fig. 1a). TRS patients showed a similar activation pattern (Fig. 1c). In contrast, NTR patients showed no supra-threshold RPE-related activation. Group comparisons showed that NTR patients had significantly reduced RPE-related activation in precentral gyrus compared with TRS, in angular gyrus compared with HC, as well as in cerebellum compared with both HC and TRS (Fig. 2, online Supplementary Table S1). The subcortical ROI analysis revealed a significant effect of group (p < 0.05 uncorrected), with NTR patients showing reduced RPE-related activation in bilateral thalamus and caudate head compared with both HC and TRS (Fig. 3a).
Loss-related RPE response was observed in a widespread network in both HC and TRS, similar to that during win outcomes (Figs. 1b and d). Due to the negative sign of loss-related RPE, this signal reflects a negative RPE signal, with greater prediction errors resulting in greater deactivation in these areas. The NTR group showed no significant supra-threshold RPE-related signal, with no significant group differences at whole-brain level. The subcortical ROI analysis revealed reduced RPE-related signal in bilateral pallidum and caudate in NTR compared with HC (p < 0.05 uncorrected) and no significant difference between TRS and either of the other two groups (Fig. 3b).
Emotional bias × group interaction on RPE signal
During the emotional (v. neutral) loss trials, the whole-brain analysis showed a significant group × emotional bias interaction on RPE signal in bilateral thalamus and caudate nucleus, indicating a differential correlation in TRS and NTR patients (Fig. 4, online Supplementary Table S2). In TRS patients, a stronger emotional bias was associated with increased RPE signal in this region (R = 0.58, p = 0.006). In contrast, in NTR patients, the opposite was the case (R = −0.56, p = 0.008). This negative correlation in NTR was no longer significant after excluding one outlier; however, the difference between correlation coefficients in the two groups remained significant (Fisher's R to Z = 2.69, two-tailed p = 0.007). Interestingly, RPE signal in this region was significantly correlated with delusion severity in TRS patients, with stronger RPE signaling associated with more severe symptoms of delusions (R = 0.48, p = 0.027). This interaction was not evident in the emotional (v. neutral) win trials.
Discussion
We used a probabilistic reward learning task to assess differences in neural mechanisms underlying reinforcement learning in patients with schizophrenia who were either treatment resistant (TRS) or non-treatment resistant (NTR), relative to a HC group. Our findings support the hypothesis that NTR patients show abnormal prediction error-related activation compared with both HC and TRS, consistent with the theory that this patient group is characterized by a greater disruption of dopaminergic functioning. We also found that underlying cognitive bias differentially modulated learning processes in the two patient groups.
We found that HC and TRS patients showed similar patterns of prediction error signaling both during wins and losses. RPE activation was evident in a widespread network in these groups, consistent with the notion that reward processing is almost ubiquitous in the brain (Vickery et al. Reference Vickery, Chun and Lee2011). The observed regions of activation, including medial, superior, and dorsolateral frontal cortex as well as visual areas and parietal cortex, are largely in line with the human cortical substrate of prediction error reported elsewhere (Schultz & Dickinson, Reference Schultz and Dickinson2000; Garrison et al. Reference Garrison, Erdeniz and Done2013). In contrast, NTR patients did not exhibit the same activation pattern. During receipt of rewarding outcomes, a whole-brain analysis showed reduced activation in the cerebellum in NTR compared with both HC and TRS patients; in parietal cortex compared with HC; and in precentral gyrus compared with TRS. An ROI analysis revealed reduced activation in NTR in the thalamus and caudate compared with both HC and TRS. Reduced RPE-related activation in the thalamus and caudate in schizophrenia patients has been previously reported and linked with dopaminergic dysfunction (Gradin et al. Reference Gradin, Kumar, Waiter, Ahearn, Stickle and Milders2011). Moreover, a further study found attenuated responses to unexpected reward, but intact responses to omission of expected reward, in several overlapping regions including the striatum, precentral gyrus, parietal cortex and cerebellum in schizophrenia (Waltz et al. Reference Waltz, Schweitzer, Gold, Kurup, Ross and Salmeron2009). In line with this, group differences with respect to loss outcomes in our study were less widespread, with NTR patients showing attenuated RPE signaling only in the pallidum and caudate compared with HC. The findings support previous suggestions that prediction error-related reinforcement learning deficits in schizophrenia stem primarily from abnormal processing of rewarding, rather than aversive, outcomes (Waltz et al. Reference Waltz, Frank, Robinson and Gold2007; Gold et al. Reference Gold, Waltz, Matveeva, Kasanova, Strauss and Herbener2012; Dowd et al. Reference Dowd, Frank, Collins, Gold and Barch2016).
Encoding of prediction errors during reinforcement learning is extensively driven by dopaminergic function (Schultz, Reference Schultz1998). Although not all the regions found to encode prediction error in our study are densely innervated by dopaminergic projections, it is possible that a ‘global reinforcement signal’ which is elicited by firing of dopamine neurons and broadcast through other regions of the brain (Schultz, Reference Schultz2002) indirectly modulates activation of structures with fewer direct connections to the dopamine system. An important criterion determining whether prediction error activation might reflect dopaminergic activity is a sign change for negative outcomes (Schultz, Reference Schultz2002; Niv & Schoenbaum, Reference Niv and Schoenbaum2008), which was indeed observed in this study. The observed activation is therefore unlikely to reflect simple attentional or surprise processing. Group differences observed in the ROI analyses are highly likely to reflect dopaminergic functioning, given that the striatum and thalamus receive dense dopamine projections from the midbrain (Groves et al. Reference Groves, Garcia-Munoz, Linder, Manley, Martone and Young1995; Schultz, Reference Schultz2002; Garcia et al. Reference Garcia, Chagas, Silva, Machado-De-Sousa, Crippa and Hallak2015). Our findings thus imply that putatively dopamine-driven mechanisms underlying reinforcement learning in response to reward feedback are selectively disrupted in NTR. In contrast, the similar RPE-related activation pattern in TRS patients and HC suggests that reinforcement learning deficits in this patient group do not stem from dopaminergically driven RPE signaling dysfunctions. The data are consistent with the notion that TRS patients do not respond to dopaminergic antipsychotic medication because a dopaminergic abnormality is not the primary cause of symptoms in this subgroup (Demjaha et al. Reference Demjaha, Murray, Mcguire, Kapur and Howes2012). Importantly, medication dosage did not significantly differ between the two patient groups in our sample. Non-response to medication in the TRS group is unlikely to arise from a lower prescribed medication dosage compared with NTR patients as CPZ equivalent dosages were descriptively higher in the TRS group. However, due to the illness chronicity of patients included in our sample, it was not possible to exhaustively ascertain the exact dosage and duration of all previous medication trials, thus cumulative medication exposure remains as a potential confound in this study.
Interestingly, groups did not differ in terms of their bias toward choosing the happy face over the angry face on emotional trials. However, there was a significant difference between TRS and NTR patients in how this bias was associated with RPE signal in the thalamus and caudate during loss processing. In NTR patients, a strong emotional bias was associated with further attenuation of the RPE signal. By comparison, emotional bias in TRS was associated with an increased RPE signal. In turn, RPE signal in this region was positively related to delusional symptom severity specifically in the TRS group. This is surprising as striatal RPE signal has previously been reported to be negatively linked with symptom severity in schizophrenia (Corlett et al. Reference Corlett, Murray, Honey, Aitken, Shanks and Robbins2007; Schlagenhauf et al. Reference Schlagenhauf, Sterzer, Schmack, Ballmaier, Rapp and Wrase2009; Gradin et al. Reference Gradin, Kumar, Waiter, Ahearn, Stickle and Milders2011; Culbreth et al. Reference Culbreth, Westbrook, Xu, Barch and Waltz2016); in line with the view that hyperdopaminergia – reflected in reduced RPE signaling – drives psychosis (Kapur, Reference Kapur2003). Our findings suggest that this relationship may be inverted in TRS patients in the thalamus and caudate. Increased RPE signaling specifically on loss trials may reflect less accurate predictions, resulting in greater prediction errors when the outcome is negative. As such, a strong social bias in TRS may lead to worse predictions about outcomes but an intact subcortical response to prediction error, which in turn is not adequately utilized to update predictions. In contrast, in NTR the prediction error response itself seems to be impaired, an effect which is further augmented in the presence of cognitive bias.
These data support a putative model of TRS whereby the central dysfunction lies not in the subcortical dopamine system itself, but in the implementation of cognitive control mechanisms interacting with this system. This control could be contributed to by glutamatergic mechanisms (Falkenberg et al. Reference Falkenberg, Westerhausen, Specht and Hugdahl2012; Taylor et al. Reference Taylor, Neufeld, Schaefer, Densmore, Rajakumar and Osuch2015). The striatum and cortex are interconnected by multiple partially overlapping circuits subserving learning and flexible cognition (Kehagia et al. Reference Kehagia, Murray and Robbins2010). The ability to maintain behavioral goals in the presence of interference, uncertainty, or bias – broadly the definition of cognitive control – is an integral aspect of feedback learning (Ridderinkhof et al. Reference Ridderinkhof, Van Den Wildenberg, Segalowitz and Carter2004; Collins & Frank, Reference Collins and Frank2013). A breakdown of this system may not only lead to reinforcement learning deficits, but also psychotic symptoms such as delusions as control processes are not adequately exerted in order to update internal models of the environment (Adams et al. Reference Adams, Stephan, Brown, Frith and Friston2013). Control-related regions such as prefrontal cortex, which also shows strong functional connectivity with the striatum (Di Martino et al. Reference Di Martino, Scheres, Margulies, Kelly, Uddin and Shehzad2008), may indeed be involved in delusion formation and maintenance (Heinz & Schlagenhauf, Reference Heinz and Schlagenhauf2010). Arguably, in the absence of an adequate cognitive control mechanism regulating bias, solely targeting subcortical dopamine with antipsychotics may not suffice to alleviate symptoms. In contrast, NTR patients may have sufficient cognitive control such that alleviating the striatal dysfunction is sufficient to reduce symptoms adequately.
Our study offers the first task-related neuroimaging evidence for differential caudate function in chronic TRS and NTR patients. It has been suggested that metabolic as well as anatomical abnormalities in the basal ganglia including the caudate nucleus are involved in TRS and may also be associated with clozapine response. For example, clozapine responders show hypermetabolism in the thalamus and basal ganglia, which is reduced following successful clozapine treatment (Rodriguez et al. Reference Rodriguez, Andree, Castejón and Garcia1996; Rodríguez et al. Reference Rodríguez, Andrée, Castejón, Zamora, Alvaro and Delgado1997). A reduction of metabolism specifically in the caudate after clozapine response was observed more recently (Molina et al. Reference Molina, Sanz, Sarramea and Palomo2007) and clozapine administration is associated with a reduction of caudate volume (Chakos et al. Reference Chakos, Lieberman, Alvir, Bilder and Ashtari1995; Frazier et al. Reference Frazier, Giedd, Kaysen, Albus, Hamburger and Alaghband-Rad1996; Scheepers et al. Reference Scheepers, De Wied, Pol, Van De Flier, Van Der Linden and Kahn2001a, Reference Scheepers, Gispen De Wied, Hulshoff Pol and Kahnb). Notably, treatment-responsive patients were found to have increased dopamine synthesis capacity compared with TRS (Demjaha et al. Reference Demjaha, Murray, Mcguire, Kapur and Howes2012), a finding which was most strong in the caudate nucleus. Thus, the caudate may constitute an interesting target for further investigation of TRS in studies stratifying patient subgroups by response.
The study has certain limitations common to fMRI studies of a potential selection bias in medicated patients suitable for scanning; however, there are scant studies comparing TRS and NTR patients and withdrawal from medication for the purposes of imaging is not ethical. We did not include patients treated with clozapine in order to maintain the homogeneity of the patient sample and TRS patients fulfilled the standard criteria for treatment resistance – thus avoiding the introduction of subgroups of patients refractory to clozapine (super-resistant patients). The differences in striatal RPE activation between groups are apparent at a liberal statistical threshold uncorrected for multiple comparisons; however, the consistent pattern of hypoactivation in NTR patients across the network lends support to this finding as a true positive. Subcortical dysfunctions in reward processing in NTR may be particularly hard to detect given that these may be attenuated in chronic patients after antipsychotic medication (Culbreth et al. Reference Culbreth, Westbrook, Xu, Barch and Waltz2016).
In summary, the data suggest that while the behavioral output during reward learning of patients with treatment-resistant and treatment-responsive schizophrenia appears to be similar, it is underpinned by different neural systems. The data support the idea that TRS may represent a different disease from treatment-responsive schizophrenia; confirming the evidence from clinical observation that TRS does not fit well into the contemporary dopaminergic dysfunction model of schizophrenia. Despite extensive research on task-related neural activity in schizophrenia, studies typically do not use key stratifiers to reduce the heterogeneity of the sample and are likely combining neurobiologically distinct subtypes of schizophrenia. This not only clouds studies of mechanism, but potentially also of treatment trials; missing effects that are specific to one or the other subset of patients (Joyce et al. Reference Joyce, Kehagia, Tracy, Proctor and Shergill2017). There is an urgent need for stratification of patients by response; both at the chronic stage of the illness and in patients suffering a first episode of psychosis. Indeed recent data following up first episode samples of patients with schizophrenia suggest that over 70% of treatment-resistant cases are apparent at onset (Lally et al. Reference Lally, Ajnakina, Di Forti, Trotta, Demjaha and Kolliakou2016). The separation of schizophrenia subgroups will allow the development of clearer hypotheses into the neural mechanisms underlying antipsychotic treatment response and potentially move us closer to being able to use these biomarkers to tailor treatment in a more personalized and effective manner.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291718000041
Acknowledgements
The authors thank the radiographic team at the Centre for Neuroimaging Sciences for their support, and Felix Dransfield, Christiana Ilesanmi, Valentina Forassi and Juliet Gillam for assistance with fMRI scanning and behavioral testing.
Financial support
This research was funded by a European Research Council Grant to SSS (grant number 311686), who is supported by the National Institute for Health Research (NIHR) Mental Health Biomedical Research Centre at South London and Maudsley NHS Foundation Trust and King's College London and a joint infrastructure grant from Guy's and St Thomas’ Charity and the Maudsley Charity. LDV is supported by a Medical Research Council studentship.
Conflict of interest
None.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.