Introduction
The cardinal features of psychotic illness are the presence of hallucinations (perceptual experiences in the absence of an external stimulus) and delusions (fixed false beliefs held contrary to evidence and against the prevailing sociocultural milieu). After any novel subjective perceptual experience, there is higher-order processing that amalgamates experience with extant beliefs, or leads to the initiation of a new belief. These beliefs are tested in the environment, leading to maintenance or extinction depending on their utility (e.g. a belief will be extinguished if it proves to be incorrect and has low utility). Therefore, to maintain appropriate (i.e. non-delusionary) beliefs, experienced events must be temporally integrated with internal models, tested against the environment and then discounted or retained according to feedback from the environment. When the evidence supporting a belief is inconsistent or contrary, the belief is highly resistant to revision and is accompanied by a subjective feeling of conviction, it becomes delusional. This represents a dysregulation of metacognitive processing, which refers to the evaluation of one's own internal cognitive processes, assigning confidence and using this in modifying behavior (Metcalfe & Shimamura, Reference Metcalfe and Shimamura1994; Koriat, Reference Koriat, Zelazo, Moscovitch and Thompson2007), including inferences about the behavior and intentions of others (i.e. theory of mind). Subjective feelings of confidence in one's beliefs can be described in terms of theory-based (e.g. deliberative thought and reasoning) or experience-based (e.g. intuitive and unconscious), with the latter being the dominant model of human metacognition (Bruno et al. Reference Bruno, Sachs, Demily, Frank and Pacherie2012), particularly in procedural learning. Studies examining metacognition in schizophrenia have focused on metamemory processes; for example, in Bacon & Izaute (Reference Bacon and Izaute2009) participants were shown a string of consonants and asked to prospectively rate the probability of accurately spotting this string among seven distractors after a short interval (‘feeling of knowing’ rating). Patients with schizophrenia persistently performed worse on actual recall and provided lower ‘feeling of knowing’ ratings. By contrast, retrospective judgments tend to show that patients with schizophrenia rate higher than controls (Danion et al. Reference Danion, Gokalsing, Robert, Massin-Krauss and Bacon2001; Moritz et al. Reference Moritz, Woodward and Ruff2003; Moritz & Woodward, Reference Moritz and Woodward2006). Recent evidence suggests that deficits in metacognitive ability, including theory of mind, are stable features of schizophrenia (Lysaker et al. Reference Lysaker, Olesek, Warman, Martin, Salzman, Nicolo, Salvatore and Dimaggio2011), showing little change over time. These deficits in schizophrenia may be underpinned by an inability to integrate new information into existing belief systems (Cohen et al. Reference Cohen, Braver and O'Reilly1996) and the established aberrant sensitivity to reward to correct negative beliefs and guide decision making (Waltz & Gold, Reference Waltz and Gold2007; Fletcher & Frith, Reference Fletcher and Frith2009).
Existing cognitive accounts of psychotic symptoms suggest that illness arises through a mixture of dysfunctional predictive models (Frith & Done, Reference Frith and Done1989), jumping to conclusions (JTC) (Garety & Freeman, Reference Garety and Freeman1999) allied with altered reward processing and dopaminergic dysfunction (Kapur, Reference Kapur2003; O'Daly et al. Reference O'Daly, Joyce, Stephan, Murray and Shergill2011). As a common theme in all of these cognitive models, the suggestion is that both perceptual and inferential biases contribute to the establishment of psychotic symptoms, which are then held with both confidence and rigidity, so that these beliefs are difficult to ‘overcome’ even in the presence of strong counter-evidence (Blackwood et al. Reference Blackwood, Howard, Bentall and Murray2001; Fletcher & Frith, Reference Fletcher and Frith2009). This can be conceptualized in a parsimonious fashion within a Bayesian framework (Fletcher & Frith, Reference Fletcher and Frith2009), where developing beliefs about the world is viewed as a probabilistic inference task where old evidence (prior belief) is updated according to new experiences or evidence (likelihoods) allowing the derivation of a new ‘model’ of the world (posterior beliefs). For example, when one's prediction about the world (prior belief) fails to explain or predict some new observation (evidence), there is a ‘surprise’ generated by this dissonance (i.e. the event is assigned salience) that triggers updating of one's existing beliefs. This important process for learning the causes of sensory experience is expressed as the dopaminergic-dependent prediction error signal (Hollerman & Schultz, Reference Hollerman and Schultz1998) within the striatum; reflecting a mismatch between the expectation and incoming sensory input. An additional factor that is often neglected is the questions of our confidence in our beliefs (metacognition) about the world. There are consistent but counterintuitive data showing the considerable discrepancy between the actual probability of certain events occurring and people's confidence in the occurrence of the same event.
In brief, the missing aspect of contemporary models of psychotic beliefs is the process of belief maintenance and failure of abolition. Is there an optimal paradigm to examine this in an ecologically valid manner? Previous studies examining the role of confirmatory and contradictory evidence in reasoning and metacognition have tended to focus on high-level ‘deliberative’ reasoning tasks (Sellen et al. Reference Sellen, Oaksford and Gray2005; Woodward et al. Reference Woodward, Moritz, Cuttler and Whitman2006, Reference Woodward, Moritz, Menon and Klinge2008; Monestes et al. Reference Monestes, Villatte, Moore, Yon and Loas2008). However, evidence for a consistent deficit in people with schizophrenia using such reasoning tasks is variable (Fletcher & Frith, Reference Fletcher and Frith2009). Those who attempt a more probabilistic explanation (e.g. the JTC phenomena) have focused on the beads-counting task, where draws of beads from one of two jars each containing a majority of yellow or black beads are undertaken sequentially. Participants decide when they have enough evidence to decide that the jar being drawn from is the ‘majority yellow’ or ‘majority black’ jar (Huq et al. Reference Huq, Garety and Hemsley1988; Garety et al. Reference Garety, Hemsley and Wessely1991; Garety & Freeman, Reference Garety and Freeman1999; Freeman et al. Reference Freeman, Garety, Kuipers, Colbert, Jolley and Fowler2006, Reference Freeman, Pugh and Garety2008; Startup et al. Reference Startup, Freeman and Garety2008), Such tasks have limited ecological validity because of the non-interactive, non-goal-directed nature of the experiments (where, for example, the utility of beliefs is not crucial to the execution of the task). To answer this question, we propose an alternative experimental approach based on active interactions akin to those we routinely encounter in everyday life. This approach is exemplified in the behavioral economics and game theory literature (Camerer, Reference Camerer1999, Reference Camerer2003; King-Casas et al. Reference King-Casas, Tomlin, Anen, Camerer, Quartz and Montague2005; Fehr & Camerer, Reference Fehr and Camerer2007; Rangel et al. Reference Rangel, Camerer and Montague2008; Fett et al. Reference Fett, Shergill, Joyce, Riedl, Strobel, Gromann and Krabbendam2012), especially in iterated competitive games. As we are concerned with the participant's ability to detect, model and make use of regularities in the opponent's plays, we implement a simulated opponent that gives the illusion of playing like a ‘real’ opponent, but in fact presents a statistically defined frequency of plays. We use a modified ‘Rock Paper Scissors’ (RPS) game, where new evidence obtained after each trial must be selectively integrated with existing evidence so as to update belief and form the basis for future actions. We suggest this represents a ‘middle ground’ between probabilistic inference tasks (e.g. bead counting) and the high-level reasoning tasks.
We hypothesized that patients with schizophrenia would differ from healthy controls (i) in failure to appropriately integrate new evidence with existing beliefs, and (ii) when evaluating their performance, patients would have excessive confidence in their beliefs.
Method
Participants
Twenty-seven participants with schizophrenia and 33 control participants were recruited from the South London and Maudsley National Health Service (NHS) Foundation Trust. Patients had a diagnosis of schizophrenia based on the DSM-IV (APA, 1994) and were selected on the basis of having current positive symptoms of hallucinations and delusions [more than three on respective Positive and Negative Syndrome Scale (PANSS) items] or commensurate levels of positive symptoms documented in their clinical records during exacerbations of their illness over the past 5 years. The average chlorpromazine equivalent was 219.8 ± 178.6 mg. Control and patient groups were matched on years of formal education. Demographic data are presented in Table 1. All subjects were required to give informed consent. This study was approved by the South London and Maudsley and Institute of Psychiatry Research Ethics Committee.
Table 1. Demographics for patients and controls
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160312053421798-0103:S0033291713000263_tab1.gif?pub-status=live)
PANSS, Positive and Negative Syndrome Scale; s.d., standard deviation.
Experimental design
Participants were told they would play six RPS games against an opponent using a computer interface. Unknown to the participants, the opponent was a computer program (Gallagher et al. Reference Gallagher, Jack, Roepstorff and Frith2002; Paulus et al. Reference Paulus, Feinstein, Tapert and Liu2004, Reference Paulus, Feinstein, Leland and Simmons2005) but with one important distinction; in our design, the distribution of the computer opponent's moves are governed by a parameterized multinomial distribution with three different parameter sets that define an easy, medium or hard game.
In any given game, the computer played randomly (i.e. with no pattern of favored plays) for 20 trials and then began playing favoring one play for the 40 subsequent trials. In an easy game, the computer switched to an obvious distribution, favoring the same move (stochastically) on 80% of the trials, with the other two moves being played on the remaining 10% of trials respectively. In a medium game, the computer behaves similarly but favors one move on 60% of the trials, and the other two moves are played on 20% of trials. Finally, a ‘hard’ game is one where the favored move is played on 40% of the trials, and the other two moves are played on 30% of the remaining trials respectively. The multinomial distribution generating the opponent's play assumes independent trials so that, on each trial, the opponent's play is not dependent on preceding trials or the participant's previous plays. Participants played two easy, medium and hard games, resulting in a total of six games.
During games, we also probed for participant's confidence in their beliefs that they had found and were able to exploit a ‘winning streak’. Participants were told that, on each trial, the winner gained one point and the loser incurred the loss of one point (in accordance with a symmetrical zero-sum game). A draw (i.e. both play rock) results in no points for either player. If, at any trial they felt sure they were on a ‘winning streak’, they could instruct the experimenter to press a button that would double both their wins and losses. We refer to this as ‘increasing the pay-off’. Participants were told this was an irreversible, one-off decision to encourage a conservative approach to doubling their wins and losses. Participants were given no explicit feedback of their current total score or their performance from previous games, forcing them to rely entirely on their own estimates of performance.
On each trial, a countdown from three to one preceded a ‘go’ signal. Participants then played their move within 1 s (using the keyboard) and then the computer revealed its move (a photograph of rock, paper or scissors) and the outcome for the participant simultaneously, whether they had won, drawn or lost. If participants did not play within 1 s of the countdown ending, they were instructed that they were too slow and the trial was restarted. Each trial had a total duration of 4250 ms.
Prior to commencing the experiment, participants were trained to use the keyboard to indicate their play on each trial. Each possible play was presented randomly until the participant's response time (e.g. pressing the correct button for rock, paper or scissors) decreased below a threshold. Once participants demonstrated a clear understanding, they began playing the six experimental games. In addition, to monitor participants' engagement with the task (to prevent inattention/distraction), the experimenter sat next to and observed the participants. The experimenter was also responsible for pressing a button to increase pay-offs on verbal instruction from the participants.
Strategy: action selection
In the RPS games, participants are expected to build a model of their opponent that informs their play. To do this, they must balance the evidence available to them from the history of previous plays in addition to weighting new evidence. In ‘hard’ games, evidence from previous plays is more or less redundant, as the opponent plays almost randomly throughout the game. In ‘easy’ games, previous evidence is a reliable predictor of the opponent's strategy, as the opponent will play the same move on a high proportion of the trials.
We sought to test whether patients had difficulty correctly balancing or weighting existing evidence against new evidence. The implication of this is that they fail to detect and model meaningful regularities in the frequency of the opponent's play, resulting in a poor strategy for winning against the opponent. This was modeled by a combined leaky integrator and temporal difference model (Sutton & Barto, Reference Sutton and Barto1998). Leaky integration is a key feature of neuronal mechanisms for coincidence detection. An incoming excitatory signal stimulates a neuronal population, driving its activity upwards. If another excitatory signal arrives within a short time window, the integration process enables the two events to be accumulated (reflected by a sustaining or increasing in the activity of the population), but if the second or subsequent excitatory signals are too far apart, the ‘leaky’ component effectively dissociates them (i.e. the population's activity falls). Similar techniques, such as the diffusion-drift model and the Ornstein–Uhlenbeck process, have been used to understand reaction time and accuracy trade-offs in two-alternative forced choice experiments (for review, see Bogacz et al. Reference Bogacz, Brown, Moehlis, Holmes and Cohen2006). In our study, the participant's playing strategy is modeled by parallel leaky integrators (one for each of rock, paper and scissors), each competing in a competitive winner-takes-all arrangement (a softmax function of the activity of three leaky integrator components), enabling the derivation of probabilities and the selection of one of the three plays to be executed on subsequent trials. The model is updated on a trial-by-trial basis, using two different pieces of information (cf. incoming signals): (i) new evidence about the current strategy with respect to the opponent's play; this signal represents the temporal difference between the predicted outcome of playing an action (i.e. the current state of the strategy suggests that playing rock will result in a win on the next trial) and the actual outcome for a given action (i.e. rock was played but the reward was a loss), and (ii) a decaying (i.e. leaking) of prior expectations based on previous evidence. Rock, paper and scissors all have an accumulated history of their utility against the opponent, but this knowledge will decay over time unless it is reinforced by continued new evidence in its favor. These two factors are modeled explicitly by two parameters: α models the leaky decaying of expectations about utility and β is the weight given to new evidence. Figure 1 shows the theoretical parameter space and the corresponding playing strategies for the model. As α tends to zero, the model emphasizes the value of expected utility based on previous evidence (that is, it discards less prior evidence so that one win with a particular move will be carried forward for some time). Conversely, if α tends to unity, then a subject would be ignoring all prior expectations from the history of plays (i.e. their play would be retrograde amnesic). One extreme model is represented by α = 1 and β = 0, where the participant ignores all prior expectations and gives no weight to new evidence, which would result in random play. If α = 0 and β = 1, then the model reduces to frequency counting and would approximate to a Bayesian model with multinomial likelihood and Dirichlet conjugate prior.
Fig. 1. Parameter space for the action selection model. This diagram illustrates how strategy is updated based on differences between predicted and actual outcomes during different types of play, where α models the decaying of prior evidence and β is the weight given to new evidence.
Importantly, this model does not assume that players explicitly use the pay-off matrix rationally as would, for example, a model based on fictitious play (Fudenberg & Levine, Reference Fudenberg and Levine1995, Reference Fudenberg and Levine1998), or that participants have a statistical model of the a posteriori distribution of the strategy as a formal Bayesian model might (Bernardo & Smith, Reference Bernardo and Smith2000).
Metacognition: confidence in decision making
In Fig. 2 we propose that a point estimate of confidence is derived from the action selection/strategy update model (the leaky integrator system described). The path labelled A in Fig. 2 shows how the decision to increase the pay-off may be a function of the output of a running history of prediction errors (i.e. the model's expected reward on a trial given a strategy). In this case confidence becomes high when the prediction error becomes sufficiently small. This would imply an internal subjective estimate of the expected value of actions is being used. Alternatively, path B in Fig. 2 shows how the ‘absolute’ reward/pay-off may be used (i.e. similar to A, but where the absolute reward (–1, 0 or 1) is used instead of the model-derived expected reward). This would suggest a more objective evaluation of rewards (outcomes) received rather than an internal, subjective evaluation based on expectation. This is similar to actor-critic models (Sutton & Barto, Reference Sutton and Barto1998): one system updates the strategy upon which action selection takes place and another evaluates the success of the strategy.
Fig. 2. Performance evaluation models. This model demonstrates the strategy for each player over successive trials, where evidence for participants to confidently double pay-off is derived from (A) prediction errors or separately and (B) the absolute reward/pay-off.
Analogous to the action–selection model described earlier, the behavior of participants is modeled using another leaky integrator model, with parameters η and κ being the weights associated with decaying the previous pay-off history and accumulating new pay-offs respectively (see online Supplementary Information for details of implementation). These parameters are analogous to α and β, where they can vary between extremes of throwing away all new information and using only new evidence and the converse of using only old information and ignoring new pay-offs.
Results
All data analysis was performed using MATLAB 7.3 (MathWorks Inc., USA).
Overall performance on games between groups
Patients tended to perform worse than controls in terms of total cumulative score (wins minus losses) at the end of a game (Fig. 3), across all three levels of difficulty (analysis of variance; game difficulty, total score, group; df = 1, F = 12.22, p < 0.0005). This difference in performance is not due to poor engagement with the task because, if this were the case, patients would perform at the same poor standard irrespective of the game difficulty (see online Supplementary Information for more detail). All participants learn the pattern quickly in easy games, less so in medium games and, as expected, very little learning occurs in the hard games.
Fig. 3. Average final scores on easy, medium and hard games. For each game, one point was gained when a subject won a trial or deducted if they lost the trial; no points were deducted for draws. This plot shows the final average scores across the difficulty level of each game (easy, medium and hard). Controls are shown with the black dashed line and patients with the gray solid line. Error bars ± 1 standard error.
Model fitting
Each fitted model was run with the estimated parameters to generate a predicted sequence of play for each game compared to what should have been played to win on each trial and averaged. The model is accurate in predicting the behavior of participants and the model fitted both controls and patients equally well (mean log likelihood for controls across all games = 0.898, s.d. = 0.171; mean log likelihood for patients across all games = 0.936, s.d. = 0.180). In terms of the fitted model predicting the participant's actual actions on every trial, across every game, the fitted models performed best on the easy games (as did the participants), with mean correct trial-by-trial model prediction of 0.707 (s.d. = 0.149) and 0.600 (s.d. = 0.203) in controls and patients respectively (where a score of 1.0 would indicate that each model correctly predicted every trial of every game). On medium games, the model for controls yielded a mean correct prediction of 0.583 (s.d. = 0.144) and for patients 0.603 (s.d. = 0.156). On hard games (i.e. close to random) control models performed at 0.495 (s.d. = 0.140) and patient models 0.521 (s.d. = 0.142). Further analysis of the performance of the model is given in the Supplementary Information.
Between-group differences in strategy
To assess how participants integrate new evidence with existing beliefs, the parameters α and β were averaged within group (controls, patients) as a function of game difficulty. An analysis of variance showed that, for α, there was an effect of group (patient, control; df = 1, F = 10.35, p < 0.002) and also game difficulty (easy, medium, hard; df = 2, F = 12.96, p < 0.0001) but no interactions of group by difficulty (df = 2, F = 2.55, p = 0.08) (Fig. 4). However, there were no significant differences for β. This result suggests that controls and patients use new evidence to a similar degree but differ in how they use existing evidence to influence their strategy.
Fig. 4. Model parameters across easy, medium and hard games. For each level of difficulty, α models the decaying of prior evidence and β represents the weight given to new evidence by each subject. Controls are shown with the black dashed line and patients with the gray solid line. Error bars are ± 1 standard error. * Significant difference p < 0.0004. ** Significant difference p < 0.022.
Patients place less emphasis on expectations based on previous evidence (i.e. α tends to 1, and they discard previous accumulated evidence more quickly) than controls in the easy (one-tailed t test; t = –3.44, p < 0.0004; patients mean α = 0.44; controls mean α = 0.23) and medium games (one-tailed t test; t = −2.05, p < 0.022; patients mean α = 0.55; controls mean α = 0.43) but not on hard games (where outcomes are most unpredictable). This suggests that when meaningful regularities in the frequency of plays are evident in the opponent's play, the higher value of α causes the estimated utility of each play to fall more quickly in patients than in controls. Thus, patients are unable to temporally ‘link together’ events that represent reliable predictors of an opponent's play. Further analysis is presented in the Supplementary Information.
Correlation of parameters with PANSS items
For easy games (where the pattern of play was obvious and exploitable), there was a modest correlation between higher values of α and greater scores on the Delusions item of the PANSS questionnaire for patients: Spearman's ρ = 0.273, p = 0.045.
Between-group differences in metacognition
To qualitatively explore the decision to gamble on doubling pay-offs, games where no decisions were made were discarded, leaving 116 and 100 games for controls and patients respectively. Patients exhibited increased confidence (i.e. by doubling pay-offs) earlier in the games than controls (Kolmogorov–Smirnoff test; p < 0.0001; controls mean decision at trial 31.46, s.d. = 13.62; patients mean decision at trial 23.77, s.d. = 16.03).
By averaging the history of pay-offs (wins, losses and draws) over the trials directly preceding the decision to double the pay-off, it was found that participants generally experience an ‘upswing’ of around 10 trials (Fig. 5) of positive reward before the trial on which they make the decision to double the pay-offs (see Supplementary Information for detailed analysis).
Fig. 5. Mean outcome (reward/pay-off) in trials preceding the decision to double pay-offs (confidence). The two histograms show a characteristic ‘upswing’ where mean pay-offs are increasing positive (i.e. wins greater than losses) over a time window of approximately 10 trials before the decision was made to ‘double bets’.
In examining whether internally derived measures (i.e. prediction errors) or actual absolute rewards influence metacognitive assessment, control subjects showed no correlation between the trial at which the decision to double bets was made and measures of the average prediction error or average absolute reward in the 10 trials preceding the decision (r 2 = 0.068, p > 0.05 and r 2 = 0.131, p > 0.05). In patients, only the 10-trial average of the mean absolute reward showed a correlation with the trial at which the decision was made (r 2 = 0.447, p < 0.0001), but not the 10-trial mean prediction error. This suggests that a simple correlation-based explanation, where an increasing trend in reward rather than punishment predicts confidence judgments, will not suffice.
A leaky integrator model was used and the parameters η and κ found by fitting the model to the data by minimizing a quadratic objective function of the time the participant made their decision and that predicted by the model for a given estimate of η and κ (see Supplementary Information). Using the absolute (rather than prediction error) pay-off accurately predicted the decision to double pay-offs with a mean error of ±1.4 trials in controls and ±1.6 trials in patients. The model using the derived prediction error was much less accurate (±9.0 trials in controls and ±15.1 trials in patients).
Of note, patients and controls did not differ in the amount by which they decay previous evidence (η) for the decision to double pay-offs but patients gave significantly more weight to new rewards (κ) (t test; one-tailed, patients > controls; p < 0.0008; patients mean κ = 0.51, controls mean κ = 0.36).
Discussion
We have shown two dissociable mechanisms at work during decision making in schizophrenia: one strategically evaluating evidence for, and deciding on, a specific action and another metacognitive mechanism that acts to assign confidence in the selected strategy. The results demonstrate that patients weight new evidence similarly to controls in deciding on their strategy but they ‘leak’ prior evidence (the α parameter), preventing efficient action selection, as the temporal patterns from previous plays are not incorporated in decision making, and they are less able to detect and exploit meaningful regularities in the opponent's play (Gray et al. Reference Gray, Feldon, Rawlins, Smith and Hemsley1991). This was particularly pronounced during the easy games, where the pattern of the opponent's play was more obvious.
Despite their less efficient opponent modeling and action selection strategy, patients still exhibit overconfidence when assessing their confidence in their strategy, choosing to increase the stakes in the game (they ‘increase pay-off’ earlier), in the face of less objective evidence, and this is driven by an overweighting of new evidence (the κ parameter) in the temporal sequence of absolute rewards.
In these patients with schizophrenia, these factors could explain why psychotic beliefs are maintained and not extinguished in the face of contrary evidence. In everyday interactions with the world, there are a succession of incoming signals from the environment (i.e. rewards, feedback on performance or observations about other agents), some of which are noise (i.e. random fluctuations) whereas others represent meaningful regularities or associations between events. These incoming signals must be evaluated for their utility (expected value) and temporally integrated when appropriate (for example, when these utilities are congruent with the consequences of actions in the environment) and discarded (leaked) when they represent meaningless coincidences. Furthermore, the metacognitive task of evaluating confidence in one's beliefs about strategy is skewed in favor of more recent events and their confidence model fails to filter out any sporadic random runs of success.
Theories of belief
Our model frames these cognitive processes (modeling the environment by detecting meaningful regularities in events, action selection and confidence) in a simple, parsimonious model that is computationally plausible and driven by the mechanisms by that neuronal populations integrate incoming signals in the temporal and spatial domains. The RPS game in this study naturally lends itself to a theory of belief representation in terms of mapping observable stimuli to actions through an internal representational scheme based on probabilistic representations. This is in contrast to formal epistemological theories of belief representation (Hintikka et al. Reference Hintikka, Hendricks and Symons2005), where the beliefs are defined in terms of theorem-based manipulations over symbolic propositions or predicates in the ‘language of thought’ (Fodor, Reference Fodor1998). For example, using Bratman's (Reference Bratman1987) theory of practical reasoning, playing the RPS game would be formulated as inferences over sets such as (C,O)→A, where O enumerates the most recent play by the opponent, C is a finite set of contexts (e.g. enumerations over the set of opponent players such as an ‘easy’ or ‘difficult’ opponent) and A enumerates the plays available to the participant. This latter approach to modeling belief, particularly as applied to the dynamic processes underpinning belief and delusions, has yet to be evaluated. It can be argued that formal epistemological theories capture the explicitly ‘linguistic’ and propositional nature of belief (e.g. ‘The opponent is a cheat’) whereas our probabilistic approach represents an implicit, action-oriented and bounded rational interpretation of ‘belief’. We propose that the dynamic, adaptive cognitive processes underpinning belief formation are best studied using such implicit action-directed approaches.
Our model posits that this failure of modulation by context is a function of the ‘leaky’ component of our model. If discrete units of evidence are allowed to ‘leak out’ too quickly, then information about meaningful temporal sequences will never be correctly associated together and no reliable evidence will be available for the higher levels of the hierarchy (i.e. those responsible for maintaining or abolishing beliefs, and metacognitive systems that evaluate performance).
Theories of metacognition
Theories of metacognition span two axes: ‘monitoring/control’ or ‘control/monitoring’ models (Koriat & Ackerman, Reference Koriat and Ackerman2009) and the ‘information/theory-based’ or ‘experience-based’ (Koriat, 1997). A third position in metacognition is represented by theory of mind (Koriat & Ackerman, Reference Koriat and Ackerman2009), where stored representations of mental state (combined with rules of inference relating these stored representations to observable behavior) allow an individual to predict others' intentions. Our model and experimental results suggest two parallel processes: the judgment of how well one is performing (evidenced by the metacognitive act of ‘doubling’ bets when confidence reaches a threshold) seems to be better predicted by temporal changes in absolute reward rather than depending on measures derived from the ‘control’ (action selection) process.
When confidence is derived directly from the internal action-selecting mechanism (suggesting one integrated system), we are unable to predict control's and patient's performance in increasing the stakes of the game by doubling pay-off. However, for both groups, this performance can be predicted from absolute reward/pay-off signals. The confidence process can be viewed within the standard actor-critic model, where one implicit mechanism is ‘fast and dirty’ for driving trial-by-trial behavior and another ‘critic’ evaluates the performance of this system (cf. the proposed sequential monitor/control or control/monitor; Koriat & Ackerman, Reference Koriat and Ackerman2009). Patients consistently made decisions earlier than controls, with more limited information, and indeed, in our model, this was reflected by a higher weighting for new (more recent) rewards rather than previous history. This is analogous to the overattribution of evidence observed in the JTC bias in the beads task and similar experiments (Freeman et al. Reference Freeman, Pugh and Garety2008; Woodward et al. Reference Woodward, Munz, LeClerc and Lecomte2009; Speechley et al. Reference Speechley, Whitman and Woodward2010), and is consistent with the observation that people with schizophrenia have poor self-assessment of their own performance and functional status (Bowie et al. Reference Bowie, Twamley, Anderson, Halpern, Patterson and Harvey2007). In our task we attempted to study the dynamic process underlying the decision rather than manipulating experimental conditions that define likelihood (new evidence) and prior (old or accumulated evidence) probabilities. This probe of metacognitive ability emphasizes ‘output-bound’ performance (Koren et al. Reference Koriat2004, Reference Koren, Seidman, Goldsmith and Harvey2006), where self-monitoring, evaluation and commitment to one's own behavior is based on a model of the world (i.e. beliefs about interactions with other agents) and directed toward behavior. This is in contrast to what Koren et al. (Reference Koren, Seidman, Goldsmith and Harvey2006) describe as ‘input-bound’ measures (such as in bead-counting tasks), where the problem is framed such that participants make assessments of input probabilities, forcing participants to commit to a response.
Relationship to monoamine theories of schizophrenia
The metacognitive results of our study and model are explained by the presynaptic hyperdopaminergic state found in schizophrenia (Howes et al. Reference Howes, Kambeitz, Kim, Stahl, Slifstein, Abi-Dargham and Kapur2012), with increased response to positive feedback (Pessiglione et al. Reference Pessiglione, Seymour, Flandin, Dolan and Frith2006) that may drive abnormal salience responses in schizophrenia (Kapur, Reference Kapur2003; Kapur et al. Reference Kapur, Mizrahi and Li2005). However, striatal hyperdopaminergia alone cannot explain the findings in the patient's action–selection strategy. Our model predicts poorer temporal integration in people with schizophrenia. This can be explained by ‘context-processing’ deficits in schizophrenia (Barch & Ceaser, 2012). Here, ‘context’ refers to appropriate online maintenance of representation of probable opponent play to enable action selection of counter-plays. It also requires filtering of irrelevant stimuli (i.e. moves by the opponent that do not concord with the emerging dominance of their preferred move). Prefrontal D1 and D2 neurons have been proposed to operate in dual-state networks: when these networks are driven by D1 activity, for example by experimental D2 blockade (Mehta et al. Reference Mehta, Manes, Magnolfi, Sahakian and Robbins2004), stable working memory formation dominates, with irrelevant stimuli being filtered but with poor context-switching and response flexibility (Cools & D'Esposito, Reference Cools and D'Esposito2011). By contrast, D2-dominated activity favors flexible response selection while sacrificing filtering of temporally intervening irrelevant stimuli (Durstewitz & Seamans, Reference Durstewitz and Seamans2008; Cools & D'Esposito, Reference Cools and D'Esposito2011).
Friston (Reference Friston2002) suggested that higher levels of a cortical hierarchy provide contextual guidance to lower levels of processing based on a prediction of inputs and that these are modified in the presence of a mismatch; this does not occur in patients with schizophrenia. This failure to modify prior belief in the presence of new evidence is supported by an extensive literature demonstrating this, for example, in visual processing of hollow mask illusions (Schneider et al. Reference Schneider, Borsutzky, Seifert, Leweke, Huber and Rollnik2002) and their neural correlates (Dima et al. Reference Dima, Roiser, Dietrich, Bonnemann, Lanfermann and Emrich2009). Other modalities such as event-related potentials in processing discrepant information (Debruille et al. Reference Debruille, Kumar, Saheb, Chintoh, Gharghi and Lionnet2007), impaired stimulus evaluation (Doege et al. Reference Doege, Bates, White, Das, Boks and Liddle2009) and our own work on predictive models distinguishing self and other (Shergill et al. Reference Shergill, Samson, Bays, Frith and Wolpert2005; Simons et al. Reference Simons, Tracy, Sanghera, O'Daly, Gilleen, Dominguez, Krabbendam and Shergill2010) suggest that top-down modulation in the integration of evidence is dysfunctional in schizophrenia.
In conclusion, patients with schizophrenia demonstrate metacognitive changes that lead them to a JTC bias, making decisions on the basis of insufficient evidence, driven by selective overweighting of recent rewarding events rather than a carefully balanced assessment of recent successes and failures over time. These data support a model of psychotic symptoms, concordant with a hyperdopaminergic state, that gives rise to the salience of aberrant perceptions linked to abnormal beliefs. These may occur transiently in a significant proportion of the general population (Smeets et al. Reference Smeets, Lataster, Dominguez, Hommes, Lieb and Wittchen2012). However, the reason these beliefs are not extinguished in the face of contrary information is because there is a failure of the normal mechanism for integrating evidence in the presence of meaningful temporally ordered events; this deficit is compounded by the changes in metacognitive processing, giving an inappropriately higher weighting to absolute rewards from the environment. Further work is required to disassemble the relative contribution of neural networks responsible for aberrant ‘leaking’ of evidence.
These data tentatively support current therapeutic approaches that encourage efficient decision making through cognitive remediation, where patients are encouraged to make explicit judgments during stepwise reasoning. Given that improving both positive symptoms and cognitive dysfunction independently have good predictive value for long-term outcomes (Bowie et al. Reference Bowie, Reichenberg, Patterson, Heaton and Harvey2006), further translation of experimental cognitive approaches focusing on serial decision making is warranted.
Supplementary material
For supplementary material accompanying this paper, please visit http://dx.doi.org/10.1017/S0033291713000263.
Acknowledgments
S.S.S. was funded by a Medical Research Centre (MRC) New Investigator Award.
Declaration of Interest
None.