Language is the backbone of interpersonal interaction and an essential part of human cognition: to understand or speak a sentence requires the coordination of a range of processes, ranging from low-level perception to high-level social cognition. In schizophrenia, language dysfunction has long been noted (Bleuler, Reference Bleuler1911/1950; Andreasen, Reference Andreasen1979a, Reference Andreasen1979b; Kuperberg, Reference Kuperberg2010a), and is most obviously seen in the disorganized (‘thought-disordered’) speech produced by some patients (Bleuler, Reference Bleuler1911/1950; Andreasen, Reference Andreasen1986). But abnormalities in language comprehension can also be detected in the absence of overt thought disorder (for reviews, see Kuperberg, Reference Kuperberg2010b; Brown and Kuperberg, Reference Brown and Kuperberg2015) and these can predict psychosocial function (e.g. Bowie and Harvey, Reference Bowie and Harvey2008; Swaab et al., Reference Swaab, Boudewyn, Long, Luck, Kring, Ragland, Ranganath, Lesh, Niendam, Solomon and Mangun2013; Holshausen et al., Reference Holshausen, Harvey, Elvevåg, Foltz and Bowie2014). Understanding the basis of abnormal language processing in schizophrenia therefore has important general implications for understanding the disorder's cognitive architecture more broadly, particularly the relationships between perceptual and higher order disturbances that characterize the disorder (Brown and Kuperberg, Reference Brown and Kuperberg2015). Moreover, the important role that language plays in social interaction suggest that understanding these linguistic abnormalities may shed light on the everyday social challenges faced by people with schizophrenia.
Abnormalities of language in schizophrenia have been described at multiple levels, including sentence and discourse processing (Cohen and Servan-Schreiber, Reference Cohen and Servan-Schreiber1992; Kuperberg et al., Reference Kuperberg, McGuire and David1998; Ditman and Kuperberg, Reference Ditman and Kuperberg2007; Boudewyn et al., Reference Boudewyn, Carter and Swaab2012), pragmatic inferencing (Frith, Reference Frith2004; Bambini et al., Reference Bambini, Arcara, Bechi, Buonocore, Cavallaro and Bosia2016), lexico-semantic associations (Spitzer et al., Reference Spitzer, Braun, Hermle and Maier1993; Mathalon et al., Reference Mathalon, Faustman and Ford2002; Minzenberg et al., Reference Minzenberg, Ober and Vinogradov2002; Titone and Levy, Reference Titone and Levy2004; Elvevåg et al., Reference Elvevåg, Foltz, Weinberger and Goldberg2007; Kreher et al., Reference Kreher, Goff and Kuperberg2009), phonology and orthography (Whitford et al., Reference Whitford, O'Driscoll, Pack, Joober, Malla and Titone2013; Revheim et al., Reference Revheim, Hole, Bruland, Reitan, Bjerkehagen, Julsrud and Seierstad2014; Whitford et al., Reference Whitford, O'Driscoll and Titone2017), and prosody (Kantrowitz et al., Reference Kantrowitz, Hoptman, Leitman, Silipo and Javitt2014). While higher and lower level language abnormalities in schizophrenia have usually been discussed independently, some have proposed that they are linked, with two major theories discussing the nature of these links.
The first ‘bottom-up’ theory proposes that lower level impairments cascade up to cause higher level language abnormalities in schizophrenia. This proposal assumes that the primary locus of linguistic dysfunction is in the perception and propagation of lower level information (such as speech sounds or early visual representations) up the linguistic hierarchy, driving abnormalities at higher levels of representation, such as the interpretation of a sentence's meaning (Leitman et al., Reference Leitman, Foxe, Butler, Saperstein, Revheim and Javitt2005; Javitt, Reference Javitt2009; Jahshan et al., Reference Jahshan, Wynn and Green2013; Kantrowitz et al., Reference Kantrowitz, Hoptman, Leitman, Silipo and Javitt2014; Revheim et al., Reference Revheim, Hole, Bruland, Reitan, Bjerkehagen, Julsrud and Seierstad2014; Javitt and Freedman, Reference Javitt and Freedman2015).
The second ‘top-down interactive’ theory proposes that linguistic abnormalities in schizophrenia stem from disruptions of the fast interactions between higher and lower level representations as language is comprehended. This theory (see Brown and Kuperberg, Reference Brown and Kuperberg2015 for a recent review) is based on models of typical language processing that posit constant communication between higher and lower level representations during language comprehension (McClelland and Rumelhart, Reference McClelland and Rumelhart1981; Rumelhart and McClelland, Reference Rumelhart and McClelland1982; Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995; Elman et al., Reference Elman, Hare and McRae2004), an idea that is echoed in more general cognitive models of schizophrenia (e.g. Cohen and Servan-Schreiber, Reference Cohen and Servan-Schreiber1992). For example, probabilistic predictive frameworks propose a crucial role of top-down inputs from higher level representations in constraining activity at lower level representations (Brown and Kuperberg, Reference Brown and Kuperberg2015; Kuperberg and Jaeger, Reference Kuperberg and Jaeger2016). If these predictive interactions are disrupted in schizophrenia, this would result in unconstrained bottom-up activity (Corlett et al., Reference Corlett, Frith and Fletcher2009; Fletcher and Frith, Reference Fletcher and Frith2009), and thus abnormal patterns of language processing (Brown and Kuperberg, Reference Brown and Kuperberg2015).
Although these two theories appear distinct, they have proven difficult to disentangle (see Brown and Kuperberg, Reference Brown and Kuperberg2015). For example, some researchers have taken correlations between lower and higher level language abnormalities in schizophrenia as evidence for the first theory (Leitman et al., Reference Leitman, Foxe, Butler, Saperstein, Revheim and Javitt2005; Jahshan et al., Reference Jahshan, Wynn and Green2013; Kantrowitz et al., Reference Kantrowitz, Hoptman, Leitman, Silipo and Javitt2014), but these data are equally well explained by the second. Conversely, others have taken impairments in patients’ use of higher level discourse representations, but preserved sensitivity to simple lexico-semantic associations (Titone et al., Reference Titone, Levy and Holzman2000; Kuperberg et al., Reference Kuperberg, Sitnikova, Goff and Holcomb2006; Ditman and Kuperberg, Reference Ditman and Kuperberg2007; Swaab et al., Reference Swaab, Boudewyn, Long, Luck, Kring, Ragland, Ranganath, Lesh, Niendam, Solomon and Mangun2013; and see Kuperberg, Reference Kuperberg2010b, for a review), as support for the second theory. However, because language comprehension is highly incremental, with each incoming word being integrated into a high-level discourse representation in real time, it is possible that apparent impairments in using higher level discourse context could actually arise from a difficulty building this context in the first place, due to impaired lower level processing.
The present study was designed to distinguish between these two theories by examining how people with schizophrenia interpret ambiguous sentences. Ambiguity resolution is a critical component of everyday language comprehension: To understand a sentence, listeners constantly have to resolve a series of ambiguous sounds, words, and meanings. Here, we focused on one particularly common type of ambiguity – syntactic ambiguities such as ‘wave to the man with the flag’, where the flag could be held by the man or by the waver. Syntactic ambiguity resolution provides an ideal test case for understanding the effects of bottom-up and top-down interactive processes. This is because syntax is often assumed to lie at an intermediate level on the linguistic hierarchy: it may lie above lower level representations such as prosody or lexical information, which are therefore said to interact with syntax in a bottom-up fashion. However, it lies below higher level representations such as discourse and pragmatics, which are therefore said to interact with syntax in a top-down fashion (see Table 1 for definitions). Here, we asked how people with schizophrenia used these two types of lower level information in a bottom-up fashion, and these two types of higher level information in a top-down fashion, to influence syntactic ambiguity resolution, and hence interpretation.
Table 1. Definitions of terms and summary of manipulations

To do this, we used the visual-world eye-tracking method, a well-established and well-validated psycholinguistics technique that has become a ubiquitous tool for studying the time course of spoken language comprehension (Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995; Tanenhaus and Trueswell, Reference Tanenhaus, Trueswell, Traxler and Gernsbacher2006). Visual-world eye tracking has not been previously used to study schizophrenia, yet it is particularly well suited for this purpose as it provides a naturalistic and minimally demanding experimental analogue to everyday communication. In our paradigm, participants interacted with a set of real-world objects placed in front of them (following Sedivy et al., Reference Sedivy, Tanenhaus, Chambers and Carlson1999; Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995; Keysar et al., Reference Keysar, Barr, Balin and Brauner2000; and see also Trueswell et al., Reference Trueswell, Sekerina, Hill and Logrip1999; Snedeker and Trueswell, Reference Snedeker and Trueswell2004; Snedeker and Yuan, Reference Snedeker and Yuan2008; Huang and Snedeker, Reference Huang and Snedeker2009a, Reference Huang and Snedeker2009b; Diehl et al., Reference Diehl, Friedberg, Paul and Snedeker2015; Gambi et al., Reference Gambi, Pickering and Rabagliati2016; for work validating this paradigm in populations other than typical adults). For example, participants might see (1) a toy frog holding a small feather, (2) a large feather, (3) a toy cat holding a small flower, and (4) a large flower (see Fig. 1). They then listened to spoken instructions telling them how to manipulate these objects, e.g. ‘Poke the frog with the feather’. Although this instruction appears simple, it is actually syntactically ambiguous: it can either be interpreted as an instruction to use the large feather as an ‘instrument’ to poke the frog (the so-called instrument interpretation), or to use one's own finger to poke the frog that is holding the small feather. Importantly, there are no ‘correct’ responses to an instruction like this: its interpretation depends upon how the syntactic ambiguity is resolved, which, in turn, depends upon whether and when participants use different types of informational cues within the context. As participants listen to such instructions, their use of different types of cues can be inferred by examining the pattern of their eye movements to the objects as the spoken verbal input unfolds. For example, if participants infer an instrument interpretation, then they should be more likely to gaze toward the large feather (i.e. the instrument) when they hear the word ‘feather’. Critically, there is little reason to believe that the types of oculomotor process that are measured in the visual-world paradigm (i.e. patterns of saccadic eye movements and fixations) are impaired in schizophrenia. Unlike the so-called ‘smooth pursuit’ eye movements (Iacono, Reference Iacono1981), there is little evidence that deficits in oculomotor control affect patients’ saccades (Whitford et al., Reference Whitford, O'Driscoll, Pack, Joober, Malla and Titone2013).

Fig. 1. Illustration of the experimental setup used. Left: an action performed with the target instrument. Right: an action performed without the target instrument. TI, target instrument; TA, target animal; DI, distractor instrument; DA, distractor animal.
To assess how participants used lower and higher level information to influence their interpretation of these syntactically ambiguous spoken sentences, we separately manipulated four features of the linguistic and non-linguistic input – two lower level cues (prosodic phrasing see Snedeker and Yuan, Reference Snedeker and Yuan2008, and semantic–thematic verb constraints, see Snedeker and Trueswell, Reference Snedeker and Trueswell2004), and two higher level cues (pragmatically-relevant visual context, see Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995, and conversational discourse information, see Rabagliati et al., Reference Rabagliati, Gambi and Pickering2014). These manipulations are described, together with definitions and examples, in Table 1. By examining how these cues affected eye movements, we were able to distinguish between the two theories outlined above. The bottom-up theory would predict reduced looks to the instrument in the schizophrenia group when both lower and higher level cues bias toward the instrument interpretation. The top-down interactive theory, however, would predict reduced looks to the instrument in the schizophrenia group, only when higher level cues bias toward this interpretation.
In addition to examining eye movements while participants listened to the sentences, we also examined participants’ final actions, reflecting their final interpretations of the sentences. Some previous studies have found that, even though people with schizophrenia can struggle with using different types of cue to process language as it unfolds very quickly, if there is enough time, they can still use such cues to ultimately interpret sentences in similar ways to healthy controls (Ditman and Kuperberg, Reference Ditman and Kuperberg2007; Kuperberg et al., Reference Kuperberg, Ditman and Choi Perrachione2018). If this was the case in the present study, then people with schizophrenia and healthy controls might show the same pattern of final actions, even if they showed different patterns of eye movements. Given the very fast pace of real-world conversation, this would have important psychosocial implications for understanding why some people with schizophrenia struggle with day-to-day social communication.
Methods and materials
Participants
Twenty-four stable outpatients (three females) were recruited from the Lindemann Mental Health Center, Boston. All met the DSM-IV-TR criteria for schizophrenia or schizoaffective disorder, confirmed using the Structured Clinical Interview for DSM-IV-TR Axis I Disorders (First et al., Reference First, Spitzer, Miriam and Williams2002b). Twenty-two were taking stable doses of antipsychotic medication (19 atypicals; three typicals) and two were unmedicated. Symptoms were assessed using the Scale for the Assessment of Positive Symptoms (SAPS, Andreasen, Reference Andreasen1984b) and the Scale for the Assessment of Negative Symptoms (SANS, Andreasen, Reference Andreasen1984a) either on the day of testing (20 participants) or within 60 days (four participants), see Table 2. Twenty-four demographically matched controls (three females) were recruited by advertisement. Control participants were not taking psychoactive medication and were screened to exclude psychiatric and neurological disorders or substance abuse/dependence (First et al., Reference First, Spitzer, Miriam and Williams2002a).
Table 2. Demographic, medication and symptom measures

Means are shown with standard deviations in parentheses.
a Premorbid IQ was assessed using the North American Adult Reading Test: NAART (Blair and Spreen, Reference Blair and Spreen1989).
b Parental socio-economic status (SES) was calculated using the Hollingshead Index (Hollingshead, Reference Hollingshead1965). One control and one patient did not provide parental occupation.
c Chlorpromazine (CPZ) equivalents were calculated following the International Consensus Study of Antipsychotic Dosing (Gardner et al., Reference Gardner, Murphy, O'Donnell, Centorrino and Baldessarini2010).
d SAPS: Scale for the Assessment of Positive Symptoms (Andreasen, Reference Andreasen1984b); SANS: Scale for the Assessment of Negative Symptoms (Andreasen, Reference Andreasen1984a). SAPS and SANS scores shown are summary scores (sum of the global ratings).
All participants were native English speakers. This study was carried out with the explicit review and approval of the Partners Human Research Committee and Massachusetts General Hospital IRB (#2010P001683) and Tufts Health Sciences Institutional Review Board (#5110). Participants gave written informed consent and were compensated for taking part in the study in accordance with the approved IRB protocols.
General procedures
Each participant was tested on three similar experimental tasks examining their use of prosodic phrasing (task 1), the semantic–thematic constraints of the verb (task 2), pragmatically-relevant visual context (also in task 2), and conversational discourse context (task 3). Participants completed the tasks in one of two orders, with task 2 always second.
We used a ‘looking while listening’ variant of the visual-world paradigm in which participants’ eye movements were remotely monitored via video camera and then hand coded (Snedeker and Trueswell, Reference Snedeker and Trueswell2004; Snedeker and Yuan, Reference Snedeker and Yuan2008). Participants sat in front of a sloped shelf containing four small platforms (see Fig. 1). On every trial, an experimenter placed four different objects on the platforms and named them. These were: (a) the target animal: a toy animal holding a small object (e.g. a toy frog holding a small feather); (b) the target instrument: a larger object (e.g. a large feather that can be used for poking); (c) the distractor animal: another toy animal, either of the same or different type as the target animal, holding a different small object (e.g. a different toy frog or a toy cat holding a small flower); and (d) the distractor instrument: a different large object (e.g. a large flower).
Participants heard spoken instructions over a loudspeaker (pre-recorded by an unfamiliar female American English speaker). A video camera, embedded in the shelf, recorded the participant's face at 30 frames per second as she/he listened to the instructions; this video was later used to code gaze fixations (see online Supplementary Materials for full details). A second camera, behind the participant's shoulder, recorded their final actions. Participants were told the purpose of each camera, and that the study was part of a larger project assessing language in children and adults, which explained the somewhat ‘silly’ nature of the instructions.
Each trial used different combinations of animals and instruments. Positions were counterbalanced across trials to avoid learned associations between particular objects and locations. Experimental trials were interspersed with filler trials using a variety of linguistic constructions, animals, and instruments.
Task 1: use of prosodic phrasing
Following Snedeker and Yuan (Reference Snedeker and Yuan2008)’s design, we varied how pauses were placed in the experimental instructions, to produce a bias toward the target instrument in four experimental trials (e.g. ‘You can poke the frog…with the feather’), and a bias against the target instrument in the remaining four experimental trials (e.g. ‘You can poke…the frog with the feather’). Trials were blocked, such that all four trials from one condition preceded trials from the other and were interspersed amongst 20 filler trials. Scenes always contained animals of different types (e.g. a frog holding a feather and a cat holding a flower).
Task 2: use of the verb's semantic–thematic constraints and pragmatically relevant visual information
Following Snedeker and Trueswell (Reference Snedeker and Trueswell2004)’s design, we varied the particular verb used in the spoken instruction. Eight experimental trials contained verbs that were independently rated (as described by Snedeker and Trueswell, Reference Snedeker and Trueswell2004) to probabilistically bias participants toward carrying out an action with an instrument (e.g. ‘poke the frog with the feather’), and eight trials contained verbs like sing that bias participants against using the instrument (e.g. ‘sing to the frog with the funnel’). These instructions did not contain any prosodic pauses.
Instructions were crossed with a manipulation of pragmatically relevant visual information. Specifically, we varied the number of potential animal referents of a particular type within the visual scene (Tanenhaus et al., Reference Tanenhaus, Spivey-Knowlton, Eberhard and Sedivy1995; Dahan and Tanenhaus, Reference Dahan and Tanenhaus2004; Snedeker and Trueswell, Reference Snedeker and Trueswell2004). In eight trials, the scene contained two animals of different types (e.g. a frog and a cat), while in the remaining eight trials, the scene contained two animals of one type (e.g. a frog holding a small feather and another frog holding a small flower). This manipulation works because the latter scene biases away from the instrument interpretation, as comprehenders who hear ‘poke the frog with the feather’ tend to infer that ‘with the feather’ disambiguates which of the two frogs should be poked. Experimental trials were randomly interspersed amongst 32 filler trials.
Task 3: use of conversational discourse information
A question preceded each of the eight experimental trials, asked by a male speaker. In four trials, the question biased participants toward using the target instrument (e.g. Question: ‘What should we do to a frog?’ Answer: ‘Poke the frog with feather’), and in the remaining four trials, the question biased against using the target instrument (e.g. Question: ‘Which frog should we play with now?’ Answer: ‘Poke the frog with feather’). All experimental trials contained two animals of the same type (e.g. a frog holding a feather and a frog holding a spoon). They were blocked and interspersed amongst 20 filler trials.
Analysis
Analysis of eye movements
On each trial, hypothesis-blind research assistants used the video to code the direction of each participant's gaze in relation to the particular location of the object for that trial, see online Supplementary Materials for full details.
We conducted a pre-planned ‘time-window’ analysis of the eye movements. This analysis focused on whether participants looked at the target instrument (e.g. the large feather) at any point within each of two time windows following the onset of each instruction's final word (feather) – from 200 to 699 ms and from 700 to 1199 ms. These time windows were selected a priori: they are the same as those analyzed by Snedeker and Trueswell (Reference Snedeker and Trueswell2004) and Diehl et al. (Reference Diehl, Friedberg, Paul and Snedeker2015), who used a similar paradigm to assess syntactic ambiguity resolution in healthy adults, adolescents with autism spectrum disorder, and young children. We specifically chose this approach over alternatives such as growth curve analysis (Mirman et al., Reference Mirman, Dixon and Magnuson2008), in part because recent work (Huang et al., Reference Huang, Stranahan and Snedeker2017) has suggested that the latter analyses can produce a high rate of false positives, a finding that we have confirmed with our own simulations on the present dataset. In contrast, as well as implementing strong a prior hypotheses, the time-window analysis we adopt here also accurately reflects many of the temporal properties of gaze behavior, including the fact that fixations typically last for many hundreds of milliseconds.
Analyses were carried out using mixed-effect logistic regressions fit using lme4 package version 1.1 (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R (R Core Team, 2016). We used logistic rather than linear regression because our dependent variable was binary: whether a participant fixated the target instrument during each time window, or whether they looked elsewhere (collapsing across looks to one of the other quadrants, to the central fixation point, or off the stage altogether). The linking function for logistic regression thus provides a more accurate model of the data and is better able to account for floor and ceiling effects.
We structured the predictors in our regression to make them maximally comparable to an analysis of variance. For each task and population group, we crossed the factors information bias (cues biasing toward or away from the instrument interpretation) and time window (early or late). In all analyses, we treated subjects as random effects. In task 2 (where trials were randomly ordered), the effect of information bias was treated as a random effect within subjects, but in tasks 1 and 3, where trials were blocked, information bias was simply treated as a fixed effect, to account for the fact that many subjects perseverated on an interpretation (and thus effects could be clearly seen between subjects). Time window was allowed to vary within subjects. Then, to determine whether effects of information bias differed significantly between the control and schizophrenia groups, we also carried out between-group analyses, in which we crossed group (controls or patients) with information bias and time window.
To assess the significance of all main effects and interactions involving fixed factors, we used Wald tests. We report results for key regression coefficients in the main text; for full regression model results, see https://osf.io/bdkpy/.
Analysis of final actions
Hypothesis-blind research assistants coded whether or not participants used the target instrument as they acted out each instruction. This indicated whether participants ultimately adopted an ‘instrument’ interpretation of the instruction. Participants’ actions were then analyzed using logistic regressions. For each task, we crossed the factors information bias (cues biasing for or against using the target instrument) and group (controls or patients). Random effects were treated as above. The full results of all models are available at https://osf.io/bdkpy/.
Results
Analysis of online processing (eye movements)
Effects of prosodic phrasing and verb semantic–thematic constraints
The eye movements of control participants and people with schizophrenia were affected by both prosodic phrasing (Fig. 2A) and the verb's semantic–thematic constraints (Fig. 2B): both groups appeared to look more often to the instrument when these bottom-up cues suggested that they should do so (see Table 3 for descriptive statistics).

Fig. 2. How participants’ eye movements and final actions were affected by lower and higher level information. (a) Use of prosodic phrasing, (b) use of lexical information, (c) use of pragmatically relevant visual information, (d) use of conversational discourse information. Graphs show proportion of trials on which controls (left panel) and patients (middle panel) fixated on the target instrument within the early and late time windows, both when information biased toward and against the instrument interpretation. Lines are loess smoothers; shaded ribbons indicate 95% CI. Right panels show participants’ final actions. Error bars represent ±1 standard error of the mean. Online Supplementary Materials show eye movements to each object over time.
Table 3. Mean proportion of trials on which participants fixated the target instrument (early and late time windows) or used the target instrument to carry out their final actions, depending on whether the different experimental manipulations biased toward or against the instrument interpretation. Standard deviations are in parentheses

Logistic regressions confirmed these patterns. In controls, there were significant effects of prosodic phrasing on eye movements [β = −0.80 (s.e. = 0.13), CI −1.05 to −0.55, Wald's z = 6.3, p < 0.001]: when prosody biased toward the instrument interpretation, the odds of gazing at the target instrument were significantly higher than when it biased against the instrument interpretation. Similarly, in people with schizophrenia, the effect was also significant [β = −0.74 (0.16), CI −1.05 to −0.43, Wald's z = 4.7, p < 0.001], meaning that people in this group were also more likely to gaze at the target instrument when the prosody biased toward this interpretation. A between-group comparison confirmed that the size of the prosody effect did not significantly differ between controls and people with schizophrenia (no interaction between information bias and group, β = 0.11 (0.20), CI −0.28 to 0.50, Wald's z = 0.53, p = 0.59).
Similarly, in both the control and schizophrenia groups, there were significant effects of the verb's semantic–thematic constraints. The control group looked significantly more at the target instrument when the verb was biased toward this interpretation [β = −0.92 (0.16), CI −1.23 to −0.60, Wald's z = 5.7, p < 0.001], and the same was true for people with schizophrenia [β = −0.84 (0.19), CI −1.20 to −0.47, Wald's z = 4.5, p < 0.001]. Once again, this effect did not differ significantly between the two groups [β = −0.07 (0.10), CI −0.14 to 0.28, Wald's z = 0.68, p = 0.49].
Effects of pragmatically relevant visual information and conversational discourse information
In contrast to the lower level cues, the effects of both pragmatically relevant visual information (Fig. 2C) and conversational discourse information (Fig. 2D) on eye movements appeared to differ between the control and schizophrenia groups (see Table 3 for descriptive statistics). Whereas controls looked more often to the target instrument when both these higher level cues suggested that they should do so, people with schizophrenia did not appear to show such robust effects.
Logistic regressions confirmed these observations. In controls, the effect of pragmatically relevant visual context was significant [β = −0.39 (0.16), CI −0.71 to −0.07, Wald's z = 2.4, p = 0.02]: when visual context biased toward the instrument interpretation, controls were more likely to gaze at the target instrument. In people with schizophrenia, however, the effect was not significant [β = 0.10 (0.13), CI −0.17 to 0.36, Wald's z = 0.72, p = 0.47]: visual context did not significantly affect their gaze to the target instrument. The between-group analysis confirmed that visual context had a significantly greater effect on controls than on people with schizophrenia [significant interactions between information bias and group, β = 0.21 (0.10), CI 0.02–0.40, Wald's z = 2.1, p = 0.03].
Similarly, conversational discourse information significantly affected the eye movements of control participants [β = −0.58 (0.15), CI −0.88 to −0.28, Wald's z = 3.8, p < 0.001]; they were significantly more likely to gaze at the target instrument when the prior question was biased toward this instrument interpretation. In contrast, conversational discourse did not have a significant effect on the eye movements of people with schizophrenia [β = −0.1 (0.14), CI −0.37 to 0.16, Wald's z = 0.76, p = 0.45]. Once again, the between-group analysis confirmed that the conversational discourse information had a significantly greater effect in controls than in people with schizophrenia [significant interaction between information bias and group, β = 0.45 (0.20), CI 0.05–0.84, Wald's z = 2.2, p = 0.03].
We also carried out exploratory correlational analyses between patterns of eye movements and clinical variables within the schizophrenia group. These are reported in online Supplementary Material.
Analysis of final interpretations (final actions)
Both groups of participants made similar use of bottom-up prosodic phrasing and semantic–thematic constraints to inform their final actions (see Fig. 2 and Table 3 for descriptive statistics). Logistic regressions confirmed this pattern. When both these bottom-up cues biased toward the target instrument, then both control participants and people with schizophrenia were significantly more likely to use the target instrument to carry out their final actions, compared with when the phrasing was biased against the target instrument. This held for both prosodic phrasing [controls: β = −1.1 (0.21), CI −1.48 to −0.64, Wald's z = 4.9, p < 0.001; people with schizophrenia: β = −0.94 (0.18), CI −1.29 to −0.58, Wald's z = 5.2, p < 0.001] and for the verb's semantic–thematic constraints [controls: β = −1.19 (0.20), CI −1.58 to 0.81, Wald's z = 6.0, p < 0.001; people with schizophrenia: β = −2.5 (0.67), CI −3.81 to −1.19, Wald's z = 3.74, p < 0.001]. Between-group analyses revealed no significant differences between the two groups in how these two types of bottom-up information influenced their final actions (no significant interactions between information bias and group for prosodic phrasing: β = −0.04 (0.13), CI −0.30 to 0.21, Wald's z = 0.32, p = 0.75, or for semantic–thematic constraints: β = −0.19 (0.15), CI −0.53 to 0.14, Wald's z = 1.1, p = 0.25.
The pattern for conversational discourse was similar (Fig. 2D and Table 3). Both groups used this information to inform their final actions [controls: β = −0.42 (0.20), CI −0.82 to −0.02, Wald's z = 2.1, p = 0.04; people with schizophrenia: β = −0.58 (0.20), CI −0.99 to −0.18, Wald's z = 2.9, p = 0.004] and there was no significant difference between the two groups [no significant interaction between information bias and group, β = −0.09 (0.14), CI −0.37 to 0.19, Wald's z = 0.62, p = 0.54]. Interestingly, despite showing an effect on controls’ eye movements (see above), pragmatically relevant visual context (Fig. 2C and Table 3) did not significantly affect controls’ final actions [β = −0.24 (0.17), CI −0.56 to 0.09, Wald's z = 1.4, p = 0.16]Footnote †Footnote 1. It also did not significantly affect patients’ final actions [β = −0.10 (0.22), CI −0.54 to 0.33, Wald's z = 0.45, p = 0.65], and there was no between-group difference in these effects [no significant interaction between information bias and group, β = 0.04 (0.12), CI −0.19 to 0.27, Wald's z = 0.32, p = 0.75].
Discussion
This study used the visual-world eye-tracking paradigm to compare how people with schizophrenia and demographically matched healthy controls use two types of lower level information (prosodic and lexical representations) and two types of higher level information (pragmatic and discourse representations) to guide syntactic processing during naturalistic spoken language comprehension. We found a dissociation in how the groups use these different types of cues as language is processed. In both groups, eye movements were robustly affected by a sentence's prosodic phrasing, as well as by the lexical constraints of its verb, suggesting that these lower level cues quickly biased syntactic processing to influence interpretation. However, in comparison with healthy controls, higher level cues – pragmatically relevant visual information and conversational discourse information – had a significantly reduced effect on the eye movements of people with schizophrenia, suggesting that they did not use these cues to immediately bias syntactic processing and sentence interpretation. Despite these differences in online processing, the two groups did ultimately reach the same interpretations, as reflected by their final actions.
These findings suggest that people with schizophrenia are impaired in their ability to predictively use higher level information in a highly interactive top-down fashion to inform the immediate processing and interpretation of incoming information. Importantly, this cannot easily be explained by a more general cognitive deficit. Such general deficits can sometimes lead to the artificial appearance of a differential deficit because of task demands or performance at ceiling or floor (see Chapman and Chapman, Reference Chapman and Chapman1973; Gold and Dickinson, Reference Gold and Dickinson2012). However, our eye-tracking paradigm posed essentially no task demands (participants simply needed to interpret simple sentences with no ‘correct’ interpretations)Footnote 2, and performance was never at either ceiling or floor in our key measures.
Our findings go beyond prior work in several ways. The demonstration of a dissociation between the use of higher and lower level information to process the syntactic structure of an entire sentence extends previous findings reporting similar dissociations between the effects of higher level discourse and lower level lexical information on semantic processing of individual words within sentences (Titone et al., Reference Titone, Levy and Holzman2000; Sitnikova et al., Reference Sitnikova, Salisbury, Kuperberg and Holcomb2002; Kuperberg et al., Reference Kuperberg, Sitnikova, Goff and Holcomb2006; Ditman et al., Reference Ditman, Goff and Kuperberg2011; Swaab et al., Reference Swaab, Boudewyn, Long, Luck, Kring, Ragland, Ranganath, Lesh, Niendam, Solomon and Mangun2013). Our findings also show that this dissociation extends across multiple different higher and lower level information sources. Specifically, the same people with schizophrenia who were able to use lower level lexical information to modulate syntactic processing during real-time comprehension were also able to use lower level prosodic phrasing, and the same people with schizophrenia who were impaired in their use of higher level conversational discourse context were also impaired in their use of higher level pragmatically relevant visual information. This significantly bolsters claims for a selective impairment of top-down interactive processing in schizophrenia.
Our finding that people with schizophrenia were impaired in their use of non-verbal pragmatic information (i.e. relevant information within the surrounding visual scene) is consistent with other evidence of pragmatic communicative difficulties in schizophrenia (e.g. Harrow et al., Reference Harrow, Lanin-Kettering and Miller1989; Meilijson et al., Reference Meilijson, Kasher and Elizur2004; Colle et al., Reference Colle, Angeleri, Vallana, Sacco, Bara and Bosco2013; Bambini et al., Reference Bambini, Arcara, Bechi, Buonocore, Cavallaro and Bosia2016; Pawełczyk et al., Reference Pawełczyk, Kotlicka-Antczak, Łojek, Ruszpel and Pawełczyk2017), which may be related to more general theory of mind deficits (Frith, Reference Frith2004; but see McCabe et al., Reference McCabe, Leudar and Antaki2004). This finding also speaks to the precise role of working memory in language processing: given that participants could always see the visual scene in front of them, the relative insensitivity to this type of information in the schizophrenia group implies that high-level impairments are not solely due to problems in maintaining or manipulating higher level linguistic information over time within working memory. Rather, they suggest a more specific impairment in the top-down use of goal-relevant information to constrain processing, which may be dissociable from simple maintenance demands in schizophrenia (e.g. see Kim et al., Reference Kim, Somerville, Johnstone, Polis, Alexander, Shin and Whalen2004; Barch and Smith, Reference Barch and Smith2008 for discussion).
The key features of our study – its naturalistic methodology and broad exploration of linguistic context – license a number of novel conclusions. However, it is important to note how inferences from these data should be constrained. For example, one strength of our study was that the same participants completed multiple different tasks, permitting conclusions about patterns of strength and weakness. However, our sample size was comparatively small. This, along with the relatively small proportion of female participants, should be borne in mind when considering the generalizability of our findings, particularly over whether this pattern of results is a stable feature of schizophrenia or whether it evolves over the course of the disorder or through its pharmacological treatment. While we did not find correlations between performance and either age or medication (see online Supplementary Material), a definitive answer to this question would require a larger sample size and, ideally, longitudinal data. It will also be important to determine whether a similar dissociation is evident in people at high risk for developing schizophrenia.
Our main finding – eye-movement evidence that individuals with schizophrenia are selectively impaired in their use of higher level information to predictively and interactively influence processing of bottom-up linguistic input – is consistent with more general frameworks proposing that a breakdown of predictive mechanisms can explain multiple aspects of the schizophrenia syndrome (Corlett et al., Reference Corlett, Frith and Fletcher2009; Fletcher and Frith, Reference Fletcher and Frith2009; Corlett et al., Reference Corlett, Taylor, Wang, Fletcher and Krystal2010; Adams et al., Reference Adams, Stephan, Brown, Frith and Friston2013). Importantly, however, this theory does not imply that higher level representations are inherently abnormal or that they cannot be used at all in schizophrenia. Rather, it emphasizes a disturbance in the connections that allow inputs from higher levels of representation to rapidly and predictively influence processing at intermediate levels of representation, thereby constraining activity from lower levels of representation as they become available (Brown and Kuperberg, Reference Brown and Kuperberg2015). Such fast, online predictive processes are thought to play a critical role in allowing language to be understood quickly and accurately in healthy individuals (Kuperberg and Jaeger, Reference Kuperberg and Jaeger2016).
Our focus on top-down connections should also not be taken to imply that lower level perceptual processing is never impaired in schizophrenia, as disturbances in acoustic or lexical processing are well-attested (Cienfuegos et al., Reference Cienfuegos, March, Shelley and Javitt1999; Kasai et al., Reference Kasai, Nakagome, Itoh, Koshida, Hata, Iwanami, Fukuda and Kato2002; Javitt and Freedman, Reference Javitt and Freedman2015). However, our findings raise the interesting possibility that apparent low-level perceptual disturbances may stem from disturbances in top-down predictions (Hemsley, Reference Hemsley1993; Silverstein et al., Reference Silverstein, Matteson and Knight1996; Silverstein et al., Reference Silverstein, Hatashita-Wong, Schenkel, Wilkniss, Kovács, Fehér, Smith, Goicochea, Uhlhaas, Carpiniello and Savitz2006; Ford and Mathalon, Reference Ford and Mathalon2012; see Brown and Kuperberg, Reference Brown and Kuperberg2015, for discussion). This idea also raises the possibility that a breakdown in top-down interactions might actually cause lower level representations to develop abnormally, given the close relationship between prediction and learning in linguistic (Dell and Chang, Reference Dell and Chang2014; Kleinschmidt and Jaeger, Reference Kleinschmidt and Jaeger2015; Rabagliati et al., Reference Rabagliati, Gambi and Pickering2014) and non-linguistic (Rescorla, Reference Rescorla1988) domains (Adcock et al., Reference Adcock, Dale, Fisher, Aldebot, Genevsky, Simpson, Nagarajan and Vinogradov2009; Brown and Kuperberg, Reference Brown and Kuperberg2015). Future longitudinal work will be necessary for understanding the developmental relationship between predictive processing based on higher level representations and low-level perceptual processing in schizophrenia.
Finally, our finding that patients were impaired in their use of higher level cues in our naturalistic task has potential implications for understanding the use of spoken language in real-world contexts in schizophrenia. For example, the predictive use of higher level information plays a vital role in allowing smooth turn-taking during every day conversational interactions (de Ruiter et al., Reference de Ruiter, Mitterer and Enfield2006; Magyari and de Ruiter, Reference Magyari and de Ruiter2012). It also ensures that language comprehension is fast and accurate in noisy or challenging environments, such as when listening to announcements on public transport or attending to one speaker amongst many in social contexts. Our data shed light on why real-world communication situations like these may present important challenges in schizophrenia (Brown and Kuperberg, Reference Brown and Kuperberg2015). In addition, our finding that, given enough time, patients were able to use these top-down cues to inform their final interpretations (see also Ditman and Kuperberg, Reference Ditman and Kuperberg2007; Kuperberg et al., Reference Kuperberg, Ditman and Choi Perrachione2018) suggests that, despite such challenges, language deficits may not necessarily manifest using traditional ‘off-line’ assessment tools. We suggest that the visual-world eye-tracking method is an ideally naturalistic and well-controlled solution for studying these real-world communication issues in schizophrenia.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291718001952
Acknowledgements
This work was funded by the National Institute of Mental Health (R01MH071635 to G.R.K.), by the Economic and Social Research Council (ES/L01064X/1 to H.R.), by the National Science Foundation (BCS-0921012 to J.S.) and by a fellowship from the Harvard Mind, Brain, Behavior Initiative (to H.R., G.K., and J.S.). The authors are grateful to Donald Goff, Leah Briggs, and Claire Oppenheim for supporting patient recruitment, to Paul Mains, Gianna Wilkie, Dan Kim, Kristina Fanucci, and Margarita Zeitlin for supporting recruitment and testing, and to Meredith Brown for her comments on the manuscript.