1. Introduction
Though sentence meaning is computed rapidly, contextual information may contribute to the semantic evaluation of an utterance, thus leading to processing difficulties when the two domains are not in accord. Therefore, the ultimate interpretation of a sentence is not only determined by the meanings of its words, but also by extra-sentential cues. For instance, sentences containing quantifiers, as in (1), may be interpreted differently depending on the location where they are uttered: when uttered in Germany, the sentence will likely be judged false, while it will be judged true when uttered in a busy Manhattan street.
-
(1) All cabs are yellow.
Quantifiers like all are grammatical determiners expressing abstract quantity information (Barwise & Cooper, Reference Barwise and Cooper1981), and are generally taken to denote relations between two sets, their restrictor argument and their nuclear scope. In the above example, the restrictor set is a set of relevant cabs and the scope set is the set of yellow things, or alternatively, equivalent to the set of yellow things that are cabs (e.g., Peters & Westerståhl, Reference Peters and Westerståhl2006). Thus, one specific semantic property of quantifiers is that they trigger domain restriction over their nominal arguments (Westerståhl, Reference Westerståhl, van Benthem and ter Meulen1985). In (1), the quantifier all is automatically restricted to a contextually relevant set of cabs instead of taking its argument to refer to all existing cabs in the world. For arriving at a final truth evaluation, contextual information thus needs to be combined with sentential meaning during some stage of processing. To date, relatively little is known about such semantic restriction processes during sentence comprehension.
The present event-related potentials (ERP) study investigates quantificational restriction in the presence of pictorial context information. For examining context effects during on-line semantic comprehension, electro-physiological measures are particularly well suited (Hagoort & van Berkum, Reference Hagoort and van Berkum2007). Event-related brain potentials provide a high temporal resolution for investigating the incremental nature of compositional processes, and are sensitive to distinct aspects of meaning. In the ERP literature, a majority of studies on context effects focuses on semantic word integration. In these studies, facilitating effects of contextual information were shown (Hagoort, Hald, Bastiaansen, & Petersson, Reference Hagoort, Hald, Bastiaansen and Petersson2004), which are reflected by the N400 component, a negative-going deflection with a centro-parietal maximum between 200 and 600 ms post stimulus onset. The amplitude of the N400 is gradually affected by the goodness of contextual fit, with the smallest amplitude for the best fit (Kutas & Hillyard, Reference Kutas and Hillyard1984; Kutas, Van Petten, & Kluender, Reference Kutas, Van Petten, Kluender, Traxler and Gernsbacher2006). Besides intra-sentential information, extra-sentential cues such as world knowledge, speaker identity, and pragmatic inferencing affect the N400 (Filik & Leuthold, Reference Filik and Leuthold2008; Kutas & Federmeier, Reference Kutas and Federmeier2011; Nieuwland, Ditman, & Kuperberg, Reference Nieuwland, Ditman and Kuperberg2010; van Berkum, Brown, Zwitserlood, Kooijman, & Hagoort, Reference van Berkum, Brown, Zwitserlood, Kooijman and Hagoort2005). These findings are compatible with an interpretation of the N400 as representing the semantic fit between any kind of contextually relevant information and the actually processed lexical item. Moreover, it has been shown that the N400 is modulated by particularly salient contextual information, which may elicit expectancies regarding the lexical–semantic properties of upcoming words (e.g., DeLong, Urbach, & Kutas, Reference Delong, Urbach and Kutas2005; Ito, Corley, Pickering, Martin, & Nieuwland, Reference Ito, Corley, Pickering, Martin and Nieuwland2016; Otten, Nieuwland, & van Berkum, Reference Otten, Nieuwland and van Berkum2007).
Rapid context effects were also reported in another domain of semantic processing, namely the comprehension of verb action information. Using a picture–sentence verification paradigm, Knoeferle, Urbach, and Kutas (Reference Knoeferle, Urbach and Kutas2011) investigated German equivalents of sentences like The gymnast pushes / applauds the journalist that were preceded by pictures that correctly or incorrectly depicted the scenes described by the sentences. At the position of the verb, an increased N400 signalled a mismatch between the picture and the actually presented sentential information (see also Vissers, Kolk, van de Meerendonk, & Chwilla, Reference Vissers, Kolk, van de Meerendonk and Chwilla2008).
In contrast to these early effects, studies on the temporal availability of quantifier meaning suggest that at least some aspects of semantic composition are contextually constrained in a delayed manner. So-far investigated quantificational properties include domain restriction (Frazier, Clifton, Rayner, Deevy, Koh, & Bader, Reference Frazier, Clifton, Rayner, Deevy, Koh and Bader2005; Kaan, Dallas, & Barkley, Reference Kaan, Dallas and Barkley2007), scope ambiguities (for a recent overview, see Bott & Schlotterbeck, Reference Bott and Schlotterbeck2015), monotonicity differences (Deschamps, Agmon, Loewenstein, & Grodzinsky, Reference Deschamps, Agmon, Loewenstein and Grodzinsky2015), scalar implicatures (Hartshorne, Snedeker, Azar, & Kim, Reference Hartshorne, Snedeker, Liem Azar and Kim2015; Huang & Snedeker, Reference Huang and Snedeker2009; Hunt, Politzer-Ahles, Gibson, Minai, & Fiorentino, Reference Hunt, Politzer-Ahles, Gibson, Minai and Fiorentino2013, Politzer-Ahles, Fiorentino, Jiang, & Zhou, Reference Politzer-Ahles, Fiorentino, Jiang and Zhou2013), and quantificational consequences on reference resolution (see Nouwen, Reference Nouwen, Everaert, Lentz, de Mulder, Nilsen and Zondervan2010, for review) or world knowledge evaluations (Kounious & Holcomb, Reference Kounios and Holcomb1992; Nieuwland, Reference Nieuwland2016; Urbach & Kutas, Reference Urbach and Kutas2010; Urbach, DeLong, & Kutas, Reference Urbach, DeLong and Kutas2015). Though it is uncontroversial that quantificational properties affect the overall interpretation of a sentence, it is still an unresolved issue whether quantifier meaning ‘incrementally’ determines the truth evaluation at each incoming word.
In fact, at least on the single-sentence level, there are cases in which quantificational information does not appear to contribute to an immediate meaning assignment, but is only evaluated at the end of the sentence. In particular, the processing of quantifier scope has been shown to give rise to non-incremental semantic processing effects. For instance, in an eye-tracking during reading experiment, Bott and Schlotterbeck (Reference Bott and Schlotterbeck2015) examined the on-line effects of scope inversion in multiply quantified sentences. In sentences such as Jeden dieser/seiner Schüler hat genau ein Lehrer voller Wohlwollen gelobt ‘each of-these / of-his pupils has exactly one teacher praised’, processing difficulties for scope-inverted readings were delayed to the clause-final verb, though a local truth evaluation could have already been computed at the position of the second quantified NP ‘exactly one teacher’. This finding indicates that a fully-fledged interpretation of quantifier scope inversion may require the exhaustive processing of the whole sentence (see also Dwivedi, Phillips, Einagel, & Baum, Reference Dwivedi, Phillips, Einagel and Baum2010, who did not find local N400 differences between linear and inverted scope; but see Dotlačil & Brasoveanu, Reference Dotlačil and Brasoveanu2015, for an opposing view).
To date, only a few studies have considered the incremental effects of contextual properties on semantic aspects of quantifier processing. Footnote 1 With regard to quantifier restriction, a study by Kaan et al. (Reference Kaan, Dallas and Barkley2007) investigated extra-sentential context effects on the processing of quantified numerals as in Twelve/four flowers were put in a vase. Six had a broken stem, in which the meaning of the numeral six is interpreted with respect to the preceding subject NP. Following twelve flowers, the numeral could either refer to six of the twelve previously mentioned flowers or, alternatively, to a different reference set containing flowers that were not mentioned before. Whereas behavioural studies using comparable materials showed rather early effects (Frazier et al., Reference Frazier, Clifton, Rayner, Deevy, Koh and Bader2005; Wjinen & Kaan, Reference Wijnen and Kaan2006), ERPs from the onset of the numeral six revealed that an impossible subset reading did not affect quantifier processing within the N400 time-window. Instead, a late positivity was observed 900–1500 ms post numeral onset, possibly reflecting the establishment of a new discourse referent (Burkhardt, Reference Burkhardt2006; see also Paterson, Filik, & Moxey, Reference Paterson, Filik and Moxey2009, for a comparable view). The delayed brain response suggests that a complete meaning evaluation was computed non-incrementally, i.e., on the word following the quantifier. This observation is probably not surprising, given that alternative continuations like Six women … are still available on the numeral, and an unambiguous decision could only be made on the verb had. These findings thus suggest that quantifier restriction is processed incrementally as soon as the input properties allow for an unambiguous decision.
Recently, Politzer-Ahles et al. (Reference Politzer-Ahles, Fiorentino, Jiang and Zhou2013) investigated pictorial context effects on the processing of the quantifiers all and some. For the present considerations, the results for the quantifier all are therefore particularly relevant. Politzer-Ahles et al. tested Mandarin Chinese versions of the sentence In this picture, all of the girls are sitting on blankets suntanning, preceded by pictures that did or did not match the sentential meaning. In a first ERP experiment, a visual picture–sentence verification task, ERPs from quantifier onset showed a reduced negativity (200–500 ms) when the pictorial information was not in accord with the meaning conveyed by the quantifier. According to the authors, the negativity is a form of the Nref component reflecting a decreased effort to bind the quantifier with an antecedent once the semantic inconsistency with the context has been recognized. In the second experiment, employing a picture–sentence consistency rating, sentences were presented auditorily. Contrary to the results from the first experiment, semantically false sentences elicited a sustained positivity (300–1000 ms post quantifier onset). According to the authors, the divergent effects could be due to experiment-inherent differences, such as presentation mode or task demands. Similarly to the study by Kaan et al. (Reference Kaan, Dallas and Barkley2007), the study by Politzer-Ahles et al. did not investigate processing on the point of disambiguation. As the authors note, an unambiguous decision could only be made later in the sentence. Since the materials in the post-quantifier region were not controlled for frequency, word order, and length, post-hoc analyses were not possible due to differing disambiguation regions (2013, p. 145f.).
Indeed, the occurrence of ambiguity may be one of the reasons for a delayed processing of specific quantificational properties, such as their effects on readers’ on-line evaluations of sentences related to world knowledge. For instance, as pointed out by Nieuwland (Reference Nieuwland2016, p. 317), a truth evaluation of the sentence fragment Few majors see, though locally false, might be delayed until a later position where quantifier scope can be unambiguously calculated (e.g., Few majors see ghosts). The avoidance of such potential meaning shifts may thus lead to underspecified semantic processing (see also Villalta, Reference Villalta2003). However, even with unambiguous scope, quantifiers like few do not appear to be interpreted in a fully incremental fashion (e.g., Urbach & Kutas, Reference Urbach and Kutas2010). Given the immediate context effects on lexical–semantic processing, a reasonable assumption would be that contextual information is already able to trigger more immediate decisions at the earliest possible sentential position.
In a recent study, Urbach et al. (Reference Urbach, DeLong and Kutas2015) tested this assumption by examining contextual effects on quantified sentences related to world knowledge. When presented in isolation, quantificational properties have been shown to contribute inconsistently (Urbach & Koutas, Reference Urbach and Kutas2010) or not at all (Kounious & Holcomb, Reference Kounios and Holcomb1992) to the incremental processing of such sentences. For instance, Kounios and Holcomb presented class inclusion statements such as all / some / no rubies are gems / spruces in an ERP study. Though off-line truth-value judgements showed that participants had correctly interpreted these sentences, ERPs from the onset of the sentence-final word showed an N400 reduction for gems vs. spruces regardless of the quantificational meaning, thus indicating that quantifier interpretation did not incrementally contribute to meaning assignment (see also Fischler, Bloom, Childers, Roucos, & Perry, Reference Fischler, Bloom, Childers, Roucos and Perry1983, for comparable results in the field of negation processing). Moreover, Urbach and Kutas (Reference Urbach and Kutas2010) investigated the effects of positive and negative quantifiers like most and few on the processing of sentences involving typical and atypical sentence continuations (Most / few tourists visit museums / mines in their vacation). Whereas post-sentential plausibility ratings showed the expected cross-over interaction (i.e., reduced plausibility ratings for atypical continuations (mines) for most, and the opposite pattern for few), ERPs from the sentential object showed a general typicality effect on the N400, which was reduced for negative quantifiers. This absence of a full cross-over interaction in the ERPs led the authors conclude that quantifier interpretation proceeds partly under-specified, with delayed processing of negative quantifiers.
In order to test whether contextual effects may boost the incremental in-depth processing of negative quantifiers, the follow-up study by Urbach et al. (Reference Urbach, DeLong and Kutas2015) presented comparable sentences that were preceded by sentential contexts (e.g., The contractor was shocked to see that the construction crew was comprised completely of women. This was quite unusual considering that # most / few construction workers are male / female in this day and age.) By increasing the predictability of atypical continuations for the negative quantifiers, contexts should thus enhance the probability of obtaining a cross-over interaction already locally, as indexed by the N400. Results from two ERP studies show that, principally, contextual information is indeed able to trigger a fully incremental interpretation of negative quantifiers. However, the expected N400 effect was only observed in the second experiment, a sentence reading task, whereas in the first study, a plausibility rating task, a similar discrepancy between on-line and off-line effects as in previous studies was observed. Though the reasons for this task dependency are currently unclear, it can be noted that, at least principally, contextual information may boost earlier in-depth semantic processing of quantificational properties (see also Nieuwland & Kuperberg, Reference Nieuwland and Kuperberg2008, for a comparable effect on the processing of negated sentences). Thus, contextual licensing is principally able to determine the sentential positions where semantic commitments are made. Finally, incremental N400 effects for negative quantifiers have also been reported in Nieuwland (Reference Nieuwland2016), who employed the same factorial design as Urbach and Kutas (Reference Urbach and Kutas2010), but additionally controlled for the predictability of the critical word. For items with a high cloze probability of the critical word across quantifier conditions, a fully-fledged cross-over interaction was found, but for items with low cloze probability, effects were similar to Urbach and Kutas’ study. Quantifier interpretation thus can be fully incremental if there is sufficiently strong guidance by world-knowledge for predicting how the quantified statement will continue (see also Freunberger & Nieuwland, Reference Freunberger and Nieuwland2016, for facilitating effects of prosodic information).
In sum, contextual information has been shown to rapidly affect lexical–semantic word integration in the N400 time-window, as well as the processing of verb action information in the N400 time-window. With respect to quantifier interpretation, contextual cues are able to incrementally affect compositional-semantic processing, but the current findings are far from offering a homogeneous picture. The few studies explicitly dealing with quantifier restriction show a discrepancy between behavioural and ERP studies, with apparently earlier effects found in behavioural tasks. In these studies, quantifier processing was not established independently of discourse processes, and the delayed ERP response might be partially related to the fact that an unambiguous response is available only at the post-quantificational position. Comparably, the results from Politzer-Ahles et al. (Reference Politzer-Ahles, Fiorentino, Jiang and Zhou2013) do not provide information about quantifier processing at a fully disambiguated position. On the other hand, contextual information has been shown to boost the incremental processing of quantified sentences dealing with world knowledge. However, though contextual cues affected processes already in the N400 time-window, the dependence on a specific experimental task makes it currently unclear whether these effects can be generalized.
The present study was designed to shed further light on the incremental nature of contextual effects on the processing of quantificational meaning. In particular, we included well-defined disambiguation regions to observe quantifier restriction as the sentence meaning unfolds over time.
2. The present study
In two experiments, we examined quantifier restriction in sentences containing the quantifier all preceded by pictorial contexts. In contrast to previous studies, we controlled the sentential positions where unambiguous decisions could be made and also added a new dimension to previous work by investigating potential semantic revision effects as in All cabs are yellow that are shown in this picture of Manhattan. As noted above, these sentences may be judged as false on the colour adjective. However, a following restriction could trigger potential meaning shifts from ‘false’ to ‘true’. For instance, the extraposed relative clause that are shown in this picture of Manhattan further restricts the set of cabs to a proper subset reading. In our study, we abstracted away from potential lexico-semantic confounds and presented materials involving geometrical figures. In both studies, one of four pictorial contexts (A–D, Figure 1) preceded a question as in (2), which was related to the content of the picture.
-
(2) Sind alle Dreiecke blau, die innerhalb des Kreises sind?
Are all triangles blue, that inside-of the circle are?
‘Are all the triangles blue that are inside of the circle?’
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_fig1g.jpeg?pub-status=prepub)
Fig. 1. Visual contexts preceding the target questions in Experiment 1.
We examined whether participants incrementally use pictorial cues during the processing of quantified questions. In addition, we tested whether semantic commitments are delayed in the face of ambiguity (cf. Kaan et al., Reference Kaan, Dallas and Barkley2007). If contextual information constrains semantic processing independently of potential ambiguities, then a semantic commitment will already be made at the earliest possible position, i.e., the colour adjective. Alternatively, if a truth evaluation is delayed until unambiguous information is encountered, a semantic commitment will only be made at the preposition in the relative clause. Note that an early local commitment might be risky, as it would come with the cost of a semantic revision. For instance, if (2) were preceded by context A, an initial ‘false’ judgement on the adjective needed to be revised to a ‘true’ judgement on the preposition. Therefore, we also tested whether the parser is sensitive to contextual reliability.
In order to control for potential confounds by silent prosody (Fodor, Reference Fodor and Hirotani2002), we presented target yes/no questions instead of declarative clauses. German declaratives are prosodically realized by clause-final falling contours (Eckstein & Friederici, Reference Eckstein and Friederici2006), which are incompatible with a relative clause continuation (Augurzky, Reference Augurzky2006). By contrast, the prosody of questions is highly comparable to a relative clause continuation (both have final rises) and should thus render the sentence ambiguous with respect to whether or not a restrictive cue will follow.
To sum up: the present study examines context effects on the timecourse of quantifier restriction at different question positions. We tested whether contextual information constrains quantifier interpretation at the earliest possible position and whether potential meaning shifts occur at later positions. We examined whether ERP effects reflect early semantic integration processes within the N400 time-window or whether quantifier interpretation is generally delayed, as reflected by later positive components. To this end, we carried out two ERP studies. In our first study, we employed a picture–question verification task. In the second study, we used a probe detection task to test whether similar effects would be observed when attention was directed away from the picture–question mapping.
3. Experiment 1
In Experiment 1, a picture–question verification task, we examined whether context information incrementally constrains the semantic processing of questions involving quantificational restriction. To this end, one of four visual contexts (Figure 1) was presented at the beginning of each trial, and then a question containing the universal quantifier.
For examining restrictive processes during question processing, we compared ERPs at different sentential positions: first, from the onset of the colour adjective (‘blue’), and second, from the onset of the preposition (‘inside-of/outside-of’). The complete question materials are provided in Table 1. Contexts either consisted of simple (B,C) or complex (A,B) pictures. In each picture, geometrical objects like triangles were presented within and outside of a container form (e.g., a circle). In the simple pictures, all objects were of identical colour, whereas in the complex pictures, objects within and outside of the container form differed in colour.
table 1. Design of the experiments
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_tab1.gif?pub-status=prepub)
Once the pictorial information has been processed, an answer to the following question is principally already available at the position of the colour adjective. If the parser incrementally makes a semantic commitment at this position, meaning shifts should be observed on the preposition following complex contexts (i.e., for A1 vs. A2 and D2 vs. D1). Given the currently indecisive results concerning the incrementality of context effects on quantificational processing, we intended to empirically discriminate between the following two competing hypotheses:
H1: A revision-insensitive version of incrementality predicts an immediate truth evaluation on the colour adjective regardless of a potential semantic revision. Increased processing cost is expected for false answers (A,C,D vs. B). In complex contexts (A,D), local answers have to be revised on the preposition in half of the trials (A: ‘inside-of’ vs. ‘outside-of’; D: ‘outside-of’ vs. ‘inside-of’).
H2: A revision-sensitive version of incrementality predicts that the position of answer selection differs between simple and complex contexts: For C vs. B, integration difficulties are expected on the adjective, and for A and B, later difficulties are expected when the preposition requires a negative as opposed to an affirmative answer (A: ‘outside-of’ vs. ‘inside-of’; D: ‘inside-of’ vs. ‘outside-of’). These effects should be qualitatively similar to those observed in contexts C vs. B on the adjective.
Predictions regarding the expected ERP components are not entirely straightforward. If context information immediately constrains the upcoming semantic integration, processing difficulties should already be visible in the N400 time-window (Knoeferle et al., Reference Knoeferle, Urbach and Kutas2011; Urbach et al., Reference Urbach, DeLong and Kutas2015). The N400 may be followed by a late positive component (Knoeferle et al, Reference Knoeferle, Urbach and Kutas2011). If semantic processing of quantificational restriction is generally delayed, we might expect either absent effects (Kounios & Holcomb, Reference Kounios and Holcomb1992; Urbach & Kutas, Reference Urbach and Kutas2010), or late positive components reflecting mismatch processing (Kaan et al., Reference Kaan, Dallas and Barkley2007, Politzer-Ahles et al., Reference Politzer-Ahles, Fiorentino, Jiang and Zhou2013). The detailed predictions are summarized in Table 2.
table 2. Predictions a
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_tab2.gif?pub-status=prepub)
a Although these two hypotheses are the major objectives of Experiment 1, we also conducted additional ad-hoc analyses to obtain a more complete picture of the results obtained. However, despite the statistical significance of such additional outcomes, they must be regarded as post-hoc findings and hence interpreted with great caution because they might simply reflect false-positive results (e.g., Ioannidis, Reference Ioannidis2005; Ulrich & Miller, Reference Ulrich and Miller2015). Nevertheless such post-hoc findings may serve as hypotheses for further studies.
As Politzer-Ahles et al. (Reference Politzer-Ahles, Fiorentino, Jiang and Zhou2013) investigated ERP responses to the quantifier all, we also analyzed this position. Due to previously indecisive findings, predictions for this position must be formulated carefully. We either expected a reduced negativity or a sustained positivity when the quantifier did not fit with the preceding picture (i.e., alle preceded by contexts (A,B), where a subset reading has to be instantiated).
3.1. methods
3.1.1. Participants
Twenty-four right-handed students (mean age: 23.5 years) from the University of Tübingen took part in the study (11 male). They were native speakers of German with normal or corrected-to-normal vision and were paid for their participation.
3.1.2. Materials
A 4×2-factorial within-subjects design was used with the factors Context (A–D) and Preposition (‘inside-of’, ‘outside-of’). Pictures were generated as quadruplets via MS PowerPoint. Each picture contained geometrical shapes like triangles and a container form like a circle. Geometrical shapes were both inside and outside of this form. For each quadruplet, the geometrical forms were identical but differed in colour. A total set of 20 quadruplets was generated. Per condition, 20 experimental picture–question pairs were presented, resulting in 160 experimental sentences. To control for strategic effects, we included 160 filler questions ending on the adjective. The experiment consisted of 320 sentences. Conditions were evenly spread over 10 blocks. To control for positional effects, two experimental versions were generated. The first block of the first version corresponded to the final block of the second version, the second block of version 1 corresponded to the ninth block of version 2, and so on.
3.1.3. Procedure
Participants were seated in a dimly lit, soundproof booth in front of a 17” computer screen. Stimuli appeared in a pseudo-randomized order, in which maximally two items of the same condition appeared in succession. The experimental session was divided into ten blocks (32 trials per block), with breaks between the blocks. At the beginning of each trial, the picture appeared in the centre of the screen for 1500 ms. Then the single words of the sentence were presented word-by-word via rapid serial visual presentation (RSVP) (500 ms per word). The question mark was presented separately in order to avoid predictability of a further restriction. After each question, participants made a truth evaluation (“Did the preceding sentence truly or falsely reflect the content of the picture?”). After the final word had disappeared, three question marks were shown, signalling that participants now had to answer with wahr ‘true’ or falsch ‘false’ by pressing one of two buttons (‘F’ or ‘J’) on the keyboard. The keys for true and false answers were counterbalanced across participants. Participants were asked to make their truth evaluation as quickly as possible. The initial above timeout was 1200 ms and was adapted to the response speed of the participants by using an exponentially weighted moving average (Leonhard, Fernández, Ulrich, & Miller, Reference Leonhard, Ruiz Fernández, Ulrich and Miller2011). When participants’ reaction times exceeded the current timeout, they received visual feedback (Schneller! ‘Faster’) on the screen. Following the judgement, a blank screen appeared for 500 ms and three exclamation marks in yellow (1200 ms) indicated that participants now could blink until the next picture was presented. A practice session preceded the experimental session. Including electrode application, the experimental session lasted between 2 and 2.5 hours.
3.1.4. EEG recording
The EEG was continuously recorded from 32 Ag/AgCl electrodes using a BIOSEMI Active-Two amplifier system: FP1, FP2, AF3, AF4, F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, PO3, PO4, O1, Oz, O2. Six further electrodes were recorded: the electrooculogram (EOG, 4 electrodes) was recorded by means of electrodes placed at the outer canthus of each eye (horizontal EOG) and above and below the participant’s left eye (vertical EOG). Two electrodes were put on the left and right mastoid for the purpose of off-line referencing. EEG and EOG recordings were sampled at 1024 Hz during recording, and downsampled to 256 Hz for data analysis. Off-line, electrode sites were referenced to linked mastoids. Raw EEG data were filtered using a 0.3–20 Hz bandpass filter. Before entering the ERP analysis, individual participant data were automatically and manually screened for each trial in order to exclude trials with eye-movement and muscular artefacts. Data per participant and per condition were aggregated from the onset of the critical element (adjective, preposition, and quantifier) to 1000 ms post onset. Afterwards, grand averages were calculated over all participants. Footnote 2
3.1.5. Data analysis
For the behavioural data, repeated measures analyses of variance with the within-subject factors Context (4), and Preposition (2) on mean error rates and reaction times for the truth-value judgements were calculated. Incorrectly answered trials (3.8% of the total trials) were excluded from reaction time and EEG analyses.
ERPs were aggregated from three sentential positions: first, in order to investigate local context effects, brain responses from the onset of the colour adjective were aggregated. Second, potential reanalysis processes were examined from the onset of the preposition. Finally, we aggregated ERPs from the onset of the quantifier. EEG data were analyzed statistically by carrying out repeated-measures ANOVAs for mean amplitude values within time-windows per condition, which were chosen based on visual inspection and previous studies. For the Grand ANOVA, four regions of interest (ROIs) were introduced: left anterior (Roi 1: F3, F7, FC1, FC5), right anterior (Roi 2: F4, F8, FC2, FC6), left posterior (Roi 3: CP1, CP5, C3, P3), and right posterior (Roi 4: CP2, CP6, C4, P4).
From the onset of the colour adjective, we carried out ANOVAs with the factors Context (A,B,C,D) × Roi (left anterior, right anterior, left posterior, right posterior). We did not include the Factor Preposition into these analyses, as conditions did not physically differ with respect to this factor on the adjective. From preposition onset, we carried out ANOVAs with the factors Context (A,B,C,D), Preposition (‘inside-of’ vs. ‘outside-of’) × Roi ( left anterior, right anterior, left posterior, right posterior). From quantifier onset, an ANOVA with the factors Context Complexity (simple (B,C) vs. complex (A,D)) was carried out. In order to examine potential strategic effects during the course of the experiment, we carried out additional post-hoc analyses of each of the above-described ANOVAs with the factor Experimental Half ( first, second).
Individual conditions are only reported if the omnibus ANOVA revealed significant interactions or in case of significant main effects of the factor Context. The detailed results of all ANOVAs are provided as supplementary materials in the ‘Appendix’ (available at: <http://dx.doi.org/10.1017/langcog.2016.30>). In the text, we will focus on the most relevant results. Statistical analyses were carried out in a hierarchical manner. Only significant interactions (p < .05) were resolved. Corrected p-values (Huynh & Feldt, Reference Huynh and Feldt1970) were chosen when the analysis involved more than one degree of freedom in the numerator. Probability levels were Bonferroni adjusted for the planned comparisons between the four levels of the factor Context.
3.2. results
3.2.1. Behavioural Data
Mean error rates and reaction times for the verification task are given in Table 3. Mean error rates in the truth evaluation task were 3.8%, reflecting overall good task performance. Repeated-measures ANOVAs on error rates showed that questions following context D were slightly more erroneous than all other context–sentence pairs. A main effect of Context was found (F(3,69) = 34.17; p < .001). Whereas contexts B and D did not differ from each other (p = .3), all other contexts did (all p values > .03). Reaction times showed an interaction between Context and Preposition (F(3,69) = 2.8; p = .05), which was due to the fact that the two prepositions differed for context D (F(1,23) = 4.34; p < .05; all other p values < .2). We refrain from interpreting reaction times in detail as they were not directly timelocked to the disambiguating positions.
table 3. Behavioural results for the single conditions in Experiment 1
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_tab3.gif?pub-status=prepub)
3.2.2. Colour adjective
Figure 2 shows ERPs from the onset of the colour adjective. Conditions differed within an early (300–400 ms) and a late (450–800 ms) time-window. The detailed results of the ANOVA are provided as supplementary materials in the ‘Appendix’ (A1.1).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_fig2g.jpeg?pub-status=prepub)
Fig. 2. Grand average ERPs from the onset of colour adjective, Experiment 1.
Statistical analyses in the early time window revealed a significant main effect of Context (F(3,69) = 22.2; p < .001), as well as an interaction between Roi and Context (F(9, 207) = 5.74; p < .05). Whereas complex contexts (A,D) did not differ from each other (p = 1), all other contexts did, with significant differences in Roi 3 and Roi 4 (all p values < .05). Finally, no interactions with the factor Half were observed (all F values < 2.07). In the late time window, an effect of Context (p < .001) was found. Whereas complex conditions did not differ (p = 1), all other conditions did (all p values < .01). A Roi and Context interaction (p = .001) reveals variances in effect sizes in the single Rois. No interactions with the factor Half were observed (all F values < 2.16).
In sum, the statistical analyses confirmed that false answers already differed from true answers on the colour adjective when contextual information allowed for an unambiguous decision: for simple contexts, false answers (C) elicited a negativity followed by a late positivity when compared to true answers (B). The complex contexts (A,D) did not differ from each other, but differed from both of the simple contexts.
3.2.3. Preposition
Figure 3 shows ERPs from preposition onset for contexts A to D. Comparably to the adjective, conditions differed within an early (300–400 ms) and a late (450–800 ms) time-window. These differences were restricted to previously ambiguous conditions following complex contexts (A,D). The complete results are provided as supplementary materials in the ‘Appendix’ (A1.2). Statistical analyses in the early time window showed an effect of Context (F(3,69) = 4.32; p < .01), as well as a three-way interaction between Roi, Context, and Preposition (F(9,207) = 7.52; p < .001): whereas an effect of Preposition was found for contexts A and D in each Roi (all p values < .01; with the exception of context A in the left posterior Roi with p = .07), no significant effects were observed for previously unambiguous conditions following simple contexts (B,C) (all p values > .83). The post-hoc analyses revealed an interaction between Roi, Context, and Half (F(9,207) = 3.65; p < .01), which became significant in the left-posterior Roi, (F(3,69) = 2.91; p < .05; all other Rois: F < 2.19). However, the effect of Context was not significant in any of the experimental halves (first half: (F(3,69) = 2.41; p =.30; second half: (F(3,69) = 2.34; p = .32). In the late time window, an effect of Context was found (F(3,69) = 15.04; p < .001), as well as a three-way interaction between Roi, Context, and Preposition (F(9,207) = 3.25; p < .01). Significant effects were restricted to the left posterior Roi. No interactions with the factor Half were observed (all F values < 2.51).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_fig3g.jpeg?pub-status=prepub)
Fig. 3. Grand average ERPs from the onset of the preposition for the midline electrodes, Experiment 1.
In sum, the statistical analyses from the onset of the preposition revealed significant effects only for complex contexts (A,D). For those cases, a similar biphasic ERP pattern for false answers was found as the one on the colour adjective, i.e., an early negativity that was followed by a late positivity.
3.2.4. Quantifier
Figure 4 shows ERPs from the onset of the quantifier. Based on Politzer-Ahles et al. (Reference Politzer-Ahles, Fiorentino, Jiang and Zhou2013), we analyzed three time-windows: an early (250–500 ms), a late (500–1000), and a long time-window (300–1000 ms). No significant effects were found in the early time-window (all p values > .2). In the late time-window, an effect of Context Complexity was observed (F(1,23) = 7.44; p < .05). In the long time-window, no significant effects were found (all p values > .09). In the additional analysis, no effects were found (all F values < 1.96). As a whole, the analysis of the quantifier position revealed that, in a late time-window, complex conditions were slightly more positive than simple conditions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_fig4g.jpeg?pub-status=prepub)
Fig. 4. Grand average ERPs from the onset of the quantifier, Experiment 1.
3.3. discussion
The present experiment used a picture–question verification task to test whether context information incrementally constrains the processing of questions involving quantifier restriction. We examined whether a semantic commitment was made on the earliest possible position in the sentence (H1) or whether parsing decisions were delayed in the risk of a semantic revision (H2). As earlier studies had either observed an N400 for context–sentence mismatches or a late positivity (or a combination of both), we were also interested in the specific processing phase affected by the incongruence.
The present results are compatible with H2. On the colour adjective, a biphasic ERP pattern was observed for false as opposed to true answers (following context C vs. B). In line with previous studies, a mismatch between pictorial and semantic information was reflected by a negativity that was followed by a late positivity (Knoeferle et al., Reference Knoeferle, Urbach and Kutas2011). Under H1, the complex contexts (A,D) were not supposed to differ from context C: in each of these conditions, an immediate local commitment would have resulted in a ‘false’ answer. Thus, a similar negativity would have been expected on the adjective for A, C, and D. By contrast, the amplitude of the complex contexts differed from the simple contexts. Whether this pattern signals a locally under-specified semantic representation can only be decided by considering the results at the preposition, where an unambiguous disambiguation becomes available.
On the preposition, effects were restricted to the complex conditions. For complex conditions, the incongruent conditions (A: ‘outside-of’ vs. ‘inside-of’; D: ‘inside-of’ vs. ‘outside-of’) elicited a comparable biphasic pattern, as did the incongruent conditions for the simple conditions on the colour adjective. This finding is again in accord with H2. Thus, the truth evaluation in the complex conditions was delayed until the position of the preposition, with a semantically under-specified representation before the ultimate point of disambiguation. Under H1, the directly opposite pattern would have been expected for complex contexts: if a semantic evaluation had already been made on the colour adjective, increased processing costs would have occurred for a shift in local truth evaluation (A: ‘inside-of’ vs. ‘outside-of’; D: ‘outside-of’ vs. ‘inside-of’). The positivity following the negativity is descriptively smaller than the one on the adjective and is statistically significant only in one Roi. One explanation for this finding could be that the specific information that has to be falsified differs between simple and complex contexts. For simple contexts, falsification involves the properties of all pictorial objects, i.e., the total set of blue or red objects. By contrast, for complex sentences, only a subset of the available objects is affected by the falsification process.
Finally, our additional analyses showed differences between quantifiers following simple and complex contexts. In contrast to Politzer-Ahles et al. (Reference Politzer-Ahles, Fiorentino, Jiang and Zhou2013), we found a late positivity for the complex conditions. Perhaps most plausibly, this effect reflects processing costs associated with the instantiation of a subset reading, as it becomes clear at this position that the sentence could only be correct if a further restriction will follow. The late effect is principally in accord with results from Kaan et al. (Reference Kaan, Dallas and Barkley2007). However, as no neutral baseline (such as all or they) was presented in that study, these results do not provide decisive evidence for or against processing costs associated with a subset reading.
In sum, the present results show that contextual effects may incrementally constrain the processing of questions involving quantifier restriction. At the earliest fully disambiguating position (the colour adjective for simple contexts, and at the preposition for complex contexts), false answers elicited a negativity followed by a positivity. Thus, a truth evaluation already occurs at an early processing stage and is made as soon as contextual information is unambiguously available. Considering previous studies on picture–sentence verification, both the negativity and the late positivity had sometimes been associated with strategic effects rather than reflecting ‘real’ semantic processing (Knoeferle et al., Reference Knoeferle, Urbach and Kutas2011; Vissers et al., Reference Vissers, Kolk, van de Meerendonk and Chwilla2008). According to that view, the negativity reflects attentional mismatch detection rather than a fully-fledged linguistic-semantic analysis (D’Arcy & Connolly, Reference D’Arcy and Connolly1999). Moreover, a long-standing debate in the literature has been concerned about whether late positivities in language tasks reflect linguistic processes (e.g., Friederici, Hahne, & Saddy, Reference Friederici, Hahne and Saddy2002; Osterhout & Holcomb, Reference Osterhout and Holcomb1992), or are members of the P300 family, reflecting domain-general mechanisms (Coulson, King, & Kutas, Reference Coulson, King and Kutas1998). According to the latter view, late positivities reflect increased attentional demands when the current stimulus is particularly relevant for fulfilling the specific task requirements. For examining whether attentional processes affected the present results, we conducted a second experiment employing a probe detection task, in which attention was directed away from the explicit mapping between the context picture and the target question.
4. Experiment 2
The second study intended to replicate the findings from Experiment 1 using a probe detection task that should prevent readers from focusing on the mapping between the context picture and the target question. In addition, we provided a post-Experiment questionnaire, in which participants indicated whether they had read the questions as a whole or whether they had focused on specific sentential positions. If so, they were asked to indicate the positions they focused on.
4.1. methods
4.1.1. Participants
Twenty-four right-handed students (mean age: 26.3 years) from the University of Tübingen took part in the study (12 male). They were German native speakers with normal or corrected-to-normal vision and were paid for their participation. Due to ocular or motor artefacts, two participants were excluded from the final analyses.
4.1.2. Materials
The identical context–question pairs as in Experiment 1 were presented. Filler items were mainly identical to those in Experiment 1. In half of the fillers, we used pictures in which two different geometrical figures appeared. The total number of experimental trials was 320. For the probe detection, five probe positions were randomly assigned: the picture, the first noun, the colour adjective, the preposition, and the final noun. Probes were either identical (50%) or non-identical to the previously presented picture or word. For non-identical pictures, probes either differed with respect to the geometrical objects or to the colours. Linguistic probes were semantically related to the words in the probe position.
4.1.3. Procedure
The experimental settings and the presentation of trials were identical to Experiment 1 up to the question mark. 300 ms after question mark offset, a visual probe (a picture or a word) was presented, and participants had to decide whether the probe had already been shown in the current trial. Participants answered with ‘yes’ or ‘no’ (50%) by pressing one of two answer buttons (‘F’ or ‘J’). The keys for the ‘yes’ and ‘no’ answers were counterbalanced across participants. For timeouts, we applied the same procedure as in Experiment 1. In 15.6% of the trials, a box appeared instead of the probe (Bitte beantworte die Frage ‘Please answer the question’), and participants had to answer the question by pressing the ‘yes’ or ‘no’ button. No reaction time feedback was provided for these trials. Following the decision or answer, a blank screen appeared (500 ms), and three exclamation marks in yellow (1200 ms) were shown, indicating that participants could blink until the picture was presented. A practice session preceded the experimental session. A post-Experiment questionnaire was filled in after the ERP study.
4.1.4. EEG recording and data analysis
EEG recording and data analysis were identical to Experiment 1.
4.2. results
4.2.1. Behavioural data
Mean error rates and reaction times for the probe detection task are given in Table 4. Mean error rates in the probe detection task were 21.5%. Repeated-measures ANOVAs on error rates showed an interaction between Context and Preposition (F(3,63) = 11.26; p < .001).
table 4. Behavioural results for the single conditions in Experiment 2
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_tab4.gif?pub-status=prepub)
For contexts A and C, the preposition innerhalb ‘inside-of’ showed more correct responses than außerhalb ‘outside-of’ (A: F(1,21) = 10.03; p < .01; C: F(1,21) = 8.98; p < .01). For contexts B and D, the preposition innerhalb caused more errors (B: F(1,21) = 6.33; p < .05; D: F(1,21) = 8.76; p < .01). Reaction times showed an effect of Context (F(3,63) = 5.46; p < .01), and an interaction between Context and Preposition (F(3,63) = 6.45; p = .001). Prepositional differences were restricted to complex contexts (A: F(1,21) = 4.41; p < .05; D: F(1,21) = 10.86; p < .01; B and D: each F value < 0.4). As in Experiment 1, we will refrain from interpreting reaction times. Nevertheless, they do appear to mirror the pattern of ERP results reported below.
4.2.2. Post-Experiment questionnaire
From the 22 participants who entered the final analysis, 18 (81.8%) reported to have read the whole question. The remaining four participants reported to have focused their attention on the probe positions.
4.2.3. ERP data
Analogous to Experiment 1, we analyzed ERPs from the onset of the colour adjective, the preposition, and the quantifier. Given the high error rates in the present study and the fact that the probe detection was not related to the picture–sentence mapping, we did not exclude errors from the ERP analysis.
4.2.4. Colour adjective
Figure 5 shows ERPs from the onset of the colour adjective. Conditions again differed within an early (300–400 ms) time-window. In contrast to Experiment 1, no differences were found in the late time-window (450–800 ms). The detailed results are provided as supplementary materials in the ‘Appendix’ (A2.1).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_fig5g.jpeg?pub-status=prepub)
Fig. 5. Grand average ERPs from the onset of colour adjective, Experiment 2.
The statistical analyses in the early time-window (300–400 ms) revealed an effect of Context (F(2,46) = 10.68; p < .001), as well as a Roi × Context interaction (F(6,138) = 2.63; p < .05). In contrast to the first study, condition A did not differ from condition B (all F values = 1). No interactions with the factor Half were observed (all F values < 1.88). In the late time-window (450–800 ms), no significant differences or interactions were found (all F values < 2.5). In sum, statistical analyses on the adjective confirmed that the experimental contexts elicited differential brain responses on the position of the colour adjective. Comparably to Experiment 1, false answers yielded the most negative effect, and true answers were most positive. Though the complex conditions did not differ significantly from each other, condition A slightly differed from condition B.
4.2.5. Preposition
Figure 6 shows ERPs from the onset of the preposition. Again, conditions differed within an early (300–400 ms) time-window. No differences were descriptively visible in the late (450–800 ms) time-window. The detailed statistical results are provided as supplementary materials in the ‘Appendix’ (A2.2).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_fig6g.jpeg?pub-status=prepub)
Fig. 6. Grand average ERPs from the onset of the preposition for the midline electrodes, Experiment 2.
Comparably to Experiment 1, the statistical analyses in the early time-window (300–400 ms) showed that the difference between the preposition types affected the previously ambiguous complex contexts. An effect of Context was found (F(3,69) = 6.63; p = .001), as well as a three-way interaction between Roi, Context, and Preposition (F(9,207) = 3.23; p < .01). Whereas an effect of Preposition was found for contexts A and D in each Roi (all p values < .01), no significant effects were observed for contexts B and C (all p values > .11). No effects or interactions with the factor Half were observed (all F values < 1.59). In contrast to Experiment 1, no effects of Preposition or Preposition × Context interactions were found in the late time-window (450–800 ms). However, a main effect of Context was found (F(3,63) = 11.65; p < .05), as well as a Context × Roi interaction (F(9,189) = 2.34; p < .05). No effects or interactions with the factor Half were observed (all F values < 2.11).
4.2.6. Quantifier
Figure 7 shows ERPs from the onset of the quantifier. Again, we analyzed an early (250–500 ms), a late (500–1000 ms), and a long (300–1000 ms) time-window.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171114095445069-0929:S1866980816000302:S1866980816000302_fig7g.jpeg?pub-status=prepub)
Fig. 7. Grand average ERPs from the onset of the quantifier, Experiment 2.
No significant effects were found in any of the time-windows (all ps > .11). The additional analyses showed an interaction with Half (late time-window: F(1,21) = 4.51; p < .05; long time-window: F(1,21) = 4.58; p < .05). Whereas no context effects were observed in the first half of the experiment (all F values < 1.83), contexts differed in the second half of the experiment (all F values > 2.60).
4.3. discussion
The present study was intended to replicate the findings from Experiment 1 by using a task that distracted attention away from the picture–question match.
With respect to ERP effects in the early time-window, the present findings replicated the results from Experiment 1. An early negativity was observed on each of the disambiguating positions. On the adjective, the negativity was found for false answers (C) as opposed to true answers (B). Again, the complex conditions (A,D) differed from these simple conditions. The early negativity on the preposition was again restricted to the previously ambiguous conditions. Thus, even without a task-relevant match between context and target, a comparable pattern as in Experiment 1 was observed: in the cases of a safe decision, the truth-value of the question was computed as quickly as possible. By contrast, in the cases of a risky decision, the parser evaluated the truth-value only at the earliest unambiguous position, i.e., at the preposition. The finding of an early negativity is in accord with Hypothesis H2 and indicates that compositional-semantic processing may rapidly affect sentence comprehension at early processing phases, thus ruling out a purely strategy-based alternative explanation of the findings of Experiment 1.
In contrast to Experiment 1, no consistent effects were found in the late time-window. On the colour adjective, the single conditions did not differ from each other. Thus, the experimental task manipulation affected the late positivity, which was modulated by attentional demands in Experiment 1, but not in Experiment 2. On the preposition, conditions differed in the late time-window. Crucially, these effects were not related to the picture–question mapping but showed up as contextual effects, which plausibly can be attributed to the fact that questions following simple contexts already differed from those following complex contexts before the preposition was processed. Visual inspection of the ERPs indicates that the complex conditions are indeed already more positive than the simple conditions from the onset of the preposition and could thus be spillover effects from the adjectival region (Steinhauer & Drury, Reference Steinhauer and Drury2012). In the current experiment, effects on the quantifier position were restricted to the second half of the experiment, thus indicating that processing difficulties associated with a subset interpretation may have been modulated by attentional demands.
Finally, results from our post-Experiment questionnaire revealed that the vast majority of participants read the questions in their entirety. Footnote 3 Importantly, none of the participants who were not reading the whole sentences reported to have been exclusively paying attention to the colour adjective or the preposition.
5. General discussion
The present study investigated pictorial effects on the compositional-semantic processing of questions involving quantificational restriction. We tested whether the meaning of the universal quantifier is incrementally restricted by contextual information regardless of the risk of a potentially upcoming restriction. Moreover, we examined whether this process is modulated by attentional demands. In our first experiment, a picture–question verification task, attention towards the mapping between the context picture and the target question was directly relevant for task fulfilment. Our second experiment employed a probe detection task, in which this mapping was not relevant for successfully fulfilling the task.
Most generally, our results show that the restrictive properties of quantifier meaning constrain question processing at early processing phases. Across two tasks, a negativity (300–400 ms) was found when the pictorial information was not in accord with a positive answer. The negativity was followed by a late positivity (450–800 ms) in Experiment 1, where the task required an explicit mapping between the pictorial context and the question. In Experiment 2, a probe detection task, attention was directed away from this match, and no late positivity was observed. In both experiments, the negativity was elicited only when a commitment of the parser did not result in the risk of a semantic revision. Thus, context effects were directly evaluated on the colour adjective when preceded by unambiguous pictorial contexts. However, when a potentially following restriction could lead to a shift in semantic meaning, a comparable negativity was observed only from the onset of the preposition. As a whole, the observed pattern of results is compatible with a revision-sensitive strategy (H2), in which the parser only makes a semantic commitment in safe contexts.
Our results are in line with previous studies investigating context effects on semantic processing (Hagoort & van Berkum, Reference Hagoort and van Berkum2007). A deteriorated semantic fit was reflected by a negative ERP component, which was followed by a positivity. Contrary to previous studies (Nieuwland, Reference Nieuwland2016; Urbach et al., Reference Urbach, DeLong and Kutas2015), the negativity could not have been triggered by lexical expectancies. First, after having processed the context picture, any kind of colour adjective could principally follow in the question. Second, the experimental design had been balanced in such a way that each context occurred with both possible colours. For instance, context B was followed by the adjective ‘blue’, but it was also followed by the adjective ‘red’ in another trial, for which it then constituted context C. Thus, in addition to studies on lexical integration, the present study indicates that also higher-level compositional-semantic processes have early effects on sentence comprehension (Hunt et al., Reference Hunt, Politzer-Ahles, Gibson, Minai and Fiorentino2013; Knoeferle et al., Reference Knoeferle, Urbach and Kutas2011; Vissers et al., Reference Vissers, Kolk, van de Meerendonk and Chwilla2008). One novelty of the current experiments is the finding that an incremental semantic commitment depends on the reliability of contextual information, which may affect the risk of revision due to alternative possibilities of domain restriction.
In contrast to the negativity, the occurrence of the late positivity was restricted to Experiment 1, where the experimental task encouraged an attentional focus towards the picture–question match. Interestingly, a comparable effect was also discussed in Nieuwland (Reference Nieuwland2016; see also Freunberger & Nieuwland, Reference Freunberger and Nieuwland2016), who compared quantificational interpretation in a verification task and a task that was not directly related to the target sentences. Comparably to our results, the positivity exclusively occurred in the verification task. Therefore, the current positivity seems to be related to task demands rather than reflecting semantic processing per se.
Though examining the quantifier region was not the primary intention of the present study, the late positivity for the subset reading in Experiment 1 is partly in accord with previous findings (Kaan et al., Reference Kaan, Dallas and Barkley2007; Politzer-Ahles et al., Reference Politzer-Ahles, Fiorentino, Jiang and Zhou2013). In the remainder of the ‘General discussion’, we will discuss the observed ERP pattern in detail.
5.1. the early negativity
Given the early onset of the negativity, the question arises whether the observed effect is in fact a ‘semantic’ N400 or whether it reflects lower-level processes related to stimulus properties. Early negativities could alternatively be viewed to be N2b components reflecting an attentional sensitivity towards the match between visual and linguistic information rather than a fully-fledged semantic analysis (Knoeferle et al., Reference Knoeferle, Urbach and Kutas2011; Vissers et al., Reference Vissers, Kolk, van de Meerendonk and Chwilla2008). Standardly, the N2b occurs in oddball tasks, in which voluntary attention is directed towards mismatch detection (Pritchard, Shappell, & Brandt, Reference Pritchard, Shappell, Brandt, Ackles, Jennings and Coles1991), and has been suggested to reflect a deviation from an active template elicited by preceding stimuli (Näätänen, Reference Näätänen1995). In the present experiments, such an attentional mechanism would involve a succession of different processing steps, such as analyzing the visual input, keeping this information active in working memory, and matching each incoming word with the current template (D’Arcy & Connolly, Reference D’Arcy and Connolly1999). When encountering contradictory information, a representational mismatch would elicit the N2b, and would therefore reflect an unsuccessful mapping between the lexical properties of the input word and the pictorial representation.
Considering the simple contexts, such a strategy could straightforwardly explain the observed results. For instance, after having processed the context picture B, the pictorial representation contains information about the colour (‘blue’) and the presented objects (‘triangles’), and probably the fact there are blue triangles inside and outside of a circle. This representation needs to be upheld in memory until matching or mismatching information will be encountered, and is then taken out of the memory buffer. When arriving at the adjective, the colour ‘blue’ matches part of the representation, whereas the colour ‘red’ leads to a mismatch between the pictorial representation and the target word. Regarding the complex contexts, the pictorial information contains information about both colours (‘blue’ and ‘red’), the presented objects, and information about the subsets (the red or the blue one). Under a purely attention-driven account, it is somewhat difficult to explain why, on the colour adjective, complex contexts do not elicit comparable mismatch effects as simple contexts. One explanation for this finding could be that the actually encountered colour information is part of the current pictorial representation and refers to one of the subsets involved, hence, no mismatch is perceived. However, under such an account, it is not entirely clear why any effects were observed at the preposition at all. At this position, the encountered preposition types are also included in the current representation. Nevertheless, clear-cut mismatch effects were observed for the complex contexts. Moreover, the current negativity reliably occurred across different tasks. Whereas the picture–question verification task explicitly focused participants’ attention towards the mapping between the picture and the sentence, fulfilment of the probe detection task was principally possible without instantiating any relation between the picture and the question. In addition, the post-Experiment questionnaire revealed that the majority of participants had read the whole questions without focusing on specific words, thus indicating that the observed ERP pattern may relate to processes involved in question processing instead of reflecting a pure attentionally driven match.
An alternative explanation could thus be that it is indeed an N400 reflecting higher-level semantic processing. The early onset of the N400 could have been evoked by a particularly high semantic expectancy elicited by the context picture (Vissers et al., Reference Vissers, Kolk, van de Meerendonk and Chwilla2008). According to this view, a presented picture elicits a strong semantic expectancy that a following sentence will correctly describe the pictorial content. This expectancy is violated when an incoming word is in conflict with a correct interpretation. However, note that, in the present experiment, questions instead of declarative sentences were presented. As a consequence, there is no principled reason to assume that comprehenders expect a ‘yes’ rather than a ‘no’ answer. In semantic theory, questions have sometimes been taken to denote sets of possible answers (Hamblin, Reference Hamblin1973). Thus, the yes/no question Are all triangles blue? is taken to denote the set {all triangles are blue, it is not the case that all triangles are blue}. Under this approach, the questions in the present experiments should denote exactly the same set of propositions independently of the utterance context. Given that there is currently little ERP evidence on the on-line comprehension of questions, we can only speculate why ‘no’ answers elicited an N400. One explanation would be that negative answers are generally more complex, as they involve an additional negation step (Kaup, Lüdtke, & Zwaan, Reference Kaup, Lüdtke and Zwaan2006). Negative answers would thus involve an instantiation of a negative tag into the actually processed meaning representation (Carpenter & Just, Reference Carpenter and Just1975), or the activation of two different states of affairs (both the negated and an actual state). For instance, when processing the door is not closed, both a closed and an open door could be assumed to be simulated (Lüdtke, Friedrich, De Filippis, & Kaup, Reference Lüdtke, Friedrich, De Filippis and Kaup2008). Adopted to the present scenario, such an account would predict that, for instance, when processing ‘blue’ after context C, both the originally shown context C as well as the correct alternative would need to be simulated, whereas following context B, only the blue context picture would be simulated.
In ERP studies on negation processing it is as yet an unresolved issue whether negative sentences are processed in an immediate or delayed fashion. While studies employing verification paradigms report absent (Fischler et al., Reference Fischler, Bloom, Childers, Roucos and Perry1983) or delayed effects (Lüdtke et al., Reference Lüdtke, Friedrich, De Filippis and Kaup2008; Wiswede, Koranyi, Müller, Langer, & Rothermund, Reference Wiswede, Koranyi, Müller, Langer and Rothermund2013), other studies report immediate N400 effects indistinguishable from non-negated controls (Nieuwland & Kuperberg, Reference Nieuwland and Kuperberg2008). In accord with the latter findings, the brain response towards negative answers in the present experiments already occurred at an early time-window. This is expected under Nieuwland and Kuperberg’s pragmatic account of negation processing because, for our polar questions, both answer alternatives were pragmatically equally licensed.
An alternative explanation for the preference for ‘yes’ answers could be that the semantic processing of questions is constrained by predictive processes. In that case, contextually given cues elicit the prediction that a speaker asking a question will refer to the already presented information instead of one of the numerous possible alternatives. For instance, a picture including only blue objects would elicit the prediction that the speaker will ask about the properties already specified instead of introducing a new reference set (e.g., a set of red objects). Such a proposal is generally compatible with previous work on lexical predictability in declarative clauses (Brothers, Swaab, & Traxler, Reference Brothers, Swaab and Traxler2015; Nieuwland, Reference Nieuwland2016; Roehm, Bornkessel-Schlesewsky, Roesler, & Schlesewsky, Reference Roehm, Bornkessel-Schlesewsky, Roesler and Schlesewsky2007; Vespignani, Canal, Molinaro, Fonda, & Cacciari, Reference Vespignani, Canal, Molinaro, Fonda and Cacciari2010). For instance, Roehm et al. (Reference Roehm, Bornkessel-Schlesewsky, Roesler and Schlesewsky2007) manipulated lexical predictability and semantic relatedness in an antonym study, and observed that both of these factors have an impact on processes within the N400 time-window. Whereas semantic relatedness affected the N400, lexical predictability (e.g., of the word white in a sentential context like The opposite of black is …) elicited an early positive deflection in the same time-window. Co-occurring effects in the N400 time-window thus may reflect the simultaneous processing of qualitatively distinct processes related to predictive processes and the semantic evaluation of the currently processed word (see Lau, Holcomb, & Kuperberg, Reference Lau, Holcomb and Kuperberg2013, for discussion). Adopted to the present findings, a processing advantage of unambiguous ‘yes’ answers would signal that correct answers had already been predicted prior to arriving at the critical region. Thus, specific conceptual features of a following restrictor (e.g., a specific colour of the adjective) would already be added to the current representation of each newly incoming word, and would then be evaluated as soon as a compatible or incompatible input is encountered. If such an approach is on the right track, it still needs to be determined why questions do not differ from declarative clauses with respect to their expected restrictors. In particular, in the Roehm et al. (Reference Roehm, Bornkessel-Schlesewsky, Roesler and Schlesewsky2007) example, a specific lexical item (i.e., white) is predicted, and any other item would render the whole statement false. By contrast, questions do not assert statements and do not have truth-values that could be falsified by unexpected items. Therefore, an interesting task for follow-up studies would be to further investigate differences between declaratives and questions manipulating different linguistic and non-linguistic context types. Thus, while the underlying mechanism for the preference of ‘yes’ answers in questions still needs to be determined, both of the above-mentioned accounts are compatible with the finding of ERP differences in the N400 time-window (see also Hunt et al., Reference Hunt, Politzer-Ahles, Gibson, Minai and Fiorentino2013, for comparably early effects in a picture–sentence verification paradigm on the processing of the quantifier some).
In sum, the present negativity is partly compatible with both an N2b approach, or could alternatively reflect an N400 component. According to the former view, the present pattern of results reflects an attentionally driven representational match between the context picture and the target question, whereas an N400 approach indicates that the negativity reflects increased processing cost associated with the generation of a negative answer.
5.2. the late positivity
As already mentioned in the discussion of Experiment 1, a long-standing debate in the literature concerns the question whether late positivities in linguistic tasks uniquely reflect linguistic processing (Friederici et al., Reference Friederici, Hahne and Saddy2002; Osterhout & Holcomb, Reference Osterhout and Holcomb1992) or whether they need to be related to the P300 family (Coulson et al., Reference Coulson, King and Kutas1998). Recently, it has been proposed that late positivities involved in well-formedness evaluations can also be captured by referring to the P300 (Bornkessel et al., Reference Bornkessel-Schlesewsky, Kretzschmar, Tune, Wang, Genç, Philipp, Roehm and Schlesewsky2011). According to this view, late positivities in linguistic tasks are related to binary decision processes. For instance, classifying linguistic stimuli into well-formed and ill-formed structures requires such a binary process. According to so-called ‘P600-as-P3’ approaches, stimuli that are highly relevant for fulfilling the requirements of a given task often elicit late positivities that reflect increased attentional demands (Sassenhagen, Schlesewsky, & Bornkessel-Schlesewsky, Reference Sassenhagen, Schlesewsky and Bornkessel-Schlesewsky2014). Instead of being language-related, the P300 is thus supposed to reflect domain-general processing strategies.
Our results are compatible with such an approach. The verification task in Experiment 1 involved a binary decision process, and the truth evaluation could have increased the attention towards the picture–question mapping. The finding that the P300 already occurred at the mismatch positions indicates that a truth evaluation was made as soon as the anomaly became available. In the first experiment, the task-relevant decision could already have been made during the processing of the critical sentential positions (i.e., on the adjective for contexts B and C, and on the preposition for contexts A and D), as, at those positions, all relevant information for task fulfilment (e.g., the pictorial representation as well as the linguistic information) was already available. Short reaction times for the clause-final verification task additionally indicate that the truth evaluation had been made prior to the clause-final response. By contrast, in Experiment 2, the actual probe detection could only have been made after the final word of the sentence had been processed. Up to the presented probe, participants could not guess which part of the sentence would be relevant for task fulfilment. As a consequence, increased processing costs in terms of generally longer reaction times were found.
As a whole, the present positivity varied as a function of the attention towards the picture–question match across experiments and is thus compatible with current P600-as-P3 approaches.
6. Conclusion
The present study shows that pictorial context information may constrain early processing stages of compositional-semantic processing. In addition to previous studies, the current results also indicate that an incremental semantic meaning evaluation is dependent on the reliability of contextual information. In the present experiments, a semantic commitment was only made when contextual information was unambiguous and thus did not lead to a risk of a potential semantic revision. Finally, our results indicate that the experimental task used in studies on semantic processing can affect the overall pattern of ERP results. In line with previous studies, a mismatch between contextual and semantic information triggered an early negativity, which was not affected by differences in attentional demands across the two tasks. By contrast, a late positivity accompanying this negativity was only observed when the task required increased attention towards the mapping between contextual and semantic information.
Supplementary materials
For supplementary material for this paper, please visit <http://dx.doi.org/10.1017/langcog.2016.30>.