Learning the grammar of another language is a challenging task, especially for late second language (L2) learners (e.g. DeKeyser, Reference DeKeyser2005; Johnson & Newport, Reference Johnson and Newport1989). While grammatical accuracy in L2 production generally improves with the overall proficiency, a degree of variability seems to persist even in advanced L2 users, at least for some aspects of grammar (e.g. Lardiere, Reference Lardiere1998; Trenkic, Reference Trenkic, García Mayo and Hawkins2009; White, Reference White2003). One of the central questions of L2 research is why the grammatical production of non-native speakers differs, often in systematic ways, from that of native speakers, and why some differences persist even in highly proficient L2 users. This issue is complicated further by the fact that despite their persistent non-targetlike production, some aspects of L2 morphosyntax can be comprehended in a targetlike manner (e.g. Tokowicz & MacWhinney, Reference MacWhinney, Kroll and de Groot2005).
In this paper, we focus on English articles (the, a) as an aspect of morphosyntax known to be especially difficult for L2 learners who come from language backgrounds without articles. In their production these L2 speakers often omit articles (e.g. *Pass me mug), or choose an inappropriate article for the context (e.g. Pass me the mug in the context of two identical mugs) (e.g. Ionin, Ko & Wexler Reference Ionin, Ko and Wexler2004; Jarvis Reference Jarvis2002; Luk & Shirai Reference Luk and Shirai2009; Ringbom Reference Ringbom1987; Trenkic Reference Trenkic, Foster-Cohen, Ruthenberg and Poschen2002, Reference Trenkic2007; Young, Reference Young, Bayley and Preston1996; Žegarac, Reference Žegarac2004). Much less is known about how these grammatical constructions are processed by L2 speakers in comprehension. Here we explore how adult, intermediate Mandarin learners of English process English articles using fine-grained measures of online language comprehension. We sought to determine whether a population of L2 speakers which is known to have persistent difficulties with English article production also experience difficulties with this aspect of morphosyntax in real-time comprehension.
Non-targetlike processing of L2 grammar
A vast body of the literature suggests that late second language learners often show inability to process L2 morphosyntactic information in a targetlike manner (e.g. Chen, Shu, Liu, Zhao & Li, Reference Chen, Shu, Liu, Zhao and Li2007; Hahne & Friederici, Reference Hahne and Friederici2001; Jiang Reference Jiang2004, Reference Jiang2007; Johnson & Newport, Reference Johnson and Newport1989; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010; Ojima, Nakata & Kakigi, Reference Ojima, Nakata and Kakigi2005; Sabourin, Stowe & de Haan, Reference Sabourin, Stowe and de Haan2006; Sanders & Neville, Reference Sanders and Neville2003; Sanders, Neville & Woldorff, Reference Sanders, Neville and Woldorff2002; Su, Reference Su2001a, Reference Sub; Weber-Fox & Neville, Reference Weber-Fox and Neville1996).
Structures that are difficult to process in comprehension are often the same ones with which L2 users struggle in production. For example, L1 Chinese/L2 English speakers, who have difficulties in plural noun marking (cats ) and subject–verb agreement (the cat is asleep, the cats are asleep) in production at even advanced proficiency levels (Lardiere, Reference Lardiere1998) are also less sensitive to plural marking and number agreement violations in comprehension, as shown in both self-paced reading tasks (Jiang Reference Jiang2004, Reference Jiang2007) and on event-related potential (ERP) measures (Chen et al., Reference Chen, Shu, Liu, Zhao and Li2007). Furthermore, when learners are trained how to process an aspect of L2 grammar in comprehension, this often results in gains not only in comprehension but also in production (VanPatten, Reference VanPatten1996, Reference VanPatten2002; VanPatten & Cadierno, Reference VanPatten and Cadierno1993). This suggests that problems in L2 production may be related to the processing strategies used in comprehension which may lead to the development of non-targetlike underlying representations of the L2 grammar (Kroll & Dussias, Reference Kroll, Dussias, Bhatia and Ritchie2004).
There are two main reasons why L2 grammar may not be processed in a targetlike way. The first is language transfer (e.g. Bates & MacWhinney, Reference Bates, MacWhinney, MacWhinney and Bates1989; Gass & Selinker, Reference Gass and Selinker1992; Odlin, Reference Odlin1989; Sharwood Smith, Reference Sharwood Smith1980): learners’ extensive experience with their first language (L1) may influence how they process aspects of L2 grammar. For example, Chinese, Japanese and Korean learners of L2 English are well known learner populations from L1 backgrounds without articles who experience considerable problems with appropriately using these grammatical elements in English (e.g. Luk & Shirai, Reference Luk and Shirai2009). It has been argued that such L2 users will have learned, through the experience with their article-lacking L1s, to infer referential definiteness from other sources, such as discourse, lexical information, and broader context (i.e. pragmatic affordances). When they encounter English articles, lexical and pragmatic cues may overshadow the article and lead L2 users to ignore it, thus blocking the creation of new associations and representations as a result of “automatically learned inattention” (Ellis, Reference Ellis2006, p. 178).
In addition to the L1-specific transfer effects, L2 processing may generally be less automatic and more resource-draining than L1 processing. Proposals such as the Shallow Structure Hypothesis (Clahsen & Felser, Reference Clahsen and Felser2006) argue that, compared to native speakers, L2 users are less able to use morphosyntax in real-time sentence processing and that they compensate by relying more extensively on lexical, pragmatic and contextual cues. While the original hypothesis makes this claim for long distance dependencies only, some studies suggest that it may be applicable to simpler structures as well (e.g. Roberts, Gullberg & Indefrey, Reference Roberts, Gullberg and Indefrey2008; Scherag, Demuth, Rösler, Neville & Röder, Reference Scherag, Demuth, Rösler, Neville and Röder2004).
In sum, previous literature suggests that problems that are often observed in L2 grammar may be associated with more extensive reliance on lexical and contextual elements, either as a consequence of L1 transfer (Luk & Shirai, Reference Luk and Shirai2009) or a more general L2 processing effect (Clahsen & Felser, Reference Clahsen and Felser2006).
Targetlike processing of L2 grammar
Not all grammatical processing in a second language appears to be non-targetlike. Age is one factor known to impact on how L2 is processed, with those starting at a younger age usually exhibiting more targetlike patterns of processing than late starters (e.g. Weber-Fox & Neville, Reference Weber-Fox and Neville1996). The effects of age, however, are modulated by the context of learning: unlike in naturalistic, immersion environments (e.g. after immigrating to a country where the L2 is spoken), the age of first exposure does not seem to play such a central role in instructed, foreign-language settings (e.g. Muñoz, Reference Muñoz2008). Indeed, some studies suggest that it is the achieved level of proficiency, rather than how or when the learning happened, that makes the most substantial difference in how L2 is processed (e.g. Perani, Paulesu, Galles, Dupoux, Dehaene, Bettinardi, Cappa, Fazio & Mehler, Reference Perani, Paulesu, Galles, Dupoux, Dehaene, Bettinardi, Cappa, Fazio and Mehler1998; see also Steinhauer, White & Drury, Reference Steinhauer, White and Drury2009, for a review).
In addition to the above, L2 processing can be modulated by structural similarities between the first and the second language via language transfer (see reviews in Van Hell & Tokowicz, Reference Van Hell and Tokowicz2010, and Tolentino & Tokowicz, Reference Tolentino and Tokowicz2011). Similarities in L2 and L1 processing have been predominantly reported for constructions that are similar in the two languages. For example, even beginner adult L1 English/L2 Spanish learners show implicit online sensitivity to copula omission in Spanish sentences such as *Su abuela cocinando muy bien “*His grandmother cooking very well” (the correct Spanish sentence is Su abuela está cocinando muy bien “His grandmother is cooking very well”), as measured by ERPs (Tokowicz & MacWhinney, Reference MacWhinney, Kroll and de Groot2005).
In contrast, non-targetlike processing is often observed for constructions that are formed differently in the L1 and the L2. In the same study, English learners of Spanish showed little sensitivity to the determiner number agreement, e.g. el niño/ los niños “the-sg boy/the-pl boys”. While English does have number agreement on determiners, it marks it only with demonstratives (this/these), but not with articles. This is another case where overshadowing and blocking have been invoked as an explanation: Tokowicz and MacWhinney (Reference Tokowicz and MacWhinney2005) propose that because English articles do not provide number information, English speakers learn to actively suppress any expectations regarding the number of the following noun when they encounter an article in their L2 Spanish.
An interesting special case are grammatical structures that are unique to the second language. While some results suggest that such structures are difficult to process in a targetlike way (see Jiang, Reference Jiang2004, Reference Jiang2007; Chen et al., Reference Chen, Shu, Liu, Zhao and Li2007 cited above), other studies suggest that this can be achieved. For example, ERP studies on grammatical gender show that speakers of L1 English (with no grammatical gender on articles and nouns) can show sensitivity to violations in gender agreement in L2 Spanish (Tokowicz & MacWhinney, Reference MacWhinney, Kroll and de Groot2005), L2 French (Frenck-Mestre, Reference Frenck-Mestre2004) and an artificial language with this category (Morgan-Short, Sanz, Steinhauer & Ullman, Reference Morgan-Short, Sanz, Steinhauer and Ullman2010, but see contradictory results for L2 Dutch in Sabourin, Reference Sabourin2003). In self-paced reading studies, Jackson (Reference Jackson2007) and Jackson and Dussias (Reference Jackson and Dussias2009) show that advanced L1 English learners of L2 German can process nominal case marking, not present in their L1, in a way not different from German native speakers.
An explanation of such targetlike processing of unique-to-L2 structures proposed by Tokowicz and MacWhinney (Reference Tokowicz and MacWhinney2005) is based on the principles of the Competition Model (MacWhinney, Reference MacWhinney and MacWhinney1987, Reference MacWhinney, Kroll and de Groot2005; MacWhinney & Bates, Reference MacWhinney and Bates1989). These authors argue that for structures unique to the L2, second language processing is not affected by either L1 transfer (there is nothing to transfer) or online competition, thus allowing for targetlike patterns to emerge. The Competition Model thus makes a different prediction from both the overshadowing and blocking account (Luk & Shirai, Reference Luk and Shirai2009) and from the Shallow Structure Hypothesis (Clahsen & Felser, Reference Clahsen and Felser2006), specifically with regard to processing of unique-to-L2 structures in comprehension.
In the study presented below we investigated the predictions of these different theoretical accounts using the visual-world eye-tracking paradigm, a paradigm which allows assessing the processing of well-formed sentences. So far, targetlike processing of unique-to-L2 structures has been observed primarily in experiments measuring participants’ sensitivity to grammatical violations (for example, ERPs in response to, or reading times of ungrammatical sentences). Being able to detect violations in ungrammatical sentences, however, is not the same as being able to facilitatively utilise grammatical information in the processing of well-formed sentences. In fact, the results of studies which assess L2 processing of well-formed sentences have either been contradictory or inconclusive. For example, unlike the ERP studies reviewed above (e.g. Frenck-Mestre, Reference Frenck-Mestre2004; Tokowicz & MacWhinney, Reference MacWhinney, Kroll and de Groot2005), visual-world eye-tracking studies suggest that the same learner populations may not actually be able to utilise grammatical gender information in L2 in real time (e.g. Grüter, Lew-Williams & Fernald, Reference Grüter, Lew-Williams and Fernald2012; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010).
Here we specifically focus on English article comprehension in intermediate non-native speakers from an article-lacking L1 background, in order to examine how a population of learners known to have persistent difficulties with these constructions in production uses them in online comprehension. Based on the Competition Model, one might expect targetlike processing of English articles by speakers of article-lacking languages to be possible, despite the frequently attested difficulties in production even at advanced stages of proficiency (e.g. García Mayo & Hawkins, Reference García Mayo and Hawkins2009; Goad & White, Reference Goad, White, Foster-Cohen, Ota, Sharwood Smith and Sorace2004; Huebner, Reference Huebner1983; Ionin et al., Reference Ionin, Ko and Wexler2004; Jarvis, Reference Jarvis2002; Master, Reference Master1990; Ringbom, Reference Ringbom1987; Tarone, Reference Tarone1985; Thomas, Reference Thomas1989; Trenkic Reference Trenkic, Foster-Cohen, Ruthenberg and Poschen2002, Reference Trenkic2007; Žegarac, Reference Žegarac2004). This would contrast with the predictions of the overshadowing and blocking proposal by Luk and Shirai (Reference Luk and Shirai2009) and of the Shallow Structure Hypothesis (Clahsen & Felser, Reference Clahsen and Felser2006), which would suggest more extensive reliance on lexical and pragmatic factors at the expense of grammatical information.
We first provide a brief description of the English article system and its role in referential processing, as revealed by previous visual-world eye-tracking studies.
English articles in referential expressions: Evidence from the visual-world paradigm studies
In definite and indefinite referential expressions in English (e.g. the mug, a mug) the head noun (with any complements and modifiers) provides a description of the intended referent, while the articles signal its definiteness status. The definite article signals that the referent is definite, i.e. that it can be uniquely identified (it exists and is unique) in the context, as in (1) (see Hawkins, Reference Hawkins1991; Lyons, Reference Lyons1999):
-
(1) Pass me the mug. [e.g. “the only mug that is present”]
The indefinite article signals that the referent is not definite, i.e. that it cannot be uniquely identified. This can be either because the referent is not unique in the context, as in (2) below, or because it does not yet exist in a pragmatically delimited domain mutually manifest to the speaker and the hearer, as in (3).
-
(2) Pass me a mug. [“one of the mugs”]
-
(3) Pass me a mug. [“whatever satisfied the description mug”]
Traditionally, articles (and the linguistic class of determiners to which they belong) have been seen as doing the principal work in “determining (i.e. restricting or making more precise) the reference of the noun phrase in which they occur” (Lyons, Reference Lyons1977, p. 452). More recent psycholinguistic evidence suggests, however, that language comprehenders utilise a much broader array of both linguistic and non-linguistic information, as it becomes available, to restrict the range of potential referents and resolve reference at the earliest opportunity. Most of this evidence comes from studies using the visual-world eye-tracking paradigm. In this paradigm participants typically look at pictures or displays of objects and listen to sentences related to them. The speed with which their looks towards the objects are initiated are closely time-locked to the linguistic input, and this research offers valuable insights into sentence processing mechanisms.
For example, when presented with spoken instructions of the type Touch the plain red square (Eberhard, Spivey-Knowlton, Sedivy & Tanenhaus, Reference Eberhard, Spivey-Knowlton, Sedivy and Tanenhaus1995) while looking at a visual display, listeners use each modifier as it is encountered to narrow down the referential domain and initiate looks to the target as soon as sufficient information is accumulated (e.g. after hearing plain if only one of the objects was plain, after red if there were more than one object that were plain but a single object that was red, and only after square, if there were two objects that were both plain and red, but only one that was also square).
Crucially, the incrementality and predictiveness in reference resolution are not limited to the accumulation of information within the referring noun phrase only. Information extracted from other words preceding the noun phrase is also used to predict which entity will be referred to. For example, on hearing an instruction with a prepositional phrase such as Put the whistle inside the can, at inside, listeners’ attention is taken away from non-container objects in a scene and directed towards container objects. Further, when only one container object is present in the display (e.g. a can), the looks towards the container start to diverge from the looks to other objects as early as the offset of the preposition inside (Chambers, Tanenhaus, Eberhard, Filip & Carlson, Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002). Similarly, on hearing the sentence The boy will eat the cake in a scene with several objects where a cake is the only edible item, listeners start to fixate the cake during the verb eat (Altmann & Kamide, Reference Altmann and Kamide1999), i.e. before the onset of the noun phrase. In fact, even broader general knowledge of what is likely to happen in a particular situation can influence the timecourse of reference resolution. For example, Kamide, Altmann and Haywood (Reference Kamide, Altmann and Haywood2003) show that on hearing ride in the sentences The man will ride the motorbike and The girl will ride the carousel while looking at a scene depicting these protagonists and objects, listeners show more anticipatory looks towards the motorbike after the man will ride than after the girl will ride, and more looks towards the carousel after the girl will ride than after the man will ride.
This evidence suggests that reference resolution is a highly incremental, predictive and cumulative process: information extracted from lexical items both within and outside of a referential phrase, together with object affordances and the general knowledge of what might happen in a given situation, are all utilised in real time to constrain referential domains and they all contribute towards successful reference resolution. Thus, in situated language use, it is possible to identify the intended referent even in the absence of articles (Brown, Reference Brown1973; Hawkins, Reference Hawkins2004). However, even though this research indicated that English articles do not do the principal work in restricting the reference of the noun phrase in which they occur, as suggested by traditional accounts, it remained unclear what their specific role in online processing was, if any. The first study to shed light on this question was Chambers at al. (Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002).
Chambers at al. (Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002, Experiment 2) manipulated the definiteness status of the referential phrase in the instructions presented to the participants. For example, participants heard either Pick up the cube and put it inside the can or Pick up the cube and put it inside a can. The visual display accompanying these instructions (Figure 1) contained a cube, two cans of different sizes, one other container object and two unrelated non-containers.Footnote 1 Additionally, the cube was either small so that it could fit into either of the cans (two-compatible referent condition), or large so that it could fit only into the larger can (one-compatible referent condition). The linguistic manipulation (the definiteness of the nominal phrase (NP) referring to the target) was crossed with object affordances (the number of potential goal referents). The results showed that participants indeed utilised in real time the information signalled by articles to anticipate the forthcoming referent, in that their looks towards the target diverged faster from other possible referents when the information signalled by the article matched the object affordances than when it mismatched it. Specifically, on hearing inside the can, where the definite NP signals that the referent is uniquely identifiable, participants resolved reference sooner when there was a single pragmatically appropriate object in the display (large cube fitting only the larger can) than when there were two (small cube fitting both cans). At the same time on hearing inside a can, where the indefinite NP can implicate non-uniqueness, reference resolution was facilitated when there were two objects compatible with the instruction compared to when there was only one compatible object in the display. These findings clearly demonstrate that while non-linguistic information such as object affordances can be exploited early to predict which entity will be referred to, in English this is further influenced by the use of articles signalling the definiteness status of the referential expression.

Figure 1. Experimental display from Chambers et al. (Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002), Experiment 2.
In sum, psycholinguistic research has shown that articles are not the be-all-and-end-all in determining the reference of a noun phrase, but it has also demonstrated that for native speakers of English they do constrain the set of candidates considered as a potential referent. The question we asked in the present study was whether speakers of English as a second language, particularly when they come from L1s without articles, are also able to utilise potentially informative articles to circumscribe referential domains in real time. Specifically, do L2 speakers from article-lacking L1 backgrounds rely on lexical and pragmatic information and ignore articles, as the overshadowing and blocking account (Luk & Shirai, Reference Luk and Shirai2009) and the Shallow Structure Hypothesis (Clahsen & Felser, Reference Clahsen and Felser2006) would predict? Or could these morphosyntactic cues be processed efficiently by both native and non-native speakers, as long as the learners’ L1 does not have a different morphosyntactic realisation of this feature, as the Competition Model (MacWhinney, Reference MacWhinney, Kroll and de Groot2005) would predict?
We explored these questions here by testing Mandarin learners of English. Mandarin Chinese does not have articles and many referential expressions appear in a bare nominal form. The definiteness status of such expressions can be computed through linguistic and non-linguistic information in the context (e.g. lexical information, object affordances) along the same lines as those described above for English (see Chen, Reference Chen2004; Luk & Shirai, Reference Luk and Shirai2009). As in the studies described above, the online use of pragmatic, lexical and morphosyntactic information was investigated using the visual-world paradigm which allows studying real-time language processing in a relatively naturalistic setting. The sentential stimuli from Experiment 2 in Chambers et al. (Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002) were adapted to Clipart-based scenarios instead of real objects (Figure 2). We recorded participants’ eye movements while they viewed displays and simultaneously listened to the sentences in (4) and (5) below, related to the scenes. The sentential stimuli manipulated the definiteness status of the target NP, while the visual displays manipulated pragmatic affordances in the scene.
-
(4) The pirate will put the cube inside the can.
-
(5) The pirate will put the cube inside a can.

Figure 2. Example visual stimuli for (a) the two-compatible referent condition, and (b) the one-compatible referent condition.
The manipulation of pragmatic affordances was similar to Chambers et al. (Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002) in that in the two-compatible referent condition (Figure 2a) the cube could fit both cans, whereas in the one-compatible referent condition (Figure 2b) it could fit only one of them. This manipulation was achieved by varying the properties of the container (one closed and one open, or both open) rather than the size of the cube. If non-native speakers are able to make use of L2 articles in real time due to the lack of competition from a similar morphosyntactic feature in their L1, we would expect to replicate the original findings from Chambers et al. (Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002). In other words, we would expect looks towards the target to start to diverge from looks towards the competitor faster in the linguistically and pragmatically matched conditions (the + one-compatible referent condition; a + two-compatible referent condition) than in mismatched conditions (the + two-compatible referent condition; a + one-compatible referent condition). If, however, non-native speakers from the article-lacking L1 over-rely on pragmatic affordances and cannot successfully process articles in real time, we might expect that they would resolve reference faster when there is only one available referent in the scene (Figure 2b) than when a choice between two referents needs to be made (Figure 2a), irrespective of the definiteness status of the goal NP.
With its ability to measure fine-grained time course of online comprehension, the visual-world paradigm has been crucial in resolving theoretical debates related to lexical access in a second language (e.g. Chambers & Cooke, Reference Chambers and Cooke2009; Kaushanskaya & Marian, Reference Kaushanskaya and Marian2007; Marian & Spivey, Reference Marian and Spivey2003; Marian, Spivey & Hirsch, Reference Marian, Spivey and Hirsch2003; Spivey & Marian, Reference Spivey and Marian1999; Weber & Cutler, Reference Weber and Cutler2004). While this paradigm has the potential to do the same for debates in L2 grammatical processing, only a handful of studies have so far applied it in this domain, focusing on pronoun resolution in L2 German (Ellert, Reference Ellert2011; Wilson, Reference Wilson2009) and grammatical gender processing (in L2 Spanish: Grüter et al., Reference Grüter, Lew-Williams and Fernald2012; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010; in L2 German: Hopp, Reference Hopp, Biller, Chung and Kimball2012) (see Dussias, Reference Dussias2010, and Roberts, Reference Roberts2012, for reviews). This is the first visual-world study that focuses on English as a second language, and which uses this paradigm to explore how L2 users utilise both grammatical (articles) and pragmatic (object affordances) information in online language comprehension.
Method
Participants
Fifty-seven L1 Mandarin/L2 English speakers and 59 native English speakers, students at the University of York, UK, were recruited through posted notices. Data from four L2 participants were removed because their accuracy scores in the eye-tracking experiment were low (below 60%). Data from another five L2 participants and from three L1 participants were also removed to maintain a full counterbalancing of the stimuli; these participants had the lowest accuracy scores in their respective groups/lists. Data analyses were conducted on the remaining 48 L2 participants and 56 L1 participants.
We recruited Mandarin speakers of intermediate proficiency in English (IELTS score of at least 6, which is necessary to gain entry for a study at a UK university), as this population is reported to have difficulties with English articles (e.g. Díez-Bedmar & Papp, Reference Díez-Bedmar, Papp, Gilquin, Papp and Díez-Bedmar2008; Robertson, Reference Robertson2000; Trenkic, Reference Trenkic2008). The criteria for inclusion in the study were that participants were Mandarin speakers and not raised as English–Mandarin bilinguals. At the beginning of the study, we administered two computerised tests assessing English language proficiency (Quick Placement Test (QPT), 2001) and language background. The average QPT score was 58.25 (SD = 4.95, range of 44–65%).Footnote 2 The participants’ average age of first L2 exposure was 11.5 years (range 3–14).Footnote 3 There were two participants who reported starting learning English before the age of six, and six participants below the age of 10. These were all, however, in non-immersion settings; there was no correlation between the age of first exposure and the proficiency score in our sample (r = .04, p > .05). Most L2 participants were newly arrived international students, with the median length of stay in an English-speaking country of two months; eight participants, however, reported being in the country for more than a year (range: from 4 days to 5 years). There was no correlation between the length of stay and the proficiency score (r = .05, p > .05).
Participants received course credit or were remunerated for their time.
Materials
Visual materials consisted of Clipart-based pictures such as Figure 2 above. Each picture consisted of a human agent, and six objects arranged in a roughly circular display, with one object in the centre, and five objects on the periphery. On critical trials, three objects were containers (e.g. the cans and the basket in Figure 2), and three were non-containers (cube, pencil, rope). Of the three non-container objects, the object in the centre of the scene was the theme object of the description (cube; see below for the description of auditory stimuli). The two other non-container objects were not related to the description in any way. Of the three container objects, two were the potential goal referents (e.g. the two cans). The third container object was of a different type and served as distractor to reduce the likelihood of participants expecting that they should make a decision between the pair of identical containers. The relative positions of the two potential goal referents and the distractor were counterbalanced. The potential goal referents were separated by at least one object on one side and two on the other side.
There were 24 experimental pictures in total, 12 in each visual condition. In the two-compatible referent condition, the two goal exemplars were identical, and both could serve as the target (Figure 2a). In the one-compatible referent condition (Figure 2b), the two goal exemplars differed in that one of them was depicted as closed (e.g. the can above) or full, so that in the context of the description it could not serve as the target. The pictures were 800 × 600 pixels in size.
There were 24 experimental descriptions in total, 12 in each definiteness condition. All experimental items were of the form “The [agent] will put the [theme] inside the/a [goal]” (e.g. for Figure 2: The pirate will put the cube inside the/a can). The sentences were digitally recorded by a native speaker of British English (GTMA) in a sound attenuated booth, sampling at 44.1 KHz. All sentential and visual stimuli are provided in the appendices.
The definiteness of the noun phrase (the/a can) was crossed with the object affordances in the scene (one vs. two potential goal referents), resulting in four experimental conditions. Four lists of trials were constructed, each containing 12 experimental trials, three in each condition (two-compatible referent, indefinite; two-compatible referent, definite; one-compatible referent, indefinite; one-compatible referent, definite). Participants were randomly assigned to a list. Thus, each participant was presented with three items in each of the four conditions, but they never saw the experimental pictures in more than one experimental condition. Across the four lists, each picture was presented with each definiteness condition.
In addition to the experimental items, there were 48 filler items in each list. A total of 36 filler instructions were presented with 18 filler pictures. Similar to the experimental pictures, the filler pictures contained a human agent and six objects (a mix of containers and non-containers, with some displays containing one and some three exemplars of the same container). The filler sentences were similar to the experimental items with half of the items containing the preposition inside and half beside, and half presented with an indefinite, and half with a definite goal NP (counterbalanced across prepositions). The remaining 12 filler sentences were presented with the 12 experimental pictures (six from the one-compatible referent condition, six from the two-compatible referent condition), so that all pictures, both filler and experimental ones, appeared twice in the experiment. The filler sentences with experimental pictures used the preposition beside and none referred to the theme or goal referents of the experimental sentence. For example, for Figure 2a the filler sentence was The pirate will put the basket beside the rope. Filler trials were randomly intermixed with the experimental trials.
Procedure
Mandarin native speakers were tested in two sessions. In Session 1, participants completed two computer-based tasks: the Quick Placement Test (2001) measuring English language proficiency, and a language background questionnaire. In Session 2 participants completed the eye-tracking experiment. English native speakers were tested in one session, in which they completed the eye-tracking experiment. All participants signed a consent form and were tested individually in a quiet room.
For the eye-tracking experiment, participants were seated in front of a 22-inch display monitor, with their eyes approximately 60 cm away from the monitor. They wore an EyeLink II head-mounted eye-tracker, sampling at 250 Hz. The auditory stimuli were presented via two loudspeakers located at each side of the display screen.
Participants were told that they would see some pictures and hear the descriptions about what is going to happen in the picture. Using an example item which was similar to the items presented in the experiment (but not presented during the experiment), they were instructed to mouse-click on the location on the screen where the described object will end up.
A drift correction dot was presented at the onset of each trial. After the participant looked at the dot, it was replaced by the visual scene, which stayed on the screen for 4000 ms, after which the auditory stimulus was played over the loudspeakers. The picture stayed on the screen until the mouse-click (or for 2000 ms post sentence offset, if there was no click). A nine-point calibration procedure was performed after every six trials to ensure the accuracy of measurements.Footnote 4 There were four practice trials before the main experimental block. There was a short break after 30 trials. The entire session lasted approximately 45 minutes.
Data analysis
Trials on which participants did not click on one of the objects on the screen were excluded from the analyses. This excluded 0.5% and 2% of the total number of experimental trials for L1 and L2 speakers respectively.
We included only the correct trials in the analyses of eye movements. In the one-compatible referent condition this included trials where participants clicked on the pragmatically appropriate goal container (e.g. the open can), whereas in the two-compatible referent condition this included trials where participants clicked on either exemplar of the goal containers. The container on which the participant clicked was labelled as the target, and the second goal exemplar as the competitor. One item was excluded from all analyses due to low accuracy levels in both native and non-native speakers, which was caused by the poor rendering of the competitor container in the one-compatible referent condition (item 10 in Appendix B).
The timing and the location of eye movements were scored beginning with the first fixation made following the onset of the goal referent noun (e.g. can) and ending with the fixation that preceded the mouse click, with the eye-movements synchronised to the speech signal on a trial-by-trial basis. Given that the duration of the determiners was 135 ms and 153 ms for a and the, respectively, and that it takes approximately 200 ms to initiate a saccadic eye-movement (Matin, Shao & Boff, Reference Matin, Shao and Boff1993), the above criteria ensured that only those eye movements that could have plausibly been launched on the basis of the information contained in the determiners and the following speech were included in the analyses.
Results
Accuracy
Overall, both the native and non-native speakers of English found the task easy. The average accuracy (percentage of mouse clicks on the goal referent) across all four conditions for native speakers was 96%, and for non-native speakers 90%, a reliable difference (F1 (1,102) = 16.08, p < .001; F2 (1,10) = 7.17, p < .02). Both groups of participants were more accurate in the two-compatible referent condition than in the one-compatible referent condition as shown in Table 1 (F1 (1,102) = 28.79, p < .001; F2 (1,10) = 23.90, p < .01). There were no other main effects or interactions.
Table 1. Accuracy rates (percentage of mouse clicks on the target referent) for native (L1) and non-native (L2) speakers, across two types of visual displays and two definiteness conditions.

Eye-movement analyses
For each experimental condition, cumulative proportions of looks to objects in the scene were calculated, across 25 ms windows from the onset of the target noun (e.g. can).
In the one-compatible referent condition, the container that could serve as the goal referent (e.g. the open can in Figure 2b) was labelled as the target, and the container of the same name which pragmatically could not serve as the goal referent (the closed can) as the competitor. In the two-compatible referent condition (Figure 2a), whichever of the two potential goal referents the participant clicked on was labelled as the target, and the other as the competitor.
The analyses sought to determine the earliest point in time at which the proportion of looks to the target becomes reliably greater than looks to the competitor. Within-subject analyses of variance (ANOVAs) were conducted separately for each condition to determine this point. We used planned contrasts whereby the difference was considered reliable if it was statistically significant in three consecutive time windows (see Chambers et al., Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002). Arcsine transformed cumulative proportions of looks to the target and the competitor across the trials were used as the dependent variable. The graphs represent the raw, untransformed proportions for ease of exposition. The full vertical line marks the noun offset, and the intermittent vertical line indicates the earliest point at which looks to the target diverged from looks to the competitor.
English native speakers
Similarly to when performing the task with real objects (Chambers et al., Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002), the native speakers of English showed sensitivity to linguistic context (Figures 3 and 4), with faster reference resolution (earlier divergence between looks to the target and the competitor) when the linguistic and the pragmatic contexts matched. With an indefinite noun phrase, the participants were faster to resolve referential ambiguity when there were two possible referents in the display (Figure 3): fixations to the target diverged from the competitor within 100 ms after the onset of the noun in the two-compatible referent condition (Figure 3, left panel), and at 275 ms in the one-compatible referent condition (Figure 3, right panel). Similarly with the definite noun phrase, the native speakers were faster in resolving referential ambiguity when the pragmatic and linguistic context matched: fixations to the target referent in the one-compatible referent condition diverged from the competitor earlier (at 200 ms post–noun-onset, Figure 4, left panel) than in the two-compatible referent condition (at 250 ms post–noun-onset, Figure 4, right panel).

Figure 3. Native speakers. Cumulative proportion of fixations in the indefinite noun phrase condition (. . . put the cube inside a can ): two-compatible referent display (left panel), and one-compatible referent display (right panel), synchronised to the noun onset. The dashed line represents the first point in time when the looks to the target referent are significantly larger than the looks to the competitor. The solid line represents the average noun offset.

Figure 4. Native speakers. Cumulative proportion of fixations in the definite noun phrase condition (. . . put the cube inside the can ): one-compatible referent display (left panel), and two-compatible referent display (right panel), synchronised to the noun onset. The dashed line represents the first point in time when the looks to the target referent are significantly larger than the looks to the alternative. The solid line represents the average noun offset.
These findings were confirmed by within-subjects ANOVAs with object (target referent vs. competitor) and time window (25 ms intervals from noun onset until 650 ms post–noun-onset) as independent variables, and arcsine transformed cumulative proportion of fixations across trials as the dependent variable, performed separately for the different pragmatic and linguistic contexts.
In all conditions, as expected, there was a main effect of time with the overall proportion of fixations increasing as the noun unfolded (see Table 2 for F and p values). There was also a main effect of object, with overall more fixations to the target referent relative to the competitor (Table 2). There was also an interaction between the time window and object (Table 2). Planned contrasts indicated that the difference between fixations to the two objects after hearing an indefinite NP emerged at 100 ms post–noun-onset in the two-compatible referent condition (F1 (1,55) = 6.77, p < .05, F2 (1,10) = 7.36, p < .05), whereas in the one-compatible referent condition there was a delay, with the difference emerging at 275 ms post-noun onset (F1 (1,55) = 5.28, p < .05; F2 (1,10) = 3.02, p > .05; the item analysis was significant only at 550 ms post–noun-onset: F2 (1,10) = 5.46, p < .05). Conversely, with a definite NP, the difference between fixations to the target referent and the competitor emerged earlier, at 200 ms post–noun-onset, in the one-compatible referent condition (F1 (1,55) = 4.61, p < .05; F2 (1,10) = 12.36, p < .05 at 200 ms, but also F2 (1,10) = 5.41, p < .05 from 125 ms, suggesting a really fast resolution), and only at 250 ms post–noun-onset in the two-compatible referent condition (F1 (1,55) = 7.72, p < .05; F2 (1,10) = 21.20, p < .05).
Table 2. Native speakers: F and p values for the main effects of time period (25 ms intervals from noun onset until 650 ms post–noun-onset), and object (target goal referent vs. competitor), and their interaction. (Greenhouse-Gisser correction was used when the sphericity assumption was violated.)

These findings replicate the results reported by Chambers et al. (Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002) in that the native English speakers showed sensitivity to the grammatical information conveyed by the articles, even when not using real objects. In both Chambers et al. (Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002) and our study, reference resolution was facilitated when the linguistic information matched the pragmatic context.
Second language learners
Like the English native speakers, the Mandarin learners of English were faster to resolve referential ambiguity when the linguistic and the pragmatic context matched. With the indefinite noun phrase and two compatible goal referents in the display, the L2 learners’ looks to the target referent started diverging from the competitor 300 ms post–noun-onset, whereas with one compatible referent they only diverged at 550 ms post–noun-onset (Figure 5). Conversely, in the definite noun phrase condition the looks to the target referent diverged from the competitor sooner with only one compatible goal referent in the display (at 175 ms post–noun-onset) than with two compatible referents (at 350 ms post–noun-onset, Figure 6).

Figure 5. Second language learners. Cumulative proportion of fixations in the indefinite noun phrase condition (. . . put the cube inside a can ): two-compatible referent display (left panel), and one-compatible referent display (right panel), synchronised to the noun onset. The dashed line represents the first point in time when the looks to the target referent are significantly larger than the looks to the competitor. The solid line represents the average noun offset.

Figure 6. Second language learners. Cumulative proportion of fixations in the definite noun phrase condition (. . . put the cube inside the can ): one-compatible referent display (left panel), and two-compatible referent display (right panel), synchronised to the noun onset. The dashed line represents the first point in time when the looks to the target referent are significantly larger than the looks to the competitor. The solid line represents the average noun offset.
These findings were confirmed by within-subjects ANOVAs with object and time window as independent variables, and arcsine transformed cumulative proportion of fixations as the dependent variable.Footnote 5
In all conditions there was a main effect of time with overall proportion of fixations increasing as the noun unfolded (see Table 3 for F and p values). There was also a main effect of object with overall more fixations to the target referent than the competitor (Table 3). As with the native speakers, this was characterised by the object by time interaction. Planned contrasts in the indefinite NP, two-compatible referent condition, indicated that the difference in looks between the two objects started to emerge starting from 300 ms post–noun-onset (F1 (1,47) = 4.48, p < .05; marginally significant by items: F2 (1,10) = 4.25, p = .07), and only from 550 ms post–noun-onset with only one compatible referent in the display (F1 (1,47) = 7.14, p < .05; F2 (1,10) = 6.07, p < .05). In the definite NP condition, planned contrasts indicated that the looks to the two objects diverged starting from 175 ms post-noun onset with one compatible referent in the display (F1 (1,47) = 6.69, p < .05; F2 (1,10) = 11.09, p < .05), whereas only starting from 350 ms post–noun-onset with two compatible referents in the display (F1 (1,47) = 5.66, p < .05; marginally significant by items: F2 (1,10) = 4.47, p = .061).
Table 3. Second language learners: F and p values for the main effects of time period (25 ms intervals from noun onset until 650 ms post–noun-onset), and object (target goal referent vs. competitor), and their interaction.

Figure 7 summarises the earliest points of divergence of looks to the target and the competitor referents across experimental conditions.

Figure 7. The earliest point of divergence of looks to the target and the competitor referents across experimental conditions.
Discussion
Morphosyntax vs. pragmatic affordances in L2 processing
This research set out to investigate whether in L2 processing, a morphosyntactic structure unique to the target language, known to cause considerable difficulties in production, can nevertheless be utilised in real time to aid sentence comprehension. We employed the visual-world eye-tracking paradigm to specifically explore whether L1 Mandarin/L2 English speakers have the ability to make use of English articles in reference resolution, or whether they predominantly rely on pragmatic affordances (what is possible in the context), effectively ignoring the information signalled by the articles.
The results indicate that, just like native speakers of English, intermediate Mandarin-speaking learners of English do not over-rely on pragmatic considerations in reference resolution in English. The task in hand – and the nature of referential processing more generally – were such that participants could have successfully completed it relying solely on the available lexical information and object affordances: in the one-compatible referent condition, there was ever only one possible outcome (only the open can could accommodate the cube); in the two-compatible referent condition, two solutions were equally plausible and a choice had to be made. If participants were over-relying on object affordances, we might have expected reference resolution always to be faster when there was only one compatible referent in the scene, compared to when a choice had to be made between two compatible referents. Instead, we found that the linguistic information signalled by the articles and pragmatic affordances (the number of compatible referents) interacted: with the definite noun phrase, both groups of participants indeed resolved reference sooner when there was only one compatible referent in the scene (only one can open); critically, however, with the indefinite noun phrase, reference was resolved sooner when there were two compatible referents in the scene (two open cans).
These results have important implications for models of L2 processing. They add to a growing body of evidence suggesting that at least some structures unique to the target language can be processed in a targetlike way. The participants in our study did not ignore English articles in comprehension, as would be predicted by the overshadowing and blocking account (Ellis, Reference Ellis2006; Luk & Shirai, Reference Luk and Shirai2009), nor did they over-rely on pragmatic affordances, as the Shallow Structure Hypothesis (Clahsen & Felser, Reference Clahsen and Felser2006) would predict. Instead, they appeared sensitive to the information signalled by the articles, in line with the Competition Model (MacWhinney, Reference MacWhinney and MacWhinney1987, Reference MacWhinney, Kroll and de Groot2005), which predicts that morphosyntactic cues can be processed efficiently by non-native speakers, as long as their first language does not have a different morphosyntactic realisation of the same grammatical category. Crucially, while targetlike processing of unique-to-L2 structures has been previously observed in self-paced reading tasks and on ERP measures typically using grammatical violations, here we demonstrate it in a visual-world eye-tracking study using well-formed sentences. The current paradigm matches much more closely situated language use and as such it shows that L2 users can actively integrate morphosyntactic information unique to the L2 in real time to facilitate grammatical sentence processing in comprehension.
Incremental sentence processing in the L2
The results of our study also provide converging evidence for incremental processing in reference resolution (e.g. Altmann & Kamide, Reference Altmann and Kamide1999; Chambers et al., Reference Chambers, Tanenhaus, Eberhard, Filip and Carslon2002; Eberhard et al., Reference Eberhard, Spivey-Knowlton, Sedivy and Tanenhaus1995), and demonstrate that like L1 speakers, L2 users also utilise a variety of information, linguistic and non-linguistic, as it becomes available to resolve reference at the earliest opportunity. For example, by the time they have encountered the nominal following the preposition inside, both groups in our study looked only at the container objects in the scene, paying minimal attention to the non-container objects. Furthermore, in the definite NP condition, reference resolution occurred faster when there was only one object of the relevant description that could accommodate the theme than when two objects were compatible with the description. This ability to exclude the object of the relevant description but unavailable for the immediate task (i.e. a closed can) from the referential domain as the utterance unfolds indicates that L2 learners also rapidly integrate lexical information with considerations of possible actions. Finally, when there was only one compatible referent in the scene, the participants resolved reference sooner after hearing the definite NP than after an indefinite NP; the opposite was the case when two compatible referents were present in the scene, with reference resolution occurring sooner after the indefinite noun phrase. This shows that both native and non-native speaker groups were able to utilise articles to constrain referential domains: on hearing the, they expected to find a single object that matched the following noun; on hearing a, they expected there to be more than one object of the same name present.Footnote 6
While L2 speakers appear able to engage in incremental processing, and furthermore utilise morphosyntactic cues unique to the L2, this is not to say that there is no cost to L2 processing. Non-native speakers are generally slower than native speakers in online sentence processing, as demonstrated across a variety of tasks and linguistic structures (e.g. Hahne & Friederici, Reference Hahne and Friederici2001; Lew-Williams & Fernald, Reference Lew-Williams and Fernald2010; Sanders & Neville, Reference Sanders and Neville2003). In our study this is evident by the typically later point at which the looks towards the target diverge from the competitor compared to the native speakers, and generally fewer and slower looks to all objects in the scene. Yet, slower processing did not prevent the L2 speakers from resolving reference with greater efficiency when the article matched the pragmatic affordances in the context. This suggests that slower processing per se does not inevitably lead to the inability to employ L2 morphosyntactic cues incrementally in sentence comprehension.
Interestingly, the native speakers resolved reference the fastest when the indefinite article was used in the two-compatible referent condition (a can with two cans open), whereas the L2 speakers were the fastest when the definite article was used in the one-compatible referent condition (the can with one open can) (Figure 7). One possibility is that L2 users find the easier to process than a because of its more consistent interpretation in discourse (signalling a uniquely identifiable referent in the context, vs. multiple readings of the indefinite article – see section on English article in referential expressions above). Consistency in form–meaning mapping has been shown to be one of the factors impacting the ease with which grammatical morphemes are learned (Goldschneider & DeKeyser, Reference Goldschneider and DeKeyser2001). The more efficient processing of the compared to a in comprehension by L2 users also seems in line with research suggesting a more accurate L2 production of the definite compared to the indefinite article (Trenkic, Reference Trenkic, Foster-Cohen, Ruthenberg and Poschen2002).
Comprehension vs. production
Previous research has demonstrated that second language speakers from L1s without articles often show persistent variability in L2 article production. Chinese learners of English are a population on which this issue has often been illustrated (e.g. Díez-Bedmar & Papp, Reference Díez-Bedmar, Papp, Gilquin, Papp and Díez-Bedmar2008; Han, Reference Han, Han and Cadierno2009; Lardiere, Reference Lardiere, Brugos, Micciulla and Smith2004; Lu, Reference Lu2001; Robertson, Reference Robertson2000; Trenkic, Reference Trenkic2008). Our current findings, however, suggest that such learners may nevertheless be sensitive to the information signalled by articles in real-time comprehension. In addition, while variability in production has been reported even with very advanced learners, the sensitivity to articles in comprehension is here detected with late bilinguals of only an intermediate level of English proficiency. These findings are in line with the literature that developmentally production often lags behind comprehension (e.g. Gaer, Reference Gaer1969, for L1 development; Swain, Reference Swain, Gass and Madden1985, for L2 development). In future studies it would be interesting to examine how early in their development L2 learners become capable of utilising articles in comprehension.
In sum, the outcome of our study suggests that whatever problems intermediate (and advanced) Mandarin speakers of English experience in article production, these are unlikely to be directly associated with inappropriate processing strategies in comprehension. We outline here a novel proposal that reconciles the two sets of findings, illustrating how the same grammatical representations may lead to different behavioural outcomes in production and in comprehension.
Our results indicate that through their exposure to English, Mandarin learners can establish the requisite form–meaning connections for English articles the and a. Specifically, they show evidence of understanding that the “the + NP” construction in English maps onto a referent that exists and is unique in a context, whereas the “a + NP” sequence maps onto a referent that may not be unique. These newly established form–meaning connections, coupled with no elements from the L1 competing with the L2 articles, ensure that articles are processed in a targetlike manner in comprehension.
The lack of competition in comprehension, however, does not in itself rule out competition in production. Unlike comprehension, production involves making choices about which lexical items, structures, etc. to use to express a message (e.g. Bock & Levelt, Reference Bock, Levelt and Gernsbacher1994), and bilinguals’ experience with two languages makes such choices even more complex. Extensive evidence suggests that two language systems in a bilingual are active and compete for selection. This has been predominantly demonstrated in research on lexical access (e.g. Abutalebi & Green, Reference Abutalebi and Green2007; Green, Reference Green1998; Hermans, Bongaerts, de Bot & Schreuder, Reference Hermans, Bongaerts, de Bot and Schreuder1998; Kaushanskaya & Marian, Reference Kaushanskaya and Marian2007; Kroll, Bobb & Wodniecka, Reference Kroll, Bobb and Wodniecka2006; Kroll, Sumutka & Schwartz, Reference Kroll, Sumutka and Schwartz2005; Kroll & Stewart, Reference Kroll and Stewart1994; Spivey & Marian, Reference Spivey and Marian1999), but we make here an important further suggestion that the competition occurs at the structural level as well.
At the level of grammatical encoding of a referent, Mandarin speakers’ experience with L2 English may activate the “Art + NP” structures to refer to countable concepts. But the much longer experience with their article-lacking L1 is likely to favour the selection of bare NP forms. For example, in wishing to refer to a single can (e.g. Can you pass me the can, please?), both the L1-licensed can, and the L2-licensed the can (and a can) will be competing for selection (Figure 8). The model accounts straightforwardly for cases of article omissions (they are the cases where the L1-licensed bare nominal is selected). Furthermore, the problems in choosing an appropriate article for the context (substituting the for a and vice versa) may also be an indirect consequence of the cross-linguistic competition. The knock-on effect that the competition from the L1 has on the speed of processing and the available resources may adversely affect the bilinguals’ ability to integrate syntactic and pragmatic information in real time, thus explaining why L2 speakers are not always consistent at their article choices in production.

Figure 8. Competition between L1-licensed and L2-licensed structures in bilingual referential production.
In sum, we propose that persistent variability in L2 production may be related to the structural competition from the L1 (see Trenkic, Reference Trenkic, García Mayo and Hawkins2009; Trenkic & Pongpairoj, Reference Trenkic and Pongpairoj2013). This is in line with constraint satisfaction models in (L1) sentence comprehension and production (e.g. Haskell & MacDonald, Reference Haskell and MacDonald2003), which also explain variability in responses as a consequence of competition.
A further prediction arises from this proposal. If in Mandarin–English bilinguals’ linguistic representations countable concepts map onto both “Art + NP” and bare NP forms, then this population should also show little sensitivity to grammatical violations (absence of articles) in comprehension. In other words, the bare nominal can, when the context is favourable, should map onto the concept of a can that exists and is unique in the context, as quickly and as easily as the can does (Figure 9).

Figure 9. Predictions for L2 reference resolution arising from the structural competition model.
While we do not have data to speak to this prediction, it seems consistent with the findings indicating the lack of sensitivity to grammatical violations of some unique-to-L2 structures, such as plural marking (Jiang, Reference Jiang2004, Reference Jiang2007) and third person singular -s in English (Ojima et al., Reference Ojima, Nakata and Kakigi2005). For example, as our model would predict, the results of Jiang (Reference Jiang2007) suggest that Chinese learners of L2 English, who do not mark plural on nouns in their first language, may, in the appropriate context, map singular nouns in English (e.g. coin) to conceptually plural representations, failing to detect the violation (e.g. *The visitor took several of the rare coin in the cabinet). Further research is needed, using methodologies appropriate for detecting online sensitivity to structural violations (e.g. ERPs) to establish whether the prediction of the structural competition model regarding L2 referential resolution described above also holds true.
It is also useful to consider this model and the results of our study with regard to current perspectives on the nature of L2 knowledge, which have traditionally been informed by L2 production data. The proposal outlined above would initially appear incompatible with the view of the Representational Deficit Hypothesis (e.g. Hawkins & Chan, Reference Hawkins and Chan1997) which assumes that errors in production are the result of the deficient L2 knowledge. The results of our study suggest that targetlike form–meaning connections for unique-to-L2 cues can be successfully established. What remains to be determined is whether the sensitivity of L2 learners to structural violations involving such structures (e.g. absence of articles) can also be developed. Our model predicts that L2 users’ grammar may indeed be non-targetlike in that respect.
Similarly to the Processing Deficit Approach (e.g. Prévost & White, Reference Prévost and White2000), our model assumes that production errors are the outcome of processing difficulties in production, but unlike it, we do not suggest that these are entirely L2-generic and uninfluenced by the learners’ L1. Consistent with the view that language systems within a bilingual mind cannot be kept fully apart, we argue that persistent problems in production are best explained by grammatical competition between L1 and L2 structures. While both accounts could in principle explain L2 production, the prediction regarding L2 comprehension arising from our model is that L2 users would be insensitive to structural violations in the L2 which are compatible with their L1 (e.g. bare NPs referring to countable objects). The Processing Deficit approach, which assumes fully targetlike knowledge, does not predict such insensitivity in comprehension.
In sum, our findings demonstrate that over-reliance on lexical and pragmatic information in a second language is not inevitable, and that the processing of grammatical structures unique to the target language is possible in real-time comprehension – even for those structures that present persistent difficulties in L2 production. This suggests that a given state of L2 grammar can have different consequences for production and comprehension processes. Therefore, this research highlights the importance of considering both production and comprehension for reaching valid conclusions about the status of L2 grammars.
Appendix A Experimental stimuli – sentences
-
1. The chef will put the candle inside the/a jar.
-
2. The woman will put the notepad inside the/a box.
-
3. The pirate will put the cube inside the/a can.
-
4. The policeman will put the matches inside the/a flowerpot.
-
5. The prisoner will put the plate inside the/a bag.
-
6. The man will put the can inside the/a basket.
-
7. The nurse will put the bottle inside the/a jar.
-
8. The nun will put the sponge inside the/a box.
-
9. The queen will put the ball inside the/a can.
-
10. The gangster will put the banana inside the/a flowerpot.
-
11. The girl will put the balloon inside the/a bag.
-
12. The monk will put the jar inside the/a basket.
Appendix B: Experimental stimuli – visual


