WHICH FACTORS DETERMINE THE CHOICE OF REFERENTIAL EXPRESSIONS IN L2 ENGLISH DISCOURSE?: NEW EVIDENCE FROM THE COREFL CORPUS

Teresa Quesada; Cristóbal Lozano

doi:10.1017/S0272263120000224

WHICH FACTORS DETERMINE THE CHOICE OF REFERENTIAL EXPRESSIONS IN L2 ENGLISH DISCOURSE?

NEW EVIDENCE FROM THE COREFL CORPUS

Published online by Cambridge University Press: 09 July 2020

Teresa Quesada

and

Cristóbal Lozano

Show author details

Teresa Quesada: Affiliation:
Universidad de Granada
Cristóbal Lozano*: Affiliation:
Universidad de Granada
*: *Correspondence concerning this article should be addressed to Cristóbal Lozano, Departamento de Filologías Inglesa y Alemana, Universidad de Granada, 18071 Granada, Spain. E-mail: cristoballozano@ugr.es

Article contents

Abstract
THE ACQUISITION OF RES IN L2 ENGLISH
RESEARCH QUESTIONS
METHOD
RESULTS
DISCUSSION
CONCLUSION
Footnotes
References

Rights & Permissions

Abstract

Referential expressions (REs) have been investigated in L2 English but to date there is no single study that systematically and simultaneously analyzes the development and acquisition of the multiple factors that constrain the choice of REs in natural discourse production. We investigate L1 Spanish–L2 English learners across three proficiency levels versus an English control group from the COREFL corpus. An analysis of both the RE and its antecedent(s) reveals that different intra- and extralinguistic factors constrain the choice of REs (information status, activated antecedents, syntactic configurations, characterhood, within-task effect, and proficiency level). L2 learners (L2ers) are sensitive to some factors but are unable to fully attain native-like levels even at advanced stages. They do not transfer null subjects from their L1 contrary to previous L2 research, and do not find all contexts at the syntax-discourse interface equally problematic, thus confirming previous theoretical proposals and empirical findings.

Keywords

referential expressions syntax-discourse interface overexplicitness learner corpus research anaphora resolution

Type: Research Article
Information: Studies in Second Language Acquisition , Volume 42 , Issue 5 , December 2020 , pp. 959 - 986

DOI: https://doi.org/10.1017/S0272263120000224 [Opens in a new window]
Open Practices: Open materials
Copyright: © The Author(s), 2020. Published by Cambridge University Press

One of the mechanisms to achieve cohesion in (non)native discourse is the use of referential expressions (REs) (Crosthwaite, Reference Crosthwaite2011; Kang, Reference Kang2004; Leclercq & Lenart, Reference Leclercq and Lenart2013). The choice of such REs in subject position (e.g., null and overt pronominals, as well as noun phrases [NPs]) is partly determined by the type of language. In subject position, null-subject languages like Spanish, Greek, or Italian license null pronouns, whereas nonnull subject languages like English, German, or French require overt pronouns. The choice of REs is constrained by factors such as the information status of the RE (i.e., topic-continuity or topic-shift), the number of activated antecedents, and the nature of the characters intervening in the narrative (Hendriks, Reference Hendriks and Giacalone2003; Kang, Reference Kang2004; Lozano, Reference Lozano, Snape, Leung and Smith2009, Reference Lozano and Alonso-Ramos2016; Martín-Villena & Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020; Ryan, Reference Ryan2015).

Let us illustrate information status. (1) shows a topic-continuity (a.k.a. topic-maintenance) scenario in which the speaker maintains the reference to the subject Lucy/Lucía in the subsequent sentences (shown in bold). Pronominal subjects are overtly realized in native English (she), though null forms (Ø) are possible but are restricted to only syntactically coordinate sentences, as previous corpus studies report (Crosthwaite, Reference Crosthwaite2011; Leclercq & Lenart, Reference Leclercq and Lenart2013).

By contrast, in a null-subject language like Spanish (2), null pronominal subjects (Ø) encode topic-continuity (Alonso-Ovalle et al., Reference Alonso-Ovalle, Fernández-Solera, Frazier and Charles2002; Lozano, Reference Lozano, Snape, Leung and Smith2009, Reference Lozano and Alonso-Ramos2016, Reference Lozano2018).

A shift in topic is marked in English and Spanish obligatorily using overt REs like pronominal subjects (he/él) and NPs (Eva) as in (3) and (4), whose choice is constrained by additional factors, as will be discussed.

The use of REs has been of particular interest for L2 research because even very advanced L2ers are unable to use REs in a native-like manner regardless of L1–L2 combination (Lozano, Reference Lozano2018; Pladevall Ballester, Reference Pladevall Ballester2013; Prentza, Reference Prentza, Nikolaos, Thomaï and Sougari2014; Tsimpli & Sorace, Reference Tsimpli and Sorace2006). The interface hypothesis (Sorace, Reference Sorace2011) states that constructions that involve an interface between syntax and other language-external cognitive domains like discourse (syntax-discourse interface) are persistently problematic for bilinguals, including L2ers. The choice of REs is a phenomenon at the syntax-discourse interface but, importantly, it is not problematic as a whole, but is rather selectively restricted to a specific person (3rd-person singular, Lozano, Reference Lozano, Snape, Leung and Smith2009) and to a specific information-status context because topic-continuity contexts appear to be more problematic than topic-shift and contrastive-focus contexts, at least in L2 Spanish (Lozano, Reference Lozano, Snape, Leung and Smith2009, Reference Lozano2018).

In L2 English, which is the focus of this study, L2ers are overexplicit (Crosthwaite, Reference Crosthwaite2011; Hendriks, Reference Hendriks and Giacalone2003; Leclercq & Lenart, Reference Leclercq and Lenart2013; Ryan, Reference Ryan2015), that is, they redundantly use fuller forms than pragmatically required. Overexplicitness has been accounted for by the pragmatic principles violation hypothesis (PPVH; Lozano, Reference Lozano and Alonso-Ramos2016), which postulates that pragmatic principles can be violated by L2ers (and even natives) in a mild or strong way: being redundant in topic-continuity contexts represents a mild violation because there is no communicative breakdown (violation of the Gricean Principle of Quantity), while being ambiguous in topic-shift contexts is a strong violation as it leads to a communicative breakdown (violation of the Manner Principle). Importantly, the degree of violation is also modulated by the number of potential (i.e., activated) antecedents in prior discourse. Lozano shows that L2 Spanish learners mostly produce mild violations by being redundant, which is in line with the general overexplicitness phenomenon mentioned before for L2 English.

The interface hypothesis proposes that these problems are triggered by either representational deficits in the L2ers’ knowledge or processing difficulties due to the limitations in cognitive resources when compared to natives (Sorace, Reference Sorace2011). These two accounts (representational vs. processing) conveniently explain the phenomenon at hand but were proposed to account for very advanced and near-native L2ers. However, as White (Reference White, Ritchie and Bhatia2009, Reference White2011) argues, it is also important to investigate the use of REs (a) in richer contexts (i.e., natural production data), (b) from a developmental point of view, and (c) discriminating between different information-status contexts as not all syntax-discourse interface phenomena are monolithic.

We take White’s lead to investigate REs in L1 Spanish–L2 English using a corpus of written English. REs in L2 English have been studied using experimental methods and focusing mostly on syntactic aspects (Mitkovska & Bužarovska, Reference Mitkovska and Bužarovska2018; Pladevall Ballester, Reference Pladevall Ballester2013; Prentza, Reference Prentza, Nikolaos, Thomaï and Sougari2014). There is also evidence from corpora of spoken L2 English (Crosthwaite, Reference Crosthwaite2011; Hendriks, Reference Hendriks and Giacalone2003; Kang, Reference Kang2004; Leclercq & Lenart, Reference Leclercq and Lenart2013; Ryan, Reference Ryan2015). These studies have not analyzed the use of REs in a systematic, unitary, fine-grained and developmental fashion as we do here for L1 Spanish–L2 English, a language combination that is underresearched. We implement a linguistically motivated and complex annotation scheme to investigate the multiple factors that (individually and in conjunction) constrain REs. We also pay attention to the development of REs in L2ers (beginning, intermediate, advanced) versus a control group of English native speakers. Additionally, given that most studies on the syntax-discourse interface have used experimental data, using a corpus of production data provides a more natural picture of the contexts of use. Adopting a Learner Corpus Research (LCR) perspective (Granger et al., Reference Granger, Gilquin and Meunier2015) thus provides a rich descriptive basis to understand the multifactorial nature of REs in L2, which will certainly pave the way for future experimental work.

THE ACQUISITION OF RES IN L2 ENGLISH

Experimental studies have investigated how natives of null-subject languages acquire and process L2 English overt pronominal subjects from a syntactic perspective (Cunnings et al., Reference Cunnings, Fotiadou and Tsimpli2017; Mitkovska & Bužarovska, Reference Mitkovska and Bužarovska2018; Pladevall Ballester, Reference Pladevall Ballester2013; Prentza, Reference Prentza, Nikolaos, Thomaï and Sougari2014). Others explore REs in general (overt and null pronominals as well as NPs) and follow a descriptive and discourse-oriented perspective by using corpora of written production data (Crosthwaite, Reference Crosthwaite2011; Kang, Reference Kang2004; Leclercq & Lenart, Reference Leclercq and Lenart2013; Ryan, Reference Ryan2015).

Pladevall Ballester (Reference Pladevall Ballester2013) tested L1 Spanish–L2 English learners at different proficiency levels using an untimed acceptability judgment task. She found an improvement from beginner (54%) to advanced (20%) levels in the acceptability of ungrammatical null subjects, Ø in (5), which suggests that L2ers transfer null pronouns, while advanced L2ers (4%) behave similarly to English natives (4%). She argues that the findings support the interpretability hypothesis (Tsimpli & Dimitrakopoulou, Reference Tsimpli and Dimitrakopoulou2007), which stipulates that uninterpretable featuresFootnote ¹ are inaccessible to L2 learners, but Pladevall Ballester’s advanced learners’ native-like results indicate otherwise.

Prentza (Reference Prentza, Nikolaos, Thomaï and Sougari2014) also tested the interpretability hypothesis by comparing L1 Greek–L2 English intermediate and advanced learners. She used (a) a paced Acceptability Judgment Task to test sentences like (6) and (7), and (b) two controlled production tasks: a Sentence Completion Task as in (8), and a Cloze Test, as in (9). Of particular interest are two conditions: in her joint-reference condition the anaphor refers to the subject of the preceding clause (i.e., what we call topic-continuity), as in (6a, b), (8a), and (9a), and in her disjoint-reference condition the pronoun refers to a non-subject antecedent, as in (8b) and (9b) (i.e., our topic-shift) or an antecedent that is not present in the preceding discourse (7a, b).

Similarly to Pladevall Ballester (Reference Pladevall Ballester2013), Prentza found that learners (particularly intermediates) accept and produce significantly more null pronouns in the joint-reference condition than in the disjoint-reference condition whereas English natives do not accept/produce them. Prentza argues that in Greek the interpretable [−Topic Shift] feature (which signals topic-continuity) is realized using a null (Ø) pronoun but [+Topic Shift] is realized using an overt pronoun. She claims that L1 agreement properties are transferred to the L2, as English lacks verbal agreement morphology. As a result, L1 Greek–L2 English learners (a) interpret overt pronominal subjects as agreement markers and (b) accept/produce ungrammatical null pronouns in [–TS] in context where English licenses them. She argues that this supports the interpretability hypothesis.

Mitkovska and Bužarovska (Reference Mitkovska and Bužarovska2018) investigated L1 Macedonian–L2 English young learners at different proficiency levels (A1–B2), though English controls were not used, in an acceptability judgment task and in the Macedonian English Learner Corpus. L2ers produce ungrammatical null subjects (10) and also accept them (11a, b), though rates were lower in acceptability than in production. There was a development from A1 to B2 level, the higher levels being more accurate. Even though Mitkovska and Bužarovska do not make reference to the topic-continuity versus topic-shift distinction, we can observe that null pronouns are accepted/produced in topic-continuity contexts, (10) and (11), in line with Pladevall Ballester (Reference Pladevall Ballester2013) and Prentza (Reference Prentza, Nikolaos, Thomaï and Sougari2014).

Overall, these studies show that L2ers from null-subject languages treat referential null subjects in a non native-like manner, which can result from either a representational deficit (cf. the interpretability hypothesis) or even a processing limitation (cf. the interface hypothesis). Importantly, these experimental studies do not employ a thorough and counterbalanced experimental design to tease apart each key factor in the choice of REs. We address these limitations by systematically taking stock of the multiple factors that constrain REs in discourse.

Cunnings et al. (Reference Cunnings, Fotiadou and Tsimpli2017) also investigated L1 Greek–L2 English learners versus English native controls. Using a visual-world paradigm, they monitored eye movements in scenarios known as position of antecedent strategy (PAS) (Carminati, 2002), as in (12). Typical PAS scenarios contain two antecedents in the first sentence (one in subject position and the second in non-subject position) and an anaphor (in this case, an overt pronoun he) in the second sentence that biases toward one of the antecedents.

In Greek PAS scenarios, a null pronoun biases toward a subject-antecedent interpretation (topic-continuity) whereas an overt pronoun typically biases toward an object-antecedent (topic-shift). But in English the pronoun is obligatory overt (he) and it can bias toward either antecedent. Their results show that English natives have a consistent subject-overt pronoun bias in both subject-bias (12a, b) and object-bias conditions (12c), where he would force an initial look at the subject but a later look at the object because it is always the boy (Peter) who has the ice cream. L2ers showed native-like gaze patterns yet their processing was slower. L2ers are not transferring from their null-subject L1 Greek because, otherwise, they would show an object bias with the overt pronoun regardless of the condition. We will analyze PAS structures to explore how they work in natural corpus production across all REs (null and overt pronouns and NPs) to check for L1 effects.

A study that is halfway between an experimental and a corpus study is Contemori and Dussias (Reference Contemori and Dussias2016), who tested very advanced L1 Spanish–L2 English learners using a picture-based controlled production task. Their L2ers show native-like behavior only in topic-shift but not in topic-continuity scenarios, a fact that confirms previous corpus findings for L1 English–L2 Spanish (Lozano, Reference Lozano, Snape, Leung and Smith2009, Reference Lozano and Alonso-Ramos2016) and experimental data for L1 Greek–L2 Spanish (Lozano, Reference Lozano2018). Interestingly, L2ers did not show any associations between the choice of REs and cognitive measures (working memory and inhibitory control), which in principle suggests that when the choice of RE is constrained by the number of activated antecedents in working memory, the choice is not affected by memory.

Corpus-based findings do not confirm the experimental findings in the preceding text. Kang (Reference Kang2004) elicited oral production in intermediate L1 Korean–L2 English versus English and KoreanFootnote ² natives using the picture-based task Frog, Where Are You? by Mayer (Reference Mayer1969). There was a character effect for all three groups: whereas secondary characters (the frog, the dog) were referred to almost exclusively by NPs, the main character (the boy) was referred to by NPs as well but also by overt pronouns, null pronouns, and proper names. Crucially, the discourse contexts in which REs appeared (topic-continuity vs. -shift) were overlooked. This leaves unresolved the proportions of NPs that are due to topic-shift exclusively (independently of character) or to the character effect. Kang also does not settle the issue of whether L2ers’ null pronouns are the result of L1 transfer or a reflection of the English input because she also overlooks the fact that null pronouns are possible in English. Kang’s findings are just suggestive because the analysis is restricted to one factor (characterhood) and proficiency level (intermediate). These limitations will be addressed in our study.

Crosthwaite (Reference Crosthwaite2011) examined picture-based oral production in upper beginner L1 Korean–L2 English learners versus English natives. In coreferential reference maintenance (i.e., our topic-continuity) the English native trend was NP (range: 42% ∼ 46%) ≈ overt pronoun (42% ∼ 51%) > null pronoun (7% ∼ 13%), and the L2ers’ trend was NP (54% ∼ 67%) > overt pronoun (30% ∼ 43%) > null pronoun (2% ∼ 3%). The English native rates for null pronouns (7% ∼ 15%) and for NPs (around 45%) are similar in the Kang and Crosthwaite studies. The L2ers’ rates for NPs are also similar in both studies (around 60%), but their production of null pronouns is lower in Crosthwaite, probably because he did not discriminate between coordinate versus subordinate sentences, which is a key factor, as will be shown in our study. Interestingly, in topic maintenance, natives’ rates of NPs and overt pronouns is similar, whereas in L2ers it is higher. This is unexpected because topic-continuity in native English is encoded using minimal forms (overt pronouns), as reported in our study. Finally, Crosthwaite, similarly to Kang, did not investigate the development of REs in L2 English across proficiency levels.

Leclercq and Lenart (Reference Leclercq and Lenart2013) investigated the oral production of L1 French–L2 English adults and a comparable English native control group in a film-retell task. They categorized REs according to the accessibility hierarchy (Ariel, Reference Ariel2004): high-accessibility markers (null and overt pronouns for antecedents that are easily retrievable) versus low-accessibility markers (definite NPs for antecedent reintroductions). Overall, learners (a) produce high-accessibility markers instead of low-accessibility markers to maintain topic; (b) produce more high-accessibility markers than natives to reintroduce a character, as expected; and (c) produce fewer null pronouns than English natives to maintain the reference, that is, learners are more overexplicit than natives, though results are presented according to character and to accessibility marker, and not according to the information status of the RE (topic-continuity vs. -shift).

Ryan (Reference Ryan2015) also found overexplicitness in the oral film-retell task by L1 Mandarin Chinese–L2 English learners. In line with Leclercq and Lenart, he analyzed the accessibility to characters but considered additional factors: distance, unity, competition, and salience. Even though L2ers use high-accessibility markers to maintain the topic, they significantly produce more NPs than natives and show a main-character effect by producing more NPs to refer to Chaplin than natives do. However, Ryan’s linguistic analysis of REs does not discriminate between topic-continuity versus topic-shift in each of the different accessibility degrees of the RE that he distinguishes.

In short, corpus-based studies find overexplicitness in L2ers, but experimental studies typically report overacceptance of null pronominal subjects, attributed to L1-transfer effects. Importantly, “[N]o single factor accounts for overexplicit reference” (Ryan, Reference Ryan2015, p. 852). We take Ryan’s suggestion as a departure point for our study, in which we systematically explore the multiple factors that constrain REs in L2 English discourse.

RESEARCH QUESTIONS

Based on the previous review of the literature, several research questions (RQs) and hypotheses (Hs) were formulated.

General RQ: What is the overall distribution of REs in the narratives of L2ers and natives, without taking into account any factor?
RQ1: Are learners sensitive to the information-status factor that constrains the choice of REs in topic-continuity versus topic-shift contexts?
H1: The literature has shown that the choice of the RE is influenced by its information status. Natives are expected to use minimal REs (null and overt pronouns) for topic-continuity but fuller REs (NPs mainly) for topic-shift. L2ers are predicted to (a) show sensitivity to this in general but not in a native-like manner (even at advanced levels) and (b) be overexplicit in topic-continuity scenarios.

The second research question relates to cross-linguistic influence (transfer) and consists of two parts.

RQ2a: Do L2ers transfer null pronominal subjects (Ø) from their null-subject L1 (Spanish) to their nonnull-subject L2 (English)?

H2a: The L2 experimental literature reports acceptance of null subjects in L2 English by learners with null-subject L1s. If L2ers transfer, they would produce null pronouns in both coordinate and subordinate sentences. However, if they produce null pronouns only in coordinate sentences, where they are allowed in native English, but not in subordinate sentences, where they are not allowed, this would suggest lack of transfer.

RQ2b: In null-subject languages like Spanish, a standard way of resolving anaphora is through the PAS mechanism. Do Spanish natives employ such L1-based strategy in their L2 English to resolve anaphora?

H2b: The psycholinguistic PAS literature reports that L1 Greek learners of L2 English correctly show a subject-overt pronoun (S-overt) bias, as in native English, and not a subject-null pronoun (S-Ø) bias, as would be the case in their native Greek. We predict that, though both English natives and English L2ers will produce PAS contexts in their discourse as a possible way of resolving anaphora, learners will not transfer, that is, they will produce mostly S-overt structures, as English natives do.

RQ3: Is there a within task effect in the production of REs? Does the transition between pictures in the picture book used to elicit narratives affect the choice of REs even in topic-continuity context?

H3: Retelling a story prompted by a series of pictures depicting a topic-continuity situation may trigger economical forms (null and overt pronouns) when describing the same picture, as expected, but fuller forms (NPs) when moving from one picture to the next.

RQ4: Does characterhood (main vs. secondary characters) affect the choice of the REs?

H4: Previous research using the same task (Kang, Reference Kang2004) found a character effect in L2ers’ narratives. We likewise predicted characterhood to affect the overall choice of REs in both natives and L2ers, though the RE information status is ultimately the prevailing factor constraining the choice.

RQ5: Does the number of activated antecedents influence the choice of REs?

H5: There is L2 corpus evidence showing that the number of activated antecedents within the proximal preceding context affects the choice of the RE (Lozano, Reference Lozano and Alonso-Ramos2016). The higher the number of antecedents in working memory, the more explicit the REs that speakers produce. This effect is expected to be found in natives and L2ers across the board as it is related to cognitive processes and not so much to proficiency level.

METHOD

CORPUS AND SAMPLE

We analyzed a sample (Table 1) from the Corpus of English as a Foreign Language (COREFL: www.learnercorpora.com) with L1 Spanish–L2 English EFL university students studying a modern-languages degree where English is not a major. They ranged from A1 to C1 proficiency levels according to the CEFR-based English Unlimited Placement Test (2010). Natives were American English university students.Footnote ³

TABLE 1. Corpus sample

MATERIALS

Twelve pictures from the wordless picture book Frog, Where Are You? (Mayer, Reference Mayer1969) were chosen as prompts for the written narratives. These prompts were chosen because they have been used in many SLA studies and because an unknown story like this is suitable as the choice of REs cannot be attributed to shared knowledge amongst speakers/readers.

TAGSET

Prior to the design of the tagset, we took into consideration certain design and methodological limitations of previous L2 English studies:

1. 1st- versus 2nd- versus 3rd-person distinction: The stimuli from experimental studies (Mitkovska & Bužarovska, Reference Mitkovska and Bužarovska2018; Pladevall Ballester, Reference Pladevall Ballester2013; Prentza, Reference Prentza, Nikolaos, Thomaï and Sougari2014) mix REs from 3rd-person singular/plural (the genuine anaphoric uses of pronouns) with 1st and 2nd person (deictic use). They do so in an unsystematic way and do not compare results for every person/number in the pronominal paradigm. Corpus studies (Crosthwaite, Reference Crosthwaite2011; Kang, Reference Kang2004; Leclercq & Lenart, Reference Leclercq and Lenart2013; Ryan, Reference Ryan2015) focus on 3rd person (singular/plural), but corpus data shows that learners have more problems with the anaphoric uses of subject pronouns than with the deictic use, and especially with 3rd-person singular in subject position (Lozano Reference Lozano, Snape, Leung and Smith2009). We therefore focus on 3rd-person singular in subject position.
2. Topic-continuity versus topic-shift distinction: Most L2 English studies focus on the acceptability/production of pronouns irrespective of their information status (topic-continuity vs. topic-shift), but the L2 literature clearly demonstrates that the form of the RE is mainly (though not exclusively) constrained by this distinction (Contemori & Dussias, Reference Contemori and Dussias2016; Cunnings et al., Reference Cunnings, Fotiadou and Tsimpli2017; Lozano, Reference Lozano, Snape, Leung and Smith2009, Reference Lozano and Alonso-Ramos2016, Reference Lozano2018; Sorace & Serratrice, Reference Sorace and Serratrice2009). In our study this distinction is crucial for our tagset.
3. Subject versus non-subject distinction: Some studies analyze REs in subject and non-subject position indistinctly (Crosthwaite, Reference Crosthwaite2011; Leclercq & Lenart, Reference Leclercq and Lenart2013). The syntactic position of the RE constrains its possible forms (Subject position {Ø/pronoun/NP}; Object position {pronoun/NP}). We focus on subject REs because (a) the choice of forms is wider and, crucially, (b) the topic-continuity versus topic-shift distinction is only observable when the RE is in subject position because REs in non-subject position (i.e., object personal pronouns) encode topic but not topic-shift.
4. Referential versus nonreferential (expletive) pronouns. All experimental studies (Mitkovska & Bužarovska, Reference Mitkovska and Bužarovska2018; Pladevall Ballester, Reference Pladevall Ballester2013; Prentza, Reference Prentza, Nikolaos, Thomaï and Sougari2014) investigate both types of pronouns. Prentza (Reference Prentza, Nikolaos, Thomaï and Sougari2014) mixes in her results both types whereas Pladevall Ballester (Reference Pladevall Ballester2013) and Mitkovska and Bužarovska (Reference Mitkovska and Bužarovska2018) present results for referential versus nonreferential pronouns. Mixing both types in the results is misleading because only referential pronouns are anaphoric (if used in the 3rd person). We analyze only 3rd-person referential uses.
5. Animate versus inanimate distinction. Pladevall Ballester (Reference Pladevall Ballester2013) distinguishes between animate and inanimate pronouns, whereas Mitkovska and Bužarovska (Reference Mitkovska and Bužarovska2018) and Prentza (Reference Prentza, Nikolaos, Thomaï and Sougari2014) mix them in their results, which is problematic because only animate anaphoric pronouns are typically difficult for L2ers (Lozano, Reference Lozano, Snape, Leung and Smith2009). Corpus studies (Crosthwaite, Reference Crosthwaite2011; Kang, Reference Kang2004; Leclercq & Lenart, Reference Leclercq and Lenart2013; Ryan, Reference Ryan2015) analyze only animate anaphoric 3rd-person REs and we do so in the present study.
6. Antecedent versus antecedentless distinction. The RE has an antecedent in Mitkovska and Bužarovska’s (Reference Mitkovska and Bužarovska2018) acceptability task, as in (10) and (11) in the preceding text, whereas Pladevall Ballester (Reference Pladevall Ballester2013) mixes stimuli with (5b) and without antecedents (5a), and Prentza (Reference Prentza, Nikolaos, Thomaï and Sougari2014) uses antecedents only in her joint condition (6a, b) but not in her disjoint condition (7a, b). Analyzing antecedentless constructions may bias the results because the RE is decontextualized and it is precisely the preceding context that determines the info status of the RE. Previous corpus studies (Crosthwaite, Reference Crosthwaite2011; Leclercq & Lenart, Reference Leclercq and Lenart2013; Ryan, Reference Ryan2015) consider the antecedent, though they do not present results according to the number of activated antecedents. We therefore tag not only the properties of the actual antecedent of the RE but also the number of potential antecedents because it influences the choice of RE, as reported in Lozano (Reference Lozano and Alonso-Ramos2016) for L1English–L2 Spanish with corpus data, Contemori and Dussias (Reference Contemori and Dussias2016) for L1 Spanish–L2 English with controlled-production data and Arnold and Griffin (Reference Arnold and Griffin2007) for native English with experimental data.
7. Main versus subordinate distinction. Mitkovska and Bužarovska (Reference Mitkovska and Bužarovska2018) distinguish between main versus subordinate clauses in their results, whereas Pladevall Ballester (Reference Pladevall Ballester2013) and Prentza (Reference Prentza, Nikolaos, Thomaï and Sougari2014) mix both syntactic contexts in their results. The corpus studies reviewed in the preceding text do not take syntax into consideration either. As discussed earlier, null pronouns are allowed in English but only in coordinate topic-continuity contexts, so it is essential to tag the syntactic environment of the RE (coordinate vs. subordinate).
8. Connectors. The stimuli in experimental studies (Mitkovska & Bužarovska, Reference Mitkovska and Bužarovska2018; Pladevall Ballester, Reference Pladevall Ballester2013; Prentza, Reference Prentza, Nikolaos, Thomaï and Sougari2014) mix a wide array of connectors, often mixing subordinators (if, because, that, so, where) with coordinators (and) and even two main clauses separated by a stop. The corpus studies do not take the connector type into consideration. As just explained in 7, we analyzed coordinate versus subordinate connectors.

The corpus was tagged with UAM (Universidad Autónoma de Madrid) Corpus Tool (O’Donnell, 2008) (www.corpustool.com). It is an annotator that allows the creation of a sophisticated tagset with different layers to annotate texts manually. The tool allows the performance of descriptive and inferential statistics (χ ²) based on the tag frequencies. Departing from Lozano’s (Reference Lozano and Alonso-Ramos2016) tagset for L2 Spanish and considering the limitations from L2 English studies reported in the preceding text, we designed a tagset (Figure 1) to reflect the multiple factors that constrain the choice of REs.

FIGURE 1 Tagset (Res).

For every sentential subject that was a RE, we assigned multiple tags, though not all of them will be analyzed in this study due to space limitations. The characters (the boy, the frog, the dog) are tagged in the character type system, though additional characters were also tagged as they were mentioned in participants’ production. The anaphor form system reflects the form of the RE: null (Ø) and overt (he/she/it) referential pronominal subjects as well as NPs (e.g., the boy, Nico, the frog). In the anaphor number and gender system, the “number” tag (singular or plural) allowed us to tag both 3rd-person singular and plural REs, though recall that only 3rd-singular human REs were analyzed in this study. The “gender” tag reflected the grammatical gender of the RE, though gender was not analyzed in this study. The anaphor clause position system encoded the position of the RE (main vs. subordinate and different subtypes, including coordination). Importantly, the anaphor information status reflects the information status of the RE (the introduction of new characters onto the scene and, crucially, topic-continuity vs. topic-shift). The picture type system differentiates between whether the RE relates to the same picture as the preceding RE or to a new picture. This system was added during the tagging procedure as it was unexpectedly observed that the overproduction of NPs in topic-continuity contexts were caused by the transition between pictures. The antecedent system reflects the referent(s) that precede the RE being annotated. Unlike previous research, we not only take into account the relationship between the RE and its actual antecedent (whose syntactic role was also tagged as subject/non-subject) but also counted the number of potential antecedents within the last four clauses that could be influencing the choice of the RE. The “intervening antecedent” tag counted the number of antecedents intervening just between the actual antecedent and the RE but was not analyzed for this study. Finally, the PAS was also included in the tagset to check if PAS scenarios occur in English and to test possible L1 influence.

Finally, a second scheme was created (Figure 2) to tag the level of the participants and classify them into groups. This allowed us to make statistical comparisons between groups.

FIGURE 2 Tagset (groups).

TAGGING PROCEDURE

Tagging was manual due to the complexity of the phenomenon under investigation. The tagging procedure was as follows. After creating the tagset (Figure 1), both authors tagged a representative sample of narratives to reach an agreement about the tagging protocol. Then, one of the authors continued tagging the narratives. Problematic cases were detected and both researchers finally agreed upon all problematic tags.

DATA ANALYSIS

As stated in the preceding text, only 3rd-person singular animate REs in subject position were considered for the analysis. A total of 674 REs were analyzed (beginner: 145; intermediate: 164; advanced: 145; natives: 220). Note that each RE tagged contained around 10 additional terminal tags, one per system: character type, RE form, RE number and gender, RE clause position, RE information status, RE picture type, antecedent role, antecedents activated, intervening antecedent, and PAS type. Based on the terminal tag raw frequencies, we performed descriptive and inferential (χ ²) statistics in UAM Corpus Tool by combining and contrasting different tag systems and subsystems.

RESULTS

GENERIC RESULTS: RE FORMS

Figure 3 shows that English natives predominantly produce NPs (49.1%), followed by overt pronouns (35%) and a remarkable amount of null pronouns (15.9%), to be discussed in section H2a. L2ers show an increase of null pronouns across proficiency toward the native norm. Interestingly, low-level L2ers hardly produce any null pronouns (beginner: only five tokens out of 145 = 3.4%; intermediate: 6/164 = 3.7%), which has implications for H2a. As for overt pronouns and NPs, most L2er groups (beginners and advanced) produce roughly the same amount of overt pronouns and NPs, whereas recall that natives clearly produce more NPs than overt pronouns. There are significant differences between the L2ers and natives. Beginners show significant differences for null (χ ² = 13.907, p < 0.01) and overt pronouns (χ ² = 7.770, p < 0.01)Footnote ⁴; intermediates show significant differences for null pronouns only (χ ² = 14.785, p < 0.01); and advanced L2ers show significant differences for both null (χ ² = 4.536, p < 0.05) and overt pronouns (χ ² = 5.769, p < 0.01).

FIGURE 3 RE forms across groups.

These generic results say little about how information packaging (topic-continuity/-shift) and the preceding discourse constrains the distribution of REs in discourse. We explore this next.

H1: THE EFFECT OF INFORMATION STATUS ON THE CHOICE OF RES

In topic-continuity contexts (Figure 4, first chart), the overall preference across groups is to maintain topic using an overt pronoun but to shift topic (Figure 4, second chart) using a fuller form (NP). As expected, in topic-continuity contexts all groups produce overt pronouns mainly (cf. [13]). Null pronouns are also an option (30.9% for natives, with L2ers showing an increasing trend toward the native norm: 6.6%, 7.1%, 17.4%), a fact that has not been fully explored in the L2 literature (see section “H2a” for details). NPs are the dispreferred option to mark topic-continuity for advanced L2ers and natives (10.1% and 16.4%, respectively), but production is slightly higher for the lower levels (9.2%, 22.9% for beginners and intermediates), as in (14). There are significant differences between natives and L2ers: beginners (null: χ ² = 16.056, p < 0.01; overt: χ ² = 19.740, p < 0.01); intermediates (null: χ ² = 14.236, p < 0.01; overt: χ ² = 5.294, p < 0.01); and advanced (null: χ ² = 4.258, p < 0.05; overt (χ ² = 6.902, p < 0.01).

FIGURE 4 RE forms according to information status (topic-continuity/shift) by group.

In topic-shift scenarios (Figure 4, second chart) all groups clearly produce more NPs (learners: 86.2%, 81.9%, 71.6%; natives: 79.4%), as in (15), than overt pronouns (13.8%, 16.9%, 28.4% for learners and 19.6% for natives), as in (16). Null pronouns are hardly produced in these contexts as they would result in ambiguity (0%, 1.2%, 0% for learners and 1% for natives). There are no statistically significant differences between L2ers and natives. An interesting question, then, is why NPs would be preferred to overt pronouns to mark topic-shift. This phenomenon has been addressed in the L2 Spanish corpus literature (Lozano, Reference Lozano and Alonso-Ramos2016), but not in the L2 English corpus literature. It appears to be related to the number of competing antecedents (see section “H5”).

Regarding the introduction of new characters onto the scene (tagged as “focus new introduction”), these were introduced/presented for the first time using an (indefinite) NP (beginners: 11/11 = 100%; intermediates: 11/11 = 100%; advanced: 9/9 = 100%; natives: 14/14 = 100%), compare examples in (17), and neither overt nor null pronouns were produced at all, as expected.

H2: CROSS-LINGUISTIC INFLUENCE

In subsection “H2a” we focus on comparing different syntactic environments (coordination vs. subordination) to check if L2ers transfer null pronominal subjects from their L1. In subsection “H2b” we report on possible cases of transfer in PAS scenarios.

H2a: Testing Cross-Linguistic Influence

Even though English is a nonnull-subject language, null-subject production has been reported in the literature in diary-drop styles (Haegeman, Reference Haegeman2009) and in coordination, as discussed early in this article, but it is unclear yet (a) whether coordination alone can license them in native English; (b) at which developmental point, if any, L2ers become sensitive to this phenomenon; and (c) whether the use of null pronouns by L2ers is a consequence of cross-linguistic influence from a null-subject L1 (Spanish) to a nonnull-subject L2 (English).

Null pronouns are produced in native English in general (15.9% in Figure 3), particularly in topic-continuity scenarios (30.9% in Figure 4, first chart) but, crucially they are predominant in coordination and topic-continuity (76.7% in Figure 5, first chart).Footnote ⁵ By contrast, in subordination and topic-continuity scenarios like (18), natives predominantly produce overt pronouns (87.5%) (Figure 5, second chart). This indicates that in native English null pronouns are allowed only in coordinate environments that encode topic-continuity.

FIGURE 5 RE forms in topic-continuity according to sentence type (coordinate vs. subordinate) by group.

As for L2ers in topic-continuity coordinate sentences (cf. Figure 5, first chart), there is a low production of null pronouns in the low-proficiency groups (beginner: 19.2%, intermediate: 20%), which dramatically increases at advanced levels (60%), thus approaching the native norm, cf. (19b).

The opposite pattern holds for overt pronouns: high production at the lower levels (80.8%, 70%) and lower production at the higher level (40%), again approaching the native norm. As with natives, NP rates in L2ers are very low and percentages represent just a few tokens. In topic-continuity subordinate scenarios, the L2ers’ rates are similar to the natives’, with predominance of overt pronouns, as in (20), low production of NPs and (virtually) no production of null pronouns, with tendencies toward the native norm as proficiency increases.

H2b: Testing AR in Specific Scenarios: The Position of Antecedent Strategy

In native English, REs can be also resolved in PAS scenarios of the type S_i O_j [RE-S_i/j], though their frequency is rather low in relation to the total number of all possible RE scenarios (47/290 = 16.2%). In native English, PAS scenarios in which the RE refers to a subject antecedent are much more frequent (31/47 = 66%) than when the RE refers to a non-subject antecedent (16/47 = 34%).

Natives produce more subject-biased scenarios (31/47 = 66%) than object-biased scenarios (16/47 = 34%). PAS with subject antecedents (Figure 6) are rather similar to topic-continuity scenarios (cf. Figure 4, first chart), which suggests that when the RE refers to a subject antecedent, there is a continuation of topic, as expected. However, null pronouns are more frequent in topic-continuity scenarios than in PAS subject-biased scenarios, which will be discussed later.

FIGURE 6 PAS scenarios with subject antecedents: RE forms by group.

A comparison of L2ers versus natives in PAS subject-antecedent scenarios (Figure 6) shows a predominant S-overt strategy: overt pronouns are the most produced RE form for both natives (64.5%) and all groups of learners (range: 60.9%–87%). Regarding NPs and null pronouns, natives produce some null pronouns (25.8%) (which correspond to cases of “and” coordination) but very few NPs (9.7%), whereas low-level L2ers show the opposite behavior (more NPs than null pronouns) but advanced L2ers produce the same proportions for both forms.

Regarding PAS scenarios with non-subject antecedents, the frequency of production is very low (N = 18 for all groups of learners, 16 for natives). Importantly, none of the groups produced null pronouns to refer to a non-subject antecedent, which suggests again that L2ers do not simply transfer the null-pronoun option from Spanish to English. By contrast, all groups prefer an NP (natives: 14/16 = 87.5%; all learner groups: 13/18 = 72.2) to an overt pronoun (natives: 2/16 = 12.5%, learners: 5/18 = 27.8%) to refer to a non-subject antecedent.

H3: PICTURE-TRANSITION EFFECTS ON THE CHOICE OF RES

RQ3 asks whether there is a within-task effect, that, whether the transition between pictures affects the choice of RE when marking topic-continuity. There is indeed a gradual effect of picture (Figure 7). The production of fuller forms increases with a new picture (null < overt < NP) but decreases within the same picture (null > overt > NP). In other words, when marking topic-continuity, null pronouns predominate when describing the same picture (which correspond to coordination), but when starting to describe a new picture it is NPs that predominate. This is so for natives and all groups of learners, who behave alike. This picture-transition effect has not previously been reported in the literature, as we will discuss later.

FIGURE 7 RE forms in topic-continuity scenarios according to picture (same vs. different picture) by group (learners vs. natives).

To illustrate, in (21) the learner is marking topic-continuity as he is talking about the boy but uses full NPs (“the boy”) in the transition between pictures (marked by two slashes). By contrast, in (22) the learner is using null pronouns when describing the same picture.

H4: THE EFFECT OF CHARACTER ON THE CHOICE OF RES

H4 predicted character type to constrain the choice of REs. Figures 8 and 9 show a main effect of character, as predicted. To refer to the main character (the boy in Figure 8), L2ers overall produce overt pronouns more frequently than NPs (the boy but occasionally invented proper names like Timothy), which in turn are more frequent than null pronouns. Natives alternate between overt pronouns (55/128 = 43%) and NPs (55/128 = 43%). Null pronouns are low for L2ers (2.9%, 3.7%, 10.5%) and natives (14%), with L2ers showing an increasing trend toward nativeness. These represent cases of topic-continuity in coordinate sentences, as discussed earlier.

FIGURE 8 REs to refer to the main character (“boy”) by group.

FIGURE 9 REs according to secondary characters (“frog,” “dog”) by group.

A different pattern can be observed when referring to secondary characters (frog, dog in Figure 9). The overall pattern is NP >> overt > null across groups, contrasts with the main character pattern (overt > NP >> null).Footnote ⁶ The fact that learners assign a more central role to the frog than to the dog can be seen in the fact that the dog is mostly referred to by an NP, whereas the frog is mostly referred to by both an NP but also an overt pronoun. The central versus peripheral role of the characters can be seen in the overall raw frequencies: boy (N = 425 for all groups) > frog (N = 155) > dog (N = 41). Though English natives appear not to give a predominant role to either frog or dog because their RE form rates are similar for both frog (NP 62% > overt 22%) and dog (NP 62.5% > overt 25%), the raw figures clearly confirm that frog (N = 50) is more central than dog (N = 16). The low frequencies for dog are due to the fact that dog appears together with boy (boy and dog) (N = 215 for all groups), whereas boy and frog are never produced together (N = 0). In short, protagonisthood constrains the choice of REs.

Given that the character effect is clear, a central question is whether the choice of RE form is constrained exclusively by characterhood or rather by its information status. If constrained by characterhood, boy should be referred to by the RE forms reported just above (Figure 8) independently of its context (topic-continuity/-shift). If constrained by information status, boy should be referred to using an overt pronoun when encoding topic-continuity but an NP when encoding topic-shift. The latter is the case: In topic-continuity, boy is referred to by an overt pronoun more than by any other forms (Figure 10, first chart), similarly to what occurred in topic-continuity contexts (Figure 4, first chart). By contrast, in topic-shift (Figure 10, second chart), boy is referred to by an NP more than by any other forms, as in topic-shift contexts (Figure 4, second chart). These results confirm that all groups know how to mark topic-continuity/-shift independently of the character.

FIGURE 10 REs for the protagonist (boy) according to information status (topic- continuity/-shift) by proficiency level.

H5: THE EFFECT OF THE NUMBER OF ACTIVATED ANTECEDENTS ON THE CHOICE OF RES

We checked the effect of the number of activated antecedents (two vs. three within the last four clauses)Footnote ⁷ regardless the information status of the RE (Figure 11).Footnote ⁸ REs are shown in bold and activated antecedents in underlining: (23) illustrates two activated antecedents and an overt pronoun RE; and (24) shows three activated antecedents and an NP RE.

FIGURE 11 RE forms according to the number of activated antecedents (two vs. three) by proficiency level.

For English natives, there is an interaction between the number of activated antecedents (2/3 antecedents) and the referential form (overt/NP). As (23a) illustrates, when there are two activated antecedents, natives’ REs are overt (62.5%) more often than NPs (37.5%). By contrast, with three activated antecedents, natives show the opposite behavior by producing more NPs (66.7%) than overt pronouns (33.3%), as in (24a). Natives’ production of overt is significantly higher with two than with three antecedents but NPs are significantly lower with two than with three antecedents (χ ² = 12.779, p < 0.01).

For learners, we do not find the exact interaction found in natives, though it is true that (a) the number of NPs increases from two to three antecedents, but (b) the number of overt pronouns decreases from two to three antecedents (cf. Figure 11).

Beginners produce more overt pronouns (81.6%) than NPs (18.4%) with two antecedents but there is no distinction with three antecedents as the rates for both overt pronouns (45.7%) and NPs (54.3%) are similar. However, beginners’ production is significantly higher (a) with two (81.6%) than with three (45.7%) antecedents with overt pronouns but (b) with 3 (54.3%) than with two (18.4%) antecedents with NPs (χ ² = 13.612, p < 0.01). Although beginners make the same statistically significant discrimination as natives do, their behavior is not native-like because with three antecedents they produce similar percentages of overt pronouns (45.7%) and NPs (54.3%). Intermediates show similar rates between overt pronouns (48.7%) and NPs (51.3%) with two antecedents, but with three antecedents they produce more NPs (61.2%) than overt pronouns (38.8%). Advanced L2ers show the opposite behavior to intermediates: different rates of overt pronouns (60.6%) and NPs (39.4%) with two antecedents, but similar rates with three antecedents (overt 51.1%, NPs 48.9%). Regarding the increase or decrease of forms as the number of antecedents increase, there are no significant differences for either intermediates (χ ² = 1.135, p > 0.05) or advanced (χ ² = 0.866, p > 0.05).

DISCUSSION

The overall results (H0), which do not consider any constraining factor, show that English natives predominantly produce NPs followed by overt and null pronouns. When compared against the natives, findings suggest that (a) none of the L2ers behaved in a consistently native-like fashion, even at advanced levels; (b) beginners and advanced L2ers (but not intermediates) produce similar rates of overt REs (overt pronouns and NPs), whereas natives produce more NPs than overt pronouns; and (c) L2ers produce very few null subjects, even in early stages, which is contrary to their high acceptance in previous experimental studies, so our findings suggest that L2ers are not transferring the null-subject option from Spanish, as will be discussed in detail later. These overall results are suggestive but not fully informative because the use of RE forms varies according to the multiple factors.

Regarding the information status factor (Hypothesis 1), natives clearly produce overt pronouns followed by null pronouns in topic-continuity contexts but mostly NPs in topic-shift contexts. L2ers show the same pattern in topic-shift context, but not in topic-continuity contexts, producing significantly more overt pronouns and less null pronouns than natives. L2ers are thus not fully able to choose the most felicitous REs at the syntax-discourse interface in a native-like manner even at advanced (C1) levels, though future corpus research will have to determine whether native-like patterns are eventually attainable. Additionally, not all contexts at the syntax-discourse interface are equally problematic, as discussed by Slabakova (Reference Slabakova2016) and White (Reference White2011), because L2ers’ performance in topic-shift contexts does not significantly differ from natives’, but it does in topic-continuity as reported in previous L2 Spanish corpus (Lozano, Reference Lozano and Alonso-Ramos2016; Martín-Villena and Lozano, Reference Martín-Villena, Lozano, Ryan and Crosthwaite2020) and experimental (Lozano, Reference Lozano2018) studies. The overexplicitness phenomenon that L2ers exhibit in topic-continuity contexts (i.e., higher production of overt pronouns but lower production of null pronouns than natives) is in line with previous corpus research (Crosthwaite, Reference Crosthwaite2011; Hendriks, Reference Hendriks and Giacalone2003; Leclercq & Lenart, Reference Leclercq and Lenart2013; Ryan, Reference Ryan2015), though note that these studies do not consider proficiency levels. We show that L2ers (particularly beginners) produce redundant REs, as reported elsewhere. Importantly, such redundancy (a.k.a., overexplicitness) has been theoretically motivated. In Sorace’s (Reference Sorace2011 and references therein) interface hypothesis, advanced learners are predicted to residually use overt pronouns as a default strategy to ease their processing load (i.e., they redundantly use overt pronouns in topic-continuity). The interpretability hypothesis (Tsimpli & Dimitrakopoulou, Reference Tsimpli and Dimitrakopoulou2007) also accounts for the overexplicitness phenomenon by postulating that L2ers transfer L1 agreement properties to the L2 due to the differences between the L1 and L2 and that overt pronouns are agreement markers. This could be a tentative explanation but there is strong evidence showing that L2ers with both L1 (Greek) and L2 (Spanish) null subject languages (Lozano, Reference Lozano2018) also overproduce redundant overt pronouns in topic-continuity and this cannot be explained in terms of L2ers’ inaccessibility to uninterpretable features. Therefore, there may be a general L2 overexplicitness phenomenon regardless of the L1–L2 combination that future research should explain. Crucially, our learners are not only redundant, but they are more redundant than ambiguous, as proposed by the PPVH (Lozano, Reference Lozano and Alonso-Ramos2016), which was presented in the “Introduction.” In particular, our learners (a) observe the Gricean Principle of Manner/Clarity (Do not use a full RE without reason, i.e., use full REs only to avoid ambiguity), therefore correctly producing fuller forms in topic-shift, but (b) often violate the Informativeness/Economy Principle (Use minimal RE forms to achieve your communicative goals) and are therefore redundant in topic-continuity. Redundancy is also modulated in the PPVH by another factor: the number of potential antecedents, as also found in our study: the higher the number of antecedents, the higher the probability of using fuller REs. Future research will need to explore how this and other theories can best account for the different factors that modulate RE choice in other L1–L2 combinations and the redundancy that all L2ers exhibit.

Regarding cross-linguistic influence (H2a), the clearer scenarios to check L1 effects are coordination versus subordination. Two pieces of evidence show that L1 transfer is not the fundamental explanatory factor. First, the lower levels produced low rates of felicitous null pronouns in coordinate sentences (beginners: 19.2%, intermediates: 20%), though much higher rates would be expected given that null pronouns are allowed in native Spanish when marking topic-continuity. Second, in subordinate sentences null pronouns would be expected too because they are also allowed in L1 Spanish, but this is not the case as production is zero. This supports the redundancy strategy just mentioned. Instead of transferring null pronouns from beginner levels, L2ers start opting for the overproduction of overt pronouns, which are redundant in topic-continuity coordinate contexts, whereas natives show the opposite behavior, with advanced learners timidly but not significantly approaching the native pattern. Therefore, it seems as if learners progressively became more sensitive to the pragmatics of the English input as their proficiency level increases by gradually producing pragmatically felicitous null subject pronouns in coordination (19.2%, 20%, 60%). As for H2b, the PAS results also suggest that transfer is not a key explanatory factor: learners predominantly produce overt pronouns but hardly any null pronoun tokens (beginners: 0; intermediates: 1; advanced: 2) to refer to a subject antecedent, though null pronouns are the predominant option in native Spanish (cf. Alonso-Ovalle et al., Reference Alonso-Ovalle, Fernández-Solera, Frazier and Charles2002, inter alia). Our results are thus not fully aligned with previous experimental studies that argue for transfer effects because they report acceptance of null pronominal subjects, particularly at lower levels of proficiency (Mitkovska & Bužarovska, Reference Mitkovska and Bužarovska2018; Pladevall Ballester, Reference Pladevall Ballester2013; Prentza, Reference Prentza, Nikolaos, Thomaï and Sougari2014). This difference could have been due to the different research method (production vs. comprehension/interpretation). So, triangulation (i.e., the combined use of production vs. comprehension methods to investigate the same phenomenon, as recommended by Mendikoetxea & Lozano, Reference Mendikoetxea and Lozano2018) seems like the correct avenue to ascertain this apparent paradox.

Regarding the within-task effect (H3), a picture-transition effect was observed. In topic-continuity contexts the transition between pictures triggered fuller form (even though they were not pragmatically required), both in natives and L2ers alike. This effect has gone undetected in L2 corpus studies (Crosthwaite, Reference Crosthwaite2011; Kang, Reference Kang2004) and L2 controlled-production studies (Contemori & Dussias, Reference Contemori and Dussias2016) that used picture-elicitation prompts. For example, even though Contemori and Dussias (Reference Contemori and Dussias2016) do not report it, we can observe (their Figure 7) that English natives produce around 25% of NPs (and 75% of overt pronouns) in a one-character topic-continuity scenario when moving from picture A to picture B. Such a high use of redundant NPs might have been triggered by a picture-transition effect. We therefore speculate there to be two types of continuity: textual continuity (topic-continuity) and visual continuity (picture-continuity). A break in visual continuity may trigger the use of fuller forms even when the textual continuity is maintained. Studies on native English have reported transition effects when breaking what has been termed “unity,” that is, the transition between paragraphs, between episodes, or between scenes in a film (cf. Collewaert, Reference Collewaert2019 for an overview). Future L2 (corpus and experimental) research will therefore need to clarify the role of such picture-transition effects.

Results on the nature of the character (H4) show a clear difference between the main character (boy) and the secondary characters (frog, dog) in the choice of REs. When referring to the main character, natives equally produce overt pronouns and NPs, but L2ers typically prefer an overt pronoun. NPs are predominantly used across groups for secondary characters, though L2ers’ rates were higher than natives’. Even though previous studies also reported a similar character effect (Kang, Reference Kang2004; Ryan, Reference Ryan2015), they overlooked the fact that the main-character effect is modulated by information status: In topic-continuity, all our groups predominantly refer to the boy with an overt pronoun whereas in topic-shift they mainly produce an NP.

Finally, we predicted that the higher the number of activated antecedents, the fuller the RE (Hypothesis 5), as also reported for native and L2 Spanish (Lozano, Reference Lozano and Alonso-Ramos2016). This was confirmed for English natives as they produce more overt pronominals than NPs with two antecedents but show the opposite trend with three antecedents. Such clear interaction is not so strong in the production of L2ers, though in their transition from two to three antecedents, overt pronominals decrease while NPs increase. The presence of one versus two competing antecedents has been argued to compromise processing resources, as reported in native English (inter alia, Arnold & Griffin, Reference Arnold and Griffin2007) and L2 English (Contemori & Dussias, Reference Contemori and Dussias2016). This factor may be more evident in natives as they know how to efficiently choose REs in their mother tongue and their use does not imply additional processing costs. By contrast, L2ers incur higher processing costs due to the number of activated antecedents and also due to the processing costs derived from choosing the correct RE at each particular context in their non-native language while inhibiting the RE form from their native language (cf. Sorace, Reference Sorace2011).

Certainly, all our conclusions are limited to 3rd singular REs in subject position only, as justified in the preceding text. This is not a limitation but rather the strong point of this study because RE and anaphora resolution deficits have been mainly observed with deictic uses of the pronouns (3rd person) in subject position (where the topic-continuity vs. topic-shift contrast is allowed). We have focused on the acquisition of this phenomenon in a unified way in L2 English.

CONCLUSION

The current developmental study focused on L1 Spanish–L2 English learners at different proficiency levels and showed how the production of different REs is constrained by different factors that were not considered together in previous research (i.e., information-status, main-character effect, syntactic configuration, number of activated antecedents, and picture transition). We confirmed that L2ers show deficits at the syntax-discourse interface as they are overexplicit/redundant but not ambiguous, in accordance with PPVH, but this varies according to proficiency and context type, thus confirming that not all contexts at the syntax-discourse interface are equally problematic.

From a statistical point of view, we have not adopted a standard multiple-regression approach, in which typically all factors are pooled together and then the effect of individual/grouped factors is explored in a statistically motivated but linguistically unmotivated way. Instead, we have taken a linguistically motivated approach based on the systematic analysis of key factors reported individually in previous L2 research. We therefore explored how (a) one factor at a time crucially contributes to the choice of RE forms: for example, info status (topic-continuity/-shift), which has been consistently shown to be the main explanatory factor in the L2 literature; character effects (main/secondary); and the number of activated antecedents (2/3), and (b) how certain combinations of two (sub)factors contribute to RE choice (e.g., topic-continuity and syntax [coordination/subordination]; topic-continuity and the number of activated antecedents [2/3]). In short, the simultaneous analysis of all factors together in a linguistically unmotivated typically yields a blurry, imprecise picture of learners’ competence and often says little to theoretical models of SLA.

Our descriptive findings also present a systematic analysis of several factors that constrain the use of REs in real native and non-native English discourse. Crucially, a solid descriptive basis is often a prerequisite for subsequent theoretically motivated studies (e.g., Ellis, Reference Ellis2015; Ellis & Barkhuizen, Reference Ellis and Barkhuizen2005). Additionally, unlike most previous L2 English studies that tested L2ers at only a given proficiency level, these factors are better understood across proficiency levels, which sheds light on developmental and ultimate-attainment issues. Finally, our findings may help future researchers develop experiments to test these factors either individually or in conjunction. As recommended by Mendikoetxea and Lozano (Reference Mendikoetxea and Lozano2018), triangulating corpus and experimental data will be essential to get a wider (and, at the same time, more nuanced) picture of the phenomenon under investigation in SLA.

Footnotes

The experiment in this article earned an Open Data badge for transparent practices. The materials are available at http://www.learnercorpora.com

This research was supported by the research project ANACOR (FFI2016-75106-P) funded by MINECO (Spain) and awarded to the second author, and by a young researchers’ contract funded by the European Union-Junta de Andalucía awarded to the first author.

¹ In generative grammar, uninterpretable features are those that do not have a transparent semantic interpretation (e.g., agreement between the subject and the verb or between the adjective and noun), whereas interpretable features do (e.g., plural).

² Korean is a null-subject language. Reference to the sentential topic (typically the subject) is realized using a null pronoun.

³ A reviewer points out the limited size of our sample, but note that our sample and the raw frequency of RE forms analyzed is larger than the corpus-based L2 English studies we review, as the following comparative table shows. What matters is not only the actual frequency of RE forms analyzed but also the final number of terminal tags (N = 674) that are pooled into the statistical analysis (N ≈ 6,740). The high number of terminal tags is the result of a detailed tagset where several aspects of each RE are tagged (cf. Figure 1).

⁴ We report p values in the χ² output format reported by the UAM Corpus Tool software: p < 0.05 (significant difference) and p < 0.01 (highly significant difference).

⁵ Subject-pronoun drop has been previously reported in native English, particularly in informal registers such as diary-drop styles (e.g., Haegeman & Ihsane, Reference Haegeman and Ihsane2001). However, the novelty of our findings is that null pronouns are typically licensed in 3rd-person topic-continuity coordinate contexts (76.7%) and this has not been previously addressed in a systematic way as we do here.

⁶ We use the symbol “>” to indicate that the difference is large and “>>” to indicate a very large difference.

⁷ We excluded from our analysis the tag “cero antecedents” because this option appears only at the beginning of the narratives when introducing the first character (focus new introduction) and because its frequency is very low. As the story develops, two or more potential antecedents (boy, dog, frog) are introduced. We also excluded the tag “3+ antecedents,” as the frequencies were very low, thus leading to unreliable statistical contrasts.

⁸ We exclude from the analysis null pronouns and focus only on overt material (overt pronouns and NPs) due to (a) the low frequency of null pronouns (particularly in learners) and (b) the nonsignificant differences in any of the groups between two versus three antecedents for null pronouns.

References

REFERENCES

Alonso-Ovalle, L., Fernández-Solera, S., Frazier, L., & Charles, C. Jr. (2002). Null vs. overt pronouns and the topic-focus articulation in Spanish: 2704. Italian Journal of Linguistics, 14, 151–170.Google Scholar

Ariel, M. (2004). Accessibility marking: Discourse functions, discourse profiles, and processing cues. Discourse Processes, 37, 91–116. https://doi.org/10.1207/s15326950dp3702_2.CrossRef Google Scholar

Arnold, J., & Griffin, Z. M. (2007). The effect of additional characters on choice of referring expression: Everyone counts. Journal of Memory and Language, 56, 521–536. https://doi.org/10.1016/j.jml.2006.09.007.CrossRef Google Scholar PubMed

Carminati, M. N. (2002). The processing of Italian subject pronouns (Doctoral dissertation). Available from ProQuest Dissertations and Theses database. (AAI3039345). https://scholarworks.umass.edu/dissertations/AAI3039345.Google Scholar

Collewaert, K. (2019). Los mecanismos referenciales en el discurso oral del español como lengua extranjera (ELE): Un estudio de corpus basado en neerlandófonos aprendices de ELE (PhD dissertation). Vrije Universiteit Brussel and Universidad de Granada.Google Scholar

Contemori, C., & Dussias, P. E. (2016). Referential choice in a second language: Evidence for a listener-oriented approach. Language, Cognition and Neuroscience, 31, 1257–1272. https://doi.org/10.1080/23273798.2016.1220604.CrossRef Google Scholar

Crosthwaite, P. (2011). The effect of collaboration on the cohesion and coherence of L2 narrative discourse between English NS and Korean L2 English users. Asian EFL Journal, 13, 135–166.Google Scholar

Cunnings, I., Fotiadou, G., & Tsimpli, I. M. (2017). Anaphora resolution and reanalysis during L2 sentence processing: Evidence from the visual world paradigm. Studies in Second Language Acquisition, 39, 621–652. https://doi.org/10.1017/S0272263116000292.CrossRef Google Scholar

Ellis, R. (2015). Researching acquisition sequences: Idealization and de-idealization in SLA. Language Learning, 65, 181–209. https://doi.org/10.1111/lang.12089.CrossRef Google Scholar

Ellis, R., & Barkhuizen, G. P. (2005). Analysing learner language. Oxford University Press.Google Scholar

Granger, S., Gilquin, G., & Meunier, F. (2015). The Cambridge handbook of learner corpus research. Cambridge University Press.CrossRef Google Scholar

Haegeman, L. (2009). Understood subjects in English diaries. On the relevance of theoretical syntax for the study of register variation. Multilingua – Journal of Cross-Cultural and Interlanguage Communication, 9, 157–200. https://doi.org/10.1515/mult.1990.9.2.157.Google Scholar

Haegeman, L., & Ihsane, T. (2001). Adult null subjects in the non-pro-drop languages: Two diary dialects. Language Acquisition, 9, 329–346. https://doi.org/10.1207/S15327817LA0904_03.CrossRef Google Scholar

Hendriks, H. (2003). Using nouns for reference maintenance: A seeming contradiction in L2 discourse. In Giacalone, A. (Ed.), Typology and second language acquisition (pp. 291–326). Mouton De Gruyter.Google Scholar

Kang, J. Y. (2004). Telling a coherent story in a foreign language: Analysis of Korean EFL learners’ referential strategies in oral narrative discourse. Journal of Pragmatics, 36, 1975–1990. https://doi.org/10.1016/j.pragma.2004.03.007.CrossRef Google Scholar

Leclercq, P., & Lenart, E. (2013). Discourse cohesion and accessibility of referents in oral narratives: A comparison of L1 and L2 acquisition of French and English. Discours. Revue de Linguistique, Psycholinguistique et Informatique, 12, 3–31. https://doi.org/10.4000/discours.8801 Google Scholar

Lozano, C. (2009). Selective deficits at the syntax-discourse interface: Evidence from the CEDEL2 corpus. In Snape, N., Leung, Y. I., & Smith, M. Sharwood (Eds.), Language acquisition and language disorders (Vol. 47, pp. 127–166). John Benjamins. https://doi.org/10.1075/lald.47.09loz CrossRef Google Scholar

Lozano, C. (2016). Pragmatic principles in anaphora resolution at the syntax-discourse interface: Advanced English learners of Spanish in the CEDEL2 corpus. In Alonso-Ramos, M. (Ed.), Studies in corpus linguistics (Vol. 78, pp. 235–265). John Benjamins. https://doi.org/10.1075/scl.78.09loz CrossRef Google Scholar

Lozano, C. (2018). The development of anaphora resolution at the syntax-discourse interface: Pronominal subjects in Greek learners of Spanish. Journal of Psycholinguistic Research, 47, 411–430. https://doi.org/10.1007/s10936-017-9541-8.CrossRef Google Scholar PubMed

Martín-Villena, F., & Lozano, C. (2020). Anaphora resolution in topic continuity: Evidence from L1 English–L2 Spanish data in the CEDEL2 corpus. In Ryan, J. & Crosthwaite, P. (Eds.), Referring in a second language: Studies on reference to person in a multilingual world (pp. 119–141). Routledge. https://doi.org/10.4324/9780429263972-7 Google Scholar

Mayer, M. (1969). Frog, where are you? Dial Books for Young Readers.Google Scholar

Mendikoetxea, A., & Lozano, C. (2018). From corpora to experiments: Methodological triangulation in the study of word order at the interfaces in adult late bilinguals (L2 learners). Journal of Psycholinguistic Research, 47, 871–898. https://doi.org/10.1007/s10936-018-9560-0.CrossRef Google Scholar

Mitkovska, L., & Bužarovska, E. (2018). Subject pronoun (non)realization in the English learner language of Macedonian speakers. Second Language Research, 34, 463–485. https://doi.org/10.1177/0267658317747925.CrossRef Google Scholar

O’Donnell, M. J. (2008). The UAM CorpusTool: Software for corpus annotation and exploration. Proceedings of the Congreso de AESLA, 26, 1433–1447.Google Scholar

Pladevall Ballester, E. (2013). Adult instructed SLA of English subject properties. Canadian Journal of Linguistics, 58, 465–486. https://doi.org/10.1017/S0008413100002668.CrossRef Google Scholar

Prentza, A. (2014). Pronominal subjects in English L2 acquisition and in L1 Greek: Issues of interpretation, use and L1 transfer. In Nikolaos, L., Thomaï, A., & Sougari, A. M. (Eds), Major trends in theoretical and applied linguistics 2: Selected papers from the 20th ISTAL (pp. 369–386). Versita Ltd.Google Scholar

Ryan, J. (2015). Overexplicit referent tracking in L2 English: Strategy, avoidance, or myth? Language Learning, 65, 824–859. https://doi.org/10.1111/lang.12139.CrossRef Google Scholar

Slabakova, R. (2016). Second language acquisition. Oxford University Press.Google Scholar

Sorace, A. (2011). Pinning down the concept of “interface” in bilingualism. Linguistic Approaches to Bilingualism, 1, 1–33. https://doi.org/10.1075/lab.1.1.01sor.CrossRef Google Scholar

Sorace, A., & Serratrice, L. (2009). Internal and external interfaces in bilingual language development: Beyond structural overlap. International Journal of Bilingualism, 13, 195–210. https://doi.org/10.1177/1367006909339810.CrossRef Google Scholar

Tsimpli, I. M., & Dimitrakopoulou, M. (2007). The interpretability hypothesis: Evidence from wh-interrogatives in second language acquisition. Second Language Research, 23, 215–242. https://doi.org/10.1177/0267658307076546.CrossRef Google Scholar

Tsimpli, I., & Sorace, A. (2006). Differentiating interfaces: L2 performance in syntax-semantics and syntax-discourse phenomena. BUCLD Proceedings, 30, 653–664.Google Scholar

White, L. (2009). Grammatical theory: Interfaces and L2 knowledge. In Ritchie, W. C. & Bhatia, T. K. (Eds.), The new handbook of second language acquisition (pp. 49–68). Emerald.Google Scholar