1. Introduction
Processes of grammatical agreement are vital in language production. Even in languages with relatively poor morphology such as English, agreement processes are frequently employed. Nevertheless, despite the frequency of these processes, native speakers of languages that require agreement relations to be expressed occasionally produce “errors” in agreement (non-canonical agreement). Recent findings indicate that subject–verb number agreement error rates may differ both across and within languages as a function of the relative morphological richness of subject noun phrase elements (Vigliocco, Butterworth and Semenza, Reference Vigliocco, Butterworth and Semenza1995; Hartsuiker, Schriefers, Bock and Kikstra, Reference Hartsuiker, Schriefers, Bock and Kikstra2003; Lorimor, Bock, Zalkind, Sheyman and Beard, Reference Lorimor, Bock, Zalkind, Sheyman and Beard2008; Foote and Bock, Reference Foote and Bock2009). Based on the hypothesis that number morphology constrains or regulates the expression of number meaning, Eberhard, Cutting and Bock's (Reference Eberhard, Cutting and Bock2005) marking and morphing account of agreement production (see also Bock, Eberhard, Cutting, Meyer and Schriefers, Reference Bock, Eberhard, Cutting, Meyer and Schriefers2001) explains this finding as a difference in the extent to which the grammatical number of a subject noun phrase controls verb number, depending on the relative richness of the noun phrase's inflectional morphology. The primary goal of the current study is to determine whether this account of differences in agreement production as a function of morphological richness can be extended to account for agreement in bilinguals who speak languages that differ in morphological richness, specifically English and Spanish.
While native speakers of languages that require subject–verb number agreement to be expressed only occasionally produce agreement errors, language learners, or bilinguals, may experience difficulties in the processing and production of agreement (e.g., McDonald, Reference McDonald2000, Reference McDonald2006; Guillelmon and Grosjean, Reference Guillelmon and Grosjean2001; Jiang, Reference Jiang2004; Montrul, Foote and Perpiñán, Reference Montrul, Foote and Perpiñán2008). Several factors seem to affect linguistic knowledge in bilinguals, and degree of command and interaction of their two languages. Two of these factors in particular, both related to language experience, may come into play in the production of bilingual agreement; they are age of acquisition and proficiency. Second language (L2) learners typically begin exposure to the L2 late, around puberty (and are often referred to as late bilinguals); they usually do not reach native-like levels of proficiency in their L2, though it does seem to be possible, at least in some linguistic domains (e.g., Birdsong, Reference Birdsong1992; Ioup, Boustagui, El Tigi and Moselle, Reference Ioup, Boustagui, El Tigi and Moselle1994; White and Genesee, Reference White and Genesee1996; McDonald, Reference McDonald2000; Birdsong and Molis, Reference Birdsong and Molis2001). Many speakers of minority languages in the US (sometimes referred to as heritage language speakers; e.g., Valdés, Reference Valdés2005) are early bilinguals, who were exposed to the family and community language since birth or early in childhood. It appears that this type of bilingual may be qualitatively different from both monolinguals and late bilinguals. While many who maintain their fluency in both languages sound like monolinguals in their everyday language use, when examined in experimental settings it becomes evident that they do not behave exactly like them (e.g., Valdés, Reference Valdés2005; Montrul, Reference Montrul2008; Polinsky, Reference Polinsky2008). Depending on their level of proficiency, early bilinguals may not suffer from the same difficulties in the production of agreement as do late bilinguals, though some seem to pattern similarly to L2 learners in some linguistic domains (Lipski, Reference Lipski, Roca and Lipski1993; Montrul, Reference Montrul2008).
It is not yet completely clear how age of acquisition and proficiency interact to determine linguistic outcomes in bilinguals. A crucial question related to bilingual agreement production, and whether monolingual models of agreement can be extended to account for the case of bilingualism, is how these factors interact to influence the production of agreement. Therefore, a secondary goal of this study is to examine whether language experience (in terms of age of acquisition and/or proficiency) plays a role in whether agreement production in monolinguals and bilinguals proceeds in the same manner.
In order to investigate these issues experimentally, in the present study I examine subject–verb number agreement production in two types of bilinguals: late bilinguals (with age of onset of bilingualism after puberty) and early bilinguals (with age of onset in early childhood). I first ask whether the marking and morphing account of differences in agreement production as a function of morphological richness can be extended to the case of bilinguals, and second, whether age of acquisition and/or proficiency modulate whether or not these differences surface.
2. Agreement production in English and Spanish
In both English and Spanish, the subject of an utterance must agree in number (singular or plural) with its verb. Although subject–verb number agreement may seem a simple task for a native speaker, native speakers sometimes make agreement errors. The examination of such errors can reveal how agreement is produced.
2.1 Errors in agreement production
One common type of native speaker error in agreement production is attraction. Attraction occurs when the verb agrees with a nearby (local) noun instead of the head noun of the subject noun phrase (NP), as exemplified in (1):
(1) *The road to the mountains are long.
This occurs most often when the verb is in proximity to a plural noun that forms part of the subject NP but is not the head noun, as in the previous example. The opposite situation, exemplified in (2), does not usually cause attraction (Bock, Reference Bock1995).
(2) The roads to the mountain
According to Eberhard (Reference Eberhard1997), this singular–plural asymmetry exists because there is an underlying plural specification for plural nouns, but not for singular nouns. Studies investigating speech errors in a variety of languages with both rich and relatively poor agreement morphology have found attraction-related agreement errors along with the singular–plural asymmetry, including studies on English (Bock and Miller, Reference Bock and Miller1991), Dutch and German (Hartsuiker et al., Reference Hartsuiker, Schriefers, Bock and Kikstra2003), French (Fayol, Largy and Lemaire, Reference Fayol, Largy and Lemaire1994), Hebrew (Deutsch and Dank, Reference Deutsch and Dank2009), Italian (Vigliocco et al., Reference Vigliocco, Butterworth and Semenza1995), Russian, (Lorimor et al., Reference Lorimor, Bock, Zalkind, Sheyman and Beard2008) and Spanish (Vigliocco, Butterworth and Garrett, Reference Vigliocco, Butterworth and Garrett1996).
Besides this potential conflict in number specification, other types of number conflict may lead to agreement errors in native speakers. If the grammatical number of a subject NP does not match its notional or conceptual number, agreement errors may occur. Several studies have investigated the effects of this type of mismatch in sentences with distributive subject NPs. Distributive NPs are grammatically singular NPs that can denote multiple entities, as illustrated in (3):
(3) The uniform of the soldiers
This noun phrase can refer to a single type of uniform that is shared across all the soldiers, so that the uniform type applies to multiple instances of a soldier, generating multiple instances of the uniform in the NP's mental conceptualization. An NP of this type can thus be construed as notionally or conceptually plural, even though it is grammatically singular. On the other hand, non-distributive noun phrases, as exemplified in (4), tend to denote only one entity; they are both notionally and grammatically singular (though they may also be ambiguous to some extent).
(4) The home of the ladies
Despite these conceptual differences between distributive and non-distributive noun phrases, any verb that is to agree with such an NP is expected to be singular, since the head of the NP is grammatically singular in both cases. However, distributive subject NPs which are grammatically singular may elicit more plural verbs than non-distributive, grammatically singular NPs.
The majority of the studies that examine how agreement is produced, and whether factors such as conceptual plurality influence the process, use a variation of the sentence-fragment completion methodology developed by Bock and Miller (Reference Bock and Miller1991). In this paradigm, participants are presented either aurally or visually with a sentence fragment that consists of a complex subject NP, as in (5):
(5) The label on the bottles
Participants repeat and complete the fragment to make a complete sentence, as in (6):
(6) The label on the bottles is pretty.
The fragment is manipulated with regards to the number of the head and local nouns, and the matching or mismatching of these. The completions that the participant creates contain verb forms that either agree or disagree with the subject in number, thus permitting the examination of the effects of the fragment manipulations.
Studies investigating distributivity effects cross-linguistically have at times not found such effects in English (Bock and Miller, Reference Bock and Miller1991; Vigliocco, Butterworth et al., Reference Vigliocco, Butterworth and Garrett1996). For languages other than English, on the other hand, researchers have found robust distributivity effects (Italian and Spanish: Vigliocco et al., Reference Vigliocco, Butterworth and Semenza1995; Vigliocco, Butterworth et al., Reference Vigliocco, Butterworth and Garrett1996; French and Dutch: Vigliocco, Hartsuiker, Jarema and Kolk, Reference Vigliocco, Hartsuiker, Jarema and Kolk1996). However, recent findings suggest that the lack of distributivity effects in English is due to methodological issues. Eberhard (Reference Eberhard1999) showed that by making experimental phrases more imageable, distributivity effects surfaced in English. Bock, Carreiras and Meseguer (Reference Bock, Carreiras and Meseguer2009) found that when the same experimental items were used in English and Spanish, both languages showed an effect for distributivity. Foote and Bock (Reference Foote and Bock2009) also used translation equivalents to examine agreement and found distributivity effects in two varieties of Spanish as well as English. In summary, some of the early data from English seemed to support grammatical number as the only source of subject–verb number agreement, while the data from other Germanic and Romance languages suggested that notional or conceptual number exerts an effect on agreement in those languages. More recent studies do not support the existence of cross-linguistic differences in whether conceptual number plays a role in agreement. It appears to do so universally (at least in the languages studied so far).
Though the previously reported cross-linguistic differences in distributivity effects appear to be due to methodological issues, results from recent studies indicate that rates of attraction and the magnitude of conceptual effects both within and across languages may vary depending on the relative richness of the agreement morphology expressed in the subject noun phrase. Two studies investigating whether ambiguity in the expression of subject NP number morphology affects attraction rates within languages suggest that morphological ambiguity leads to higher agreement error counts in Italian (Vigliocco et al., Reference Vigliocco, Butterworth and Semenza1995) and in German and Dutch (Hartsuiker et al., Reference Hartsuiker, Schriefers, Bock and Kikstra2003). Lorimor et al. (Reference Lorimor, Bock, Zalkind, Sheyman and Beard2008) compared the magnitude of distributivity effects across several languages, drawing on results of previous studies and introducing new data from Russian, and found a clear pattern; languages with richer morphology (e.g., Russian, Spanish, Italian and French) showed a diminished distributivity effect in comparison to those with relatively poorer morphology (e.g., English). Foote and Bock (Reference Foote and Bock2009) compared the magnitude of distributivity effects across Mexican Spanish, Dominican Spanish and English. Mexican Spanish is a variety of Spanish that has maintained relatively rich inflectional morphology, with elements of the noun phrase consistently inflected for both gender and number. Dominican Spanish, on the other hand, appears to be undergoing changes reducing the relative richness of its morphological system in comparison to other varieties (Toribio, Reference Toribio2000). English is comparatively impoverished with respect to inflectional morphology, with only the noun marked for number in the noun phrase. Results of an experimental comparison of the magnitude of distributivity effects across these languages/language varieties (using the same materials as the current study) revealed that English and Dominican Spanish patterned alike, with larger distributivity effects than Mexican Spanish, providing further support for the hypothesis that the size of distributivity effects varies with morphological richness across languages.
2.2 The marking and morphing account of agreement production
The finding that both attraction rates and the magnitude of distributivity effects are negatively correlated with morphological richness can be explained by Eberhard et al.'s (Reference Eberhard, Cutting and Bock2005) marking and morphing account of agreement production. According to this account, when the speaker's preverbal message is formed, it is determined whether the referent that is to be expressed as the subject NP is one thing or more than one thing. This information is subsequently transmitted to the syntax during marking, which is part of the functional assembly stage of language production. The system also accesses the lexicon to select the appropriate lexical entries to reflect the conceptual number specified in the message, and a lexical–grammatical representation is formed. Next, during the stage of structural integration, morphing operations reconcile number features that were marked on the syntax with the number specifications from the lexicon. These reconciled features determine the number of the verb phrase. It is when the number features marked on the syntax and those specified from the lexicon do not match that agreement errors due to the influence of conceptual number may arise. When this type of mismatch occurs, the number specifications from the lexicon usually win out since they have the advantage of recency in comparison to marking with respect to when verb number is determined. However, the winner in the reconciliation process not only depends on recency, but also on the quantity of morphological information present in the number specifications from the lexicon. When there is less number morphology present, there is decreased opportunity for the “canceling out” of mismatching notional plurality that was marked in the syntax. The result is then what appears to be greater sensitivity to conceptual number, or larger distributivity effects in agreement production as a result of relative morphological richness.
3. Agreement production in bilinguals
The evidence currently available in support of the relationship between conceptual effects and morphological richness comes only from native speakers in their L1. A key question to ask is whether this relationship is borne out in bilingual production. Not only is this question important for determining how bilinguals make agreement happen, but also for the generalizability of the marking and morphing account. The examination of bilingual agreement production also allows for the negative correlation between morphological richness across languages and agreement errors to be tested in a unique way, specifically, within the context of one speaker. This eliminates any experimental confound due to differences between participants across language groups. However, other factors may come into play in the production of agreement by bilinguals. Bilingual individuals do not normally have equal command of their two languages; they are almost always dominant in one. Factors related to language experience, such as age of acquisition and proficiency, may determine which of their languages is the stronger one, and therefore are important to consider when asking whether a monolingual account of language production applies to the case of bilingualism.
3.1 Age of acquisition and proficiency as factors in language processing and production
There is a wealth of literature that addresses the possible effects of age on the acquisition of an L2. Many researchers frame their investigation of these effects around the idea of a critical or sensitive period, noting that the variability in the success of post-pubescent second language acquisition when compared with the success of first language acquisition suggests that there is some factor that sets first and (late) second language acquisition apart as fundamentally different (Bley-Vroman, Reference Bley-Vroman, Gass and Schachter1989; Clahsen and Muysken, Reference Clahsen and Muysken1989; Clahsen, Reference Clahsen1990). According to this view, it may be the case that late language acquirers no longer have access to innate linguistic mechanisms and principles that are used for first language acquisition (e.g., Meisel, Reference Meisel and Meisel1990, Reference Meisel1997; Eubank, Reference Eubank, Hoekstra and Schwartz1994; Hawkins and Chan, Reference Hawkins and Chan1997; Beck, Reference Beck1998; Clahsen and Felser, Reference Clahsen and Felser2006) or that adults no longer use the same cognitive mechanisms to acquire language as they did when they were children (e.g., Ullman, Reference Ullman2001; DeKeyser, Reference DeKeyser, Doughty and Long2003; Paradis, Reference Paradis2004). Other researchers point to examples of late language learners that show native-like competence in the L2 as vital pieces of counter-evidence for across-the-board age effects in L2 acquisition (e.g., Birdsong, Reference Birdsong1992; Ioup et al., Reference Ioup, Boustagui, El Tigi and Moselle1994; Schwartz and Sprouse, Reference Schwartz and Sprouse1996; White and Genesee, Reference White and Genesee1996; McDonald, Reference McDonald2000; Birdsong and Molis, Reference Birdsong and Molis2001; Dekydtspotter, Kim, Kim, Wang, Kim, and Lee, Reference Dekydtspotter, Kim, Kim, Wang, Kim, Lee, Chan, Jacob and Kapia2008). McDonald (Reference McDonald2006) presents one alternative to the hypothesis that age is responsible for less-than-native-like language acquisition outcomes, suggesting that what appear to be age effects are actually related to learner processing deficits. In two experiments, she provides evidence that native speakers of English can be made to perform similarly to late learners on grammaticality judgment tasks with the introduction of stressors to increase processing loads.
Interacting but not necessarily correlating with age of language acquisition is language proficiency. Though the Critical Period Hypothesis seems to predict native-like attainment in both languages of an early bilingual since both were learned before the closure of the hypothesized critical period, studies that examine early bilinguals who are speakers of minority languages show that this prediction does not always hold (e.g., Montrul, Reference Montrul2002, Reference Montrul2006; Montrul et al., Reference Montrul, Foote and Perpiñán2008; Polinsky, Reference Polinsky2008). The investigation of the linguistic characteristics of speakers of Spanish as a minority language in the US, for example, suggests that quantity and quality of linguistic input must be taken into account in addition to age of acquisition. While Spanish-speaking immigrants who arrived in the US during adulthood are typically late bilinguals with English as a weaker L2, their children often learn Spanish at home until starting schooling in English around age five. They then go on to become dominant in English rather than the early-learned minority language. Spanish is either not fully acquired, or becomes attrited (Valdés, Reference Valdés2005; Montrul, Reference Montrul2008). According to Montrul (Reference Montrul2008), heritage language acquisition shares traits with both early monolingual and late L2 acquisition, in that while heritage language acquirers do have early exposure to the language as well as input that is similar in quality, if not quantity, to input available to L1 acquirers, they do not necessarily completely acquire the heritage language.
Regardless of whether early onset of bilingualism is sufficient to ensure native-like attainment of a language, it seems that age may play a role in language acquisition outcomes, though it is not evident whether it is direct or indirect. Recent research efforts have been directed toward clarifying which linguistic structures and tasks are most susceptible to age effects. The processing and production of agreement morphology (or inflectional morphology in general) seems to be one of the linguistic tasks that is most affected (McDonald, Reference McDonald2000, Reference McDonald2006; Guillelmon and Grosjean, Reference Guillelmon and Grosjean2001; Jiang, Reference Jiang2004). However, early bilinguals may not experience such difficulties with the processing and production of agreement morphology, though there is little research that directly tests this claim. Guillelmon and Grosjean (Reference Guillelmon and Grosjean2001) examined whether proficient early and late English–French bilinguals process gender marking in the same way as native speakers of French, and found that early bilinguals performed similarly to native speakers, while late bilinguals did not. McDonald (Reference McDonald2000) found that early L1 Spanish learners of English as an L2 did not perform significantly differently than native speakers on third person present tense agreement in a grammaticality judgment task (though early L1 Vietnamese learners did). Montrul et al. (Reference Montrul, Foote and Perpiñán2008) compared proficiency-matched early and late English–Spanish bilinguals on several tasks involving the processing and production of gender agreement in Spanish and found that the early bilinguals (heritage speakers of Spanish) outperformed the late bilinguals in spontaneous, oral tasks.
To sum up, it appears that late bilinguals may have trouble utilizing knowledge of agreement when processing or producing the L2 in real time. Early bilinguals do not seem to suffer from the same problem. It is not surprising that even early bilinguals without “native-like” proficiency in their heritage language do not experience these difficulties, as agreement is one of the earliest aspects of grammar to be acquired, and thus may be less vulnerable to incomplete acquisition or loss than later-learned aspects (Montrul, Reference Montrul2008).
Relating these findings back to the primary goal of this study, it seems that age of acquisition and/or proficiency may need to be considered when determining whether the account of differences in agreement production as a function of morphological richness can be extended to the case of bilinguals. Both of these factors appear to be related either directly or indirectly to outcomes of bilingualism, and more specifically, to the processing and production of agreement morphology. However, according to Lorimor et al. (Reference Lorimor, Bock, Zalkind, Sheyman and Beard2008), relative morphological richness affects the occurrence of agreement errors “regardless of whether the speaker is accustomed, by virtue of his or her language experience, to rich or to sparse [agreemen] patterns” (p. 791). Although not specifically referring to the case of bilingualism, this statement suggests that whether a language is learned early or late, or whether a speaker is of low or high proficiency, the magnitude of conceptual effects in subject–verb number agreement production should depend on morphological richness and not on language experience, as a product of how the language production system is designed. Alternatively, it could be the case that morphological richness can only be taken into account by the system if a particular language was acquired early, or if sufficient proficiency in the language exists so that the system uses morphology in the same way as monolinguals. Findings cited above regarding the differential processing of agreement morphology in some types of bilinguals indicate that these factors may be crucial in how the production of agreement proceeds in bilingual speakers. However, the two published studies that exist on subject–verb number agreement in bilinguals do not clearly indicate whether this is the case.
3.2 Mechanisms of agreement production in bilinguals
Only two published studies have investigated bilingual subject–verb number agreement production from a psycholinguistic perspective, making use of the sentence fragment completion methodology developed by Bock and Miller (1991; see section 2.1). Neither of the studies directly examined or reported the magnitude of distributivity effects found.
Basing their study on the purported cross-linguistic differences between Spanish and English found by Vigliocco, Butterworth et al. (Reference Vigliocco, Butterworth and Garrett1996), Nicol, Teller and Greth (Reference Nicol, Teller, Greth and Nicol2001) reported the results of two experiments that tested for distributivity effects, one with native English speakers who were late L2 learners of Spanish, and one with early Spanish–English bilinguals who judged themselves to be equally proficient in both languages. The late learners were tested only in Spanish, and results showed conceptual effects on subject–verb number agreement production, similar to findings for monolingual Spanish speakers. The early bilinguals were tested only in English, and they showed distributivity effects in that language, in line with studies that show no differences between Spanish and English with respect to the presence of conceptual effects in agreement.
Nicol and Greth (Reference Nicol and Greth2003) looked at late L2 learners who had taken at least six semesters of Spanish at the university level. They tested them in both English and Spanish; results showed a distributivity effect in both languages. A calculation of the distributivity effect from the agreement error percentages reported in the paper reveals that the effects were roughly equivalent in the two languages, with the effect for Spanish being slightly larger (7.4% for English; 7.5% for Spanish). This does not support morphological richness as the determining factor in the magnitude of distributivity effects in bilinguals; however, different materials were used in the two languages tested.
4. The present study
In the two previous studies that examined subject–verb number agreement production in bilinguals, results did not clearly indicate whether morphological richness determines the degree to which conceptual number is taken into account. Nor is it obvious from these studies whether age of acquisition and/or proficiency have anything to do with whether or not differences in morphological richness affect agreement production within a particular bilingual speaker, as these variables were not directly manipulated. Critically, neither of the studies cited employed translation equivalents to test the participants in the two languages. These issues are addressed in the present study, which compares participants according to age of onset of bilingualism and proficiency, and uses stimuli that are translation equivalents in Spanish and English.
Two research questions are investigated in this study. First, can the account of differences in agreement production as a function of morphological richness be extended to the case of bilinguals? Considering the finding that the magnitude of conceptual effects in agreement varies in monolinguals of languages that differ in morphological richness, it is possible that bilingual speakers of English and Spanish may behave as monolingual speakers of their two languages, showing a relatively smaller distributivity effect in Spanish (relatively richer morphology) than in English (relatively poorer morphology). Alternatively, they may produce agreement the same way in both languages without taking richness of morphology into account, as seems to be suggested by Nicol and Greth's (2003) findings.
The second question addressed in this study asks whether language experience (age of acquisition and/or proficiency) modulates the existence of differences in conceptual effects within a bilingual individual. Though Lorimor et al. (Reference Lorimor, Bock, Zalkind, Sheyman and Beard2008) make the claim that language experience does not matter to the agreement production system, that it is the nature of the system to interact with agreement morphology in this way, this claim was not necessarily made to address the case of bilingualism, and it has not yet been tested with this population. Considering that bilinguals’ linguistic behavior seems to be influenced either directly or indirectly by both age of acquisition and proficiency, it is an open question whether their agreement production system functions in the same way as that of monolinguals, and whether this may vary depending on variables of language experience. Two experiments were conducted to address these questions.
5. Experiment 1: Method
5.1 Participants
One hundred and thirty-four English–Spanish bilinguals participated in Experiment 1; 108 of those were included for data analyses (the rest were excluded due to low proficiency scores; see below). All were faculty or students at a large Midwestern university in the United States. The majority of the bilinguals who spoke Spanish at home (early bilinguals) were from families that immigrated from Mexico. Bilinguals were divided into four participant groups based on age of onset of bilingualism and proficiency.
For the purposes of this experiment, early bilinguals were considered to be those who had begun to acquire both English and Spanish by the age of five; late bilinguals were those who had begun to learn Spanish after the age of eleven, and had no immersion experience in the language until age eighteen or later. Though age cut-offs in the consideration of early versus late bilingualism can vary, according to Meisel (Reference Meisel, Bhatia and Ritchie2004), the “gradual decline” of language acquisition ability begins by the age of five, with the critical period ending between the ages of seven and ten (p. 104). Other authors have set a cut-off for the critical period at puberty, considering ages eleven to fifteen or sixteen as the divide between early and late language acquisition. What definitively distinguishes the late bilinguals in this study from the early bilinguals is the complete lack of immersion in Spanish for the late bilinguals until age eighteen or later. According to Birdsong (Reference Birdsong2006), age of acquisition is not when bilinguals are first exposed to the L2, but rather when they are first immersed in it (p. 11), and age of acquisition rather than age of exposure is the most reliable predictor of bilingual outcomes (p. 12).
All bilinguals, both early and late, completed a written proficiency test in Spanish,Footnote 1 which included the vocabulary section of an MLA test (30 items) and the cloze section of the advanced DELE (Diplomas de Español como Lengua Extranjera; 20 items), a test used by the Spanish Ministry of Education and Culture as an official means of determining Spanish proficiency (see Online Supplementary Materials – Appendix 1 for sample items from each sectionFootnote 2). Mean scores are presented in Table 1. With 50 points possible at one point per item, scores from 40 to 50 placed participants into the advanced/native proficiency group; scores from 21 to 39 placed participants into the intermediate proficiency group; scores of 20 or lower placed participants into the low proficiency group (as noted above, 26 participants were excluded from data analyses due to low proficiency scores). Self-ratings in Spanish and English were also given by the participants as an additional measure of language proficiency; these were based on a scale from 1 (can barely speak/understand/read/write Spanish/English) to 5 (can speak/understand/read/write Spanish/English like a native speaker). Mean ratings are listed in Table 1. A two-tailed, independent samples t-test conducted on the Spanish self-ratings indicated that the two groups (as categorized by the written proficiency test) rated themselves differently (t(106) = 11.18, p < .001), so that self-rating scores reflected the results of the grammatical proficiency test. Therefore, although the written proficiency test was limited to the evaluation of vocabulary and some advanced grammatical knowledge, the self-ratings served to provide converging support for a difference in general proficiency between the two groups. Participant group characteristics are detailed in Table 1.
Table 1. Bilingual participant group characteristics, Experiment 1.

5.2 Stimuli
This study employed a sentence-fragment completion methodology similar to that developed by Bock and Miller (Reference Bock and Miller1991), but designed to encourage the distributive interpretation of the complex subject noun phrases which can denote multiple instances of a grammatically singular subject. This methodology used pictorial stimuli in addition to auditory stimuli to ensure that participants were not simply repeating and completing sentence fragments without fully realizing potential distributive readings of the stimuli, since previous research has indicated that the imageability of experimental stimuli may impact results (Eberhard, Reference Eberhard1999).
Auditory
Two sets of 32 pairs of complex subject NPs (sentence fragments) were used as target items (some of these were adapted from Bock and Miller, Reference Bock and Miller1991, and from Vigliocco, Butterworth et al., Reference Vigliocco, Butterworth and Garrett1996). One set was in English, and the other was made up of the Spanish translation equivalents of the English fragments.Footnote 3 Each complex subject NP consisted of a singular head noun followed by a prepositional phrase. Within one pair, each subject NP was identical except for the number of the local noun in the prepositional phrase. One of the pairs had a singular local noun, thus serving as a control, and the other had a plural one. Sixteen of the pairs had distributive (conceptually plural, but grammatically singular) referents as head nouns, as in (7), while the other 16 had single (conceptually and grammatically singular) referents, as in (8).
(7) The stamp on the envelopes
(8) The key to the suitcases
The distributivity status of each fragment was decided by a norming questionnaire, administered in Spanish to 37 undergraduate students at the Universidad Autónoma de Nuevo León, Mexico, and in English to 61 undergraduate students at a large Midwestern university in the US. Each participant was asked to decide if the given complex noun phrase referred to one thing, or more than one. Participant ratings were coded as 0 if they answered “one thing” and 1 if they answered “more than one thing”. In one version of the questionnaire given to 36 of the English speakers, fragments were listed with accompanying picture stimuli (see below). In the version given to the other 25 English speakers, and in the version given to the Spanish speakers, fragments were listed without pictures. The mean ratings of single vs. distributive referent fragments from each questionnaire (Spanish: single, 0.18, distributive, 0.29; English, no pictures: single, 0.13, distributive, 0.35; English, pictures: single, 0.23, distributive, 0.45) were compared using two-tailed, paired samples t-tests; the difference was significant whether fragments were listed with pictures or without (Spanish: t(15) = 4.80, p < .001; English (no pictures): t(15) = 5.09, p < .001; English (pictures): t(15) = 4.69, p < .001). The fragments in the Spanish set were counterbalanced for gender of head and local nouns so that there were equal numbers of matching (e.g., masculine–masculine and feminine–feminine) and mismatching (e.g., masculine–feminine and feminine–masculine) head and local nouns. Spanish fragments were considered to be morphologically richer than English fragments, since the number of the head noun is indicated not only on the noun in Spanish, but also on the determiner, as illustrated in (9a–b):
- (9)
a. The (sing. or plural) label (sing.) on the bottles
b. La (sing.) etiqueta (sing.) en las botellas
See Online Supplementary Materials – Appendix 2 for the complete list of experimental items, in English and in Spanish.
Two sets of 64 filler subject NPs in the form of determiner–noun were used in this experiment, 16 of which were singular and 48 of which were plural in order to balance the number of correct singular and plural responses in the experiment. One set of fillers was in English, and the other set was made up of the Spanish translation equivalents of the English fillers. Two sets of two 96-item lists, one set in English and one set in Spanish, were formed by combining 32 experimental phrases, one from each pair, with all 64 filler phrases in each language. Within one list, eight experimental phrases represented each of the four conditions. Across both lists in a set, each experimental phrase appeared only one time. Participants completed eight practice items, and then each list began with two randomly determined singular and two randomly determined plural phrases. Subsequent fillers and experimental items appeared randomly with the exception that no two experimental items could appear consecutively. The English fragments were recorded by a native speaker of American English, and the Spanish by a native speaker of Mexican Spanish, both female.
Pictorial
Each fragment, both target and filler, was associated with a line drawing of its referent, which was black and white except for the representation of the head noun, which was green, blue, yellow or red. The drawings were created by modifying clipart files using a graphics software program. They were displayed in the center of a computer screen during the experiment. The average size of each picture was approximately the same. Figure 1 (single referent, sp fragment) and Figure 2 (distributive referent, sp fragment) provide examples of the pictorial stimuli.

Figure 1. Example of pictorial stimuli, single referent, sp fragment. Fragment prompt: “The light over the tables”; expected response: “The light over the tables is blue.”

Figure 2. Example of pictorial stimuli, distributive referent, sp fragment. Fragment prompt: “The label on the bottles”; expected response: “The label on the bottles is yellow.”
5.3 Procedure
All participants completed the experiment individually in two sessions, held at least one week apart. The first session was conducted in English and the second in Spanish. If a participant received a particular stimuli list in English, he or she then received the opposite list in Spanish. In addition to a language history questionnaire to establish native language and to detail experience with foreign languages, participants completed a Spanish proficiency test and self-ratings of proficiency as described above.
Participants were seated in front of a computer and a microphone connected to a digital voice recorder. The instructions were displayed on the computer screen; they informed participants that they would see a picture in the center of the screen, accompanied by a sentence fragment played over the computer speakers. They would then have to repeat the fragment and make it into a complete sentence as quickly and accurately as possible, naming the color of the head noun's referent according to the picture shown on the screen (e.g., in the example of a single referent fragment illustrated in Figure 1, the light above the tables was blue; in the distributive referent example, the label on the bottles was yellow).
Participants completed eight practice trials. During these trials, correction was provided if the participant did not follow the correct procedure; no correction of agreement errors was provided. Each trial's format was the same. The participant pressed the space bar to advance to the trial. A blank screen was presented for 1.5 seconds, followed by a picture appearing in the middle of the screen. When the picture appeared, the corresponding sentence fragment was played one time over the computer speakers. The picture remained on the screen until the space bar was pressed to advance to the next trial. All responses were digitally recorded.
6. Experiment 1: Results
6.1 Scoring
Each recorded response was transcribed and assigned one of the following agreement scores: (1) Correct response (exact repetition of the fragment with correct verb number); (2) Agreement error; (3) Repetition error; (4) Repetition and agreement error; or (5) Miscellaneous response (failed to repeat the whole fragment, did not produce a verb, no response). See Online Supplementary Materials – Table 1 for examples of each type of response.
6.2 Results
In order to address the research questions of the current study, I will first establish the existence of a distributivity effect in both languages of the bilingual participants, and then look at the magnitude of the effect across languages for each bilingual group. A total of 3,456 responses were scored in each language according to the categories listed above (results are presented for experimental items only). The within-category totals for each speaker group in English and in Spanish are listed in Table 2, along with mean percentages for each category within each group.
Table 2. Within-category total responses by group (percentages in parentheses), Experiment 1.

Since almost all responses fell into the “correct response” or the “agreement error” categories, only agreement errors were further analyzed. The distribution of responses in these two categories by experimental condition (single vs. distributive referent, including both experimental items and controls), by group and by language, is presented in Table 3.
Table 3. Distribution of responses by experimental condition, group and language (percentages in parentheses), Experiment 1.

As Table 3 illustrates, agreement errors were most common for distributive referent sentence fragments with a singular head noun and a plural local noun, as in (10):
(10) The stamp on the envelopes
The few errors in the single referent condition also occurred when participants repeated and completed fragments that had a singular head noun and a plural local noun. Overall, errors only occurred when the head and local noun were mismatched in number, in accord with previous findings in the monolingual literature. All participant groups made fewer errors with single referent fragments than distributive referent fragments in English as well as in Spanish, also in accord with recent findings for monolinguals. Net error rates for the singular–plural conditions by group (computed by dividing the number of errors in the singular–plural condition by the number of possible responses for the condition after having subtracted errors in the singular–singular control condition) are presented in Figure 3 for English and Figure 4 for Spanish. According to Figures 3 and 4, there is a distributivity effect for all groups in both English and Spanish. The statistical analyses of the agreement errors confirm this.

Figure 3. Net error rates for singular–plural conditions (single vs. distributive referent), English, Experiment 1.

Figure 4. Net error rates for singular–plural conditions (single vs. distributive referent), Spanish, Experiment 1.
Global analyses of variance were conducted on results for all groups by participants and by items. For the by-participants ANOVA, item type (single referent vs. distributive referent) and language (English vs. Spanish) were within-participant variables of two levels each, and age of acquisition (early vs. late) and proficiency (intermediate vs. advanced) were between-participant variables of two levels each. For the by-items ANOVA, age of acquisition (early vs. late), proficiency (intermediate vs. advanced), and language (English vs. Spanish) were within-item variables of two levels each and item type (single vs. distributive referent) was a between-item variable of two levels. Since no agreement errors were produced in singular–singular conditions, rather than analyzing raw agreement scores and thus including those cells with zero errors (which would violate ANOVA assumptions), difference scores were analyzed (the proportional number of errors in the singular–plural condition minus the proportional number of errors in the singular–singular condition; see Hartsuiker, Kolk and Huinck, 1999). A distributivity effect appears as a main effect of item type in this analysis.
The mixed ANOVA by participants yielded a main effect of item type [F(1,104) = 99.75; MSE = 0.02; p < .001; η2 = 0.49], a main effect of age of acquisition [F(1,104) = 17.90; MSE = 0.03; p < .001; η2 = 0.15], and a main effect of proficiency [F(1,104) = 8.55; MSE =0.03; p < .01; η2 = 0.08]. Significant interactions included item type × age of acquisition [F(1,104) = 22.53; MSE = 0.02; p < .001; η2 = 0.18] and item type × proficiency [F(1,104) = 6.05; MSE = 0.02; p < .05; η2 = 0.06]. The ANOVA by items showed similar results, with a main effect of item type [F(1,30) = 47.95; MSE = 0.03; p < .001; η2 = 0.62], a main effect of age of acquisition [F(1,30) = 39.58; MSE = 0.01; p < .001; η2 = 0.57], and a main effect of proficiency [F(1,30) = 22.19; MSE = 0.01; p < .001; η2 = 0.43]. There were significant interactions between item type and age of acquisition [F(1,30) = 37.49; MSE = 0.01; p < .001; η2 = 0.56] and item type and proficiency [F(1,30) = 11.81; MSE = 0.01; p < .01; η2 = 0.28].
The main effect of item type confirms a distributivity effect across participants, while the lack of main effect of language (by participants: [F(1,104) = 2.38; MSE = 0.01; p = .126; η2 = 0.02]; by items: [F(1,30) = 1.90; MSE = 0.01; p = .178; η2 = 0.06]) indicates that the overall quantity of agreement errors did not differ across the two languages. The main effects of age of acquisition and proficiency indicate differences in quantity of agreement errors according to participant age of acquisition and proficiency. Specifically, the early bilinguals made more errors than the late bilinguals, and the intermediate bilinguals made more errors than the advanced bilinguals. The interactions between item type and both age of acquisition and proficiency show that item type had a different effect for participants according to these two factors. Separate ANOVAs within each group with language and item type as within-participant variables of two levels each indicated that while there were distributivity effects in each group, the magnitude of these effects varied, with the largest effects in the early and intermediate groups (early-intermediate: [F(1,15) = 19.37; MSE = 0.06; p < .001; η2 = 0.56]; early-advanced: [F(1,21) = 37.72; MSE = 0.02; p < .001; η2 = 0.64]; late-intermediate: [F(1,51) = 42.89; MSE = 0.01; p < .001; η2 = 0.46]; late-advanced: [F(1,17) = 7.16; MSE = 0.01; p < .05; η2 = 0.30]).
Taken together, these analyses indicate, first, that all groups showed distributivity effects in the production of agreement in both of their languages, which is consistent with Nicol and Greth's (2003) findings, and with recent findings for monolingual speakers of English and Spanish. Second, both age of acquisition and degree of language proficiency appear to play a role in quantity of agreement errors produced. Specifically, while all groups, regardless of age of acquisition or proficiency, evidenced distributivity effects in both English and Spanish, the magnitude of these effects varied according to whether bilinguals were early or late learners of Spanish, and whether they were of intermediate or advanced proficiency in Spanish (recall that they were all early learners and of native proficiency in English).
Having established distributivity effects in both languages of all four participant groups, we now turn to the role of morphological richness in the magnitude of these effects across the two languages within each group. Figure 5 illustrates distributivity effects (the proportional difference in error frequency between the distributive and single referent conditions after subtracting error frequencies in the respective control conditions from the experimental conditions) in each language for each participant group separately. According to Figure 5, all groups patterned similarly to monolingual speakers; distributivity effects were larger in English (relatively less rich morphologically) than in Spanish (relatively more rich). However, the lack of item type × language interaction in either the by-participants [F(1,104) = 2.70; MSE = 0.01; p = .103; η2 = 0.03] or the by-items [F(1,104) = 1.86; MSE = 0.01; p = .183; η2 = 0.06] analysis indicates that the effects in the two languages were not statistically different.

Figure 5. Distributivity effect by participant group, Experiment 1.
In sum, while statistical analyses revealed no significant differences between the magnitude of the effects in the two languages, all groups showed the pattern predicted by the account of differences in agreement production as a function of morphological richness, in that distributivity effects were larger in English than in Spanish. However, in the late-intermediate group, the difference in effects across the two languages was only 0.0048 (less than half of 1%). The small size of this difference could mean that a lower level of proficiency makes a difference in the degree to which relative morphological richness can constrain the effect of conceptual number. Yet the difference between distributivity effects in English and Spanish in the early-intermediate group was similar to that of the two more proficient groups, suggesting that late age of acquisition could also play a role. However, it is quite possible that the early-intermediate bilinguals’ proficiency was underestimated due to the nature of the proficiency test. Previous research indicates that late bilinguals who have learned the L2 in a primarily academic environment show advantages over early bilinguals in written tasks such as the proficiency test employed in this study (e.g., Montrul et al., Reference Montrul, Foote and Perpiñán2008), so that the late bilinguals’ proficiency may have been overestimated while the early bilinguals’ proficiency may have been underestimated. Self-ratings of the two intermediate groups also suggest that the early-intermediate group was more proficient than the late-intermediate group (see Table 1), as do the percentages of repetition errors produced in each group (7.2% for the early-intermediate group vs. 12.7% for the late-intermediate). This, in conjunction with the fact that the late-advanced group showed the correct pattern, suggests that it is proficiency level rather than age of acquisition that is responsible for the relatively small difference between languages in the late-intermediate group.
The patterning of the rest of the groups provides tentative support for the extension of the marking and morphing account to the case of bilinguals. However, one methodological issue in the current study that may have contributed to these results is that all participants completed the English session of the experiment before they completed the Spanish session. In addition, the same items were used in both languages, though no participant saw the same item in the same condition more than once (so that if a particular participant saw “The label on the bottle” in English, he or she saw “The label on the bottles” in Spanish). It is therefore possible that the pattern of effects in English and Spanish could be due to repetition effects. Specifically, if the participants realized they were making agreement errors during the English session, they may have monitored their speech more closely during the Spanish session so as to avoid making errors. This closer monitoring could then have resulted in fewer errors and thus a smaller distributivity effect in Spanish than in English. In other words, the pattern of smaller effects in Spanish than in English could be due to increased monitoring during the second experimental session rather than the relatively richer morphology of Spanish.
In order to determine whether Experiment 1 results were an artifact of the experimental design, a second experiment was conducted with different participants in which the language order was counterbalanced and no participant saw any item more than once. In addition to an early-advanced and a late-advanced group, a late-intermediate group was again tested to determine whether a lower proficiency level may affect whether morphological richness in the L2 can constrain agreement production. Furthermore, the late-advanced group in Experiment 2 consisted of bilinguals with Spanish rather than English as their dominant language. This group was included to examine whether the same pattern of results found in Experiment 1 would obtain when the bilinguals’ dominant language was different.
7. Experiment 2: Method
7.1 Participants
Forty-eight bilinguals participated in Experiment 2. All but two (who were members of the local community with bachelor's degrees) were faculty or students at a large Midwestern university in the US. In this experiment, three participant groups were tested; one group consisted of early bilinguals of advanced proficiency in Spanish, a second group consisted of late bilinguals of advanced proficiency in English, and a third group was made up of late bilinguals of intermediate proficiency in Spanish. For this experiment, all but two of the early bilinguals had begun to acquire both English and Spanish by the age of five; these two began learning English at the ages of nine and ten. The late-advanced bilinguals were all native speakers of Spanish with some English study in school, but who began immersion experience at the age of fifteen or later (mean age of immersion = 21); none were from the Dominican Republic. The late-intermediate bilinguals were native speakers of English who began learning Spanish in school at the age of twelve or later, but had no immersion experience until age eighteen or later. The late-intermediate bilinguals completed the same proficiency test used in Experiment 1 but, due to time constraints, the early bilingual group completed only the vocabulary section. The late-advanced bilinguals did not complete a proficiency test, but all had sufficient proficiency in English to be studying and working in an English-speaking environment. All participants also gave self-ratings of proficiency in both English and Spanish. Participant group characteristics are listed in Table 4.
Table 4. Bilingual participant group characteristics, Experiment 2.

Note: a Proficiency score is listed as a percentage since the max score was 30 for the early bilinguals, but 50 for the late-intermediate bilinguals.
7.2 Stimuli
The methodology and stimuli of Experiment 2 were the same as Experiment 1, with the following changes. First, 32 additional pairs of complex subject noun phrases of the same structure and conditions were added to the target items of Experiment 1 for a total of 64 pairs of target NPs per language set (see Online Supplementary Materials – Appendix 3). Distributivity ratings for the entire set of items were collected by the same questionnaires given to the same participants as in Experiment 1 (mean ratings, Spanish: single, 0.19, distributive, 0.32; English, no pictures: single, 0.15, distributive, 0.34; English, pictures: single, 0.23, distributive, 0.53). The mean ratings of single vs. distributive referent fragments were significantly different from each other according to t-test results (Spanish: t(31) = 8.56, p < .001; English (no pictures): t(31) = 5.77, p < .001; English (pictures): t(31) = 8.95, p < .001). The same 64 filler subject NPs were employed in Experiment 2 as in Experiment 1. Four sets of two 96-item lists, one in English and one in Spanish, were created by combining 32 experimental phrases with all 64 filler phrases; list characteristics were the same as in Experiment 1, except that lists were structured so that no participant saw the same fragment in both English and Spanish.
7.3 Procedure
The procedure was the same as in Experiment 1, except that order of language was counterbalanced, with half of the participants in each group completing the English session first, and the other half completing the Spanish session first. In addition, there were three participants who completed both English and Spanish sessions on the same day, but with other experimental tasks not related to this study in between the two sessions.
8. Experiment 2: Results
8.1 Scoring
Scoring was the same as in Experiment 1.
8.2 Results
Participants produced a total of 1,024 responses in each language; results are presented only for experimental items. Table 5 lists within-category totals and mean percentages by category for each participant group in English and Spanish.
Table 5. Within-category total responses by group (percentages in parentheses), Experiment 2.

As in Experiment 1, only agreement errors were further analyzed. Table 6 presents the distribution of responses by experimental condition, by group and by language. According to Table 6, agreement error patterns were similar to those found in the first experiment. The most errors occurred in distributive referent sentence fragments with a singular head and a plural local noun (illustrated above in (10)). The majority of the rest of the errors were with single referent sentence fragments with a singular head and a plural local noun. Both of the advanced groups produced a few errors in the control conditions, but overall the results mirror those of Experiment 1, with distributivity effects in all three groups. Net error rates for the singular–plural conditions by group are illustrated in Figure 6 for English and Figure 7 for Spanish.
Table 6. Distribution of responses by experimental condition, group and language (percentages in parentheses), Experiment 2.


Figure 6. Net error rates for singular–plural conditions (single vs. distributive referent), English, Experiment 2.

Figure 7. Net error rates for singular–plural conditions (single vs. distributive referent), Spanish, Experiment 2.
Figures 6 and 7 indicate a distributivity effect for all groups in English and in Spanish. Global analyses of variance conducted by participants and by items confirmed this effect. For the by-participants ANOVA, item type (single referent vs. distributive referent) and language (English vs. Spanish) were within-participant variables of two levels each, and group (early vs. late-advanced vs. late-intermediate bilinguals) was a between-participant variable of three levels. For the by-items ANOVA, group (early vs. late-advanced vs. late-intermediate bilinguals) and language (English vs. Spanish) were within-item variables of three and two levels respectively, and item type (single referent vs. distributive referent) was a between-item variable of two levels. Difference scores rather than raw agreement scores were analyzed, as in Experiment 1, so that a distributivity effect appears as a main effect of item type.
The by-participants analysis yielded a main effect of item type [F(1,45) = 64.18; MSE = 0.03; p < .001; η2 = 0.59], a main effect of language [F(1,45) = 12.02; MSE = 0.03; p < .01; η2 = 0.21], and a main effect of group [F(2,45) = 12.75; MSE = 0.06; p < .001; η2 = 0.36]. Significant interactions included item type × group [F(2,45) = 7.14; MSE = 0.03; p < .01; η2 = 0.24], item type × language [F(1,45) = 6.15; MSE = 0.01; p < .05; η2 = 0.12], language × group [F(2,45) = 0.03; MSE = 1.67; p < .05; η2 = 0.14], and item type × language × group [F(2,45) = 6.89; MSE = 0.01; p < .01; η2 = 0.23]. The by-items analysis yielded similar results, with a main effect of item type [F(1,62) = 87.86; MSE = 0.04; p < .001; η2 = 0.59], a main effect of language [F(1,62) = 12.02; MSE = 0.05; p < .01; η2 = 0.16], and a main effect of group [F(2,124) = 45.08; MSE = 0.03; p < .001; η2 = 0.42]. As in the by-participants analysis, interactions included item type × group [F(2,124) = 12.52; MSE = 0.03; p < .01; η2 = 0.17], language × group [F(2,124) = 9.15; MSE = 0.02; p < .001; η2 = 0.13], and item type × language × group [F(2,124) = 7.66; MSE = 0.02; p < .01; η2 = 0.11], but the interaction between item type and language did not reach significance [F(1,62) = 2.81; MSE = 0.05; p = .099; η2 = 0.04]. To summarize the results of these analyses, the main effect of item type confirms a distributivity effect, while the main effect of language indicates that there were more errors in English than in Spanish. Similarly, the main effect of group reflects the larger number of errors made by the early bilingual group than the other two groups (according to a post-hoc Tukey). The item type × group interaction indicates that the magnitude of the distributivity effects varied across groups. Separate ANOVAs within each group with language and item type as within-participant variables of two levels each revealed that conceptual number had the largest effect in the early bilingual group [F(1,15) = 37.14; MSE = 0.04; p < .001; η2 = 0.71], the next largest in the late-advanced group [F(1,15) = 28.44; MSE = 0.02; p < .001; η2 = 0.66], and the smallest effect in the late-intermediate group [F(1,15) = 5.51; MSE = 0.03; p < .05; η2 = 0.27]. The language × group interaction shows that language affected the groups differently in terms of quantity of agreement errors. According to the separate ANOVAs conducted within each group, the late-advanced group made significantly more overall errors in English than in Spanish [F(1,15) = 15.78; MSE = 0.02; p < .01; η2 = 0.51], while the early bilingual and late-intermediate groups did not differ in overall errors in the two languages (early bilingual: [F(1,15) = 3.73; MSE = 0.05; p = .073; η2 = 0.20]; late-intermediate: [F(1,15) = 0.15; MSE = 0.01; p = .708; η2 = 0.01]). Similar to Experiment 1 results, both early and late bilinguals showed distributivity effects in the production of agreement in English and Spanish. The overall number of errors varied by group, with the early bilinguals making more errors than the two groups of late bilinguals, and accordingly, item type played a larger role in the early bilinguals’ production than in that of the late bilinguals.
In terms of the present study's research questions, the item type × language interaction in the by-participants analysis reflects the fact that item type had a larger effect in English than in Spanish, and the item type × language × group interaction in both analyses shows that the extent to which this was the case varied by group. Distributivity effects in each language are illustrated by group in Figure 8. Just as in Experiment 1, the early and the late-advanced groups patterned similarly to monolinguals, with larger distributivity effects in English (relatively less morphologically rich) than in Spanish (relatively more rich). In contrast to Experiment 1, however, late-intermediate bilinguals did not even pattern in the right direction in this experiment. Instead of a larger distributivity effect in English, they showed a larger effect in Spanish. Paired samples t-tests indicated that the effects in the two languages were statistically equivalent in the early bilinguals [t(15) = 0.73; p = .478] and in the late-intermediate bilinguals [t(15) = 1.31; p = .211], but different in the late-advanced bilinguals [t(15) = 3.96; p < .01].

Figure 8. Distributivity effect by participant group, Experiment 2.
In summary, the results of Experiment 2 suggest that the results of Experiment 1 are not due to the lack of counterbalancing of the two language sessions, or to the repetition of the same items in different languages; the same patterns held for the more advanced bilinguals (whether English or Spanish dominant), with larger distributivity effects in English than in Spanish. Results for the late-intermediate bilinguals are also similar to those of the late-intermediate bilinguals of Experiment 1, in that morphological richness in the L2 does not seem to be taken into account to the same extent by these speakers as by more advanced bilinguals.
9. Discussion
The main goal of this study was to determine whether the account of differences in the production of agreement as a function of morphological richness could be extended to the case of bilinguals. The secondary goal was to investigate whether factors of language experience such as age of acquisition and/or proficiency would modulate whether relative morphological richness across languages plays the same role in bilinguals as it does in monolinguals. Results of both Experiments 1 and 2 of the present study suggest that mechanisms of agreement in bilinguals are similar to mechanisms of agreement in monolinguals regardless of age of acquisition; however, proficiency may play a role in the extent to which relative morphological richness is taken into account by the production system. Conceptual number influenced the production of subject–verb number agreement in English and Spanish in all groups tested, and in all groups except for the late-intermediate bilinguals, the magnitude of its influence varied cross-linguistically with larger distributivity effects in English than in Spanish. In contrast, in the late-intermediate group of Experiment 1, the difference in effects across the two languages was very small, suggesting that bilinguals of lower proficiency may not be able to make use of number morphology in the same way as do more proficient bilinguals. Experiment 2 results with similar participants seemed to confirm this, with a slightly larger distributivity effect in Spanish than in English. Language dominance was not an issue; in Experiment 2, the late learners of English who were Spanish-dominant patterned in the same way as did the more advanced English-dominant bilinguals of Experiments 1 and 2. It is the case that the differences in the magnitude of distributivity effects across languages in all of these participant groups only reached statistical significance in the late learners of English in Experiment 2, which necessarily makes these conclusions tentative. However, as Lorimor et al. (2008, p. 793) point out, because the relevant observations within the data set are so few, the standard statistical analyses used to test the null hypothesis of no difference are not necessarily valid. Therefore, instead of relying solely on these analyses, Lorimor et al. suggest that replication serves as the best means of discovering whether experimental effects are generalizable. The fact that the same pattern was replicated within all groups tested in the present study (with the exception of the late-intermediate groups) across two different experiments with different sets of experimental items supports the existence of a real effect of morphological richness in bilinguals of differing ages of acquisition.
In terms of Eberhard et al.'s (Reference Eberhard, Cutting and Bock2005) marking and morphing account of agreement production, when number features marked in the syntax do not match those specified from the lexicon, and morphing operations must reconcile these two sources of number information, the richer morphology in Spanish (with number indicated on the determiner as well as the noun) is more likely to cancel out mismatching conceptual plurality marked in the syntax than the relatively less rich morphology in English (with number indicated only on the noun). This occurs as a consequence of the design of the language production system, and does not vary depending on whether a speaker is monolingual or bilingual, or whether he or she is an early-acquirer of a morphologically rich or poor language, though it does seem to vary depending on proficiency in the morphologically richer language. For all but the late-intermediate groups of the present study, if the number morphology was specified from the lexicon (as evidenced by the production of singular or plural morphology), then this morphology served the constraining function proposed in the marking and morphing account. The implication of this finding is that language production mechanisms do not differ in nature depending on the language spoken, either between different speakers or within the same speaker; agreement is carried out in the same basic manner, but taking into account the different morphology both across and within languages.
The question that remains is why the late-intermediate bilinguals of the present study did not seem to make use of number morphology in Spanish to constrain agreement production in the same way as the other bilingual groups. Key to the account of “canceling-out” effects in morphologically richer languages is that number morphology be present for these effects to occur. Based on this assumption, one possible explanation for the late-intermediate bilingual results is that number morphology was not present in these participants’ Spanish production. It does not seem to be the case, however, that these learners do not know Spanish number morphology; the late-intermediate bilinguals made only two errors in number morphology on the determiner, so that in all but these two cases, if a noun was singular, the determiner accompanying the noun was also singular, and if the noun was plural, the determiner was also plural. Since there were so few errors, it can be tentatively concluded that number morphology was appropriately present in the late-intermediate bilinguals’ Spanish production, but that they were unable to make use of it during morphing operations to cancel out mismatching notional number that was marked in the syntax. This could be due to some sort of deficit in morphological knowledge (e.g., along the lines of the account proposed by McCarthy, Reference McCarthy2008), though it seems likely that such a deficit would be evidenced by production errors, which was not the case in the present study. Alternatively, it may be that these learners do take into account number morphology as predicted by the marking and morphing account, but that the heavier cognitive load associated with producing the non-proficient L2 increases the likelihood of agreement errors in Spanish in comparison to English, thus obscuring cross-linguistic effects due to morphological richness. This possibility receives support from research that shows that the addition of a secondary task increases the occurrence of agreement errors (Fayol et al., Reference Fayol, Largy and Lemaire1994; Hartsuiker and Barkhuysen, Reference Hartsuiker and Barkhuysen2006).Footnote 4 It also seems to be in accord with McDonald's (Reference McDonald2006) proposal that the increased processing load associated with L2 use may be responsible for non-native-like performance. Future research comparing lower proficiency Spanish learners’ agreement production in English and Spanish when given a secondary task to complete in English may shed more light on which explanation is the correct one. Specifically, by increasing the cognitive load in the learners’ L1, one can presumably balance the likelihood of agreement errors in the learners’ two languages. In this way, cross-linguistic effects due to morphological richness that were obscured by the elevated error rate in Spanish (because of the increased processing load associated with L2 use) may be revealed. If, with an increased cognitive load in English, non-proficient bilinguals pattern more like the proficient bilinguals of the current study, we can then conclude that the processing load alternative provides the best explanation for current results.
In summary, results of two experiments carried out to determine whether the account of differences in agreement production as a function of morphological richness can be extended to the case of bilinguals tentatively suggest that it can and, furthermore, that age of acquisition does not modulate how the basic mechanisms of agreement production proceed, though proficiency may. Further replication of this study's pattern of results with other materials and participants can serve to corroborate present findings.