Bilingualism is pervasive among people who do not belong to an economically and culturally dominant country (Myers-Scotton, Reference Myers-Scotton2006). This has encouraged scholars to investigate commonalities and differences between language processing in the mother tongue (L1) and another known, so-called second language (L2). Reviews of this research can be found in De Groot (Reference de Groot2010), Altarriba & Isurin (Reference Altarriba and Isurin2014), Heredia & Altarriba (Reference Heredia and Altarriba2014), and Tokowicz (Reference Tokowicz2014). We limit ourselves to studies on visual word recognition.
Evidence against selective access
For a long time, researchers started from the hypothesis that words in L1 and L2 were stored in separate lexicons, and tested whether participants had selective access to one or the other lexicon (Kroll & Stewart, Reference Kroll and Stewart1994). The conclusion from this line of research was that selective access does not exist and that even the existence of distinct lexicons is unlikely (Brysbaert & Dijkstra, Reference Brysbaert, Dijkstra, Morais and d'Ydewalle2006; Brysbaert & Duyck, Reference Brysbaert and Duyck2010; Jin, Reference Jin2013; Kroll, Bobb & Wodniecka, Reference Kroll, Bobb and Wodniecka2006; Tokowicz, Reference Tokowicz2014).
Much research focused on words shared between the languages, either with the same meaning (called cognates) or with different meanings (interlingual homographs). With respect to cognates, Costa, Caramazza and Sebastian-Galles (Reference Costa, Caramazza and Sebastian-Galles2000) reported that bilinguals name pictures with cognate names faster than matched pictures with non-cognate names. The cognate advantage has been obtained in many other studies involving both language production and comprehension (e.g., Bultena, Dijkstra & van Hell, Reference Bultena, Dijkstra and van Hell2014; Duyck, Van Assche, Drieghe & Hartsuiker, Reference Duyck, Van Assche, Drieghe and Hartsuiker2007). As for interlingual homographs, Dijkstra, Timmermans and Schriefers (Reference Dijkstra, Timmermans and Schriefers2000) presented Dutch–English bilinguals with lists of English and Dutch words. The participants were to press a button only if an English word appeared. If the presented word belonged to Dutch, they were instructed to wait for the next word (i.e., a go / no-go paradigm). The authors were interested in the comparison between interlingual homographs (such as room, which means cream in Dutch) and words that only exist in English (e.g., home). The idea was that if participants only activated words in their English lexicon, they should not be influenced by whether or not the letter string formed a word with a different meaning in Dutch. Still, Dijkstra et al. (Reference Dijkstra, Timmermans and Schriefers2000) obtained a reliable homograph effect: Participants needed more time to decide that a homograph was an English word than that a non-homograph was an English word, even though the English reading of the homograph was much more frequent than the Dutch reading and even though all test words were readily recognized as valid English words. Interestingly, Dijkstra et al. further showed that performance was affected by the other language not only when the response was required in L2, but also when the response was required for words in L1 (with homographs in L2). Participants took longer to accept a letter string as an existing Dutch word when it was an English homograph (room) than when it was not (e.g., nis [niche]).
Commonalities in L1 and L2 processing
Research on bilingual language processing has traditionally focused on differences between L1 and L2 processing. For instance, Van Heuven, Dijkstra and Grainger (Reference Van Heuven, Dijkstra and Grainger1998) examined how the recognition of L2 target words is influenced by similar words in L1 and L2. Dutch–English bilinguals and English native speakers were asked to decide whether strings of letters formed English words or nonwords (English lexical decision task). For the English native speakers, word identification time depended on the number of English orthographic neighbors (i.e., words of the same length that differ by one letter). Participants took longer to decide that a letter string was a word when it had few neighbors (e.g., deny, with the neighbors defy and dent) than when it had many (e.g., dish, with the neighbors fish, wish, dash, dosh, disc, disk). In contrast, the Dutch–English bilinguals were more influenced by the number of Dutch neighbors than by the numbers of English neighbors. Furthermore, the Dutch neighborhood effect was different from the English neighborhood effect: Dutch–English bilinguals took longer to accept an English L2 word with many Dutch L1 neighbors (e.g., poor, with the Dutch neighbors boor, door, goor, hoor, koor, moor, noor, voor, pook, pool, poos, poot) than an English word with few Dutch neighbors (e.g., bath with no reasonably well-known Dutch words as neighbor). This was interpreted as evidence for strong inhibitory cross-language interactions in word identification.
To chart the differences between L1 and L2 word recognition more systematically, Lemhöfer, Dijkstra, Schriefer, Baayen, Grainger and Zwitserlood (Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008) set up a large-scale study comparing English word recognition in native speakers, Dutch–English bilinguals, French–English bilinguals, and German–English bilinguals. Participants were given a word identification task (progressive demasking) with 1,025 monosyllabic English words (3–5 letters). Against their own expectations based on van Heuven et al. (Reference Van Heuven, Dijkstra and Grainger1998), the authors found many more commonalities between the groups than differences. They observed a substantial overlap of reaction time patterns across the various groups of participants, indicating that the word recognition data obtained for one group generalized to the other groups. Furthermore, among the set of significant predictors, all but one reflected characteristics of the target language, English. There were virtually no influences of the bilinguals’ mother tongue on their responses to English words. As a result, Lemhöfer et al. concluded that to understand English L2 word processing, it is more important to study the properties of the English language itself than possible interactions between English and the participants’ mother tongue. The only robust differences Lemhöfer et al. (Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008) observed between native speakers and bilinguals were related to the cognate status of the words and the word frequency effect. As for the latter, L2 speakers needed relatively more time to process low-frequency words than L1 speakers. The larger frequency effect in bilinguals has also been reported by de Groot, Borgwaldt, Bos and van den Eijnden (Reference de Groot, Borgwaldt, Bos and van den Eijnden2002), Van Wijnendaele and Brysbaert (Reference Van Wijnendaele and Brysbaert2002), Duyck, Vanderelst, Desmet and Hartsuiker (Reference Duyck, Vanderelst, Desmet and Hartsuiker2008), Whitford and Titone (Reference Whitford and Titone2012), and Cop, Keuleers, Drieghe and Duyck (Reference Cop, Keuleers, Drieghe and Duyck2015).
The lexical entrenchment account
Diependaele, Lemhöfer and Brysbaert (Reference Diependaele, Lemhöfer and Brysbaert2013) examined whether the larger frequency effect in bilinguals was due to a qualitative distinction between L1 and L2 processing. A qualitative difference meant that an extra variable had to be postulated for L2 processing, that the weight of a variable differed fundamentally between L2 and L1, or that knowledge of more than one language significantly interfered with the processing of each of the languages. In contrast, if the larger frequency effect in L2 could be understood on the basis of the same mechanisms as differences in the frequency effect among L1 speakers, then this would be evidence for a system that processes L1 and L2 words in very much the same way. For instance, in L1 word recognition it has been reported that people with a small vocabulary size have a larger frequency effect than people with a large vocabulary size (Yap, Balota, Sibley & Ratcliff, Reference Yap, Balota, Sibley and Ratcliff2012). Could the difference in the frequency effect between bilinguals and native speakers also be explained by the fact that people have a smaller vocabulary size in L2 than in L1?
All participants in the Lemhöfer et al. (Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008) study completed a vocabulary test and, therefore, Diependaele et al. (Reference Diependaele, Lemhöfer and Brysbaert2013) could enter this variable as a covariate in their analysis. Once vocabulary size was taken into account, all differences between bilinguals and native speakers disappeared. Bilingual participants showed a larger frequency effect, not because they were processing words in L2, but because on average they had a smaller English vocabulary size. L2 speakers and L1 speakers with matched vocabulary sizes showed similar word frequency effects. Diependaele et al. (Reference Diependaele, Lemhöfer and Brysbaert2013) named their finding the lexical entrenchment hypothesis: “lexical representations are weaker in low-proficiency individuals and require more energy to be processed; this is particularly true for low-frequency words”.
Kuperman and Van Dyke (Reference Kuperman and Van Dyke2013) offered an explanation why a reduced vocabulary size correlates with an increased word frequency effect. They showed that limited exposure to language negatively affects the exposure to low-frequency words in particular. Large corpora yield higher frequencies of rare words than small corpora. So, people with limited exposure to a language are likely to have encountered low-frequency words considerably less than people with extensive exposure. High frequency words are encountered in large numbers by both groups and are less affected by additional exposures. The latter is a direct consequence of the fact that learning curves are concave with more impact of additional learning trials in the early stages of learning. To Kuperman and Van Dyke's (Reference Kuperman and Van Dyke2013) interpretation, one could add that people with a limited exposure to language are also likely to opt for easier materials (i.e., with fewer low-frequency words). For instance, it is well documented that written materials (books, newspapers, magazines) contain a richer choice of words than spoken conversations or television programs (Cunningham & Stanovich, Reference Cunningham and Stanovich2001).
Importantly, the lexical entrenchment hypothesis entails that there is no qualitative difference between L1 and L2 word processing, and that any processing differences can be explained by variations in exposure. Exposure is also the driving force behind the word frequency effect and the age of acquisition (AoA) effect (early-acquired words are easier to process than late-acquired words), and arguably exposure is also involved in the cognate effect (as cognates are part of both languages). This suggests that variations in exposure to the words of a language is the main variable determining word processing times for that language, both in L1 and L2. Following Diependaele et al. (Reference Diependaele, Lemhöfer and Brysbaert2013) and Kuperman and Van Dyke (Reference Kuperman and Van Dyke2013), we believe that a good vocabulary test is the best measure of language exposure we currently have (see also Huttenlocher, Haight, Bryk, Seltzer & Lyons, Reference Huttenlocher, Haight, Bryk, Seltzer and Lyons1991, for a link between language exposure and vocabulary knowledge in young children). Participants exposed to less language have a smaller vocabulary.
Lexical decision and a diffusion model analysis
A limitation of the Lemhöfer et al. (Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008) and the Diependaele et al. (Reference Diependaele, Lemhöfer and Brysbaert2013) studies is that they were based on word identification in the progressive demasking paradigm. In this paradigm a word is presented between masks for increasing durations until the participant is able to identify the word. Although this task is known to correlate with other word processing times (e.g., Carreiras, Perea & Grainger, Reference Carreiras, Perea and Grainger1997; Ferrand, Brysbaert, Keuleers, New, Bonin, Meot, Augustinova & Pallier, Reference Ferrand, Brysbaert, Keuleers, New, Bonin, Meot, Augustinova and Pallier2011; Ploetz & Yates, in press), it is not the most common task in word recognition research. Many more studies are based on the lexical decision task, which shows a very clear word frequency effect (Balota, Yap, Cortese, Hutchison, Kessler, Loftis, Neely, Nelson, Simpson & Treiman, Reference Balota, Yap, Hutchison, Cortese, Kessler, Loftis Neely, Nelson, Simpson and Treiman2007; Ferrand, New, Brysbaert, Keuleers, Bonin, Meot, Augstinova & Pallier, Reference Ferrand, New, Brysbaert, Keuleers, Bonin, Meot, Augustinova and Pallier2010; Keuleers, Diependaele & Brysbaert, Reference Keuleers, Diependaele and Brysbaert2010; Keuleers, Lacey, Rastle & Brysbaert, Reference Keuleers, Lacey, Rastle and Brysbaert2012). So, a test of the effect of the lexical entrenchment hypothesis on lexical decision times is needed.
A challenge for a between-groups design is to test enough participants to make sure that the participants form a representative group and that intermediate effect sizes can be detected. Lemhöfer et al. (Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008) compared four groups of 21 participants (university undergraduates) each. This is good, but still provides a rather limited picture. In particular, one would like to have a larger group of L1 speakers, so that the performance of L2 speakers can be compared to the full range of L1 performances. Such a study was recently published by Adelman, Johnson, McCormick, McKague, Kinoshita, Bowers, Perry, Lupker, Forster, Cortese, Scaltritti, Aschenbrenner, Coane, White, Yap, Davis, Kim and Davis (Reference Adelman, Johnson, McCormick, McKague, Kinoshita, Bowers, Perry, Lupker, Forster, Cortese, Scaltritti, Aschenbrenner, Coane, White, Yap, Davis, Kim and Davis2014), who tested 1011 native English speakers from 14 different universities on 420 six-letter words. By running an additional sample of Dutch–English bilingual participants, we can get a detailed picture of the position of L2 speakers relative to L1 speakers.
The large number of observations per participant and the large number of participants also allowed us to do more in-depth analyses than a simple comparison of mean reaction times (RTs). A model increasingly used to understand performance in binary forced choice RT tasks is Ratcliff's (Reference Ratcliff1978) diffusion model (Dutilh, Vandekerckhove, Forstmann, Keuleers, Brysbaert & Wagenmakers, Reference Dutilh, Vandekerckhove, Forstmann, Keuleers, Brysbaert and Wagenmakers2012; Gomez & Perea, Reference Gomez and Perea2014; Ratcliff, Gomez & McKoon, Reference Ratcliff, Gomez and McKoon2004). The advantage of using such a model is that it takes into account the full distribution of RTs both for correct and incorrect responses, words and nonwords, and that it captures differences between conditions with a small set of parameters, which can be linked to processing aspects. The model will be explained in more detail in the Results section, when we report the outcome of the analysis.
Method
Participants
Participants were 56 psychology undergraduates from Ghent University, Belgium. They had normal or corrected-to-normal vision and knew that the experiment involved English word recognition. All participants were native Dutch speakers and saw themselves as reasonably proficient in English. Because Adelman et al. (Reference Adelman, Johnson, McCormick, McKague, Kinoshita, Bowers, Perry, Lupker, Forster, Cortese, Scaltritti, Aschenbrenner, Coane, White, Yap, Davis, Kim and Davis2014) used 28 counterbalanced lists of stimuli (see below), two participants were tested per list. To be included in the data analysis, participants had to obtain accuracy scores above 75% in the lexical decision task. A similar criterion was used in Adelman et al., as that study's focus was on the orthographic priming effect of 28 different types of stimuli expressed in milliseconds. Because 16 students did not reach the 75% criterion, they were replaced (using the same stimulus list). Ghent University students also have reasonable knowledge of French (taught in the last two years of primary school and in all years of secondary education) and sometimes of a fourth language (German, Spanish, Turkish, Hebrew. . .), but this knowledge is not expected to affect the results in a way that invalidates the conclusions.
Stimuli
The 420 words and 420 nonwords from Adelman et al. (Reference Adelman, Johnson, McCormick, McKague, Kinoshita, Bowers, Perry, Lupker, Forster, Cortese, Scaltritti, Aschenbrenner, Coane, White, Yap, Davis, Kim and Davis2014) were used. They were all 6 letters long. As in the Adelman et al. study, targets were preceded by a briefly presented, masked non-word prime that had various letters in common with the target word. There were 28 types of primes varying from primes that had all letters in common with the target word (i.e., identity priming) to primes that had no letters in common (unrelated primes), as shown in Table 2 below. The primes were included to test various theories of orthographic processing (the original aim of the Adelman et al. study) and were not visible to the participants. Adelman et al. used a Latin-square design to obtain data from all prime-target combinations in a group of participants who saw the target list only once. Consequently, 28 different stimulus lists were composed with 15 target words in each priming condition. As orthographic priming is expected to take place at the very first, prelexical stages of word processing, we did not expect differences in orthographic priming between our L2 participants and the L1 participants tested by Adelman et al., also because Dutch and English have very similar orthographies. Targets were presented in uppercase letters, primes in lowercase letters.
Design
The design followed the Adelman et al. (Reference Adelman, Johnson, McCormick, McKague, Kinoshita, Bowers, Perry, Lupker, Forster, Cortese, Scaltritti, Aschenbrenner, Coane, White, Yap, Davis, Kim and Davis2014) study as closely as possible.Footnote 1 Participants started with the lexical decision experiment. They then proceeded with a word spelling test (not reported here) and a vocabulary test. The latter was based on Shipley (Reference Shipley1940) and consisted of 40 words of increasing difficulty with four alternatives to choose from. Participants had to select the correct alternative.
Results
The full dataset, containing all information of the lexical decision task at the trial level, is available on the website of the Open Science Framework (https://osf.io/wsdxm/). This is also the case for the mixed-effects models we report, so that the analyses we report can be replicated. Our discussion involves various parts, starting with the vocabulary test. As the lexical entrenchment hypothesis makes predictions about RTs we focus on this variable (see the diffusion model below for an analysis incorporating accuracy data). Following common practice, RTs were calculated on correct trials only. Outliers were detected and removed per participant using the adjusted boxplot criterion by Hubert & Vandervieren (Reference Hubert and Vandervieren2008), which takes into account the positive skewness of RT distributions. Because it became clear that the vocabulary sizes of our participants were at the low end of the L1 range, we included all L1 participants available in the Adelman et al. (Reference Adelman, Johnson, McCormick, McKague, Kinoshita, Bowers, Perry, Lupker, Forster, Cortese, Scaltritti, Aschenbrenner, Coane, White, Yap, Davis, Kim and Davis2014) database, so that we had a full overlap of the range of vocabulary sizes in both groups. This gave a total of 1,011 participants rather than the 924 analyzed by Adelman et al. (Reference Adelman, Johnson, McCormick, McKague, Kinoshita, Bowers, Perry, Lupker, Forster, Cortese, Scaltritti, Aschenbrenner, Coane, White, Yap, Davis, Kim and Davis2014). Table 1 shows the number of participants per university.
a This value is the one obtained from the dataset. In all likelihood, it is caused by a different starting point of the timer, as the RTs correlate as well with the other data as can be expected on the basis of the reliability of the data. Importantly, all analyses we report can handle a constant subtraction (e.g., due to inclusion of an intercept difference between participants or to the inclusion of Ter in the diffusion model). So, the conclusions we draw are not influenced by this measurement error.
Vocabulary test
Our participants scored on average 59.3% (SD = 9.1%) on the Shipley vocabulary test. Table 1 illustrates how this compares to the universities tested in Adelman et al. (Reference Adelman, Johnson, McCormick, McKague, Kinoshita, Bowers, Perry, Lupker, Forster, Cortese, Scaltritti, Aschenbrenner, Coane, White, Yap, Davis, Kim and Davis2014). As can be seen, the average score of the L2 participants was below that of the L1 participants, although it came close to the least scoring universities. As could be expected, the vocabulary scores correlated with the accuracy data on the lexical decision task (r = .91, N = 15). Surprisingly, they did not correlate with the response times (r = .13, N = 15).
Masked priming
Before we analyze the lexical decision data, it is important to check whether the orthographic priming effects are similar in L1 and L2, as expected. Table 2 shows the priming effects for the 28 different types of primes. As can be seen, the effects are pretty similar (correlation between the L1 and L2 effects = 0.84, N = 27, p < .0001). A mixed-effects analysisFootnote 2 on the lexical decision times confirmed that there were main effects of language (L1 vs. L2, χ2 (1) = 17.21, p < .001), vocabulary size (χ2 (1) = 19.83, p < .001), and type of prime (χ2 (27) = 1503.6, p < .001). Participants responded faster when English was their first language, when they had a large vocabulary size, and when the orthographic overlap between prime and target increased (Table 2). Importantly, there were no interactions between prime type and language (χ2 (27) = 23.34, p = .66) or between prime time and vocabulary size (χ2 (27) = 37.92, p = .08)
Lexical decision performance
As can be seen in Table 1, average performance of the L2 participants was in line with that of the L1 participants, although the RT was at the high end of the universities tested and the accuracy rate was at the low end. To further investigate the similarities/differences between the groups, we correlated the RTs of the groups across the 420 target words. The correlations are shown in the upper right half of Table 3. This table also includes an estimate of the reliabilities of the estimates per university placed on the diagonal (based on the Intraclass Correlation Coefficient). The reliabilities differ because the number of students tested per university varied from 28 to 217 (Table 1). Correlations can be corrected for the lack of reliability with the equation: corrected correlation = (correlation / sqrt(reliability test1 * reliability test2). The corrected correlations are given in the lower left half of Table 3. They clearly show the high correlation between L2 and L1 processing times (around r = .8), but the still higher correlations between the L1 data collected at the various universities (around r = .9). As was found by Lemhöfer et al. (Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008), the commonalities of L1 and L2 processing outweigh the differences, but there is room for a few discrepancies, which will be outlined in the remainder of the text.
The frequency effect and the lexical entrenchment hypothesis
The lexical entrenchment hypothesis makes two predictions: (1) participants with a small vocabulary size will show a stronger word frequency effect than participants with a large vocabulary size, and (2) once vocabulary size is taken into account, no more difference in frequency effect is expected between L1 speakers and L2 speakers.
To test the frequency effect, we made use of the SUBTLEX-UK word frequency estimatesFootnote 3 , expressed as Zipf-values (Van Heuven, Mandera, Keuleers & Brysbaert, Reference Van Heuven, Mandera, Keuleers and Brysbaert2014). The Zipf values are a standardized measure of word frequency, equal to log10 (frequency per billion words), and have the following interpretation: A Zipf value of 2 equals 1 occurrence per 10 million words, Zipf 3 = 1 occurrence per million words, Zipf 4 = 10 occurrences per million words, and Zipf 5 = 100 occurrences per million words. As a rule of thumb, Zipf-values of 3 and lower can be considered as low-frequency words (equal to or lower than 1 occurrence per million words) and values of 4 and higher as high frequency words (equal to or higher than 10 occurrences per million words).
The usual finding related to the frequency effect is that the frequency effect is strong in the middle part of the continuum but levels off at the low and the high end (Keuleers et al., Reference Keuleers, Diependaele and Brysbaert2010, Reference Keuleers, Lacey, Rastle and Brysbaert2012). The leveling-off at the high end is most likely due to a floor effect in RTs. The levelling-off at the low end seems to be related to the fact that many low frequency words are not well known.Footnote 4 The consequence is that the RTs are based on smaller numbers of observations, which in addition come from the few people who know the word (and arguably have processed it more often). Keuleers, Stevens, Mandera and Brysbaert (Reference Keuleers, Stevens, Mandera and Brysbaert2015) showed that the percentage of people who know a word (a variable called ‘word prevalence’) is more informative for low-frequency words than frequency itself.
The shape of the frequency effect outlined above is also present in the current dataset (Figure 1), although the leveling off at the low end starts at much higher word frequencies than seen in other megastudies (possibly because the participants of the word megastudies had larger vocabulary sizes). We tried out various ways to best capture the nonlinear nature of the frequency effect, but the most easily understandable (without loss of accuracy) is the one suggested by Harrell (Reference Harrell2001) and depicted in Figure 1. In this approach the frequency effect is estimated via linear regression in three ranges: Low end, middle, high end. In line with Harrell's (Reference Harrell2001) recommendation, the inflection knots were placed at the frequency percentiles 20 and 80 (i.e., the lower end included the 20% words with the lowest frequencies and the higher end included the 20% words with the highest frequencies). For the present stimulus set, these knots coincided with the Zipf values 3.047 and 4.302.
Based on a mixed-effects model with frequency as a fixed effect, a random intercept per item and participant and random slopes of the frequency effect per participant, frequency is highly significant in the middle part (β = −60.88, z = −15.14, χ2 (1) = 229.315, p < 0.001) but not in the low part (β = −11.29, z = −1.01, χ2 (1) = 1.022, p = 0.31) or the high part (β = −10.12, z = −1.53, χ2 (1) = 2.346, p = 0.13). As will become clear below, the middle range is the part where the individual differences were situated.
To check whether the L2 speakers had a stronger word frequency effect than the L1 speakers, as previously reported, we added language group and the interaction between language group and frequency to the above model (together with a random effect of language per item). In this analysis, the interaction between language group and frequency was significant for the middle part, but not for the lower and the higher end (see Table 4). In addition, there was a strong main effect of language group, because the L2 speakers were on average 88 ms slower (740 ms) than the L1 speakers (652 ms). Figure 2 shows the frequency effects for the L1 and L2 group.
The specific prediction of the lexical entrenchment hypothesis is that the difference in the word frequency effect between L1 and L2 speakers disappears once vocabulary size is taken into account. To test this prediction, we added vocabulary size, its random slope per item and its interaction with frequency to the model. This analysis (Table 5) showed a strong main effect of vocabulary size: The participants with the lowest vocabulary sizes (estimated as 2SD below the mean) were 64 ms slower than the participants with the highest vocabulary sizes (estimated as 2SD above the mean), with RTs of 685 ms and 621 ms respectively. More importantly, there was a strong interaction between vocabulary size and word frequency in the middle range of the frequency, but not at the lower end or the higher end, as shown in Figure 3. The word frequency effect was larger for participants with a small vocabulary than for participants with a large vocabulary. Furthermore, after adding vocabulary size, the interaction between frequency and language was not significant any more, either for the middle, lower, or higher part of the frequency range. The main effect of language remained significant.
A diffusion model analysis
In the previous analyses we saw clear evidence for a modulation of the frequency effect by vocabulary size, combined with overall slower reaction times for the Dutch–English bilinguals (even though the RTs of our bilinguals were not much longer than those of the students from the University of Arizona and Colby College; Table 1). Another way to investigate the origins of these effects is to make use of a model of the underlying processes. A model increasingly used to understand performance in binary forced choice RT tasks is Ratcliff's (Reference Ratcliff1978) diffusion model (Dutilh et al., Reference Dutilh, Vandekerckhove, Forstmann, Keuleers, Brysbaert and Wagenmakers2012; Gomez & Perea, Reference Gomez and Perea2014; Ratcliff et al., Reference Ratcliff, Gomez and McKoon2004). The advantages of the model are that it takes into account the full distribution of RTs both for correct and incorrect responses, words and nonwords, and that it captures differences between conditions with a small set of parameters. Figure 4 shows the model as it applies to a lexical decision situation. The model assumes that the information for a word or a nonword response accumulates over time, beginning from a start position until a threshold value is exceeded. The starting value, the speed with which information increases, and the position of the threshold values are parameters of the model.
The standard version of the diffusion model makes use of seven parameters:
-
1. Mean drift rate (v): This is the speed with which information accumulates. It depends on task difficulty and participant ability. Word frequency typically affects this parameter, with higher drift rates for high-frequency words than for low-frequency words (Dutilh et al., Reference Dutilh, Vandekerckhove, Forstmann, Keuleers, Brysbaert and Wagenmakers2012; Gomez & Perea, Reference Gomez and Perea2014; Ratcliff et al., Reference Ratcliff, Gomez and McKoon2004). We expect vocabulary size to have a strong effect on this parameter. The lexical entrenchment hypothesis predicts that there will be no additional effect of L2 vs L1 once vocabulary size is taken into account. There are separate drift rates for word and nonwords.
-
2. Across–trial variability in drift rate (η). This parameter reflects the fact that drift rate may fluctuate from one trial to the next. As people with a large vocabulary size are more practiced, it seems sensible to expect that η decreases with vocabulary size.
-
3. Boundary separation (a). This variable indicates how far the boundaries are separated from each other. It quantifies response caution and modulates the speed–accuracy tradeoff. Given that bilinguals took longer to respond but made more errors, it is not clear what to expect for this parameter.
-
4. Mean starting point (z): This variable reflects the bias participants have towards word or nonwords responses. It might be hypothesized, for instance, that participants with a small vocabulary size show a stronger bias towards nonwords responses, as they know fewer words.
-
5. Across–trial variability in starting point (sz). This parameter reflects the fact that the starting point may fluctuate from one trial to the next. Given that participants with a large vocabulary have more practice with words, a likely expectation is that variability will decrease with vocabulary size.
-
6. The non–decision component of processing (Ter). This parameter represents the time needed to encode the stimulus and execute the response, irrespective of information accumulation and decision. Finding a difference between L2 and L1 speakers on this parameter would suggest that the main effect of language group has little to do with word processing. On the other hand, both Dutilh et al. (Reference Dutilh, Vandekerckhove, Forstmann, Keuleers, Brysbaert and Wagenmakers2012) and Gomez and Perea (Reference Gomez and Perea2014) found a clear effect of word frequency on Ter. So, the interpretation of this variable is less clear for word processing than originally assumed.
-
7. Across–trial variability in the non–decision component of processing (sT). As for the previous variability parameters, the explanation would be most straightforward if the variability decreased as a function of vocabulary size.
By fitting the model to the data of each participant, we can enter the resulting parameter estimates in multiple regression analyses with language group (L1, L2) and vocabulary size as predictors. To estimate the parameters of the diffusion model, we made use of the fast-dm algorithm written by Voss & Voss (Reference Voss and Voss2007).
Table 6 shows the estimates of the various parameters, together with the z-values for the effects of language group and vocabulary size. Language group has a significant effect on the drift rate for words and on the non-decision time. Vocabulary size had a significant effect on nearly all parameters.
** p < .001
Starting with the most interesting parameter, we see that the drift rate v differs as a function of vocabulary size, as expected: Participants with a large vocabulary size have a higher drift rate than participants with a low vocabulary size. At the same time, L2 speakers have a lower drift rate than L1 speakers for words. Figure 5 shows both effects. The variability in drift rate (η) was smaller for participants with a high vocabulary size, in line with the assumption that processing went more smoothly for them.
There were no clear effects on boundary separation (parameter a) when we corrected for multiple comparisons. If a more lenient criterion is used, L2 speakers had their boundaries slightly lower than L1 speakers, meaning that they based their decisions on less information. This explains their higher error rates. Interestingly, the boundaries were not influenced by vocabulary size. Figure 6 shows how the a-parameter changes as a function of language group and vocabulary size.
All participants had a bias towards words (i.e., the starting point was closer to the word boundary than to the nonword boundary, as shown in Figure 7). Against expectation, participants with a large vocabulary had a less strong word bias than participants with a small vocabulary.
There was a 70 ms difference in Ter between L2 and L1 speakers, indicating that the main effect of language group on RT was largely due to factors outside the word recognition and decision processes (Figure 8). At the same time, there was no difference between people with a small and a large vocabulary. These findings agree with the observation that a considerable variability was observed in the mean RTs between the English-speaking universities as well, without corresponding differences in vocabulary size (Table 1).
Finally, the variabilites of Ter and z had opposite effects as a function of vocabulary size. Whereas the variability in Ter decreased for participants with a large vocabulary, as expected, the variability in z (the starting point) increased. It is not clear how to interpret the latter finding. Maybe good participants are more flexible in their starting point and make it shift more as a function of the stimulus sequence just processed (e.g., a streak of words or nonwords; Dufau, Grainger & Ziegler, Reference Dufau, Grainger and Ziegler2012)?
Cognates, age-of-acquisition, and neighbors
Given the richness of the dataset, it is worthwhile to further test three variables that have been claimed to affect L2 word recognition differently from L1 word recognition. This allows us not only to further chart the differences between L1 and L2 processing, but also to test the quality of the dataset. If none of these effects could be found, we would have to conclude that the dataset is less interesting than we had hoped for. The three variables claimed to have different effects in L1 and L2 are cognates, age-of-acquisition (AoA), and neighbors in L1 and L2. Importantly for bilingualism researchers, AoA refers to the age at which English words are acquired in English L1 speakers, not the age at which an L2 is learned. These variables were added simultaneously to the model of Table 5 (see Table 7).
As indicated in the Introduction, cognate words are expected to be easier for bilinguals than non-cognate words. Based on the Dutch–English cognate list compiled by Schepens, Dijkstra and Grootjen (Reference Schepens, Dijkstra and Grootjen2012), 126 of the 420 target words were Dutch–English cognates. As predicted, bilinguals were 26 ms faster on cognates than on noncognates (z = −4.81, p < 0.001). This was significantly larger than the difference seen in L1 speakers (z = −3.56, p<0.001; Figure 9), even though the L1 speakers also responded 11 ms faster to the cognates than the noncognates (z = −3.20, p < 0.001), indicating that researchers must be very careful when they investigate the cognate effect, as the effect could be due to other variables if it is not contrasted against an L1 group. Also reassuring is that the cognate effect did not depend on vocabulary size, as the cognate effect is thought to be present in all bilinguals.
Izura and Ellis (Reference Izura and Ellis2002) reported that the AoA effect in L2 depends on the order of acquisition of the L2 words and not on the order of acquisition of the L1 words. Given that most of our bilingual participants started to learn English at the age of 12–14 years, the words they first acquired were different from the words an English toddler is learning. So, if Izura and Ellis (Reference Izura and Ellis2002) are right, we ought to find a stronger AoA effect, based on English L1 AoA estimates, for L1 speakers than for L2 speakers. The AoA measures were taken from Kuperman, Stadthagen-Gonzalez and Brysbaert (Reference Kuperman, Stadthagen-Gonzalez and Brysbaert2012). As Figure 10 and Table 7 show, there was indeed a significant interaction between AoA and language group in the predicted direction. We found an AoA effect for L1 speakers (β = 3.61, z = 5.34, p < 0.001), but not quite for L2 speakers (β = 1.58, z = 1.41, p = 0.156), although there was a trend in the right direction. AoA did not interact with vocabulary size, as was expected given that the AoA effect is assumed to be present for all L1 speakers.
As described in the introduction, van Heuven et al. (Reference Van Heuven, Dijkstra and Grainger1998) reported that intra-language neighbors had a facilitation effect on English lexical decision times, but that inter-language neighbors had an inhibition effect for bilinguals. We could test this pattern of results in our data as well.Footnote 5 Because the length of the stimuli was longer in the present dataset (6 letters) than in Van Heuven et al. (3–5 letter words), the number of neighbors is considerably less. However, this is likely to be an advantage, because the effect of word neighbors is particularly robust between 0 and 1 neighbor (Davis, Reference Davis2010). As it happens, 221 out of the 420 words did not have an English neighbor, and only 74/420 words had at least one Dutch neighbor.Footnote 6
As can be seen in Figure 11, the effect of English neighborhood size was facilitatory, both for the L1 and the L2 speakers. The effect was best captured with the log (neighborhood size + 1) transformation as predictor. This transformation takes into account that the effect of word neighborhood size is particularly strong for differences between small sizes. The effect of English neighborhood was larger for participants (both L1 and L2) with a small vocabulary size.
The Dutch neighborhood size had no effect, also not for the L2 speakers separately. There was a hint of an interaction with vocabulary size, as the effect tended to be facilitatory for participants with a small vocabulary but inhibitory for participants with a large vocabulary size. However, this interaction was present to the same extent for L1 and L2 speakers and, hence, is unlikely to be specific to knowledge of the Dutch language.
Discussion
Bilinguals show a stronger frequency effect in L2 than in L1 (Cop et al., Reference Cop, Keuleers, Drieghe and Duyck2015; de Groot et al. Reference de Groot, Borgwaldt, Bos and van den Eijnden2002; Duyck et al., Reference Duyck, Vanderelst, Desmet and Hartsuiker2008; Lemhöfer et al., Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008; Van Wijnendaele & Brysbaert, Reference Van Wijnendaele and Brysbaert2002; Whitford & Titone, Reference Whitford and Titone2012). According to the lexical entrenchment hypothesis (Diependaele et al., Reference Diependaele, Lemhöfer and Brysbaert2013), this difference can be explained on the basis of a more limited exposure to L2 than to L1, and requires no further explanation. A good proxy of language exposure is vocabulary size (see also Kuperman & Van Dyke, Reference Kuperman and Van Dyke2013). Once a person's vocabulary size is taken into account, there are no further differences between L2 and L1 processing.
The present study tests the lexical entrenchment hypothesis with lexical decision data. We made use of a database in which lexical decision times for 420 six-letter English words had been collected from 1011 native speakers at 14 different universities. To this database, we added the records of 56 Dutch–English bilinguals with overlapping vocabulary sizes. In line with previous findings, there was a clear interaction between language group and word frequency: The frequency effect was stronger for the L2 speakers than for the L1 speakers (Table 4 and Figure 2). More importantly, when vocabulary size was introduced as a covariate, the interaction largely disappeared (Table 5), as reported by Diependaele et al. (Reference Diependaele, Lemhöfer and Brysbaert2013). Bilinguals show a stronger word frequency effect in L2, not because a second language is harder to process, but because participants have had less exposure to this language than the average native speaker. Once the degree of exposure (estimated via vocabulary size) is taken into account, the frequency effects in L1 and L2 become equivalent.
Further evidence that L2 word processing is better explained in terms of exposure to L2 than in terms of interactions with L1 can be seen in the effects of cognates, AoA, and word neighbors. Each of these effects can be explained in terms of exposure. Because cognates exist in both languages and have the same meaning, bilingual participants have been exposed to them more often and, hence, show a cognate advantage (Figure 9). Interestingly, the English L1 speakers also showed a (smaller) cognate effect. This has been reported before (Mulder, Dijkstra, Schreuder & Baayen, 2014) and related to the fact that cognates tend to be the same in many languages. As a result, they are the words that English speaking students may pick up most easily when they are abroad or have some shallow knowledge of another language.
The age-of-acquisition effect is attributed to the order of acquisition and to the fact that a learning network loses plasticity the more stimuli of a particular kind it already knows (Monaghan & Ellis, Reference Monaghan and Ellis2010). Interestingly, the AoA effect in L2 is related to the order of word acquisition in L2 and not to the order of acquisition in L1 (Izura & Ellis, Reference Izura and Ellis2002). As a result, English AoA estimates should be better predictors of L1 processing times than of L2 processing times, as we indeed observed (Figure 10). The fact that the AoA effect is not completely absent for L2 speakers is in line with the hypothesis that the AoA effect is not entirely situated in the connections between the representations but also has an effect on the organization of the semantic system, with the meaning of early-acquired words being more accessible than the meaning of late-acquired words (Brysbaert & Ellis, in press; Brysbaert, Van Wijnendaele & De Deyne, Reference Brysbaert, Van Wijnendaele and De Deyne2000). Importantly for the present discussion, the most straightforward interpretation of the difference in AoA effect between L1 and L2 word processing refers to differences in (the order of) exposure to the English words.
Finally, we observed that reaction times to English words were influenced by the number of English orthographic neighbors, but not by the number of Dutch orthographic neighbors. The former is in line with van Heuven et al. (Reference Van Heuven, Dijkstra and Grainger1998). The effect is present to a similar extent in the English Lexicon Project (as checked on the basis of Balota et al., Reference Balota, Yap, Hutchison, Cortese, Kessler, Loftis Neely, Nelson, Simpson and Treiman2007) and, therefore, is not something peculiar to the present experiment (e.g., due to the fact that the target words were preceded by orthographic primes). The absence of an effect due to Dutch neighbors contrasts with van Heuven et al. (Reference Van Heuven, Dijkstra and Grainger1998), who found an inhibitory effect of Dutch neighbors for Dutch–English bilinguals. As indicated in the introduction, the pattern of results reported by van Heuven et al. (Reference Van Heuven, Dijkstra and Grainger1998) did not agree with the later findings of Lemhöfer et al. (Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008) or Diependaele et al. (Reference Diependaele, Lemhöfer and Brysbaert2013). Our findings are further evidence that this aspect of the van Heuven et al. (Reference Van Heuven, Dijkstra and Grainger1998) data may be less solid than assumed thus far. On the other hand, it should be taken into account that our study was not well suited to measure the effects of cross-language, Dutch neighbors. Less than 20% of the words had Dutch neighbors and no attempts were made to make the Dutch neighborhood size orthogonal to the English neighborhood size. So, the null-effect has to be treated very cautiously.
The facilitation effect of within-language English neighbors was stronger for participants with a small vocabulary size than for participants with a large vocabulary size (Figure 11). This is in line with the hypothesis that the neighborhood size effect on lexical decision times is the result of a balance between (a) facilitation due to the fact that a word looks more wordlike when it has neighbors, and (b) inhibition because it is more difficult to distinguish two visually similar words (Andrews, Reference Andrews1997; Grainger & Jacobs, Reference Grainger and Jacobs1996). Because a lexical decision can often be made on the basis of an overall familiarity feeling rather than the identification of the exact word presented, word neighborhood facilitation effects are often observed in lexical decision experiments (Andrews, Reference Andrews1997). This is particularly true for participants with lower English proficiency levels (Andrews & Hersch, Reference Andrews and Hersch2010). Important for the present discussion is that the effect of orthographic neighbors depends on the English vocabulary size of the participants and not on whether English was their L2 or L1 (Figure 11).
So far, the analyses are all in line with the lexical entrenchment hypothesis: Differences between L1 and L2 processing can be explained in terms of differences in exposure to the target language, which can be measured with a good vocabulary test, and do not need the inclusion of further mechanisms. A slightly more complicated picture emerges, however, when we analyze the data with the diffusion model (Ratcliff, Reference Ratcliff1978). Then we see that the similar RTs in L1 and L2, once vocabulary size is filtered out, are not achieved in exactly the same way. In particular, there is some evidence that lexical information builds up more slowly in L2 than in L1, and that this is compensated by a stronger word bias and more risky decision boundaries in L2 speakers (Figures 5–7). This would suggest that L2 word processing is genuinely harder than L1 word processing (e.g., because of extra competition from the L1 words). A complicating factor for this explanation is that the slower information build-up is not observed for non-words, making it hard to decide whether there is a genuine difference between L1 and L2 processing in terms of the diffusion model parameters, or whether the differences observed are due to some overfitting of the model or because the vocabulary test we used failed to pick up all differences between L1 and L2 speakers. Given that the effects of language on the parameters of the diffusion model are rather modest and not entirely convergent, for the time being we prefer to treat them as an observation, to be kept in mind when analyzing new data but not strong enough to refute the lexical entrenchment hypothesis. A further interesting research question may be to investigate whether similar effects would be found in L1 processing between bilinguals and monolinguals, to find out whether knowledge of another language has an impact on the processing of the native language. Such research would require a considerable investment, however, as the participant samples must be large enough to have good power to disentangle the effect of language status from the effect due to differences in vocabulary size.
All in all, our findings largely agree with the conclusions of Lemhöfer et al. (Reference Lemhöfer, Dijkstra, Schriefers, Baayen, Grainger and Zwisterlood2008) and Diependaele et al. (Reference Diependaele, Lemhöfer and Brysbaert2013) that in order to understand L2 word processing, it is much more important to study the characteristics of the L2 words, rather than possible ways in which L1 and L2 words interfere with each other. All the differences between L1 and L2 word processing we obtained could be understood on the basis of discrepancies in the exposure to the English language, which can be estimated by means of an objective vocabulary test.Footnote 7
Although it may be tempting to interpret the absence of an interaction between Dutch and English words as evidence for separate lexicons (in which the English L2 words are insulated from the Dutch L1 words), we do not think such a conclusion is warranted. As indicated in the Introduction, there is quite a lot of evidence that the bilingual lexicon is unitary (Brysbaert & Dijkstra, 2005; Brysbaert & Duyck, Reference Brysbaert and Duyck2010; Jin, Reference Jin2013; Kroll et al., Reference Kroll, Bobb and Wodniecka2006; Tokowicz, Reference Tokowicz2014). In addition, interpreting a lack of interaction between Dutch and English words as evidence for distinct lexicons only makes sense in the presence of clear interactions between the English words themselves. Such interactions should have taken the form of an inhibition effect between English orthographic neighbors. The fact that we found a facilitation effect can only be explained by assuming that the lexical decision times were partly based on the overall ‘English’ activity in the mental lexicon (Andrews, Reference Andrews1997; Grainger & Jacobs, Reference Grainger and Jacobs1996). Such overall activity can as well be present in a bilingual Dutch–English lexicon as in a full English lexicon. Apparently, RTs from a lexical decision task are not well suited to expose the competition process between orthographically similar entries in the mental lexicon, contrary to what the data of van Heuven et al. (Reference Van Heuven, Dijkstra and Grainger1998) originally suggested.Footnote 8 Ferrand et al. (Reference Ferrand, Brysbaert, Keuleers, New, Bonin, Meot, Augustinova and Pallier2011) reported a similar lack of orthographic competition effect on response times in the progressive demasking task. The most likely reason for the insensitivity of both tasks to orthographic competition is that the size of the effect is considerably smaller than the exposure-based effects reported here and in Diependaele et al. (Reference Diependaele, Lemhöfer and Brysbaert2013). This, in our view, is the reason why the lexical entrenchment hypothesis is such a good account for the RTs obtained in progressive demasking and lexical decision.