Research on bilingualism has in recent years accelerated at “a dizzying pace” (Kroll & de Groot, Reference Kroll and de Groot2005). Yet, despite the now thousands of studies, there is still no standard method for determining language proficiency, degree of bilingualism, and language dominance. Uniformity in how language dominance is assessed is tremendously important for advancing knowledge about the effects of bilingualism on language processing and cognition, and for interpretation of outcomes observed in experimental studies, and in clinical settings. Some effects obtained will apply only to some types of bilinguals (e.g., the cognitive advantages of bilingualism may be observed only in highly proficient bilinguals), but without a system for classifying bilinguals into types it will be impossible to identify precisely which aspect of bilingualism is critical in each case. A standard method for determining proficiency and dominance across multiple types of bilinguals would go a long way towards clarifying the associated theoretical implications.
One of the most broadly used approaches to assessing bilingual language proficiency are self-ratings (Li, Sepanski & Zhao, Reference Li, Sepanski and Zhao2006). Bilinguals are often asked to rate their abilities in each language, and multiple studies have shown that self-ratings are significantly correlated with objectively measured proficiency on a broad variety of measures (e.g., in one study, significant correlations were reported between self-ratings and reading fluency, reading comprehension, picture naming, auditory comprehension, sound awareness, receptive vocabulary, and grammaticality judgment speed and accuracy; see Marian, Blumenfeld & Kaushanskaya, Reference Marian, Blumenfeld and Kaushanskaya2007). These correlations are often highly robust (significant at the p < .01 level), and can also be moderate or large in size (especially for ratings of a non-dominant language which were as high as .74 in some cases in Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007).
However, correlations between self-reported proficiency and objective measures of proficiency are far from perfect, and they do not address a different question, which is: How accurately can bilinguals classify themselves into language dominance groups? Some have argued that bilinguals are “notoriously bad” (Dunn & Fox Tree, Reference Dunn and Fox Tree2009, p. 275) at providing such ratings (Hakuta & D'Andrea, Reference Hakuta and D'Andrea1992), and the issue of measuring bilingual language proficiency and dominance is timely (e.g., Bedore et al., Reference Bedore, Peña, Summers, Boerger, Resendiz, Greene, Bohman and Gillam2011; Daller, Reference Daller2011; Treffers-Daller, Reference Treffers-Daller2011), but no studies considered how accurately bilinguals report which language is dominant on a case-by-case basis. In clinical settings examinees are often asked which language they prefer and then are tested exclusively in that language. Thus, it is important to assess the accuracy of such reports for predicting language dominance (Lim, Rickard Liow, Lincoln, Chan & Onslow, Reference Lim, Rickard Liow, Lincoln, Chan and Onslow2008). Testing in a non-dominant language will underestimate performance, and testing in the dominant language may be more likely to distinguish patients from healthy controls (Gollan, Salmon, Montoya & da Pena, Reference Gollan, Salmon, Montoya and da Pena2010), which is often the goal in clinical settings.
The question “Which language is your dominant language?” can also be viewed as inherently flawed given that for many bilinguals one language is dominant in one domain whereas a different language is dominant in another domain (e.g., at home versus at work; this issue is discussed at length by Grosjean, Reference Grosjean2008). Evidence for this phenomenon can be found in the assessment of picture naming skills which improve for bilinguals when they are credited for producing a name in either language (for similar approaches see Bedore, Peña, García & Cortez, Reference Bedore, Peña, García and Cortez2005; Kohnert, Hernandez & Bates, Reference Kohnert, Hernandez and Bates1998). This improvement in naming scores with alternative scoring procedures is found in bilingual children (Bedore, et al., Reference Bedore, Peña, García and Cortez2005; Pearson, Fernández & Oller, Reference Pearson, Fernández and Oller1993; Umbel, Pearson, Fernández & Oller, Reference Umbel, Pearson, Fernández and Oller1992) in college-aged and middle-aged adult bilinguals (Gollan & Silverberg, Reference Gollan and Silverberg2001; Kohnert, et al., Reference Kohnert, Hernandez and Bates1998), in aging bilinguals (Gollan, Fennema-Notestine, Montoya & Jernigan, Reference Gollan, Fennema-Notestine, Montoya and Jernigan2007), and for bilinguals with Alzheimer's disease (Gollan et al., Reference Gollan, Salmon, Montoya and da Pena2010). Scores improve when names in either language are credited because bilinguals know some names in their non-dominant language that they do not know in their otherwise usually more dominant language. Thus, the usually non-dominant language may be dominant in some situations, and even if bilinguals could be accurate in saying which language is dominant overall, testing in just one language would still provide an incomplete assessment of language proficiency in some important ways.
A different approach to establishing which language is dominant is to test bilinguals in both languages on an objective measure. However, objective measures can be biased if they are more difficult in one language than the other. Further complicating matters, it is not always clear how to design difficulty-matched measures across different languages. This can be particularly challenging with language pairs that are structurally distinct (e.g., English and Chinese differ greatly in orthography, phonology, and morphology; Lim et al., Reference Lim, Rickard Liow, Lincoln, Chan and Onslow2008), but will be present to at least some degree with any language pair (Grosjean, Reference Grosjean1998). For example, the Boston Naming Test (BNT; Kaplan, Goodglass & Weintraub, Reference Kaplan, Goodglass and Weintraub1983) was designed for monolingual English speakers, and is graded for difficulty in English such that relatively easy items appear at the beginning of the test and the most difficult items towards the end of the test. The final item is abacus an item that is quite difficult in English, but because abacuses are more common in China than they are in the USA, it is relatively easy to name in Mandarin. Thus, an item that is difficult in one language may be relatively easy in the other and vice versa (see also Kohnert et al., Reference Kohnert, Hernandez and Bates1998).
One way around this problem is to create parallel versions of a test with different items for each language. However, this introduces a different problem which is how to establish the criterion of reference for difficulty. For example, it might be stipulated that a test is difficulty-matched for English and Spanish if monolingual speakers of similar age and education levels obtain equivalent scores on the test (Peña, Reference Peña2007). This approach is becoming common practice in the field; for example, the Bilingual Aphasia Test (Paradis & Libben, Reference Paradis and Libben1987) has parallel versions with some overlapping and some different items for each language, and the Woodcock–Muñoz test (see Woodcock & Muñoz-Sandoval, Reference Woodcock and Muñoz-Sandoval1996) has different items for testing in Spanish than the Woodcock–Johnson set has for testing in English (see Mather & Woodcock, Reference Mather and Woodcock2001). Similarly the TVIP (Test de Vocabulario en Imágenes Peabody; Dunn, Padilla, Lugo & Dunn, Reference Dunn, Padilla, Lugo and Dunn1986) was created by selecting subsets of Spanish-appropriate items from two versions of the PPVT (Peabody Picture Vocabulary Test; Dunn & Dunn, Reference Dunn and Dunn1981, 1987). The use of different items in each language will work well for assessing proficiency in an individual target language, but not necessarily for comparing across languages given possible difficulties with matching monolingual speakers across cultures (e.g., a high school education in the USA may not be equivalent to a high school education in a different country; Byrd, Sanchez & Manly, Reference Byrd, Sanchez and Manly2005). In some respects this approach also seems to adopt the questionable assumption (Grosjean, Reference Grosjean1989) that bilinguals should ideally be able to function like a monolingual in each language.
In the current study we examined the utility of self-reported proficiency ratings for establishing spoken language dominance. As objective measures of spoken proficiency, participants were interviewed in each language by a bilingual experimenter using a structured oral proficiency interview (OPI). In addition, participants named pictures in each language using the Multilingual Naming Test (MINT; a new naming test that was designed for bilingual speakers), and in Experiment 1 also the Boston Naming Test. Although self-report of language dominance has been criticized, we hypothesized that dominance ratings on the group level would be at least as reliable as correlations between self-report and measures of ability in each language because individuals may vary in their standards of excellence, and dominance ratings control for such differences but ratings of absolute level of ability do not. For example, some people might never rate themselves as Superior on any domain even though their abilities may in fact be superior in objective terms relative to others. Conversely, other individuals might overestimate their abilities relative to others. Ratings of language dominance would not be as affected by such differences given their focus on ability in one versus the other language within the same person, rather than on ability in each language relative to other people.
Experiment 1: Young bilinguals
Methods
Participants
A total of 112 young adults (56 bilinguals and 56 monolinguals) participated. Most were undergraduates at the University of California, San Diego (UCSD), and participated in exchange for course credit. A smaller number received payment ($20) for their participation. Four bilinguals were excluded from further analyses because they had to leave before they could complete all of the tasks. In addition, 19 monolinguals were excluded for being partially bilingual. The criteria used to classify monolinguals were as follows: (a) must rate their ability to speak a language other than English as less than 5 (which corresponds to “intermediate middle” on the 10-point scale in the appendix), and (b) must report using English at least 95% of the time during childhood. These criteria were developed based on the bilingual data; all but two bilinguals rated their Spanish-speaking abilities as greater than 6 (the remaining two rated their Spanish-speaking ability as 5). In addition, all bilinguals rated their percentage of English use when growing up as between 10% and 93%. Participant characteristics are shown in Table 1, with bilinguals separated into three groups: Spanish-dominant bilinguals, who rated their Spanish as more proficient than their English (n = 10); balanced bilinguals, who selected the same rating for each language (n = 7); and English-dominant bilinguals, who rated their English as more proficient than their Spanish (n = 35).
eMarginally significant t-test comparing Spanish-dominant to balanced bilinguals (p < .10)
eeSignificant t-test comparing Spanish-dominant to balanced bilinguals (p < .05)
eeeSignificant t-test comparing Spanish-dominant to balanced bilinguals (p < .01)
*Marginally significant t-test comparing Spanish-dominant to English-dominant (p < .10)
**Significant t-test comparing Spanish-dominant to English-dominant (p < .05)
***Significant t-test comparing Spanish-dominant to English-dominant (p < .01)
††Significant t-test comparing English-dominant to monolinguals (p <. 05)
†††Significant t-test comparing English-dominant to monolinguals (p < .01)
1The following seven-point scale was used: 1 = rarely or never, 2 = less than one hour per day, 3 = about one hour per day, 4 = about 2 hours per day, 5 = about 3–4 hours per day, 6 = about 5 hours per day, 7 = 6 or more hours per day.
2The following five-point scale was used: 1 = just once to switch out of English, 2 = occasionally, 3 = two or three times in each conversation, 4 = several times in each conversation, 5 = lot or sometimes even constantly.
3Self-ratings were based on the following 10-point scale: 1 = novice low, 2 = novice middle, 3 = novice high, 4 = intermediate low, 5 = intermediate middle, 6 = intermediate high, 7 = advanced low, 8 = advanced middle, 9 = advanced high, 10 = superior.
Materials and procedure
Participants signed consent forms and completed a Language History Questionnaire at the start of the testing session, followed by an English vocabulary test (the Shipley Vocabulary Test; Shipley, Reference Shipley1946; which consists of 40 multiple-choice synonym identification questions), and a test of non-verbal reasoning skills (the Matrices Subtest of the Kaufman Brief Intelligence Test, Second Edition, KBIT-2; Kaufman & Kaufman, Reference Kaufman and Kaufman2004; which consists of 46 designs with a missing element that participants complete by selecting an element from multiple-choice options). Participants began with the first item (rather than beginning at an age-specific start point). Raw Shipley and Matrices scores are shown in Table 1.
After completing these tests, participants were interviewed to assess spoken language proficiency, and then were asked to name pictures from the Boston Naming Test (BNT; Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983) and the Multilingual Naming Test (MINT) with test order (BNT, MINT), and language-of-testing (English, Spanish) in counterbalanced order between subjects. Monolinguals were tested in English only. Bilinguals were interviewed in both languages, and named pictures in both languages. To minimize language switching, the proficiency interview and naming tests were administered in succession in one language, followed by interview and then naming tests in the other language. Phonemic cues were not administered for either naming test, and participants were asked to name all pictures in both tests (i.e., testing did not begin in the middle of the test). Tasks were presented on a Macintosh computer with a 17-inch color monitor using PsyScope 1.2.5 (Cohen, MacWhinney, Flatt & Provost, Reference Cohen, MacWhinney, Flatt and Provost1993) and a bilingual experimenter recorded naming accuracy during testing, and testing sessions were also audio-recorded for later verification of scoring. The testing protocol took about an hour and a half to complete for most participants, and no more than two hours.
Self-ratings of language proficiency
As part of the questionnaire, participants were asked to rate their proficiency level using a 10-point scale modified and shortened from guidelines published by the American Council on the Teaching of Foreign Languages (ACTFL). ACTFL introduces ten categories used to classify a speaker's language abilities: Superior (10), Advanced High (9), Advanced Middle (8), Advanced Low (7), Intermediate High (6), Intermediate Mid (5), Intermediate Low (4), Novice High (3), Novice Mid (2), and Novice Low (1). The modified guidelines for spoken proficiency that were used here are shown in the appendix. The full-length guidelines as published by ACTFL can be obtained on the “publications” tab at http://www.actfl.org/i4a/pages/index.cfm?pageid=1.
Oral proficiency interview (OPI)
The proficiency interviews were based on the format used by ACTFL for assessing spoken language proficiency. Questions appropriate for Novice levels (levels 1–3) were excluded because of the focus on relatively proficient early bilinguals. Two sets of six interview questions were created. The first question in each set was relatively easy and could be answered mostly in the present tense (e.g., “Where did you grow up? How is it similar to or different from San Diego?”). The second question in each interview set asked speakers to describe a picture (either the Cookie Theft picture from the Boston Diagnostic Aphasia Exam, or a picture of a scene depicting a broken window, the child who broke the window hiding behind a bush, and an adult accusing a different child of breaking the window). The third and fourth questions were designed to elicit past and future tense constructions (e.g., “Tell me about your first day at UCSD. What was it like? What do you remember most about it?” and “Tell me about what you will do next week. Where will you be and what will you be doing each day?”). The last two questions in each set were designed to provide speakers with an opportunity to produce more difficult constructions typical of educated native speakers (e.g., “Some parents think that bilingual children will not do as well in school as monolingual children. Others say bilingualism is an advantage. What do you think? How would you try to convince someone that your view is the right one?”). Monolinguals completed only one set of interview questions in English (with question set counterbalanced between subjects). Bilinguals completed both sets (one in each language with counterbalanced assignment of question set to language between subjects).
Participants were interviewed by one of two proficient Spanish–English bilingual experimenters who assigned each participant a rating using the same guidelines as those shown in the appendix. After data collection, a third multilingual experimenter listened to all of the proficiency interview recordings and assigned each participant a rating for each language (using the same scale). Perhaps because of the truncated range of bilingual proficiency levels (no low-proficiency bilinguals were tested), and because two different raters provided the initial ratings, the correlation between the final ratings (provided by the single third rater) and initial ratings (some of which were provided by one experimenter and some by a second experimenter) were not very high; for English was r = .55, p < .01, and for Spanish it was r = .60, p < .01. However, the average difference between the third rater and the initial two raters was quite small; just over half a point of difference on average for both languages (M = 0.72; SD = 0.58 for English, and M = 0.87; SD = 0.73 for Spanish). Thus, on average, the ratings matched each other within a difference of less than one point on the 10-point scale in both languages. For internal consistency, the ratings provided by the third rater were used in all statistical analyses reported below (with the exception of one initial rating for one person in one language because the recording was corrupted and thus the third experimenter could not rate this interview).
Multilingual Naming Test (MINT)
A set of 68 black-and-white line drawings were selected and presented in order of estimated increasing difficulty. To cater the test to multilingual speakers, target pictures were selected from a variety of sources with the following constraints. First, pictures with cognate names (i.e., translation equivalents that are similar in form across languages were excluded; e.g., pyramid is pirámide in Spanish; see Gollan, et al., 2007 for an analysis of cognate effects on the BNT). Cognates were excluded in an attempt to maximize the extent to which the test measures language-specific knowledge without influence from the other language. Second, an attempt was made to include a range of item difficulty but with a greater proportion of medium difficulty items than typically included in naming tests designed for monolinguals (e.g., the BNT; Kaplan et al., Reference Kaplan, Goodglass and Weintraub1983). The rationale here was that sensitivity to bilingual naming skills might be better with a slightly easier test given that bilinguals often obtain lower naming scores than monolinguals, and bilinguals might be completely unfamiliar with some of the very low frequency items towards the end of the test (e.g., Gollan & Brown, Reference Gollan and Brown2006; Gollan, Montoya, Cera & Sandoval, Reference Gollan, Montoya, Cera and Sandoval2008; Roberts, Garcia, Desrochers & Hernandez, Reference Roberts, Garcia, Desrochers and Hernandez2002). Inclusion of a greater range of medium difficulty items might be especially important for assessing naming ability in a non-dominant language (given that items that are too difficult would simply elicit “don't know” responses).
Finally, these criteria were applied with consideration of four languages including Spanish, English, Mandarin Chinese, and Hebrew to allow for eventual cross-study comparison of bilinguals of different language combinations (though here we present only the Spanish–English data). To this end, several bilingual experimenters were consulted during initial item selection including two Spanish–English bilinguals, two Hebrew–English–Spanish trilinguals, and three Mandarin–English bilinguals. The initial item set was piloted with a larger set of words in English, Spanish, Hebrew, and Mandarin (n ≈ 5 per language). Items were eliminated if they were cognates with English words, seemed to be more difficult to name in one language than in the others, or had multiple names in any of the four languages. Thus, the resulting item set might be relatively culture-neutral when compared with an item set designed for use with just one (or even just two) languages. However, we caution the test would be unlikely to work for other languages (i.e., be biased against or for languages that were not included in piloting and item development; e.g., cognate status is something that would vary across language pairs and could have powerful effects on naming scores; e.g., Costa, Caramazza & Sebastián-Gallés, Reference Costa, Caramazza and Sebastián-Gallés2000; Costa, Santesteban & Caño, Reference Costa, Santesteban and Caño2005; Gollan & Acenas, Reference Gollan and Acenas2004; Gollan et al., Reference Gollan, Fennema-Notestine, Montoya and Jernigan2007; Roberts & Deslauriers, Reference Roberts and Deslauriers1999).
Table 2 illustrates the material characteristics with means for BNT items as a point of comparison (a full list of items is also shown in Supplementary Materials.Footnote 1 Item characteristics were obtained using a program called N-WATCH (Davis (Reference Davis2005) for English, and using Buscapalabras (Davis & Perea, Reference Davis and Perea2005) for Spanish, and from the Corpus del Español (Davies, Reference Davies2002). Frequency counts for English are from the Count of Contemporary American English (Davies, Reference Davies2008), CELEX (Baayen, Piepenbrock & Gulikers, Reference Baayen, Piepenbrock and Gulikers1995) and Kučera & Francis (Reference Kučera and Francis1967), and for Spanish from the LEXESP database (Sebastián-Gallés, Martí, Cuetos & Carreiras, Reference Sebastián-Gallés, Martí, Cuetos and Carreiras2000). Consistent with the goal of making the MINT a little easier than the BNT, the MINT names are shorter (in syllables and number of phonemes) and higher frequency in both languages than the BNT names. Given other selection restrictions we did not attempt to match across languages for length; thus, English words tended to be shorter on average than Spanish words. The means also suggest that the English names are higher frequency than the Spanish names, but note that the validity of this comparison is compromised by the fact that the frequency counts were not matched across languages, and that the frequency databases for Spanish were based on texts from many countries, whereas nearly all of the bilinguals in the current study originated from Mexico. It should also be noted that monolingual frequency counts may not be as accurate for bilingual speakers. A complete list of names used most often to name MINT pictures, any alternative names that were counted as correct (e.g., teeter totter was accepted as a correct response for seesaw), and naming rates for each item by age group and proficiency level can be found in Supplementary Materials.
*Significant difference between the MINT and the BNT items at p < .05 level
**Significant difference between the MINT and the BNT items at p < .01 level
Results
Table 3 reports the means (Ms) and standard deviations (SDs) for bilinguals’ self-rated spoken language proficiency, the oral proficiency interview (OPI) ratings, and proportion correct on the MINT and the BNT in English and in Spanish broken down by self-rated dominance groups. For ease of exposition we group together the OPI, MINT and BNT scores under the term objective measures because they do not rely on bilinguals’ self-ratings (note however, that the OPI is technically not objective in the sense that the interview scores exist in the minds of the interviewers). Briefly summarized, results reveal significant correlations between measures, but these are far from perfect. Self-report, proficiency interview, and the MINT (but not the BNT) agreed with each other in classifying bilinguals into groups, but when considering degree of language dominance (rather than simple classification into groups) the naming tests (especially the BNT) classified bilinguals as more English-dominant than the self-ratings and proficiency interviews.
aMINT and BNT scores are reported as proportion correct for ease of comparison across tests given different number of items (68 in the MINT and only 60 in the BNT).
bOlder adults self-ratings were adjusted from a seven-point scale to a 10-point scale to for ease comparison to other scores ((rating/7) × 10). Actual ratings for English were 5.57 (SD = 1.13) and for Spanish 5.75 (SD = 1.11; see also Table 7).
Correlations between measures
We began by considering correlations between measures in each language, a language dominance score, and an index score designed to measure the degree of balanced bilingualism. We calculated a dominance score for each of the four measures (self-ratings, OPI, MINT, and BNT) by subtracting the Spanish scores from the English scores (thus negative difference scores reflect Spanish dominance, and positive scores reflect English dominance; see Figure 1, on next page). Index scores were calculated for each of the four measures by dividing the score in whichever language produced the lower score by the score in the other language (which produced a higher score; see Figure 2). For example, a bilingual who named 60 pictures in English and only 30 in Spanish on the BNT would be classified as 50% bilingual according to the BNT (as would a bilingual who named 30 pictures in English and 60 in Spanish). Or using ratings as another example, a bilingual with a Superior rating for English (i.e., 10) and an Advanced Middle rating for Spanish (i.e., 8) would be classified as 80% bilingual. Bilingual index scores range from 0 to 1 and measure the extent to which knowledge of each language is similar (ignoring direction of dominance and ignoring absolute ability level; see also Gollan et al., Reference Gollan, Salmon, Montoya and da Pena2010). The bilinguals tested here all scored at least 79% correct in their dominant language, and between 38% and 94% correct in their non-dominant language on the MINT (thus no bilinguals had extremely low scores in both languages, and all were at least moderately proficient bilinguals).
Table 4 shows the between measure correlations. As previously reported (e.g., Marian et al., Reference Marian, Blumenfeld and Kaushanskaya2007), there were significant correlations between self-reported proficiency in English (the dominant language for most participants) and objective measures (OPI, MINT, and BNT scores) ranging from r = .281 to r = .503, and correlations tended to be higher between self-reported level of proficiency in Spanish (the non-dominant language for most participants) ranging r = .425 to r = .520. Interestingly, and providing evidence against claims that bilinguals cannot accurately report which language is dominant (Dunn & Fox Tree, Reference Dunn and Fox Tree2009), the correlations between self-reported ratings of language dominance and objective measures of language dominance tended to be higher, ranging from r = .585 to r = .622. Thus, bilinguals were at least as accurate, or even more accurate, in estimating which of their own two languages is dominant as they were at estimating their absolute level of ability in each language.
In contrast, the correlations between self-rated index scores and objective index scores were substantially smaller and only marginally significant, ranging from r = .197 to r = .268. Thus, whereas bilinguals are relatively accurate in indicating which language is dominant, they are relatively less able to estimate the extent of difference in proficiency between languages (ignoring language dominance and focusing instead on the extent to which knowledge of the two languages is similar or balanced). Finally, objective measure index scores were strongly correlated with each other, ranging from r = .669 to r = .858. Taken together, these correlations suggest that self-report measures can predict language dominance (though their utility for this purpose is far from perfect), and that self-report should not be used to measure degree of balanced bilingualism.
Other correlations shown in Table 4 are of interest. Analyses reported in later sections reveal the BNT as an outlier measure; however, despite these differences the correlation between the BNT and MINT were quite high, ranging from r = .855 to r = .893. In addition, objective measures of language dominance were strongly correlated with each other, ranging from r = .751 to r = .893 (relative to correlations between self-report and objective measures of language dominance which, as noted above, ranged from r = .585 to r = .622). Thus, objective measures of language dominance are probably a better choice than self-report measures.
Young bilinguals’ ability to self-report language dominance
Dominance classification into subgroups
Because dominance classification is often of interest in absolute terms (correct or incorrect) we further investigated correspondence between self-reported and objective measures of language dominance using measure-anchored cut-off scores. Note that we did not ask bilinguals to say which language is dominant (which involves directly comparing the two languages); but dominance ratings can be inferred by inspecting the ratings for each language (and allowing a “balanced” category). In self-ratings the smallest difference between languages was half a point (a 5% difference) on the 10-point scale we provided (see Appendix). Thus, for balanced bilingualism we allowed any difference of less than 5% in either direction (i.e., English better than Spanish or Spanish better than English) to be classified as objectively balanced, and any difference of 5% or greater in either direction to be classified as objectively dominant in one or the other language (depending on the direction of the difference). Thus cut-offs for Spanish-dominant bilinguals were difference scores of –5% and greater; for balanced bilinguals, from –4.9% to 4.9%; and for English-dominant bilinguals, 5% and greater. The OPI ratings were on the same 10-point scale as the self-ratings, but MINT and BNT scores were based on a 100-point scale. Thus, for purposes of comparison, naming scores were converted to a 10-point scale by dividing by 10. For example, naming score differences of 5% were considered equivalent to 0.5 points on the 10-point scale used for self-ratings and OPI. Note that these cut-off scores are arbitrary in that there is no sense in which a 5% difference necessarily qualifies as a point in which a significant, measurable, or “true” difference is present. Thus, the scale is consistent across measures and provides a means for comparison but the extent to which misclassifications truly qualify as such could be debated (we return to this in the “General discussion” section below).
With this method of classifying bilinguals into three groups (Spanish-dominant, balanced, English-dominant), self-classifications did not differ from OPI classifications and MINT scores, (both ps ≥ .22), but self-classifications were significantly different from BNT classifications, χ2(2, n = 52) = 8.92, p = 0.01. Similarly, OPI ratings did not differ from MINT classifications (p = .33), but were significantly different from BNT classifications, χ2(2, N = 52) = 7.46, p = 0.02. Thus, the BNT stands out as significantly different from self-ratings and OPI, though the MINT and BNT classifications did not differ significantly from each other; p = .35. Table 5 illustrates the percentage of bilinguals in each self-rating group (i.e., Spanish-dominant, balanced, and English-dominant) whose self-ratings seemed to match objective dominance classifications.
Dominance along a continuum
On average as a group, bilinguals obtained higher scores in English than in Spanish in self-ratings, OPI (oral proficiency interview) ratings, MINT scores, and BNT scores (all ps < .001). However, as shown in Figure 1 above, the extent of English dominance varied across measures (see also Bedore et al., Reference Bedore, Peña, Summers, Boerger, Resendiz, Greene, Bohman and Gillam2011); for self-ratings it was by 8.8% (SD = 16.4), for OPI ratings by 9.9% (SD = 10.8), for MINT scores by 16.0% (SD = 15.6), and for BNT scores by 28.1% (SD = 21.4). Six paired t-tests comparing all possible two-way comparisons of these difference scores were all significant (ps ≤ .001), with one exception, which was that self-ratings and OPI ratings were not significantly different from each other (p = .54). Thus, self-ratings agreed with OPI ratings, but not with naming tests and the BNT in particular seemed to stand out in this regard.
Comparing the two naming tests, the degree of English dominance appeared to be considerably greater for the BNT than the MINT. A 2 × 2 ANOVA with test (MINT, BNT) and language (English, Spanish) as within-subject factors, and proportion correct as the dependent variable revealed this interaction to be highly robust statistically. There were main effects of language such that scores were higher in English than in Spanish [F(1,51) = 78.010, MSE = 0.032, ηp 2 = .605, p < .001], a main effect of test such that scores were higher on the MINT than on the BNT [F(1,51) = 352.563, MSE = 0.004, ηp 2 = .874, p < .001], and a significant interaction such that English appeared to be more dominant with BNT than with MINT scores [F(1,51) = 73.182, MSE = 0.003, ηp 2 = .589, p < .001]. Thus, the test not designed for use with Spanish or bilinguals seemed to bias classifications towards English dominance.
What is the source of discrepancy between subjective and objective measures of language dominance?
Beginning with the middle of Table 5, of the bilinguals who classified themselves as balanced, none seemed to be balanced by objective measures. Instead, all were classified as English-dominant. Some of these misclassifications were very small (i.e., only 5% and therefore possibly not true misclassifications); however, others appeared to have misclassified themselves much more obviously (e.g., a difference of up to 41.7%). Bilinguals who rated themselves as Spanish-dominant matched objective classifications a bit better; however, here too the match between self-report and objective measures was only 40%. For example, one bilingual who said s/he was Spanish-dominant was classified as English-dominant on the oral proficiency interview (OPI), and six bilinguals who said they were Spanish-dominant obtained higher naming scores on the BNT in English than in Spanish. Finally, in English-dominant bilinguals the match between self-report and objective measures seemed to be better, but even here, one bilingual scored better in Spanish than in English on the OPI, another (a different person) scored better in Spanish than in English on the MINT, and a handful more seemed to be relatively balanced bilinguals on objective measures.
Table 3 illustrates that bilinguals who reported being Spanish-dominant seemed to be the most balanced bilinguals by objective measures, and those who reported being balanced bilinguals tended to be English-dominant. For example, bilinguals who reported being Spanish-dominant on average rated their Spanish to be about 1.5 points better than English, but objective measures revealed very small differences between languages, and suggested that these bilinguals may have overestimated their abilities in Spanish (e.g., they rated their Spanish at 9.5 on average but scored only an 8.5 on the Spanish OPI, and named about 84% of pictures on the MINT). Other studies have also found that the most objectively balanced bilinguals were also those who reported being dominant in, and also have a later age of acquisition for, their second-learned language (see Flege, MacKay & Piske, Reference Flege, MacKay and Piske2002 for a similar result with Italian–English bilingual immigrants to Canada). Bilinguals who rated themselves as balanced had higher self-ratings overall (over 9.5 on average in both languages) but like self-rated Spanish-dominant bilinguals also seemed to overestimate their abilities in Spanish (on average scoring between 12% and 29.8% better in English than in Spanish depending on the measure). Bilinguals who reported being English-dominant had virtually the same average rating values as Spanish-dominant bilinguals (just reversed by language; 9.5 for language chosen as dominant and about 7.8 for language chosen as non-dominant), but were more accurate given that objective measures seemed to confirm their English dominance.
Additional subgroup comparisons confirmed that bilinguals who rated themselves as balanced bilinguals resembled English-dominant bilinguals in their objective scores (see also Gollan & Ferreira, Reference Gollan and Ferreira2009). For example, balanced bilinguals rated their abilities in Spanish as higher (p < .001), but did not score significantly higher, than English-dominant bilinguals in Spanish on the OPI (p = .21), the MINT (p = .29) or the BNT (p = .26). This lack of differences (between test scores in each language in self-reported Spanish-dominant bilinguals) could not be attributed to lack of sensitivity in the measures given that self-rated balanced bilinguals did score significantly higher than self-rated Spanish-dominant bilinguals in English (p = .04 on OPI; both ps < .01 on MINT and BNT). Similarly, although self-rated Spanish-dominant bilinguals rated their ability in Spanish as significantly higher than their ability in English (p < .01), their performance on objective measures was not different between languages (all ps ≥ .34). Other significant differences of note were that self-rated English-dominant bilinguals were significantly different from those of Spanish-dominant bilinguals in both languages on all measures (all ps ≤ .01) with the exception of OPI scores in English, which only trended in the expected direction (p = .18). Finally, self-rated English-dominant bilinguals did not rate their spoken English proficiency as lower than monolinguals, but named significantly fewer pictures on both the MINT and the BNT (ps < .01) confirming previous reports of bilingual disadvantages (e.g., Gollan et al., Reference Gollan, Fennema-Notestine, Montoya and Jernigan2007; Reference Gollan, Montoya, Cera and Sandoval2008; Roberts et al., Reference Roberts, Garcia, Desrochers and Hernandez2002), and demonstrating sensitivity in the MINT to differences between bilinguals and monolinguals as well as to proficiency differences within bilinguals.
Degree of balanced bilingualism
Figure 2 above illustrates the index score means. The self-ratings, proficiency interviews, and the MINT all classify bilinguals as between 80% and 88% bilingual. In contrast, the BNT seems to underestimate the degree of bilingualism, classifying them as only 63% bilingual. The BNT index scores were significantly lower than all other index scores (all ps < .001). MINT index scores were only marginally different from self-rating index scores (p = .06), though like the BNT, the MINT index scores were significantly lower than oral proficiency interview (OPI) index scores (p < .001). Finally, self-rating index scores were only marginally lower than OPI index scores (p = .06).
Discussion
Experiment 1 revealed significant correlations between measures of bilingual language proficiency. As a group young bilinguals were best able to predict their own language dominance, and could also predict their level of proficiency in each language (especially the non-dominant language). In contrast, bilinguals were relatively unable to predict the extent to which they were balanced bilinguals (i.e., self-rated index scores were not significantly correlated with objectively measured index scores in Experiment 1, and not consistently in Experiment 2). For predicting purposes, degree of language dominance, self-ratings and the oral proficiency interview (OPI) ratings agreed with each other, and also with the MINT in absolute classification into groups. However, considering degree of language dominance, both naming tests indicated greater English dominance than self-report and interview measures (see Figure 1). Although bilinguals were fairly good at classifying themselves into three dominance groups (without considering degree of dominance), in all self-assigned dominance groups (English-dominant, balanced, Spanish-dominant) some bilinguals seemed to make classification errors, and these errors seemed to be driven in part by self-rated Spanish-dominant and balanced bilinguals’ overestimating their abilities in Spanish, and English-dominant bilinguals overestimating their ability in English. Importantly, the BNT stood out as an outlier in several analyses; it was most likely to classify bilinguals as English-dominant, classified the group as much more English-dominant than any other measure (Figure 1), and also seemed to underestimate the extent of balanced knowledge of the two languages (Figure 2), relative to all the other measures. Before considering the implications of these results, in Experiment 2 we further investigated bilinguals’ ability to estimate their own language dominance by testing a group of older Spanish–English bilinguals.
Experiment 2: Older bilinguals
Method
Participants
Table 6 shows the characteristics of the 20 older Spanish–English bilinguals who participated in Experiment 2. The majority of older bilinguals (n = 15) were recruited for participation from a cohort of healthy bilingual controls at the Alzheimer's Disease Research Center (ADRC) at the University of California, San Diego (UCSD), and were diagnosed as cognitively intact by two senior staff neurologists using criteria developed by the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) and the Alzheimer's Disease and Related Disorders Association (ADRDA; McKhann, et al., Reference McKhann, Drachman, Folstein, Katzman, Price and Stadlan1984) and based on medical, neurological, and neuropsychological evaluations and a number of laboratory tests (to rule out dementia). Five additional Spanish–English bilinguals were recruited from the San Diego area and were assumed to be cognitively intact based on high levels of reported functioning in daily life.
*Marginally significant t-test comparing Spanish-dominant to English dominant (p < .10)
**Significant t-test comparing Spanish-dominant to English dominant (p < .05)
***Significant t-test comparing Spanish-dominant to English-dominant (p < .01)
1The following seven-point scale was used: 1 = rarely or never, 2 = less than one hour per day, 3 = about one hour per day, 4 = about 2 hours per day; 5 = about 3–4 hours per day, 6 = about 5 hours per day, 7 = 6 or more hours per day.
2Self-ratings were based on a seven-point scale: 1 = almost none, 2 = very poor, 3 = fair, 4 = functional, 5 = good, 6 = very good, 7 = like native speaker.
Materials and procedure
These were the same as in Experiment 1 with two exceptions. First, the BNT, Shipley vocabulary, and Matrices subtest were not administered. Participants not from the ADRC were tested with the Dementia Rating Scale (DRS; Mattis, Reference Mattis1988), and Mini Mental State Examination (MMSE; Folstein, Folstein & McHugh, Reference Folstein, Folstein and McHugh1975) in their self-reported dominant language. For ADRC participants the DRS and MMSE scores were obtained from the most recent annual testing session at the ADRC. In addition, a shorter version of the language history questionnaire was used with self-ratings on a simpler scale ranging from 1 to 7. This simpler scale may be more practical for use in clinical settings.
The OPI ratings were all completed by the same multilingual experimenter who assigned OPI ratings in Experiment 1 (with the exception of two English scores for which recordings were missing and thus scores were taken from the experimenter who administered the interview instead). The correlation between the final and initial ratings for English was r = .69, p < .01, and for Spanish r = .86, p < .01. These correlations are a bit higher than the analogous correlations in Experiment 1, possibly reflecting the slightly broader range of proficiency levels in Experiment 2. All bilinguals had at least some moderate proficiency in both languages and the range was broader in Experiment 2 than in Experiment 1 (based on the one rater who rated all speakers in both studies these ranged from 5.5 to 10 in both languages in Experiment 2, but only from 6.5 to 10 in English and from 6 to 10 in Spanish in Experiment 1). As in Experiment 1, the average difference in rating between the final and initial ratings was low; in this case, under half a point of difference on average between raters for both languages (M = 0.19; SD = 1.11 for English, and M = 0.43; SD = 0.91 for Spanish). Thus, on average, the ratings matched each other within a difference of less than half of a point on the 10-point scale used to assign OPI ratings in both languages (see Appendix).
On average as a group, older bilinguals were relatively balanced exhibiting comparable English and Spanish self-ratings and OPI ratings (both Fs < 1), although MINT scores exhibited some tendency towards English dominance overall [F(1,19) = 2.97, MSE = 0.018, ηp 2 = .14, p = .10]. The relatively more balanced profile in the overall means (compared with English dominance for younger bilinguals in Experiment 1) reflects the lower proportion of self-reported English-dominant participants in Experiment 2 (seven out of 20 or 35%) relative to Experiment 1 (35 out of 52 or 67%; compare Tables 1 and 7).
Correlations between measures
Table 7 shows the correlations between measures, difference scores, and index scores. As in Experiment 1, there were significant correlations between bilinguals’ self-rated proficiency in each language and objective measures, ranging from r = .690 to r = .786. Also as in Experiment 1, correlations between self-ratings and objective measures of language dominance tended to be larger, ranging from r = .794 to r = .876, whereas correlations between self-ratings and objective index scores tended to be smaller, ranging from r = .396 to r = .586. Finally, objective measure index scores were correlated with each other, ranging from r = .473 to r = .874. These analyses confirm those reported in Experiment 1, and demonstrate that older bilinguals can also predict their language dominance, in this case using a simpler rating scale (for details see bottom of Table 6).
Older bilinguals’ ability to self-report language dominance
Dominance classification into subgroups
Using the same measure-anchored cut-off system as in Experiment 1, in Experiment 2, self-classifications did not differ from OPI-classifications or from MINT score classifications, and OPI and MINT scores classifications also did not differ from each other (ps ≥ .26). These results replicate those reported for young bilinguals. Further replicating Experiment 1, self-report and objective classifications did not always match, and depending on which measure was considered there were some total reversals of dominance group. Table 8 illustrates the percentage of older bilinguals of each type (self rated Spanish-dominant, balanced, English-dominant) whose self-ratings seemed to match objective dominance classifications, and Table 3 illustrates some of the source of discrepancy between self-report and objective measures. Of the three bilinguals who classified themselves as balanced, one was confirmed to be balanced by the OPI, but this same bilingual scored 5.9% better on the MINT in English than in Spanish. Another was classified as relatively balanced by the MINT (scoring 4.4% better in Spanish than in English), but was rated as 20% better in English than in Spanish on the OPI (a rating of 8.5 for English and only 6.5 for Spanish). Among the 10 bilinguals who rated themselves as Spanish-dominant, two scored about 15% better on the MINT in English than in Spanish. Finally, as in Experiment 1, in English-dominant bilinguals the match between self-ratings and objective measures seemed to be better (all seven were classified as English-dominant in all measures).
Dominance along a continuum
On average, difference scores (English minus Spanish) were relatively balanced (see Figure 1); for self-ratings the scores averaged slightly in the direction of Spanish dominance by 2.5% (SD = 27.2), and in OPI ratings by 1.3% (SD = 27.0), whereas MINT scores averaged in the direction of English dominance by 7.3% (SD = 19.1). As in Experiment 1, paired t-tests revealed significant differences between self-ratings and MINT difference scores, and between OPI and MINT difference scores (both ps = .01), but self-rating and OPI based differences scores were not significantly different from each other (p = .75).
What is the source of discrepancy between subjective and objective measures of language dominance?
As in Experiment 1, the three self-reported balanced bilinguals seemed to be English-dominant on objective measures (both OPI and the MINT). Though cross-experiment comparisons are to be exercised with caution (young and older participants were not matched for language proficiency and other characteristics, and were tested with slightly different procedures), in other respects older bilinguals in Experiment 2 seemed to fare better in estimating their language dominance than did young bilinguals in Experiment 1. For example, instead of exhibiting a balanced profile as in Experiment 1, self-rated Spanish-dominant older bilinguals seemed to score significantly better in Spanish than in English on the OPI (p = .01), and their MINT naming scores were 5.9% higher in Spanish than in English (instead of just 1.7% higher in Experiment 1; though the 5.9% difference still was not significant, p = .27). Finally, as in Experiment 1, older bilinguals who reported being English-dominant had average rating values very similar to those of Spanish-dominant bilinguals (again just reversed by language), but were more accurate given that objective measures confirmed their English dominance (both ps < .01).
Assessment of degree of bilingualism
Figure 2 above illustrates the index score means. The self-ratings, proficiency interviews, and the MINT, classified older bilinguals as between 77% and 82% bilingual and there were no significant differences in index scores across measures (all ps ≥ .14).
General discussion
The results of the current study simultaneously validate, and illustrate the limitations of, self-report measures of language proficiency and language dominance. The approach taken here assumes that no single measure will provide a complete assessment of bilingual language proficiency which can vary from domain to domain, and will reflect different aspects of knowledge and skill. A bilingual who is classified as dominant in one language by objective measures but nevertheless rates herself as dominant in the other language is not necessarily “wrong” in this self assessment. Instead, this bilingual may be focusing on something that is not measured by naming tests and proficiency interviews (or other objective tests).
The proficiency interviews in the current study provided an objective measure of language proficiency that is relatively naturalistic, and more similar to self-ratings in a number of ways. Perhaps most notably, interview scores were likely influenced by a range of abilities including lexical retrieval ability, formulation of syntactic structures, perhaps knowledge of colloquial expressions, range of registers, accent, and other skills. In contrast, MINT scores reflect only the ability to retrieve picture names. As such, it might be expected that the interviews would be more strongly correlated with self-ratings which probably also are based on a wide range of abilities (i.e., it is unlikely that bilinguals consider only their ability to produce object names when providing a rating of their ability to speak each language). Moreover, in Experiment 1 both self-ratings and proficiency interview scores were based on the same scale and detailed descriptions of the skills associated with each scale level (see Appendix).
Indeed, self-ratings and interview scores did not differ from each other in determining degree of language dominance (see Figure 1 above), and both differed significantly from dominance classifications derived from naming tests (in both Experiment 1 and Experiment 2). However, Tables 4 and 7 do not confirm this expectation; instead, the correlations between self-ratings and interviews were often smaller than correlations between self-ratings and naming tests, and between interview-ratings and naming tests. Without the proficiency interviews, it might seem that self-ratings and naming tests do not produce perfect correlations because naming tests do not measure a variety of skills, and because the scale of measurement is not the same across these two measures. Instead, it seems that there may be some real differences in language dominance across different domains (Bedore et al., Reference Bedore, Peña, Summers, Boerger, Resendiz, Greene, Bohman and Gillam2011; Grosjean, Reference Grosjean2008) – and perhaps also some degree of true error – in self-ratings.
Can bilinguals tell which language is dominant, and if not why not?
The current findings begin to provide an answer to the question “Can bilinguals accurately tell which language is dominant?” The answer to this question appears to be yes to some degree – particularly if degree of dominance does not matter (see also Dunn & Fox Tree, Reference Dunn and Fox Tree2009). However, bilinguals may still perform relatively better on objective measures in the language they report is not dominant, particularly if measures were not designed for use with bilinguals (i.e., BNT). Moreover, the consequences of classification error will be so great in many circumstances that it would be very wise not to rely exclusively on self-report. Tables 5 and 8 illustrate an estimation of the percent of bilinguals who seemed to have slightly or greatly misclassified their own language dominance in their own self-ratings. Some of the misclassifications include cases of complete dominance reversals (i.e., saying one language is dominant but then performing better in the other language). These were observed in both Experiment 1 and Experiment 2, sometimes with very large discrepancies. Subtler differences were also found and it might be debated as to whether or not they truly qualify as true misclassifications, but could nevertheless have important consequences for conclusions drawn in both clinical settings and for shaping models of bilingual language processing (more on this below).
Our method of classifying bilinguals into groups could be criticized. For example, our 5% cut-off point was anchored to the self-rating scores, and the fact that half a point of difference on the 10-point scale was the smallest distinction chosen by any of the bilinguals. This approach is somewhat arbitrary and not necessarily defensible in its application across measures. For example, in Experiment 2 we used only a seven-point scale and there too a half a point of difference was the smallest distinction used in self-ratings even though half a point corresponds to a greater percentage of difference on a seven-point than on a 10-point scale (which in turn implies that bilinguals’ ratings were influenced to some extent by the scale they were provided with and not exclusively by actual proficiency levels). Having acknowledged this limitation in our approach there are also reasons to believe that a 5% difference constitutes a reasonable cut-off point for misclassifications. For example, a 5% difference on the BNT corresponds to a standard deviation of monolinguals’ naming scores (see Table 3). In terms of cognitive assessment and also in terms of theoretical interpretation, a standard deviation would be considered a significant difference in many (if not most) cases.
The data reported here do not provide a definitive answer as to why some bilinguals seem to misclassify their language dominance but the participant characteristics tables (Tables 1 and 6) as well as the self-reported sub-group means (in Table 3) provide some clues. First, note several significant differences between subtypes in a range of self-report characteristics. Spanish-dominant bilinguals reported learning English at a later age, and using Spanish relatively more often both currently and when growing up, relative to both English-dominant and balanced bilinguals. In Experiment 1 self-reported balanced bilinguals also had significantly higher non-verbal reasoning scores (this skill was not measured in Experiment 2). Thus, one could speculate that people with higher intellectual ability might be more willing to give themselves a very high rating in both languages (even if such a rating is not warranted!). Looking at the subgroup means (Table 3), one might have expected that bilinguals immersed in a language that is not their self-reported dominant language could be more likely to underestimate the extent to which they have become dominant in the language dominant to the environment. This seemed to be the case for balanced bilinguals (both young and older in Experiments 1 and 2), who rated their abilities as equal in the two languages but then performed better in English on objective measures (proficiency interviews and naming tests). But the means in Table 3 tell a slightly different story especially for young Spanish-dominant bilinguals who underestimated their abilities in English only slightly, but seemed to overestimate their abilities in their dominant language (i.e., Spanish) to a larger extent. Similarly, English-dominant bilinguals (again especially young bilinguals in Experiment 1) seemed to overestimate their abilities in English. Thus, overestimation of abilities in the dominant language seems to be part of the reason why self-report and objective measures of dominance do not match perfectly. The presence of an effect in the same direction for Spanish-dominant and English-dominant bilinguals suggests a locus of discrepancy that is not specific to maintenance of a minority language (e.g., see Hakuta & D'Andrea, Reference Hakuta and D'Andrea1992, which presented evidence that positive attitude towards maintenance of Spanish proficiency in an English-dominant environment influences proficiency ratings).
The term “overestimation” is used here on the assumption that the objective measures capture an aspect of proficiency that should be included in an ideal measure of proficiency but that self-ratings somehow fail to capture. An alternative possibility is that the self-ratings are more accurate and the objective measures are all flawed, but even if so the correspondence between them is important given that objective measures must be used in testing situations (where the goal will often be to test in whichever language produces a better performance). There is also an assumption of proportional correspondence between measures in scales. As noted above, the extent to which this correspondence is justified could be debated. However, some degree of confidence in the correspondence can be drawn from the significant correlations between objective measures in these comparisons. Having noted these, it is also important to discuss some of the differences found between objective measures in the extent to which one language was dominant over the other (for the same bilinguals).
Limitations of the BNT for bilingual assessment
Particularly notable in this regard in the current study was the bias in favor of English on the BNT. For all bilinguals, the BNT seemed to underestimate Spanish proficiency, provided an inaccurate measure of the degree of bilingualism, and distorted language dominance classifications relative to all three other measures (including some complete reversals of dominance classification). For Spanish-dominant bilinguals, the BNT produced the largest proportion of completely reversed classifications of language dominance (see Table 5; i.e., 60% of bilinguals who said they are Spanish-dominant were actually able to name more pictures in English than in Spanish on the BNT). For self-rated balanced bilinguals and English-dominant bilinguals the BNT likely overestimates the extent to which English is dominant over Spanish. The BNT is likely inadequate for assessing bilingual language proficiency because it was not designed for use with bilinguals or with Spanish speakers. (e.g., Allegri et al., Reference Allegri, Mangone, Fernandez Villavicencio, Rymberg, Taragano and Baumann1997; Gollan et al., Reference Gollan, Fennema-Notestine, Montoya and Jernigan2007; Kohnert et al., Reference Kohnert, Hernandez and Bates1998; Patricacou, Psallida, Pring & Dipper, Reference Patricacou, Psallida, Pring and Dipper2007; Weintraub et al., Reference Weintraub, Salmon, Mercaldo, Ferris, Graff-Radford, Chui, Cummings, DeCarli, Foster, Galasko, Peskind, Dietrich, Beekly, Kukull and Morris2009), and thus the items may be relatively more difficult in Spanish than in English (for discussion see de la Plata et al., Reference de la Plata, Vicioso, Hynan, Evans, Diaz-Arrastia, Lacritz and Cullum Munro2007, and Peña-Casanova et al. (Reference Peña-Casanova, Quiñones-Úbeda, Gramunt-Fombuena, Aguilar, Casas, Molinuevo, Robles, Rodríguez, Sagrario Barquero, Antúnez, Martínez-Parra, Frank-García, Fernández, Molano, Alfonso, Sol and Blesa2009, p. 350); the latter suggest that “more studies about the suitability of each item for assessment of naming ability in Spanish” are needed).
The BNT seemed to be an outlier in terms of both index scores and dominance classifications (see Figure 1 and Table 5). Nevertheless, performance on the two naming tests was highly correlated (see Table 4; the BNT was not used in Experiment 2 with older bilinguals). The correlations indicate that the extent to which the BNT is biased in favor of English (and against Spanish) is relatively uniform across subjects (the direction of difference between languages on the two tests is similar between individuals). Thus, although we caution against using the BNT to assess language dominance and degree of bilingualism, in other respects the BNT may provide a useful measure (e.g., for tracking changes in ability in each language over time; or for determining how bilinguals perform in English). Despite its potential flaws in this context, the BNT remains commonly used both in clinical settings and in experimental research with bilinguals (e.g., Gollan et al., Reference Gollan, Salmon, Montoya and da Pena2010; Rosselli et al., Reference Rosselli, Ardila, Araujo, Weekes, Caracciolo, Padilla and Ostrosky-Solis2000; Silverberg & Samuel, Reference Silverberg and Samuel2004), thus it is important to qualify interpretation of scores with a detailed understanding of specifically how the test may distort bilingual language assessment.
Implications for research and clinical use
To facilitate future use of naming tests for these purposes, detailed information about which items on both tests were more difficult in Spanish than in English for different types of bilinguals can be found in Supplementary Materials. In addition, to provide a measure of difficulty level for each item in each language, these Supplementary Materials tables include two columns that show naming accuracy for bilinguals who were rated at the highest possible proficiency level in the OPI (a Superior rating; there were 11 young and two older bilinguals who received this score for English, and two young and three older bilinguals who received this score for Spanish). Finally, Table 9 provides mean (and SD) naming test scores for each language at each self-rated proficiency level. These means may be useful in clinical settings for asking more specific questions relating self-rated proficiency level to performance (e.g., given a rating of X on language Y, what is the range of normal performance?). Note that means go down with each rating level for both naming tests and in both languages, again validating self-ratings (with some exceptions, where the n is small); however, the standard deviations also become larger as the means become smaller (scanning from the top to the bottom, where lower proficiency levels are represented). This suggests greater variability in performance, and reduced reliability of ratings at lower proficiency levels. In addition, with few exceptions, standard deviations tend to be larger in the BNT than the MINT, especially in Spanish; thus, for diagnostic purposes, the MINT may be more useful than the BNT.
Previous studies which claimed that bilinguals are not able to indicate which language is dominant may have drawn this conclusion because of limitations in the choice of measures used to evaluate self-ratings. As an example, in lieu of self-report, Dunn and Fox Tree (Reference Dunn and Fox Tree2009) developed and recommend the use of a language dominance scale which includes questions about each language for age of acquisition, extent to which bilinguals feel “comfortable” speaking, location of language use, language used for math, presence of foreign accent, schooling, language dominant to the environment, and questions about language loss (including loss of knowledge and forced choice of which language is more important). They reported that bilinguals who were classified as relatively balanced on this scale translated words more slowly than bilinguals with one clearly dominant language, thus demonstrating utility of their measure for predicting performance on an objective measure. In addition, they found no correlation between self-reported degree of language dominance and translation speed, and therefore concluded that self-ratings are not reliable. They also concluded that balanced bilinguals translate more slowly because they suffer from more interference between languages than unbalanced bilinguals.
The Bilingual Dominance Scale is compelling in many ways, and it would be interesting to see if it improved on self-ratings in classifying bilinguals into dominance groups. However, the analyses presented here reveal a number of problems with the interpretations offered therein. First, the way Dunn and Fox Tree (Reference Dunn and Fox Tree2009) assessed self-report as a predictor did not measure if self-reported and objective classifications of language dominance match or not. Their analyses asked if dominance ratings predict translation times. The current data indicate that bilinguals can be fairly accurate in indicating which language is dominant, but are less able to assess the extent to which their knowledge of the two languages is balanced. The distinction between these is quite subtle but could nevertheless have tremendous significance in terms of the conclusions drawn. In particular, bilinguals are certainly not completely useless at indicating which language is dominant; the data in Figure 1 suggest that bilinguals’ self-ratings of degree of language dominance align quite well with those determined by proficiency interviewers. Bilinguals do not exclusively imagine themselves translating single words, or naming pictures when they provide self-ratings of proficiency. Thus, the measure used to assess accuracy of self-ratings is of critical importance.
To illustrate, the same balanced bilinguals who translated more slowly than other bilinguals at the single word level in Dunn and Fox Tree (Reference Dunn and Fox Tree2009), also translated with fewer hesitations (ums and uhs) and elongations than less balanced bilinguals when given a more difficult task (translation of sentences). In this second task, no analysis was reported to assess if self-ratings were correlated with translation fluency (presumably because by that point they had already abandoned self-ratings as a flawed measure given results of analyses of the single-word task). However, a closer look at the methods and results reveals that apparently items in the single-word translation task included words with multiple translations, and more proficient bilinguals might have therefore been slower to translate because they were choosing between multiple alternative possible translations (with balanced bilinguals having “difficulty choosing the most accurate translation” (Dunn & Fox Tree, Reference Dunn and Fox Tree2009 p. 282). If so, the theoretical implications of finding that bilinguals translated single words more slowly in the single-word task could have nothing to do with interference between languages, but rather with greater proficiency and a need to select within a single target-language the best translation (an issue completely orthogonal to the possibility of between-language interference).
To conclude, bilinguals are largely pretty good at reporting which of their two languages is dominant, but the extent of difference between languages can vary with domain (and with different measures), and some bilinguals completely miss the mark thus sole reliance on self-report is not advised. Although we did not set out to compare young and older bilinguals, the data we presented also appear to be largely comparable across age-groups. In cases where bilinguals perform relatively better in the language they report is not dominant, this may occur because their level of ability is better in some domains in their otherwise less-dominant language, because the test is biased towards their non-dominant language, because dominance varies with domain (Bedore et al., Reference Bedore, Peña, Summers, Boerger, Resendiz, Greene, Bohman and Gillam2011), or for other reasons (e.g., overestimating ability in the dominant language). In clinical settings, bilinguals who report balanced ability in both languages should be questioned and it should not be assumed that they could be tested in either language. English-dominant bilinguals can be tested in English, but should not be expected to perform like monolinguals.
Although we have focused here largely on measurement of bilingual language proficiency and the accuracy of self-report measures, it is important to consider the possibly far-reaching implications of the results reported here for developing theoretical models of bilingual language processing. There has been some focus recently on whether a non-dominant language can influence processing in a dominant language, both in research on visual word recognition (van Assche, Duyck, Hartsuiker & Diependaele, Reference van Assche, Duyck, Hartsuiker and Diependaele2009; van Hell & Dijkstra, Reference van Hell and Dijkstra2002), and in research on language production and verbal fluency (e.g., Costa et al., Reference Costa, Caramazza and Sebastián-Gallés2000; Ivanova & Costa, Reference Ivanova and Costa2008; Sandoval, Gollan, Ferreira & Salmon, Reference Sandoval, Gollan, Ferreira and Salmon2010). In such investigations, it would be wise to establish dominance using objective measures rather than relying on self-report. In addition, such assessment should be reported for each individual included in the analysis rather than for the group as a whole. For example, in Experiment 1 the overall means suggest English dominance in the group as a whole; however, 40% of these bilinguals are classified as Spanish-dominant by objective measures. In looking for effects of a non-dominant language on a dominant language, it is extremely important to exclude participants who might be incorrectly self-classifying their dominance. Bilinguals with a relatively balanced profile should also be excluded from analysis to allow strong conclusions to be drawn. Similar approaches should be taken in studies that wish to distinguish between balanced and unbalanced bilinguals. Self report measures seemed to be least accurate for this type of classification. Future attempts to draw theoretical conclusions about the effects of language dominance, or balanced versus unbalanced bilingualism, should take into consideration the limitations in self-report and objective measures, and temper conclusions accordingly while also taking extra measures to ensure that misclassifications are very unlikely.
Appendix. Language proficiency scale for self-ratings in Experiment 1, and by oral proficiency interviewers in Experiments 1 and 2
1 = Novice Low = No real functional ability. Given lots of time and cues may be able to exchange greetings, give identity and name a number of familiar objects. Cannot participate in a true conversational exchange.
2 = Novice Middle = Can communicate only very minimally and with great difficulty using a number of isolated words and memorized phrases.
3 = Novice High = Can communicate with some success about simple topics only. Heavy reliance on memorized phrases, or on words provided by person speaking with. Speaks in short or incomplete sentences, and frequent miscommunications occur.
4 = Intermediate Low = Can successfully handle a limited number of uncomplicated communicative tasks by combining and recombining into short statements what they know and what the person speaking with says.
5 = Intermediate Middle = Can successfully handle a variety of uncomplicated communicative tasks about simple topics (food, travel, family, daily activities and personal preferences). Speaks in full sentences and even with some strings of sentences.
6 = Intermediate High = Can successfully handle many uncomplicated tasks and social situations requiring an exchange of basic information related to work, school, recreation, particular interests and areas of competence. Some hesitation, errors, and gaps in communication may still occur.
7 = Advanced Low = Can participate actively in most informal and a limited number of formal conversations on activities related to school, home, and leisure activities and, to a lesser degree, those related to events of work, current, public, and personal interest or individual relevance. Can rarely function at the level of formal or professional language, and cannot speak at a professional level for an extended period of time.
8 = Advanced Middle = Can handle with ease and confidence a large number of communicative tasks such as informal and some formal exchanges on a variety of concrete topics relating to work, school, home, and leisure activities, as well as to events of current, public, and personal interest or individual relevance. Can sometimes function at a formal or professional level of language but not consistently and not with a broad range of topics.
9 = Advanced High = Can participate fully and effectively in conversations on a variety of topics in formal and informal settings from both concrete and abstract perspectives. Can speak at a formal or professional level of language usually without difficulty. When speaking at a formal or professional level some patterns of errors may still appear but these do not interfere with communication.
10 = Superior = Speaks like a highly educated native speaker. Can participate fully and effectively in conversations on a variety of topics in formal and informal settings from both concrete and abstract perspectives with accuracy and fluency using formal and professional quality language. Occasional errors may still occur but these do not interfere with communication.