Hostname: page-component-7b9c58cd5d-f9bf7 Total loading time: 0 Render date: 2025-03-14T04:51:17.350Z Has data issue: false hasContentIssue false

Another bilingual advantage? Perception of talker-voice information

Published online by Cambridge University Press:  09 June 2017

SUSANNAH V. LEVI*
Affiliation:
New York University
*
Address for correspondence: Susannah Levi, New York University, Department of Communicative Sciences and Disorders, 665 Broadway, 9th floor, New York, NY 10012, svlevi@nyu.edu
Rights & Permissions [Opens in a new window]

Abstract

A bilingual advantage has been found in both cognitive and social tasks. In the current study, we examine whether there is a bilingual advantage in how children process information about who is talking (talker-voice information). Younger and older groups of monolingual and bilingual children completed the following talker-voice tasks with bilingual speakers: a discrimination task in English and German (an unfamiliar language), and a talker-voice learning task in which they learned to identify the voices of three unfamiliar speakers in English. Results revealed effects of age and bilingual status. Across the tasks, older children performed better than younger children and bilingual children performed better than monolingual children. Improved talker-voice processing by the bilingual children suggests that a bilingual advantage exists in a social aspect of speech perception, where the focus is not on processing the linguistic information in the signal, but instead on processing information about who is talking.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2017 

Introduction

Speech is a complex acoustic signal that simultaneously carries information about what is being said (linguistic information) and about who is saying it (talker information). While it is possible to process these two dimensions separately, numerous studies have shown that they interact during speech perception (e.g., Cutler, Andics & Fang, Reference Cutler, Andics, Fang, Lee and Zee2011; Green, Tomiak & Kuhl, Reference Green, Tomiak and Kuhl1997; Mullennix & Pisoni, Reference Mullennix and Pisoni1990), showing the importance of exploring how people process talker information for understanding the larger questions related to spoken language processing. In this study, we test whether being bilingual is an advantage when processing information about who is talking. In particular, we investigate the perception of talker-voice information (who is talking) in both a familiar language (English) and in an unfamiliar language (German) to see whether a bilingual advantage exists and whether it extends to unfamiliar languages. By including talker-voice perception in an unfamiliar language, we are able to rule out the possibility that a bilingual advantage in English, a familiar language, is due to being exposed to a wider variety of speech samples in English (presumably both native speakers of English and foreign-accented English).

The majority of people in the world grow up in multilingual environments, whether that means that they speak more than one language or are simply exposed to speakers of other languages (Hamers & Blanc, Reference Hamers and Blanc2000). In recent years, there has been an increase in interest in how bilingualism or exposure to multiple languages affects cognitive and social processing. The majority of studies of a bilingual advantage have focused on executive functioning and cognitive control, and a few studies have examined the social benefits of bilingualism.

In terms of executive function, several studies have found a bilingual advantage where bilinguals respond faster and are better at inhibiting irrelevant information or shifting to a new task. This advantage has been found in children (Bialystok, Reference Bialystok1999; Bialystok & Martin, Reference Bialystok and Martin2004; Carlson & Meltzoff, Reference Carlson and Meltzoff2008), older adults (Bialystok, Craik, Klein & Viswanathan, Reference Bialystok, Craik, Klein and Viswanathan2004; Gold, Kim, Johnson, Kryscio & Smith, Reference Gold, Kim, Johnson, Kryscio and Smith2013), and young adults who are at their peak of executive function / cognitive control ability (Costa, Hernández & Sebastián-Gallés, Reference Costa, Hernández and Sebastián-Gallés2008; Friesen, Latman, Calvo & Bialystok, Reference Friesen, Latman, Calvo and Bialystok2015). Bilingualism may even act a buffer for other factors that are known to negatively affect executive function skills, such as lower socio-economic status (SES) (Carlson & Meltzoff, Reference Carlson and Meltzoff2008) or normal aging (Bialystok et al., Reference Bialystok, Craik, Klein and Viswanathan2004; Gold et al., Reference Gold, Kim, Johnson, Kryscio and Smith2013). More recently, Incera and McLennan (Reference Incera and McLennan2015) used a mouse tracker to examine online processing in a Stroop task (Stroop, Reference Stroop1935) and found that bilinguals take longer to initiate their response, but then move faster to their target, a pattern that the authors argue matches the performance of experts in other domains (e.g., baseball players who take longer to initiate a swing of the bat but then swing faster). The benefits of bilingualism on executive function have not gone without criticism, however. Several articles argue against the bilingual advantage (de Bruin, Treccani & Della Sala, Reference de Bruin, Treccani and Della Sala2015; Paap & Greenberg, Reference Paap and Greenberg2013; Paap, Johnson & Sawi, Reference Paap, Johnson and Sawi2014; Paap & Sawi, Reference Paap and Sawi2014; Paap, Sawi, Dalibar, Darrow & Johnson, Reference Paap, Sawi, Dalibar, Darrow and Johnson2014), but see (Ansaldo, Ghazi-Saidi & Adrover-Roig, Reference Ansaldo, Ghazi-Saidi and Adrover-Roig2015; Arredondo, Hu, Satterfield & Kovelman, Reference Arredondo, Hu, Satterfield and Kovelman2015; Saidi & Ansaldo, Reference Saidi and Ansaldo2015) for a rebuttal.

In addition to studies showing a bilingual advantage in cognitive control, studies have also found a bilingual advantage in social perception. Greenberg, Bellana, and Bialystok (Reference Greenberg, Bellana and Bialystok2013) asked school-age children to shift their perspective to that of another person (cartoon owl) in a visual task. Children saw a cartoon owl looking at four blocks and were asked to identify what the blocks looked like from the perspective of the owl, who differed in orientation from the child by 90°, 180°, or 270°. Bilingual children were significantly better at selecting the perspective of the other person. In a similar study using a visual perspective-taking task, Fan, Liberman, Keysar, and Kinzler (Reference Fan, Liberman, Keysar and Kinzler2015) tested whether being exposed to other languages in the environment – but not actually being bilingual – was sufficient to improve performance. They found that even children who were not bilingual but who were exposed to other languages in their community were better than monolinguals at interpreting the visual perspective of another person. Furthermore, they found no difference in performance between bilinguals and those children who were exposed to other languages in the ambient environment, suggesting that exposure is sufficient to generate a social advantage in visual perception

Processing who is talking is an important social component of communication and begins to develop even before birth. Differentiating between the mother's voice and an unfamiliar female has been found in fetuses (Kisilevsky, Hains, Lee, Xie, Huang, Ye, Zhang & Wang, Reference Kisilevsky, Hains, Lee, Xie, Huang, Ye, Zhang and Wang2003), newborns (DeCasper & Fifer, Reference DeCasper and Fifer1980), and infants in their first year of life (Purhonen, Kilpeläinen-Lees, Valkonen-Korhonen, Karhu & Lehtonen, Reference Purhonen, Kilpeläinen-Lees, Valkonen-Korhonen, Karhu and Lehtonen2004, Reference Purhonen, Kilpeläinen-Lees, Valkonen-Korhonen, Karhu and Lehtonen2005). In the first year, infants also begin to differentiate between unfamiliar voices (DeCasper & Prescott, Reference DeCasper and Prescott1984; Johnson, Westrek, Nazzi & Cutler, Reference Johnson, Westrek, Nazzi and Cutler2011). The ability to identify and discriminate voices continues to develop through elementary school and adolescence (Bartholomeus, Reference Bartholomeus1973; Creel & Jimenez, Reference Creel and Jimenez2012; Levi & Schwartz, Reference Levi and Schwartz2013; Mann, Diamond & Carey, Reference Mann, Diamond and Carey1979; Moher, Feigenson & Halberda, Reference Moher, Feigenson and Halberda2010; Spence, Rollins & Jerger, Reference Spence, Rollins and Jerger2002).

The close ties between linguistic processing and talker processing have led to studies exploring what determines performance on a talker-voice processing task. Because a salient feature of talker identity is a speaker's fundamental frequency (vocal pitch), Xie and Myers (Reference Xie and Myers2015) tested talker identification in native-English listeners with and without musical training. They found that the musicians performed better than non-musicians in an unfamiliar language (Spanish), suggesting shared resources between linguistic and non-linguistic perception. In another study, Krizman, Marian, Shook, Skoe, and Kraus (Reference Krizman, Marian, Shook, Skoe and Kraus2012) found that bilinguals have more robust pitch processing than monolinguals.

Several recent studies have examined how bilinguals and people exposed to a second language process talker information. Bregman and Creel (Reference Bregman and Creel2014) tested how age of acquisition of a second language affects learning to identify talkers in both the first and second language. They trained listeners to criterion over several learning sessions. Listeners heard sentences produced by both English and Korean talkers. They found that listeners learn talkers faster in their L1, similar to previous studies that have found that listeners are better at processing talker information in their native language (Goggin, Thompson, Strube & Simental, Reference Goggin, Thompson, Strube and Simental1991; Goldstein, Knight, Bailis & Conover, Reference Goldstein, Knight, Bailis and Conover1981; Hollien, Majewski & Doherty, Reference Hollien, Majewski and Doherty1982; Köster & Schiller, Reference Köster and Schiller1997; Levi & Schwartz, Reference Levi and Schwartz2013; Perrachione, Del Tufo & Gabrieli, Reference Perrachione, Del Tufo and Gabrieli2011; Perrachione, Dougherty, McLaughlin & Lember, Reference Perrachione, Dougherty, McLaughlin and Lember2015; Perrachione & Wong, Reference Perrachione and Wong2007; Schiller & Köster, Reference Schiller and Köster1996; Thompson, Reference Thompson1987; Winters, Levi & Pisoni, Reference Winters, Levi and Pisoni2008; Zarate, Tian, Woods & Poeppel, Reference Zarate, Tian, Woods and Poeppel2015), even when the stimuli are time-reversed (Fleming, Giordano, Caldara & Belin, Reference Fleming, Giordano, Caldara and Belin2014). In addition, Bregman and Creel (Reference Bregman and Creel2014) found that bilinguals who had learned the L2 early were faster at learning talkers in the L2 than were bilinguals who had learned the L2 later in life. They argue that the better performance by early bilinguals suggests a close link between speech representations and talker representations.

In another study, Orena, Theodore, and Polka (Reference Orena, Theodore and Polka2015) tested how adult listeners learned to identify talkers. They compared English monolinguals who were living in Montréal and thus had extensive exposure to French and those living in Connecticut who did not have frequent exposure with French. Even though both groups of adults did not speak French fluently, those who had extensive exposure to French were better at learning to identify voices in that language.

In Xie and Myers (Reference Xie and Myers2015), they not only examined the effect of musical training on the perception of talker information (see above), but also compared performance in adult listeners who were monolingual, native speakers of English with those who were Mandarin–English bilinguals. In contrast to Bregman and Creel (Reference Bregman and Creel2014) and to Orena et al. (Reference Orena, Theodore and Polka2015), Xie and Myers did not find a difference in talker processing between these two groups of listeners. We return to this point in the discussion.

In the current study, we expand on these data by testing whether a bilingual advantage in processing talker information is present in children. The current study is a strong test of the benefits of bilingualism because it looks for differences in both a language familiar to all participants (English) and a language that is unfamiliar to all participants (German). The talkers used here are German(L1)–English(L2) bilinguals and thus have produced English with a foreign accent. Bilingual speakers – rather than two different sets of monolingual speakers – were used to minimize differences in discriminability in the two different languages. If a bilingual advantage is present even in an unfamiliar language (German), it would indicate that the ability is not limited to experience listening to foreign-accented talkers in the dominant language. We also extend the previous findings by testing talker processing across different tasks: discrimination, learning, and identification of talkers. An additional extension in the current study is testing children versus adults. This allows us to examine the development of talker processing and also allows us to better control language exposure of the monolingual group. For example, in Xie and Myers (Reference Xie and Myers2015) a large portion of the monolingual English, college-attending listeners had studied Spanish, the other unfamiliar language, and no mention of other languages studied by the college students used as listeners was mentioned. We return to this point in the discussion.

We expect bilinguals to perform better than monolinguals for a variety of reasons. First, bilinguals are more likely to have heard the same person speaking different languages so they have experience perceiving the same voice in multiple languagesFootnote 1. Second, knowing who is talking is relevant for interpreting both linguistic information and also social information, as who is talking may affect how to react in a particular social setting, such as weighing how much trust to put into someone's statement. Given that previous studies have found a bilingual advantage in other tasks tapping social processing (see above), we expect that bilinguals will be better at processing information about who is talking. Third, even though the bilinguals in the current study do not have experience listening to the particular type of foreign accented speech used here (German L1 talkers), they are likely to have more experience listening to foreign-accented speech than do monolingual children. Finally, given that previous research has found that bilinguals have more robust pitch perception and that pitch perception is an important component of processing talker-voice information, we expect bilinguals to perform better than monolinguals.

Methods

Participants

Forty-one children (22 monolinguals, 19 bilinguals) were selected from a dataset of 97 subjects who had participated in a larger study. All children were attending school in New York City with English as the primary language of instruction. The 97 children all started a multiple-session study (7–12 sessions) that included the experimental tasks described below, as well as other experimental tasks which will not be described. In addition to the experimental tasks, children were given subtests from the Clinical Evaluation of Language Fundamentals-4 (CELF) (Semel, Wiig & Secord, Reference Semel, Wiig and Secord2003) and were also given the Test Of Nonverbal Intelligence-3 (TONI) (Brown, Sherbenou & Johnsen, Reference Brown, Sherbenou and Johnsen1997). The CELF-4 subtests generate a composite “Core Language” score that is normed with a mean of 100 and a standard deviation of 15. The TONI is a test of nonverbal cognition using black and white line drawings and tests problem solving and pattern matching. It is also normed to 100 with a standard deviation of 15. Parents were asked to fill out a questionnaire that included five questions about the child's language use ((i) What language(s) does your child speak?, (ii) What language does your child use most?, (iii) What language does your child use best?, (iv) What language does your child prefer?, (v) What language does your child understand best?), questions about the language(s) spoken by other family members living with the child, and questions about the languages spoken by other caregivers (e.g., nanny, daycare provider).

First, children from the larger subject pool were categorized as monolingual, bilingual, or other as follows: Monolingual children were those for whom the parent report indicated only “English” for the five language questions listed above and indicated that there was no caregiver or family member living with the child who spoke another language (n=52). Children were categorized as bilingual if there was a family member living with the child who spoke another language or if the parent report indicated an additional language for any of the language questions listed above (n=40). Thus, the bilingual group includes both children who speak/understand another language and children who are exposed to another language on a daily basis. This latter set of children who are merely exposed to another language is similar to the groups used in the social processing (Fan et al., Reference Fan, Liberman, Keysar and Kinzler2015) and talker processing tasks (Orena et al., Reference Orena, Theodore and Polka2015) discussed above. The remaining five children were monolingual, but had an additional caregiver not living with the child who spoke another language, or had incomplete language background questionnaires. These five children were excluded at this point. None of the participants in the monolingual or bilingual groups had any knowledge of German (the second language used in the perception task).

The monolingual and bilingual groups were then divided into two groups for age, with a younger group that was nine years and younger and an older group that was 10 years and older. In addition, children were only included if they had completed the CELF-4 and TONI-3 and scored 85 or above on these tests (greater than one standard deviation below the mean). Children were eliminated if they failed the hearing screening or if the parent report indicated (i) a failed hearing screening, (ii) a diagnosis of ADHD, or (iii) a diagnosis of Autism/Asperger's. These exclusion criteria ensured that children who were included in the study had language and nonverbal skills in the typical range for children their age. These additional subject criteria resulted in the following four groups of children: Younger monolingual (n=12), Older monolingual (n=10), Younger bilingual (n=11), Older bilingual (n=8). The children in the bilingual groups were exposed to the following languages: Spanish (n=11), Russian (n=2), Hindi (n=1), Greek (n=1), Italian (n=1), Tagalog (n=1), Japanese (n=1), Fulani (n=1). Two of the Spanish speaking children also had exposure to an additional language (Catalan and French). None of these languages are tone languages.

In addition to age, language, and nonverbal IQ measures, we also collected information about socioeconomic status (SES). We used mother's education level as a proxy for SES (Ensminger & Fothergill, Reference Ensminger, Fothergill, Bornstein and Bradley2003), and divided education into an 8-point scale following Buac and Kaushanskaya (Reference Buac and Kaushanskaya2014) as follows: 1, less than high school; 2, high school; 3, one year of college; 4, two years of college / associate's degree; 5, three years of college; 6, four years of college; 7, master's degree; 8, Ph.D., J.D., M.D. The mean SES for the monolingual group indicates an average three years of college and for the bilingual group indicates two-to-three years of college. Information about the participant groups is in Table 1.

Table 1. Demographic data for the four groups.

Three separate two-way ANOVAs were run with Group (monolingual, bilingual) and Age (younger, older) as between-subjects factors for the Core Language composite of the CELF-4, the TONI-3 score (scaled for age), and SES. No main effects or interactions were significant. See appendix A for the full ANOVA results.

Stimuli

Three female bilingual German(L1)–English(L2) speakers were recorded producing 360 monosyllabic CVC words in both English and German (see Winters et al., Reference Winters, Levi and Pisoni2008 for details of the recording methods). The three bilingual speakers used in the current study were highly intelligible and selected from a larger group of bilinguals (Levi, Winters & Pisoni, Reference Levi, Winters and Pisoni2011) – see Table 2. Additionally, the three speakers were selected as having relatively different average fundamental frequency across productions. Fundamental frequency, whose perceptual correlate is vocal pitch, is a salient feature used to distinguish speakers (Van Dommelen, Reference Van Dommelen1987). The current study is designed to use the same talkers in the English (familiar) and German (unfamiliar) conditions to control for the fact that some speakers sound more or less similar to each other and to limit paralinguistic (e.g., acoustic) differences across the talkers. Thus, in the current study, the talkers speak English with a slight foreign accent (Levi et al., Reference Levi, Winters and Pisoni2011).

Table 2. Speaker information.

Note: Intelligibility and F0 information about the speakers used in the current study. “Intelligibility” refers to the average number of words correctly identified in the clear (Levi, Winters & Pisoni, Reference Levi, Winters and Pisoni2011). For reference, average intelligibility for five native speakers in the same task ranged from 74.6% to 92.1%.

“Foreign-accent Rating” is the average raw rating on a 7-point scale ranging from 0–6 (Levi, Winters & Pisoni, Reference Levi, Winters and Pisoni2007). For reference, five native speakers in the same task ranged from 1.1-1.5. “F0” is the average fundamental frequency of the vowels for the 360 English and German words.

Procedure

Participants completed three experimental tasks: two separate self-paced talker discrimination experiments in English and German on the first day of the study and a five-day talker-voice learning task which was completed on later days of the experiment. All experiments were conducted on a Panasonic Toughbook CF-52 laptop with a touch screen running Windows XP. All experiments were created with E-Prime 2.0 Professional (Schneider, Eschman & Zuccolotto, Reference Schneider, Eschman and Zuccolotto2007). Children sat at a desk or table facing the computer screen. Stimuli were presented binaurally over Sennheiser HD-280 circumaural headphones. Children were tested in a quiet room either at school, in their home, or in the Department of Communicative Sciences and Disorders at New York University.

For the two talker discrimination tasks, children were randomly assigned to completing the English talker discrimination task first or second. The English talker discrimination task was completed first for 7/12 of the younger monolingual group, for 6/10 for the older monolingual group, for 5/11 for the younger bilingual group and for 3/8 of the older bilingual group. Children were informed whether they would be listening to English or a language other than English, but were not told what the other language would be. Participants heard pairs of distinct words separated by a 750 ms inter-stimulus interval. Participants were asked to determine whether the words were spoken by the same talker or by two different talkers. In all trials, different lexical items were used, requiring listeners to process the talker information at an abstract level (e.g., Talker 1 book, Talker 2 room; Talker 1 cat, Talker 1 name). Participants listened to one practice block with 12 trials followed by two experimental blocks of 24 trials. Half of the trials were same-talker pairs and half were different-talker pairs.

Children also completed a talker-voice learning task in English. For this task, children completed five days of talker-voice training in which they learned to identify the voices of three unfamiliar talkers (the same talkers used in the discrimination task), represented as cartoon-like characters on the computer screen. Each day of talker-voice learning consisted of two sessions with feedback (“training”) and one test session without feedback (“test”). Children were instructed that they would hear a single English word and have to decide which of three characters produced the word by tapping the screen (Figure 1). During each session, children first had a familiarization phase in which they heard the same four words produced by all three talkers twice. Only the image of the actual character/talker appeared on the screen during the familiarization phase. After familiarization, children completed 30 trials (10 words produced by all three talkers).

Figure 1. Response screen for Talker Learning. Figure from Levi (Reference Levi2015). Reprinted with permission.

During this phase, children heard a word and had to select which character had spoken the word. In the two feedback sessions, after their response, children received two forms of feedback: first they were shown a smiley or frowny face to indicate their accuracy and then they heard the word again while the image of the correct character/talker appeared on the screen. An outline of a single trial is provided in Figure 2. Each day children completed the exact same feedback session twice and then completed the test session with different lexical items. Test sessions had the same format as the training sessions except no feedback was provided. Thus, over the course of talker-voice learning, children were presented with 104 distinct lexical items produced by all three talkers as follows: same 4 lexical items during familiarization, 10 lexical items x 5 days for the feedback sessions, and 10 lexical items x 5 days for the test sessions without feedback.

Figure 2. Sample trial procedure during the five days of training. Figure from Levi (Reference Levi2015). Reprinted with permission.

Results

Talker discrimination in English and German

To assess performance on the two talker discrimination tasks, a logit mixed-effects model (Baayen, Reference Baayen2008; Baayen, Davidson & Bates, Reference Baayen, Davidson and Bates2008; Jaeger, Reference Jaeger2008) was fit to the English and German data using the glmer() function with the binomial response family (Bates, Maechler, Bolker & Walker, Reference Bates, Maechler, Bolker and Walker2010) in R (http://www.r-project.org/). The built-in p-value calculations from the glmer() function were used. A model was fit to the data with fixed effects for Group (monolingual, bilingual), Language (English, German), and Age (in months which was scaled), along with random intercepts by Subjects and random slopes for Language by Subject.

The model revealed a significant effect of Group (estimate = −.364, standard error = .165, z-value = −2.20, p = .027), of Age (estimate = .322, standard error = .130, z-value = 2.47, p = .013), and of Language (estimate = −.290, standard error = .113, z-value = −2.54, p = .010), illustrated in Figure 3. None of the interactions reached significance (all p > .59). These main effects indicate that overall, bilinguals performed better than monolinguals, that older children performed better than younger children, and that children performed better in the English condition than in the German condition. The full table of results can be found in appendix B.

Figure 3. Proportion correct in the English (black lines) and German (grey lines) discrimination tasks. Error bars represent standard error.

Because a primary question is whether the bilingual group would perform better than the monolingual group in the German condition, where none of the children have prior exposure to German, we constructed separate models for the two language conditions with Group and Age as fixed effects and with Subject as a random effect. For the German AX task, the model revealed a significant effect of Group (estimate = −.378, standard error = .179, z-value = −2.11, p = .034) and of Age (estimate = .338, standard error = .142, z-value = 2.68, p = .007). Similar results were found for the English AX task, where the model revealed a significant effect of Group (estimate = −.361, standard error = .162, z-value = −2.22, p = .026) and of Age (estimate = .315, standard error = .128, z-value = 2.46, p = .013).

Talker-voice learning

Two different analyses were conducted on the talk-voice learning data. First, we examined the number of days it took for children to reach 70% correct talker-voice identification in Training sessions (averaging performance for the two feedback sessions completed per day) and in the Test sessions where listeners received no feedback. This criterion was used because 70% has been used in other studies of voice learning to differentiate good learners from poor learners (Levi et al., Reference Levi, Winters and Pisoni2011; Nygaard & Pisoni, Reference Nygaard and Pisoni1998). Because a single data point is used per subject (number of days to criterion), a simple two-way ANOVA with Group (monolingual, bilingual) and Age (younger, older) as between-subjects factors conducted on these data. The ANOVA for the number of days to criterion for training sessions with feedback revealed a significant main effect of Group (F(1,37) = 5.16, p = .028, η2p = .12). Neither the main effect of Age (F(1,37) = 1.71, p = .198) nor the interaction (F(1,37) = 0.57, p=.45) were significant. As seen in Figure 4, bilingual children took fewer days to reach the 70% criterion than monolingual children.

Figure 4. Average number of days to reach criterion (70% accuracy) in the training phase (with feedback). Error bars represent standard error.

Similarly, the ANOVA for the number of days to reach criterion on the test condition (no feedback) revealed a main effect of Group (F(1,37) = 4.34, p = .044, η2p = .105). Neither the main effect of Age (F(1,37) = 1.22, p = .27) nor the interaction (F(1,37) = .03, p = .85) were significant. As in the training sessions, bilingual children took fewer days to reach the 70% criterion than the monolingual children (see Figure 5).

Figure 5. Average number of days to reach criterion (70% accuracy) in the test phase (no feedback). Error bars represent standard error.

Second, separate models were fit to the Training data (with feedback) and the Test data (no feedback) with fixed effects for Group (monolingual, bilingual), and Age (in months which was scaled), Day (1-5), and the Group by Age interaction, along with random intercepts by Subjects and random slopes for Day by Subject. For the Training data, a fixed effect for Session (1, 2) was also included.

For the training data, the model revealed a significant effect of Day (estimate = .322, standard error = .035, z-value = 9.15, p < .001) and a nearly significant effect of Group (estimate = −.325, standard error = .166, z-value = −1.95, p = .051), illustrated in Figure 6. None of the remaining main effects nor the interaction reached significance (all p> .13). The full table of results can be found in Appendix B. These main effects indicate that, overall, bilinguals performed better than monolinguals and that children performed better with additional days of training. The effect of Day is clearly visible in Figure 6, where talker identification accuracy gradually improves across the five days of training. The effect of Group is also visible, where the bilinguals (dashed lines) are more accurate than the monolinguals (solid lines).

Figure 6. Proportion correct on talker identification during the five days of training (with feedback). Error bars represent standard error.

For the Test data, the model revealed a significant effect of Day (estimate = .244, standard error = .045, z-value = 5.36, p < .001) and of Age (estimate = .329, standard error = .151, z-value = 2.17, p = .029). None of the remaining main effects or the interaction reached significance (all p > .09). The full table of results can be found in Appendix B. The effect of Day is clearly visible in Figure 7, where talker identification accuracy gradually improves across the five days of training. As has been found in previous studies, both adults (Winters et al., Reference Winters, Levi and Pisoni2008) and children (Levi, Reference Levi2014) perform significantly worse on talker identification when no feedback is provided, ultimately reducing the range of performance.

Figure 7. Proportion correct on talker identification during the five days of testing (no feedback). Error bars represent standard error.

Discussion

Several findings emerged from the experiments presented here. First, the current study confirmed the findings of many previous studies showing that processing talker-voice information in the speech signal improves across development. Second, the results of the current study show that a bilingual advantage exists for talker-voice perception in a familiar language (English) and in an unfamiliar language (German). In the familiar language, bilingual children were better at discriminating voices, better at learning to identify the voices, and also faster at learning the voices (days to criterion). In the unfamiliar language, bilingual children were better at discriminating voices. Third, the current study revealed a language familiarity effect, where children discriminated voices better in the familiar language (English) than in an unfamiliar language (German).

Before delving into the mechanisms that underlie these results, it is important to point out some differences between the current study and other studies that have examined talker-voice processing. The current study tested two different experimental paradigms, an AX discrimination task and a talker-voice training task that took place over five days of training. As will be shown below, most other studies have used a single task and results are not always consistent across different studies. Early studies of talker-voice processing used a voice line-up task where listeners are exposed to a voice and then have to pick it out from an array of voices (Goggin et al., Reference Goggin, Thompson, Strube and Simental1991; Goldstein et al., Reference Goldstein, Knight, Bailis and Conover1981; Stevenage, Clarke & McNeill, Reference Stevenage, Clarke and McNeill2012; Sullivan & Schlichting, Reference Sullivan and Schlichting2000; Thompson, Reference Thompson1987). Other experimental paradigms include familiarization with a voice followed by trials in which the listener has to respond with whether they match the target voice (Köster & Schiller, Reference Köster and Schiller1997; Schiller & Köster, Reference Schiller and Köster1996), brief voice familiarization with multiple talkers followed by an identification task (Bregman & Creel, Reference Bregman and Creel2014; Orena et al., Reference Orena, Theodore and Polka2015; Perrachione, Chiao & Wong, Reference Perrachione, Chiao and Wong2010; Perrachione et al., Reference Perrachione, Del Tufo and Gabrieli2011; Perrachione et al., Reference Perrachione, Dougherty, McLaughlin and Lember2015; Perrachione & Wong, Reference Perrachione and Wong2007; Xie & Myers, Reference Xie and Myers2015; Zarate et al., Reference Zarate, Tian, Woods and Poeppel2015), extensive training and identification of talkers (Nygaard & Pisoni, Reference Nygaard and Pisoni1998; Winters et al., Reference Winters, Levi and Pisoni2008), voice similarity rating where listeners hear two speech samples and rate the likelihood that they were produced by the same speaker (Fleming et al., Reference Fleming, Giordano, Caldara and Belin2014), and discrimination tasks (Levi & Schwartz, Reference Levi and Schwartz2013; Winters et al., Reference Winters, Levi and Pisoni2008).

Another difference between the current study and many previous studies is in the type of stimuli that were used. Most other studies of talker-voice perception use sentence-length utterances (Bregman & Creel, Reference Bregman and Creel2014; Fleming et al., Reference Fleming, Giordano, Caldara and Belin2014; Orena et al., Reference Orena, Theodore and Polka2015; Perrachione et al., Reference Perrachione, Chiao and Wong2010). The study by Zarate et al. (Reference Zarate, Tian, Woods and Poeppel2015) presented several polysyllabic words on a single trial. In the experiments presented here, listeners were provided much less information from each speaker on a given trial because stimuli were only monosyllabic CVC words. Previous research has shown that the type of talker information available from sentence-length utterances and word-length utterances is not the same. In a task looking at transfer between talker-voice training and word recognition of familiar talkers, Nygaard and Pisoni (Reference Nygaard and Pisoni1998) demonstrated that learning a voice from word-length stimuli results in better word recognition in noise, but learning a voice form sentence-length utterances does not result in better word recognition in noise. They argue that the phonetic information available in the sentence length utterances is different from the word-length utterances. This is not surprising given that sentences have more complex intonation patterns than words and thus provide different information about the speaker. We now discuss each of the findings in turn.

Age

The effect of age found in the discrimination tasks and in the accuracy analysis of the Test data replicates many findings of talker-voice perception showing improvement across age (Bartholomeus, Reference Bartholomeus1973; Creel & Jimenez, Reference Creel and Jimenez2012; Levi & Schwartz, Reference Levi and Schwartz2013; Mann et al., Reference Mann, Diamond and Carey1979; Moher et al., Reference Moher, Feigenson and Halberda2010; Spence et al., Reference Spence, Rollins and Jerger2002). The lack of an effect of Age in the other analyses related to the five days of training may be due to the extensive training and exposure that participants got, essentially washing out group differences. As will be discussed below, the language familiarity effect has been shown to be eliminated in an extensive training paradigm similar to the one used here.

A bilingual advantage for talker-voice processing

The current study also showed a bilingual advantage, which we can separate into two different types based on familiarity with the language. In the first type, bilinguals outperformed monolinguals in talker-voice tasks that were presented in foreign-accented English. Previous research has demonstrated that listeners are worse when processing accented speech, whether it is due to a foreign accent (Goggin et al., Reference Goggin, Thompson, Strube and Simental1991; Goldstein et al., Reference Goldstein, Knight, Bailis and Conover1981; Thompson, Reference Thompson1987) or to dialect differences (Kerstholt, Jansen, Van Amelsvoort & Broeders, Reference Kerstholt, Jansen, Van Amelsvoort and Broeders2006; Perrachione et al., Reference Perrachione, Chiao and Wong2010). In the case of dialect differences, these latter two studies found asymmetrical differences for own vs other dialect where speakers of the non-standard variety showed less of a difference between dialects than speakers of the standard dialect. The authors attribute this asymmetry to differences in exposure: Speakers of nonstandard varieties have extensive exposure to the standard variety, but not the other way around. Thus, one possible explanation for the bilingual advantage in the English conditions is that the children in the bilingual group simply have more experience with (foreign) accented speech and thus perform better.

Another possible explanation for the bilingual advantage in the English tasks is that the bilinguals have better cognitive control (inhibition) and are able to focus on the task (processing who is talking), while suppressing irrelevant information, namely that the speech is accented. Yet another possibility is that bilinguals have better social perception and perceiving the voice of a talker is highly relevant in social situations. Future studies will be needed to adjudicate between these alternatives.

The second way that the bilingual advantage manifested itself here was with better performance in an unfamiliar language. In this case, it is not possible to explain the better performance of the bilinguals as resulting from experience because none of the children had experience with German. As in the bilingual advantage for the experiments with English stimuli, it is possible that the advantage in the German condition is due to either better cognitive control (inhibition), to better social processing, or to better pitch perception. In the German condition, children must suppress the somewhat distracting fact that they cannot understand what is being said and must attend only to the talker-voice information. Thus, it is possible that the better cognitive control of the bilinguals translates to better processing in the German condition. Alternatively, bilinguals may perform better in an unfamiliar language because they simply have more experience listening to talker information in multiple languages, even if this experience does not come from German. This could be related to the literature on high variability training which shows that performance improves when listeners learn from many different speakers or speaker types versus a single speaker or single speaker type (Bradlow & Bent, Reference Bradlow and Bent2008; Bradlow, Pisoni, Akahane-Yamada & Tohkura, Reference Bradlow, Pisoni, Akahane-Yamada and Tohkura1997; Clopper & Pisoni, Reference Clopper and Pisoni2004; Logan, Lively & Pisoni, Reference Logan, Lively and Pisoni1991; Sidaras, Alexander & Nygaard, Reference Sidaras, Alexander and Nygaard2009). Finally, better performance in the German condition could be due to better pitch perception by bilinguals (Krizman et al., Reference Krizman, Marian, Shook, Skoe and Kraus2012).

It should be noted that a bilingual advantage for an unfamiliar language was not found in Xie and Myers (Reference Xie and Myers2015). They tested Mandarin(L1)–English(L2) (non-musician) and L1 English (non-musician) listeners on talker identification in Spanish. For these two groups of listeners, they did not find better performance for the bilingual Mandarin–English listeners, who presumably had two characteristics that could have made them better: (1) being a speaker of a tone language and thus having better pitch perception and (2) being a bilingual and thus having better pitch perception. There is no way of knowing why a bilingual advantage was not found in their study, but several possibilities exist. First, 17/26 of the L1 English listeners had studied Spanish so Spanish was not truly an unfamiliar language. Even though these listeners reported that they were not fluent in Spanish, the results of Orena et al. (Reference Orena, Theodore and Polka2015) with Montreal listeners suggested that exposure and not fluency is sufficient for performing better in an exposed language. Second, Xie and Myers (Reference Xie and Myers2015) used sentence-length utterances with training and identification, whereas the current study used monosyllabic words with a discrimination task. Future studies will need to probe whether training tasks can eliminate or reduce predicted group differences, as has been found in Winters et al. (Reference Winters, Levi and Pisoni2008), where the generally robust language familiarity effect was absent in the training-identification task, but was present in the discrimination task.

Language familiarity effect

Finally, the current study replicated the language familiarity effect which has been found for both a native language (Goggin et al., Reference Goggin, Thompson, Strube and Simental1991; Perrachione et al., Reference Perrachione, Del Tufo and Gabrieli2011; Perrachione et al., Reference Perrachione, Dougherty, McLaughlin and Lember2015; Perrachione & Wong, Reference Perrachione and Wong2007; Thompson, Reference Thompson1987; Winters et al., Reference Winters, Levi and Pisoni2008) and a second language (Köster & Schiller, Reference Köster and Schiller1997; Schiller & Köster, Reference Schiller and Köster1996; Sullivan & Schlichting, Reference Sullivan and Schlichting2000). The source of the language familiarity effect has been explored in several recent studies and has been attributed either to comprehension – retrieving lexical and/or semantic information – or to phonological knowledge.

Fleming et al. (Reference Fleming, Giordano, Caldara and Belin2014) argue that comprehension is not necessary to exhibit a language familiarity effect. In their study, listeners showed the effect in time-reversed speech which they interpret as evidence that the phonological information that is retained in the time-reversed samples is sufficient to generate better performance in a familiar language. Zarate et al. (Reference Zarate, Tian, Woods and Poeppel2015) also argue for the importance of phonological knowledge. In their study, native English-speaking listeners were better at perceiving talker information in German – a phonologically similar language – than in Mandarin – a phonologically dissimilar language: both of which were languages unfamiliar to the listeners. Additional support for the importance of phonetic/phonological processing in the absence of comprehension comes from Orena et al. (Reference Orena, Theodore and Polka2015) who found that listeners residing in Montréal, but who were not second language speakers of French, performed better than listeners residing in Connecticut who did not have frequent exposure to French. This latter finding suggests that the language familiarity effect is indeed that, familiarity with the language, and does not require access to lexical or semantic information.

In contrast, Perrachione et al. (Reference Perrachione, Dougherty, McLaughlin and Lember2015) argue for the importance of comprehension, or access to lexical and semantic information. Like Fleming et al. (Reference Fleming, Giordano, Caldara and Belin2014), they tested listeners in both forward and backward speech. Listeners who did demonstrate a language familiarity effect in the forward condition did not demonstrate this effect in the backward condition. In a second experiment, they found that listeners were more accurate in learning talkers from real sentences than from nonword sentences. The authors take these two studies as evidence that lexical and semantic information is an important component of the language familiarity effect. Additional evidence for the contribution of lexical/semantic information comes from Goggin et al. (Reference Goggin, Thompson, Strube and Simental1991) who found that listeners were worse in time-reversed speech – which contains some amount of phonological information but lacks lexical/semantic information – than in forward speech. Perrachione et al. (Reference Perrachione, Dougherty, McLaughlin and Lember2015) argue for the importance of both phonetic/phonological and lexical/semantic processing due to a stepwise improvement from Mandarin to English nonwords and then to English words. Zarate et al. (Reference Zarate, Tian, Woods and Poeppel2015) actually argue that phonological information is more important than lexical or semantic information because they did not find a significant difference in performance between German, English nonwords, and English words. It should be noted that Zarate et al. (Reference Zarate, Tian, Woods and Poeppel2015) did not perform similar follow-up analyses to explore these differences, as they did between German and Mandarin, where this latter difference only emerged when examining data from the first 40% of the experiment.

If we delve deeper into the processing of English and German – two phonologically similar and genetically related languages – it is apparent that not all studies have found the same results, suggesting that task effects might modulate performance. Twenty years ago, Köster and Schiller (Reference Köster and Schiller1997) tested three groups of listeners, all of whom had no knowledge of German: English L1, Spanish L1, and Mandarin L1. Using a voice line-up paradigm, they found that English-speaking listeners outperformed Spanish-speaking listeners, a result that was expected given the phonological similarity of English and German. They also found that English-speaking listeners performed better in English than in German, suggesting that phonological similarity is not the only factor affecting performance. Surprisingly, they also found that Mandarin-speaking listeners outperformed English-speaking listeners when listening to German, even though Mandarin and German are phonologically dissimilar and genetically unrelated languages. Phonological similarity appears to play some role in perceiving talker information in an unfamiliar language (English > Spanish), but is clearly not the only factor that influences performance. The better performance by the Mandarin-speaking listeners is consistent with the hypotheses about talker-voice perception outlined in Xie and Myers (Reference Xie and Myers2015), who predict that speaking a tone language would result in better talker-voice perception in an unfamiliar language. Although Xie and Myers (Reference Xie and Myers2015) did not actually find better performance for Mandarin versus English listeners when listening to Spanish stimuli, the results from Köster and Schiller (Reference Köster and Schiller1997) suggest that other properties of the native language, aside from phonological similarity, affect performance.

In a study using speech samples from the same database as those used in the current study, Winters et al. (Reference Winters, Levi and Pisoni2008) found that adults who were native speakers of English with no knowledge of German exhibited the language familiarity effect in a discrimination task, but this effect disappeared in an identification and training paradigm where listeners had extensive exposure to the talkers. The lack of a language familiarity effect in the training paradigm may be due to the extensive exposure, which may have washed out any differences across the two language conditions. Better performance for English versus German in a discrimination task was also found for both children and adults in Levi and Schwartz (Reference Levi and Schwartz2013) and in the current study.

That several studies have found poorer performance in German than in English by native speakers of English suggests that phonological similarity is not sufficient to override other factors. Better performance in English than German could be related to comprehension or to experience with the specific phonetics and phonology of English. The best way to test this would be with English nonwords, and again, studies have shown different results: Perrachione et al. (Reference Perrachione, Dougherty, McLaughlin and Lember2015) found better performance for English words than English nonwords, Zarate et al. (Reference Zarate, Tian, Woods and Poeppel2015) did not.

The different results across the studies of German and English suggest that additional research is needed. From the results presented in the current study, it is not possible to determine whether differences across studies are due to task effects, properties of the stimuli (monosyllabic words, multiple polysyllabic words, sentences), age of the participants, or other factors. That differences are found across studies suggests that we should be careful when drawing conclusions from a single experiment.

Conclusion

Taken together, the current study shows that bilingual children – here defined loosely as those children who speak, read, or understand another language or who have a family member who speaks another language living in the household – have a perceptual advantage when processing information about a talker's voice. They are faster to learn the voices of unfamiliar talkers and also perform better overall. They are even better in an unfamiliar language, suggesting that the benefits found for the English stimuli are not attributable only to experience listening to foreign-accented speech. Processing information about a talker combines both social skills, where listeners must understand that people sound different, and perceptual acuity, where listeners must attend to fine-grained phonetic detail in the speech signal. Testing perceptual acuity cannot be done with typical tests of speech sounds because a listener's first and second languages and their fluency in those languages affect how they perceive nonnative contrasts (Best, Reference Best and Strange1995; Best & Tyler, Reference Best and Tyler2007; Flege, Reference Flege and Strange1995; Kuhl & Iverson, Reference Kuhl, Iverson and Strange1995; Strange, Reference Strange2006). In contrast, the tasks used here are able to demonstrate differences in perceptual acuity for acoustic information in the speech signal by focusing on how monolinguals and bilinguals process talker-voice information. The current findings demonstrate a perceptual advantage for bilingual listeners. This finding contributes to the literature on the various types of advantages afforded to bilingual speakers, where they have been found to be better at tasks tapping cognitive control, as well as socially relevant tasks such as taking on the perspective of another person.

Appendix A. Detailed output of the ANOVAs for the demographic variables.

Appendix B. Results for talker discrimination.

Footnotes

This work was supported by a grant from the NIH-NIDCD: 1R03DC009851-01A2. We would like to thank Jennifer Bruno, Emma Mack, Alexandra Muratore, Sydney Robert, and Margo Waltz for help with data collection, three anonymous reviewers for their extremely helpful comments, and the children and families for their participation.

1 We recognize that not all bilingual children have heard the same person in multiple languages, such as in the case where a grandparent speaks the home language but not the language of local community.

References

Ansaldo, A. I., Ghazi-Saidi, L., & Adrover-Roig, D. (2015). Interference Control In Elderly Bilinguals: Appearances Can Be Misleading. Journal of Clinical and Experimental Neuropsychology, 37 (5), 455470. doi:10.1080/13803395.2014.990359CrossRefGoogle ScholarPubMed
Arredondo, M. M., Hu, X.-S., Satterfield, T., & Kovelman, I. (2015). Bilingualism alters children's frontal lobe functioning for attentional control. Developmental Science, n/a-n/a. doi:10.1111/desc.12377Google Scholar
Baayen, R. H. (2008). Analyzing Linguistic Data: A practical introduction to statistics. Cambridge: Cambridge University Press.Google Scholar
Baayen, R. H., Davidson, D. J., & Bates, D. M. (2008). Mixed-effects modeling with crossed random effects for subjects and items. Journal of Memory and Language, 59, 390412.Google Scholar
Bartholomeus, B. (1973). Voice identification by nursery school children. Canadian Journal of Psychology, 27 (4), 464472.CrossRefGoogle ScholarPubMed
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2010). lme4: Linear mixed-effects models using Eigen and S4 Retrieved from http://lme4.r-forge.r-project.org/Google Scholar
Best, C. T. (1995). A direct realist view of cross-language speech perception. In Strange, W. (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 171204). Baltimore: York Press.Google Scholar
Best, C. T., & Tyler, M. D. (2007). Nonnative and second-language speech perception: Commonalities and complementarities. Language experience in second language speech learning: In honor of James Emil Flege, 1334.CrossRefGoogle Scholar
Bialystok, E. (1999). Cognitive complexity and attentional control in the bilingual mind. Child Development, 70 (3), 636644.Google Scholar
Bialystok, E., Craik, F. I. M., Klein, R., & Viswanathan, M. (2004). Bilingualism, aging, and cognitive control: Evidence from the Simon task. Psychology and Aging, 19 (2), 290303. doi:10.1037/0882-7974.19.2.290Google Scholar
Bialystok, E., & Martin, M. M. (2004). Attention and inhibition in bilingual children: evidence from the dimensional change card sort task. Developmental Science, 7 (3), 325339. doi:10.1111/j.1467-7687.2004.00351.xGoogle Scholar
Bradlow, A. R., & Bent, T. (2008). Perceptual adaptation to non-native speech. Cognition, 106 (2), 707729.CrossRefGoogle ScholarPubMed
Bradlow, A. R., Pisoni, D. B., Akahane-Yamada, R., & Tohkura, Y. i. (1997). Training Japanese listeners to identify English /r/ and /l/: IV. Some effects of perceptual learning on speech production. Journal of the Acoustical Society of America, 101 (4), 22992310.CrossRefGoogle Scholar
Bregman, M. R., & Creel, S. C. (2014). Gradient language dominance affects talker learning. Cognition, 130 (1), 8595. doi:https://doi.org/10.1016/j.cognition.2013.09.010Google Scholar
Brown, L., Sherbenou, R. J., & Johnsen, S. K. (1997). TONI-3: Test of nonverbal intelligence (third edition). Austin, TX: Pro-Ed.Google Scholar
Buac, M., & Kaushanskaya, M. (2014). The relationship between linguistic and non-linguistic cognitive controll skills in bilingual children from low socio-economic backgrounds. Frontiers in Psychology, 5, 112.Google Scholar
Carlson, S. M., & Meltzoff, A. N. (2008). Bilingual experience and executive functioning in young children. Developmental Science, 11 (2), 282298. doi:10.1111/j.1467-7687.2008.00675.xCrossRefGoogle ScholarPubMed
Clopper, C. G., & Pisoni, D. B. (2004). Effects of talker variability on perceptual learning of dialects. Language and Speech, 47 (3), 207239.Google Scholar
Costa, A., Hernández, M., & Sebastián-Gallés, N. (2008). Bilingualism aids conflict resolution: Evidence from the ANT task. Cognition, 106 (1), 5986.Google Scholar
Creel, S. C., & Jimenez, S. R. (2012). Differences in talker recognition by preschoolers and adults. Journal of Experimental Child Psychology, 113, 487509.CrossRefGoogle ScholarPubMed
Cutler, A., Andics, A., & Fang, Z. (2011). Inter-dependent categorization of voices and segments. In Lee, W.-S. & Zee, E. (Eds.), Proceedings of the Seventeenth International Congress of Phonetic Sciences (pp. 552555). Hong Kong: Department of Chinese, Translation, and Linguistics, City University of Hong Kong.Google Scholar
de Bruin, A., Treccani, B., & Della Sala, S. (2015). Cognitive Advantage in Bilingualism: An Example of Publication Bias? Psychological Science, 26 (1), 99107. doi:10.1177/0956797614557866Google Scholar
DeCasper, A. J., & Fifer, W. P. (1980). Of human bonding: Newborns prefer their mothers' voices. Science, 208, 11741176.Google Scholar
DeCasper, A. J., & Prescott, P. A. (1984). Human newborn's perception of male voices: Preference, discrimination, and reinforcing value. Developmental Psychology, 17 (5), 481491.CrossRefGoogle ScholarPubMed
Ensminger, M. E., & Fothergill, K. E. (2003). A decade of measuring SES: What it tells us and where to go from here. In Bornstein, M. H. & Bradley, R. H. (Eds.), Socioeconomic status, Parenting, and Child Development (pp. 1327). Mahwah, New Jersey: Lawrence Erlbaum Associates.Google Scholar
Fan, S. P., Liberman, Z., Keysar, B., & Kinzler, K. D. (2015). The Exposure Advantage: Early Exposure to a Multilingual Environment Promotes Effective Communication. Psychological Science. doi:10.1177/0956797615574699Google Scholar
Flege, J. E. (1995). Second language speech learning: Theory, findings, and problems. In Strange, W. (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233277). Baltimore: York Press.Google Scholar
Fleming, D., Giordano, B. L., Caldara, R., & Belin, P. (2014). A language-familiarity effect for speaker discrimination without comprehension. Proceedings of the National Academy of Sciences, 111 (38), 1379513798. doi:10.1073/pnas.1401383111Google Scholar
Friesen, D. C., Latman, V., Calvo, A., & Bialystok, E. (2015). Attention during visual search: The benefit of bilingualism. International Journal of Bilingualism, 19 (6), 693702. doi:10.1177/1367006914534331CrossRefGoogle ScholarPubMed
Goggin, J. P., Thompson, C. P., Strube, G., & Simental, L. R. (1991). The role of language familiarity in voice identification. Memory & Cognition, 19 (5), 448458.CrossRefGoogle ScholarPubMed
Gold, B. T., Kim, C., Johnson, N. F., Kryscio, R. J., & Smith, C. D. (2013). Lifelong bilingualism maintains neural efficiency for cognitive control in aging. The Journal of Neuroscience, 33 (2), 387396. doi:10.1523/jneurosci.3837-12.2013Google Scholar
Goldstein, A. G., Knight, P., Bailis, K., & Conover, J. (1981). Recognition memory for accented and unaccented voices. Bulletin of the Psychonomic Society, 17 (5), 217220.Google Scholar
Green, K. P., Tomiak, G. R., & Kuhl, P. K. (1997). The encoding of rate and talker information during phonetic perception. Perception & Psychophysics, 59 (5), 675692.Google Scholar
Greenberg, A., Bellana, B., & Bialystok, E. (2013). Perspective-taking ability in bilingual children: Extending advantages in executive control to spatial reasoning. Cognitive Development, 28 (1), 4150. doi:https://doi.org/10.1016/j.cogdev.2012.10.002Google Scholar
Hamers, J. F., & Blanc, M. (2000). Bilinguality and bilingualism: Cambridge University Press.Google Scholar
Hollien, H., Majewski, W., & Doherty, E. T. (1982). Perceptual identification of voices under normal, stress, and disguise speaking conditions. Journal of Phonetics, 10, 139148.Google Scholar
Incera, S., & McLennan, C. T. (2015). Mouse tracking reveals that bilinguals behave like experts. Bilingualism: Language and Cognition, FirstView, 111. doi:10.1017/S1366728915000218Google Scholar
Jaeger, T. F. (2008). Categorical data analysis: Away from ANOVAs (transformation or not) and towards logit mixed models. Journal of Memory and Language, 59, 434446.Google Scholar
Johnson, E. K., Westrek, E., Nazzi, T., & Cutler, A. (2011). Infant ability to tell voices apart rests on language experience. Developmental Science, 14 (5), 10021011.Google Scholar
Kerstholt, J. H., Jansen, N. J. M., Van Amelsvoort, A. G., & Broeders, A. P. A. (2006). Earwitnesses: effects of accent, retention and telephone. Applied Cognitive Psychology, 20 (2), 187197. doi:10.1002/acp.1175Google Scholar
Kisilevsky, B. S., Hains, S. M. J., Lee, K., Xie, S., Huang, H., Ye, H. H., Zhang, K., & Wang, Z. (2003). Effects of experience on fetal voice recognition. Psychological Science, 14 (3), 220224.Google Scholar
Köster, O., & Schiller, N. O. (1997). Different influences of the native language of a listener on speaker recognition. Forensic Linguistics, 4, 1828.Google Scholar
Krizman, J., Marian, V., Shook, A., Skoe, E., & Kraus, N. (2012). Subcortical encoding of sound is enhanced in bilinguals and relates to executive function advantages. Proceedings of the National Academy of Sciences, 109 (20), 78777881. doi:10.1073/pnas.1201575109Google Scholar
Kuhl, P. K., & Iverson, P. (1995). Linguistic experience and the “Perceptual Magnet Effect”. In Strange, W. (Ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 121154). Baltimore: York Press.Google Scholar
Levi, S. V. (2014). Individual differences in learning talker categories: The role of working memory. Phonetica, 71 (3), 201226.Google Scholar
Levi, S. V., & Schwartz, R. G. (2013). The development of language-specific and language-independent talker processing. Journal of Speech, Language, and Hearing Research, 56, 913920.Google Scholar
Levi, S. V. (2015). Talker familiarity and spoken word recognition in school-age children. Journal of Child Language, 42 (4), 843872.Google Scholar
Levi, S. V., Winters, S. J., & Pisoni, D. B. (2007). Speaker-independent factors affecting the perception of foreign accent in a second language. Journal of the Acoustical Society of America, 121, 23272338.Google Scholar
Levi, S. V., Winters, S. J., & Pisoni, D. B. (2011). Effects of cross-language voice training on speech perception: Whose familiar voices are more intelligible? Journal of the Acoustical Society of America, 130 (6), 40534062.Google Scholar
Logan, J. S., Lively, S. E., & Pisoni, D. B. (1991). Training Japanese listeners to identify English /r/ and /l/: A first report. Journal of the Acoustical Society of America, 89 (2), 874886.Google Scholar
Mann, V. A., Diamond, R., & Carey, S. (1979). Development of voice recognition: Parallels with face recognition. Journal of Experimental Child Psychology, 27, 153165.Google Scholar
Moher, M., Feigenson, L., & Halberda, J. (2010). A one-to-one bias and fast mapping support preschoolers’ learning about faces and voices. Cognitive Science, 34 (5), 719751.Google Scholar
Mullennix, J. W., & Pisoni, D. B. (1990). Stimulus variability and processing dependencies in speech perception. Perception & Psychophysics, 47 (4), 379390.CrossRefGoogle ScholarPubMed
Nygaard, L. C., & Pisoni, D. B. (1998). Talker-specific learning in speech perception. Perception & Psychophysics, 60 (3), 355376.Google Scholar
Orena, A. J., Theodore, R. M., & Polka, L. (2015). Language exposure facilitates talker learning prior to language comprehension, even in adults. Cognition, 143, 3640. doi:https://doi.org/10.1016/j.cognition.2015.06.002Google Scholar
Paap, K. R., & Greenberg, Z. I. (2013). There is no coherent evidence for a bilingual advantage in executive processing. Cognitive Psychology, 66 (2), 232258. doi:https://doi.org/10.1016/j.cogpsych.2012.12.002CrossRefGoogle ScholarPubMed
Paap, K. R., Johnson, H. A., & Sawi, O. (2014). Are bilingual advantages dependent upon specific tasks or specific bilingual experiences? Journal of Cognitive Psychology, 26 (6), 615639. doi:10.1080/20445911.2014.944914Google Scholar
Paap, K. R., & Sawi, O. (2014). Bilingual Advantages in Executive Functioning: Problems in Convergent Validity, Discriminant Validity, and the Identification of the Theoretical Constructs. Frontiers in Psychology, 5. doi:10.3389/fpsyg.2014.00962Google Scholar
Paap, K. R., Sawi, O. M., Dalibar, C., Darrow, J., & Johnson, H. A. (2014). The Brain Mechanisms Underlying the Cognitive Benefits of Bilingualism may be Extraordinarily Difficult to Discover. AIMS Neuroscience, 1 (3), 245256. doi:https://doi.org/10.3934/Neuroscience.2014.3.245CrossRefGoogle Scholar
Perrachione, T. K., Chiao, J. Y., & Wong, P. C. M. (2010). Asymmetric cultural effects on perceptual expertise underlie an own-race bias for voices. Cognition, 114 (1), 4255. doi:https://doi.org/10.1016/j.cognition.2009.08.012Google Scholar
Perrachione, T. K., Del Tufo, S. N., & Gabrieli, J. D. E. (2011). Human voice recognition depends on language ability. Science, 333, 595.Google Scholar
Perrachione, T. K., Dougherty, S. C., McLaughlin, D. E., & Lember, R. A. (2015). The effects of speech perception and speech comprehension on talker identification Proceedings of the 18th International Congress of Phonetic Sciences. Glasgow: University of Glasgow.Google Scholar
Perrachione, T. K., & Wong, P. C. M. (2007). Learning to recognize speakers of a non-native language: Implications for the functional organization of human auditory cortex. Neuropsychologia, 45, 18991910.Google Scholar
Purhonen, M., Kilpeläinen-Lees, R., Valkonen-Korhonen, M., Karhu, J., & Lehtonen, J. (2004). Cerebral processing of mother's voice compared to unfamiliar voice in 4-month-old infants. International Jornal of Psycholphysiology, 52, 257266.Google Scholar
Purhonen, M., Kilpeläinen-Lees, R., Valkonen-Korhonen, M., Karhu, J., & Lehtonen, J. (2005). Four-month-old infants process own mother's voice faster than unfamiliar voices - Electrical signs of sensitization in infant brain. Cognitive Brain Research, 24, 627633.CrossRefGoogle ScholarPubMed
Saidi, L. G., & Ansaldo, A. I. (2015). Can a Second Language Help You in More Ways Than One? AIMS Neuroscience, 2 (1), 5257. doi:https://doi.org/10.3934/Neuroscience.2015.1.52Google Scholar
Schiller, N. O., & Köster, O. (1996). Evaluation of a foreign speaker in forensic phonetics: a report. Forensic Linguistics, 3 (1), 176185.Google Scholar
Schneider, W., Eschman, A., & Zuccolotto, A. (2007). E-Prime 2.0 Professional. Pittsburgh, PA: Psychology Software Tools, Inc.Google Scholar
Semel, E., Wiig, E. H., & Secord, W. A. (2003). Clinical evaluation of language fundamentals, fourth edition (CELF-4). Toronto, CA: The Psychological Corporation/A Harcourt Assessment Company.Google Scholar
Sidaras, S. K., Alexander, J. E. D., & Nygaard, L. C. (2009). Perceptual learning of systematic variation in Spanish-accented speech. Journal of the Acoustical Society of America, 125 (5), 33063316.Google Scholar
Spence, M. J., Rollins, P. R., & Jerger, S. (2002). Children's recognition of cartoon voices. Journal of Speech, Language and Hearing Research, 45 (1), 214222.Google Scholar
Stevenage, S. V., Clarke, G., & McNeill, A. (2012). The “other-accent” effect in voice recognition. Journal of Cognitive Psychology, 24 (6), 647653. doi:10.1080/20445911.2012.675321Google Scholar
Strange, W. (2006). Second‐language speech perception: The modification of automatic selective perceptual routines. The Journal of the Acoustical Society of America, 120, 3137.CrossRefGoogle Scholar
Stroop, J. R. (1935). Studies of interference in serial verbal reactions. Journal of Experimental Psychology, 18, 643662.Google Scholar
Sullivan, K. P. H., & Schlichting, F. (2000). Speaker discrimination in a foreign language: first language environment, second language learners. Forensic Linguistics, 17 (1), 95111.Google Scholar
Thompson, C. P. (1987). A language effect in voice identification. Applied Cognitive Psychology, 1 (2), 121131.Google Scholar
Van Dommelen, W. A. (1987). The Contribution of Speech Rhythm and Pitch to Speaker Recognition. Language and Speech, 30 (4), 325338. doi:10.1177/002383098703000403Google Scholar
Winters, S. J., Levi, S. V., & Pisoni, D. B. (2008). Identification and discrimination of bilingual talkers across languages. Journal of the Acoustical Society of America, 123 (6), 45244538.Google Scholar
Xie, X., & Myers, E. (2015). The impact of musical training and tone language experience on talker identification. Journal of the Acoustical Society of America, 137 (1), 419432.Google Scholar
Zarate, J. M., Tian, X., Woods, K. J. P., & Poeppel, D. (2015). Multiple levels of linguistic and paralinguistic featres contribute to voice recognition. Scientific Reports, 5: 11475. doi:10.1038/srep11475Google Scholar
Figure 0

Table 1. Demographic data for the four groups.

Figure 1

Table 2. Speaker information.

Figure 2

Figure 1. Response screen for Talker Learning. Figure from Levi (2015). Reprinted with permission.

Figure 3

Figure 2. Sample trial procedure during the five days of training. Figure from Levi (2015). Reprinted with permission.

Figure 4

Figure 3. Proportion correct in the English (black lines) and German (grey lines) discrimination tasks. Error bars represent standard error.

Figure 5

Figure 4. Average number of days to reach criterion (70% accuracy) in the training phase (with feedback). Error bars represent standard error.

Figure 6

Figure 5. Average number of days to reach criterion (70% accuracy) in the test phase (no feedback). Error bars represent standard error.

Figure 7

Figure 6. Proportion correct on talker identification during the five days of training (with feedback). Error bars represent standard error.

Figure 8

Figure 7. Proportion correct on talker identification during the five days of testing (no feedback). Error bars represent standard error.

Figure 9

Appendix A. Detailed output of the ANOVAs for the demographic variables.

Figure 10

Appendix B. Results for talker discrimination.