Introduction
When a new language is proficiently learned in early or late bilinguals, novel phonetic categories are established (MacKain, Best & Strange, Reference MacKain, Best and Strange1981), although there are individual differences in how well these sounds are perceived and produced (Flege, Munro & MacKay, Reference Flege, Munro and MacKay1995, Pallier, Bosch & Sebastián-Gallés, Reference Pallier, Bosch and Sebastián-Gallés1997, Bosch, Costa & Sebastián-Gallés, Reference Bosch, Costa and Sebastián-Gallés2000, Sebastián-Gallés, Rodriguez-Fornells, Deigo-Balaguer & Díaz, Reference Sebastián-Gallés, Rodriguez-Fornells, de Diego-Balaguer and Díaz2006). Laboratory training studies also show that adults can learn to hear and to produce foreign speech sounds, again with large individual differences (Golestani & Zatorre, Reference Golestani and Zatorre2009, Hattori & Iverson, Reference Hattori and Iverson2009, Kartushina, Hervais-Adelman, Frauenfelder & Golestani, Reference Kartushina, Hervais-Adelman, Frauenfelder and Golestani2015). This review will address the cortical basis of phonetic processing in bilinguals and of phonetic learning, with a focus on functional magnetic resonance imaging (fMRI) studies of phonetic perception. An overview of the neural basis of phonetic processing per se will precede the review of the bilingual and phonetic learning literature.
Cortical bases of phonetic processing
Functional brain imaging studies using methods such as PET and fMRI in adults have examined the neural underpinnings of phonetic perception using words, speech syllables, and meaningless speech sounds, and using passive listening, phoneme monitoring, discrimination, identification, and rhyming tasks. Several existing papers offer well-established models of the neural underpinnings of language processing and learning more generally (Hickok & Poeppel, Reference Hickok and Poeppel2007, Rodriguez-Fornells, Cunillera, Mestres-Misse & Deigo-Balaguer, Reference Rodriguez-Fornells, Cunillera, Mestres-Misse and de Diego-Balaguer2009, Price, Reference Price2012). With respect to phonetic processing specifically, these models highlight the role of the dorsal audio-motor interface, or of the dorsal stream, including auditory, frontal and parietal regions, in mapping sounds onto articulatory-based representations (Hickok & Poeppel, Reference Hickok and Poeppel2007, Rodriguez-Fornells et al., Reference Rodriguez-Fornells, Cunillera, Mestres-Misse and de Diego-Balaguer2009). This network is especially relevant for phonological processing and working memory (Aboitiz, Reference Aboitiz2012), in contrast with the ventral stream, which is thought to be more implicated in lexical processing and in processing meaning, or semantics (Hickok & Poeppel, Reference Hickok and Poeppel2007, Rodriguez-Fornells et al., Reference Rodriguez-Fornells, Cunillera, Mestres-Misse and de Diego-Balaguer2009). Within the dorsal audio-motor network, the left pars opercularis, which lies in the posterior portion of Broca's area, and the adjacent left insula/frontal operculum (FO) of the left inferior frontal gyrus (LIFG) are involved even during purely receptive (i.e., perceptual) phonetic tasks when there are specific task demands such as phonetic segmentation and analysis (Démonet, Chollet, Ramsay, Cardebat, Nespoulous, Wise, Rascol & Frackowiak, Reference Démonet, Chollet, Ramsay, Cardebat, Nespoulous, Wise, Rascol and Frackowiak1992, Zatorre, Evans, Meyer & Gjedde, Reference Zatorre, Evans, Meyer and Gjedde1992, Fiez, Raichle, Miezin, Petersen, Tallal & Katz, Reference Fiez, Raichle, Miezin, Petersen, Tallal and Katz1995, Poldrack, Wagner, Prull, Desmond, Glover & Garbrieli, Reference Poldrack, Wagner, Prull, Desmond, Glover and Gabrieli1999, Burton, Small & Blumstein, Reference Burton, Small and Blumstein2000, Golestani & Zatorre, Reference Golestani and Zatorre2004). The left pars opercularis and the left supramarginal gyrus (SMG) are implicated in verbal working memory, or in the phonological loop, with the left pars opercularis and the adjacent left premotor area being involved in subvocal rehearsal, and the left SMG being involved in phonological storage (Paulesu, Frith & Frackowiak, Reference Paulesu, Frith and Frackowiak1993, Smith, Jonides, Marshuetz & Koeppe, Reference Smith, Jonides, Marshuetz and Koeppe1998, Nixon, Lazarova, Hodinott-Hill, Gough & Passingham, Reference Nixon, Lazarova, Hodinott-Hill, Gough and Passingham2004, Koelsch, Schulze, Sammler, Fritz, Mueller & Gruber, Reference Koelsch, Schulze, Sammler, Fritz, Mueller and Gruber2009). The implication of left motor cortex in addition to premotor regions during phonetic perception is thought to reflect subvocal articulatory demands (Pulvermuller, Huss, Kherif, Martin, Hauk & Shtyrov, Reference Pulvermuller, Huss, Kherif, Martin, Hauk and Shtyrov2006, Lee, Turkeltaub, Granger & Raizada, Reference Lee, Turkeltaub, Granger and Raizada2012, Rogers, Mottonen, Boyles & Watkins, Reference Rogers, Mottonen, Boyles and Watkins2014), in line with the motor theory of speech perception (Liberman & Mattingly, Reference Liberman and Mattingly1985).
The bilateral auditory cortex activations observed in the superior temporal gyrus (STG) during phonetic perception are typically localized to secondary auditory cortices anterior and posterior to Heschl's gyrus (HG), including the planum temporale (PT) (Binder, Rao, Hammeke, Yetkin, Jesmanowicz, Bandettini, Wong, Estkowski, Goldstein, Haughton & Hyde, Reference Binder, Rao, Hammeke, Yetkin, Jesmanowicz, Bandettini, Wong, Estkowski, Goldstein, Haughton and Hyde1994, Jancke, Shah, Posse, Grosse-Ryuken & Muller-Gartner, Reference Jancke, Shah, Posse, Grosse-Ryuken and Muller-Gartner1998, Binder, Frost, Hammeke, Bellgowan, Springer, Kaufman & Possing, Reference Binder, Frost, Hammeke, Bellgowan, Springer, Kaufman and Possing2000, Hickok & Poeppel, Reference Hickok and Poeppel2000, Kilian-Huetten, Valente, Vroomen & Formisano, Reference Kilian-Huetten, Valente, Vroomen and Formisano2011). However, these regions are also involved in processing complex sounds such as amplitude modulated noise (Giraud, Lorenzi, Ashburner, Wable, Johnsrude, Frackowiak & Kleinschmidt, Reference Giraud, Lorenzi, Ashburner, Wable, Johnsrude, Frackowiak and Kleinschmidt2000) as well in the analysis of spectral and temporal information more generally (Obleser, Eisner & Kotz, Reference Obleser, Eisner and Kotz2008, Santoro, Moerel, De Martino, Goebel, Ugurbil, Yacoub & Formisano, Reference Santoro, Moerel, De Martino, Goebel, Ugurbil, Yacoub and Formisano2014), whereas earlier, primary auditory regions respond preferentially to simpler stimuli such as pure tones (Wessinger, VanMeter, Tian, Van Lare, Pekar & Rauschecker, Reference Wessinger, VanMeter, Tian, Van Lare, Pekar and Rauschecker2001). When the processing of complex auditory (i.e., non-phonetic) stimuli is controlled for, or when across category phonetic conditions are compared to within category ones, phonetic perception is localised to the more downstream left middle/anterior superior temporal sulcus (STS) (Liebenthal, Binder, Spitzer, Possing & Medler, Reference Liebenthal, Binder, Spitzer, Possing and Medler2005) and to the adjacent left middle temporal gyrus (Zhang, Xi, Xu, Shu, Wang & Li, Reference Zhang, Xi, Xu, Shu, Wang and Li2011), respectively. This latter study, which investigated lexical tonal stimuli in native speakers of Chinese, and other studies having examined the learning of lexical tone in people who were not native speakers of Chinese (Wong, Perrachione & Parrish, Reference Wong, Perrachione and Parrish2007), demonstrate convergence in terms of the left-lateralized neural underpinnings of lexical tone processing and of phonetic processing in non-tonal languages. Consistent with the hierarchical view that more downstream regions respond to phonetic information per se, it has been proposed that speech perception is robust due to the presence of multiple, complementary representations of the input, which operate both on acoustic-phonetic features but also in articulatory-gestural domains (Scott & Johnsrude, Reference Scott and Johnsrude2003, Obleser, Leaver, VanMeter & Rauschecker, Reference Obleser, Leaver, Vanmeter and Rauschecker2010). Bilateral temporal regions are involved in the processing of phonology, and higher levels of linguistic information in the speech signal (e.g., semantics, syntax) are processed in higher-level, left-lateralized frontal and parietal association cortices (Scott & Johnsrude, Reference Scott and Johnsrude2003, Peelle, Reference Peelle2012). Interestingly however, recent electrical recordings in humans (electrocorticography, or ECoG) during surgical planning have shown neural response patterns within the posterior STG (pSTG) which correspond to phonetic category boundaries (Chang, Rieger, Johnson, Berger, Barbaro & Knight, Reference Chang, Rieger, Johnson, Berger, Barbaro and Knight2010), and to the speech sound features which map onto particular articulatory dimensions (Mesgarani, Cheung, Johnson & Chang, Reference Mesgarani, Cheung, Johnson and Chang2014). In other words, the pSTG does more than process spectro-temporal information in complex auditory input, and is likely also engaged in functional interaction with higher-level frontal and parietal regions that are involved in the categorical perception (CP) of speech sounds, an idea that is supported by recent developmental fMRI work on CP (Conant, Liebenthal, Desai & Binder, Reference Conant, Liebenthal, Desai and Binder2014), and by fMRI adaptation (Raizada & Poldrack, Reference Raizada and Poldrack2007) and pattern classification studies on CP (Lee et al., Reference Lee, Turkeltaub, Granger and Raizada2012). Similarly, the adjacent left temporo-parietal junction (area Spt) is thought to be involved in the interface, or mapping between sensory and motor representations during speech processing (Hickok & Poeppel, Reference Hickok and Poeppel2007). Finally, there is growing evidence for involvement of partially overlapping frontal (i.e., Broca's area) and posterior (i.e., Wernicke's area) brain regions classically associated with speech production and perception, respectively, during phonological and speech perception and production (Paus et al., Reference Paus, Perry, Zatorre, Worsley and Evans1996, Buchsbaum, Hickok & Humphries, Reference Buchsbaum, Hickok and Humphries2001, Heim, Opitz, Muller & Friederici, Reference Heim, Opitz, Muller and Friederici2003, Hickok & Poeppel, Reference Hickok and Poeppel2007, Meister, Wilson, Deblieck, Wu & Iacoboni, Reference Meister, Wilson, Deblieck, Wu and Iacoboni2007, Price, Crinion & Macsweeney, Reference Price, Crinion and Macsweeney2011, Agnew et al., Reference Agnew, McGettigan, Banks and Scott2013), lending further support to the idea of interdependency of phonetic perception and production in the human brain.
Functional brain imaging studies on bilingual phonetic processing and on phonetic learning
Studies involving words
In an early PET study in late, proficient bilinguals, overlapping activations were observed in regions including the pars triangularis and the pars orbitalis of the LIFG in the first (L1) and second language (L2) during rhyme and synonym generation tasks, where phonological and semantic cues guided word selection, respectively (Klein, Milner, Zatorre, Meyer & Evans, Reference Klein, Milner, Zatorre, Meyer and Evans1995). These frontal regions are more typically associated with semantic processing and memory (Binder, Frost, Hammeke, Rao & Cox, Reference Binder, Frost, Hammeke, Rao and Cox1996, Dapretto & Bookheimer, Reference Dapretto and Bookheimer1999, Liebenthal, Desai, Ellingson, Ramachandran, Desai & Binder, Reference Liebenthal, Desai, Ellingson, Ramachandran, Desai and Binder2010) than with phonetic processing, which is more typically localized to the left pars opercularis (Poldrack, Wagner, Prull, Desmond, Glover & Gabrieli, Reference Poldrack, Wagner, Prull, Desmond, Glover and Gabrieli1999). The implication of semantic regions also during phonologically guided word retrieval might be expected given that a word generation task was used where semantic and lexical processes are likely also at play, especially when new words are generated. The findings of this study were interpreted as reflecting shared neural representations during phonetic and also semantic processing, in proficient bilinguals (Klein et al., Reference Klein, Milner, Zatorre, Meyer and Evans1995).
In a later longitudinal fMRI study on phonetic learning, minimal word pairs were used to test and to train Japanese individuals to hear the /r/ - /l/ contrast. These participants had previously been extensively exposed to this contrast, during 6 years of English-language instruction. After training, increased activation was found in regions including the bilateral superior temporal gyrus/sulcus (STG/STS), IFG, insula, SMG, premotor cortex, supplementary motor area and subcortical regions. It was proposed that these increases reflect the acquisition of auditory-articulatory mappings for the difficult /r-l/ contrast, in particular since this network was broader than that observed during perception of an easy phonetic contrast (/b-g/) (Callan, Tajima, Callan, Kubo, Masaki & Akahane-Yamada, Reference Callan, Tajima, Callan, Kubo, Masaki and Akahane-Yamada2003). Given that training was extensive and that it involved words, the functional plasticity results could in part have arisen from changes in semantic processing. It is interesting, however, that activation in primary and secondary auditory areas was also increased after training, reflecting functional plasticity in relatively low-level auditory regions (Callan et al., Reference Callan, Tajima, Callan, Kubo, Masaki and Akahane-Yamada2003). More generally, greater overall activation during perception of the difficult compared to the easy contrast is consistent with the idea of greater neural recruitment during effortful task performance, an explanation that has been offered for bilingual language processing more generally, in particular in the left IFG (Frith, Friston, Liddle & Frackowiak, Reference Frith, Friston, Liddle and Frackowiak1991, Chee, Hon, Lee & Soon, Reference Chee, Hon, Lee and Soon2001, Golestani & Zatorre, Reference Golestani and Zatorre2004, Golestani, Alario, Meriaux, Le Bihan, Dehaene & Pallier, Reference Golestani, Alario, Meriaux, Le Bihan, Dehaene and Pallier2006). However, the above studies did not isolate phonetic processing per se, and as such the interpretation of the findings is limited.
Studies having used isolated phonemes or syllables
Studies on bilingual phonetic processing and on phonetic learning having used isolated phonemes or syllables converge with the idea that phonetic processing in L1 and L2 generally overlap, with greater neural recruitment during non-native, or effortful phonetic processing. For example in a magnetoencephalographic study on preattentive neural responses to stimulus change, English and Japanese listeners were tested during exposure to the /ra/ and /la/ syllables. The processing of non-native speech sounds in the Japanese group recruited greater neural resources and was associated with longer periods of brain activation in bilateral superior temporal and inferior parietal regions (Zhang et al., Reference Zhang, Kuhl, Imada, Kotani and Tohkura2005).
Other phonetic perception studies have required active task performance. In one such fMRI study, native (English) and non-native (Japanese) listeners identified syllables starting with /r/ and /l/ (Callan, Jones, Callan & Akahane-Yamada, Reference Callan, Jones, Callan and Akahane-Yamada2004). The Japanese listeners had previously studied English for at least 6 years, and accordingly, they performed above chance on this task, but still more poorly than the English participants. In line with the above-described longitudinal study by the same group (Callan et al., Reference Callan, Tajima, Callan, Kubo, Masaki and Akahane-Yamada2003), brain imaging revealed greater activation in the non-native listeners in an articulatory-auditory network comprising Broca's area, the anterior insula, the anterior STS/STG, the PT, the temporo-parietal junction, the SMG and the cerebellum, once again consistent with greater neural recruitment during more effortful, non-native phonetic processing. There was also a weak, positive correlation between performance on the /r/-/l/ contrast and activation in the above-reported network in the non-native listeners (Callan et al., Reference Callan, Jones, Callan and Akahane-Yamada2004). In other words, between groups, higher activation was associated with poorer performance (i.e., in the non-native compared to native listeners), but within the non-native (Japanese) group, the opposite was observed.
In line with the above-described study (Callan et al., Reference Callan, Jones, Callan and Akahane-Yamada2004) and with the related longitudinal study by the same group (Callan et al., Reference Callan, Tajima, Callan, Kubo, Masaki and Akahane-Yamada2003), a second longitudinal study also found greater recruitment of auditory and articulatory brain regions after learning to hear a difficult non-native phonetic contrast (Golestani & Zatorre, Reference Golestani and Zatorre2004). In this latter study, listeners were trained to hear the difficult dental-retroflex contrast. After training, the pattern of brain activation came to resemble that observed during identification of a native contrast, with greater recruitment of the left IFG, the right insula / FO, the STG bilaterally and the left caudate nucleus (Golestani & Zatorre, Reference Golestani and Zatorre2004). There was also a positive relationship between behavioural improvement and post-training brain activation in the left angular gyrus, as well as a negative relationship between improvement and activation in the left insula/FO. This latter result suggests that the degree of success in phonetic learning is accompanied by more efficient neural processing in frontal speech regions implicated in phonetic processing, and conversely, that more effortful processing in the poorer learners is accompanied by greater recruitment of the left insula/FO (Golestani & Zatorre, Reference Golestani and Zatorre2004). The negative correlation with performance is in the opposite direction to that found in this and other brain regions by Callan and colleagues (Reference Callan, Jones, Callan and Akahane-Yamada2004). One factor that could explain the discrepancy is that in Golestani and Zatorre (Reference Golestani and Zatorre2004), participants were completely naïve to the contrast before training, and after 5 hours of training, only about half of the participants performed above chance (Golestani & Zatorre, Reference Golestani and Zatorre2004), whereas in the study by Callan and colleagues (Reference Callan, Jones, Callan and Akahane-Yamada2004), all the Japanese participants performed above chance even before scanning.
This raises the important question of the interaction between performance/effort and the degree of neural recruitment of relevant brain regions. Specifically, it is likely that some individuals can easily hear the contrast, that others can do so but with difficulty (i.e., with uncertainty and effort), and that yet others cannot hear it at all. In this latter subgroup, due to perceptual assimilation of non-native with native sounds, one can expect that participants eventually make less effort (i.e., they might give up on performing the task), and one can also expect greater neural adaptation in these individuals (Grill-Spector & Malach, Reference Grill-Spector and Malach2001) due to the fact that they effectively hear the same sound across different trials. Such differences across individuals and also across studies (e.g., related to aptitudes, but also to previous exposure to the contrast of interest) might modulate the observed neural response in brain regions involved, resulting in discrepancies across studies in terms of the direction of the training effects, and in terms of the direction of correlations between activation and performance.
Interestingly, an electroencephalography (EEG) study has uncovered an important finding in relation to individual differences in phonetic perception. Using a pre-attentive oddball paradigm on vowels, it was found that good and poor phonetic perceivers differed in their electrophysiological response indexing change detection (i.e., the mismatch negativity, or MMN response) not only to non-native but also to native phonetic contrasts (Díaz, Baus, Escera, Costa & Sebastían-Gallés, Reference Díaz, Baus, Escera, Costa and Sebastián-Gallés2008). In other words, people who are particularly good or poor in non-native vowel perception also differ in their neural response to native vowel contrasts. This finding may arise from the partially shared neural resources underlying L1 and L2 phonetic processing (Golestani & Zatorre, Reference Golestani and Zatorre2004), and suggests that there exist individual differences even in how native speech sounds are perceived, at least in bilinguals. This could in part be due to the influence of learning a new phonetic inventory on characteristics of the native inventory (Chang, Reference Chang2012, Kartushina, Hervais-Adelman, Frauenfelder & Golestani, unpublished manuscript). Possibly related to a relationship between L1 and L2 phonetic perception is recent behavioural evidence for a relationship between L1 and L2 phonetic production (Kartushina and Frauenfelder, Reference Kartushina and Frauenfelder2014).
The studies reviewed thus far reported results of univariate analyses, and generally converge in showing greater recruitment of frontal and/or of posterior brain regions during the processing of new or of difficult speech sounds. Different, complementary results have been obtained using multi-voxel pattern analysis (MVPA, aka ‘pattern classification’), which is better suited for differentiating neural representations within spatially overlapping brain regions. In one such study, English and Japanese listeners were tested on their perception of the /r/ - /l/ distinction. It was found that the statistical separability of fMRI activation patterns in the right primary auditory cortex predicted subjects’ ability to tell the sounds apart, both across and within groups (Raizada, Tsao, Liu & Kuhl, Reference Raizada, Tsao, Liu and Kuhl2010). This result is consistent with functional brain imaging (Binder et al., Reference Binder, Rao, Hammeke, Yetkin, Jesmanowicz, Bandettini, Wong, Estkowski, Goldstein, Haughton and Hyde1994, Jancke et al., Reference Jancke, Shah, Posse, Grosse-Ryuken and Muller-Gartner1998, Binder et al., Reference Binder, Frost, Hammeke, Bellgowan, Springer, Kaufman and Possing2000, Hickok & Poeppel, Reference Hickok and Poeppel2000, Kilian-Huetten et al., Reference Kilian-Huetten, Valente, Vroomen and Formisano2011) and with electrocorticography studies showing temporal cortex involvement during phonetic processing (Chang et al., Reference Chang, Rieger, Johnson, Berger, Barbaro and Knight2010, Mesgarani et al., Reference Mesgarani, Cheung, Johnson and Chang2014), and demonstrates that further work is needed involving more fine-grained analyses of differences in neural recruitment within spatially overlapping brain regions. This opens the question of the contributions of top-down versus bottom-up influences on auditory cortex activation differences in relation to phonetic processing.
A recent adaptation fMRI study partially addressed this question (Myers & Swan, Reference Myers and Swan2012). Involvement of temporal and inferior frontal brain regions was shown in phonetic processing (Myers & Swan, Reference Myers and Swan2012), and additionally, the bilateral middle frontal gyri were implicated specifically during the processing of a newly learned phonetic category. This suggests that top-down information about new categories may reshape perceptual sensitivities via attentional or executive mechanisms (Myers & Swan, Reference Myers and Swan2012), and demonstrates that there is a complex interplay between low-level, perceptual aspects of the input and higher-level knowledge about phonetic categories, in particular when they are newly learned. Related to this are the results of a longitudinal training study with synthetic, phonetic and non-speech but voice-like continua, which showed that the left posterior STS may play a role in the short-term representation of sound features relevant for learning new sound categories (Liebenthal et al., Reference Liebenthal, Desai, Ellingson, Ramachandran, Desai and Binder2010). This provides evidence for a lower-level, temporal cortex mechanism that may mediate subsequent consolidation during the learning of novel speech sounds.
Conclusions and future reading
In conclusion, although a limited number of functional imaging studies have examined the neural underpinnings of bilingual phonetic processing per se, the results of these studies generally converge in showing overlapping brain regions during phonetic processing in the L1 and L2 of bilinguals, with greater recruitment of frontal and posterior brain regions during the processing of new or of ‘difficult’ non-native sounds. This converges with findings on bilingual language processing more generally, where it has been shown that at early stages of L2 learning there is relatively greater engagement of anterior and parietal portions of the language network including Broca's area as well as of higher level executive and language control regions, and that, as increased proficiency is attained in the second language, the two languages recruit more overlapping brain networks (Indefrey, Reference Indefrey2006, Abutalebi, Cappa & Perani, Reference Abutalebi, Cappa and Perani2001, Stowe & Sabourin, Reference Stowe and Sabourin2005, Abutalebi, Reference Abutalebi2008, Sebastian, Laird & Kiran, Reference Sebastian, Laird and Kiran2011). Further, studies having examined the question of phonetic perception and learning per se using univariate approaches and at the macroscopic level using fMRI suggest that largely overlapping regions of the auditory cortex are recruited when processing familiar versus novel speech sounds, or when processing different speech sounds of one language. More advanced image analysis methods (i.e., MVPA) and invasive approaches such as intracranial recordings, however, reveal differences in the neural response pattern within overlapping regions of auditory cortex in response to L1 versus L2 speech sounds, and also in relation to specific phonetic features such as place of articulation, and in relation to cross versus within category differences (i.e., categorical perception). These more fine-grained auditory cortex differences, which are likely modified during the acquisition of new speech sounds, are likely mediated a) by regions including the left middle to posterior STS in the short-term representation of sound features defining new sound categories; b) by increased involvement of the left temporo-parietal junction related to increased demands on sensori-motor mapping of the new sounds; and c) by the additional involvement of frontal brain regions in the top-down reshaping of lower-level, perceptual phonetic encoding in the auditory cortex. These findings are convergent with the known roles of these respective components of the dorsal audio-motor stream in spectro-temporal analysis (bilateral dorsal STG), in phonological processing (bilateral middle to posterior STS), in the sensori-motor interface (left temporo-parietal junction) and in subvocal articulation (posterior LIFG) (Hickok & Poeppel, Reference Hickok and Poeppel2007). Outstanding questions remain regarding the precise mechanisms underlying differential encoding of L1 versus L2 (or foreign) speech sounds in primary and secondary auditory cortices, in particular in light of interactions of these bottom-up, auditory processes with top-down, frontal and temporo-parietal ones. These can be addressed using, among other approaches, ultra-high resolution (i.e., 7 Tesla) functional mapping, advanced data analysis methods including MVPA and computational modelling, and invasive methods such as intracranial recordings.
Recommendations for further reading that relate to the neural bases of phonetic processing in bilingualism and to phonetic learning include developmental work on native and non-native speech sound processing in infants (Cheour, Ceponiene, Lehtokoski, Luuk, Allik, Alho & Naatanen, Reference Cheour, Ceponiene, Lehtokoski, Luuk, Allik, Alho and Naatanen1998, Rivera-Gaxiola, Silva-Pereyra & Kuhl, Reference Rivera-Gaxiola, Silva-Pereyra and Kuhl2005, Minagawa-Kawai, Mori, Naoi & Kojima, Reference Minagawa-Kawai, Mori, Naoi and Kojima2007, Petitto, Berens, Kovelman, Dubins, Jasinska & Shalinsky, Reference Petitto, Berens, Kovelman, Dubins, Jasinska and Shalinsky2012, Ortiz-Mantilla, Hamalainen, Musacchia & Benasich, Reference Ortiz-Mantilla, Hamalainen, Musacchia and Benasich2013, Fava, Hull & Bortfeld, Reference Fava, Hull and Bortfeld2014), on foreign-language syllable production in children (Hashizume, Taki, Sassa, Thyreau, Asano, Asano, Takeuchi, Nouchi, Kotozaki, Jeong, Sugiura & Kawashima, Reference Hashizume, Taki, Sassa, Thyreau, Asano, Asano, Takeuchi, Nouchi, Kotozaki, Jeong, Sugiura and Kawashima2014), and on the neural bases of lexical tone processing in individuals whose first language was tonal but was subsequently forgotten (Pierce, Klein, Chen, Delcenserie & Genesee, Reference Pierce, Klein, Chen, Delcenserie and Genesee2014). There is also a large electrophysiological (EEG and magnetoencephalography, or MEG) literature and some functional near infrared spectroscopy (fNIRS) work on the cortical and subcortical bases of phonetic perception and learning (Alain, Reinke, McDonald, Chau, Tam, Pacurar & Graham, Reference Alain, Reinke, McDonald, Chau, Tam, Pacurar and Graham2005, Zhang, Kuhl, Imada, Iverson, Pruitt, Stevens, Kawakatsu, Tohkura & Nemoto, Reference Zhang, Kuhl, Imada, Iverson, Pruitt, Stevens, Kawakatsu, Tohkura and Nemoto2009, Kumar, Hegde & Mayaleela, Reference Kumar, Hegde and Mayaleela2010, Xi, Zhang, Shu, Zhang & Li, Reference Xi, Zhang, Shu, Zhang and Li2010, Zhang et al., Reference Zhang, Xi, Xu, Shu, Wang and Li2011, Chandrasekaran, Kraus & Wong, Reference Chandrasekaran, Kraus and Wong2012, Brandmeyer, Farquhar, McQueen & Desain, Reference Brandmeyer, Farquhar, McQueen and Desain2013, Kaan, Wayland & Keil, Reference Kaan, Wayland and Keil2013, Skoe, Chandrasekaran, Spitzer, Wong & Kraus, Reference Skoe, Chandrasekaran, Spitzer, Wong and Kraus2014, Zinszer, Chen, Wu, Shu & Li, Reference Zinszer, Chen, Wu, Shu and Li2015). Also, given the growing evidence for the importance of syllable-level speech processing (Morillon, Liegeois-Chauvel, Amer, Bener & Giraud, Reference Morillon, Liegeois-Chauvel, Amer, Bener and Giraud2012, Edwards & Chang, Reference Edwards and Chang2013, Doelling, Arnal, Ghitza & Poeppel, Reference Doelling, Arnal, Ghitza and Poeppel2014), studies on the neural basis of bilingual phonotactic processing are recommended (Dehaene-Lambertz, Dupoux & Gout, Reference Dehaene-Lambertz, Dupoux and Gout2000, Jacquemot, Pallier, Le Bihan, Dehaene & Dupoux, Reference Jacquemot, Pallier, Le Bihan, Dehaene and Dupoux2003, Minagawa-Kawai, Cristia, Long, Vendelin, Hakuno, Dutat, Filippin, Cabrol & Dupoux, Reference Minagawa-Kawai, Cristia, Long, Vendelin, Hakuno, Dutat, Filippin, Cabrol and Dupoux2013), although only a limited number of studies have addressed this.
Other literature that is relevant to bilingual phonetic processing and learning is a body of work on the brain structural correlates of individual differences in phonetic processing and also in language processing more generally (see Golestani, Reference Golestani2014, for a recent review). These include studies on the brain structural correlates of phonetic perception (Golestani, Paus & Zatorre, Reference Golestani, Paus and Zatorre2002, Golestani, Molko, Dehaene, Le Bihan & Pallier, Reference Golestani, Molko, Dehaene, Le Bihan and Pallier2007, Wong, Chandrasekaran, Garibaldi & Wong, Reference Wong, Chandrasekaran, Garibaldi and Wong2011, Lebel & Beaulieu, Reference Lebel and Beaulieu2009, Wong, Warrier, Penhune, Roy, Sadehh, Parrish & Zatorre, Reference Wong, Warrier, Penhune, Roy, Sadehh, Parrish and Zatorre2008, Sebastián-Gallés, Soriano-Mas, Baus, Díaz, Ressel, Pallier, Costa & Pujol, Reference Sebastián-Gallés, Soriano-Mas, Baus, Díaz, Ressel, Pallier, Costa and Pujol2012, Burgaleta, Baus, Díaz & Sebastián-Gallés Reference Burgaleta, Baus, Díaz and Sebastián-Gallés2014) and production (Golestani & Pallier, Reference Golestani and Pallier2007), on foreign speech imitation (Reiterer, Hu, Erb, Rota, Nardo, Grodd, Winkler & Ackermann, Reference Reiterer, Hu, Erb, Rota, Nardo, Grodd, Winkler and Ackermann2011), on bilingualism (Mechelli, Crinion, Noppeney, O’Doherty, Ashburner, Frackowiak & Price, Reference Mechelli, Crinion, Noppeney, O’Doherty, Ashburner, Frackowiak and Price2004, Ressel, Pallier, Ventura-Campos, Díaz, Roessler, Avila & Sebastián-Gallés, Reference Ressel, Pallier, Ventura-Campos, Díaz, Roessler, Avila and Sebastián-Gallés2012, Klein, Mok, Chen & Watkins, Reference Klein, Mok, Chen and Watkins2014) and on expertise in phonetics (Golestani, Price & Scott, Reference Golestani, Price and Scott2011).