Hostname: page-component-745bb68f8f-mzp66 Total loading time: 0 Render date: 2025-02-12T00:54:40.611Z Has data issue: false hasContentIssue false

WHY ARE LEXICAL TONES DIFFICULT TO LEARN?

INSIGHTS FROM THE INCIDENTAL LEARNING OF TONE-SEGMENT CONNECTIONS

Published online by Cambridge University Press:  16 September 2019

Ricky KW Chan*
Affiliation:
University of Hong Kong
Janny HC Leung
Affiliation:
University of Hong Kong
*
*Correspondence concerning this article should be addressed to Ricky Chan, Speech, Language and Cognition Laboratory, School of English, University of Hong Kong, Pokfulam Road, Hong Kong. E-mail: rickykwc@hku.hk
Rights & Permissions [Opens in a new window]

Abstract

L2 sounds present different kinds of challenges to learners at the phonetic, phonological, and lexical levels, but previous studies on L2 tone learning mostly focused on the phonetic and lexical levels. The present study employs an innovative technique to examine the role of prior tonal experience and musical training on forming novel abstract syllable-level tone categories. Eighty Cantonese and English musicians and nonmusicians completed two tasks: (a) AX tone discrimination and (b) incidental learning of artificial tone-segment connections (e.g., words beginning with an aspirated stop always carry a rising tone) with synthesized stimuli modeled on Thai. Although the four participant groups distinguished the target tones similarly well, Cantonese speakers showed abstract and implicit knowledge of the target tone-segment mappings after training but English speakers did not, regardless of their musical experience. This suggests that tone language experience, but not musical experience, is crucial for forming novel abstract syllable-level tone categories.

Type
Research Article
Copyright
Copyright © Cambridge University Press 2019 

INTRODUCTION

Tone languages (e.g., Cantonese, Thai, Vietnamese) employ contrastive pitchFootnote 1 patterns at the syllable level (i.e., lexical tones) to distinguish word meaning. For instance, in Mandarin the syllable [ma] means “mother” when it carries a high level tone, but “scold” when it carries a high falling tone. Lexical tones are often reported to be difficult for second language (L2) learners, especially for those whose native language (L1) is a nontone language (e.g., English and French) (e.g., Francis, Ciocca, Ma, & Fenn, Reference Francis, Ciocca, Ma and Fenn2008; So & Best, Reference So and Best2010; Wang, Spence, Jongman, & Sereno, Reference Wang, Spence, Jongman and Sereno1999). However, the nature of the long-term difficulty involved is still far from clear. Most previous studies focused on explicit processes such as L2 tone identification or discrimination, and what factors – notably prior linguistic background and musical training – may contribute to better L2 tone perception and learning (e.g., Hallé, Chang, & Best, Reference Hallé, Chang and Best2004; Mok & Zuo, Reference Mok and Zuo2012; Wayland & Guion, Reference Wayland and Guion2004). However, the ability to perceive or discriminate different L2 tones explicitly does not entail the ability to encode pitch patterns as abstract tone categories at the syllable level, which is the prerequisite of using abstract tone categories as lexical cues for meaning contrast (i.e., the phonological level in a phonetic-phonological-lexical continuity in speech learning [e.g., Wong & Perrachione (2007)]).

This article is motivated by the hypothesis that, for learners whose native language is nontonal, a major long-term difficulty in learning novel lexical tones concerns repurposing pitch patterns from intonation cues to the formation of abstract tone categories at the syllable level. This hypothesis was tested with an experiment on the incidental learning (i.e., learning without intention) of tone-segment connections (constraints by segmental composition on the possible tone a given syllable can carry), which hinges on the encoding of pitch patterns as abstract tone categories at the syllable level. By contrast, the extent to which second language acquisition (SLA) involves incidental learning or even implicit learning (i.e., learning without awareness) remains controversial (see Andringa and Rebuschat, Reference Andringa and Rebuschat2015; Chan and Leung, Reference Chan and Leung2018; Hulstijn and Ellis, Reference Hulstijn and Ellis2005 for a review). An area of inquiry regarding incidental/implicit language learning lies in determining the scope of the adult capacity in acquiring knowledge of L2 language patterns without intention and/or awareness. Therefore, whether incidental exposure to tone-segment connections may lead to relevant abstract implicit knowledge is also an interesting theoretical question in language learning. The main goals of the present study are thus to (a) determine whether incidental exposure to tone-segment connections may lead to relevant implicit, abstract, and potentially rulelike knowledge; and (b) determine the effects of prior experience in the linguistic use of tones and musical training on the formation of novel abstract tone categories at the syllable level, and draw implications for L2 tone learning and the relationship between music and speech.

DIFFICULTY IN L2 TONE LEARNING

It is well known that unfamiliar L2 speech sounds may pose challenges for language learners, and L2 lexical tones are no exception. Lexical tones are often reported to be difficult for second language (L2) learners, especially for those whose native language (L1) is a nontone language (e.g., English and French) (e.g., Francis et al., Reference Francis, Ciocca, Ma and Fenn2008; So & Best, Reference So and Best2010; Wang et al., Reference Wang, Spence, Jongman and Sereno1999). Still, the nature of this difficulty is not well understood. In tone languages, pitch patterns function like vowels and consonants to contrast word meaning; however, in nontone languages such as English, pitch patterns are mainly employed as intonational cues for contrasting pragmatic meaning or discourse types and functions (Wells, Reference Wells2006). In terms of L2 tone processing and learning by nontone language learners, three potential levels of difficulty in L2 tone learning can be identified (cf. a phonetic-phonological-lexical continuity in speech learning [e.g., Wong & Perrachione, 2007]):

  1. (1) Perceiving and distinguishing tonal pitch patterns at the phonetic-acoustic level;

  2. (2) Encoding pitch patterns as abstract tone categories at the syllable level; and

  3. (3) Using abstract tone categories as lexical cues.

Most previous studies have addressed difficulty (1) by investigating explicit processes such as L2 tone perception or discrimination on monosyllables, and what factors may contribute to better L2 tone perception, notably (a) whether learners’ L1 is also a tone language; (b) proficiency in the target tone language; and (c) prior musical training. The key findings are summarized in the following text.

Previous research has shown that tone-language speakers tend to outperform nontonal language speakers in discriminating novel tones in general (e.g., Hallé et al., Reference Hallé, Chang and Best2004; Leather, Reference Leather, Leather and James1987; Lee & Nusbaum, Reference Lee and Nusbaum1993; Mok & Zuo, Reference Mok and Zuo2012; Repp & Lin, Reference Repp and Lin1990; Stagray & Downs, Reference Stagray and Downs1993; Wang, Jongman, & Sereno, Reference Wang, Jongman and Sereno2001; Wayland & Guion, Reference Wayland and Guion2004), although perceptual success for individual tones depends crucially on the similarities of the tonal systems between the target language and the listeners’ L1 as predicted by models of L2 speech perception such as the Perceptual Assimilation Model (e.g., Best, Reference Best, Goodman and Nusbaum1994; Francis et al., Reference Francis, Ciocca, Ma and Fenn2008; So & Best, Reference So and Best2010; Wayland & Guion, Reference Wayland and Guion2004). Generally speaking, nontone language speakers appear to process fundamental frequency (acoustic correlate of pitch) differently from tone language speakers, both behaviorally and neurologically (Gandour et al., Reference Gandour, Dzemidzic, Wong, Lowe, Tong, Hsieh, Satthamnuwong and Lurito2003; Hallé et al., Reference Hallé, Chang and Best2004; Krishnan, Xu, Gandour & Cariani, Reference Krishnan, Xu, Gandour and Cariani2005; Wang, Behne, Jongman, & Sereno, Reference Wang, Behne, Jongman and Sereno2004; Wong, Reference Wong2002; Xu, Gandour, & Francis, Reference Xu, Gandour and Francis2006). Short-term training in tone identification has been shown to facilitate nonnative tone identification for tone language speakers, but the results for nontone language speakers are mixed (e.g., Francis et al., Reference Francis, Ciocca, Ma and Fenn2008; Wang et al., Reference Wang, Spence, Jongman and Sereno1999; Wayland & Guion, Reference Wayland and Guion2004).

What about nontonal speakers who have exposure to a tone language beyond one or a few experimental sessions? A relatively small body of work has found that experienced/advanced L2 learners of a tone language perform well or even close to native level in tone perception in monosyllables (Hao, Reference Hao2012; Pelzl, Lau, Guo, & DeKeyser, Reference Pelzl, Lau, Guo and DeKeyser2019; Wang et al., Reference Wang, Spence, Jongman and Sereno1999; Zhang, Reference Zhang2011). This suggests that experienced/advanced L2 learners of a tone language appear to have little difficulty perceiving tones at the phonetic level.

Still, the ability to identify/distinguish different tone categories perceptually does not entail the ability to encode pitch patterns as abstract tone categories at the syllable level (difficulty [2]) and using abstract tone categories as lexical cues (difficulty [3]). Wong and Perrachione (Reference Wong and Perrachione2007) stresses the importance of examining the phonetic-phonological-lexical continuity in speech learning. Recent research in L2 speech learning has often shown a “discontinuity” between phonetic level and the lexical level: the ability to identify/categorize L2 sounds phonetically does not necessarily predict success in lexical tasks (i.e., word learning in the target language) (e.g., Sebastián-Gallés & Díaz, Reference Sebastián-Gallés and Díaz2012 at the segmental level). As for L2 tone processing and learning, while a few studies have demonstrated that in general nontonal speakers can make significant progress on learning to associate tonal words and pictures instead of tonal labels only after a few experimental sessions, large individual differences exist (e.g., Bowles, Chang, & Karuzis, Reference Bowles, Chang and Karuzis2016; Chandrasekaran, Sampath, & Wong, Reference Chandrasekaran, Sampath and Wong2010; Perrachione, Lee, Ha, & Wong, Reference Perrachione, Lee, Ha and Wong2011; Wong & Perrachione, Reference Wong and Perrachione2007). Pelzl et al. (2019, p. 4) pointed out that participants in these studies learned “only a couple dozen words, all of which form minimal tone contrasts with other known words in the training vocabulary (e.g., vs. vs. ), thereby enhancing the salience of tones as a lexical feature. These studies thus are limited in their ability to reflect the realities of the much larger L2 Mandarin lexicon that experienced learners have acquired.” However, Malins and Joanisse (Reference Malins and Joanisse2010, Reference Malins and Joanisse2012) used eye-tracking and event-related potential measures to demonstrate that the time course of lexical tone processing in native speakers is different from that of acoustic-phonetic perception of tones. Wiener, Ito, and Speer (Reference Wiener, Ito and Speer2018) showed that L2 lexical tone processing draws on lexical information on which listeners cannot draw in a typical tone discrimination task. Also, while experienced L2 listeners have been found to perform comparably to native listeners on monosyllabic tone identification tasks (e.g., Lee, Tao, & Bond, Reference Lee, Tao and Bond2009), a few studies have demonstrated that even advanced L2 learners struggle with tone perception in disyllabic or polysyllabic stimuli (e.g., Hao, Reference Hao2012; Pelzl et al., Reference Pelzl, Lau, Guo and DeKeyser2019; Zhang, Reference Zhang2011). For example, Pelzl et al. (Reference Pelzl, Lau, Guo and DeKeyser2019) found that L1 English advanced learners of Mandarin showed near-native performance in a tone identification task for isolated words, but they struggled in a lexical decision task and a semantic judgment task that depended on the processing of tones on disyllabic words. As a speculation, the authors attributed such “discontinuity” to English learners’ inability to “repurpose” pitch patterns from intonational cues to lexical cues, but their study did not serve as a direct piece of evidence for this speculation in that, alongside other confounds (e.g., the small sample size and the wide age range of the participants), the difficulty arising from tonal coarticulation could have made individual tones in the disyllabic words in their study difficult to perceive (Chang & Bowles, Reference Chang and Bowles2015).

However, these studies on the “lexical learning” of tones conflate lexical tone processing at the phonological level (abstract and categorization) with the lexical level (processing of word meaning), and assumed that participants’ performance on lexical tasks directly reflects their ability to form phonological tonal contrasts. While successful “lexical learning” of tones (e.g., being able to contrast words/meaning with tones) may entail successful “phonological learning” of tones (i.e., possessing abstract tone categories), the absence of learning effects in relevant lexical or semantic tasks (despite success at the phonetic level) in those studies could have been due to difficulty in both the phonological and lexical levels, and their relative difficulty for learners remains unclear. In this article we tease apart the phonological level from the phonetic and lexical levels in tone learning and investigate learners’ ability to establish abstract tone categories at the syllable level.

PRIOR MUSICAL TRAINING AND THE RELATIONSHIP BETWEEN MUSIC AND SPEECH

Music and speech are commonly believed to be closely related as both involve the use of pitch variation. Such overlap in the use of pitch raises interesting issues on the relationship between music and speech, specifically whether pitch processing is specific to the context in which it is learned or whether pitch processing advantages span across different contexts. The study of music and speech has broader implications for the fundamental question in human cognition about how different “domains” in the mind may interact with one another. The majority of prior research focused on the perception of musical pitch and linguistic pitch by musicians and nonmusicians. A number of studies have demonstrated that musicians generally outperform nonmusicians in the identification of lexical tone (e.g., Gottfried, Reference Gottfried, Bohn and Munro2007; Lee and Hung, Reference Lee and Hung2008; Mok & Zuo, Reference Mok and Zuo2012; Wayland, Herrera, & Kaan, Reference Wayland, Herrera and Kaan2010), suggesting that prior musical training may facilitate lexical tone perception. There is also neurological evidence that musicians display enhanced performance in the processing of fundamental frequency (acoustic correlate of pitch) (e.g., Bidelman, Gandour, & Krishnan, Reference Bidelman, Gandour and Krishnan2011; Lee, Lee, & Shr, Reference Lee, Lee and Shr2011; Wong et al., Reference Wong, Skoe, Russo, Dees and Kraus2007). Further, Deutsch, Dooley, Henthorn, and Head (2009) found that tone language speakers performed significantly better at a test of absolute pitch than nontonal language speakers. A recent study has also revealed that Mandarin-speaking children performed significantly better at processing relative pitch than English-speaking children, even though both groups performed similarly on a control music task (timbre discrimination), suggesting that L1 tone language experience may enhance musical pitch perception in children because tone language experience drives attention to pitch in nonlinguistic contexts (Creel, Wang, Fu, Heyman, & Lee, Reference Creel, Weng, Fu, Heyman and Lee2018). These findings suggest that there is overlap and interaction between the musical and linguistic domains, and that music and speech share very similar, if not the same, neural circuitry (Deutsch, Henthorn, & Dolson, Reference Deutsch, Henthorn and Dolson2004) and pitch processing mechanisms (e.g., Bradley, Reference Bradley2012; Perrachione, Fedorenko, Vinke, Gibson, & Dilley, Reference Perrachione, Fedorenko, Vinke, Gibson and Dilley2013). The OPERA hypothesis was proposed to explain why prior musical training benefits the neural encoding of speech (see Patel, Reference Patel2011 for details). Deutsch et al. (Reference Deutsch, Dooley, Henthorn and Head2009) even speculated that acquisition of absolute pitch by tone language speakers utilizes the same mechanisms involved in the acquisition of a tone language.

However, there is also counterevidence suggesting the separation of musical and linguistic domains in the use of pitch. For example, Jiang, Hamm, Lim, Kirk, and Yang (Reference Jiang, Hamm, Lim, Kirk and Yang2010) and Nan, Sun, and Peretz (Reference Nan, Sun and Peretz2010) found no transfer effects from experience with linguistic use of pitch to musical pitch perception. Mok and Zuo (Reference Mok and Zuo2012) found no facilitating effect of musical training on native lexical tone perception. Also, pitch range can vary considerably across individual talkers in their production of lexical tones, and thus success in acquiring L2 lexical tones hinges crucially on learners’ ability to normalize and abstract tonal pitch contours across multiple tone tokens by different talkers and categorize them into different tone categories. In this regard, Wayland et al. (Reference Wayland, Herrera and Kaan2010) found that although musicians performed better at pitch contour identification than nonmusicians, their pitch contour abstraction and categorization ability was comparable to that of nonmusicians, suggesting that prior musical training does not facilitate the abstraction and categorization of lexical tone categories.

Still, a crucial issue with regard to the relationship between music and speech remains unanswered: whether musical training may facilitate the formation of abstract tone categories at the syllable level by nontonal language speakers, which is the prerequisite for the use of tone categories as lexical cues for contrasting word meaning. This question has not been directly addressed in the L2 tone learning literature and will be the main research question of the present study. Findings will have implications for not only L2 tone learning but also the relationship between music and speech.

WHY STUDY L2 TONE LEARNING THROUGH THE INCIDENTAL LEARNING OF TONE-SEGMENT CONNECTIONS

At the methodological level, most previous studies on L2 tone perception and processing employed tasks that involved the use of explicit knowledge related to tones (e.g., tone/word identification, lexical decision, or semantic judgment based on minimal pairs that differ only in the tone). However, Pelzl et al. (Reference Pelzl, Lau, Guo and DeKeyser2019) found that accurate explicit knowledge of tones and meanings for the target vocabulary items, as revealed in an explicit vocabulary knowledge test, was apparently enough for success in the tone identification task but not in the lexical decision task. They speculated that, for nontonal L1 speakers, L2 tone may be difficult to acquire at the automatized/implicit level.

The preceding review suggests that (a) the use of nonexplicit knowledge should be tested in L2 tone processing; and (b) we need a learning target that requires the encoding of abstract tone categories at the syllable level (phonological level) while avoiding confounds such as tonal coarticulation effects (phonetic level), the processing of meaning, and prior lexical knowledge in the target language (lexical-semantic level). The proposed study addresses these issues with a novel approach that involves the incidental learning of tone-segment connections.

In tone languages (e.g., Thai, Vietnamese, Cantonese), a given syllable may in principle carry different tones to contrast word meaning. In some tone languages, however, the segments (consonants and vowels) of a syllable may pose constraints on the possible tone a given syllable can carry (a kind of tonotactics/tonal phonotactics, or “tone-segment connections” hereafter). For example, in many Chinese languages such as Cantonese and Hakka, entering tonesFootnote 2 only appear in syllables with a stop consonant such as /p/, /t/, or /k/ in the coda position (Bauer & Benedict, Reference Bauer and Benedict1997; Lee & Zee, Reference Lee and Zee2009; Sagart, Reference Sagart1999). In Thai, the tone of a syllable is determined by a complex interplay among the initial consonant class, vowel length, and syllable type. For instance, the mid-tone and the rising tone only occur in syllables ending with a nasal stop, glide, or an open vowel (see Sladen, Reference Sladen2009 for details). The learning of tone-segment connections hinges on the perception and encoding of both segments and abstract tone categories at the syllable level, and thus tone-segment connections are suitable learning targets for testing learners’ ability to form novel abstract tone categories.

The incidental learning paradigm was used. Incidental learning generally refers to the process by which information in the environment is picked up without conscious intention to learn (e.g., acquiring new words when reading a novel as a leisure activity) (Hulstijn, Reference Hulstijn and Chapelle2011). In contrast, explicit/intentional learning involves conscious intention and effort to learn and mainly results in conscious knowledge (Hulstijn, Reference Hulstijn and Chapelle2011). In laboratory settings, a typical incidental learning experiment usually involves subjects learning one aspect of the stimulus while paying attention to another. For instance, participants may pick up some grammar patterns when completing a meaning-focused task. On top of this, if the participants are unaware of the learning targets during the learning process, such a situation is referred to as implicit learning (Williams, Reference Williams, Ritchie and Bhatia2009). The present study focuses on the incidental learning mechanism in general.

It has been argued that the implicit/incidental learning mechanisms, often postulated as domain-general, play a fundamental role in various social behaviors and everyday situations, including first language acquisition (Reber, Reference Reber1993). Still, the extent to which incidental/implicit learning mechanisms are involved in SLA remains controversial (see Andringa and Rebuschat, Reference Andringa and Rebuschat2015; Chan and Leung, Reference Chan and Leung2018; Hulstijn and Ellis, Reference Hulstijn and Ellis2005 for a review). A major line of research on incidental/implicit language learning thus lies in determining the scope of the adult capacity in acquiring knowledge of L2/novel language patterns without intention and/or awareness. Recently, research endeavors on incidental/implicit language learning of various aspects of language have surged (e.g., Rebuschat and Williams, Reference Rebuschat and Williams2012, Williams and Kuribara, Reference Williams and Kuribara2008 on syntax; Leung and Williams, Reference Leung and Williams2011, Reference Leung and Williams2012, Reference Leung and Williams2014 on form-meaning connections; Grey, Williams, & Rebuschat, Reference Grey, Williams and Rebuschat2014, Rogers, Revesz, & Rebuschat, Reference Rogers, Revesz and Rebuschat2016 on morphology). There is a relatively small but growing body of work with respect to phonological patterns. For instance, Dell, Reed, Adams, and Meyer (Reference Dell, Reed, Adams and Meyer2000) demonstrated implicit knowledge of novel artificial phonotactics ([f] always occurred as an onset and [s] as a coda), as revealed by speech errors, may be learned through incidental exposure. Warker and Dell (Reference Warker and Dell2006) showed a similar incidental learning effect of second-order phonotactic constraints (e.g., if the vowel is [æ], [g] must occur as an onset and [k] as a coda, but if the vowel is [ɪ], [k] must occur as an onset and [g] as a coda). In the prosodic domain, recent studies have provided evidence that implicit knowledge of novel stress patterns (Chan and Leung, Reference Chan and Leung2014, Reference Chan and Leung2018; Graham and Williams, Reference Graham and Williams2018) may be acquired through incidental exposure. This raises the possibility of acquiring implicit knowledge of other prosodic patterns incidentally/implicitly, and renders the possibility of acquiring implicit knowledge of tone-segment connections through incidental exposure a theoretically interesting research question. As discussed in the preceding text, the learning of tone-segment connections hinges on the perception and encoding of both segmental and tone categories at the syllable level. This may pose additional challenges to learners, especially those whose L1 is a nontone language. Therefore, another goal of this article is to determine whether implicit knowledge of novel tone-segment connections may be learned through incidental exposure, and whether such learning may be constrained by learners’ prior tonal experience and musical training.

A further question in implicit/incidental learning research concerns the nature of the resultant knowledge: specifically, whether incidental/implicit learning may lead to abstract implicit knowledge (see Chan and Leung, Reference Chan and Leung2018 for a detailed discussion). Implicit knowledge refers to the unconscious knowledge that one is unaware of possessing, while explicit knowledge the conscious knowledge that one is aware of possessing and may be able to verbalize (Hulstijn & Ellis, Reference Hulstijn and Ellis2005). Previous research has revealed a complex relationship between the conscious state of the learning process and that of the resultant knowledge. In other words, implicit/incidental learning does not necessarily lead to implicit knowledge, and explicit/intentional learning does not necessarily lead to explicit knowledge. For instance, implicit learning may first result in implicit knowledge. With continuous exposure to the stimuli, one may develop an “insight” and relevant explicit knowledge of the underlying patterns in the stimuli. However, explicit learning may first lead to explicit knowledge but with enough practice, it may transform into implicit and automatized knowledge that influences behavior without consciousness (Williams, Reference Williams, Ritchie and Bhatia2009). In fact, both implicit and explicit knowledge may develop regardless of the learning process (e.g., Godfroid, Reference Godfroid2016; Rebuschat and Williams, Reference Rebuschat and Williams2012; Rogers et al., Reference Rogers, Revesz and Rebuschat2016). Therefore, it is important to assess the conscious state of knowledge participants acquire after incidental exposure (Rebuschat, Reference Rebuschat2013). As for the abstractness of the resultant knowledge, controversy has centered on whether the resultant knowledge from incidental/implicit learning is abstract and potentially rule based, or is merely based on memorized chunks/fragments or details of particular exemplars (see Chan and Leung, Reference Chan and Leung2018 for a review). In the present study, participants’ structural knowledge of tone-segment connections resulting from incidental exposure will be assessed using a source attribution task (Dienes, Reference Dienes2008), and the abstractness of the resultant knowledge will be evaluated based on the transfer tests (e.g., Altmann, Dienes, & Goode, Reference Altmann, Dienes and Goode1995) (see “Testing Phase” in the Methods section for details).

RESEARCH QUESTIONS

In sum, the research questions of the present study are as follows:

  1. (1) Can tone-segment connections be acquired through incidental exposure?

  2. (2) If so, is the resultant knowledge abstract and implicit?

  3. (3) What are the effects of prior musical training and tonal experience on the encoding of pitch patterns as abstract tone categories at the syllable level, as reflected in the learning of novel tone-segment connections?

METHODS

PARTICIPANTS

Four groups of participants (N = 80) aged 17–30 were recruited in the study (20 Cantonese musicians, 20 Cantonese nonmusicians, 20 English musicians, and 20 English nonmusicians) and their background information are summarized in Table 1. All Cantonese learners spoke English and Mandarin as an L2, ranging from intermediate to advanced level for English and from beginner to advanced level for Mandarin based on self-report. Some L1 English learners spoke other nontonal languages – including German, Spanish, French, Hungarian, Dutch, Welsh, Swedish, and Persian, ranging from beginner to advanced level based on self-report, but none of the English learners reported any knowledge of any tone language.

TABLE 1. Background information of each participant group.

Musicians and nonmusicians were defined as follows based on Mok and Zuo (Reference Mok and Zuo2012):

  • Musicians: participants with 6 or more years of formal training in singing or any instrument who have played music regularly in the past 2 years;

  • Nonmusicians: participants with no more than 2 years of casual musical experience/training who have not played music regularly in the past 2 years.

People with musical experience between the two categories were not recruited for this study as their musical background was too ambiguous.

LEARNING TARGETS

The learning targets involved two artificial rules on the mappings between onset consonant and tone as illustrated in the following text:

  1. (1) Words beginning with an aspirated stop (e.g., /ph/, /th/, or /kh/) always carry a rising tone (R) (e.g., /pho:mR/, /thɔːŋR/, /khaːnR/).

  2. (2) Words beginning with an approximant (e.g., /l/, /w/, or /j/) always carry a falling tone (F) (e.g., /lo:mF/, /wɔːŋF/, /jaːnF/).

Only monosyllabic words were used in the experiment, as tonal coarticulatory effect would be a confound for the study of tone processing if polysyllabic words are used. To fully learn the rules mentioned previously, participants had to be able to (a) distinguish perceptually the two classes of consonants (aspirated stops vs. approximants) and the two lexical tones (rising or falling); and more importantly (b) form relevant segmental and abstract tone categories at the syllable level and pick up their connections. Because our focus was to determine whether participants could form novel tone categories at the syllable level, it was our intention to choose tones that should be easily distinguishable (rising vs. falling) at the phonetic-acoustic level, and used phonological natural classes that are present in both English and Cantonese. No processing of meaning, which was a confound in many previous studies on lexical tone processing, was required.

Participants’ prior linguistic knowledge may play a role in the learning of the target patterns. At the segmental level, /ph/, /th/, and /kh/ and /l/, /w/, and /j/ are phonological natural classes of aspirated stops and approximants respectively in both Cantonese and English. As such, all participants in the experiment have similar starting points for the segmental part of the learning targets.Footnote 3 At the prosodic level, Cantonese contrasts six phonemic tones: T1[55] high level, T2[25] high rising, T3[33] mid-level, T4[21] low falling, T5[23] low rising, and T6[22] low level (Bauer & Benedict, 1997). The Cantonese learners also spoke Mandarin as an L2 that contrasts four phonemic tones on monosyllables: T1[55] high tone, T2[25] rising tone, T3[214] dipping/low tone, and T4[51] falling tone (Norman, Reference Norman1988). The rich pitch contrasts in the Cantonese and Mandarin tone systems, and in particular the presence of both rising and falling tones, may facilitate their perception of novel rising and falling tones in the learning targets (e.g., they may perceptually assimilate the target tones to the Cantonese and/or Mandarin counterparts as predicted by the Perceptual Assimilation Model (So & Best, Reference So and Best2010). Also, native Cantonese speakers should be able to encode novel abstract tone categories at the lexical level as tones are used contrastively in Cantonese at the syllable level. However, native English speakers should have little problem distinguishing between a rising tone and a falling tone as the two pitch contours are frequently used in English intonation (Wells, Reference Wells2006). They could potentially assimilate the novel tones in the learning targets to their intonational rise and fall of the English tonal system (So & Best, Reference So and Best2010). However, although contrastive pitch patterns may also be found on monosyllabic words in English (e.g., the single word “me” may carry a rising tone or a falling tone to convey uncertainty and definiteness, respectively), in English intonation typically operates at the utterance level involving more than one syllable and English does not contrast word meaning based on syllable-based pitch variation. Native English speakers may not be able to encode abstract tone categories at the syllable level and thus they may fail to learn the target tone-segment connections.

STIMULI AND PROCEDURE

All stimuli used in the present study were monosyllabic nonce words generated by the Salika 2011 Thai speech synthesizer,Footnote 4 and thus the phonetic realizations of the stimuli were based on Standard Thai. Thai was chosen to minimize the effects of prior linguistic knowledge for all the participants. None of the participants reported any knowledge of Thai. The experiment was administrated on a computer using E-prime.

The learning of the target tone-segment connections was assessed with a word learning task. Before that, an AX discrimination task was administered to test whether participants could perceptually distinguish the two tones in the learning targets (rising and falling).

AX Discrimination Task.

The goal of the task was to test whether participants could distinguish the two tones (rising and falling) in the learning targets. The monosyllabic nonce words used in the task had either a CV or a CVV structure, which were concatenations of phonemes and tones listed as follows:

  • Onset: /s/, /h/, or /f/

  • Nucleus: /i:/, /e:/, /ɯː/, /ɤː/, /u:/, /o:/, /ɔː/, /iːa/, /uːa/, or /ɯːa/

  • Tone: rising (R) or falling (F)

The consonants /s/, /h/, and /f/ were chosen here in a bid to avoid overlapping with the phonological natural classes of the consonants involved in the learning targets. The concatenations listed resulted in 60 different monosyllabic nonce words (3 × 10 × 2). In each trial, participants were auditorily presented with two monosyllabic words with an interstimulus interval (ISI) of 500 ms. They were instructed to indicate whether the two words have the same pitch pattern using a serial response box, and they were instructed to make their decision as quickly as possible. The two monosyllabic words in each trial were either AA pairs (same-tone pairs) and AB pairs (different-tone pairs). Sixty AA pairs were formed by repeating the monosyllabic words (e.g., /fi:R/-/fi:R/; /hɤːF/-/hɤːF/). For the AB pairs, the two words shared the same segmental content and differed only in tone they carried (e.g., /suːaR/-/suːaF/). The order of the AB pairs was counterbalanced (i.e., both AB and BA pairs were used), resulting in 60 AB pairs.

A short practice was given at the beginning, followed by 120 trials in the AX discrimination task (60 AA pairs and 60 AB pairs). Both accuracy and reaction time data were collected. No feedback was given throughout the actual task. It was expected that the two target tones should be easy for all the participants to distinguish. If Cantonese and English learners performed similarly well in this task, a further question would be whether different participant groups would show differential success in learning the target tone-segment connections.

Word Learning Task

The general design of the word learning task was adapted from Chan and Leung (Reference Chan and Leung2014, Reference Chan and Leung2018). All the monosyllabic words used in this task had a CVC structure. The task consisted of two phases: the training phase and the testing phase.

Training Phase

The goal of the training phase was to expose the participants to the target tone-segment connections incidentally (i.e., without stating the target rules). Seventy-two (4 × 6 × 3) monosyllabic nonce words used in the training phase were concatenations of the following phonemes:

  • Onset: /ph/, /th/, /l/, or /w/

  • Nucleus: /i:/, /u:/, /o:/, /ɛː/, /ɔː/, or /aː/

  • Coda: /m/, /n/, or /ŋ/

Words with an aspirated stop (i.e., /ph/ and /th/) in the onset always carried a rising tone, whereas those with an approximant (i.e., /l/ and /w/) in the onset always carried a falling tone. By creating stimuli using concatenations of phonemes, the frequency of each phoneme in the nucleus and coda was the same for words with an aspirated stop onset and words with an approximant onset. This served to prevent participants from relating the tone a word carries with anything other than the nature of the onset consonant.

Participants were told that they were going to learn words in an unknown language but were not told anything about the language. In each trial, participants were auditorily presented with a word. They were asked to listen to the word carefully and repeat it aloud as accurately as possible. To encourage the participants to pay attention to the pronunciation of the words, they were told that their voice would be recorded. No visual information on the spelling or meaning of the word was provided. Importantly, no explicit information about the connections between the initial consonant and tone type (i.e., the learning targets) was provided. Participants’ pronunciation accuracy during the training phase was not the focus of the experiment and thus the production data were not analyzed. The 72 nonsense words were randomly presented and were repeated in four different blocks to form a total of 288 trials.

The design of the training phase was in line with the “noticing” hypothesis in implicit/incidental learning (Schmidt, Reference Schmidt1990, Reference Schmidt and Robinson2001, Reference Schmidt, Chan, Chi, Cin, Istanto, Nagami, Sew and Walker2010), which stipulates that “noticing” (conscious registration of attended input) but not “understanding” (a higher level of awareness that involves generalizations across instances, such as knowledge of rules and metalinguistic awareness) is required for the incidental/implicit learning of language patterns. In the training phase of the present study, incidental/implicit learning of the target rules was hypothesized to depend on attention to the pronunciations of the segments and tonesFootnote 5 through practice, but generalizations of tone-segment connections beyond individual training items and the emergence of relevant abstract implicit knowledge could take place using a basic associative learning mechanism in humans that automatically extracts patterns across instances, leading to a nonverbalizable and intuitive form of knowledge (Schmidt, Reference Schmidt, Chan, Chi, Cin, Istanto, Nagami, Sew and Walker2010).

Testing Phase

The testing phase consisted of a pronunciation judgment task and a source attribution task. In each trial of the pronunciation judgment task, participants were auditorily presented with two words and they were asked to choose the one that “sounds better” to them (instead of “choose the correct one”). This encouraged the participants to use their intuition rather than explicit knowledge in their judgment (Scott & Dienes, Reference Scott and Dienes2008).

Previous studies on incidental/implicit learning (e.g., Altmann et al., Reference Altmann, Dienes and Goode1995; Gomez and Gerken, Reference Gomez and Gerken1999; Reber, Reference Reber1993) have demonstrated that if participants possess abstract and potentially rulelike knowledge of the learning targets instead of merely memory of the training items, they should be able to transfer their knowledge to novel words/items. In this study we included two kinds of items: critical items and extension items (Chan & Leung, Reference Chan and Leung2014, Reference Chan and Leung2018). Sound pairs for the critical and extension items (see details in the following text) differed only in terms of the tone they carried (e.g., /phɯːmR/ vs. / phɯːmF/). As such, it was when participants possessed knowledge of the target tone-segment mappings that they would show a preference for words that follow the target rules (e.g., /phɯːmR/ in the preceding case).

The critical items were concatenations of the following phonemes, resulting in 36 novel words (4 × 3 × 3):

  • Onset: /ph/, /th/, /l/, or /w/

  • Nucleus: /ɯː/, /eː/, or /ɤː/

  • Coda: /m/, /n/, or /ŋ/

In other words, the critical items differed from the items in the training phase only in their vowel in the nucleus. If participants had acquired abstract knowledge of the tone-segment connections in the training phase rather than merely memorizing the items encountered in the training phase, they should show a preference for novel words that follow the rules in the learning targets in the critical trials despite the change in the vowel.

However, the extension items were concatenations of the following phonemes, resulting in another 36 novel words (2 × 6 × 3). They differed from those in the training phase only in their onset consonants:

  • Onset: /kh/, /j/

  • Nucleus: /i:/, /u:/, /o:/, /ɛː/, /ɔː/, or /aː/

  • Coda: /m/, /n/, or /ŋ/

Therefore, if participants had acquired abstract and potentially rulelike knowledge of the target patterns (e.g., if the onset consonant is an aspirated stop/approximant, the word should carry a rising/falling tone or other correlated rules; see the “Discussion” section for details), they should be able to transfer their knowledge to words with a new onset consonant and show a preference for words that followed the learning targets in their judgment.

All stimuli in the filler trials were words previously encountered in the training phase. The purpose of having 50% of the trials as fillers was to disguise the purpose of the test and discourage the use of explicit knowledge (Keating & Jegerski, Reference Keating and Jegerski2015). To sum up, the pronunciation judgment tasks involved 144 trials, including 36 critical trials, 36 extension trials, and 72 filler trials.

The conscious state of participants’ structural knowledge of the learning targets was also assessed through a source attribution task (Dienes, Reference Dienes2008). In each trial, after each pronunciation judgment, they were instructed to state the basis of their judgment as either “guess,” “intuition,” “recollection,” or “rule/rules,” which were defined as follows: “guess” – you were making a wild guess like flipping a coin; “intuition” – your judgment was based on a feeling that you cannot explain; “memory” – your judgment was based on a recollection; and “rule/rules”: your judgment was based on one or more rules or partial rules that you can state. “Guess” and “intuition” attributions reflect unconscious structural knowledge, while “recollection” and “rule/rules” attributions reflect conscious structural knowledge (Dienes & Scott, Reference Dienes and Scott2005).

RESULTS

AX DISCRIMINATION TASK

We first examined participants’ sensitivity to differences between sounds in the AB pairs (d’) and potential bias toward certain responses (decision criterion) (c) based on Signal Detection Theory (Macmillan & Creelman, Reference Macmillan and Creelman2005), with “hit,” “miss,” “false alarm,” and “correct rejection” defined as shown in Table 2.

TABLE 2. Definitions of “hit,” “miss,” “false alarm,” and “correct rejection” for participants’ responses in the AX discrimination task.

Both d’ and c were computed using the psycho package (Makowski, 2018) in R (R Core Team, 2013), and Figure 1 presents the average d’ (left panel) and c (right panel) of each group. Most of the participants have a d’ score over 3, suggesting that they were all highly sensitive to the differences between the sound pairs in the AB trials. One-way ANOVA reveals no significant difference in sensitivity among the four groups, F(3, 76) = 0.154, p = 0.927. However, the four groups appear to show a small variation in their response bias (c) and most participants’ c score are all close to zero, revealing little overall response bias. One-way ANOVA reveals an overall significant difference in response bias, F(3, 76) = 4.301, p = 0.007. A post-hoc Tukey pairwise multiple comparison (Table 3) reveals no significant difference for all comparisons except that English nonmusicians had a significantly higher criterion for choosing “same” than Cantonese musicians. To sum up, these results are not surprising, as the two tones in the stimuli were meant to be easy for all participants to distinguish perceptually.

FIGURE 1. Sensitivity (d') (left) and response bias (c) (right) of the four participant groups in the AX discrimination task.

TABLE 3. Post-hoc Tukey pairwise multiple comparisons for the response bias (c) of the four participant groups

Because this task aimed to test participants’ ability to distinguish the two tones in the learning targets, we now focus only on participants’ responses to the AB pairs. The accuracy (number of correct responses) and the reaction time data of the four participant groups were analyzed. For each participant, reaction time values beyond ± 2.5 standard deviations from their mean were treated as outliers and were excluded from analysis. The reaction time data were log-transformed for normalization before any further statistical analyses. Figure 2 shows the average accuracy and log-reaction time (logRT) of the four groups for the AB pairs in the AX discrimination task. In general, the four groups achieved very high accuracy (92.4% to 95%) and their average reaction time was very close (2.55 to 2.73).

FIGURE 2. Accuracy and log-transformed reaction time of the four groups for the AB pairs in the AX discrimination task (C = Cantonese, E = English; error bar = 1 SE).

Mixed-effects models (LMMs) (R package lme4 [Bates & Maechler, 2009] in R) were used to determine the effects of tonal experience and musical background (fixed effects) on participants’ accuracy (generalized linear mixed-effects models, GLMMs) and reaction time (linear mixed-effects models). Participants and items (AB pairs) were included as random effects in the LMMs, with by-participant and by-item random slopes for the effect under investigation. For example, a full model for testing the effect of tonal experience was coded in R as “full_model ←glmer(Accuracy ∼ Musical Background + Tonal Experience + (Tonal Experience |Participants) + (Tonal Experience|Item)).” Effects for each individual effect were tested by likelihood ratio tests of a full model against a reduced model that excluded the effect to be tested.Footnote 6 Effects for interaction were tested by comparing models with and without the interaction (e.g., R code: Musical background * Tonal experience vs. Musical background + Tonal experience) with random intercepts for random effects. Results are presented in Table 4. Musical background showed no significant effect on participants’ accuracy or reaction time. Tonal experience showed a significant effect on participants’ reaction time but not their accuracy. The interaction between musical background and tonal experience is nonsignificant for accuracy but significant for logRT. These results suggest participants’ accuracy was mostly not affected by their prior musical training and tonal experience, but tonal experience may have a relatively subtle facilitative effect for their reaction time, potentially at the automatized level. As expected, distinguishing a rising tone from a falling tone as in the learning targets was generally easy for our participants regardless of their tonal experience or musical background.

TABLE 4. Summary of the statistics of the mixed model comparisons for the factors of interest for the AX discrimination task (* = significant effect).

PRONUNCIATION JUDGMENT TASK

Figure 3 shows the average accuracy of the four groups for the critical items and extension items. Impressionistically, for both critical items and extension items, English participants performed at around chance level (50%) but Cantonese participants performed considerably above chance level, regardless of their prior musical training.

FIGURE 3. Accuracy of the four groups for the critical items and extension items in the pronunciation judgment task (C = Cantonese, E = English; error bar = 1 SE).

GLMMs were constructed to determine the effects of tonal experience, musical background, and their interaction on participants’ accuracy on the critical items and extension items with same procedure as previously mentioned. Again, “tonal experience” and “musical background” were treated as fixed factors and “item” and “participant” as random factors in the models. Results are presented in Table 5, which reveal that musical background had no significant effect on participants’ accuracy on either critical items or extension items but tonal experience had a significant effect on their accuracy of both kinds of items. No significant interaction between musical background and tonal experience was found.

TABLE 5. Summary of the statistics of the mixed model comparisons for the factors of interest for the pronunciation judgment task (* = significant effect).

To determine whether individual groups performed significantly above chance level (i.e., showing a learning effect), further GLMMs were constructed with “Participant Group” (4 levels) as the fixed factor and by-item and by-participant random intercepts.Footnote 16 Ninety-five percent confidence intervals were generated for the prediction of each participant group based on GLMM using the “predict” function in R. Results are presented in Tables 6 and 7. For both critical items and extension items, the 95% confidence intervals of both Cantonese musicians and nonmusicians do not contain the chance level value (50% accuracy), revealing that they performed significantly above chance and their knowledge of tone-segment connections was abstract and potentially rulelike. However, the 95% confidence intervals for English musicians and nonmusicians contain the chance level value, suggesting that they did not perform significantly above chance level.

TABLE 6. Accuracy (% correct) on and 95% confidence intervals (from upper bound to lower bound) of the critical items for the prediction of each participant group based on GLMM (chance level = 50%).

TABLE 7. Accuracy (% correct) on and 95% confidence intervals (from upper bound to lower bound) of the extension items for the prediction of each participant group based on GLMM (chance level = 50%).

To determine whether each participant group performed significantly differently for words with different onset consonants and tones, another set of GLMMs were constructed, with “Item Type” (2 levels: approximants vs. aspirated stops) as the fixed factor and by-item random intercept and by-participant random slope for the effect of Item Type. GLMMs were built separately for critical items and extension items. Results are presented in Table 8. For all the four participant groups, there is no significant difference in participants’ accuracy on approximant-initial words and aspirated-stop-initial words for either critical items or extension items, suggesting that in general our participants learned the two target rules on tone-segment connections similarly well.

TABLE 8. Summary of the statistics of the mixed model comparisons on item types in the pronunciation judgment task (* = significant effect).

STRUCTURAL KNOWLEDGE

A further question is whether Cantonese musicians and nonmusicians, who showed a learning effect of the target tone-segment connections, possessed relevant unconscious structural knowledge. Table 9 shows the number and percentage of responses of each source attribution (guess, intuition, memory, or rule[s]) for the critical items and extension items by the Cantonese musicians and nonmusicians. For both critical items and extension items, more than 60% of the source attribution responses were implicit attributions (i.e., guess or memory), with “guess” being the attribution with the highest frequencies. This suggests that most of the time the Cantonese participants were simply guessing or using their intuition for their judgments in the critical and extension trials. In general, there were more implicit attributions for extension items than critical items. This is in line with the fact that extension items involved new onset consonants that were not encountered in the training phase.

TABLE 9. Number and percentage of responses of each attribution for critical items and extension items (36 critical/extension items * 40 participants = 1,440 responses).

To test whether Cantonese participants showed a learning effect of the target tone-segment connections even when they stated they were guessing or using their intuition, another set of GLMMs were built to determine participants’ accuracy when they stated “guess” or “intuition” (implicit attributions) and 95% confidence intervals were generated in the same way as reported in the preceding text. Results are presented in Tables 10 and 11. For both critical items and extension items, the 95% confidence intervals of both Cantonese musicians and nonmusicians do not contain the chance level value (50% accuracy), revealing that they performed significantly above chance for both kinds of items even when they claimed to be guessing or using their intuition. This reveals that Cantonese participants possessed implicit, abstract, and potentially rulelike knowledge of the target tone-segment connections.

TABLE 10. Accuracy (% correct) on and 95% confidence intervals (from upper bound to lower bound) of the critical items for the prediction of Cantonese participants (with data from their implicit attributions only) based on GLMM (chance level = 50%).

TABLE 11. Accuracy (% correct) on and 95% confidence intervals (from upper bound to lower bound) of the extension items for the prediction of Cantonese participants (with data from their implicit attributions only) based on GLMM (chance level = 50%).

DISCUSSION

The present study sought to determine (a) whether tone-segment connections can be acquired through incidental exposure; and if so (b) whether the resultant knowledge may be abstract and implicit; and (c) the effects of prior musical training and tonal experience on the encoding of pitch patterns as abstract tone categories at the syllable level, as reflected in the learning of tone-segment connections. With regard to the goals (a) and (b), we found that Cantonese speakers, regardless of their musical background, developed knowledge of novel tone-segment connections only after brief exposure without explicit feedback. They performed similarly well on the two target rules for both critical items and extension items. With regard to the conscious state of the resultant knowledge, it was found that Cantonese speakers possessed implicit structural knowledge of the target tone-segment mappings as assessed by the source attribution task (Dienes, Reference Dienes2008). We conclude that it is possible for Cantonese learners (and potentially speakers of other tone languages as well) to acquire implicit knowledge of tone-segment connections through incidental exposure. This adds to an existing body of work demonstrating that different linguistic patterns, especially phonological patterns, can be acquired by adult L2 learners (Chan & Leung, Reference Chan and Leung2014, Reference Chan and Leung2018; Dell et al., Reference Dell, Reed, Adams and Meyer2000; Grey et al., Reference Grey, Williams and Rebuschat2014; Leung & Williams, Reference Leung and Williams2011, Reference Leung and Williams2012, Reference Leung and Williams2014; Rebuschat & Williams, Reference Rebuschat and Williams2012; Rogers et al., Reference Rogers, Revesz and Rebuschat2016; Warker & Dell, Reference Warker and Dell2006; Williams & Kuribara, Reference Williams and Kuribara2008).

The Cantonese speakers also appeared to have an abstract and potentially rulelike knowledge of the target tone-segment connections. In the present study, given the strict control on the frequency of phonemes in the nucleus and coda positions for the stimuli in the training phase and the testing phase, their above-chance performance on the critical items and the extension should have been based on their knowledge on the mapping between the onset consonants and the target tones. The Cantonese speakers displayed a learning effect for novel words whose onset consonants were encountered in the training phase (critical items), suggesting that they possessed knowledge of the target tone-segment connections, which should be at a sufficient level of abstraction to facilitate transfer to completely new lexical items rather than relying solely on their memory of the training items. Moreover, they also showed a learning effect for novel words with onset consonants not encountered in the training phase (extension items), suggesting that their knowledge is potentially rulelike. While they may have acquired rules described in terms of the phonological classes of the onset consonant and the tone types (i.e., words beginning with an aspirated stop always carry a rising tone, and words beginning with an approximant always carry a falling tone), they might also have learned rules about the mappings of tones with other salient byproducts of the target rules (correlated rules) (e.g., words beginning with an aspiration or a “puff of air” always carry a rising tone, or words beginning with a consonant with vowellike quality [a property of approximants] always carry a falling tone). Although the design of the present study did not allow us to determine whether their knowledge was based on the entire phonological classes or the associated features of these phonological classes, our findings provide evidence that their knowledge of the target tone-segment connections was abstract and potentially rule-based knowledge instead of merely based on memory of the training items. Still, it should be noted that Cantonese speakers’ performance on extension items was worse than that on critical items. This is in line with previous studies that used the transfer tests (e.g., Altmann et al., Reference Altmann, Dienes and Goode1995): judgment performance was worse on test items with different surface features (e.g., different letter set) or in different modalities (e.g., from visual to auditory). Specifically in our study, one main difference between the onset consonants in the training/critical items and those in the testing items lies in the feature [anterior]: onset /ph/, /th/, /l/, and /w/ in the training/critical items were [+anterior] whereas onset /kh/ and /j/ were [–anterior]. Participants might thus be less able to apply the knowledge they had acquired in the training to the extension items. This interpretation is consistent with the fact that all four participant groups apparently had lower accuracy on the extension items than on the critical items.

As for goal (c), our study demonstrates that incidental learning of tone-segment connections does not appear to be completely unconstrained. The English speakers in our study, be they musicians or nonmusicians, did not show any learning effect of the target tone-segment connections. This does not mean that tone-segment connections are not learnable for them, as we cannot rule out the possibility that the target tone-segment connections could also be learnable by English speakers if more input was given (e.g., more trials in the training phase). Still, with the same amount of input and similar experiment duration, our results show that Cantonese speakers and English speakers exhibited differential success in learning the target tone-segment connections through incidental exposure. What could have caused such a difference in their performance? Our learning targets involved the mappings between onset consonants (aspirated stops vs. approximants) and lexical tones (rising or falling). At the segmental level, both aspirated stops and approximants are phonological natural classes in English and Cantonese; both Cantonese and English speakers in our study should share a similar starting point with regard to the segmental parts of the target rules. Besides, both English and Cantonese speakers performed similarly well in distinguishing between the rising tone and the falling tone in the learning targets as revealed in the AX discrimination task, implying that the observed differential performance was unlikely to be due to difficulty in perceiving tones at the phonetic-acoustic level. Then it is likely that the observed differential performance was due to how the tones in the learning targets were processed, which can be explained by the Perceptual Assimilation Model (Best, Reference Best, Goodman and Nusbaum1994). Specifically, our Cantonese speakers, who also spoke Mandarin as an L2, may have perceptually assimilated the rising and falling tones in the learning targets to the Cantonese and/or Mandarin counterparts. As such, they could encode novel pitch patterns as abstract tone categories at the word level, which was the prerequisite of learning novel tone-segment connections incidentally. In contrast, English learners may have perceptually assimilated the rising and falling tones in the learning targets to intonational rise and fall in the English tonal system (So & Best, Reference So and Best2010). Although contrastive intonational patterns may also operate on monosyllabic words in English and other nontonal languages, intonation usually operates at the utterance level and is not used for contrasting word meaning. Therefore, English learners might not be able to encode novel pitch patterns as abstract tone categories at the syllable level and thus did not show a learning effect of the target tone-segment connections.

These results have implications for L2 lexical tone learning in general. L2 lexical tones are often reported to be difficult, especially for learners whose L1 is a nontonal language, but the nature of the long-term difficulty involved remains far from clear (Pelzl et al., Reference Pelzl, Lau, Guo and DeKeyser2019). Most previous studies have focused on the perception of L2 lexical tones at the phonetic-acoustic level (e.g., Francis et al., Reference Francis, Ciocca, Ma and Fenn2008; Hallé et al., 2004; Wayland & Guion, Reference Wayland and Guion2004). However, it is not until the past decade that the distinction between phonetic perception/processing of sounds and phonological and lexical perception/processing of tones has been highlighted in the L2 tone learning literature (e.g., Malins & Joanisse, Reference Malins and Joanisse2010, Reference Malins and Joanisse2012; Pelzl et al., Reference Pelzl, Lau, Guo and DeKeyser2019; Wiener et al., Reference Wiener, Ito and Speer2018). For instance, Pelzl et al. (Reference Pelzl, Lau, Guo and DeKeyser2019) found a discontinuity of success in L2 tone learning: they found that L1 English advanced learners of Mandarin performed at a near-native level in identifying tones on isolated words, but they struggled in a lexical decision task and a semantic judgment task that hinged on lexical tone processing. A possible reason lies in English learners’ inability to “repurpose” pitch patterns from intonational cues to lexical cues, but they must first form abstract tone categories at the word level, which can be a separate challenge for English and other nontonal language speakers, before they can use them as lexical cues in lexical tasks. However, previous studies on the “lexical learning” of tones conflate lexical tone processing at the phonological level (abstract and categorization) with the lexical level (processing of word meaning), by using lexical tasks to infer participants’ ability to establish novel phonological categories. As far as we are aware, this is the first study that teases apart the phonological level from the phonetic and lexical levels in tone processing and demonstrated that forming abstract tone categories at the word level may be difficult for English speakers, which is a prerequisite of using abstract tone categories as lexical cues for contrasting word meaning. Specifically, in the present study, English speakers achieved nativelike accuracy in distinguishing the rising tone from the falling tone in the learning targets but they nevertheless failed to learn the target tone-segment connections. Such “discontinuity” is potentially due to their processing bias in treating pitch patterns as intonational cues. Further research in L2 tone learning should focus on how nontonal language speakers may overcome such a processing bias and how to facilitate the encoding of pitch patterns as abstract tone categories at the word level by nontonal language speakers.

In addition, our findings also contribute to the long-standing effort in the study of the relationship between music and speech. Previous work has provided evidence for both the overlap and separation of the musical and linguistic domains (see the Introduction section for a detailed discussion). However, a crucial question about the relationship between music and speech remain unexplored: that is, whether prior musical training may facilitate the formation of novel lexical tone categories. The present study shows that prior musical training did not have any significant effect on the learning of tone-segment connections by either Cantonese or English speakers, suggesting that prior musical training does not facilitate the encoding of pitch patterns as abstract tone categories at the syllable level. This is in line with a previous finding that, while musicians who speak a nontonal language were relatively more accurate in pitch contour identification than nonmusicians, their pitch contour abstraction and categorization ability was comparable to that of nonmusicians (Wayland et al., Reference Wayland, Herrera and Kaan2010), suggesting that musicians do not have an advantage in forming abstract categories based on pitch patterns despite better ability to perceive tones at the phonetic-acoustic level. Our results highlight an area of the separation between the musical domain and the linguistic domain.

At a broader level, our findings reveal that the incidental learning mechanism, despite being domain-general, does not seem to involve unconstrained associative learning but may interact with learners’ prior linguistic experience and processing biases. Specifically, the incidental learning of tone-segment connections appears to depend crucially on how pitch variation is used in the tone system(s) of the language(s) the learners speak, which shapes their processing bias on pitch patterns. The present study adds to a relatively small body of work that investigates the limits of and constraints involved in incidental/implicit learning (e.g., Leung & Williams, Reference Leung and Williams2011, Reference Leung and Williams2012, Reference Leung and Williams2014).

In the present study, it was our intention to (a) use simple tonal contrast (i.e., rising vs. falling) in our stimuli that were easy for all participant groups to discriminate at the phonetic-acoustic level; (b) have strict control on the frequency of all the different phonemes in the stimuli; and (c) make the tasks entirely auditory without involving meaning in the stimuli. With careful experimental control on the stimuli based on the learning targets, participants were exposed to systematic data and the results provide strong evidence that the differential success by Cantonese and English speakers in the incidental learning of tone-segment connections can only be attributed to how tones were processed at the phonological level. Due to the quest for optimum experimental control, one might imagine that because the tone-segment connections involved in our study were artificial and simple when compared with tone-segment constraints found in some natural languages such as Thai (Sladen, Reference Sladen2009), the findings seem to have little apparent relevance to L2 speech learning in naturalistic settings. However, as in many psychology/psycholinguistic experiments, a plausible assumption is that participants brought the same processing resources that they use in everyday life to the laboratory, and thus their performance must reveal something about how their minds work. In terms of language learning, it has been found that success in an artificial language learning experiment correlates positively with indices of L2 learning (Ettlinger, Morgan-Short, Faretta-Stutenberg, & Wong, Reference Ettlinger, Morgan-Short, Faretta-Stutenberg and Wong2016). Therefore, it is reasonable to believe that our findings are also relevant to L2 tone learning in naturalistic settings.

CONCLUSION

The present study investigated the incidental learning of tone-segment connections by Cantonese and English speakers. Despite similar performance on distinguishing the rising tone from the falling tone perceptually, Cantonese speakers showed an abstract and potentially rulelike knowledge of the target tone-segment connections but English speakers did not, attributable to the possibility that English learners failed to form tone categories at the syllable level. Also, prior musical training did not seem to facilitate the formation of lexical tone categories, providing evidence for the separation between music and speech.

Footnotes

We would like to thank Professor Susan Gass, the editor, and two anonymous reviewers for their constructive feedback on our work. We would also like to thank Scarlett Hao, Alision Lam, Sula Ross, and Bruce Wang for their research assistance, and Patchanok Kitikanan for her advice on the stimuli. Part of the data collection took place when the first author worked in Lancaster University, UK. The financial support from Faculty of Arts and Social Sciences Research Fund, Lancaster University (project code: SZA1435) and Small Project Funding, University of Hong Kong (project code: 201409176014) is gratefully acknowledged.

1 While other phonetic cues such as duration, loudness, and voice quality also play a role in tone perception and production, this article will focus on pitch, which is the primary cue of lexical tones.

2 From a phonological point of view, there are “four tones” in Middle Chinese tonal system, namely, Ping (level), Shang (Rising), Qu (Departing), and Ru (Entering). Entering tone only occurs when syllables ending with stops, while the other three tones occur when syllables ending with vowel, semivowel, and nasal (Sagart, Reference Sagart1999). Cantonese, for example, has three entering tones that have similar pitch heights and shapes as the three corresponding nonentering level tones but with shorter durations. They are often treated as allotones of the corresponding nonentering tones.

3 Pairs of aspirated and unaspirated stops (e.g., /p/ and /ph/) are allophones in English but are different phonemes in Cantonese. However, this should not lead to any difference in learning the target tone-segment connections by the four groups of learners because, in the present study, stops only appeared at a fixed location (i.e., syllable onset in monosyllabic words) and were not preceded by /s/. Under these conditions, only phonetically aspirated, and not unaspirated, voiceless stops occur in English.

4 By PPA Innovation Co., Ltd. (see http://www.ppainnovation.com/salika/homeedition_en.html for details). Synthesized stimuli instead of recordings by a native speaker were used in the experiment because a native speaker is likely to vary in their fluency when trying to pronounce nonce words.

5 By performing the AX discrimination task first, it is likely that participants will have been alerted to tones in the training phase. But this is a nonissue as long as no explicit information about the connections between the initial consonant and tone type will be provided.

6 For example, to test the effect of musical background on accuracy, the R code for the full model is “full_model < -glmer(Accuracy ∼ Musical Background + Tonal Experience + (Musical Background|Participants) + (Musical Background|Item))”; the reduced model was coded as “reduced_model ←glmer(Accuracy ∼ Tonal Experience + (Musical Background|Participants) + (Musical Background|Item)”

7 Full model: glmer(Accuracy ∼ Musical Background + Tonal Experience+ (Musical Background | Participants) + (Musical Background | Item), family=binomial, control=glmerControl(optimizer = “optimx,” “optCtrl=list(method = ‘nlminb’)))”

8 Full model: (LogRT∼ Musical Background + Tonal Experience + (Musical Background | Participants) + (Musical Background | Item), REML = FALSE)

9 Full model: glmer(Accuracy ∼ Musical Background + Tonal Experience + (Tonal Experience | Participants) + (Tonal Experience | Item), family=binomial, control=glmerControl(optimizer=‘optimx,’ “optCtrl=list(method=‘Nelder_Mead’)))”

10 Full model: (LogRT∼ Musical Background + Tonal Experience + (Tonal Experience | Participants) + (Tonal Experience | Item), REML=FALSE)

11 Full model: glmer(Accuracy ∼ Musical Background * Tonal Experience + (1 | Participants) + (1 | Item)). Any models with more complicated random slope structure failed to converge.

12 Full model: lmer(LogRT ∼ Musical Background * Tonal Experience + (1 | Participants) + (1 | Item)). A random intercept model was used instead of a random slope model so that the results are comparable with the results from the model on accuracy.

13 Full model: glmer(Accuracy ∼ Musical Background + Tonal Experience + (Musical Background | Participants) + (Musical Background | Item), family=binomial)

14 Full model: glmer(Accuracy ∼ Musical Background + Tonal Experience + (Tonal Experience | Participants) + (Tonal Experience | Item), family=binomial)

15 Full model: glmer(Accuracy ∼ Musical Background * Tonal Experience + (1 | Participants) + (1 | Item)). Any models with more complicated random slope structure failed to converge.

16 R code: glmer(Accuracy ∼ Participant Group + (1| Participants) + (1 | Item), family=binomial). There is no theoretical motivation for any random slope structures in the model.

References

REFERENCES

Andringa, S., & Rebuschat, P. (Eds.), (2015). New directions in implicit and explicit language learning [special issue]. Studies in Second Language Acquisition, 37, 185196.Google Scholar
Altmann, G. T. M., Dienes, Z., & Goode, A. (1995). Modality independence of implicitly learned grammatical knowledge. Journal of Experimental Psychology: Learning, Memory and Cognition, 21, 899912.Google Scholar
Bates, D. M., & Maechler, M. (2009). lme4: Linear mixed-effects models using S4 classes. R Package Version 0.999375-32.Google Scholar
Bauer, R. S., & Benedict, P. K. (1997). Modern Cantonese phonology. Berlin: Walter de Gruyter.CrossRefGoogle Scholar
Best, C. (1994). The emergence of native-language phonological influences in infants: A perceptual assimilation model. In Goodman, C. & Nusbaum, H. (Eds.), The development of speech perception (pp. 167224). Cambridge, MA: MIT Press.Google Scholar
Bidelman, G. M., Gandour, J. T., & Krishnan, A. (2011). Musicians and tone-language speakers share enhanced brainstem encoding but not perceptual benefits for musical pitch. Brain and Cognition, 77, 110.CrossRefGoogle Scholar
Bowles, A. R., Chang, C. B., & Karuzis, V. P. (2016). Pitch ability as an aptitude for tone learning. Language Learning, 66, 774808.CrossRefGoogle Scholar
Bradley, E. D. (2012). Crosslinguistic perception of pitch in language and music (Unpublished doctoral dissertation). Newark: University of Delaware.Google Scholar
Chan, R., & Leung, J. (2014). Implicit learning of L2 word stress regularities. Second Language Research, 30, 463484.CrossRefGoogle Scholar
Chan, R., & Leung, J. (2018). Implicit knowledge of L2 lexical stress rules: Evidence from the combined use of subjective and objective awareness measures. Applied Psycholinguistics, 39, 3766.CrossRefGoogle Scholar
Chandrasekaran, B., Sampath, P. D., & Wong, P. C. M. (2010). Individual variability in cue-weighting and lexical tone learning. Journal of the Acoustical Society of America, 128, 456465.CrossRefGoogle ScholarPubMed
Chang, C. B., & Bowles, A. R. (2015). Context effects on second-language learning of tonal contrasts. Journal of the Acoustical Society of America, 136, 37033716.CrossRefGoogle Scholar
Creel, S. C., Weng, M., Fu, G., Heyman, G. D., & Lee, K. (2018). Speaking a tone language enhances musical pitch perception in 3–5‐year‐olds. Developmental Science, 21, e12503.CrossRefGoogle ScholarPubMed
Dell, G. S., Reed, K. D., Adams, D. R., & Meyer, A. S. (2000). Speech errors, phonotactic constraints, and implicit learning: a study of the role of experience in language production. Journal of Experimental Psychology: Learning, Memory, and Cognition, 26, 1355.Google ScholarPubMed
Deutsch, D., Dooley, K., Henthorn, T., & Head, B. (2009). Absolute pitch among students in an American music conservatory: Association with tone language fluency. The Journal of the Acoustical Society of America, 125, 23982403.CrossRefGoogle Scholar
Deutsch, D., Henthorn, T., & Dolson, M. (2004). Absolute pitch, speech, and tone language: Some experiments and a proposed framework. Music Perception: An Interdisciplinary Journal, 21, 339356.CrossRefGoogle Scholar
Dienes, Z. (2008). Subjective measures of unconscious knowledge. Progress in Brain Research, 168, 4964.CrossRefGoogle ScholarPubMed
Dienes, Z., & Scott, R. (2005). Measuring unconscious knowledge: Distinguishing structural knowledge and judgment knowledge. Psychological Research, 69, 338351.CrossRefGoogle ScholarPubMed
Ettlinger, M., Morgan-Short, K., Faretta-Stutenberg, M., & Wong, P. (2016). The relationship between artificial and second language learning. Cognitive Science, 40, 822847.CrossRefGoogle ScholarPubMed
Francis, A. L., Ciocca, V., Ma, L., & Fenn, K. (2008). Perceptual learning of Cantonese lexical tones by tone and non-tone language speakers. Journal of Phonetics, 36, 268294.CrossRefGoogle Scholar
Gandour, J., Dzemidzic, M., Wong, D., Lowe, M., Tong, Y., Hsieh, L., Satthamnuwong, N., & Lurito, J. (2003). Temporal integration of speech prosody is shaped by language experience: An fMRI study. Brain and Language, 84, 318336.CrossRefGoogle Scholar
Godfroid, A. (2016). The effects of implicit instruction on implicit and explicit knowledge development. Studies in Second Language Acquisition, 38, 177215.CrossRefGoogle Scholar
Gomez, R. L., & Gerken, L. (1999). Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition, 70, 109135.CrossRefGoogle ScholarPubMed
Gottfried, T. L. (2007). Music and language learning: Effect of musical training on learning L2 speech contrasts. In Bohn, O. S. & Munro, M. (Eds.), Language Experience in Second Language Speech Learning (pp. 221237). Philadelphia, PA: John Benjamins.CrossRefGoogle Scholar
Graham, C. R., & Williams, J. N. (2018). Implicit learning of Latin stress regularities. Studies in Second Language Acquisition, 40, 329.Google Scholar
Grey, S., Williams, J. N., & Rebuschat, P. (2014). Incidental exposure and L3 learning of morphosyntax. Studies in Second Language Acquisition, 36, 134.Google Scholar
Hallé, P. A., Chang, Y.-C., & Best, C. T. (2004). Identification and discrimination of Mandarin Chinese tones by Mandarin Chinese vs. French listeners. Journal of Phonetics, 32, 395421.CrossRefGoogle Scholar
Hao, Y. C. (2012). Second language acquisition of Mandarin Chinese tones by tonal and non-tonal language speakers. Journal of Phonetics, 40, 269279.CrossRefGoogle Scholar
Hulstijn, J. H. (2011). Incidental learning in second language acquisition. In Chapelle, C. A. (Ed.), The encyclopedia of applied linguistics (pp. 26322640). Chichester, UK: Wiley-Blackwell.Google Scholar
Hulstijn, J. H., & Ellis, R. (Eds.), (2005). Implicit and explicit second-language learning [special issue]. Studies in Second Language Acquisition, 27.CrossRefGoogle Scholar
Jiang, C., Hamm, J. P., Lim, V. K., Kirk, I. J., & Yang, Y. (2010). Processing melodic contour and speech intonation in congenital amusics with Mandarin Chinese. Neuropsychologia, 48, 26302639.CrossRefGoogle ScholarPubMed
Keating, G. D., & Jegerski, J. (2015). Experimental designs in sentence processing research: A methodological review and user’s guide. Studies in Second Language Acquisition, 37, 132.CrossRefGoogle Scholar
Krishnan, A., Xu, Y., Gandour, J., & Cariani, P. (2005). Encoding of pitch in the human brainstem is sensitive to language experience. Cognitive Brain Research, 25, 161168.CrossRefGoogle ScholarPubMed
Leather, J. (1987). F0 pattern inference in the perceptual acquisition of second language tone. In Leather, J. & James, A. (Eds.), Sound patterns in second language acquisition (pp. 5980). Providence, RI: Foris.Google Scholar
Lee, C. Y., & Hung, T. H. (2008). Identification of Mandarin tones by English-speaking musicians and nonmusicians. The Journal of the Acoustical Society of America, 124, 32353248.CrossRefGoogle ScholarPubMed
Lee, C. Y., Lee, Y. F., & Shr, C. L. (2011). Perception of musical and lexical tones by Taiwanese-speaking musicians. The Journal of the Acoustical Society of America, 130, 526535.CrossRefGoogle ScholarPubMed
Lee, C. Y., Tao, L., & Bond, Z. S. (2009). Speaker variability and context in the identification of fragmented Mandarin tones by native and non-native listeners. Journal of Phonetics, 37, 115.CrossRefGoogle Scholar
Lee, L., & Nusbaum, H. C. (1993). Processing interactions between segmental and suprasegmental information in native speakers of English and Mandarin Chinese. Perception and Psychophysics, 53, 157165.CrossRefGoogle ScholarPubMed
Lee, W. S., & Zee, E. (2009). Hakka Chinese. Journal of the International Phonetic Association, 39, 107111.CrossRefGoogle Scholar
Leung, J., & Williams, J. (2011) The implicit learning of mappings between forms and contextually-derived meanings. Studies in Second Language Acquisition, 33, 3355.CrossRefGoogle Scholar
Leung, J., & Williams, J. (2012) Constraints on implicit learning of grammatical form-meaning connections. Language Learning, 62, 634662.CrossRefGoogle Scholar
Leung, J., & Williams, J. (2014) Cross-linguistic differences in implicit language learning. Studies in Second Language Acquisition, 36, 733755.CrossRefGoogle Scholar
Macmillan, Neil A., & Creelman, C. Douglas. (2005). Detection theory: A user’s guide. Mahwah, NJ: Lawrence Erlbaum.Google Scholar
Makowski, D. (2018). The psycho package: An efficient and publishing-oriented workflow for psychological science. Journal of Open Source Software, 3, 470.CrossRefGoogle Scholar
Malins, J. G., & Joanisse, M. F. (2010). The roles of tonal and segmental information in Mandarin spoken word recognition: An eyetracking study. Journal of Memory and Language, 62, 407420.CrossRefGoogle Scholar
Malins, J. G., & Joanisse, M. F. (2012). Setting the tone: An ERP investigation of the influences of phonological similarity on spoken word recognition in Mandarin Chinese. Neuropsychologia, 50, 20322043.CrossRefGoogle ScholarPubMed
Mok, P. P., & Zuo, D. (2012). The separation between music and speech: Evidence from the perception of Cantonese tones. The Journal of the Acoustical Society of America, 132, 27112720.CrossRefGoogle ScholarPubMed
Nan, Y., Sun, Y., & Peretz, I. (2010). Congenital amusia in speakers of a tone language: association with lexical tone agnosia. Brain, 133, 26352642.CrossRefGoogle ScholarPubMed
Norman, J. (1988). Chinese. Cambridge, UK: Cambridge University Press.Google Scholar
Patel, A. D. (2011). Why would musical training benefit the neural encoding of speech? The OPERA hypothesis. Frontiers in Psychology, 2, 142.CrossRefGoogle ScholarPubMed
Pelzl, E., Lau, E. F., Guo, T., & DeKeyser, R. (2019). Advanced second language learners’ perception of lexical tone contrasts. Studies in Second Language Acquisition, 41, 5986.Google Scholar
Perrachione, T. K., Fedorenko, E. G., Vinke, L., Gibson, E., & Dilley, L. C. (2013). Evidence for shared cognitive processing of pitch in music and language. PLoS One, 8, e73372.CrossRefGoogle ScholarPubMed
Perrachione, T. K., Lee, J., Ha, L. Y. Y., & Wong, P. C. M. (2011). Learning a novel phonological contrast depends on interactions between individual differences and training paradigm design. Journal of the Acoustical Society of America, 130, 461472.CrossRefGoogle ScholarPubMed
R Core Team. (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Version 3.0.0. Retrieved from http://www.R-project.org.Google Scholar
Reber, A. S. (1993). Implicit learning and tacit knowledge: An essay on the cognitive unconscious. Oxford, UK: Oxford University Press.Google Scholar
Rebuschat, P. (2013). Measuring implicit and explicit knowledge in second language research. Language Learning, 63, 595626.CrossRefGoogle Scholar
Rebuschat, P., & Williams, J. N. (2012). Implicit and explicit knowledge in second language acquisition. Applied Psycholinguistics, 33, 829856.CrossRefGoogle Scholar
Repp, B. H., & Lin, H.-B. (1990). Integration of segmental and tonal information in speech perception: A cross-linguistic study. Journal of Phonetics, 18, 481495.CrossRefGoogle Scholar
Rogers, J., Revesz, A., & Rebuschat, P. (2016). Implicit and explicit knowledge of inflectional morphology. Applied Psycholinguistics, 37, 781812.CrossRefGoogle Scholar
Sagart, L. (1999). The origin of Chinese tones. In Proceedings of the Symposium/Cross-Linguistic Studies of Tonal Phenomena/Tonogenesis, Typology and Related Topics (pp. 91104). Tokyo: Institute for the Study of Languages and Cultures of Asia and Africa, Tokyo University of Foreign Studies.Google Scholar
Schmidt, R. (1990). The role of consciousness in second language learning. Applied Psycholinguistics, 11, 129158.CrossRefGoogle Scholar
Schmidt, R. (2001). Attention. In Robinson, P. (Ed.), Cognition and second language instruction (pp. 332). Cambridge, UK: Cambridge University Press.CrossRefGoogle Scholar
Schmidt, R. (2010). Attention, awareness, and individual differences in language learning. In Chan, W. M., Chi, S., Cin, K. N., Istanto, J., Nagami, M., Sew, J. W., & Walker, I. (Eds.), Proceedings of CLaSIC 2010 (pp. 721737). Singapore: National University of Singapore, Centre for Language Studies.Google Scholar
Scott, R., & Dienes, Z. (2008). The conscious, the unconscious, and familiarity. Journal of Experimental Psychology: Learning Memory, and Cognition, 34, 12641288.Google ScholarPubMed
Sebastián-Gallés, N., & Díaz, B. (2012). First and second language speech perception: Graded learning. Language Learning, 62, 131147.CrossRefGoogle Scholar
Sladen, G. (2009). Central Thai phonology. Retrieved from http://www.thailanguage.com/resources/slayden-thai-phonology.pdf.Google Scholar
So, C. K., & Best, C. T. (2010). Cross-language perception of non-native tonal contrasts: Effects of native phonological and phonetic influences. Language and Speech, 53, 273293.CrossRefGoogle ScholarPubMed
Stagray, J. R., & Downs, D. (1993). Differential sensitivity for frequency among speakers of a tone and nontone language. Journal of Chinese Linguistics, 21, 143163.Google Scholar
Wang, Y., Behne, D. M., Jongman, A., & Sereno, J. A. (2004). The role of linguistic experience in the hemispheric processing of lexical tone. Applied Psycholinguistics, 25, 449466.CrossRefGoogle Scholar
Wang, Y., Jongman, A., & Sereno, J. A. (2001). Dichotic perception of Mandarin Tones by Chinese and American listeners. Brain and Language, 78, 332348.CrossRefGoogle ScholarPubMed
Wang, Y., Spence, M. M., Jongman, A., & Sereno, J. A. (1999). Training American listeners to perceive Mandarin tones. The Journal of the Acoustical Society of America, 106, 36493658.CrossRefGoogle ScholarPubMed
Warker, J. A., & Dell, G. S. (2006). Speech errors reflect newly learned phonotactic constraints. Journal of Experimental Psychology: Learning, Memory, and Cognition, 32, 387.Google ScholarPubMed
Wayland, R., Herrera, E., & Kaan, E. (2010). Effects of musical experience and training on pitch contour perception. Journal of Phonetics, 38, 654662.CrossRefGoogle Scholar
Wayland, R. P., & Guion, S. G. (2004). Training English and Chinese listeners to perceive Thai tones: A preliminary report. Language Learning, 54, 681712.CrossRefGoogle Scholar
Wiener, S., Ito, K., & Speer, S. R. (2018). Early L2 spoken word recognition combines input-based and knowledge-based processing. Language and Speech, 61, 632656.CrossRefGoogle ScholarPubMed
Wells, J. (2006). English intonation: An introduction. Cambridge, UK: Cambridge University Press.Google Scholar
Williams, J. N. (2009). Implicit learning in second language acquisition. In Ritchie, W. C. & BhatiaT, K. T, K. (Eds.), The New Handbook of Second Language Acquisition (pp. 319353). Bingley, UK: Emerald Group Publishing Limited.Google Scholar
Williams, J. N., & Kuribara, C. (2008). Comparing a nativist and emergentist approach to the initial stage of SLA: An investigation of Japanese scrambling. Lingua, 118, 522553.CrossRefGoogle Scholar
Wong, P. C. (2002). Hemispheric specialization of linguistic pitch patterns. Brain Research Bulletin, 59, 8395.CrossRefGoogle ScholarPubMed
Wong, P. C., Skoe, E., Russo, N. M., Dees, T., & Kraus, N. (2007). Musical experience shapes human brainstem encoding of linguistic pitch patterns. Nature Neuroscience, 10, 420.CrossRefGoogle ScholarPubMed
Wong, P. C. M., & Perrachione, T. K. (2007). Learning pitch patterns in lexical identification by native English-speaking adults. Applied Psycholinguistics, 28, 565585.CrossRefGoogle Scholar
Xu, Y., Gandour, J. T., & Francis, A. L. (2006). Effects of language experience and stimulus complexity on the categorical perception of pitch direction. The Journal of the Acoustical Society of America, 120, 10631074.CrossRefGoogle ScholarPubMed
Zhang, L. (2011). Meiguo liuxuesheng Hanyu shengdiaode yinwei he shengxue xinxi jiagong. Shijie Hanyu Jiaoxue Chinese Teaching in the World, 25, 268275.Google Scholar
Figure 0

TABLE 1. Background information of each participant group.

Figure 1

TABLE 2. Definitions of “hit,” “miss,” “false alarm,” and “correct rejection” for participants’ responses in the AX discrimination task.

Figure 2

FIGURE 1. Sensitivity (d') (left) and response bias (c) (right) of the four participant groups in the AX discrimination task.

Figure 3

TABLE 3. Post-hoc Tukey pairwise multiple comparisons for the response bias (c) of the four participant groups

Figure 4

FIGURE 2. Accuracy and log-transformed reaction time of the four groups for the AB pairs in the AX discrimination task (C = Cantonese, E = English; error bar = 1 SE).

Figure 5

TABLE 4. Summary of the statistics of the mixed model comparisons for the factors of interest for the AX discrimination task (* = significant effect).

Figure 6

FIGURE 3. Accuracy of the four groups for the critical items and extension items in the pronunciation judgment task (C = Cantonese, E = English; error bar = 1 SE).

Figure 7

TABLE 5. Summary of the statistics of the mixed model comparisons for the factors of interest for the pronunciation judgment task (* = significant effect).

Figure 8

TABLE 6. Accuracy (% correct) on and 95% confidence intervals (from upper bound to lower bound) of the critical items for the prediction of each participant group based on GLMM (chance level = 50%).

Figure 9

TABLE 7. Accuracy (% correct) on and 95% confidence intervals (from upper bound to lower bound) of the extension items for the prediction of each participant group based on GLMM (chance level = 50%).

Figure 10

TABLE 8. Summary of the statistics of the mixed model comparisons on item types in the pronunciation judgment task (* = significant effect).

Figure 11

TABLE 9. Number and percentage of responses of each attribution for critical items and extension items (36 critical/extension items * 40 participants = 1,440 responses).

Figure 12

TABLE 10. Accuracy (% correct) on and 95% confidence intervals (from upper bound to lower bound) of the critical items for the prediction of Cantonese participants (with data from their implicit attributions only) based on GLMM (chance level = 50%).

Figure 13

TABLE 11. Accuracy (% correct) on and 95% confidence intervals (from upper bound to lower bound) of the extension items for the prediction of Cantonese participants (with data from their implicit attributions only) based on GLMM (chance level = 50%).