Introduction
Little is known about the early language development of Mandarin–English-speaking children, especially for those learning English between the ages of three and four. What makes this a particularly interesting area of study is that the two languages differ greatly in their phonological, syllabic and morphological structures. Compared to most dialects of English, Mandarin has a small segmental inventory with five phonemic vowels and no consonant clusters. It shares many segments with English, including the voiceless stops /p, t, k/, the nasals /m, n, ŋ/ and the fricatives /f, s/. One main difference is that Mandarin does not have a voicing contrast, so the voiced counterparts for the above stops and fricatives are absent in Mandarin (see Table 1 for a comparison of the segmental inventories of Mandarin (Duanmu, Reference Duanmu2007) and Australian English (AusE) (Cox & Palethorpe, Reference Cox and Palethorpe2007). Of particular interest for the present study is that Mandarin permits only the nasals /n/ and /ŋ/ in word-final coda position, whereas English permits a range of singleton and cluster codas. This is all the more important given the high morphological load of many coda consonants in English, (e.g., plural -s, past tense -ed), raising many questions about when young speakers of Mandarin exposed to English at preschool begin to acquire English phonology and morphology.
Table 1. Australian English and Mandarin Chinese segmental inventory.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160920223010454-0534:S1366728913000618:S1366728913000618_tab1.gif?pub-status=live)
Background
Studies of adult and older, 5–16-year-old Mandarin-speaking learners of L2 English suggest that both the phonology and morphology of English are difficult to acquire. While voicing contrasts (which do not occur in Mandarin) present a continuing problem despite long exposure to English (Broselow & Zheng, Reference Broselow and Zheng2004; Flege, Munro & Skelton, Reference Flege, Munro and Skelton1992; Hansen, Reference Hansen2001), voiceless stops, fricative and affricate codas are produced relatively well. On the other hand, Mandarin-speaking adults and older children learning L2 English are reported to have persistent difficulties with producing coda consonants that encode English inflectional morphology (Hawkins & Liszka, Reference Hawkins, Liszka, van Hout, Hulk, Kuiken and Towell2003; Xu & Demuth, Reference Xu and Demuth2012), even after many years of English immersion (Jia, Reference Jia2003; Jia & Fuse, Reference Jia and Fuse2007). We might, then, expect that Mandarin early child second language learners (ECL2) of English would have little difficulty acquiring a variety of coda consonants, but more difficulties in producing coda clusters that are morphemic. There might also be extant L1 phonotactic effects that could influence these ECL2 learners’ use of coda consonants (see Paradis, Genesee & Crago, Reference Paradis, Genesee and Crago2011, for discussion of residual L1 effects in older immigrant children). Alternatively, precisely because these children are younger, perhaps they are better able to learn the morphological structure of English.
Little is known about the interactions between segmental and syllabic structure in the L2 acquisition of ECL2 learners, and/or how this might influence the acquisition or inflectional morphemes. Yet it is likely that many, if not most, bilinguals around the world are faced with this problem, where 2–3-year-olds begin to be exposed to other languages once they become more independent and are able to explore the larger world around them. In some cases this will mean entering into preschools or larger peer groups where other languages in the community are spoken. In other cases there may be a change in family living situation, as in immigrant communities. Whatever the circumstances, the young child is presented with a language learning challenge. In the case of language disorders, this can lead to further difficulties, both for the family and the community. It is therefore imperative to know more about what the processes of learning a new L2 are, and how and when it is reasonable to expect ECL2 learners to exhibit the various grammatical competencies expected of their peers.
Monolingual acquisition of segments and syllable structure
The acquisition of segmental phonology has a long tradition of study, with Jakobson (Reference Jakobson1941, Reference Jakobson and Greenberg1963) proposing that segments that are typically considered unmarked crosslinguistically will be acquired before more marked segments. Although this has been a challenge to validate (see Demuth, Reference Demuth, Goldsmith, Riggle and Yu2011, for review), typically developing children tend to acquire stops before fricatives in most languages, and replace more marked /ʃ/ with less marked, crosslinguistically more widely found [s] in early productions (Edwards, Reference Edwards1974).
Evidence of markedness at higher levels of phonology, such as syllable structure and prosodic words, has also been noted across languages. That is, children tend to produce simple CV syllable structures before more complex CVC or CVCC structures (Demuth, Reference Demuth and Beckman1995; Gnanadesikan, Reference Gnanadesikan, Kager, Pater and Zonneveld2004; Levelt, Schiller & Levelt, Reference Levelt, Schiller and Levelt2000), and produce simple phonological words containing a disyllabic foot (e.g., kitty) before more complex phonological words (e.g. banana) (see Demuth, Reference Demuth and Beckman1995, Reference Demuth1996, Reference Demuth and Demuth2006). These crosslinguistic findings on the early acquisition of syllable and word structures provide support for claims that early phonological structures will be simple, or “unmarked” (Prince & Smolensky, Reference Prince, Smolensky and McCarthy2004). One of the questions that then arises is how these unmarked structures emerge in bilingual acquisition. This is all the more interesting since high frequency phonological structures tend to be acquired before lower frequency phonological structures, all else being equal (Levelt et al., Reference Levelt, Schiller and Levelt2000; Roark & Demuth, Reference Roark, Demuth, Howell, Fish and Keith-Lucas2000). For example, the development of syllable structures in Dutch follows closely the frequencies with which these occur in child-directed speech, with the more common syllable structures being acquired before those that less commonly occur. Clements & Keyser (Reference Clements and Keyser1983) suggest that the unmarked structure of syllables is that which has a flatter sonority profile within the rhyme. Thus, those coda consonants that are more vowel like (have greater sonority) (Ladefoged, Reference Ladefoged1993) tend to be more common across languages. For example, the only coda consonants permitted in Mandarin are the nasals /n/ and /ŋ/. In contrast, stops (such as /p, t, k/), which have very low sonority, are only permitted in the onset of the syllable in Mandarin (e.g., /than/ greed). Thus, on markedness grounds alone, one might expect that even English-speaking children would first acquire more sonorant consonants (liquids and nasals) in coda position, and that less sonorant coda consonants (fricatives and stops) would be later acquired. However, the first coda consonant that English-speaking children tend to acquire is /t/, the most frequent coda consonant in English (Kehoe & Stoel-Gammon, Reference Kehoe and Stoel-Gammon1997; Stites, Demuth & Kirk, Reference Stites, Demuth, Kirk, Brugos, Micciulla and Smith2004). This suggests that both frequency and sonority play a role in determining the course of coda consonant acquisition in a given language. This then raises interesting questions regarding when Mandarin ECL2 learners of English will acquire simple English syllable structures with low sonority coda consonants such as /t/ and /s/. On one hand, /t/ is phonologically marked in coda position, so we might expect it to be acquired late. On the other hand, due to the fact that it occurs at the unmarked (alveolar) place of articulation, and is the highest frequency coda in English, might facilitate early acquisition.
The situation with /s/ is less clear due to the fact that it is a later acquired segment in English, but has high occurrence as an inflectional morpheme (e.g., in plurals, possessives, and 3rd person singular – cats, Mike's, eats). English /s/ also often occurs as part of a morphologically complex consonant cluster at the ends of words, as in cats, books. Thus, although the affricate /ts/ occurs at the beginnings of words in Mandarin, it is not clear if or how this might facilitate the acquisition of the English consonant cluster /t+s/ in coda position. Furthermore, monolingual children are notorious for exhibiting variable grammatical morpheme production (Brown, Reference Brown1973), and even when the morpheme is produced, cluster reduction may occur (e.g., cats /kæts/ > [kæs]) (Theodore, Demuth & Shattuck-Hufnagel, Reference Theodore, Demuth and Shattuck-Hufnagel2011).
Recent studies have also shown that English-speaking children are more likely to produce inflectional morphemes in utterance-final position compared to utterance-medial position. Words at the end of an utterance are typically longer in duration due to phrase-final lengthening (Oller, Reference Oller1973). One explanation for why children's inflectional morpheme production is better at the ends of utterances is that phrase-final lengthening provides more time to plan for the articulation of the coda consonant(s) (Song, Sundara & Demuth, Reference Song, Demuth and Shattuck-Hufnagel2009). Utterance-medial morphemes are therefore more difficult to produce because they are shorter in duration, with less time to coordinate articulators to approximate the intended target. In addition, the speaker still has to plan and articulate the following word, which is not required when the morpheme occurs in utterance-final position (Theodore et al., Reference Theodore, Demuth and Shattuck-Hufnagel2011). Thus, grammatical morphemes may be more likely to be produced when they occur in phonologically or prosodically simple, unmarked positions such as simple codas, phrase-final position, etc. (see Demuth & McCullough, Reference Demuth and McCullough2009).
Bilingual studies
There have been many studies of bilingual language learning, focusing on either perception or production. In infant speech perception, most studies have focused on the learning of segments (e.g., Bosch & Sebastián-Gallés, Reference Bosch and Sebastián-Gallés1997, Reference Bosch and Sebastián-Gálles2003; Mehler, Dupoux, Nazzi & Dehaene-Lamberts, Reference Mehler, Dupoux, Nazzi and Dehaene-Lamberts1996; Sundara, Polka & Molnar, Reference Sundara, Polka and Molnar2008; Werker & Tees, Reference Werker and Tees2002). In production, Vihman (Reference Vihman1996) has explored the bilingual transition from babbling to first words. Leopold (Reference Leopold1949) conducted early studies of his daughter Hildegard's bilingual development of English and German, and there has been a study exploring the nature of voice onset timing (VOT) in a child acquiring English and Spanish (Deuchar & Clark, Reference Deuchar and Clark1996). Most of these have been case studies focusing on the acquisition of segmental contrasts. However, a few recent studies have begun to examine syllabic development in simultaneous bilinguals, finding that high frequency marked structures in one language can stimulate earlier acquisition of similar structures in the other language, especially when these have relatively high frequency in one language and lower frequency in the other language. For example, in a study of German–Spanish simultaneous bilinguals, Kehoe and Lleó (Reference Kehoe, Lleó, Beachley, Brown and Conlin2003) found that coda consonants were produced earlier by the bilingual Spanish-speaking children than their monolingual Spanish-speaking peers. This is a very interesting result, suggesting that both markedness and frequency play a role in determining the development of bilingual phonological systems as well. This raises the possibility that similar results may be found in our ECL2 children. However, to better understand when Mandarin ECL2 learners begin to produce English codas during early L2 productions, it is critical to have a good understanding of the monolingual development of the languages under investigation.
Normative data on English and Mandarin monolingual language development
Normative studies have shown that English-speaking children produce most coda stops /p, b, t, d, k, g/ and nasals /m, n/ by the age of three. In contrast, coda fricatives such as /s/ are reportedly acquired sometime between 3–3;6 years (Dodd, Holm, Hua & Crosbie, Reference Dodd, Holm, Hua and Crosbie2003) or later (Smit, Hand, Freilinger, Bernthal & Bird, Reference Smit, Hand, Freilinger, Bernthal and Bird1990), depending on the criteria used. Although English permits an extensive inventory of coda clusters, relatively little is known about when these are acquired. One study found few clusters produced by two-year-olds, but /ts/ was amongst the first to be produced (Stoel-Gammon, Reference Stoel-Gammon1987). Kirk and Demuth (Reference Kirk and Demuth2005) found that two-year-olds were able to produce stop+/s/ and nasal+/z/ clusters with reasonable accuracy based on acoustic cues for stopping and frication. Furthermore, these same-segment clusters are more accurately produced as codas than as onsets. They suggest this asymmetry may be due to articulatory constraints: /s, z/ as a final member of a cluster (e.g., kicks ) may be easier to produce than as part of a fricative + stop cluster word-initially (e.g., skate).
Among English coda consonants that can also encode morphological information, the plural morpheme -s is one of the earliest acquired, regularly appearing between 2;5–2;9 years when children have an MLU (mean length of utterance) of around 3.0–3.5 morphemes/words per utterance (Brown, Reference Brown1973). Thus, we might expect it would also be one of the first morphemes ECL2 children would learn. However, given that Mandarin does not make use of either inflectional morphemes or coda clusters, we could also expect there might be some challenges in learning to produce these morphemes.
Little research is available on the phonological acquisition of Mandarin, and even less is known about acquisition of the only permissible codas /n, ŋ/. A normative study using phonetic transcription examined the acquisition of Mandarin by children in Beijing using two criteria: phoneme emergence (when most children started to use a phoneme), and phoneme stabilization (when most children had achieve above 75% accuracy on producing a phoneme in a target context) (Hua & Dodd, Reference Hua and Dodd2000). The results showed that a range of onset consonants have emerged by 3. This includes the stops /t, k/ by 1;6–2;0, the fricatives /f, s/ by 2;1–2;6, and eventually the stop and liquid /p, l/ by 2;7–3;0 (Hua & Dodd, Reference Hua and Dodd2000). A range of stop, fricative and nasal onsets in Mandarin have stabilized by 3;6 as well, including /p, t, k, f, m, n/. However, /s/ only stabilizes in Mandarin by around 4;0–4;6 (Hua & Dodd, Reference Hua and Dodd2000).
The current study
Several predictions can be made by comparing the monolingual acquisition patterns across these two languages. Firstly, the acquisition of onset consonants in Mandarin appears to occur 6–12 months earlier than the use of the same onset consonants in English (Dodd et al., Reference Dodd, Holm, Hua and Crosbie2003; Hua & Dodd, Reference Hua and Dodd2000). This may be due to different assessment criteria, differences in the size of the two phonological inventories, and/or the simple consonant–vowel (CV) syllable structure that predominates in Mandarin compared to the more complex CVC(C) structure that is typical of high-frequency words in English (Roark & Demuth, Reference Roark, Demuth, Howell, Fish and Keith-Lucas2000). Critically, however, by three years, the same stops, fricatives and nasals have emerged in both languages. Secondly, the two nasal codas in Mandarin are reported to emerge by two years (Hua & Dodd, Reference Hua and Dodd2000), suggesting that three-year-old Mandarin-speaking children should have no problem producing nasal codas in English. Thus, any difficulties with the singleton codas we investigated would suggest a difficulty in acquiring English codas and not the segments themselves. Although Mandarin has a /ts/ affricate, it does not allow consonant clusters. Thus, even though English-speaking children can produce stop+fricative codas as early as 2;6 (Kirk & Demuth, Reference Kirk and Demuth2005), we predicted that these would present a challenge for the ECL2 children in our study. English-speaking monolingual children also appear to use phrase-final lengthening by at least the age of two (e.g., Snow, Reference Snow1994; Song, Demuth & Shattuck-Hufnagel, Reference Song, Demuth and Shattuck-Hufnagel2012). If these ECL2 learners were sensitive to English phrase-final lengthening, we expected they would also produce longer vowels and coda consonants in utterance-final compared to utterance-medial position.
Method
Participants
In this study we examined the acquisition of English codas by three-year-old children who are exposed only to Mandarin at home and were learning English in a preschool setting. Twelve Mandarin-speaking children (six boys, six girls) with a mean age of 3;6 years (SD = 0;4 range = 3;2–4;2 years) were recruited from preschools in Sydney, Australia. Their mean length of English exposure was 14 months (SD = 5 months, range = 6–21 months), and mean age of initial exposure to English was 29 months (SD = 6.5 months, range = 20–40 months). Parents reported that Mandarin was the child's first language and the only language spoken at home. Preschool was the primary source of English language exposure for all the children. However, while we recruited children from areas with a high concentration of Mandarin-speaking community, it would be impossible for the parents to avoid speaking English completely outside of the home environment. Therefore, the children would have been exposed to listening to their parents speaking English with other people in the community.
Stimuli
Given the limited concentration span of three-year-olds, we examined a subset of English codas using an elicited imitation task. These included the voiceless stop /t/, the fricative /s/, and the nasal /n/, all segments also present in the Mandarin segmental inventory. We also included the English consonant cluster /ts/, where the -s carried plural morphology. Sixteen high-frequency, picturable monosyllabic CVC(C) nouns containing onset consonants that typically emerge early in both Mandarin and English were selected for the experiment. The lexical frequencies were extracted via ChildFreq from the CHILDES database which calculates children's frequency of saying the target verb per million words at age 3;0 (Bååth, Reference Bååth2010; MacWhinney, Reference MacWhinney2000). Each target word was represented by a picture to serve as a visual prompt during the experiment. All pictures were real photos with minimal background distractions.
Half of the words for the stop and fricative codas contained short mid to high front vowels [ɪ, e] and half contained short low vowels [æ, ɐ] to control for vowel duration while providing some variety to the stimuli. However, the stimulus words containing the nasal coda /n/ had only long vowels since it was difficult to find a sufficient number of high frequency words with short front mid-high and low vowels that also had a nasal coda. The long vowels used for the nasal coda words were the high and mid vowels [iː, oː] and the low vowel [æ] (in Australian English /æ/ is a long vowel proceeding alveolar nasals). The test words were placed in both utterance-medial and utterance-final contexts (see Table 2). Note that all medial target words were followed by a high frequency verb beginning with /b/. This avoided possible coarticulation with a following consonant at the same alveolar place of articulation, and avoided possible resyllabification effects that might occur with a following vowel.
Table 2. Target words with each coda and corresponding sentences.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921030841-60968-mediumThumb-S1366728913000618_tab2.jpg?pub-status=live)
All stimuli were recorded in a child-directed speech register by a female monolingual speaker of Australian English. This register is slower in pace and in general more carefully articulated than speech directed at adults, providing optimal acoustic cues to the target words and sentences. Each sentence was extracted from the recordings using Praat software (Boersma & Weenink, Reference Boersma and Weenink2012) and embedded with the associated picture for computer presentation. Each of the sixteen stimulus prompts were randomized into two versions and counter-balanced across participants to control for order effects. Thus, all participants were asked to produce four stimulus word sentences for each of the four codas (/t, s, ts, n/) for a total of 32 sentences, 16 with the target words in utterance-medial position (e.g., Her pets bark) and 16 with the target words in utterance-final position (e.g., She has pets ).
Procedure
Child and parent were brought into a sound attenuated room where the child was invited to play a computer language game. The participant sat at a child-sized table where pictures accompanied by the related pre-recorded stimulus sentence were played on a computer monitor. After a two-sentence warm-up, the child was asked to repeat each sentence they heard. If participants were unable to produce the sentence after the initial presentation, the audio stimulus was repeated a maximum of three more times before moving to the next stimulus. This was typically an easy task for the children, and was completed in 20 minutes. The children received a T-shirt or stickers for their participation in the study.
The parent was then asked to fill in the short-form of the MacArthur Communicative Development Inventory (CDI) (Fenson, Pethick, Renda, Cox, Dale & Reznick, Reference Fenson, Pethick, Renda, Cox, Dale and Reznick2000), as well as demographic information about the child's language background and language exposure. The CDI was used to ensure that the children had some expressive vocabulary in English. The reported scores ranged from 36 to 80 (mean = 54) out of 100. A score of 54 is in the bottom 10th and 5th percentiles for English monolingual 30-month-old boys and girls, respectively, and a score of 80 is at the 40th and 20th percentiles for boys and girls. However, since the parents and children did not communicate in English, it is likely that parental reports underestimated the children's actual command of English. Furthermore, the children had no difficulty communicating with the experimenter and understanding the English instructions about how to complete the task.
All participants’ speech productions were recorded via a Behringer C-2 directional microphone onto a computer using Protools software, with a sampling rate of 44.1 kHz at 16 bit quantization. Uncompressed WAV files were then exported for later acoustic analysis using Praat.
Acoustic coding
Acoustic analysis was used to provide greater coding accuracy than simple impressionistic transcription methods, since contrasts made by the child may not be detected by the listener (Scobbie, Gibbon, Hardcastle & Fletcher, Reference Scobbie, Gibbon, Hardcastle, Fletcher, Broe and Pierrehumbert2000). Li, Edwards and Beckman (Reference Li, Edwards and Beckman2009) and Theodore, Demuth and Shattuck-Hufnagel (Reference Theodore, Demuth and Shattuck-Hufnagel2012) found that transcription alone could not accurately account for children's acquisition of onset and coda consonants. We therefore follow previous research in using acoustic analysis as a method for measuring children's use of coda contrasts.
The presence vs. absence and duration of different acoustic cues was coded for each coda type. Each acoustic cue was identified by visual inspection of the waveform, spectrogram and listening to the utterance. For the stop /t/, the acoustic events coded were (i) vowel duration: the interval between the onset and offset of F2 energy in a periodic waveform in the spectrogram, (ii) the presence of closure: an abrupt diminishing of amplitude at the end of vowel and F3 cessation, (iii) closure duration: the interval between termination of vowel-formant transition and onset of coda bursts, (iv) the presence of coda burst(s), and (v) the duration of post-release noise: the interval between the onset and offset of noticeable post-release noise following the closure or bursts (see Figure 1). In sentence-medial position, where /t/ is often unreleased or produced as a glottal stop, (v) post-release noise was not a sufficient criterion for coding the presence of coda /t/ and (i)–(iv) plus presence of glottalization were also coded (see also Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921030841-60626-mediumThumb-S1366728913000618_fig1g.jpg?pub-status=live)
Figure 1. Representative waveform and spectrogram illustrating acoustic coding made for vowel, closure, burst and post-release noise durations for the word pet.
For the fricative /s/, the acoustic measures included (i) vowel duration (as described above), and (ii) duration of the frication noise following vowel offset (see Figure 2). For the /ts/ cluster, the acoustic measures included both those for the stops and the fricatives. Here frication noise was coded instead of post-release noise where an auditory percept of frication was heard AND greater spread of energy across the spectra was observed (see Figure 3).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921030841-80836-mediumThumb-S1366728913000618_fig2g.jpg?pub-status=live)
Figure 2. Representative waveform and spectrogram illustrating acoustic coding made for vowel and frication durations for the word bus.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921030841-62710-mediumThumb-S1366728913000618_fig3g.jpg?pub-status=live)
Figure 3. Representative waveform and spectrogram illustrating acoustic coding made for vowel, closure, burst and frication durations for the word nets.
For the nasal /n/, the acoustic events coded included (i) vowel duration (as described above), and (ii) nasal duration, entailing the voiced period of reduced amplitude with either absence of formants above F2, or downward movement for high vowels and upward movement for low vowels of F2, anti-resonance as indicated by dark and light bands on the spectra, and simplification of the vowel (see Figure 4).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921030841-03165-mediumThumb-S1366728913000618_fig4g.jpg?pub-status=live)
Figure 4. Representative waveform and spectrogram illustrating acoustic coding made for vowel and nasal duration for the word can.
A total of 355 tokens were produced by the children and coded by one trained coder. Ten tokens were subsequently excluded due to poor acoustic quality from noise interference. The target words of the remaining 345 tokens were then used for further analysis. A second trained coder coded 10% of these remaining items. Reliability between the two coders on the presence or absence of an acoustic event was 96.5%, and mean durational measures were within 2 ms of each other.
Results
Presence of coda as a function of utterance position
To examine children's performance, the number of codas “produced” was tallied. A particular segment was considered “produced” when at least one of the acoustic cues to the presence of closure, burst/release noise, frication, or nasality was observed for the coda in question. The raw counts for each coda were then converted to a percentage out of total elicited. The percentage of each coda produced, as a function of medial/final utterance position, is presented in Figure 5.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921030841-48567-mediumThumb-S1366728913000618_fig5g.jpg?pub-status=live)
Figure 5. Percentage of codas produced as a function of utterance position (medial, final) with standard error bars; *p < .05.
A two-way analysis of variance (ANOVA) with codas on four levels (/t, s, ts, n/) and sentence positions on two levels (medial and final) was conducted. With α set at .05, a significant main effect of codas was found, F(3,33) = 3.556, p = .025, ηp 2 = .244. However, the main effect of sentence position was not significant. This suggests that the number of productions differed across coda types but not across utterance position. Six post-hoc pair-wise comparisons with unadjusted α of .05 comparing all coda pairs revealed three significant comparisons. More productions were made for /t/ than /n/ (p = .034), more for /s/ than /n/ (p = .034), and more for /t/ than /ts/ (p = .029). However, no significant differences were found with Bonferroni adjusted α of .008.
However, the interaction between coda type and sentence position was significant, F(3,33) = 5.494, p = .004, ηp 2 = .333. Post hoc comparisons for each coda across utterance position (medial vs. final) with Bonferroni adjusted α of .013 revealed significant differences only for /t/, F(1,11) = 148.5, p < .001, ηp 2 = .931, with no other significant differences observed. This suggests that the coda by position interaction is largely driven by coda /t/ which is produced more often in sentence-final than sentence-medial condition.
In sum, performance on coda production overall was quite good. The pattern for /t/ was that expected of monolingual English-speakers, with ceiling performance utterance-finally, but significantly lower utterance-medially. This shows the typical utterance position effect found in previous studies of monolingual English-speaking children (Song et al., Reference Song, Sundara and Demuth2009; Theodore et al., Reference Theodore, Demuth and Shattuck-Hufnagel2012). The general high performance on production of /t/ suggests that the high frequency with which this coda occurs in English may provide these ECL2 learners with ample evidence for using this segment in coda position. Interestingly, however, the production of coda /s/ was also near ceiling in both utterance-medial and utterance-final positions. Perhaps frication duration was easier to produce in medial position than the utterance-medial cues to a stop, especially given that stops are often unreleased utterance-medially. It is also possible that closure is a less salient perceptual cue than frication for these ECL2 learners, especially in medial position where the /t/ was followed by another word beginning with a stop. As predicted, performance on the /ts/ cluster was not as good as either singleton /t/ or singleton /s/, though overall it was on par with /n/. This is better performance than we might have expected, with no difference in cluster production as a function of utterance position. Finally, we had expected that the nasal coda /n/ would be acquired easily by these children who have Mandarin /n/ codas in their L1. Interestingly, however, overall performance on coda /n/ was lower than on /t/ or /s/, with better performance utterance-medially compared to utterance-finally (though this is not significant after Bonferroni correction). We discuss possible reasons for this below.
Since the cluster /ts/ contains both the coda /t/ and the morpheme /s/, further analysis was carried out on the types of cluster reduction errors made (see Figure 6). A two-way chi-squared test with α set at .05 revealed no significant relationship between cluster realizations as a function of sentence position. However, collapsed across sentence positions, a one-way chi-squared test revealed a significant difference in cluster simplification, χ2 (2, N = 11) = 8.909, p = .012, with /t/ being most frequently preserved (58%), then simplification to /s/ (24%), and then complete omission (18%). Similar patterns have been found in studies of monolingual English-speaking children (Polite, Reference Polite2011; Theodore et al., Reference Theodore, Demuth and Shattuck-Hufnagel2011), suggesting increased challenges with producing codas in phonologically and morphologically complex contexts.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921030841-74976-mediumThumb-S1366728913000618_fig6g.jpg?pub-status=live)
Figure 6. Types of productions made for the /ts/ cluster as a percentage of total errors.
Acoustic evidence for phrase-final lengthening
Recall that previous studies of English-speaking children had reported that coda production was more accurate utterance-finally, and that this could be attributed to the increased duration of the final syllable as a function of phrase-final lengthening. Perhaps these ECL2 learners have also acquired English phrase-final lengthening, facilitating their overall good performance on coda production. Table 3 reports the means and standard deviations of the duration of each coda /t, s, ts, n/ and for the preceding vowel, closure, release and frication in utterance-medial and utterance-inal positions. Figure 7 shows mean durations of these segments in the two utterance positions. With α set at .05, two orthogonal planned comparisons were conducted for coda /t/ to examine vowel and coda (closure, post-release noise, and glottalization) duration by utterance positions (medial vs. final). Significant comparisons were found for closure, F(1,11) = 17.518, p = .002, ηp 2 = .613, but not for vowel duration across utterance position. These results suggest that durations were longer in utterance-final compared to utterance-medial positions for coda /t/, but vowel duration did not differ significantly as a function of utterance position. Thus, these ECL2 children appear to exhibit phrase-final lengthening, but this is restricted to only the coda consonant. This differs from 2;6-year-old monolingual English-speaking children, who show lengthening of both the vowel and coda consonant in CVC words where the coda is also /t/ (Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012).
Table 3. Durations in milliseconds for vowel, closure, post-release noise, and frication noise for codas /t/, /s/, /ts/, and /n/ in utterance-medial and utterance-final positions.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160920223010454-0534:S1366728913000618:S1366728913000618_tab3.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160921030841-43261-mediumThumb-S1366728913000618_fig7g.jpg?pub-status=live)
Figure 7. Durations in milliseconds for vowel, closure, post-release noise, and frication noise for codas /t/, /s/, /ts/ and /n/ in utterance-medial and utterance-final positions; *p < .05.
Two orthogonal planned comparisons were conducted for /s/ to examine vowel and frication duration by utterance positions (medial vs. final). Only frication duration was significantly different across utterance positions, F(1,11) = 21.51, p = .001, ηp 2 = .655. Again, phrase-final lengthening was exhibited only in the coda /s/: vowel duration did not differ significantly between utterance positions (Table 3, Figure 7).
Three orthogonal planned comparisons were then conducted for /ts/ to examine vowel, closure and frication duration by utterance positions (medial vs. final). Two significant comparisons were found for closure, F(1,11) = 8.596, p = .014, ηp 2 = .440, and frication duration, F(1,11) = 6.823, p = .024, ηp 2 = .381, but not for vowel duration across utterance position (Table 3, Figure 7). These results suggest that durations were longer in utterance-final than utterance-medial positions for both closure and frication, but not for vowel duration. Once again, these children lengthen the coda consonants utterance-finally, but not the vowel.
Overall, the results from these durational analyses suggest that these ECL2 children are lengthening coda consonants in utterance-final position, but have yet to learn to systematically lengthen the vowel as well. Thus, aspects of phrasal prosodic structure are still to be fully acquired by these ECL2 learners, even in contexts where they otherwise use appropriate L2 segments and syllable structures.
We now turn to the acoustic realization of the nasals. Two orthogonal planned comparisons were conducted to examine vowel and coda nasal duration by utterance positions (medial vs. final). Vowel duration for both long vowels [iː, oː] and [æ] (which is realized as a long in before the nasal) was significantly longer in utterance-final than utterance-medial position, F(1,11) = 31.151, p < .001, ηp 2 = .739. However, nasal durations did not differ significantly in sentence-medial vs. sentence-final position. The results therefore show phrase-final lengthening for vowels only, but not for nasal codas.
The effect of age and length of exposure on coda production
For bilingual and L2 research with older children, age of exposure to the L2 and length of exposure to the L2 have typically been found to be important factors for L2 acquisition (Jia, Reference Jia2003; Jia & Fuse, Reference Jia and Fuse2007; Paradis et al., Reference Paradis, Genesee and Crago2011). For phonological acquisition, age at testing, frequency of lexical items and phonological complexity are important factors affecting both L1 and L2 child productions (Leonard & Ritterman, Reference Leonard and Ritterman1971; Sundara, Demuth & Kuhl, Reference Sundara, Demuth and Kuhl2011; Theodore et al., Reference Theodore, Demuth and Shattuck-Hufnagel2011; Tyler & Edwards, Reference Tyler and Edwards1993). To explore the possible effects of these factors on our ECL2 learners’ coda productions, several logistic regression analyses were conducted.
A mixed-effects logistic regression was carried using Stata with three fixed-effects predictors: (i) age of L2 exposure (AOL2), (ii) length of L2 exposure (LOL2), (iii) age at testing (AgeAT), and two random-effects factors: (iv) word frequency, and (v) utterance position. Here word frequency was included as a factor to address the question of whether productions were influenced by lexical word frequency and not the type of coda segment alone. A test of the full model with all five predictors against a constant-only model was not statistically reliable, χ2 (5, N = 345) = 6.59, p = .253, indicating that the factors, as a set, did not reliably predict coda production.
A test of correlations, however, revealed that AOL2 was highly correlated with both AgeAT (r = .588, p < .001) and LOL2 (r = –.745, p < .001). A forward stepwise analysis was therefore carried out. In the first step, each variable was considered for entry and the one with the lowest p-value was entered into the model. In successive steps, each variable not already in the model was considered for entry into the model (adjusted for variables already in the model) and the one with the lowest p-value was entered. The results of this analysis were clear-cut: LOL2 was entered at the first step (p = .034). No other variables were significant at this first step, and none were significant adjusted for LOL2, so the procedure was terminated at that step. The final model contained only LOL2. Furthermore, LOL2 (odds ratio = 1.070, z = 2.12, p = .034) reliably predicted coda production, indicating that children were 1.07 times more likely to produce codas with every additional month of exposure to English.
Discussion
Overall, the group of ECL2 learners who participated in this study had little difficulty producing the non-native singleton codas /t/ and /s/, suggesting that the high frequency of these codas in the English these children hear facilitates early acquisition. Coda /t/ was produced at a level comparable with the same aged English-speaking children as those reported on by Dodd et al. (Reference Dodd, Holm, Hua and Crosbie2003) and Smit et al. (Reference Smit, Hand, Freilinger, Bernthal and Bird1990). An utterance position effect was also found for /t/, where more codas were produced in the durationally longer utterance-final position. This is consistent with previous findings for English-speaking monolingual children (Song et al., Reference Song, Sundara and Demuth2009, Reference Song, Demuth and Shattuck-Hufnagel2012). However, future studies manipulating the following phonological contexts, (i.e., vowel vs. consonants) are needed to shed more light on whether different phonological contexts may facilitate or pose more challenges for L2 English coda acquisition. The high performance on coda /s/ appeared earlier than that reported for both monolingual English-speaking and Mandarin-speaking children (Dodd et al., Reference Dodd, Holm, Hua and Crosbie2003; Hua & Dodd, Reference Hua and Dodd2000; Smit et al., Reference Smit, Hand, Freilinger, Bernthal and Bird1990) (though see Song, Demuth, Evans & Shattuck-Hufnagel, Reference Song, Demuth, Evans and Shattuck-Hufnagel2013). These findings suggest that these high frequency /t/ and /s/ codas, combined with experience with the same segment in both languages, may enhance the early acquisition of these non-native codas by these ECL2 learners.
Surprisingly, these children found coda /n/ more challenging to produce, with a mean correct production at 58%. This was unexpected given that nasals are the only codas permitted in Mandarin, and English-speaking monolinguals are producing coda /n/ at close to ceiling levels by three years (Dodd et al., Reference Dodd, Holm, Hua and Crosbie2003; Smit et al., Reference Smit, Hand, Freilinger, Bernthal and Bird1990). However, although it is reported that coda nasals emerge before two years in Mandarin-speaking children (Hua & Dodd, Reference Hua and Dodd2000), little is known about when Mandarin nasal codas are actually acquired. One possibility is that Mandarin-speaking children initially produce nasalized vowels instead of nasal codas, and later transition into producing actual nasal coda consonants. Indeed, many of the productions from these ECL2 learners had the percept of a nasal even when the traditionally recognized acoustic cues to a nasal consonant that we coded for were missing. This suggests that further investigation of how and when nasal codas are acquired in monolingual Mandarin-speaking children would be most interesting to pursue. It was also interesting to find a trend toward better nasal coda production utterance-medially – contra our expectation. It is possible that the following medial context, which contained a voice plosive /b/ (e.g., My pan burns), provided a continued voicing context for actual realization of the nasal consonant (see similar suggestions by Ohala and Ohala (Reference Ohala and Ohala1991) for the appearance of nasal consonants in Hindi). Future studies that examine the acoustic properties of both adult and child coda nasals in both languages and following different acoustic contexts (i.e., vowel vs. consonants) will be needed to shed further light on this issue.
The coda cluster /ts/ however proved challenging to acquire, with production rates at 62% across sentence positions. The cluster was often reduced to /t/, and less frequently to /s/, whereas the opposite pattern is typically found among English-speaking monolinguals (Theodore et al., Reference Theodore, Demuth and Shattuck-Hufnagel2011). Since the /ts/ coda words in this study are all plurals, and were matched with a picture of two identical objects, this result suggests these Mandarin-speaking ECL2 learners of English may not yet have acquired plural morphology, an interesting issue for further research. Taken together, our findings suggest that these ECL2 learners can acquire English L2 singleton codas rapidly, but that codas that contain more complex syllabic and/or morphological structures present more of a challenge.
The ECL2 learners in this study also exhibited phrase-final lengthening. However, for the stops and fricatives, longer durations were restricted to the coda consonant(s) and not evidenced on the preceding vowel. In contrast, monolingual-English-speaking children show lengthening of both the coda and the preceding vowel (Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012, Reference Song, Demuth, Evans and Shattuck-Hufnagel2013). A different pattern of results was evident for coda /n/, where phrase-final lengthening was observed in the vowel preceding coda /n/, but no coda nasal lengthening was observed. While Mandarin does exhibit phrase-final lengthening (Shen, Reference Shen1992), it is not clear which part of the syllable is lengthened in Mandarin (vowel, coda or both), and when this is acquired by monolingual Mandarin-speaking children. Future research on possible lengthening effect in monolingual adult and child Mandarin speech will help to address the nature of this contrast in these children's early English.
Finally, a range of predictors for coda production were examined. These included age at testing, age of L2 acquisition, length of L2 exposure, frequency of the lexical item, and effect of utterance position. All of these factors have been found to be important to various degrees for child L1 and L2 acquisition. However, in this study only length of L2 exposure was a good predictor of performance on coda production. Thus, at this very early age, length of exposure was important in predicting the acquisition of English codas but age of exposure was not (1;8–3;4 years).
Conclusions
The results from this study show that Mandarin-speaking ECL2 learners can acquire some English coda consonants (/t, s/) rapidly, after only a short period of exposure to English. However, more complex syllable structures that involve consonant clusters and inflectional morphology may take longer to acquire. The lower overall production for the cluster /ts/, together with the preference for reduction to /t/, suggests this group of ECL2 learners may not yet have developed robust representations for English clusters and/or plural morphology. Future studies designed to specifically tease apart phonological and morphological issues will be very informative. This is consistent with what we know about adults and older L2 learning children. Our results suggest that ECL2 learners are a unique group of speakers, with similarities and differences to both monolingual and older L2 learning children. More research is required to clarify when ECL2 learners catch up to their English-speaking peers, and the possible implications for atypical L2 and bilingual development.