Introduction
Children first learn to produce voicing contrasts in word-onset position, eventually learning how to produce these in word-coda position as well (Jakobson, Reference Jakobson1968; Stoel-Gammon and Buder, Reference Stoel-Gammon, Buder, Elenius and Branderud1999). But due to their syllabic complexity, coda consonants often take some time to acquire (Demuth et al., Reference Demuth, Culbertson and Alter2006). This raises questions about when children learn to produce the acoustic cues needed to convey voicing contrasts in coda position, critical for distinguishing words in English such as dog vs. dock. Quantifying these cues in typically-developing children is also necessary for assessing language-delayed child populations where these contrasts may present long-term challenges (e.g., Baudonck et al., Reference Baudonck, Lierde, D'haeseleer and Dhooge2011; Markides, Reference Markides1970; Miles et al., Reference Miles, Demuth, Ching, Cox, Demuth, Lin, Miles, Yuen, Palethorpe and Shaw2012; Xu Rattanasone and Demuth, Reference Xu Rattanasone and Demuth2014).
Most of our knowledge about child phonological acquisition comes from perceptual coding, where adults listen and transcribe children's speech phonemically. These studies have shown that children's early word productions vary substantially from those of adults, and that phonological contrasts, such as voicing, may not be systematically conveyed (e.g., Smith, Reference Smith1979, Reference Smith1973). Transcriptions of child speech can thus be difficult to interpret with respect to actual phonological knowledge because children sometimes produce systematic acoustic distinctions that adult listeners cannot hear (Li et al., Reference Li, Edwards and Beckman2009; Macken and Barton, Reference Macken and Barton1980; Scobbie et al., Reference Scobbie, Gibbon, Hardcastle, Fletcher, Pierrehumbert and Broe2000). These ‘covert contrasts’ have been documented for phonological contrasts such as voicing in the speech of both typically-developing children (Macken & Barton, Reference Macken and Barton1980; Weismer et al., Reference Weismer, Dinnsen and Elbert1981) and those with language delays (Forrest et al., Reference Forrest, Weismer, Hodge, Dinnsen and Elbert1990). Systematic acoustic analysis is therefore needed for a better understanding of children's developing phonological knowledge (Munson et al., Reference Munson, Edwards and Beckman2005; Theodore et al., Reference Theodore, Demuth and Shattuck-Hufnagel2012).
Relatively little is known about the acquisition of acoustic cues to coda voicing contrasts, as acoustic investigations of the development of voicing contrasts have mostly focused on word-initial consonants and voice onset time (VOT). Between 2 and 4 years of age, children go through several stages of development with respect to these voicing contrasts (Imbrie, Reference Imbrie2005; Kewley-Port & Preston, Reference Kewley-Port and Preston1974; Koenig, Reference Koenig2001; Macken & Barton, Reference Macken and Barton1980; Zlatin & Koenigsknecht, Reference Zlatin and Koenigsknecht1976). Around the age of 2, children first go through a period where their VOTs for voiced and voiceless stops are not distinct from one another, both falling within the adult range of voiced stops. During the second stage, their VOTs for voiced and voiceless stops systematically differ from one another but are still perceived as the same category (i.e., voiced) by adults. Finally, around the age of 3–4, children's VOTs begin to approximate adult acoustic values, with short VOTs for voiced stops and long VOTs for voiceless stops, though children's VOTs tend to be longer than those of adults. Thus, it takes several years for children to acquire adult-like voicing contrasts in onset position (e.g., Yu et al., Reference Yu, De Nil and Pang2015; Zlatin and Koenigsknecht, Reference Zlatin and Koenigsknecht1976). Since coda consonants are typically acquired later than onsets, we might expect children to take even longer to reach adult-like acoustic realizations for voicing contrasts in coda position.
Coda consonants are often omitted or variably realized in children's early speech (e.g., Demuth et al., Reference Demuth, Culbertson and Alter2006; Kirk and Demuth, Reference Kirk and Demuth2006). Despite being later acquired, children seem to exhibit a coda consonant representation at an early age, even when the target coda itself is not actually produced. For instance, in spontaneous speech, 2-year-olds lengthen the duration of the vowel when the coda consonant is missing (e.g., /dɔg/ → /d ɔ:/; Song and Demuth, Reference Song and Demuth2008). This raises questions about the acoustic cues children use in producing coda voicing contrasts, and how these become more adult-like over time.
There are multiple acoustic cues to oral stop coda voicing contrasts in English. The primary cue is vowel duration, where vowels preceding voiced codas are longer than those preceding voiceless codas (Fowler, Reference Fowler1992; de Jong, Reference de Jong2004; Penney et al., Reference Penney, Cox, Miles and Palethorpe2018; Raphael, Reference Raphael1972). Closure duration (e.g., Cox and Palethorpe, Reference Cole, Kim, Choi and Hasegawa-Johnson2011; Lisker, Reference Lisker1957; Luce and Charles-Luce, Reference Luce and Charles-Luce1985; Penney et al., Reference Penney, Cox, Miles and Palethorpe2018) and burst duration (Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012) also vary as a function of coda voicing, with longer durations for voiceless than for voiced codas. Additional cues, such as the presence of a voice bar (VB) during closure preceding a voiced coda, and the presence of irregular pitch periods (IPP) at the end of the vowel preceding a voiceless coda, provide supplementary information to coda voicing (British English: Docherty and Foulkes, Reference Docherty, Foulkes, Docherty and Foulkes1999; American English: Redi and Shattuck-Hufnagel, Reference Redi and Shattuck-Hufnagel2001; Australian English: Penney et al., Reference Penney, Cox, Miles and Palethorpe2018). IPP appears to be a strong correlate to coda voicelessness in Australian English (Penney et al., Reference Penney, Cox, Miles and Palethorpe2018), while it is less systematic in American English (Redi & Shattuck-Hufnagel, Reference Redi and Shattuck-Hufnagel2001; Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012).
While many of the cues to coda voicing contrasts have been investigated in adult speech, relatively little is known about these cues in child speech, though some suggest early sensitivity. Acoustic analysis of the longitudinal spontaneous speech of four 1–3-year-olds from the Providence Corpus (Demuth et al., Reference Demuth, Culbertson and Alter2006) revealed that these American English-speaking children produced longer vowel durations before voiced than voiceless codas (Ko, Reference Ko2007). Using the same corpus, Song et al. (Reference Song, Demuth and Shattuck-Hufnagel2012) compared the speech of three mother-child dyads and found that 1;6-2;6-year-olds were already using most of the acoustic correlates to coda voicing contrasts (i.e., vowel duration, burst duration, IPP and VB). For example, they produced longer vowel durations before voiced than voiceless codas, without any developmental trend. They also started to produce longer burst durations for voiceless stops than for voiced stops around the age of 2;0, matching the adult pattern. By 2;6 years, burst durations for both voicing categories decreased in duration, approximating adult values. These children did not differ in their use of IPP as a function of coda voicing, whereas their mothers tended to produce more IPP before voiceless than voiced codas. Finally, both children and mothers showed similar patterns for voice bar (VB), with more use before voiced codas than voiceless codas.
In another, elicited speech study of two American children aged 2;5 and 3;2 years, Shattuck-Hufnagel et al. (Reference Shattuck-Hufnagel, Demuth, Hanson, Stevens, Clements Nick and Ridouane2011) also found that children produced more IPP and post-release noise (bursts) when the coda was voiceless (though Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012 had found no difference in slightly younger children). This suggests that the use of IPP as a cue to coda voicelessness in American English may emerge around 3 years, with children still refining control of the larynx and the degree of vocal fold tension until age 3;6 years (Imbrie, Reference Imbrie2005). Finally, children as young as 2 years systematically produced more instances of VB before voiced than before voiceless codas (Shattuck-Hufnagel et al., Reference Shattuck-Hufnagel, Demuth, Hanson, Stevens, Clements Nick and Ridouane2011; Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012). This suggests that the use of VB as a cue to voicing is in place relatively early.
The studies above suggest that children under 4 years of age can make coda voicing contrasts, but that their acoustic realizations remain less systematic and more variable than those of adults. Most of the above studies also contained data from only a few children, and examined only some places of articulation (PoA). To determine how children's use of cues to coda voicing contrasts comes to approximate that of adults, a larger, more systematic investigation of coda voicing contrasts at all PoAs is needed.
The present study therefore set out to examine the acoustic correlates to coda voicing contrasts in Australian English-speaking children aged 4–5 years and compared these with those of adults. Using an elicited imitation task with systematically controlled stimuli at all PoAs, the present study thus aimed to provide a normative Australian English baseline/reference of the acoustics of coda voicing contrasts in both children and adults that could then be used for future work examining early L2 learners and language-delayed populations.
Based on previous findings from both children and adults, we hypothesized that 4-year-olds would use systematic durational differences to coda voicing contrasts. These might include longer vowel durations before voiced codas, and longer closure and burst durations for voiceless codas. We also expected that children would produce overall longer durations than adults, with more inter- and intra-speaker variation. Given that IPP is a strong correlate to coda voicelessness in Australian English (Penney et al., Reference Penney, Cox, Miles and Palethorpe2018), we also predicted that children might produce higher rates of IPP before voiceless codas than before voiced codas, though perhaps lower than adults. Finally, we hypothesized that children would produce more instances of VB before voiced than before voiceless codas, though again, perhaps less often than adults.
Methods
Participants
A total of 20 pre-schoolers (aged 4;1-5;8 years, M = 4;10; 12 females, 8 males) were recruited, along with twenty adult controls (aged 20–35 years, M = 28; 15 females, 5 males). All participants were monolingual speakers of Australian English, born in Australia and brought up in Sydney. No participants reported any speech, hearing or cognitive difficulties. The study was approved by the Macquarie University's Human Ethics Panel. Children received a $20 voucher and stickers for their participation; adults received course credit.
Stimuli
A total of 18 CVC picturable minimal pair words (see Table I) were selected by crossing word-final voicing (voiced vs. voiceless), PoA (bilabials vs. alveolars vs. velars) and three short–lax vowels (i.e. /ɪ/, / ɐ / and / ɔ /; Cox and Palethorpe, Reference Cox and Palethorpe2007). All stimuli were high-frequency words, with a mean Zipf frequency of 4.5 in the Subtlex-UK CBeebies pre-schooler corpus (van Heuven et al., Reference van Heuven, Mandera, Keuleers and Brysbaert2014). This is a corpus of subtitles taken from the BBC channel CBeebies, which is aimed at pre-school-aged children. Target words were embedded utterance-finally in the sentence “See this XXX”. All sentences were recorded by a 25-year-old female native speaker of Australian English in a sound-attenuated room (sampling rate: 44.1Khz with 16-bit quantization). Three additional sentences with non-target CVC words were recorded to serve as practice trials. To make the task engaging for children, all recorded stimuli were then paired with a cartoon-like picture and presented as an interactive game on an iPad Air using the Keynote presentation software.
Procedure
All participants were recorded in a sound-attenuated room at Macquarie University, Sydney. Participants sat at a table in front of the iPad and 30 cm away from an AKG C535 EB microphone. The microphone was connected to a pre-amplifier (Sound Devices, USBPre2) and recordings were captured and encoded with Audacity (mono WAV files: 44.1 kHz sampling rate, 16-bit quantization). Pictures and paired audio sentences were presented one at a time on the iPad, starting with the three practice trials. When participants touched the screen, the audio file linked to the picture was played. Participants repeated the sentence, touched the iPad again to move on to the next picture, tapped on the screen to hear the next sentence, and so forth until all the stimuli had been repeated. All participants completed a total of five blocks, with all stimuli presented once in each block, in pseudo-randomized order. Each participant thus produced 90 tokens (18 target words x 5 repetitions), completing the task in 30 minutes for children and 10 minutes for adults.
Acoustic coding and analysis
A total of 3600 recorded tokens (1800 from children, 1800 from adults) were inspected and manually annotated by the first author using Praat (Boersma & Weenink, Reference Boersma and Weenink2019). Five acoustic cues (see Figure 1) were annotated as follows: (1) vowel duration was measured from the beginning to the end of a strong F2, (2) closure duration was measured from the end of a strong F2 to the first peak of the release burst of the following stop, (3) burst duration was measured from the first peak of the release burst to the end of strong energy on the spectrogram, (4) the presence of irregular pitch periods (IPP) was identified by the presence of irregularly spaced glottal pulses at the end of the vowel, (5) the presence of voice bar (VB) was identified by the presence of a low frequency, low amplitude signal following the sudden drop of amplitude at the end of the vowel. Thus, vowel, closure and burst durations were durational cues, while the presence or absence of IPP and VB were binary cues. A phonetically trained research assistant independently coded 10% of the child and adult data (n = 360) to check reliability. Pearson correlations were high for each of the five cues (vowel duration: r = 0.93, p < .001; closure duration: r = 0.90, p < .001; burst duration: r = 0.88, p < .001; presence of IPP: r = 0.91, p < .001 and presence of VB: r = 0.91, p < .001).
Statistical analysis
Some of the produced tokens were not released (especially bilabials; see Table II), which meant that burst and closure durations could not be measured. Only vowel duration, IPP and VB were therefore measured for those tokens. For all durational cues, tokens falling beyond two standard deviations from the mean were excluded as outliers (see Table III). The proportion of outliers per cue was similar for both participant groups. For children, a total of 1718 tokens were analysed for vowel duration, 1607 for closure duration and 1561 for burst duration. For adults, a total of 1724 tokens were analysed for vowel duration, 1752 for closure duration and 1653 for burst duration. For both groups, all tokens for IPP and VB were analysed, as these cues were binary, so outlier removal was not applicable.
Separate linear mixed-effects models were fitted with the lme4 package (Bates et al., Reference Bates, Mächler, Bolker and Walker2015) in R (R Core Team, 2013) to each of the three durational measures. For the binary cues separate generalized mixed-effects models were fitted using the same R package. Each model had the same fixed structure, which included all main effects of and interactions between Group (Children vs. Adults), Voicing (Voiced vs. Voiceless) and PoAs (Bilabial vs. Alveolar vs. Velar). The random structure included by-subject and by-item intercepts, as well as by-subject random slopes for the effects of Voicing and PoA. Fixed factors were contrast-coded for Group (Children as -1 and Adults as 1) and Voicing (Voiced stops as 1 and Voiceless stops as -1) and Helmert-coded for the three-level factor PoA. The first contrast for this factor (i.e., PoA-1) corresponded to the difference between bilabials on one hand and the combined mean of alveolars and velars on the other. This was motivated by the fact that alveolar and velar stops have a different primary articulator (i.e., the tongue) than bilabials (i.e., the lips), and that bilabials tend to be less often released. The second contrast (i.e., PoA-2) corresponded to the difference between alveolars and velars. Post-hoc Tukey tests were performed using the least-squares means method of the lsmeans package (Kuznetsova et al., Reference Kuznetsova, Brockhoff and Christensen2017).
Results
Vowel duration
Figure 2 shows the vowel duration by voicing category, PoA and participant group (means are shown in Table IVa). The model fit showed significant effects of Voicing and PoA-1 (bilabials vs. non-bilabials) along with a two-way interaction between Group and Voicing, and a three-way interaction between Group, Voicing and PoA-1 (see Table V). This suggests that, in line with our predictions, vowel duration varied as a function of coda voicing in both children and adults, with longer vowel durations before voiced than before voiceless codas. In addition, it suggests that children had a larger difference between voiced and voiceless categories than the adults, and that this voicing difference was greater in non-bilabials than in bilabials.
* = p < .05; ** = p < .01; *** = p < .001
Closure duration
Figure 3 shows the closure duration, and Table IVb presents the means for both participant groups. Results of the model fit showed significant effects of Voicing, PoA-1 and PoA-2. There were also three two-way interactions, between Voicing and Group, between Voicing and PoA-1 and between Group and PoA-1, along with two three-way interactions between Voicing, Group and PoA-1, and between Voicing, Group and PoA-2 (see Table VI). This indicates that, as predicted, children and adults exhibited longer closure durations for voiceless than voiced stops and that children had longer closure duration than adults. Children also had a larger difference between voiced and voiceless categories than the adults, and this voicing difference was greater in non-bilabials than in bilabials. Finally, in children, the difference between voiced and voiceless stops was larger for alveolars than for velars.
* = p < .05; ** = p < .01; *** = p < .001
Burst duration
The burst durations are displayed in Figure 4 by voicing category, PoA and participant group (means are shown in Table IVc). The results of the model fit (see Table VII) revealed significant effects of Voicing, PoA-1 and PoA-2 along with three two-way interactions between Voicing and Group, Voicing and PoA-1, and Group and PoA-2. In addition, there was one three-way interaction between Voicing, Group and PoA-1. In line with our predictions, children exhibited longer burst durations for voiceless than for voiced codas, as did adults, and children had overall longer burst durations than the adults. In addition, children had a smaller difference between voiced and voiceless non-bilabials than the adults, and both children and adults showed a pattern of longer burst durations for alveolar than for velar codas, with the durational difference more apparent in the children. Finally, the difference between voiced and voiceless non-bilabials was smaller in children than in adults.
* = p < .05; ** = p < .01; *** = p < .001
Irregular Pitch Periods
Figure 5 and Table VIIIa show the proportion of IPP at the end of the vowel preceding the coda consonant, by voicing category, PoA and participant group. Results of the model fit (see Table IX) showed a significant effect of Voicing, and a two-way interaction between Voicing and Group. As expected, both children and adults used more IPP before voiceless than voiced codas, though children produced more IPP before voiced codas than adults, and less IPP before voiceless codas than adults.
* = p < .05; ** = p < .01; *** = p < .001
Voice Bar
The percentage of VB during the closure of the coda consonant, by voicing category, PoA and participant group are presented in Figure 6 and Table VIIIb. The results of the model fit (see Table X) revealed a significant effect of Voicing and a two-way interaction between Voicing and Group. As predicted, both children and adults produced more VB before voiced than voiceless codas. However, children produced fewer instances of VB than adults before voiced codas.
* = p < .05; ** = p < .01; *** = p < .001
Discussion
The present study investigated the acoustic realization of coda voicing contrasts in 4-year-old speakers of Australian English to determine if these children had acquired adult-like acoustic cues to these contrasts, critical for distinguishing minimal pair words. In line with our predictions, the children in the present study used vowel, closure and burst durations in an adult-like manner to distinguish voiced and voiceless stop codas. Children's vowels before voiced codas were longer than those before voiceless codas, consistent with previous findings in younger American English-speaking children (Ko, Reference Ko2007; Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012), and they produced shorter closure and burst durations for voiced than voiceless codas (cf. Luce & Charles-Luce, Reference Luce and Charles-Luce1985; Penney et al., Reference Penney, Cox, Miles and Palethorpe2018; Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012). Nonetheless, children's productions at this age were still systematically longer than those of adults. Figure 7 summarises these observations by voicing category and PoA.
The finding that children's durational cues are longer in absolute terms than those of adults raises the possibility that this might be due to children's slower speaking rate (e.g., Nip and Green, Reference Nip and Green2013), resulting in longer acoustic durational measures (Green et al., Reference Green, Moore, Higashikawa and Steeve2000, Reference Green, Moore and Reilly2002; Kowal et al., Reference Kowal, O'Connell and Sabin1975). To examine this possibility, we compared the children's data to those of the adults, with durational measures mean-centered by group (Enders & Tofighi, Reference Enders and Tofighi2007): that is, for vowel, closure and burst durations separately, we subtracted the grand mean of each participant group from the individual values within this group. We then re-fitted the original linear mixed-effects models for vowel, closure and burst durations using these mean-centered values. This showed that children's durational cues to coda voicing contrasts remained significantly longer than those of adults, even after speech rate differences were taken into consideration. It is possible that these children's longer durational values arise from a lack of articulatory mastery, with children needing more time to fine-tune the use of these temporal cues to voicing contrasts. This would corroborate previous findings regarding the lack of precision in articulatory timing of speech segments in children below the age of 6 (Green et al., Reference Green, Moore, Higashikawa and Steeve2000; Lowenstein and Nittrouer, Reference Lowenstein and Nittrouer2008; Nittrouer, Reference Nittrouer1993; Nittrouer et al., Reference Nittrouer, Studdert-Kennedy and Neely1996).
Interestingly, our results also showed some effect of PoA on closure and burst durations, with group differences at different PoAs whereby children had a larger difference between bilabials and non-bilabials than adults. Similar results have previously been observed for word-initial voicing contrasts, where 4-year-olds tend to show more adult-like voicing contrasts for bilabials first, followed later by alveolars and velars (Barton & Macken, Reference Barton and Macken1980). This is likely due to the lips being easier to articulate than the tip and body of the tongue. Given young children's smaller oral cavities, contrasts that occur further back in the mouth may need more time to develop (Green et al., Reference Green, Moore, Higashikawa and Steeve2000).
Finally, we found that the burst durations of alveolar codas were longer than those of velars, for both children and adults. In contrast, Song et al. (Reference Song, Demuth and Shattuck-Hufnagel2012) reported that the burst durations of alveolars were systematically shorter than those of velars for both children and their mothers. This suggests that burst duration may be used somewhat differently in Australian English compared to American English, and that 4-year-olds are attuned to the cues used in their own dialect of English. This reinforces previous claims in the literature about the importance of systematically documenting the different acoustic implementations of various phonological contrasts for different dialects of English (Chodroff & Wilson, Reference Chodroff and Wilson2017; Scobbie, Reference Scobbie, Goldstein, Whalen and Best2006; Stuart-Smith et al., Reference Stuart-Smith, Sonderegger, Rathcke and Macdonald2015).
With respect to the binary cues to coda voicing contrasts, the present study found more occurrence of IPP before voiceless than voiced stop codas, for both children and adults, consistent with previous findings for Australian English-speaking adults though not American English children (Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012) where the rates of occurrence of IPP did not vary with voicing. Despite producing IPP less often than adults, children followed the adult pattern. As suggested in previous literature (Penney et al., Reference Penney, Cox, Miles and Palethorpe2018), it seems that IPP in Australian English is a strong correlate to coda voicelessness that is already in place by the age of 4.
In line with previous findings (e.g., Cole et al., Reference Cole, Kim, Choi and Hasegawa-Johnson2007; Shattuck-Hufnagel et al., Reference Shattuck-Hufnagel, Demuth, Hanson, Stevens, Clements Nick and Ridouane2011; Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012), the probability of VB during closure was higher for voiced than voiceless codas for both children and adults, though children produced VB less often than adults. The proportion of VB found for adults was also consistent with previous literature on both Australian English (Penney et al., Reference Penney, Cox, Miles and Palethorpe2018) and American English (Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012).
In light of previous findings on 2-year-olds (Song et al., Reference Song, Demuth and Shattuck-Hufnagel2012), the results of the present study suggest that, as they become older, children refine their use of vowel duration to contrastively mark voicing toward the adult model. It is noted that in Song et al. (Reference Song, Demuth and Shattuck-Hufnagel2012), the 2-year-olds’ vowel duration before voiceless codas was similar to that of the adults’ in the same study, estimated at ~200 ms. The vowel duration preceding voiced codas, on the other hand, was estimated at ~375 ms for the 2-year-olds and ~275 ms for the adults. This asymmetry in adult-likeness suggests that the finding of a larger voicing contrast (i.e., a larger difference between the vowel durations preceding voiced and voiceless codas) for the 2-year-olds than for the adults was the result of the children's exaggerated lengthening of the vowel before voiced codas. In the current study, however, children had longer vowel durations than adults before both voiced and voiceless codas: the mean vowel duration in children was about 400 ms for voiced and 300 ms for voiceless codas whereas in adults it was 160 ms for voiced and 120 ms for voiceless (see Table IV). Surprisingly, the 4-year-olds in the present study still exhibited a larger magnitude of vowel durational difference than adults for voicing contrasts. This suggests that it may take time for children to develop the fine articulatory timing control as discussed above in relation to speech rate (e.g., Green et al., Reference Green, Moore and Reilly2002).
Although the burst duration of the 2-year-olds in Song et al. (Reference Song, Demuth and Shattuck-Hufnagel2012) did not differ from that of the adults, the 4-year-olds in the current study produced longer burst durations than the adults. This difference might be related to different speech registers: lab speech in the present study vs. spontaneous speech in Song et al. (Reference Song, Demuth and Shattuck-Hufnagel2012).
Both the 2;6-year-olds in Song et al. (Reference Song, Demuth and Shattuck-Hufnagel2012) and the 4-year-olds in the current study produced VB for approximately 60% of all voiced codas, suggesting that the use of VB as a cue to coda voicing might have been established by around 2;6 years. Interestingly, these 2-year-olds produced more VB for voiceless codas than the 4-year-olds, which might be related to the different dialects these children were acquiring (i.e., American English vs. Australian English). The use of IPP also varies between the two studies. The (American English-speaking) 2-year-olds produced IPP for about 30% of all voiceless codas, whereas the (Australian English-speaking) 4-year-olds did so for over 70% of all voiceless codas. This difference might be due to the dialectal difference between Australian and American English, given that in both studies adults and children had a similar proportion of IPP. This suggests that the language-specific association of IPP with voicelessness in Australian English might limit any ambiguous use of VB to signal voicelessness.
The results of the present study thus build on previous studies of younger, American English-speaking children, showing that Australian English-speaking 4-year-olds can use adult-like acoustic cues to coda voicing contrasts, including both durational information (vowel, closure and burst durations) and binary cues (IPP and VB). However, even at this older age, children's acoustic implementation of the durational cues tends to be longer than those of adults, and they still use less IPP and VB. These findings contribute to our understanding of phonological development in typically-developing children and provide a much-needed acoustic baseline for evaluating the development of voicing contrasts in populations with language delay.
Unlike stop voicing contrasts in word onset position, coda stop voicing contrasts are still understudied. Although we have here made a start at remedying this situation by looking at the acoustic cues to coda voicing contrasts in production, it would be interesting in future to investigate the perception of coda contrasts. This could elucidate, for instance, whether children rely on different acoustic cues than adults when listening to voicing distinctions in coda position, and how the weighting of various cues develops over time. This would provide a more comprehensive picture of coda development, providing a baseline for understanding the perception and production of codas in other populations, such as those with hearing difficulties.
Conclusion
The goal of the present study was to determine if pre-school-aged children had acquired adult-like phonetic implementations for coda voicing contrasts, critical for distinguishing word meanings in English. Our results provide a much-needed acoustic understanding of children's ongoing phonological development. Since most language evaluations of clinical populations are transcription-based, systematic acoustic analysis is essential for providing complementary information about how and when voicing contrasts are acquired in these populations. The findings presented here will therefore provide a valuable baseline for contributing to our knowledge of voicing contrasts in Australian English, and for evaluating challenges faced by various populations with language delay.
Acknowledgments
We thank the members of the Child Language Lab and the Phonetics Lab at Macquarie University for helpful comments and feedback. We also thank Xin Cheng for assistance with data coding. This research was supported, in part, by a Macquarie University scholarship (2013225) to the first author, an Australian Research Council (ARC) Laureate Fellowship to the last author (#FL130100014), and by the ARC Centre of Excellence in Cognition and its Disorders (#CE110001021). The authors have no competing interests to declare.