Introduction
Although children's ability to convey prosodic functions is by and large established at the age of 5, some functions, such as distinguishing compounds (e.g., ‘ice-cream’) from lists (e.g., ‘ice, cream’), continue to develop until the age of thirteen (Wells, Peppé & Goulandris, Reference Wells, Peppé and Goulandris2004). This involves ‘prosodic chunking’, or the use of prosody to delimit units within an utterance. Wells and colleagues examined prosodic chunking in British English. Age-related differences in the comprehension suggested that children become better at utilizing the relevant acoustic cues between the ages of eight and thirteen. This is in line with reported age-related improvements in other prosodic phenomenon, such as differentiating compound vs. phrasal stress (Vogel & Raimy, Reference Vogel and Raimy2002). An age-related change was also found in production, when Wells and colleagues analysed error data. Five-year-olds differed from older children in having more errors on two-item lists containing a compound than with three-item lists without one. Since the reported errors from Wells et al. (Reference Wells, Peppé and Goulandris2004) were based on perceptual evaluation, it is not clear which acoustic cues the listener may have used. There is some evidence suggesting a correspondence between listener accuracy and acoustic realizations in other types of prosodic contrasts, such as distinguishing questions from statements (Patel & Grigos, Reference Patel and Grigos2006, Patel & Brayton, Reference Patel and Brayton2009). It is therefore necessary to examine children's acoustic realizations for a better understanding of how they implement acoustic cues during prosodic chunking. As children also exhibit individual variation in their acoustic implementations (cf. Dankovičová, Pigott, Wells & Peppé, Reference Dankovičová, Pigott, Wells and Peppé2004), it is also important to be able to replicate the observation about children's prosodic abilities (Wells et al., Reference Wells, Peppé and Goulandris2004) (cf. discussions in Open Science Collaboration, 2015). With that in mind, the present study examined how the 5-year-old Australian English-speaking children acoustically realize the prosodic distinction between compounds and lists, and the extent to which the use of some acoustic cues might differ from the adults.
Adult use of acoustic cues to distinguish compounds from lists
Compounds are prosodically complex, composed of two prosodic words which combine to form a new prosodic word: [ice]PW [cream]PW]PW with monomorphemic status (cf. Wheeldon & Lahiri, Reference Wheeldon and Lahiri2002; Wynne, Wheeldon & Lahiri, Reference Wynne, Wheeldon and Lahiri2018). Wells et al. (Reference Wells, Peppé and Goulandris2004) postulated two different structures for a noun + noun compound and two nouns respectively (Figure 1). The compound constitutes a single prosodic word and a phonological phrase, but the corresponding nouns constitute two prosodic words and two phonological phrases. These two structures differ in prominence pattern and temporal structure as reflected in different acoustic cues.
First, in terms of prominence, it is generally agreed that the first noun of the N + N compound is more prominent than the other (e.g., Liberman & Prince, Reference Liberman and Prince1977; Nespor & Vogel, Reference Nespor and Vogel1986). Therefore, the compound ‘ice-cream…’ will receive more prominence on ‘ice’ than ‘cream’ resulting in a strong-weak pattern; but in a list ‘ice, cream…’ will receive equal prominence on both nouns resulting in a strong-strong pattern (Liberman & Prince, Reference Liberman and Prince1977; Nespor & Vogel, Reference Nespor and Vogel1986; Hayes, Reference Hayes1995). Prominence is often associated with a pitch accent. When prominence is measured in terms of F0 for every word in a compound, the strong-weak prominence pattern will be realised as an F0 drop. However, the strong-strong prominence pattern will be realized as a relatively flat F0 between words in a list.
Secondly, compounds and the corresponding two nouns will show different temporal structures, because of the absence vs. presence of a boundary. Due to its monomorphemic status, a compound is not likely to be inserted with a boundary within itself. Compound as a prosodic word also does not allow higher-level boundary (e.g., phonological phrase boundary) to occur within itself because this will violate the Strict Layer Hypothesis (Nespor & Vogel, Reference Nespor and Vogel1986). The absence of an internal boundary will foster tight temporal cohesion of a compound, resulting in short duration (cf. Farnetani, Torsello & Cosi, Reference Farnetani, Torsello and Cosi1988). However, a list can be inserted with a phonological phrase boundary. The presence of a postulated phonological phrase boundary in a list then will lead to pre-boundary lengthening. When a word occurs at the end of a sentence, as in ‘He is a good boy’, it is longer than when it occurs in utterance-medial position, as in ‘The boy is happy’. Note that this is also pre-pausal. Pre-boundary lengthening at the end of an utterance is often referred to as utterance/phrase-final lengthening (Streeter, Reference Streeter1978; Lehiste, Reference Lehiste1972; Klatt, Reference Klatt1975, Reference Klatt1976; Scott, Reference Scott1982). Pre-boundary lengthening and pauses are also found to correlate with phonological phrase boundary and above (e.g., Price, Ostendorf, Shattuck-Hufnagel & Fong, Reference Price, Ostendorf, Shattuck-Hufnagel and Fong1991; Wightman, Shattuck-Hufnagel, Ostendorf & Price, Reference Wightman, Shattuck-Hufnagel, Ostendorf and Price1992). The current study focused on pre-boundary lengthening in utterance medial position. Recent findings suggest that the scope of pre-boundary lengthening is not restricted to the final syllable of a word (Turk & Shattuck-Hufnagel, Reference Turk and Shattuck-Hufnagel2007), but can also extend as far as an initial unstressed syllable in a word such as ‘guitar’ (Cho, Kim & Kim, Reference Cho, Kim and Kim2013). Therefore, overall word duration, rather than simply syllable or rhyme duration, was used as an indicator of pre-boundary lengthening in the current study.
The other cue to the presence of a boundary is a silent interval (i.e., a pause) (e.g., Cutler & Butterfield, Reference Cutler and Butterfield1990). Pauses, like pre-boundary lengthening, are also associated with high-level prosodic boundary such as phonological phrases and above (Price et al., Reference Price, Ostendorf, Shattuck-Hufnagel and Fong1991). The use of pauses in distinguishing compounds from lists was evident in Peppé, Maxim & Wells (Reference Peppé, Maxim and Wells2000), reporting the absence of pauses in 94% of compounds, and the presence of pauses in 88% of lists. Pause duration has also been observed to increase when the strength of prosodic boundary goes up from an ‘intermediate phrase boundary’ (e.g., phonological phrase in the current study) to an ‘intonational phrase boundary’ (Choi, Reference Choi2003). Thus, given the structural difference between compounds and lists presented in Figure 1, pause duration (if there is one) is likely to be longer in lists than compounds. In short, adults can employ F0, pre-boundary lengthening, and a pause to distinguish these two structures in utterance-medial position.
Child use of acoustic cues to compounds and lists
Some perception studies have suggested that five-year-olds are sensitive to F0 and word duration patterns when distinguishing conjoined structures such as ‘pink and (green and white) socks’ from ‘(pink and green) and white socks’ (Beach, Katz & Skowronski, Reference Beach, Katz and Skowronski1996), and structures such as ‘sunflower, pot’ from ‘sun, flowerpot’ (Yoshida & Katz, Reference Yoshida and Katz2004, Yoshida, Reference Yoshida2007). In her thesis, Yoshida (Reference Yoshida2007) examined American English-speaking five- and seven-year-old children's use of prosody in processing ambiguous grammatical structure in the information-integration framework. She conducted a perception experiment using a test pair of sentences: ‘sunflower, pot’ and ‘sun, flowerpot’. The main findings were that five- and seven-year-old children exhibited an adult-like cue-trading pattern between F0 and word/pause duration, with an age-related developmental shift in the use of F0 as a prosodic cue (cf. Yoshida & Katz, Reference Yoshida and Katz2004). In other words, young children relied less on F0 than duration, relative to other age groups. This observation differs from that of Beach et al. (Reference Beach, Katz and Skowronski1996) who reported a developmental shift in children's use of durational cues in American English. Perhaps this might be related to the fact that Yoshida (Reference Yoshida2007) manipulated both word and pause duration, whereas Beach et al. (Reference Beach, Katz and Skowronski1996) manipulated only word duration. Another study investigating compounds vs. phrasal stress has also reported that American English-speaking five-year-olds were less accurate than older children and adults in using acoustic cues to differentiate the two types of stress, partly because of the different prosodic domains to which stress is assigned (Vogel & Raimy, Reference Vogel and Raimy2002). These differences in children's perceptual performance in different studies might be related to the different prosodic functions under investigation: prosodic disambiguation by means of the location of boundary in Beach et al. (Reference Beach, Katz and Skowronski1996) and Yoshida & Katz (Reference Yoshida and Katz2004), vs. prosodic domains for stress assignment in Vogel & Raimy (Reference Vogel and Raimy2002). These perception studies suggest that, by the age of five, children are sensitive to various prosodic cues, but that their ability to employ some acoustic cues is still not adult-like. The developmental patterns reported might thus be related to children's learning to map acoustic cues to specific prosodic functions in production.
Of relevance to the current study are a few production studies examining the acoustic realization of compounds vs. lists (cf. Yoshida, Reference Yoshida2007; Yoshida & Katz, Reference Yoshida and Katz2006; Dankovičová et al., Reference Dankovičová, Pigott, Wells and Peppé2004). In Yoshida (Reference Yoshida2007), American English-speaking five- and seven-year olds described a picture prompt to a blindfolded experimenter in one of the two target sentences: ‘sunflower, pot’ or ‘sun, flowerpot’ in a total of seven repetitions. The F0 contour and word duration of ‘sun’ and ‘flower’ were measured, as well as two inter-word pause durations (one between ‘sun’ and ‘flower’ and the other between ‘flower’ and ‘pot’). The main findings were that these five- and seven-year-old children exhibited adult-like patterns for F0, word and pause duration to distinguish the two prosodic structures: ‘sunflower, pot’ vs. ‘sun, flowerpot’ (cf. Yoshida & Katz, Reference Yoshida and Katz2006). In addition, an age-related difference in pause duration was found: the seven-year-old children used shorter inter-word pause durations than the five-year-olds and the adults between ‘sun’ and ‘flower’. Although pause was expected in this inter-word location in the target sentence ‘sun, flowerpot’, most seven-year-olds did not use any. This suggests that the mapping of pause duration and incidence of pauses to the function of prosodic chunking takes time to master.
Despite the fact that the use of phrase-final lengthening appears in children around two years, at least at the ends of utterances (Snow, Reference Snow1994), American English-speaking five-year-old children did not exhibit adult-like use of utterance medial word and pause durations to delimit items according to different phrasal groupings in the ‘pink and green and white’ example mentioned above (Katz, Beach, Jenouri & Verma, Reference Katz, Beach, Jenouri and Verma1996). This contrasts with findings from Yoshida (Reference Yoshida2007), who argued that Katz et al. (Reference Katz, Beach, Jenouri and Verma1996) might have underestimated the five-year-old children's ability to use durational cues for prosodic grouping/chunking.
Examining children's use of durational cues (i.e., pre-boundary lengthening and pause duration) in British English, Dankovičová et al. (Reference Dankovičová, Pigott, Wells and Peppé2004) focused on the data from the eight-year-olds who completed the Profiling Elements of Prosodic Systems – Child version (PEPS-C) in Wells et al. (Reference Wells, Peppé and Goulandris2004). Although these children as a group utilized adult-like pre-boundary lengthening and longer pause durations as cues to the presence of phonological phrase boundary, there was individual variation in the use of two temporal/durational cues: while a third used both temporal/durational cues unambiguously, in line with the adult pattern, two thirds did not. This suggests that children are still not consistent in how they utilize durational cues at the age of eight.
Although these previous studies indicate that children generally can use prosodic cues to reflect phrasal groupings, there are variations in how these acoustic cues, particularly durational cues, are used from different age groups and English dialects. To disentangle children's ability to use acoustic cues in prosodic chunking from those potential confounds, it is necessary to be able to generalize the previous observations to other English dialects when age, stimuli and experimental design are similarly matched to those in previous studies (e.g., Yoshida, Reference Yoshida2007; Yoshida & Katz, Reference Yoshida and Katz2006; Wells et al., Reference Wells, Peppé and Goulandris2004).
The current study therefore examined Australian English and investigated the generalizability of previous observations as to how five-year-old children use acoustic cues for producing compounds (N1+N2) and lists (N1, N2) in utterance-medial position, and the extent to which these children are adult-like in their use of acoustic cues. We tested this using an elicited production task where we examined the acoustic realisation of each of these relevant acoustic cues in children's productions, and compared this to that of adults.
Hypotheses
Given previous reports regarding children's ability to use adult-like acoustic cues to delimit a compound (e.g., Yoshida & Katz, Reference Yoshida and Katz2006; Yoshida, Reference Yoshida2007), three different hypotheses were thus formulated, one for each of the three cues.
• H1: Given that five-year-olds showed some degree of distinct F0 patterns in Yoshida (Reference Yoshida2007), we expected that five-year-olds might also be able to use F0 to reflect distinct prominence patterns for compounds and lists. A strong-weak prominence pattern within a compound would result in mean F0 to fall, and a strong-strong prominence pattern in lists would result in mean F0 to be relatively flat. These F0 patterns would hold for children and adults, if both groups share similar prosodic structures.
• H2: Given that children as young as two years can use utterance-final lengthening to mark an utterance boundary (e.g., Snow, Reference Snow1994), and that five-year-olds lengthen duration of a word preceding a phonological phrase boundary in ‘sun, flowerpot’ (e.g., Yoshida & Katz, Reference Yoshida and Katz2004, Reference Yoshida and Katz2006), we expected the five-year-olds to implement pre-boundary lengthening in lists with a phonological phrase boundary. As a result, N1 would be longer in lists than compounds. Therefore, words in a list would have an overall longer duration than the same two words in a compound. The pattern of pre-boundary lengthening would hold for children and adults, if both groups share similar prosodic structures.
• H3a: Based on Cutler & Butterfield (Reference Cutler and Butterfield1990) and Yoshida (Reference Yoshida2007), we expected fewer pauses in compounds than in lists, because it is less likely for a pause to occur within a compound than between words in a list. This pattern of pause distribution would hold for children and adults, if both groups share similar prosodic structures.
• H3b: Based on Yoshida (Reference Yoshida2007) and Dankovičová et al. (Reference Dankovičová, Pigott, Wells and Peppé2004) we expected that pause duration between N1 and N2 would be longer in lists (N1, N2) than compounds (N1+N2). This pattern of pause duration would hold for children and adults, if both groups share similar prosodic structures.
Method
Participants
Twenty-four typically developing monolingual Australian English-speaking (AusE) children (9 M, 15 F) participated in the study (Mean age = 5;8 years, Range = 5;0–6;7). An additional seven children were excluded for failure to satisfy the criterion of producing 80% of the stimuli (n = 4), Autism Spectrum Disorder (n = 1), and experimental error (n = 2).
Twenty monolingual AusE-speaking undergraduates (6 M, 14 F) from the Sydney area formed the adult baseline for this experiment (Mean age = 21 years, Range = 18–30). They received course credit for participation in the study. A further 18 speakers were excluded, 12 due to exposure to an additional language at home, and six due to the heavy use of creaky voice, where F0 could not be accurately tracked over the voiced region of each noun.
Stimuli
The target stimuli consisted of seven noun-noun items, where N1 and N2 occurred as compounds in one condition and as part of lists in another (see Table 1). In the compound condition, the compound (N1+N2) was embedded in a two-item list (N1+N2, N3). In the list condition, the target stimuli (N1, N2) were part of a three-item list (N1, N2, N3). All the nouns were identical in both experimental conditions. The target stimuli were vetted by all the authors to ensure they were culturally appropriate and familiar to children in Australia. We controlled the word frequency of the nouns using ChildFreq (Bååth, Reference Bååth2010). The N1 items had a combined word frequency of 2199 per million with a mean word frequency of 314.1 per million. The N2 items had a combined word frequency of 1076 per million with a mean word frequency of 153.7 per million. According to ChildFreq, frequency counts were not available for three compound words. The combined frequency for the remaining compound words was 370 per million with a mean frequency of 92.5 per million. The number of syllables in N1 was varied to allow for generalisation across word length. All target stimuli occurred sentence-medially and were followed by another noun. This allowed us to dissociate utterance-final lengthening from any other boundary-related lengthening effects on the target stimuli. We also took care to use target words which were picturable and contained segments that five-year-old children could produce (Priester, Post & Goorhuis-Brouwer, Reference Priester, Post and Goorhuis-Brouwer2011). The stimuli were presented as coloured cartoon pictures of objects.
Procedure
At the beginning of the testing session, participants were presented with pictures of the objects to be used in the experiment and were asked to name the objects. This ensured that participants were familiar with both the target nouns and the corresponding visual stimuli. Three practice items were then used to familiarise the participants with the task and the carrier sentence: ‘I can see…’.
In the test phase, a female AusE native speaker played a language game with the participants where they were shown a set of two pictures for the compound condition or three pictures for the list condition and asked: ‘What can you see here?’ Participants were then instructed to respond by completing the carrier sentence using the names of the pictured objects in the order in which they appeared. No feedback was provided during elicitation. Thus, the response to a scene showing ice-cream and juice from left to right would be ‘I can see ice-cream and juice’. The response to a scene showing three items would be ‘I can see ice, cream and juice’. Compound and list items were pseudo-randomised to generate a test set. The item order of the test set was then reversed to generate a second test version. The two test versions were randomly assigned so that half the participants saw the pictures in one order, and the other half in the reverse order. The responses were audio-recorded in a sound insulated booth onto a computer using Audacity audio recording software at a sampling rate of 44.1 kHz, with a Behringer C2 condenser microphone.
Each participant produced 14 target sentences. This resulted in a possible total of 280 items from the adults and 336 items from the children. Nine items from the adult data and 36 items from the child data were discarded due to the insertion of ‘and’ between the two nouns of interest, misarticulation or naming errors (e.g., ‘ice, cubes’ produced as ‘ice, blocks’). Thus, a total of 271 items from the adults and 300 items from the children were included for acoustic coding.
Acoustic coding
All remaining productions were annotated and segmented in Praat (Boersma & Weenink, Reference Boersma and Weenink2011). Since the stimuli contained various segment types in the nouns of interest, we adopted the following criteria to identify the onset and offset of N1 and N2. For onsets: (a) when the noun contained no onset consonant, the beginning of clear F2 and voicing were used as cues to the noun onset, (b) when the onset consonant was a plosive or an affricate, the beginning was indicated by the onset of the burst release, (c) in nouns beginning with an approximant /w/, we used the intensity minimum and the lowest formant transition in F2 as the word onset, (d) when the onset contained the approximant /ɹ/, the F3 minimum was used as the point of demarcation, and (e) in case of a fricative onset consonant, the beginning of high energy noise was used. The criteria for offsets were: (a) minimal high energy noise to identify the end of a fricative coda, (b) the beginning of the burst release to signal the end of words with a plosive coda, (c) the end of nasal formants and voicing to indicate the end of words with a nasal coda, and (d) cessation of F2 to identify the end of word-final vowels.
Analysis
Using Praat (Boersma & Weenink, Reference Boersma and Weenink2011), we extracted F0 and the duration values of the two nouns (N1 and N2), as well as the duration of the pause between them, in both the compound (N1+N2) and the list (N1, N2) conditions. F0 values within the voiced region of N1 and N2 were examined for creak, glottalization and pitch errors (e.g., pitch doubling and pitch halving) in Praat. Pitch errors were corrected manually before automatically extracting the mean F0 over the voiced region for each noun. Due to the difficulty in tracking F0 with glottalization, nine of the 271 items from adults and one of 300 from children were removed from the data. This resulted in 262 items for the adults and 299 items for the children being used for statistical analysis.
Mean F0 was employed to minimize any micro-prosodic perturbation at the beginning and the end of the voiced region for each noun. Due to differences in vocal tract size, F0 differs between children and adults (e.g., Lee, Potamianos & Narayanan, Reference Lee, Potamianos and Narayanan1999; Vorperian, Wang, Chung, Schimek, Durtschi, Kent, Ziegert & Gentry, Reference Vorperian, Wang, Chung, Schimek, Durtschi, Kent, Ziegert and Gentry2009). Therefore, F0 was transformed into normalized F0 for each group according to the following: (F0 – mean of F0 group) / Standard deviation of F0 group. Since child speech is often characterised by a slower speaking rate than adult speech, N1 and N2 durations were also transformed into normalized durations for each group according to the following: (Duration – mean of duration group) / Standard deviation of duration group. Normalization of F0 and duration were further motivated by the findings in Aoyama, Akbari and Flege (Reference Aoyama, Akbari and Flege2016) who found that 10-year-old American English-speaking children produced longer utterances and higher F0 than adults in absolute terms, but these differences diminished using proportional metrics.
Since some items contained the closure duration of a plosive consonant between N1 and N2, a closure duration threshold was first factored in as a reference criterion to determine the presence or absence of a pause between N1 and N2. The reference closure duration was based on a recent corpus study of closure duration in English stops in TIMIT (Ghosh & Narayanan, Reference Ghosh and Narayanan2009). The reference closure duration for voiced bilabials was 83 ms (Mean of 63 ms plus SD of 20 ms), and that for voiceless velars was 74 ms (Mean of 54 plus SD of 20 ms). If the temporal interval between the end of N1 and the beginning of N2 exceeded the reference closure duration by segment type, i.e., 83 ms for bilabial and 74 ms for velars, a pause was coded as present. A pause was considered absent otherwise. The presence of a pause was then tallied from the productions of children and adults in both compound and list conditions for analysis. Since the same nouns were included in both conditions (i.e., compounds and lists) and the same procedure of determining pauses was applied to both conditions, the effect of a stop consonant on the pause metric was kept constant across conditions. For items coded with presence of pause, pause duration was calculated according to the following: Temporal onset of N2 – Temporal offset of N1. Since speech rate might influence pause duration (e.g., Goldman Eisler, Reference Goldman Eisler1968; Fletcher, Reference Fletcher1987; Trouvain & Grice, Reference Trouvain and Grice1999), we also normalized pause duration according to the following: (Pause duration - Mean of pause duration group) /Standard deviation of pause duration group. Normalized F0, normalized duration, number of pauses and normalized pause duration were treated as dependent variables in subsequent statistical analysis.
Ten percent of the trials were randomly selected from children (n = 31) and adults (n = 36) for recoding by another annotator to check for coding consistency. These items covered a range of segment types in the target stimuli to ensure representative sampling. N1 and N2 durations were used as the dependent variable for correlation analysis, which showed high levels of reliability between the two coders for both the child (r = .957, p < .0001) and adult data (r = .976, p < .0001).
Results
Data from the 24 children and 20 adults were analysed. The normalized F0, normalized duration and normalized pause durations were evaluated in the R package (R Core Team, 2015), using the lme4 package (Bates, Mächler, Bolker & Walker, Reference Bates, Mächler, Bolker and Walker2015). The anova function, which provides Satterthwaite's approximation to degrees of freedom for estimating p-values, was used to test for statistical significance of linear mixed effects models in the lmerTest package (Kuznetsova, Brockhoff & Christensen, Reference Kuznetsova, Brockhoff and Christensen2016). Pairwise comparisons of multi-level factors were performed with Tukey-HSD adjustments, using the lsmeans package (Lenth, Reference Lenth2016). The number of pauses was evaluated in a mixed effects logistic regression model (binomial), using the car package (Fox & Weisberg, Reference Fox and Weisberg2011).
F0
H1 predicted a strong-weak prominence pattern (i.e., F0 fall) from N1 to N2 in compounds, but a strong-strong prominence pattern (i.e., relatively flat F0) in lists. Figure 2 displays the overall pattern of normalized F0 for N1 and N2 in compounds and lists from children and adults.
A linear mixed effects model was fitted to normalized F0. The within-subject factors were Type (Compounds vs. Lists) and Noun position (N1 vs. N2), and the between-subjects factor was Group (Children vs. Adults). Random factors included by-subject and by-item intercepts and slopes. The results revealed significant main effects of Type and Noun position with a significant Type*Noun position interaction. Otherwise, no other interactions reached significance (Table 2). Figure 2 reveals how Type interacts with Noun position. While F0 rose from N1 to N2 in compounds (N1+N2), the F0 pattern was reversed in lists (N1, N2). That is, compounds exhibited a rising F0 pattern from N1 to N2, but lists exhibited a falling F0 pattern from N1 to N2. This result is counter to our predictions. The children and adults did not produce the expected F0 pattern. The unexpected F0 patterns might be related to the effect of intonation (cf. Morrill, Reference Morrill2011) to be discussed in Discussion. Despite that, compounds and lists exhibit distinct F0 patterns reflecting different structures.
Pre-boundary lengthening
According to H2, children, like adults, would use pre-boundary lengthening to indicate the absence vs. presence of a phonological phrase boundary in differentiating compounds (N1+N2) from lists (N1, N2), with longer duration for the N1 in the list condition. Figure 3 displays the patterns of normalized duration for compounds and lists in children and adults. Normalized duration of N1 and N2 in lists have positive values, suggesting lengthening. However, normalized duration of N1 in compounds has negative values, suggesting shortening (probably due to polysyllabic shortening within a compound). Normalized duration constituted the dependent variable in the linear mixed effects model, with Type, Noun position and Group as factors. The random structure of the model included by-subject and by-item intercepts and slopes.
The results showed significant main effects of Type and Noun position, with a significant Group*Noun position interaction (Table 3). As predicted, the overall normalized duration of N1 and N2 in lists was longer than that in compounds, and this pattern held for both children and adults, suggesting pre-boundary lengthening in lists. There was also a Group*Noun position interaction: the effect was due to a statistically significant difference between children and adults in the normalized duration of N1 in the compounds, with children showing less shortening of N1 than adults (Table 4).
Pauses
There were two H3 predictions. First, we expected a difference in the incidence of pauses between compounds and lists, because it is less likely for a pause to occur within a compound than in a list. Second, we expected pause duration within a compound to be shorter than that within a list (if there is any pause in a compound), because N1 in compounds is not in a pre-boundary position, whereas its N1 counterpart in lists is. Figure 4 displays the incidence of pauses in compounds and lists for both children and adults, and Figure 5 shows the respective normalized pause duration.
A mixed effects logistic regression model (binomial) was fitted to the number of pauses, with Type and Group as factors. The reported model included by-subject intercepts and slopes and by-item intercepts. There were significant effects of Type, Group and Type*Group interaction (Table 5).
Consistent with the prediction in H3a, children used significantly fewer pauses in compounds (N1+N2) than lists (N1, N2). Adults exhibited a similar pattern, with significantly fewer pauses in compounds (15%) compared to lists (88%). However, children employed three times more pauses than adults for compounds. The Type-by-Group interaction arose because children had a much higher use of pauses for compounds (48%) than adults.
To evaluate H3b, the normalized pause duration from items coded for the presence of pause was fitted in a linear mixed effects model, with Type and Group as factors. The model included by-subject intercepts and slopes and by-item intercepts. There was a significant effect of Type (Table 6). Thus, as predicted in H3b, pause duration in compounds was shorter than that in lists, for both children and adults.
As Wightman et al. (Reference Wightman, Shattuck-Hufnagel, Ostendorf and Price1992) reported pre-boundary lengthening to be larger for boundaries with pauses than those without, it is possible that the high usage of pauses in children's productions of compounds might be due to the presence of a major boundary. To explore this issue, we examined N1 durations in compounds and lists with vs. without pauses. Table 7 displays the means and standard deviations of the N1 durations in compounds and lists for children and adults. Children generally showed longer N1 durations in compounds and lists with pauses than those without. However, adults exhibited shorter N1 durations in compounds with pauses than those without. Since the N1 is part of a monomorphemic compound and is not in a pre-boundary position, pauses are not expected. If pauses occur in compounds, adults seem to compensate by shortening N1 duration. This suggests a compensatory relationship between the incidence of pauses and pre-boundary lengthening for adults, but this is not the case for children. Instead, children appear to have a longer N1 duration with a pause in compounds than without – possibly suggesting the presence of a major boundary.
Discussion
In this study we investigated whether the five-year-olds can use prosodic cues to distinguish compounds from lists in their speech productions, and the extent to which their use of cues is acoustically adult-like. Regarding the use of F0 as a cue, our results show that, like adults, these children can associate distinct F0 patterns with compounds and lists, with F0 rising from N1 to N2 in compounds and F0 falling in lists. However, these distinct F0 patterns did not correspond to the expected prominence patterns, namely the strong-weak pattern for compounds vs. the strong-strong pattern for lists. The compound stimuli in the current study were selected to have lexical stress assigned to N1, whereas both N1 and N2 in the list attracted stress. If F0 is used as one of the acoustic correlates of stress (e.g., Fry, Reference Fry1958; Morrill, Reference Morrill2011), it was expected to be higher in N1 than N2 for compounds, but equal between N1 and N2 for lists. As Morrill (Reference Morrill2011) has shown, the intonational/prosodic context can affect how robustly the F0 cue serves as a correlate of compound stress in American English. For instance, a compound with primary stress on ‘Red’ in ‘Red Sox’ had a higher F0 on ‘Sox’ than ‘Red’ when the compound was produced with question intonation which has a H% boundary tone. In our study, the five-year-old children and adults also exhibited a similar F0 pattern as reported in Morrill (Reference Morrill2011). This suggests that a H% boundary might have influenced how the expected prominence patterns of compounds could be realized. The short word length and the nature of segments (for example, voiceless stops selected for ease of segmentation) in our study made it difficult to examine the continuous F0 contours of compounds and lists to shed light on the use of boundary tone, which deserves future investigation. Recall that Vogel and Raimy (Reference Vogel and Raimy2002) found that children had difficulty using stress information to differentiate compounds from phrases during a listening task. Perhaps the difficulty children faced in relying on F0 as a cue to compound stress in Vogel and Raimy (Reference Vogel and Raimy2002) might be related to the ambiguity of F0 as a stress cue in different intonation contexts.
Similar to findings of American English in Yoshida and Katz (Reference Yoshida and Katz2006) and Yoshida (Reference Yoshida2007), our pre-boundary lengthening results indicate that, like the adults, Australian English-speaking children can employ duration to differentiate compounds from lists, with overall longer duration in lists. This suggests pre-boundary lengthening of a phonological phrase in lists. Children did not differ from adults in the duration of N1 and N2 in lists, indicating that the pattern of pre-boundary lengthening is adult-like by five years in both English dialects. This is also consistent with previous reports that five-year-old children around the same age can use duration as a cue to demarcate other types of prosodic units in perception (Beach et al., Reference Beach, Katz and Skowronski1996).
Results from pause occurrence and pause duration indicate that both children and adults used more pauses and had longer pause durations in lists than compounds. The long pause duration patterns in the current study replicated those in Yoshida (Reference Yoshida2007). Children, like the adults, generally employed pauses and pause duration to signal a phonological phrase boundary in lists, consistent with the findings in Price et al. (Reference Price, Ostendorf, Shattuck-Hufnagel and Fong1991) and Wightman et al. (Reference Wightman, Shattuck-Hufnagel, Ostendorf and Price1992). However, children differed from adults in their use of pauses in compounds, with pauses used 48% of the time after the N1 in compounds, compared to only 15% for adults, though both groups used comparable rates of pauses in the lists. In other words, there is an age-related difference in the incidence of pauses within a compound.
The children's high usage of pause in compounds is inconsistent with their pre-boundary lengthening pattern, whereby N1 and N2 in compounds are shorter than their counterparts in lists. On the one hand, the high incidence of pauses in compounds suggests some kind of boundary; on the other, N1 and N2 duration in compounds suggest no boundary. Price et al. (Reference Price, Ostendorf, Shattuck-Hufnagel and Fong1991) and Wightman et al. (Reference Wightman, Shattuck-Hufnagel, Ostendorf and Price1992) pointed out that pre-boundary lengthening increased with the presence of pause for high-level prosodic boundaries. When we compared the N1 duration in compounds with and without a pause, we found that children lengthened only N1 durations in the compounds with pauses, a temporal pattern opposite to that of adults who shortened N1 duration in compounds with pauses. This, together with the incidence of pauses, suggests that children might treat N1 in compounds as constituting a separate unit with a boundary. This inconsistent use of pauses and pre-boundary lengthening is in line with the inconsistent use of temporal cues from the eight-year-olds in Dankovičová et al. (Reference Dankovičová, Pigott, Wells and Peppé2004), which used stimuli that were similar to that used in the current study. Recall that the correct prosodic structure of a compound is a holistic prosodic word. Adults process compounds as a single prosodic unit during phonological encoding (Wynne et al., Reference Wynne, Wheeldon and Lahiri2018). Yet the five-year-olds in the current study did not seem to use word duration and incidence of pause in a coherent manner to reflect that.
Our interpretation of these findings is that children are somewhat uncertain about the mapping between durational cues and the prosodic structure of compounds (N1+N2) in their acoustic realization. Perhaps they have a problem in suppressing lexical word structure in compounds. If children construct the high-level complex prosodic word (i.e., PW) of compound by building from embedded prosodic words in a bottom-up manner, this will lead them to insert a boundary between ‘ice’ and ‘cream’ in ([[ice]PW[cream]PW]PW. This tendency for five-year-olds to insert a pause in compounds then suggests that lexical structure might have (mis)guided children to formulate prosodic structure (cf. Vogel & Raimy, Reference Vogel and Raimy2002). Perhaps children first acquire the prosodic cues for simple prosodic words, and only later learn to use and weigh the word duration and pause cues consistently to reflect high-level prosodic units (e.g., PW) and different prosodic structures/domains (see Gerken, Reference Gerken2006; Demuth & McCullough, Reference Demuth and McCullough2009, and Tang, Yuen, Xu Rattanasone, Gao & Demuth, Reference Tang, Yuen, Xu Rattanasone, Gao and Demuth2019, for similar proposals for younger children in other domains). This interpretation is based on the observation that children do not compensate the presence of pause by shortening N1 duration for compounds. Since unfilled pauses might also reflect verbal planning functions in five-year-olds (cf. MacWhinney & Osser, Reference MacWhinney and Osser1977), the high incidence of pauses might also suggest that it is cognitively demanding for children to encode the recursive structure of compounds during planning in lab speech, resulting in the inconsistent use of durational cues. In spontaneous speech, it would then be even harder for children to do so.
Although five-year-old children can use F0, word duration, pause and pause duration to indicate the different prosodic structures of compounds and lists in Australian English, they are not yet adult-like in their use of pauses. Our findings are in partial agreement with previous findings. On the one hand, the findings in American English from Katz et al. (Reference Katz, Beach, Jenouri and Verma1996) does not fit neatly with our findings in Australian English. Perhaps this might be related to the different linguistic structures of the stimuli and tasks. It might be more difficult and cognitively demanding to map durational cues to reflect the three phrasal groupings/structures of ‘pink and green and white’: (pink) (and green) (and white) vs. (pink) (and green and white) vs. (pink and green) (and white) in a spontaneous speech task (Katz et al., Reference Katz, Beach, Jenouri and Verma1996). In contrast, in our study there are only two structures to be disambiguated: ‘(ice-cream) (and juice)’ vs. ‘(ice), (cream), (and juice)’. Yoshida (Reference Yoshida2007) eliciting similar linguistic structures in American English, found a similar ability of five-year-old children to use acoustic cues for prosodic chunking.
On the other hand, the age-related difference in the use of pauses in Australian English was not observed in American English (Yoshida, Reference Yoshida2007). Perhaps this might be related to the number of test items. Yoshida (Reference Yoshida2007) tested a pair of phrases: ‘sun, flowerpot’ vs. ‘sunflower, pot’; whereas the current study examined seven different test pairs (see also Dankovičová et al. (Reference Dankovičová, Pigott, Wells and Peppé2004) who reported individual variation in the use of durational cues by eight-year-old children, using nine different test pairs).
Our data show that children do not have problems using acoustic cues to reflect different structures, but may have problems planning where and what kind of boundary to use for compounds. This raises further questions as to what kinds of structural frames (prosodic or lexical) children generate and use to guide their speech production and planning and when this become adult-like. The current findings therefore have implication for assessing the prosodic abilities of atypical populations as well, such as children with autism spectrum disorder (ASD) who encounter problems in processing prosody (Peppé & McCann, Reference Peppé and McCann2003).
Conclusion
This study found that five-year-old children can utilize different acoustic cues to distinguish compounds from lists. Like adults, they employ pre-boundary lengthening and pauses to signal the presence of a phonological phrase boundary in lists. In other words, they do not have problems in using temporal cues to signal a boundary. However, these temporal cues were used inconsistently in compounds, suggesting that children do not have an adult-like mapping of acoustic cues to the prosodic structure of compounds during planning. It seems that children tend to preserve a lexical word representation of N1 in a compound, in competition with the status of a compound as a holistic complex prosodic word. This suggests that the challenge of prosodic chunking for five-year-old children may be related to the recursive prosodic word structure of compounds. These findings raise further questions regarding how and when children can construct adult-like prosodic structure for compounds, and how and when they can use these acoustic cues to distinguish compounds from lists. These issues can be tested in language comprehension as well, comparing the results to production in the same children.
Acknowledgments
This research was supported by Macquarie University and the following grants from the Australian Research Council (ARC): ARC CE110001021, awarded to Crain et al.; ARC FL130100014, awarded to Katherine Demuth. We would like to thank our participants for taking part in the research, Amit German for data collection and coding, and the Child Language Lab and Phonetics Lab for helpful feedback and suggestions.