INTRODUCTION
Prominences encode the hierarchical relationships between prosodic units and signal the edges of phrases (cf. Beckman, Hirschberg & Shattuck-Hufnagel, Reference Beckman, Hirschberg, Shattuck-Hufnagel and Jun2005; Cruttenden, Reference Cruttenden1986; Pierrehumbert, Reference Pierrehumbert1980). Prominences at the word level are due to lexical stress, and are conveyed via duration and amplitude changes (Fry, Reference Fry1955; Kochanski, Grabe, Coleman & Rosner, Reference Kochanski, Grabe, Coleman and Rosner2005; Lehiste & Fox, Reference Lehiste and Fox1990; Mo, Reference Mo2008; Turk & Sawusch, Reference Turk and Sawusch1996). Vowel quality and a high fundamental frequency (F0) are also frequently associated with lexical stress, but these correlates vary across lexical items and prosodic environment (Beckman & Edwards, Reference Beckman, Edwards and Keating1994; Huss, Reference Huss1978). Prominences at the phrase level are due to default or context-dependent pitch accenting, and are conveyed by F0 peaks or valleys (Beckman, Reference Beckman1986; Cooper, Eady & Mueller, Reference Cooper, Eady and Mueller1985; Fry, Reference Fry1958; Mo, Reference Mo2009).
In adult speech, phrasal pitch accents land on stressed syllables (Bolinger, Reference Bolinger1961; Hayes, Reference Hayes1984, Reference Hayes1995; Shattuck-Hufnagel, Ostendorf & Ross, Reference Shattuck-Hufnagel, Ostendorf and Ross1994; Vanderslice & Ladefoged, Reference Vanderslice and Ladefoged1972), giving rise to prominence integration. The alignment of prominences at the phonological level is conveyed by the temporal alignment of acoustic features associated with lexical stress and pitch accenting. Prominence integration is especially striking under conditions where lexical stress is shifted from a default position, for example, in a stress clash context (Hayes, Reference Hayes1984, Reference Hayes1995; Liberman & Prince, Reference Liberman and Prince1977). The stress clash context allows for the possibility of misalignments between lexical stress and pitch accent, which underscores the notion that word- and phrase-level prominences must be actively integrated at some level in phonological or phonetic planning. The current study focused on the questions of how word- and phrase-level prominences are organized in the language of school-aged children and whether the integration of prominences is as robust in child as in adult language.
Integrated word- and phrase-level prominences in adult language
Phrases where lexical stress shift is likely to occur may be used to examine how tightly word- and phrase-level prominences are integrated, because this stress shift may or may not be accompanied by a shift in the location of pitch accent. Stress shift reportedly occurs in clash contexts, that is, in contexts where word sequencing results in adjacent, lexically stressed syllables. Consider, for instance, Liberman and Prince's (Reference Liberman and Prince1977) classic example of leftward stress shift in the phrase thirteen men, which is represented in Figure 1 using metrical grid notation. In the grid, ‘x's mark each syllable at the lowest level, then lexical stress at the next level, and phrasal prominence at the highest level. The height of an ‘x’ bar represents cumulative prominence for each syllable. Cumulative prominences are reflected in the phonetic realm following the general assumption, well-articulated by Cho and Keating (Reference Cho and Keating2009), that ‘phonological categories of prominence can be translated into a single prominence scaling, and that perceivers can differentiate varying degrees of prominence along such a scale’ (p. 468). According to this assumption, then, the highest ‘x’ bar in the grid is perceived as the strongest prominence, and the strongest prominence is in turn a consequence of prominence integration, that is, the temporal alignment of lexical stress and phrasal accent.
Fig. 1. Models of lexical and phrasal prominences in two-word phrases produced by adults. Circles indicate prominence location, and arrows indicate prominence shifting.
The arrow in panel A of Figure 1 shows only a word-level shift in prominence. In Metrical Stress Theory (Liberman & Prince, Reference Liberman and Prince1977), phrasal prominence (main stress) does not move. Rather, stress shift is viewed as a word-level phenomenon driven by a clash-avoidance strategy and by language-specific preferences; in this case, an English preference for a trochaic pattern. This view of stress shift can be contrasted with the approach taken in Intonational-Metrical Theory (Bolinger, Reference Bolinger1986; Gussenhoven, Reference Gussenhoven1991; Shattuck-Hufnagel et al., Reference Shattuck-Hufnagel, Ostendorf and Ross1994). In this theory, lexical stress shift at the word level is conditioned by phrase-level patterns as well as by language-specific preferences for some metrical patterns over others. The phrase-level patterns in English follow from a ‘template, with strong connections to rhythm, whereby a sort of annunciatory or attention-getting accent comes toward the beginning (of a phrase) and a “punch” accent comes toward the end’ (Bolinger, Reference Bolinger1985: 85). Panel B in Figure 1 shows how shifts in prominence at both word and phrase levels contribute to the accumulation of prominence under conditions of stress clash in Intonational-Metrical Theory.
Supporting evidence for the Intonational-Metrical Theory comes from the observation that stress shift is dependent on phrase position (Grabe & Warren, Reference Grabe, Warren, Connell and Arvaniti1995; Shattuck-Hufnagel, Reference Shattuck-Hufnagel1992; Shattuck-Hufnagel et al., Reference Shattuck-Hufnagel, Ostendorf and Ross1994). For example, a leftward stress shift will occur in the phrase thirteen monkeys when it is in phrase-initial position (e.g. Thírteen mónkeys escaped from the zoo this morning), but not when it is in phrase-final position (e.g. Media reports focused on the health of those thirtéen mónkeys). In fact, the preference for a phrase-initial accent is so strong, that stress shift may occur even in the absence of clash (e.g. Thírteen goríllas escaped from the zoo this morning) (Shattuck-Hufnagel et al., Reference Shattuck-Hufnagel, Ostendorf and Ross1994).
Word- and phrase-level prominence patterns in child language
Previous research has established that English-speaking children realize lexical stress by age two, and control trochaic (strong–weak) patterns earlier than iambic (weak–strong) patterns (Allen & Hawkins, Reference Allen, Hawkins, Yeni-Komshian, Kavanagh and Ferguson1980; Gerken, Reference Gerken1991; Kehoe, Stoel-Gammon & Buder, Reference Kehoe, Stoel-Gammon and Buder1995; Schwartz, Petinou, Goffman, Lazowski & Cartusciello, Reference Schwartz, Petinou, Goffman, Lazowski and Cartusciello1996), but very little is known about how or whether children subordinate word-level prominences in service of phrase-level patterns. Recent research suggests that children realize major pitch accents and boundary tones also by age two, but the repertoire and accuracy of tonal targets are more adult-like in trochaic than iambic patterns. For example, Snow (Reference Snow2007) found that young children's production of adult-like falling and rising intonational patterns in polysyllabic utterances is more restricted in a weak–strong context compared to a strong–weak context. Similarly, Astruc, Payne, Post, Vanrell, and Prieto (2013) demonstrated that toddlers are less able to align pitch accent peaks and stressed syllables in words with ultimate stress compared to in words with penultimate or antepenultimate stress. These results could suggest that children's early preference for the trochaic pattern (i.e. at the one-word stage) affords them more practice with prominence integration in trochees than in iambs, and this then generalizes to short utterances.
Work with children between five and ten years old suggests that phrase-level patterns of prominence integration in child language are immature well into middle childhood. Immature patterns include a strong bias towards an early accent in phrases such as hot dóg, perceived as compound stress (Allen & Hawkins, Reference Allen, Hawkins, Yeni-Komshian, Kavanagh and Ferguson1980; Atkinson-King, Reference Atkinson-King1970; Vogel & Raimy, Reference Vogel and Raimy2002); undifferentiated, and therefore unadult-like, prominence patterns in prosodic units with different morphosyntactic structures (Goffman, Reference Goffman2004); and difficulties incorporating two lexically stressed words into a single intonational phrase (Wells, Peppé & Goulandris, Reference Wells, Peppé and Goulandris2004).
If immature phrase-level patterns in child language result in the poor integration of word- and phrase-level patterns for some structures, we might expect a mismatch between lexical stress placement and pitch accent placement. These immature phrase-level patterns may be most apparent under conditions that promote stress shift in that different contexts defined by word-level prominence patterns allow for the misalignment of word- and phrase-level prominences.
Current study
The current study investigated child and adult realizations of lexical stress and pitch accents in clash and non-clash contexts in an effort to better understand the acquisition of integrated word- and phrase-level prominence marking. Although this study may be the first to examine both lexical stress and pitch accent in these contexts, it is not the first to examine children's sensitivities to clash and non-clash contexts. Goffman, Heisler, and Chakraborty (Reference Goffman, Heisler and Chakraborty2006) were interested in the possibility that children may exhibit a greater rhythmic readjustment in stress clash and gap contexts than adults due to a stronger preference for the strong–weak patterns in English. Goffman and colleagues elicited sentences with stress clash (e.g. Bób's púppup is falling) and stress gap (e.g. Bóbby's puppúp is falling) from four- to seven-year-old children using puppet props and skits. Kinematic data showed that children reproduced the strong–weak and weak–strong alternations in the nonword puppup (i.e. [ˈp^pəp] and [pəˈp^p]), but did not reorganize prominence patterns at the phrase level to avoid stress clash or stress gap contexts. Then again, neither did the adults in the study.
It turns out that stress shift is not always detected in production studies. This is because stress shift is probabilistic: the likelihood of its occurrence is influenced by several factors including metrical context (Hayes, Reference Hayes1995; Liberman & Prince, Reference Liberman and Prince1977; Quené & Port, Reference Quené and Port2002), phrase and word structures (Bolinger, Reference Bolinger1961; Gussenhoven, Reference Gussenhoven1991), word position in the phrase (Grabe & Warren, Reference Grabe, Warren, Connell and Arvaniti1995; Shattuck-Hufnagel, Reference Shattuck-Hufnagel1992; Shattuck-Hufnagel et al., Reference Shattuck-Hufnagel, Ostendorf and Ross1994), and speech rate (Quené & Port, Reference Quené and Port2002). These factors have not always been controlled in experiments that investigate the rhythmic organization of speech. For example, the stress clash context in Goffman et al.'s (Reference Goffman, Heisler and Chakraborty2006) study used the phrase-initial word Bob, which contains only one syllable, thus making a leftward stress shift in this word impossible.
Moreover, stress shift is not a stress pattern reversal from weak–strong to strong–weak. Production studies that find stress shift in clash contexts, like thirtéen mén, indicate that stress is not so much shifted as equalized across a word (Cooper & Eady, Reference Cooper and Eady1986; Grabe & Warren, Reference Grabe, Warren, Connell and Arvaniti1995; Horne, Reference Horne1990; Shattuck-Hufnagel et al., Reference Shattuck-Hufnagel, Ostendorf and Ross1994; Vogel, Bunnell & Hoskins, Reference Vogel, Bunnell, Hoskins, Connell and Arvaniti1995). That is, the lexically unstressed (e.g. thir-) and stressed (e.g. -teen) syllables in a word are produced with more similar duration and amplitude values in stress clash contexts than in non-clash contexts. At least for duration patterns, the neutralization of relative differences may be attributed to the shortening of stressed syllables in stress clash contexts rather than to the lengthening of unstressed syllables (Vogel et al., Reference Vogel, Bunnell, Hoskins, Connell and Arvaniti1995).
The current study was undertaken with full knowledge that stress shift can be difficult to detect. For this reason, we used a counting task that controlled for metrical context, phrase structure, information structure, target word position in the phrase, and speech rate. The task was designed to encourage highly rhythmic speech (i.e. speech with regularly timed intervals), albeit more naturally than in studies that have used continuous repetition of a particular phrase (i.e. cycling) or metronome-timing of speech (Cummins & Port, Reference Cummins and Port1998; Quené & Port, Reference Quené and Port2002). Metrical context (clash vs. non-clash) was manipulated in our counting task by using intervening nouns with different lexical stress patterns. We also elicited a straight count (e.g. thirteen, fourteen, fifteen, etc.) to provide a baseline for the assessment of stress shift and prominence integration. We assumed that word- and phrase-level prominences would be temporally aligned in the lexically stressed -teen syllable in the straight count condition. This assumption was experimentally validated in the present study.
With respect to prominence integration, pitch accent placement was expected to be in the number word in both phrasal contexts, and not on the noun, because in a counting task the number represents new information. This expectation was confirmed in a perceptual judgment experiment. Briefly, eleven adult judges were asked to decide which word was emphasized in the adult and child productions of the N-teen barbeque and N-teen banana phrases. On average, seven out of eleven judges perceived the number word as the most prominent element in these phrases. This result held for both phrasal contexts and for adult and child speech, thereby validating our assumption that number words would be accented in number+noun phrases.
Two sets of predictions were made with regard to stress shifting and prominence integration in phrasal contexts. The first set of predictions was made with regard to adult speech (1 and 2 below), and the second set with regard to child speech (3 and 4 below):
1. In a clash context, the phrase-level prominence was expected to be on the initial syllable due to an early accent bias in a phrasal context (see the discussion of Intonational-Metrical Theory above). Word-level prominence was predicted to shift due to the clash context and the pressure to integrate word- and phrase-level prominences.
2. In a non-clash context, adults were also predicted to align prominences because these prominences are integrated during planning. The location of cumulative prominence was predicted to be on the second syllable according to Metrical Stress Theory, but on the first syllable according to Intonational-Metrical Theory.
3. In a clash context, children were predicted to behave like adults insofar as the conditioning context also interacts with their presumed preference for trochaic stress patterns and an early accent bias.
4. In a non-clash context, children were predicted to misalign word- and phrase-level prominences if they faithfully preserve the lexical stress pattern and yet still implement an early accent bias.
To summarize, we examined the integration of stress and pitch accent in a phrase, which would entail the alignment of lexical stress and pitch accent on a single syllable in the number word. Acoustic correlates of lexical stress (duration and amplitude) and pitch accents (F0) were measured to determine the extent to which word- and phrase-level patterns shifted in the clash and non-clash conditions, as compared to a straight count condition.
METHODS
Participants
Twenty-five children and twenty-five adults participated in the study. At the time of the study, children ranged in age from 6;2 to 7;3. The adults were all University of Oregon undergraduate students. All participants were monolingual native speakers of American English. Seventeen of the twenty-five participants in each group were female and the rest were male. Child participants were developing typically according to parental report. Children also had average to above average receptive vocabularies, as determined by the Peabody Picture Vocabulary Test (Dunn & Dunn, Reference Dunn and Dunn2007), and normal hearing, as determined by a pure-tone hearing screen. Adult participants also reported normal hearing and language.
A separate set of twelve University of Oregon undergraduates provided auditory judgments of syllable prominence in number words.
Counting task
We asked children and adults to count from 1 to 20 under three conditions: a straight count (no phrasal context), a clash context, and a non-clash context. A straight count was elicited to provide information about the default location and baseline realization of lexical stress and phrasal accent during counting (e.g. thirtéen, fourtéen …). The task used to create the clash and non-clash context conditions was modeled after the playground counting method of ‘one Mississippi, two Mississippi, three Mississippi …’ (note the customary singular form of the noun). American children use this method to slow their counting pace in games such as hide-and-seek. In our task, the clash context condition was created by inserting the noun bárbeque in the count (e.g. thirtéen bárbeque, fourtéen bárbeque …). The non-clash context condition was created by inserting the noun banána in the count (e.g. thirtéen banána, fourtéen banána …). The order of the clash and non-clash context conditions was counterbalanced across participants.
The nouns barbeque and banana were chosen because these trisyllabic nouns are well matched in segmental and syllabic structure. We chose to use trisyllabic nouns rather than disyllabic nouns in order to avoid an effect of final phrase-boundary tones on the realization of word-level prominence patterns. The selected nouns were also highly familiar to six- and seven-year-old children, and so were easily depicted (see the task description below). While it is likely that our child participants had more lifetime practice producing the word banána than the word bárbeque, the counting task ensured that by the time children produced the first target phrase (thirtéen bárbeque), they had repeated the noun at least twelve times.
Participants performed the counting task in a quiet laboratory room in the presence of a tester. Whereas the task was simply demonstrated orally for adult participants, testers used a number line and a picture of either a barbeque or a banana to support their oral explanation of the task to children. The props were then used to pace children's counting by moving the picture across the number line. Pacing ensured that children maintained a consistent rate and rhythm across all numbers. Pilot work had indicated that children would otherwise race through the counting task and become unintelligible in so doing. Pacing also allowed the tester to back up on the number line so that a child could regain their counting rhythm if a disfluency occurred. The counting task was trivial for adult participants: all maintained a steady, natural speech rate; and their speech was highly intelligible and fluent. All participants' productions were digitally recorded for later acoustic analyses.
Prominence judgment task
In addition to the acoustic analyses, auditory judgments of syllable prominence in number words were collected to validate the use of the straight count condition as a baseline. Our manipulation of clash and non-clash contexts to examine phrasal prominence structures is only valid if -teen is the default prominent syllable in numbers produced during counting. The counting sequences were excised, blocked by group, and presented to twelve adult listeners. The listeners were asked to decide whether the first or the second syllable was emphasized in the number words. They responded by clicking one of three buttons displayed on a computer screen: ‘N-syllable’, ‘-teen syllable’, or ‘Unsure’.
Acoustic measurements
A total of 300 words and 600 two-word phrases were analyzed in Praat (Boersma & Weenink, Reference Boersma and Weenink2011). These were the six disyllabic stress-shiftable number words (i.e. 13–16, 18–19) produced by every child and adult participant in a straight count, and in phrases with an intervening strong–weak noun (bárbeque) or an intervening weak–strong noun (banána). Productions of phrases that contained pauses or some other disfluency were excluded from further analysis. One child produced the majority of his phrases disfluently, and so the entirety of his data was excluded. Acoustic measurements were made on the sonorant rhymes of 289 words (adults, N=149; children, N=140) and 558 phrases (adults, N=295; children, N=263). Sonorant rhymes were chosen for measurement instead of nucleus vowels to avoid problems associated with separating vowels from sonorant codas (Grabe & Warren, Reference Grabe, Warren, Connell and Arvaniti1995).
Word-level prominence patterns
To investigate the realization of lexical stress, rhyme duration and the average root-mean-square (rms) amplitude were measured in both sonorant rhymes of the number words (e.g. r1 and r2 in fifteen, Figure 2) and in the first two sonorant rhymes of the noun words (r3 and r4 in barbeque, Figure 2). Duration and amplitude ratios were then computed for each word by dividing the duration or amplitude of the first rhyme by the second. In this way, we were able to capture relative duration and amplitude patterns across the same number word produced in a straight count and with intervening nouns.
Fig. 2. An oscillogram and a spectrogram of one child's production of the phrase fifteen barbeque. Measurements of duration, amplitude, and F0 were extracted based on a segmentation of sonorant rhymes (r).
Phrase-level prominence patterns
To investigate the realization of pitch accents, F0 was recorded at three temporal points in each rhyme: 10%, 50%, and 90% into the rhyme duration. F0 ratios were computed for the number words by dividing the midpoint F0 values of the first syllable by the second. In addition, F0 values were normalized across a number word by subtracting a grand mean of all F0 values in the word from each individual value (F0i_norm=F0i−$\overline{M} $, where
$\overline{M}$ is an average of six F0 values). These normalized F0 values were used for a comparison of pitch accent locations in the number words produced by children and adults. A high–low pitch accent was identified as an F0 peak followed by a significantly lower F0 value in the following syllable.
Analyses
All data were analyzed using mixed effects modeling with speaker and item (number words) as random factors. Analyses of noun words were conducted to examine the characteristics of lexical stress in words with unambiguous lexical stress patterns. These analyses included group (children, adults) and stress pattern (strong–weak, weak–strong) as fixed factors. The analyses of number words included group (children, adults) and condition (straight count, clash, and non-clash) as fixed factors. An additional fixed factor of rhyme position (r1, r2) was included in an analysis of normalized F0 values. When the effect of condition was significant, differences between the straight count and two phrasal contexts were further investigated in pairwise comparisons. These tests were corrected for multiple comparisons using the Bonferroni method.
RESULTS
Prominence patterns in nouns and straight count numbers
The first set of analyses confirmed the expected duration and amplitude patterns for strong–weak and weak–strong nouns (bárbeque vs. banána). Duration and amplitude ratios in the noun words were much higher for the strong–weak pattern than for the weak–strong pattern [duration ratio, F(1, 531)=473·52, p <·001; amplitude ratio, F(1, 532)=135·87, p<·001]. The effect of group was not significant, nor was there an interaction between group and stress pattern. These results are evident in Table 1, which provides the means and standard deviations of ratios for child and adult productions of bárbeque and banána.
Table 1. Acoustic correlates of lexically specified stress patterns in context count nouns (barbeque, banana) and straight count numbers (N-teen)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:88240:20160412052206464-0901:S030500091300024X_tab1.gif?pub-status=live)
Next, duration and amplitude patterns of straight count numbers were examined to test the assumption that the -teen syllable is lexically stressed in this control condition. Table 1 shows that the patterns conformed neither to the canonical strong–weak or weak–strong pattern of the context nouns. Importantly, the duration ratios in the straight count number words were less than 1, which shows that the -teen rhyme was longer on average than the first syllable, thus the weak–strong lexical pattern was maintained. On the other hand, the duration (but not amplitude) ratios in the straight count words were significantly different from those of the weak–strong nouns [F(1,571·32)=480·40, p<·001].This difference can probably be attributed to the difference in the segmental composition of sonorant rhymes in the first syllable of the N-teen words compared to the highly reduced schwa rhymes in the banána nouns. Note that it is exactly this property that makes the N-teen words ‘stress-shiftable.’
Native listener perceptual judgments provided further support for the assumption that word- and phrase-level prominences would align in the second syllable in the straight count condition. The averaged proportion of ‘teen syllable prominent’ judgments was ·66 in this condition (children: 64%; adults: 68%), but dropped to ·32 in the phrasal context conditions (children: 35%; adults: 29%). For each category of auditory judgments, the difference between the child and adult productions was not significant (Mann–Whitney tests, p >·150). Taken together, the acoustic and auditory results validated the use of the straight count condition to assess prominence shifting in the phrasal conditions.
Word-level prominence patterns across conditions
The second set of analyses tested the effect of condition on duration ratios in number words. The first analysis showed a significant interaction between group and condition [F(2,824·07)=4·05, p=·018], and a nearly significant simple effect of condition[F(2,824·07)=2·93, p=·054] on duration ratios. The significant interaction between group and condition led us to split the data by group to test for an effect of condition in child and adult productions, respectively. These results are shown in Figure 3.
Fig. 3. Duration ratios associated with number words across the three conditions. The number was followed by bárbeque in the clash context and by banána in the non-clash context. Error bars indicate ±1 SE around the mean.
The second analysis showed that condition had no effect on adults' realization of lexical stress. This result indicates that adults did not stress shift in either of the phrasal contexts. Condition had an effect on children's realization of number words [F(2, 384·01)=7·21, p=·001]. The difference between ratios in the clash condition and the straight count was not significant, and neither was the difference between the clash and non-clash conditions. This result indicates that children did not stress shift in either of the phrasal contexts. The significant effect of condition was therefore due to a difference between the straight count and the phrasal contexts. Mean comparisons indicated only significantly smaller duration ratios in the non-clash condition compared to the straight count (p=·001).
The finding that children had significantly smaller duration ratios in the non-clash condition compared to the straight condition context was due to a lengthening of the second rhyme compared to the first in the phrasal context. In particular, the mean durations of the first rhyme were similar across the two conditions (r1: straight count=109 ms, SD=46; non-clash condition=106 ms, SD=45), but those of the second rhyme were quite different (r2: straight count=133 ms, SD=50; non-clash condition=166 ms, SD=54). This lengthening suggests a strengthening of second syllable prominence. Although not significant, the mean duration ratio of number words in the clash condition was also smaller than the mean duration ratio of number words in the straight count condition. Again, this was due to a longer second rhyme in the phrasal context (r2: straight count=133 ms, SD=50; clash condition=146 ms, SD=59). The mean durations of the first rhyme were similar across the two conditions (r1: straight count=109 ms, SD=46; clash condition=108 ms, SD=45). Note also that the mean duration ratio of number words in the straight count condition was more similar to that of the adult mean duration ratio than to the mean duration ratios of number words in both of the phrasal contexts. Altogether these results suggest that children's realization of lexical stress changed as a function of phrase length rather than as a function of metrical context per se.
Like the analysis on duration ratios, the analysis on amplitude ratios indicated a significant interaction between group and condition [F(2, 823·31)=4·25, p=·015], and no significant simple effects. These results are shown in Figure 4.
Fig. 4. Amplitude ratios associated with number words across the three conditions. The number was followed by bárbeque in the clash context and by banána in the non-clash context. Error bars indicate ±1 SE around the mean.
When the data were split by group, the effect of condition was found only for adults [F(2, 439·01)=4·43, p=·012]. A comparison of amplitude ratios in the straight count and two phrasal contexts indicated somewhat lower ratios in the straight count condition compared to the non-clash and clash context, but these differences were not significant after corrections for multiple comparisons. Nonetheless, the result suggests some effect of phrase length on adult speech, just like in child speech. In contrast to the child data, though, the slight increase in prominence was in the first syllable of the number word (i.e. absolute phrase-initial position).
Phrase-level prominence patterns across conditions
The next set of analyses investigated the effect of condition on the realization of pitch accents. Results from the omnibus analysis showed significant main effects of condition [F(2, 803·81)=51·35, p<·001] and group [F(1, 24·51)=5·41, p=·029] on F0 ratios in number words production, but no interaction between the factors. These results are shown in Figure 5.
Fig. 5. F0 ratios associated with number words across the three conditions. The number was followed by bárbeque in the clash context and by banána in the non-clash context. Error bars indicate ±1SE around the mean.
Comparisons of F0 ratios within each group confirmed that these were significantly lower in the straight count condition than in the phrasal context conditions. The relatively low F0 ratio associated with the straight count suggests an initial low–high contour (LH), which could signal the presence of a low pitch accent on the first syllable (L*), or a high pitch accent on the second (H*). Of course, since the ratios were calculated using values averaged across the rhyme, the question of peak location cannot be directly addressed. However, the previously reported finding that listeners heard the second syllable as more prominent than the first in the straight count condition favors the LH* interpretation of the initial low–high contour for both child and adult speech.
In order to more accurately assess pitch accent location in number words, normalized F0 contours were reconstructed from the F0 measures to compare intonation patterns across the three conditions and across the two groups of speakers. These contours are shown in Figure 6.
Fig. 6. Reconstructed F0 contours of the number words in the straight count, the following bárbeque context (solid lines) and banána context (dashed lines). Error bars indicate ±1 SE around the mean.
Visual inspection of children's contours suggests the possibility of a leftward shift of the high tone in both phrasal contexts, whereas the adult low–high pattern appears to remain more or less constant across conditions. Analyses on normalized F0 values confirm the observation that intonation patterns differ across conditions more in child speech than in adult speech. An analysis on the F0 midpoint values in the two rhymes of the number words indicated a significant three-way interaction between group, rhyme position, and condition [F(2,1453·41)=4·47, p=·012]. When the data were split by group, the interaction between rhyme position and condition was significant in the child [F(2,704·74)=10·88, p<·001] and adult data [F(2,748·55)=3·80, p=·023], but for different reasons. Figure 6 shows that the interaction in the child data was due to a change in pattern from LH in the straight count to HL in the non-clash condition and to HH in the clash condition. We will refer to the change from LH to HL as a shift in pitch accent from the second to the first syllable, and to the change from LH to HH as the spreading of a high tone from the second to the first syllable. In contrast to the changes observed for children's speech, the interaction in the adult data was due to changes in the excursion from low to high, which at their maximum approximated the spreading of a high tone in the clash condition but never resulted in a pitch accent shift.
In sum, both children and adults produced an initial low–high contour in the straight count condition. Group difference emerged in the non-clash condition where adults continued to produce an F0 pattern more similar to that of the straight count, whereas children shifted the high tone from the second to the first syllable.
DISCUSSION
The current study examined the integration of word- and phrase-level prominence patterns in children's speech by investigating the temporal alignment of lexical stress and pitch accents in two-word phrases. By varying metrical context, we created opportunities for the misalignment of word- and phrase-level prominence on N-teen number words. The predictions were that adults would always align lexical and phrasal prominences, but that the location of cumulative prominence would certainly be on the first syllable in the clash context (prediction 1), but possibly on the second in the non-clash context (prediction 2). Children were predicted to behave like adults in the clash context; albeit for the additional reason that they have a preference for trochaic patterns (prediction 3). We expected children to preserve the repeating iambic lexical stress pattern in the non-clash context, but predicted that a misalignment in prominences might occur if children also preserved a preference for early accent (prediction 4). Even though prediction 1 was not upheld, in that prominences were temporally aligned with the second syllable across conditions in adult speech, word- and phrase-level prominences were always integrated. By contrast, word- and phrase-level prominences were never fully integrated in phrasal contexts in children's speech. Relative to the straight count condition, children strengthened the lexical prominence of the second syllable regardless of the conditioning context. This result was at odds with prediction 3. The F0 results indicated, however, that children used an initial high pitch accent in the phrasal contexts, albeit more consistently in the non-clash context than in the clash context. This result suggests an early accent bias in children's speech, but one that does not precipitate stress shift.
The misalignment of prominences in child speech is illustrated in Figure 7. This figure shows how children organized prominence patterns in two-word phrases (Panels B and C) as compared to one-word phrases (Panel A). In longer phrases, children (1) strengthened lexical prominence on the second syllable of the number words, and (2) either shifted or spread the high pitch accent to the phrase-initial syllable. By contrast, lexical and phrasal prominence patterns in adults' productions were more invariant (robust) across all types of phrase, as indicated by the duration and F0 correlates. Taken together, these results suggest that prominence patterns are less stable in child language compared to adult language. In particular, the results suggest that the prosodic structures that condition prominence integration are still immature in six-year-old children. Below, we discuss these findings, first with reference to the notion of stress shifting, and then in broader terms.
Fig. 7. Prominence patterns in children's productions of number words (word-level: duration correlate; phrase-level: F0 correlate). Arrows indicate the spread in phrase-level prominence compared to the straight count.
Prominence shifting and prosodic templates
Stress shifting from non-initial stressed syllables to phrase-initial syllables in English is viewed as an adjustment of the prosodic template in stress clash contexts (Metrical Stress Theory; Liberman & Prince, Reference Liberman and Prince1977) or as the realization of a preferred prosodic template (Intonational-Metrical Theory; Shattuck-Hufnagel et al., Reference Shattuck-Hufnagel, Ostendorf and Ross1994). Previous studies investigating stress shift have compared clash and non-clash contexts to each other (e.g. Goffman et al., Reference Goffman, Heisler and Chakraborty2006; Vogel et al., Reference Vogel, Bunnell, Hoskins, Connell and Arvaniti1995). Across context differences in such studies are interpreted as an effect of stress clash on prominence location. Stress shifting is said to have occurred when an unstressed syllable is strengthened or a stressed syllable is neutralized through weakening (e.g. Goffman et al., Reference Goffman, Heisler and Chakraborty2006; Vogel et al., Reference Vogel, Bunnell, Hoskins, Connell and Arvaniti1995). We have already noted that, in our study, there was little effect of metrical context in adult speech, and the expected stress shift was not observed. Given that context effects were interpreted with reference to a baseline condition, the near invariant patterning in adult speech likely represents a basic prosodic template associated with number word production in the specific functional linguistic context of counting. Although first syllable amplitudes in adult speech were slightly higher in the phrasal conditions compared to the straight count condition, word-level stress and the location of the high pitch accent was invariantly on the second syllable, as indicated by the duration ratios and F0 values across conditions. If the near invariant pattern in fact reflects a prosodic template for counting, then the findings fall outside of what is predicted by Metrical Stress Theory or Intonational-Metrical Theory with regard to stress shifting and phrase-initial pitch accenting.
As for the increased amplitude of phrase-initial syllables in two-word phrases in adult speech, we note that while amplitude typically covaries with duration (e.g. Fry, Reference Fry1955; Kochanski et al., Reference Kochanski, Grabe, Coleman and Rosner2005; Mo, Reference Mo2008), it also independently marks phrase-initial position (Cho & Keating, Reference Cho and Keating2009). For example, Cho and Keating found higher amplitude in phrase-initial vowels compared to phrase-medial vowels. This pattern of so-called initial strengthening corresponds to our results.
Phrasal context effect on children's prominence patterns
Adults exhibited minimal duration and F0 changes across counting conditions in our study, suggesting that they have a robust prosodic template for counting. By contrast, children exhibited substantial differences in their realization of prominence across conditions. In this section, we consider several explanations for this variability in child speech.
The six- and seven-year-old children in our study produced the same lexical prominence patterns as adults for trochaically and iambically stressed nouns and for number words in the straight count condition. The absence of between-group differences here is consistent with prior work showing that children phonetically distinguish lexical stress patterns using both duration and amplitude as early as age three or four years (Goffman & Malin, Reference Goffman and Malin1999; Kehoe et al., Reference Kehoe, Stoel-Gammon and Buder1995; Pollock, Brammer & Hageman, Reference Pollock, Brammer and Hageman1993). Where children differed from adults was in the production of number words in the phrasal context conditions. In particular, children strengthened the second syllable in number words when these were followed by nouns. This strengthening cannot be explained in terms of immature control over the correlates of lexical stress per se, or in classical terms of stress clash avoidance. Rather, the prominence adjustments observed in the current study suggest that children may focus on the production of lexical stress patterns at the expense of word- and phrase-level prominence integration. This possibility seems especially likely for the context with an iambic (weak–strong) alternation as described below.
It is possible that children were especially focused on word-level patterns in the weak–strong condition (e.g. non-clash context in this study) because the weak–strong pattern in a word with two full vowels (e.g. N-teen number words) may be more difficult for English-speaking children than the strong–weak pattern (e.g. bárbeque). As a consequence, children may ‘overdo’ the weak–strong pattern in contexts where adults do not. This suggestion is consistent with the documented preference for the trochaic metrical pattern in early childhood (Allen & Hawkins, Reference Allen, Hawkins, Yeni-Komshian, Kavanagh and Ferguson1980; Gerken, Reference Gerken1991; Kehoe et al., Reference Kehoe, Stoel-Gammon and Buder1995; Snow, Reference Snow2007), and with the developmental assumption that patterns that are acquired earlier are more easily accessed and controlled. Note also that this assumption is fully consistent with work by Goffman and colleagues who describe trochaic forms as requiring less modulation and iambic forms as requiring more, that is, a greater degree of articulatory weakening or strengthening of one of the syllables involved (e.g. Goffman et al., Reference Goffman, Heisler and Chakraborty2006; Goffman & Malin, Reference Goffman and Malin1999). In a phrasal context, the speaker can either default to the less modulated pattern through prominence neutralization (e.g. duration ratios closer to 1) or strive to maintain the weak–strong pattern (e.g. duration ratios closer to 0·5). Our suggestion is that in middle childhood children strive to maintain lexically specified patterns and, in a sense, end up ‘overdoing’ them, especially under the challenging condition of integrating the pattern into a phrase.
With regard to phrasal prominence patterns (i.e. pitch accenting), both children and adults in our study produced a low–high F0 contour in the straight count condition. The high tone was in the second syllable of the number words as illustrated in Panel A of Figure 7. In the clash context, this high tone tended to spread from the second to the phrase-initial syllable as illustrated in Panel B. Group differences emerged in the temporal alignment of the high tone in the non-clash context. In children's productions, the high tone shifted leftward to the phrase-initial syllable as illustrated in Panel C. In contrast, adults produced the high tone on the second syllable of the number words. The similarity of children's productions across the two contexts suggests that F0 contours in child speech may be more influenced by the presence of a phrasal context than by the type of context (clash or non-clash).
There are at least two types of explanation – one structural and the other pragmatic – for why a tone might shift to the phrase-initial syllable in children's productions of phrases. The first type of explanation concerns intonational structure, which can be decomposed into phrase boundary tones and pitch accents (Cruttenden, Reference Cruttenden1986; Pierrehumbert, Reference Pierrehumbert1980; Snow, Reference Snow2007). A phrase-initial rise in F0 may represent an interpolation between a low boundary tone and a high-tone pitch accent (L% and H* in ToBI transcription conventions; Beckman et al., Reference Beckman, Hirschberg, Shattuck-Hufnagel and Jun2005; Pierrehumbert, Reference Pierrehumbert1980). It is possible that the boundary tone is overridden by the pitch accent in children's productions of longer or more complex phrases because the F0 rise is more difficult for children to produce than other contours (Patel & Grigos, Reference Patel and Grigos2006; Snow, Reference Snow2007). The articulatory difficulty of an initial F0 rise may have to do with its slow, non-iconic F0 change. Alternatively, the rise could represent an immature understanding of intonational phonology, resulting in the collapse of boundary tones and pitch accents.
Along these lines, it is also possible that children have acquired intonational phonology, and merely have difficulty with the implementation. That is, it could be that children correctly represent the pitch accent on the second syllable, but inadvertently spread the tone to the first syllable in production. While we acknowledge that it is difficult to completely ascertain the location of pitch accent placement, we assume contra the possibility of a representation–production mismatch, that the underlying contour is H*L or H*H contour in the child data because the F0 values were higher in the first syllable. Moreover, a H*L or H*H target contour is consistent with perceptual and acoustic studies that suggest a strong early accent bias in child language (Allen & Hawkins, Reference Allen, Hawkins, Yeni-Komshian, Kavanagh and Ferguson1980; Atkinson-King, Reference Atkinson-King1970; Vogel & Raimy, Reference Vogel and Raimy2002).
A second type of explanation for children's pitch accent placement on the phrase-initial syllables is pragmatic. Note that the only new information in the counting task was the number word itself and, more specifically, the initial syllable of the N-teen number words (i.e. thir.teen, four.teen, fif.teen, and so on). The nouns barbeque and banana were repeated throughout the task and so always represented given information. According to the analysis in Pierrehumbert and Hirschberg (Reference Pierrehumbert, Hirschberg, Cohen, Morgan and Pollack1990), a high–low pitch accent (falling F0 contour) is used to highlight information that is new to the discourse. A low–high pitch accent (F0 rise) is used to mark information selected from a small well-known domain of alternatives. It could be that children used the first type of pitch accent and adults the second in our study. Then again, it could be that only children's behavior reflected pragmatic influences: adults may have found the counting task so mundane that phonological structure was privileged over information structure.
Phrasal prosody and compound noun prosody
Although we have been treating target productions in the phrasal contexts as generic numeral–noun phrases, one could also imagine them as compound noun phrases. In fact, there is a documented preference in early and middle childhood for compound stress patterns (Atkinson-King, Reference Atkinson-King1970; Vogel & Raimy, Reference Vogel and Raimy2002). This preference is observed in English-speaking children who, given a task of prosodic disambiguation between a compound noun interpretation and an adjective–noun interpretation of two-word collocations produced by adults (e.g. hót dog vs. hot dóg, gréen house vs. greenhóuse, respectively), tend to favor the compound noun interpretation. In other words, children tend to misperceive the first word in two-word phrases as most prominent, which is reminiscent of a trochaic bias and an early accent bias in young children's productions. The acoustic analysis of stimulus materials in Vogel and Raimy (Reference Vogel and Raimy2002) showed an F0 rise in the first word of compound nouns, whereas F0 fell slightly in the first word of adjective–noun phrases. It is possible that children's misperception of the falling contour as also indicative of a compound noun is due to an immature understanding of boundary tones and pitch accents. This immature understanding may lead children to collapse the two tone types and treat the initial rises and falls as variable realizations of an underlying initial high tone. If this is the case, then one could imagine that children might also default to producing initial high tones in two-word phrases. Thus, a final explanation for children's behavior in the current study is similar to the first: children may collapse boundary tones and pitch accents in a phrasal context. This could be due to an early accent bias, as previously suggested, which may itself be due to the misapprehension of %L+H* as another instance of H*. Future studies of prominence patterns in different types of tasks with different kinds of materials is clearly needed to further investigate the reason for the phrasal context effects found in the present study.
Integrated prominence and speech rhythmicity
As a final matter, let us consider the integration between word- and phrase-level prominence patterns in relation to speech rhythmicity. We conceived of prominence integration as the temporal alignment of lexical stress and pitch accent in a single syllable, which is indexed by coordinated changes in duration, amplitude, and F0. Since amplitude also has the function of marking prosodic boundary strength (Cho & Keating, Reference Cho and Keating2009), we will consider the question of prominence integration in terms of its other two correlates.
Focusing on duration and F0, we find that lexical stress and pitch accents were aligned in all adult productions in our study. Prominences in child speech were not integrated in this way. The variation suggests that word- and phrase-level prominence patterns were poorly aligned and the structures only loosely integrated. Note that these group differences in prominence integration have consequences for rhythmicity. This is because rhythm is largely defined by the spacing between cumulative prominences (e.g. Arvaniti, Reference Arvaniti2009). Since adults' pitch accents were always aligned with the lexically stressed syllables, the well-defined cumulative prominences created a strong sense of alternating rhythm (strong–weak or weak–strong) in adult speech. Not so for child speech. The absence of well-defined cumulative prominences, due to the temporal misalignment of word- and phrase-level prominences, means that each prominence was weaker and more evenly distributed across the phrase. The more even distribution of weak prominences results in the percept of a more evenly-timed rhythm. Thus, the local patterns of prominence production observed in the current study likely provide at least a partial explanation for developmental differences in global rhythm that have been remarked on in a separate, but related literature (e.g. Allen & Hawkins, Reference Allen, Hawkins, Yeni-Komshian, Kavanagh and Ferguson1980; Grabe, Watson & Post, Reference Grabe, Watson, Post, Ohala, Hasegawa, Ohala, Granville and Bailey1999; Payne, Post, Astruc, Prieto & Vanrell, Reference Payne, Post, Astruc, Prieto and Vanrell2012; Prieto, Vanrell, Astruc, Payne & Post, Reference Prieto, Vanrell, Astruc, Payne and Post2012).
CONCLUSION
To conclude, word- and phrase-level prominence patterns are temporally aligned in adult speech, and invariant across phrases varying in length and metrical structure, suggesting a robust prosodic template for counting. School-age children have yet to fully acquire prosodic structures that are functionally independent of lexical constituent patterns. The development of adult-like phrasal prosody includes the use of both duration and F0 to signal prominence, and the structural coordination of pitch accents with phrase boundary tones. It could be that children acquire word- and phrase-level prominence patterns separately and then integrate them slowly over a long period of time. While lexical stress is acquired relative early by children, the acquisition of adult-like phrasal prosody appears to require more substantial linguistic experience.