INTRODUCTION
Suprasegmentals are generally recognized to be acquired earlier than segmentals (i.e. consonants and vowels). In the literature on early phonetic development, prosodic development has received relatively little attention, especially for languages other than English. This study of infants learning a tonal language (Mandarin) had two goals: to examine whether the development of prosodic patterns in Mandarin-learning infants reflects universal or language-specific effects, and whether the distribution of prosodic patterns changes from early (babbling) to later (early words) stages of speech development, i.e. whether prosodic development is continuous or discontinuous. We studied early prosodic development by measuring variations in fundamental frequency rather than investigating tone acquisition per se because the latter would require knowledge of the target utterance, which is not possible in the case of babbling. Therefore, we adopted a broader phonetic approach to prosodic development (based on Hallé, de Boysson-Bardies & Vihman, Reference Hallé, de Boysson-Bardies and Vihman1991), applying the same methodology to babbling and to the transition to early words.
Universal characteristics in early development of prosodic patterns
Infants may passively acquire language using innate articulatory and auditory templates which are later modified by the linguistic environment (Locke, Reference Locke1983; Locke & Pearson, Reference Locke, Pearson, Ferguson, Menn and Stoel-Gammon1992). These original universal patterns may be modified as children develop sensitivity to sensory stimulation from their own production and to linguistic information from the environment. Similar to the claim that nature and nurture factors interact, common prosodic patterns have been reported at early developmental stages, and language-specific patterns have been found in studies on infants from different linguistic environments.
Systematic pitch control is developed and certain prosodic patterns are produced more often by children during the first year than other prosodic patterns (Delack & Fowlow, Reference Delack, Fowlow, Waterson and Snow1978; Kent & Murray, Reference Kent and Murray1982; Kent & Bauer, Reference Kent and Bauer1985; Robb, Saxman & Grant, Reference Robb, Saxman and Grant1989). Most available empirical data on prosodic patterns during the first year are based on f0 measurements of early non-cry vocalizations from a limited number of English-learning infants in longitudinal or cross-sectional studies. In these studies, the prosodic patterns most commonly found during the first year are rising–falling, falling and level contours. Among these, the rising–falling contour occurred most frequently in nineteen infants during the first year (Delack & Fowlow, Reference Delack, Fowlow, Waterson and Snow1978), in five infants aged 1 ; 1 (Kent & Bauer, Reference Kent and Bauer1985) and in seven infants aged 0 ; 8 to 2 ; 2 (Robb et al., Reference Robb, Saxman and Grant1989). In addition, level contour was most commonly produced in a cross-sectional study of twenty-one infants aged 0 ; 3, 0 ; 6 and 0 ; 9 (Kent & Murray, Reference Kent and Murray1982) and second-most commonly found during the first two years of life (Robb et al., Reference Robb, Saxman and Grant1989). Falling contour was the second-most frequently produced pattern after rising–falling or level contours in all of these studies, except that Robb et al. (Reference Robb, Saxman and Grant1989) reported no falling contour.
In general, prosodic patterns with falling endings (either rising–falling or falling) are more commonly found in infants' vocalizations during the first year than prosodic patterns with rising endings (either falling–rising or rising contours). Moreover, these studies do not concur on the relative order of occurrence for level and contour patterns.
Language-specific characteristics in early development of prosodic patterns
Regarding the influence of ambient language, both French and Japanese infants show language-specific prosodic patterns in their disyllabic vocalizations from ages 1 ; 3 to 1 ; 11 (Hallé et al., Reference Hallé, de Boysson-Bardies and Vihman1991). Specifically, French infants produced a majority of rising contours, and Japanese infants produced more level and falling contours than any other patterns. Furthermore, analysis of f0 contours showed a similar prominence of falling contours in Japanese spoken by adults and in the substitution patterns of Japanese-learning infants attempting adult words. Target falling contours were seldom replaced by rising contours (18%) although target words with rising contours were produced with 43% accuracy by Japanese infants. These findings suggested that children begin to acquire language-specific f0 contours between ages 1 ; 3 and 1;11 (Hallé et al., Reference Hallé, de Boysson-Bardies and Vihman1991).
This review reveals conspicuous lacunae in the literature on the development of prosodic patterns. First, no study has addressed whether the relative frequency of occurrence of prosodic patterns reflects language-specific influences or universal patterns. Second, only one study (Robb et al., Reference Robb, Saxman and Grant1989) investigated the order of frequency of various f0 contours during the transition from prelinguistic vocalizations to early lexical production.
Developmental continuity in early prosodic patterns
Most theories of child phonology, except for Jakobson's (Reference Jakobson1941/68) discontinuity view, suggest continuity between the phonetic and phonological structures of infant babbling and early speech. The neurobiological approach (Locke, Reference Locke1983) proposed three continuous stages in early phonological development: (1) the production of highly restricted, biologically given babbling patterns with communicative intent (ages 0 ; 6 to 1 ; 0); (2) the production of relatively stable perceived form from the linguistic environment (ages 1 ; 0 to 1 ; 6); and (3) the deviation from a babbling pattern to forms reflecting the adult system (after age 1 ; 6). Conveying a similar notion of developmental continuity, the biogenesis approach (Kent, Reference Kent, Ferguson, Menn and Stoel-Gammon1992) indicated that children continuously use their currently available motoric resources to achieve target forms. However, few empirical studies have investigated continuity in the development of prosodic patterns from babbling to early speech.
In one study, average f0 was shown to change little in seven English-learning infants from the preword stage (roughly age 0 ; 8 to 1 ; 1) through meaningful speech (age 1 ; 7 to 2 ; 2) (Robb et al., Reference Robb, Saxman and Grant1989). Moreover, average f0 remains stable within 300–550 Hz from age 0 ; 1 to 0 ; 9 or 1 ; 0 (Delack & Fowlow, Reference Delack, Fowlow, Waterson and Snow1978; Kent & Murray, Reference Kent and Murray1982; Kent & Bauer, Reference Kent and Bauer1985; Robb et al., Reference Robb, Saxman and Grant1989).
There are few studies addressing continuity in the acquisition of prosodic patterns in Mandarin, Taiwanese and other tonal languages. Most of the studies have focused on the development of lexical tones in the second year and beyond. Only two exceptions were found: Lin (Reference Lin1971) and Jeng (Reference Jeng1979). Early prosodic development was mentioned briefly and generally in Lin's (Reference Lin1971: 194) cross-sectional, longitudinal study with six Taiwanese-learning infants aged 0 ; 0 to 2 ; 3: lexical tones were ‘always’ imitated correctly, except that the end of the lower even tone was replaced by a rising ending. In addition, Jeng (Reference Jeng1979) asserted strongly that prosodic patterns are developed during the babbling stage and before the first word emerges (around age 1 ; 3).
Furthermore, Jeng suggested that after sufficient practice with various prosodic patterns during the babbling stage, children can immediately assign the correct tone to the first word acquired. This prediction was based on a longitudinal study of spontaneous vocalizations of Mandarin-learning infants aged 0 ; 2 to 1 ; 8 (Jeng, Reference Jeng1979). Although these infants did not acquire two of the four basic Mandarin tones – high–rising and falling–rising – until around age 1 ; 7 and tone sandhi rules in Mandarin were not acquired until around 2 ; 0, variations of prosodic patterns (falling–rising, level–rising–level, rising–falling, level–falling–level and falling contours) were observed before the first word was uttered. In addition, the children at ages 1 ; 3 to 1 ; 4 showed knowledge of tone phonemes for high–level, falling and rising tones by distinguishing either the production or perception of minimal words pairs differing only in tone. However, these two studies covering the first year did not systematically describe the frequency of occurrence of various prosodic patterns and the accuracy of these prosodic patterns in lexical items relative to the adult model. They offered only the general impression that infants produce various prosodic patterns quite early, before extensive acquisition of segments used to compose early lexical items.
Thus, further research is needed to systematically investigate the development of prosodic patterns at the transitional stage (from prelexical vocalizations to early speech). The findings from this research would provide information to test the continuity proposal, to clarify the relative developmental order of level and falling contour patterns, and to address the influence of ambient language on prosodic development.
This study examined the development of prosodic patterns in consonant–vowel (CV) utterances of twenty-four Mandarin-learning infants aged 0 ; 7 to 1 ; 6 at the transitional stages from the emergence of canonical babbling to the age of producing the first fifty words. The two primary research questions were: (1) Are prosodic patterns in Mandarin-learning infants identical to those in other language groups (reflecting universal constraints) or do these patterns reflect those of their major linguistic input? (2) Does the distribution of prosodic patterns change from the early through later stages of infant vocalizations?
METHODS
Design and subjects
For this cross-sectional study, twenty-nine infants were recruited by informal referral from the community surrounding Tainan City (Taiwan) and divided into two age groups: G1 (fifteen infants, 0 ; 7 to 1 ; 0, representing mostly babbling vocalizations) and G2 (fourteen infants, 1 ; 1 to 1 ; 6, mostly words). Of these twenty-nine infants, five (three in G1 and two in G2) did not meet criteria used to exclude extreme individual differences among participating infants (see Chen & Kent, Reference Chen and Kent2005). Thus, the final sample included twenty-four infants (twelve in each category). In addition to infant vocalizations, caregivers' speech in caregiver–child interactions was analyzed to study the phonetic characteristics of infants' language environments. Parents and caregivers used Mandarin when interacting with infants.
Data collection and analysis
Infants' vocalizations and adults' child-directed speech were audio-recorded with a DAT recorder (Sony TCD-D8) and a wireless lapel microphone (Telex ProStar R-10) during observations of their natural daily activities at home or in daycare centers. Both the infants' spontaneous vocalizations and adults' child-directed speech were transcribed, compiled and analyzed for frequency of occurrence of the major prosodic patterns.
Data were selected by five criteria. First, ‘speechlike’ sounds were broadly defined to exclude only vegetative or reflexive sounds (e.g. cries, coughs, breathing noises). Second, among all speechlike vocalizations, complex f0 contours were excluded by analyzing only those in short utterances. Third, all spontaneously produced f0 contours were transcribed and analyzed, while imitated data were not. Fourth, no distinction was made between babbling and early words, and infants' prosodic patterns were not analyzed in terms of accuracy with reference to attempted adult patterns. Fifth, all spontaneous f0 contours produced in sequences during all twenty-four infants' 45-minute recordings were transcribed. For longer recordings, the 45-minute section with the most f0 contours was transcribed and analyzed. Table 1 summarizes subject characteristics, words produced (based on parental report) and the number of f0 contours transcribed from the productions of infants and caregivers. Among infants, wide variations were found in the number of f0 contours transcribed (G1: median=118, mean=124·8, SD=78·5, range=37–279; G2: median=214, mean=291·2, SD=182·6, range=88–649) and of words produced (G1: median=3, mean=4·8, SD=4·7, range=0–13; G2: median=60, mean=51·8, SD=26·8, range=5–87).
TABLE 1. Subject characteristics including age, number of words produced and number of f0 contours per child and caregiver
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170207050707-65169-mediumThumb-S0305000908008878_tab1.jpg?pub-status=live)
The f0 contours in infants' vocalizations were perceptually transcribed using broad categories both in direction of major change (level/rising/falling) and in relative level (high/mid/low). The tonal categories of child-directed speech are also analyzed with the same broad categories. The adult Mandarin system has four basic tones: high–level, high–rising, low falling–rising and high–falling. In addition to these four basic tones, neutral tones are found in unstressed syllables, e.g. particles, suffixes, localizers. Neutral tones do not have tone value unless paired with other basic tones, and their tone values are determined by the preceding tones. However, the rules for neutral tones are not strictly followed in Mandarin spoken in Taiwan, where the Mandarin neutral tones commonly spoken are short mid–falling tones. Moreover, in Mandarin spoken in Taiwan (especially in reduplicated form in child-directed speech), a mid–level tone appears to be influenced by the tonal system of Taiwanese. Both the short mid–falling and mid–level tones occur much less often than the four basic tone categories in child-directed speech. Furthermore, the full-tone value of the low falling–rising tone is found only in slow and careful speech and only in sentence- or phrase-final positions. Its allophone, the low–falling tone, is more frequent than the full tone in the daily conversation of adults in Taiwan. These tonal categories of child-directed speech are generally grouped into three f0 contour patterns (level: high– and mid–level tones; rising: high–rising tone; falling: high–, mid– and low–falling tones) and three f0 levels (high: high–level, high–rising and high–falling tones; mid: mid–level and mid–falling tones; low: low–falling tone) for comparison with the broad categories of infants' early prosodic patterns.
A randomly selected 5-minute sample (11% of the total recording) for each infant was transcribed by a second transcriber to determine inter-transcriber reliability. Another randomly selected 5-minute sample for each infant was transcribed by the original transcriber to check intra-transcriber reliability. The inter-transcriber reliability (1052 syllables) and the intra-transcriber reliability (1245 syllables) for categories of major f0 change were 84% and 92% respectively, and for categories of relative f0 level were 86% and 95% respectively.
Median differences in the frequency of occurrence of prosodic patterns were compared by one-sample Wilcoxon tests. The distribution patterns of the various f0 contours between the infants' and adults' systems in child-directed speech (research question one) and between developmental stages (research question two) were compared by two-sample Wilcoxon tests. Correlations between variables were determined by Spearman's r. Median differences were examined rather than mean differences because the former are less influenced by outliers and are more valid measures of patterns from small samples with large individual differences. However, mean values are included to confirm distribution patterns. The rationale for choosing and dividing the age range, the criteria used to exclude extreme individual differences, data collection procedure, data analysis methods and statistical analysis have been previously described (Chen & Kent, Reference Chen and Kent2005).
RESULTS
Prosodic patterns in infant vocalizations and child-directed speech
In terms of f0 contour patterns (direction of major f0 change), falling contours occurred significantly more often in both infants' vocalizations and adults' child-directed speech than either rising or level contours (p<0·017, i.e. 0·05/3). Level contours were produced more frequently than rising contours in both infants' vocalizations and adults' child-directed speech. However, this difference was not significant in either the infants' or adults' data. The distribution patterns of these three f0 contours are similar in infants' and adults' child-directed speech (Table 2). No significant difference was found between infants' and adults' data in any category of f0 contour patterns. However, no strong or significant correlations were found between the contour patterns of these two groups in any individual category.
TABLE 2. Median and mean percentages of the major prosodic patterns in infants and caregivers (N=48)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170207040006858-0933:S0305000908008878:S0305000908008878_tab2.gif?pub-status=live)
* p<0·017 (0·05/3).
a The sum of mean percentages for both contour patterns and f0 levels is 100.
b Spearman's r (1-tailed) for the correlation between infants' and caregivers' prosodic patterns.
c Two-sample Wilcoxon test comparing medians (2-tailed).
In terms of f0 levels, high prosodic patterns were produced significantly more often in both infants' vocalizations and adults' child-directed speech than mid and low prosodic patterns (p<0·017, i.e. 0·05/3). Infants produced significantly more mid prosodic patterns than low prosodic patterns, whereas low prosodic patterns occurred significantly more often than mid prosodic patterns in child-directed speech. The difference in these two categories is also reflected in the significant differences (p<0·017, i.e. 0·05/3) between infants' and adults' data.
In summary, infants' vocalizations reflected adults' child-directed speech in several aspects: (1) falling contours occurred more frequently than level and rising contours; (2) the frequencies of level and rising contours did not differ significantly; and (3) high prosodic patterns were produced much more frequently than mid and low patterns. However, infants used significantly more mid prosodic patterns and fewer low patterns than adults.
Developmental changes in the production of prosodic patterns
Infants in G1 (aged 0 ; 7 to 1 ; 0) and G2 (aged 1 ; 1 to 1 ; 6) were generally similar in terms of prosodic development; they did not differ significantly in any corresponding prosodic category (Table 3).
TABLE 3. Median and mean percentages of the major prosodic patterns in two infant age groups (N=24)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170207040006858-0933:S0305000908008878:S0305000908008878_tab3.gif?pub-status=live)
p<0·017 (0·05/3).
a The sum of mean percentages for both contour patterns and f0 levels is 100.
b Two-sample Wilcoxon test comparing medians (2-tailed).
Among the three f0 contour patterns, falling contours occurred significantly more often than rising contours in both G1 and G2 (p<0·017, i.e. 0·05/3). Moreover, the occurrence of level and rising contours was not significantly different in either G1 or G2. However, although falling contours were produced significantly more often than level contours in G1, the difference found in G2 was not significant.
With respect to three f0 levels, high prosodic patterns were produced significantly more often than mid and low patterns in both G1 and G2 (p<0·017, i.e. 0·05/3). Moreover, the occurrence of mid and low patterns did not differ significantly in either G1 or G2.
In addition to these similarities, developmental changes were found between G1 and G2. The prosodic patterns produced by G1 and G2 infants showed a slight developmental tendency toward adult patterns. First, the high frequency of occurrence of falling contours in G1 decreased in G2, and the relatively low frequency of occurrence of level contours increased in G2. Second, low prosodic patterns were produced more often in G2 than in G1, showing a trend toward increased use of this category, similar to the adult model. Moreover, the mid prosodic pattern, often used in infants' productions in G1, was produced less frequently in G2, reflecting the low frequency of occurrence of this category in child-directed speech. Although these developmental changes were not supported by significant differences, possibly due to limited data and extensive individual differences, the trends observed could be verified in future studies.
DISCUSSION
Universal and language-specific patterns
To determine whether prosodic development in Mandarin-learning children reflects universal or language-specific effects, we compared our findings on the similarities and differences between infant and adult vocalizations with those of previous studies on other language groups. We found that prosodic development in Mandarin-learning children reflects both universal and language-specific effects.
The specific order of the frequency of prosodic patterns in our sample of infants suggests a developmental sequence observed in other language groups. In Mandarin-learning infants, falling contours occurred significantly more often than rising contours, similar to prosodic patterns found in English-learning infants during the first year (e.g. Delack & Fowlow, Reference Delack, Fowlow, Waterson and Snow1978; Kent & Bauer, Reference Kent and Bauer1985; Robb et al., Reference Robb, Saxman and Grant1989; Snow & Stoel-Gammon, Reference Snow, Stoel-Gammon and Yavas1994; Snow, Reference Snow1995). This is one example of a potentially universal characteristic. Nevertheless, a strict universal order of prosodic development is difficult to determine because of the limited data available in different studies and the various methodologies and criteria employed.
Moreover, Mandarin-learning infants in this study produced falling contours more frequently than level contours. The relative frequency of these two prosodic patterns was either not previously reported or the findings were contradictory (e.g. Delack & Fowlow, Reference Delack, Fowlow, Waterson and Snow1978; Kent & Murray, Reference Kent and Murray1982), but this pattern is consistent with the adult model in this study and in Cheng's data (Reference Cheng1982). Thus, our findings on prosodic development also appear to reflect patterns in the ambient language.
Developmental continuity
Regarding the empirical evidence for developmental continuity, falling contours occurred significantly more often than rising contours in both G1 and G2. This characteristic of prosodic development reflects a universal characteristic found in previous studies. In addition, two quantitative changes were seen from G1 to G2. First, the frequency of occurrence of the prominent prosodic category in G1 – falling contours – decreased in G2. Second, the frequency of occurrence of the level contour increased from G1 to G2. These changes balanced the distribution of frequency patterns across the three f0 contour patterns in G2. The decreased frequency of falling contours and the increased frequency of level contours from G1 to G2 might be a developmental trend reflecting the frequency of occurrence of these two contour patterns in child-directed speech. Likewise, the trends toward more low prosodic patterns and fewer mid prosodic patterns from G1 to G2 also reflect the patterns in adults' speech.
In conclusion, we found no evidence of a sharp discontinuity (as proposed by Jakobson, Reference Jakobson1941/68) between babbling (roughly corresponding to G1, with mostly babbling data) and the production of first words (roughly corresponding to G2, with more word production data). On the contrary, our quantitative measurements revealed significant similarities between infant data in the two age ranges studied. In addition, differences between the two age groups were part of a continuum of developmental changes that gradually brought infants' vocalizations closer to the adult model, similar to the first two continuous stages in the neurobiological approach (Locke, Reference Locke1983). The lack of a strong developmental effect may be due to the two groups not being very different in age, and some children in G2 resembling children in G1 in terms of word production. Furthermore, adult-like patterns of prosodic production were still developing in infants younger than age 1 ; 6 (the oldest age studied). Moreover, the variability in the frequency of occurrence of prosodic patterns did not decrease from G1 to G2 as shown by the standard deviations in Table 3.
As mentioned before, we examined median differences rather than mean differences, which are more influenced by outliers and are less valid measures of patterns in small samples with large individual differences. As shown in Tables 2 and 3, the median differences are more striking than the mean differences. Among these median differences, some are statistically significant, whereas others are not. These non-significant differences are likely due to the small samples and large standard deviations. These findings should be verified in further studies with larger samples.
In addition to these general findings, further longitudinal or cross-sectional studies are needed with smaller age ranges in each group to reveal more subtle developmental changes. For example, regression and reorganization of prosodic patterns has recently been found before age 1 ; 6 (Snow, Reference Snow2006). In addition to the frequency of occurrence of f0 contour categories, other measures should be incorporated to reflect finer developmental changes, e.g. the accent range of intonations (Snow, Reference Snow2006) and duration contrasts of tones (Tsay, Reference Tsay and Mineharu2001). Furthermore, distinguishing between the data for preintentional prosodic productions and for actively controlled prosodic patterns (or the accuracy of pitch control relative to target forms) would clarify the relationships between these two systems in the transitional stage.