INTRODUCTION
It has long been observed that onomatopoeia – that is, words which imitate real-world sounds, such as animal or engine noises – play a disproportionate role in many children's early words (Lewis, 1939; Stern & Stern, Reference Stern and Stern1928). Historically it was believed that these words occurred as part of the ontogenetic unfolding of language (Werner & Kaplan, Reference Werner and Kaplan1963); however, the basis for this view is exclusively theoretical. More recently, onomatopoeia have been discussed in relation to the sound symbolism bootstrapping hypothesis (Imai & Kita, Reference Imai and Kita2014), where again onomatopoeia have been assumed to provide a learning advantage in the early stages of language development. Still, no empirical evidence is put forward to support this theoretical discussion. A number of alternative proposals have been briefly considered, suggesting articulatory or phonetic motivations for the presence of these forms in infant speech (e.g. Kunnari, Reference Kunnari2002). However, the discussion of onomatopoeia in infant language development has remained largely inactive since Werner and Kaplan's contribution over fifty years ago. Accordingly, their theory endures as the generally accepted view on this topic (Laing, Reference Laing2014). This study will attempt to reinvigorate a dialogue on the presence of onomatopoeia in infant language from a new perspective, considering how onomatopoeia feature in the early input. Here we will observe the prosodic aspects of infant-directed speech with a specific focus on onomatopoeia in mothers’ speech to their prelinguistic infants. This analysis will shed light on the question of why infants often produce onomatopoeia among their early words (Laing, Reference Laing2014), when they occur so rarely in the adult language.
ONOMATOPOEIA IN INFANT SPEECH
Since as early as the mid-nineteenth century it has been proposed that onomatopoeia lie at the very beginnings of human language (Bonvillian, Miller Garber & Dell, Reference Bonvillian, Miller Garber and Dell1997). This early position corresponds to that of Werner and Kaplan (Reference Werner and Kaplan1963), whose work Symbol Formation remains one of the most influential explorations of infants’ “cognitive construction of the human world” (p. 13). Werner and Kaplan provided a detailed discussion of the importance of non-arbitrary sound–meaning links in the development of referential meaning, agreeing with early claims positing that onomatopoeia function as ‘stepping stones’ in language learning (Farrar, Reference Farrar1883). However, Ferguson (Reference Ferguson1964) rejected Werner and Kaplan's general thesis, stating that the assumption that “millions of children independently create items like choochoo and bow-wow instead of the hundreds of equally satisfactory onomatopoeias that could be imagined, is clearly unsatisfactory” (p. 104). Instead, Ferguson suggested that these forms are initiated by the adult during interactions with the infant.
We find Ferguson's theoretical position cogent. However, he does not attempt to account for the strikingly common occurrence of onomatopoeia in the early lexicon. Kern (Reference Kern2010) reports that onomatopoeia constitute over a third of French infants’ vocabularies between the ages of 0;8 and 1;4, and Menn and Vihman (Reference Menn, Vihman, Clements and Ridouane2011) found that onomatopoeia contributed to 20% of the first five words of forty-eight infants acquiring a range of ten languages. In another cross-linguistic analysis, Tardif, Fletcher, Liang, Zhang, Kaciroti, and Marchman (Reference Tardif, Fletcher, Liang, Zhang, Kaciroti and Marchman2008) observed that up to 40% of Cantonese-speaking infants’ first ten words were onomatopoeic, compared with just under 30% and 8·7% of American-English and Mandarin-Chinese infants’ early words, respectively.
Despite the general acknowledgement that infants produce a large proportion of onomatopoeia in their early words, few studies have directly considered this aspect of infant speech. Moreover, onomatopoeic forms are often disregarded in the linguistic analysis of early infant data (for example, Behrens, Reference Behrens2006; Fikkert & Levelt, Reference Fikkert, Levelt, Avery, Dresher and Rice2008), as they are considered to be meaningless or irrelevant when compared with the ‘conventional’ word forms of the developing infant, which continue to progress into the adult language; indeed, few suggestions alternative to that of Werner and Kaplan (Reference Werner and Kaplan1963) can be found in the developmental literature.
Onomatopoeia in the input
It is now widely accepted that language acquisition is led by the input. Phonological development has been shown to be driven by salient features of the ambient language (Vihman, Reference Vihman, Fougeron, Kühnert, D'Imperio and Valée2010; Vihman & Keren-Portnoy, Reference Vihman and Keren-Portnoy2013) – that is, features which stand out from or draw attention to the speech stream, making certain segments “especially attractive to infants” (Fernald & Kuhl, Reference Fernald and Kuhl1987, p. 290) – as well as by statistical regularities in input speech (Ambridge, Kidd, Rowland & Theakston, Reference Ambridge, Kidd, Rowland and Theakston2015; Pierrehumbert, Reference Pierrehumbert2003). The effect of onomatopoeia in the input can be seen in the combined findings of two studies by Kauschke and her colleagues (Kauschke & Hofmeister, Reference Kauschke and Hofmeister2002; Kauschke & Klann-Delius, Reference Kauschke, Klann-Delius, Gülzow and Gagarina2007). Kauschke and Hofmeister (Reference Kauschke and Hofmeister2002) show how the infant output responds to changes in the input: a decrease in use of onomatopoeia can be seen in both mothers’ and infants’ outputs over time. The authors see the production of onomatopoeic words in infants’ early language as a passing phase, as they increase as a proportion of the lexicon over the second year before being replaced by more conventional lexical items. Kauschke and Klann-Delius (Reference Kauschke, Klann-Delius, Gülzow and Gagarina2007) interpret this as being a result of the changing use of onomatopoeia in infant-directed speech: the vocabulary of German mothers was found to parallel that of their infants. Notably, Kauschke and Klann-Delius found that ‘personal-social words’, including onomatopoeia, decreased significantly in the infants’ input over time. The authors attribute this to the attention-getting function of these word forms, which is no longer needed once an infant can make use of a wider and more varied vocabulary. These findings suggest an interaction between the production of onomatopoeia in the speech of the infant and of the caregiver: Kauschke and Klann-Delius refer to the social–pragmatic role of these words, which are reported to be important in establishing early conversations. Furthermore, in her analysis of syllabification in Finnish infants’ language development, Kunnari (Reference Kunnari2002) comments on the production of onomatopoeia, which are found in her analysis to be produced more accurately than other word forms, and as such distort her wider findings. She suggests that onomatopoeia may be particularly prominent in the infant input when compared with “proper words” (p. 133), positing that this may be due to the especially salient pragmatic or prosodic features of these word forms.
IDS in the literature
It appears to be unanimously accepted in the literature that infant-directed speech is an important and functional aspect of infant language development. Lewis (Reference Lewis1936) describes the use of intonation to convey meaning in the absence of linguistic comprehension, stating that the “affective tone” (p. 121) of a word or phrase is what first establishes its meaning, prior to the development of lexical understanding. Even adults can correctly perceive communicative intent through the intonation contours of IDS (but not of adult-directed speech [ADS]; Fernald, Reference Fernald1989), demonstrating that “the melody carries the message in speech addressed to infants” (p. 1505).
While onomatopoeia are reported as being a lexical feature of IDS (Bornstein et al., Reference Bornstein, Tal, Rahn, Galperin, Pecheux, Lamour, Toda, Azuma, Ogino and Tamis-Lemonda1992; Ferguson, Reference Ferguson1964; Fernald & Morikawa, Reference Fernald and Morikawa1993), there has been no consideration of how these forms are presented to infants in the input. Indeed, much of the IDS literature focuses on the salient prosodic markers consistently found in IDS as compared with ADS (e.g. Fernald & Simon, Reference Fernald and Simon1984) – that is, those features which stand out more from the speech stream, and which are typical of ‘baby talk’ speech (higher pitch, wider pitch range, repetition, longer duration, and loudness). Many studies of IDS have found that adults routinely alter the prosodic features of their speech style when addressing young infants; this has been shown to be consistent across both mothers and fathers (Fernald, Taeschner, Dunn, Papousek, De Boysson-Bardies & Fukui, Reference Fernald, Taeschner, Dunn, Papousek, De Boysson-Bardies and Fukui1989), as well as adults without experience of speaking to infants (Fernald, Reference Fernald1989), and towards infants across a range of ages (Stern, Spieker, Barnett & Mackain, Reference Stern, Spieker, Barnett and Mackain1983). IDS appears to be ubiquitous in the early input, and is thought to benefit language development in its early stages not only through capturing infants’ attention (Vihman, Reference Vihman2014), but also through drawing the infant towards specific functional elements of the speech stream (Lee et al., Reference Lee, Davis and Macneilage2008). Lewis (Reference Lewis1936) remarks on the “strong affective character” (p. 42) of speech directed at young infants, and more recent empirical research supports Lewis’ claims: Smith and Trainor (Reference Smith and Trainor2008) found that infants’ positive feedback to IDS reinforces their caregivers’ use of higher pitch. Indeed, infants are known to prefer the salient features of IDS over ADS, including higher mean pitch (Fernald & Kuhl, Reference Fernald and Kuhl1987), wider pitch range, shorter utterances, longer pauses, and repetition (Fernald & Simon, Reference Fernald and Simon1984).
Furthermore, the features of IDS are claimed to facilitate word segmentation (Golinkoff & Alioto, Reference Golinkoff and Alioto1995; Jusczyk, Hirsh-Pasek, Nelson, Kennedy, Woodward & Piwoz, Reference Jusczyk, Hirsh-Pasek, Nelson, Kennedy, Woodward and Piwoz1992), and evidence linking experience of IDS with eventual word learning has shown an advantage for IDS: in a word segmentation task, Floccia and colleagues (Reference Floccia, Keren-Portnoy, DePaolis, Duffy, Delle Luche, Durrant, White, Goslin and Vihman2016) showed that British infants aged 0;10 were able to learn novel words when presented in an ‘exaggerated IDS style’ but not in typical, non-exaggerated IDS. Brent and Siskind (Reference Brent and Siskind2001) found an important link between words presented in isolation and early production, as infants were shown to learn words which had been presented in isolation in the input earlier than non-isolated words. Finally, Golinkoff and Alioto (Reference Golinkoff and Alioto1995) went some way towards demonstrating bootstrapping effects of IDS for language learning with their findings on English-speaking adults, who were better able to learn Mandarin Chinese words in IDS than in ADS when these were presented utterance-finally, though target words in utterance-medial position showed no significant effect of speech style.
Taken together, this evidence demonstrates a role for IDS throughout the language development process. Moreover, IDS is thought to facilitate acquisition at all stages of language learning, and it has been found that the characteristics of IDS change as is appropriate to the infant's developing ability (Fernald & Morikawa, Reference Fernald and Morikawa1993). Evidence from the literature demonstrates how specific features of IDS can lead to language learning (Brent & Siskind, Reference Brent and Siskind2001; Golinkoff & Alioto, Reference Golinkoff and Alioto1995), and so it seems pertinent to relate the use of IDS to features that are commonly found in infants’ early lexicons. Many studies in this field focus on infants’ perceptual preference for IDS (e.g. Fernald & Kuhl, Reference Fernald and Kuhl1987; Karzon, Reference Karzon1985), or on typical features of IDS as produced by the caregiver (Lee et al., Reference Lee, Davis and Macneilage2008; McMurray, Kovack-Lesh, Goodwin & McEchron, Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013; Werker, Pons, Dietrich, Kajikawa, Fais & Amano, Reference Werker, Pons, Dietrich, Kajikawa, Fais and Amano2007); while these aspects of IDS are illuminating in themselves, they are somewhat abstracted away from the infant's eventual language production. Here we ask how the reality of the input can be related to our understanding of infants' early lexical development: Might it be the case that onomatopoeia are produced more saliently in the input than non-onomatopoeic words?
Onomatopoeia and IDS
Parallels have already been established between an infant's word production and the early input provided by the mother (Kauschke & Hofmeister, Reference Kauschke and Hofmeister2002; Kauschke & Klann-Delius, Reference Kauschke, Klann-Delius, Gülzow and Gagarina2007), and it has been suggested that onomatopoeic word forms have particular prosodic characteristics due to the fact that they are intended as ‘sound-effect words’. These characteristics may cause onomatopoeia to gain infants’ attention more successfully. The present study considers the use of onomatopoeia in IDS, using acoustic analyses of mothers’ interactions with their infants to pinpoint the prosodic characteristics of onomatopoeia in relation to the rest of the input. The analysis will show that onomatopoeia are especially salient; through their limited context in use as a lexical feature of ‘baby talk’, onomatopoeia possess features that render them more salient in the infant input than those words which continue to develop as part of the adult language. These empirical findings prompt us to reconsider the theoretical perspectives posited by Werner and Kaplan (Reference Werner and Kaplan1963) and Imai and Kita (Reference Imai and Kita2014), and provide new evidence supporting an input-based approach to infants’ acquisition of onomatopoeia, which corresponds to findings from the wider developmental literature.
The current study
The goal of this study is to examine the nature of caregivers’ production of onomatopoeic words (OWs) in the early input, through an analysis of the relative salience of OWs in IDS. Based on a sample of parental input to 8-month-old infants, we analyze the prosodic features of onomatopoeic words (e.g. woof woof) in relation to their equivalent conventional words (CWs, e.g. dog). Here we hypothesize that the status of OWs as ‘sound-effect words’ leads them to be prosodically more salient than non-onomatopoeic words. Features that are often cited in the literature as being typical of IDS will be examined (Brent & Siskind, Reference Brent and Siskind2001; Fernald & Kuhl, Reference Fernald and Kuhl1987; Soderstrom, Reference Soderstrom2007); these features are expected to be especially exaggerated in the production of OWs. This includes the use of higher pitch and wider pitch range to imitate the sounds in question (for example, meow compared with cat), as well as longer vowels (as in moo or baa) leading to extended word duration. The presence of reduplication in OWs (Ferguson, Reference Ferguson1983) is expected to increase the number of individual tokens of these forms in the input (for example, quack is often reduplicated while duck is not likely to undergo reduplication). Finally, the grammatical status of OWs, or rather, their lack of any clear syntactic role in speech, should lead these forms to be presented in isolation more often than their equivalent CWs. More precisely, we hypothesize that:
-
1. Pitch is modified to result in an increased salience of OWs over CWs: mean pitch is higher and pitch excursions wider in the production of OWs.
-
2. Word duration of OWs is longer than CWs.
-
3. OWs are produced more frequently than CWs owing to reduplication.
-
4. Pauses are longer and more frequent before and after the production of OWs than CWs; OWs will appear in isolation more frequently than CWs.
It is assumed that the combination of these features will lead OWs to be more salient across the board than their CW counterparts. This will provide an input-based perspective for the high number of OWs reported in early infant speech (Menn & Vihman, Reference Menn, Vihman, Clements and Ridouane2011; Tardif et al., Reference Tardif, Fletcher, Liang, Zhang, Kaciroti and Marchman2008).
METHOD
Participants
Data collected for a previous study was used for this analysis (DePaolis, Keren-Portnoy & Vihman, Reference DePaolis, Keren-Portnoy and Vihman2010). Recordings of twelve British mothers interacting with their infants were analyzed. Participants were all based in Yorkshire, in the UK, and were recruited through an advert in a local magazine. At least one parent of each infant held the equivalent of an undergraduate degree from a college or university. The infants (four females) were aged 0;8 (mean age = 256·6 days) and had passed a newborn hearing screening; no hearing problems were reported. All infants were either first-born or had no pre-teen siblings.
Apparatus
Data were collected using a Language Environment Analysis (LENA) digital language processor – a recording device placed in a vest worn by the infant. The mother was asked to ‘read’ with the infant once each day over a weekend: two picture-books – Home (Priddy Books, 2009a) and Toys (Priddy Books, 2009b) – were supplied by the experimenters.
Stimuli
The recordings of the mothers reading the two picture-books were analyzed in this study. The mothers were asked to talk their infants through each of the books, which presented a series of colourful pictures and their corresponding labels (one word and picture per page). Text in the picture-books was minimal, allowing the mothers’ speech to be unscripted and spontaneous, while also providing some lexical consistency across participants. The original experiment did not target onomatopoeic forms in any way, and so mothers were not specifically prompted to use onomatopoeia in the book-reading activity: all onomatopoeic words were produced spontaneously. Importantly, none of the labels presented in the books were onomatopoeic words, though the books contained images of toys and household objects which could elicit onomatopoeic productions from the mothers, including a rubber duck, a train, a car, and a jigsaw featuring images of farmyard animals.
Analysis
OWs and their corresponding CWs produced by mothers during the book-reading task were analyzed. A word was considered to be onomatopoeic if it served to imitate the sound of an object in the context of the book-reading task. For example, the mothers used typical OWs such as meow to imitate a cat, but also used less typical forms such as boing and brrring to imitate a ball and a bicycle, respectively: in the context of the book-reading task these words were both considered to be onomatopoeic.
Every instance of an OW and its corresponding CW (e.g. woof and dog; see Table 1) were extracted from the recordings using Praat 4·5·02. Unpaired stimuli, whereby an OW was produced in the absence of production of at least one corresponding CW in the same recording, and vice versa (quack occurring without duck, or ball without boing), were excluded from the analysis, in order to ensure that pairwise comparisons could be made for each mother across matched OW and CW forms. Wherever both OW and CW forms appeared in the same recording, whether together or in separate contexts, they were considered a pair. The set of OW-CW pairings included in the study is detailed in Table 1, along with the stimulus name for each pairing (in small capitals).
As is typical in IDS (Sundberg, Reference Sundberg1998), many instances of OWs were reduplicated in the recordings (e.g. woof woof). With this in mind, reduplicated OWs were analyzed as single units in cases where there was a pause of less than 200 ms between tokens, while pauses of more than 200 ms marked a new token even in cases of multiple reduplication. This is shown in (1), where numbers in brackets indicate pause duration (in seconds):
-
(1) M1| it's a duck (3·45)
M2| quack quack (2·32) quack quack (2·12)
Although the token quack is reduplicated four times in this example, for the purposes of this analysis this counts as a repetition (or two tokens) of quack, each with an instance of reduplication. This approach takes into account the typical characteristics of established onomatopoeic sequences, which often include reduplicated segments (e.g. quack quack, woof woof), while also acknowledging reduplication as a typical feature of infant-directed speech (Sundberg, Reference Sundberg1998). On a methodological level this also makes for a more conservative measure of word duration, as the presence of any pauses between repeated forms does not affect the duration measurement of individual (reduplicated) tokens.
Praat was used to measure mean pitch, pitch range, and duration for each of the stimuli, as well as pauses separating the stimuli from surrounding speech. Measurements were taken from word onset to offset, including aspiration of word-final consonants where appropriate. Pitch traces were cross-checked by the first author to ensure that they corresponded to the audio data, and any errors were corrected manually in Praat. Measurements for every individual OW and CW token were recorded. Transcriptions were also made of the utterances containing the OWs and CWs used in this analysis, and pauses were recorded in order to establish word use in isolation. As in Brent and Siskind's (Reference Brent and Siskind2001) analysis, words were considered to be fully isolated if they were separated from other words in the speech stream by a pause of at least 300 ms on both sides. Partially isolated words were identified as words with a 300 ms pause preceding or following, but not both. Linear mixed effects models were generated in R (R Core Team, 2014) to analyze how word type (OW vs. CW) affects the prosody of mothers’ speech across the dataset. The lmer() function in the lme4 package (Bates, Maechler & Bolker, Reference Bates, Maechler and Bolker2012) was used; this allowed us to consider the expected variability across speakers and stimuli, notably with regard to pitch (for example, a higher pitch is expected in the production of choo choo than woof woof). By-subject random slopes were included in all analyses, but by-item random slopes were omitted, since each mother produces a different set of OW-CW pairs. P-values were obtained using likelihood ratios to compare the full model with the effect in question against the model without the effect in question. Post-hoc t-tests were used to follow up these results where appropriate, to break down the analysis by subject or by item. All reported t-tests are two-tailed, and all non-normally distributed data (both OW and CW tokens) were normalized using a log10 transformation. Parametric tests were therefore used for all analyses.
RESULTS
OW production across mothers
On average, 20 minutes and 12 seconds of recording were available for each mother (min = 5 minutes, 25 seconds, max = 40 minutes, 20 seconds) from the book-reading task, from a total of 31 separate recordings (mean = 2·58 recordings per mother). The mother with the shortest recording produced 8 OWs in total and 10 corresponding CWs, while the mother with the longest recording produced 17 OWs and 39 CWs. Given the difference in recording time of almost 35 minutes across mothers, a Pearson product-moment correlation coefficient was used to analyze the distribution of OWs in the data; this indicated that there was no correlation between duration of recording and number of OWs produced by the mothers (r = ·012, n = 12, p = ·971).
The frequency of production of each OW and CW is detailed in Table 2. As shown here, the production frequency per each stimulus of OWs and CWs was almost identical, in terms of both the number of mothers that produced each of the forms and the number of times they produced them. While the use of OWs was highly variable across different mothers, all of the mothers produced at least two of the OW-CW pairs listed in Table 1 (max = 11, min = 2, mean = 5·17). Furthermore, seven of the twelve mothers produced at least five of the pairs, providing a large pool of stimuli for comparison. A Shapiro–Wilk's test confirmed normality for word duration and mean pitch for both OW and CW stimuli across mothers (word duration: OW p = .289, CW p = ·506; mean pitch: OW p = ·169, CW p = ·735), as well as for pitch range for CWs (p = ·735), though not for OWs (p = ·014).
notes: ‘Mothers’ relates to the number of mothers who produced each stimulus; ‘tokens’ relates to the number of times each stimulus occurred across all mothers’ data.
Pitch
A linear mixed effects model compared mean f0 values across OW and CW stimuli. Word type (OW or CW) was included as a fixed effect, with subject and item (target word) as random effects and by-subject random slopes for the effect of word type. OW stimuli had a significant impact on the production of the target word (χ 2 (1) = 4·507, p = ·034), increasing mean pitch by about 65 Hz (see Figure 1).
Pitch range was then compared across OW and CW stimuli, and OWs were found to be produced with a significantly wider pitch range (χ 2 (1) = 5·32, p = ·021), with an average increase of around 30·5 Hz in the OW condition (see Figure 2).
Word duration
It was expected that OWs would be longer than their respective CWs, due to the fact that OWs are commonly produced with reduplication (e.g. quack quack). Indeed, of the 216 instances of OWs produced, 84% (n = 181) were reduplicated, with all but two instances undergoing full reduplication. The dataset did not contain any reduplicated CWs. While there were some cases of extensive reduplication across tokens (for example, OW bee was reduplicated 25 times in one instance), the vast majority of OWs (71%) were reduplicated twice. Cat and horse were the only two OWs to feature no reduplication across the full dataset; in contrast, OWs dog and ball were always reduplicated.
A linear mixed effects model compared word duration across OWs and CWs. Duration was measured as the dependent variable, with word type as a fixed effect, subject and item as random effects, and by-subject random slopes for the effect of word type. OWs were found to be significantly longer in duration than CWs (χ 2 (1) = 15·165, p < ·000); mean duration values show the OW stimuli to be 659 ms longer than CW stimuli on average but, as shown in Figure 3, there is wide variability in OW duration. A median value shows OWs to be on average only 69 ms longer than CWs. It is not clear whether this extended word duration is due to reduplication or to vowel or consonant lengthening.
An exploratory analysis considered OWs separately to observe whether the presence of reduplication had any effect on the duration of these forms. A linear mixed effects model with word duration as the dependent variable and reduplication as a fixed effect (including subject and item as random effects and by-subject random slopes) showed no effect for reduplication on the duration of OWs, though this result was close to significance (χ 2 (1) = 3·657, p = ·056). Reduplicated OWs were on average around 402 ms longer than non-reduplicated forms.
Finally, it was proposed that the observed higher pitch range of OWs may be related to their longer duration. A Pearson product-moment correlation coefficient revealed a highly significant correlation between pitch range and word duration across all OW and CW tokens in the dataset (r = ·251, n = 444, p < ·000). In order to account for this, rate of pitch change was calculated across all targets by dividing the pitch range of an individual token with its duration in milliseconds; this takes into consideration the change in pitch across a word in terms of its duration. A Shapiro–Wilk's calculation showed a non-normal distribution for rate of pitch change across OWs (p < ·000), and so this measure was normalized in R using a log10 transformation. A linear mixed effects model with rate of pitch change as the dependent variable showed a significant difference between OW and CW production (χ 2 (1) = 7·375, p = ·007); rate of pitch change was significantly higher across CWs than OWs by around 400 Hz/second.
Repetition and reduplication
It was proposed in Hypothesis 3 that OWs may occur more often than CWs, owing to the presence of reduplication. However, as noted above, many instances of OWs were found to be repeated, whether reduplicated or not. Repetition was thus considered alongside reduplication in order to account more fully for any frequency effects. The definition of reduplication used here (see above) does not account for the extent to which OWs are repeated in full within close temporal proximity. Fifty-eight percent (n = 126) of the OWs produced in the dataset – both reduplicated ‘clusters’ such as woof woof as well as those without reduplication such as meow – are repeated in immediate proximity to another token of the same OW (with or without reduplication), separated only by a pause. Furthermore, 87% of all OWs in the dataset occur with either reduplication or immediate repetition; that is, nearly all OWs occur directly next to another instance of the same word. Importantly, 45% of OWs are both reduplicated and repeated within the same utterance (see Example (1), M2, above), thus providing multiple tokens of the same word type, one after the other. In contrast, only one instance of direct repetition can be found across all 226 CWs, and there are no reduplicated CWs in the dataset.
A generalized linear mixed effects model was generated using the glmer() function in R to account for the binomial distribution of this data (repeated vs. non-repeated). Use of repetition was included as the dependent variable, with word type as the fixed effect, subject and item as random effects, and by-subject random slopes. Unsurprisingly, repetition featured significantly more often in OW production (χ 2 (1) = 28·61, p < ·000).
Multiple contiguous productions (including both repetition and reduplication, hereafter ‘repeats’) were then considered in terms of the mean pitch, pitch range, rate of pitch range, and duration of OWs, to determine whether the extensive use of repeats in OW production brought about any prosodic changes in the mothers’ production of these forms. Four linear mixed effects models considering the OW data only were carried out in R, with mean pitch, pitch range, rate of pitch change, and word duration as the four dependent variables, each with repeats as the fixed effect (repeat vs. no repeat) and target word and subject as random effects. By-subject random slopes were also included for the effect of repeats. No effect was found for any of the four measures (mean pitch: χ 2 (1) = 0·852, p = ·36, pitch range: χ 2 (1) = 0·674, p = ·41, rate of pitch range: χ 2 (1) = 0·51, p = ·48, word duration: χ 2 (1) = 0·041, p = ·84).
Isolated words
Pauses before and after all OWs and CWs in the dataset were analyzed to account for fully isolated (pauses before and after the word) and partially isolated words (pauses either before or after the word). As detailed above, a pause was considered for analysis if it measured 300 ms or more in duration.
OWs occur in isolation more often than CWs: 53% (n = 114) of OWs produced in the dataset appeared in full isolation, while only 5% of CWs (n = 11) were fully isolated. A generalized linear mixed effects model with isolation (isolated vs. non-isolated) as a dependent variable and word type as the fixed effect showed that OWs were produced in isolation significantly more often than CWs (χ 2 (1) = 15·306, p < ·000). A further 94 OWs (44%) were found to be partially isolated. The same generalized linear mixed effects model, this time with the inclusion of partial as well as full isolation in the dependent variable (full or partial isolation vs. no isolation), again showed OWs to be produced significantly more often in full or partial isolation than CWs (χ 2 (1) = 26·722, p < ·000). In total 97% of OWs were produced in at least partial isolation compared with 44% of CWs. Figure 4 shows the percentage distribution of use in isolation across OWs and CWs.
The distribution of word-initial and word-final pauses in partially isolated words can be accounted for in terms of trends in OW and CW production that are observed throughout the data. A breakdown of these pause types showed word-final pauses to be more common following CWs than OWs: on average, 44% of all CWs were produced with a word-final pause, compared with 23·5% of OWs. This trend can be attributed to a specific speech-style that the mothers use in addressing their infants, whereby both OWs and CWs are produced within syntactic ‘frames’. Some typical examples can be seen in (2) to (4) (CWs are highlighted in bold):
-
(2) Joshua
M1| a buzzy bee (0·26) bzbzbzbzbzbz (0·79)
M2| and a duck (0·69) quack quack (0·69) quack quack (1·69)
M3| and a cat (0·49) meow (1·31)
M4| and a dog
-
(3) Lily
M1| that's a duck (0·51) quack quack (0·27)
M2| and a sheep (0·19) baa (0·52)
M3| s'a pig (0·22) oink oink (0·82)
M4| s'a cow (0·63) moo (0·81) moo (1·59)
M5| there's a bowl
-
(4) Warren
M1| is that a duck (0·41) quack quack quack (0·76)
M2| quack quack (0·76) quack quack (3·6)
M3| it's a bicycle (1·83)
M4| bicycle (0·16) bring bring (.) bring bring (.)
M5| bring bring (0·57) there's a
As shown in these examples, all three mothers use the same syntactic structure when engaging with their infant in the picture-book reading activity. Word-final pauses appear to be common across CWs, as they occur after a repeated existential phrase (there's a, and a, [it]’s a) and are followed by a corresponding OW, which is produced in isolation on the back of the word-final pause. Furthermore, all three examples show the use of reduplication and repetition of OWs, whereas (4) is the only example containing repetition of a CW, which in this instance is produced in isolation – the only instance of direct CW repetition in the dataset. While our primary aim is to consider the prosodic features of OW production here, the apparent syntactic patterning of OWs and CWs as shown in these examples may be an important feature of OW-production in IDS. Accordingly, the distribution of OWs and CWs on a syntactic level will now be considered.
Proximity
Following the analysis of OWs and CWs produced in isolation we observed a pattern in mothers’ production of OW and CW combinations, as shown in examples (2) to (4) above. In many cases the mothers produced CWs in immediate proximity to their corresponding OWs; it seems that OWs are rarely produced without their corresponding CW. An analysis of OW-CW proximity, if it proves consistent across the dataset, might add an important insight into the use of OWs.
A ‘proximity score’ was calculated from every OW to its nearest corresponding CW, whereby the number of words produced between the OW and the CW was counted for each OW in the dataset. (For example “a train that goes choo choo” would have a proximity score of 2, as there are two words between the OW and the CW.) As some CWs were produced in a context without the OW counterpart in close proximity (but not vice versa), the initial analysis was based on OW rather than CW production.
Of the 216 OWs analyzed in the full dataset, 194 (90%) were found to occur within 10 words of the corresponding CW (M = 0·77 words), and over half (n = 127) were produced immediately next to the corresponding CW. Again this gives evidence of a routinized approach to OW production: these forms appear to depend on the presence of a CW. When the analysis is reversed to consider the proximity of OWs to CWs, the figures are less illuminating but show the same trends. Seventy-four percent of CWs are produced within 10 words of a corresponding OW (M = 1·6 words), and 81 of these (36% of all CWs in the dataset) occur immediately next to the OW in the mothers’ speech. Here we see that CWs do not necessarily occur with their corresponding OW, but mothers do produce the accompanying OW form in the majority of cases.
Individual OW forms
Finally, we must acknowledge the variability across the mothers’ production of the individual OW forms. Since the production of OWs involves the stylized imitation of non-human sounds, prosodic effects vary in reference to individual word forms: in fact, a wider pitch range or higher pitch may not always be appropriate. As shown in Figures 5a–c, a particular pitch contour may be implicit in the production of a specific OW, such as monotonal high-pitched brring brring (telephone) compared with a rising variable pitch in ribbit (frog) or a falling variable pitch in neigh (horse): here we see pitch being used variably to represent the OW in question. This accounts for the variability observed in Figures 1 and 2, as well as, to some extent, the use of reduplication in some OWs (e.g. woof woof, quack quack) but not others (e.g. neigh, meow).
DISCUSSION
Our results confirm the four hypotheses set out in the ‘Introduction’: OWs were produced more saliently than their CW counterparts in relation to pitch (both mean pitch and pitch range – Hypothesis 1), duration (Hypothesis 2), frequency (Hypothesis 3), and word isolation (Hypothesis 4). This analysis has thus shown that mothers’ production of OWs is more salient across the board than their production of the corresponding CWs. Furthermore, we observed some important trends in the stylistic features of OW production: proximity of OW-CW pairings was found to be an important feature of OW production, as OWs occurred almost exclusively in close proximity to – often immediately next to – their CW counterpart. Finally, the idiosyncratic nature of individual OW forms and the sound effects that typically accompany them were found to influence the various prosodic features used in mothers’ production of these forms.
OWs were found to be more salient than their CW counterparts with regard to both f0 and pitch range, giving OWs special prominence in the infants’ input. However, the analysis of pitch range gave mixed results: while OWs featured wider pitch excursions than their CW counterparts, their increased duration appeared to account for this. Indeed, rate of pitch change was higher in the CW forms when duration was controlled for, demonstrating the dynamic effect of production on prosody, which was found to be dependent on multiple factors, not only on the lexical status of the word in question. Nevertheless, considering the infant's experience of OWs, absolute pitch may be a more appropriate measure to adopt here, since the combination of longer words and wider pitch excursions undoubtedly serves to increase their salience.
Word duration was also found to be more extended for OWs than CWs, although we were not able to identify the precise nature of this trend – both reduplication and vowel/consonant lengthening seemed likely to be playing a role. Reduplication was not consistent across all stimuli – no instance of cat or horse was reduplicated – yet all targets exhibited longer OW than CW forms. Two important features of OWs appear to be at play here: increased word duration, which is among the most commonly reported characteristics of IDS and which applies to an even greater extent to OWs than to CWs, and reduplication, which is typical of onomatopoeia in general. Together, the use of repetition and reduplication in the production of OWs brings about an increased presence in the input: repetition is reported as one of the typically salient features of IDS (Brent & Siskind, Reference Brent and Siskind2001; Fernald & Kuhl, Reference Fernald and Kuhl1987), yet there was only one example of CW repetition in the entire dataset. We also see here how OWs have a frequency advantage owing to the common reduplication and repetition of these forms. Frequency is cited as having an important role in language acquisition in general (Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015), and the close proximity of repeated or reduplicated OW tokens no doubt adds to this.
Taken together, these results provide a new perspective on onomatopoeia in early language development, which presents an alternative to the general approach positing an advantage for non-arbitrary sound–meaning correspondences (Imai & Kita, Reference Imai and Kita2014; Werner & Kaplan, Reference Werner and Kaplan1963). This study has presented empirical evidence to show that OWs stand out from the input more prominently than their CW alternatives; this can be assumed to contribute to infants’ early acquisition of these forms, as observed in numerous studies of early lexical development (Kern, Reference Kern2010; Menn & Vihman, Reference Menn, Vihman, Clements and Ridouane2011; Tardif et al., Reference Tardif, Fletcher, Liang, Zhang, Kaciroti and Marchman2008). Indeed, Werner and Kaplan's (Reference Werner and Kaplan1963) review overlooks the role of the input in infants’ early experience of language: Leopold's (Reference Leopold1939) account of his daughter's language development is repeatedly cited in Werner and Kaplan's (Reference Werner and Kaplan1963) analysis, yet Werner and Kaplan fail to acknowledge the author's descriptions of his daughter's input. For example, they report Hildegard Leopold's use of “sch, sch, sch!” for both car and train (1939, p. 121), yet they do not mention the fact that her grandfather used this form in games relating to trains. While the proposal that infants are more easily able to connect sound and meaning in onomatopoeia may be theoretically appealing, it disregards the reality of language learning, which must heavily depend on infant experience of onomatopoeia in the input.
When these findings are considered with regard to the wider IDS literature we can establish a functional role for all of the features analyzed in this study. As Fernald and Kuhl (Reference Fernald and Kuhl1987) show, young infants tend to prefer the exaggerated pitch contours of IDS, which have been found to attract infants’ attention more readily than the pitch features found in ADS (Fernald, Reference Fernald1985). Furthermore, an eye-tracking study by Laing (Reference Laing2015) shows how attention to OWs may be maintained as a result of their salient pitch features, as those OWs with the highest pitch were found to elicit longer looking times than OWs with less-distinctive pitch contours. On this basis it can be presumed that the further increase in salience of OWs in terms of mean pitch and perhaps also pitch range causes these forms to attract infants’ attention over the less-salient CWs.
Gervain, Macagno, Cogoi, Peña, and Mehler (Reference Gervain, Macagno, Cogoi, Peña and Mehler2008) have shown that within-word repetition (or reduplication) is advantageous in language processing: neonates were able to distinguish between words which contained repetitions (AAB words, such as mubaba) and those that did not (ABC words, as in mubage), but the results did not hold when those repetitions were not directly sequential (i.e. when an ABA word such as bamuba was contrasted with an ABC word). The authors suggest that there may be a “perceptual repetition detector” (p. 14226) at work in early language processing, which may facilitate the acquisition of forms containing repetition. This is supported by numerous studies showing infants’ use of consonant harmony and reduplication in early production (e.g. Ferguson, Reference Ferguson1983; Laing, Reference Laing2015, Ch. 2; Vihman, Reference Vihman2016). Finally, in a longitudinal analysis tracing mothers’ use of IDS to their infants’ eventual word production, Brent and Siskind (Reference Brent and Siskind2001) demonstrate that the use of isolated words in IDS impacts directly upon infants’ eventual word production, showing that framing words with pauses facilitates their acquisition.
We must also bear in mind, however, that this study is based on a sample of only twelve mother–infant dyads, interacting over a very short stretch of time. While the mothers made consistent use of OWs in using picture books to elicit interactions, it is impossible to ascertain just how common mothers’ production of OWs may be in infants’ input more generally. Longitudinal data which observes infants’ eventual word production would be required to make empirical claims regarding infants’ eventual OW production. Of course, the early input is just one of many aspects of the social, developmental, and production experience necessary for language development.
Why might OWs lend themselves to being produced with more salient prosody than CWs? The first point to consider is the nature of onomatopoeia as sound effects; in many cases, they are produced in an attempt to imitate a real-world sound. Thus, the use of more salient features such as high pitch and extended duration may be automatic in certain situations such as book-reading or toy play; these features may be unusually salient in human speech owing to the nature of the real-world sound in question (see Figures 5a and 5c). The fact that these forms are largely absent from the adult language could also be advantageous for IDS, since the prosodic conventions that normally govern adult-directed speech do not apply.
The consistency with which the mothers in this study paired OWs with the corresponding CWs may reflect doubts as to the status of OWs in the adult language and whether they are words in their own right. This may also explain their predominant use in isolation, as onomatopoeia have no conventional grammatical role, serving instead as embellishments to an appropriate phrase or word form.
Finally, in interactions with 8-month-olds, when the infant typically cannot respond verbally to the input, OWs provide caregivers with the lexical and prosodic variety with which to engage the infant. Positive infant engagement has been found to reinforce mothers’ use of higher pitch contours in IDS (Smith & Trainor, Reference Smith and Trainor2008), and the use of OWs in this study appears to have had a similar effect on mother–infant interactions. Accordingly, infants’ responses to the task during the data collection anecdotally demonstrate their engagement: although none of the infants were yet able to speak, many made noises and cries of excitement during the mothers’ production of OWs. One infant even appeared to produce the word quack when the mother was talking about the picture of the duck – the only comprehensible word produced by any of the infants in these recordings. This brings us back to the findings of Kauschke and colleagues (Reference Kauschke and Hofmeister2002, Reference Kauschke, Klann-Delius, Gülzow and Gagarina2007), and their acknowledgement of the attention-grabbing function of OWs. Our results show that onomatopoeia – considered to be a lexical feature of IDS (Ferguson, Reference Ferguson1964; Fernald & Morikawa, Reference Fernald and Morikawa1993) – are produced with even more exaggerated features than is typical in this speech style when compared with their conventional equivalents; they can indeed be said to be “attention-getting” (Kauschke & Klann-Delius, Reference Kauschke, Klann-Delius, Gülzow and Gagarina2007, p. 198).
CONCLUSION
This study has demonstrated a revealing yet unsurprising connection between onomatopoeia and IDS, with empirical evidence to contribute to our understanding of onomatopoeia in early language development. Our results show how OWs are made more salient (and thus more readily learnable) through the use of prosodic features that are particular to IDS, supported by the use of reduplication and isolation, features which no doubt make these forms easier to segment from the speech stream. Onomatopoeia stand out from the caregiver's speech significantly more than their conventional counterparts, providing an account of infants’ common production of onomatopoeia which differs from the assumption that onomatopoeia are intrinsically learnable because of their iconic properties (e.g. Imai & Kita, Reference Imai and Kita2014). Indeed, their presence in early infant speech appears to be a product of the affective linguistic mechanisms that are unconsciously but effectively put into practice in the adult output.