One of the great puzzles of language acquisition is how infants learn phonetic contrasts at such an incredible speed. The phonetic learning that takes place during the first months is based on an infant's surrounding speech stimuli (Vallabha, McClelland, Pons, Werker, & Amano, Reference Vallabha, McClelland, Pons, Werker and Amano2007; Werker et al., Reference Werker, Pons, Dietrich, Kajikawa, Fais and Amano2007). It is widely believed that the speech infants receive has characteristics that facilitate learning, and we know that the speech register we use when interacting with an infant (infant-directed speech; IDS) is different than the one we use when interacting with an adult (adult-directed speech; ADS). Among other characteristics, the phonetic aspects of segments are different in IDS compared to ADS (for a review, see Cristia, Reference Cristia2013). For vowels, the vowel space is larger in IDS than in ADS, indicating extreme articulation (Burnham, Kitamura, & Vollmer-Conna, Reference Burnham, Kitamura and Vollmer-Conna2002; Lam & Kitamura, Reference Lam and Kitamura2008; Uther, Knoll, & Burnham, Reference Uther, Knoll and Burnham2007). However, not all studies reveal the same pattern (Cristia & Seidl, Reference Cristia and Seidl2013).
Some have shown discrepant findings with a smaller vowel space in IDS, and a shift for some vowel qualities (Benders, Reference Benders2013; Englund & Behne, Reference Englund and Behne2005). Research on vowels in IDS is mostly restricted to point vowels, perhaps providing us with a biased understanding of the facilitating input infants are thought to receive. In addition, most studies of IDS adopt a methodological approach where IDS is recorded once or only a few times (Benders, Reference Benders2013; Green, Nip, Wilson, Mefferd, & Yunusova, Reference Green, Nip, Wilson, Mefferd and Yunusova2010; Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina and Lacerda1997). Recording situations differ in the different studies, as do ages of the infants. This makes direct comparisons difficult, and there is a need for studies that expand analyses to denser data sets, where larger parts of phonological inventories are studied and where numerous recordings are made of the same mother while ensuring a natural interactive setting to capture the true nature of the input. This will clarify and broaden our view on the language environment that infants typically are surrounded with and learn from.
HYPERARTICULATION AND HYPOARTICULATION IN IDS
In the hyper–hypo theory, IDS is viewed as an adaptation to a receiver who cannot predict the message very well (Lindblom, Reference Lindblom, Hardcastle and Marchal1990). Under optimal listening conditions, and when predictability of the message is high, speech is relaxed with more assimilation. This is termed hypospeech. When, in contrast, predictability is low and/or listening conditions are less than optimal, articulation becomes forceful with longer segments that are more audible, reducing ambiguity for the listener. This is called hyperspeech. A small infant does not have much linguistic experience, and as a result, predictability will therefore almost always be low. When speaking to an infant, we will consequently use hyperspeech, manifested by IDS.
The hyper–hypo theory relates to one of the predominant theories on phonological development, the native language magnet theory (Kuhl, Reference Kuhl, Schonen, Jusczyk, McNeilage and Morton1993; Kuhl et al., Reference Kuhl, Conboy, Coffey-Corina, Padden, Rivera-Gaxiola and Nelson2008), which says that some prototypical vowel exemplars function as magnets for the perception of other exemplars. It is assumed that IDS vowels represent prototypical exemplars of vowel categories. Showing evidence of this is a study where American, Russian, and Swedish mothers’ IDS to their 2- to 5-month-old infants was analyzed (Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina and Lacerda1997). In the languages studied, the vowels /a, i, u/ had generally more extreme Formant 1 (F1) and Formant 2 (F2) in IDS than in ADS, implying more extreme articulations. From these results, it was suggested that extreme articulation makes IDS perceptually salient to the infant and aids language learning (Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina and Lacerda1997).
Hyperarticulated vowels are believed to aid language learning; and Liu, Kuhl, and Tsao (Reference Liu, Kuhl and Tsao2003) demonstrated a positive correlation between the size of mothers’ vowel spaces in IDS and their 6- to 12-month-old infants’ ability to discriminate vowels. An additional study has revealed that 21-month-old children learn words better from IDS than ADS (Ma, Golinkoff, Houston, & Hirsh-Pasek, Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011). Others have found strong evidence for the generality of vowel hyperarticulation as an instructive device for teaching language to learners (Uther et al., Reference Uther, Knoll and Burnham2007). Hyperarticulation is modified by the degree of linguistic competence expected from the language learner (Xu, Burnham, Kitamura, & Vollmer-Conna, Reference Xu, Burnham, Kitamura and Vollmer-Conna2013, but see Burnham et al., Reference Burnham, Wieland, Kondaurova, McAuley, Bergeson and Dilley2015, who show that vowel space characteristics are consistent across the first 2 years of an infant's life). Despite the extensive findings of hyperarticulation of vowels in IDS (Burnham et al., Reference Burnham, Kitamura and Vollmer-Conna2002; Lam & Kitamura, Reference Lam and Kitamura2008; Liu, Kuhl, & Tsao, Reference Liu, Kuhl and Tsao2003; Xu et al., Reference Xu, Burnham, Kitamura and Vollmer-Conna2013), not all research points in the same direction.
There are studies displaying patterns of results more compatible with the hypoarticulation of IDS (Benders, Reference Benders2013; Cristia & Seidl, Reference Cristia and Seidl2013; Dodane & Al-Tamimi, Reference Dodane and Al-Tamimi2007; Englund & Behne, Reference Englund and Behne2005, Reference Englund and Behne2006; McMurray, Kovack-Lesh, Goodwin, & McEchron, Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013). Benders (Reference Benders2013) did a study of Dutch IDS using a paradigm in which mothers played freely with their 11- and 15-month-old infants using a set of selected toys to elicit words containing segments from the same phonetic surroundings. Findings included a small vowel space in IDS compared to ADS. In addition, mothers raised their F2 and Formant 3 (F3) in corner vowels in IDS compared to ADS. The author points to these as acoustic markers of positive affect, rendering the idea of hyperarticulation as beneficial to phonological learning as secondary. A different finding was evident in Cristia and Seidl (Reference Cristia and Seidl2013), who conducted a study of American English IDS where the mothers’ task was to describe objects to their children and to adults. In the study, mothers were asked to talk about categories to their infants and were provided objects/pictures of category exemplars to show to their babies. The study displayed different results for point versus more central vowels. While vowel spaces for point vowels were expanded in IDS compared to ADS, [i–I] and [eI–ɛ] were not categorically separated but had more overlap in IDS and not less overlap as would be expected within the view that mothers are trying to categorically separate speech segments. Therefore, although point vowels were produced with more peripheral acoustic characteristics in IDS than in ADS, hyperarticulation was not evident for phonemic differences other than place of articulation. The authors conclude that hyperarticulation is not a necessary feature of IDS. In another study of American English IDS, by McMurray et al. (Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013), parents were recorded while reading a story to an infant and to an adult. Findings revealed that while point vowels for the most part show a stretched vowel space, central vowels are not enhanced in IDS. Authors point to the large overlap between vowels and consequently question whether IDS enhances vowel category learning. They opt for a revision of the assumption that the content of IDS promotes language acquisition. In line with this are results from Dodane and Al-Tamimi (Reference Dodane and Al-Tamimi2007), who studied English, French, and Japanese child-directed speech. They did not find a stretched vowel space, but rather a shift in the vowel triangle on the high–low dimension. Central vowels were more open, with higher F1 in IDS than in ADS. More specific are the findings for Norwegian by Englund and Behne (Reference Englund and Behne2006), where hypoarticulation of point vowels was demonstrated with a reduced vowel space in IDS.
Due to the possibility that only some phonetic contrasts are enhanced in IDS, some researchers have pointed to the need for studying a broad range of contrasts in the same language (Martin et al., Reference Martin, Schatz, Versteegh, Miyazawa, Mazuka, Dupoux and Cristia2015; McMurray et al., Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013). This is also important in order to uncover if there are adaptations going on in IDS that can only be seen from a larger pattern of results, and not just from results based on just a few vowels. The approach of studying a broad set of vowels within the same language has the advantage of observing patterns of results that may have been hidden in previous studies.
In addition, a naturally occurring recording situation will provide information on speech interaction that occurs with little or no instruction (Martin et al., Reference Martin, Schatz, Versteegh, Miyazawa, Mazuka, Dupoux and Cristia2015). However, Martin et al. (Reference Martin, Schatz, Versteegh, Miyazawa, Mazuka, Dupoux and Cristia2015) used contrasts from the RIKEN corpus, which consists of IDS elicited when mothers were instructed to view picture books or engage in play. While being everyday activities, they were still induced by instruction, rendering less ecologically valid recording situations. Situations that are initiated by the participants themselves in the comfort of their own homes will display the language input that an infant normally encounters. Together, this calls for studies of a broad set of contrasts, recording mothers with their small infants where they may feel most comfortable, in their own homes. This study is therefore an important contribution to enhance our understanding of early language acquisition.
PREDICTIONS
The knowledge we have to date of vowels in an infant's ambient language is much restricted to point vowels, not providing a full picture of the input from which an infant learns language. In addition, data collection is often restricted to a few vowels and low density in data as well as instructed elicitation of IDS. Higher density would increase precision of results and provide a better foundation for conclusions. This invites studies that analyze fuller vowel inventories where mothers are recorded in natural situations to capture the true nature of the input.
The present study was designed to do this by looking at the large vowel inventory of Norwegian IDS. While 3 point vowel qualities have been studied before, 6 vowel qualities are unexplored and will be analyzed in the current study. This will add to our knowledge of vowels in Norwegian IDS by covering the full vowel inventory. Most of the work on IDS is on vowel quality, but in Norwegian, vowel duration is a contrastive feature, corresponding to the term vowel quantity (Kristoffersen, Reference Kristoffersen2000). With short and long vowels for each quality, Table 1 gives an overview of the 12 vowels included, exemplified by Norwegian minimal pairs.
From Table 1 it is clear that there are only two unrounded vowels /æ:, æ/ and /e:, ɛ/. The rest are rounded. Long and short vowels are represented within the same brackets; for example, when /y:, y/ is referred to throughout the paper, it is the vowel quality that is referred to, including both long /y:/ and short /y/. The studies showing hyperarticulation in IDS clearly outnumber those showing hypoarticulation (Cristia, Reference Cristia2013). If hyperarticulation is a general feature of IDS, then this should be evident also in the current study. As the vowels studied here are not point vowels, vowel space calculations are futile. Enhancement of contrastive features in vowels would ensure that one vowel should not stand the risk of becoming too similar to another vowel. Accordingly, hyperarticulation should render less overlap between vowel qualities in IDS compared to ADS. Although the relationship between formant frequencies and articulator movement is not a direct one: in general, when F1 decreases, the tongue has moved to a higher position, and when F2 increases, the tongue has moved to a more front position. This means that increased F1 corresponds to a more open articulation, and increased F2 corresponds to a more front articulation. F3 corresponds to lip protrusion, which for rounded vowels would make them more distinct from unrounded vowels. The anterior–posterior dimension in the vocal tract has been closely tied to lip protrusion. The relationship between lip protrusion and F3 is inverse, with lower F3 with increasing protrusion (Kent & Read, Reference Kent and Read1992; Stevens, Reference Stevens1998). For the different vowel qualities, hyperarticulation therefore would imply the following:
-
• For /æ:, æ/: more open (higher F1) in IDS than ADS;
-
• for /ø:, ɵ/: further back and more protruded lips (lower F2, lower F3) in IDS than in ADS;
-
• for /o:, ɔ/: further back, more closed and more lip protrusion (lower F2, lower F1, and lower F3) in IDS than in ADS;
-
• for /y:, y/: further back, more open and more lip protrusion (lower F2, higher F1, and lower F3) in IDS than in ADS;
-
• for /ʉ:, ʉ/: more closed and more lip protrusion (lower F1 and lower F3) in IDS than in ADS; and
-
• for /e:, ɛ/: more closed (lower F1) in IDS than in ADS.
Table 1. Overview of the vowels under study exemplified by minimal pairs
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171229032250358-0362:S0142716417000480:S0142716417000480_tab1.gif?pub-status=live)
Note: The first and fourth columns represent the two words in the minimal pair. English meanings are added in parentheses. Columns two and five represent the corresponding long and short vowels in these minimal pairs in IPA.
In addition, as Cristia (Reference Cristia2013) points out, of 30 studies, 25 showed generally longer vowel duration or a reduced speech rate, and the longer duration of vowels in IDS is also expected here. Enhancement of contrastive phonetic features in IDS will additionally mean an interaction between speech type and quantity, where the difference between long and short vowels should be greater in IDS compared to ADS.
A generally higher fundamental frequency (F0) is an equally prevalent finding, with 33 out of 36 studies showing this in a metastudy (Cristia & Seidl, Reference Cristia and Seidl2013), and a recent study confirming this for different language cultures (Broesch & Bryant, Reference Broesch and Bryant2015); hence, it is also expected here.
METHOD
The data for the current study come from a large corpus of natural speech,Footnote 1 and details about data collection have previously been described (Englund & Behne, Reference Englund and Behne2005). In the current research, care was taken to make recording settings as unobtrusive as possible to enable the spontaneous interaction that takes place in day-to-day activity and at the same time ensure the elicitation of IDS. This was accomplished by using a recording setting for IDS with direct face-to-face interaction between a mother and infant. The experimenter was present and interacted with the mother to elicit ADS, but was not present during IDS recordings. The mothers initiated the recordings themselves so that the situation came as close to everyday activity as possible.
Participants
Participants were enrolled from maternity groups at local health care centers and recordings started after their infants were born. Six native Norwegian-speaking mothers with a mean age of 27 years (range = 26–28 years) participated in the study. Their infants ranged from almost 4 to 24 weeks old. All mothers reported levels of education at bachelor level or higher. Mothers and infants were generally healthy throughout the study. Upon introduction, mothers signed an informed consent, and after completing the study, they were briefed about the purpose of the study and received sound files with their own IDS recordings.
Procedure and equipment
Recordings were made over a 6-month period. A headset microphone (SHURE, model WH20) with a frequency response from 50 to 15000 Hz was connected to a Sony Digital Audio Tape recorder Walkman TCD-D8 for recording both ADS and IDS. For ADS, two headsets ran through a Behringer Eurorack MX602 mixer. Each recording session included both IDS and ADS recordings, and for a typical IDS recording, the mother and infant were alone in the room, while ADS was recorded in a conversation between the experimenter and the mother, usually in the living room. Recording time varied and ranged from approximately 10 to 45 min. A typical IDS recording was 15 min, and a typical ADS recording was 30 min. Each mother was instructed to change the infant's nappy, interacting with her child as she would normally do in an everyday situation. Other than that, no instructions were given. ADS recordings were natural conversations about anything the mother initiated as a topic. The development of her infant was a recurring topic, as were general news items from papers. At the beginning of an ADS recording, the mother was asked if she remembered any of the words she used while making the IDS recording. In this way, some words occurred in both IDS and ADS. Instructions were typically given only at the first and second recordings, because it seemed artificial to repeat them.
Acoustic analyses
The current research was conducted in order to explore a wider range of vowel qualities, therefore including the vowel qualities /æ:, æ, ø:, ɵ, o:, ɔ, y:, y, ʉ:, ʉ, e:, and ɛ/. A further aim was to explore a representative sample of phonetic contexts for each vowel. Consequently, all occurrences of target vowels in content words were from words in a focal position in a sentence. It has been found that compared to content words, vowel durations in function words are longer in IDS than in ADS (Bernstein Ratner, Reference Bernstein Ratner1985). In addition, hyperarticulation may be different for words in a focal position (Martin, Utsugi, & Mazuka, Reference Martin, Utsugi and Mazuka2014). Therefore, the same percentage of content and function words were sampled from ADS as from IDS, from the different vowel qualities and quantities. With a corpus of natural speech, the vowels used for measurements occurred in a variety of phonetic contexts. Cases where vowels preceded or followed a liquid, glide, or nasal, from which it can be difficult to distinguish the vowel, were kept to a minimum.
Praat (Boersma & Weenink, Reference Boersma and Weenink2009) was used to conduct acoustic analyses. Visual inspection initiated determination of the beginning and end of a vowel, and was supplemented by auditory judgment. Duration was measured in milliseconds. For formants, each measurement was based on the mean for all frames whose centers lie within the selected time span. Means of the first, second, and third formant frequencies were calculated in Hertz for the total selected frame. If vowels were not visibly evident in the spectrum, in cases of background noise, where the speaker had a creaky voice, or when there was a heavy puff of air during articulation, the vowel was rejected from further analyses. From the hours of recordings available, 3,028 segments were analyzed. Selection depended on only one speaker being audible, as well as no noise on the recording. As everyday activity includes the use of objects, such as running water, this considerably reduced the number of words that were feasible for analyses. Sentences were transcribed for further analysis. Care was taken to include vowels where the start and end points could be determined from periods with considerable amplitude.
The study ran over a period of 6 months. Some hesitated to start the study with their newly born infants, resulting in a varying start of the first recordings. In addition, Mothers 2 and 6 did not complete the last recording. Consequently, there was unevenness in data density for different time points. When, in addition, a previous study from the same corpus of IDS showed no changes in vowel spectra in IDS over the first 6 months (Englund & Behne, Reference Englund and Behne2006), data was collapsed for analysis. Results were analyzed by the IBM SPSS (Version 21.0) statistical package. Duration was measured in milliseconds, but as duration is perceived logarithmically, a transformation was applied to duration values. The statistical software uses a natural logarithm, returning base e logarithm of the duration values. Analyses were run for the recalculated variable (Kondaurova, Bergeson, & Dilley, Reference Kondaurova, Bergeson and Dilley2012). The mel scale takes the nonlinearity of frequency perception into consideration (Stevens, Volkmann, & Newman, Reference Stevens, Volkmann and Newman1937). Fundamental frequency and formant frequencies were recalculated by using the formula from O'Shaughnessy (Reference O'Shaughnessy2000): m = 2595 log 10 (1 + f/700).
RESULTS
Independent variables were speech type with two levels (ADS or IDS); vowel quality with six levels (/æ:, æ/; /ø:, ɵ/; /o:, ɔ/; /y:, y/; /ʉ:, ʉ/; and /e:, ɛ/); and vowel quantity with two levels (long and short). The Mahalanobis procedure for detecting outliers was employed, and after removing extreme values at ±3 SD at either side of the mean, ADS had 1,529 tokens and IDS 1,236 tokens. N for the different mothers included in the analyses was as follows: for Mother 1: 648, Mother 2: 387, Mother 3: 495, Mother 4: 378, Mother 5: 401, and Mother 6: 456. This led to the following distribution over speech qualities: for /æ:, æ/, n = 435; for /ø:, ɵ/, n = 229; for /o:, ɔ/, n = 505; for /y:, y/, n = 290; for /ʉ:, ʉ/, n = 494; for /e:, ɛ/, n = 812; with 1,046 long vowels and 1,719 short vowels. Means and standard deviations for variables F1, F2, and F3 in mels are presented in Table 2.
Table 2. Means (standard deviations) for duration (ms), log duration, and F0–F3 (mels) for vowel qualities in infant-directed and adult-directed speech
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171229032250358-0362:S0142716417000480:S0142716417000480_tab2.gif?pub-status=live)
Note: F0, fundamental frequency; F1–3, Formants 1–3.
Table 2 shows means and standard deviations for F1–F3 in mels for vowel qualities in ADS and IDS. From Table 2, it is apparent that IDS has generally higher standard deviations than ADS. This was followed up by computing standard deviations into new variables for all vowel qualities in ADS and IDS, and running repeated measures analyses of variance for log duration, F0, F1, F2, and F3. Analyses revealed that in all dependent variables except log duration, standard deviations were higher for IDS than for ADS. Means, standard error, as well as results from repeated-measures analyses are presented in Table 3.
Table 3. Standard deviations for log duration and F0–F3 for adult-directed (ADS) and infant-directed speech (IDS) and results from repeated measures analyses with speech type and vowel quality as independent variables
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171229032250358-0362:S0142716417000480:S0142716417000480_tab3.gif?pub-status=live)
Note: F0, fundamental frequency; F1–3, Formants 1–3.
Table 3 shows that the standard deviations were significantly higher in IDS than in ADS for F1, F2, and F0. From the analyses, two interactions between speech type and vowel quality emerged. For F2, F (5, 25) = 25.87, p = .000, and for F3, F (5, 25) = 10.91, p = .000. Paired-samples t tests revealed that for F3, the only two vowel qualities where IDS had higher standard deviations than ADS were /y:, y/, t (5) = –4.19, p = .009, and /e:, ɛ/, t (5) = –7.23, p = .001. For F2, standard deviation was higher in IDS than in ADS for /y:, y/, /ʉ:, ʉ/, and /e:, ɛ/, t (5) = –3.49, p = .017, t (5) = –4.53, p = .006, and t (5) = –4.52, p = .006, respectively. For F2, there was one instance where ADS had significantly higher standard deviation than IDS, t (5) = 7.82, p = .001.
Figure 1 shows F1–F2 representation of all vowel qualities in ADS, and Figure 2 shows the same for IDS. From these figures, what seems to be a shift in vowel space appears with a vowel distribution that is higher on F2 in IDS than in ADS. In addition, a large variation for the different vowel qualities is evident.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171229032250358-0362:S0142716417000480:S0142716417000480_fig1g.gif?pub-status=live)
Figure 1. Formant 1–Formant 2 distribution (in mels) for adult-directed speech. Each point represents one segment.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171229032250358-0362:S0142716417000480:S0142716417000480_fig2g.gif?pub-status=live)
Figure 2. Formant 1–Formant 2 distribution (in mels) for infant-directed speech. Each point represents one segment.
Linear mixed models analyses were carried out for dependent variables F0, log duration, F1, F2, and F3. In the model, fixed effects were speech type with two levels, vowel quality with six levels, and vowel quantity with two levels. Subject was a cluster variable. Since in a mixed model interactions are test values against the highest of the values for the variable in question, for vowel quality this meant that each vowel quality was tested separately with a similar model where interactions appeared. As natural speech is used in the current study, the degree of inherent variability is necessarily high and a 5% level of significance was used. Table 4 shows main effects (tests of fixed effects) of the dependent variables from mixed models analyses.
Table 4. F values and significance levels from mixed model analyses for main effects and interactions for vowel log duration and F0–F3
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20171229032250358-0362:S0142716417000480:S0142716417000480_tab4.gif?pub-status=live)
Note: F0, fundamental frequency; F1–3, Formants 1–3.
The last two predictions of a generally longer duration and higher F0 in IDS than in ADS were supported (Table 4). Table 2 shows that the significantly higher log duration in IDS compared to ADS seen from the means also shows a different result for some vowel qualities. The significant interaction between speech type and segment showed that for /æ:, æ/ and /o:, ɔ/, IDS vowels were longer than ADS vowels, F (1, 3025) = 23.82, p = .000 and F = (1, 3025) = 12.34, p = .000, respectively. However, for /ø:, ɵ/, /y:, y/, /ʉ:, ʉ/, and /e:, ɛ/, it was the other way round: ADS vowels were longer than IDS vowels, F (1, 3031) = 102.2, p = .000; F (1, 3052) = 61.60, p = .000; F (1, 3050) = 67.89, p = .000; and F (1, 3030) = 49.65, p = .000, respectively. Log duration was reliably higher for long vowels (Table 2) than for short vowels, and this was stable across speech types, F (1, 3032) = 1.52, ns. F0 was significantly higher in IDS than in ADS, and further investigation into the interaction between speech type and vowel quality (Table 4), showed that only for /y:, y/ was F0 higher in IDS than in ADS, F (1, 3019) = 4.61, p = .032.
For /æ:, æ/, the prediction of a higher F1 in IDS (Table 2) was not supported. However, further analyses based on the significant interaction between speech type and vowel quality showed that for /æ:, æ/, the minimally higher F1 in ADS compared to IDS is significant, F (1, 3054) = 77.08, p = .000. F1 was also significantly different between IDS and ADS for /ø:, ɵ/, /o:, ɔ/, /y:, y/, and /e:, ɛ/, F (1, 3059) = 27.00, p = .000; F (1, 3051) = 16.09, p = .000; F (1, 3057) = 19.17, p = .000; and F (1, 3059) = 33.59, p = .000, respectively. It was higher in IDS for /o:, ɔ/, but for the other three vowel qualities it was lower in IDS. The only case where F1 did not differ between the speech types was /ʉ:, ʉ/, F (1, 3058) = 2.56, ns.
The second prediction of a lower F2 in IDS than ADS for /ø:, ɵ/ (Table 2) was not supported. On the contrary, a higher F2 in IDS was observed (Table 2), F (1, 3058) = 24.86, p = .000. For the other vowel qualities, the significant interaction between speech type and vowel quality revealed F2 to be no higher in IDS for /æ:, æ/, /y:, y/, or /ʉ:, ʉ/, F (1, 3058) = 2.80, ns; F (1, 3056) = 0.60, ns; F (1, 3058) = 2.77, ns, respectively, but for /o:, ɔ/ and /e:, ɛ/, F (1, 3058) = 6.74, p = .009; F (1, 3056) = 49.65, p = .000 (see Table 2).
For /o:, ɔ/, the expected lower F1 and F2 in IDS was thereby not supported. Neither was the expected lower F2 and higher F1 in IDS than in ADS for /y:, y/. For /ʉ:, ʉ/, the expectation was a lower F1 in IDS than in ADS, but this was not evident from the analyses. However, the lower F1 for IDS compared to ADS in /e:, ɛ/ was supported.
A generally higher F3 appeared in IDS compared to ADS (Table 2), and further analyses of the significant interaction between speech type and vowel quality (Table 4) showed there to be no speech type difference in F3 for /e:, ɛ/, F (1, 3057) = 0.39, p = .528. However, a higher F3 in IDS appeared for /æ:, æ/, /ø:, ɵ/, /o:, ɔ/, /y:, y/, and /ʉ:, ʉ/, F (1, 3059) = 21.44, p = .000; F (1, 3059) = 22.19, p = .000; F (1, 3059) = 38.05, p = .000; F (1, 3057) = 10.41, p = .001; and F (1, 3058) = 16.24, p = .000, respectively (see Table 2).
DISCUSSION
Summary of results
This study aimed at broadening our knowledge of IDS to young infants by studying the large vowel inventory of Norwegian IDS and recorded mothers at home in everyday situations while their infants were very young. The following is an overview of predictions, with comments as to whether each was supported:
-
• more open /æ:, æ/ in IDS: not supported; the effect was the opposite;
-
• more back and protruded /ø:, ɵ/ in IDS: not supported; the effect was the opposite;
-
• more back, closed, and protruded /o:, ɔ/ in IDS: not supported; the effect was the opposite;
-
• more back, open, and protruded /y:, y/ in IDS: not supported; no effect (the opposite for protrusion);
-
• more closed and protruded /ʉ:, ʉ/ in IDS. Not supported; no effect (the opposite for protrusion);
-
• more closed /e:, ɛ/ in IDS: supported;
-
• longer duration IDS: supported; and
-
• higher F0 in IDS: supported.
The following effects seem evident from the current results: vowel qualities /ø:, ɵ/, /o:, ɔ/, and /e:, ɛ/ are more fronted; /o:, ɔ/ is more open; /ø:, ɵ/, /y:, y/, and /e:, ɛ/ are less open; and, in addition, vowel qualities /æ:, æ/, /ø:, ɵ/, /o:, ɔ/, /y:, y/, and /ʉ:, ʉ/ are less protruded in IDS compared to ADS. The results correspond to those of Benders (Reference Benders2013), with raised F2 and F3, and parts of McMurray et al. (Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013) and Cristia and Seidl (Reference Cristia and Seidl2013), with a larger overlap between vowel contrasts. They also accord with Dodane and Al-Tamimi (Reference Dodane and Al-Tamimi2007) and Englund and Behne (Reference Englund and Behne2005), who observed a shift of some vowels in the front–back dimension. Based on previous and current findings, the questions become: what is the mother doing in IDS and why is she doing so?
Acoustic–articulatory relations
Some ask whether adaptations in IDS are secondary to higher fundamental frequency or reduced speaking rate (McMurray et al., Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013). We know that infants’ preference for IDS (Cooper, Abraham, Berman, & Staska, Reference Cooper, Abraham, Berman and Staska1997) is related mainly to F0 (Segal & Newman, Reference Segal and Newman2015). Higher F0 in IDS could be the result of an attempt to mimic infant production (Cristia, Reference Cristia2013). When we know that infants prefer to listen to utterances produced by other infants (Masapollo, Polka, & Ménard, Reference Masapollo, Polka and Ménard2015), one would get an infant's attention by doing this. Due to smaller vocal tracts, this would lead to higher F0, but also more open and more front vowels (Ménard, Schwartz, Boë, Kandel, & Vallée, Reference Ménard, Schwartz, Boë, Kandel and Vallée2002). However, both emotionality and mimicking speech would affect all vowel qualities in a similar way. The selective increase in formants for some qualities in this study does not support either explanation.
Speech rate has been used to explain the usual longer vowel duration in IDS. Infants easily attend to more slow IDS that is high in affect (Panneton, Kitamura, Mattock, & Burnham, Reference Panneton, Kitamura, Mattock and Burnham2006), and slow speech improves word recognition (Song, Demuth, & Morgan, Reference Song, Demuth and Morgan2010). This is not surprising, seeing that in slow speech there is a decreased probability of target undershoot (Gay, Ushijima, Hirose, & Cooper, Reference Gay, Ushijima, Hirose and Cooper1974). In addition, a slow speaking rate would lead to a more open jaw, and more time to reach articulatory extremes (Vanson & Pols, Reference Vanson and Pols1990). Although certainly part of what is going on in IDS, also here with longer vowel log duration in IDS, neither higher pitch nor slower speech can alone account for the selective adjustment in formants observed for some vowel qualities in the present data.
Varying articulatory parameters does not always lead to the same acoustic result (Stevens, Reference Stevens1998), and one should be cautious not to interpret the connections between acoustic measures and articulatory movement too directly. Nevertheless, some connections are seen between the position and movement of the articulators and the acoustic outcome. Generally, the first formant frequency is linked to jaw opening. Jaw opening was generally comparable between speech types, but mothers articulated /o:, ɔ/ with a more open jaw in IDS. However, the fact that the opposite was true for /ø:, ɵ/, /y:, y/, and /e:, ɛ/ leads to a conclusion that vowels are articulated with neither a more open nor a more closed jaw in IDS.
The second formant is sensitive to placement of the tongue body (Kent & Read, Reference Kent and Read1992; Stevens, Reference Stevens1998), and the third formant is sensitive to placement of the tip of the tongue (Sundberg, Reference Sundberg1977). F2 and F3 can be seen in connection, and if the tongue body is more front, it is probable that the tip of the tongue will be equally front. Therefore, where F2 is increased, it is likely that F3 is correspondingly increased and that lips are more protruded. The current results showed three vowel qualities to be more front and five vowel qualities to be less protruded in IDS than in ADS. The lack of more opening is surprising, seeing that infants have an F1 bias (i.e., the finding that young infants are better at discriminating vowel contrasts conveyed by F1 than contrasts that are associated with corresponding F2 changes; Lacerda & Sundberg, Reference Lacerda, Sundberg, Lacerda, Hofsten and Heimann2001). This, together with selective fronting and less rounding in vowels, paints a picture where IDS represents less-specified vowels.
Hypoarticulation as perceptual challenge
As observed in Table 3, and confirmed through analyses of standard deviations, although more so for some vowel qualities, variation is generally higher in IDS than in ADS. Together with lack of specification of vowel quantity, lack of jaw opening, and selective fronting as well as less-rounded vowels the current results do not support hyperarticulation but rather point more in the direction of hypoarticulation. Hypoarticulation in IDS does not coincide well with the traditional idea that this speech type enhances speech category learning. Instead, it justifies the idea that IDS may be a perceptual challenge.
Large variation is also mentioned as a central finding in McMurray et al. (Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013), and there is also a possibility that high variability in IDS is what makes it challenging, and that the hypoarticulation observed is a masked effect of variability where large variability leads to overlapping phonetic categories. Statistical learning models are based on an estimate of both the mean and variability in values that characterize phonemes. If variability is high, this may outweigh the benefit of expanding the vowel space to establish prototypes. In this way, highly variable IDS clearly entails a perceptual challenge to an infant who is faced with the task of learning phonetic categories. This goes against findings that computer models learn speech contrasts better from hyperarticulated speech (Boer & Kuhl, Reference Boer and Kuhl2003). However, it should be mentioned that not all agree with what de Boer and Kuhl concluded (Kirchhoff & Schimmel, Reference Kirchhoff and Schimmel2005), and when, further, some have found the opposite, namely, that ADS outperforms IDS as a foundation for learning of some contrasts (McMurray et al., Reference McMurray, Kovack-Lesh, Goodwin and McEchron2013), we have to consider the possibility that although IDS may be a perceptual challenge, it may still not be detrimental to phonetic learning. Instead, such a challenge may be beneficial to a speech-learning infant. How can it be beneficial? Work on categorization of meaning offers a useful analogue. At the heart of stimuli processing in infants is the idea that low levels of variability and complexity might lead to habituation, which in turn causes low attention and counteracts learning. In this way, variation is not necessarily harmful to categorization. A study of visual categorization has shown that variability is central in defining category membership (Mather & Plunkett, Reference Mather and Plunkett2011). Note that habituation would be happening only if a stimulus was presented repeatedly, and the referred study used 10-month-olds and with a very different purpose. Still, a perceptual challenge hypothesis could be set forth for phonetic learning with the equivalent idea, presupposing variation to be necessary for attention and learning in phonetic category development. A new study supports this hypothesis, by using a mathematical teaching model. Although not all learners can profit from variability, large variability may lead to better learning by directing the learners’ inferences away from segments that are not good exemplars, and toward segments that are (Eaves, Feldman, Griffiths, & Shafto, Reference Eaves, Feldman, Griffiths and Shafto2016). Additional research comes from studies of second language acquisition in adults where it is shown that spectrally more variable materials from different talkers leads to learning of more robust categories (Lively, Pisoni, Yamada, Tohkura, & Yamada, Reference Lively, Pisoni, Yamada, Tohkura and Yamada1994; Sadakata & McQueen, Reference Sadakata and McQueen2013; Wong, Reference Wong, Li and Ching2014). Infants also learn words faster when presented with multiple speakers (Rost & McMurray, Reference Rost and McMurray2009), but also from one speaker with varying duration, overall pitch, and pitch contour (Galle, Apfelbaum, & McMurray, Reference Galle, Apfelbaum and McMurray2015). Infants may search for invariant cues in the speech they encounter, whether auditory or visual, and invariants may become evident if variants outnumber them.
Visual perceptual aspects
Some articulatory gestures may have acoustic effects that are more or less easy to perceive, and the content of IDS may or may not be intentional by the mother. Regardless, why would mothers speak with selectively more fronted vowels and less lip protrusion? Much of the speech that infants encounter is multimodal. We know that infants prefer to look at infant-directed faces (Kim & Johnson, Reference Kim and Johnson2014), and a study has shown that infants as young as 2 months old perceive the audiovisual aspects of sounds within syllables (Baier, Idsardi, & Lidz, Reference Baier, Idsardi and Lidz2007). If visual cues are important in language learning, they may also be fundamental to IDS, and although it would be expected as a general effect, the observed fronting in IDS could be motivated by enhancing visual speech cues to infants. One highly visible aspect of speech would be jaw opening, but this was not observed as an aspect of the current IDS. Another would be fronting, which could mean vowel articulatory movements in some way would be easier for an infant to see. When only half of the vowel qualities were fronted (/æ:, æ/, /o:, ɔ/, and /e:, ɛ/) and at least two of the ones that were already central vowels, it is not easy to grasp in what way it would make the visual task easier for an infant. A third highly visible aspect in vowel production is lip protrusion. The effect of protruding the lips is the lengthening of the vocal tract and lowering of formant frequencies. However, this was not the case here. In the current data, it seems most vowels are less protruded. A related possibility is that IDS represents increased visual contrastiveness between categories. Although some have questioned it (Ter Schure, Junge, & Boersma, Reference Ter Schure, Junge and Boersma2016), a study by Teinonen, Aslin, Alku, and Csibra (Reference Teinonen, Aslin, Alku and Csibra2008) shows that visual information aids phoneme learning in infancy.
Smiling
There is one explanation that may cover the selective fronting and less lip rounding observed in the current data. Revealing emotional information, the mother could be smiling while producing IDS. If a speaker is smiling, the tongue body is more front and lips are less protruded (Tartter, Reference Tartter1980). In addition to making speech cues visible, it could increase infant production, as shown in a study where the quantity of speechlike syllabic infant vocalizations increases if the mother is smiling during face-to-face interaction (Hsu, Fogel, & Messinger, Reference Hsu, Fogel and Messinger2001). Although done with adults, a study has also reported that listeners may attach more weight to visual input from a smiling rather than an austere speaker (Traunmuller & Ohrstrom, Reference Traunmuller and Ohrstrom2007). When smiling, the mouth widens and the lips retract, resulting in a shortened vocal tract with a resulting increase in all formants. However, it seems to have different consequences for rounded versus unrounded vowels. A study has shown that for the vowel /u:/, a smile resulted in significantly higher F3, while for /a:/ and /i:/ it did not. Therefore, lip protrusion decreased more from a smile if the vowel was inherently more protruded (Fagel, Reference Fagel, Esposito, Campbell, Vogel, Hussain and Nijholt2010). This coincides well with the current data, where there were no difference in F3 for /e:, ɛ/.
The smiling explanation could be further supported by the choice of recording situation. The current approach used a face-to-face interactive setting, which may have encouraged smiling. This is not a common setting to use in IDS research and could explain the different results in a study of Swedish, where a play situation was used to elicit IDS (Kuhl et al., Reference Kuhl, Andruski, Chistovich, Chistovich, Kozhevnikova, Ryskina and Lacerda1997). Studying infants under 6 months challenges the adaptation of interactive situations, and with a 2-month-old it may be unnatural to play on the floor with toys, while with a 6-month-old it may be more natural. In this way, the recording situation should be adapted to the age and development of the infant. The recording situation selected for the current study ensured inclusion of infants from birth and to 6 months. As the mother and infant were alone in the current setting, it is seen as highly natural and representative for natural occurring IDS, but it may also have affected the kind of adaptations the mother is using in that particular situation.
Methodological aspects
Accounts of IDS rarely discuss the possibility that the recording situation may have a profound effect on experimental results. In Green et al. (Reference Green, Nip, Wilson, Mefferd and Yunusova2010), the point was made that the lack of more articulatory exaggeration can be due to self-consciousness while being observed. Although some have shown no difference (Stern, Spieker, Barnett, & MacKain, Reference Stern, Spieker, Barnett and MacKain1983), it has also been shown that some mothers speak more slowly in a home setting than in a laboratory setting (Stevenson, Leavitt, Roach, Chapman, & Miller, Reference Stevenson, Leavitt, Roach, Chapman and Miller1986). If this is so, one can question whether mothers use more extreme IDS at home, and this would mean that the slower speaking rate should be evident here, but the lack of longer vowels in IDS refutes this. With a lab-oriented approach with older infants, there might be less face-to-face interaction, and one may observe less focus on visual speech cues. Research is under way to test whether characteristics of Norwegian IDS changes in line with recording situations.
Interpreting the impact of what the mother does during the IDS recordings depends on what she does during the ADS recording. Recent work has pointed to the possibility that the differences between speech registers is in part due to the nature of ADS recordings (Johnson, Lahey, Ernestus, & Cutler, Reference Johnson, Lahey, Ernestus and Cutler2013). Most studies of IDS use ADS speech where the adult is unfamiliar to the mother, typically an experimenter. Johnson et al. (Reference Johnson, Lahey, Ernestus and Cutler2013) have shown that differences between ADS to a familiar adult and IDS are smaller than those between ADS to an unfamiliar adult and IDS, which may lead to a bias when interpreting the characteristics of IDS. In the current methodological approach, each mother spoke to the same experimenter at 12 points in time in the families’ homes. Although the bias may have been relevant for the first couple of recording sessions, the experimenter and mothers became increasingly friendly throughout the study. In consequence, the effect is likely evened out by later recording sessions. While this might explain if reduced register effects were found (e.g., no hyperarticulation), it does not predict significant differences when they do occur. This gives the current study an advantage with a valid ADS condition, containing speech that comes close to what mothers would use with other familiar/friendly adults.
Summary
The early language environment may present considerable complexity to infants who are about to learn phonetic categories. The present study was designed as a thorough approach studying an abundance of vowels collected in IDS and ADS in a natural interactive setting early in infant life. Data provide a striking picture of results showing vowels to be hypoarticulated and selectively open, fronted, and less protruded in IDS. While hypoarticulation may complicate an infant's auditory language learning, it may also facilitate perception of visual aspects of speech and emotional aspects in communication. Results call for theoretical development in IDS research that acknowledges that within the emotional and attention-getting message of IDS lies a perceptual challenge for an infant.