Introduction
In many species, parents’ responses to their offspring's immature vocalizations play a key role in the development of communication. Results from vocal learning studies on humans (Goldstein & Schwade, Reference Goldstein and Schwade2008), songbirds (King, West, & Goldstein, Reference King, West and Goldstein2005), and marmosets (Takahashi et al., Reference Takahashi, Fenley, Teramoto, Narayanan, Borjon, Holmes and Ghazanfar2015; Gultekin & Hage, Reference Gultekin and Hage2018) indicate that adults who coordinate their vocalizations with those of their offspring create contingent social feedback that facilitates learning of more advanced vocal patterns. Human infants have a long period of vocal immaturity, during which vocal development seems particularly open to social input (e.g., Kuhl, Tsao, & Lui, Reference Kuhl, Tsao and Lui2003; Goldstein & Schwade, Reference Goldstein, Schwade, Blumberg, Freeman and Robinson2010; Ramírez-Esparza, García-Sierra, & Kuhl Reference Ramírez-Esparza, García-Sierra and Kuhl2017). By five months, infants have come to expect that their babbling (i.e., all speech-related prelinguistic vocalizations, excluding cries and vegetative sounds; Oller, Reference Oller2000; Warlaumont, Richards, Gilkerson, & Oller, Reference Warlaumont, Richards, Gilkerson and Oller2014) will reliably elicit an adult's response (Goldstein, Schwade, & Bornstein, Reference Goldstein, Schwade and Bornstein2009). Social influences continue to facilitate improvements in vocal learning throughout the babbling phase. By nine months, infants produce more speech-like vocalizations when caregivers’ responses are contingent on their vocalizations (Goldstein, King, & West, Reference Goldstein, King and West2003). Contingent vocal responses from parents predict the content of infants’ vocal changes (e.g., vowel resonance and faster consonant–vowel transitions) (Goldstein & Schwade, Reference Goldstein and Schwade2008).
Findings from infant vocal learning studies clearly demonstrate the power of parents’ contingent responses in facilitating vocal production learning, but the statistical and linguistic structure of contingent parental speech is unknown. In what ways does babbling influence the statistics of parental speech? We know that infants can detect many forms of auditory statistics. Prelinguistic infants can segment sequences of phonemes based on their co-occurrence regularities (Saffran, Aslin, & Newport, Reference Saffran, Aslin and Newport1996). Infants also pool phoneme information to construct phonological categories found in their ambient language (e.g., Mattys, Jusczyk, Luce, & Morgan, Reference Mattys, Jusczyk, Luce and Morgan1999). Children use the co-occurrence of heard words and seen object statistics to rapidly learn object–referent mappings (Smith, Suanda, & Yu, Reference Smith, Suanda and Yu2014). Infants’ sensitivity to the statistics of language input, and their ability to abstract structure from spoken language at multiple levels of linguistic organization, means that variation in parental speech plays a powerful role in infant language learning and development (Newman, Rowe, & Ratner, Reference Newman, Rowe and Ratner2016).
Parental speech to infants is characterized by several acoustic and linguistic features that increase its signal value, organize infant attention, and simplify the language learning task. Infant-directed speech (IDS) generally consists of shorter utterances than are found in adult-directed speech (Newport, Gleitman, & Gleitman, Reference Newport, Gleitman, Gleitman, Snow and Ferguson1977; Snow, Reference Snow, Snow and Ferguson1977). While the linguistic content of IDS is simplified, the prosodic features (e.g., pitch contours) are elaborated. IDS has a higher overall pitch and larger dispersion of pitch compared to adult-directed speech (Fernald & Simon, Reference Fernald and Simon1984). Differences in the pitch of caregivers’ speech may facilitate infants’ attention to the co-occurrence of speech sounds over time (Thiessen, Hill, & Saffran, Reference Thiessen, Hill and Saffran2005).
Parents also tend to maintain a small set of consistent grammatical structures in their speech to infants, and those specific structures predict later child language (Cameron-Faulkner, Lieven, & Tomasello, Reference Cameron-Faulkner, Lieven and Tomasello2003). Isolated words spoken to children are predictive of the words that are most likely to be produced by children later in development (Brent & Siskind, Reference Brent and Siskind2001). The simplicity of language spoken to children and infants may interact with the diversity or variability of words parents utter over extended time periods. Numerous findings from learning studies implicate the unique role of variability in the input, specifically the number of words and specific word types heard by children. Previous work demonstrates that exemplar variability promotes both language development (Huttenlocher, Waterfall, Vasilyeva, Vevea, & Hedges, Reference Huttenlocher, Waterfall, Vasilyeva, Vevea and Hedges2010) and generalization in learning auditory statistics (Vukatana, Graham, Curtin, & Zepeda, Reference Vukatana, Graham, Curtin and Zepeda2015).
Parents’ speech and responsive behaviors to infants during free play with objects have been shown to facilitate word learning in toddlers between 12 and 24 months of age (e.g., Tamis-LeMonda, Bornstein, & Baumwell, Reference Tamis-LeMonda, Bornstein and Baumwell2001; Weisberg, Zosh, Hirsh-Pasek, & Golinkoff, Reference Weisberg, Zosh, Hirsh-Pasek and Golinkoff2013). Might babbling also influence parental speech in ways that facilitate language learning? The present research examines whether the structural patterns of parents’ infant-directed speech change in response to babbling. Past findings suggest that the complexity of parents’ interactions with their infants reflects infants’ current linguistic ability. When speaking to 4-month-olds, mothers’ utterances are characterized by more exaggerated pitch contours and lexical repetitiveness than when infants are newborns, 12, or 24 months of age (Stern, Spieker, Barnett, & MacKain Reference Stern, Spieker, Barnett and MacKain1983). Parents also raise the pitch of their speech to their 4-month-old infants as a function of their infants’ responses to their speech (Smith & Trainor, Reference Smith and Trainor2008). Parents’ speech to 6-month-old infants utilizes more nonsense sounds to attract infants’ attention than when infants are 12 or 19 months (Fernald & Morikawa, Reference Fernald and Morikawa1993). The length of parents’ utterances to children in everyday learning environments may decrease until children learn specific target words within the utterances (Roy, Frank, & Roy, Reference Roy, Frank and Roy2009). Thus, parents’ speech is sensitive to children's overall developmental progression, but the role of prelinguistic vocalizing in organizing parents’ speech has received little attention.
Recent findings suggest that the window of time after infants vocalize may prove crucial for the promotion of learning. Infants’ own vocalizations may serve to modulate their attention and afford learning the mapping of parents’ speech to attended objects in the environment (Albert, Schwade, & Goldstein, Reference Albert, Schwade and Goldstein2017). When infants babble while looking at an object, they create a state of receptivity for learning at the same moment caregivers are likely to provide specific information about the target object (Albert et al., Reference Albert, Schwade and Goldstein2017). These object-directed vocalizations (ODVs), coupled with an immediate label of the object by an adult, result in stronger associations between word–object referents (Goldstein, Schwade, Briesch, & Syal, Reference Goldstein, Schwade, Briesch and Syal2010). These findings suggest that contingent parental speech capitalizes on infant attentional focus and facilitates the mapping of heard words to seen objects.
We first compared the linguistic structure of parental speech as a function of its contingency on infant babbling. Parental speech structure was quantified in terms of three measures. To measure lexical diversity, we counted the number of unique words (types) parents said to infants. Parents’ speech was also analyzed with two measures of syntactic complexity. We determined the mean length of utterances in words (MLUw) (Parker & Brorson, Reference Parker and Brorson2005) and the proportion of utterances which contained only a single word.
Next, we examined the relations between prelinguistic vocalization quality and contingent parental speech. We used Oller's infraphonological framework in our classification of 9- and 10-month-old infants’ vocalizations to quantify the maturity of infant vocalizations. Over the first year of life, infants’ vocal production undergoes extensive changes (Oller, Reference Oller2000). Infants between 1 and 4 months begin to produce immature vowel sounds characterized by minimal breath support and a closed vocal tract, which result in creaky and nasalized vocalizations called quasi-resonant vowels. Incrementally, between 3 and 6 months, infants begin to produce fully resonant vowels which have full breath support and are produced with an open vocal tract. By 9 months, infants begin to produce mature consonant–vowel alternations which resemble the sounds found in well-formed language (e.g., [ba], [da]; Oller, Eilers, & Basinger, Reference Oller, Eilers and Basinger2001). Infants’ prelinguistic vocal repertoires are most diverse between 9 and 10 months, a time when they are beginning to incorporate phonological characteristics of their ambient language into their vocal productions.
Parental speech is sensitive to these developmental changes in infants’ vocalizations. Parents are more likely to respond to vocalizations which more closely resemble adult speech, such as consonant–vowel vocalizations (Abney, Warlaumont, Oller, Wallot, & Kello, Reference Abney, Warlaumont, Oller, Wallot and Kello2016; Albert et al., Reference Albert, Schwade and Goldstein2017). Parents can also identify when infants’ babbling matures. Parents’ identification of qualitative variance in infant's vocalizations has been documented (Oller, Eilers, & Basinger, Reference Oller, Eilers and Basinger2001). In addition to responding differently based on the acoustic maturity (e.g., vowel resonance, consonant–vowel timing) of vocalizations, parents respond to the directedness of infants’ vocalizations (Albert et al., Reference Albert, Schwade and Goldstein2017). In response to infants’ object-directed vocalizations, parents responded more frequently, were more likely to respond sensitively to infants’ current attentional focus, and more often provided information content about infants’ current attentional focus. However, Albert et al. investigated only the content of parents’ contingent speech to infant babbling. By conducting acoustic and linguistic analyses of both contingent and non-contingent speech to infants, we can better understand the role that babbling may play in influencing social interactions in ways that might facilitate the infant's own learning.
Methods
Participants
Thirty mother–infant dyads participated (infant mean age = 9.20 months, range: 9.4–10.14 months). Participants were recruited from birth announcements in local newspapers and through advertisements. Mothers received a T-shirt or a book as a gift for their participation. Participants were part of a larger corpus from a previously published study (Goldstein & Schwade, Reference Goldstein and Schwade2008).
Apparatus
Sessions were recorded in a naturalistic environment (a 12 ft. × 18 ft. playroom) with toys, a toy box, and posters of animals. Infants were free to roam around the room and explore. Interactions were video-recorded via three remote-controlled digital video cameras. To obtain detailed audio-recordings, each infant wore denim overalls concealing a wireless microphone (Telex FLM-22; Telex Communications, Inc., Burnsville, MN) and transmitter (Telex USR-100). Caregivers wore a wireless lapel microphone (Telex FMR-150) with a transmitter concealed in a pouch at their waist (Telex USR-100). Infant vocalizations and caregiver speech were recorded on distinct audio channels.
Procedure
Participants came to the lab for two 30-min play sessions, spaced approximately 24 hours apart. The first session and the first 10 min of the second session were unstructured, with parents instructed to play as they normally would at home.
Speech transcription
Parents’ speech during session 1 was transcribed in full. Caregiver utterances were segmented if they were bounded by silence longer than 2 sec and/or if they exhibited terminal pitch contours (Venker, Bolt, Meyer, Sindberg, Weismer, & Tager-Flusberg, Reference Venker, Bolt, Meyer, Sindberg, Weismer and Tager-Flusberg2015). Utterances from the parents were categorized as contingent if they occurred within 2 sec of the offset of infants’ vocalizations (including all vocalization categories as described by Oller, Reference Oller2000, and defined below); all other parent utterances were categorized as non-contingent (Table 1). Caregiver responses to crying, fussing, and vegetative vocalizations (e.g., coughs) were excluded from our analysis.
Table 1. Caregiver speech descriptive statistics
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220810100340343-0828:S0305000919000291:S0305000919000291_tab1.gif?pub-status=live)
Child-directed speech analyses
We computed three descriptive measures for caregivers’ contingent and non-contingent speech during session 1: (1) the number of unique words in caregivers’ speech, (2) the mean length of caregivers’ utterances in words (MLUw), and (3) the proportion of single-word utterances in caregivers’ speech. To measure inter-rater reliability, a second coder transcribed 5 min of the sessions for a subset (20%, N = 23) of the dyads. The intra-class correlation (ICC) for the number of unique contingent words was good (.85) as was reliability for the number of non-contingent unique words (.79). The ICC for the total number of words spoken contingently was good (.76) and moderate for the total number of non-contingent words (.71). The ICC for the mean length of utterances was good for both contingent utterances (.87) and non-contingent utterances (.79).
To obtain acoustic parameters of parents’ speech, we utilized a Praat (Boersma & Weenink, Reference Boersma and Weenink2015) script that extracted a vector corresponding to the F0 (fundamental frequency) over time from a subset (15%) of the total utterances in our dataset. Utterances used for analysis of acoustic differences across contingent and non-contingent speech were duration matched.
Phonology of infants’ vocalizations
Infant vocalizations from the unstructured play period of session 2 were categorized into four different groups according to Oller's infraphonological acoustic classification system (Oller, Reference Oller2000). This system considers both acoustic (e.g., timing of formant transitions) and qualitative (e.g., phonetic categories) features of infants’ vocalizations. Vocalization boundaries were segmented according to breath groups (see Oller & Lynch, Reference Oller, Lynch, Ferguson, Menn and Stoel-Gammon1992; Oller, Reference Oller2000). A quasi-resonant nucleus is produced with a relatively closed throat, has little breath support and is qualitatively creaky, nasal, or both. A fully resonant nucleus is produced with an open vocal tract with normal phonation which yields a clear formant structure. A marginal syllable is a slow movement between consonant and vowel (> 250 ms). The slow movement of the articulators often distorts the vowel. Following Oller's (Reference Oller2000) classification system, we included as consonants stops and fricatives, and excluded glides. Glides were included in the vowel categories (Stoel-Gammon Reference Stoel-Gammon1989). A canonical syllable has a fully resonant vowel and a quick (< 250 ms) transition between consonant and vowel. We tested 9- to 10-month-olds because they typically have all four vocalization types in their repertoires. Infant vocalizations which contain consonant sounds are considered to be more developmentally advanced, as they are more speech-like. Reliability for infant vocalizations was calculated based on independent coding of 20% of the sample (ICC = .92).
We tallied infant vocalizations from session 2, after infants had become accustomed to the lab play room during session 1. We calculated the proportion of infant's vocalizations with CV structure to the total number of vocalizations to assess the maturity of infants’ vocalizations. We included marginal and canonical syllables in our measure of CV vocalizations because over the course of the first year, the proportion of infant vocalizations containing consonants increases (Holmgren, Lindblom, Aurelius, Jalling, & Zetterström, Reference Holmgren, Lindblom, Aurelius, Jalling, Zetterström, Lindblom and Zetterström1986). We analyzed infant and caregiver speech from separate sessions to maximize our sampled data, as caregivers generated more utterances during session 1 and infants vocalized more during session 2. Analyzing caregiver and infant utterances from separate sessions also provided caregiver and infant variables that were more independent from each other.
Results
Caregiver speech: linguistic comparisons
Parents produced less contingent than non-contingent speech, with significantly fewer unique words spoken as part of contingent utterances (Figure 1) (t(29) = –7.92, p < .001, d = –1.71). A binomial test showed that a significant number of caregivers (n = 28 of 30) uttered fewer unique words contingently on infant vocalizations than non-contingently (z = –4.52, p < .001).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220810100340343-0828:S0305000919000291:S0305000919000291_fig1g.gif?pub-status=live)
Figure 1. Boxplot of the number of unique words parents uttered as part of contingent and non-contingent utterances. *** p < .001.
To compare syntactic complexity across speech types, caregivers’ MLUw was calculated for contingent and non-contingent utterances (Figure 2). Caregivers had significantly shorter contingent utterances than non-contingent utterances (t(29) = –4.76, p < .001, d = –0.75). A binomial test showed that a significant number of caregivers (n = 24) showed this pattern (z = 3.97, p < .001). Because tests of individual subject means violate non-independence assumptions, we used R (R Core Team, 2018) and R package lme4 (Bates, Maechler, Bolker, & Walker, Reference Bates, Maechler, Bolker and Walker2015) to perform a linear mixed effects analysis of the relationship between MLUw and speech contingency. We entered contingency of speech (without an interaction term) into the model as a fixed effect, and subject intercepts and by-subject random slopes for the effect of speech contingency as random effects. Visual inspection of qq plots of residuals did not reveal obvious deviations from homoscedasticity. P-values for the mixed effects model were obtained by a likelihood ratio test of the full model against a null model. The model estimated an average change in MLUw of –1.07 words from non-contingent to contingent speech (p < .001).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220810100340343-0828:S0305000919000291:S0305000919000291_fig2g.gif?pub-status=live)
Figure 2. Boxplot of the mean number of words per contingent and non-contingent parent utterances. *** p < .001.
To further test syntactic complexity, the proportion of utterances which contained a single word were calculated for contingent and non-contingent utterances (Figure 3). A significantly higher proportion of contingent than non-contingent utterances were a single word (t(29) = 5.45, p < .001, d = 1.20). A binomial test showed that a significant number of caregivers (n = 26) showed this pattern (z = 4.47, p < .001).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220810100340343-0828:S0305000919000291:S0305000919000291_fig3g.gif?pub-status=live)
Figure 3. Boxplot of the proportion of contingent and non-contingent parent utterances that were a single word in length. *** p < .001.
All caregiver speech and language measures were significantly inter-correlated (Table 2). As expected, diversity of words was positively correlated with utterance length. Conversely, proportion of single-word utterances was negatively correlated with both utterance length and diversity of words.
Table 2. Correlations between linguistic measures across contingent and non-contingent utterances. * p < .05, ** p < .01, *** p < .001.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220810100340343-0828:S0305000919000291:S0305000919000291_tab2.gif?pub-status=live)
Caregiver speech: acoustic comparisons
Analysis of caregiver fundamental frequency [F0] during contingent and non-contingent speech reveals no significant differences between speech types. The mean F0 for contingent speech (M = 275.54 Hz, SD = 58.26) was not significantly different than for non-contingent speech (M = 286.26 Hz, SD = 43.62) (t(16) = 2.11, p = .42). The F0 range for contingent speech (M = 192.42 Hz, SD = 61.12) was likewise not significantly different than for non-contingent speech (M = 210.19 Hz, SD = 57.64) (t(16) = 1.74, p = .31). The mean and range for F0 for both contingent and non-contingent speech are consistent with previously descriptions of naturally produced IDS (e.g., Fernald & Simon, Reference Fernald and Simon1984). To assess effects of individual differences in mean pitch on our analyses on pitch, we used mixed effects modeling as in the above MLUw analysis. The model estimated an average change of –12 Hz when changing from non-contingent to contingent speech (p = .14).
Infant vocalizations and caregiver speech
We analyzed the relation between parents’ lexical diversity and infant vocal maturity (Figure 4). The number of unique words in contingent parental speech predicted vocal maturity in infants (r(28) = .40, p = .02). The number of unique words in non-contingent speech did not significantly predict vocal maturity (r(28) = .04, p = .80).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220810100340343-0828:S0305000919000291:S0305000919000291_fig4g.gif?pub-status=live)
Figure 4. a. Parents who used more lexically diverse contingent speech had infants with more advanced vocalizations (r(28) = .40, p = .02). The best outcomes for infant vocal development were associated with parents producing approximately 75–150 word types. b. The correlation between lexical diversity in non-contingent speech and infant vocalization quality was not significant (r(28) = .04, p = .80).
Discussion
We found that parents simplified the statistical and syntactic structure of their speech in response to babbling. Contingent speech contained fewer unique words and contained both shorter utterances and more single-word utterances. In combination, these characteristics of parents’ contingent speech suggest a new form of influence of infants’ prelinguistic vocalizing on the ambient linguistic environment. Infants’ immature vocalizations may create language learning opportunities by eliciting responses from parents that contain simplified, more learnable information.
Linguistic simplification of parents’ contingent speech may provide particular benefits for infant learning, because infants often babble at times of focused attention and heightened arousal (Goldstein et al., Reference Goldstein, Schwade, Briesch and Syal2010b). Infants more accurately remember the features of objects to which they had babbled, as compared to objects that received similar looking and handling but no babbling (Goldstein et al., Reference Goldstein, Waterfall, Lotem, Halpern, Schwade, Onnis and Edelman2010a). In addition, studies of infant vocal learning have shown that infants rapidly learn patterns of parental speech when they are uttered contingently on babbling (Goldstein & Schwade, Reference Goldstein and Schwade2008). When parents change their speech statistics in response to infant vocalizations, they provide more learnable input to infants at a time that infants seem best able to learn from it.
Parents simplified their speech in two characteristic ways that could facilitate language learning. First, contingent parental speech contained fewer unique words than non-contingent speech. Providing a narrower distribution of words in contingent speech might serve as simplified input that is more tuned to infants’ developmental level. Second, contingent parental speech has a higher proportion of single-word utterances and shorter utterances than non-contingent speech.
The higher proportion of single-word utterances in contingent speech may benefit infants. Single-word utterances simplify the task of finding word boundaries, facilitating statistical learning (Lew-Williams, Pelucchi, & Saffran, Reference Lew-Williams, Pelucchi and Saffran2011). Parents’ production of single words predicted children's later production of those words (Brent & Siskind, Reference Brent and Siskind2001). Parents also used shorter utterances in contingent than non-contingent speech. Previous studies have shown that parents may make modifications to their utterance complexity over the course of infant language development that are tied to their infants’ learning progress. Evidence from large-scale recordings of a single family suggest that caregivers’ MLUw may decrease until children begin to combine words around 16 months (Roy et al., Reference Roy, Frank and Roy2009). These changes in parent MLUw may correspond to a shift in children's language comprehension. This interpretation is consistent with previous hypotheses of parents adjusting their speech to their infants’ developmental level (Snow, Reference Snow, Fletcher and MacWhinney1995). The fine-tuning hypothesis (Snow, Reference Snow, Fletcher and MacWhinney1995) suggests that adults adapt the complexity of their speech to infants and children in response to properties of their immature speech and language. Taken together, these speech simplifications indicate that infants’ immature vocalizing serves to elicit more learnable speech from their caregiver. Because parents are sensitive to the maturity of infants’ vocalizations, future research will compare differences of parent responses to infants’ cries and non-cry vocalizations and responsiveness to precanonical and canonical vocalizations.
In contrast to these linguistic changes, the prosodic features (e.g., pitch contour) of parental speech did not differ significantly across contingent and non-contingent utterances. These findings are consistent with past work demonstrating the ubiquity of pitch changes in infant-directed speech (Fernald & Morikawa, Reference Fernald and Morikawa1993). Stability of prosodic features may highlight changes in the underlying linguistic structure of parents’ contingent utterances that may be relevant for speech and language development.
We next examined relations between parents’ speech and their infants’ vocal development. We found that the lexical diversity of contingent parental speech predicted infants’ vocal maturity. In contrast, the lexical diversity of non-contingent speech did not predict vocal development. More studies are required to ascertain causality, but there are several reasons why the structure of contingent parental speech may influence infant vocal learning. Although more lexically diverse contingent speech was associated with more advanced infant vocal productions, the amount of lexical diversity in contingent speech was generally less than was present in non-contingent speech. Recent research has established that infants seem to seek information streams characterized by intermediate levels of complexity (e.g., Kidd & Hayden, Reference Kidd and Hayden2015). Such information streams might be optimal for learning, as they are neither overly simple nor insurmountably complex. Infant learning might thus be optimized when parents provide an intermediate level of variability in their speech. Other research has established the crucial contribution of variability to learning. For example, infants’ construction of cross-modal associations is facilitated by variability in exemplars of category membership (Vukatana et al., Reference Vukatana, Graham, Curtin and Zepeda2015). Infants exposed to single exemplars of a novel animal were unable to learn pairings of the novel animal with a novel animal sound. In contrast, when infants were exposed to multiple exemplars of the animal category during training, they both learned the animal–sound pairing and generalized the animal–sound pairing to new category exemplars.
In our view, intermediate levels of variability in parents’ words could direct infants’ attention to shared features of the words (e.g., mature consonant–vowel alternations) and away from features irrelevant to the learning task (e.g., pitch fluctuation). A longitudinal study of children aged one to four years found that caregivers’ production of a larger variety of words over that time period was positively related to their children's vocabulary (Huttenlocher et al., Reference Huttenlocher, Waterfall, Vasilyeva, Vevea and Hedges2010). To investigate longitudinal changes in how parents respond to infants’ vocalizations, we are currently investigating parents’ speech in response to vocalizations of infants younger and older than those in the present study. Partial variation in language content over successive utterances may perceptually highlight changes in language content and make them easier to learn (Goldstein et al., Reference Goldstein, Waterfall, Lotem, Halpern, Schwade, Onnis and Edelman2010b). The presence of intermediate levels of variation in contingent feedback from parents should make the structure of language more salient to the learner. To test the causal influence of exemplar variability on infant vocal learning, current studies in our lab are experimentally manipulating the level of speech variability infants hear and measuring subsequent in-the-moment changes in infant vocal production. If manipulations of the distributional properties of parents’ speech lead to changes in infants’ vocalizations, the findings would indicate that parental speech variability plays a causal role in infant vocal learning. Past findings regarding intermediate complexity and infant attention come from non-social stimuli (Kidd, Piantadosi, & Aslin, Reference Kidd, Piantadosi and Aslin2014; Kidd & Hayden, Reference Kidd and Hayden2015). However, if, as posited, the effect of complexity on learning is a general principle of learning, then it should also hold for other contexts. Future analyses will also consider whether CV vocalizations (compared to vowel-only vocalizations) differentially influence parents’ contingent speech, as infant vocalization type influences other forms of parental responses (Albert et al., Reference Albert, Schwade and Goldstein2017).
In addition to the structural modifications in parents’ contingent speech, the contingent timing of parents’ speech on infant vocalizations may facilitate learning for multiple reasons. First, as found in previous studies of social influences on prelinguistic vocal learning, infants learn new phonological patterns and word–object associations better when information is presented contingently on babbling (Goldstein & Schwade, Reference Goldstein and Schwade2008; Goldstein et al., Reference Goldstein, Waterfall, Lotem, Halpern, Schwade, Onnis and Edelman2010a). Second, infants may find contingent responses themselves rewarding, which may facilitate learning. The affiliative bonds present in parent–infant relationships may have deep connections with reward and memory formation (Depue & Morrone-Strupinsky, Reference Depue and Morrone-Strupinsky2005). Reward might also play a role in combination with activation of social motivation regions of the brain during learning (Syal & Finlay, Reference Syal and Finlay2010) which are highly interconnected with the motor circuitry of vocalizing (Theofanopoulou, Boechx, & Jarvis, Reference Theofanopoulou, Boeckx and Jarvis2017). In addition, learning itself is hypothesized to be rewarding to infants (Kidd & Hayden, Reference Kidd and Hayden2015). Thus, social interaction with mature vocalizers which provides information not available in the absence of social interaction is a primary contender for facilitating learning. Third, infants have limited memory and may need to capitalize on linguistic content within short time-windows when their attention is heightened and they are expecting a response from their parents (Kareev, Reference Kareev1995; Goldstein & Schwade, Reference Goldstein, Schwade, Blumberg, Freeman and Robinson2010).
In our view, immature vocalizations create learning opportunities by eliciting social responses that contain simplified, learnable information. These findings have important implications for current large-scale data collection and intervention studies on language development. Sophisticated and useful data on changes in linguistic structure as a function of contingent timing can be gleaned from home recording efforts that are currently focused on turn-taking and other forms of parent–infant interaction (Romeo et al., Reference Romeo, Leonard, Robinson, West, Mackey, Rowe and Gabrieli2018). Several interventions for at-risk families currently focus on increasing the number of words parents say (e.g., Providence Talks; <http://www.providencetalks.org>) or turn-taking interactions with infants (Leffel & Suskind, Reference Leffel and Suskind2013) but have not, to the best of our knowledge, focused on the relevance of learning distributional and temporal properties of parents’ speech to infants.
Our results suggest that infants, via their immature vocalizing, play an important role in shaping their own language environment. Infants bring curiosity about their environment to the continuous series of new situations they are exposed to (Moulin-Frier, Nguyen, & Oudeyer, Reference Moulin-Frier, Nguyen and Oudeyer2014; Kidd & Hayden, Reference Kidd and Hayden2015). Accurate prediction of environmental changes, an underlying learning mechanism in computational models of vocal learning, may also support infant learning in social contexts (Moulin-Frier et al., Reference Moulin-Frier, Nguyen and Oudeyer2014). Such models choose to learn from data over which they can minimize the error of their own predictions at the highest rate. By vocalizing, infants have the opportunity to observe the effects of their vocalizations on parents. Over their first year, infants quickly come to associate their immature vocalizations with responses from their parents (Goldstein et al., Reference Goldstein, Schwade and Bornstein2009). Eliciting mature speech sounds from caregivers may become the target of infants’ curiosity and subsequent guidance of vocal development. Parents infant-directed behaviors are multimodal in nature. Here we report the first indication (to our knowledge) that parents’ speech is sensitive to infant vocal behavior in real time; however, future research could investigate whether parents simplify their behaviors in other modalities. For more advanced understanding of early infant learning, future experimental, large-scale observational, and computational research should incorporate the affects infants have on the temporal and distributional properties of parents’ speech.
Author ORCIDs
Michael H. Goldstein, 0000-0001-6672-3752.
Acknowledgments
Sofia Carrillo, Shelly Zhang, Kexin Zheng, and SoYoung Kwon transcribed parent speech and coded infant vocalizations. Data collection was supported by NSF grant BCS-0844015 to MHG. We thank the families who participated in the study.