Introduction
It has long been observed that the characteristic style of caregiver speech to infants known as infant-directed speech (IDS) changes as the infant gets older; it has been suggested in particular that these changes correlate with the development of the infant (Kitamura & Burnham, Reference Kitamura and Burnham2003; Niwano & Sugai, Reference Niwano and Sugai2002; Paavola, Kunnari, Moilanen, & Lehtihalmes, Reference Paavola, Kunnari, Moilanen and Lehtihalmes2005; Snow, Reference Snow1972; Stern, Spieker, Barnett, & MacKain, Reference Stern, Spieker, Barnett and MacKain1983; but see also Newport, Gleitman, & Gleitman, Reference Newport, Gleitman, Gleitman, Snow and Ferguson1977). Further, caregivers’ IDS changes with their communicative intent as well. Kitamura and Burnham tracked changes in F0 and pitch range in IDS as a function of the age of the infant, and correlated these modifications with mothers’ intent, claiming that “mothers differentially adjust mean F0 and pitch range to express various nuances of communicative intent … in response to outward signs of development in the infant” (Reference Kitamura and Burnham2003, p. 102). Thus it seems that adults modify their IDS based on their understanding of the developmental stage of the infant (see Albert, Schwade, & Goldstein, Reference Albert, Schwade and Goldstein2017, for a direct test of this claim) and changes in their own communicative intent.
It is one thing for adults to make these adjustments, and quite another for infants to make use of them. A wide range of studies have demonstrated diverse connections between IDS and infants’ linguistic abilities: between IDS prosody and vowel discrimination (Trainor & Desjardins, Reference Trainor and Desjardins2002) and word-stream segmentation (Floccia et al., Reference Floccia, Keren-Portnoy, DePaolis, Duffy, Delle Luche, Durrant, White, Goslin and Vihman2016; Thiessen, Hill, & Saffran, Reference Thiessen, Hill and Saffran2005); and between features of IDS and lexical comprehension (Gogate, Walker-Andrews, & Bahrick, Reference Gogate, Walker-Andrews and Bahrick2001), comprehension and production (Goldstein & Schwade, Reference Goldstein and Schwade2008; Hartman, Bernstein Ratner, & Newman, Reference Hartman, Bernstein Ratner and Newman2017), and, particularly pertinent, word learning in older infants (Graf Estes & Hurley, Reference Graf Estes and Hurley2013; Ma, Golinkoff, Houston, & Hirsh-Pasek, Reference Ma, Golinkoff, Houston and Hirsh-Pasek2011).
Results from neuroimaging studies provide evidence for the recognition of communicative intent at a young age. Both IDS and infant-directed gesture enhance activation in areas of the brain associated with communicative stimuli in 6-month-old infants (Lloyd-Fox, Széplaki-Köllőd, Yin, & Csibra, Reference Lloyd-Fox, Széplaki-Köllőd, Yin and Csibra2015), in fact, in just those areas activated in adults being addressed in a communicative interaction (Kampe, Frith, & Frith, Reference Kampe, Frith and Frith2003). Woodward and Hoyne (Reference Woodward and Hoyne1999) suggest that 13-month-old infants’ perception of an adult's intent-to-label supports their acceptance of labels as referential. The recognition of speaker intent also plays a crucial role in word learning, particularly for older infants (Bloom, Reference Bloom1997; Csibra, Reference Csibra2010; for a review, see Parish-Morris, Hennon, Hirsh-Pasek, Golinkoff, & Tager-Flusberg, Reference Parish-Morris, Hennon, Hirsh-Pasek, Golinkoff and Tager-Flusberg2007).
Thus, caregivers adjust their style of communicating with their infants to the perceived needs and abilities of those infants, whose language learning can be enhanced by those adjustments. Infants, on their part, appear to apprehend their caregivers’ various communicative intentions, and, particularly relevant to this work, make use of adults’ intention to refer in word learning.
Our aim in this work is to investigate whether 14-month-old infants can recognize and exploit task-appropriate intentions of an adult's IDS in associating minimal pair nonsense words and objects. A robust finding in the word–object association literature is that 14-month-old infants, despite their ability to discriminate minimal pairs like bin and din, cannot associate those nonsense words with novel objects in a habituation/dishabituation paradigm (Pater, Stager, & Werker, Reference Pater, Stager and Werker2004; Stager & Werker, Reference Stager and Werker1997). Follow-up studies to this work have identified conditions under which infants can be successful: when the words are familiar (Fennell & Werker, Reference Fennell and Werker2003); the objects are familiar (Fennell & Werker, Reference Fennell, Werker, Beachley, Brown and Conlin2004); the words are embedded in a sentential context (Fennell & Waxman, Reference Fennell and Waxman2010); the experiment includes an initial orientation phase establishing the referential nature of the task (Fennell & Waxman, Reference Fennell and Waxman2010); and the infant hears the words from a (live) experimenter (Fais et al., Reference Fais, Werker, Cass, Leibowich, Vilela Barbosa and Vatikiotis-Bateson2012).
In order to examine the effects of the use of contextualized, task-appropriate stimuli on 14-month-old infants’ ability to succeed in a minimal pair nonsense word–object association task, we tested two different kinds of stimuli, those that were task-appropriate, and those that were task-inappropriate.
Infants at 24 months of age are past their ‘vocabulary spurt’ (Goldfield & Reznick, Reference Goldfield and Reznick1990), even for ‘late spurters’ (Mervis & Bertrand, Reference Mervis and Bertrand1995), and thus are likely to be demonstrating word knowledge. We reasoned that a mother interacting with an infant at this age, in a context involving objects labeled with words new to the infant (i.e., bin and din), would intend to refer to the objects, and adjust her speech to make word learning easier for the infant (for a review of work supporting this suggestion, see Golinkoff, Can, Soderstrom, & Hirsh-Pasek, Reference Golinkoff, Can, Soderstrom and Hirsh-Pasek2015, and work cited above). That is, her IDS would be appropriate for a word–object association task. On the other hand, at 5 months of age, infants tend not to be engaging even in recognizable babbling (de Boysson-Bardies, Hallé, Sagart, & Durand, Reference de Boysson-Bardies, Hallé, Sagart and Durand1989). Thus, a mother engaged in the same context (interacting with objects labeled bin and din) with her much younger infant, should not have the intention to teach these nonsense words, and thus should not modify her speech to this younger infant in the same way as she modifies her speech to her older infant. In particular, the characteristics of her IDS may be appropriate to expressions of affect but, crucially, not to word learning (Kitamura & Burnham, Reference Kitamura and Burnham2003).
Infants can make use of particular characteristics of IDS, including intent, to enhance language acquisition (e.g., Floccia et al., Reference Floccia, Keren-Portnoy, DePaolis, Duffy, Delle Luche, Durrant, White, Goslin and Vihman2016; Parish-Morris et al., Reference Parish-Morris, Hennon, Hirsh-Pasek, Golinkoff and Tager-Flusberg2007; Thiessen et al., Reference Thiessen, Hill and Saffran2005; Trainor & Desjardins, Reference Trainor and Desjardins2002). We conjectured that stimuli taken from a potential word learning interaction between a mother and her 24-month-old infant would support success in a word–object association task, while the same words from a mother's interaction with her 5-month-old infant would not lead to success. To put this line of reasoning to a rigorous test, we used only single words bin and din excised from contextualized recordings, and tested 14-month-olds in the habituation/dishabituation Switch task (Stager & Werker, Reference Stager and Werker1997).
Methods
Stimuli
We recruited a mother from our regular participants, who had sons ages 24 mon, 25 d, and 5 mon, 17 d, to record stimuli at our center. She sat in a comfortable chair or on the floor and interacted with one infant at a time, using the same toys in each interaction. Three of the toys that were not immediately nameable with a single word (i.e., could not easily be labeled ‘doll’, ‘ball’, ‘truck’, etc.) were affixed with labels that said bin, din, and neem. The mother was instructed to behave with her infant and the toys as she would naturally, and that the purpose of the recording was to “investigate how mothers interact with their infants”. She was asked to use the words bin, din, and neem when interacting with those particular toys in order to establish a baseline for comparison to other mothers. Nothing further was said to restrict or influence the mother's speech in any way, and the instructions she received were identical for the recording with each infant. We also recorded the mother naming and describing the objects to the adult experimenter in order to get a sample of her adult-directed speech (ADS). After the recording, we debriefed the mother as to the actual purpose of the study, and she gave her permission for words from the recordings to be used as stimuli in the study.
The mother's speech was recorded directly onto a G4 Macintosh using the software program SoundEdit. All of the mother's productions of bin, din, and neem were excised from the recordings using Praat (Boersma & Weenink, Reference Boersma and Weenink2005), and the six tokens of each word used in the present study were chosen from these for maximal clarity, and, as far as possible, to match utterance position (i.e., initial, middle, final, whether in declaratives or questions, or spoken in isolation). In addition, tokens were selected such that the overall intensities of the series of each type of stimulus were comparable. For ease of reference, tokens spoken by the mother during her interaction with the older infant will be referred to as ‘24-mon tokens’; those spoken to the younger infant will be referred to as ‘5-mon tokens’, noting that, of course, these are tokens spoken to the infant of that age, not by the infant.
Fixed-time habituation trials of 15 s for both 24-mon stimuli and 5-mon stimuli were constructed by repeating each of the six chosen tokens once, in a semi-random order, and then replacing one of the tokens with a relatively ‘flat’ ADS token, for a total of 12 tokens. An ADS token was included in each of the 24-mon and 5-mon habituation stimuli sets in order to ‘prepare’ the infant for the test stimuli, which consisted entirely of ADS tokens. Test stimuli were comprised of each of the four remaining ADS tokens, repeated twice, except for the token with the least flat prosodic contour, which was repeated only once. This resulted in a series of 11 ADS tokens presented in a semi-randomized order. Using ADS tokens allowed us to keep the test stimuli identical across both study conditions. The average duration, pitch, pitch range, and intensity of each of the habituation and test stimuli are shown in Table 1. Unlike in other studies of this kind, we did not match these properties across stimuli; the variations in duration, pitch, and intensity for the stimuli are typical of the natural speech for this mother and, as such, play a role in conveying her communicative intent in each condition.
Pre- and post-test stimuli were constructed in a way similar to that for the habituation stimuli, using 24-mon and 5-mon tokens of the word neem.
The visual stimulus for the pre- and post-trials consisted of a colorful, moving waterwheel, videotaped against a black background; the stimuli for the habituation and test trials were two brightly colored nonsense objects, photographed against a black background and animated to move slowly back and forth during the trials.
Participants
Thirty-two infants, 16 males, around 14 months of age (average age 437 days; range 413–457) were recruited via calls to participants in an infant database. An additional 57 babies participated, but were not included in the study for the following reasons: 10 looked in the mirror behind the parent during test trials; 10 did not habituate within the criterial number of trials; 23 were too fussy to complete the study; 8 looked at one or more of the test trials for less than 1 s; 4 were off-camera at test; and 2 could not be included because of experimental error. All infants were hearing at least 80% English in the home, and were full-term, healthy infants. Caregivers gave consent for their infants to participate, and the protocol for the experiment was approved by the university Behavioral Research Ethics Board.
Procedure
Infants sat on their caregiver's lap in a darkened and sound-attenuated room, in front of a 27-inch television monitor framed by a dark curtain that also concealed the speakers that delivered the auditory stimuli. The experiment was controlled using Habit (Cohen, Atkinson, & Chaput, Reference Cohen, Atkinson and Chaput2000), and began with a 15 s pre-test during which the infant heard a recording of the word neem and saw the moving waterwheel. During the pre-test trial infants could become accustomed to the room and the sounds, and begin to direct their attention to the screen.
The habituation phase of the experiment followed the pre-test. A bright, animated attention-getter was projected on the monitor in silence and, once infants had looked toward that visual stimulus, one of two nonsense objects appeared, moving back and forth while the 15 s habituation stimulus for either bin or din was played. The changes in direction of the moving visual stimuli were asynchronous with the start of each auditory stimulus. During the habituation phase, infants were presented with a semi-randomized series of consistent object–word pairings: bin paired with either the rounded object or the object resembling a molecule, and din paired with the other object, for a maximum of 24 trials. In one habituation condition, infants heard the 24-mon stimuli; in the second, infants heard the 5-mon stimuli. A digital video camera, placed below the television monitor and obscured by the curtain except for the lens, recorded the baby's face and allowed the experimenter, who was blind to the nature of the trials presented, to record infant looking time to the monitor via the Habit software, in an external observation area, by pressing a key on the computer keyboard. Reaching a criterial decrease in looking time (predefined as a block of four habituation trials in which the infant looked 65% or less of the longest amount of looking time registered in any previous four-block window) ended the habituation phase. At the end of the habituation phase, Habit delivered the test stimuli.
Infants in both conditions were presented with test trials consisting of ADS tokens of bin and din paired with the two nonsense objects. The test phase consisted of two blocks of two test trials each. In each block, one test trial repeated one of the stimuli pairs just as in the habituation phase (the Same test trial); the second test trial presented a mis-pairing of word and object (the Switch trial). Switch trials consisted either of the same word as the word in the Same trial paired with the other object, or the same object as the one in the Same trial paired with the other word. Thus, in the Switch trial, although the infant was seeing a familiar visual stimulus and hearing a familiar word, these two stimuli were presented in a novel pairing. Whether the word or object was switched, the order of the Same and Switch trials, and the pairings of word and object were counterbalanced across the sex of the participants. One possible configuration of pairings is illustrated in Figure 1.
We had no basis for predicting how infants would respond to the marked difference between the ADS test tokens and the IDS habituation tokens. Including two blocks of test trials allowed us to investigate whether infants would: readily adapt to the nature of the stimuli (i.e., recognize the test trial tokens to be different-sounding tokens of the same words they had just been hearing) and show evidence of word association in both blocks of test trials; readily adapt but perform differentially in only the first block; take longer to adapt and only perform differentially in the second block; or not look differentially in either block.
After the test trials, during the post-test phase, participants were presented with the same waterwheel and neem stimulus as in the pre-test.
The recordings of the infants were digitized at approximately 30 fps, and infant looking time to and away from the visual stimuli was coded offline, frame-by-frame, by a coder who was blind to the status of the test trials. The total time that infants were considered to be looking toward the stimulus was recorded as the looking time for a given trial. The data from infants who did not habituate within 24 trials were not included (Werker, Cohen, Lloyd, Stager, & Casasola, Reference Werker, Cohen, Lloyd, Stager and Casasola1998).
Results
Greater looking time to Switch trials than to Same trials is considered an indication that infants recognized the mis-pairing of the words and objects they experienced in the habituation phase, and thus as evidence that they succeeded in associating the words and objects. Sex is routinely included as a factor in the analysis for work done in our center. In addition, its inclusion is indicated by the sex differences found in the linguistic abilities of young infants (e.g., Bauer, Goldfield, & Reznick, Reference Bauer, Goldfield and Reznick2002; Huttenlocher, Haight, Bryk, Seltzer, & Lyons, Reference Huttenlocher, Haight, Bryk, Seltzer and Lyons1991; Paavola et al., Reference Paavola, Kunnari, Moilanen and Lehtihalmes2005; Woodward, Markman, & Fitzsimmons, Reference Woodward, Markman and Fitzsimmons1994), most pertinently, in this particular task and exact procedure (Werker et al., Reference Werker, Cohen, Lloyd, Stager and Casasola1998). Infants’ performance in the test phase was assessed using a mixed 2 (test block: block 1 vs. block 2) × 2 (test trial type: same vs. switch) × 2 (sex: female vs. male) × 2 (stimuli: 5-month vs. 24-month stimuli) analysis of variance (ANOVA). There were no main effects. There were five interactions: between test block and stimuli (F(1,28) = 5.878, p = .022, η p2 = .173); between trial type and sex (F(1,28) = 4.831, p = .036, η p2 = .147); among test block, sex, and stimuli (F(1,28) = 14.438, p = .001, η p2 = .340); among test block, trial type, and sex (F(1,28) = 8.095, p = .008, η p2 = .224); and among test block, trial type, sex, and stimuli (F(1,28) = 4.882, p = .035, η p2 = .148).
The four-way interaction among test block, trial type, sex, and stimuli was further explored in two ANOVAs conducted on the data split by stimuli type. A mixed, 2 (test block: block 1 vs. block 2) × 2 (test trial type: same vs. switch) × 2 (sex: female vs. male) ANOVA for each stimuli type revealed no main effects or interactions for the 5-mon stimuli. For the 24-mon stimuli, there was a main effect of test block (F(1,14) = 5.675, p =.032, η p2 = .288), and three interactions: between test block and sex (F(1,14) = 14.319, p = .002, η p2 = .506); between trial type and sex (F(1,14) = 4.782, p = .046, η p2 = .255); and among test block, trial type, and sex (F(1,14) = 12.703, p = .003, η p2 = .476).
A follow-up analysis of the three-way interaction, split by block, revealed in block 1, a main effect of trial type for the infants hearing the 24-mon stimuli (F(1,14) = 4.542, p = .051, η p2 = .245), as well as an interaction between trial type and sex (F(1,14) = 16.056, p = .001, η p2 = .245). There were no significant effects in block 2 for the same group of infants.
A follow-up analysis of the two-way interaction in the first block of test trials for infants hearing the 24-mon stimuli split by sex revealed that females looked significantly longer at the Switch trials than Same trials (F(1,7) = 14.225, p = .007, η p2 = .670; M switch = 11.12 s, SD switch = 2.98, M same = 6.95 s, SD same = 2.31). There was no main effect for male participants (F(1,7) = 2.604, p = .151, η p2 = .271; M switch = 5.44 s, SD switch = 1.78, M same = 6.71 s, SD same = 2.19).
Female infants looked significantly longer to the Switch trial than to the Same trial, in the first block of test trials, when hearing the 24-mon stimuli, but not when hearing the 5-mon stimuli. This pattern of looking behavior, showing a significant difference in only the first block of two blocks of test trials, is consistent with other previous studies focusing on word–object association (Yoshida, Fennell, Swingley, & Werker, Reference Yoshida, Fennell, Swingley and Werker2009; Zamuner, Fais, & Werker, Reference Zamuner, Fais and Werker2014). This pattern of results may indicate, in this case, that female infants, at least, readily recognized the test tokens in the first test block as ADS versions of the IDS tokens heard during habituation, but that this recognition was not robust enough to support continued word–object association across the Switched test trial encountered in the first test block and into the second block of test trials. Figure 2 shows the mean looking times for Same and Switch test trials, for males and females hearing 24-mon stimuli and hearing 5-mon stimuli, in block 1.
Discussion
Female infants who heard stimuli consisting of recordings of bin and din spoken to a 24-month-old infant in a potential word-learning context showed evidence of associating the minimal pair words bin and din to novel objects in the habituation/dishabituation Switch task. On the other hand, neither female nor male infants hearing the words spoken to a 5-month-old infant showed this ability. It seems that, for female infants, there was an informative difference between tokens spoken to a 24-month-old infant and those spoken to a 5-month-old infant such that the former supported minimal pair word–object association while the latter did not. We have suggested above that the tokens spoken to the 24-month-old infant were likely to have been appropriate to the task of word learning, while those spoken to the 5-month-old were not. This result provides the first indication that such task-appropriate stimuli might make a difference in infants’ ability to succeed in a word learning task.
Sex differences
The effects of task-appropriate stimuli on word learning at this age are shown only for female infants in this study. This begs the question whether there are sex-correlated differences among the study participants that might shed some light on this differential effect. Analyses showed no statistical differences in age, percentage of exposure to English, vocabulary comprehension as measured by the MacArthur-Bates Communicative Development Inventories (Fenson et al., Reference Fenson, Marchman, Thal, Dale, Reznick and Bates2007), habituation time, or number of habituation trials across the sex of the infants. Thus, none of these factors are helpful in understanding the difference between female and male performance in this study. A number of previous studies have also showed female success and male failure for rapid word learning (e.g., Werker et al., Reference Werker, Cohen, Lloyd, Stager and Casasola1998; Woodward et al., Reference Woodward, Markman and Fitzsimmons1994), and we note that our results are consistent with these previous studies.
Though it is beyond the scope of this work to investigate the foundations for the sex difference we found, it does raise fascinating questions for further study. For example, a more finely tuned measure of attention or an analysis of gaze behavior might uncover sex differences in learning during habituation not revealed by time alone. On the other hand, it might be the case that maternal interaction history or other social factors could be at play. Or it may be that male infants can be affected by the task-appropriateness of the stimuli, but this effect would become apparent at a later developmental stage. In the absence of answers to these interesting questions, we restrict our claim concerning the importance of task appropriateness in word learning to its effects on female infants at this age.
Nature of the stimuli
No clear patterns of difference were discernible in the properties of each word across the two habituation conditions. Further investigation of this issue using a much larger dataset would be required in order to investigate the specific acoustic correlates of intention.
The role of the intent to refer
We know that adults change the nature of the speech they address to infants over time, conforming to their perceptions of the developmental stages of the infants, and that infants are capable of using features of IDS, particularly communicative intent, in language acquisition, specifically in word learning (Woodward & Hoyne, Reference Woodward and Hoyne1999). We propose that infants’ ability to associate minimal pair novel words with nonsense objects in this study was supported by the communicative intent of the stimuli recorded from a mother interacting with her 24-month-old infant, a context in which the mother might certainly make adjustments to her IDS with the intent to teach the infant a new word. The infants’ perception of this intent enabled 14-month-old female infants to link minimal pair nonsense words and objects in a task at which they fail given non-contextualized stimuli in an otherwise identical paradigm. On the other hand, the adjustments made by the mother to her IDS in the context of interacting with her 5-month-old infant carried no such word teaching-oriented intention, and thus the 14-month-old infants hearing the stimuli derived from this interaction failed at the task, just as they did with non-contextualized stimuli (Stager & Werker, Reference Stager and Werker1997).
At least a part of the appropriateness of the 24-mon stimuli to a word-learning context is likely rooted in the intent of the mother to use the nonsense words to refer to the novel objects. Our findings, then, are consistent with studies indicating that 14-month-old infants can succeed in this task, even with non-contextualized stimuli, when they understand that the task crucially involves the referential relationship between the word and object, for example, when the stimuli tokens are embedded in typical phrases used to refer to objects, or when a training phase is included in which infants are shown familiar objects and hear their labels (Fennel & Waxman, Reference Fennell and Waxman2010). Fennell and Waxman claimed that the referential ‘mindset’ supported 14-month-old infants’ success in the task. How this ‘mindset’ might differ for female and male infants is another unanswered question whose exploration could yield new insights into the interplay of factors that contribute to the early acquisition of vocabulary.
Infant word learning takes place in contexts that often include adult fluent language users who interact with infants in word labeling and learning situations. Those adults may adjust their utterances to be appropriate to the developmental level of the infant, and to the purpose of providing referential information to the infant. Infants, for their part, are prepared and able to perceive this intention. The results of our study suggest that young female infants, at least, are able to use word learning-appropriate input to boost their word–object association capabilities, and thus to build their early vocabularies. Along with other important auditory features of IDS, the task-appropriateness of adult input may make a foundational contribution to early infant vocabulary building.
Acknowledgements
Enormous thanks are due to Padmapriya Kandhadai for her help with the statistical analyses and to both Priya and Irene de la Cruz Pavía for their helpful and insightful comments about the ideas discussed in this work. Thanks as well to Kimberley Chan (now Ung) for her help with conducting the study. This work was carried out with generous intellectual support from Janet F. Werker, and partial financial support from a grant to Janet F. Werker from the Social Sciences and Humanities Research Council of Canada (410-2004-744) and financial support to Laurel Fais from the Social Sciences and Humanities Research Council of Canada (410-2007-1003).