One of the central debates in the literature on language acquisition surrounds the question of how children learn the correspondences between linguistic forms – be they words or rules – to concepts. According to one perspective, children's acquisition of these form–meaning correspondences is driven by mechanisms of pattern detection and associative learning (Smith, Reference Smith and MacWhinney2003). A different perspective argues that, in addition to associative mechanisms, children rely on mechanisms for reading intentions and drawing communicative inferences (Tomasello, Reference Tomasello2003).
Regarding the acquisition of words, this theoretical debate is unresolved, as both perspectives have received substantial supporting evidence. Studies show that young infants (Schafer & Plunkett, Reference Schafer and Plunkett1998), dogs (Kaminski, Call & Fisher, Reference Kaminski, Call and Fischer2004) and connectionist simulations (Samuelson, Reference Samuelson2002) can learn the correspondence between words and referents, intimating that species-wide associative mechanisms might suffice for acquiring words. Researchers have challenged this conclusion, arguing that the ‘words’ learned in the above instances have limited meanings and communicative functions (Markman & Abelev, Reference Markman and Abelev2004; Werker, Cohen, Lloyd, Casasola & Stager, Reference Werker, Cohen, Lloyd, Casasola and Stager1998). For instance, it is unclear to what extent infants can extend novel nouns learned in extremely controlled conditions to similar objects, and whether ‘word-learning animals’ can use the ‘words’ they learn, in contexts different from the one in which they have learned these words. These questions support the claim that an understanding of speakers' intentions is necessary for acquiring meaningful and communicatively functional words (Bloom, Reference Bloom2000).
Consistent with this latter position, studies reveal that young children rely on speakers' intentions when learning words, even when intentional cues contradict associations. For instance, two-year-olds associate a novel name with the object a speaker is looking at, even if: (a) they are looking at a different object (Baldwin, Reference Baldwin1991); (b) there is another attractive object available (Moore, Angelopoulos & Bennet, Reference Moore, Angelopoulos and Bennett1999); or (c) there is no temporal contiguity between the naming and the appearance of the target object (Tomasello & Barton, Reference Tomasello and Barton1994). In general, young children seem to learn the correspondence between a novel word and an object only when intentionally, as opposed to accidentally, exposed to the correspondence (Diesendruck, Markson, Akhtar & Reudor, Reference Diesendruck, Markson, Akhtar and Reudor2004).
While the debate between the above-mentioned theoretical perspectives has been very lively with regards to the acquisition of words, very few experimental studies have systematically investigated these mechanisms with regards to the acquisition of rules. Consistent with the sufficiency of associative mechanisms, both connectionist networks (Dienes, Altmann & Gao, Reference Dienes, Altmann and Gao1999) and human infants (Gomez & Gerken, Reference Gomez and Gerken1999) can acquire artificial grammars based solely on the statistical regularities of a linguistic input. In fact, studies demonstrate that children more easily learn artificial languages with statistical regularities – e.g. with predictive dependencies between language components – than languages without such statistical properties (Saffran, Reference Saffran2002). These findings indicate that pattern detection and associative capacities might suffice for detecting formal patterns in a stream of signs.
However, as Naigles (Reference Naigles2002) persuasively argued, the above findings are inconclusive as to how children learn to which concepts different patterns correspond. That is, they do not decisively determine whether mechanisms of pattern detection are sufficient to account for the learning of meanings of linguistic patterns. Our hypothesis is that to acquire these correspondences between rules and aspects of their meanings – such as reference – children are helped by a sensitivity to speakers' intents. As Tomasello (Reference Tomasello2003) notes, intentions arguably provide causal links between the particular linguistic forms a speaker uses and the specific events present in the communicative context. That is, the noticing of intentions prompts children to expect a relevant relationship between what a person says and what the child experiences. Thus, if an adult uses linguistic form A, and subsequently presents children event type X, and then uses linguistic form B, and subsequently presents children event type Y, children are drawn to infer that these regular correspondences between linguistic forms and event types are not simply coincidental, but rather that they are principled. Given that they are principled and not coincidental, they are worthy of learning.
To test this hypothesis, we taught two-year-olds a particular morphological distinction in Hebrew between noun and verb forms that is typically acquired between three and four years of age (Berman, Reference Berman and Shimron2003). We believed that the use of a rule existent in their natural language, as opposed to an artificial rule, would make the learning easier to the children and more ecologically valid. We used two different training protocols that were identical in all parameters except in how children were introduced to the verbal and visual stimuli. In an Intentional condition, the introduction was within a communicative context in which the experimenter intentionally linked the training stimuli. In a Control condition, the link between training stimuli was not accompanied by an expressed communicative intention. We predicted that children would learn the morphological rule better in the Intentional than in the Control condition.
METHOD
Participants
Thirty-two two-year-old Israeli Hebrew-speaking children (M=2 ; 10, SD=3·6 months), 15 boys and 17 girls, participated in the study. They were recruited from – and tested at – daycares after signed parental consent. Based on reports from the daycare teachers and the parents, and on observations by the experimenter (who is a certified speech and language pathologist), children with language delay, hearing impairment or attentional problems were not included in the sample.
Materials
Twenty-five triads of computer animated objects, consisting of a target and two test objects, were used. One of the test objects was identical in appearance to the target but performed a different action (the appearance-match), whereas the other test object was different in its appearance from the target but performed the same action (the action-match) (see Figure 1). Twenty of the sets consisted of novel objects and actions, for which toddlers would not know a name. The other five sets consisted of objects and actions presumably familiar to two-year-olds. The 8-seconds long animations were constructed in Maya®, and were inserted in a PowerPoint® presentation, displayed on a 14 in laptop screen.
The experimenter referred to all animations with either a noun or a verb form derived from the same Hebrew morphological patterns. The novel noun and verb forms used were Hebrew-sounding words derived from the same made-up root, and had the same patterns as the familiar nouns and verbs. In Hebrew, as in other Semitic languages, all verbs and most nouns are constructed from (usually tri-) consonantal roots, which combine with a specifiable set of patterns in the form of vocalic infixes plus syllabic prefixes and/or suffixes. Whereas the root usually carries the core semantic meaning of the word, the word pattern defines its word class and other grammatical characteristics. In the present study, nouns were formed by the sequence ma-CCe-Ca, and verbs by the sequence mit-Ca-CeC or mit-Ca-CeaC (C stands for a root consonant). Examples of familiar words used are: madrega (‘[a] step in a staircase’) vs. mitparek (‘[it] disassembles’). Examples of novel words used are: maNGeLa (noun) vs. mitNaGeL (verb), and maLGeVa (noun) vs. mitLaGeV (verb).
Design
The experiment consisted of three stages – Baseline, Training and Evaluation – run in two separate meetings within ten days (first meeting: Baseline; second meeting: Training and Evaluation). Children were randomly assigned to an Intentional and a Control condition, equivalent in terms of the age (M Int=2 ; 95, SD=3·8 months; M Cont=2 ; 80, SD=3·3 months), gender (χ2 (1, N=32)=0·12, p>0·1) and daycare distribution of the participants (χ2 (5, N=32)=4·04, p>0·1). For each participant, half of the sets were accompanied by a noun form, and the other half by a verb form. The presentation order of the word types followed a constraint that the same type never occurred more than twice in a row. This order was counterbalanced between subjects.
Procedure
In the first meeting – the Baseline stage – a female experimenter invited children to sit by a table, about 70 cm from the computer screen, and told them she wanted to show them a game. The goal of this stage was to establish children's lack of knowledge of the distinction between the pertinent noun and verb morphological patterns. To accustom children to the procedure, the first two sets shown to children were familiar. The experimenter activated the target object, and said, ‘This [familiar word – either a noun or a verb].’ The experimenter then revealed the two test objects, called children's attention to both objects and asked them while pointing at each object, ‘Is this [familiar word] or is this [familiar word]?’ The Hebrew sentences in which the words were embedded, both when introducing the target object and questioning about the test objects in all stages, were grammatically correct and did not include any syntactic markers as to whether the words were nouns or verbs. Thus, children's decisions about the lexical class of the words had to be based exclusively on the morphological form of the words. After the two familiar sets, the experimenter showed children eight novel sets, half accompanied by novel nouns (e.g. ze mangela), and half by novel verbs (e.g. ze mitnagel) (see Figure 1). They were presented in a similar manner as the familiar sets, except that now the experimenter repeated the novel words five times when introducing the target object. Given that the goal of this stage was to ensure children did not know the rule prior to being taught by us, five children who answered correctly on more than six of the eight Baseline trials – thus manifesting some knowledge of the morphological rule – were replaced by five other children.
The second meeting started with a color game in the computer aimed to introduce children to the computer feedback. In this game, three color patches appeared on the computer screen, and the experimenter asked children to point to the referent of a color name. When children pointed to the correct referent, the experimenter clicked on a ‘pass’ box on the computer screen, and a Smiley face appeared clapping on the screen, saying, ‘well done!’ When children failed, the experimenter clicked on a ‘fail’ box on the computer screen, and the same face appeared with a sad look, voicing a disappointed expression, and children were encouraged to guess again. There were four trials in this game, and the color terms children were asked to identify varied in their familiarity to children. For instance, in the first trials children were asked to identify highly familiar terms (e.g. ‘red’ or ‘blue’), and in the last trials they were asked to identify less familiar terms (e.g. ‘gray’ or ‘purple’). This was done so as to make sure children would make both correct and incorrect answers, and thus would be exposed to both the positive and the negative feedback. The vast majority of children indeed got both types of feedback in response to their answers, and the few who answered correctly on all four trials were nonetheless shown the negative feedback as well. The Training stage then ensued. The goal of this stage was to teach children the morphological rule distinguishing between nouns and verbs. Children were shown twelve novel sets (four had appeared in the Baseline stage) and four familiar sets, in a fixed order: two familiar, four novel, one familiar, four novel, one familiar, four novel. Given that this was a relatively long session, in which children were exposed to twelve sets of objects they had never encountered before and therefore could not be certain about the accuracy of their responses, we presented children also with familiar sets so as to attenuate children's potential frustration with the task. It was in this stage that the experimental manipulation occurred.
In the intentional condition, the procedure had the following steps: (1) as in the Baseline stage, the target object appeared and was labeled by the experimenter with either a novel noun form or a novel verb form; (2) as in the Baseline stage, the test objects appeared right below the target object; (3) the target object was highlighted automatically, without the experimenter's intervention. The experimenter labeled the target object again, said ‘I want to show you more’, and then distinctively clicked on the mouse, which caused the correct test object to be highlighted; (4) the experimenter asked children while pointing at each test object, ‘Is this [target word] or is this [target word]?’; (5) children responded and received feedback from the computer regarding their accuracy (see Figure 2 for a display of steps 3 and 4).
The control condition was exactly the same as the Intentional condition except for two changes in step 3. First, in the presentation of the first two familiar sets, the correct test object automatically appeared highlighted on the screen, without the experimenter's intervention, and the experimenter expressed that she did not know why the computer did that. Second, in the subsequent step 3 of the novel trials, the experimenter simply labeled the target object, but then automatically the correct test object appeared highlighted on the screen, without any comments from the experimenter regarding her intentions or why the object was highlighted.
The final Evaluation stage was identical to the Baseline stage, except that all eight trials – four including noun forms and four verb forms – consisted of novel sets (four had appeared in the Baseline stage). Thus, there was no highlighting of objects, and no computer feedback in this stage. The goal of this stage was to assess children's learning of the morphological rule.
RESULTS
The dependent measure was the number of correct responses made by children. When the word had verb morphology, the correct response was to pick the action-match test object, whereas when the word had noun morphology, the correct response was to pick the appearance-match test object.
A repeated-measures 3-way ANOVA: (2) condition: Intentional vs. Control, as a between-subjects variable×(2) stage: Baseline vs. Evaluation,×word type: Noun vs. Verb as within-subjects variables, revealed a significant effect of condition (F(1, 30)=7·01, p=0·013, η2=0·19), such that children in the Intentional condition made more correct choices (M=4·78, SD=0·73) than children in the Control condition (M=4·12, SD=0·67). The analysis also revealed a significant effect of stage (F(1, 30)=10·44, p=0·003, η2=0·26), such that children made more correct choices in the Evaluation stage (M=5·53, SD=0·96) than in the Baseline stage (M=4·13, SD=1·26). Finally, there was also a main effect of word type (F(1, 30)=38·82, p<0·0001, η2=0·56), such that children made more correct choices on Noun trials (M=3·00, SD=0·75) than on Verb trials (M=1·45, SD=0·83).
More importantly, the analysis also revealed that the above main effects were qualified by a significant condition by stage interaction (F(1, 30)=10·44, p=0·003, η2=0·26) (see Figure 3), and a significant stage by word type interaction (F(1, 30)=6·42, p=0·017, η2=0·176) (see Figure 4). The interactions between condition and word type, and condition by stage by word type, were not significant (F(1, 30) <1).
Follow-up t-tests examining the source of the condition by stage interaction showed, first of all, that indeed there was no significant difference between conditions in the Baseline stage (t(30)=0·01). This finding was not too surprising, but it was important in showing that children in both conditions started the training protocols on level grounds. More importantly, and consistent with our hypothesis, the follow-up tests revealed: (a) a significant difference between conditions in the Evaluation stage, in favor of the Intentional condition (t(30)=3·79, p=0·001); (b) no improvement from the Baseline to the Evaluation stage in the Control condition (paired-t(15)=0·0); and (c) a significant improvement from the Baseline to the Evaluation stage in the Intentional condition (paired-t(15)=5·28, p<0·001). In fact, in the Evaluation stage, only children in the Intentional condition selected the correct test object more often than what would be expected by chance (chance=4) (t(15)=6·79, p<0·001).
Additionally, examining the source of the stage by word type interaction showed that correct responses for nouns were significantly higher than correct responses for verbs in the two stages (in the Baseline stage, paired-t(31)=6·50, p<0·001; in the Evaluation stage, t(31)=3·49, p=0·001). The interaction resulted from the fact that the only significant improvement from the Baseline to the Evaluation stage was on verb trials (t(31)=3·66, p=0·001), not on noun trials. In fact, only children in the Intentional condition demonstrated this improvement (t(15)=3·73, p=0·002).
In an additional analysis, we examined children's pattern of responses in the Training stage. Notice that to respond correctly in the Training stage, children did not have to learn the morphological rule; all they had to do was learn to pick the highlighted object. And indeed, we found no significant difference between conditions in the number of children's correct responses in the Training stage (t(30)=1·27, p>0·1). Thus, even though children in both conditions were equally correct in the Training stage, only children in the Intentional condition kept that high level of performance in the Evaluation stage.
To assess whether during the Training stage children in the Control condition nonetheless learned something beyond the sheer fact that the highlighted objects were the ‘correct’ ones, we analyzed all children's selection patterns in the Baseline and Evaluation stages. In particular, we conduced an ANOVA on the frequency of selection of the appearance-match – out of eight trials – in both conditions and stages. We found that while children in both conditions had a strong tendency to select the appearance-match in the Baseline stage (M Int=5·93, SD=1·88; M Cont=6·12, SD=1·75), this tendency significantly decreased in the Evaluation stage (M Int=5·19, SD=1·47; M Cont=5·0, SD=2·0) (F(1, 30)=5·45, p=0·003, η2=0·645). Importantly, there was no effect of condition, or an interaction between condition and stage. In other words, the selection pattern of children in the Control condition changed exactly in the same manner as that of children in the Intentional condition. As the interactions reported above reveal, the difference is that only children in the Intentional condition managed to change in accordance to the morphological form they were asked about. In particular, children in the intentional condition learned better how to move away from selecting appearance matches when the word morphology indexed a verb.
DISCUSSION
Two-year-olds exposed to identical statistical regularities associating novel linguistic patterns to referents, were capable of generalizing the patterns to novel instances only when the exposure had been in an intentional communicative context. Children did not succeed in generalizing when the correspondence between linguistic patterns and event types was not embedded in an intentional referential context. Thus, merely being exposed to the forms–referents correspondences was not enough for children to learn the abstract correspondences, but being exposed to them via a communicative intent was.
The way in which ‘intentions’ was manipulated in the present study was somewhat different than the way in which it has been manipulated in previous studies. For instance, a common procedure used by Tomasello and colleagues (e.g. Carpenter, Akhtar & Tomasello, Reference Carpenter, Akhtar and Tomasello1998) has been to mark an intentional event with an expression such as ‘there’, and an accidental event with an expression such as ‘whoops’. In such a procedure, it is clear to the child that, from the speaker's perspective, one event is relevant whereas the other can be ignored. In our procedure, the experimenter in the Intentional condition marked the event as relevant, but in the Control condition, she was agnostic about the relevance of the event in the first two familiar sets, and pronounced no opinion about it in all the novel events. Presumably, children could decide for themselves whether the highlighting was relevant or not based on the feedback they received on their selection of objects. It would be interesting to examine in future work the effect on children's rule learning of positive intentional cues (i.e. when a speaker clearly states what he or she intended), neutral/agnostic cues (i.e. when a speaker is not clear about his/her intent) and negative cues (i.e. when a speaker clearly states what he or she did not intend). Such a study would complement the work on word learning using similar kinds of manipulations (e.g. Baldwin, Reference Baldwin1991; Werker et al., Reference Werker, Cohen, Lloyd, Casasola and Stager1998).
Importantly, the results of the Training stage indicate that the difference between conditions was not due to differential attention to the linguistic forms or their potential referents, nor to children's beliefs that the computer's actions in the Control condition were irrelevant. After all, not only did children in the two conditions perform equally well in the Training stage, they changed their selection patterns from the Baseline to the Evaluation stage – from appearance-based to also action-based – in a similar way. In other words, despite the experimenter's comments regarding the initial familiar sets, and her lack of comments regarding the novel sets, in the Training stage, children in the Control condition chose the highlighted object just as often as did the children in the Intentional condition, and detected differences in object types. Thus, it seems that the difference between conditions lies in what children extracted from their noticing of the linguistic forms and the visual stimuli.
Before we discuss how intentions might have produced such a difference, it is interesting to note how exactly children's pattern of responses changed from the Baseline to the Evaluation stage. In the Baseline stage – a stage where children still had no systematic understanding of the morphological distinction assessed here – children's selection pattern was consistent with a ‘noun bias’. Namely, they treated novel words as nouns, extending them based on appearance rather than action similarity. This bias has been reported in children who are native speakers of a number of languages, including Hebrew (e.g. Bornstein et al., Reference Bornstein, Cote, Maital, Painter, Park, Pascual, Pecheux, Ruel, Venuti and Vyt2004), and has been the source of a lively theoretical debate regarding its source (Gentner, Reference Gentner and Kuczaj1982; cf. Tardif, Shatz & Naigles, Reference Tardif, Shatz and Naigles1997). A major issue within this debate has to do with how easy it is for children to learn nouns versus verbs. We found that only verbs showed significant improvement in learning from the Baseline to the Evaluation stage. That is, even though children manifested a preference for treating novel words as nouns, under conditions in which nouns and verbs had identical morphological transparency, syntactic frames, salience and frequency, children exposed to an intentional communicative context managed to overcome this noun bias and learn verbs. This finding illustrates the potential dissociation between frequency of a form and ease of its acquisition (see Tardif et al., Reference Tardif, Shatz and Naigles1997, for arguments about the importance of pragmatics for the learning of verbs).
What exactly did intentions do to children's learning process? A plausible answer is that intentions might supply children with an explanation for the apparent correspondence between the specific linguistic form used by a speaker in a given communicative context and the form's visual referent (Tomasello, Reference Tomasello2003). In other words, the experimenter's announcement of a communicative intent might have alerted children that they should pay attention to both the words and the objects. Following this reasoning, intentions may have led children to focus attention not so much on the separate visual and verbal elements present in the context, but in the actual correspondences between them.
Notice that this account does not preempt the importance of pattern detection capacities. Evidently, in order to learn that certain forms corresponded to verbs – and thus actions – while others to nouns – and thus objects – children had to detect certain regularities in the visual and linguistic input. Moreover, this account does not argue that pattern detection and sensitivity to intentions suffice for the learning of rules. In fact, they probably do not suffice even for the learning of words. For instance, in Moore et al.'s (Reference Moore, Angelopoulos and Bennett1999) study, two-year-olds learned that a novel word uttered by a speaker likely referred to the boring object the speaker was looking at rather than to the attractive object next to it. That is, sensitivity to a cue to a speaker's intention helped children determine the referent of the word. However, it did not establish the meaning of the word – i.e. whether it was the proper name of the object, its category name, its color, etc. Analogously, in the present study, sensitivity to the experimenter's intention raised in children the expectation that there is some non-arbitrary important reason for the experimenter to utter a specific linguistic form and then highlight a specific object. The parsing of the linguistic form, the conceptual classification of the object and the noticing of an abstract similarity between objects are just some of the other capacities required for success in the task. Intention served as the catalyst for the deployment of these capacities.
The conclusion that an understanding of intentions helps children make sense of their communicative experiences is further consistent with its early-emerging and extensive centrality in children's cognition. Infants interpret behavior in terms of intentions or goals (Woodward, Reference Woodward1998), and young children seem to greatly extend this interpretive bias, employing it when reasoning about the origins of natural objects (Kelemen, Reference Kelemen2004), categorizing artifacts (Diesendruck, Markson & Bloom, Reference Diesendruck, Markson and Bloom2003) and analyzing inanimate objects acting contingently (Luo & Baillargeon, Reference Luo and Baillargeon2005). The main contribution of the present study is the finding that children efficiently recruit their understanding of intentions for inferring linguistic rules.