INTRODUCTION
The multimodal nature (linguistic and gestural) of oral communication is a scientific domain gaining increasing recognition today in several fields: linguistics and gesture studies (Kendon, Reference Kendon2004), psychology (McNeill, Reference McNeill2000, Reference McNeill2005), and cognition and computer science (Sales Dias, Gibet, Wanderley & Bastos, Reference Sales Dias, Gibet, Wanderley and Bastos2009). Gestural aspects of human communication also have an impact in the study of language acquisition. Use of gesture is not restricted to children whose communication is heavily reliant upon non-verbal means in the early stages of language development. Not only does the gestural mode (hand and head gestures, facial expressions, posture changes) not disappear at the end of the so-called ‘pre-linguistic’ period, but it constitutes an indispensable basis for later linguistic communication. It evolves accordingly with linguistic and cognitive acquisition (Reference Capirci, Caselli, De Angelis, Riva and NjiokiktjienCapirci, Caselli & De Angelis, 2010; Capirci & Volterra, Reference Capirci and Volterra2008). Here we examine the role of gesture in the later stages of language development, in the context of a narrative retelling task. We examine the effect of age and language on children's speech and gesture production during the narrative retelling task.
Multimodal development of narratives is an area open to examination. There are tentative models of gesture and speech production in adults such as the Growthpoint Theory (McNeill, Reference McNeill2005), the Interface Hypothesis (Kita & Özyürek, Reference Kita and Özyürek2003), or the Sketch model (de Ruiter, Reference De Ruiter and McNeill2000). However, these models focus on the representational aspects of language, as shown for example by Özyürek, Kita, Allen, Brown, Furman, and Ishizuka (Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008), and do not consider the role of gesture in discourse and pragmatic dimensions. Our study tackles the multimodal narrative abilities of children in three languages never compared before. We bring forward elements that argue for a tentative model of multimodal narrative development including discourse and pragmatic dimensions.
The role of gesture in the early development of multimodal communication
Gesture plays a key role in early language acquisition. Before saying their first word, the infant uses of a repertoire of gestural signals that have communicative functions: the child expresses his/her emotions by using different facial expressions, designates objects with glances and gestures, knows how to wave in greeting, negates with the head, etc. (Bates, Benigni, Bretherton, Camaioni & Volterra, Reference Bates, Benigni, Bretherton, Camaioni and Volterra1979; Bates, Camaioni & Volterra, Reference Bates, Camaioni and Volterra1975; Blake, Reference Blake2000; Guidetti, Reference Guidetti2002, Reference Guidetti2005; Volterra & Erting, Reference Volterra and Erting1990). The appearance of pointing gesture constitutes an important milestone in acquisition as it marks the child's entry into referential communication in the first year (Butterworth, Reference Butterworth and Kita2003; Camaioni, Reference Camaioni, Nadel and Camaioni1993; Pizzuto & Capobianco, Reference Pizzuto and Capobianco2005; Tomasello, Carpenter & Liszkowski, Reference Tomasello, Carpenter and Liszkowski2007). A relationship has been established between pointing gesture production at a particular age and later lexical acquisition (Carpenter, Nagell & Tomasello, Reference Carpenter, Nagell and Tomasello1998).
During the second year, new gestures appear, including gestures performed by an empty hand, which are endowed with representational (gestures that shape objects and characters) and pragmatic properties (gestures that mean ‘open’, ‘give’). These gestures are considered to be similar to first words (Caselli, Reference Caselli, Volterra and Erting1990; Iverson, Capirci & Caselli, Reference Iverson, Capirci and Caselli1994). Furthermore, children start to combine referential (i.e. representational and pointing) gestures and words. Children use their capacity to combine two signals in bimodal messages (gesture + word) before being able to combine two words in a single message (Capirci, Iverson, Pizzuto & Volterra, Reference Capirci, Iverson, Pizzuto and Volterra1996; Iverson & Goldin-Meadow, Reference Iverson and Goldin-Meadow2005; Özçaliskan & Goldin-Meadow, Reference Özçaliskan and Goldin-Meadow2005, Reference Özçaliskan and Goldin-Meadow2009; Volterra, Caselli, Capirci & Pizzuto, Reference Volterra, Caselli, Capirci, Pizzuto, Tomasello and Slobin2004). Whether used alone or in combination with words, early gestures allow children to further express what meaning they want to convey.
Gesture and language in the later development of multimodal communication
As they get older, children are faced with more complex language tasks. Narrative is important as a form of complex language task. It requires linguistic as well as social and cognitive abilities (Berman, Reference Berman2004; Hickmann, Reference Hickmann2003). First, the narrative presents a more constrained form than a single utterance, and the daily use of language to narrate events relies on the ability to understand and generate linguistic information organized at this level, such as in expository discourse (verbal explanations and reasoning). Second, the narrative displays specific properties of coherence and cohesion (Halliday & Hasan, Reference Halliday and Hasan1976) that has no equivalent in the course of dialogue which is constructed out of the sequencing of short speech turns. Third, the action of storytelling requires cognitive abilities such as expressing absent referents, contextualizing linguistic information, and cognitive decentration to read the interlocutor's or the reader's mind (Hickmann, Reference Hickmann2003; Tolchinsky, Reference Tolchinsky and Berman2004). Although narrative presents children with unique challenges, the role gesture plays in narrative production is not well established. Thus, our understanding of the gestures that children use in narrative production is limited, and even less is known about the factors that influence the gestures that accompany narratives.
The picture we get from the existing studies is that of a series of jointly related changes in gesture use and linguistic abilities. It is known that, from the third year onwards, the gestural repertoire is reorganized and new types of co-speech gestures appear in later stages of language development (Colletta, Reference Colletta2004; McNeill, Reference McNeill1992), like beats (e.g. rhythmic gestures of the hand or the head that accompany certain syllables or words); metaphoric gestures that express abstract concepts (e.g. pointing behind to express the past or pointing in front to express the future, two hands forming a round shape to express the idea of completeness, separating the frontal space in two parts to express opposition); gestures of discourse cohesion (e.g. gestures that accompany connectives, abstract pointing to specific and empty spots in the frontal space of the speaker to refer to the objects and characters he is talking about).Footnote 1 As regards the gesture–speech relation, a study by Alibali, Evans, Hostetter, Ryan & Mainela-Arnold (Reference Alibali, Evans, Hostetter, Ryan and Mainela-Arnold2009) on co-speech representational gestures found children to be less redundant than adults when gesturing during a narrative task.
Second, the language task in which children are involved is a key predictor of the types of gesture children produce. For example, Reig Alamillo, Colletta & Guidetti (Reference Reig Alamillo, Colletta and Guidetti2013) compared oral narration vs. oral explanation and found that pragmatic gestures and subordinate markers were more frequent in explanations than in narratives, whereas cohesion markers were more often used in narratives. Moreover, gestures of the abstract are observable from the age of six years onwards in the child who formulates explanations (Colletta & Pellenq, Reference Colletta, Pellenq, Nippold and Scott2009; Goldin-Meadow, Reference Goldin-Meadow2003), yet this kind of gesture is hardly ever produced by six-year-olds in oral narrative tasks.
The studies on multimodal narratives from French children and adults (Colletta, Reference Colletta2004, Reference Colletta2009; Colletta, Pellenq & Guidetti, Reference Colletta, Pellenq and Guidetti2010), Italian children (Capirci, Cristilli, De Angelis & Graziano, Reference Capirci, Cristilli, De Angelis, Graziano, Stam and Ishino2011; Graziano, Reference Graziano2009), and Zulu children and adults (Kunene, Reference Kunene2010) revealed that the development of gestural behavior accompanies the development of narrative behavior. As such, with age, children seem to produce longer and more detailed narratives, including reported speech and various types of commentaries. Gestures and expressive mimics can contribute as markers to this information complexity, both on the structural and the pragmatic dimension. The more complex narratives are on the syntactic and pragmatic levels, the more gesture they include, specifically framing and cohesive gestures (Colletta et al., Reference Colletta, Pellenq and Guidetti2010). Here we examine the role of age in children's multimodal narrative production within a single study.
The role of language and culture in the development of multimodal communication
Past research has shown that the structure of the language itself influences gesture production. In terms of semantic differences, not all languages express space, location, and motion in the same way (Talmy, Reference Talmy1985). Speakers who express manner and path in satellite-framed languages such as English (e.g. to go + up/down/across) were found to produce different representational gestures in the accompanying gesture behavior to those who express manner and path in verb-framed languages such as French (e.g. monter/descendre/traverser) (Gullberg, Hendricks & Hickmann, Reference Gullberg, Hendricks and Hickmann2008) or Turkish (Özyürek, Kita, Allen, Furman & Brown, Reference Özyürek, Kita, Allen, Furman and Brown2005). In terms of syntactic differences, some languages require an explicit subject, such as English and French, whereas others are null-subject languages, such as Italian, Spanish, or Zulu. This characteristic requires distinct markings of referential continuity in the textual use of language, with less need to repeat anaphora in the latter case (Hickmann, Reference Hickmann2003). It is also suspected to have an effect on the production of gesture, for example, co-speech gesture can compensate for the absence of linguistic anaphora in a null-subject language (Cristilli, Capirci & Graziano, Reference Capirci, Colletta, Cristilli, De Angelis and Graziano2010; Demir, So, Özyürek & Goldin-Meadow, Reference Demir, So, Özyürek and Goldin-Meadow2011; Kunene, Reference Kunene2010; Yoshioka, Reference Yoshioka2009).
In addition to language differences, another key factor influencing multimodal communication is culture as a set of values and norms that help shape the social behavior of individuals who belong to a cultural group as well as social interaction between them. Past research has shown that culture places restrictions on multimodal communication in all aspects: ritualized forms of interpersonal interaction, the use and form of speech acts, genres of monologues which are a part of narrative and expository texts (Saville-Troïke, Reference Saville-Troike1982). Furthermore, culture expresses itself in a non-verbal code during communication, including the use of emblems (Morris, Reference Morris1994; Pika, Nicoladis & Marentette, Reference Pika, Nicoladis and Marentette2009), facial expressions (Ekman, Reference Ekman, Rimé and Scherer1989) and co-speech gesture (Kendon, Reference Kendon2004; Kita & Özyürek, Reference Kita, Özyürek, Duncan, Cassell and Levy2007; McNeill, Reference McNeill2000). It was reported that Italians use a great number of gestures when communicating (Kendon, Reference Kendon2004). Conversely, Western culture, including North American culture, has been described as a culture that is poor in body contact and gesture (Barnlund, Reference Barnlund, Kendon, Harris and Key1975). Additionally, in the cultural adaptation process, gesture recognition plays an important role (see Molinsky, Krabbenhoft, Ambady & Choi, Reference Molinsky, Krabbenhoft, Ambady and Choi2005). The postulate that originates from these observations is that, under the influence of socialization, children will mobilize these resources in different ways depending on their culture or origins. A study by Iverson, Capirci, Volterra & Goldin-Meadow (Reference Iverson, Capirci, Volterra and Goldin-Meadow2008) on early communication showed that, from a very early age, Italian children use more gestures and have a bigger gesture repertoire than American children in referential communication. Here we ask whether the language and culture that children are exposed to has an effect on their gesture production during narrative retelling.
Aims of the present study
The present study concentrates on speech and gestural production in the same narrative retelling task in three different linguistic groups, and questions the common characteristics of late multimodal development in these groups: French, American, and Italian children. To our knowledge, this is the first study comparing both the speech and gestural production of these three groups of children. In terms of hypotheses, on the developmental side, we expected narratives to get longer and more complex on the syntactic as well as pragmatic level, to have more gesture and, amongst gestures, to observe more cohesive and framing gestures with age (see Colletta et al., Reference Colletta, Pellenq and Guidetti2010). We also expected to find a similar developmental trajectory in the American, French, and Italian corpora, which would confirm the existence of a general developmental pattern of textual and monologue abilities, already highlighted on the linguistic aspects of spoken texts (Berman, Reference Berman2004; Berman & Slobin, Reference Berman and Slobin1994; Hickmann, Reference Hickmann2003), but not yet confirmed on the gestural dimension. On the cross-linguistic side, studying the development of narrative behavior in three different groups allows us to examine the effects of the constraints of different languages and cultural backgrounds on multimodal communication. Here we expected to see more gestures in narratives produced by Italian children compared to French and American children, and among them, more representational gestures for linguistic and cultural reasons. Italian is a highly elliptical language (Simone, Reference Simone2008), which causes reference tracking in the narrative to be less explicit than in French or English. If the hypothesis of a compensation link between speech and gesture is applicable, a part of this marking should be completed through representational gestures that help construct and/or express the referent. As a consequence, the proportion of those gestures should be higher in the Italian narratives than in the other narratives. Based on previously reported cultural differences, we also expect Italian children to use more gestures than French and American children during a narrative task, and American children to use the least.
METHOD
Participants
Ninety-eight participants belonging to three linguistic groups (French, American, and Italian) and two age groups (five and ten years of age) were observed with the same protocol. Gender was nearly equal for most of the groups (see the distribution of participants in Table 1). The three groups have been made homogeneous with respect to both SES (predominantly upper-middle-class children) and ethnicity (mostly Caucasian). All children are L1 speakers of their country dominant language and attended preschools for the younger children and primary schools for the older ones, where they were selected in the grades corresponding to their age.
Table 1. Distribution of participants by language and age (years;months)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:90297:20160412043718634-0460:S0305000913000585_tab1.gif?pub-status=live)
Procedure
The protocol was suitable for both age groups and consisted of videotaping the children in a semi-school environment (at school but out of the classroom), in narrative and explanatory tasks. Participants were asked to watch a video extract (2 minutes and 47 seconds) of a wordless cartoon, taken from the series Tom & Jerry, and to retell (constrained narrative) the story it depicted as well as answer some comprehension questions (explanations in a dialogue context) about the same story. In the present study, we will focus only on the narrative production task.
The cartoon starts with a mother bird leaving her egg in the nest. The egg accidentally falls out and rolls into Jerry's house. The egg hatches in Jerry's house and a baby woodpecker emerges. The baby bird then starts damaging Jerry's furniture. After a few failed attempts to calm the bird down, Jerry gets angry and decides to put the bird back in its nest.
The participants’ narratives were videotaped for later analysis. The data thus consisted of ninety-eight narratives told by the French (data collected in Grenoble and Toulouse), American (data collected in Chicago), and Italian children (data collected in Naples and Rome), collected with exactly the same procedure.
Coding
To analyze this cross-linguistic corpus, the research team defined a common procedure to transcribe and annotate the verbal and the gestural data. A multi-tier coding grid was conceived using the software ELAN (http://www.lat-mpi.eu/tools/elan/) and the coding manual accompanies this annotation system.Footnote 2 It presents the conventions of transcription for utterances, adapted from the Belgium VALIBEL transcription system (http://www.uclouvain.be/cps/ucl/doc/valibel/documents/conventions_valibel_2004.PDF), defines the linguistic and gestural variables which are to be analyzed, explains the manner of coding tier by tier, and provides examples for each variable. An illustration of our coding system is provided in the ‘Appendix’ and shows an extract from an annotated file on ELAN.
The transcription of the speakers’ words appears on two tracks: one track for the interviewer and one for the child. The transcription is orthographical and presents the entirety of the remarks of the speakers. An example of a narrative produced by a French six-year-old is provided below with the corresponding English translation:
example: En premier c’était la maman, elle tricotait, et puis après elle est partie, et puis l’œuf il bougeait, et puis après il est tombé, il est arrivé dans la maison de la petite souris, et puis la petite souris elle s'est réveillée, et elle s'est réveillée, et puis elle était assise dessus l’œuf, et puis après elle est partie, parce qu'elle avait peur un peu, et puis après il commençait à craquer l’œuf, et puis après il commençait à marcher, et puis petite souris elle va enlever l’œuf, qui est resté en haut sur la tête, et puis le petit il a dit maman, et puis après il cassait tout, et après il l'a ramené chez lui la petite souris dans son nid.
translation: ‘First there was the mummy, she was knitting, and then she left, and then the egg, it moved, and then it fell down, it ended up in the little mouse's house, and then the little mouse, he woke up, he woke up, and then he was sitting on top of the egg, and then he left, because he was a bit scared, and then it started to crack, the egg, and then it started to walk, and then the little mouse, he went and took off the egg that was still on its head, and then the little one said mummy, and then it broke everything, and then he brought it back to its house, the little mouse, to its nest.’
Speech coding
As for the linguistic coding, we focused on three measures: clauses, connectors, and anaphora. We segmented the child's speech into clauses and words. The number of clauses contained in an account provides a good indication of its informational quantity, which is likely to grow with age. As for words, the elements coded as connectives in our corpus include the words that contribute to discourse structure marking temporal relations (then, later, already, first, next, finally, now, etc., and the French and Italian corresponding connectives), logical or argumentative relations between utterances (because, since, so, therefore, but, or, though, if, given that, etc.), reformulation (in other words, in fact, etc.), conversational markers (well, there, etc.), and other connectives marking more than one of these relations (and, then).
The category of anaphors includes linguistic expressions that serve to maintain the identity of previously introduced referents throughout the text. Anaphoric expressions differ in their referential content: personal pronouns are one of the anaphoric expressions with less referential content, and their adequate use is conditioned by the speaker's judgment on the availability of the referent. Definite NPs, on the other hand, specify most of the information needed to identify their referent, but their use in circumstances where the referent is clearly identifiable is perceived as redundant. The referential expressions included under the category of anaphors in the analysis are: personal pronouns (e.g. and the egg [referent underlined] moves around and it [anaphor in bold] falls into a spider's web), relative pronouns (e.g. and unfortunately it ends up in the house that belongs to Jerry, who is asleep), and definite noun phrases, including definite NPs with or without lexical repetition (e.g. then the egg cracked, the fledgling cracked the egg; but the little bird still has the shell over its eyes … and then the woodpecker says mummy), and proper names (e.g. Jerry helped it a bit and as soon as the fledgling saw Jerry). These three types of anaphoric expression were included in the analysis because of their high frequency, and because their adequate use, due to the different discursive and cognitive status of their referent, has been pointed out as one of the landmarks of late language development (Hickmann & Hendricks, Reference Hickmann and Hendricks1999).
Gesture coding
For the coding of co-speech gesture, we defined ways to identify and then code gestures and their relationship to speech on several dimensions. In Kendon's (Reference Kendon2004) work, a pointing gesture, a representational gesture, or any other hand gesture (an excursion of the body during speech) is called a ‘gesture phrase’, and it possesses several phases including the preparation, the stroke, (i.e. the meaningful part of the gesture phrase), the retraction, and the repositioning for a new gesture phrase. Yet some gestures are nothing but strokes: a head gesture or a facial expression, for instance, are meaningful right from the start until the end of the movement and have no preparatory or retraction phases. Our premise, therefore, was that a gesture was any co-speech gesture phrase or isolated gesture stroke that needed to be annotated.
To identify the gestures, each coder took into account the following three criteria (based on Kendon's, Reference Kendon2004, proposals):
(i) if the movement was easy to perceive, of good amplitude, or well marked by its speed (on a scale of 0 to 2, 2 being the strongest value);
(ii) if location was in the frontal space of the speaker, for the interlocutor (on a scale of 0 to 2, 2 being the strongest value);
(iii) if there was a precise hand shape or a well-marked trajectory (on a scale of 0 to 2, 2 being the strongest value).
Once a gesture had been identified (total score > 3), the coder annotated its phases using the above-quoted values based on Kendon (Reference Kendon2004). The coder then attributed a function to each gesture stroke. In the literature on gesture function, there generally appears to be agreement amongst gesture researchers, although they do not always agree on terminology. According to several researchers (Kendon, Reference Kendon2004; McNeill Reference McNeill1992), four main functions are always mentioned: referential gestures that help to identify (pointing gestures) or represent concrete and abstract referents; framing and pragmatic gestures that express social attitudes, mental states, and emotions and that help perform speech acts and comment on one's own speech as well as others’; gestures that mark speech and discourse, including discourse cohesion gestures; and interactive gestures that help to synchronize one's own behavior with the interlocutor's in social interaction. Our gesture annotation scheme relies mostly on Kendon's (Reference Kendon2004) and Colletta's (Reference Colletta2004) classifications and covers the whole range of these functions. The coders had to choose between: representational, discursive, framing, performative, interactive, and word searching (see ‘Appendix’ for detailed explanations).
Gesture–speech relations
We also coded gesture–speech relations as reinforces, integrates, supplements, complements, contradicts, and substitutes (see ‘Appendix’ for detailed explanations).
Rates per clause
In order to ensure comparability across groups, we took the total number of each type of linguistic or gestural component (e.g. the number of anaphors, the number of framing gestures, etc.) and divided it by the number of clauses. These rates allowed us to account for individual and age group differences, as well as to compare the proportions of linguistic and gestural components in the different groups.
Reliability
Reliability in transcription and coding of the children's words was established after three transcripts of the words of the speakers. In order to establish reliability in gesture coding, two separate coders identified the gesture units and attributed a function to each stroke. A third coder validated their annotations and settled any disagreements. To assess the level of agreement, we used the 2/3 agreement method (Colletta et al., Reference Colletta, Kunene, Venouil, Kauffman, Simon, Kipp, Martin, Paggio and Heylen2009): there is agreement when at least two out of the three coders agree on the presence of a stroke or on the function to attribute to a stroke. Inter-rater agreement on the identification of gesture units was 87%, and agreement on the function attributed to each stroke was 99%.
RESULTS
Prior to the parametric analysis, we performed Levene's test to check the equality of variance in our results. The test was not significant for any of the measures, and all the data were processed with two-way ANOVAs: age groups (2: five- and ten-year-olds) × language groups (3: French, Italian, English). Age and language were regarded as between-groups factors. This section is organized to present data according to the two types of effects expected: age and language. The results for the linguistic measures are presented in the first subsection, followed by the analysis of the gestural measures. Tables 2 to 4 present the narrative, linguistic, and gestural measures for both age groups and for the three language groups.
Effects of age and language on linguistic measures
Table 2 shows the means and standard deviations (SDs) for the number of clauses and the rates of connectives and anaphors in the narratives produced by the two age groups in each language group.
Table 2. Means (SD) of linguistic measures for five- and ten-year-olds’ narratives in each linguistic group
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:78114:20160412043718634-0460:S0305000913000585_tab2.gif?pub-status=live)
note: Rates calculated by dividing the total number of each linguistic type by the number of clauses.
The results for the number of clauses indicated that there was an effect of age, an effect of language group, and a significant interaction between age and language group. We found a significant effect of age on the number of clauses (F(1,92) = 15·85, p < ·001, η p2 = ·14), indicating that older children produced longer narratives overall than younger children. Also informative is the group effect (F(2,92) = 27·61, p < ·001, η p2 = ·37) confirmed by post-hoc tests (Tukey, p < ·001) showing that the French children had the longest narratives, with an average of 35 clauses per narrative, and the American children had the shortest (13 clauses); the Italian children were between the two, with an average of 25 clauses. The interaction between the two factors was also significant (F(2,92) = 6·29, p = ·002, η p2 = ·12), indicating that the observed effect of age depended on the group. In other words, the age effect was stronger for French children than for the two other groups, and in all cases the ten-year-olds’ narratives were significantly longer than those of the five-year-olds. The between-group differences were higher for French and Italian children than for American children.
Turning to discourse cohesion, the analysis of the connective rate (number of connectives per clause) showed a significant effect of group (F(2,92) = 6·13, p = ·003, η p2 = ·11), indicating, and confirmed by post-hoc tests (Tukey, p = ·002), that the use of connectives was higher in the French children's group than in the Italian children's group. The analysis also showed a significant interaction between language group and age (F(2,92) = 8·26, p = ·0004, η p2 = ·15), indicating that the effect of group depended on age: the narratives produced by the French younger children had a higher rate of connectives than the one produced by the American younger children and all the Italian children. The effect was reversed for the connectives produced by the American children, whose rate was significantly higher in older children than in the younger group; these effects were confirmed by the post-hoc tests (Tukey, p < ·003).
The second measure accounting for discourse cohesion was the presence of anaphoric elements. The analysis of the anaphor rate (number of anaphors per clause) indicated that both age and language group effects were significant. The age effect (F(1,92) = 18·06, p = ·00005, η p2 = ·16), showed that, overall, there were more anaphors per clause in the discourses produced by the older children. The group effect (F(2,92) = 10·7, p = ·001, η p2 = ·11), showed that the highest anaphor rate was produced by the French children followed by the Italian children and then by the American children. The post-hoc tests confirmed (Tukey, p < ·0001) that only the differences between the French and the American groups and between the American and the Italian groups were significant.
Effects of age and language on gestural measures
Table 3 shows the means and standard deviations of the gesture measures included in the analysis: gesture rate (number of co-speech gestures by clause) and rates of representational gestures, discursive gestures, and framing gestures. The ‘other gestures’ category brought together interactive, performative, and word searching for both age groups and for the three language groups of children.
Table 3. Means (SD) of gesture measures for five- and ten-year-olds’ narratives in each linguistic group
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:53890:20160412043718634-0460:S0305000913000585_tab3.gif?pub-status=live)
note: Rates calculated by dividing the total number of each gesture type by the number of clauses.
In the analysis of the gesture rate, only language group was found to have a significant effect (F(2,92) = 4·69, p = ·01, η p2 = ·09), showing that the gesture rate was higher in the Italian group than in the French group (Tukey, p < ·007), the American children occupying an intermediary position between the French and the Italian, but the differences were not found to be significant. A closer look at the different types of gesture yielded interesting information about the effects of age and language on the children's use of co-speech gestures.
For the representational gesture rate, only the language group effect was significant (F(2,92) = 9·69, p = ·0001, η p2 = ·17), indicating, as confirmed by the post-hoc tests (Tukey, p < ·003), that the Italian children produced more representational gestures than the French and the American children. Representational gestures were therefore the most frequent type of gesture produced by the three groups, and produced more frequently by the ten-year-olds than by the five-year-olds, even if these differences were not significant.
In the analysis of the discursive gesture rate, age and the language group effects were both significant (F(1,92) = 11·97, p = ·0008, η p2 = ·11; F(2,92) = 3·82, p=.02, η p2 = ·07, respectively), indicating that the older children produced significantly more discursive gestures than younger children, and that, as confirmed by post-hoc tests (Tukey, p < .01), the American children produced more discursive gestures (i.e. gestures helping to structure speech or mark cohesion) by clause than the French children.
For the framing gesture rate, only the language group effect was significant (F(2,92) = 4·08, p = ·02, η p2 = ·08), and showed that, as confirmed by post-hoc tests (Tukey, p < ·01), French children produced significantly more framing gestures than their Italian counterparts.
In the analysis of other gestures (performative, interactive, and word searching gestures gathered together), only the age effect was found to be significant (F(1,92) = 5·63, p = ·01, η p2 = ·05). It showed that the five-year-old children produced more of these types of gesture by clause than the ten-year-olds, except in the Italian group.
Table 4. Means (SD) of gesture–speech relation types for five- and ten-year-olds’ narratives in each linguistic group
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:83375:20160412043718634-0460:S0305000913000585_tab4.gif?pub-status=live)
Finally, Table 4 presents the data concerning the gesture–speech relations (reinforce, integrate, supplement; ‘other relation’ included the scarce numbers for the complement, contradicts, and substitute categories). An ANOVA with repeated measures was carried out on these data, with age and language as between-groups factors and type of gesture–speech relation as a within-subjects factor. The analysis showed a language group effect (F(2,92) = 9·43, p = ·0001, η p2 = ·17), and all the interactions between type of gesture–speech relation and language and age groups were found to be significant (F(12,552) = 12·73, p < ·001, η p2 = ·21; F(6,552) = 7·12, p < ·001, η p2 = ·07; F(12,552) = 1·97, p = ·02, η p2 = ·04, respectively). In other words, the gesture–speech relation, which aims to ‘integrate’, and where the information provided by the gesture adds precision to the encoded linguistic information, was the most frequently produced, particularly by the Italian children (Tukey, p < ·002 in all cases) and by the older children in all language groups (Tukey, p < ·01 in all cases).
DISCUSSION
Based on a common method, this cross-linguistic investigation of multimodal narratives produced by children from two age groups added to the findings on the effect of age, language, and culture on multimodal language acquisition, and raised new questions.
Effects of age on multimodal narratives
With regard to age, similar differences occurred between younger and older children in all the three language groups, which suggest a common developmental trend in multimodal discourse abilities despite the language and culture particularities. Interestingly, these differences appeared in the gestural aspects of narrative behavior as well as in their linguistic aspects.
First, older children in all three language groups produced longer narratives than their younger counterparts. This developmental change was put forward decades ago in the literature on narrative development. It shows in recent multimodal studies for French (Colletta, Reference Colletta2004, Reference Colletta2009; Colletta et al., Reference Colletta, Pellenq and Guidetti2010), as well as for Italian (Graziano, Reference Graziano2009) and Zulu (Kunene, Reference Kunene2010).
Second, as suggested by the increase in the rate of anaphora, older children's narratives contained more information. The above-mentioned multimodal studies also found qualitative age-related changes on the linguistic measures and a greater complexity in the linguistic markings. These changes may result in a more detailed account of the narrative, as in the studies on Italian and Zulu narratives (Graziano, Reference Graziano2009; Kunene, Reference Kunene2010); they may also result in a greater pragmatic complexity in the retelling of the story, with older children and adults adding more meta-narrative and para-narrative comments to their account (Colletta et al., Reference Colletta, Pellenq and Guidetti2010).
Importantly, we did not find an age-related increase in gesture rate (except for Italian children). However, the kind of gestures children produced in the context of narrative production varied with age. Older children relied more on discursive (cohesive) gestures, less on gestures that are not directly related to the narratives task (other gestures), and they favored the ‘integrate’ relation between speech and gesture which resolves in packed bimodal information. All in all, this result confirms the changes put forward by Colletta (Reference Colletta2009) and Colletta et al. (Reference Colletta, Pellenq and Guidetti2010), by Graziano (Reference Graziano2009), and by Kunene (Reference Kunene2010) for French, Italian, and Zulu respectively, with an increase in the use of co-speech gesture directly associated with narration during childhood, and, in the meantime, a modification of the gesture repertoire of the child and in the use of gestures. The research on children's narratives and gestures is now well advanced and, together with these results, allows a tentative model of multimodal narrative development in which major changes in later language acquisition occur despite language and cultural differences. The following elements present the basic picture of these changes in narrative behavior.
Whatever the language, the co-speech gesture system evolves in later language acquisition in order to fulfill new specific functions or new communicative aims, such as, in our case, narrating fictitious – or real, as in the 2009 study reported by Colletta – events. In other words, language development and gesture development are tightly related during childhood. Younger, five-year-old children in the last year of preschool, who are typically more at ease with dialogue and interactive language formats, find the monologue production of narrating a story a difficult task, and produce short and elliptical narratives. They do gesture, and their gesture sometimes reflects their own difficulties in dealing with monologue language production, and with the adult who needs to prompt them and scaffold their narrative production. The high proportion of word searching gestures and of gestures expressing pragmatic and interactive functions in the younger children's gesture production indexes their constant move back towards a dialogue format.
Older, ten-year-old children on the way to secondary school have developed narrative abilities that show in the length of their linguistic production as well as in their linguistic and gestural aspects. They hold on to the monologue production task from the start till the end, concentrate on the narrative, and deliver longer and more detailed accounts. This greater complexity in linguistic information goes along with an increasing use of co-speech gesture to represent and track the characters from the story, to enlighten the events, to mark the discourse progression – the breaks in the narrative thread and between telling the event and commenting on it – to express feelings towards the story or the task, to modify the illocutionary value of a clause, etc. (Colletta, Reference Colletta2009; Graziano, Reference Graziano2009). Children of this age who do gesture rely on gesture resources as well as on linguistic resources to accomplish the task. Their narratives show how intricate the two types of sign system are in discourse production.
Our study has some limitations. As for this comparative study, we lack more refined age classes. When available (Colletta et al., Reference Colletta, Pellenq and Guidetti2010; Kunene, Reference Kunene2010), adult narratives show statistically significant changes from ten-year-olds’ with respect to narrative content and structure, and the pragmatics of narration. On the same ground, eight- and twelve-year-olds’ narratives show statistically significant changes from six- and ten-year-olds’ (Kunene, Reference Kunene2010). As for the most used type of gesture – representational gesture – there are developmental differences in their shape and meaning between the ages of four and ten years (Graziano, Reference Graziano2009) that need to be explored in greater detail, as they help track cognitive changes in the way children represent the characters, scenes, and actions in a story. These changes may be unseen when focusing solely on the children's words. Other types of gesture that have a framing or a discourse cohesion function also need to be investigated to help understand when and how pragmatic and discourse constraints evolve during childhood. As for the relationship between gesture and speech, it is found to be far more complex in the multimodal language production of children aged six years and over than in younger children. Starting with the categories put forward by researchers who have studied early language development (Capirci et al., Reference Capirci, Iverson, Pizzuto and Volterra1996; Özçaliskan & Goldin-Meadow, Reference Özçaliskan and Goldin-Meadow2005, Reference Özçaliskan and Goldin-Meadow2009), we had to work out new categories (integrates, contradicts, substitutes) that need to be studied in greater detail in the future.
Effects of language on linguistic measures
Despite the common developmental trends, interesting language and cultural differences were observed. Concerning the effects of language on linguistic measures, we found that French children spoke longer and used more connectives than the other two groups. In all three countries, preschool favors the use of oral language in dialogue format and does not focus explicitly on training children to retell stories, whereas primary school favors the use of written and academic language without focusing on training oral narrative abilities. Learning how to retell stories and narrate from fictitious media or real life mostly remains as a language acquisition issue rather than as a formal learning one in all three countries. Future studies should examine factors such as the home environment in order to explain such a cultural difference.
The Italian group did not produce less anaphora than the other two groups, because zero anaphora was coded for along with other types of anaphora, as unmarked anaphora is commonly used in Italian language and represented 47% of all linguistic anaphora in the Italian data.
The shortness of American children's narratives compared to their Italian and French counterparts could be explained by grammatical constraints. For instance, translators are accustomed to shorter versions of French texts in English, as a number of grammatical features contribute to shorten English sentences compared to French: subject + verb contraction, verb + negation contraction, elision of the determiner, elision of the verb, the marking of the possessive, the verb + preposition construction. However, the anaphor rate was also found to be lower in the American narratives compared to the French and Italian narratives, and grammar does not account for this result. The explanation lies rather in the content of the narratives delivered by American children. Unlike their French and Italian counterparts, the American children produced narratives that included a lot of comments. The more meta-narrative and para-narrative comments there are, the less anaphoric marks there are, as these help tracking the main referents from the story plot, as shown by McNeill (Reference McNeill1992).
Effects of language on gesture measures
Concerning the effects of language on gesture measures, first we found that the greater use of gestures by Italians was confirmed for Italian children in a monologue narrative task. Second, we found that French children, like American children, produced less representational gestures while narrating than their Italian counterparts. On the other hand, we found a higher rate of framing gestures for the French, and of discursive gestures for the American children. Studying the gestural production of two-year-old Italian and American children, Iverson et al. (Reference Iverson, Capirci, Volterra and Goldin-Meadow2008) found a higher proportion of representational gestures in the Italian children's repertoire than in the American children's. The result from our study confirms this difference for older children attending primary school and is consistent with the fact that Italian children favored the ‘integrate’ gesture speech-relation which characterizes the use of representational gestures.
Although we did not look for direct evidence of the compensatory role of representational gestures towards linguistic reference tracking in a zero anaphora language, these results are in line with a study on the same set of data (Capirci, Colletta, Cristilli, De Angelis & Graziano, Reference Capirci, Colletta, Cristilli, De Angelis and Graziano2010), in which the Italian children were found to use a representational strategy to track and disambiguate referents in gesture. These new and conclusive results are in line with studies by Gullberg (Reference Gullberg2006) and Yoshioka (Reference Yoshioka2009), and they confirm the crucial role played by gesture in reference tracking for young narrators of a zero anaphora language.
Studying narrative production in various languages also brings to light cultural differences. A comparison of French and Zulu narratives (Kunene, Reference Kunene2010) showed striking differences in the linguistic as well as in the gestural aspects of adults’ and children's narratives. Unlike the French older children and adults, the Zulu older children and adults delivered far more precise narratives and used a lot more gestures, rarely interrupted the telling of the events to insert a personal comment, behaved differently depending on the genre (male or female), and the males used a wider gesture space expanding over the frontal space. Cultural particularities in literacy conceptions as well as in everyday social behavior may help to explain these differences. We can also hypothesize that they have an effect on the way people respond to data collection methods. A previous study on French spontaneous narratives produced by French children aged six to eleven years (Colletta, Reference Colletta2009) showed advanced social abilities in the ten- and eleven-year-olds that did not show in the children's narratives collected for our study.
All in all, our comparative study on American, Italian, and French narratives brought some unexpected differences (length of narratives, cohesion, use of certain types of gesture) that may have to do with structural differences between the languages, and broader cultural differences in the three societies, if not with conceptions of narrative. Importantly, despite some linguistic and cultural peculiarities, the results of this study clearly argue for a developmental model of multimodal narrative production within three languages not yet compared in the literature. As children's speech became more complex in the context of narrative production, so did their gestures, specifically those that contribute to the narrative structure. These findings add evidence to the subtlety and strength of the relationship between speech and gesture in later language development and pave the way for future research. Further investigations on formal aspects of representational gestures, the use of framing gestures, and the gesture–speech relation of non-representational gestures would be of interest for a better understanding of the gesture–speech system development in these three languages and others.
APPENDIX
Detailed explanations for the coding system for gestures
For gesture functions, the coders had to choose between:
(i) Representational: hand or facial gesture, associated or not to other parts of the body, which represents an object or a property of this object, a place, a trajectory, an action, a character or an attitude (e.g. two hands drawing the form of the referent; hand or head gesture pointing to a spot that locates a virtual character or object in frontal space; hand or head moving in some direction to represent the trajectory of the referent; two hands or body mimicking an action), or which symbolizes, by metaphor or metonymy, an abstract idea (e.g. hand or head movement towards the left or the right to symbolize the past or the future; gesture metaphors for abstract concepts).
(ii) Discursive: cohesive gesture which aids in structuring speech and discourse by the accentuation or highlighting of certain linguistic units (e.g. beat gesture accompanying a certain word; repeated beats accompanying stressed syllables), or which marks discourse cohesion by linking clauses or discourse units (e.g. brief hand gesture or beat accompanying a connective; abstract pointing gesture with an anaphoric function, e.g. pointing to a spot to refer to a character or an object previously referred to and assigned to this spot).
(iii) Framing: gesture which expresses an emotional or mental state of the narrator (e.g. face showing amusement to express the comical side of an event; shoulder shrug or facial expression of doubt to express the incertitude of what is being asserted).
(iv) Performative: gesture which allows the gestural realization of a speech act (e.g. head nod as a ‘yes’ answer, head shake as a ‘no’ answer), or which co-expresses, together with the verbal utterance, the illocutionary value of a speech act (e.g. head nod accompanying a ‘yes’ answer, head shake accompanying a ‘no’ answer).
(v) Interactive: gesture accompanied by a gaze towards the interlocutor to express that the speaker requires or verifies his attention, or shows that he has reached the end of his speech turn or his narrative, or towards the speaker to show his own attention (e.g. nodding head while interlocutor speaks).
(vi) Word searching: hand gesture or facial expression which indicates that the speaker is searching for a word or expression (e.g. frowning, staring above, tapping fingers while searching for words).
For gesture–speech relation, the coders had to choose between:
(i) Reinforces: the information brought by the gesture is identical to the linguistic information it is in relation to, as when a nodding head is accompanied by a ‘yes’ of an affirmative.
(ii) Integrates: the information provided by the gesture adds precision to the encoded linguistic information as ‘she leaves’ + < shifting of the left hand towards the left side >, indicating the direction of the displacement; this annotation only concerns the representational gestures.
(iii) Supplements: the information brought by the gesture adds new information not coded in the linguistic content, as ‘he tries to come out’ + < vertical agitation of the hand > to represent the baby bird moving inside the egg.
(iv) Complements: the information provided by the gesture brings a necessary complement to the incomplete linguistic information provided by the verbal message: the gesture disambiguates the message, as when a pointing gesture accompanies a location adverb like ‘here’, ‘there’.
(v) Contradicts: the information provided by the gesture contradicts the linguistic information provided by the verbal message: pointing to the right while talking about the left direction; displaying the facial expression of anger while using soft words to describe a person's behavior or attitude.
(vi) Substitutes: the information provided by the gesture replaces linguistic information as nodding in affirmative response.
Illustration of the coding system
The figure shows an extract from an annotated file on ELAN (French data, interviewer sitting on the left on the media window, child sitting on the right): words of the child (first track), annotated gestures (phrase and stroke, function, relation to speech respectively on second, third, and fourth track), and annotated speech (clauses on sixth track, words on eighth track, connectives and anaphoric marks on tenth track).