INTRODUCTION
Events of motion contain many aspects that can occur simultaneously. Take, for example, a person running downstairs. The runner is pumping his legs and moving downward at the same time. One of the key features of human language is that it can take a holistic event, break it into segments, and express the segments sequentially, allowing speakers to focus on each of the decomposed semantic elements of the event in turn. For example, in English, we describe the running event as run down, representing the manner of the movement (run) separately from its trajectory (down). But dividing holistic events into segments and then combining those segments into organized strings is not the only way we can represent events. We use maps, pictures, pantomimes, etc., which represent events more holistically; for example, a mime who is depicting an act produces movements that correspond, part-for-part, to the act, thus evoking the act as a whole. Segmented (but not holistic) representations have the potential to give rise to new combinations, one of the basic design features of language (Hockett, Reference Hockett1960) and a characteristic of all spoken languages (Talmy, Reference Talmy and Shopen1985), as well as established (Supalla, Reference Supalla, Fischer and Suple1990) and newly emerging (Senghas, Kita & Özyürek, Reference Senghas, Kita and Özyürek2004) sign languages. Moreover, segmented representations allow speakers and signers to focus on a single piece of an event (e.g. to highlight manner and not path) in a way that holistic representations do not (e.g. miming movement along a path inevitably brings with it a depiction of manner).
We ask in this paper whether segmenting and sequencing the pieces of a motion event is such a central and robust feature of human language and communication that it can be reinvented by a child who does not have access to a conventional language. We tackle this question by comparing the gestures produced by deaf children not exposed to accessible input from a conventional language (homesigners) to gestures produced by hearing speakers (adults, children, and their own hearing mothers) in the same community.
Componentialization in hearing and deaf children
Children exposed to conventional languages learn to componentialize elements of a motion event early in development, whether they are learning a spoken language (Allen et al., Reference Allen, Özyürek, Kita, Brown, Furman, Ishizuka and Fujii2007; Choi & Bowerman, Reference Choi and Bowerman1991; Özçalışkan & Slobin, Reference Özçalişkan, Slobin, Greenhill, Littlefield and Tano1999) or a sign language (Supalla, Reference Supalla1982). However, not all children are exposed to models of language. Deaf children whose hearing losses are so severe that they cannot acquire spoken language, and whose hearing parents have not exposed them to sign language, lack an accessible model for language. Nevertheless, these children communicate using gestures, called homesigns (Goldin-Meadow, Reference Goldin-Meadow2003).
Homesigns are characterized by many, although not all, of the properties of natural language, including a stable lexicon (Goldin-Meadow, Butcher, Mylander & Dodge,Reference Goldin-Meadow, Butcher, Mylander and Dodge1994), structure at word (Goldin-Meadow, Mylander & Butcher Reference Goldin-Meadow, Mylander and Butcher1995; Goldin-Meadow, Mylander & Franklin, Reference Goldin-Meadow, Mylander and Franklin2007) and sentence (Feldman, Goldin-Meadow & Gleitman, Reference Feldman, Goldin-Meadow, Gleitman and Lock1978; Goldin-Meadow & Feldman, Reference Goldin-Meadow and Feldman1977) levels, sentence-level negation and question modulators (Franklin, Giannakidou & Goldin-Meadow, Reference Franklin, Giannakidou and Goldin-Meadow2011), nominal constituents (Hunsicker & Goldin-Meadow, Reference Hunsicker and Goldin-Meadow2012), grammatical categories such as subject (Coppola & Newport, Reference Coppola and Newport2005), and a demarcated distinction between noun and verb categories (Goldin-Meadow et al., Reference Goldin-Meadow, Butcher, Mylander and Dodge1994). The question we ask here is whether homesigners introduce a language-like segmentation strategy (e.g. a rolling gesture produced in place, followed by a gesture moving across space) into gesture sentences that express both Manner and Path of spontaneous motion.
The gestures that homesigners produce have, in fact, been found to display one type of segmentation and combination. For example, in order to describe putting down a round penny, a homesigner child first held up a ‘round’ handshape (thumb and index forming a circle, ‘penny’), followed by a flat palm moved downward (‘down’), thus producing two segmented gestures strung together (‘penny–down’) rather than a single holistic gesture that combined both semantic elements (i.e. moving the circle-shaped hand down, ‘penny + down’; Goldin-Meadow, Reference Goldin-Meadow2003; Goldin-Meadow et al., Reference Goldin-Meadow, Mylander and Butcher1995). Even hearing adults, when asked to use their hands without speech to describe an event, will produce segmented gestures, each representing a different semantic element. For example, when asked to describe with their hands a simple event in which a circle moves diagonally across the screen, hearing speakers behave like homesigners – they produce two separate gestures, one representing the circle (‘penny’) and one presenting the diagonal downward movement down (‘down’) (Gershkoff-Stowe & Goldin-Meadow, Reference Gershkoff-Stowe and Goldin-Meadow2002; Goldin-Meadow, McNeill & Singleton, Reference Goldin-Meadow, McNeill and Singleton1996). Segmenting semantic elements (in this case, figure and path) out of a motion event thus appears to be a basic aspect of cognition, easily incorporated into communication (see Goldin-Meadow, So, Özyürek & Mylander, Reference Goldin-Meadow, So, Özyürek and Mylander2008, for evidence that hearing adults can segment other semantic elements, e.g. the patient and the endpoint, out of motion events when using their hands to describe the event).
However, segmenting a figure from its path might be different from segmenting two movements that take place simultaneously (e.g. separating the act of running down the street into the path along which the runner moves and the manner of movement that propels the runner along the path). As mentioned earlier, despite the simultaneity of path and manner in the actual event, established languages rarely conflate manner and path into the same lexical item and instead encode manner and path in two separate lexical items, using different lexicalization patterns depending on the typology of the language (Talmy, Reference Talmy and Shopen1985). For example, in English (a satellite-framed language), manner is expressed in the verb and path in a satellite, as in The child runs (manner) down (path) the street. In contrast, in Turkish (a verb-framed language), path is expressed in the main verb and manner in a subordinate verb as in Çocuk koșarak tepeden așaǵı indi – ‘child as running (manner) descended the hill (path)’.Footnote 1 The sign languages in which path and manner have thus far been studied, American Sign Language (ASL) and the Sign Language of the Netherlands (SLN, also known as Nederlandse Gebarentaal, NGT), also convey manner and path in separate lexical items in verbal predicates. For example, Slobin and Hoiting (Reference Slobin and Hoiting1994) have proposed that sign languages use serial-verb constructions (manner-path) and are best characterized as complex verb-framed languages (although manner and path can be combined within a single sign in some classifier constructions in sign languages; Supalla, Reference Supalla, Fischer and Suple1990). Action segmentation of this sort has even been observed in newly emerging sign languages (e.g. Nicaraguan Sign Language; Senghas et al., Reference Senghas, Kita and Özyürek2004). Do we see action segmentation and sequencing in homesign?
Previous research has shown that American and Chinese child homesigners are able to segment manner and path into separate gestures when communicating about crossing space events – they produce gestures conveying path alone and gestures conveying manner alone, in addition to producing manner + path gestures (Zheng & Goldin-Meadow, Reference Zheng and Goldin-Meadow2002). However, the previous work did not ask whether manner and path gestures were combined within a single gesture sentence and, if so, how those sentences were structured. Nor did the study examine the gestures produced by hearing individuals in the community, gestures that might have served as input to the homesign system.
PRESENT STUDY
To determine whether young children who are creating their communication systems without benefit of a community of language users introduce action segmentation and combination into those systems, we asked Turkish homesigners, seven in Study 1 and five in Study 2, to gesture about a series of events designed to elicit manner and path descriptions.
Homesigners are not exposed to a conventional sign language. However, they do see the gestures that their hearing parents and other hearing people around them produce as they talk (cf. Iverson, Capirci, Longobardi & Caselli, Reference Iverson, Capirci, Longobardi and Caselli1999; Özçalışkan & Goldin-Meadow, Reference Özçalişkan and Goldin-Meadow2005; Shatz, Reference Shatz, Wanner and Gelitman1982). Thus, we compared the gestures produced by the homesigners to gestures that the children's hearing mothers and eighteen other hearing adults in Study 1 produced in response to the motion events to determine whether the gestures hearing speakers produce provide a model for the deaf children's homesigns.
Hearing children are also exposed to the gestures for motion events produced by adult speakers of their language. The difference, however, is that the hearing children experience other people's gestures, and produce their own gestures, in the context of speech. Thus, we examined the gestures that Turkish hearing children, fourteen in Study 1 and five in Study 2, produced in response to the motion events to determine whether experiencing and producing gesture in the context of speech matters compared to using gestures only.
Finally, we explore whether the patterns found in the deaf children's gestures, if not copied from hearing speakers' gestures, might be a response simply to the fact that the manual modality is the deaf child's sole means of communication. If so, requiring hearing adults to describe motion events using gesture without speech might result in gestures that resemble the deaf children's. As mentioned earlier, we know that when hearing adults are called upon to use the manual modality as their sole means of communication, they can segment semantic elements such as figure and path into separate gestures and combine those gestures into structured strings (Goldin-Meadow et al., Reference Goldin-Meadow, McNeill and Singleton1996, Reference Goldin-Meadow, So, Özyürek and Mylander2008). To determine whether segmentation and combination of manner and path will also arise when hearing speakers rely solely on the manual modality, we asked the eighteen Turkish hearing adults who initially described the events in speech in Study 1 to describe them a second time in Study 3, this time using only their hands.
In sum, we investigate whether homesigns – gestural systems that develop in a deaf child without conventional language input – contain the roots of action segmentation and sequencing. We compare three conditions under which action segmentation and combination have the potential to arise in the manual modality: (i) when gestures have been a child's only means of communication throughout development (deaf homesigners); (ii) when gestures are produced along with speech (hearing adults, hearing children, hearing mothers of the deaf homesigners gesturing while talking); and (iii) when gestures are recruited on-the-spot to replace speech (hearing adults gesturing without talking). We focus on descriptions of spontaneous motion events (i.e. actors moving across space on their own). In these events, manner (e.g. rolling) takes place throughout the crossing-space event (e.g. moving down). Events of this sort can be represented in the manual modality either holistically (rotating the hand while moving it down), or componentially (rotating the hand in place, followed or preceded by moving the hand down), and thus provide fertile ground for exploring the conditions that give rise to action segmentation and combination.
STUDY 1
METHOD
Participants
Seven Turkish deaf children, ranging in age from 3;2 (years; months) to 4;9 (M = 4;2), participated in a longitudinal study and were videotaped at home every one to three months (Table 1). The children were congenitally deaf, with bilateral hearing losses (70–90 dB), and no other reported cognitive or physical disabilities (in Turkey, there are few opportunities for deaf children to be given normed cognitive evaluations; however, during our year-long observations, we did not notice any major cognitive or social deficiencies in the children in our study – they all performed our tasks without difficulty). The children's hearing parents had chosen to educate them using oral methods. None of the children had cochlear implants, but all wore hearing aids, although they did not use them regularly and, in addition, had very little (if any) speech therapy. Although able to produce an occasional Turkish word, the children did not combine words into sentences. Moreover, none had been exposed to conventional sign language or had contact with another deaf child or adult. The deaf children had not attended preschool of any sort during the observational period and spent their days at home with their mothers (at the time of our observations, deaf children did not begin school until age seven in Istanbul; the first preschool for deaf children was established after this study was conducted). The deaf children's hearing mothers also participated in the study.
In addition, fourteen Turkish hearing children, ranging in age from 3;0 to 6;10 (M = 4;9) and drawn from families of the same socioeconomic status as the deaf children were videotaped at home, and eighteen Turkish adults, undergraduate students in Istanbul, were videotaped on campus of a Turkish university (Koç University). All hearing participants were native Turkish speakers.
Procedure
Hearing participants (adults, children, mothers of deaf children) were told that they would see a series of animated vignettes on a laptop, and were asked to tell the experimenter what happened after each vignette; the speakers gestured spontaneously while talking and it was these co-speech gestures that we compared to the homesigners' gestures. To elicit responses from the deaf homesigners, after each vignette, the experimenter produced a two-handed flip gesture along with a quizzical look and pointed at the screen.
All participants were shown six spontaneous motion events highlighting manner and path (Özyürek, Kita & Allen, Reference Özyürek, Kita and Allen2001), along with thirty-six other action events (Goldin-Meadow et al., Reference Goldin-Meadow, So, Özyürek and Mylander2008), in random order. During the retelling, a still picture of the initial scene of the event, which included all objects in the event, was placed in front of the participants as a memory aid. The children, and occasionally the adults (particularly when asked to gesture without speech), pointed at the picture as a way to refer to an object in the event or traced a trajectory on the picture to refer to the path or the manner.
The deaf children were part of a longitudinal study and thus were shown the events six times at sessions taking place over the course of several months (see Table 1). The hearing children and the deaf children's hearing mothers described the events once. Four of the mothers told the vignettes to their own children; three told them to the experimenter. The mothers performed the task after the sixth session; thus all of the deaf children described the vignettes before their mothers did their descriptions.
The hearing adults described the vignettes twice, first using speech and whatever gestures they spontaneously produced, second using gesture and no speech, always in the same order. The gestures that the hearing adults produced without speech will be described in Study 3.
Materials
We focused on the six animations (each 6–15 seconds) designed to highlight simultaneous manner and path of spontaneous events: roll + ascend, roll + descend, rotate + ascend, rotate + descend, jump + ascend, jump + go around. For example, rotate + ascend involved an animated object turning on its horizontal axis as it ascended vertically in the air. Each clip involved a round red smiling character and a triangular-shaped green frowning character moving within a simple landscape. All the vignettes were designed so that the target event – which was always in the middle of the sequences of events – had a distinct beginning and a distinct end point (Figure 1); these events were used in Özyürek, Kita, Allen, Furman, and Brown (Reference Özyürek, Kita, Allen, Furman and Brown2005, Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008), Kita et al. (Reference Kita, Özyürek, Allen, Brown, Furman and Ishizuka2007), and Allen et al. (Reference Allen, Özyürek, Kita, Brown, Furman, Ishizuka and Fujii2007) to investigate expressions of spontaneous motion events with simultaneous manner and path in hearing speakers. The path of the moving figure in the target event always followed a different direction from the path followed in the entry and closing events (e.g. from a horizontal path in the entry event in Figure 1, to a diagonal movement up the hill in the target event, followed by another horizontal path in the closing event. The manner also occurred only in the target event. This design made it easy to identify the boundaries of the target event in speech and in gesture during coding.
In creating the stimuli, we made sure that the target motion event was indeed a spontaneous (i.e. not caused) event. In two of the events, roll + down and roll + ascend, the figure in the target event is given a bump during the entry event, but when the figure changes direction in the target event, it is clear that it is moving under its own steam. In the four other events, there is no bump during the entry event.
Coding
As in previous papers (Allen et al., Reference Allen, Özyürek, Kita, Brown, Furman, Ishizuka and Fujii2007; Kita et al., Reference Kita, Özyürek, Allen, Brown, Furman and Ishizuka2007; Özyürek et al., Reference Özyürek, Kita, Allen, Furman and Brown2005, Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008), we included only the gestures that displayed the direction of the figure's path during the target event. The stroke (meaningful phase) of the gesture (Kendon, Reference Kendon and Key1980; McNeill, Reference McNeill1992) was used to segment gestures and determine their meaning. To determine onset and offset of gesture strokes, we considered changes in the parameters of shape, placement of the hand, trajectory of motion, and tension of the hands (for more on gesture phases and how to recognize and code them, see Kita et al., Reference Kita, van der Hulst, van Gijn, Wachsmuth and Fröhlich1998).
Following Kita et al. (Reference Kita, Özyürek, Allen, Brown, Furman and Ishizuka2007) and Özyürek et al. (Reference Özyürek, Kita, Allen, Furman and Brown2005, Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008), gestures were divided into three categories:
1. Path gestures depicted the trajectory that the moving object took (e.g. ascending movement of the hand representing moving upward).
2. Manner gestures depicted the manner by which the object moved as it changed its location (e.g. repetitive circular movement representing rolling).Footnote 2
3. Manner + Path gestures simultaneously depicted both manner and path within the gesture's stroke (e.g. hand moves repetitively in a circle as it ascends representing rolling upward). Single points to objects were not included in the analysis. However, points that traced either the trajectory of a path or the manner of movement were included.
We also coded gesture strings. Our goal was to analyze gestures in the same way for all participants. We therefore needed to consider gesture without regard to speech. We divided gestures into strings using motoric criteria. Following Goldin-Meadow and Mylander (Reference Goldin-Meadow and Mylander1984), string breaks were coded when participants paused or relaxed their hands. On average, pauses lasting longer than 1·5 s. constituted a break between gesture strings. A string could contain one or more gesture strokes, e.g. a circular movement of the hand (stroke 1, Manner) followed, without pause, by a downward movement of the hand (stroke 2, Path) was considered two gestures within a single string.
We categorized gesture strings into five types:
1. Path alone (no gestures referring to manner).
2. Manner alone (no gestures referring to path).
3. Conflated (both manner and path were produced within a single gesture stroke, i.e. Manner + Path, with no other gestures referring to manner or path in the string).
4. Sequenced (at least one Manner gesture conjoined with at least one Path gesture and no Manner + Path gestures in the string).
5. Mixed (a Manner + Path gesture combined with a Manner or Path gesture, i.e. two gestures). When a combination included Manner + Path gesture combined with a Manner and a Path gesture (i.e. three gestures), we also coded it as a Mixed form (although this happened rarely).
Reliability was calculated on 30% of the participants' event descriptions in each group by two independent coders. Agreement between coders was 94% for categorizing both gestures and gesture strings (Cohen's Kappa score was 0·87 for both).
All of the statistical analyses reported below are omnibus ANOVAs followed by post-hoc tests exploring pairwise comparisons when needed.
RESULTS
Homesigners across sessions and ages
Homesigners are unique in that they are not able to make use of the spoken language that surrounds them, nor are they exposed to a conventional sign language. The homesigners were participating in a longitudinal study and were shown the vignettes at each of the six sessions at which they were observed (see Table 1). In order to compare their data to the data collected on the hearing participants, who responded to the vignettes only once, we collapsed the homesigners' scores across the six sessions, using the mean across sessions for each homesigner. The decision to use a single score for each homesigner was motivated by the fact that we did not find differences in the string types homesigners produced across sessions. For each type of gesture string, we performed a repeated measures ANOVA on the number of strings produced by each child with session (1 through 6) as the within-subject factor. Furthermore, to determine whether age of the child contributed to each effect, the age of each child at session 1 was included as a covariate in each ANOVA analysis since the children began the study at different ages (see Table 1). The analyses revealed no main effect of session for any of the gesture string types: Path Only strings (M = 2·36, SE = 2·37; F(5,25) = 0·26, p = ·80, partial η 2 = ·05); Manner Only strings (M = 0·61, SE = 0·98; F(5,20) = 0·39, p = ·67, partial η 2 = ·09); Conflated strings (M = 2·52, SE = 2·15; F(5,25) = 0·19, p = ·80, partial η 2 = ·04); Mixed strings (M = 1·21, SE = 1·34; F(5,25) = 0·35, p = ·73, partial η 2 = ·07); Sequenced strings (M = 0·26, SE = 0·43; F(5,25) = 2·58, p = ·11, partial η 2 = ·34). No significant interaction effects were found between mean number of gesture strings per session and age for any of the gesture string types: Path Only strings (F(5,25) = 0·26, p = ·78, partial η 2 = ·05); Manner Only strings (F(5,20) = 0·50, p = ·61, partial η 2 = ·11); Conflated strings (F(5,25) = 0·30, p = ·71, partial η 2 = ·06); Mixed strings (F(5,25) = 0·36, p = ·72, partial η 2 = ·05); Sequenced strings (F(5,25) = 2·65, p = ·10, partial η 2 = ·34).
Homesign compared to co-speech gesture
We looked first at the total number of manner and path gestures participants produced and found no significant differences across groups: (M = 9·9, SE = 1·1) per participant across the six vignettes for homesigners; (M = 6·2, SE = 0·6) for hearing adults; (M = 6·3, SE = 1·1) for hearing children; (M = 7·7, SE = 2·0) for hearing mothers; (F(3,42) = 2·00, p = ·13, partial η 2 = ·14), one-way ANOVA with group (homesigners, hearing adults, hearing children, hearing mothers) as the independent factor, and total number of manner and path gestures as the dependent factor. We then looked at the total number of gesture strings participants produced and also found no significant differences across groups: (M = 7·0, SE = 0·9) gesture strings per participant across the six vignettes for the homesigners; (M = 5·0, SE = 0·5) for the hearing adults; (M = 5·4, SE = 0·8) for the hearing children; (M = 6·4, SE = 1·4) for the hearing mothers; (F(3,42) = 1·15, p = ·34, partial η 2 = ·08) one-way ANOVA with group (homesigners, hearing adults, hearing children, hearing mothers) as the between-subjects independent factor, and number of gesture strings as the dependent factor.
We turned next to the types of gesture strings the participants produced. Figure 2 presents the number of gesture strings of each type produced by a participant, taken as a proportion of all gesture strings that the participant produced, and averaged across all of the participants within each of the four groups. In other words, the proportions in this figure (and all subsequent figures) were calculated by individual and then averaged to create a mean proportion per group.
We focused first on the gesture strings containing 1-event component strings, that is strings containing either a Path or a Manner gesture (the two sets of bars on the left). A 4 × 2 ANOVA with group (homesigners, hearing adults, hearing children, hearing mothers) and string type (Manner only, Path only) as independent factors, and proportion of 1-event component strings as the dependent factor, revealed an effect of group (F(3,84) = 3·2, p = ·03, partial η 2 = ·1); an effect of string type (F(1,84) = 90·43, p < ·000, partial η 2 = ·53); and no interaction (F(3,84) = 2·45, p = ·07, partial η 2 = ·08). LSD pairwise post-hoc comparisons revealed that homesigners used 1-event component strings significantly less often (M = 0·43, SE = 0·04) than hearing adults (M = 0·84, SE = 0·05) (p = ·005) and hearing children (M = 0·80, SE = 0·07) (p = ·012), but not significantly less often than their mothers (M = 0·65, SE = 0·12) (p = ·199). All groups used more Path only strings than Manner only strings.
We then examined 2-component gesture strings in which both manner and path were conveyed. We again calculated the number of gesture strings of each type produced by a participant, taken as a proportion of all gesture strings that the participant produced, and averaged across all participants within each of the four groups. A 4 × 3 ANOVA with group (homesigners, hearing adults, hearing children, hearing mothers) and string type (Conflated, Mixed, Sequenced) as independent factors, and proportion of 2-component strings as the dependent factor, revealed an effect of group (F(3,126) = 8·9, p < ·000, partial η 2 = ·18) and string type (F(6,126) = 16·2, p < ·000, partial η 2 = ·21), and an interaction between string type and group (F(2,126) = 3·07, p = ·008, partial η 2 = ·13). We describe these effects for each of the string types in the next paragraphs.
LSD pairwise comparisons revealed that Conflated strings were used significantly more often by homesigners (M = 0·34, SE = 0·05) than by hearing adults (M = 0·09, SE = 0·03), (p < ·000) and hearing children (M = 0·09, SE = 0·04) (p = ·001). But homesigners did not differ significantly from their mothers (M = 0·22, SE = 0·06) (p = ·13). The hearing mothers produced approximately the same number of Conflated gestures whether or not they addressed their deaf child: the four hearing mothers who described the vignettes to their deaf child produced, on average, 1·8 Conflated gesture strings; the remaining three who described the vignettes to the experimenter produced 2·0. No other differences were found among the hearing groups.
Turning next to Mixed strings, LSD pairwise comparisons revealed that homesigners (M = 0·17, SE = 0·02) used significantly more Mixed forms than all three hearing groups: hearing children (M = 0·05, SE = 0·02) (p = ·002); hearing mothers (M = 0·07, SE = 0·05) (p = ·02); and hearing adults, who did not use any mixed forms. Mothers also used Mixed forms significantly more often than the other hearing adults (p = ·05); however, only two hearing mothers produced Mixed combinations, one who described the vignettes to her deaf child and one who described them to the experimenter. In contrast, all seven deaf children produced at least one instance of a Mixed form. The participants created Mixed strings by combining Conflated gestures equally often with Manner gestures (19 in total) or with Path gestures (17 in total), and less often with both Manner and Path gestures (2 in total, both produced by homesigners).
No differences were found among the groups with respect to Sequenced strings (homesigners, M = 0·06, SE = 0·03; hearing adults, M = 0·07, SE = 0·02; hearing children, M = 0·05, SE = 0·02; and hearing mothers, M = 0·06, SE = 0·04). There was only one instance in which a Path gesture was sandwiched between two Manner gestures, and one in which a Manner gesture was sandwiched between two Path gestures, both produced by homesigners.
DISCUSSION
We have found that, when describing motion events, Turkish homesigners often mention both manner and path within a single gesture string, and do so significantly more often than hearing speakers do in the gestures they produce along with their speech to describe the same events. When homesigners mention both manner and path, they use two different forms: the Conflated form in which manner and path are combined within a single gesture (manner + path), and the Mixed form in which conflated gestures are combined with a segmented gesture for manner or path (e.g. manner + path − manner). Note that the Conflated form represents the motion event holistically and iconically since both the manner and the path take place simultaneously in the actual event. In contrast, the Mixed form segments out either the manner or the path and is thus a step away from iconicity. We also found that the homesigners produced the Mixed form significantly more often than hearing adults, hearing children, and their own hearing mothers, suggesting that this segmentation strategy is not directly copied from the gestural input that the children see. These patterns were presents as early as age three years in the homesigners.
STUDY 2
One difficulty with Study 1 is that the homesigners' data came from six observation sessions, whereas all of the hearing participants were observed only once. The fact that we found no differences across the deaf children's six sessions suggests that the relatively large numbers of Mixed gesture strings that the homesigners produced were not attributable to their being observed a number of times. However, to verify these findings, we asked, in Study 2, an additional five Turkish homesigners and five Turkish hearing children to describe the same motion events only once, and examined their gestures.
In addition to replicating the patterns in Study 1, Study 2 had one other methodological goal. As described in Study 1, during the retellings of the vignettes, a still picture of the initial scene of the event, which included all objects in the event, was placed in front of the participants as a memory aid. All seven of the homesigners and eleven of the fourteen (·78) hearing children in Study 1 had referred to the pictures, and produced ·70 and ·51 of their manner and path gestures, respectively, on the pictures. In contrast, only four of the eighteen (·22) hearing adults and four of the seven (·57) hearing mothers in Study 1 referred to the pictures, producing ·09 and ·25 of their manner and path gestures, respectively, on the pictures. To determine whether the pictures had influenced the gestures that the children produced in Study 1, we modified the procedure in Study 2 and presented the vignettes without the pictures, thus making it likely that the children would produce their gestures in neutral space (i.e. at chest level) rather than on the pictures, as did the adults in Study 1.
METHODS
Participants
Five Turkish deaf children, ranging in age from 4;7 to 7;2 (M = 6;3), were videotaped once in their homes; none of the children had participated in Study 1. As in Study 1, all of the homesigners were congenitally deaf, with bilateral hearing losses (70–90 dB), did not have cochlear implants, used hearing aids, and had no other reported cognitive or physical disabilities. None had been exposed to a conventional sign language or had contact with another deaf child or adult. None of the children had attended preschool of any sort, and all spent their days at home with their mothers.
In addition, five Turkish hearing children, ranging in age from 3;9 to 6;8 (M = 5;3), and drawn from families of the same socioeconomic status as the deaf children, were videotaped once at home. All of the hearing children were native Turkish speakers.
Procedure, materials, coding
The procedure, materials, and coding for Study 2 were identical to Study 1, with the exception that the children were not given pictures to act as a memory aid during their retellings of the vignettes.
RESULTS
One of the hearing children produced no gestures and was therefore removed from the statistical comparisons. As in Study 1, we found that the deaf and hearing children did not differ significantly in the total number of manner and path gestures they produced: 10·2 (SE = 2·52) per participant for homesigners, 6·0 (SE = 2·12) for hearing children (F(1,7) = 1·52, p=·26, partial η 2 = ·18), one-way ANOVA with hearing status as the independent factor and total number of manner and path gestures as the dependent factor. In addition, the deaf and hearing children also did not differ significantly in the total number of gesture strings they produced: (M = 7·0, SE = 1·18) gesture strings per participant for the deaf children; (M = 4·75, SE = 1·38) for the hearing children (F(1,8) = 1·55, p = ·25, partial η 2 = ·18), one-way ANOVA with hearing status (deaf, hearing) as the between-subjects independent factor, and number of gesture strings as the dependent factor.
Figure 3 presents the number of gesture strings of each type produced by a participant, taken as a proportion of all gesture strings that the participant produced, and averaged across the participants within each of the two groups. As in Study 1, we focused first on 1-component strings, containing either a Path or a Manner gesture. A two-way ANOVA with hearing status (deaf, hearing) and string type (Manner only, Path only) as independent factors, and mean proportion of 1-component strings as the dependent factor, revealed no main effect of hearing status (F(1,14) = 2·61, p = ·13, partial η 2 = ·16), a marginal effect of string type (F(1,14) = 4·67, p = ·049, partial η 2 = ·25), and a significant interaction (F(1,14) = 6·96, p = ·02. partial η 2 = ·33). Further one-way ANOVA tests conducted to explore this interaction revealed that deaf children used fewer Path only strings than hearing children (F(1,7) = 10·05, p = ·02, partial η 2 = ·59). There were no differences in Manner only strings between groups (F(1,7) = 0·48, p = ·51, partial η 2 = ·06). Deaf children used Path only and Manner only strings equally often (F(1,8) = 0·21, p = ·66, partial η 2 = ·03). Hearing children, in contrast, produced more Path only than Manner only strings (F(1,6) = 6·84, p = ·04, partial η 2 = ·53).
Turning to the 2-component strings in which both manner and path were conveyed, we again calculated the number of gesture strings of each type produced by a participant, taken as a proportion of all gesture strings that the participant produced, and averaged across all participants within each of the two groups. The deaf children used all three types of 2-component strings: Conflated (M = 0·22, SE = 0·12), Mixed (M = 0·13, SE = 0·07), and Sequenced (M = 12, SE = 0·05). The hearing children produced only one type: Mixed (M = 0·06, SE = 0·06). A 2 × 3 ANOVA with group (homesigners, hearing children) and string type (Conflated, Mixed, Sequenced) as independent factors, and proportion of 2-component strings as the dependent factor, revealed an effect of group (F(1,21) = 5·3, p = ·03, partial η 2 = ·20), but no effect for string type (F(2,21) = 0·27, p = ·77, partial η 2 = ·03), and no interaction (F(2,21) = 0·57, p = ·58, partial η 2 = ·05). Thus, deaf children used significantly more Conflated, Mixed, and Sequenced string types than hearing children.
DISCUSSION
Study 2 paralleled the findings of Study 1. When describing motion events, the Turkish homesigners once again conveyed both manner and path within a single string, using Mixed, Conflated, and Sequenced forms to do so. Moreover, they produced fewer 1-component strings and more 2-component strings (of all types) than hearing children did in the gestures they produced along with their spoken descriptions of the same events. In addition, because we did not provide children in Study 2 with still pictures to use as memory aids, we can be certain that the patterns found in Study 1 (and replicated in Study 2) were not influenced by having pictures present during the retellings.
STUDY 3
Why do deaf homesigners use Conflated and Mixed forms so often to convey manner and path? As mentioned earlier, the homesigners use the manual modality as their sole means of communication. One possibility, then, is that Conflated and Mixed forms arise whenever communication is done with the hands alone. Alternatively, these forms may arise only in gestures that have been used for communication for many years and have transformed into a semi-structured system (as in the homesigners). Study 3 explores this possibility by observing the gesture strings that are produced when hearing speakers recruit the manual modality on-the-spot as their sole means of communication. We analyzed the gestures that the eighteen hearing adults produced when asked to describe the motion events without speech, comparing them first to the gestures that the same adults produced when describing the events with speech (analyzed in Study 1), and then to the gestures that the seven homesigners in Study 1 produced when describing the events.
METHODS
Participants
The participants were the eighteen hearing adults and the seven deaf homesigners who participated in Study 1. We used the homesigners in Study 1 (as opposed to Study 2) for this comparison because we found that twelve of the eighteen (67%) hearing adults referred to the pictures when asked to describe the events without speech, producing ·43 of their manner and path gestures on the pictures. As noted earlier, homesigners in Study 1 also produced many manner and path gestures on the pictures and, in this sense, are more comparable to the hearing adults in the no-gesture condition than the homesigners in Study 2 who did not have the pictures available.
Procedure, materials, coding
The procedure, materials, and coding for Study 3 were identical to Study 1, with the exception that, in addition to describing the vignettes with speech (and spontaneous gesture), the hearing adults were asked to describe the vignettes a second time using only their hands and not their mouths. The order of retellings was always the same: first with speech, then without speech. We followed this order because putting the silent gesture condition first might have encouraged the participants to focus on gesture and, as a result, alter their subsequent co-speech gestures. The still pictures of the initial scene of each event were available to the participants when they described the events with and without speech, as was the case for all groups in Study 1.
RESULTS
Gesture without speech compared to co-speech gesture
The hearing adults produced, on average, 6·9 (SE = 0·2) total manner and path gestures across the vignettes when asked to gesture without speaking, a number that did not differ from the total number of manner and path gestures they produced while speaking (F(1,17) = 1·42, p = ·25, partial η 2 = ·08), repeated measures ANOVA with group as the within-subjects factor and total number of manner and path gestures as the dependent factor. In contrast, the silent gesturers produced, on average (M = 6·0, SE = 0·1) gesture strings across the six vignettes, which was significantly more gesture strings than they produced while speaking (F(1,17) = 4·63, p = ·046, partial η 2 = ·21), repeated measures ANOVA with group as the within-subjects factor and total number of gesture strings as the dependent factor.
Figure 4 displays the number of gesture strings of each type produced by a participant, taken as a proportion of all gesture strings that the participant produced, and averaged across all of the participants within each of the two conditions: hearing adults when producing gestures with speech vs. without speech. We first focused on 1-component strings and conducted a 2 × 2 repeated measures ANOVA on the proportion of 1-event component strings, with condition (gesture with speech, gesture without speech) as the within-subjects factor and string type (Manner only, Path only) as the between-subjects factor. We found main effects of condition (F(1,34) = 82·06, p < ·000, partial η 2 = ·71) and string type (F(1,34) = 77·28, p < ·000, partial η 2 = ·69), and an interaction between the two (F(1,34) = 42·25, p < ·000, partial η 2 = ·55). Adults used fewer 1-component strings in the gesture without speech condition than in the gesture with speech condition. The interaction reflected the fact that the difference in the proportion of Path only gesture strings adults produced with speech (M = 0·73, SE = 0·06) vs. without speech (M = 0·14, SE = 0·04) was greater (F(1,17) = 92·56, p < ·000, partial η 2 = ·85) than the difference in the proportion of Manner only gesture strings they produced with speech (M = 0·11, SE = 0·04) vs. without speech (M = 0·009, SE = 0·01; F(1,17) = 4·73, p = ·04, partial η 2 = ·22).
We conducted a similar analysis on the 2-component gesture strings, using a 3 × 2 ANOVA with condition (gesture with speech, gesture without speech) as the within-subjects factor and string type (Conflated, Mixed, Sequenced) as the between-subjects factor, and proportion of 2-component strings as the dependent factor. We found main effects of condition (F(1,51) = 102·1, p < ·000, partial η 2 = ·67) and string type (F(2,51) = 124·99, p < ·000, partial η 2 = ·83), and an interaction between the two (F(2,51) = 84·21, p < ·000, partial η 2 = ·77).
Adults used more 2-component strings in the gesture without speech condition than in the gesture with speech condition. Further post-hoc analyses (repeated measures) revealed that the adults produced significantly more Conflated gesture strings (F(1,17) = 157·27, p < ·000, partial η 2 = ·90) in the gesture without speech condition (M = 0·74, SE = 0·04) than in the gesture with speech condition (M = 0·07, SE = 0·03). No difference was found for the Sequenced strings (without speech, M = 0·03, SE = 0·02; with speech, M = 0·07, SE = 0·02) (F(1,17) = 0·85, p = ·37, partial η 2 = ·05). No Mixed forms were produced in the gesture with speech condition, and very few in the gesture without speech condition (M = 0·08, SE = 0·03).
When required to use only their hands to communicate, the hearing adults put all of the necessary information into their hands, producing gesture strings containing both manner and path. Moreover, they tended to combine manner and path within a single gesture, producing significantly more Conflated strings when they gestured without speech than when they gestured with it.
Gesture without speech compared to homesign
Next we compared the gestures that the hearing adults produced without speech to the gestures produced by the seven homesigners who participated in Study 1. Figure 5 displays the silent hearing adults' gesture strings in relation to the homesigners' gesture strings. We first focused on the proportion of 1-component strings the two groups produced. A 2 × 2 ANOVA with group (homesigners, silent gesturers) and string type (Path only, Manner only) as independent factors, and proportion of 1-component gesture strings as the dependent factor, revealed main effects of group (F(1,46) = 14·41, p < ·000, partial η 2 = ·24) and string type (F(1,46) = 25·23, p < ·000, partial η 2 = ·35), but no interaction between the two (F(1,46) = 2·56, p = ·12, partial η 2 = ·05). The homesigners conveyed more 1-component events (M = 0·43, SE = 0·04) than the silent gesturers (M = 0·15, SE = 0·02) and thus did not appear to be as sensitive to the importance of conveying both manner and path information as the silent gesturers. Both groups produced more Path only than Manner only 1-component strings.
A similar analysis was conducted on the 2-component gesture strings. We conducted a 3 × 2 ANOVA, with group (homesigners, silent gesturers) and string type (Conflated, Mixed and Sequenced) as independent factors, and the proportion of 2-component strings as the dependent factor. We found main effects of group (F(1,69) = 8·69, p = ·004, partial η 2 = ·11) and string type (F(2,69) = 86·59, p < ·000, partial η 2 = ·72), and an interaction between the two (F(2,69) = 21·77, p < ·000, partial η 2 = ·39). Post-hoc tests revealed that the silent gesturers used Conflated gesture strings significantly more often than homesigners (F(1,23) = 27·13, p < ·000, partial η 2 = ·54). In contrast, homesigners used Mixed gesture strings marginally more often than the silent gesturers (F(1,23) = 3·6, p = ·06, partial η 2 = ·14). There were no differences between the groups in Sequenced strings (F(1,23) = 0·36, p = ·55, partial η 2 = ·02).
DISCUSSION
We have found that the gestures Turkish homesigners produce to describe spontaneous motion events look different from a hearing person's gestures, even when those gestures are produced without speech. When told to use gesture and not speech to describe motion events, Turkish hearing adults mentioned both manner and path within a single gesture string, and did so even more often than the deaf homesigners. Importantly, however, the hearing adults did not use the same forms to convey these pieces of information as the deaf homesigners – the hearing adults exclusively conflated manner and path into one gesture; the deaf children produced these Conflated forms, but also produced forms in which manner and/or path was segmented out into a separate gesture produced along with the Conflated gesture (i.e. the Mixed form). Thus, although the need to convey everything in the manual modality is likely to have encouraged both homesigners and silent gesturers to express manner and path within a single gesture string, this need did not dictate the form of the gestures: Silent gesturers relied exclusively on the Conflated form, whereas homesigners used the Mixed form as well as the Conflated form.
GENERAL DISCUSSION
Turkish homesigners, who are developing their communication systems without the benefit of a conventional language model, nevertheless express two basic elements of motion events – manner and path – in their gestures. These observations corroborate those of Zheng and Goldin-Meadow (Reference Zheng and Goldin-Meadow2002), who found that Chinese and American homesigners produced manner and path gestures when describing motion events in naturalistic interactions. However, our findings take the phenomenon several steps further by analyzing how these two elements are combined within a gesture string and comparing them to the gestures produced by hearing children and adults in the same cultural community.
Emphasis on both manner and path
We found, first, that in approximately half of their gesture strings (see the right-most bars in Figures 2 and 3), homesigners mention both manner and path within a single gesture string; moreover, they display this pattern as early as age three, with no further developmental change through age five. This finding is itself interesting given that many studies of preschool-aged hearing children describing motion events in speech (e.g. Allen et al., Reference Allen, Özyürek, Kita, Brown, Furman, Ishizuka and Fujii2007; Özyürek et al., Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008; Papafragou & Selimis, Reference Papafragou and Selimis2010) have found that children of this age tend to mention only one component of the motion event (typically the path, but see Bunger, Trueswell & Papafragou, Reference Bunger, Trueswell and Papafragou2012, and Papafragou, Massey & Gleitman, Reference Papafragou, Massey and Gleitman2006, for the effect of typology of the language), rather than mentioning both manner and path. We speculate that the homesigners' inclusion of both manner and path at these early stages may reflect the influence of modality. Recent work by Sümer, Zwitserlood, Perniss, and Özyürek (Reference Sümer, Zwitserlood, Perniss and Özyürek2013) has shown that deaf children learning Turkish Sign Language from their deaf parents frequently include both manner and path in their sign sentences beginning at age four, and do so significantly more often than age-matched hearing peers who are learning Turkish. It may be easier to convey both manner and path in the manual modality, which supports an iconic mapping between form and meaning.
Second, we found that when homesigners mention manner and path within a single gesture sentence, they often conflate the two components into one gesture (manner + path). Note that the conflated form portrays the motion event holistically. But homesigners go beyond holistic representation when they combine conflated gestures with a segmented gesture for manner or path––that is, when they produce the mixed form. Homesigners produce this form significantly more often than hearing adults, hearing children, and their own hearing mothers. Even when hearing adults are called upon to use only their hands to communicate, they rarely produce the mixed form, preferring instead to use the conflated form. Importantly, in another study of four Turkish homesigners (all of whom participated in Study 1) interacting with their hearing mothers in unscripted play sessions at home, we also found evidence of the mixed form; and, just as importantly, we found that the mothers did not produce the mixed form during these interactions (Goldin-Meadow, Namboodiripad, Mylander, Özyürek & Sancar, in press). We consider the implications of these findings on mixed and conflated gesture forms in the next sections.
The mixed form
The mixed form, which combines a conflated form with at least one segmented form, is interesting in large part because it represents a step towards segmentation and combination. Segmentation may not be difficult for Turkish homesigners to introduce into their action gestures simply because they routinely see hearing Turkish speakers produce decomposed gestures along with their descriptions of motion events (see Figure 1, and Kita & Özyürek, Reference Kita and Özyürek2003; Özyürek et al., Reference Özyürek, Kita, Allen, Furman and Brown2005, Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008). These segmented gestures could have served as a model for the homesigners' path alone and manner alone gestures (and even for their few sequenced gesture strings), but segmented gestures are not a good model for the homesigners' mixed form, which involves combining a segmented form with a conflated form. If the homesigners were merely taking their own conflated form and combining it with the most frequent segmented form they see (i.e. path), they should produce more mixed forms with path in the segmented slot (i.e. manner + path − path) than with manner in the segmented slot (manner + path − manner), and they do not.
But perhaps homesigners do not need to see segmentation and combination in order to use it; segmentation and combination offer a number of communicative benefits and thus may be easy for children to discover on their own, possibly facilitated by interaction sequences with their communicative partners. First, segmenting out one component of a simultaneous event allows the language user to combine that component with other elements, thus leading to new combinatorial possibilities not imaginable with the conflated form alone. Second, segmenting out one component allows the language user to focus on one aspect of the event in a topic–focus construction. Importantly, this type of focusing is not possible with conflated or single-component constructions. As a result, the pressure to highlight certain components of an event might have led homesigners to produce mixed forms, and to do so more often than hearing speakers. Hearing speakers can use speech to accomplish this type of focusing and thus need not rely on co-speech gesture for this function. Interestingly, however, even when asked to use gesture on its own without speech, hearing speakers still do not use the mixed form as often as homesigners do – silent gesturers primarily produce conflated forms.
The mixed form represents a small step towards segmentation and combination within action gestures and it is interesting that the silent gesturers do not take it, particularly since they do exhibit other linguistic properties in the gestures they create on the spot (e.g. Goldin-Meadow et al., Reference Goldin-Meadow, So, Özyürek and Mylander2008). This finding suggests that some of the properties found in the homesigners' gestures may require time to develop. Note, however, that the onset of action segmentation and combination must have taken place prior to the onset of our study since the mixed form was present in the homesigners' earliest sessions and did not change in frequency over the six sessions (a fact that also suggests the form was not an adaptive response to the task demands per se). Generating a communication system over a period of years thus appears to be a process that is distinct from inventing gestures on the spot – although there are, of course, many differences between homesigners and silent gesturers (e.g. age, cognitive maturity), making time span only of many potential factors that could account for the difference between the groups.
Turning to a longer time span (development over generations rather than over childhood), we note that our findings cohere well with Senghas et al. (Reference Senghas, Kita and Özyürek2004), who studied changes in how action is segmented in the newly evolving Nicaraguan Sign Language (NSL). Nicaraguan Sign Language was born thirty years ago when deaf children were brought together for the first time in an educational setting but with no sign language instruction. Every year, new students entered the school and the peers developed a common, rule-governed sign language (Kegl, Senghas & Coppola, Reference Kegl, Senghas, Coppola and DeGraff1999; Senghas & Coppola, Reference Senghas and Coppola2001). Senghas et al. (Reference Senghas, Kita and Özyürek2004) explored action segmentation across the first three cohorts of NSL, and found that each new cohort introduced more manner and path segmentation and sequencing (i.e. our sequenced forms) than the previous cohort. Interestingly, the co-speech gestures of the hearing community within which these individuals live displayed no segmentation; the gesturers conflate manner and path into a single gesture.
Our homesigners live in a very different cultural context, and are much younger, than the Nicaraguan gesturers and the NSL signers described by Senghas et al. (Reference Senghas, Kita and Özyürek2004), and we found that they developed a form not previously identified in either the Nicaraguan gesturers or the Nicaraguan signers. Prompted by the discovery of the mixed gesture string in our data, Senghas, Özyürek, and Goldin-Meadow (Reference Senghas, Özyürek, Goldin-Meadow, Smith, Schouwstra, de Boer and Smith2010, Reference Senghas, Özyürek, Goldin-Meadow, Botha and Everaert2013) reanalyzed the original Nicaraguan data reported in Senghas et al. (Reference Senghas, Kita and Özyürek2004) to determine whether and how often the mixed form was used. They discovered that all three cohorts of NSL signers, as well as the Spanish-speaking gesturers, produced the mixed form. However, the mixed form was the dominant response only for the first cohort of NSL signers, the transitional group between the speakers (whose gestures were not linguistically structured) and the second and third cohorts (whose signs were taking on more and more linguistic properties; cf. Senghas & Coppola, Reference Senghas and Coppola2001). However, the fact that the first cohort of NSL signers, who are older and therefore have spent more time in the emerging language community than the second and third cohorts, use segmentation and combination less often than any other group of signers (Senghas et al., Reference Senghas, Kita and Özyürek2004) suggests that time on task does not fully account for the rise of segmentation and combination. The age of the language creator/learner may play an important role as well in order for full segmentation and combinatorial possibilities to emerge (see Senghas, Reference Senghas2003).
We speculate on the basis of our Turkish homesign data that the original Nicaraguan homesigners who came together and created NSL were already producing some mixed sentences (containing conflated and segmented forms). The mixed gesture strings became the dominant form in the first cohort, setting the stage for the sequenced strings (containing only segmented forms) that have come to dominate the signs of the second and third cohorts. The fact that our homesigners produced the mixed form suggests that children who have not had contact with either an accessible conventional language model or other deaf individuals (i.e. children fashioning a communication system without a linguistic community) can introduce action segmentation and combination into their gestures. These children have thus taken a step towards segmentation not found in gesturers. However, the fact that our homesigners produced very few of the sequenced manner-path gesture sentences found primarily in the second or third cohorts of NSL suggests that they have not yet achieved the fully segmented and sequenced forms found in signers. Taken together, the findings suggest that the mixed form may constitute an early step in the emergence of manual communication systems – one that retains an element of iconicity and holistic representation (i.e. Conflated strings) while at the same time allowing the signer to single out and focus on a piece of the event. We might therefore expect to find the mixed form in homesigns developed around the globe, perhaps even in individuals who use homesign into adulthood.
The conflated form
The other dominant form in the homesigners was the conflated manner + path gesture used on its own. The homesigners used this form significantly more often than the hearing adults and hearing children did in their co-speech gestures. The hearing adults and children preferred instead to produce individual path gestures, as we would expect given the syntactic structures of Turkish, their spoken language (Kita & Özyürek, Reference Kita and Özyürek2003; Özyürek et al., Reference Özyürek, Kita, Allen, Furman and Brown2005, Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008). Given that the gestures speakers produce along with their descriptions of motion events tend to parallel the speech they accompany (Özyürek et al., Reference Özyürek, Kita, Allen, Furman and Brown2005, Reference Özyürek, Kita, Allen, Brown, Furman and Ishizuka2008), it is not surprising that the Turkish hearing speakers in Study 1 produced path-only speech descriptions in more than half of their responses. The fact that Turkish speakers used the conflated form in the silent condition, but relied primarily on decomposed gestures (path only and manner only, with a preference for the first) during speech also provides further evidence for the claim that gestures are shaped not directly by the imagery of the event but by the habitual linguistic packaging of event components (Kita et al., Reference Kita, Özyürek, Allen, Brown, Furman and Ishizuka2007).
The homesigners neither understood nor produced spoken Turkish. Their gestures were thus not constrained by Turkish and were free to assume whatever form the child chose. Their choice of the conflated form might have been motivated by iconicity, as this form represents the actual event more closely than the segmented forms. Another possibility is that the homesigners learned the conflated form from their hearing mothers. Recall that the hearing mothers used the conflated form more often than the other hearing adults and almost as often as their deaf children, possibly to be able to communicate with their child in the most iconic way possible. Moreover, when Turkish hearing adults are asked to communicate using only their hands, they increase the number of conflated gestures they produce. If the hearing mothers had addressed their children using gesture without speech, they might have produced an even greater number of conflated gestures. In this regard, it is important to note that the mothers of Turkish deaf homesigners rarely produce gestures without speech when addressing their children in spontaneous interactions (Flaherty & Goldin-Meadow, Reference Flaherty, Goldin-Meadow, Smith, Schouwstra, de Boer and Smith2010). Thus, although it is possible that the deaf children learned the conflated form from their hearing mothers, we cannot rule out the possibility that the hearing mothers produced their conflated gestures in response to their children's conflated gestures.
Gesture with speech and without it in the input to children
Our findings suggest that, aside from seeing their hearing mothers produce a slightly larger number of conflated gestures than hearing Turkish speakers typically produce, the Turkish deaf children in our study are likely to have been exposed to the same types of manner and path gestures as Turkish hearing children in their community (see also Goldin-Meadow & Saltzman, Reference Goldin-Meadow and Saltzman2000, who found few frequency differences between the gestures hearing mothers produce with their deaf vs. hearing children in China and the US). The difference between Turkish hearing and deaf children is that hearing children interpret the gestures they see in the context of speech – most severe to profoundly deaf children are unable to make effective use of the speech that surrounds them, even when provided with a hearing aid. This difference seems to affect how children use their gestural input. Hearing children integrate their gestures with the speech they hear, producing gestures comparable to those produced by hearing adults. In contrast, deaf children transform the gestures they see into a homesign system characterized by language-like structure; they use, for example, the mixed form found only rarely in any of the hearing speakers' gestures.
Our findings are the first to explicitly compare how deaf and hearing children respond to the gestures they see and, as such, they make it clear that there is no one ‘child’ gesture pattern. The findings also underscore the fact that gesture is part of an integrated gesture–speech system for hearing children, but must serve all of the functions of language for the deaf children and, as a result, needs transformation.
Gesture with speech and without it in the output
We found that the deaf children's gestures look different from a hearing person's gestures even when those gestures are produced without speech. When hearing adults are told to use only gesture to describe motion events, they (like the deaf children) find it essential to mention both manner and path within a single gesture sentence, presumably in response to the need to convey all of the relevant information in the manual modality. Importantly, however, this pressure does not dictate the form of the resulting gesture string – deaf homesigners, in addition to using the conflated form on its own, often add decomposed segments to the conflated gesture, thus creating the mixed form; hearing adults prefer to use the conflated form on its own.
One additional point is worth highlighting in this regard. Segmenting and sequencing the action components of an event appears to be less robust in communication than segmenting and sequencing an entity and the event in which it is involved. Hearing adults asked to communicate using only their hands routinely produce segmented and sequenced gestures representing the figure and path of an event (e.g. circle followed by path; Goldin-Meadow et al., Reference Goldin-Meadow, McNeill and Singleton1996, Reference Goldin-Meadow, So, Özyürek and Mylander2008), as do homesigners. In contrast, neither group produces many segmented and sequenced gestures representing the manner and path of an event within a single gesture string (e.g. roll gesture, followed by down gesture). However, by producing a sizeable number of mixed forms (e.g. roll + down, followed by roll or down), the homesigners have taken a step towards action segmentation and combination that silent hearing adults do not take (see also Özçalişkan & Goldin-Meadow, Reference Özçalişkan and Goldin-Meadow2013).
CONCLUSION
In sum, we have found that homesigners, who do not have access to a language model that they can process, introduce action segmentation and combination into their gestural communication systems even though the manual modality lends itself to holistic representation (e.g. rolling down a hill is a single act that is easily represented using a single gesture incorporating both manner and path). Homesigners do conflate manner and path in the same gesture, but they also combine those conflated gestures with segmented gestures for manner and/or path (the mixed gesture string), thus taking the first step toward a segmented representational form. In contrast, hearing speakers in the same community rarely combine conflated gestures with segmented gestures into a mixed string and, in fact, produce large numbers of conflated gestures only when they are forced to use gesture to communicate. Thus the mixed form may be indexing an intermediate stage in the development of manual language systems, one that bridges the transition from conflated forms that have no segmentation to sequenced forms that are fully segmented.
The segmentation patterns we observe in the homesigners' mixed form are consistent with patterns found in deaf children learning sign languages from their deaf parents. Deaf children have been found to display a preference for linear sequencing even in situations where adult signers use simultaneous constructions. For example, Meier (Reference Meier1987) found that children learning American Sign Language initially break complex verb expressions down into sequential morphemes, despite the fact that adult ASL signers produce these verb elements within a single simultaneous movement (see also Supalla, Reference Supalla1982; Newport, Reference Newport and Collins1981). Our homesigners display similar tendencies even though they are not exposed to a conventional language model.
Our results are also in line with recent experimental and simulation studies of language emergence. The conflated representations of manner and path in homesign systems reveal an initial bias for iconic and holistic representation, corroborating claims about iconicity as the base out of which linguistic structures might have emerged (Garrod, Reference Garrod2007; Gasser, Reference Gasser, Forbes, Gentner and Regier2004; Theisen, Oberlander & Kirby, Reference Theisen, Oberlander and Kirby2010) and as a feature that can still be found in modern-day languages, signed (Perniss, Thompson & Vigliocco, Reference Perniss, Thompson and Vigliocco2010) and spoken (Shintel, Nusbaum & Okrent, Reference Shintel, Nusbaum and Okrent2006). At the same time, our results underscore the fact that homesigning children are able to pull away from iconicity (even if not totally), suggesting that children may be predisposed to prefer communication systems characterized by segmentation and combination. However, the fact that homesigners do not display the sequencing found in later cohorts of Nicaraguan Sign Language makes it clear that children cannot do it all, and that other forces (e.g. having a community within which the language is socially shared; transmitting the language from one generation to the next; Christiansen & Kirby, Reference Christiansen and Kirby2003; Fay, Garrod, Roberts & Swoboda, Reference Fay, Garrod, Roberts and Swoboda2010; Goldin-Meadow, Reference Goldin-Meadow2010; Senghas et al., Reference Senghas, Özyürek, Goldin-Meadow, Smith, Schouwstra, de Boer and Smith2010, Reference Senghas, Özyürek, Goldin-Meadow, Botha and Everaert2013) must have collaborated to make human language what it is.