Co-speech gestures are hand and arm movements that are produced in conjunction with speech to convey some aspects of a speaker's message (Goldin-Meadow, Reference Goldin-Meadow1999; Kendon, Reference Kendon1997; McNeill, Reference McNeill1992, Reference McNeill2005). The meaning of many co-speech gestures complements or supplements the meaning of the co-occurring speech, such that a speaker's message is conveyed by both words and gestures (Alibali, Evans, Hostetter, Ryan & Mainela-Arnold, Reference Alibali, Evans, Hostetter, Ryan and Mainela-Arnold2009; Morrel-Samuels & Krauss, Reference Morrel-Samuels and Krauss1992). Some researchers have argued that speech and gesture are interactive: once initiated, both operate in conjunction (Krauss, Chen & Gottesman, Reference Krauss, Chen, Gottesman and McNeill2000; see also Beattie, Reference Beattie2003). If so, then if either gesture or speech is inhibited, then the other modality should suffer (indeed, see, Rauscher, Krauss & Chen, Reference Rauscher, Krauss and Chen1996, for effects of gesture restriction on verbal fluency). Most of the previous studies have focused on how people, in general, use gestures in relation to speech. Yet we know that there are individual differences in people's verbal abilities as well as their tendency to gesture. Researchers have started to explore some of the reasons for these individual differences, including language proficiency (particularly in a second language; Nagpal, Nicoladis & Marentette, Reference Nagpal, Nicoladis and Marentette2011), personality (Hostetter & Potthoff, Reference Hostetter and Potthoff2012), and cultural background (Smithson, Nicoladis & Marentette, Reference Smithson, Nicoladis and Marentette2011). In this study, we test whether individual differences in working memory capacity can predict differences in the frequency of gesture use in bilinguals and monolinguals. As previous studies have pointed to the possibility that bilinguals and monolinguals might use working memory differently, we must first test whether there are differences in working memory architecture between bilinguals and monolinguals.
In hypothesizing that working memory capacity might predict gesture use, we rely on a cognitive model of speech-gesture production proposed by Krauss et al. (Reference Krauss, Chen, Gottesman and McNeill2000), referred to from here on as the Speech-Gesture Production model. Within this model it is proposed that iconic gestures and speech rely upon two production systems that operate jointly. Gesture and speech are both thought to emerge from the activation of representations in working memory. Working memory is a theoretical system which “underlies human thought processes” (Baddeley, Reference Baddeley2003, p. 829) by temporarily storing and manipulating information. According to the multi-component working memory model (Baddeley, Reference Baddeley2000, Reference Baddeley2003), the phonological loop is the component of working memory that is responsible for the storage and rehearsal of language information, the visuospatial sketchpad is responsible for the storage and maintenance of visual and spatial information, and the central executive is responsible for the manipulation and modification of information in the phonological loop and the visuospatial sketchpad. Verbal short-term memory refers to the storage and rehearsal of language information in the phonological loop whereas verbal working memory refers to the storage, rehearsal, and manipulation of language information requiring the use of both the phonological loop and the central executive. Visuospatial short-term memory refers to the storage and maintenance of visual and spatial information in the visuospatial sketchpad whereas visuospatial working memory refers to the storage, maintenance and manipulation of visual and spatial information requiring the use of both the visuospatial sketchpad and the central executive (Baddeley, Reference Baddeley2000, Reference Baddeley2003).
Krauss et al. (Reference Krauss, Chen, Gottesman and McNeill2000) assert that the contents of long-term memory are often multiply encoded in different representational formats (i.e. verbal and visuospatial) and that when one type of representational format is activated in working memory, it tends to activate related concepts in other formats. For example, remembering the image of a bird could activate the word “bird”. Within this model, visuospatial working memory gives rise to gesture production, while verbal working memory gives rise to speech articulation. Several processes are involved in the transformation of information held in visuospatial working memory into overt gestures. Similarly, several processes are involved in the transformation of information held in verbal working memory into overt speech. However, despite the autonomy of these processes, they are critically interactive. In this model, iconic gestures can facilitate speech production by cross modally priming the relevant word in the formulation stage of speech production (Krauss et al., Reference Krauss, Chen, Gottesman and McNeill2000). This implies that if verbal working memory is not activating the relevant words for speech in an efficient manner, that gesture production may play an especially important role in cross modally priming relevant lexical terms. More specifically, gestures may play a crucial compensatory role in speech production processes.
Evidence suggesting a compensatory role of gesture production in speech
Across a number of different experimental methods, gesture production has been shown to facilitate access to words and linguistic constructions during speech (Alibali, Kita & Young, Reference Alibali, Kita and Young2000; Frick-Horbury & Guttentag, Reference Frick-Horbury and Guttentag1998; Krauss, et al., Reference Krauss, Chen, Gottesman and McNeill2000; Morrel-Samuels & Krauss, Reference Morrel-Samuels and Krauss1992; Rauscher, et al., Reference Rauscher, Krauss and Chen1996). Some studies have shown that the production of gestures can compensate for difficulties in speech production or can facilitate speech production when cognitive processing is difficult. For example, Iverson and Braddock (Reference Iverson and Braddock2011) argued that children with language impairments used a higher rate of gestures in comparison to typically developing children. She concluded that gesture production “takes on a compensatory role, conveying information that may be difficult for the speaker to encode or express in oral language” (Iverson & Braddock, Reference Iverson and Braddock2011, p. 84). In a study by Melinger and Kita (Reference Melinger and Kita2007), conceptual load was manipulated without altering the needs for speech formulation. Inducing a higher conceptual load leads participants to use more gestures during speech. In a study by Tellier (Reference Tellier2008), foreign language words were presented to children with accompanying gestures or pictures. Children who reproduced gestures as they repeated the words showed enhanced memory for these items. Other research has shown that gesture rates are higher during fast speech (Rauscher et al., Reference Rauscher, Krauss and Chen1996); gesture rates are higher during extemporaneous speech (Chawla & Krauss, Reference Chawla and Krauss1994); and tip of the tongue states are more likely resolved when participants are allowed to gesture (Frick-Horbury & Guttentag, Reference Frick-Horbury and Guttentag1998; see also Beattie & Coughlan, Reference Beattie and Coughlan1999). Taken together, these studies suggest that when verbal resources are taxed, gestures can help with accessing and/or producing speech.
Verbal memory as a predictor of gesture production
According to the Speech-Gesture Production model proposed by Krauss et al. (Reference Krauss, Chen, Gottesman and McNeill2000), differences in verbal working memory ability may lead to differences in iconic gesture production. Because of the compensatory role that gestures play in speech production, difficulties with verbal working memory processing may be associated with an increase in gesture production. To date, research investigating the relationship between gesture production and verbal memory resources has not been conclusive. In a study by Hostetter and Alibali (Reference Hostetter and Alibali2007), individual differences in verbal and spatial skill were assessed as predictors of gesture production. In this study, participants’ verbal skill was assessed using two tasks: a phonemic fluency task and a semantic fluency task (Hostetter & Alibali, Reference Hostetter and Alibali2007). Participants’ spatial skill was assessed using a paper folding task. Gesture production was collapsed across a narrative production and a package wrapping task. The results revealed that lower verbal skill (as measured by the phonemic fluency task) and higher spatial skill, were associated with greater gesture use. This study provides suggestive evidence that gestures may be negatively associated with verbal memory and positively associated with visuospatial memory. However, since the tasks used in this study were designed to measure skill rather than memory, this conclusion is largely speculative.
In a separate study that investigated whether gestures were more strongly associated with verbal or visuospatial working memory, Wagner, Nusbaum and Goldin-Meadow (Reference Wagner, Nusbaum and Goldin-Meadow2004) assigned participants to a verbal working memory condition (wherein they were asked to remember a string of letters) or a visuospatial working memory condition (wherein they were asked to remember a pattern on a grid) while solving mathematical problems. Additionally, some participants were allowed to gesture while others were restricted from gesturing. It was thought that if gestures were more strongly associated with verbal working memory resources, that gesture use would disrupt memory in the verbal working memory condition to a greater extent. Alternatively, it was thought that if gestures were more strongly associated with visuospatial working memory resources, that gesture use would disrupt memory in the visuospatial memory condition to a greater extent. Participants performed better on both memory assessments when they were allowed to gesture. Importantly this effect was only apparent when gesture meaning matched the verbal information that was conveyed in speech. This suggests that gesture production may reduce the cognitive load on verbal working memory. However, this study does not allow any conclusions to be drawn with respect to whether or not individual differences in verbal working memory capacity predict gesture use. In sum, whether working memory resources can predict iconic gesture use remains unclear.
Verbal memory as a predictor of gesture production among bilinguals
Both of the studies mentioned above that suggest a role of verbal memory in predicting gesture production were conducted with monolinguals. Some research has shown that bilinguals may allocate their cognitive resources differently from monolinguals in processing language and cognitive tasks, particularly with regard to the central executive (Carlson & Meltzoff, Reference Carlson and Meltzoff2008). Since verbal working memory requires the use of both the phonological loop and the central executive, differences in either of these contributing components would be expected to have an influence upon gesture production given the Speech-Gesture Production model. The central executive is responsible for inhibition, among other functions (Miyake, Friedman, Emerson, Witzki & Howerter, Reference Miyake, Friedman, Emerson, Witzki and Howerter2000). Among bilinguals, when one language is being spoken, the other is active and accessible (Chee, Reference Chee2006; Crinion, Turner, Grogan, Hanakawa, Noppeney, Devlin, Aso, Urayama, Fukuyama, Stockton, Usui, Green & Price, Reference Crinion, Turner, Grogan, Hanakawa, Noppeney, Devlin, Aso, Urayama, Fukuyama, Stockton, Usui, Green and Price2006; Kroll, Bobb & Wodniecka, Reference Kroll, Bobb and Wodniecka2006). Therefore, bilinguals must allocate attentional resources into ensuring that they select and communicate words from the relevant language rather than words from the competing language (Bialystok, Reference Bialystok2009). As a result, inhibitory control may be required whenever bilinguals speak in one of their languages. Research has shown that bilinguals exhibit superior inhibitory control in comparison to monolinguals (Carlson & Meltzoff, Reference Carlson and Meltzoff2008) suggesting that the frequent use of inhibitory control may strengthen this component of executive functioning among bilinguals.
The investigation of differences between monolinguals and bilinguals with respect to the central executive is complicated by the fact that bilingualism is often associated with variables that influence performance on these tasks such as socioeconomic status (SES) (Bialystok, Reference Bialystok2001). In a study by Morton and Harper (Reference Morton and Harper2007), bilinguals and monolinguals were compared on the Simon task (a measure of cognitive control). Previous researchers have shown that bilinguals are faster and more precise on this task than monolinguals (Bialystok, Craik, Klein & Viswanathan, Reference Bialystok, Craik, Klein and Viswanathan2004). Morton and Harper (Reference Morton and Harper2007) found that monolingual and bilingual children performed similarly on this task when comparing children from identical ethnic and SES backgrounds. It was argued that the bilingual advantage in cognitive control that has frequently been claimed in literature may be attenuated by controlling for differences in SES. Though this study merits consideration, other research shows that even when controlling for SES, bilingual executive advantages persist. For example, in a study by Barac and Bialystok (Reference Barac and Bialystok2012), monolingual and bilingual children with equivalent levels of general cognitive level, psychomotor speed, and SES were compared on an executive control task requiring task switching. The results demonstrated that three different groups of bilinguals (Chinese–English, French–English, and Spanish–English) outperformed monolinguals by demonstrating smaller switching costs in comparison to the monolingual children. In a study by Carlson and Meltzoff (Reference Carlson and Meltzoff2008), Spanish–English bilingual children, Spanish–English immersion children, and an English monolingual control group were assessed on a variety of executive functioning tasks. When the researchers controlled for the effects of SES, verbal ability and age, the bilingual group showed a significant relative advantage to both other groups on the executive function battery (Carlson & Meltzoff, Reference Carlson and Meltzoff2008). Therefore, although still a topic of debate, it is widely believed that bilinguals have enhanced executive functioning in comparison to monolinguals.
Bilingualism may also influence the communication between the phonological loop and the visuospatial sketchpad. The bilingual dual-coding model has been proposed in order to explain how bilinguals use verbal and imaginal representations throughout speech production (Paivio & Desrochers, Reference Paivio and Desrochers1980 as cited in Paivio, Clark & Lambert, Reference Paivio, Clark and Lambert1988). According to this model, bilinguals rely on two sets of verbal representations (one in each of their languages) and a single imagery system. The verbal representations from both languages are interconnected in such a way that translation equivalents rely upon the same imagery representations (Paivio et al., Reference Paivio, Clark and Lambert1988). While there is some debate whether bilinguals are more reliant than monolinguals on visuospatial abilities in general, or solely in the early stages of language learning (Leonard, Brown, Travis, Gharapetian, Hagler, Dale, Elman & Halgren, Reference Leonard, Brown, Travis, Gharapetian, Hagler, Dale, Elman and Halgren2010), the language processing structure underlying the processing of verbal and visuospatial information may be different for monolinguals and bilinguals. If so, then models relying on working memory as a predictor of gesture use devised on monolinguals may not generalize to bilinguals.
The present study
The purpose of this study was to examine the association between gesture production and verbal working memory among monolinguals and bilinguals.
Two research questions were investigated:
Are there different architectures of WM in monolinguals and bilinguals?
The association between verbal working memory and iconic gesture production may differ among monolingual and bilingual groups since differences concerning the central executive (an integral component of verbal working memory) (Barac & Bialystok, Reference Barac and Bialystok2012; Carlson & Meltzoff, Reference Carlson and Meltzoff2008; see also Morton & Harper, Reference Morton and Harper2007) and the relative activation of verbal and visuospatial working memory during the processing of words (Leonard et al., Reference Leonard, Brown, Travis, Gharapetian, Hagler, Dale, Elman and Halgren2010), have been documented.
Verbal short-term memory, verbal working memory, visuospatial short-term memory, and visuospatial working memory were all measured and correlated using a standardized memory assessment in order to thoroughly evaluate the pattern of interrelations that exists between memory components among monolinguals and bilinguals. It was predicted that the interactions between the verbal and visuospatial components of working memory would be more pronounced among bilinguals in comparison to monolinguals.
Are there different predictors of gesture frequency among monolinguals and bilinguals?
The Speech-Gesture Production model suggests that a negative association may exist between verbal working memory and iconic gesture production. If an association exists between these factors it would provide important insight into how iconic gestures serve a compensatory role in speech production processes. Since previous research is suggestive of a negative association between verbal memory and gesture production and a positive association between visuospatial memory and gesture production (Hostetter & Alibali, Reference Hostetter and Alibali2007), it was predicted that both verbal short-term and verbal working memory would be negative predictors of iconic gesture production and that both visuospatial short-term and visuospatial working memory would be positive predictors of iconic gesture production. No specific predictions were made with respect to whether short-term memory or working memory resources would show stronger associations with gesture production. It was predicted that if the architecture of WM differed among monolinguals and bilinguals, that differences would also emerge with respect to the association between gesture production and working memory resources.
Method
Participants
Monolinguals
All monolingual participants were recruited from the University of Alberta in Edmonton, Alberta. A sample of 30 English monolingual adults originally participated in this study. These participants ranged in age from 18–75 years. Since working memory is thought to be influenced by age, only the 23 participants who were in their teenage years or in their twenties, were included in the final analyses.
Participants were considered to be monolingual even if they had studied a foreign language for a year or if they had non-fluent knowledge of another language. Though non-fluent knowledge of another language has been shown to have effects on gesture viewpoint (Brown, Reference Brown2008), no evidence has shown that this influences gesture frequency.
Participants in the final sample of 23 ranged in age from 18–28 years (M = 21.13, SD = 2.46). Both male and female participants were included in this study. The ratio of male to female participants was 8:15. No significant gender differences were found on any of the independent or dependent measures.
Bilinguals
All bilingual participants were recruited from the University of Alberta in Edmonton, Alberta. A sample of 30 English–French bilingual adults originally participated in this study. Two participants were excluded from the analyses since one participant scored two standard deviations below the mean on the assessment of verbal working memory and one participant was unable to relay a fluent narrative in both languages. Four additional participants were excluded from the analyses since they were 30 years old or older. Only participants in their teenage years and their twenties were included in the final analyses.
Among the 24 participants included in the analyses, French was the first language of 10 of the participants, English was the first language of 6 of the participants, and the remaining 8 were simultaneous bilinguals. Participants ranged in age from 18–28 years (M = 20.88, SD = 2.36). Both male and female participants were included in this study. The ratio of male to female participants among the bilinguals was 5:19. No significant gender differences were found on any of the independent or dependent measures.
Materials
Vocabulary assessments
In order to assess the English vocabulary among both English monolinguals and English–French bilinguals, the Peabody Picture Vocabulary Test – Third edition (PPVT–IIIA) was used (Dunn & Dunn, Reference Dunn and Dunn1997). In order to assess the French vocabulary among the English–French bilinguals, the Echelle de Vocabulaire en Images Peabody (EVIP) was used (Dunn, Thériault-Whalen & Dunn, Reference Dunn, Thériault-Whalen and Dunn1993).
Pink Panther cartoons
Two segments of Pink Panther cartoons were shown to the adults. No words are uttered by any of the characters in the selected cartoons. The first was entitled “In the Pink of the Night” and lasted four minutes and two seconds. The second video was entitled “Jet Pink” and lasted four minutes and 14 seconds. The cartoons were shown one right after the other. In the first video, Pink Panther is being woken up by a cuckoo bird. The Pink Panther tries desperately to silence the cuckoo bird. Eventually the Pink Panther ends up becoming friends with the bird. In the second video, Pink Panther decides that he wants to be a famous pilot. He gets into an airfield for military jet airplanes and proceeds to fly into the atmosphere and around a city until, finally, he gets ejected from the plane.
Automated Working Memory Assessment
A four-subtest working memory battery called the Automated Working Memory Assessment Short-Form (AWMA-S) (Alloway, Reference Alloway2007) was used to evaluate the adults’ verbal short-term memory, verbal working memory, visuospatial short-term memory and visuospatial working memory. The AWMA is a standardized, computerized testing assessment. The assessment scoring is automated and the testing sequence is pre-set. When participants were prompted for a response on the assessments, the experimenter checked the accuracy of their response using an AWMA answer manual in a location not visible to the participant. The experimenter then indicated whether the participant was correct or incorrect by using either the forward or backward arrow keys respectively. The indication of a correct or incorrect response by the experimenter using the arrow keys led to either a new test item or ended the task depending upon how many errors the participants had accumulated on that particular level of the assessment. Participants did not receive feedback as to whether their responses were correct or incorrect. Once the four tasks were completed all working memory scores were available for the experimenter in an Excel file. Descriptions of each of the four tasks are outlined below. For additional detail concerning these tasks, see Alloway (Reference Alloway2007).
Verbal short-term memory was assessed using a digit recall task. Participants heard a sequence of digits and were asked to recall the digits orally in the correct order. The test becomes progressively more difficult as the digit span increases on subsequent trials.
Verbal working memory was assessed using a listening recall task. Participants heard a series of spoken sentences. They were first asked to orally identify the sentence as being true or false, and they were subsequently asked to orally recall the last word of each sentence in the correct sequence. The task increases in difficulty as more sentences are added.
Visuospatial short-term memory was assessed using a task called the dot matrix. In this task participants were shown the location of a red dot in a series of 4 × 4 grids and were asked to recall the position by pointing to the squares on the computer screen that contained the red dot in the same order that the dot(s) appeared (no oral response was required on this task). The test becomes increasingly difficult as the number of dots to be remembered increases.
Visuospatial working memory was assessed using a spatial span task. This task requires participants to view a screen with two shapes. The shape on the right side of the screen had a red dot in one of three locations. Participants were asked to identify whether the shape on the right was the same or opposite to the shape on the left side of the screen orally by saying “same” or “opposite”. The shape with the red dot was rotated at various angles during each trial. Participants were asked to recall the location of each red dot on the shape in the exact sequence that it was presented by pointing to the locations of the red dots on an image with three dots in the form of a triangle. The shapes remained on the screen until participants identified whether the shape on the right was in the same orientation as the shape on the left. In addition, the points in the form of a triangle remained on the screen as the participants pointed to areas where the red dots had been presented.
Procedure
All participants gave informed consent to participate in the study by signing a consent form. Within the consent form it was noted that many aspects of their narrative production (including speech and gesture production) would be assessed. Bilingual participants completed an English session and a French session and the ordering of these sessions was counterbalanced. Monolinguals completed only an English session. Participants were thanked for their time and participation, and they were offered an honorarium of ten dollars for each visit.
English session procedure
An English-speaking experimenter conducted the English sessions and spoke exclusively in English throughout the entire duration of this session. Participants completed the PPVT vocabulary assessment. Subsequently, participants were asked to watch the Pink Panther cartoons alone in a testing room. When the videos were finished, participants were asked to retell the cartoons in narrative form to a native English-speaking experimenter as they were videotaped. They were not given a time limit for their retellings.
Subsequently, participants’ working memory was assessed using the AWMA-S. The experimenter sat next to the participant and used the arrow keys on the computer to indicate whether the participant responded correctly or incorrectly to each question posed by the assessment tool. Subsequently the program calculated the scores for each subtest automatically.
The order of the English session was constant in that participants would always complete the vocabulary assessment first, followed by the narrative production task, and subsequently they would complete the working memory assessment.
French session procedure
An English–French bilingual experimenter conducted the French sessions and spoke exclusively in French throughout the entire duration of this session. Participants completed the EVIP vocabulary assessment. Subsequently, participants were asked to watch the Pink Panther cartoons alone in a testing room. When the videos were finished, the participants were asked to retell the cartoons in narrative form to an English–French bilingual experimenter. They were not given a time limit for their retellings.
The vocabulary assessment always preceded the narrative task in this session.
Transcription and coding of speech
All English sessions were transcribed by a native English speaker and the French sessions were transcribed by an English–French bilingual speaker in orthographic words. For each session, the total number of words (i.e. word tokens) used to tell the story including false starts and repetitions was counted.
Coding gestures
Using the coding system developed by McNeill (Reference McNeill1992), four types of gesture were coded during the analysis of the videos: iconic, deictic, conventional, and beat. Iconic gestures make use of shapes or actions to represent an object (i.e. flapping fingers to indicate a flying bird). Deictic gestures are pointing gestures towards a person or an object (i.e. extending the index finger towards an object). Conventional gestures are recognized by adults without the need of speech (i.e. thumbs up). Beat gestures are up and down movements of the hands. When a gesture could not be clearly identified it was labeled as an unknown gesture. Only iconic gestures will be reported since they are the type of gesture thought to be most strongly associated with speech (Krauss et al., Reference Krauss, Chen, Gottesman and McNeill2000). According to McNeill (Reference McNeill1992, p. 79), “[a] gesture is iconic if it bears a close formal relationship to the semantic content of speech”. Iconic gestures may also be the type of gesture most strongly associated with imagery (Hadar & Butterworth, Reference Hadar and Butterworth1997; Morsella & Krauss, Reference Morsella and Krauss2004; Wesp, Hesse, Keutmann & Wheaton, Reference Wesp, Hesse, Keutmann and Wheaton2001). It has been argued that the imagery associated with iconic gesture production facilitates lexical retrieval (Hadar & Butterworth, Reference Hadar and Butterworth1997).
All of the speech was coded using CHAT conventions (MacWhinney, Reference MacWhinney2000) in order to be processed by software called CLAN (MacWhinney, Reference MacWhinney2000) to determine the number of word tokens produced by the participant. In order to be labeled as an iconic gesture, the gesture had to be produced with a clear preparation, stroke, and retraction. Additionally, the gesture had to depict an aspect of a referent. For each iconic gesture that was produced, a separate line in the transcript was created identifying the gesture clearly with an “(I)” and with a sentence describing the gesture, the hand(s) used to produce the gesture and also what motion was produced (e.g., gesture meaning: right hand moves upwards in the shape of a fist and then moves downwards quickly, perhaps describing the action of breaking the clock (I)).
Gesture rate was calculated as the number of iconic gestures produced divided by the number of word tokens (in order to control for the length of participant narratives). This rate was then multiplied by 100, for ease of interpretability.
Results
The PPVT was used to assess English vocabulary and the EVIP was used to assess French vocabulary. On the PPVT, monolinguals had a mean score of 115.13 (SD = 9.89) and bilinguals had a mean score of 115.21 (SD = 9.72). On the EVIP, bilinguals had a mean score 106.54 (SD = 12.69). Both measures of vocabulary among bilinguals are above the standard score of 100, suggesting fluency in both languages. It is important to note that bilinguals as a group tended to perform slightly better on the English vocabulary assessment in comparison to the French vocabulary assessment.
Table 1 summarizes the means, standard deviations, and observed ranges of the relevant variables for this study among monolinguals and bilinguals. With respect to verbal short-term memory, monolinguals scored a mean of 39.57 (SD = 5.42) and bilinguals scored a mean of 37.08 (SD = 7.96). With respect to verbal working memory, monolinguals scored a mean of 19.83 (SD = 4.04) while bilinguals scored a mean of 19.13 (SD = 4.92). With respect to visuospatial short-term memory, monolinguals scored a mean of 31.87 (SD = 5.48) while bilinguals scored a mean of 31.79 (SD = 5.45). With respect to visuospatial working memory, monolinguals scored a mean of 30.70 (SD = 6.24) while bilinguals scored a mean of 27.29 (SD = 7.94). With respect to gesture rate, monolinguals had a gesture rate of 4.74 (SD = 2.55) in comparison to the bilingual French gesture rate of 2.66 (SD = 1.90) and the bilingual English gesture rate of 6.23 (SD = 3.00).
Independent samples t-tests were conducted on all memory measures and English narrative production measures. These comparisons revealed no significant differences between the monolinguals and bilinguals on any of the measures.
Correlations were conducted in order to determine whether the working memory architecture differs among monolinguals and bilinguals. Table 2 summarizes the correlations between all memory measures among the monolinguals. These correlations reveal that verbal short-term memory is significantly correlated with verbal working memory. They also reveal that visuospatial short-term memory is significantly correlated with visuospatial working memory. None of the correlations between the verbal and visuospatial memory measures were significant.
*p = .05
Table 3 summarizes the correlations between all memory measures among the bilinguals. These correlations reveal that all memory measurements are either significantly or marginally significantly associated with one another. All correlations between memory measures were significant at either the p = .01 or p = .05 level except for the association between visuospatial short-term memory and verbal working memory which was marginally significant at the p = .06 level. No analyses were carried out according to the first language of the participants, as the sample sizes were too small.
* p = .05; ** p = .01; *** p < .06
Regression analyses
Forward linear regression analyses were conducted in order to determine whether memory resources predict iconic gesture use among monolinguals and bilinguals. All four memory measures were used as possible predictors in the analyses.
Monolinguals
A forward linear regression led to a significant model F1,21 = 11.489, p = .003. The adjusted R2 value was .323 indicating that 32.3% of the variance in iconic gesture rate can be explained by verbal short-term memory (Beta = –0.595, p = .003). The regression equation is as follows: Y′ = 15.83 – 0.280(x1) (where x1 = verbal short-term memory).
Bilinguals
English session
No variables were included in the regression equation since none of the variables were significant predictors of iconic gesture production (note: only significant predictors are included in forward regression analyses).
French session
A forward linear regression led to a significant model F1,22 = 5.263, p = .032. The adjusted R2 value was .156 indicating that 15.6% of the variance in iconic gesture rate can be explained by verbal working memory (Beta = –0.439, p = .032). The regression equation is as follows: Y′ = 5.897 – 0.169(x1) (where x1 = verbal working memory).
Discussion
The Speech-Gesture Production model asserts that gestures and speech are critically interactive since gesture can play a role in cross modally priming relevant words during the formulation stage of speech production (Krauss et al., Reference Krauss, Chen, Gottesman and McNeill2000). This model also asserts that verbal working memory gives rise to speech, while visuospatial working memory gives rise to gestures. The purpose of the current study was to investigate the association between iconic gesture production and verbal working memory among two populations who are thought to differ with respect to their central executive, a critical component of verbal working memory.
The architecture of working memory among monolinguals and bilinguals
The first research question addressed was whether the architecture of working memory is consistent across monolinguals and bilinguals. This was an important consideration since the relations between working memory components allow for more precise interpretation of results concerning an association between verbal working memory and iconic gesture production.
We have found evidence that the working memory architecture differs between these groups. Among monolinguals, verbal memory resources and visuospatial resources were relatively independent. Among bilinguals, there was a positive association between verbal short-term memory and visuospatial short-term memory, and a positive association between verbal working memory and visuospatial working memory. According to Alloway, Gathercole and Pickering (Reference Alloway, Gathercole and Pickering2006, p. 1698), strong associations between verbal and visuospatial working memory would suggest that they are “supported by a common resource pool” as opposed to being “maintained by separable cognitive resources”. The strong associations between verbal and visuospatial working memory among bilinguals may indicate that tasks requiring the temporary storage, maintenance or manipulation of verbal and/or visuospatial information are reliant upon the combined efforts of the verbal and visuospatial processing components. More specifically, this may indicate that bilinguals transform verbal information into a visuospatial format and vice versa more frequently than monolinguals when temporarily storing information in memory (Paivio et al., Reference Paivio, Clark and Lambert1988). As the bilinguals in this study were fluent in both languages, this different memory architecture may outlast the earliest stages of language acquisition (see Paivio et al., Reference Paivio, Clark and Lambert1988).
It has been argued that by the age of six years “[t]he central executive is linked closely with both the phonological loop and the visuospatial sketchpad, which are themselves relatively independent” (Gathercole, Pickering, Ambridge & Wearing, Reference Gathercole, Pickering, Ambridge and Wearing2004, p. 188). The results from the current study suggest that the phonological loop and visuospatial sketchpad may only represent separate storage components among monolinguals. These results challenge the generalizability of Baddeley's (Reference Baddeley2000, Reference Baddeley2003) working memory model and suggest that bilingualism may alter the structure of this model by blurring the distinction between verbal and visuospatial storage systems.
Verbal working memory as a predictor of iconic gesture production among monolinguals and bilinguals
The second research question addressed was whether working memory resources predict gesture frequency among monolinguals and bilinguals. In support of the Speech-Gesture Production model a negative association between iconic gesture production and verbal memory resources was predicted. It was also predicted that visuospatial memory would be positively associated with iconic gesture production since this has been speculated previously (Hostetter & Alibali, Reference Hostetter and Alibali2007).
Among the monolinguals verbal short-term memory was a negative predictor of iconic gesture production and among bilinguals in the French session, verbal working memory was a negative predictor of iconic gesture production. No memory measure was a significant predictor of iconic gesture production among the bilinguals in the English session. No visuospatial measure was a significant predictor of iconic gesture (among either group).
These results suggest that verbal memory resources rather than visuospatial memory resources play a strong role in predicting individual differences in iconic gesture production. More specifically, it suggests that those who use iconic gestures more frequently tend to have weaker abilities to temporarily store verbal information in working memory. It may be the case that a compensatory relationship exists between gesture use and verbal memory wherein the production of gestures enhances the efficiency of verbal memory in narrative tasks.
One way to understand how this might occur is to consider the task that participants were asked to complete. They were asked to watch cartoon videos and retell the stories to a listener. This requires participants to recall visuospatial information from the video and verbally describe that visuospatial information. Since a single imagistic representation can be verbally described in a variety of ways, the storage system that carries the cognitive load in this task is verbal memory. One possible function of gestures is to help speakers package spatial information in their message into verbalisable chunks (Alibali et al., Reference Alibali, Kita and Young2000). By expressing visuospatial information using the hands, gestures may help to package information into units for speech thereby reducing the load on verbal working memory (Wagner et al., Reference Wagner, Nusbaum and Goldin-Meadow2004).
Among both groups verbal memory was a negative predictor of gesture production and visuospatial memory did not predict iconic gesture production. However, among the monolinguals, verbal short-term memory was a negative predictor of gesture production whereas among the bilinguals, verbal working memory was a negative predictor of gesture production. Since verbal working memory is thought to rely on the capacity of both the central executive and the phonological loop (Baddeley, Reference Baddeley2000, Reference Baddeley2003) and since the central executive is thought to be an “attentional controller” (Baddeley, Reference Baddeley2003, p. 835), the observed negative relationship between verbal working memory and iconic gesture production among bilinguals may have reflected the fact that gesture production was facilitating the storage and manipulation of language information, or it may have reflected the fact that gesture production was facilitating the storage of language information while at the same time facilitating the appropriate allocation of attentional resources to ensuring that the relevant language was being spoken.
Limitations and future research
With respect to our claim that bilingualism may alter the structure of the working memory by weakening the distinction between verbal and visuospatial storage systems, alternative explanations could be proposed. It could be argued that the association between the verbal and spatial memory systems among the bilinguals indicates a stronger reliance among executive control among bilinguals in comparison to monolinguals. However, if this were the case then we would anticipate the association between verbal working memory and visuospatial working memory to be very strong but we would also expect no association between verbal short-term memory and visuospatial short-term memory. We would expect this since short-term memory is not thought to require the involvement of central executive resources. Since our results show strong associations between verbal and visuospatial working memory resources, as well as between verbal and visuospatial short-term memory resources, we think this possibility is unlikely.
It could also be argued that rather than representing greater information exchange among bilinguals, the strong association between the verbal and visuospatial short-term memory actually reflects the simultaneous use of a verbal and visuospatial strategy in memory. However, the tasks used in this study for measuring verbal short-term memory and visuospatial short-term memory would necessitate information exchange for a simultaneous strategy to be applied. For example with respect to the verbal short-term memory task, participants were presented auditorily with a series of digits. A visuospatial strategy could be applied where participants visualize the numbers as they are presented auditorily, however this type of exchange would require auditory information to be transformed into a visuospatial representation before it could be done. Though both strategies could be maintained by working memory simultaneously (i.e. repeating the digits using subvocal rehearsal while simultaneously visualizing the digits), the visualization of the digits would nonetheless initially require a transformation of the auditory information.
The negative association between iconic gesture production and verbal memory among both monolinguals and bilinguals was taken as evidence that iconic gestures may enhance the efficiency of verbal memory in narrative tasks. As one of the reviewers of this manuscript noted, it is quite possible that working memory plays distinct roles in maintaining the story in mind and in gesture production during recall. Future research would benefit from examining how working memory may contribute distinctly to these two processes.
Another limitation of this study is that it is not entirely clear why memory resources predicted iconic gesture rate only in the French session among the bilinguals. Recall that, as a group, the bilinguals scored slightly lower on the French vocabulary test than on the English vocabulary test. It may be the case that the association between verbal working memory and iconic gesture production is contingent upon language proficiency. In support of this possibility, gesture production is largely contingent upon language proficiency, however the nature of this association remains unclear since some researchers have found that bilinguals use more iconic gestures in their weaker language (e.g., Nicoladis, Pika, Yin & Marentette, Reference Nicoladis, Pika, Yin and Marentette2007) whereas others have found that bilinguals use more iconic gestures in their stronger language (e.g., Gullberg, Reference Gullberg1998). Future studies that systematically control for first language may reveal important insights regarding this question.
Gullberg, de Bot and Volterra (Reference Gullberg, de Bot and Volterra2008) argue that it is often tacitly assumed that children and adults produce gestures primarily to bridge their communicative intentions with their expressive abilities. The current study has provided valuable insights into how this compensation may occur via the association between iconic gesture production and verbal memory and also by demonstrating that the specific nature of this compensation may be unique among monolinguals and bilinguals. Future research is required in order to more precisely identify how hand movements can enhance the efficiency of verbal memory.