Introduction
A challenge for bilingual individuals is that words that are translations of one another often do not convey exactly the same conceptual information. That is, a word in one language often does not have a perfectly matching translation in another language in terms of conceptual meanings. For example, the word ball in English is typically translated as balle in French. However, the two words do not refer to an identical set of objects. The French word balle can only refer to small balls, such as tennis balls and baseballs, while its English translation ball can refer to all kinds of balls, including larger-sized balls such as basketballs (Paradis, Reference Paradis, Groot and and Kroll1997). Differences in conceptual information activated by translation word pairs can also come from contextual circumstances, especially when the two languages are used in different cultures. For example, the English word dragon is translated as 龙 in Chinese, but the concepts are dissimilar in the two languages. In Western culture, dragons are lizard-like creatures with wings, while dragons in Chinese culture are depicted as serpent-like creatures without wings.
Malt and her colleagues have explored the issue in the first example: namely, that apparently translation-equivalent words may capture somewhat different sets of exemplars. They have investigated cross-language differences in conceptual categories by examining how speakers of different languages label pictures of household objects (see Malt & Majid, Reference Malt and Majid2013, for a review of other conceptual categories that differ across languages). For example, in Malt, Sloman, Gennari, Shi and Wang (Reference Malt, Sloman, Gennari, Shi and Wang1999), English, Spanish, and Chinese participants labeled pictures of objects which were predominantly labeled as jars, bottles, and containers in English. Results showed that speakers of different languages grouped the objects somewhat differently when assigning labels. Spanish speakers assigned 7 different labels for 16 objects that were named as bottles by English speakers, and Chinese speakers used a single label for 40 objects, which were labeled variously as jar, bottle, and container by English speakers. Thus, even though the English word bottle translates as ping in Chinese, English speakers and Chinese speakers may have somewhat different concepts of the objects that are labeled by the word in each language. Malt and colleagues then demonstrated that bilinguals differ from monolinguals in each of their languages in the set of objects given a particular word label (Ameel, Malt, Storms & Van Assche, Reference Ameel, Malt, Storms and Van Assche2009; Ameel, Storms, Malt & Sloman, Reference Ameel, Storms, Malt and Sloman2005; Malt & Lebkuecher, Reference Malt and Lebkuecher2017; Malt, Li, Pavlenko, Zhu & Ameel, Reference Malt, Li, Pavlenko, Zhu and Ameel2015; Malt & Sloman, Reference Malt and Sloman2003; Pavlenko & Malt, Reference Pavlenko and Malt2011; Zinszer, Malt, Ameel & Li, Reference Zinszer, Malt, Ameel and Li2014). Furthermore, within bilinguals, although categories converged across languages (more so for simultaneous than sequential bilinguals), there were some differences in the sets of objects named with words that are considered to be translations.
The focus of the present investigation is on the issue in the second example – that is, whether the conceptual representations of translation word pairs differ in bicultural bilinguals. Our participants were individuals who had learned Mandarin Chinese in China and had then moved to Canada where they were immersed in English – therefore, they not only knew two languages but they had also lived and used those languages in two very different cultural contexts (hereafter we will use Mandarin to refer to the language and Chinese to refer to the culture). Specifically, the study examined whether printed words in each of a bicultural bilingual's languages activate culture-specific referents more strongly than referents from the other culture. Several models of conceptual representations in bilinguals provide an account of how conceptual representations for translation word pairs could differ. Next, we discuss those models and then we describe two studies that specifically examined bicultural bilinguals’ conceptual representations for translation word pairs.
Bilingual dual coding theory
Dual Coding Theory (Paivio, Reference Paivio1971, Reference Paivio1986, Reference Paivio2007) assumes that concepts are encoded in two systems: the nonverbal (imagen) and the verbal (logogen) systems. Imagens represent modality-specific perceptual information and logogens represent linguistic information. The Bilingual Dual Coding Theory (Paivio & Desrochers, Reference Paivio and Desrochers1980) proposed that bilinguals have two separate verbal systems. Words and their translations are connected to each other through links between logogens. The two verbal systems are each connected to the nonverbal system. A translation word pair can be connected either to shared or separate imagens in the nonverbal system, depending on the similarity of the concept across two languages. If a bilingual learns his or her two languages at different times and in different cultural environments, it is more likely that the two logogens of a translation pair would develop connections to different imagens. For example, if a Mandarin–English bilingual learns Mandarin in China and then learns English in a Western country, he or she would be more likely to link the English word dragon to a lizard-like creature with wings, while the Chinese word 龙 is linked to a serpent-like creature without wings. Figure 1 shows the model with another object that differs visually in the two cultures, mailboxes. The links between a picture and a word are depicted as stronger for culturally-congruent pairs (Chinese mailbox and Mandarin word, Canadian mailbox and English word) than for incongruent pairs (Canadian mailbox and Mandarin word, Chinese mailbox and English word). The latter links acknowledge that bilinguals may not strictly speak one language in each context.
Feature models
Most recent models of semantic memory assume that concepts are represented by sets of semantic features. A bilingual semantic feature model, the Distributed Conceptual Feature model, was proposed by De Groot (Reference De Groot, Frost and Katz1992). In this account, translation word pairs differ in the extent to which they activate the same semantic features (see Figure 2). Some translation pairs activate all or most of the same features, whereas others activate only some of the same features. Pairs would have more semantic features in common if the underlying concepts were similar in two languages (e.g., dog in English and chien in French share most of their features: has four legs, barks, etc.). On the other hand, translation word pairs would each activate some language specific features if the concepts were dissimilar in two languages (e.g., dragon in English has some English-specific features, like having wings, that are not shared by 龙 in Chinese). Taking a step further, the Shared (Distributed) Asymmetrical model (Dong, Gui & MacWhinney, Reference Dong, Gui and MacWhinney2005) encompasses a developmental account regarding how the connection strengths between words and semantic elements change as bilinguals become more proficient in L2. The model assumes that, when first acquiring a second language, the L2 learner starts by assuming the representation of an L2 word has all the elements of its translation in L1. As the bilingual becomes more proficient in L2, the links between L2 words and L1-specific elements are gradually eliminated, and the links between L2 words and L2-specific elements are added and gradually strengthened. The acquisition of L2-specific elements can also result in bilinguals developing connections between L1 words and L2-specific elements. The Distributed Conceptual Feature model and the Shared (Distributed) Asymmetrical model are verbally-described models of bilingual conceptual representations – however, more recently, Fang, Zinszer, Malt and Li (Reference Fang, Zinszer, Malt and Li2016) developed a computational model of bilingual word processing. The model consists of three self-organizing maps, one of which is a semantic map that uses feature representations. The authors were able to simulate the finding by Ameel et al. (Reference Ameel, Storms, Malt and Sloman2005) that bilinguals’ naming patterns for objects converge across their two languages and are distinct from those of monolinguals of each language.
The above-mentioned authors do not discuss in detail the nature of the semantic features in their models, but they appear to be amodal (i.e., independent of perceptual modality). An example of an amodal feature would be “has a tail”. This feature would be associated with any animal that has a tail. However, there are a lot of different sizes, shapes, and colours of tails among different animals, even among exemplars of a specific animal, such as a dog. These bilingual semantic feature models do not elaborate on how this variety is captured. If features are amodal, then the specific perceptual experiences of bilinguals when acquiring words in each language may not be that important.
Grounded cognition
In contrast, embodied theories of cognition assume that cognition is grounded in bodily states, modal simulations, and situated action (e.g., Barsalou, Reference Barsalou1999; Barsalou, Santos, Simmons & Wilson, Reference Barsalou, Santos, Simmons, Wilson, De Vega, Glenberg and Graesser2008; Glenberg & Robertson, Reference Glenberg and Robertson2000). Research on semantic memory in monolinguals has now provided strong evidence that many semantic features are modality specific and are represented in brain areas that are responsible for perception and action (e.g., Kiefer & Pulvermüller, Reference Kiefer and Pulvermüller2012; Martin, Reference Martin2007, Patterson, Nestor & Rogers, Reference Patterson, Nestor and Rogers2007). Vigliocco, Meteyard, Andrews and Kousta (Reference Vigliocco, Meteyard, Andrews and Kousta2009) argue that we learn word meanings through two qualitatively different types of information, experiential and language-based (verbal associations), which are integrated statistically. This latter view is not unlike Paivio's (Reference Paivio1971, Reference Paivio1986, Reference Paivio2007) Dual Coding Theory. An implication of these grounded cognition views is that the specific perceptual experiences that bilinguals encounter when learning words in each language may have a substantial impact on the semantic features that are activated by those words. Translation word pairs that are learned in different cultural contexts may activate somewhat different semantic features, or conversely, some real-world referents may more strongly activate word representations in one language than the other, as hypothesized by Paivio and Desrochers (Reference Paivio and Desrochers1980). More recently, Lupyan and Lewis (Reference Lupyan and Lewis2019) also argued that word meanings are contextually dependent and vary across languages. As an example, they showed that word associations given by American and Dutch participants to cheese and jealousy differed substantially.
Studies of bicultural bilinguals
A number of studies have investigated the role of the sensorimotor system in L2 semantic processing (see Monaco, Jost, Gygax & Annoni, Reference Monaco, Jost, Gygax and Annoni2019, for a review) but only a few have specifically examined bicultural bilinguals. Jared, Poh and Paivio (Reference Jared, Poh and Paivio2013) showed images of objects that are typical either in Canada or China to Mandarin–English bilinguals who were born in China and who moved to Canada in their teens. Participants were asked to name each object in English in one block of trials and in Mandarin in another block. For example, participants saw an image of a dragon, which was either a typical Western depiction or a typical Chinese depiction, and named the image either in English or Mandarin. Results showed that bilinguals responded faster when the visual image and the language of the task were culturally congruent (e.g., saw the Western dragon and named it in English) compared to when it was completed in a culturally incongruent language (e.g., saw the Western dragon and named it in Mandarin). This study provides evidence that links from representations of objects to their verbal labels in each language differ in strength, presumably as a consequence of the different perceptual experiences when learning and using each language.
A similar study by Berkes, Friesen and Bialystok (Reference Berkes, Friesen and Bialystok2018) used Canadian and Korean culturally biased pictures to explore the links between translation words and culture-specific images with ERP (event-related potentials). Korean–English bilinguals heard words (either in English or Korean) and simultaneously saw culturally biased pictures, and were asked to decide whether the word and picture matched. RT results showed that bilinguals matched the pictures and auditory words faster when they were culturally congruent compared to when they were culturally incongruent. For example, the Korean word 국 (soup) was matched faster with a picture of Korean soup than Canadian soup. On the other hand, the English word soup was matched faster with a picture of Canadian soup than Korean soup. These findings suggest that links between translation words and perceptual referents can differ. However, no significant results were found in the ERP data regarding the effect of cultural congruency in bilinguals.
These studies provide some evidence that the specific perceptual experiences that bilinguals encounter when learning words in each language have an impact on the semantic representations of those words. They therefore provide support for grounded cognition views in which conceptual representations encode modality-specific perceptual information, and suggest that bilingual theories, such as the Distributed Conceptual Feature model, could benefit by enhancing the description of its features to include modality-specific perceptual features. Jared et al. (Reference Jared, Poh and Paivio2013) noted that a challenge for distributed feature theories is to explain how the features are integrated into conceptual wholes (the binding problem). To the extent that information about the relative size of features and about spatial relationships among features is needed to characterize differences between a bilingual's representations of an object in different cultures, these findings support bilingual theories of conceptual representations that provide an account of how parts are assembled into wholes, such as Bilingual Dual Coding Theory (Paivio & Desrochers, Reference Paivio and Desrochers1980). That theory (and its parent Dual Coding Theory) assumes that modality-specific features are holistic parts of larger wholes, which are organized into perceptual hierarchies or nested sets (see Jared et al., Reference Jared, Poh and Paivio2013, for a more detailed discussion). There are several limitations to the two aforementioned studies, however, and further evidence is needed to more firmly establish the claim that the specific perceptual experiences that bilinguals encounter when learning words in each language influence the semantic representations activated by those words. In particular, it remains to be demonstrated that such an influence can be observed in ERP waveforms.
The present study
The present study extended Jared et al.'s study (2013) of Mandarin–English bilinguals by examining the mapping from verbal labels to real-world referents, the reverse of the original study, using a word–picture matching task with ERP. Specifically, we investigated whether printed words in each of a bicultural bilingual's languages activate congruent culture-specific referents more strongly than referents from the other culture. As in Jared et al., critical pictures differed between Chinese and Canadian/Western culture, although a larger set was used here. The study is similar to Berkes et al.'s (2018) study of Korean–English bilinguals but a few methodological improvements were made. First, Berkes et al. presented the auditory word and the pictures to participants at the same time. Therefore, the recorded ERP signal in their study reflected participants’ brain responses from processing both the auditory word and the picture, which likely overlapped in time and could have obscured the ERP components for each type of stimulus. Here, a printed word was presented first, and then a picture, and ERP recording was time-locked to the presentation of the picture. Therefore, ERP results in the current study reflected participants’ brain responses to the picture without an interfering signal from the simultaneous presentation of the word. The hope was that this procedure would allow us to observe a cultural congruency effect in the ERP data.
A second limitation of the Berkes et al. (Reference Berkes, Friesen and Bialystok2018) study, as well as the Jared et al. (Reference Jared, Poh and Paivio2013) study, is that bilinguals were tested in both of their languages in the same session. As a result, bilinguals might have kept the nontarget language activated to a higher degree when they were doing the task in one language than they typically would. In addition, including both of a bilingual's languages in one test session could give the participants some idea that they are participating in a bilingual study and that cultural differences between Korea or China and Canada matter in the experiment. This knowledge could have helped them generate and use some explicit strategies when doing the task. In the current study, bilinguals were tested in their two languages in separate sessions that were held at least a week apart. The testing environment was designed to match the language of the session. All conversation and the consent forms were in English in the English session. Similarly, all conversation and forms were in Mandarin in the Chinese session.
A third limitation in Berkes et al.'s study (Reference Berkes, Friesen and Bialystok2018), as well as in the Jared et al. (Reference Jared, Poh and Paivio2013) study, was the repeated presentation of critical pictures. Bilinguals in the Berkes et al. study saw each critical picture four times, twice as a semantic match (once with a Korean word, once with an English word), and twice as a mismatch. This was the case for each cultural version of a picture, that is, there were eight presentations of a bowl of soup, all in the same experimental session. Bilinguals in the Jared et al. study saw each critical picture twice, once with an English word and once with a Mandarin word. This also was the case for each cultural version of a picture, so that there were four presentations of each concept, such as dragon, in the same experimental session. This repetition of critical pictures could have weakened the results in that responses to repeated pictures may have been attenuated, especially in the ERP data. Here, participants saw each picture only once, and the Canadian/Western and Chinese cultural versions were seen in separate sessions.
Relevant ERP components
Previous research gives us clues as to the ERP components that may be sensitive to our manipulation. It is a well-established finding that the N400 (a centroparietally distributed negative-going wave) is sensitive to whether or not semantic predictions are fulfilled. Several studies of monolinguals have examined ERP responses to pictures that either were congruent or incongruent with a preceding word, sentence, picture, or picture sequence (e.g., Barrett & Rugg, Reference Barrett and Rugg1990; Chauncey, Holcomb & Grainger, Reference Chauncey, Holcomb and Grainger2009; Eddy, Schmid & Holcomb, Reference Eddy, Schmid and Holcomb2006; Federmeier & Kutas, Reference Federmeier and Kutas2001; Ganis, Kutas & Sereno, Reference Ganis, Kutas and Sereno1996; Hamm, Johnson & Kirk, Reference Hamm, Johnson and Kirk2002; Holcomb & McPherson, Reference Holcomb and McPherson1994; Kiefer, Reference Kiefer2001; McPherson & Holcomb, Reference McPherson and Holcomb1999; Nigam, Hoffman & Simons, Reference Nigam, Hoffman and Simons1992; Pratarelli, Reference Pratarelli1994; West & Holcomb, Reference West and Holcomb2002; Willems, Özyürek & Hagoort, Reference Willems, Özyürek and Hagoort2008), and have observed a difference between the two conditions in N400 and often in N300 components of the ERP waveform. The N300 tends to have a more frontal distribution than the N400 and is thought to reflect the semantic processing of non-verbal stimuli (e.g., West & Holcomb, Reference West and Holcomb2002). In our study, the critical ERP data come from responses to pictures that all matched the previously presented word, although not always in culture, and therefore effects are likely to be more subtle than in the aforementioned studies in which there was a match and an obvious mismatch. If there are stronger links between Mandarin words and Chinese-specific referents than between Mandarin words and Canadian/Western referents, then there should be reduced N300 and N400 responses to Chinese pictures compared to Canadian/Western pictures when they are preceded by Mandarin words. Similarly, if there are stronger links between English words and Canadian/Western referents than between English words and Chinese referents, then pictures from Canadian/Western culture should elicit less negative N300 and N400 components than Chinese pictures when they are preceded by English words.
In the study by Federmeier and Kutas (Reference Federmeier and Kutas2001) noted in the previous paragraph, a condition was also included in which pictures were unexpected from a prior sentence context but came from the same semantic category as the expected picture (e.g., He journeyed to the African plains, hoping to get a photograph of the king of the beasts. Unfortunately, the whole time he was there he never saw a …. followed by a picture of a tiger). In high constraint sentences (i.e., those that led to a strong prediction for the ending), these pictures produced different ERP responses from expected pictures (e.g., a lion) in two early time windows (N100: 50–150 ms, P200: 150–250 ms) in addition to the N300 and N400. Their high constraint condition is relevant for our study because the word preceding the target picture would strongly constrain possible targets. The two early components were hypothesized to reflect attentional load and ease of feature extraction, with less attention needed and feature extraction easier for expected pictures than for unexpected pictures. Here, less attention might be needed, and feature extraction may be easier for pictures when the prior word is congruent with the culture of the picture than when it is incongruent. The Chauncey et al. (Reference Chauncey, Holcomb and Grainger2009) study is most like ours in that pictures of real objects served as targets and these were preceded by single word primes. In contrast to Federmeier and Kutas (Reference Federmeier and Kutas2001), a negative-going component that peaked just after 200 ms was observed and was more negative for pictures that were unrelated to the prime word than pictures that depicted the prime word, especially along midline sites. They suggested that this component reflects the preactivation of structural representations of pictures by word primes. Therefore, we expected the manipulations in the current study to have an impact at 200 ms, but it was not entirely clear in advance whether the component would be positive-going or negative-going.
In some of the studies with picture targets, differences between related and unrelated pictures persisted as long as 1000 ms post stimulus onset, with related pictures showing larger positive/less negative responses than unrelated pictures, particularly in anterior electrodes. West and Holcomb (Reference West and Holcomb2002) suggested that the duration of the congruency effect may depend on the visual and semantic complexity of the target pictures. In the Chauncey et al. (Reference Chauncey, Holcomb and Grainger2009) study, a significant congruity effect was observed in a 500–700 ms time window, and this effect was largest at anterior sites. Therefore, here a congruity effect (Word Type x Picture Type interaction) might be observed in the late positive component (LPC). In summary, prior research suggests five ERP components that are likely to be sensitive to the manipulation in our study.
Method
Participants
Fifty-three Mandarin–English bilinguals (mean age 19, range 18–24, 37 female) were tested. Participants received course credit or money for their participation. Data from 21 participants were excluded from analyses (three of them had lived in an English country for less than one year, eight of them had poor ERP recordings, 10 of them did not complete both experimental sessions), leaving 32 Mandarin–English bilinguals in the final sample. The first language of all bilinguals in the final sample was Mandarin. All bilinguals in the final sample were born in China (31) or Taiwan (1), had lived there for a mean duration of 14.4 years (range 8–20), and had lived in Canada for a mean duration of 5 years (range 2–11). The participants reported that they were currently exposed to English for a mean of 48.3% (range 15–95%) of the time, and they were exposed to Mandarin for a mean of 48.5% (range 5–85%) of the time in their daily activities. Participants’ ratings of their fluency in English and Mandarin are presented in Table 1.
Materials
Experimental task
Each trial consisted of a word followed by a picture. The word was in either English or Mandarin, depending on the language of the session. The critical pictures chosen for the study consisted of 60 pairs of culturally biased pictures. Each pair consisted of a Canadian/Western-biased picture, and a Chinese-biased picture of the same concept. All of the critical pictures were preceded by their correct label in either English or Mandarin, and therefore required a yes decision. Filler stimuli were created to include some no decisions. In addition, since all the critical stimuli were culturally biased, filler stimuli of culturally neutral pictures were included to mask the manipulation of the experiment. There were 60 pairs of culturally biased filler pictures, 30 Canadian-biased and 30 Chinese-biased, all of which had a preceding incongruent word (a no decision was correct). There were also 60 pairs of culturally neutral fillers (e.g., a red apple and a green apple) that were pictures of common objects that are the same in both cultures. Half of the pairs had a preceding word that was congruent (a yes decision was correct) and half had an incongruent word (a no decision was correct). All pictures were real life pictures that were presented in colour. Researchers are welcome to contact the authors to obtain the materials.
Two lists of 180 pictures each were created. Each picture appeared on only one list. Each list consisted of 60 culturally biased critical pictures (all yes), 60 culturally biased filler pictures (all no) and 60 culturally neutral filler pictures (30 yes and 30 no). Each member of a pair of pictures was on a different list. On each list, half of the culturally biased pictures were Canadian/Western (30 critical, 30 filler) and half were Chinese. The two lists of pictures were further combined with English and Mandarin words to create 4 sub-lists: A1, A2, B1, and B2 (A1 represents list A in English; A2 represents list A in Mandarin; B1 represents list B in English; B2 represents list B in Mandarin).
Picture-rating task
To test the validity of the cultural representation in the critical pictures, each of the participants completed a picture-rating task after they had completed the experiment. The critical pictures were displayed in a printed booklet, and instructions asked participants to rate each picture on a 1 to 7 scale, in which 1 represented “very Canadian”, 4 represented “culturally neutral”, and 7 represented “very Chinese”. Two versions of the task were created such that each member of a pair of critical pictures was included in a different version (e.g., the picture of a Chinese dragon was included in version A, while the Western dragon was included in version B). Each version contained equal numbers of Chinese-biased (30) and Canadian-biased (30) pictures. Half of the participants completed version A, and half of the participants completed version B. Chinese culturally biased critical pictures received a mean rating of 6.36 (SD = 0.73), and Canadian culturally biased critical pictures received a mean rating of 2.65 (SD = 0.94), indicating that the participants agreed that the critical pictures we chose were indeed biased to one culture or the other.
Language questionnaire
A questionnaire developed in our lab was used to collect the information about language history and the basic demographic data that is reported in the Participant section.
Procedure
A word–picture matching task was used. Participants first saw a 200 ms fixation cross, then a word (e.g., dragon in English; 龙 in Mandarin) for 500 ms, followed by an image that stayed on the screen until participants made a response. Participants were instructed to judge whether the image matched the word. After the participant made a response, the computer screen was blank for 2000 ms before the next trial began. The bilinguals were tested in both Mandarin and English in separate sessions. Communication with participants was done entirely in the language of the session. The second session was conducted at least 7 days after the first. Half of the participants were presented with list A first, and the other half was presented with list B first. Half of the participants did the Mandarin session first, and half did English session first. Participants were evenly assigned to the four sub-list pairings: A1-B2, A2-B1, B1-A2, B2-A1. At the end of the first session, participants were asked to fill in the questionnaire about their language background. At the end of the second session, participants were asked to do the picture-rating task.
EEG recording and preprocessing
Continuous EEG activity was recorded at 32 scalp sites using ActiveTwo BioSemi active Ag/AgCl electrodes embedded in a custom elastic cap (BioSemi, Amsterdam, The Netherlands). The electro-oculogram (EOG) was recorded with electrodes placed above and below the right eye (vertical), and on the outer canthus of each eye (horizontal). Data were recorded using ActiView software (BioSemi) in the frequency range of 0.1–100 Hz at a sampling rate of 512 Hz. Offsets at each active electrode were kept between ±25 mV.
Off-line analysis was performed using ERPlab toolbox (Lopez-Calderon & Luck, Reference Lopez-Calderon and Luck2014). All data were re-referenced to the mean electrical activity of the mastoids and bandpass filtered with cutoffs of 0.1 and 30 Hz. The epochs of interest for target images were established to be from -200 to 800 ms post-stimulus onset. Data were baseline corrected to the prestimulus baseline. The data were filtered of eye-movement artifacts that were identified by running an independent component analysis (ICA). Eye artifact components (e.g., eyeblinks and horizontal eye movements) were identified upon visual inspection of the activity power spectrum and scalp topography of the component. If the scalp topography showed that the component affected the electrodes around the eyes, and it accounted for a high data variance and lacked peaks in the power spectrum, then we identified it as an eye artifact. After removing eye artifact components, trials contaminated with activity greater than ±75 microvolts (μΩ) were excluded from the analysis (7.6% of the trials).
Results
Data were analysed with linear mixed effects (LME) models in R (version 3.6.0, R Core Team, Reference Core Team2019) using the lme4 package (version 1.1–21, Bates, Mächler, Bolker & Walker, Reference Bates, Mächler, Bolker and Walker2015). The significance of the fixed effects was determined with effect coding and type-II Wald tests using the Anova function provided by the car package (version 3.0–3; Fox & Weisberg, Reference Fox and Weisberg2019). The latter are reported in the text. Full outputs from the models using the summary function appear in the Supplementary Materials (Supplementary Materials). Post-hoc tests (with Tukey correction for multiple comparisons) were done with pooled t tests by using the emmeans function provided by the emmeans package (version 1.3.5.1; Lenth, Singmann, Love, Buerkner & Herve, Reference Lenth, Singmann, Love, Buerkner and Herve2019). Posthoc tests were conducted separately for each language (all of the Berkes et al., Reference Berkes, Friesen and Bialystok2018, analyses were done separately for each language). In such analyses, the pictures in the congruent and incongruent conditions necessarily differ, and therefore a potential issue with this type of comparison is that differences in responses in the two conditions could be due to physical differences between the picture pairs, rather than an effect of cultural congruency. However, if such physical differences were responsible for the congruency effect, it should be in opposite directions in the two languages because the same set of pictures is used for each language but the assignment to pictures to congruency conditions is reversed. Nonetheless, cultural picture pairs should be chosen carefully such that they have similar amounts of visual detail. Alternatively, one could conduct posthoc analyses separately for pictures of each culture and compare responses to pictures when they are preceded by a word in a culturally congruent language and when preceded by a word in a culturally incongruent language. In this case, the pictures would be identical in the congruent and incongruent conditions (although would differ for each culture condition). However, the issue with such a comparison is that most bilinguals can extract information more quickly from printed stimuli in one of their languages than the other, and therefore differences in responses to pictures could be due to fluency differences in the languages rather than the cultural congruency of the picture. We opted for the former approach (i.e., as in Berkes et al., Reference Berkes, Friesen and Bialystok2018) and, as we will show, there was no main effect of picture type for the pictures that we selected, suggesting that picture pairs were indeed well-matched.
Behavioural data
Incorrect responses (5.2%), as well as RTs that were shorter than 200 ms or longer than 1500 ms (0.5%), were excluded from the analyses of the latency data for critical trials. Table 2 shows the mean RTs and error rates for critical trials. LME models were first fitted with Test Language (Mandarin vs. English, sum coded), and Picture Type (Chinese-biased vs. Canadian-biased, sum coded) as fixed effects, participants and items as random intercepts, and by-participant random slopes for the effects of Test Language and Picture Type (with interaction). If this model failed to converge, the interaction between Test Language and Picture Type was dropped. If the model still failed to converge, the by-participants random slopes were dropped.
In the latency data, there was a main effect of Test Language, χ2(1) = 14.75, p < .001. Participants responded faster when tested in Mandarin than in English. No significant main effect of Picture Type was found, χ2(1) = 1.52, p > .21. Importantly, there was an interaction between Test Language and Picture Type, χ2(1) = 12.65, p < .001. When tested in English, bilinguals responded 31 ms faster to Canadian-biased pictures than to Chinese-biased pictures (t = 3.01, p = .003), but when tested in Mandarin, bilinguals responded 7 ms slower to Canadian-biased pictures than to Chinese-biased pictures (t = 0.72, p > .40). In the error data, there was a main effect of Test Language, χ2(1) = 14.36, p < .001. Participants responded more accurately (4.1%) when tested in Mandarin than in English. No main effect of Picture Type was found, χ2(1) = 0.01, p > .92. There was again an interaction between Test Language and Picture Type, χ2(1) = 3.65, p = .05. Consistent with the RT data, when tested in English, bilinguals made fewer errors (1.5%) to Canadian-biased pictures than to Chinese pictures, but when tested in Mandarin, bilinguals made more errors (1.2%) to Canadian-biased pictures than to Chinese-biased pictures – however, neither comparison was significant.
ERP data
Three regions of interests were selected, and the response reported in each region is the mean response of the set of electrodes. The regions were (see Figure 3): anterior (AF3, AF4, F3, Fz, F4), central (C3, Cz, C4, FC1, FC2), and posterior (P3, Pz, P4, PO3, PO4, CP1, CP2). Based on the research cited in the Introduction, five ERP components were identified from grand-averaged data for all participants. The negative going N100 component peaked at around 150 ms and was measured in the 125–175 ms time window. The negative going N200 component peaked at around 200 ms and was measured in the 175–250 ms time window. The negative going N300 component peaked at around 300 ms and was measured in the 250–350 ms time window. The N400 component peaked at around 450 ms and was measured in the 350–500 ms time window. The positive going LPC was measured in the 500–700 ms time window. Figure 4 shows the grand average waveforms in microvolts (μV) evoked when bilinguals were tested in Mandarin and in English. Figure 5 shows the voltage maps showing the congruency effect (incongruent - congruent) on N1, N200, N300, N400, and LPC components when bilinguals were tested in Mandarin and in English. Mean amplitudes for the N100, N200, N300, N400, and LPC components were analysed with LME models. Models were fitted with Test Language (Mandarin vs. English, sum coded), Picture Type (Chinese-biased vs. Canadian-biased, sum coded), and Electrode Location (Anterior, Central vs. Posterior, sum coded) as fixed effects, participants as random intercepts, and by-participant random slopes for the effects of Test Language and Picture Type (without interaction). We also conducted a separate analysis with just the three midline electrodes (Fz, Cz, and Pz) to check whether results differed when electrodes which typically have a weaker signal were excluded. Models were fitted with Test Language (Mandarin vs. English, sum coded), Picture Type (Chinese-biased vs. Canadian-biased, sum coded), and Electrode Location (Fz, Cz vs. Pz, sum coded) as fixed effects, participants as random intercepts, and by-participant random slopes for the effects of Test Language and Picture Type (without interaction). If the model failed to converge, the by-participants random slopes were dropped. Although there were significant main effects of Electrode Location, none of the triple interactions of Test Language x Picture Type x Electrode Location were significant, and therefore we have not reported on this variable further in the text.
N100 (125–175 ms)
The main effect of Test Language approached significance, all electrodes: χ2(1) = 2.98, p = .08, midline electrodes: χ2(1) = 3.27, p = .07. Bilinguals elicited a more negative N1 when tested in English than in Mandarin. No main effect of Picture Type was found, all electrodes: χ2(1) = 0.03, p = .80, midline electrodes: χ2(1) = 0.07, p = 0.79. Of particular interest, there was an interaction between Test Language and Picture Type, all electrodes: χ2(1) = 3.28, p = .06, midline electrodes: χ2(1) = 6.39, p = .01. When the test language was English, Canadian pictures elicited a less negative N100 than Chinese pictures, but when the test language was Mandarin, Canadian pictures elicited a more negative N100 than Chinese pictures. However, none of the individual comparisons reached significance, in either the analyses with all electrodes or just midline electrodes.
N200 (175–250 ms)
There were no significant main effects of either Test Language or Picture Type, all ps > .20. Importantly, there was an interaction between Test Language and Picture Type, all electrodes: χ2(1) = 3.94, p = .04, midline electrodes: χ2(1) = 6.01, p = .01. The two types of pictures did not differ when English was the test language. However, when the test language was Mandarin, Chinese-biased pictures elicited a less negative N200 than Canadian-biased pictures (all electrodes: t = 2.29, p = .02; midline electrodes: t = 2.31, p = .02).
N300 (250–350 ms)
Again, there were no significant main effects of either Test Language or Picture Type, all ps > .23. There was a robust interaction between Test Language and Picture Type, all electrodes: χ2(1) = 6.68, p = .009, midline electrodes: χ2(1) = 10.20, p = .001. When the test language was English, Canadian pictures elicited a less negative N300 than Chinese pictures, but when the test language was Mandarin, Canadian pictures elicited a more negative N300 than Chinese pictures. None of the comparisons reached significance, although most were close (English– all electrodes: t = 1.51, p = .13, midline electrodes: t = 1.86, p = .07; Mandarin– all electrodes: t = −1.72, p = .08, midline electrodes: t = −1.91, p = .06).
N400 (350–500 ms)
We analyzed the waveforms in this time window separately from the previous time window, although the N400 in our data may not be a distinct component from the N300 but rather a continuation of that component (see Draschkow, Heikel, Vo, Fiebach & Sassenhagen, Reference Draschkow, Heikel, Vo, Fiebach and Sassenhagen2018). No main effects of Test Language or Picture Type were found, all ps > .13. There was again a robust interaction between Test Language and Picture Type, all electrodes: χ2(1) = 5.98, p = .01, midline electrodes: χ2(1) = 9.05, p = .002. When the test language was English, Canadian-biased pictures elicited a less negative N400 than Chinese-biased pictures (all electrodes: t = 2.26, p = .02, midline electrodes, t = 2.07, p = .04). When the test language was Mandarin, the pattern was in the opposite direction but the comparison between the two picture types did not reach significance, either in the analysis with all electrodes or just midline electrodes.
LPC (500–700 ms)
No main effect of Test Language or Picture Type was found, all ps > .15. There was an interaction between Test Language and Picture Type, all electrodes: χ2(1) = 5.03, p = .02, midline electrodes: χ2(1) = 8.23, p = .004. When the test language was English, Canadian pictures elicited a less negative LPC than Chinese pictures, but when the test language was Mandarin, Canadian pictures elicited a more negative LPC than Chinese pictures. The comparison for Mandarin approached significance for all electrodes (t = −1.71, p = .09) and was significant in midline electrodes (t = −2.08, p = .04), but the comparison for English was not significant in either analysis.
Discussion
The current study examined bicultural bilinguals’ conceptual representations of words that are considered to be translations of one another. Specifically, we investigated whether printed words in each of a bicultural bilingual's languages activate congruent culture-specific referents more strongly than referents from the other culture. Response times and ERP data were collected as Mandarin–English bilinguals performed a word–picture matching task. Critical pictures were from Chinese or Canadian culture and were preceded by Mandarin words in one session and English words in another. Overall, participants made faster responses when tested in Mandarin than in English, suggesting that bilinguals tested in the present study were more proficient in Mandarin than in English, which was consistent with the background information reported by the participants. Importantly, there was no main effect of Picture Type in any analysis, providing evidence that Chinese and Canadian pictures were well-matched. Critically, an interaction between Test Language and Picture Type was observed in response times and error rates in our behavioural data, and in each of the five ERP components that we examined. Participants detected a match when the language of the word and the culture of the picture were congruent and a mismatch when they were incongruent. These results suggest that the lexical representations of words that were shown in the task more strongly activated conceptual representations that were culturally congruent with their language than those that were culturally incongruent. More broadly, the findings provide strong evidence that the specific perceptual experiences that bilinguals encounter when learning words in each language influence the semantic features that are activated by those words.
Our behavioural results are consistent with Berkes et al.'s (2018) finding that Korean–English bilinguals matched auditory words and pictures faster when they were culturally congruent compared to when they were culturally incongruent. It is important to show that the result generalises to words and pictures of another pair of languages and cultures. However, here we also observed the critical interaction in five components of the ERP waveform, whereas they did not find a cultural congruency effect for bilinguals in their ERP data. One possible explanation for the difference in ERP results is that Berkes et al. presented the words auditorily at the same time as the picture, which may have obscured the congruency effect in the waveforms. Another possible explanation for the difference in results is that they presented each picture four times, whereas here each picture was presented only once. ERP responses to pictures may be attenuated with repeated presentation. Our behavioural results are also consistent with those of Jared et al.'s (2013) study of Mandarin–English bilinguals but extend their finding of a cultural congruency effect when going from a picture to the production of a verbal label to the opposite direction, that is, from a verbal label to a picture.
The crucial interaction between Test Language and Picture Type was observed in each component of the ERP data, despite the fact that all of the critical pictures did indeed depict the prior word. As noted in the introduction, most previous research compares a match and an obvious mismatch condition. Here the manipulation was more subtle, in that critical pictures all matched the prior word, but either matched or mismatched the culture associated with the language. Still, a robust interaction was observed between Test Language and Picture Type in the N300 and N400, components that reflect semantic fit. Priming effects in the N300 component have been found specifically for picture targets (e.g., Chauncey et al., Reference Chauncey, Holcomb and Grainger2009; Federmeier & Kutas, Reference Federmeier and Kutas2001; West & Holcomb, Reference West and Holcomb2002), whereas priming effects in the N400 have been observed with both picture and word targets (see Kutas & Federmeier, Reference Kutas and Federmeier2011, for a review). Hamm et al. (Reference Hamm, Johnson and Kirk2002) noted that theories of object perception propose that objects are first identified in a fairly general manner, such as the semantic category to which they belong, and then more specific identity information becomes available. They suggested that the N300 may reflect category-level semantic mismatches between word–picture pairs, whereas the N400 reflects mismatches at a specific level. Here, the N300 could reflect the detection of a mismatch between the expected and presented culture to which the picture belongs, which could be considered a broad category, whereas the N400 may reflect the detection of a mismatch between how a specific object was expected to look and how it actually appeared. Our observation that the N300 and N400 were not clearly distinct components suggests that there is an increasing depth of processing over time rather than two separate processing stages. The interaction between Test Language and Picture Type was also significant in the next time window from 500–700 ms. This may be an extended N400 due to the complexity of the pictures that we used (West & Holcomb, Reference West and Holcomb2002). Our results are consistent with those of Chauncey et al. (Reference Chauncey, Holcomb and Grainger2009) in finding that incongruent word–picture pairs produced less positivity than congruent pairs in this time window, although we did not observe a stronger effect in the anterior region than in other regions as they did.
The earliest ERP component in which an interaction between Test Language and Picture Type was observed was the N100 component. This component was hypothesized by Federmeier and Kutas (Reference Federmeier and Kutas2001) to reflect attentional load, with less attention needed for expected pictures than unexpected pictures. Their interpretation was based on a comparison of responses to pictures that were expected based on a highly constraining sentence context and those that were unexpected but from the same semantic category as expected pictures. Their interpretation of N100 fits with our study. It is reasonable to assume that less attention was needed to process pictures from the expected cultural context than the unexpected cultural context even though both depicted the presented word. The next early component in which the interaction was significant was the N200. A similar negative-going component was observed in the Chauncey et al. (Reference Chauncey, Holcomb and Grainger2009) study, but both of these findings differ from the positive-going component (P200) observed in the Federmeier and Kutas study. The difference in direction of the components across studies may be due to the differences in the experimental procedures, such as the presentation of a complex sentence vs. a single word before the target picture. Chauncey et al. suggested that the N200 component reflects the preactivation of structural representations of pictures by word primes.
Although the critical interaction between Test Language and Picture Type was significant in each of our analyses, the comparisons of Picture Type for each language were significant in only a few of the analyses. These comparisons were likely not more robust because, as previously noted, the manipulation was quite subtle, in that critical pictures all matched the prior word, but either matched or mismatched the culture associated with the language. There was an effect of Picture Type in the Mandarin session in the N200 and the effect approached significance in the N300, whereas in the English session the effect of Picture Type was in later measures, RT and the N400. Fluency differences between Mandarin and English may be a reason why these effects appeared slightly later in the English session. In the English session, there was likely slower and/or weaker activation of the meanings of the words that preceded the pictures than in the Mandarin session. Follow-up experiments with a larger sample size may reveal significant effects of Picture Type in more of the analyses for each language.
Theoretical implications
This study, along with those of Jared et al. (Reference Jared, Poh and Paivio2013) and Berkes et al. (Reference Berkes, Friesen and Bialystok2018), provides clear evidence that the specific perceptual experiences that bilinguals encounter when learning words in each language have an impact on the semantic representations activated by those words. They therefore provide support for embodied theories of cognition that assume that cognition is grounded in bodily states, modal simulations, and situated action (e.g., Barsalou, Reference Barsalou1999; Barsalou et al., Reference Barsalou, Santos, Simmons, Wilson, De Vega, Glenberg and Graesser2008; Glenberg & Robertson, Reference Glenberg and Robertson2000). In Lupyan and Lewis's (Reference Lupyan and Lewis2019) view, words do not merely map on to concepts but rather are cues, which along with perception and action help construct our semantic knowledge. Furthermore, the results are consistent with research on conceptual representations in monolinguals that has shown that many semantic features are modality specific and are represented in brain areas that are responsible for perception and action (e.g., Kiefer & Pulvermüller, Reference Kiefer and Pulvermüller2012; Martin, Reference Martin2007; Patterson et al., Reference Patterson, Nestor and Rogers2007).
According to feature-based theories of bilingual conceptual representations, such as the Distributed Conceptual Feature model (De Groot, Reference De Groot, Frost and Katz1992) and the Shared (Distributed) Asymmetrical model (Dong et al., Reference Dong, Gui and MacWhinney2005), translation words that are learned in different cultural contexts activate somewhat different semantic features. For example, the English word dragon is strongly connected to features like fierce and has wings, while the Mandarin word 龙 is strongly connected to features like emperor and has a serpent-like shape. When Mandarin–English bilinguals saw a word in the word–picture matching task, the word would have activated the most typical features of the concept in the culture of the language. Then when participants saw the target picture, there would be more overlap between the features activated from the word and the picture when they were culturally congruent than when they were incongruent. For example, the Mandarin word 龙 would have strongly activated features like emperor and serpent-like, which overlapped more with a picture of Chinese-biased dragon than a picture of Canadian-biased dragon. As a result, processing was facilitated for culturally congruent word–picture pairs compared to incongruent pairs. Although these feature theories are broadly consistent with our findings, the features themselves would need to be detailed enough to capture subtle differences between the referents of translation words that are learned in different cultural contexts. Further development of the notion of features in these bilingual theories could benefit by assuming that representations include modality-specific perceptual features. The theories also need to provide an account of how features are assembled into wholes (the binding problem).
In the Bilingual Dual Coding theory (Paivio & Desrochers, Reference Paivio and Desrochers1980), the non-verbal system is assumed to have modality-specific representations. Words in each of a bilingual's languages may be connected to different representations in this nonverbal system, or may be connected to at least some of the same representations but with different strengths, depending on the contexts in which the words in each language were learned. When Mandarin–English bilinguals saw a word in the word–picture matching task, they would have most strongly activated nonverbal representations that were consistent with the perceptual environment in which they learned that word. When the picture target matched the word in terms of culture, participants would process that picture more easily and would be less surprised by it than by a picture from the other culture. For example, when Mandarin–English bilinguals saw the Mandarin word 龙, they would have activated a representation of a serpent-like creature without wings. Then when they saw the picture of a Chinese dragon, the consistency between this representation and actual picture would facilitate processing compared to when they saw a picture of a fire-breathing dragon with wings. An advantage of this theory is that it offers an explanation of how parts of objects are assembled into wholes. This information is likely useful to be able to capture subtle differences between a bilingual's representations of an object in different cultures, such as those that involve differences in the relative size of features or differences in the spatial relationships among features.
A limitation of all of the theories just discussed is that they are verbal accounts and not implemented models. A popular model of bilingual word recognition, the Bilingual Interactive Activation (BIA+) model (Dijkstra & van Heuven, Reference Dijkstra and van Heuven2002), has recently been developed as a computational model called Multilink (Dijkstra et al., Reference Dijkstra, Wahl, Buytenhuijs, van Halem, Al-Jibouri, De Korte and Rekké2019). Although the authors acknowledge that it may be necessary for a model of bilingualism to make a distinction between language-dependent and language-independent semantic features, the model currently assumes holistic meaning representations that are fully shared or fully separate across languages. Our findings confirm that this is indeed an oversimplification that will have to be dealt with in future versions of the model.
Conclusion
The present study extended our understanding about bilingual conceptual representations for translation words, providing evidence that language-specific conceptual representations exist for words that are considered to be translations of one another. The present study also suggests that cultural differences play a significant role in bilingual conceptual representations. The referents for translation word pairs can differ across cultures. Links between culturally congruent words and conceptual representations are stronger than links between incongruent ones. Finally, the results of this study imply that a challenge of learning a new language is to acquire language- and cultural-specific conceptual representations in L2, especially for bicultural bilinguals. Other research (e.g., Matsuki, Hino & Jared, Reference Matsuki, Hino and Jared2021) suggests that it could take years or decades for bilinguals to develop native-like conceptual representations of L2 words through L2 cultural immersion.
Acknowledgments
This research was supported by a Discovery grant from the Natural Sciences and Engineering Council of Canada to Debra Jared.
Supplementary material
For supplementary material accompanying this paper, visit https://doi.org/10.1017/S1366728921000262
Data availability statement
The data that support the findings of this study are available by contacting Debra Jared (djjared@uwo.ca)
Competing interests
The authors declare none.