1. Introduction
The question of whether language affects cognition has generated vigorous debates in the past decades (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014b; Lucy, Reference Lucy2016). The hypothesis of Linguistic Relativity (Whorf, Reference Whorf1956) postulates that cross-linguistic differences in the semantic encoding affect cognitive processing, even when language is not actively involved in the decision-making process. Empirical evidence demonstrates two things. On the one hand, cross-linguistic differences in non-linguistic representations have been detected in various conceptual domains such as time (Boroditsky, Fuhrman & McCormick, Reference Boroditsky, Fuhrman and McCormick2011; Casasanto & Boroditsky, Reference Casasanto and Boroditsky2008), color (Athanasopoulos, Reference Athanasopoulos2009; Athanasopoulos, Damjanovic, Krajciova & Sasaki, Reference Athanasopoulos, Damjanovic, Krajciova and Sasaki2011), objects (Cook, Bassetti, Kasai, Sasaki & Takahashi, Reference Cook, Bassetti, Kasai, Sasaki and Takahashi2006; Pavlenko & Malt, Reference Pavlenko and Malt2011) and motion (Ji & Hohenstein, Reference Ji and Hohenstein2018; Park, Reference Park2019). On the other, these effects are task-dependent and obtained in certain conditions. For example, the influence of language on cognitive processing is most likely to appear when language is explicitly used during on-line thinking (Filipovic, Reference Filipovic2018; Montero-Melis, Jaeger & Bylund, Reference Montero-Melis, Jaeger and Bylund2016; Trueswell & Papafragou, Reference Trueswell and Papafragou2010), or when it is used as a strategy to solve a subsequent cognitive task (Lai, Rodriguez & Narasimhan, Reference Lai, Rodriguez and Narasimhan2014; Lupyan, Reference Lupyan2012). However, such effects may disappear when the access to language is blocked by task manipulation (Gennari, Sloman, Malt & Fitch, Reference Gennari, Sloman, Malt and Fitch2002; Montero-Melis & Bylund, Reference Montero-Melis and Bylund2017).
These varied findings have motivated researchers to explore further how language affects cognition in different conditions. The hypothesis of thinking-for-speaking (Slobin, Reference Slobin, Gumperz and Levinson1996) emphasizes the effects of language on on-line thinking when speakers are involved in language-driven activities. This approach has generated evidence that speakers of different languages demonstrate different non-linguistic patterns during the engagement in language comprehension or production. In line with the explicit use of language, other studies propose that speakers may use language as a strategy to solve a high-level cognitive task, especially when the task lacks an objective answer or with a time limitation (Finkbeiner, Nicol, Greth & Nakamura, Reference Finkbeiner, Nicol, Greth and Nakamura2002). Such “thinking after language” effect, as termed by Wolff and Holmes (Reference Wolff and Holmes2011), emphasizes the spontaneous recruitment of linguistic resources to aid working memory and facilitate answer formulation.
The investigation of language learning on cognitive processing mainly focuses on monolingual speakers, although recent studies start to extend the scope with bilingual speakers or L2 learners of various types (Athanasopoulos, Damjanovic, Burnand & Bylund, Reference Athanasopoulos, Damjanovic, Burnand and Bylund2015b; Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a; Cook & Bassetti, Reference Cook and Bassetti2011; Pavlenko & Volynsky, Reference Pavlenko and Volynsky2015). Bilingualism research on language and cognition mainly focuses on the extent to which learning a new language reshapes one's thinking. Studies to date have demonstrated that conceptual representations within the bilingual mind are flexible and dynamic, such that the internalization of L2-specific associations may give rise to cognitive restructuring of L1-specific categories. This process is likely to be modulated by various predictors such as age of acquisition (Lai et al., Reference Lai, Rodriguez and Narasimhan2014), language proficiency (Athanasopoulos et al., Reference Athanasopoulos, Damjanovic, Burnand and Bylund2015b), and frequency of language use (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a).
With the exception of Bylund and Athanasopoulos (Reference Bylund and Athanasopoulos2014a) and Bylund, Athanasopoulos, and Oostendorp (Reference Bylund, Athanasopoulos and Oostendorp2013), from a grammatical perspective, very little attempt has been made to address how speakers of more than two languages (i.e., multilingual speakers) conceptualize motion events from a lexical perspective. The current study takes a first step towards investigating how Cantonese–English–Japanese multilingual speakers with three typologically different languages (equipollent-framed, satellite-framed, and verb-framed) encode and gauge event similarity in caused motion. Specifically, we examine multilinguals’ linguistic and non-linguistic behaviours in the lexicalization and conceptualization of caused motion with a boundary crossing situation where the target motion contains a categorical change of location. Their processing efficiency in the decision-making process is measured by reaction time. In addition, the study also addresses whether the amount of language contact with each language affects their performance while controlling for the speakers’ language proficiency in the target language.
2. Background
2.1 Conceptual representations in the bilingual mind: the account of cognitive grammar
The interplay between language and cognition in speakers with more than one language raises many intriguing questions. These questions are related to 1) the degree to which bilinguals restructure their conceptualization patterns as a result of L2 acquisition; 2) the transfer phenomena in linguistic and non-linguistic representations; and 3) linguistic or extra-linguistic variables that modulate learner's cognitive behavior during L2 learning.
Empirical evidence shows that learning a new language means acquiring a new way of thinking. When speakers learn an additional language, they need not only acquire new linguistic references or frames, but also associated conceptual distinctions (Jarvis & Pavlenko, Reference Jarvis and Pavlenko2008; Pavlenko, Reference Pavlenko and Pavlenko2011). This may give rise to restructuring the existing conceptual categories acquired through the L1. This process, termed as conceptual or cognitive restructuring, refers to conceptual changes that bilinguals undergo during learning a new language. It is a gradual process and occurs in bilingual's verbal and non-verbal behaviors (Pavlenko, Reference Pavlenko and Pavlenko2011).
The concept of cognitive restructuring is well discussed within the framework of cognitive grammar, which provides a solid background for the mechanism of language learning and has been successfully applied to explain the relativistic effects and cognitive restructuring in the context of bilingualism or L2 learning (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014b; Casasanto, Reference Casasanto2008; Ji & Hohenstein, Reference Ji and Hohenstein2018; Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). According to cognitive grammar (Langacker, Reference Langacker1987, Reference Langacker1991, Reference Langacker2008), grammatical constructions are form-meaning pairings above the word level. Thus, language-specific ways of selecting and organizing information are directly related to how conceptualizations are represented in cognition. As a result, speakers of different languages construe the same event in conceptually different ways depending on the grammatical devices made available in their language (Athanasopoulos, Bylund, Montero-Melis, Damjanovic, Schartner, Kibbe & Thierry, Reference Athanasopoulos, Bylund, Montero-Melis, Damjanovic, Schartner, Kibbe and Thierry2015a; Bylund & Jarvis, Reference Bylund and Jarvis2011; Flecken, Carroll, Weimar & Von Stutterheim, Reference Flecken, Carroll, Weimar and Von Stutterheim2015b; Von Stutterheim, Andermann, Carroll, Flecken & Schmiedtova, Reference Von Stutterheim, Andermann, Carroll, Flecken and Schmiedtova2012; Von Stutterheim & Nuse, Reference Von Stutterheim and Nuse2003). For example, the presence or absence of linguistic devices for grammatical aspect in a language affects the degree to which speakers express and allocate their attention to event trajectory and endpoints (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a; Flecken et al., Reference Flecken, Carroll, Weimar and Von Stutterheim2015; Von Stutterheim, Andermann, Carroll, Flecken & Schmiedtova, Reference Von Stutterheim, Andermann, Carroll, Flecken and Schmiedtova2012). Speakers of an aspect language (i.e., languages with grammatical means to present aspectual contrast) are less prone to mention the endpoint in lexicalization and focus more on the ongoing phrase of the target event, whereas speakers of a non-aspect language (i.e., languages without grammatical means to encode aspect) tend to express the endpoint of an event more frequently and adopt a holistic perspective in event categorization. The findings can be interpreted by a psycholinguistic model that “conceptual categories encoded in a grammatical system play an active role in the cognitive filter set up in processes of attention allocation and event construal when talking about events“ (Von Stutterheim et al., Reference Von Stutterheim, Andermann, Carroll, Flecken and Schmiedtova2012, p.862). Thus, the linguistic structures highlighted by grammar (i.e., number, aspect making, and finite verbs) tend to be placed with greater prominence in speakers’ mental representations.
For bilingual speakers, continuous exposure to novel events throughout their lifetime will facilitate learners to form new form-meaning pairings of the same event based on statistical regularities of the language-specific patterns in different contexts. From the perspective of bilingualism and L2 learning, the main concern is about how the internalization of L2-specific form-meaning pairings interact with the L1-biased patterns, and whether it can trigger the restructuring of the existing categories in the bilingual mind (Athanasopoulos et al., Reference Athanasopoulos, Damjanovic, Burnand and Bylund2015b). In fact, cross-linguistic differences in conceptual representations are affected by the degree of exposure to language-specific constructions: the more routinized an association becomes, the easier it is to be retrieved from memory and unitized for the purposes of categorization (Langacker, Reference Langacker2008).
As mentioned above, cross-linguistic differences in conceptualization demonstrate a linear and gradual process. This process is context-bound and highly open to individual differences in a learner's trajectory. One key factor is related to the frequency of exposure to a specific form-meaning association. Empirical evidence shows that conceptual representations are subject to constant changes as a function of exposure and language use (Athanasopoulos et al., Reference Athanasopoulos, Damjanovic, Krajciova and Sasaki2011; Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a; Bylund et al., Reference Bylund, Athanasopoulos and Oostendorp2013). In other words, frequent language use will strengthen the language-specific form-meaning associations whereas infrequent language use will weaken the associations. In addition, the degree of cognitive restructuring of language-specific associations can be modulated by other extra-linguistic factors such as age of L2 acquisition (Athanasopoulos, Reference Athanasopoulos2009; Boroditsky, Reference Boroditsky2001; Lai et al., Reference Lai, Rodriguez and Narasimhan2014), L2 proficiency (Athanasopoulos et al., Reference Athanasopoulos, Damjanovic, Burnand and Bylund2015b; Ji, Reference Ji2017; Park & Ziegler, Reference Park and Ziegler2014), language context (Filipović, Reference Filipović2011; Montero-Melis et al., Reference Montero-Melis, Jaeger and Bylund2016) and length of immersion in an L2-speaking community (Cook et al., Reference Cook, Bassetti, Kasai, Sasaki and Takahashi2006; Daller, Treffers-Daller & Furman, Reference Daller, Treffers-Daller and Furman2011; Park, Reference Park2019).
2.2 Motion event descriptions in English, Japanese and Cantonese
The domain of motion serves as an ideal testing ground for the intrinsic interplay between language and cognition as world languages exhibit great variabilities in the semantic encoding. A caused motion refers to a situation where an agent exerts some external forces on an object which causes its direct movement (Talmy, Reference Talmy2000). It is a complex type of motion and contains a number of semantic elements, such as path of motion (into, out of), cause (take, carry), manner of cause (push, pull) and manner of object (roll, slide).
Following Talmy's typological distinctions (Reference Talmy and Shopen1985, Reference Talmy2000), the world's languages fall into two broad categories based on the semantic distribution of path of motion. In satellite-framed languages (S-languages) such as English and German, path is encoded outside of the verb in a satellite; whereas cause of motion, like manner, is an external co-event, which can be conflated with motion in the main verb. In contrast, for V-languages such as Japanese and French, cause of motion is typically conflated with path in the main verb, leaving manner of cause unexpressed (by default) or via peripheral devices (i.e., subordinations, gerunds, or adverbial clauses). For example, as an S-language, the most prototypical way for English is to conflate manner of cause with motion in the main verb – and, by contrast, to conflate manner of cause with path in the satellite.
(1) The boy pushed [Manner of Cause] a box into [Path] a cave.
On the contrary, as a V-language, Japanese either conflates path in the main verb, leaving manner of cause unexpressed, as shown in example (2a) or via a subordinate form (the -te conjunctive marker), as illustrated in (2b).
(2)
a. 彼 は 荷を 上げた [Path + Cause] (Yiu, Reference Yiu2013).
Kara wa ni-o ageta
S/he TOP goods ACC ascend PST
‘S/he moved up the goods’.
b. 彼 は 荷を 押して [Manner of Cause] 道を
渡りました [Path]
Kara wa ni-o oshite michi-o
watalimashita
S/he TOP goods-ACC pushing road-ACC cross PST
‘S/he crossed the road pushing the goods’.
Talmy's classification has been useful in analyzing Indo-European languages, but does not easily apply to serial-verb languages such as Chinese, Tai, and other Sino-Tibetan languages. Thus, Slobin introduces a third type, known as equipollent-framed languages, in which “both Manner and Path are expressed by equipollent elements, that is, elements that are equal in formal linguistic terms, and appear to be equal in force or significance” (Slobin, Reference Slobin, Strömqvist and Verhoeven2004, p. 226). Cantonese, widely spoken in Hong Kong and Guangdong Province of China, is a serial-verb language (Matthews & Yip, Reference Matthews and Yip2011). A serial-verb construction in Cantonese usually consists of two or more verbs, with each being able to stand alone as an independent element (Matthews, Reference Matthews, Aikhenvald and Dixon2006). The most prototypical way in Cantonese to encode caused motion is the disposal construction, which consists of a disposal marker ‘zoeng1’, followed by two or more transitive verbs (Yiu, Reference Yiu2013, Reference Yiu and Yiu2014). Example (3) shows that in caused motion, manner of cause and path are conflated in the form of a verb compound. In addition, like Japanese, Cantonese also allows the conflation of cause with path in the main verb, leaving manner in the form of subordination or unexpressed at all, as demonstrated in example (4). Given these typological features, Cantonese is classified as an equipollent-framed language incorporating typological features of both S- and V-languages (Lamarre, Reference Lamarre2007; Yiu, Reference Yiu2013, Reference Yiu and Yiu2014).
(3) 佢 將 一架車 拉上 [Manner of Cause+ Path]
一個山 (Yiu, Reference Yiu and Yiu2014).
Keoi5 zoeng1 jat1 gaa 3 ce1laai1 soeng5
jat1 go3 saan1
S/he DM a car push-ascend a CL hill
‘S/he pulled a car up a hill.’
(4) 佢 上咗 [Cause+Path] 三箱貨 喺
個架 (度)
Keoi5 soeng5 zo2 saam1 soeng1 fo3 hai2
go3 gaa2 (dou6).
S/he ascend ASP three goods at the
CL shelf (Localizer)
‘S/he moved three boxes of goods up onto the shelf.’
The typological status of Cantonese can be attributed to diachronic changes that classical Chinese went through from a V-language to an S-language (Peyraube, Reference Peyraube, Hickmann and Robert2006), and such typological transformations in some Chinese dialects, such as Cantonese, have not completed yet (Xu, Reference Xu2006; Yiu, Reference Yiu2013, Reference Yiu and Yiu2014). Thus, it has been proposed that the typological distinctions between S-and V-languages should not be viewed as an absolute dichotomy, but a continuum with various degrees of manner and path salience (Slobin, Reference Slobin, Strömqvist and Verhoeven2004; Zlatev & Yangklang, Reference Zlatev, Yangklang, Strömqvist and Verhoeven2004).
Encoding manner or cause of motion via peripheral devices is more characteristic for V-language speakers when describing a boundary-crossing movement. According to the boundary-crossing constraint, manner verbs are not supposed to be used in a situation where there is a categorical change of location (Aske, Reference Aske1989; Slobin & Hoiting, Reference Slobin and Hoiting1994). However, this constraint does not apply to S- or E-languages (Slobin, Reference Slobin, Strömqvist and Verhoeven2004, Reference Slobin, Hickmann and Robert2006). Thus, it is suggested that a boundary-crossing movement serves as the clearest context to address the cross-linguistic differences in motion event cognition (Slobin, Reference Slobin, Hickmann and Robert2006).
As reviewed above, the typological contrasts in motion event encoding makes manner of cause less codable in Japanese compared with English and Cantonese as there is no obligatory syntactic slot to encode this information. In addition, as V-languages only license manner subordination in a boundary-crossing event, the high load of cognitive processing tends to prevent V-language speakers from encoding manner as often as S-language speakers. Drawing on the manner salience hypothesis (Slobin, Reference Slobin, Strömqvist and Verhoeven2004), speakers’ memory and attention are guided by variations in the lexical and grammatical patterns, such that speakers allocate more attention to the element that is made more available and salient by the language (Slobin, Reference Slobin, Gentner and Goldin-Meadow2003, Reference Slobin, Hickmann and Robert2006).
2.3. Cross-linguistic differences in motion event encoding
Cross-linguistic research on motion event lexicalization mostly focus on voluntary motion, whereas only few studies have explored caused motion events (Choi & Bowerman, Reference Choi and Bowerman1991; Hendriks, Hickmann & Demagny, Reference Hendriks, Hickmann and Demagny2008; Hickmann & Hendriks, Reference Hickmann and Hendriks2010; Hickmann, Hendriks, Harr & Bonnet, Reference Hickmann, Hendriks, Harr and Bonnet2018; Ji, Hendriks & Hickmann, Reference Ji, Hendriks and Hickmann2011; Montero-Melis & Bylund, Reference Montero-Melis and Bylund2017). Some studies have reported that language-specific features in caused motion encoding are found among English and Korean (Choi & Bowerman, Reference Choi and Bowerman1991), Spanish and Swedish (Montero-Melis & Bylund, Reference Montero-Melis and Bylund2017), and English and Chinese (Ji et al., Reference Ji, Hendriks and Hickmann2011). However, other studies have found no cross-linguistic differences in the linguistic encoding of manner of cause, path, and manner of object between English and French (Hickmann & Hendriks, Reference Hickmann and Hendriks2010; Hickmann et al., Reference Hickmann, Hendriks, Harr and Bonnet2018). The mixed results suggest that it is important to choose optimal languages pairs with clearest typological contrasts when conducting cross-linguistic research.
With respect to L2 acquisition, some studies have demonstrated that bilinguals or L2 learners with typologically different languages may transfer L1-based lexicalization patterns into an L2 (Cadierno & Ruiz, Reference Cadierno and Ruiz2006; Daller et al., Reference Daller, Treffers-Daller and Furman2011; Ochsenbauer & Engemann, Reference Ochsenbauer and Engemann2011). However, other studies report that L2 learners are able to restructure their L1-based lexicalization patterns when describing motion events with an L2 (Ji & Hohenstein, Reference Ji and Hohenstein2014). The diverse results indicate that conceptual restructuring is a dynamic process and susceptible to individual differences, such as age of acquisition (Engemann, Harr & Hickmann, Reference Engemann, Harr, Hickmann, Filipovic and Jaszczolt2012; Hohenstein, Eisenberg & Naigles, Reference Hohenstein, Eisenberg and Naigles2006), L2 proficiency (Ji & Hohenstein, Reference Ji and Hohenstein2014; Treffers-Daller & Calude, Reference Treffers-Daller and Calude2015), and frequency of language use (Daller et al., Reference Daller, Treffers-Daller and Furman2011).
2.4. Cross-linguistic differences in motion event conceptualization
Moving beyond language use, cross-linguistic research on motion event starts to question whether language-specific patterns in lexicalization affect event conceptualization at a deeper level of cognition. These studies have been well-documented in children and adults, with different combination of language pairs, and by a wide range of non-verbal measurements such as similarity judgements, recognition memory, attention allocation, reaction times and gestures (Brown, Reference Brown2015; Filipović, Reference Filipović2011; Flecken et al., Reference Flecken, Carroll, Weimar and Von Stutterheim2015b; Montero-Melis et al., Reference Montero-Melis, Jaeger and Bylund2016; Papafragou, Hulbert & Trueswell, Reference Papafragou, Hulbert and Trueswell2008; Von Stutterheim et al., Reference Von Stutterheim, Andermann, Carroll, Flecken and Schmiedtova2012).
Some studies utilize a triads-matching paradigm to tap into participant's potential bias in event categorization or similarity judgments. Results showed that, on the one hand, effects of language on conceptualization were found when language was either explicitly or implicitly involved in the process of decision-making (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002; Montero-Melis & Bylund, Reference Montero-Melis and Bylund2017; Papafragou & Selimis, Reference Papafragou and Selimis2010). However, such effects disappeared when a verbal interference (Ji & Hohenstein, Reference Ji and Hohenstein2017; Trueswell & Papafragou, Reference Trueswell and Papafragou2010) or a task distraction was introduced (Filipović & Geva, Reference Filipović, Geva, Filipović and Jaszczolt2012). Meanwhile, other studies use a recognition memory test to explore whether different degrees of salience language attaches to manner and path affect the memorization and recalling of relevant linguistic elements (Filipović, Reference Filipović2011, Reference Filipović2018; Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). Results further illustrate that the effects of language on thought are obtained under conditions when the access to language is not blocked during cognitive processing.
In addition, many studies have applied an eye tracking and preferential looking scheme to examine participants’ attention allocation during event perception (Flecken et al., Reference Flecken, Carroll, Weimar and Von Stutterheim2015b; Papafragou et al., Reference Papafragou, Hulbert and Trueswell2008; Soroli & Hickmann, Reference Soroli, Hickmann, Marotta, Lenci, Meini and Rovai2010; Von Stutterheim et al., Reference Von Stutterheim, Andermann, Carroll, Flecken and Schmiedtova2012). For example, Papafragou et al. (Reference Papafragou, Hulbert and Trueswell2008) used eye-tracking to test whether native speakers of English (S-language) and Greek (V-language) allocated the same amount of attention to manner and path while viewing an unfolding event. Results showed that language-specific patterns in motion event lexicalization affected participant's attention allocation during speech production. However, such effects disappeared in a non-verbal condition when the access to language was blocked by task manipulation. Similar results were reported by Von Stutterheim et al. (Reference Von Stutterheim, Andermann, Carroll, Flecken and Schmiedtova2012), who investigated how languages with a grammatical aspect system biased speakers’ attention towards event trajectory or endpoint. Based on the eye-tracking data from adult participants of seven languages, results illustrated that non-aspect language speakers encoded endpoints more often in verbalization and allocated more attention to endpoints in conceptualization, whereas aspect-language speakers were less prone to mention endpoints in verbalization and paid more attention to the ongoing phase on the same event. The findings are in line with the cognitive grammar that attention is drawn towards the linguistic forms highlighted by grammar.
More recently, several studies have started to use reaction time as a subtle measurement for participant's cognitive behaviors in motion event perception (Flecken, Athanasopoulos, Kuipers & Thierry, Reference Flecken, Athanasopoulos, Kuipers and Thierry2015a; Ji, Reference Ji2017; Ji & Hohenstein, Reference Ji and Hohenstein2017, Reference Ji and Hohenstein2018). In a recent study, Ji and Hohenstein (Reference Ji and Hohenstein2018) examined how Chinese and English children and adults categorized and responded to caused motion in a non-verbal condition. Results showed that participants demonstrated an overall preference for path-match choices in categorization regardless of age and language group. However, their reaction time to manner and path preferences patterned with the linguistic properties of each language: English monolinguals reacted much quicker in making manner-match choices than path-match choices whereas Chinese monolinguals reacted equally quickly in making manner-and path-match preferences. It was suggested that the form-meaning associations between grammatical status of manner/path and non-linguistic can be detected even if language use was blocked via a verbal shadowing in decision-making.
Beyond studies of monolingual speakers, only few studies have probed into the effects of language on thought with bi- or multilingual speakers. The core issue with bilingual speakers or second language learners lies in whether learning an additional language with contrastive typological features will give rise to the cognitive restructuring and factors that modulate the restructuring process (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014b).
On the one hand, some studies suggest that the already established conceptual categories in the L1 are stable and resistant to change regardless of the increased L2 proficiency (Aveledo & Athanasopoulos, Reference Aveledo and Athanasopoulos2016; Cadierno, Reference Cadierno, Han and Cadierno2010; Filipovic, 2018). For example, Filipovic (Reference Filipovic2018) examined the lexicalization patterns and recall memory in causation events (intentional vs non-intentional) with late English–Spanish and Spanish–English bilinguals. Results suggested that bilinguals consistently relied on their habitual L1 thinking patterns as an aid to facilitate memory even though L2 was used in language production.
On the other hand, a few studies have demonstrated that learning an additional language means acquiring a new way of thinking, especially when participant's access to language use is not blocked in cognitive processing (Athanasopoulos, Bylund, et al., Reference Athanasopoulos, Bylund, Montero-Melis, Damjanovic, Schartner, Kibbe and Thierry2015a; Brown & Gullberg, Reference Brown and Gullberg2011; Filipović, Reference Filipović2011; Hohenstein et al., Reference Hohenstein, Eisenberg and Naigles2006; Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010; Lai et al., Reference Lai, Rodriguez and Narasimhan2014). For example, Kersten et al. (Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010) examined how late Spanish–English bilinguals classified novel objects by using a supervised learning paradigm. Results showed that bilinguals achieved better performance in manner recognition when tested in an English-instructed context than in a Spanish-instructed context. Similar results were reported by Lai et al. (Reference Lai, Rodriguez and Narasimhan2014): that, in event categorization, late English–Spanish bilinguals in a Spanish-priming condition were more likely to base their judgment on path of motion than those in an English-priming condition and English monolinguals. In addition, Athanasopoulos, Bylund, et al. (Reference Athanasopoulos, Bylund, Montero-Melis, Damjanovic, Schartner, Kibbe and Thierry2015a) further demonstrated that German–English bilinguals switched their preferences between ongoingness and goal orientation in event categorization as function of language in operation.
To sum up, the overall results in L2 acquisition and bilingualism have provided evidence for cognitive restructuring when participants are involved in language use. And the degree of cognitive restructuring is modulated by various predictors such as age of L2 acquisition (Filipović, Reference Filipović2011; Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010; Lai et al., Reference Lai, Rodriguez and Narasimhan2014), L2 proficiency (Athanasopoulos, Reference Athanasopoulos2009; Athanasopoulos et al., Reference Athanasopoulos, Damjanovic, Burnand and Bylund2015b; Ji, Reference Ji2017; Park & Ziegler, Reference Park and Ziegler2014), and the frequency of L2 use (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a; Bylund et al., Reference Bylund, Athanasopoulos and Oostendorp2013).
3. The present study
The current study aims to expand the sphere of event cognition from bilingualism to multilingualism from a lexical perspective and takes a first step in investigating how speakers of three typologically different languages gauge event similarity in caused motion. It aims to examine whether, and to what extent, the acquisition of an L2-English and an L3-Japanese recalibrates the lexicalization (event structures and semantic distributions) and conceptualization patterns (categorical preferences and reaction time) established in the L1-Cantonese when participants are actively involved in speaking the target languages (i.e., L2-English for bilinguals and L3-Japanese for multilinguals). Specific research questions are formulated as follows:
1. How do Cantonese–English–Japanese multilinguals lexicalize caused motion in their L3 compared with Cantonese-English bilinguals and monolinguals of each language?
2. How do Cantonese–English–Japanese multilinguals conceptualize caused motion compared with Cantonese-English bilinguals and monolinguals of each language?
3. To what extent does the amount of language contact with each language affect the restructuring process in the multilingual mind?
4. Method
4.1 Participants
A total of 150 university students took part in the study and divided into five language groups (N = 30 each group). Native controls of Cantonese (Mage=22.1, SD = 2.7), English (Mage=23.7, SD = 1.9) and Japanese (Mage=24.6, SD = 2.3) were recruited from local universities of China, UK and Japan. Native controls of monolinguals in the study refer to those with limited proficiency and minimal exposure to any foreign language. The dominant language in their daily communication is the native language. Cantonese-English bilinguals (Mage=20.7, SD = 2.1) and Cantonese–English–Japanese multilinguals (Mage=21.2, SD = 1.8) were from Hong Kong where both Cantonese and English are the official languages. According to the language policy in HK, students normally start the L2-English learning from an average age of three as early-bilinguals and pick up a third language as either Major or Minor at university.
For bilingual speakers, their onset of L2 learning was 3.7 (SD = 1.5), whereas for multilingual speakers, their L2 onset was 3.4 (SD = 1.7), and 19.2 (SD = 1.4) for the L3. Due to an early exposure and active use of the L2, speakers have already achieved a high level of proficiency in English. In line with previous studies (Athanasopoulos et al., Reference Athanasopoulos, Damjanovic, Burnand and Bylund2015b; Montero-Melis et al., Reference Montero-Melis, Jaeger and Bylund2016; Park & Ziegler, Reference Park and Ziegler2014), participant's language proficiency was self-evaluated in a language history questionnaire. Participants needed to evaluate their current proficiency in all languages they know based on a seven-point scale where 7 is the maximum rating. In accordance with the Common European Framework of Reference for language (Council of Europe, 2011), bilingual's proficiency of English (M = 6.41; SD = 0.51) and multilingual's proficiency of English (M = 6.56; SD = 0.45) and Japanese (M = 6.26; SD = 0.67) were above the upper intermediate level (C rating), as measured by their self-rating scores. Thus, bilingualism and multilingualism in the current study is defined as an alternate of two or more languages of high proficiency.
To measure multilinguals’ language contact with Cantonese, English and Japanese, daily language use within the last three months was estimated by hours. Participants were asked to self-report the time they spent on doing the daily activities (e.g., watching television, reading for school, and talking with native speakers etc.) with each language. Detailed information about multilingual speakers’ language background is presented in Table 1.
Note: The frequency of language use in Cantonese, English and Japanese was estimated by hours/day.
4.2 Materials
4.2.1 Task 1: linguistic encoding of caused motion
Linguistic descriptions of caused motion were elicited by a total of 48 animated cartoons with 36 test items and 12 control items. Each animation was 6 seconds long. Following the model developed by Hickmann and Hendriks (Reference Hickmann and Hendriks2010), this study focuses on a specific type of complex motion where different linguistic elements (i.e., manner of cause, manner of object and agent) can be encoded simultaneously. The test items depicted a boy (the agent) performing a certain action (i.e., push or pull) on the object, which directly caused its movement (i.e., roll or slide) along a certain trajectory (i.e., into, out of). Each animation has a clear destination (goal of motion). In addition, the agent moved together with the object throughout the course by walking. The control items shared the same types of actions with the test items but largely minimized the path information. The involvement of the control items had two functions: 1) to distract participants following the same lexicalization patterns and 2) to test whether multilingual speakers have already mastered the related vocabulary to describe various types of manner in the target language. Altogether four specific types of manners of cause (pull, push, drag and kick) and four types of path (into, out of, across and along) were covered in the stimuli. The stimuli were fully randomized and counterbalanced across participants. A whole list of stimuli is presented in Appendix A.
4.2.2 Task 2: Non-linguistic categorization of caused motion
The stimuli consisted of 18 sets of animated videos, including 12 sets of test triads and 6 sets of filler items. The test items had the same content with the stimuli used in the linguistic encoding. This was to make sure that participants had described all scenes prior to event categorization. Each triad contained three animated videos: a target video (e.g., A boy pushes a box into the room), and its two alternates with manner and path as the contrast of interest. For example, for manner-match alternate, manner of cause remained the same while path was changed (e.g., A boy pushes a box out of the room). For path-match alternate, path of motion kept the same whereas manner of cause was different (e.g., A boy pulls a box into the room). In order to keep manner-path as the only contrast of interest, other semantic components in the caused motion (Figure, Ground, and Goal) remained consistent across each test triad. Following Loucks and Pederson (Reference Loucks, Pederson, Bohnemeyer and Pederson2011), 6 sets of filler items were introduced to mask the contrast of interest and distract participants from strategically using the same pattern throughout the whole course. Thus, half of the filler contrasted manner of cause with Ground while the other half contrasted path with Ground. Altogether four sets of manner of cause contrasts (push-pull; push-drag; push-kick and drag-kick) and four sets of path contrasts were used (into-out of; across-along; out of-along and into-along) in the stimuli. All stimuli were horizontal motions and the direction of agent's movement (i.e., from left to right or from right to left of the screen) was counterbalanced across each triad.
4.3 Procedure
4.3.1 Training section
Participants were tested individually by the experimenter in a quiet room at their universities. All the stimuli were displayed and run by software Superlab 5.0 on a MacBook laptop. A training session was given at the beginning of each experiment to get participants familiarized with the test procedures.
4.3.2 Test section
Following other well-established studies within the framework of ‘thinking-for-speaking’ and ‘thinking-after-language’ (Filipović, Reference Filipović2011, Reference Filipović2018; Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002; Lai et al., Reference Lai, Rodriguez and Narasimhan2014; Montero-Melis, Jaeger & Bylund, Reference Montero-Melis, Jaeger and Bylund2016; Papafragou & Selimis, Reference Papafragou and Selimis2010), participants verbalized all categorization stimuli in a language production task immediately prior to the subsequent similarity judgements. This operationalization was to maximally boost the engagement of language during the decision-making process.
In the first experiment, participants were instructed to watch the categorization stimuli first and describe “what happened” in each clip right after the viewing. Monolinguals narrated in their L1s. As the current study aims to investigate whether learning a new language means acquiring a new way of thinking, bi-and multilingual participants were asked to narrate only in their L2-English and L3-Japanese respectively. In order to establish a “monolingual mode” (Grosjean, Reference Grosjean and Nicol2001), all instructions were given in the language participants used in speech production.
After the linguistic encoding, participants moved on to a subsequent similarity judgement. Following Ji and Hohenstein (Reference Ji and Hohenstein2018), participants were instructed that the categorization stimuli were presented in a synchronized order where the target video played first at the bottom of the screen. Then the target disappeared right after its completion, followed by its two simultaneous alternates playing side by side at the top of the screen. A half-second black screen was placed between the target video and its two alternates within each triad and a one-second black screen was placed between triads. The presentation order of each triad was counterbalanced across participants in each group. The location of manner- and path-match alternate on the screen (right-or left-side) was counterbalanced across stimuli in a fixed order. Participants needed to decide which alternate video was more similar to the target by pressing one of the two keys: A and L respectively on the keyboard. They were required to make their decisions as soon as possible as their reaction time in the decision-making process was automatically recorded.
After the experimental session, participants were instructed to complete a language background questionnaire.
4.4 Data coding
4.4.1 Linguistic data
The linguistic data was transcribed by three L1 speakers of each language and segmented in to clauses. Only test items were coded for the analysis. Following Berman and Slobin (Reference Berman, Slobin, Berman and Slobin1994) a clause is defined as either syntactically simple sentences or complex sentences containing a unified predicate. Then the semantic encoding of each clause was conducted following the guidelines developed for English and Japanese (Brown & Gullberg, Reference Brown and Gullberg2010; Hickmann, Taranne & Bonnet, Reference Hickmann, Taranne and Bonnet2009). Cantonese data was transcribed based on the adapted guidelines for Chinese data (Ji & Hohenstein, Reference Ji and Hohenstein2014). Descriptions without a specific focus on motion were excluded from the analysis (e.g., The river was frozen). Most of the target responses (98%) consisted of one clause. Within each clause, descriptions were coded from two perspectives: 1) whether participants mentioned path and manner of cause and 2) where the target elements were encoded (i.e., in the main verb or other peripheral devices). To establish data coding reliability, 20% of the entire data was re-coded by a second rater. The inter-coder reliability measured by the Kappa Index (Cohen's kappa = .97) showed that a high agreement was reached between coders on the frequency and semantic distribution of each element. For the control items, one point was given when participants used the target manner verbs in their oral descriptions.
4.4.2 Non-linguistic data
Non-linguistic data in the similarity judgement task were coded as a binary dependent variable where “0” represented participants’ choice for path-match alternate, and “1” for cause-manner-match alternate. Participants’ reaction time in each triad was measured as a continuous variable and calculated from the onset of playing of the alternative video until participants made their decisions. Theoretically, the longest RT to each triad is 6 seconds (the same length as the video clip). Outliers of extremely long and short values were trimmed with plus and minus two standard deviations (SD) from the mean. Altogether 85 outliers out of 1,800 items were replaced by two SDs from the mean. After the data trimming, more than 95% of the data was included in the data set.
5. Results
5.1 Linguistic encoding of caused motion event
5.1.1. Frequency of Manner and Path encoding across five groups
Altogether 5324 target descriptions were included for the final analysis. Monolingual descriptions were in Cantonese, English and Japanese. Bilingual descriptions were in L2-English whereas multilingual descriptions were in L3-Japanese. Participants’ selection of manner of cause and path across each utterance was transformed into percentage scores and compared as a function of participant group. As shown in Figure 1, participants of each group were very likely to express Path with a high-to-ceiling frequency (English: M = 97.22%, SD = 4.95%; Bilinguals: M = 95.83%, SD = 5.35%; Cantonese: M = 95.65%, SD = 5.24%; Multilinguals: M = 96.95%, SD = 3.45%; Japanese: M = 97.03%, SD = 3.17%). However, regarding the encoding of Manner of Cause, there was a hierarchical decrease across participant group (English: M = 98.15%, SD = 3.59%; Bilinguals: M = 97.31%, SD = 4.40%; Cantonese: M = 90.65%, SD = 11.49%; Multilinguals: M = 79.91%, SD = 17.79%; Japanese: M = 76.85%, SD = 13.56%), that is, English monolinguals, bilinguals in L2-English and Cantonese monolinguals predominantly encoded manner of cause in their oral descriptions, as shown in (5), (6) and (7), while Japanese monolinguals and multilinguals in L3-Japanese presented the lowest level of C-manner encoding, as shown in (8) and (9).
(5) English monolinguals:
A boy is pulling [C-manner] a metal chair into [Path] his bedroom (ENG12cau).
(6) Bilinguals: A boy is pushing [C-manner] a suitcase across [Path] the street (BIL21cau).
(7) Cantonese monolinguals:
Keoi5 zoeng1 jat1 gaa3ce1 laai1jap6 [C-manner + path]
jat1go3saan1dung6
He DM a toy car drag-enter a cave
‘He dragged a toy car into a cave.’ (CAN6cau)
(8) Japanese monolinguals:
Kara-wa sūtsukēsu-o dōkutsu-ni
ireta [C-path in the main verb].
He-TOP suitcase-ACC cave-GOAL make enter-PST
‘He moved a suitcase into the cave.’ (JAP20cau)
(9) Multilinguals:
Kara-wa hoīru -o tento -ni ireta [C-path in the main verb].
He-TOP wheel-ACC tent-GOAL make enter-PST
‘He moved a wheel into the tent.’ (MUL21cau)
To access whether speakers from each group differed in the likelihood to encode Manner of Cause and Path, two separate logistic mixed-effect modelsFootnote 1, Footnote 2 were fitted using the glmer function from the lme4 package (Bates, Maechler, Bolker & Walker, Reference Bates, Maechler, Bolker and Walker2014) in R (R development core team, 2013). Within each model, the binary dependent variable was the selection of the target semantic element (yes or no) and the fixed effect was participant group. For path encoding, results showed that the inclusion of participant group did not significantly improve the model fit compared with the null model (χ2 (4) =7.29, p = 0.12), indicating that group is not a main effect. In other words, participants across different groups were equally likely to encode path in their linguistic description. For manner encoding, the fixed effect was participant group and the random effects are crossed random intercepts for participant and item. The inclusion of participant group significantly increased the model fit compared with the null model (χ2 (4) =479.44, p < .001), indicating that participant group is a main effect. Forward coding was used to compare the log-likelihood of manner of cause encoding with the next group. Results showed that bilinguals encoded more manner of cause than Cantonese monolinguals (βBilinguals-Cantonese = 1.59, SE = 0.23, Wald z = 6.70, p < .001) but patterned with English monolinguals (βBilinguals-English = −0.39, SE = 0.29, Wald z =−1.32, p = 0.18). Multilinguals encoded less manner of cause than Cantonese monolinguals (βMultilingual-Cantonese = −1.00, SE = 0.13, Wald z =−7.43, p <.001) but patterned with Japanese monolinguals (βMultilinguals-Japanese = 0.21, SE = 0.11, Wald z =−1.88, p = 0.06).
5.1.2 Semantic distribution of Manner of Cause and Path across five groups
The semantic distribution of C-manner and path is in line with the typological status of each language (Table 2). Being an S-language, English encodes manner of cause in the main verb (M = 97.8%, SD = 3.5) whereas path in the satellite (M = 97.6%, SD = 2.6). As encoding path in verb whereas manner in subordination is the default choice for V-language speakers in a boundary-crossing event, Japanese encodes path in the main verb (M = 95.0%, SD = 5.8) whereas manner in subordination (M = 96.3%, SD = 8.5). Cantonese, an equipollent-framed language standing midway on the continuum of S-and V-languages, encodes manner (M = 93.9%, SD = 9.8) and path (M = 96.1%, SD = 6.4) in verbs with equal grammatical status. Language-specific examples are given in (10), (11), and (12) below.
(10) English: C-manner in the main verb, path in the satellite
A boy is pushing [C-manner in the main verb] a box across [path in satellite] the road (ENG11cau).
(11) Japanese: C-manner in subordination, path in the main verb
Kara-wa sūtsukēsu-o oshite [C-manner in OTH]
michi-o watalimashita [Path in the main verb].
He-TOP suitcase-ACC pushing-GER
street-ACC cross PST.
‘He crossed the street pushing a box.’ (JAP12cau)
(12) Cantonese: C-manner and path in main verbs
Go3 naam4 zai2 zoeng1 go3toi2 teui1fann1
[C-manner + path in the verb] seoi6 fong2 (CAN5cau).
A boy DM a table push-enter
the bedroom
‘A boy pushed a table into the bedroom.’
Notes: The sum of the first two columns within each language group does not always add up to 100% as manner of cause and path of motion can be double-encoded in V and OTH at the same time (e.g.: The boy is [dragging]Verb a box up the hill [slowly] OTH). The denominator is the overall frequency of manner and path encoding.
To further address whether participants across each group have the same likelihood to use a manner verb or a path verb, two separate logistic mixed effect modelsFootnote 3, Footnote 4 were fitted with the usage of Manner verb or Path verb as the respective dependent variable. The fixed effect was language group. The random effects were crossed random intercept for participant and item. For the use of manner verbs, forward coding was used to compare the log-likelihood of manner encoding with the next group. Results showed that English monolinguals and bilinguals were equally likely to encode manner in the verb form, but more frequently than Cantonese monolinguals (βBilingual-Cantonese = 0.87, SE = 0.24, Wald z = 3.60, p < .001). However, multilinguals used significantly fewer manner verbs compared with Cantonese monolinguals (βCantonese-Multilingual = 4.83, SE = 0.19, Wald z = 24.67, p < .001), although not exactly resembling the patterns of Japanese monolinguals (βMultilingual-Japanese = −2.52, SE = 0.21, Wald z = −12.24, p < .001). For the use of Path verbs, bilinguals patterned with English monolinguals in not encoding path in the main verb. Multilinguals used more path verbs than bilingual speakers (βMultilingual-Bilingual = 9.52, SE = 0.36, Wald z = 26.1, p < 0.01) but fewer path verbs than Japanese monolinguals (βMultilingual-Japanese = −2.35, SE = 0.19, Wald z = −12.11, p < 0.01).
5.2 Similarity judgement of caused motion event
5.2.1 Categorical preferences of cause-manner-/path-match alternates across five groups
Figure 2 shows participants’ manner/path-match preferences in the subsequent categorization task. A logistic mixed effect modelFootnote 5 was built with participants’ categorical choice as a binary dependent variable and participant group as the fixed effect. Random effects included crossed-random intercepts for items and participants. The intercept represents the log-likelihood to choose a Manner-match alternate and the negative value illustrated that participants across language group had an overall preference for path-match alternate (β0 =−0.57, SE = 0.20, Wald z =−2.79, p = 0.005). Involving participant group as the fixed-effect didn't significantly optimize the model (χ2 (4) =8.86, p = 0.07) compared with the null model, indicating that group was not a main effect and participants across each group were equally likely to choose a Path-match alternate in motion event categorization (English: M = 65.56%, SD = 30.93%; Bilinguals: M = 61.67%, SD = 30.13%; Cantonese: M = 62.50%, SD = 28.85%; Multilinguals: M = 62.78%, SD = 33.52%; Japanese: M = 70.56%, SD = 25.40%).
5.2.2 Reaction time to Manner-/Path-match alternates across five groups
Participants’ RT to manner-and path-match preference was measured as a continuous variable and used to indicate participants’ efficiency in cognitive processing. The mean RT to manner-and path-alternate across each participant group was presented in Table 3.
A mixed effect modelFootnote 6 was built using the lmer function from the lme4 package (Bates et al., Reference Bates, Maechler, Bolker and Walker2014) with log-transformed RT as the continuous dependent variable. Fixed effects included participant group, preference type (i.e., manner- or path-match preference) and their interaction. Random effects included the crossed random intercepts for participant and item. The dependent variable was log-transformed to meet the assumption of the normality of residuals. The details of fixed-effect parameter estimates were given in Table 4. Results suggested that both participant group and the interaction with preference type were main effects.
Note: The intercept represents the log-transformed RT when the preference type is Path-match alternate and participant group is Cantonese.
To further address the interaction between participant group and preference types, five separate mixed effect models were built with log-transformed RT as the dependent variable and preference type as the fixed effect to address the within group difference. Random effects included crossed random intercepts for participant and item. The intercept for each model set path-match alternate as the benchmark for comparison. Results confirmed that the RT for manner- and path-match alternate in Cantonese monolinguals were equally the same. For English monolinguals (β0 = −0.14, SE = 0.04, t =−3.55, p < .001) and bilinguals (β0 = −0.12, SE = 0.04, t =−3.26, p = 0.001), their mean RT to manner-match alternate was faster than path-match alternate. However, for Japanese monolinguals (β0 = 0.14, SE = 0.04, t = 4.05, p < .001) and multilinguals (β0 = 0.10, SE = 0.03, t = 3.34, p < .001), their mean RT to path-match alternate was faster than manner-match alternate, as visualized in Figure 3.
In addition, another two mixed effect models were built with log-transformed RT as the dependent variable, and participant group as the fixed effect to address the between-group difference. Random effects included crossed random intercepts for participant and item. Results suggested that English monolinguals had the fastest RT in making manner-match choices than bilinguals (β0 = −0.11, SE = 0.04, t =−2.89, p = .03), multilinguals (β0 = −0.19, SE = 0.04, t =−4.79, p < .001) and Japanese speakers (β0 = −0.11, SE = 0.04, t =−2.61, p = .04), while Japanese monolinguals had the fastest RT in making path-match decisions than all the other four groups (vs. English, β0 = −0.18, SE = 0.03, t =−6.46, p < .001; vs. Bilingual, β0 = −0.26, SE = 0.03, t =−8.98, p < .001; vs. Cantonese: β0 = −0.11, SE = 0.03, t =−4.01, p < .001; vs. Multilingual: β0 = −0.22, SE = 0.03, t =−7.96, p < .001).
5.3. Factors predictive of cognitive restructuring in the multilingual mind
We further investigated for multilingual speakers, whether the amount of contact with each language affects their degree of cognitive restructuring in both linguistic and non-linguistic tasks. Following Athanasopoulos (Reference Athanasopoulos2009), language contact is defined as the amount of language use multilinguals have with each language and measured by totaling participant's amount of use of the L1, L2 and L3. On average, multilinguals used Cantonese 24.63% (SD = 9.22) of the time, English 26.20% (SD = 12.99) of the time and Japanese 49.16% (SD = 13.89) of the time. Thus, Japanese was the dominant language used in daily activities.
A logistic mixed effect modelFootnote 7 was set up with the absence or presence of manner verbs as binary dependent variable, the respective amount of L1, L2 and L3 use as fixed effects, and intercepts for items and participants as the random effects. For the use of manner verbs, it was positively associated with the amount of English use, whereas negatively correlated with the amount of Japanese use. However, the use of Cantonese was not a main predictor here, as illustrated in Table 5.
Following Ji and Hohenstein (Reference Ji and Hohenstein2018), a multiple linear regressionFootnote 8 was built with mean differences of RT in manner-match preference minus path-match preference as the dependent variable, and the respective amount of L1, L2 and L3 use as explanatory variables. Positive values of RT indicated longer reaction time in making manner-match choices whereas negative values represented longer time in path-match choices. Results showed that the more frequently English was used in daily communication, the faster participants reacted to manner-match choices; whereas the more frequently Japanese was used, the faster participants reacted to path-match choices.Footnote 9 However, Cantonese use was not a main predictor, as shown in Table 6.
6. General discussion
The current study goes beyond the bipartite classification of motion events with multilingual speakers of three typologically different languages and aims to investigate whether and to what extent acquiring an additional language gives rise to cognitive restructuring when the target language is actively involved in the decision-making process. It also addresses how the amount of language contact with each language modulates this process.
The first research question examined how multilingual speakers lexicalized caused motion in comparison with bilingual and monolingual controls of each language. Participants’ responses in event lexicalization were analyzed in terms of the frequency of manner and path selection and their semantic distribution (i.e., encoded by motion verb or satellite). Results of the monolingual data confirmed the typological constraints of each language. For manner encoding, English (S-language) expressed manner of cause more frequently than Cantonese (E-language). Meanwhile, Japanese (V-language) presented the lowest frequency of manner encoding. The differences in information selection can be attributed to the language-specific conflation patterns and availability of manner expressions in caused motion (Talmy, Reference Talmy2000). In Japanese, as cause of motion is frequently conflated with path in the main verb, there is no obligatory syntactic slot for the encoding of manner of cause. As a result, manner of cause can be easily added or dropped in the description. In addition, Japanese has a limited set of lexical devices for manner expressions (Matsumoto, Reference Matsumoto and In Y2017), such that speakers opted to use a more general expression to encode pure causation (i.e., put, take, carry). Being an equipollently-framed language, Cantonese most prototypically encodes manner and path in a verb-compound. Although Cantonese allows the conflation of cause with path in the main verb while not expressing manner at all, this construction is not used as frequently as that in Japanese.
For path encoding, all three monolingual groups reached a ceiling level, indicating that path is a central element in motion events (Talmy, Reference Talmy and Shopen1985, Reference Talmy2000). The results are in line with the manner salience hypothesis that cross-linguistic differences in event lexicalization are only found in the likelihood of manner selection (Slobin, Reference Slobin, Gumperz and Levinson1996, Reference Slobin, Strömqvist and Verhoeven2004). With regard to the semantic distribution of manner and path, English monolinguals predominantly encoded manner in the main verb like other S-languages (Slobin, Reference Slobin, Strömqvist and Verhoeven2004), whereas Japanese monolinguals encoded path in the main verb like other V-languages (Brown & Chen, Reference Brown and Chen2013; Spring & Horie, Reference Spring and Horie2013). Meanwhile, Cantonese monolinguals encoded manner and path in a verb-compound with equal grammatical salience (Francis & Matthews, Reference Francis and Matthews2006; Matthews, Reference Matthews, Aikhenvald and Dixon2006).
Turning to bilingual speakers, results suggested that bilinguals in L2 English largely patterned with English monolinguals in both manner selection (i.e., with high frequency) and semantic distribution (i.e., manner verb + path satellite). This suggests that bilingual speakers have fully acquired the L2-based lexicalization patterns due to early exposure and active use of the L2 in daily communication (Aveledo & Athanasopoulos, Reference Aveledo and Athanasopoulos2016; Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a; Bylund et al., Reference Bylund, Athanasopoulos and Oostendorp2013). For multilingual speakers in L3 Japanese, results indicate an ongoing cognitive restructuring towards the L3-based patterns as multilinguals with a high proficiency in Japanese presented a tendency of encoding manner less frequently, a typical characteristic of V-language speakers. We have ruled out the possibility that the lower frequency of manner encoding in L3 learners might be due to the incomplete acquisition of the target vocabulary or the use of avoidance as a communication strategy, because they have already mastered all target manner expressions in their descriptions of the control items.
In addition, multilingual learners presented a clear divergence from the L1-and L2-based patterns towards target L3-based patterns in using the “path verbs + manner subordinate” construction when describing a boundary-crossing event, a construction that has triggered difficulties for learners with contrastive linguistic features (Daller et al., Reference Daller, Treffers-Daller and Furman2011). There are two reasons to account for this. Firstly, as mentioned in the typology section, Cantonese is an E-language with properties of both S- and V-languages (Yiu, Reference Yiu2013). Although the most conventional way in Cantonese is the serial-verb construction, encoding manner in a subordinate form whereas path in the main verb is also used in oral description. Therefore, the partial overlap between the L1 and L3 facilitates learner's acquisition of the target forms (Ji et al., Reference Ji, Hendriks and Hickmann2011; Ji & Hohenstein, Reference Ji and Hohenstein2014). In addition, as multilinguals have already achieved an advanced level in the L3 and used Japanese as the predominant language in their daily communication (cf. Table 1), the active use of language facilitates the restructuring process towards the target linguistic forms (Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a; Park & Ziegler, Reference Park and Ziegler2014), which will be discussed with more details in the third research question.
The second research question probed into how multilingual speakers conceptualized caused motion in comparison with bilingual and monolinguals controls of each language. Two types of measurement were used: a categorical preference and reaction time. Results suggested that, on the one hand, participants preferred a path-match alternate irrespective of the language background in event categorization. However, the RT to manner- and path-match alternate was closely associated with language-specific lexicalization patterns, demonstrating a ‘thinking-for-speaking’ effect. One possible explanation for the lack of language-specific properties in the overt selection might be that path is the core element in motion events (Talmy, Reference Talmy and Shopen1985, Reference Talmy2000). Previous studies reported that children demonstrated a cognitive salience towards path in non-verbal behaviors before fully acquiring the language-specific patterns for motion event descriptions (Allen et al., Reference Allen, Özyürek, Kita, Brown, Furman, Ishizuka and Fujii2007; Ji & Hohenstein, Reference Ji and Hohenstein2018). The second possible reason might be that the inter-typological distinctions across languages are cline rather than categorical, such that the cross-linguistic differences in lexicalization might not be clear-cut enough for absolute distinctions in non-linguistic categorization (Ji & Hohenstein, Reference Ji and Hohenstein2017; Loucks & Pederson, Reference Loucks, Pederson, Bohnemeyer and Pederson2011).
In contrast, the RT of manner-and path-match selection presented clear language-specific patterns: English monolinguals reacted much quicker in making manner-match choices than path-match choices, had the fastest RT in making manner decisions than bilinguals, multilinguals and Japanese monolinguals. However, Japanese monolinguals reacted much quicker in making path-match choices than manner-match choices, and had the fastest RT in making path decisions than the other language groups. Meanwhile, Cantonese monolinguals had equal efficiency in making either manner-or path-match choices. In line with the ‘thinking for speaking’ and ‘thinking after language’ accounts (Slobin, Reference Slobin, Gumperz and Levinson1996; Wolff & Holmes, Reference Wolff and Holmes2011), when given a categorization task, speakers tend to draw on all resources of representation available, including the linguistic ones, to facilitate decision-making. Thus, the language-specific regularities made available in the linguistic encoding task tend to mediate participants’ performances in a subsequent non-linguistic task in language-specific waysFootnote 10 (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002; Montero-Melis & Bylund, Reference Montero-Melis and Bylund2017; Wolff & Holmes, Reference Wolff and Holmes2011). Previous studies have demonstrated that language effects are most likely to appear when the stimuli are complex or when the task has a time limitation (Filipovic, Reference Filipovic2018; Trueswell & Papafragou, Reference Trueswell and Papafragou2010). Thus, the different processing efficiency participants had in similarity judgements can be interpreted as a consequence of language mediation. In English, as manner is expressed in the main verb and used with high frequency, the high manner codabiltiy may contribute to a higher cognitive salience in mental representations which increases its accessibility in cognitive processing (Slobin, Reference Slobin, Strömqvist and Verhoeven2004). Based on the concept of cognitive grammar (Langacker, Reference Langacker2008), attention is drawn towards form-meaning associations that were highlighted by grammar. Speakers are more likely to access the highlighted linguistic elements when perceiving and retrieving relevant information from memory. Thus, as manner of motion is prominently marked in English, monolinguals of English may have attended to manner of motion at the first instance due to its higher salience. Although participants finally opted for path-match alternate, their reaction time to manner was much quicker. In contrast, as Japanese typically encodes path in the main verb whereas manner in subordination with relatively low codability, the easy access to path directed speakers’ attention to path at the first instance. This may facilitate the information retrieval of path and processing efficiency in making path-match choices. As for Cantonese, given that manner and path are typically expressed in a verb compound with equal salience, it is plausible to assume manner and path were retrieved “in a parallel fashion” with equal amount of attention being paid to both elements simultaneously (Ji & Hohenstein, Reference Ji and Hohenstein2017, Reference Ji and Hohenstein2018).
For bilingual speakers, results suggested that bilinguals patterned with English monolinguals in reacting much quicker to manner-match alternate than path-match alternate, indicating that early exposure to an L2 not only gave rise to the internalization of novel linguistic frames, but also the L2-specific way of ‘thinking for speaking’ in event perception. Turning to multilingual speakers, results showed that proficient multilinguals demonstrated a tendency towards Japanese monolinguals in reacting much quicker to path-match alternate than manner-match alternate when Japanese was at operation, indicating an ongoing process of cognitive restructuring in the multilingual mind. It is suggested that bi-and multilingual learners are able to reconstruct their conceptualization patterns towards the target language when speaking an L2 or L3. The results indicate that learning a new language means acquiring a new way of thinking and the L1’thinking-for-speaking’ patterns are subject to reconstructing in online thinking (Slobin, Reference Slobin, Gumperz and Levinson1996; Wolff & Holmes, Reference Wolff and Holmes2011). The findings are in line with previous studies that on the one hand, non-linguistic representations tend to be modulated by language-specific properties when the access to the target language is not blocked during or prior to event categorization (Montero-Melis & Bylund, Reference Montero-Melis and Bylund2017; Trueswell & Papafragou, Reference Trueswell and Papafragou2010); on the other, bi-and multilingual's conceptualization patterns are dynamic and susceptible to change with the language at operation (Athanasopoulos, Bylund, et al., Reference Athanasopoulos, Bylund, Montero-Melis, Damjanovic, Schartner, Kibbe and Thierry2015a; Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010; Lai et al., Reference Lai, Rodriguez and Narasimhan2014).
The third research question examined whether multilingual speakers’ linguistic and non-linguistic behaviors were modulated by the amount of contact with each language. Results suggested that the degree of conceptual restructuring in both verbal and non-verbal task was associated with the amount of language contact with L3-Japanese and L2-English. In other words, the more frequently participants used an L3, the more L3-based linguistic and non-linguistic patterns they were able to produce. The results can be explained in terms of entrenchment and routinisation (Langacker, Reference Langacker2008) that the frequent use of the target forms may lead to an entrenchment of corresponding conceptual categories. And the associations between language and conceptual representations can be strengthened by a large amount of exposure and frequent usage of the target language. The results were in line with previous studies that the more frequently a target language was used, the more likely participants would exhibit associated conceptualization patterns (Athanasopoulos et al., Reference Athanasopoulos, Damjanovic, Burnand and Bylund2015b; Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a; Bylund et al., Reference Bylund, Athanasopoulos and Oostendorp2013). Following this reasoning, due to the contrastive typological differences between L2-English and L3-Japanese, the frequent use of English may hinder the restructuring process towards the L3-based patterns in event lexicalization and conceptualization. However, the amount of L1 use did not serve as a core predictor in the current study. There were two possible reasons. Firstly, as Cantonese is an E-language with properties of both S-and V-languages, the less contrastive typological differences may eliminate the influence that language places on cognition. Secondly, as indicated by participants’ self-reported amount of language contact, the use of Cantonese only accounted for a quite small proportion compared with the predominant use of English and Japanese. Thus, the effect of L1 might be diminished due to the inactive involvement in daily communication.
7. Conclusion
The current study extends the scope of motion research by examining how Cantonese–English–Japanese multilinguals lexicalize and conceptualize caused motion in a boundary-crossing situation. Specifically, it explores how language-specific patterns in lexicalization affect different levels of cognitive processing by using two types of measurements: a categorical measurement of similarity judgements and a continuous measurement of reaction time. Findings showed that in event lexicalization, multilingual speakers demonstrated a clear trend towards the target language in encoding path in the main verbs whereas manner in subordination when describing a boundary-crossing event. Although no cross-linguistic differences were found in the categorical preferences of event categorization, reaction time illustrated that multilingual speakers presented an ongoing process of cognitive restructuring towards the L3 in reacting much quicker to path-match alternate than manner-match alternate. In both tasks, the amount of language contact with L2 and L3 served as main predictors for the degree of cognitive restructuring for multilingual speakers.
The current findings demonstrate that learning an additional language may continue shaping or influencing bi-and multilingual's cognitive processing when the target language is actively involved in the decision-making process. In other words, learners are able to acquire relevant structures of the target language and corresponding thinking patterns when provided with sufficient language-specific instances (Athanasopoulos et al., Reference Athanasopoulos, Damjanovic, Burnand and Bylund2015b; Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014a; Cadierno, Reference Cadierno, Han and Cadierno2010; Park, Reference Park2019). On the whole, the current findings show that learning a new language means acquiring an alternative way of thinking, and speakers can switch between distinct sets of thinking patterns depending on which language they are using. This new finding makes a timely contribution to the hypothesis of thinking-for-speaking, and sheds light on the complexity and diversity of how language shapes thought in the multilingual mind. This helps understanding how people learn multiple languages.
Future research may combine the measurement of reaction time with the use of the eye-tracking technique to explore participants’ attention allocation patterns during event perception. Also, other extra-linguistic factors such as language proficiency, length of immersion need to be taken into further consideration when examining the dynamic relationship between the progress of language learning and change of cognitive state in the bi-or multilingual mind.
Supplementary material
For supplementary material accompanying this paper, visit https://doi.org/10.1017/S1366728921000018.
Acknowledgements
First and foremost, we would like to thank all the participants without whom this study would not have been possible. Thanks also go to Dr. Zhaoliang Gu for his precious help with the video stimuli. We are extremely grateful to the editors and three anonymous reviewers for their insightful comments and feedback on earlier versions of this article.
Appendix A
Appendix B
A demonstration of the video stimuli used in the similarity judgement task (Item 1).