It has long been held that space is a domain where strong language-independent cognitive organization can be plausibly claimed because our perceptual representation of space is universally constrained by vision and other highly structured biological systems (Bowerman, Reference Bowerman, Bloom and Peterson1999, p. 387). Despite this universal biological heritage in spatial understanding, human languages are found to vary strikingly, in some fundamental ways, with respect to the expression of spatial relations between entities and one’s spatial experience. Individual languages, with their specific lexicalization properties, seem to invite their speakers to pay attention to different aspects of their spatial experience, or to pay different degrees of attention to a given spatial component. This raises an important question, among other things, in the field of second language (L2) acquisition: How does an L2 learner learn to express their spatial experience in a non-native language? To what extent does the native language influence his or her acquisition rhythms, and does an L2 learner undergo some reorganization or reformulation of his or her spatial experience at the cognitive level when encoding such experience in a second language? The present study aims to address these questions by examining how adult Chinese learners of English at different proficiency levels learn to express caused motion events as compared to Chinese and English native speakers.
1 The expression of motion events in first (L1) and second language (L2) acquisition
According to Talmy (Reference Talmy and Shopen1985), with respect to motion events, a more or less universal set of semantic information is expressed in all languages, which includes manner of motion (e.g., climbing, crawling, skipping), path of motion (e.g., up, across, along, towards), and cause of motion (e.g., pushing, kicking, rolling [transitive use]), apart from the entity in motion (i.e., Figure), the Ground with respect to which a motion event occurs, and the motion itself. Languages differ significantly in terms of the systematic association between given types of semantic element and surface expression. In most Germanic languages, such as English, the manner of motion is usually expressed in the main verb, whereas in most Romance languages, such as French, the path of motion is characteristically encoded in the main verb, with the manner information expressed (if at all) in the periphery of an utterance. In some other languages (e.g., serial verb languages), such as Chinese and Thai,Footnote 1 manner and path information can be represented simultaneously via grammatical elements such as compound verbs or co-verbs.Footnote 2 Example (1) illustrates such a difference between languages.
(1) a. The ball rolled into the hole.
b. La balle est entree dans el trou en rolland
the ball has entered in the hole by rolling
‘The ball entered the hole by rolling.’
c. 球 滚下 了 小洞。
Qiu2 gun3-jin4 le xiao3dong4
ball roll-intoFootnote 3aspFootnote 4 hole
‘The ball rolled into the hole.’
Based on the different ways in which the semantic components for motion are expressed across an utterance − particularly with regard to the expression of Path − Talmy (Reference Talmy2000) proposes a general typological framework within which languages mainly fall into two broad categories: satellite-framed (S-language hereafter) and verb-framed (V-language hereafter). In the former (example (1a)), the verb typically conflates Motion and Manner of motion and/or Cause of motion, while Path is encoded outside the verb in a satellite (e.g., through the use of particles and affixes), whereas in the latter (example (1b)), the verb typically conflates Motion and Path, while Manner and Cause are expressed separately in an adverbial or a gerund (when expressed at all).
An increasing number of cross-linguistic studies on spatial expressions have confirmed that the typological properties of a given language greatly influence the way speakers of this language select, express, and organize spatial information over an utterance and at the discourse level. Such language-specific impact had been detected in diverse subdomains of space such as frameworks of spatial reference, static spatial configurations, expressions of translocational motion events, and the overall narrative style of a discourse (Allen, Ozyurek, & Kita, Reference Allen, Ozyurek and Kita2007; Berman & Slobin, Reference Berman and Slobin1994; Hohenstain, Eisenberg, & Naigles, Reference Hohenstain, Eisenberg and Naigles2006; Hohenstein, Naigles, & Eisenberg, Reference Hohenstein, Naigles, Eisenberg, Hall and Waxman2004; Ji, Hendriks, & Hickmann, Reference Ji, Hendriks and Hickmann2011a, 2011b; Naigles & Terrazas, Reference Naigles and Terrazas1998; Slobin, Reference Slobin, Shibatani and Thompson1996b, Reference Slobin, Strömqvist and Verhoeven2004, to name a few). To give an example, Hickmann (Reference Hickmann, Hickmann and Robert2006) and Hickmann and Hendriks (Reference Hickmann and Hendriks2010) examined the description of voluntary motion events by French-speaking and English-speaking children (ages four and six) as well as adults, and reported that typological properties (verb-framed versus satellite-framed) affected the semantic density of children’s utterances. Regardless of age, English-speaking children tended to express denser information than their French-speaking counterparts, resulting from the availability in English of easily accessible Verb + Satellite constructions.
Findings such as these led some scholars to suggest that language-specific properties could even shape our spatial conceptualization at the deeper level of cognition (see, for instance, Bowerman, Reference Bowerman, Gumperz and Levinson1996, Reference Bowerman, Bloom and Peterson1999; Bowerman & Choi, Reference Bowerman, Choi, Bowerman and Levinson2001; Hohenstein, Reference Hohenstein2005; Levinson Reference Levinson2003; Lucy Reference Lucy1992). Slobin’s (Reference Slobin, Gumperz and Levinson1996a) ‘thinking for speaking’ hypothesis represented a weak version of this language-specificity view. It was claimed that there is a kind of thinking that is intimately tied to language: the thinking carried out on-line in the process of speaking, writing, signing, or listening. ‘Thinking for speaking’ involved identifying those characteristics of entities and events that fit some conceptualization of the event and were readily encodable in the language. In this sense, a native language one learns in childhood is a system “that has trained its speakers from early on to pay different kinds of attention to events and experiences when talking about them; this training is carried out in childhood and is exceptionally resistant to restructuring in adult second language acquisition” (Slobin, Reference Slobin, Gumperz and Levinson1996a, p. 89).
Compared to the vast bulk of research on motion expressions in first language acquisition, far fewer studies have systematically investigated motion expressions in the context of second language acquisition. However, in recent years there seemed to be an increasing interest in L2 acquisition of motion language. Relevant studies involved both satellite- and verb-framed languages, mostly concerning typologically contrasting L1s and L2s (e.g., Engemann, Harr, & Hickmann, Reference Engemann, Harr, Hickmann, Filipovic and Jaszczolt2012; Luk, Reference Luk, Filipovic and Jaszczolt2012; Marotta & Meini, Reference Marotta, Meini, Filipovic and Jaszczolt2012; Vidakovic, Reference Vidakovic, Filipovic and Jaszczolt2012). These studies mainly focused on on-line speech in production tasks, but also touched upon the topic of using accompanying gestures in speech (see, for instance, Gullberg, Reference Gullberg, Robinson and Ellis2008; Gullberg & McCafferty, Reference Gullberg and McCafferty2008). It needs to be noted, however, that quite a number of studies on L2 acquisition of motion, caused motion in particular, seemed to generate divergent results regarding the influence of L1 typological pattern on the L2 acquisition process and rhythm (e.g., Cadierno, Reference Cadierno, Achard and Niemeier2004; Cadierno & Ruiz, Reference Cadierno and Ruiz2006; Hendriks, Hickmann, & Demagny, Reference Hendriks, Hickmann and Demagny2008; Navarro & Nicoladis, Reference Navarro, Nicoladis and Eddington2005).
Two issues were at the centre of this controversy. First, will L2 learners encounter greater acquisitional difficulties when encoding motion events in an L2 that is typologically contrastive to L1? Hendriks et al.’s (Reference Hendriks, Hickmann and Demagny2008) study seemed to provide an affirmative answer to this question. They examined how English learners at intermediate and advanced levels described caused motion events as compared to native speakers of English (satellite-framed) and French (verb-framed) in aspects such as the overall semantic density of their responses and the particular devices used to express motion components. Their results showed that L2 learners produced utterances that were less dense than native speakers, irrespective of proficiency level. Although they frequently expressed cause and manner in their responses, they did not systematically express path. In particular, English learners of French basically relied on an English organization pattern of motion components, resulting in idiosyncratic uses of French devices at the intermediate level and in failing to achieve target performance, even at the advanced stage (e.g., pousser en montant ‘to push ascending’). Their results suggested that in describing complex motion events, the transfer of the L1 typological pattern was evident, especially at the level of discourse organization.
However, when Navarro and Nicoladis (Reference Navarro, Nicoladis and Eddington2005) examined how English (S-language) native speakers acquired motion expressions in another typical V-language, Spanish, they arrived at a different conclusion. They focused on whether L2 learners were able to acquire the characteristic conflation pattern of frequently mapping path onto the main verb. Their findings showed that L2 learners did not differ significantly from native speakers in the amount of path conflation verbs used, although they used bare path verbs to a lesser degree as compared to native speakers. These results were largely consistent with those obtained from the studies of Cadierno (Reference Cadierno, Achard and Niemeier2004) and Cadierno and Ruiz (Reference Cadierno and Ruiz2006). The latter group of researchers systematically examined how Danish (S-language) learners of Spanish (V-language) expressed motion events. It was reported that although L2 learners showed some traces of the L1 typological pattern in using fewer manner verb types whilst relying on complex and elaborated path descriptions as compared to native speakers of Spanish, they did not use event conflation frequently (a construction commonly used in their L1 but not normally allowed in their L2), and, like Spanish native speakers, they provided more descriptions of static setting for motion rather than trajectories of motion (see Cadierno, Reference Cadierno, Robinson and Ellis2008, for a detailed review of these studies). As commented by Cadierno (Reference Cadierno, Han and Cadierno2010), these studies suggested that learners with a typologically different L1 and L2 had almost fully acquired the target lexicalization pattern of motion events, and had been able to restructure their ‘thinking for speaking’ patterns when talking about motion in an L2 (note that the learners in these studies were predominantly intermediate and advanced learners).
The second issue concerns whether the typological similarity between L1 and L2 can facilitate an L2 learner’s acquisition process. Again, previous studies produced contradictory results. To give an example, detailed description of manner of motion is a characteristic phenomenon in S-languages such as Danish but not in V-languages like Spanish. A study conducted by Cadierno and Ruiz (Reference Cadierno and Ruiz2006) showed that when asked to narrate a picture book presenting motion scenes, Italian learners of Spanish (both L1 and L2 were V-languages) elaborated on manner of motion in no lesser degree than did Danish learners of Spanish (L1 S-language versus L2 V-language). In contrast, Cadierno (Reference Cadierno, Han and Cadierno2010) examined, in a similar study, how German and Russian speakers of Danish (all satellite-framed languages) expressed boundary-crossing motion events as compared to Spanish learners of Danish (L1 V-language versus L2 S-language). Her results showed this time that the former group of learners used the characteristic Danish construction of manner verb + path particle to a significantly larger extent than the latter group, showing the facilitating impact of typological resemblance on the acquisition of the L2 system for motion expressions.
Needless to say, such a great divergence, as reviewed above, in interpreting the role of L1 in L2 acquisition of motion expressions can be attributed to variations along a number of dimensions, such as the experimental stimuli used (static motion pictures versus dynamic motion video clips), the foci of investigation (lexical means to express a given type of motion component versus discourse organizing devices to distribute varied motion components), the degree of simplicity of the L2 pattern (systematic and lucid pattern in S-languages versus variable and opaque pattern in V-languages), and the nature of motion events investigated (spontaneous versus caused; telic versus non-telic). To illustrate, Cadierno (Reference Cadierno, Achard and Niemeier2004), Cadierno and Ruiz (Reference Cadierno and Ruiz2006), and Navarro and Nicoladis (Reference Navarro, Nicoladis and Eddington2005) focused, in their investigations, on L2 learners’ motion descriptions at a micro-level of semantics such as the amount of manner verb types, the elaboration on path information, and the conflation of path onto main verbs, whereas Hendriks et al. (Reference Hendriks, Hickmann and Demagny2008) devoted more attention to the organization of motion information at a macro-level of the semantics–syntax interface: how a given motion component was encoded at different loci across an utterance and whether selected motion information was packaged in a compact or discursive way. The latter aspects involved a switch of more advanced skills of discourse organization on the part of L2 learners, which may need a longer time to be adapted to; therefore even advanced learners in Hendriks et al.’s (Reference Hendriks, Hickmann and Demagny2008) study had difficulty shaking off their L1 typological pattern completely.
As is clear from the above review, the number of studies on L2 acquisition of motion expressions is still limited, and previous investigations diverged greatly on the role of the L1 in the L2 acquisition process. Most studies concerned Indo-European languages only (but see Chen & Ai, Reference Chen, Ai and Xiao2009, and Zeng, Reference Zeng2011, for exceptions) and involved learners at relatively advanced levels only. In this context, the present study aims to fill in a gap along this line of research by extending the investigation to serial verb languages with distinctive typological properties (i.e., L1 Chinese versus L2 English; see details in Section 2), and by examining motion expressions among L2 learners at three different levels (low, intermediate, and advanced) with an aim of revealing more clearly the developmental path across proficiencies (if any).
2 The status of L1 English and L2 Chinese in motion-event typology
A large number of previous studies on spatial expressions all confirmed that English is a typical S-language. In encoding caused motion events, the characteristic lexicalization pattern is to combine cause and manner in the main verb whilst expressing path in verb-supporting elements such as particles (i.e., satellites), and this pattern has been found to be prevalent across different contexts, such as in controlled experimental situations and in spontaneous oral and written discourses.Footnote 5
There has been, however, a great deal of controversy about the exact status of Chinese in motion-event typology (Chen, Reference Chen2005; Chen & Guo, Reference Chen and Guo2009; Ji et al., Reference Ji, Hendriks and Hickmann2011c; Slobin, Reference Slobin, Strömqvist and Verhoeven2004; Talmy, Reference Talmy2000, Reference Talmy, Guo, Lieven and Budwig2009). In Chinese, motion events are mainly expressed in a Resultative Verb Compound (RVC), which typically consists of three parts, with the first part depicting manner of motion, the second part path of motion, and the (optional) third part the deixis of motion (e.g., pa-shang-lai ‘climb-up-towards the speaker’). Traditionally the second constituent in an RVC was considered as a satellite, comparable to English verb particles, and Chinese was thus categorized as an S-language (Talmy, Reference Talmy and Shopen1985, Reference Talmy2000). However, an increasing number of recent studies all argued that the typological properties of Chinese were much more complicated and flexible than assumed in previous research (e.g., Slobin, Reference Slobin, Strömqvist and Verhoeven2004; Zlatev & Yangklang, 2004). Slobin (Reference Slobin, Strömqvist and Verhoeven2004) pointed out that the second constituent in an RVC is more akin to a verb than a satellite because, unlike English particles, this element can function as an independent verb in motion expressions (example (2) versus (3)). Therefore, he suggested that Chinese and other serial verb languages constitute a third type along the S-framed and V-framed continuum: an ‘equivalently-framed’ language in which both manner and path are encoded in elements that are equal not only in formal linguistic terms but also in the force of significance (e.g., pa-shang-lai ‘climb-ascend-towards the speaker’).
(2) a. 他 跑上 了 楼梯。
Ta1 pao3-shang4 le lou2ti1.
he run up aspFootnote 5 stairs
‘He ran up the stairs.’
b. 他 上 了 楼梯。
Ta1 shang4 le lou2ti1.
he ascend asp stairs
‘He went up the stairs.’
(3) a. He ran up the stairs.
b. *He up the stairs.
A series of recent studies by Hickmann and Hendriks (Reference Hickmann and Hendriks2010) and Ji, Hendriks, and Hickmann (Reference Ji, Hendriks, Hickmann, Marotta and Lenci2010, 2011c) went beyond the controversial issue of the part of speech of a given linguistic element and explored motion expressions in Chinese at the discourse level. Their findings revealed that, as far as caused motion events were concerned, two lexicalization patterns were used by native speakers in comparable frequencies. One pattern was to combine cause and manner in the verb whilst encoding path in ‘verb complements’ (as categorized by Talmy, Reference Talmy and Shopen1985; example (4a)). In this case, Chinese showed clear properties of an S-language. The other pattern was to encode manner and cause in the periphery of an utterance via a subordinate clause whilst encoding path (or path plus manner via an RVC) in the main clause of an utterance (example (4b)). Under this circumstance, Chinese demonstrated clear evidence of a typical V-language.
(4) a. 他 把 球 推 过 马路。
Ta1 ba3 qiu2 tui1 guo4 ma3lu4
He ba ball push across street
‘He pushed the ball across the street.’
b. 他 推 着 球 过 马路。
Ta1 tui1 zhe qiu2 guo4 ma3lu4
He pushdur ball cross street
‘He, pushing the ball, went across the street.’
These findings were in line with what Slobin (Reference Slobin, Strömqvist and Verhoeven2004) found in his ‘frog story’ narratives, that is, Chinese native speakers used manner verbs (e.g., flying out of) and path verbs (e.g., exited flying) in equivalent proportions. They also echoed with Talmy’s (Reference Talmy, Guo, Lieven and Budwig2009) recent proposal that Chinese is a language with ‘parallel systems’ in expressing motion events. In addition, in the latter typological pattern, the joint use of a verb compound and other related grammatical constructions facilitates simultaneous encoding of multiple motion components over an utterance, and meanwhile leads to a great degree of explicitness or elaboration in expressing a given component (see details in Section 4.2).
Seen in this way, Chinese represents a particularly interesting case for the study of the L2 acquisition of motion expressions. In comparison to English (S-language), the case of typological similarity (i.e., Chinese’s satellite-framing properties), and the case of typological difference (i.e., Chinese’s verb-framing properties) converge upon this individual language. It thus becomes meaningful to see whether and to what extent these distinctive typological properties of the L1 influence the L2 learners’ acquisition process. Will Chinese learners of English be more aware of the similarity and adopt the target system from an initial stage of their acquisition, or will they be more aware of the difference and have difficulties in fully adapting to the target pattern (e.g., at less advanced levels), especially when this target-deviant pattern best suits the specific requirement of the task they are asked to perform?
In the present study, we generally investigate the role that L1 plays in the process whereby L2 learners acquire the target system of caused motion expressions, and we are particularly interested to see which aspect(s) of the L1 typological pattern is attended to and selected by L2 learners and expressed in the target language. Specifically, we will examine: (a) whether and to what extent the caused motion expressions by English L2 learners resemble or differ from those of native speakers; and (b) whether and to what extent the caused motion expressions by L2 learners develop with proficiency. Two lines of investigation will be pursued:
(a) The expression of multiple information types: (i) what information types (cause, manner, path) are characteristically encoded in an utterance, and (ii) how many times a given information type (i.e., cause, manner, or path) is expressed across an utterance (i.e., degree of explicitness).
(b) The syntactic packaging of selected information types: (i) where a given motion component is placed in an utterance, and (ii) how selected motion components are syntactically organized over an utterance.
Given that both languages under discussion have readily accessible linguistic means to facilitate the encoding of multiple information types in an utterance, we predict that L2 learners, across three proficiency levels, will produce utterances of multiple information types (i.e., Cause + Manner + Path) as frequently as native speakers do. As to other aspects under examination, two hypotheses are possible. First, if L2 learners, from the very beginning, attend to how similar the L1 and L2 are in terms of caused motion expressions, they will choose to express the most important type of manner information (i.e., manner of cause such as pushing or rolling) in the main verb and the most important type of path information (i.e., trajectory itself) in verb particles in a syntactically simplex clause. In this case, the factor of proficiency will have little role to play in the acquisition process, and it is predicted even L2 learners with low proficiency will mainly produce responses that are generally target-like. Alternatively, if L2 learners are more aware of the typological difference between the L1 and L2, and how this difference can help meet the communicative challenge presented by the task, they will frequently opt for expressing more varied motion components in their responses and to a larger extent of explicitness, as is facilitated by the target language (e.g., manner elaboration, path elaboration). Further, they should distribute these components at different loci across an utterance (e.g., within and outside the main verb) in syntactically more discursive ways (e.g., coordinated or juxtaposed clauses, gerunds). Such a propensity is predicted to be more visible in L2 learners of low proficiency than in more advanced learners, presuming that the influence of the L1 is stronger at the initial stage of language acquisition and that such an influence will disappear as the process of acquisition advances (see also Cadierno, Reference Cadierno, Han and Cadierno2010, p. 8).
3 Methodology
3.1 participants
There were sixty adult participants in the current study; all were university or technical school students who were divided into five groups with twelve students per group (six males, six females). Two control groups of Chinese or English monolingual speakers came from a Technical School of Telecommunication in China and Stanford University (US), respectively. The three English L2 learner groups consisted of students from Beijing University, whose English proficiency was at low, intermediate, and advanced levels respectively (see Table 1). The L2 learners’ proficiency levels were determined by three national tests of English as a second language (China): the College English Test Band 4 and Band 6 for non-English majors (CET-4 and CET-6, respectively), and the Test for English Major Band 8 (TEM-8).Footnote 6 These tests are administered by the Committee of Higher Education of China and held twice a year across Chinese degree-granting universities. All L2 participants in our study had taken the above-mentioned tests in the six months before the experiment. Students who passed CET-4 (but not any other tests indexing a higher level of proficiency) were considered at the low English proficiency level, those who passed CET-6 (but not any other tests indexing a higher level of proficiency) were categorized at the intermediate level, and those who passed TEM-8 were assessed at the advanced stage of English learning.
table 1. Groups of participants in the present study
All L2 learners (Mean age = 20.9 years) had similar learning backgrounds, with their first exposure to English at roughly the age of twelve. All learners acquired English in a predominately Chinese-speaking community and their English input came primarily from classroom teaching and some extracurricular activities involving English (e.g., English study focus groups, TV or radio English channels).
3.2 materials
The stimuli of the present study were consistent with models developed by Hickmann, Hendriks, and Champaud (Reference Hickmann, Hendriks, Champaud, Guo, Lieven and Budwig2009) in the sense that both represented a specific type of caused motion involving accompanied movement on the part of the agent. However, different from models in previous studies, the current stimuli covered as many as eight paths of motion (see below), only four of which (up, down, across, and into) were investigated before. Our stimuli consisted of sixteen short video clips of caused motion events (5 seconds each). As shown in ‘Appendix 1’, they depicted a child (Bonny) doing a specific action to an object which changed its location due to the external force, and meanwhile the child accompanied the moving object (by walking) throughout its course of movement. These video clips presented eight paths of motion in all, falling into four major categories: the path of verticality (up and down), the path of boundary-crossing (across and into), the path of deixis (towards and away from), and the path with a course parallel to and close to the ground (along and around). The inclusion of more varied paths aims to make the findings of the present study more generalizable across all major path contexts. Following Hickmann et al. (Reference Hickmann, Hendriks, Champaud, Guo, Lieven and Budwig2009), the video clips also presented four types of causative actions by the protagonist (i.e., the child): either pushing (pulling) or rolling (sliding [transitive use]), as a result of which the objects concerned either rolled or slid. For instance, A10 in ‘Appendix 1’ depicted the child rolling a barrel of beer around a round table, and the barrel rolled around the table whilst the boy walked behind it (see illustration in ‘Appendix 2’). There were sixteen video stimuli in all and they were presented to participants in two orders, A and B (A reversed) to counterbalance for order effects. During the testing session, each participant produced sixteen responses to the stimuli, which led to a total of 192 responses per group (12 participants in each group).
As analyzed by Hickmann et al. (Reference Hickmann, Hendriks, Champaud, Guo, Lieven and Budwig2009), each stimulus contained three major categories of motion information, as presented in (5).
(5) a. Manner of motion [M]
– Manner of the Agent’s motion [Ma]: walking
– Manner of the Object’s motion [Mo]: rolling, sliding
– Manner of the Agent’s Causative Action [Mc]: pushing (pulling), rolling (sliding)
b. Path of motion [P] (regarding the Agent and/or the object):
up, down, across, into, towards, away from, along, around
c. Causality [C] (the causer–causee relation between the Agent and the object)
3.3 procedure
The testing was conducted individually in a quiet classroom at the participant’s university or school. Each session lasted, on average, for 20 minutes, during which the participant was asked to describe as completely as possible the caused motion scenes to an imaginary remote listener. The question asked, after each short video clip, was always “What happened?”, and the playing was paused so that the participant had time to respond. Each session began with a training item (see ‘Appendix 1’), which served to familiarize participants with the types of information (e.g., different manners of motion, path of motion) they were expected to provide during the experiment. All sessions were audiotaped and later transcribed by fluent speakers of English and/or Chinese.
3.4 coding
The methods of coding and analyzing the data partially followed those developed by Hickmann et al. (Reference Hickmann, Hendriks, Champaud, Guo, Lieven and Budwig2009) for the English and French languages. Our methods greatly differ from theirs in several important aspects. First, we examine all clauses in one response in our qualitative analysis (e.g., including coordinated or juxtaposed clauses and subordinated clauses), whereas Hickmann et al. tended to single out one target clause in a multiple-clause response (based on principles such as ‘semantic richness’ or ‘path priority’), leaving one of the clauses in coordination or juxtaposition unattended to in their statistical analysis. In other words, the basic unit of analysis in our study is ‘utterance’ rather than ‘clause’. The main reason for this decision is that Chinese speakers, when describing a similar set of dynamic cartoons, tended to provide particularly rich information regarding caused motion, and distribute them beyond the boundary of a single clause, as revealed by previous studies (e.g., Ji et al., Reference Ji, Hendriks and Hickmann2011a, Reference Ji, Hendriks and Hickmann2011b, Reference Ji, Hendriks and Hickmann2011c). Second, our analysis investigates not only information focus and locus in caused motion expressions, but also particular syntactic constructions for which the participant opted in packaging selected motion components over an utterance, whereas Hickmann et al. (Reference Hickmann, Hendriks, Champaud, Guo, Lieven and Budwig2009) were primarily interested in the former issues.
Finally, and most importantly, in coding path information, the method used by Hickmann et al. (Reference Hickmann, Hendriks, Champaud, Guo, Lieven and Budwig2009) was such that the varied types of path information were all collapsed into one path component, no matter whether they depicted the trajectory itself or further elaborated on the path. This is quite different from their decision to subcategorize manner information into varied types (i.e., Ma, Mo, Mc). The present study holds that it is of great importance to differentiate between path information of varied natures: the trajectory itself (e.g., up, across, along) versus the path elaboration, the latter of which mainly refers to fine details on path information such as the source and/or the goal of motion (e.g., from ..… to), the deixis of motion (towards / away from the speaker), the direction of motion (forward, backward) and the route of motion (in a circle). It also includes here occasional cases where only a general location for motion is provided (e.g., beside a tunnel, on the ice). It is equally important not to lump path elaborations together, as such elements present path information from different perspectives and are thus worth being considered individually. The difference in coding path information between our method and the system developed by Hickmann et al. can be seen most clearly from example (6).
(6) a. The boy pulled [C+Mc] the toy car across [P] the icy lake.
b. The boy pulled [C+Mc] the toy car across [P1] the icy lake from [P2] the right side to [P3] the left side onto [P4] the ground.
According to Hickmann et al. (Reference Hickmann, Hendriks, Champaud, Guo, Lieven and Budwig2009), all sorts of path information in example (6b) were lumped into one path component and the utterance as a whole had the same overall semantic density with example (6a), viz., C + Mc + P. In contrast, in our coding scheme, example (6b) encoded not only boundary-crossing (i.e., across), but also fine details that further present this trajectory from three dimensions: the source of motion (i.e., from), the goal of motion (i.e., to), and the endpoint of motion (i.e., onto). The utterance as a whole was thus considered to encode four path components (i.e., P1-trajectory + P2-source + P3-goal + P4-endpoint), and therefore claimed a greater degree of path explicitness over example (6a).
4 Results
4.1 the expression of multiple information types: an overview
A complete narration of our video clips requires the presence of three information types [IT], viz., cause (C), at least one type of manner information (M), and at least one type of path information (P). In this section we grouped responses into three categories according to the number of information types included: IT=3, IT=2, and IT=1. We calculated their frequency of occurrence over the total number of responses per group (192).
As shown in Figure 1, both Chinese and English native speakers encoded all three information types in their responses in a high-to-ceiling frequency (98% in both cases). Similarly, all English L2 learners across proficiency levels predominantly produced utterances with multiple information types (low: 88% in a mean frequency, intermediate: 93% and advanced: 97%). A Kruskal–Wallis test was performed on this set of non-normally distributed data and confirmed that there was no significant difference in the frequency of utterances with IT = 3 across all five participant groups (χ2 (4,55) = 7.215, p =.125). Note, however, that there was a tendency of increasing information types from the low-intermediate to the advanced, though such an increase was not statistically significant. Learners of the former groups occasionally gave responses containing two information components (mostly cause and path, as shown in examples (7a) and (7b)), or rarely one component only, which is invariably path (examples (7c) and (7d)), thus confirming Talmy’s (Reference Talmy and Shopen1985, Reference Talmy2000) hypothesis that path is the most basic and central dimension of a motion event.
(7) a. He put a bag of money up the pyramid. [C+P] (Beg02A)Footnote 7
b. The boy move [moved] the trunk away [from] the tent. [C+P] (Int08B)
c. Bonny keep [kept] the toy car across the ice lake. [P] (Beg03B)
d. Bonny put the treasure bag into the pyramid. [P] (Int01A)
Fig. 1. The overall expression of information types by L2 learners as well as native speakers.
4.2 elaboration on manner and path information
In this section we focus on the degree of explicitness of a given type of motion information: how often manner (conflated with cause) or path is encoded in an utterance, and at each occurrence, which facet of the semantics of manner or path is emphasized. Qualitative analysis was used to supplement our statistical findings.
4.2.1 Utterances with varied levels of manner elaboration
As mentioned in Section 3.4, manner information was coded as including three subcategories: manner of the Agent’s causative action [Mc], manner of the Agent’s motion [Ma], and manner of the Object’s motion [Mo]. Responses were thus classified as having different levels of manner elaboration (i.e., ME=0, ME=1, ME=2, and ME=3; see example (8) for illustration).
(8) a. The boy walked [Ma] along a row of trees pushing [Mc] a big ball and the ball rolled [Mo] forward. [ME=3]
b. The boy walked [Ma] along a row of tress kicking [Mc] a ball. [ME=2]
c. The boy is walking [Ma] along a row of trees. [ME=1]
d. The boy went around the table. [ME=0]
Figure 2 represents the frequency of responses with different levels of manner elaboration over the total number of responses within each group (192).
Fig. 2. Frequency of responses with different levels of manner elaboration in learner groups as well as native groups.
As shown in Figure 2, responses with ME=1 constituted the main response type within each group. Chinese native speakers seemed to produce responses encoding two types of manner components sometimes (ME=2). Participants across groups only occasionally provided utterances encoding no specific manner (ME=0). Semantically densest responses expressing three manner types were not found. Given that there was not enough variability in the dimension of utterances with ME=0 to warrant one more level of comparison, we conducted a two-way mixed ANOVA test with the participant group as a between-groups factor (CNS, Beg, Int, Adv, and ENS) and repeated measures on levels of manner elaboration (ME=2 and ME=1). The results revealed a significant interaction effect between manner elaboration and participant group (F(4,55) = 27.779, p =.000), apart from the significant main effects of manner elaboration (F(1,55) = 1267.519, p =.000) and participant group (F(4,55) = 2.561, p =.048), respectively. Two one-way ANOVA tests were performed on the relevant raw data to follow up the above findings, one for each manner elaboration level. The results further suggested that:
(a) There was a significant difference in the frequency of utterances with two manner components across five groups (F(4,55) = 43.013, p =.000). Post-hoc comparisons with Bonferroni correction further revealed that Chinese native speakers provided responses with ME=2 significantly more frequently as compared to their English counterparts as well as each of the three learner groups. No other significant differences between groups were detected.
(b) Similarly, there was a significant difference in the frequency of utterances with one piece of manner information across groups (F(4,55) = 13.119, p =.000). Post-hoc comparisons with Bonferroni correction showed that each of the three learner groups, in addition to English native speakers, produced responses with ME=1 significantly more frequently than did the native speakers of Chinese. No other significant contrasts were reported between any other pair of participant groups (cf. Figure 2).
The above results suggested, first of all, a strong effect of typology on language use. English native speakers followed satellite-framing properties of their language in systematically encoding manner of cause (Mc) only in their responses. In contrast, native speakers of Chinese tended to provide one more piece of manner information and gave utterances with ME=2 in a mean frequency of 33%, which was achieved by adopting some specific syntactic constructions that demonstrate the verb-framing facet of the Chinese language (see more details in Section 4.3.2). Two subcategories of manner information encoded in this circumstance were found to be manner of cause (Mc) and manner of the agent’s motion (Ma), as illustrated in example (9).
(9) 邦尼 推 着 一 捆 木柴
Bonny tui2 [Mc] zhe yi4 kun3 mu4chai2
Bonny push dur one cl wood
走 向 了 一 堆 篝火。(CNS04B)
zou3 [Ma] xiang4 le yi4 dui1 gou1huo3
walk towards asp one cl fire
‘Bonny, pushing a pile of wood, walked towards the fire.’ [ME=2: Mc+Ma]
Second, as to learner groups, the statistical results confirmed that L2 learners across proficiencies produced utterances with ME=1 equally frequently as compared to English native speakers. This finding was largely in line with our first hypothesis. L2 learners at low and intermediate proficiencies, in addition to an expectedly high proportion of advanced learners, resembled English native speakers versus Chinese native speakers in predominantly using a single manner–cause–verb in their narration. These results generally held across all eight types of path and across participants within a group. Example (10) provides illustrations of this general tendency.
(10) a. Bonny is pushing a car across a [an] icy lake. (Beg07B)
b. Bonny is pushing a bunch of wood far away from the fire. (Int06A)
c. He’s pushing the bundle of hay up the ladder. (Adv03B)
Further, our statistical report revealed no significant developmental progression concerning the frequency of responses with ME=1 between any pair of learner groups. The factor of proficiency, in this case, seemed to have a very limited role to play.
4.2.2. Utterances with varied levels of path elaboration
As mentioned earlier, path information was coded as varying in nature (i.e., trajectory itself versus fine details of path) in our study. Depending on how many facets of the path meaning were mentioned, utterances were grouped as having different levels of path elaboration (i.e., PE=0, PE=1, PE=2, and PE=3; see example (11)).
(11) a. The boy pulled the big box along [trajectory] the tunnel from the entrance near to us [source of motion] to the far end of the tunnel [goal of motion]. [PE=3]
b. The boy pulled the toy car across [trajectory] the iced lake onto [endpoint of motion] the right bank. [PE=2]
c. The boy pulled the car across [trajectory] the lake. [PE=1]
d. This is a tunnel and the boy is pulling a large box there. [PE=0]
Figure 3 shows the frequency of responses with different levels of path elaboration over the total number of responses within each group (192).
Fig. 3. Frequency of responses with different levels of path elaboration in learner groups as well as native groups.
As illustrated in Figure 3, participants across groups seemed to mainly produce utterances expressing one type of path information (PE=1). Chinese native speakers seemed to demonstrate more flexibility in providing responses with more varied levels of path elaboration (PE=2 and PE=3). Utterances encoding no specific path information were very rarely found among participants (PE=0). A further look at the data showed that responses with PE=3 were only used by native speakers of Chinese occasionally and there was not enough variability along this dimension to warrant a third level of comparison. Therefore, we performed a 5×2 mixed ANOVA test with the participant group as a between-groups factor (CNS, Beg, Int, Adv, and ENS) and levels of path elaboration as a within-subjects factor (PE=2 and PE=1). Our results indicated that there was a significant interaction effect between path elaboration and participant group (F(4,55) = 26.617, p =.000), in addition to significant main effects of path elaboration (F(1,55) = 1140.499, p =.000) and group (F(4,55) = 21.464, p =.000), respectively.
Two one-way ANOVA tests were conducted on the relevant raw data, one for each path elaboration level. A Levene’s test suggested that the equal variance of both datasets concerning PE=2 and PE=1 respectively was not assumed, therefore, post-hoc comparisons with Tamhane correction were opted for narrowing down the contrasts between groups. Specifically, there was a significant difference in the frequency of utterances with two types of path information across groups (F(4,55) = 18.511, p =.000). Post-hoc comparisons further revealed that Chinese native speakers significantly more frequently gave responses with PE=2 than any of the other groups did.
Similarly, there was a significant difference in the frequency of utterances with one piece of path information across groups (F(4,55) = 32.635, p =.000). Post-hoc comparisons indicated that both learner groups across proficiencies and English native speakers provided responses with PE=1 significantly more frequently than did Chinese native speakers. No other significant difference was detected between any other pair of participant groups: all learner groups produced utterances with PE=1 as frequently as did English native speakers, and there was no significant contrast in performance among the three learner groups across proficiencies (cf. Figure 3).
Consistent with findings regarding manner elaboration, English native speakers systematically used particles to express trajectory of motion, whereas Chinese native speakers tended to elaborate more on fine details of motion, as illustrated in example (12).
(12) 邦尼 沿着 一 排 椅子 把 蓝球
Bonny yan2zhe [trajectory] yi4 pai2 yi3zi ba3 lan2qiu2
Bonny along one cl chair ba basketball
从 这 端 推到 了 那 端。
cong2 [source] zhe4 duan1 tui1–dao4 [goal] le na4 duan1
from this end push–to asp that end
‘Bonny pushed the basketball from this end to that end along a row of chairs.’
[PE=3: trajectory + source of motion + goal of motion] (CNS05A)
The performance of L2 learners across proficiencies showed more resemblance to that of English native speakers rather than to Chinese native speakers (e.g., underlined particles in example (10)). Such a propensity seemed to emerge from the initial stage of acquisition and did not develop significantly with the advance in learning.
4.2.3. A qualitative look at the data
A qualitative examination of the data revealed that the performance of English L2 learners at low and intermediate levels seemed to differ from both native speakers and advanced learners in two aspects. First, for manner expression, they sometimes produced utterances with general verbs only (e.g., put, move), that is, verbs without any indication of specific manner of motion (12% in frequency among low-proficiency learners; see example (13)).
(13) a. Bonny carry [carried] a gift box along the tunnel. (Beg02A)
b. Bonny is bringing the swimming ring down to the dune. (Beg03B)
c. Bonny is moved [moving] this bag near the elevator. (Beg06A)
In cases where their provision of path information remained stable (low: 99% and intermediate: 94% in mean frequency), this fact meant that the increase in expressing multiple information types from the low-intermediate to the advanced level (cf. Section 4.1) was largely attributable to an increased expression of specific manner information.
Second, in encoding path information, although L2 learners at low and intermediate levels tended to encode one path component only in their responses, the percentage of this single component that corresponded to trajectory was actually lower than in both native speakers and advanced learners (i.e., Table 2). Our coding method of differentiating path information of varied natures helped to enhance the visibility of this result. Learners at less advanced levels sometimes expressed fine details regarding path only, rather than using such information to supplement the trajectory. To illustrate, examples (14a) to (14c) encode source of motion only, endpoint of motion only, and boundary-crossing only, respectively. Occasionally they provided general locations for motion only rather than giving proper path information, thus failing in presenting the motion event concerned as a translocational one, viz., involving a change of location (example (14d)).
(14) a. A swimming ring was pushed by Bonny from the sand dune. (Beg10A)
b. He pushed the barrel of hay onto the roof of the house. (Int03B)
c. He is pushing the luggage out of the tent. (Int04B)
d. Bonny [is] pulling a toy car on a [an] icy lake. (Beg04B)
table 2. The expression of trajectory versus supplementary path information only in utterances with PE=1

mAnother phenomenon warrants mention as well. The advanced L2 learners (but not those at initial and intermediate stages of acquisition) sometimes made efforts to elaborate on manner and path information. They occasionally expressed one more type of manner, Ma (slide in example (15a)) or Mo (run in example (15b)), apart from Mc in their responses (ME=2, 6% in frequency). Similarly, they tended to depict the goal or endpoint of motion, particularly with the stimuli involving path of verticality (PE=2, 5% in frequency; examples (15c) and (15d)). Note that it seemed difficult to interpret this finding as a trace of L1 transfer; it could be an artefact of our experimental set-up, in which the participant felt compelled to provide particularly dense information to the remote addressee.
(15) a. He slides [Ma] across the icy lake while dragging [Mc] the car. (Adv07B)
b. He slightly rolls [Mc] the golf ball toward the puddle and it runs [Mo] into a small pond. (Adv09A)
c. He pushes the treasure bag up [trajectory] to the top [goal] of the pyramid. (Adv09A)
d. Bonny has pushed the barrel of hay up [trajectory] the ladder to the top [goal] of the roof. (Adv04B)
To summarize, in Section 4.2 we have examined in detail how manner and path are elaborated by three learner groups in comparison with Chinese and English native speakers. Our results suggested a potential influence from the partial typological resemblance between the L1 and L2. Learners across proficiencies had almost fully adopted the target system of caused motion expressions despite infrequent uses of general verbs (versus specific manner verbs) and supplementary path information (versus trajectory) among learners of low proficiency.Footnote 8 Note that the target-like performance of learners at less advanced levels can be explained in different ways. It might be possible that they have fully learned that only indispensable manner and path information (i.e., Mc and trajectory) were characteristically encoded in the target system; it is also likely that due to the constraint of low fluency, L2 learners at low and intermediate proficiencies are not yet fully capable of expressing and organizing extra information regarding caused motion (e.g., Ma, Mo, path elaboration) in their responses.
4.3 the syntactic organization of multiple information types
In this section we will examine the syntactic pattern of information organization across an utterance for learners across proficiency levels as compared to native speakers. Before generating syntactic patterns (Section 4.3.2), we will illustrate in detail how different loci were identified for various information types (Section 4.3.1).
4.3.1 Manner and path loci
Three information loci were distinguished in our analysis: within the main verb only (in Vm), outside the main verb only (outside Vm), and at both loci. ‘Main verb’ here was defined as an independent verb in English or Chinese, or the first constituent in a Chinese RVC (the second constituent is taken as a particle rather than a verb in the present analysis as explained in footnote 3). Grammatical devices outside the main verb referred to satellites (e.g., verb particles, affixes), prepositional phrases, nouns, subordinated clauses, and similar devices that encode manner or path information. Utterances were thus differentiated with respect to the loci manner or path information occurred, as illustrated in (16) and (17).
(16) Manner at different loci across an utterance:
a. The boy slid the suitcase away. [M in Vm]
b. The boy went across the lake pulling a toy car. [M outside Vm]
c. The boy walked [M in Vm] across the lake pulling [M outside Vm] a toy car. [M in both loci]
(17) Path at different loci across an utterance:
a. The boy entered the pyramid pulling a treasure bag. [P in Vm]
b. The boy pushed the swimming ring down the sand dune. [P outside Vm]
c. The boy circled [P in Vm] around [P outside Vm] the table pulling a barrel of beer. [P in both loci]
A closer look at the data suggests that there seems to be a great difference between the source and the target language concerning the issue of information loci. English native speakers showed a clear-cut pattern of putting manner within the main verb and path in verb particles (i.e., M in Vm, P outside Vm). In contrast, Chinese native speakers demonstrated much more flexibility in locating motion information at different loci over an utterance, and such a propensity seemed to be more prominent in relation to manner than to path. A qualitative examination of the data revealed that three general patterns were frequently opted for by Chinese native speakers:
(18) a. M in Vm, P outside Vm
Bonny 把 财宝袋 拖进 了 金字塔。
Bonny ba3 cai2bao3dai4 tuo1 [M]-jin4 [P] le jin1zi4ta3
Bonny ba treasure bag pull–into asp pyramid
‘Bonny pulled the treasure bag into the pyramid.’ (CNS01A)
b. M outside Vm, P in Vm
邦尼 拉 着 小车 经过 小湖。
Bonny la1 [M] zhe xiao3che1 jing1guo4 [P] xiaohu
Bonny pull dur car cross lake
‘Bonny, pulling the car, went across the lake.’ (CNS11B)
c. M in both loci, P outside Vm
邦尼 推 着 木柴 走近
Bonny tui [M] zhe mu4chai4 zou3 [M]-jin4 [P]
Bonny push dur log walk–towards
了 篝火。
le gou1huo3
asp fire
‘Bonny, pushing the logs, walked towards the fire.’ (CNS10A)
As for L2 learners, they seemed to show a tendency of adopting the target pattern of ‘M in Vm, P outside Vm’, as exemplified in (19).
(19) a. Bonny pulled the treasure bag into the pyramid. (ENS01A)
b. Bonny is pushing some woods [wood] away from the fire. (Beg09A)
c. He rolled the bowling ball into the pool. (Int04B)
d. He pushed the heavy bag to the escalator. (Adv08B)
It is worth mentioning that the information loci and syntactic pattern of a given utterance are closely associated with one other. To illustrate, the frequent option of expressing at least one type of manner information outside the main verb in the periphery of an utterance among Chinese native speakers leads to a grammatical construction involving subordination, and thus a relatively discursive pattern of information packaging ((18b) and (18c)). In comparison, English native speakers tended to wrap up manner and path in a verb + satellite combination, and therefore their utterances were normally syntactically simplex and denoted a highly compact pattern of information distribution (example (19)). The issue of how an L2 learner attends to such differences will be investigated in detail in the following sections.
4.3.2 The syntactic constructions for caused motion expressions
Three major syntactic modes were distinguished, based on the above observations with respect to information loci: (a) Compact (e.g., English simplex clauses, Chinese BA constructions); (b) Semi-compact (e.g., English complex clauses, gerunds, Chinese ZHE constructions); and (c) Loose (e.g., coordinated and juxtaposed clauses). We added a fourth type ‘Others’ to include any other possible grammatical structures that may be opted for by learners in their caused motion expressions (e.g., infinitive construction as in (20d)). The English examples in (20) illustrate the difference between these syntactic structures.
(20) a. Compact
Bonny pushed the toy car across the icy lake. (ENS01A)
b. Semi-compact
Bonny walks across the lake, dragging a toy car. (Adv07B)
c. Loose
He pushed the golf ball and it rolled all the way to the puddle. (Adv01A)
d. Others
Bonny rolled the barrel of hay two times to get it up the ladder. (ENS01A)
Figure 4 shows the frequency of utterances with different syntactic structures over the total number of responses per group (192).
Fig. 4. Frequency of responses with different syntactic structures in learner groups as well as native groups.
As demonstrated in Figure 4, the Compact way of distributing information across an utterance was opted for by participants across groups, and Chinese native speakers seemed to frequently adopt the Semi-compact pattern as well. There were only sporadic occurrences of the Loose and the Others pattern across all groups. Since there was not sufficient variability regarding syntactic patterns except for the Compact to justify additional factors for comparison, we performed a one-way ANOVA test on responses with the Compact syntactic pattern only. The result suggested a significant difference in the frequency of syntactically Compact utterances across five groups (F(4,55) = 31.686, p =.000). The Tamhane post-hoc comparisons further revealed that each of the three learner groups, apart from English native speakers, adopted the Compact mode significantly more frequently than did native speakers of Chinese. In addition, L2 learners used this mode equally frequently, as compared to English native speakers, and no significant difference was attested across the three levels of proficiency.
Further, we conducted an additional paired samples t-test on the dataset of Chinese native speakers to determine whether the Compact (M = 8.67) and the Semi-compact syntactic patterns (M = 7.25) were used in comparable frequencies. The results corroborated our speculation: Chinese native speakers employed these two modes for information organization equally frequently (t(11) = 0.875, p =.410). A qualitative examination of the data further revealed that two main grammatical constructions were normally adopted: the BA construction (Compact) and the ZHE construction (Semi-compact). The former focuses on the affectedness of the object (i.e., how the object is disposed of) and typically expresses three information components (i.e., C, Mc, and P, as illustrated in example (21)). The latter emphasizes the temporal simultaneity between events in caused motion expressions (Li & Thompson, Reference Li and Thompson1976, Reference Li and Thompson1981) and can accommodate four (or even more) information components over an utterance (C, Mc, Ma, and P, as shown in example (22)).
(21) BA construction
邦尼 把 游泳 圈 推下 沙丘。
Bonny ba3 you2yong3 quan1 tui1 [C+Mc]-xia4 [P] sha1qiu1
Bonny ba swimming ring push–down dune
‘Bonny pushed the swimming ring down the dune.’ (CNS11B)
(22) ZHE construction
小 男孩 缓缓 地 推 着
xiao3 nan2hai2 huan3huan3 [M] de tui1 [C+Mc] zhe
little boy slowly assoc push dur
小 车 从 冰 湖 上 走过。
xiao3 che1 cong2 [P] bing1 hu2 shang4 zou3 [Ma]-guo4 [P]
small car from ice lake on walk-across
‘The little boy, pushing the small car slowly, walked across the iced lake on its surface.’ (CNS03B)
The ZHE construction, especially when used in conjunction with an RVC, is capable of encoding particularly dense semantic information across an utterance. Take example (22) as an illustration. The multiple pieces of semantic information (apart from C) in this response included three types of manner information: Mc (‘push’), Ma (‘walk’), and velocity of motion (‘slowly’); one type of path information: the trajectory of boundary-crossing (‘across’); and the general location for motion (‘on its surface’).
Again, in line with our first hypothesis, all L2 learners, irrespective of proficiency level, resembled English native speakers in adopting the Compact construction. Particular attention should be focused on the way that L2 learners abandoned the Semi-compact construction, which had been employed by Chinese native speakers in a mean frequency of 45% of all cases. This construction, with its capability of accommodating particularly rich information components, seemed most applicable to the context of participation in our study, in which speakers’ attention was directed to the explicitness of description. The reasons and the implications of L2 learners’ choice in this respect will be discussed in detail in Section 5.
Finally, it is worth mentioning the performance of the advanced learners. In contradistinction to those at low and intermediate levels who almost entirely adapted to the target system, the advanced learners occasionally (4% in a mean frequency) employed the syntactically Loose structure to encode caused motion events, mostly in the form of coordinated clauses (example (23)). One explanation is that their non-target-like performance was driven by the need to be more explicit because such Loose structures normally allowed two types of manner information to be encoded, mostly Mc and Mo, as demonstrated in example (23). Another possibility is that their level of fluency has given them the ability to use richer description and match-mirror the Chinese pattern.
(23) a. He slightly rolls [Mc] the golf ball toward the puddle and it runs [Mo] into a small pond. (Adv09A)
b. He pushes [Mc] the swimming ring and it rolls [Mo] down the sand dune. (Adv12B)
5 Discussion and conclusion
The present study investigated the expression of caused motion events by adult Chinese learners of English at three proficiency levels (low, intermediate, and advanced) in comparison with English and Chinese native speakers. Our general aim is to determine which aspect(s) of caused motion events in the L1 is attended to and expressed in the L2 when the two languages concerned share partial typological similarity. To this end, we examined in detail caused motion expressions by different groups of speakers along two dimensions: (a) the expression of multiple information types, i.e., what has been expressed and to what degree of explicitness; and (b) the syntactic organization of multiple information types, i.e., where to place a given information component and how to distribute multiple information components across an utterance via syntactic means.
First of all, our findings showed, as expected, strong language differences between the source and target languages. As a typical S-language, English manifested a highly systematic and lucid pattern of expressing the manner of causality in the main verb and the trajectory of motion in verb particles, and packaging these information types in a compact way via syntactically simplex structures. In striking contrast, Chinese showed a much more flexible pattern of encoding caused motion events, allowing for a fuller and more explicit narration of these events. Apart from the manner of the Agent’s causative action, the manner of the Agent’s motion (i.e., walking) was also frequently encoded. In a similar fashion, apart from the trajectory itself, fine details regarding path, such as the deixis, the source, and the goal or endpoint of motion, were also frequently provided. To organize these particularly rich types of information over an utterance, Chinese native speakers opted for syntactically complex constructions, resulting in a more discursive way of information distribution as compared to English.
As to the three learner groups, the results were consistent with our first hypothesis: L2 learners, from an initial stage of acquisition, produced predominately target-like responses. In this case, the factor of proficiency seemed to have very little role to play, as L2 learners’ acquisition of the target system did not seem to develop across proficiency levels, despite a minor increase in manner and path elaboration from low to intermediate and advanced levels.
These findings are somewhat surprising at first sight, especially when considering the fact that these L2 learners are living in an entirely Chinese-speaking environment in which their English input is rather limited, basically from classroom teaching. Despite this socio-cultural disadvantage, adult Chinese learners of English still managed to acquire the target pattern of caused motion expressions from an early stage of learning. One might propose that the typological similarity between L1 and L2 greatly facilitated the acquisition rhythm, but this poses the question of how this facilitating effect happens. In this respect, the theory of Structural Ambiguity (Müller, Reference Müller1998) may lend us some insight, although this theory is primarily used to explain the cross-linguistic influence between one’s L1 and L2 in simultaneous bilinguals. The gist of this theory can be summarized as follows: if more than one structure (i.e., ambiguous) exists in the L1 for a given expression when there is only one structure (i.e., unambiguous) available in the L2 for such an expression, the direction of transfer in simultaneous bilinguals is from L2 to L1, namely, from the language that is least ambiguous to the language that is more ambiguous. Let us return to our findings in the light of this theory. In English, there is only one characteristic lexicalization pattern for caused motion expressions (i.e., cause-and-manner verb + path satellite), whereas in Chinese, two main lexicalization patterns exist in parallel, depending on which aspect of the caused motion experience is emphasized. This means that L2 learners need to switch from a rather complicated, variable, and ambiguous system (Chinese) to a fairly simple, systematic, and unequivocal one (English). What they seem to have done is to simply opt for the simple and unambiguous system that is applicable in both languages.
As mentioned in Section 1, Slobin (Reference Slobin, Gumperz and Levinson1996a) predicts, in his ‘thinking for speaking’ hypothesis, that learning a second language basically means acquiring an alternative way of thinking, and the L1 ‘thinking for speaking’ pattern, which is ingrained from childhood, is “exceptionally resistant to restructuring in adult second language acquisition” (1996a, p. 89). In our study, the target-like performance of L2 learners at low proficiency makes one doubt whether they had really undergone a switch of ‘thinking for speaking’ patterns in their learning process. One possibility is that, in cases where L1 and L2 have some overlapping typological properties (for instance when the L2 ‘thinking for speaking’ pattern constitutes a subset of the L1 ‘thinking for speaking’ pattern), L2 learners simply activate the relevant part of the ‘thinking for speaking’ pattern in their native language and apply it directly onto L2 surface forms.
As mentioned earlier, L2 learners in our experiment face a communicative challenge: how to relate the caused motion scenes to the (imaginary) remote addressee in the most explicit way. Their native language offered readily available means to elaborate on different sorts of motion information (manner and path in particular), which cannot be all accommodated in L2 surface forms. The question thus becomes how to strike a balance between being maximally explicit (which is desirable in the immediate context) and being target-like (see also Hendriks & Hickmann, Reference Hendriks, Hickmann, Cook and Bassetti2011). To an L2 learner, a greater degree of explicitness can only be achieved by mastering more complex grammatical constructions in the L2 (e.g., those involving subordination such as The boy walked across the street pushing a big ball with him), and at the cost of violating the canonical pattern for caused motion expressions in the L2, thus making their narration sound non-native-like. In contrast, although adopting the target pattern of motion expressions means sacrificing some fine details in description (e.g., the manner of the Agent’s motion and complementary path information), it represents grammatical simplicity and a reasonable degree of explicitness (i.e., the most important type of manner information, the manner of cause, and the most important type of path information, the trajectory itself are already there). Put simply, to an L2 learner who is potentially aware of this dilemma, being target-like seems to outweigh being maximally explicit in the particular context of our experiment.
It is worth remembering that, compared to learners at less advanced levels, the advanced group of English L2 learners in our study unexpectedly made more efforts to elaborate on manner and path of motion in their description via syntactically Loose structures (mainly coordinated clauses). The phenomenon seems to be an artefact of our experimental set-up. L2 learners probably never entirely give up their endeavour to be maximally explicit. At relatively low proficiency levels, this resulted in rarely occurring ungrammatical responses such as *Bonny pushed the swimming ring roll down the sand dune (Int10A). When L2 learners’ acquisition process advanced and they had a better command of the target language, they attempted from time to time to fully exploit grammatical devices to achieve the maximal degree of communicative explicitness, which resulted in grammatically correct yet non-target-like performance (i.e., responses involving coordination or subordination).
Note that our findings in the current study are task-specific (i.e., expressing complex caused motion events with multiple information components), context-specific (i.e., in controlled laboratory situations), and language(s)-specific (i.e., languages with overlapping typological properties). The communicative challenges inherent in the current task include: (a) being target-like; and (b) being maximally explicit. Some recent studies have attempted to explore whether and how L2 learners achieve communicative goals in tasks of more varied contexts. For instance, Hendriks et al. (Reference Hendriks, Hickmann and Demagny2008) and Hendriks and Hickmann (Reference Hendriks, Hickmann, Cook and Bassetti2011) examined how adult English learners of French expressed voluntary versus caused motion events in an experimental situation. It was reported that in depicting complex caused motion events, L2 learners relied heavily on the system provided in their source language (English), because it offered a more cost-effective way of expressing multiple information types as compared to the target language (French). In contrast, in narrating less complicated voluntary motion events, the same groups of L2 learners systematically adopted the target pattern and completely shook off their L1 pattern because the target language already provided an efficient way of expressing the motion events under discussion. It was thus speculated that when the task presented a double communicative challenge of being target-like and being explicit, the pattern of motion expressions opted for by L2 learners tended to be a relatively economical one: using the most cost-effective means to adequately get the message across.
An investigation of how adult English learners of Chinese learn to express the set of caused motion events in the present study is in progress. If the above-mentioned speculation is valid, we should expect that the adult English learners of Chinese resort to the system in their source language for caused motion expressions as the source pattern provides a terse yet effective way of packaging multiple information components. Note, however, that another phenomenon is equally likely: in this circumstance, the target language (Chinese) happens to be the language that allows for the maximal degree of explicitness; therefore, being target-like means being most explicit (this is quite different from the current context in which the two aims cannot be realized simultaneously), and this convergence may invite L2 learners across proficiency levels to produce more or less target-like responses.
To conclude, our results suggest that in expressing caused motion events in a second language, L2 learners’ acquisition can be facilitated when: (a) their L1 and L2 present some overlapping typological properties; and (b) they switch from a complicated and opaque language system to a simple and clear-cut system. Obviously, much ground has yet to be covered in this area of research. Future studies, when evaluating any L1 transfer, should take into account multiple factors such as the nature of motion events under investigation (e.g., spontaneous versus caused), the precise typological properties of the L1 and L2 (e.g., intra-typological differences, inter-typological similarities), the possible interaction between grammatical choices and experimental requirements, and the direction of the language switching (from a complex and ambiguous system to a simple and unequivocal one or vice versa).
Appendix 1 Description of sixteen caused motion stimuli (Order A)
Training item: Bonny pulled a boat out of the lake.
A1. Bonny pushed a swimming ring down the sand dune.
A2. Bonny pulled a treasure bag into the pyramid.
A3. Bonny pushed a bundle of wood away from the campfire.
A4. Bonny pulled a big gift box along the tunnel.
A5. Bonny pushed a treasure bag up the pyramid.
A6. Bonny pulled a toy car across the icy lake.
A7. Bonny pushed a bundle of wood towards the campfire.
A8. Bonny pulled a toy car around the icy lake.
A9. Bonny slid a heavy bag towards the escalator.
A10. Bonny rolled a barrel of beer around the round table. A11. Bonny slid a toy car across the icy lake.
A12. Bonny rolled a barrel of hay up the ladder.
A13. Bonny rolled a basketball along a row of chairs.
A14. Bonny slid a suitcase away from the tent.
A15. Bonny rolled a golf ball into the puddle.
A16. Bonny rolled a swimming ring down the hill.