1. Introduction
To what extent does our conception of reality bear a relation to the language we speak? This question, known as the principle of linguistic relativity, or the Whorfian hypothesis (Whorf, Reference Whorf and Carroll1956), has been subject to extensive research and fierce debate within the cognitive sciences, notably during the past two decades (for recent overviews, see Gleitman & Papafragou, Reference Gleitman, Papafragou, Holyoak and Morrison2012; Regier & Kay, Reference Regier and Kay2009; Wolff & Holmes, Reference Wolff and Holmes2011). Classic test beds of the Whorfian hypothesis comprise the domains of colour, time, space, and objects. Though traditionally under-represented, the domain of motion has attracted increasing attention over the past decade, with a growing number of studies investigating cross-linguistic differences in motion event cognition (Athanasopoulos et al., Reference Athanasopoulos, Bylund, Montero-Melis, Damjanovic, Schartner, Kibbe and Thierry2015; cf. Athanasopoulos & Bylund, Reference Athanasopoulos and Bylund2013; Bylund, Athanasopoulos, & Oostendorp, Reference Bylund, Athanasopoulos and Oostendorp2013; Flecken, Von Stutterheim, & Carroll, Reference Flecken, von Stutterheim and Carroll2014; Gennari, Sloman, Malt, & Fitch, Reference Gennari, Sloman, Malt and Fitch2002; Loucks & Pederson, Reference Loucks, Pederson, Bohnemeyer and Pederson2011; Malt, Ameel, Imai, Gennari, Saji, & Majid, Reference Malt, Ameel, Imai, Gennari, Saji and Majid2014; Papafragou, Hulbert, & Trueswell, Reference Papafragou, Hulbert and Trueswell2008; Papafragou, Massey, & Gleitman, Reference Papafragou, Massey and Gleitman2002; Papafragou & Selimis, Reference Papafragou and Selimis2010; Trueswell & Papafragou, Reference Trueswell and Papafragou2010; von Stutterheim, Andermann, Carroll, Flecken, & Schmiedtová, Reference von Stutterheim, Andermann, Carroll, Flecken and Schmiedtová2012). While previous work on cross-linguistic differences in the conceptualization of motion has yielded mixed evidence, all studies to date have focused on a single type of event: voluntary motion, in which an entity travels along a trajectory by its own force (e.g., ‘the man walked into the barn’).
In this paper, we extend the scope of previous research by investigating a type of motion that hitherto has remained unexplored in the study of linguistic relativity: caused motion, that is, the motion of an entity as a result of an external force exerted by another entity (e.g., ‘The man pushed the tyre into the barn’). Footnote 1 We ask to what extent our native language mirrors the way we conceptualize caused motion, which is arguably a conceptually more complex phenomenon than voluntary motion. Specifically, by implementing a novel methodological approach that combines the experimental principles used in previous motion cognition research, we ask which aspects of event representation differ and which ones are shared between speakers of typologically different languages. We focus on Spanish and Swedish, because these languages exhibit maximal contrast in their way of encoding motion, and thereby constitute an ideal test bed for assessing the relationship between linguistic structure and conceptualization.
2. Cross-linguistic differences in the linguistic packaging of motion
Caused motion involves a simple motion event and some external force that causes the simple event to happen (Talmy, Reference Talmy2000a), as in example (1):
-
(1) The man pushed the tyre into the barn.
The simple event in (1) involves the non-volitional motion of a figure (the tyre) with respect to a ground (the barn) following a certain path (inwards). In addition, the situation described in (1) includes a causing event, namely an agent (the man) causing the tyre to move by manipulating it in a certain manner (he pushes it). Compared to voluntary motion (e.g., ‘the man walked into the barn’), caused motion contains a larger number of event components that can be linguistically encoded. In (1), the tyre could move in different manners: it could roll or slide along the ground. In the present study, we manipulate the manner in which the agent causes an object to move, henceforth manner of cause, and the manner of motion of the object, henceforth manner of object. Both are illustrated in (2).
-
(2) a. He pushed the tyre into the barn. (manner of cause)
-
b. He rolled the tyre into the barn. (manner of object)
Both examples involve an agent moving a tyre into a barn, but they profile information about different event components: (2a) highlights the manner of cause (he pushes the tyre, rather than pulling it), while (2b) highlights manner of object (the tyre rolls, rather than sliding).
As in voluntary motion, we can apply Talmy’s (Reference Talmy and Shopen1985, Reference Talmy2000b) well-known typological distinction between satellite-framed languages (S-languages) and verb-framed languages (V-languages) to caused motion. S-languages like Swedish or English encode path outside of the main verb root (e.g., in verb particles like into), and the verb typically expresses manner information (e.g., push/roll). In contrast, V-languages like Spanish encode path in the verb root (entró ‘(he) entered’). This is illustrated in examples (3) and (4), for Swedish and Spanish, respectively:
-
(3) Han sköt/rullade in hjul-et i lada-n.
-
He pushed/rolled in tyre-def to barn-def
-
manner-cause/manner-object path
-
‘He pushed/rolled the tyre into the barn.’
-
(4) Entr-ó en el granero con una rueda
-
Enter-3sg in the barn with a tyre
-
path
-
(empujándo-la/rodándo-la).
-
pushing-it/rolling-it
-
(manner-cause/manner-object)
-
‘He entered the barn with a tyre (pushing it/rolling it).’
This contrasting lexicalization pattern makes manner information less codable in V-languages like Spanish, because there is no obligatory syntactic slot that encodes this information. Manner information can be expressed in Spanish, for example in a gerund as in (4), but it is often omitted without the description being ungrammatical or sounding odd. Swedish, on the other hand, represents an even more extreme case of S-language than English (Ragnarsdóttir & Strömqvist, Reference Ragnarsdóttir, Strömqvist, Strömqvist and Verhoeven2004; see Cadierno & Ruiz, Reference Cadierno and Ruiz2006, for Danish). For instance, Swedish lacks most of the English Latinate verbs encoding path, such as ascend, descend, enter, or exit. Moreover, Swedish has no generic motion verb (like the English go or the Spanish ir), which leads to a high use of various manner verbs in everyday discourse.
Presently, empirical investigations on cross-linguistic differences in the linguistic packaging of caused motion are scarce. In a pioneering study, Choi and Bowerman (Reference Choi and Bowerman1991) demonstrated that Korean lexicalizes caused and voluntary motion differentially, such that the former is encoded through a conflation of motion and path in the main verb (on a par with V-languages), whereas the latter is expressed with separate constituents for motion, path, and (optionally) manner.
More recently, Hickmann and colleagues investigated how child and adult native speakers of English and French (Hendriks, Hickmann, & Demagny, Reference Hendriks, Hickmann and Demagny2008; Hickmann & Hendriks, Reference Hickmann and Hendriks2010), and English and Chinese (Ji, Hendriks, & Hickmann, Reference Ji, Hendriks and Hickmann2011), express path, manner of cause, and manner of object. Interestingly, the two former studies found no cross-linguistic differences in the frequency of encoding of these motion components in English and French. The lack of cross-linguistic differences in the information conveyed may be due to the fact that French is not straightforwardly classified as a V-language, but has been argued to constitute a “mixed case” in Talmy’s typology (Stringer, Reference Stringer2005, p. 210). However, French speakers were found to exhibit greater variation in their structural patterns than English speakers, encoding path, manner of cause, and manner of object interchangeably in main verbs, subordinated verbs, or adverbial constructions. Ji et al. (Reference Ji, Hendriks and Hickmann2011), in contrast, found developmental differences in how speakers of English and Chinese (an equipollent framing language) encoded caused motion events. These results highlight the importance of choosing optimal language pairs, so as to avoid null effects that stem from a lack of clear-cut cross-linguistic differences in event descriptions.
3. Cross-linguistic differences in motion event cognition
Using Talmy’s typology as a starting point, several studies have set out to investigate the potential effects of the linguistic packaging of voluntary motion on the way we conceptualize motion events. A prominent notion is Slobin’s (Reference Slobin, Gumperz and Levinson1996, Reference Slobin, Gentner and Goldin-Meadow2003) thinking-for-speaking hypothesis, which predicts that cross-linguistic differences in grammaticized and lexicalized concepts will lead speakers of different languages to attend to different features when preparing and producing speech. In one very clear demonstration of this, Papafragou et al. (Reference Papafragou, Hulbert and Trueswell2008) used an eye-tracking paradigm to probe whether speakers of English and Greek allocated visual attention to different components of an unfolding motion event (e.g., a man skating to a snowman on an ice rink). Unlike English, Greek is a V-language and typically encodes the path in the main verb. In the linguistic condition, participants had to inspect the events while preparing to describe them. Consistent with thinking-for-speaking, gaze allocation was distinct in the two groups and followed the specific lexicalization patterns of each language. However, this difference disappeared in the condition where participants freely inspected the events without having to describe them (see also Trueswell & Papafragou, Reference Trueswell and Papafragou2010).
Moving beyond speech planning, researchers have probed the extent to which cross-linguistic differences in motion event encoding influence cognitive processes such as recognition memory, similarity judgements, and category learning. According to the salience hypothesis (cf. Papafragou, Reference Papafragou, Carruthers, Laurence and Stich2008), the repeated use of linguistically mediated conceptual categories, in this case the distinct lexicalization patterns of manner and path, will lead to non-linguistic conceptual schemata that are largely in agreement with linguistic patterns. In a series of groundbreaking studies, Gennari et al. (Reference Gennari, Sloman, Malt and Fitch2002) and Papafragou and associates (Papafragou et al., Reference Papafragou, Massey and Gleitman2002; Papafragou & Selimis, Reference Papafragou and Selimis2010) tested this hypothesis using a triads-matching paradigm with native speakers of English, Greek, and Spanish. In this paradigm, each experimental stimulus consists of a reference event (e.g., a man walking up the stairs), a same-path alternate in which the manner is changed (a man running up the stairs), and a same-manner alternate in which the path is changed (a man walking down the stairs). Gennari et al. (Reference Gennari, Sloman, Malt and Fitch2002) found that Spanish speakers had a greater preference for same-path alternates compared to English speakers, but only if they had described the events before judging their similarity (a similar finding was also reported by Lai, Garrido Rodriguez, & Narasimhan, Reference Lai, Garrido Rodriguez and Narasimhan2014). A possible flaw in Gennari et al.’s study is that the stimuli do not seem to have consistently discriminated between voluntary and caused motion (cf. Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002, pp. 66–67). Papafragou and Selimis (Reference Papafragou and Selimis2010) similarly found that when the task instructions contained a linguistic bias, English speakers were more likely to base their similarity judgements on manner than Greek speakers. In these studies cross-linguistic differences disappeared in the absence of biasing instructions (Papafragou & Selimis, Reference Papafragou and Selimis2010) or prior verbalization (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002), as well as in the presence of a verbal interference task (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002).
The studies just reviewed all used a task in which participants had to make a forced choice between path and manner. By its very design, this task confounds path and manner preferences, because proportions of path and manner choices always add up to one. Thus, higher path preference becomes equivalent to lower manner preference and vice versa. However, both S-languages and V-languages consistently encode path information, albeit mapped onto different linguistic elements. The crucial difference lies in the fact that manner is less codable in V-languages and thus gets more often omitted in discourse. Therefore, a design that teases out path bias from manner bias more adequately reflects the linguistic state of affairs, which after all is what is hypothesized to drive any potential conceptual differences between language groups.
Following this logic, Kersten, Meissner, Lechuga, Schwartz, Albrechtsen, and Iglesias (Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010) adopted a methodological approach which did not oppose manner and path. They implemented a supervised classification paradigm in which English and Spanish speakers had to learn to correctly identify alien species depending on their motion patterns. The diagnostic criterion for classification differed between subjects; it was either path or manner. A series of experiments consistently showed that English speakers had an advantage over Spanish speakers when the relevant classification criterion was manner, but not when it was path, in which case both groups performed equally well. One limitation of this study is that it did not manipulate the involvement of language, thus leaving open the question of how much participants were relying on verbal mediation to solve the task.
The evidence reviewed above provides mixed support for the salience hypothesis. If it were the case that conceptual categories of motion were invariably aligned with linguistic categories, Whorfian effects should not fluctuate as a function of experimental condition and task. On the one hand, the negative evidence lends support to the idea that motion cognition is mainly guided by our shared perceptual system with language exerting no influence on how we perceive motion events (Papafragou, Reference Papafragou, Carruthers, Laurence and Stich2008). On the other hand, the available positive evidence provides a qualified picture of language effects on motion event cognition. If language-specific labels are made salient in an experimental situation, they may modulate event categorization criteria. Linguistic labelling might be achieved by means of using instructions that encourage the use of language (Papafragou & Selimis, Reference Papafragou and Selimis2010), or by implementing a challenging task that promotes the use of labels (Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). These effects are consistent with what Wolff and Holmes (Reference Wolff and Holmes2011) have termed thinking-with-language effects, whereby linguistic and non-linguistic processes might be activated in tandem, with the consequence that linguistic categories mediate similarity judgements and facilitate category learning (for similar findings on grammatical aspect and goal-oriented motion, see Athanasopoulos & Bylund, Reference Athanasopoulos and Bylund2013; Bylund & Athanasopoulos, Reference Bylund and Athanasopoulos2014; Flecken et al., Reference Flecken, von Stutterheim and Carroll2014).
4. The present study
The overall aim of the present paper is to extend the cross-linguistic study of motion event cognition to the domain of caused motion. To this end, we implemented a free arrangement task (Goldstone, Reference Goldstone1994), in which participants had to arrange different scenes of caused motion on the basis of their perceived similarity. Unlike the commonly used forced triads-matching task, this paradigm does not confound path and manner preference, but instead captures the possibility that the relative importance of event components be weighted differently. The arrangement task was carried out under three different encoding conditions that manipulated the participants’ likelihood to use language to solve the task. Our method thus represents an improvement compared to previous studies, which either have manipulated linguistic task mediation but confounded path/manner preference (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002; Papafragou et al., Reference Papafragou, Massey and Gleitman2002; Papafragou & Selimis, Reference Papafragou and Selimis2010), or vice versa (Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). The experimental approach, combined with the choice of prototypical S- and V-languages (Swedish and Spanish, respectively), thus maximizes the likelihood of detecting potential cross-linguistic differences.
Our first and narrower research question concerns whether speakers of Spanish and Swedish differ in their focus on manner of cause and manner of object when judging event similarity, reflecting how these events are described in their respective languages. Our second and broader question concerns the commonalities and differences in how speakers of different languages represent motion, and whether event representation varies as a function of verbal mediation. Experiments 1 through 3 speak to the first question; we address the second question in a subsequent compound analysis of all three experiments. Footnote 2
5. Experiment 1: similarity assessment under linguistic encoding
Under the assumption that cross-linguistic variability in describing caused motion implies corresponding differences in conceptualizing these events, such a pattern should be strongest when language-specific representations have been explicitly activated. For this reason, Experiment 1 tested how speakers of Spanish and Swedish judged event similarity after first having described the events in their respective languages.
5.1. method
5.1.1. Participants
Twenty-four native Spanish speakers (mean age = 23.4 years, SD = 3.0) and twenty-two native Swedish speakers (mean age = 25.7 years, SD = 4.3) participated. Spanish speakers were students at the Complutense University of Madrid, and Swedish speakers were students at Stockholm University. Participants used their native language routinely since they lived in a monolingual context. Some had familiarity with other languages, but none of them had expert knowledge. Crucially, the Spanish speakers had no knowledge of Swedish, and vice versa. Participation was remunerated.
5.1.2. Materials
The stimuli were thirty-two animated cartoons (each approximately 7 seconds long) depicting caused motion events in which the same human-like agent displaced an object along a certain path. The animations were originally developed by Hickmann and associates (e.g., Hickmann & Hendriks, Reference Hickmann and Hendriks2010). Three motion components were systematically crossed in the stimuli: path (four levels: up, down, across, into); manner of cause (two levels: push, pull); and manner of object (two levels: roll, slide). See Fig. 1 for examples. The sagittal direction of motion (left-to-right, right-to-left) was counterbalanced, so that each of the sixteen possible combinations of motion components corresponded to two target items, one for each direction. In addition, stimuli varied with respect to the ground in which the event took place (8 different grounds) and the object that was moved (16 different objects). This variation was necessary: for the path to vary (e.g., ‘into’ versus ‘down’) the ground necessarily has to vary as well; similarly, an object can either roll or slide, but typically cannot do both. A full description of the target items is given in the Supplementary Materials (available at <http://dx.doi.org/10.1017/langcog.2016.22>.
5.1.3. Norming study
We took the measure of establishing the degree to which speakers of Spanish and Swedish actually differed in their descriptions of caused motion as depicted in the videos. Eighteen native Spanish and nineteen native Swedish speakers, who did not take part in the main experiment but belonged to the same two participant populations, provided descriptions of the thirty-two stimulus videos and seven additional filler items presented in pseudo-randomized order. The instructions were: “Describe what happens in each scene after having watched it in its entirety.” Participants were not given any limitation regarding length. Descriptions were coded for whether they mentioned path (e.g., ‘into’, ‘across’), manner of cause (e.g., ‘push’, ‘pull’), and manner of object (e.g., ‘roll’, ‘slide’), and whether this information was expressed in a main verb root or outside of it (see table 1). As expected, descriptions in both languages were very likely to express path (Spanish: 95% of all descriptions, Swedish: 99%), but Swedish more often than Spanish descriptions mentioned manner of cause (Spanish: 55%, Swedish: 78%) and manner of object (Spanish: 10%, Swedish: 29%). Lexicalization patterns followed each language’s typological status, with Spanish verbs mainly expressing path and Swedish verbs manner ( table 1). Comparison of these figures to previous studies confirms the suitability of the current language pair. Footnote 3
notes: The first two columns by language do not always add up to the total proportions (Total) because the same description can have redundant information in the main verb root and outside of it (e.g., sube para arriba ‘he ascends up’). The V columns need not add up to 100 because verbs can express neither of the components (e.g., Spanish va ‘he goes’) and because there can be two main verbs in a description (e.g., Swedish skjuter och rullar ‘pushes and rolls’).
We fitted three separate logistic mixed models (Jaeger, Reference Jaeger2008) to assess whether Spanish and Swedish reliably differed in their likelihood to express each of the three components in table 1. In each model the binary dependent variable was whether the component was expressed in a description or not. The sole predictor was language, which was dummy coded (Spanish = 0, Swedish = 1), so that the intercept in each model expresses the log-likelihood that Spanish speakers encoded a given component, while the language coefficient expresses the difference in log-likelihood between languages. Here we only report the latter, which provide the critical comparison (see Supplementary Materials for full model details including random effects structure). Results confirmed that Spanish and Swedish speakers were equally likely to express path ( $\hat \beta$ Swedish-vs.-Spanish = 3.03, p = .113), but that Swedish speakers were more likely than Spanish speakers to encode manner of cause ( $\hat \beta$ Swedish-vs.-Spanish = 5.01, p < .001) and manner of object ( $\hat \beta$ Swedish-vs.-Spanish = 2.55, p = .018). The norming study thus confirms the validity of our stimuli for the two tested language populations.
5.1.4. Procedure
Participants were tested individually by a native speaker of the relevant language. They first described the stimuli following the same procedure as in the norming study, and then moved on to the arrangement task. The exact instructions, translated into English, were: “Your task is to arrange the scenes on the screen depending on how similar they are. Video-clips showing similar actions should be placed near each other; if the action is different, the scenes should be placed far away from each other.” Participants were informed they would carry out three series of arrangements and that they would not be able to move a video-clip once it had been placed. The test phase, which started after a brief training phase, consisted of three arrangement blocks of twenty-two video-clips each. The progression per block was as follows: first, a video-clip was played on the screen in its entirety. The participant then moved to a screen where they had to place the scene by clicking with the mouse; upon clicking, a still of the video-clip appeared on the screen. This procedure was repeated until the end of the block. Participants could arrange items freely on the screen (they were not constrained to form piles or clusters). They moved forward by clicking on a centred message box to prevent spatial bias in the arrangements.
The specific items appearing in each block were randomized subject to the following constraints: no item appeared more than once in the same block; all 496 possible pairs of video-clips appeared at least once in some block; the number of videos appearing three times across blocks was minimized. We preferred this algorithm to the one used in Goldstone (Reference Goldstone1994), that is, choosing the items completely at random from the full set, since fully randomizing the choice of items would entail missing observations and lead to an unbalanced dataset. This resulted in a total of 693 pairwise similarity values per participant (231 for each block). The arrangement task was programmed in E-Basic and run in E-Prime (Schneider, Eschman, & Zuccolotto, Reference Schneider, Eschman and Zuccolotto2002).
5.1.5. Design and analysis
To assess whether speakers of Spanish and Swedish relied on different event components in their arrangements, we analyzed to what extent two events sharing a component led to increased similarity scores (Fig. 1). For example, if two events shared the manner of object (e.g., in both events the object was rolling), did this on average lead to an increase in similarity ratings with respect to events that did not share the manner of object? Crucially, was this increase different for Spanish and Swedish speakers?
The dependent variable was similarity between event pairs. Similarity was a continuous measure bounded between 0 (minimal similarity) and 1 (maximal similarity). It was computed from the similarity arrangements in two steps. First, we normalized the distance in pixels between all pairs of scenes for each participant–block combination to yield a measure between 0 and 1, as follows:
where Sim ijbs denotes the normalized similarity score between scenes i and j in block b for subject s. Dist ijbs denotes the distance in pixels between the coordinates of the centres of scenes i and j in arrangement b for subject s, and MaxDist bs denotes the maximal distance between two scenes in block b for subject s. Second, for each participant we averaged the normalized similarity between pairs of scenes across blocks, so as to yield one single measure per participant and event pair. Footnote 4 The final number of observations per participant was 496, one for each pairwise combination of the stimuli.
We examined the effect of the critical event components (path, manner of cause, and manner of object) on similarity scores and their interactions with language using linear mixed effects regression models (Baayen, Davidson, & Bates, Reference Baayen, Davidson and Bates2008). The analyses controlled for the effect of other non-target variables (ground, object, and left/right direction). Predictors were first centred so that the reported coefficients represent the estimated difference in similarity ratings between the two levels of each predictor. We included the maximal by-subject random effects structure and by-item random intercepts (Barr, Levy, Scheepers, & Tily, Reference Barr, Levy, Scheepers and Tily2013). All analyses were run in in R (R Development Core Team, 2013) using the lmer function of the lme4 library (Bates, Maechler, Bolker, & Walker, Reference Bates, Maechler, Bolker and Walker2014). See Supplementary Materials for full specification of the linear mixed effects models.
5.2. results and discussion
5.2.1. Reliance on event components
Results by language are plotted in Fig. 2A. There were main effects of path (Pathsame-vs.-different: $\hat \beta$ = 0.18, t = 8.43, p < .001), manner of cause (MannerCausesame-vs.-different: $\hat \beta$ = 0.08, t = 4.74, p < .001) and manner of object (MannerObjectsame-vs.-different: $\hat \beta$ = 0.04, t = 3.88, p < .001), indicating that all of these components were used to judge event similarity. We were especially interested in interactions of these components with language to assess cross-linguistic differences in similarity judgements (Fig. 2A). The analysis revealed both cross-linguistic similarities and differences. There was no difference in how much speakers of each language relied on path (Pathsame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = −0.02, t = −0.58, p > .10). Swedish speakers did, however, rely significantly more on manner of object than Spanish speakers (MannerObjectsame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = 0.04, t = 2.28, p < .05). Finally, with respect to manner of cause, while Swedish speakers did numerically rely more on this component than Spanish speakers, the difference was not significant (MannerCausesame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = 0.05, t = 1.54, p > .10).
These results provide several insights. First, they show that participants in the current task and with our specific instructions were not just adopting the very simple strategy of labelling actions with verbs to gauge event similarity. In that case we should have observed that Spanish speakers relied almost exclusively on path (based on Spanish path verbs like subir ‘ascend’, entrar ‘enter’, etc.) and Swedish speakers on manner (based on Swedish manner verbs like skjuta ‘push’, rulla ‘roll’, etc.). Instead, participants seemed to engage in a complex decision-making procedure in which several components were used to assess event similarity. Footnote 5 Second, the lack of a difference in reliance on path further confirms that this component is equally salient for speakers of both languages (recall that path was systematically encoded in both languages in the norming study; see also Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). Finally, one of the manner manipulations yielded cross-linguistic differences: Swedish speakers relied more than Spanish speakers on the manner of object motion (rolling or sliding); but no significant difference was found with respect to manner of cause (pushing or pulling). This suggests that speakers are doing more than just basing event representations on the typical descriptions in their language.
What the between-group analysis leaves open, however, is whether there is a tight link between how individual speakers describe the events and then judge their similarity. A strong link would be indicative that participants were using their descriptions to perform the task. In other words, it would support a thinking-for-speaking effect (as in Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002). In contrast, the absence of such a link would suggest that the source of the influence was not coming from the linguistic descriptions per se, but was possibly an effect of habitual linguistic encoding on attention, as predicted by the salience hypothesis (cf. Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). We explore this in the next analysis.
5.2.2. Are descriptions predictive of similarity judgements?
To test the relation between descriptions and similarity judgements, we performed by-participant correlation analyses between the frequency with which an event component was mentioned in the descriptions during the linguistic encoding phase and the degree to which that component was used in the subsequent similarity arrangement task. Footnote 6 If during the arrangement task speakers are mainly retrieving previously activated linguistic representations, the extent to which they mentioned an event component should offer a good index of how much they relied on that component for judging similarity.
Fig. 3 shows scatterplots with regression lines and 95% confidence intervals for each of the three semantic components in Spanish and Swedish. The large confidence intervals for the regression lines suggest that overall the link was not very strong. First, path showed very limited spread over the x-axis because speakers tended to systematically include this component in their descriptions, which makes inferences about the relation between descriptions and similarity judgements difficult (Fig. 3, left panel). Manner of cause and manner of object (middle and right panels) show a greater spread over the x-axis, but here again the confidence intervals are broad. Note that in the case of manner of cause in Swedish the correlation numerically even runs counter to expectations, with more mentions of manner of cause predicting less reliance on this component during the similarity task (dashed line in middle panel).
Results from the six separate by-speaker correlation analyses for each component and language are shown in table 2. Only one of the correlations reached significance at the .05 level, namely manner of object in Spanish. These results suggest at best a very weak link between participant descriptions and performance on the similarity task. The low correlations between subject descriptions and their similarity arrangements are suggestive of a thinking-with-language effect and thus open up for the possibility that there may be cross-linguistic differences even in the absence of prior linguistic encoding. We tested this in the next experiment.
6. Experiment 2: similarity assessment under free encoding
Do people pay attention to aspects of the events typically encoded in their language, even when linguistic representations are not explicitly evoked as in Experiment 1? To test this, we let speakers of Spanish and Swedish carry out the same similarity arrangement task without providing prior event descriptions (free encoding condition). If it was the descriptions that drove the patterns in Experiment 1, we should find no differences between language groups in the current experiment. A lack of differences under free encoding would support a thinking-for-speaking type of effect and would replicate the findings in Gennari et al. (Reference Gennari, Sloman, Malt and Fitch2002). If, on the other hand, the same cross-linguistic patterns persist (i.e., Swedish speakers still rely more on manner-related components than Spanish speakers), this would suggest a deeper relation between the language we speak and the concepts we form, one that does not depend on explicit verbal mediation.
6.1. method
6.1.1. Participants
Twenty-four native Spanish speakers (mean age = 22.3 years, SD = 2.8) and twenty-five native Swedish speakers (mean age = 25.8 years, SD = 4.0) participated. Participants in each group were drawn from the same student populations as in Experiment 1. Participation was remunerated.
6.1.2. Procedure
The procedure was identical to that in Experiment 1 except for the familiarization phase preceding the similarity task: instead of describing the events, participants went through a familiarization phase where they silently watched the target stimuli presented in random order.
6.1.3. Materials, design, and analysis
The materials, design, and analysis were all the same as in Experiment 1.
6.2. results and discussion
The results are plotted in Fig. 2B. As in Experiment 1, we found main effects of path (Pathsame-vs.-different: $\hat \beta$ = 0.11, t = 6.66, p < .001), manner of cause (MannerCausesame-vs.-different: $\hat \beta$ = 0.12, t = 5.45, p < .001), and manner of object (MannerObjectsame-vs.-different: $\hat \beta$ = 0.04, t = 3.11, p < .01), again indicating that all of these components were used to judge event similarity. Critically, the interactions of these components with language were qualitatively identical to Experiment 1. Once again, Swedish speakers relied significantly more on manner of object than Spanish speakers (MannerObjectsame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = 0.05, t = 2.27, p < .05). As expected, there was no difference in how much speakers of each language relied on path (Pathsame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = 0.03, t = 0.87, p > .10). Finally, while the numerical difference with respect to manner of cause persisted (greater mean reliance by Swedish speakers), it was far from significant (MannerCausesame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = 0.03, t = 0.70, p > .10).
In Experiment 2 participants did not describe the events prior to carrying out the similarity arrangement task. Yet precisely the same qualitative patterns obtained as under linguistic encoding (Experiment 1). This outcome, together with the relative lack of correlations between speaker descriptions and their similarity judgements in Experiment 1, is suggestive of a general tendency for Swedish speakers to pay more attention to the manner of object than Spanish speakers, even when linguistic representations are not explicitly invoked, congruent with thinking-with-language and the saliency hypothesis. However, the results leave open the possibility that speakers were covertly using language during the task. That is, to help them carry out the arrangement task, participants might have described the events subvocally. If so, the fact that the same cross-linguistic difference was found in Experiments 1 and 2 would have a simple explanation: participants might have been using language to the same extent in both experiments, only overtly in the first and covertly in the second. We tease out these different possibilities in two steps. First, we present the results of a third experiment in which the ability to subvocally describe the events was reduced (Experiment 3). Second, we run a compound analysis of the experiments to find out whether there is evidence that participants are solving the similarity task in equivalent ways irrespective of encoding condition.
7. Experiment 3: similarity assessment under verbal interference
Do the results in Experiments 1 and 2 reflect a deep cognitive bias that leads Swedish speakers to pay more attention than Spanish speakers to manner-related event components, even when the use of language is blocked? We tested this by letting speakers of both languages carry out the similarity arrangement task under verbal interference throughout both the encoding and the test phase. If the same cross-linguistic difference persists, this will constitute evidence that the effect is not mediated by the on-line recruitment of language. If the effect disappears, it will suggest that the difference was due to the on-line activation of linguistically mediated event representations.
7.1. method
7.1.1. Participants
Twenty native Spanish speakers (mean age = 20.7 years, SD = 1.4) and twenty native Swedish speakers (mean age = 24.3 years, SD = 4.1) participated. In each group, participants were drawn from the same student populations as in Experiments 1 and 2. Participation was remunerated.
7.1.2. Procedure
The procedure was identical to that in Experiment 2, except that participants repeated out loud random series of three two-digit numbers throughout the experiment (cf. Trueswell & Papafragou, Reference Trueswell and Papafragou2010). A new series was presented aurally before each trial during both the familiarization and the arrangement phase.
7.1.3. Materials, design, and analysis
The materials, design, and analysis were all the same as in Experiments 1 and 2.
7.2. results and discussion
Results for the interference condition are plotted in Fig. 2C. There were main effects of path (Pathsame−vs.-different: $\hat \beta$ = 0.10, t = 6.33, p < .001), manner of cause (MannerCausesame−vs.−different: $\hat \beta$ = 0.07, t = 3.46, p < .001), and manner of object (MannerObjectsame-vs.-different: $\hat \beta$ = 0.02, t = 2.46, p < .05), so all of these components were used to some extent to judge event similarity. Critically, none of these components interacted significantly with language: path (Pathsame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = −0.01, t = −0.32, p > .10), manner of cause (MannerCausesame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = 0.00, t = −0.01, p > .10), manner of object (MannerObjectsame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = 0.00, t = 0.41, p > .10). Footnote 7
When participants’ ability to engage in linguistic mediation was reduced, the higher reliance on manner of object by Swedish speakers disappeared. The vanishing of cross-linguistic differences under verbal interference lends no support to the strong version of the salience hypothesis that language can change our underlying perceptual machinery. The result also replicates previous findings under verbal interference (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002; Trueswell & Papafragou, Reference Trueswell and Papafragou2010). Regarding the effects of Experiments 1 and 2, however, they are still consistent with two accounts. It could be the case either that Experiments 1 and 2 yielded the same results because linguistic mediation was equally active in both experiments, or that the difference persisted although language was not being used to the same extent or in the same fashion. The final compound analysis speaks to this question.
8. Compound analysis of Experiments 1–3
Experiments 1 to 3 had their main focus on cross-linguistic differences. This final section adopts a broader perspective. On the one hand, we aim to explore the weights given to the different event components, including the control variables for which we did not have explicit cross-linguistic predictions (direction, ground, and object); this will help understand how participants solved the kind of complex similarity task we gave them. On the other hand, a compound analysis lets us directly assess whether the weights assigned to each component varied as a function of encoding condition; this will be informative as to the nature of event representations that participants create depending on whether language is engaged or not.
8.1. analysis
For this analysis we fitted a very similar model to those in Experiments 1 to 3, except that now we used the data from all three experiments and added the factor encoding condition, with three levels: linguistic, free, and interference (corresponding to Experiments 1 to 3, respectively). Encoding condition was forward coded, so that we can compare linguistic encoding against free encoding (Experiments 1 vs. 2), and free encoding against verbal interference (Experiments 2 vs. 3). As in experiments 1–3, all other predictors were centred with a difference of 1, so that the reported coefficients represent the estimated difference in similarity ratings between the two levels of each predictor. See Supplementary Materials for full specification and output of the model.
8.2. results and discussion
Results by encoding condition are shown in Fig. 4. When considering all three experiments together, the analysis again found main effects of path (Pathsame-vs.-different: $\hat \beta$ = 0.13, t = 11.01, p < .001), manner of cause (MannerCausesame-vs.-different: $\hat \beta$ = 0.09, t = 7.75, p < .001), and manner of object (MannerObjectsame-vs.-different: $\hat \beta$ = 0.03, t = 4.71, p < .001). There were also main effects of the control variables: left/right direction (Directionsame-vs.-different: $\hat \beta$ = 0.10, t = 7.40, p < .001), ground (Groundsame-vs.-different: $\hat \beta$ = 0.03, t = 2.77, p < .01), and object (Objectsame-vs.-different: $\hat \beta$ = 0.05, t = 2.44, p < .05), indicating that participants also used these components in their similarity judgements. Crucially, we found two significant interactions of event components with encoding condition: path was used more under linguistic encoding (Experiment 1) than free encoding (Experiment 2) (Pathsame-vs.-different × Encodinglinguistic-vs.-free: $\hat \beta$ = 0.07, t = 2.97, p < .01), while the opposite was true for left/right direction (Directionsame-vs.-different × Encodinglinguistic-vs.-free: $\hat \beta$ = −0.10, t = −3.42, p < .001). No other event component interacted with encoding condition. As for interactions of event components with language, the analysis again yielded a greater reliance by Swedish speakers on manner of object (MannerObjectsame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = 0.03, t = 3.02, p < .01), an effect we know was driven by Experiments 1 and 2. Additionally, however, we found a reliable difference with respect to ground, indicating that Spanish speakers paid more attention than Swedish speakers to the particular landscapes shown in the stimuli (Groundsame-vs.-different × LanguageSwedish-vs.-Spanish: $\hat \beta$ = −0.04, t = 2.71, p < .01). Finally, there were no significant three-way interactions between either of the event components, language, and encoding condition, suggesting that encoding condition had the same effect across languages. Footnote 8
The compound analysis showed that numerically the most predictive component of event similarity across experiments was path. It was closely followed by manner of cause and direction, with only slightly lower mean effects. Manner of object, ground, and object had an overall lower, but significant, effect on similarity judgements. However, event components were not equally important across conditions: the effect of path was stronger if events had been previously described, whereas the effect of left/right direction was smaller under this condition. This implies that participants were not simply adopting the same strategy in the linguistic and the free encoding conditions: participants in free encoding do not seem to have silently (i.e., subvocally) formulated full-blown linguistic descriptions of the events. The observed decrease in the importance of path and the concomitant increase in the importance of direction suggest that verbal mediation was stronger in the linguistic condition, boosting the importance of a component that is typically verbalized (path) and supressing the effect of a visually salient but linguistically irrelevant category (left/right direction). In the free condition, linguistically relevant and non-relevant categories seemed to be activated in tandem.
Last, the finding that Spanish speakers relied more on the ground (i.e., landscape) than Swedish speakers was not predicted on the basis of the linguistic descriptions in the norming study (ground information was systematically and equally often encoded in both languages, see Supplementary Materials). Slobin (Reference Slobin, Gumperz and Levinson1996) has shown that the rhetorical style in Spanish is particularly prone to encoding static locative descriptions (i.e., grounds), but adducing this as an explanation remains post-hoc. Given that the difference only arose in the compound analysis, but not in the individual experiments, we remain cautious about this unexpected finding.
9. General discussion
This study has extended research on motion cognition to the domain of caused motion. We used a paradigm that let participants choose among several event components when judging the similarity of dynamic scenes, and we explored the role of language in those choices. Our first and narrow question was about cross-linguistic differences: Do the components used by Spanish and Swedish speakers mirror the lexicalization patterns of their languages? We found the answer to be a partial ‘yes’. Since manner information is expressed more often in Swedish than Spanish (see norming study), Whorfian accounts predict a greater focus by Swedish speakers on this component when mentally representing events. We found this to be the case for only one of the two manner manipulations in our stimuli: Swedish speakers tended to rely more than Spanish speakers on manner of object (whether the object rolled or slid), but both groups equally relied on manner of cause (whether the agent pushed or pulled the object). The difference in manner of object was found when participants judged event similarity after having described the events (Experiment 1: linguistic encoding) or after having watched them silently (Experiment 2: free encoding). However, the difference disappeared under verbal interference (Experiment 3), lending no support to the view that language changes our underlying perceptual machinery. This seems to correspond to thinking-with-language, the hallmark of which is precisely that “it can be eliminated by having people engage in a verbal interference task” (Wolff & Holmes, Reference Wolff and Holmes2011, p. 256).
Why not simply call it thinking-for-speaking? Recall that in the free encoding condition (Experiment 2) participants were not asked to produce verbal descriptions at any point, which is the typical scenario for thinking-for-speaking, as formulated by Slobin (Reference Slobin, Gentner and Goldin-Meadow2003). They did have the possibility of using language, and it is even quite likely that they did so, as verbal encoding could offer a way of keeping the events in memory during a rather long and complex task. Footnote 9 However, we know from the compound analysis that the linguistic and free encoding conditions did differ in ways suggestive that there was more verbal mediation in the former than in the latter. This, coupled with the weak correlations between descriptions and similarity judgements in Experiment 1, suggests that the effect is one of thinking-with-language, which in turn is very similar to what has been referred to as language-as-strategy (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002). Overall, then, our results from the domain of caused motion replicate previous cross-linguistic differences found in voluntary motion (Gennari et al., Reference Gennari, Sloman, Malt and Fitch2002; Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010; Lai et al., Reference Lai, Garrido Rodriguez and Narasimhan2014; Papafragou & Selimis, Reference Papafragou and Selimis2010). We conclude that – independent of the degree of complexity of the events, which is greater for caused motion than for voluntary motion – language-specific labels may modulate event conceptualization when made salient in a given task.
A question that remains is the lack of cross-linguistic differences for manner of cause in the arrangements, even though this component was encoded significantly more often in Swedish than in Spanish event descriptions. This result poses a challenge to any simple Whorfian account and calls for a more complex model of the correspondence between linguistic and conceptual patterns. One possibility is that manner of cause has higher cognitive salience than manner of object, which would override any language-induced difference. This may be because pushing or pulling an object pertains to the agent and is thus linked to a highly prominent role in the event (i.e., the protagonist). Alternatively, pushing and pulling can be reduced to the relative position of the object with respect to the agent (in front or behind), a property that was arguably more visually salient in our stimuli than the way the object moved. Analyzing linguistic data, Malt et al. (Reference Malt, Ameel, Imai, Gennari, Saji and Majid2014) recently provided evidence that Spanish speakers make less fine-grained manner distinctions than speakers of S-languages, but that speakers of both language types converge for more general manner categories (see also Slobin, Ibarretxe-Antuñano, Kopecka, & Majid, Reference Slobin, Ibarretxe-Antuñano, Kopecka and Majid2014, for a similar argument). Pushing or pulling objects might correspond to one of these rather general manner categories. In any case, the question remains open for future research of why the conceptual representation of some but not other manner categories reflects cross-linguistic differences.
From a broader perspective, our findings also contribute new evidence to the more general question of how people weigh different components when judging event similarity. Uncontroversially, we found that cross-linguistic differences in event representation were played out against a backdrop of commonalities. First, the results confirmed our expectation that speakers of S- and V-languages should not differ in their focus on path because this component is routinely encoded in both language types (see also Kersten et al., Reference Kersten, Meissner, Lechuga, Schwartz, Albrechtsen and Iglesias2010). More interestingly, the compound analysis found that the effect of path, a category that is often held to be basic and universal (Talmy, Reference Talmy2000a), diminished when participants did not describe the events (Experiment 2 vs. 1). This change in the importance of path across encoding conditions held for both languages and was accompanied by an increase in reliance on left/right direction, a component that is hardly ever expressed in motion descriptions. Together this suggests that speakers changed how they represented the events depending on the degree of verbal mediation.
Recall that, at the other extreme of the Whorfian hypothesis, there is the idea that spatial cognition is guided by universal conceptual categories such that motion cognition is independent of language (Papafragou, Reference Papafragou, Carruthers, Laurence and Stich2008). In light of the results of the compound analysis, we submit that a theoretical framing that sets as its primary goal the identification of language-independent ‘core’ conceptual categories might miss out on certain crucial insights as to how humans conceptualize events. Concretely, should a category such as path – which differentiates between motion upwards, inwards, etc. – be part of this core conceptual repertoire? At the very least, the compound analysis suggests that certain aspects of how we construe path are susceptible to linguistic mediation, which resonates with the view that conceptual categories are formed dynamically as a function of the task (Barsalou, Reference Barsalou1983; Casasanto & Lupyan, Reference Casasanto, Lupyan, Margolis and Laurence2015), rather than being fixed categories. Recent theoretical accounts consider strict divisions between verbal and non-verbal cognition as moot, arguing instead for a tight link between high-level cognitive representations and low-level conceptual categories (Lupyan, Reference Lupyan2012; Lupyan & Clark, Reference Lupyan and Clark2015). These accounts see cognitive processing as essentially flexible and task-dependent. Under this view, instead of dealing with language effects as superficial ‘intrusions’ into pure cognition (cf. Gleitman & Papafragou, Reference Gleitman, Papafragou, Holyoak and Morrison2012), we might conceive of language as one of the main ingredients that allows us to flexibly form the type of conceptual representations needed to understand events. In that process, it seems that the mental categories we form will at least partially be coloured by the particular language we happen to speak.
Supplementary Materials
For supplementary material for this paper, please visit http://dx.doi.org/10.1017/langcog.2016.22.