1. Introduction
Learning with multimedia environments where verbal and visual information is concurrently provided has been recognized as effective for receptive and productive second language (L2) vocabulary acquisition, particularly in English as a foreign language (L2) (Lomicka, Reference Lomicka1998; Sato & Suzuki, Reference Sato and Suzuki2010; Yoshii, Reference Yoshii2006). This is supported by the dual coding theory (Paivio, Reference Paivio1971), which claims that knowledge representation in verbal and visual modes may facilitate processing and therefore aid understanding and retention of knowledge more effectively than representations depending on a single mode. This theory is resonant with the generative theory of multimedia learning (Mayer, Reference Mayer1997) as such multimodal knowledge representation is feasible with the use of computers.
However, previous studies that support L2 vocabulary acquisition in a multimedia environment have discussed why visual aids facilitate L2 vocabulary learning from the perspective of multimedia learning theory rather than linguistic theory. As a result, they failed to mention different learning effects of different vocabulary items (Sato, Reference Sato2016a). The present study draws on cognitive linguistics (CL) as a rationale for the use of visual aids to facilitate L2 learning and has developed visual aids based on a CL framework (Boers & Lindstromberg, Reference Boers, Lindstromberg, Boers and Lindstromberg2008; Boers, Warren, Grimshaw & Siyanova-Chanturia, Reference Boers, Warren, Grimshaw and Siyanova-Chanturia2017; Gao, Reference Gao2011). According to dual coding theory, which underpins CL-based pedagogies, the combination of verbal and visual input serves as a trigger to create mental imagery and improve understanding of word meanings. Several L2 vocabulary acquisition studies have found evidence to support this hypothesis (Boers, Píriz, Stengers & Eyckmans, Reference Boers, Píriz, Stengers and Eyckmans2009; Lam, Reference Lam2009; Wong, Zhao & MacWhinney, Reference Wong, Zhao and MacWhinney2018). The present study seeks to contribute to this field of research by investigating the efficacy of computerized visual aids in an experimental study with Japanese and Chinese learners of L2 English involving technology-enhanced visual aids. The study will be presented in more detail in the following sections.
2. Theoretical framework
2.1 The literature on CL
CL theory relies on a usage-based model claiming that it is through metaphor that our bodily experiences with actual language use and interaction extend to abstract concepts (Langacker, Reference Langacker1987). This conceptual relatedness (Lakoff, Reference Lakoff1987) can be observed in polysemic verbs (e.g. take) and prepositions (e.g. over), each of which has several meanings ranging from prototypical meanings (related to bodily experiences) to figurative ones (Lakoff, Reference Lakoff1987; Langacker, Reference Langacker1987; Tyler & Evans, Reference Tyler and Evans2003). The patterns that emerge as a result of abstraction from our bodily experiences are image schemata (Johnson, Reference Johnson1987; Lakoff, Reference Lakoff1987). As shown in Figure 1, an image schema is often represented as a form of a visual image (Dewell, Reference Dewell1994; Lakoff, Reference Lakoff1987; Tyler & Evans, Reference Tyler and Evans2003). CL claims that the image schema systematically connects all the meanings of a word, from literal to figurative ones (Langacker, Reference Langacker1987), because the image schema may function as a device for a metaphorical extension, which maps our bodily experiences to metaphorical ideas that underlie our ways of thinking (Gibbs, Reference Gibbs2005).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_fig1.png?pub-status=live)
Figure 1. Example of a cognitive-linguistics-based image schema of the preposition over (Dewell, Reference Dewell1994)
2.2 CL and L2 acquisition studies
The CL concept of image schema has been applied to L2 vocabulary acquisition, especially when the focus of L2 instruction is on meaning (Boers, Reference Boers2013) as in understanding idioms, phrasal verbs, or colloquial expressions. Tasks that facilitate conceptual relatedness by relating concrete meanings to abstract meanings are thought to facilitate cognitive engagement with the target knowledge, leading to deeper processing and therefore better retention and retrieval. This process is known as semantic elaboration (Boers & Lindstromberg, Reference Boers, Lindstromberg, Boers and Lindstromberg2008; Verspoor & Lowie, Reference Verspoor and Lowie2003). Many studies have reported positive effects for this type of CL-based instruction and materials on L2 vocabulary learning (Boers, Reference Boers2000; Boers & Lindstromberg, Reference Boers, Lindstromberg, Boers and Lindstromberg2008; Chen, Reference Chen2009; Littlemore, Reference Littlemore2009; Lu & Sun, Reference Lu and Sun2017; Tyler & Evans, Reference Tyler and Evans2003; Verspoor & Lowie, Reference Verspoor and Lowie2003; Yasuda, Reference Yasuda2010). Verspoor and Lowie (Reference Verspoor and Lowie2003) show that providing the core sense or the most prototypical meaning of target English words in a newspaper article results in longer retention, of the figurative sense also. In a study by Yasuda (Reference Yasuda2010), Japanese learners who received CL-based instruction made conscious metaphorical extensions of the target English phrasal verbs, including up, down, into, out, and off, and showed better retention than those who received non-CL-based instructions.
2.3 Applying the CL framework to dynamic L2 learning materials
The present study further hypothesizes that CL-based L2 learning materials presented in a multimedia environment lead to better retention of target vocabulary than either verbal or visual information. This effect is due not only to the simultaneous representation of the image and visual modes underpinned by the dual coding theory but also to the representation of dynamic images defined as “simulated motion picture[s] depicting movement of drawn (or simulated) objects” (Mayer & Moreno, Reference Mayer and Moreno2002: 88). Roche and Scheller (Reference Roche, Scheller, Zhang and Barber2008) postulate that the visual aids used to support mental imagery may be usefully presented in dynamic form, because mental images in language processing involve motion in nature. Brett (Reference Brett1998) also demonstrated that the advantage of dynamic aids over verbal aids lies in the combination of visualization, sequence, motion, and trajectory. Dynamic aids thus helps learners to develop mental representations of motions and processes (Höffler & Leutner, Reference Höffler and Leutner2007), and instructions with dynamic aids have been found to be more effective in multimedia environments than static images (Craig, Gholson & Driscoll, Reference Craig, Gholson and Driscoll2002; Lin & Dwyer, Reference Lin and Dwyer2010). Several L2 studies have supported this claim with respect to grammar (Roche & Scheller, Reference Roche, Scheller, Zhang and Barber2008), reading (Huang & Chuang, Reference Huang and Chuang2016), and vocabulary acquisition (Aldera & Mohsen, Reference Aldera and Mohsen2013; Al-Seghayer, Reference Al-Seghayer2001). Rusli, Ardhana, Degeng and Kamdi (Reference Rusli, Ardhana, Degeng and Kamdi2014) explained this advantage by suggesting that dynamic images facilitate semantic elaboration more than static images do.
The present study hypothesizes that the advantage of dynamic aids, which can depict motions and processes, may be harnessed for the representation of image schemata as a visual aid for L2 learning, leading to better retention of target vocabulary. Littlemore (Reference Littlemore2009) claims that three-dimensional diagrams might be useful in CL-based L2 vocabulary acquisition if the image schema is displayed dynamically. For example, in the case of the preposition over illustrated by the trajectory image in Figure 1, dynamic-image technology can easily display motion. Roche and Scheller (Reference Roche, Scheller, Zhang and Barber2008) also concluded that the dynamic images developed along cognitive principles bring positive learning effects.
CL-based dynamic images can describe the relation between prototypical senses and schematic images. Langacker (Reference Langacker1987) claims that the image schema derived from the schematization of a prototypical sentence could help connect all the meanings of the word, resulting in an organized semantic network. In theory, the dynamic image describing a transition from a prototypical sense to an image schema could help L2 learners understand that all the meanings of the word are systematically related via the image schema, be they literal or figurative (see Figure 4).
Since not many studies examined the advantages or the effectiveness of animation in L2 vocabulary learning (Mohsen & Balakumar, Reference Mohsen and Balakumar2011), we conducted previous studies involving dynamic image schema (see Figure 2) for L2 prepositions and then verified their effectiveness (Lai, Sato & Burden, Reference Lai, Sato and Burden2021; Sato, Reference Sato2016a, 2016b; Sato & Suzuki, Reference Sato and Suzuki2010). All the previous studies listed above were based on the hypothesis that dynamic image schemas may provide more effective visual aids than verbal aids or static image schemas. As a result of comparative analyses between the Japanese undergraduates who worked with static and dynamic visual aids, those in both groups improved their appropriate choices of the target words. However, the animated image groups did not significantly outperform static image groups.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_fig2.png?pub-status=live)
Figure 2. Dynamic image schema of over as a visual aid for L2 learning (Sato & Suzuki, Reference Sato and Suzuki2010)
2.4 Individual factors in CL-based multimedia L2 learning
Considering that no significant differences were observed between static and dynamic image schemas despite the theoretical support for the effectiveness of CL-based dynamic image schemas in L2 vocabulary learning, the present study hypothesizes that the effectiveness of dynamic aids varies according to individual differences. Despite the fact that the impact of individual factors has been emphasized in the field of second language acquisition (Ehrman, Leaver & Oxford, Reference Ehrman, Leaver and Oxford2003), CL-based L2 learning (Boers & Lindstromberg, Reference Boers, Lindstromberg, Boers and Lindstromberg2008), and multimedia learning (Alwi & McKay, Reference Alwi and McKay2015; Yang, Reference Yang2016), few computer-assisted language learning (CALL) studies have addressed such factors, especially in terms of the visual representation of knowledge. Based on the individual difference principle (Mayer, Reference Mayer1997; Mayer & Moreno, Reference Mayer and Moreno2003), the present study hypothesizes that two individual learner factors will affect L2 language processing when multimodal treatments are employed. These factors are addressed in this section of the paper.
2.4.1 Learners’ information-processing styles
One individual factor foregrounded in our study is information-processing style. Learners who are better at conceptualizing knowledge with the help of visual information are called imagers (or high imagers), and those who are better at analyzing knowledge through verbal information are called verbalizers (or low imagers) (Boers & Littlemore, Reference Boers and Littlemore2000; Riding & Rayner, Reference Riding and Rayner1998). Imagers tend to process information using visual representations and will learn easily through visual modes, whereas verbalizers tend to process information using words and will learn better if the information is presented only with a verbal mode (Ghinea & Chen, Reference Ghinea and Chen2008). To identify learners’ information-processing styles, the Style of Processing (SOP) questionnaire (Childers, Houston & Heckler, Reference Childers, Houston and Heckler1985) is used; the questionnaire consists of 22 statements to be responded to on a 4-point scale to determine to what degree the respondents processed information with visual images and with words respectively. Respondents are defined as verbalizers or imagers, as in the previous studies (Boers at al., Reference Boers, Píriz, Stengers and Eyckmans2009; Lee, Reference Lee2017; Littlemore, Reference Littlemore2004) related to L2 vocabulary learning.
The information-processing or cognitive styles in question have been addressed by several CL-based L2 studies (Boers, Eyckmans & Stengers, Reference Boers, Eyckmans and Stengers2006; Littlemore, Reference Littlemore2004). For example, Littlemore (Reference Littlemore2004) examined the effect of metaphoric extension strategies for L2 vocabulary learning among upper-intermediate L2 graduate students and found that imagers tended to use these strategies more successfully than verbalizers. Boers, Lindstromberg, Littlemore, Stengers and Eyckmans (Reference Boers, Lindstromberg, Littlemore, Stengers and Eyckmans2008) also verified the efficacy of CL-based pictorial aids with similar participants and found that the imagers outperformed the verbalizers in a productive translation task. Boers et al. (Reference Boers, Eyckmans and Stengers2006) also observed that imagers obtained better scores in multiple-choice tests for English idioms. This finding supports the main tenet of multimedia learning theory: enhanced ability to establish a mental model of a scene is associated with higher learning gains (Höffler & Leutner, Reference Höffler and Leutner2011).
In spite of such findings regarding the importance of information-processing styles in CL-based L2 learning, this dimension has not been much discussed from a CALL perspective, particularly as regards the relative merits of simple/static versus dynamic images. Static images illustrate the spatial relationship between objects in a simple manner, whereas dynamic images display the concepts in various ways, such as with colored objects, pictures, and conceptual and dynamic motions. Therefore, our study hypothesizes that imagers, who process information preferentially via the visual channel, may obtain better L2 vocabulary learning effects from dynamic images than verbalizers.
2.4.2 Learners’ first language
The other possible individual factor affecting the effectiveness of dynamic aids is learners’ first language (L1). As Wolter (Reference Wolter2006) points out, L1 lexical knowledge affects the L2 lexical network. On the one hand, L1 knowledge can help to construct meaning, as a similar lexical structure between L1 and L2 may facilitate more rapid development of learners’ L2 lexical knowledge. For example, Boers (Reference Boers2000) demonstrates the significant impact of L1-L2 similarity on retention of the target L2 vocabulary. On the other hand, L1 interference may also hinder learners in their understanding of the meanings of L2 vocabulary items (Ellis, Reference Ellis2006). In the case of Japanese L2 learners, who constitute one of the participant groups in our study, L1 interferes with L2 vocabulary development in learning L2 polysemous words (Tanaka, Reference Tanaka1990). When L1 translations are added to each sense of a word, misunderstanding of the correlations among the senses may arise, causing misuse. To illustrate this, consider the spatial prepositions over/above/on in English, no ue ni/no ue wo in Japanese, and zai… shang/guo in Chinese, using examples from Tyler and Evans (Reference Tyler and Evans2003). The examples show (a) the English sentence, (b) a literal Japanese translation, (c) a grammatically correct Japanese translation, (d) a literal Chinese translation, and (e) a grammatically correct Chinese translation. The asterisks in the following refer to ungrammatical sentences and the sentences in parentheses are the English translations according to the word order of Japanese and Chinese.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_tabu1.png?pub-status=live)
In English, the prototypical sense of above is used when one object not only is at a higher location than another but also has no contact with it and shows no movement; the prototypical sense of on entails contact between two objects; that of over implies an object moving across the space above the other objects. In Japanese, no ue ni(wo) refers to the spatial situation of something located in a higher place, without the English distinctions apparent in above, on, and over. Such semantic overlapping in the L2 mental lexicon tends to prevent L1 Japanese learners of L2 English from appropriately using above or on instead of over in context. Chinese has two expressions equivalent to above, on, and over: zai… shang and guo. In the first sense, zai is a locational preposition, while shang refers to the upper surface of an object, which together form a meaning similar to the primary senses of English prepositions on and above, as shown in sentences (2) and (3). However, sentence (1), which used the word guo, involves the crossing sense, which only the meaning of over has in English.
We hypothesize that Japanese L2 learners might experience more difficulty learning the target prepositions than Chinese L2 learners because the Chinese language makes a two-way distinction in the spatial frame covered by the three English prepositions, whereas Japanese makes no distinction at all. From the perspective of multimedia learning theory, learners’ prior knowledge is a critical factor in successful learning: multimodal representation will be more beneficial for those who have low prior knowledge, as those with high prior knowledge can generate their mental images without visual aids (Mayer, Reference Mayer1997; Mayer & Anderson, Reference Mayer and Anderson1992). In our study, the relevant prior knowledge concerns the semantic distinctions present in the L1, which we have seen may or may not correspond to the appropriate L2 patterns. Considering these conflicting arguments, the objective of the present study is to test empirically whether the efficacy of dynamic aids for learning L2 spatial prepositions differs according to not only information-processing styles but also L1.
3. Research questions
Taking into account existing empirical studies of multimedia L2 vocabulary learning, based on the CL framework and individual differences, we hypothesize that effective use of dynamic visual aids when learning L2 prepositions will be affected by both information-processing styles and L1. To test this hypothesis, the current study used the same research protocol with L2 English learners in Japan and Taiwan, anticipating that the research findings in each context would differ, despite the use of the same visual aids. Our study seeks to answer the following research questions (RQ):
-
1. When L2 learners use dynamic aids to learn target prepositions, are learning gains greater than when static aids are provided?
-
2. When L2 learners use dynamic aids to learn target prepositions, does information-processing style (verbalizer versus imager) affect their learning gains?
-
3. When L2 learners use dynamic aids to learn target prepositions, does L1 (Chinese or Japanese) affect learning gains?
4. Method
4.1 Participants
In total, 109 students participated in the present study. They were all the students of the authors and all the tasks were conducted in our English language classes. The L1 Japanese participants were 58 undergraduates from three different universities in Japan; the Chinese L1 speakers were 51 undergraduate students from a private university in Taiwan. All reported intermediate L2 English proficiency (TOEFL (PBT) 457–527; (CBT) 137–197; TOEIC 550–750; G-TELP (Level 2): Near Mastery). Participants in each country were randomly assigned to one of two intervention groups. Each group used a different type of visual aid to learn the target words. As previous studies have already demonstrated the advantage of visual aids over exclusively verbal stimuli (Boers et al., Reference Boers, Lindstromberg, Boers and Lindstromberg2008; Lam, Reference Lam2009; Sato, Reference Sato2016b), no control group without visual aids was included, and the study focuses on the relative effectiveness between two types of visual aids: static or dynamic.
The pretest showed no significant difference, t(107) = –1.93, > .05, between the static (n = 59, M = 19.42, SD = 3.28) and the dynamic groups (n = 50, M = 20.70, SD = 3.63), indicating similar English proficiency and knowledge of the target prepositions before the treatment.
4.2 Treatment materials
The present study targeted three English spatial prepositions – above, on, and over – all of which refer to the situation in which one object is at a higher position than another object but reflecting different positional relationships to that object. To learn these prepositions, three image-schema-based visual aids were presented to each group. For the static group, each visual aid was shown in the form of static pictures, which were visual glosses of an English-Japanese dictionary (Tanaka, Takeda & Kawade, Reference Tanaka, Takeda and Kawade2003). Figure 3 shows the visual aid for on. This figure illustrates that one object is in direct contact with another object, but their spatial relation is not always vertical, which differentiates over and on.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_fig3.png?pub-status=live)
Figure 3. Static visual aid based on the image schema of the preposition on for L2 learning (Tanaka, Takeda & Kawade, Reference Tanaka, Takeda and Kawade2003)
For the dynamic group, meanwhile, dynamic visual aids based on the image schemas were shown. Figure 4 shows a consecutive dynamic image of on, which is displayed for one minute. The image-schema-based dynamic images (originally developed as commercial materials on a web-based L2 learning system, now freely available on YouTube) start with a photograph depicting one of the prototypical situations involving prepositions with audio input: “The boy is putting ketchup on French fries.” The photo is then blacked out and some parts of it are redisplayed consecutively while the sentence is repeated twice. The final image of the sequence is displayed when “putting on” is announced and shown in Figure 4. Blue dots and lines superimposed on the photograph are intended to help learners understand the positional relationship between the two objects in the image schema. The ketchup and French fries are then displayed once more. The same dynamic aids were used in the case of the other prepositions (“The plane is flying above the clouds” and “The balloon is flying over the mountain”).
In addition to the visual aids, both groups used the same paper-based learning materials to consolidate their learning of the target prepositions. The materials consisted of indices of the literal and figurative preposition meanings and example sentences taken from the dictionary (Tanaka et al., 2003). Along with the example sentences, L1 translations (Japanese or Chinese) were added. For example, two indices were shown for on: (1) physically in contact with something; and (2) metaphorically in contact with something, followed by several example sentences such as (1) Pull the knob on the door; (2) We live on rice.
4.3 Test materials
This study involved one survey and three tests. The survey was the SOP questionnaire (Childers et al., Reference Childers, Houston and Heckler1985) to divide the participants into verbalizers (n = 53) and imagers (n = 56) for the data analysis. The test was the cloze test, which was conducted as a pretest, posttest, and delayed posttest to examine comprehension of sentences with the target prepositions. The test consisted of 40 items (see Appendix 1 in the supplementary material), all of which involved selecting the correct target prepositions, developed according to the index of the meanings of each preposition (12 meanings for above questions, 14 for on questions, and 14 for over questions) listed in an English-Japanese dictionary (Tanaka et al., 2003), which includes the CL-based indexes with image-schema-based visual aids. Following Wong et al.’s (Reference Wong, Zhao and MacWhinney2018) cloze test for L2 English prepositions, the test covered literal and figurative meanings (e.g. “a full moon above the sea level” versus “a lecture above my comprehension”). The target items were developed by the third author, an L1 speaker of English, and 84.9% of all words used (523/616) belonged to the 2000 frequency band (Nation, Reference Nation2001).
4.4 Procedures
Figure 5 shows the research procedures that were carried out in exactly the same manner in Japan and Taiwan during regular class meetings, with visual aids displayed to the appropriate class using a video projector and large screen. The protocol began with the processing styles questionnaire to identify verbalizers and imagers. Following the 40-item pretest, the schematic images for the target prepositions above, on, and over were displayed: one class viewed the static visual aids (Figure 3), whereas the other saw the dynamic aids (Figure 4). In both conditions, images were displayed for one minute and the audio was heard twice. After the visual aid input, all participants in both classes were given the paper-based learning materials described above. The participants were then asked to form a mental link by connecting the visual aid they had just seen with the literal and figurative meanings of the preposition given in the materials. When they had finished studying, they viewed the visual aid for another preposition. The time taken to learn using the materials depended on the number of meanings for each target preposition: three minutes for above and six minutes each for on and over.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_fig4.png?pub-status=live)
Figure 4. Dynamic visual aid based on the image schema of on for L2 learning (Sato, Reference Sato2016a)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_fig5.png?pub-status=live)
Figure 5. Research procedures
Immediately following this learning phase, the posttest was administered using exactly the same questions, randomly reordered, and without feedback on participants’ responses. A delayed posttest took place two weeks later under the same conditions. The participants were given 15 minutes without dictionaries to complete each 40-item test. This time limit was judged to allow sufficient time such that time pressure would not be a confounding influence on their scores. No feedback was provided until the end of the intervention.
4.5 Analysis and scoring
Data were collected from the participants’ processing style questionnaires, pretest, posttest, and delayed posttests. The scores of the SOP questionnaire (ranging from 22 to 88) were used to classify each participant as either a verbalizer or an imager, based on whether his or her SOP score was respectively higher or lower than the average score of the class they belonged to. Because the SOP questionnaire has been widely utilized in L2 learning research (e.g. Boers et al., Reference Boers, Píriz, Stengers and Eyckmans2009) to gauge the efficacy of the integration of verbal and visual aids, it is used in the present study without adaptation.
The tests were scored according to the number of correct answers on the tests with a maximum total of 40 points, then several comparative analyses between the groups were conducted. To answer RQ1, the learning gains of the two tests (pretest to posttest; posttest to delayed posttest) were analyzed using a t-test. Then, to answer RQ2 and RQ3, two-way factorial ANOVAs (visual aid x information-processing style; visual aid x L1) were conducted for the same two learning intervals (pretest to posttest; posttest to delayed posttest) to verify the effect of individual factors between static and dynamic image groups. All the following findings were analyzed at a significance level of .05 (two-tailed). To confirm the validity of ANOVA for this study, we conducted visual observations of the graphs, including histograms and dot plots: small deviations from normality and few extreme equal variances served as confirmation.
5. Results
5.1 Tests and written tasks without individual factors
Table 1 shows the results of the three tests without considering processing style. Each group in Table 1 consists of both Japanese and Taiwanese participants, and imagers as well as verbalizers. A t-test between groups showed no significant difference, t(107) = 0.15, > .05, in learning gains from pretest to posttest between the static image (M = 4.12, SD = 3.68) and dynamic image groups (M = 4.02, SD = 3.31). A second t-test also showed no significant difference, t(107) = 1.20, > .05, in learning gains from the posttest to delayed posttest test between the static image (M = 1.71, SD = 3.85) and dynamic image (M = 0.92, SD = 2.88) groups.
Table 1. Average scores of three (pretest, posttest, and delayed posttest) tests (N = 109)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_tab1.png?pub-status=live)
These results confirm equivalent learning by both groups following the treatment with visual aids, whether static or dynamic: no significant difference in accurate use of the target prepositions across image conditions was revealed for the whole group. The remainder of our study therefore addresses individual factors: processing style and L1.
5.2 Tests with individual factors
Table 2 shows the average scores of the 109 participants organized according to information-processing style (verbalizers or imagers). As noted, there were 53 verbalizers and 56 imagers (RQ2) and 51 Taiwanese and 58 Japanese (RQ3).
Table 2. Average scores on three tests by visual aid and information-processing style
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_tab2.png?pub-status=live)
5.2.1 Effects of information-processing styles
Two-way factorial ANOVAs were conducted to verify the influence of the two between-subject factors (type of visual aid and information-processing style) on learning gains across pretest and posttests. Although no interaction effect was found between the factors, F(1,105) = 0.61, > .05, a main effect was observed in the information-processing style factor, F(1,105) = 13.92, < .05, ηp2 = 0.18. As the mean score of the imagers (M = 5.23) was higher than that of the verbalizers (M = 2.85), it seems that imagers were better able to benefit from both static and dynamic visual aids, irrespective of L1. The same analysis was conducted on learning gains from posttest to delayed posttest test and revealed an interaction effect, F(1,105) = 0.02, > .05, but no main effect either for visual aid, F(1,105) = 1.47, > .05, or information-processing style, F(1,105) = 2.17, > .05.
5.2.2 Effect of L1s
A further two-way ANOVA was applied to investigate the findings as shown in Table 3. In the analyses, we analyzed the effect of the two intervening factors (visual aids and L1s) on the learning gains over the same two intervals.
Table 3. Average scores on three tests by visual aid and L1
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220413144959049-0328:S0958344021000288:S0958344021000288_tab3.png?pub-status=live)
The analysis of immediate learning gains from pretest to posttest revealed an interaction effect, F(1,105) = 4.54, < .05, ηp2 = 0.41. The main effect analysis revealed a contrastive tendency: among learners shown static images, the Japanese speakers (M = 4.76) learned more than their Taiwanese counterparts (M = 3.50), whereas in the dynamic image group, the Taiwanese speakers (M = 4.95) outperformed the Japanese speakers (M = 3.34) (see Appendix 2 in the supplementary material).
Another two-way factorial ANOVA was conducted on the retention (delayed) effect revealing an interaction effect between the in-between factors, F(1,105) = 6.66, < .05, ηp2 = 0.60. The follow-up analysis demonstrated the opposite tendency to the previous analysis, as shown in Appendix 3 (see supplementary material). Among the Japanese participants, the dynamic image group (M = 0.17) did not learn as effectively as those who saw static images (M = 2.52); among the Chinese speakers, those shown static images (M = 0.93) did less well than those who were given the dynamic version (M = 1.95). However, no interaction effect was observed between type of visual aids, F(1,105) = 1.03, > .05, and L1, F(1,105) = 0.02, > .05.
5.3 Summary of findings
We are now in a position to answer the three research questions posed in our study. Our first hypothesis – the learning of L2 prepositions is facilitated more by dynamic visual aids than by static images – is not supported. This result confirms the findings of our previous research, none of which found significant differences between static and dynamic aids for undifferentiated groups of learners (Sato, Reference Sato2016a, 2016b; Sato, Lai & Burden, Reference Sato, Lai, Burden, Colpaert, Aerts and Oberhofer2014; Sato & Suzuki, Reference Sato and Suzuki2010). Regarding our second hypothesis – learners’ processing styles differentially affect their response to dynamic versus static visual aids – clear findings were not obtained because no interaction effect was found. However, an important result concerns an immediate learning effect for information-processing style: the superior performance by imagers compared to verbalizers indicates that information-processing style affected learning gains, whether the aids were static or dynamic, and also confirms the findings of previous L2 vocabulary studies (Boers et al., Reference Boers, Eyckmans and Stengers2006; Lam, Reference Lam2009). The present study demonstrates that the advantage of imagers is preserved in technology-enhanced contexts.
The third hypothesis predicted an advantage for Chinese over Japanese L1 learners for the particular preposition contrasts selected for the study. Here, interaction effects between group and L1 in terms of learning gains showed delayed gains for Japanese speakers in that they retained more learning from dynamic than static aids at two weeks after the intervention. For the Chinese speakers the effect was an immediate one, and with dynamic aids. Since L2 English proficiency was held constant, these results suggest an effect of L1-L2 proximity, which supports our CL-inspired prediction.
6. Conclusion and discussion
Our study explored the effectiveness of CL-based dynamic visual aids in the acquisition of L2 prepositions and the impact of individual factors for the purpose of optimizing these visual aids. As few studies have been conducted to validate CL-based computer-assisted learning for L2 learners with different L1 backgrounds, our research design included two new variables: information-processing styles and L1-L2 linguistic proximity. To examine the influence of these factors, we conducted research with Japanese and Taiwanese learners of English L2 using the same visual aids and learning materials.
The findings of this research reveal the influence of learners’ L1 on the effectiveness of different types of visual aids. An immediate advantage for dynamic visual aids was detected among L2 English learners with L1 Chinese, where the semantic structures for the target prepositions are closer to English than in the case of Japanese. The L1 Japanese participants showed delayed learning gains from the use of dynamic aids. These results validate one of our hypotheses concerning the influence of learners’ L1 in multimedia L2 vocabulary learning, and we suggest that the dynamic images could be incorporated into their L2 English, thanks to their L1-related prior knowledge and concept of locative prepositions. Furthermore, the findings indicate that Japanese learners might require more time for dynamic aids because the situations encoded in their L1 are different from those in L2 English. This claim is also supported by Sato (Reference Sato2016a), which demonstrated a delayed effect for L1 Japanese using the same dynamic aids in an English writing task.
The findings of the present study suggest that dynamic images accelerate immediate L2 processing if the target spatial concepts are similar to those of learners’ L1. This processing could also facilitate deeper L2 processing (cf. Swain, Reference Swain and Lantolf2000), which might produce a durable effect in memory triggered by the dynamic images. In this respect, our research could shed light on the impact of L1 as an individual difference, which few previous CALL studies have addressed or confirmed. Additionally, we showed that imagers were better able to benefit from CL-based visual aids, whether they were static or dynamic. This confirmed the advantage to imagers in this kind of CL-based L2 vocabulary learning (Boers & Lindstromberg, Reference Boers, Lindstromberg, Boers and Lindstromberg2008) as well as its applicability to multimedia settings, another new result.
These findings also suggest pedagogical recommendations. One is the benefit of dynamic images in L2 classrooms. Although previous studies have expressed skepticism (Höffler & Leutner, Reference Höffler and Leutner2011; Lowe, Reference Lowe2003), our study examined the topic empirically. As it has become easier for L2 instructors to obtain dynamic images from YouTube, for example, showing them in their classrooms should facilitate certain learners’ comprehension of target L2 vocabulary. Students producing their own dynamic images using mobile applications may also be an effective learning tool (Wong & Looi, Reference Wong and Looi2010). It is, however, quite possible that dynamic aids will not always ensure greater L2 learning than static images (Boers et al., Reference Boers, Píriz, Stengers and Eyckmans2009). The advantage of dynamic visual input may be related to various factors such as developmental principles (Roche & Scheller, Reference Roche, Scheller, Zhang and Barber2008) or the provision of additional verbal information (Niknejad & Rahbar, Reference Niknejad and Rahbar2015). Students who fit the verbalizer profile, on the other hand, tend to learn L2 vocabulary better with a vocabulary list than with images (Sato & Burden, Reference Sato and Burden2020).
Taken together, these studies suggest that successful L2 learning requires a careful selection of relevant verbal and visual aids, which should be integrated for meaningful learning to enhance learners’ language processing (Mayer & Moreno, Reference Mayer and Moreno2002). The present study also points to a need for more attention to learners’ L1 backgrounds. In classes with learners with a range of different L1s, some learners might understand the target knowledge rather easily based on visual aids, but others may not due to greater L1/L2 distance on the target feature. Additional materials or explanations may be necessary in such contexts.
However, our study has some limitations. Participants’ responses to the SOP questionnaire may not be a strong enough indication of their status as imagers or verbalizers, as our use of the group average as the cut-off for assigning them to one group or the other can only suggest relative visual or verbal orientations. Another possible limitation was the lack of a pilot. Piloting the materials we used before treatment might have given us some insights into how to perform the treatment more efficiently and what items on the questionnaires, for example, could have been worded more clearly. Educational differences between the countries also remain to be considered. It is possible that learners’ educational background, and not simply their L1, may have had some influence on the way in which they engaged with the dynamic aids. Although the research procedures were identical in both countries, our study cannot rule out the possibility of this factor having some effect on the results, and this is therefore a question for future research.
Further research should also be conducted in more teaching contexts and with different L2 targets and L1s in order to produce more generalizable results. Longer time frames should also be envisaged, as well as more work on the role of technology in this kind of L2 teaching and learning. There is certainly scope for further study of the efficacy of CL-based dynamic images in different types of language tasks to further explore the role of individual learner differences in this framework.
Supplementary material
To view supplementary material referred to in this article, please visit https://doi.org/10.1017/S0958344021000288
Acknowledgements
This work was supported by JSPS KAKENHI Grant Number 26370658 & 18K00778.
Ethical statement
The authors declare that they have no conflicts of interest. All procedures performed in studies involving human participants were in accordance with the ethical standards of the institutional and national research committee and with the 1964 Helsinki Declaration and its later amendments or comparable ethical standards. Informed consent was obtained from all individual participants involved in the study.
About the authors
Takeshi Sato is an associate professor in the Institute of Engineering at Tokyo University of Agriculture and Technology, Japan.
Yuda Lai is an associate professor in the Department of English Language, Literature, and Linguistics at Providence University, Taiwan.
Tyler Burden is an associate professor in the Faculty of Education at Meisei University, Japan.
Author ORCIDs
Takeshi Sato, https://orcid.org/0000-0003-4797-0234
Yuda Lai, https://orcid.org/0000-0002-8224-3086
Tyler Burden, https://orcid.org/0000-0003-1525-7872