1 Introduction
In recent years, online video sharing sites have received increasing attention in the educational domain, because they offer tremendous opportunities for educators to enhance their course content (Smith, Reference Smith2009). For instance, YouTube and Google Videos are two of the resources that can be utilized for different pedagogical purposes. Other open educational resources are also rapidly becoming available (e.g., TeacherTube, Academic Earth and OpenCourseWare offered by some institutions establishing their own channels at YouTube such as National Public Radio at http://www.youtube.com/npr). Teachers can use free online video downloaders (e.g., Vixy.net and Keepvid) to convert online video into multimedia formats and store them in the computer beforehand for class use later. If a webcast is not captioned, teachers can use software such as iMovie or Adobe Premiere to generate on-screen text. Consequently, with the advancement of downloading technology and user-friendly interfaces, a major challenge for teachers does not lie in technophobia but in harnessing the potential of these expanding open source videos as teaching tools.
The context of this study was an English writing class, in which the EFL students’ productive/active vocabulary was relatively small as opposed to their receptive/passive vocabulary. In view of online video that offers additional channels for access to the target language, the teacher-researcher was motivated by concern about whether multiple exposures to English before writing would help learners to stretch their productive vocabulary. This research targeted non-basic English words (i.e., beyond the first 2000 most frequent word families) which may lie dormant in a learner's lexical repertoire and sought to determine which audiovisual format would trigger the greater use of latent productive vocabulary. The specific research question guiding the investigation was:
Is there any significant difference in the percentage of non-basic vocabulary use when learners get the same input but in different audiovisual modalities in a video-based writing task?
2 Literature Review
Since the present study concerns the development of free active vocabulary, the following literature review regarding lexical knowledge, multimedia applications and cognitive theory informs this research.
2.1 Receptive/passive vocabulary versus productive/active vocabulary
With vocabulary growth, knowledge of words may progress from superficial to deep levels at various stages of language learning. In a dichotomous way, Nation (Reference Nation2001) distinguished between receptive and productive knowledge of a word and referred to receptive vocabulary as understood in reading or listening and productive vocabulary as used in writing or speech. Laufer and Paribakht (Reference Laufer and Paribakht1998: 369) instead termed the former passive vocabulary and the latter active vocabulary, which is further called free active vocabulary when it refers to “words learners voluntarily choose to use”.
With regard to word use, Laufer and Paribakht (op.cit.) defined passive knowledge as “understanding the most frequent and core meaning of a word” and free active knowledge as “spontaneous use of a word in a context generated by the user”. We consider the present research as a case of free active vocabulary, because our students were left to make their own selection of words while writing a paragraph on a designated topic. They were allowed to use words at their free will and could avoid words that they considered problematic or were unsure of.
Contrary to the dichotomy, Henriksen (Reference Henriksen1999) split lexical competence into three continuums—partial to precise knowledge, depth of knowledge, and receptive to productive ability. The first two are knowledge-related while the third reflects a control continuum, depicting how well the learner can access and use a word. Henriksen pointed out that if learners cannot use a word correctly nor access it freely for production, this does not mean that they do not know the word, but that they have not yet achieved adequate control over word access.
As for vocabulary growth, Laufer (Reference Laufer1998) reported that active vocabulary develops more slowly and less predictably than passive vocabulary. A moderate increase in passive vocabulary within a proficiency level would not necessarily result in greater free active vocabulary. This may help to explain why the gap between passive and active vocabulary of EFL learners appears to be relatively large, as shown in the present data results.
2.2 Previous studies on audiovisual modalities
Multimedia application has been highly advocated for language learning for over two decades, because it can create diverse modalities of input through various combinations of video, audio and text and thus accommodate different learning styles. To make multimedia materials more comprehensible, videos are often augmented with captions/subtitles. When L2 learners are watching a captioned/subtitled video in the target language, they have to attend to two types of visual input (images and texts) as well as audio input (sounds) and thereby gain broad access to the target language.
According to Markham and Peter (Reference Markham and Peter2003), captions refer to on-screen text in the L2 combined with the L2 soundtrack, while subtitles are on-screen text in the L1 along with the L2 soundtrack. The present study adopted L2 captions in lieu of L1 subtitles because it aimed at multiple channels to the target language and captions served the purpose of visual contact.
The issue of whether captions/subtitles support or hinder language learning is still inconclusive. Some researchers maintain that captions play a supporting role in L2 learning (Stewart, Reference Stewart2004; Taylor, Reference Taylor2005). They assume that the combination of visual, audio and written information allows the learner to select input from images, sounds and texts simultaneously to further organize and integrate them. Therefore captions are presumed to have a positive influence on the intake of language materials. In their study, Bird and Williams (Reference Bird and Williams2002) presented novel words to advanced learners of English under three aural and visual conditions: (1) text with sound, (2) text without sound and (3) sound without text, and examined the bimodal effects on vocabulary acquisition. The results indicated that vocabulary presented with text and sound resulted in better recognition memory for spoken words. They thus drew the conclusion that bimodal presentation aids new word learning.
Similarly, Sydorenko (Reference Sydorenko2010) examined the effect of input modality on several aspects of vocabulary learning by playing the video with audio and captions, without captions but with audio, and with captions but without audio. Partially in line with Bird and Williams’ (Reference Bird and Williams2002) findings, her data results showed that more new word meanings were learned when videos were shown with both audio and captions than with either audio or captions.
Moreover, Sydorenko (op.cit.) found that video with audio tends to improve listening comprehension as it facilitates recognition and recall of aural vocabulary. This answers the question of whether learners attend to audio and provides some implications that some learners may process spoken words better when audio is present simultaneously with captions, rather than with captions only. It also suggests that aural vocabulary can be activated or learned from the audio input. The audio effect is relevant to the current study because it may explain why some students are able to write better, using a diverse vocabulary, after listening to audiovisual materials. Different from Sydorenko's investigation that targeted vocabulary acquisition with a specific focus on the recognition and recall of word form, the intention of this research was to expand productive vocabulary, as the learning context was a writing course.
In contrast to the advocacy of caption use, some researchers cast their doubts on the effectiveness of captions due to an excessive cognitive burden (Mayer & Moreno, Reference Mayer and Moreno1998), which may lead to what Sweller (Reference Sweller2005) calls a split-attention effect. Pujola (Reference Pujola2002) discovered that although some learners made progress in listening comprehension, yet they relied on caption reading instead of listening. By testing whether captioned video was helpful to novice learners of Spanish, Taylor (Reference Taylor2005) found that captions were distracting and made it difficult for learners to pay attention to the concurrent input, sound, image and text.
Previous research has focused on overall vocabulary gain via video viewing, while recent studies have switched attention to higher-level vocabulary acquisition through viewing academic lectures (Vidal, Reference Vidal2011; Yang & Sun, Reference Yang and Sun2011). By utilizing open educational resources, Yang and Sun (Reference Yang and Sun2011) proposed that apart from reading, academic listening is also a source of vocabulary acquisition. Their data results verified that technical vocabulary, low-frequency vocabulary and academic vocabulary can be acquired via watching academic lectures. This inspires the present research in that just like reading a passage, while listening to an aural paragraph, students may recognize or notice particular known words in addition to new words. As such, students should be encouraged to use and experiment with the words they hear, in particular less frequent vocabulary.
Generally, the studies reviewed have mainly surveyed the impact of video on vocabulary acquisition and comprehension. Most researchers designed their own listening tests and vocabulary tests on word recognition and recall based on the video content. Although the measures varied widely, most of them were used to calculate the gain of recognition/passive vocabulary. Few studies have investigated the growth of free active vocabulary. It is hoped that the present research may contribute to the literature of CALL in this regard.
2.3 Cognitive theory and multimedia
One of the strengths of multimedia is its dynamic audiovisual presentation. The positive outcomes of media synchronization demonstrate that better learning occurs when related information is presented simultaneously via auditory-verbal and visual-pictorial media than when the information is presented via auditory-verbal or visual-pictorial media alone. As such, the concept of dual coding in the field of cognitive science has often been borrowed as an explanation for multimedia learning performance (Najjar, Reference Najjar1995).
Dual coding first advanced by Paivio (Reference Paivio1971) attempts to give equal weight to verbal and non-verbal processing. The theory postulates that both visual and verbal information are processed differently and along distinct channels with the human mind creating separate representations for information processed in each channel. The visual-pictorial channel processes visual information such as image, animation and on-screen text, while the auditory-verbal channel processes verbal information such as narration (Mayer, Reference Mayer2001). Dual coding stresses the notion that all cognition involves associations between verbal and nonverbal systems and hence visual and verbal information do not compete with each other.
In support of dual coding as an explanation for the effect of multimedia learning, Mayer and Anderson (Reference Mayer and Anderson1991, Reference Mayer and Anderson1992) undertook a series of experiments in order to test the relevant assumptions. In one of their tests, a group of mechanically naive college students viewed a narrated animation showing how a bicycle pump or automobile brakes worked and the other group of students simply listened to a verbal explanation. The students who heard the verbal explanation with the animation performed better on a creative problem-solving test than the students who heard a narration only. The data results proved that students learn more deeply from concurrent viewing and listening than from listening alone.
The proponents of dual-coding hypotheses emphasize the integration of dually-coded information in mental structures and assume that it can lead to a reduction of the cognitive load in either the visual-pictorial or the auditory-verbal channel. Nevertheless, a possible cognitive overload in one channel caused by redundant media deserves equal attention. For example, broadcasting an animation along with concurrent narration and captions is equivalent to presenting the same words in two formats (spoken and written). The added on-screen text may compete with the animation for cognitive resources in the visual-pictorial channel. Each channel has a limited capacity for holding and manipulating knowledge (Baddeley, Reference Baddeley1986). When too many visual components are presented at a time, the visual-pictorial channel can become overloaded. Likewise, when spoken words and other sounds are presented at the same time, the auditory-verbal channel can also become overloaded. Placing a high cognitive load on the same channel may reduce the effectiveness of information presentation. When learners receive all sorts of stimuli, they have to use attention selectively (Wickens, Reference Wickens2007), because their attention capacity is limited.
In light of a cognitive overload and selective attention, teachers may need to raise their awareness of how video can be utilized to enhance writing performance, when considering the incorporation of video into a writing course.
3 Research method
3.1 The video-based writing course and the selection of video
The course English Composition was a 2-credit hour semester course for English-majoring freshmen at a university in Taiwan. The data was collected from the learners during class contact time. The researcher-teacher had one class of fifty students for the course. There were individual variations in language ability; however, all of them had been learning English for eight years.
To increase students’ exposure to English and to add variety to class activities, online video served as an add-on component that afforded students background knowledge for a writing theme. Before the writing phase, the researcher surfed on YouTube and Google Videos for film extracts for a writing topic. The video clips were selected on a level slightly above the students’ English proficiency. The matriculating students had a recognition vocabulary reaching the 4,000—5,000 word-family level on average (for the measure of vocabulary size, see Section 3.3). Therefore, the video as supplementary material must contain some advanced words beyond this level. To measure the vocabulary levels of video clips, the transcripts were entered into the RANGE program (this instrument and the notion of vocabulary levels will be detailed in Section 3.3).
The four writing topics were in turn A Coke Fountain, My Lottery Experience, Living in the City on the Sea and Internet Addiction. Finally, four video clips that went with the four writing topics were selected. Table 1 provides some information about the video clips. One of the transcripts is provided in Section 4.4 as an example.
Table 1 Four video clips for four writing topics
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:75798:20160415231608123-0231:S0958344013000220_tab1.gif?pub-status=live)
In the literature, 95% lexical coverage has often been suggested regarding the vocabulary threshold for minimally acceptable comprehension. The video concerning the city on the sea contained more proper nouns than the other three videos so as to have a lower percentage of words within the 2,000-word level.
As shown in Table 1, the four video clips were equivalent in terms of 95% of the words within the range of the 5,000-word level and a similar percentage of words within the first 2,000-word level, indicating that none of the videos was much more difficult or much easier than the other three.
3.2 The design of the study
The writing class took place in a language lab where each student had a desk installed with audiovisual equipment and a networked computer. Between the seats, there were glass partitions. Fifty students were divided into four groups, each having twelve to thirteen people. In total, four online videos were arranged for four viewing/listening-to-write tasks during the semester. To maintain class cohesion and fairness, the four groups of students took turns at using each of the four audiovisual modalities before writing a paragraph-composition: (1) captioned video, (2) non-captioned video, (3) silent video with captions and (4) video with screen off (audio only). Each student wrote four compositions in total and received a different video display mode each time. Since everyone had a headset and a computer in the lab, the same video but in four different audiovisual formats could be simultaneously delivered to the four groups (i.e., the whole class) each time before writing. Table 2 shows the design of the study.
Table 2 The design of the study
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:50360:20160415231608123-0231:S0958344013000220_tab2.gif?pub-status=live)
Mode A: captioned video; Mode B: non-captioned video; Mode C: silent video with captions: Mode D: video with screen off.
Except for the video phase, the same class routines such as lectures, writing and revision proceeded for the whole class. Therefore, the varying video delivery approaches were regarded as the explanatory variable. For the sake of the experiment, no dictionary was allowed. To avoid plagiarism, students were required to complete their drafts in class.
It needs mentioning that the students were used to video as part of routine class activities, since other English classes also used video as supplementary materials for integrated skills training. In the meantime, there was no hint that the compositions with a diverse or advanced vocabulary would be favorably graded. The students received general evaluative remarks (no grade) and indications of errors in red ink in the following week. They were required to make corrections. After revision, a holistic score in consideration of content, vocabulary, grammar and organization was given on the revised version.
Since this study targeted learners’ spontaneous use of non-basic vocabulary at will, the holistic composition scores were not adopted. Therefore, the first drafts before revision were utilized as a basis for analysis of active vocabulary size. According to Henriksen (Reference Henriksen1999), free active vocabulary shows a learner's attempts to use vocabulary and partial knowledge of receptive lexical items, signalling that s/he is in the process of acquiring and using them.
At the time of completion of the fourth composition, fifty questionnaires of five open-ended questions concerning perceptions of the video delivery modes were distributed to the students.
3.3 The instruments and measures
Regarding what levels EFL learners’ productive vocabulary may reach as opposed to their receptive vocabulary size, the RANGE program (Nation & Heatley, Reference Nation and Heatley2005) and the Vocabulary Size Test (Nation & Beglar, Reference Nation and Beglar2007) were used.
The RANGE program is installed with a large collection of word family lists derived from the British National Corpus (BNC) and the Corpus of Contemporary American English (COCA). Nation (Reference Nation2012) and his colleagues compiled twenty-five 1,000-word-family lists and ranked them according to their occurring frequency, range and dispersion in the BNC/COCA. Along this scale in association with word-frequency levels, the second 1,000 words mean that they are less frequent than the first 1,000 words and more frequent than the third 1,000 words. Through RANGE, the vocabulary levels of a text can be worked out by comparing the word lists made from the target text with the twenty-five 1,000-word-family lists.
For instance, if a 200-word composition contains 120 words belonging to the first 1,000 most frequent words, 40 words belonging to the second 1,000, 25 from the third 1,000 and 15 from the fourth 1,000, the ratio embracing the relative percentage of words in the composition that comes from different vocabulary levels would be 60%-20%-12.5%-7.5%. Because the research focus was the free active use of non-basic vocabulary, attention was paid to the vocabulary beyond the first 2,000.
In this study, the first 2,000 most frequent words were viewed as basic vocabulary while the words above 2,000 were non-basic and regarded as advanced vocabulary. Applying to the above case, the ratio of the first 2,000 words to the words beyond was 80%-20%. The condensed lexical frequency profile is more amenable to statistical analysis and some research has found it reliable and valid (Laufer & Nation, Reference Laufer and Nation1995). For further focus, the concern was with the twenty percent of vocabulary beyond the 2K level, which represents the learner's lexical richness in free expression. The increase of advanced vocabulary use was defined as growth of free active vocabulary.
To gauge the fifty students’ receptive/passive vocabulary, the Vocabulary Size Test was administered at the beginning of the course. Nation and Beglar (Reference Nation and Beglar2007) made this test by sampling from the most frequent 14,000 word families of English out of the BNC. It consists of 140 items, ten from each 1,000-word-family band of the 14,000 word families. Using Rasch tests, Beglar (Reference Beglar2010) has examined and verified the validation of the Vocabulary Size Test. Below are two sample items from the first and the fourteenth 1,000-word levels.
The word standard occurs more frequently than the word canonical. If one knows the word canonical, it is highly possible that s/he may also know the word standard.
3.4 Data processing
Fifty students’ compositions, totalling 200 paragraphs with each containing 150 words on average, were entered into the RANGE program. The notion of a word family was applied to the calculation. For example, glow, glows, glowed, glowing and glowingly belong to the same word family. Regardless of frequency, using any or all of them was regarded as one word family.
In the process of data entering, we corrected minor spelling errors which did not distort the word in order to make the words recognizable by the computer program. If a word use involved minor errors that did not affect the meaning of the vocabulary item, we considered it as an attempt for vocabulary use and included it in the calculation.
However, when a word was clearly used incorrectly (e.g., wrong meaning and non-words), it was not typed into the computer, as it could not be regarded as known by the student. Thus it should not be counted as part of his/her productive lexicon.
For a coding reliability check, one of the researcher's colleagues, who taught the same subject, was requested to help. The consistency of the coding was mainly calculated by the coefficient of the simple percentage agreement. The percentage of agreement between two raters reached 95%.
Finally, a series of statistical tests using ANCOVA were conducted on SPSS 17.0.
4 Results and discussion
4.1 Pre-test without video
Based on the scores of the Vocabulary Size Test, the fifty students’ recognition vocabulary ranged from 3,700 to 6,900 words (Mean = 4,500, SD = 650). The vocabulary size of a majority of students converged at 4,000—5,000 words (42 out of 50 students), with 6 people beyond 5,000 words and one of them even reaching 6,900 words, and only two below 4,000. They were divided into four groups, each having a similar mean vocabulary size. Running the Levene's test on the vocabulary size scores for homogeneity of variances (p > .05), the population variances for the four groups were approximately equal, as had been expected.
The following is an example of one paragraph written on the topic entitled Chinese Ghost Festival from a student whose receptive vocabulary was 6,900 words. This writing task served as a pilot study before any video delivery approach began. The words above the BNC/COCA 2,000-word level are in bold.
The ghost festival is an important holiday for Taiwanese people. The Gate of Hell is open in the seventh lunar month. Many spirits get the chance to visit the world. Ghosts are like human beings. Some are good and some are bad. The good ghosts do not hurt people. Instead, they help people to do good things. The bad ones are awful and they make people suffer and fear. A long time ago, many people were killed every seventh month of the lunar calendar. Consequently, people prepare a lot of food and things for the spirits and their ancestors, such as meat, wine, fruits and ghost money. They burn ghost money and daily necessities made of paper, for example, paper car, paper house, paper shoes and paper clothes for the dead people. In this way, at the end of lunar July, the ghosts would return to their own places and leave people in peace again.
In this 155-token composition, the total number of free active vocabulary was 81 word families excluding the proper noun Taiwanese. This student used 6 non-basic word families in her writing (in bold). The usage rate of advanced vocabulary was 7.4% (6/81), as opposed to 92.6% of the total word families which belonged to the first 2,000 most frequent words. This means that although this student occasionally used advanced words, most of her productive vocabulary centered upon the basic 2,000 word families. Compared with her vocabulary size (6,900 words), this student's productive proficiency had not yet reached a plateau in terms of advanced vocabulary use. Most of her receptive vocabulary remained dormant and needed activating.
As for the whole class, the mean usage rate of non-basic vocabulary on this pre-test writing was 4.86% (purely out of interest, non-basic vocabulary accounts for 19.73% of the total word families used in this paper). In practice, even though 48 out of the 50 students have reached the 4,000-word level, they used roughly 5 word-families above the 2 K level for every 100 word-families generated while formulating their ideas, with a correspondingly overwhelming majority of 95 word-families below 2,000. The ratio of non-basic vocabulary to basic vocabulary (1:19) demonstrated that their everyday range of free active vocabulary was still confined to the basic 2,000-word level.
4.2 Quantitative analysis of video effects
To examine whether there is a video effect on the expansion of productive vocabulary, a series of one-way ANCOVA were conducted separately for writing with video against the pre-test writing without video. The independent valuable included four video delivery modes and the dependent valuable was the percentage of non-basic vocabulary use. The covariate was the pre-test writing involving a different writing topic.
Before conducting ANCOVA, the homogeneity of regression slopes assumption was first tested. The test evaluates the interaction between the covariate topics and the independent variable video modes in the prediction of dependent variable non-basic vocabulary use. The p-value >.05 indicates that the interaction between topics and video modes was not significant.
Table 3 presents the mean percentage of non-basic vocabulary use in the pre-test writing without video and in the four post-video compositions.
Table 3 Mean percentage of non-basic vocabulary use in the pre-test writing without video and four post-video writings
Mode A: captioned video; Mode B: non-captioned video; Mode C: silent video with captions: Mode D: video with screen off. Pre-test writing without video: A ghost festival; Video-Writing1: Coke fountains; Video-Writing2: Lottery; Video-Writing3: The city on the sea; Video-Writing4: Internet addiction
In Video-Writing1, the advanced vocabulary use of the four groups was 11.27%, 12.56%, 5.11% and 8.87% respectively (likewise, in Video-Writing2, 12.87%, 5.37%, 8.51% and 11.26%; in Video-Writing3, 5.19%, 8.5%, 11.54% and 12.26%; in Video-Writing4, 8.62%, 11.23%, 12.1% and 5.15%). Overall, regardless of video-writing topics, the students viewing video before writing in any presentation format outperformed their pre-test writing without video (all the mean percentages with video > those of pre-test writing without video, 4.866%, 4.861%, 4.864% and 4.862%). Through audiovisual exposure to English, the students seemed to enlarge their normal range of free active vocabulary by voluntary attempts to use more non-basic/advanced vocabulary.
In Table 4, a series of ANCOVA results (the between-subjects comparisons for four writings) show that there was a consistently significant audiovisual effect on the growth of free active vocabulary after controlling for the confounding variable topics (Video-Writing1, F = 194.9, p = .000; Vidio-Writing2, F = 239.5, p = .000; Vidio-Writing3, F = 186.9, p = .000; Video-Writing4, F = 229.4, p = .000). The four groups of students, receiving the same input but undergoing different channels of exposure, performed significantly differently in the use of non-basic vocabulary.
Table 4 A series of ANCOVA results for four writings against the pre-test writing
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:31888:20160415231608123-0231:S0958344013000220_tab4.gif?pub-status=live)
As shown in Table 4, the strength of relationship between video modes and the use of non-basic vocabulary (assessed by the Eta squared) was very strong with the mode factor accounting for 93% of the variance of dependent variable on average, holding constant the covariate (Eta squared = 0.929, 0.941, 0.926 and 0.939 respectively).
To sum up, the results have brought a beacon of hope in teaching English writing with the incorporation of video. Any format of access to target language may inspire students to go beyond their habitual domain and use a diverse vocabulary.
4.3 Pairwise comparisons of video modes
To further ascertain which video mode achieves the best effect (namely, to address the research question “Is there any significant difference in the percentage of non-basic vocabulary use when learners get the same input but in different audiovisual modalities in a video-based writing task?”), the Holm's sequential Bonferroni correction was repeatedly used to control Type I error across the six pairwise comparisons for each video-writing.
Table 5 demonstrates that there were significant differences in the mean percentage of advanced vocabulary use between the groups receiving different video modes (nearly all the p-values <.05, except for p = .205 > .05 in Video-Writing3). The mean difference between non-captioned video (Mode B) and silent video with captions (Mode C) was consistently significant and the largest, followed by the significant difference between captioned video (Mode A) and silent video with captions (Mode C). Among the four rounds of video-writings with groups shifting formats alternately, the mean difference between Modes A and B was all negative, showing that non-captioned video fostered more growth of productive vocabulary than captioned video. Moreover, the mean difference between both modes was also significantly different across those rounds except that there was no apparent discrepancy in Video-Writing3 (as shown in p = .205).
Table 5 Pairwise comparisons for four video-writings
1, 2, 3, 4: Groups 1, 2, 3; 4; Mode A: captioned video; Mode B: non-captioned video; Mode C: silent video with captions: Mode D: video with screen off.
Returning to Table 3 for the mean percentage, the group receiving non-captioned video (Mode B) produced consistently greater advanced vocabulary than the other three groups having different formats (in Video-Writing1, Group 2 with Mode B, 12.56%; in Video-Writing2, Group 1 with Mode B, 12.87%; in Video-Writing3, Group 4 with Mode B, 12.26%; in Video-Writing4, Group 3 with Mode B, 12.1%). As to the within-group comparison, the results also reveal a clear-cut difference in the usage rate of non-basic vocabulary. For group 1, viewing video without captions (Mode B) in Video-Writing2 stimulated them to make attempts to use more non-basic vocabulary as opposed to the other three conditions (12.87% versus 11.27% viewing the captioned video (Mode A) in Video-Writing1; 5.19% viewing the video with sound off but with text on (Mode C) in Video-Writing3; 8.62% listening to the video with screen off (Mode D) in Video-Writing4). Likewise, when given non-captioned video (Mode B) in Video-Writing1, group 2 made the greatest progress in the increase of non-basic vocabulary use (12.56% versus 5.37% with silent but captioned video (Mode C) in Video-Writing2; 8.5% with soundtrack only (Mode D) in Video-Wriitng3, and 11.23% with captioned video (Mode A) in Video-Writing4).
As to an overall average of each mode over the four rounds of writing, the mean percentage of non-basic vocabulary use for captioned videos averaged 11.33% [ = (11.27% + 11.26% + 11.54% + 11.23%) divided by 4] and the average percentages for non-captioned videos, silent but captioned videos and soundtrack-only videos were 12.45%, 5.2% and 8.63% in turn.
In a nutshell, concerning the audiovisual effect on the development of non-basic vocabulary use, the non-captioned video mode ranked top and the captioned video came second, followed by the video with screen off (soundtrack only) and the silent video with captions at the bottom. The learners were able to stretch their active vocabulary after exposure to English, regardless of the audiovisual modalities.
4.4 Qualitative analysis of video effects
To follow up on the quantitative findings, the students’ compositions were examined. Below is an example of one paragraph on the topic entitled A Coke Fountain from an ordinary student in the non-captioned group, who had a receptive vocabulary of 4,000 words on the Vocabulary Size Test.
What can Coke be used for? In addition to being a refreshing beverage, it can also be made an amazing experiment – a Coke fountain. All you need to do is to follow the four steps. First of all, purchase all the items you need at a convenience store: a roll of Mentos, a bottle of 2-liter diet Coke, a jumbo straw and a piece of cardboard. Second, bring all the items to a place outside like a park or garden, never do this indoors. Third, put the Mentos into the straw and aim it at the opening of the Coke with the cardboard between them to hold Mentos. Finally, remove the cardboard quickly and let the Mentos drop into the Coke. It is wise for you to prepare an umbrella or wear goggles in advance because a Coke fountain has just been created. It can erupt four meters high.
After viewing the Coke geyser video clip, the student indeed used more non-basic vocabulary (15.38% in word family excluding the product name Mentos and Coke fountain given in the topic), compared with 5.16% on her pilot writing without video.
The reader may be interested to see what English words the student may have been exposed to, before she wrote this timed paragraph composition. The following is part of the transcript about how to make a Coke geyser.
…A marvelous Coke fountain can be created by mixing Mentos, mint candy and the best refreshing beverage, Coke. This activity is probably best done outside, in a wide open space or on a huge lawn. Prepare a roll of Mentos, a PVC tube or a jumbo straw that is big enough to hold Mentos, a piece of cardboard or a flashcard and a two-liter bottle of Coke. Either diet or regular Coke will work for this experiment, but Coke Zero works better than anything else. In other words, diet Coke erupts higher. Position the bottle on the ground so that it will not tip over. Put the flashcard under the straw to stop Mentos from dropping. Unwrap the whole roll of Mentos and fill the straw with the mint candy. Open the cap of the Coke. Put the straw with the flashcard under it on top of the two-liter bottle and align the straw with the opening of the bottle. Warn the spectators to stand back. Quickly remove the flashcard so that all of the Mentos drop into the bottle at once, and then move out of the way as fast as you can. A Coke fountain forms, gushing four meters high, probably as high as four meters.
(from http://www.youtube.com/watch?v=5taG_-sCAtQ&feature=fvsr)
Of the total words used in the video, 15.84% were advanced words, roughly 16 word families above the 2 K level for every 100 word families uttered (excluding PVC and Mentos).
Table 6 compares the advanced vocabulary used in the video transcript and in the student writing. When watching the Coke geyser video, the student picked up eight advanced words from what she had heard (i.e., experiment, erupt, refreshing, straw, cardboard, liter, beverage and jumbo, excluding Coke and fountain given in the title) and used them in her paragraph writing. In addition to listening, the student took advantage of what she saw on the screen and experimented with the words she had learned before (i.e., umbrella, goggles and ‘convenience’ at a convenience store) to describe what she observed and associated with them. For instance, the student may have caught sight of the spectators with raincoats, umbrellas and goggles standing aside to watch the experiment of mixing Coke and Mentos. The funny scene may have impressed her so that she put forward her suggestion to the reader in her writing about it being wise to prepare an umbrella or wear goggles before doing a Coke fountain.
Table 6 Advanced vocabulary use in the student composition and the video transcript
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:87803:20160415231608123-0231:S0958344013000220_tab6.gif?pub-status=live)
It was very encouraging to see that even as little as eight minutes exposure without captions could actually help the student to expand her productive vocabulary from the 2,000-word level to beyond. By inference, the repetition of vocabulary and lexical diversity in sustained audiovisual exposure may help as well in this regard. Without the provision of captions, the audiovisual activities can still promote free active vocabulary.
4.5 Student views on the video delivery modes
After a series of audiovisual experiments, fifty questionnaires of five open-ended questions were distributed to the class as a follow-up. The student responses were classified into several categories based on the gist of their statements. Questions 1 to 4 in the questionnaire were asked separately, “How did you feel about the video (1) with captions, (2) without captions, (3) with captions but sound off, and (4) with screen off?” The answers to these four questions showed a level of agreement and consistency with Question 5 in the questionnaire, “Which video delivery mode do you prefer?” Most students (31 out of 50 students) preferred the video without captions to the other three conditions. Very surprisingly, four of them indicated that they did not read the captions when there were captions. Two students even said that when the video was played, they intentionally closed their eyes in order to work with the audio better through the earphones. They gave their reason that through audio, they could attend better to the lexical items they may need to use for subsequent writing. The statements reveal what strategy the students were inclined to use while watching the video and suggest that the students’ allocation of attentional resources seemed to be selective.
As to listening to the video with screen off, eight students reflected anxiety in their responses. They reported that even though they strained their ears to listen, their mental images were still blurred or even blank. All they could do was to listen for the main idea and pick up the key words from a sequence of utterances for general comprehension. In contrast to the video with screen off (which presents information through the auditory-verbal channel only), the video playing with the sound off (which presents messages through the visual-pictorial channel only) also made a majority of students uneasy. One observation supported this inference. When a video clip showed a hilarious scenario concerning a reporter wearing over-sized goggles and a full-length raincoat introducing a coke and Mentos experiment, the video with sound off group did not laugh while the video with and without captions groups produced a burst of laughter. The video with sound off group seemed to be so devoted to the captions as to miss the interesting scenarios. In other words, when the sound was off, they tended to process the content of captions as a priority instead of a series of images. A similar finding was observed in Sydorenko's (Reference Sydorenko2010) study on the effect of input modality. Most of her students paid most attention to captions and then to video.
The above situation may concur with some cognitive principles about redundancy and dual coding. According to Mayer (Reference Mayer2002), the visual-pictorial channel can become overloaded when learners must use their visual cognitive resources both to read the on-screen text and to watch the image. In contrast, when words are presented verbally, they are processed in the auditory-verbal channel, which frees the visual-pictorial channel to focus on processing the image. Therefore, both the groups viewing the video with and without captions laughed out loud upon seeing the comic reporter fully equipped to prevent himself from getting wet, whereas the silent video with captions group did not laugh at all.
The added on-screen text seemed to compete with the image for cognitive capacities in the visual-pictorial channel, creating a split-attention effect. This may explain why the students viewing silent video with captions were not responsive. They had to pay attention visually to both the printed words and images, resulting in a detriment to their processing of input. As has been shown in the percentage of advanced vocabulary use, the worst performance by the group that watched silent video with captions supported the dual channel proposition that students may learn more deeply through both the auditory channel and the visual channel than from each alone.
In addition, the better performance by the non-captioned group than that by the captioned group in the higher usage rate of non-basic vocabulary may also sustain the cognitive redundancy principle that deeper learning happens when words are presented as spoken words rather than as both spoken words and on-screen text. When receiving audio only (with the screen turned off), the group performed even better than the group watching silent video with captions. Drawing upon the cognitive modality principle, one probable explanation is that some students may learn better when words are presented as spoken words rather than as on-screen text.
Despite this, the utility of the video display with captions cannot be denied. This delivery mode involves three elements: image, sound and text. The rationale for presenting the same words in two formats (sound and on-screen text) is that students will be able to choose the format that better suits their learning style. If students learn better from spoken words, they can pay attention to the narration; if they learn better from written words, they can pay attention to the captions.
The overall impression from the students was that video playing before writing provided a relaxing atmosphere to encourage writing freely. They expressed the view that video-based writing was helpful in organizing thoughts and in particular, in boosting inspiration. Sixteen students felt that video reduced class tension and panic, despite the fact that the composition class was more boring than other subjects on first impression. Generally speaking, the pedagogical framework appeared to be acceptable to the students.
5 Conclusion
As the sample size of student writing was small, the results can only be regarded as indicative rather than conclusive. This study demonstrated that incorporating video into a composition class may positively influence student writing performance.
Although the fifty students’ recognition vocabulary was over 2,000 words, ranging from 3,700 to 6,900 words, the extent of their productive vocabulary was within the basic 2,000 word-families, with a scant use of four to five word-families over the 2,000-word level as opposed to 95-96 word-families below 2,000 for every 100 word-families produced. Obviously, the majority of the passive/receptive words did not enter the active realm. This may be due to the fact that in an EFL context, limited English exposure and lack of practice hinder successful passage of words from receptive to productive vocabulary.
This could have been improved by providing direct access to the target language prior to writing. An increasing tendency regarding non-basic vocabulary use (from 4.86% without video to the averaged 12.45% with non-captioned videos, 11.33% with captioned videos, 5.2% with silent but captioned videos and 8.63% with soundtrack-only videos) demonstrated that some of the dormant vocabulary seemed to be aroused. Through various audiovisual exposures to English, the EFL students were inspired to use their passive vocabulary that was contextually appropriate for the topic in an immediate writing task.
This has some pedagogical implications. Writing teachers may need to choose audiovisual content in relation to the writing topic so as to allow students to make attempts at contextually-related words, especially infrequent vocabulary. They also need to recognize the vocabulary levels of their students in the first place and then find suitable audiovisual materials which are lexically challenging and manageable.
Admittedly, the present research has worked within a narrow focus on vocabulary. When viewing videos, students may pay little attention to syntax. It is not sufficient merely to attend to vocabulary in isolation. The audiovisual impact on the use of phrases and collocations in writing are also worthwhile exploring.
Last but not least, the researcher would like to stress that the fulfillment of the goal for extending productive vocabulary requires no radical new teaching approach. The integration of audiovisual activities and writing can be exploited flexibly so that they complement each other. The aim of this research has been to raise that awareness.