1 Introduction
1.1 Grammar, lexis, and phraseology
John Sinclair, a pioneer in the field of corpus linguistics and phraseology, opens the eleventh chapter of his book, Trust the text (2004), with a telling example of just how elusive grammar can be. Reflecting on the letterhead of a European society founded to promote the topic of phraseology, Sinclair notes the ambiguity of grammatical correctness in the English version of the society’s name: The European Society of Phraseology. Being a European society, the letterhead also appears in German and French as, Europäische Gesellschaft für Phraseologie and Société Européenne de Phraséologie, respectively. Sinclair (Reference Sinclair2004: 177) ruminates:
Notice that the preposition used in the English version is of, and when I first encountered this I felt it was, if not ungrammatical, certainly uncomfortable. In French the preposition is de and in German für. The regular translation of de in English is indeed ‘of’, but of für it is ‘for’. I wondered, does:
1. European Society for Phraseology
sound any better [than European Society of Phraseology]? Yes, I think it does, but I have no idea why.
Examples of this nature illustrate the advantage of viewing language at the syntactic or phrasal level as well as the lexical level. This is precisely where a phraseological view of language can prove helpful to language learners, as it eschews the traditional lexis / grammar dichotomy view of language in favor of a more integrated one. Phraseology is predicated on Sinclair’s (Reference Sinclair1991: 110) idiom principle, which is encapsulated in the simple observation that “a language user has available to him or her a large number of semi-preconstructed phrases that constitute single choices, even though they might appear to be analyzable into segments”. When language is taken in segments, or multi-word units of meaning, much of its ambiguity dissolves.
1.2 Formulaicity and terminology
The phraseology, or formulaicity, of language can be illuminated by corpus linguistics. Römer (Reference Römer2009: 141) observes that, “if there is one major finding of modern (computer) corpus linguistic research over the past 40 years, it is probably that language is highly patterned”. Hoey (Reference Hoey2009: 36) takes this idea a step further, expounding that “grammar is the system that one falls back onto when the collocational and other patterns are not used”. Clearly, both of these statements allude to the pervasiveness of formulaicity in language. Erman and Warren (Reference Erman and Warren2000) estimate that 52.3% and 58.6% for written and spoken language, respectively, is formulaic. Considering the cognitive processing advantages associated with using patterned language over novel utterances, it is not surprising to see such substantial quantities of formulaic language in everyday discourse (Conklin & Schmitt, Reference Conklin and Schmitt2008; Jiang & Nekrsova, Reference Jiang and Nekrsova2007; Tremblay, Derwing, Libben & Westbury, Reference Tremblay, Derwing, Libben and Westbury2011).
For formulaic language, Ellis (Reference Ellis2012: 27) identifies three broad qualities to keep in mind: frequency, association, and native norms. Frequency and strength of association are two measures typically applied to formulaic language. Frequency is a self-explanatory term that simply refers to how often words co-occur. One measure of strength of association between co-occurring words is MI value, which Kennedy (Reference Kennedy2008: 23) defines as “the actual frequency of co-occurrence of two words with the predicted frequency of co-occurrence of the two words if each were randomly distributed in the corpus”. Hunston (Reference Hunston2002) proposes that an MI value of three or higher indicates a relatively strong collocation.
Just as language is replete with patterns, so is the field of applied linguistics with terminology for patterned language (Wray, Reference Wray2002). For the purposes of this study we will adopt Wray’s terminology of formulaic sequence / language. Wray (Reference Wray2002: 9) defines a formulaic sequence (FS) as “a sequence, continuous or discontinuous, of words or other elements, which appears to be prefabricated: that is, stored and retrieved whole from memory at the time of use”. This is an often-cited definition of formulaic language, and with good reason. The inclusion of continuous or discontinuous affords the term a great deal of flexibility in accepting a wide range of multi-word units, “from formulaic phrase, to limited-scope slot-and-frame pattern, to fully productive schematic pattern” (Ellis, Reference Ellis2012: 18).
1.3 Corpora in foreign language learning curricula
There are two common pedagogical applications of corpora in second/foreign language teaching and learning: indirect and direct applications (Römer, Reference Römer2011). Indirect applications include researchers and teachers consulting corpora to inform curriculum and materials development, and may lead to authentic examples of language for textbooks rather than invented examples. Direct applications of corpora in language teaching and learning, on the other hand, typically involve learners accessing a corpus directly. This is perhaps most commonly identified with data-driven learning (DDL), a term coined by Tim Johns (Johns, Reference Johns1986). Johns (Reference Johns1991: 30) defines DDL as “the attempt to cut out the middleman as far as possible and to give the learner direct access to the data”. The idea behind DDL is that learners act as language detectives, or researchers, investigating authentic examples of the target language on their own. Boulton (Reference Boulton2010a: 535) explains that “learners are not taught overt rules, but they explore corpora to detect patterns among multiple language samples”. Hunston (Reference Hunston2002: 170) contends that DDL supports learning because “students are motivated to remember what they have worked to find out”.
DDL appears to be generally well received by learners and theoretically sound as a language-learning tool. Ellis (Reference Ellis2002: 144) reminds us that cognitive linguistic theory postulates that “all linguistic units are abstracted from language use”. In usage-based theories of language learning, frequency is crucial for acquisition because “‘rules’ of language, at all levels of analysis… are structural regularities that emerge from learners’ lifetime analysis of the distributional characteristics of the language input” (Ellis, Reference Ellis2002: 144). Gries (Reference Gries2008) suggests that there is a strong affinity between corpus linguistics and cognitive linguistics as they both rely heavily on frequency, and Boulton (Reference Boulton2009: 39) maintains that “DDL… exploits processes that humans have evolved to be naturally good at: exposure to data, detection of patterns, extrapolation to other cases.” While differing from naturalistic first-language acquisition which is largely unconscious, DDL can be argued to be firmly grounded in cognitive linguistic theory as learners analyze masses of input in a quest to become more familiar with structural regularities via inductive means.
Studies that have made quantitative comparisons of the efficacy of DDL with more traditional approaches to teaching suggest that DDL leads to results which are at least as good as, if not better than, other approaches (e.g., Boulton, Reference Boulton2009; Boulton, Reference Boulton2010b; Cobb & Reference Cobb and BoultonBoulton, forthcoming). For example, Frankenberg-Garcia (Reference Frankenberg-Garcia2012), with a group of EFL learners in Portugal, compared the efficacy of using dictionary definitions to corpus examples with respect to (1) learning the meaning of a target word, and (2) learning how to appropriately use a target word on a syntactic level. Two of Frankenberg-Garcia’s hypotheses in the study were that dictionary definitions would be more effective in the comprehension of novel words, while corpus examples would be more effective in learning the proper usage of familiar words. The findings supported both hypotheses (see also Frankenberg-Garcia, this volume).
Moving away from the experimental to the more qualitative, many studies investigating corpus-based learning have focused on student attitudes and beliefs toward the approach and/or the processes involved (Boulton, Reference Boulton2009: 38), and most pertain to EFL or ESL learners’ writing (Chambers & O’Sullivan, Reference Chambers and O’Sullivan2004; Yoon & Hirvela, Reference Yoon and Hirvela2004; Chambers, Reference Chambers2005; Yoon, Reference Yoon2008; Chen & Baker, Reference Chen and Baker2010; Kennedy & Miceli, Reference Kennedy and Miceli2010). While these studies report largely positive findings related to outcomes of corpus-based learning and student attitudes toward DDL, a number of drawbacks consistently emerge as well, such as lack of confidence with respect to the grammaticality of corpus findings, the time-consuming nature of DDL, and the difficulty of interpreting the results of corpus investigations (Chambers, Reference Chambers2005; Chambers & O’Sullivan, Reference Chambers and O’Sullivan2004; Yoon & Hirvela, Reference Yoon and Hirvela2004). Complaints of this nature have led a number of scholars to recommend substantial introduction and training in how to use corpora properly (Kennedy & Miceli, Reference Kennedy and Miceli2010; Yoon & Hirvela, Reference Yoon and Hirvela2004). Bernardini (Reference Bernardini2004: 26), for instance, recommends starting students with convergent tasks, that is, tasks that guide learners to the same outcome. Once learners become familiar with the interface, they can then move on to more divergent, or independent tasks.
A number of studies have investigated student attitudes and beliefs about corpus-based learning as a way to improve their writing, and while some delve into student attitudes and beliefs about corpus-based learning with respect to speaking (Aguado-Jiménez, Pérez-Paredes & Sánchez, Reference Aguado-Jiménez, Pérez-Paredes and Sánchez2012; Pérez-Paredes & Cantos Gómez, Reference Pérez-Paredes and Cantos Gómez2004), there are notably fewer of them. The aim of this study is three-fold. The first is to outline a course that puts DDL at the center of the curriculum with the aim of increasing learners’ repertoires of formulaic language and their ability to employ FSs in conversation; the second is to gauge how effective students are in employing their target phrases in a pragmatically appropriate manner; and the third is to investigate student attitudes toward this approach to language learning.
2 The study
2.1 Context
The course described below was a semester-long optional course open to third- and fourth-year students in the Department of International Communication (IC) at a private foreign language university in Japan. It met twice a week for 15 weeks, with each session lasting 90 minutes. Both rooms in which the class met were equipped with laptop computers, one with 30 and the other with 15, and a wi-fi Internet connection.
2.2 Population
The class consisted of 30 students, ranging in age from 20 to 22 years old, of whom 21 were female and 9 male. All were Japanese, and 29 of the 30 students agreed to participate in the study. In accordance with departmental policy, these students had taken the Test of English for International Communication (TOEIC) and had scores ranging from 540 to 860, and a mean of 736. This corresponds roughly to A2/B2 in the Common European Framework of Reference for Languages (Council of Europe, 2001), or intermediate to mid-advanced levels (Educational Testing Service 2013).
2.3 Course syllabus
The first three weeks of the course were used to explain course aims, do reading and discussion activities about DDL and inductive learning versus deductive learning, and train students in using the corpus via handouts with convergent tasks. The Corpus of Contemporary American English (COCA) (Davies Reference Davies2008-) was chosen for the course as it is a large, publicly available corpus consisting of 450 million words. Also, the instructor was from the United States and felt more comfortable commenting on instances of American usage as opposed to another variety of English.
Beginning in the fourth week, attention turned to three main components of the course: speaking journals (see section 2.3.1), student-led lessons, and a final project. Students used COCA to investigate and discover FSs they wished to use in their speaking journals and teach their peers in weekly student-led lessons. Finally, the students took part in a project that had them conduct a Behavioral Profile (BP) study of near-synonymous words and phrases (see section 2.3.3). Due to space limitations, the majority of attention in this article will be devoted to the speaking journals.
2.3.1 Speaking journals
The speaking journals formed the core of the course, and students were responsible for completing four throughout the semester. Students were given four class periods over the course of two weeks to complete each speaking journal. The speaking journals consisted of four distinct phases: (1) preparation; (2) corpus consultation; (3) a rehearsal conversation; and (4) the real conversation.
The speaking journal task is essentially based on input and interaction. An interactionist perspective on language acquisition posits that “the interactional ‘work’ that occurs when a learner and his/her interlocutor (whether a native speaker or more proficient learner) encounter some kind of communication breakdown is beneficial for L2 development” (Mackey, Abbhul & Gass, Reference Mackey, Abbhul and Gass2012: 9). In the course reported on here, the learners’ task was to identify a potential communication breakdown before it happened by learning FSs that they did not have command of prior to the corpus consultation. The learners then took their FSs and used them in conversation with a more proficient speaking partner. This use of novel FSs can reasonably be likened to Swain’s (Reference Swain1995) output hypothesis as the learners ‘pushed themselves’ to create the opportunity to use their target phrases. Productive use of the target language, Swain (Reference Swain1995) contends, causes learners to process the language more deeply than input alone.
Phase 1 of the speaking journal had the students interact with authentic materials that they were free to choose. It was hoped that choosing materials and topics that interested them would increase motivation. Furthermore, formulaic language is more ubiquitous in authentic materials, such as television and movies, than in textbooks designed for language learning (Irujo, Reference Irujo1986: 237; Biber, Conrad, and Cortes, Reference Biber, Conrad and Cortes2004: 379–380). Students were given a number of choices for authentic materials: English-language video news (e.g., CNN news video), news articles (e.g., an online news source, or print newspaper), magazines, TV shows, movies, and comic books, or they could bring their own ideas to the instructor for approval. The students had easy access to the first two options through the Internet, and easy access to the remaining options through the university’s library and self-access center. They were not allowed to choose any one form of materials more than twice in order to guarantee exposure to a wide variety of media. After the students had read or watched their material, their task was to write up a summary and two discussion questions. The summary and discussion questions were then used for small-group discussions in the subsequent class period.
Phase 2 of the speaking journal was the DDL component, with students using COCA to investigate words and phrases discovered in Phase 1. A useful analogy to describe the goal of this phase is Kennedy and Miceli’s (Reference Kennedy and Miceli2010: 32) pattern-hunting, which “amounts to encouraging them [the learners] to use the corpus as an aid to the imagination and memory”. Learners were encouraged to investigate words that they anticipated would be of use in their planned topic of conversation. One student, for example, planned to discuss her search for a job as she was approaching the end of her university career. In her corpus investigations she began with the word work, as it was sure to come up in conversation. Working her way through the concordancing phase of the speaking journal she settled on the five-word phrase, work on a full-time basis. The student was then able to use this phrase in the ‘real conversation’ phase of her speaking journal. As it turned out, students in the class often chose to look up familiar words with the goal of finding novel ways to use them (recall Frankenberg-Garcia’s (Reference Frankenberg-Garcia2012) study; see also Frankenberg-Garcia this volume). Upon choosing intriguing collocates, students noted the frequency and MI value, and combed through the concordance lines to find interesting patterns. If a student chose to investigate a phrase in COCA rather than an individual word, the collocate function of the corpus was not used. Figure 1 shows notes a student took in her speaking journal during her concordancing.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:94590:20160415231602477-0998:S0958344014000044_fig1g.jpeg?pub-status=live)
Fig. 1 Example of a student’s notes from concordancing session on COCA
In Phase 3 of the speaking journal, students used the phrases from their previous investigations in small-group ‘rehearsal conversations’ with their classmates. The main point of these rehearsals was to give students the opportunity to practice using their new phrases in the context of their chosen topic, and, perhaps more importantly, to learn how to manipulate a conversation in order to create an opportunity to use their target phrases.
Phase 4 was the final phase of the speaking journal where students were sent out to record their ‘real conversations’ with a native or more proficient speaker of English. Students were afforded opportunities to practice conversation in an open space at the university where international students often gather and teachers are available for informal conversations. There are mp3 recorders available for students to borrow to record their conversations, but most students simply used a personal smartphone. Students began their real conversations with a topic and general plan as to how they anticipated working their FSs into their conversations based on their rehearsal. After the conversations, students completed a reflection section in their speaking journal where they listened to their recording as a whole, noted when they used their FSs, and whether or not they were able to produce the FSs as planned. The sound files of their recordings were then uploaded to the class website, or emailed to the instructor, and the speaking journals were handed in.
2.3.2 Student-led lessons
The second major component of the class was the student-led lessons. Starting in the fourth week of class, and each week thereafter, a small group of students led the class in a 30-minute lesson featuring FSs discovered through their speaking journals. Ten groups of three were formed, and each student contributed two of their favorite FSs to the lesson, so each lesson featured six FSs. In addition to explaining their FSs, students led their classmates in an activity designed to give the class an opportunity to use the FSs. Example activities include variations of Pictionary-like games where the class draws pictures of the target phrases for the other students to guess, telephone-like games, hot-potato, creating skits or writing stories that use the phrases, and so on. The student-led lessons proved to be very popular, as illustrated in the questionnaire results presented in section 4.3. Some examples of FSs that students used in their speaking journals and then went on to teach their peers in class are:
(1) pave the way for
(2) behind the scenes
(3) put (personal pronoun) best foot forward
(4) poised on the brink of
(5) catch one’s eye
(6) fail to recognize
(7) place an emphasis on
2.3.3 Behavioral Profile study
The course culminated in students undertaking a Behavioral Profile (BP) study of near-synonymous words or phrases of their own choosing. Gries (Reference Gries2010) explains that BP studies allow for the fine-grained analysis of near synonyms, which can shed light on differences between near synonyms and polysemous words. The scope of the project was such that it is not possible to explore the details fully here, but briefly the project entailed students identifying near-synonymous words or phrases and embarking on a corpus analysis via COCA and web searches to reveal subtle differences in patterns of usage. Students wrote reports to present to classmates and finally hand in to the instructor. Table 1 provides examples of the type of near-synonymous words and phrases students investigated.
Table 1 Examples of near-synonymous words and phrases investigated by students
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:50795:20160415231602477-0998:S0958344014000044_tab1.gif?pub-status=live)
3 Data collection and analysis
In order to arrive at a clearer understanding about student attitudes toward this particular approach to corpus-based language learning, data were collected via a questionnaire and triangulated through follow-up interviews and student reflection logs at the end of each speaking journal.
The questionnaire consisted of 44 statements to which the respondents were asked to indicate their degree of agreement on a 6-point Likert scale. The majority of items were adapted from two published studies on using corpora in L2 writing (Yoon & Hirvela Reference Yoon and Hirvela2004; Liu & Jiang Reference Liu and Jiang2009). The researchers designed the remaining items specifically for this study. All statements were presented in English and Japanese. The questionnaire included negatively worded items to keep respondents from marking only one side of the questionnaire, and scores for such items were reverse-coded before analysis (Dörnyei & Taguchi Reference Dörnyei and Taguchi2010). The researchers merged items into multi-item scales based on theoretical considerations. Categories were: (1) difficulty in using corpora; (2) positive impact of using corpora; (3) effectiveness of presentation and delivery of coursework; (4) completing speaking journals and incorporating phrases; and (5) attitudes and beliefs about data-driven learning and its potential. Six of the participants did not answer all 44 items on the questionnaire, missing an item either by choice or simple oversight (e.g., participant 11 did not answer item 14; participant 1 did not answer item 23). Therefore, the internal consistency, or reliability in how participants responded between items on the questionnaire, was based on the responses of the 23 individuals who responded to all 44 items and measured via Cronbach’s Alpha in the statistical package SPSS. The instrument showed a high level of internal consistency with r =.870. Means, modes, and standard deviations were calculated based on all participants’ responses. For the purposes of presentation, the results from the questionnaire are presented simply in terms of agreement to the statements.
The follow-up interviews were semi-structured with lead questions based on the survey results and student reflection logs from the speaking journals (see the Appendix for interview questions). Interviewees were selected at random and included two males and three females. Each interview lasted about thirty minutes. Interviewees were given the choice of being interviewed in English or Japanese, and all of them chose to communicate in Japanese. The quotes presented in this paper were translated into English by the researchers.
Finally, one additional quantitative analysis was performed to investigate whether students were able to employ their target phrases in a contextually and/or pragmatically appropriate manner. The analysis entailed giving a sample of 114 phrases to four native-speakers of English to independently rate on a numerical rating scale of 1–4 (1 being ‘inappropriate’ and 4 being ‘appropriate’).
4 Results and discussion
4.1 Positive impact of corpus use
The findings concerning the impact of corpus use were quite encouraging and in general suggest students’ belief in the utility of DDL on a number of fronts, as Table 2 demonstrates. Students felt strongly that this approach to language learning increased their knowledge of collocations. Nearly all participants agreed with the statements that researching familiar vocabulary items in the corpus led to learning new phrases and new ways to use familiar vocabulary (see items 7 and 26). This provides qualitative support to Frankenberg-Garcia’s (Reference Frankenberg-Garcia2012) finding that concordances are useful for learning novel usages of familiar words. Perhaps most encouraging is that 28 of 29 participants, 97%, believed DDL to be helpful for language learning, with a mean score of 5.00. One student noted in her speaking journal log:
I had a good conversation with Shelley [a teacher]. It went as I planned. And I could learn new words from the conversation. I think it’s one of the best learning styles. I think she [Shelley] has different ideas from rehearsal conversation.
Table 2 Positive impact of using corpus
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160711135107-75747-mediumThumb-S0958344014000044_tab2.jpg?pub-status=live)
* Raw numbers and percentage in parentheses
** 1: strongly disagree, 2: disagree, 3: somewhat disagree, 4: somewhat agree, 5: agree, 6: strongly agree
*** Responses occurred equally.
Students also believed that COCA was helpful for writing (66%) and speaking (89%), (see items 22 and 23). The latter is particularly of note because, as mentioned earlier, less has been done in the way of investigating student beliefs about the benefits of corpus consultation as pertaining to speaking as compared to writing. This finding is perhaps not too surprising, though, as the course focus was on speaking rather than writing. Nevertheless, the students did perceive corpus consultation to be a useful tool to improve their speaking.
4.2 Difficulty in using the corpus
A recurring line in the literature is the difficulties that accompany DDL, and many common themes from previous studies emerged here (Table 3). One notable exception, and likely a sign of the times and location, is that very few participants believed a dearth of Internet access to be a hindrance. As noted earlier, a frequent complaint is the investment in learning how to use a corpus effectively. In this study, too, a slight majority of students felt that learning to use COCA was difficult. However, once past the initial learning curve, less than half of the students felt that the actual concordancing was “difficult”. Interestingly, the participants were more or less split over the categorization of abundant concordance lines as a “difficulty”, which traditionally has been a common complaint about corpus consultation. One interviewee, however, did explicitly note a common complaint with DDL, citing difficulties he had in understanding the meaning of concordances due to cut off sentences:
[In the concordance output] there are many example sentences. With the long sentences, I mean, I can’t see the entire sentence, just one part... so I can’t understand the “situation”. If sentences are cut off in the middle… I wonder what the following words will look like…
Table 3 Difficulty with using corpus
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:4266:20160415231602477-0998:S0958344014000044_tab3.gif?pub-status=live)
* Agreement that the category is difficult as opposed to easy
** Raw numbers and percentage in parentheses
*** 1: strongly disagree, 2: disagree, 3: somewhat disagree, 4: somewhat agree, 5: agree, 6: strongly agree
**** Responses occurred equally.
In addition to cut-off sentences, a mismatch of register was also pointed out as a difficulty by an interviewee. This is of paramount importance, and likely one of the greatest weaknesses of the course reported on here. Students were using COCA to find high-frequency FSs to incorporate into their conversations, which usually happens in semi-informal contexts. Yet much of the spoken language accumulated in COCA comes from formal contexts, such as news programs. This underscores the importance of raising student awareness of the genre register from which concordance lines are gleaned and making decisions about how pragmatically appropriate a phrase from a news broadcast might be in a different context. This is a topic we will come back to in section 4.4.
4.3 Effectiveness of presentation and delivery of coursework
Given the unique nature of this class, the researchers wanted to gather data on the students’ attitudes and beliefs about the delivery of the coursework. For this reason, a number of statements specifically addressing aspects unique to the context were crafted, such as items concerning class time allotted to concordancing, the rehearsal conversations, and student-led lessons (Table 4).
Table 4 Effectiveness of presentation and delivery of coursework
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160711135107-31046-mediumThumb-S0958344014000044_tab4.jpg?pub-status=live)
* Raw numbers and percentage in parentheses
** 1: strongly disagree, 2: disagree, 3: somewhat disagree, 4: somewhat agree, 5: agree, 6: strongly agree
*** Responses occurred equally.
Because of the protracted nature of DDL, substantial time in class was allotted for concordancing. Typically students were given 60 minutes to concordance and then took part in a student-led lesson for the remaining 30 minutes. Occasionally, though, an entire 90-minute class period was devoted to concordancing and consulting with the teacher and their classmates about their findings. Even when given 90 minutes of class time, the majority of students felt that it was not enough. Interestingly, though, 83% believed that they did complete an adequate amount of concordancing for each speaking journal (items 16 and 17). Hopefully this indicates that students spent time concordancing outside of class, but it may be that while 90 minutes was not enough to achieve their concordancing goals for a speaking journal, students felt that additional concordancing would not have been beneficial.
The rehearsal conversations were viewed in an overwhelmingly positive light. This is apparent from the questionnaire items related to the ‘helpfulness’ of the rehearsal. 24 of the 29 participants agreed that the rehearsal was helpful to some degree, with a mean score of 4.72 and a mode of 5. Likewise, a common theme in students’ speaking journal logs was the usefulness of the rehearsal in helping hone their conversations in order to use their planned FSs. One student wrote in her speaking journal log:
When I did a rehearsal conversation I couldn’t use the phrases well and I felt some phrases were unnatural. Therefore, I made more examples to be able to choose and use the most natural one while doing this conversation. I expect my [real] conversation to go more naturally than my rehearsal conversation.
Having a teacher or native speaker of English explain what students found in the corpus was also perceived positively, as illustrated by item 34. Two students reported in the follow-up interviews that they felt strongly about the need to check the meaning and usage of target FSs with a native speaker, teacher, or friend whose English was more advanced, because the meanings of new words and phrases encountered in the corpus were sometimes not found in electronic dictionaries, or the nuance of the target words and phrases was lost when translated into Japanese.
The student-led lessons also proved to be a popular activity throughout the course (see items 39 and 40 in Table 4). Indeed, the instructor of the class noted that students responded well to peers in the role of teacher, and that the students made substantial efforts to create engaging lessons. It is worth noting that the student-led lesson was weighted at 20% of the final grade, which may in part explain the effort the students put in.
4.4 Completing speaking journals and incorporating phrases
Wray and Fitzpatrick (Reference Wray and Fitzpatrick2010: 38) point out that “it would be easy to construe them [FSs] as a straitjacket for the user, rather than an opportunity”. However, when used correctly in the appropriate context, FSs can be a concise, economical, ‘native-like’ means of conveying one’s message. Indeed, the more adept user of FSs can cut short, rearrange, and come up with new combinations joined by individual lexical items or shorter phrases. Mastery of a large repertoire of FSs can thus be seen as an integral step in the journey to fluency in a language. Wray and Fitzpatrick (Reference Wray and Fitzpatrick2010) and Kennedy and Miceli (Reference Kennedy and Miceli2010) suggest that considering the context and anticipating the trajectory of a planned conversation when choosing target phrases (i.e., pattern-hunting) will ostensibly minimize the failure to employ pre-determined phrases in a conversation. The following student comment illustrates her success with this approach:
I visualized and simulated the trajectory of a conversation using the target FSs. In addition, I tried to visualize how to expand the conversation. I prepared everything. Of course, the conversation didn’t go exactly the way I had expected, but I intentionally selected the words I would use in explaining things, then I became able to manipulate the target FSs naturally.
Some students were able to manipulate their FSs on the spot during their conversations, demonstrating adeptness at noticing and dominating the more open slot-and-frame patterning of some FSs. One student, searching for the word increase in COCA, arrived at the phrase increase your lifespan. However, when the time came to actually use the phrase in the real conversation, circumstances dictated that she change the phrase to decrease your lifespan. The student acknowledged as much in her speaking journal reflection log, demonstrating a high level of performance and agency by appropriating the phrase and manipulating it to fit her needs.
In addition to the success stories, though, Table 5 illustrates that using novel FSs in a natural way is not always easy for students. Just as learning the proper usage of novel vocabulary items can be challenging at times, it should not be too surprising that learners will occasionally encounter difficulties working prefabricated material into conversations. In their speaking journal logs, some students noted times when they abandoned target phrases because they felt unable to work them into their conversations naturally. At other times, students wrote that they were so absorbed in their conversations that they forgot to include a target phrase. Conversations are, after all, inherently open and dynamic; even the most socially alert people cannot predict with 100% accuracy how a given conversation will unfold.
Table 5 Completing speaking journals and incorporating phrases
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:25818:20160415231602477-0998:S0958344014000044_tab5.gif?pub-status=live)
* Raw numbers and percentage in parentheses
** 1: strongly disagree, 2: disagree, 3: somewhat disagree, 4: somewhat agree, 5: agree, 6: strongly agree.
Perhaps the most common difficulty throughout the course was students feeling unable to capture the nuance, or precise meaning of a target phrase, and use it in a pragmatically appropriate way in their own conversations.
One student commented in the interviews:
I believe the nuance is different when I translate the target phrases into Japanese. It seems okay to use those phrases in a straightforward way in any situation, but I was sometimes told that I couldn’t use certain phrases in certain contexts because the nuance is slightly weird even though they were grammatically correct.
This comment is interesting in that corpus work lends itself to discovering more ‘natural’, frequently used language. While corpus-based language learning might help students discover frequently occurring sequences of words that will often sound natural in speech, we hypothesize that the situation described by the student above is the result of trying to shoehorn a more idiomatic phrase into the conversation. We conjecture that this unexpected use of idiomaticity sometimes struck the students’ conversation partners as odd.
In order to more thoroughly gauge the consistency with which students were able to nest their target phrases into a larger context in a pragmatically appropriate way, 114 items were collected and rated by four native speakers of English on a numerical rating scale measuring appropriateness, with 1 being least appropriate and 4 being most appropriate. Raters were given an Excel file with the target phrases underlined and embedded in the larger context of the conversation in one column, and a drop-down menu where they could select 1 to 4 in the adjacent column.
Consistency between raters was again calculated via Cronbach’s Alpha, and was r =.816, indicating a high level of consistency (Table 6). The raw frequency of all phrases receiving a given rating is displayed in columns 1–4, which represent the numerical rating score, and the corresponding percentage is in parentheses. The mean score assigned by each rater is given in the last column; the overall mean score was 3.09. Also, the mean number of each score assigned by all raters is given in the bottom row. In general, the scores suggest that students were able to employ their target phrases in a pragmatically appropriate manner. However, there were still a considerable number of phrases that were used in an inappropriate manner. This is likely a reflection of the legitimacy of students’ concerns over not always being able to grasp the nuance of novel FSs, and possibly a lack of sufficient planning and preparation for their real conversations.
Table 6 Appropriateness of phrases
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:87824:20160415231602477-0998:S0958344014000044_tab6.gif?pub-status=live)
On the other hand, it is important to note that conversations can consist of as much listening as speaking, and some students noted an increase in ability to understand their interlocutors and authentic English input, such as television programs, due to the FSs they learned through their corpus consultation. One student commented:
As I mentioned earlier, for example, the phrase, “grab a bite” is a phrase that I couldn’t have encountered if I had studied in a regular way. It doesn’t appear in a textbook. I find words reading news articles then I try to search for FSs around those words… When I watch TV and encounter an FS that I learned in class, I would feel “I know this meaning!”
4.5 Attitudes toward DDL
With respect to student attitudes toward DDL, there was some scepticism about the grammaticality of the concordance data, and they believed it prudent to have a dictionary on hand to verify corpus findings. This scepticism is likely why they felt that a class of this nature is better suited to advanced learners of English as opposed to novices.
On the other hand, there were many positive perceptions of corpus consultation as well. Item 35 in Table 7, for example, suggests that one of the major aims of the course, to increase learner awareness of the interdependence of lexis and grammar, was largely effective with 21 of 29 participants agreeing with the statement. Additionally, the majority of students believed that they would continue to use a corpus in future classes, would recommend DDL to other learners of English in Japan, and believed that corpus use should be taught more regularly in English classes. These numbers suggest that students in this study were convinced of the utility of corpora in language education.
Table 7 Attitudes and beliefs about data-driven learning
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160711135107-16095-mediumThumb-S0958344014000044_tab7.jpg?pub-status=live)
* Raw numbers and percentage in parentheses
** 1: strongly disagree, 2: disagree, 3: somewhat disagree, 4: somewhat agree, 5: agree, 6: strongly agree.
5 Conclusion
Chambers (Reference Chambers2005: 111–112) notes that while there is an increasing body of research on corpus use by learners, there is “considerable scope for development, particularly in the area of course design and structure, concerning how one can successfully integrate corpus consultation into a programme of language study in higher education”. This was precisely one of the major aims of this paper. The course placed students in an interaction-rich environment that saw them interact with authentic materials in the form of news articles or videos, TV shows, etc. The students used that interaction to form a topic and choose key words, familiar or unfamiliar, that they believed would be useful in a conversation about their topic. Students then engaged in pattern hunting as they attempted to identify common patterns of usage regarding their key words. They endeavored to appropriate their newfound phrases by pushing themselves to produce them in their output in class with their classmates, and outside of class with more proficient speakers of English.
Based on the survey, interviews, and speaking journal reflection logs, students generally reported favorable impressions of the course, and perceived DDL as having a positive effect on their language learning. Participants also believed that corpus consultation should be taught more regularly in English classes, and planned to continue using the skills they learned in the class reported on here. The usual complaints about the tedious nature of learning to use the corpus surfaced, and some participants did express reservations about being able to use their newly discovered FSs in pragmatically appropriate ways. But the sample of phrases rated in this study suggests that students were more successful than not in employing their phrases in an appropriate manner. Additionally, there is some evidence that familiarization with FSs in the course led to increased understanding of FSs encountered outside of the class. Beyond the speaking journals, students responded very positively to the student-led lessons and believe that corpora can be a good tool for discovering the difference between near-synonymous words and phrases.
There are, however, a number of limitations with this study that need to be addressed. First and foremost is the small number and homogeneity of participants. With only 29 participants, all of whom were Japanese, it is difficult to extrapolate the findings of this study to a wider array of contexts. Another serious limitation is the lack of longitudinal data. While students indicated they would continue to use the corpus-consulting skills they learned into the future and in other classes, no follow-up survey or contact was made to verify this. More longitudinal studies that track learners’ corpus use over an extended period of time are a worthwhile direction for more research (cf. Yoon, Reference Yoon2008). It would be especially interesting to track students who go through a corpus-training course for a number of years after completion to see how long and to what extent they independently engage in corpus consultation.
This study has illustrated that, with training, learners can take advantage of the power of a corpus, and has provided qualitative evidence suggesting that students strongly believe that corpus consultation has the potential to facilitate the learning of novel usages of familiar lexical items, thus supporting the quantitative evidence provided by Frankenberg-Garcia (Reference Frankenberg-Garcia2012; this volume). Future research could perhaps investigate the effect of different corpus-based approaches in increasing learners’ functional knowledge of familiar lexical items. For example, in addition to investigating paper-based teacher-prepared DDL activities (Boulton, Reference Boulton2010a; Johns, Reference Johns1991), one exciting avenue could be to explore their electronic counterparts via tablet computers that offer more interactive and tactile affordances.
A number of scholars note that corpus consultation may have its brightest future outside the classroom as it affords students a high degree of autonomy (Chambers Reference Chambers2007; Yoon & Hirvela Reference Yoon and Hirvela2004). To see this prediction come to fruition, we recommend focusing efforts on making already excellent resources such as COCA even more accessible to casual learners. Perhaps corpora designed for hands-on use by learners can afford to sacrifice some depth and functionality in exchange for accessibility and intuitiveness. Corpus-based language learning might see even wider adoption if vetted, principled corpora were as accessible and intuitive as, say, Google searches.