1 Introduction
In the course of its development as a discipline, corpus linguistics has had a significant effect on the field of applied linguistics (Hunston, Reference Hunston2002). Its effects can be seen in materials development (e.g., dictionaries, usage manuals, grammar books, and course books), test development (Taylor & Barker, Reference Taylor and Barker2008), and the fact that many introductory books on second language acquisition (SLA), language teaching, language testing, and materials development include sections on corpus linguistics or its applied domains (e.g., Loewen & Reinders, Reference Loewen and Reinders2011; Long & Doughty, Reference Long and Doughty2011; Mackey & Gass, Reference Mackey and Gass2012; Shohamy & Hornberger, Reference Shohamy and Hornberger2008; Tomlinson, Reference Tomlinson2013).
Corpus research also has a deep connection with computer-assisted language learning (CALL). Specifically, among the pedagogical applications of corpora such as learning and teaching of vocabulary and phraseology, grammar, pragmatics, writing, reading, speaking, listening, and translation (e.g., Aijmer, Reference Aijmer2009; Aston, Reference Aston2001; Flowerdew, Reference Flowerdew2012; O’Keeffe, McCarthy & Carter, Reference O’Keeffe, McCarthy and Carter2007; Reppen, Reference Reppen2010; Römer, Reference Römer2010; Sinclair, Reference Sinclair2004), direct applications of corpora in which learners themselves get hands-on experience of using a corpus for learning purposes, often with guided tasks or materials, are called “data-driven learning” (henceforth DDL). The term “DDL” was coined by Johns (Reference Johns1990) more than 20 years ago, suggesting that the “language-learner is also, essentially, a research worker whose learning needs to be driven by access to linguistic data—hence the term ‘data-driven learning’ (DDL) to describe the approach” (p. 2). It should be noted here that we accept a broad definition of DDL (Römer, Reference Römer2011) involving both hands-on (i.e., direct searches of corpora by learners) and hands-off uses (i.e., searches of corpora by teachers, who prepare paper-based materials for learners; so-called ‘paper-based’ DDL).
Research to date suggests that the benefits of DDL include input enhancement from multiple contexts in a concordancer, rich exposure to authentic language use, awareness raising (or noticing) toward patterns and forms, cognitive and meta-cognitive development, improved skills and communicative ability, heightened motivation, student-centeredness, and inductive learning. All of these benefits could result in greater autonomy and life-long learning (Boulton, Reference Boulton2009, Reference Boulton2010; Gilquin & Granger, Reference Gilquin and Granger2010; Lin & Lee, Reference Lin and Lee2015; O’Sullivan, Reference O’Sullivan2007; C. Yoon, Reference Yoon2011).
In the field of CALL, research papers on DDL have appeared in the major international journals (e.g., Boulton & Pérez-Paredes, Reference Boulton and Pérez-Paredes2014; Geluso, Reference Geluso2013). Along with other corpus-integrated activities, the DDL approach has been regarded as a language learning and teaching methodology. It has been mainly used to show concordance lines to students for the purpose of learning and teaching lexico-grammatical or grammar structure with patterns. Especially in the field of English for Academic or Specific Purposes (EAP/ESP), it has been used often as a second language (L2) writing and reference tool (e.g., Friginal, Reference Friginal2013; H. Yoon & Jo, Reference Yoon and Jo2014). Although it is generally assumed that DDL works only for advanced learners and needs extensive training (e.g., Kennedy & Miceli, Reference Kennedy and Miceli2001), researchers have found that that is not the case and argued that DDL works equally well for lower-level learners without going through lengthy training in using a concordancer with the aid of teacher-prepared materials (e.g., Boulton, Reference Boulton2009; Chujo & Oghigian, Reference Chujo and Oghigian2012). Boulton (Reference Boulton2010), for example, has shown that, for lower-level learners, by utilizing paper-based DDL activities prepared by teachers, DDL in the printed form can have immediate benefits for those learners and can counter potential barriers that inhibit the use of DDL in the classroom.
Recent years have seen a growing body of research that examines the effects of DDL in the classroom. For example, DDL has proved effective in teaching phrasal verbs (Azzaro, Reference Azzaro2012), in improving the use of linking adverbials, reporting verbs, and verb tenses in academic writing (Friginal, Reference Friginal2013), and in acquiring lexico-grammatical patterns, shown in improved accuracy and complexity in L2 writing (Huang, Reference Huang2014). Frankenberg-Garcia (Reference Frankenberg-Garcia2014) showed that DDL could be useful for both receptive and production purposes. Using paper-based DDL, Smart (Reference Smart2014) put forward positive results that DDL helped learners improve their grammar ability with the passive voice. It should be emphasized that all the studies reported above as examples of those investigating the effectiveness of DDL as a teaching methodology provide evidence that experimental groups with DDL outperformed control groups without DDL, supporting the use of DDL in the classroom over other teaching methods and techniques. Further, there are a number of narrative syntheses on DDL such as Cheng (Reference Cheng2010: 320), who concludes that “DDL has been found to be a useful language learning methodology, and there is evidence that learners can indeed benefit from being both language learners and language researchers.” However, it is plausible that only advocates of DDL or corpus-based teaching methodology report positive evidence for DDL. Thus, more reliable and quantitative evidence of effectiveness of DDL as a teaching methodology at the meta-analysis level is necessary in order to evaluate the effects of its use objectively.
Meta-analysis is a secondary research methodology to statistically combine research outcomes represented in a quantifiable unit measured with instruments such as tests and psychological scales (e.g., Norris, Reference Norris2012). With meta-analysis, researchers can put forward more rigorous empirical evidence, as it is an established method to integrate the results from the primary studies. Recently, Cobb and Boulton (Reference Cobb and Boulton2015) conducted a meta-analysis of DDL studies, including improving writing, learning vocabulary and grammar, reading comprehension, and noticing skills. Their meta-analysis included 21 DDL studies out of 116 (from 1989 to 2012), which reported the essential descriptive statistics such as the number of participants, means, and standard deviations for calculating the effect sizes. Cobb and Boulton reported that, for the between-group contrasts (k = 13), the combined effect size was d = 1.04, 95% CI [0.83, 1.25], and for the pre-post or within-group contrasts (k = 8) was d = 1.68, 95% CI [1.36, 2.00]. Both can be regarded as large effect sizes following the criteria of Plonsky and Oswald (Reference Plonsky and Oswald2014). From these results, Cobb and Boulton concluded that, especially by comparing the results from meta-analyses of instructed SLA in general (Norris & Ortega, Reference Norris and Ortega2000; Spada & Tomita, Reference Spada and Tomita2010) and CALL in particular (Grgurović, Chapelle & Shelley, Reference Grgurović, Chapelle and Shelley2013), corpus use in the classroom is effective and results in sizable gains in the outcome measures.
Despite its reported effectiveness, DDL has not become part of mainstream teaching practice. This may be because, as Gilquin and Granger (Reference Gilquin and Granger2010) rightly pointed out, problems and limitations of DDL exist in (a) the logistics, (b) the teacher’s point of view, (c) the learner’s point of view, and (d) the content of DDL. As for (a) the logistics, we tacitly assume that DDL involves the use of concordancing software in a classroom where computers are available. However, there are a few studies that report the effectiveness of paper-based material as a variant form of DDL activities (e.g., Boulton, Reference Boulton2010; Chujo, Anthony, Oghigian & Uchibori, Reference Chujo, Anthony, Oghigian and Uchibori2012; Chujo & Oghigian, Reference Chujo and Oghigian2012). Considering that beliefs and self-efficacy play a pivotal role in learning and teaching a second or foreign language (e.g., Dörnyei, Reference Dörnyei2005), (b) the teacher’s point of view and (c) the learner’s point of view toward DDL should be of paramount importance, especially when those new to the DDL approach have negative opinions about DDL or have no idea about what it is. It has been reported that teachers do not normally learn how to use corpora in teacher training courses (Granath, Reference Granath2009; Römer, Reference Römer2009). At the same time, teachers may not have adequate computer expertise or may find preparing DDL materials time-consuming (Lin & Lee, Reference Lin and Lee2015). Learners share similar difficulties with teachers in utilizing the DDL approach, and without appropriate support, factors such as lack of confidence in using a concordancer, its time-consuming nature, and the difficulty of interpreting concordance lines may impede active use of DDL (Geluso & Yamaguchi, Reference Geluso and Yamaguchi2014). Yet, studies that attempt to change the viewpoints of teachers or learners toward DDL have not been conducted thus far. Regarding (d) the content of DDL in an EAP or ESP course, although DDL could be used for a communicative learning task (Cresswell, Reference Cresswell2007) and such use in a communicative context is recommended (Aston, Reference Aston2001), mostly it is either geared toward learning and teaching of lexico-grammatical items or is used as an L2 writing reference tool, as noted earlier. The specific content (of the corpus data and syllabuses) and outcome measures in the DDL approach vary from study to study, which may limit the generalizability of the findings reported. Obviously, these concerns need to be further addressed and overcome: because research on DDL as a learning and teaching methodology will likely continue, we need to tackle each problem in the DDL approach more thoroughly and systematically.
In this study, of the four problems and limitations listed above, we focused on “the learner’s point of view.” In many of the empirical studies investigating DDL, attitudes toward the approach are reported and discussed (Boulton, Reference Boulton2009). However, no study to date has measured the learner’s point of view (i.e., learner’s attitude toward DDL) with a psychometric “scale” (e.g., a Likert scale) with more than two items for each construct (also called multi-item scales, Dörnyei & Taguchi, Reference Dörnyei and Taguchi2010: 23). The scale score, which can be gained from adding (or averaging) item scores for similar questions, represents an underlying trait such as attitudes, beliefs, and other mental variables. On the other hand, DDL studies have conventionally employed a single-item scale, which supposedly taps into the target construct with only one item. For instance, H. Yoon and Hirvela (Reference Yoon and Hirvela2004) asked students for their opinions (perceptions of the strengths and weaknesses) about using the Collins COBUILD Corpus in L2 writing. Although this study was highly revealing in that it provided an overview of students’ opinions about using DDL, only one item was used to measure each construct (e.g., “The corpus is more helpful than a dictionary for my English writing.”). As such, the response of a student for a specific item may have been unreliable due to measurement error. As measurement error is virtually always present in any measurement, we need a scale consisting of more than two items rather than one item for more accurate measurement. As an illustrative example, imagine a situation where a researcher intends to measure a learner’s grammatical ability with multiple-choice questions. Of course, the researcher would prepare more than one item to measure the learner’s grammatical ability, such as accurate use of the target irregular verbs, because measuring it with only one item would certainly entail a measurement error (e.g., resulting from mood, fatigue, and wording of the items). When it comes to questionnaires, researchers often fail to take measurement error into account. As Dörnyei and Taguchi (Reference Dörnyei and Taguchi2010: 23) argue, “the notion of multi-item scales is the central component in scientific questionnaire design, yet this concept is surprisingly little known in the L2 profession.” Dörnyei and Taguchi also continue that “because of the fallibility of single items, there is a general consensus among survey specialists that more than one item is needed to address each identified content area” (p. 25). With multi-item scales, it is possible to calculate a reliability coefficient such as Cronbach’s α, which reflects that the scale takes measurement error into consideration. Furthermore, it has been reported that “multi-item scales clearly outperform single items in terms of predictive validity” (Diamantopoulos, Sarstedt, Fuchs, Wilczynski & Kaiser, Reference Diamantopoulos, Sarstedt, Fuchs, Wilczynski and Kaiser2012). Note that some researchers are against using Likert scale items as interval scales because they are in fact ordinal (for which adding or averaging item scores is inappropriate). However, it is conventionally accepted that Likert scales can be treated as interval data when the wider range of scales are included (Hatch & Lazaraton, Reference Hatch and Lazaraton1991).
Although studies employing a questionnaire with a single-item approach could shed light on learners’ attitudes toward DDL, for the reasons stated above we need to develop a psychometric scale to further investigate and advance the research on DDL. This is especially true considering the fact that DDL is claimed to foster autonomy and learner-centeredness in learning (Boulton, Reference Boulton2010; O’Sullivan, Reference O’Sullivan2007). With a psychometrically valid and reliable scale for measuring learners’ attitudes toward DDL, sophisticated modern statistical analyses can be conducted to further understand the relationship between learners’ attitudes toward DDL and other potentially influencing variables in a more scientific manner. Thus, the aim of the current study was to develop and validate a reliable multi-item scale (in the form of a questionnaire) for measuring learners’ attitudes toward DDL (i.e., perceived preferences and benefits).
2 Method
2.1 Treatments in previous studies
Before we describe how we developed an item pool for the questionnaire, in this section, we explain the treatments in previous studies, in which we have collected open-ended responses about the perceived preferences and benefits learners feel toward the DDL approach. The second author of this article and her colleagues have been incorporating DDL in their classroom teaching practice since 2004 (Chujo, Anthony, Oghigian & Uchibori, Reference Chujo, Anthony, Oghigian and Uchibori2012), targeting beginner-level English as a Foreign Language (EFL) university learners (A1 to A2 in the Common European Framework of Reference for Languages; CEFR, Council of Europe, 2001). The learners’ nationality and first language (L1) are Japanese. They are mostly science and engineering majors at a private university in the Tokyo metropolitan area of Japan. DDL applications began as a four-week (once a week; 90 minutes) activity for learning vocabulary items. They have since been conducted with pedagogical modifications made every year, together with the development of user-friendly web-based bilingual concordancers (Anthony, Chujo & Oghigian, Reference Anthony, Chujo and Oghigian2011). Since 2006, they have been expanded to 20 weeks (i.e., two semesters of ten weeks; 90 minutes per week) with DDL syllabuses such as those listed in Table 1.
Note. NP stands for noun phrases; VP stands for verb phrases. Reprinted with permission of Chujo, Anthony and Oghigian (Reference Chujo, Anthony and Oghigian2009).
The DDL syllabuses were developed based on the focus of the course—preparing those beginner-level learners for taking the Test of English for International Communication (TOEIC). In other cases, the focus involved learning rudimentary grammar rules for remedial-level learners (Chujo, Anthony, Oghigian & Yokota, Reference Chujo, Anthony, Oghigian and Yokota2013). Also, concordancers and teaching procedures have been refined each year to meet the needs of evolving DDL syllabuses. Thus, we present the 2008 DDL syllabus here as an example and provide a detailed description of the treatment. In this course, each class had a vocabulary component and a DDL grammar component. The vocabulary was grouped into categories such as business, personnel, meetings, and travel, and learners studied a total of 400 words over two semesters. In each class, learners first used a CALL vocabulary program to study 20 vocabulary words. Seven of those 20 words were used as the focal point of the subsequent DDL grammar lesson.
Both the grammar and vocabulary items covered in the syllabus used in 2008 (Table 1) were selected following research findings that suggest grammar and vocabulary items to learn to better prepare beginner-level learners for the TOEIC test (for details, see Chujo & Oghigian, Reference Chujo and Oghigian2012).
In teaching grammar and vocabulary items, teachers have utilized bilingual corpora such as a Japanese-English newspaper corpus (Utiyama & Isahara, Reference Utiyama and Isahara2003) for TOEIC preparation courses, and the Corpus of Beginner Level English (Chujo et al., Reference Chujo, Anthony, Oghigian and Yokota2013) for remedial-level courses aiming at learning junior- or high-school-level grammar items. Such bilingual corpora and parallel concordancers enable learners to understand the target language concordance lines and show a context in both languages in the format of Key Word in Context (KWiC), which is a prerequisite for an effective use of DDL (Chujo, Anthony & Oghigian, Reference Chujo, Anthony and Oghigian2009). The rationale behind using bilingual corpora and parallel concordancers was that L1 translations would help overcome common difficulties (e.g., lack of confidence and difficulty in interpreting concordance lines) that lower-level learners would face in using monolingual concordancers. The following four-stage approach, which was developed in 2008 (Chujo & Oghigian, Reference Chujo and Oghigian2008), has been employed in the classroom DDL activities (Chujo, et al., Reference Chujo, Anthony, Oghigian and Yokota2013: 73):
Stage 1: Hypothesis formation through inductive DDL with hands-on tasks
Stage 2: Explanations from the teacher to confirm or correct these hypotheses
Stage 3: Hypothesis testing through follow-up exercises (homework) and teacher feedback on homework
Stage 4: Production through follow-up exercises (in class) and teacher feedback on homework
In Stage 1, students work in pairs or groups sharing their findings and offering support to each other; consequently, they arrive at hypotheses about the form and usage of a particular lexico-grammatical pattern (Figure 1). In Stage 2, the teacher explains the target items so that students can confirm if their hypotheses were correct or not. In Stage 3, students are assigned to work on additional practice and consolidation exercises as homework. In Stage 4, students again work together to complete the production practice exercises in class and the teacher gives feedback on the exercises.
The four-stage approach may seem like a traditional class procedure where a teacher-centered, deductive PPP (presentation, practice, and production) method is used. That, however, is not the case. In Stage 1, students play an active role in formulating their hypotheses, exchanging their ideas with peers in pairs or in groups, and discovering grammatical patterns by themselves. This type of interaction helps promote scaffolding in the Zone of Proximal Development (e.g., van Lier, Reference van Lier2004) wherein learners with different levels of proficiency and learning styles can help each other. The teacher’s role, therefore, is that of a facilitator at this point, encouraging discovery learning. From Stage 2, explicit explanation takes place, in which the deductive PPP teaching method is used. In this sense, our four-stage approach is a hybrid of an inductive DDL approach and a deductive grammar teaching method, one very similar to the “guided induction” (Flowerdew, Reference Flowerdew2009; Smart, Reference Smart2014).
Chujo and Oghigian (Reference Chujo and Oghigian2012) report that, in their studies using the DDL approach described above, student gains in the target lexico-grammatical items and overall proficiency, measured with the pre-post test design, have been consistently positive over the years. Moreover, Mizumoto and Chujo (Reference Mizumoto and Chujo2015) conducted a meta-analysis of their DDL studies with a pre-post design and concluded that the synthesized results, based on the classification of the outcome measures, showed that the DDL approach worked well particularly for learning vocabulary items (k = 4, d = 2.93, 95% CI [2.19, 3.67]). Other effects were calculated for basic grammar items (k = 9, d = 0.81, 95% CI [0.69, 0.93]), noun and verb phrases (k = 14, d = 0.86, 95% CI [0.73, 0.99]), and for proficiency (k = 5, d = 0.40, 95% CI [0.22, 0.58]). Note that the proposed criteria for d values from pre-post or within-group contrasts are: 0.60 small, 1.00 medium, and 1.40 large (Plonsky & Oswald, Reference Plonsky and Oswald2014).
In addition to those sizable gains in the learning outcomes, students continue to report that they enjoyed the DDL approach and found it useful and effective for learning grammar and vocabulary. For example, Figure 2 illustrates the accumulated feedback from students on the perceived effectiveness of DDL activities for grammar and vocabulary learning in the teaching practice of four consecutive years from 2006 to 2010 (N = 103).
As Figure 2 shows, a majority of students perceived the DDL activities to learn grammar and vocabulary items as effective, with 64.1% of the students feeling positive about DDL activities in learning grammar and 70.9% in learning vocabulary.
2.2 Development of the item pool
In their series of classroom applications of DDL stretching over ten years (described above), the second author of this article and her research team have amassed students’ reflective open-ended responses (i.e., perceived preferences and benefits) to questions about DDL at the end of each course. Some of their studies were concerned with organizing these reflective comments into categories and coding them accordingly. Chujo et al. (Reference Chujo, Anthony and Oghigian2009) performed a text analysis (i.e., text mining) using these open-ended responses and found that words such as can, word, usage, variety of, examine, observe, search, immediately were used more frequently than other words, suggesting that the beginner-level learners in the primary studies appreciate that they can examine and observe a variety of examples of authentic usage immediately and easily. In the process of writing questionnaire items, we referred to the original contexts in which those words were used. Although negative keywords such as eyestrain, lack of time, and initial settings and open-ended responses such as “It first required time to get used to the search operation” were also reported (Chujo et al., Reference Chujo, Anthony and Oghigian2009), we decided to focus on the perceived preferences and benefits of DDL, and did not include the drawbacks. This is because, in unidimensionality scale development (i.e., measuring only one overarching construct in an instrument), it would be theoretically and statistically unsupportive to attempt to measure mixed constructs. In other words, we could not create a one-size-fits-all measurement instrument in theory. While recognizing the potential and pedagogical value of including drawbacks in the questionnaire items, we chose statistical and methodological rigor over practical utility, which is inevitable in a scale development study like this one.
By referring to these open-ended responses and items or categories in previous studies (e.g., H. Yoon & Hirvela, Reference Yoon and Hirvela2004), we developed a list of questionnaire items. The wording of the items was examined and modified where necessary by the authors of this article. In total, 18 items were developed as a result of this procedure and included in the questionnaire (see Table 3 for items).
2.3 Field-testing the questionnaire
The questionnaire was field tested with 267 university EFL learners (science and engineering majors, 226 males and 41 females, aged 18–20) at a private university in Japan. Their proficiency measured with the TOEIC Bridge test was beginner level (n=255, M=131.73, SD = 14.08; the score is roughly equivalent to 350 in the TOEIC test), which, according to Educational Testing Service (2007), is classified as “Basic User” (A1 to A2) in the CEFR. We focused on the learners at this proficiency level because approximately 80% of the target Japanese EFL learners (university undergraduates) are at that level (Tono & Negishi, Reference Tono and Negishi2012). Although the proficiency measures of the respondents in the previous studies were not available, the authors confirmed, based on observations and in-house examinations (and also from the fact that the DDL intervention had been conducted at the same university for the past ten years), that the participants who responded to the questionnaire had about the same proficiency level as the samples in the previous studies.
The questionnaire was administered after a three-month DDL intervention (according to the DDL syllabus and procedures explained in Chujo et al., Reference Chujo, Anthony, Oghigian and Uchibori2012) as part of a compulsory English course at their university. The participants responded to the 18-item questionnaire on a 6-point scale: 1 (not at all true of me), 2 (not true of me), 3 (somewhat not true of me), 4 (a little true of me), 5 (mostly true of me), and 6 (very true of me), according to the degree of perceived preferences and benefits of DDL. The decision of using a 6-point scale (an even number without the ‘middle’ option) was based on a suggestion by Dörnyei and Taguchi (Reference Dörnyei and Taguchi2010: 28).
After pilot-testing the questionnaire, item analyses were carried out consisting of the following steps: (a) examining the item-total correlations to determine whether the figures were over 0.3, which is a suggested criterion for this index of item analysis (Wintergerst, DeCapua & Itzen, Reference Wintergerst, DeCapua and Itzen2001: 391); (b) using exploratory factor analysis to investigate which items belong together (i.e., construct validity); (c) scrutinizing Cronbach’s α levels to verify the internal consistency of the subscales.
R version 3.1.0 (R Core Team, Reference Core Team.2014) was used for all the statistical analyses in this study. In exploratory factor analysis, maximum likelihood extraction with promax rotation was performed. To decide on the number of factors, we first looked at the scree plot to check any distinctive slope between factors, and retained those with the eigenvalues greater than 1.0, which is an accepted criterion for extracting factors. Then, items showing factor loadings above 0.4 on only one factor were adopted. After going through these screening procedures, 16 of the eighteen items were retained for the final version of the questionnaire.
Table 2 shows the result of the exploratory factor analysis. Two factors were extracted and, through an extensive discussion among the authors, they were named after advantages of DDL reported in the literature (e.g., Johansson, Reference Johansson2009; Kennedy & Miceli, Reference Kennedy and Miceli2001; O’Sullivan, Reference O’Sullivan2007). Table 3 lists the questionnaire items.
Note. Items are renumbered for clarity.
The first factor was “Clarity,” which consists of items such as “I can see the target sentences in real use” (Item 05) and “I can see many sentences that include the target structure” (Item 01). These items intend to measure the advantages of DDL in clarifying the authentic use of target structures. The second factor was “Autonomy” with items such as “this type of learning is not passive but active” (Item 12) and “I can search for and learn target sentences independently” (Item 11). These items seem to represent the extent to which the learners embrace the autonomy of learning in the DDL approach. Internal consistency reliability (Cronbach’s α) of the items representing each factor was moderately high with .91 for Clarity and .81 for Autonomy. The Pearson correlation coefficient between Clarity and Autonomy was .60, which is reasonably high, and thus it can be suggested that these two subscales (i.e., Clarity and Autonomy) successfully measure similar constructs under an overarching theme, “perceived preferences and benefits of DDL.”
2.4 Administering the questionnaire
In order to validate the factor structure, the 16-item questionnaire targeting the perceived preferences and benefits of DDL was administered to another group of 147 EFL learners. The major and proficiency level of the participants were about the same as those of the learners in the pilot study (science and engineering majors, 123 males and 24 females, aged 18–20; the TOEIC Bridge test score, n = 140, M = 131.00, SD = 15.82). We administered the questionnaire to the learners who take the same courses that incorporate a DDL approach at the same university in the year following the pilot study. Like the pilot study, the questionnaire was administered after a three-month DDL intervention as part of a compulsory English course at their university. The participants again responded to the 16-item questionnaire on a 6-point scale, from 1 (not at all true of me) to 6 (very true of me), according to the degree of perceived preferences and benefits of DDL.
The construct validity of the questionnaire was examined with confirmatory factor analysis (CFA) because the decisions about the factor models were made a priori (from the pilot study). In order to ensure the unidimensionality (i.e., measuring only one underlying trait) of the subscales of the questionnaire, CFA was conducted for each subscale, Clarity and Autonomy. Based on a criterion for higher convergent validity (Hair, Black, Babin, Anderson & Tatham, Reference Hair, Black, Babin, Anderson and Tatham2006: 777), items with standardized loading estimates higher than 0.5 were used for further analysis. This procedure left five items for both Clarity and Autonomy measures. In structural equation modeling (SEM), an umbrella term for a collection of methods including CFA, a set of observed variables (i.e., the questionnaire items in this study), latent variables (i.e., underlying traits that can be represented with the subscales of the questionnaire – Autonomy and Clarity, in this study), and measurement errors can be modeled simultaneously. Because SEM tests whether a hypothesized model is consistent with the observed data, several fit indices are examined to evaluate the overall fit of the structural model. In this study, we checked the comparative fit index (CFI), Tucker-Lewis index (TLI), root mean square error of approximation (RMSEA), and standardized root mean square residual (SRMR). The acceptable criterion value for each fit index is: CFI and TLI>.90, RMSEA and SRMR<0.08. Although the chi-square value, with which a familiar p value can be computed, is the basis of these indices, the evaluation of goodness of fit with chi-square value is not recommended because it is sensitive to sample size and other factors (Hair et al., Reference Hair, Black, Babin, Anderson and Tatham2006). Thus, it is a common practice in SEM to examine and compare several goodness-of-fit indices, which do not come with p values. The one-factor model provided a good fit to the data for both the subscales of Autonomy (CFI=.97, TLI=.94, RMSEA=0.09 [90% CI=0.01–0.17], SRMR=.04) and Clarity (CFI=1.00, TLI=1.00, RMSEA=0.00 [90% CI=0.00–0.11], SRMR=.02).
In addition to the questionnaire asking about the perceived preferences and benefits of DDL, six items were prepared (Table 4) to measure task values, which can be defined as individual beliefs about the relative worth of tasks, by referring to items such as those in the Motivated Strategies for Learning Questionnaire (MSLQ); Pintrich, Smith, Garcia & McKeachie, Reference Pintrich, Smith, Garcia and McKeachie1993) and administered to the same participants. The rationale behind measuring the task values is that we can expect a relationship between the perceived preferences and benefits of DDL and the task values. That is, if one favors and appreciates the benefits of DDL, it is likely that one’s task values will increase, and probably vice versa as well (i.e., high task values would correspond to higher degrees of perceived preferences and benefits of DDL). This would provide us with convergent evidence of the construct validity of the proposed scale (i.e., external aspect of validity in Messick, Reference Messick1995). With CFA, the one-factor model of task values with 6 items (see Appendix B) showed a good fit to the data (CFI=.97, TLI=.95, RMSEA=0.12 [95% CI=0.07–0.17], SRMR=.04). The relationship among the perceived preferences and benefits of DDL and task values was also explored with a path analysis using SEM and its model fits. For the purpose of transparent sharing of data and their analyses, the data and R codes used in this study are available online (http://mizumot.com/files/ReCALL_DDL.html).
3 Results and Discussion
Internal consistency reliability (Cronbach’s α) and descriptive statistics of all the scales are displayed in Table 5. Cronbach’s α reliability coefficients for the three scales, Clarity, Autonomy, and Task Value, were relatively high. These three scales showed reasonably high correlation coefficients (i.e., all values above .50). According to Dörnyei (Reference Dörnyei2007: 223), “in applied linguistics research we can find meaningful correlations of as low as 0.3–0.5 . . . and if two tests correlate with each other in the order of 0.6, we can say that they measure more or less the same thing.” Furthermore, a more recent meta-analysis of L2 studies by Plonsky and Oswald (Reference Plonsky and Oswald2014) suggests that a correlation coefficient of .60 can be considered a large effect (i.e., a strong relationship). Applying these criteria, we can argue that the two subscales of perceived preferences and benefits of DDL (i.e., Clarity and Autonomy) had a strong correlation (r = .58). Each of these two subscales showed high correlations with Task Value (Clarity and Task Value, r = .58; Autonomy and Task Value, r = .62). That is, the more positive the learners’ attitudes toward perceived preferences and benefits of DDL (i.e., Clarity and Autonomy) are, the higher their Task Value would be, and vice versa. These patterns exactly reflected the hypothesized relationship between the perceived preferences/benefits of DDL and task values.
Note. Correlation between scale scores (below diagonal); SEM result (above diagonal). Possible range of item response (M): 1 to 6.
Figure 3 shows the result of SEM, which explored the relationship between the perceived benefits of DDL and task values. The model assumes that the higher-order factor (i.e., Perceived Benefits of DDL) affects the two factors of DDL (i.e., Clarity and Autonomy), and that the higher-order factor correlates with Task Value. In Figure 3, the variables are illustrated with boxes and circles. Boxes show that they are observed variables (i.e., questionnaire items) and circles are latent variables (i.e., constructs or underlying traits that affect the response of each questionnaire item). The arrows show the standardized path coefficients with a possible range of –1 to +1. Those paths represent the strength of the relationship between variables, with the higher values indicating stronger relationships. The directions of arrows indicate the hypothesized causality within the model. For each item and two factors (Clarity and Autonomy), measurement errors, which cannot be explained with the hypothesized model, are also displayed in Figure 3. Because SEM can include measurement error in the model in this way, it is a known advantage of using SEM in data analysis.
The fit indices revealed that the hypothesized model provided an adequate fit to the data (CFI=.93, TLI=.92, RMSEA=0.08 [95% CI=0.07–0.10], SRMR=.06). Because examining the higher-order factor (latent variable) and its relationship with other variables (in this case, Task Value) and evaluating the model fit indices are not possible with normal correlation analyses (Table 5), the hypothesized model in Figure 3 provides stronger statistical evidence that Perceived Benefits of DDL and Task Value positively affect each other (with the standardized correlation coefficient of .82).
Taken together, with adequate internal consistency reliability coefficients, high inter-correlations, and established convergent evidence of construct validity (i.e., the expected relationship between the subscales and task values), the two subscales in the questionnaire serve as a valid and reliable measure of the learners’ perceived preferences and benefits of DDL. This study is unique in that it provides a research and pedagogical instrument for assessing the effectiveness of DDL (and possibly expanded into CALL in general) from the viewpoints of learners. As such, the questionnaire developed in this study will be useful for systematic inquiry into learners’ attitudes toward DDL.
Although we can justifiably claim that we have achieved our initial research purpose of developing a psychometrically sound instrument for assessing learners’ perceived preferences and benefits of DDL, the current study has a few limitations that should be acknowledged. First, the participant groups here (in both pilot and main studies) were homogeneous, composed of false-beginner level learners. Second, the study took place only in one university in Japan. It can be assumed that the socio-cultural context the learners are in will influence the way they perceive DDL. Therefore, studies on the effectiveness of DDL with more proficient learners, at different institutions, and in L2 or other foreign language learning settings, should be conducted to generalize the applicability of the instrument. Third, the questionnaire items were created based on a “guided induction approach” (Flowerdew, Reference Flowerdew2009; Smart, Reference Smart2014), where learners use self-explanatory guided worksheets created by teachers and mainly engage in exploring lexico-grammatical items in a classroom. As pointed out by Johansson (Reference Johansson2009: 41) and Flowerdew (Reference Flowerdew2012: 197), though DDL is usually associated with an inductive approach, in actual DDL classroom activities, a “deductive” DDL approach is often used. Such use of DDL is in stark contrast to inductive DDL, where advanced learners consult the concordance lines themselves with much less help from the teacher (Boulton, Reference Boulton2010; Cresswell, Reference Cresswell2007; Flowerdew, Reference Flowerdew2009). Because the scale developed in the current study is mainly for the former purposes (i.e., deductive DDL) and it is possible that given different tasks, learners would respond to each item differently, caution should be exercised when researchers employ the scale for the latter purposes (i.e., inductive DDL). Fourth, the developed scale deals with learners’ perceived preferences and benefits of DDL, and problems or difficulties in corpus use (H. Yoon & Hirvela, Reference Yoon and Hirvela2004) were not included in the constructs it measures. This is, as described earlier in this paper, because it is not possible to create a one-size-fits-all measurement instrument, and we chose statistical and methodological rigor accordingly. Thus, those researchers who are interested in negative aspects of DDL will need to devise a similar scale to investigate the topic precisely and in more detail with a psychometrically based multi-item scale.
These limitations being pointed out, findings from the current study have several implications for research and practice involving DDL and CALL in general. First, the questionnaire could serve as a tool to investigate the relationship between the attitude toward DDL and outcomes (i.e., test scores) or other learner variables such as learning styles, learning strategies, motivation, and self-regulation (e.g., Dörnyei, Reference Dörnyei2005) with much more precision in measurement by using this new instrument. In this sense, although the current study fits previous research on learners’ assessment of the benefits of DDL, it does not confirm or contradict previous findings in a directly comparable manner. In other words, it opens up a new, sophisticated, and more desirable assessment dimension for researchers and practitioners involved in DDL. Second, the instrument developed in this study can be used for the evaluation of DDL practice, which in the past could be achieved with test scores or open-ended responses from learners. With this instrument, it is possible to quantify attitudes toward DDL use in language learning and teaching. Third, related to the second implication, meta-analysis or research synthesis of DDL in terms of how learners perceive it is now feasible because the questionnaire developed is a quantifiable, valid, and reliable scale of the trait (i.e., learners’ perceived preferences and benefits of DDL). With the questionnaire developed in this study, it will also be possible to synthesize and evaluate learners’ attitude toward DDL in a quantifiable fashion. As can be illustrated in these implications, the newly developed scale could have the potential to greatly expand the research possibilities of DDL and CALL in general.
4 Conclusion
The primary purpose of this study was to develop and validate a psychometrically sound scale to measure learners’ perceived preferences and benefits of DDL in order to provide a valid and reliable instrument for research and pedagogical purposes. Through the development phases, it was confirmed that the instrument possesses excellent psychometric properties as a measure of learners’ perceived preferences and benefits of DDL.
Based on the findings, we suggest that researchers who investigate the role of DDL as a teaching methodology can now utilize the psychometrically justifiable multi-item scale developed in the current study. Researchers can also study the relationship between the subscales of the developed questionnaire and other variables (e.g., Autonomy), which can allegedly be fostered by engaging in DDL activities.
Given that the research momentum of DDL is likely to continue, we believe the scale developed in this study holds promise as a valid and reliable measure of learners’ attitude toward DDL, and more studies using this scale type will be conducted.
Acknowledgements
This study was supported by JSPS KAKENHI Grant Numbers 26704006 and 25284108. We would like to thank the anonymous reviewers for their constructive comments and feedback to improve the quality of the paper.