1. Introduction
Concept mapping (CM) is a cognitive learning strategy and a visual road map that represents the pathways connecting the meanings of concepts (Novak & Gowin, Reference Novak and Gowin1984; Novak, Gowin & Johansen, Reference Novak, Gowin and Johansen1983). Two notions form the theoretical framework of CM: “schema theory” (Rumelhart & Norman, Reference Rumelhart, Norman, Cotton and Klatzky1978; Winn & Snyder, Reference Winn, Snyder and Jonassen1996) and Ausubel’s (Reference Ausubel1968) “theory of human cognitive learning,” often referred to in the literature as meaningful learning theory. According to the schema theory, when learners are confronted with new information, they react in one of three ways: accretion, tuning, or restructuring. In accretion, the learner assimilates new knowledge into an existing schema without changing the overall schema. In tuning, the learner realizes that the existing schema is inadequate for the new knowledge and modifies that. In restructuring, the learner creates a new schema that resolves inconsistencies between the new knowledge and the old schema.
Ausubel’s (Reference Ausubel1968) theory of human cognitive learning focuses on the processes of using meaningful learning for acquiring knowledge. At the core of the theory is the distinction between rote and meaningful learning. Ausubel theorized that new meaning develops as a result of previously acquired concepts and propositions reorganizing and merging with new information. While the process of creating a concept map is in the works, a rearrangement of new concepts occurs as a result of obtaining new knowledge and a deeper understanding of the topic. As a continuation of Ausubel’s legacy, Novak’s (Reference Novak2002) “theory of meaningful learning” connects cognitive gains with emotional sensitivity. According to Novak, meaningful learning is dynamic and is modified over time as new information is acquired.
One means of developing students’ academic competence and performance includes writing through meaningful practice. As an essential language skill, writing compels second/foreign language (L2) learners to think, concentrate, and organize their ideas. It also cultivates their analytical abilities and demands highly meaningful cognitive practice. Meanwhile, students learning to write in an L2 come across multiple challenges, namely deciding on the content knowledge related to a writing topic, making proper lexical and grammatical choices to form correct sentences, and organizing sentences into a paragraph or an entire essay. Hence, it is vital to identify efficient learning strategies and constructive planning tools, as composing a well-developed multiple-paragraph essay can be a daunting task for learners (Graham, Collins & Rigby-Wills, Reference Graham, Collins and Rigby-Wills2017). According to Novak (Reference Novak2002), learning by rote contributes very little to the learners’ knowledge structure and hampers creative thinking and innovative problem-solving. As an attempt to counter this inefficiency, CM was put forth for augmenting conceptual understanding such that new concepts can be meaningfully assimilated into the learners’ pre-existing knowledge base. Focusing on knowledge structure in such a way is assumed to promote meaningful learning.
Although modern practices are oriented towards pre-writing as a key component of the writing process, it is often not given substantive thought by instructors in many educational settings (Chang, Sung & Chen, Reference Chang, Sung and Chen2001; Gardner, Reference Gardner2015). Teachers often express concern that enhancing learners’ writing ability is easier said than done, which, ideally, requires ample time, energy, and expertise to fulfill (Zumbrunn & Krause, Reference Zumbrunn and Krause2012). In this process, learners’ lexical competence can be an asset to their written productions. An essential ingredient of lexical competence is lexical diversity (LD), which refers to the range or, in fact, breadth of vocabulary usage in a text or piece of writing (Malvern, Richards, Chipere & Durán, Reference Malvern, Richards, Chipere and Durán2004). For a text to be highly lexically diverse, the writer has to produce a wide range of different words, with little repetition of the words already used (McCarthy & Jarvis, Reference McCarthy and Jarvis2007). In short, the current study attempted to investigate the effectiveness of computer-aided concept mapping (CACM) as a pre-writing strategy versus the traditional outlining approach on English as a foreign language (EFL) learners’ LD as a key component of linguistic competence in written productions. In addition, the study sought to explore the relationship between learners’ LD and their writing quality. Hence, the following research questions were proposed:
-
1. Is CACM as a pre-writing strategy more effective than the traditional outlining approach in developing EFL learners’ LD?
-
2. Is there a relationship between LD (represented by D values) and the writing quality scores of the two conditions (CACM vs. outlining)?
2. Literature review
To date, the effects of various pre-writing strategies (e.g. outlining, free writing, brainstorming, mind mapping) on various aspects of English as a second language (ESL) and EFL learners’ writing ability have been investigated (e.g. Ellis & Yuan, Reference Ellis and Yuan2004; Lan, Sung, Cheng & Chang, Reference Lan, Sung, Cheng and Chang2015; Ojima, Reference Ojima2006). In principle, it is held that skills such as meaningful organization of the text, critical thinking, and idea generation are positively affected by the aforementioned pre-writing strategies. Furthermore, several studies (e.g. Blunt & Karpicke, Reference Blunt and Karpicke2014; Novak & Gowin, Reference Novak and Gowin1984; Redford, Thiede, Wiley & Griffin, Reference Redford, Thiede, Wiley and Griffin2012) have proposed that CM is an effective strategy that can improve both the quality and quantity of written compositions. Nevertheless, the majority of studies to this day have adopted a paper-and-pencil approach to CM. In retrospect, the traditional approach has several drawbacks (Chang et al., Reference Chang, Sung and Chen2001; Reader & Hammond, Reference Reader and Hammond1994; Schau, Mattern, Zeilik, Teague & Weber, Reference Schau, Mattern, Zeilik, Teague and Weber2001). On the one hand, the process of constructing and modifying concept maps produced in this way is often tedious and frustrating for students, and occasionally viewed by teachers as inconvenient for evaluation due to their often cluttered image. On the other hand, the development process of the concept maps cannot be recorded; hence, the quality of learning may only be judged on the basis of learners’ final production.
2.1 CACM and aspects of writing
In light of the advances in computer-aided educational applications, a number of software packages have been developed to cope with the limitations of the paper-and-pencil approach. Mapping software programs enable learners to construct, revise, and record their individual maps and receive instant feedback from an instructor; meanwhile, the instructor is also able to trace and further examine the students’ idea-generation trajectories. Thus, CACM is seen to have the potential to overcome most of the shortcomings of the paper-and-pencil approach. A plethora of studies (e.g. Gardner, Reference Gardner2015; Kwon & Cifuentes, Reference Kwon and Cifuentes2009; Liu, Reference Liu2011; Liu, Chen & Chang, Reference Liu, Chen and Chang2010) corroborate the effectiveness of incorporating CACM on learners’ writing performance in terms of various dimensions including accuracy, fluency, coherence, and syntactic complexity. Overall, the bulk of research in this area suggests that LD plays a crucial role in distinguishing between high and low writing proficiency levels (Crossley, Salsbury & McNamara, Reference Crossley, Salsbury and McNamara2012; Gebril & Plakans, Reference Gebril and Plakans2016; González, Reference González2017; Olinghouse & Wilson, Reference Olinghouse and Wilson2013). However, there are studies surrounding LD and writing proficiency levels that found little or no association between the two variables (e.g. Jarvis, Reference Jarvis2002; Wang, Reference Wang2014; Yu, Reference Yu2010). In fact, evidence implies that moderator variables such as first language (L1) background and planning strategies may affect this relationship (Jarvis, Reference Jarvis2002). Dujsik (Reference Dujsik2008), for example, examined the effects of computer-aided pre-writing strategy instruction on intermediate ESL students’ strategy use, writing quantity, and writing quality. The results of the sequential mixed-methods study demonstrated the significant impact of training on the learners’ strategy use, but failed to detect tangible effects on the quantity and quality of their writing.
A number of studies have examined the role of CACM as an instructional strategy. Sung (Reference Sung2008) compared the effects of computerized CM versus no-mapping on the writing performance of 125 ESL undergraduate students. The results indicated that participants of the CM group – using a piece of software known as Inspiration – outperformed the comparison group in terms of content and organization of their writing due to ample opportunity for devising their assignments through visualization. Elsewhere, Liu (Reference Liu2011) examined the effect of using different CM treatments (no-mapping, individual-mapping, and cooperative-mapping) during the pre-writing phase on ESL learners’ writing performance across various proficiencies. The participants were 94 university freshmen divided into three groups of high-, middle-, and low-level proficiency on the basis of their baseline writing scores. Participants received all three rounds of treatment for accomplishing three writing assignments. The results indicated that the two CACM conditions were more productive than the no-mapping condition. Furthermore, the study revealed the sophistication of the learners’ concept maps, which were positively correlated with their writing performance. In a multiple-baseline study, Evmenova et al. (Reference Evmenova, Regan, Boykin, Good, Hughes, MacVittie, Sacco, Ahn and Chirinos2016) investigated the impact of a computer-based graphic organizer (CBGO) on the quantity and quality of persuasive essays. The participants were 6th- and 7th-grade ESL students with high-incidence disabilities who were assigned to the three conditions of baseline (writing on the computer but not using the graphic organizer), CBGO use (writing on the computer with the graphic organizer), and maintenance (writing on the computer without the graphic organizer). Data were collected on the number of words, sentences, transition words, essay parts, and holistic writing quality score, revealing that the writing quality and quantity of the CBGO planning condition improved more significantly than the baseline and maintenance groups. The CBGO used for the second group allowed students to approach planning and writing either in a vertical (i.e. brainstorm all parts first and then write complete sentences) or horizontal manner (i.e. brainstorm one idea and turn it into a complete sentence before moving to the next one). Furthermore, Gardner (Reference Gardner2015) attempted to explore the effects of CACM versus the so-called four-square graphic organizers on the quality of persuasive writing compositions administered through essays to ESL students. The four-square organizer is a type of pre-writing strategy in which a complete topic sentence is written in the center of a large square, which is divided into four smaller squares of equal size. Learners are then supposed to write three sentences that develop the thesis statement. The results indicated that CACM more efficiently affected the persuasive content and engagement with content.
2.2 Lexical diversity and writing
Having discussed the capability of CACM for the pre-writing stage, we now proceed to a discussion of the role of lexical competence in the actual writing process. Research suggests that lexical knowledge is a crucial factor in writing proficiency. Several studies (Crossley & McNamara, Reference Crossley and McNamara2012; Crossley et al., Reference Crossley, Salsbury and McNamara2012; Crossley, Salsbury, McNamara & Jarvis, Reference Crossley, Salsbury, McNamara and Jarvis2011; Engber, Reference Engber1995; Gebril & Plakans, Reference Gebril and Plakans2016; González, Reference González2017; Goodfellow, Lamy & Jones, Reference Goodfellow, Lamy and Jones2002) have observed the central role of LD in academic writing proficiency. Gebril and Plakans (Reference Gebril and Plakans2016), for example, investigated issues related to the influence of textual borrowing on LD and the difference in LD across test scores on integrated tasks. To this end, 130 ESL students from a Middle Eastern university completed a reading-based integrated task that required students to read two passages on “global warming” and write an argumentative essay on the topic. The writing samples were analyzed via CLAN software as a measure of their LD. The scoring procedure was based on the Test of English as a Foreign Language (TOEFL) iBT’s integrated writing task rubrics. The results demonstrated a positive correlation between LD and the integrated writing scores with a fairly large effect size.
In another study, Crossley et al. (Reference Crossley, Salsbury and McNamara2012) explored how L2 texts can be classified using computational indices that characterize lexical competence. In so doing, writing samples derived from 100 L2 learners at various proficiency levels were analyzed using lexical indices available in Coh-Metrix 3.0. The writing samples were categorized into beginning, intermediate, and advanced groupings based on the learners’ scores on the Institutional TOEFL or the TOEFL iBT, and the ACT ESL Compass reading and grammar tests. A discriminant function analysis was used to predict the level categorization of the texts using lexical indices related to breadth of lexical knowledge (i.e. word frequency and LD), depth of lexical knowledge (hyponymy, polysemy, and word meaningfulness), and access to core lexical items (word concreteness and familiarity). The findings indicated that word frequency, LD, and word familiarity were the strongest predictors of the writing proficiency level. Further evidence came from González (Reference González2017), who investigated the extent to which lexical frequency and LD contribute to writing proficiency scores on monolingual English-speaking writers’ and advanced multilingual writers’ academic compositions. The data comprised 172 essays written by the multilingual and monolingual learners. The essays were evaluated by three independent raters based on the TOEFL iBT’s independent writing task rubrics. Results from a binary logistic regression indicated that LD had a significantly greater contribution to the writing scores than lexical frequency, thereby situating LD as an essential component of academic writing proficiency.
2.3 Measure of textual lexical diversity and other measures of lexical diversity
The concept of LD has been of interest to linguists since the 1930s, leading to the introduction of several indices for its measurement. As a traditional measure of range of a writer’s vocabulary, LD has often been calculated through type-token ratio (TTR) (Malvern & Richards, Reference Malvern and Richards2002), which represents the ratio between the number of different words (types) and the total number of words (tokens). TTR is claimed to be inherently flawed, as it varies as a function of sample length. To counter this problem, several researchers developed indices with diverse algebraic transformations of the TTR (e.g. Guiraud, Reference Guiraud1954; Maas, Reference Maas1972). Nevertheless, these TTR variations could not resolve the problem of text length influence. Later, Yule’s K measure was tested on texts containing several thousands of tokens, and as a result, text length influence remained a significant factor in evaluating LD. Yule’s K measure (Yule, Reference Yule1944) is a probability model of the changes that take place in the lexical frequency spectrum of a text as the text becomes longer.
Computational linguistics has introduced a new generation of tools for measuring LD that produce length-invariant estimates without removing any data. One such tool is known as vocd-D, which was developed by Malvern and Richards (Reference Malvern, Richards, Ryan and Wray1997) and computationally implemented by McKee, Malvern and Richards (Reference McKee, Malvern and Richards2000). This measure, derived from the vocd algorithm and the process utilized in its generation, counteracts the problem of sample size through random sampling and reference to ideal curves. The procedure is run three times to reach the optimum diversity (D) value. Vocd-D is also sensitive to short samples that contain less than 150 tokens (Koizumi, Reference Koizumi2012; McCarthy & Jarvis, Reference McCarthy and Jarvis2007; Owen & Leonard, Reference Owen and Leonard2002). Furthermore, McCarthy and Jarvis (Reference McCarthy and Jarvis2007, Reference McCarthy and Jarvis2010) proposed the HD-D index. HD stands for hypergeometric distribution and HD-D is an alternative to the vocd-D index. The estimation of LD using the HD-D is based on probabilities of word occurrence in a language sample (McCarthy & Jarvis, Reference McCarthy and Jarvis2010). HD-D is mathematically less demanding than vocd-D, although both indices are highly correlated and also sensitive to short samples of less than 150 tokens (deBoer, Reference deBoer2014).
The measure of textual lexical diversity (MTLD) is another contemporary measure developed by McCarthy (Reference McCarthy2005). MTLD measures TTR after every word of a sample until it reaches a value of 0.72. Two MTLD values are regularly calculated: one for forward processing and one for reverse processing. The average of the forward and backward MTLD scores produces the final D value through a sequential analysis of the sample (Fergadiotis, Wright & Green, Reference Fergadiotis, Wright and Green2015; Koizumi, Reference Koizumi2012). According to several research studies, the MTLD measure presents several advantages: it is more robust with regard to text length variations than vocd-D or HD-D (e.g. Fergadiotis, Wright & West, Reference Fergadiotis, Wright and West2013; Treffers-Daller, Reference Treffers-Daller, Jarvis and Daller2013), and it demonstrates no text length bias for text samples that encompass between 100 and 2,000 tokens (Crossley, Salsbury & McNamara, Reference Crossley, Salsbury and McNamara2009; McCarthy, Reference McCarthy2005). Further evidence was found from a study by González (Reference González2017), who observed that a reliable instrument was necessary for calculating the D values of the written essays that varied greatly in text length. Consequently, González used the MTLD index within the Coh-Metrix (Version 3.0), which produced reliable estimations of LD.
To validate the MTLD approach, McCarthy and Jarvis (Reference McCarthy and Jarvis2010) compared it against the performances of the basic (TTR and Yule’s K) and sophisticated competing indices (vocd-D, HD-D, and Maas) in the field. Two corpora were used in the study: the MJ corpus and the M&C corpus, with each text originally composed of approximately 2,000 words. Following common practice in LD assessment (e.g. McCarthy & Jarvis, Reference McCarthy and Jarvis2007; McKee et al., Reference McKee, Malvern and Richards2000), each text of each register was divided into smaller sections (i.e. one section of 2,000, two sections of 1,000, etc.) in order to examine the sensitivity of the LD indices when texts of varying lengths were assessed. Thus, a total of 1,584 textual units were included. The comparisons involved assessments of convergent validity, divergent validity, internal validity, and incremental validity. The results suggested that MTLD correlates highly with all the established LD indices and satisfies convergent validity to at least the same degree as other sophisticated and established LD indices. In the case of divergent validity, the results indicated that MTLD, similar to Maas, K, vocd-D and HD-D, did not highly correlate with TTR, which was considered as a flawed index. In sum, MTLD is one of the most versatile indices of measuring LD. As noted, it is less sensitive to text length variations than TTR and D as traditional algorithms (e.g. Fergadiotis et al., Reference Fergadiotis, Wright and West2013; Treffers-Daller, Reference Treffers-Daller, Jarvis and Daller2013) and even other sophisticated measures such as vocd-D and HD-D.
Although there has been a relatively large body of research into the effect of CACM on a number of variables (e.g. accuracy, text cohesion, vocabulary learning, writing motivation), one issue that has not received due attention by researchers is the extent to which CACM can affect LD in essay writing. Thus, in the first place, this study integrated the affordances of computer-assisted language learning into the exploration of LD. Second, even though the majority of studies testify to the role of LD in distinguishing between high and low levels of writing proficiency (e.g. Crossley et al., Reference Crossley, Salsbury and McNamara2012; Gebril & Plakans, Reference Gebril and Plakans2016; González, Reference González2017), further evidence from a process writing experiment may shed more light on their partly inconclusive association. In addition, as discussed earlier, the basic indices of LD (e.g. TTR, Guiraud, and Yule’s K) have been inherently problematic to a greater or lesser extent. Even sophisticated measures of LD (e.g. vocd-D and HD-D) have been criticized on the grounds that they are sensitive to text length variations. Thus, MTLD was ultimately opted for this project.
3. Method
3.1 Participants
Initially, the sample included 55 mixed-gender sophomores, aged between 19 and 24 (M age = 20.3, SD = 1.59) from two intact classes at Vali-e-Asr University of Rafsanjan in Iran. They were native speakers of Farsi majoring in English language literature who had managed to gain entrance to the university on the basis of their ranking obtained in a competitive national examination. Out of the two groups, one was randomly chosen as the experimental (N = 27) and the other as the comparison group (N = 26). The groups were from two independent essay-writing classes, and the course was a core subject in their undergraduate program. However, a mock Oxford Placement Test (OPT) (Version 1.1, 2001) was administered to further establish whether they were at a homogeneous level of language proficiency. Based on a rating descriptor accompanying the OPT, 53 students who scored between 30 and 47 – the acceptable range for the intermediate level – were considered qualified to partake in this study.
3.2 Instrumentation
To achieve the objectives of the study, the following instruments were applied: (a) a writing task (as pre-test and post-test), (b) the Inspiration software (Version 9), (c) a writing analytic scale (Jacobs, Zinkgraf, Wormuth, Hartfiel & Hughery, Reference Jacobs, Zinkgraf, Wormuth, Hartfiel and Hughery1981), and (d) MTLD.
3.2.1 Inspiration
Inspiration, a product of Inspiration Software, Inc., is a concept-mapping program designed for adolescent and adult students (Liu, Reference Liu2011; Sung, Reference Sung2008). This visual learning tool helps users to organize their ideas by creating hierarchical concept maps and to make immediate changes by adding concepts, creating links, labeling propositions, and moving items around. Inspiration also contains cross-curricular templates designed to help students and educators jump-start their writing process. A built-in help system provides support when users have inquiries about how to apply it. Moreover, this software consists of a presentation manager that can transform diagrams and concept maps into polished presentations that communicate ideas and notions.
3.2.2 MTLD
MTLD (McCarthy, Reference McCarthy2005), a product of textinspector.com, was used in this study to calculate the learners’ LD scores. In addition to not varying as a function of text length (McCarthy & Jarvis, Reference McCarthy and Jarvis2010), MTLD allows for comparisons between text segments of considerably different lengths (at least 100 to 2,000 words) and produces reliable results while strongly correlating with other LD indices (McCarthy, Reference McCarthy2005). The MTLD scores have an indefinite range; however, validation studies identified the typical score as ranging from 70 to 120, with higher scores indicating greater diversity (McCarthy & Jarvis, Reference McCarthy and Jarvis2010). To enter the data for MTLD analysis, the spelling mistakes were manually corrected according to the guidelines provided by Text Inspector, as the presence of errors in texts can have a confounding effect on LD ratings. First, the written texts were checked for any spelling, and typographical errors and certain misspellings were corrected. These principles for correcting spelling mistakes are based on the premise that the focus of LD is to explore how diverse the range of words are. Therefore, to fulfill this primary focus, the words themselves must be in correct and recognizable forms. Second, contracted forms were separated (e.g. I’m was changed to I am, don’t to do not, I’d to I would, etc.). Third, all digits and numbers were excluded by Text Inspector (see supplementary material for the Text Inspector tool and the LD profile score, respectively).
3.3 Procedure
First and foremost, the students’ and university administrators’ written consent to work the intervention into the program was sought in advance. Prior to the treatment, a five-paragraph argumentative essay of approximately 300 words was assigned within a 45-minute time limit as a pre-test to both conditions. Next, the experimental group underwent CACM treatment via Inspiration in a computer-equipped classroom, spanning seven sessions of their regular class time. During the first session, students were given instructions on how to sketch concept maps via Inspiration. The first segment comprised an introduction to CM in which the notion was delineated by the instructorFootnote 1 and some terminology associated with it, including concepts, linking, propositions, and cross-links, was presented. Moreover, the instructor provided the class with some samples of concept maps as well as a video clip for additional support. During the second part of the session, the learners created their own concept maps using Inspiration under the instructor’s guidance. In each session, approximately 20–25 minutes were devoted to implementing the procedure. At the outset of each session, the participants received a writing prompt, and each individual was required to construct their hierarchical map with the main concepts and sub-concepts incorporated and labeled with propositions within the diagram view of the software. Then the instructor examined the concept maps and had students go through the drafting process via Microsoft Word on their computers. The instructor also provided prompt-like advice with regard to the lexical aspects of the learners’ drafts (e.g. elaboration, LD, relevance to the topic, etc.). This procedure was repeated over the subsequent sessions until the treatment was completed (see supplementary material for a hierarchical concept map constructed by one student).
Alongside the experimental group, students in the comparison (outlining) group underwent the same task of developing a five-paragraph essay. Similarly, the pre-writing phase as an essential component of process writing was followed discreetly by the instructor in the comparison group as well. The students were given similar topics for their writing assignments but used an alternative outlining strategy over a seven-session period that took place during seven weeks. They received a topic each session and were instructed to generate an outline before composing a draft. To create a well-developed outline, the students were taught to categorize the main points and ideas relating to the introduction, body, and conclusion paragraphs. Then the outline structures (i.e. clustering and ordering ideas) were checked by the instructor. The students were told to follow the outline content and transform their ideas into paragraphs by providing relevant details. The instructor also gave the students the necessary feedback to revise their written drafts during each stage of the writing process before completing their texts (final products). It should be noted that the outlining group’s writing prompts, essay length, and administration timing were identical to those of the experimental group. In fact, they were parallel classes that were held once a week.
3.4 Scoring
Immediately after the seven-week span of the treatments, a post-test identical in topic and genre to the pre-test was assigned to both groups. Two independent, trained ratersFootnote 2 anonymously rated all 106 essays in the original format (i.e. handwritten) using Jacobs et al.’s (Reference Jacobs, Zinkgraf, Wormuth, Hartfiel and Hughery1981) analytic scoring scale. This multi-trait scoring rubric has been extensively used in L2 writing studies to measure text quality (Polio & Friedman, Reference Polio and Friedman2017). In spite of being labor intensive, we used this scale in particular to evaluate the writing quality of the essays due to the added value it provides beyond the total score. In this scale, scripts are rated on five aspects of writing and are weighed differentially on the basis of content (30 points), language use (25 points), organization (20 points), vocabulary (20 points), and mechanics (5 points). The inter-rater reliability was also found to be at an acceptable range (r = .72). In the end, the compositions were entered into Text Inspector for calculating the D value. It is worth mentioning that for evaluating LD, only the lexical aspect of the essays is taken into account. The respective score typically ranged from 70 to 120.
4. Results
To address the first research question, the data were submitted to a one-way analysis of covariance (ANCOVA). The aim was to explore the impact of CACM on LD of the written essays. The independent variable was the type of intervention (CACM vs. outlining); the dependent variable was the LD scores on the post-test, and the covariate was the writing pre-test. Preliminary checks were made to ensure there was no violation of the assumptions of normality, linearity, homogeneity of variances, and homogeneity of regression slopes. Table 1 presents the descriptive data output for LD scores provided by Text Inspector.
Table 1. Descriptive statistics for lexical diversity scores
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210817113258354-0646:S095834402100001X:S095834402100001X_tab1.png?pub-status=live)
The ANCOVA test (Table 2) revealed that after adjusting for the pre-test scores, there was a statistically significant difference between the LD indices of the two conditions, F(1, 50) = 5.08, p = .029, η p 2 = .09. The magnitude of difference in the mean scores can be inferred from the effect size index, which suggests a medium impact. Overall, the results imply that CACM has affected LD scores more strongly than the outlining treatment. Figure 1 illustrates the performance of both groups across the tests.
Table 2. ANCOVA output for lexical diversity (LD) scores
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210817113258354-0646:S095834402100001X:S095834402100001X_tab2.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210817113258354-0646:S095834402100001X:S095834402100001X_fig1.png?pub-status=live)
Figure 1. Lexical diversity scores from pre-test to post-test
With regard to the second research question, Pearson product-moment correlation was used to explore the association between learners’ LD and writing quality scores (see Table 3 for the descriptive writing quality scores before and after the treatments). The results, based on Jacobs et al.’s (Reference Jacobs, Zinkgraf, Wormuth, Hartfiel and Hughery1981) rubric (with a maximum score of 100), indicated no significant relationship between the post-test scores of the two variables in either of the conditions: CACM, r = .06, p = .77; outlining, r = –.01, p = .94. Figure 2 illustrates the status of the CACM group. As can be seen, the scores are spread out across the diagram, indicating no meaningful orientation. Similarly, no association was found between learners’ LD and their writing quality scores on the pre-test: CACM, r = –.16, p = .42. A similar lack of association was found for the outlining condition, r = .26, p = .19.
Table 3. Descriptive statistics for writing quality
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210817113258354-0646:S095834402100001X:S095834402100001X_tab3.png?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210817113258354-0646:S095834402100001X:S095834402100001X_fig2.png?pub-status=live)
Figure 2. Association between lexical diversity and writing quality in the CACM group
5. Discussion
The first research question inquired whether applying Inspiration as a graphic organizer would affect the D values in the learners’ written essays. The results revealed that learners in the CACM group scored significantly higher than those in the outlining group, implying that using Inspiration for the pre-writing strategy enabled learners to build more complex cognitive structures to organize their conceptual system. As a matter of fact, learners’ ability to give variety to their lexical choices and produce linguistically diverse texts received a boost. The second research question sought to explore the relationship between LD and overall writing quality. The findings revealed that there was no correlation between the two variables, neither on the post-test nor the pre-test.
To create concept maps, the individual learner needs to take charge of retrieving and restructuring the relevant concepts. The process helps them to envision the key themes and constituents of the text and their relationships. In this way, the participants of the current study were able to assimilate the new concepts into their pre-existing knowledge base. Therefore, the findings corroborate Ausubel’s (Reference Ausubel1968) theory of human cognitive learning, which underscored the processes of using meaningful learning for acquiring knowledge. Although not many studies to date have specifically focused on the impact of CACM on LD in writing, the findings of the present study are, in one way or another, consistent with a number of previous research studies (e.g. Gardner, Reference Gardner2015; Kwon & Cifuentes, Reference Kwon and Cifuentes2009; Liu, Reference Liu2011; Liu et al., Reference Liu, Chen and Chang2010) whose findings suggested that CACM can serve as a cognitive tool to enhance learners’ thinking, problem-solving, and reasoning skills. With regard to the substantial impact of graphic organizers at the pre-writing stage, Gardner’s (Reference Gardner2015) findings are comparable to those of the current study. Although LD was not evaluated as a dependent variable in Gardner’s study, the greater contribution of CACM, compared to the four-square or the no-organizer conditions, became evident. A possible reason for Gardner’s convergent findings to those of ours might be attributed to CACM’s propositional structure, which helps learners to organize their ideas and make connections between their argument and the content they are using to support their argument. One thing that can be deduced from our findings is that CACM instruction through process writing possibly made learners more attentive to drafting and revision stages of the writing process, thereby leading to improvement in their compositions. Furthermore, Liu’s (Reference Liu2011) findings revealed that the application of different computerized concept-mapping treatments (individual-mapping and cooperative-mapping) as a pre-writing strategy had distinctly more impact on the writing performance of the learners than did the no-mapping treatment.
In retrospect, it has been argued that integrating computers in writing instruction may not always yield a favorable outcome due to various reasons such as classroom time constraints and dearth of the teacher’s expertise. As mentioned earlier, Dujsik (Reference Dujsik2008) had found no difference between his experimental and control groups in terms of writing quality and quantity. In fact, there is a plethora of factors that can result in such varying outcomes in different studies. For instance, one possible cause for the conflicting findings with those of the current study might come from Dujsik’s shorter intervention span. It therefore could be the case that the length of treatment, among other elements, can be a crucial factor.
Concerning the second research question, it was seen that EFL students’ ability to produce diverse vocabularies may not always correlate with their mastery over composing texts. This finding fails to accord with previous conclusions that suggested that LD is positively associated with writing proficiency (Crossley & McNamara, Reference Crossley and McNamara2012; Crossley et al., Reference Crossley, Salsbury and McNamara2012; Gebril & Plakans, Reference Gebril and Plakans2016; González, Reference González2017; Olinghouse & Wilson, Reference Olinghouse and Wilson2013). Regarding the relationship between linguistic complexity features and L2 writing proficiency, the results here were not aligned with those of Crossley and McNamara’s (Reference Crossley and McNamara2012), who observed that LD and lexical frequency can predict differences between low- and high-scoring learners. González (Reference González2017) also came up with dissimilar results to those of the current study. He used MTLD to investigate the extent of the contribution that LD and lexical frequency impart to English writing quality in multilingual and monolingual English academic compositions. The study concluded that LD had a greater impact on writing scores than lexical frequency. Moreover, our findings are in line with those who found that LD was not a defining characteristic of text quality (e.g. Jarvis, Reference Jarvis2002; Wang, Reference Wang2014; Yu, Reference Yu2010). Wang (Reference Wang2014), for example, found a lack of association between LD and EFL writing proficiency. The study was carried out through an analysis of email texts written by 45 Chinese high school students. The TTR and vocd-D indices were used therein to assess LD. His findings revealed that the LD scores of email texts graded at higher proficiency levels were not significantly different from those of lower proficiency levels. Addressing this dissociation, one can argue that learners who produced a more diverse range of vocabulary could have committed more usage errors, thus impinging on their overall writing score. Yet, considering the dissimilar genres and LD measures across the two studies, our claim remains merely a conjecture and conclusive evidence should be garnered to verify that by future researchers. In a similar vein, Yu (Reference Yu2010) investigated the relationship between LD and quality of writing tasks, where 200 compositions on five different topics by English learners from different L1 backgrounds were analyzed in the form of written discourse using the vocd-D. The results revealed that for samples from the two largest L1 groups (i.e. Filipino and Chinese), LD was not a significant predictor of the quality of learners’ written compositions. A possible explanation for this discrepancy between that study and ours could be that in Yu’s experiment, the subjects from several language backgrounds were tested. On balance, our study provided further empirical evidence that the overall quality of a written essay may not always correlate with the use of a wider range of words or decrease in lexical monotony.
6. Conclusion
Theoretically, our findings demonstrate that applying the CACM strategy ahead of the drafting stage helped to facilitate and provoke effective thinking. This strategy affected the learners’ diction through developing the generation and organization of ideas. In other words, CACM can possibly provide more chances for writers to view their own writing through visualization. This possibility can provide a justification for the improvements in the D values of student compositions. The present study was a pioneering attempt in the Iranian context to determine whether the retrieval of a wider range of lexical items by EFL learners could be realized through employing the Inspiration software. As can be perceived from the preceding studies, concept maps portray relationships in a logical and hierarchical fashion and meaningfully connect branches of content ideas. We, therefore, found that the application of CACM as a pre-writing strategy can help promote learners’ LD in writing. Correspondingly, L2 writing instructors and practitioners can work graphic organizer training into their lesson plans in order to encourage their students to rehearse before homing in on the main draft. Researchers should take stock of the significant role of CACM in boosting linguistic complexity in general and LD in particular across different genres of writing in their future endeavors. Evaluating LD through a quantitative measure, such as MTLD, may provide teachers with a more accurate picture of their students’ lexical development and offer a respectable benchmark for distinguishing writing proficiency levels (also see Vögelin, Jansen, Keller, Machts & Möller, Reference Vögelin, Jansen, Keller, Machts and Möller2019). We would also wish to emphasize that accentuating CACM should take on a more substantive role in academic writing instructional courses. The finding that this pre-writing strategy can affect LD more than a traditional technique such as outlining holds worthwhile implications for methods of teaching and assessing EFL compositions. Teachers could set up varied practice activities on how to diversify learner diction during essay drafting and revision processes and how to identify alternatives for repetitive lexical items. We recommend that more attempts be made to introduce apps such as Inspiration to the L2 classroom. As noted, one merit of the study resides in the utilization of MTLD, which enabled us to counteract potential pitfalls that are associated with other indices, as it is maximally independent of text size. MTLD can also be used as a diagnostic tool to target students who are struggling with their unsatisfactory lexical knowledge.
The study also shed light on the extent of association between overall writing quality and LD. It is to be acknowledged that because the main thrust of the current study was to operationalize CACM as well as LD measures and software, we did not further scrutinize the analytic writing subscores. The main purpose of the second research question was to compare LD and writing quality as an overall construct. Hence, we sufficed to report a representative mean score. Nevertheless, the analytic rubric provides an overall score as well as the subscores on the learners’ detailed performance. Analytic scales such as that of Jacobs et al.’s (Reference Jacobs, Zinkgraf, Wormuth, Hartfiel and Hughery1981), therefore, can be useful for formative assessment purposes where the provision of diagnostic information on the learners’ writing performance is taken into account. We would also like to acknowledge that the respective data are available for sharing and further research, and we recommend that researchers analyze writing quality in terms of its subscale dimensions in connection with the other variables of their study. Thus, researchers of future studies may prefer to focus on a collection of textual features, including content, organization, accuracy, error production, style, and other contextual factors, and examine how such variables are affected by CACM or other computer-assisted pre-writing strategies.
6.1 Limitations
Furthermore, as LD mainly pertains to the breadth of lexical knowledge, it can only account for the range of words used during the writing process. It therefore cannot judge the quality of the lexicon such as how appropriately the words have been used. It is suggested that future studies examine depth of knowledge over and above the size of vocabulary. Schmitt’s (Reference Schmitt2014) synthesis of studies involving vocabulary depth is also a thought-provoking resource for avid researchers and practitioners alike. Although the present study strived to open new directions for further investigation, it is recommended that researchers carry out a more comprehensive evaluation of the effect of various software applications on other dimensions of semantic complexity, namely lexical accuracy, lexical frequency, lexical sophistication, lexical density, and lexical repetition (see, e.g., Ballance, Reference Ballance2021). In light of the fact that pre-writing strategies instruction is an under-researched area, studies in the future can delve into different aspects of this enterprise. Care must be taken to consider the spillover effect of learner variables such as motivation, learning styles, and language proficiency. Hence, aptitude-treatment interactionFootnote 3 research can be illuminating in this regard. We also suggest that future researchers draw on other computerized learning tools such as those involving idea generation/organization software, which were not tapped in the present study. One last point of note is that our study was primarily designed for an experimental set-up, and the sample size was by no means adequately large to allow correlational extrapolation (to the population). As a result, the findings pertaining to the second research question should be treated with more caution and deemed perhaps as the equivalent of an action research project.
Supplementary material
To view supplementary material referred to in this article, please visit https://doi.org/10.1017/S095834402100001X
Ethical statement
The research involved no conflict of interest. All participants consented to contribute to the experiments and were personally informed about the outcome of their writing tasks. All individuals’ performance data remained anonymous throughout the project.
About the authors
Mohammad Hassanzadeh is an assistant professor of applied linguistics with the Department of English Language and Literature at Vali-e-Asr University of Rafsanjan, Iran. He is currently serving as a guest professor at Sharif University of Technology. His research interests involve instructed second language acquisition with a focus on technology-enhanced pedagogy.
Elahe Saffari is an MA graduate in applied linguistics from Vali-e-Asr University of Rafsanjan, Iran. Having taught English across several language institutes and schools in the city of Kerman, she has developed a vested interest in L2 vocabulary and writing instruction research.
Saeed Rezaei is currently an associate professor of applied linguistics at Sharif University of Technology. His main areas of interest are in social issues in TESOL and linguistics. His most recent publications have appeared in TESOL Quarterly, Teaching and Teacher Education, New Writing, Multilingua, Open Learning, Asian Englishes, and Higher Education.
Author ORCIDs
Mohammad Hassanzadeh, https://orcid.org/0000-0003-1510-1149
Elahe Saffari, https://orcid.org/0000-0001-8601-1437
Saeed Rezaei, https://orcid.org/0000-0003-0296-0382