1. INTRODUCTION
The decline in the numbers of pupils studying French beyond the age of 16 in England has been well documented (CILT, 2005; Fisher, Reference Fisher2001; Graham, Reference Graham1997, Reference Graham2002, Reference Graham2004). Reasons for this are complex, but important factors are the difficulty learners experience, both at the General Certificate of Secondary Education (GCSE) and in the first year of post-16 study, and their perceived lack of progress when faced by the more stringent linguistic demands of the Advanced Level syllabus. The transition from Year 11 (GCSE) entails a steep increase in difficulty, arising partly from exposure to large amounts of new and more complex vocabulary as students move from a basic transactional and survival-based syllabus to more extended and abstract texts and topics. There are also indications that learners have insufficient awareness of how to make progress (Graham, Reference Graham2004, Reference Graham2006). In one of the few large-scale studies conducted into students' perceptions of language-learning post 16 in England, Graham (Reference Graham1997) found that many students reported difficulties learning the large amounts of new lexis they encountered from the start of their A-level course. In addition, she found that students used a narrow range of strategies for learning vocabulary, which mainly consisted of fairly shallow strategies such as list-making and ‘look-cover-write-check’.
Graham's (Reference Graham1997) study does not, however, attempt to chart the development of learners' vocabulary knowledge during advanced level study. Indeed, in comparison with what we know about how learners of English develop vocabulary, little research has been conducted into learners of French in schools in England in general and at A-level in particular. Milton (Reference Milton2006a) represents probably the only study to address this question. Using data drawn from X_Lex (Meara and Milton, Reference Meara and Milton2003), a vocabulary recognition test, he estimates that pupils learn on average 170 words per year in the first five years of secondary schooling (Years 7 to 11). More progress seems to occur in Year 12, however, where the mean number of words known rises from 852 in Year 11 to 1,555. Milton (Reference Milton2006a) also reports a strong relationship between vocabulary levels and the A-Level grade learners achieve in French, which underlines the importance of learners' vocabulary levels for overall progress.
Milton's (Reference Milton2006a) study concludes that the rate of vocabulary growth among learners of French in England is slow when compared with learners of other languages in different countries, even when the smaller number of lessons typically received by pupils in English schools is taken into account. Reasons for this are unclear, although Milton suggests that an analysis of the textbook contents used by English learners might provide insights.
Milton's research looks at vocabulary development under ‘normal’ classroom conditions, which presumably involve a combination of explicit and incidental vocabulary learning. Schmitt (Reference Schmitt2000: 116) defines the former as ‘the focused study of words’, and the latter as learning ‘through exposure when one's attention is focused on the use of language, rather than the learning itself’ (e.g. through reading or listening). That both explicit and implicit modes play some role in vocabulary acquisition is generally acknowledged (e.g. Min, Reference Min2008). Nevertheless, the relative importance of these two modes of learning has been the subject of controversy (Laufer, Reference Laufer2006), with a recent gulf emerging between research evidence and semi-official advice (see Milton's, Reference Milton2006b, review of the Harris and Snow, Reference Harris and Snow2004, book on vocabulary building published by The National Centre for Languages, London).
A number of studies have been conducted into both processes. In terms of explicit learning, one area of research has been how learning can be enhanced through word-focused activities such as sentence construction, sentence completion and using target vocabulary in essays (Laufer, Reference Laufer2006). In addition, studies of instruction in vocabulary learning strategies have provided generally positive results (e.g. Burgos-Kohler, Reference Burgos-Kohler1991, for learners of Spanish in a US university, and Lawson and Hogben, Reference Lawson and Hogben1998, for secondary school learners of Italian in Australia. See also Nyikos and Fan, Reference Nyikos, Fan, Cohen and Macaro2007, for a review of vocabulary intervention studies).
Regarding incidental learning, there is a sizeable body of research (reviewed in Pigada and Schmitt, Reference Pigada and Schmitt2006, and in Swanborn and de Glopper, Reference Swanborn and de Glopper1999) which suggests that vocabulary growth can occur through extensive reading, although gains tend to be small. As both Pigada and Schmitt (Reference Pigada and Schmitt2006) and Swanborn and de Glopper (Reference Swanborn and de Glopper1999) found, there is a lack of clarity surrounding such research, in terms of differences in the measurement instruments used and exactly what factors have led to the vocabulary gain. Is it frequency of occurrence of words in the text, for example? If so, to what extent is this dependent on textual support mechanisms such as marginal glosses or dictionary use? (see, for example, Hulstijn, Hollander and Greidanus, Reference Hulstijn, Hollander and Greidanus1996). Few studies, however, have looked at incidental L2 vocabulary acquisition through listening. Vidal (Reference Vidal2003) located only three such studies in non-laboratory settings, ranging from studies of young beginners (Schouten-van Parreren, Reference Schouten-van Parreren1989) to university-level Japanese learners of English (Toya, Reference Toya1992), all of whom made gains in vocabulary through listening activities. Vidal herself conducted a study of vocabulary gains among ESP students in a Spanish university, to see whether vocabulary was gained through listening to academic lectures. Before watching and listening to video-taped lectures, learners were tested on their knowledge of 36 technical, academic and low-frequency words contained in the lectures. This test was then repeated after listening, and again as a delayed post-test. Results showed significant gains in knowledge of the selected items at post-test, but this declined at delayed post-test.
Clearly, incidental learning of L2 vocabulary can occur through listening and reading. Nevertheless, it is also true that second language learners fail to learn words they encounter (Hulstijn et al., Reference Hulstijn, Hollander and Greidanus1996) and that incidental learning will be influenced by factors that impact on the learner's ability to infer the meaning of unknown words such as the ratio of known to unknown words (Laufer, Reference Laufer, Lauren and Nordman1989, Reference Laufer, Arnaud and Bejoint1992). Of particular relevance here seems to be the ‘quality of information processing’ (Laufer and Hulstijn, Reference Laufer and Hulstijn2001: 12) involved in the task through which incidental vocabulary learning may occur. Interventions that seek to improve learners' performance in listening or reading skills, and which typically involve learners in greater depth of processing as learners consciously apply strategies to tasks, would therefore be likely to have an additional benefit for their vocabulary acquisition. We located only one published study (Fraser, Reference Fraser1999), however, that looked at this by-product of strategy instruction, and that was in reading. Fraser investigated the impact of instruction in ‘lexical processing strategy’ (LPS) use on vocabulary learning among eight university-level Francophone learners of English. Participants received training over a period of two months in three forms of LPS: ignore the word and continue reading; consult (a dictionary or another person); and infer (via linguistic or contextual clues). Data were collected on eight occasions, with four measurement points to determine the extent of LPS use: a baseline period; after the metacognitive strategy training; after the focused language instruction; and one month after the treatment. Learners' use of LPS was ascertained through a think-aloud procedure, whereby they identified unknown words in a text they were asked to read and then explained how they had dealt with these unknown words. A week after this procedure, each learner was presented with ten of the words identified as unknown and asked to indicate whether they knew them or not. The mean ‘retention rate’ (Fraser, Reference Fraser1999: 238) was 28%, although there was much individual variation and a standard deviation of 12%. Fraser argues that this retention rate is higher than reported in previous studies of incidental vocabulary learning through reading. She also reports that learners' rate of ignoring unknown words decreased and their rate of success when making inferences increased. Thus the instruction seemed to have improved the underlying conditions for vocabulary learning. As there was no comparison group, however, and the number of participants small, the results need to be interpreted cautiously.
The limited amount of research in this area thus suggests that the potential impact on vocabulary acquisition of strategy interventions in other skill areas requires further exploration. One possible argument might be that the whole process of strategy awareness raising and a focus on alleviating students' problems in listening and writing would have beneficial spin-offs. In relation to listening tasks, these could result from increased incidental vocabulary learning through mechanisms such as including improved word segmentation, identification and comprehension. In addition, students might benefit from multiple spaced repetition in listening texts (Nyikos and Fan, Reference Nyikos, Fan, Cohen and Macaro2007) and the recycling of previously encountered words known only superficially. The use of writing strategies might benefit vocabulary through, among other things, deeper processing, more efficient use of feedback, redrafting, and vocabulary-focused planning and monitoring. On the other hand, a counter-argument might be that strategy training would not benefit lexical learning because a tight focus on other skill areas such as listening and writing would be at the expense of curricular time devoted to vocabulary learning and promotion of vocabulary-specific strategies.
The study described here forms part of a project to evaluate the impact of strategy instruction on Year 12 learners' performance in these two skills. Results show a clear-cut positive effect of the intervention for listening (Graham and Macaro, forthcoming, Reference Graham and Macaro2008). The benefits for writing were less clear, however, with a much smaller effect (Macaro, Graham, Richards, Spelman-Miller and Vanderplank, Reference Macaro, Graham, Richards, Spelman-Miller and Vanderplank2006). Against this background, and in view of the paucity of studies looking at vocabulary development as a by-product of strategy instruction in other skill areas, we identified the following questions specific to vocabulary:
1. In view of the increased vocabulary demands post-GCSE and Milton's (Reference Milton2006a) findings of a vocabulary ‘spurt’ in Year 12 (see above) together with the difficulty of finding reliable indices of vocabulary growth over short periods (Tonkyn, Reference Tonkyn2006), can this progress be measured over two school terms in students' writing as well as their recognition vocabulary?
2. What student variables at pre-test predict post-test scores and progress in vocabulary over this period?
3. If vocabulary progress is measurable (Q1), how much progress is made compared with listening and writing?
4. What is the effect of strategy intervention that targets listening and writing? There are three possibilities:
a. there are incidental benefits to vocabulary;
b. the extra attention paid to listening and writing is at the expense of other areas of development such as vocabulary learning;
c. there is no effect on vocabulary.
2. METHOD
2.1 Participants
The Year 12 students were aged 16 to 17 and were in the first year of post-compulsory education, having elected to continue with French following their GCSE. These participants were preparing for the Advanced Subsidiary (AS) examination at the end of Year 12, with the option of continuing their studies into Year 13 and sitting the Advanced Level (A2) examination. Typically they had already studied French for five years, receiving between 400 and 600 hours of instruction. At the outset, a total of 150 pupils (120 females, 30 males) took part. By the time of the post-test, however, the number of participants had reduced, and inevitably there was some absenteeism on days when data were collected, leaving 107 pupils who completed both pre- and post-tests for listening, writing and vocabulary. Such a high attrition rate is not unusual at this level and reflects a number of factors including pupils moving to different schools or colleges, changes of subject choice and, above all, the high drop-out rates for languages.
Background data on grades indicate that the 150 participants were high achievers in their GCSE French examinations at the end of the compulsory phase of their education: 42.2% obtained an A* grade (the highest), 39.1% an A, 12.5% B, 5.5% C and 0.8% (one student) was awarded a D. This reflects the typical profile of students in AS level classes.
2.2 The schools
Students attended 15 comprehensive schools. The comparison group consisted of four schools, with the remaining eleven schools allocated to one of two treatment groups (see below). We sought to obtain a stratified sample with matched pairs, i.e. allocation of schools was conducted in such a way as to obtain three groups that were as well-matched as possible for type, location and make-up of school (e.g. general level of pupil achievement). Randomised allocation was neither ethically nor logistically possible within the framework of a state school setting. The comparison schools were not located in the same counties as the treatment schools.
2.3 Strategy instruction
Schools receiving strategy instruction were placed in one of two groups: high scaffolding (HSG, five schools) or low scaffolding (LSG, six schools). Both groups received two initial one-hour sessions from researchers, in which modelling of selected strategies for listening and writing took place. Over the following six months, class teachers then led further modelling and practice activities (five main activities were provided for teachers in each skill) in normal class time. Additionally, students were encouraged to use the strategies that had been introduced whenever they were engaged in writing and listening tasks through the use of strategy prompt sheets and record sheets. Detailed instruction notes were provided for teachers as well as briefing meetings to guide them through the implementation of the strategy instruction. For the HSG only, scaffolding of strategy use was provided in the form of additional awareness-raising and reflection about strategy use in the initial researcher-led sessions, a diary in which to record reflections on strategy use, and feedback from researchers both on these diary entries and on the strategy record sheets that learners submitted along with the language tasks that they accompanied (see Graham and Macaro, forthcoming, Reference Graham and Macaro2008). Students in the comparison group received no strategy instruction but simply followed their normal French classes.
Selection of strategies was based on the problems in strategy use exhibited by a different, but comparable, sample of students. For listening, these included: poor use of prediction and inferencing; lack of monitoring; and difficulties with identifying familiar words and word boundaries, the latter being a particular problem for English-speaking learners of French. Materials for developing effective use of prediction, inferencing and monitoring, and clusters of strategies were therefore created, along with those aimed at improving students' perception skills. For writing, students had exhibited problems at the ‘formulation’ stage of composing, i.e, the point at which they wanted to turn their ideas into French. Instruction materials therefore focused on formulation strategies (e.g., re-combining or restructuring known phrases) but also included planning (e.g. ‘mind-mapping’ or ‘brainstorming’ of known French that fitted the task requirements), monitoring and using feedback. For details and examples of materials in both skills, see Graham and Macaro (Reference Graham and Macaro2007).
2.4 Assessing students
At the beginning of their AS course (pre-test) and after two school terms (post-test), students' listening was assessed through a written recall protocol after they had listened to four short passages on the theme of holidays. A different set of passages was used each time, with the difficulty level held constant in terms of length, percentage of unknown words, speech rate and judgements of a group of students who had listened to passages during piloting. Students were instructed to write down in English everything they had understood in each passage. Responses were written during the two hearings of each passage.
Recall protocols were scored by two raters, who used a banded rating score (four bands) to assess how many idea units had been recalled (in the form of words or phrases) across all four passages. There was a high level of agreement for the total scores of the two raters: 0.95 at Time 1, at 0.96 at Time 2 (Pearson correlations). Differences in scores were resolved by discussion.
Students' performance in writing was assessed through a narrative writing task. They were given a six-picture narrative and asked to write a past-tense account of approximately 200 words in 30 minutes. A different but comparable set of pictures was used at each time point. Consulting a bilingual dictionary was allowed because bilingual dictionary use was included in the strategy training. Scoring was conducted using a six-dimensional analytical marking scheme adapted from Jacobs, Zingraf, Wormuth, Hartfiel, and Hughey (Reference Jacobs, Zingraf, Wormuth, Hartfiel and Hughey1981) and Weir (Reference Weir1993): Content (max. = 20), Organisation (20), Local coherence (20), Vocabulary (15), Grammar (15), Mechanics (10). Organisation includes the control of genre conventions, Local Coherence covers cohesion and the development and integration of ideas, and Mechanics focuses on spelling and punctuation. Each dimension was divided into bands of marks with descriptors for each band. For vocabulary, these were: ‘inadequate vocabulary, basically translation (0–2)’; ‘frequent lexical inappropriacies, circumlocution, and/or repetition (3–7)’; ‘some lexical inappropriacies and/or circumlocution (8–12)’; ‘almost no inadequacies in vocabulary for the task, effective range of vocabulary and appropriate register (13–15)’. Scoring was carried out independently by two expert raters and interrater reliability across the six categories ranged from 0.69 to 0.77 (Pearson correlations). Discrepancies were resolved by negotiation. While the reliability of writing assessment is often improved by including more than one task, the pressures of working in busy school contexts meant that this was out of the question. Fortunately, this constraint had no adverse effect on the total writing score whose reliability (Cronbach's apha) was 0.957 at pre-test and 0.955 at post-test.
In addition to vocabulary production in narratives, students' receptive vocabulary was assessed using X_Lex (Version 2.02), the Swansea Vocabulary Levels Test (Meara and Milton, Reference Meara and Milton2003). This is a computerised ‘yes/no test’ that asks respondents to indicate whether they know the meaning of a series of 120 words that appear on screen. One hundred of these words are real words taken from five frequency bands (1K to 5K) based on Baudot's (Reference Baudot1992) frequency count. In order to control for false positives resulting from guessing, overconfidence or cheating, 20 of the items are non-existent but plausible words that follow the phonotactic rules of French, for example ‘clabrer’ or ‘muce’. An adjusted score is calculated that takes account of guessing (see Milton, Reference Milton2006a). A pilot study (Richards and Malvern, Reference Richards, Malvern, Daller, Milton and Treffers-Daller2007) had shown that X_Lex was appropriate for Year 12 pupils and that their results across the five bands were sensitive to word frequency. In the intervention study reported here, students did two parallel forms of the test both at pre- and post-test. The intention was to use the second set of results because we feared that students would overstate their knowledge at their first attempt before realising how heavily this would be penalised in the adjusted score.
We have referred above to ‘vocabulary production’ and ‘receptive vocabulary’. As Read (Reference Read2000:154–157) has pointed out, however, there is much confusion about the distinctions between receptive and productive vocabulary. It is important, therefore, to note that our two measures actually address subsets of receptive and productive knowledge. Using Read's terminology, these entail recognition of pre-selected, decontextualised L2 words and ratings of contextualised use of L2 vocabulary in writing constrained only by the content of the picture stimuli.
3. RESULTS
3.1 Progress on vocabulary measures (the three groups combined)
This section addresses the first research question by examining whether, for the whole sample of students, their progress on both vocabulary indices is measurable. In doing so, we will need to consider the relative validity of the students' two attempts at X_Lex on each occasion.
It will be recalled that students had two attempts at X_Lex in order to benefit from a predicted practice effect. In practice, however, some students lost motivation on their second attempt, with several achieving surprisingly low scores and others who failed to finish (see Ns and minimum scores in Table 1). Table 1 shows the results for the four attempts including Cronbach's alpha coefficients assessed from the raw totals from each of the five frequency bands as an indicator of internal consistency. These all indicate high reliability. Tests of normality indicate that the first three sets of results were normally distributed with zero skew but the second post-test attempt was strongly negatively skewed (skew = −0.558, s.e. skew = 0.228, z = −2.45, p < 0.05) and there were two outliers with low scores. Parallel forms reliability was therefore estimated using Spearman rank order correlations. These were also satisfactory, though less impressive: 0.636 (N = 130) for the two pre-test attempts and 0.687 (N = 112) for the two post-test attempts.
Table 1. Descriptive statistics for the students' four attempts at X_Lex (maximum possible score = 5,000) and internal consistency of scores across frequency bands (Cronbach's alpha)

A comparison of the average scores for the two attempts in Table 1 using the Wilcoxon signed ranks tests shows that, contrary to expectations, there was no advantage for the second attempt at either time point (pre-test: z(129) = 0.469; post-test: z(111) = 0.169; ps > 0.05). However, on both attempts, the students performed substantially better at the second time point (first attempt: z(107) = 5.34, p < 0.001; second attempt: z(99) = 4.65, p < 0.001), thus demonstrating measurable progress on receptive vocabulary over the period of the study.
Because of concerns over loss of motivation and students failing to finish their second attempt, the analyses that follow include only the data from their first attempt at pre-test and post-test. This has the additional advantage of allowing the use of parametric statistics.
Raters' scores for vocabulary in students' writing at pre-test and post-test are shown in Table 2. Both variables are normally distributed and a paired samples t-test shows gains over the course of the study (t(112) = 5.75, p < 0.001, Eta2 = 0.228).
Table 2. Descriptive statistics for ratings of vocabulary in writing at pre-test and post-test (maximum possible score = 15)

Substantial progress can thus be reliably demonstrated for both productive and receptive vocabulary. For the latter, however, analysis of the two successive attempts suggests that students tend to demonstrate a fatigue effect rather than a practice effect.
3.2 Inter-relationships between variables at each time point
In this section we explore the relationships among the vocabulary, listening and writing variables and participants' success at GCSE as a precursor to identifying factors that predict success in vocabulary learning (Research Question 2). For the continuous variables we examine Pearson correlations at both time points. All are highly significant (Table 3), with inter-relationships being slightly weaker at the second point. Productive and receptive vocabulary are related, but of particular interest are the strong correlations between writing vocabulary and the total writing score (0.942 and 0.933), as well as the moderately strong correlations between writing vocabulary and listening (0.603 and 0.561) and between listening and writing scores (0.601 and 0.544). It must be remembered, of course, that the writing scores were not independent of vocabulary production in the sense that 15% of the total marks allocated were assigned to rating of vocabulary. A new variable was therefore created at both time points which excluded the vocabulary ratings from the total writing scores. While this does not ensure total independence because of possible halo effects between different dimensions that contribute to the total, it does provide mathematical independence. Correlations between the vocabulary rating and these new, more independent, writing variables were 0.919 at pre-test and 0.905 at post-test, consistent with a highly important contribution of vocabulary to success in writing at both time points.
Table 3. Intercorrelations between vocabulary, listening and writing variables at both time points

Note. All ps < .001.
When using examination results in educational research it is common to convert the grades into points and to treat the resulting scale as ordinal or even interval level of measurement (see, for example, Croll, Reference Croll1995). The skewed distribution of GCSE grades in our data, however, makes this inappropriate and the restricted range makes even an analysis based on rank orders questionable. Participants were therefore divided into a high grade and low grade group. In practice this meant that the high group consisted entirely of those whose grade was A* while all other grades were allocated to the low group. Univariate ANOVAs were then carried out to determine relationships with the continuous variables that entered into the correlational analysis above. At pre-test all differences were substantial and highly reliable (ps < 0.001): for X_Lex means were 2083.8 for the low group and 2576.5 for the high group (F(1,120) = 17.64, Eta2 = 0.128); for writing vocabulary the means were 6.32 and 10.06 respectively (F(1,121) = 76.20, Eta2 = 0.386); for the total writing score the means were 40.55 versus 65.24 (F(1,121) = 82.59, Eta2 = 0.406); and for listening they were 14.54 versus 22.98 (F(1,121) = 48.69, Eta2 = 0.287). As can be seen from the Eta2 values the largest effect sizes are for the two writing variables and the smallest is for the receptive vocabulary test.
3.3 Predictors of success in vocabulary learning
Having examined inter-relationships at each time point, the next step was to discover which variables predicted success in receptive and productive vocabulary after two terms and the amount of progress made (Research Question 2). Predicting success and predicting progress are, of course, entirely different procedures. Correlations between pre-test and post-test variables will predict success, but may only tell us that students who did well at Time 1 also did well at Time 2. Of more interest is progress and this needs to be assessed by the gain made from Time 1 to Time 2. The measurement of gains is problematic, however. Simple or ‘crude’ gain scores (Time 2–Time 1) or percentage gain scores ((T2−T1)/T1 × 100) are not independent of Time 1 scores, and tend to be negatively correlated with them (Barnes, Gutfreund, Satterly and Wells, Reference Barnes, Gutfreund, Satterly and Wells1983). A solution is to use residual gain scores (O'Connor, Reference O'Connor1972), that is to say the difference between actual and predicted scores obtained from the regression of the post-test scores on the pre-test scores. By definition, these are independent of Time 1 scores. Unfortunately, residual gains are affected by similar problems to other gain scores, as their reliability is not only a function of the reliability of both the Time 1 and Time 2 scores but also of the correlation between them. The stronger the correlation between Time 1 and Time 2 the lower the reliability (see Ross, Reference Ross1998). Ideally, therefore, our vocabulary scores need to be significantly correlated between the two occasions in order to justify the calculation of residuals, but not strongly enough to impair reliability. As can be seen from Table 4, this optimum level of correlation was the case for both X_Lex and writing vocabulary, and so residual gains were computed for these two variables.
Table 4. Pearson correlations between pre-test variables and success and progress in receptive and productive vocabulary

Note. aCorrelations between residual gain scores and the independent variable from which they were calculated are always zero.
*p < .05. **p < .01. ***p < .001.
Pearson correlations were computed between pre-test and post-test continuous variables (Table 4). For GCSE grade, ANOVAs were conducted with GCSE group as the independent variable. As can be seen, all the pre-test variables are highly significant predictors of post-test receptive and productive vocabulary (all ps < 0.001), although predictors for writing vocabulary tend to be rather stronger than those for X_Lex. Correlations with residual gains indicate how strongly the pre-test variables predict later success over and above what would be predicted by the students' original pre-test status on that measure. These associations can therefore be expected to be weaker than those discussed above, and this is indeed the case. Nevertheless, both receptive and productive vocabulary at the first time point is significantly associated with later vocabulary gains, as is pre-test listening. However, there is no relationship between pre-test total writing scores and gains for vocabulary. This can be accounted for by the high correlation at pre-test between writing and writing vocabulary: in other words, writing does not account for a significant amount of additional variance beyond that already explained by pre-test writing vocabulary.
Results of univariate ANOVAs with GCSE group (upper versus lower) as the independent variable follow a similar pattern: GCSE predicts post-test X_Lex scores for receptive vocabulary (F(1,100) = 6.23, p = 0.014, Eta2 = 0.059) and, much more strongly, productive vocabulary in writing (F(1,106) = 47.47, p < 0.001, Eta2 = 0.309). GCSE also weakly predicts gain scores for writing vocabulary (F(1,102) = 5.66, p = 0.019, Eta2 = 0.053) but not for X_Lex (p > 0.05).
3.4 Comparing progress within and between groups
To compare how much progress the students made on vocabulary, listening and writing (Research Question 3), a series of repeated measures ANOVAs was conducted comparing pre-test and post-test scores. In order to remove any confounding effects of the intervention, these were performed on the comparison group only. Significant progress could be detected on all variables except receptive vocabulary: X_Lex (F(1,41) = 3.51, p = 0.068), writing vocabulary (F(1,45) = 9.24, p < 0.001, Eta2 = 0.170), listening (F(1,37) = 42.85, p < 0.001, Eta2 = 0.537), writing (F(1,44) = 5.66, p = 0.022, Eta2 = 0.114). Although we are not able to show that effect sizes differ significantly from each other, it is clear that by far the largest increase is for listening. Progress on writing and written vocabulary is similar with a slight advantage for writing. No effect size is reported for receptive vocabulary as there was no significant difference between pre-test and post-test for this group.
In order to test whether the intervention programme had focused on listening and writing to the detriment of vocabulary or, conversely, whether it had provided positive spin-offs for vocabulary (Research Question 4), an ANCOVA was conducted with the group as the independent variable, pre-test receptive vocabulary as a covariate, and post-test receptive vocabulary as the dependent variable. This was not statistically significant (F(2,104) = 1.31, p = 0.275). A corresponding analysis for writing was also carried out, again with group as the independent variable, but entering pre-test writing vocabulary as the covariate and post-test writing vocabulary as the dependent variable. There was no significant effect of group (F(2,110), = 0.556, p = 0.569), indicating that the intervention had neither positive nor negative effects on students' vocabulary.
4. DISCUSSION
The research reported above investigated four research questions about the measurement, characteristics and extent of development of vocabulary proficiency over a period of two terms in English comprehensive schools. The first question considered whether students' progress could be measured reliably over such a short period. Even though Year 12 is a period of increased vocabulary growth compared with Years 7 to 11 (Milton, Reference Milton2006a), it was not clear that measures of general vocabulary proficiency, particularly the X_Lex test of receptive vocabulary, would be sensitive to developments across 15 schools using different textbooks and different examination syllabuses whose varied range of topics and linguistic content lead to different expectations concerning vocabulary. Nevertheless, for our measures of both receptive and productive vocabulary, reliable scores were obtained from which significant progress could be measured even over a relatively short period.
One potential measurement problem had been the issue with yes/no tests of how to prevent respondents from wrongly claiming to know the meaning of a word. X_Lex attempts to control for this by interspersing the stimuli with highly plausible nonsense words and heavily penalising those who claim to know them. For each such ‘error’, the raw score out of 5000 is adjusted downwards by 250 points. Thus a student who really did know all the genuine words but also claimed to know all 20 nonsense words would receive a raw score of 5000 and an adjusted score of zero. We were concerned that students prone to guessing or risk-taking would fail to realise the impact of lack of caution and obtain scores that greatly underestimated their knowledge. We therefore allowed two attempts on each occasion in order that students could benefit from practice. In the event, this provided no advantage and second attempt scores were no better than for the first attempt. If anything, some students became bored and demotivated. Nevertheless, concerns remain about the effect of test-taking style or test-taker personality on the validity of such tests, and this is an area that needs further investigation.
The second research question addressed predictors of success and progress in vocabulary. Firstly, however, concurrent intercorrelations at pre-test and post-test between measures of vocabulary, listening and writing were examined. These were all substantial and highly reliable, as are their relationships with GCSE grades at pre-test. What was particularly striking, however, were the very high correlations (0.94 and 0.93) between productive vocabulary and writing at both time points, providing evidence of the central role of vocabulary in the writing process at this stage.
This research question made an important distinction between success after two terms and progress, that is, gains, over two terms and asked which student pre-test variables predicted these for vocabulary. Both pre-test vocabulary measures and pre-test writing and listening scores were highly significant predictors of post-test vocabulary, as were GCSE grades (particularly of productive vocabulary). Our measure of progress used residual gain scores to obtain an index for each vocabulary measure that reflected progress over and above what would be predicted from the pre-test score on that measure. As would be expected, these were predicted less strongly, but were still significantly related to pre-test productive vocabulary, X_Lex and, especially, listening. GCSE grade was also a significant predictor of productive vocabulary.
With regard to our third research question concerning the relative amount of progress made in vocabulary, listening and writing, this analysis was carried out on the comparison group only in order to control for the effects of the intervention programme. It was clear that students made progress in listening, writing and writing vocabulary. The result for X_Lex, however, was not significant. It seems to be the case that it is more difficult to make measurable progress on receptive vocabulary, particularly when compared with listening which had the largest effect size. It may well be that the skill element of listening undergoes rapid development in Year 12, even for students who do not receive strategy training, and this has a large impact on test scores. Writing, on the other hand, which offers more opportunity for reflection, planning and self-direction may progress more gradually. With regard to receptive vocabulary versus vocabulary in writing, it seems likely that students would be able to demonstrate progress more easily on the latter when they are able to choose the lexical items they will use rather than relying on any chance correspondence between the vocabulary they have learnt and the items contained in X_Lex's dictionary.
Finally, we considered the three possible effects on vocabulary learning of an intervention that addressed listening and writing skills. The first possibility was that students' vocabulary in the intervention groups would benefit indirectly from the attention to listening and writing strategies, possibly through incidental learning from more effective listening, and practice and depth of processing in writing. The second possibility was that the intervention could harm lexical development by diverting attention from it, and the third possible result was that the intervention made no difference. Although we had expected vocabulary to benefit from the experimental programme, analyses found no difference between the groups. It may be, as Walters (Reference Walters2006: 238) argues, that the effects of strategy instruction on vocabulary development take time to emerge fully, perhaps because of the ‘incremental nature of incidental vocabulary development’. It is also possible that strategy instruction that had focused more sharply on word identification or inferencing might have led to greater vocabulary gains in the intervention groups. These are questions that need to be explored more fully in future research. Nevertheless, the fact that the strategy instruction had a positive impact on students' listening skills, while not jeopardising their vocabulary development suggests that strategy instruction is a fruitful avenue to explore in terms of pedagogy and improving learners' attainment in French at lower-intermediate level.