Hostname: page-component-7b9c58cd5d-9k27k Total loading time: 0 Render date: 2025-03-15T09:49:17.951Z Has data issue: false hasContentIssue false

A developmental perspective on productive lexical knowledge in L2 oral interlanguage1

Published online by Cambridge University Press:  01 November 2008

ANNABELLE DAVID*
Affiliation:
Newcastle University
*
Address for correspondence: Annabelle David, School of Modern Languages, Old Library Building, Newcastle University, Newcastle upon Tyne, NE1 7RU, UK e-mail: annabelle.david@ncl.ac.uk
Rights & Permissions [Opens in a new window]

Abstract

This article reports on productive vocabulary development by instructed British learners of French over a five-year period (from age 13 to 18). Lexical diversity development was investigated through a semi-guided oral picture-based task. Results show that the students' lexical diversity (as measured by D) did significantly improve throughout the five years showing little sign of slower periods. Overall more noun types were observed than verb types in the composition of the lexicon throughout the study but with a consistent decrease in its proportion after Year 10. Further results using the Limiting Relative Diversity measure indicate that learners vary their use of nouns to a much larger extent than verbs. The discussion focuses on the noun-bias hypothesis and the use of different elicitation tasks.

Type
Articles
Copyright
Copyright © Cambridge University Press 2008

INTRODUCTION

The aim of this paper is to describe and analyse the development of lexical diversity (the range of vocabulary used in a text or transcript) as well as the type of vocabulary used (from different grammatical categories) during semi-spontaneous oral production amongst instructed learners of French through cross-sectional and longitudinal data representing a five-year learning period. A brief background is first provided, describing the current literature in L2 vocabulary and outlining the need for the current study. Then, the data used as well as the unit of analysis and methods of analysis are presented. Finally, the details and results of the current study are laid out and discussed.

BACKGROUND

Vocabulary acquisition is one of the key basic aspects of language learning and current models of language competence all give a central role to the lexicon, including both Chomskyan minimalist accounts and connectionist and emergentist accounts (see Collentine, Reference Collentine2004 for discussion). Speakers continue to learn new words well into adulthood (Hall, Paus and Smith, Reference Hall, Paus and Smith1993) even though other aspects of language might have been fully acquired or have fossilised (e.g. grammar). Vocabulary is also one of the key predictors of school success (Verhallen and Schoonen, Reference Verhallen and Schoonen1998).

There are very few studies in the current literature about L2 lexical development. L1 acquisition research, on the other hand, has produced a much larger number of studies. L1 researchers have focused, amongst other aspects, on profiling the developing lexicon of young children and overall, cross-linguistically, the same developmental lexical patterns have been described by the literature. Children acquire nouns first, verbs second and closed-class items later (see David, Reference David2004, for a cross-linguistic discussion). However, there are differences in noun/verb productions across languages (Childers and Tomasello, Reference Childers, Tomasello, Hirsh-Pasek and Golinkoff2006). Recent research has provided evidence, against the noun-first prediction, showing that verbs can be learnt as early as nouns (e.g. Bassano, Reference Bassano2000) and that certain languages are less noun-biased. Tardif, Shatz and Naigles (Reference Tardif, Shatz and Naigles1997) found that Mandarin-speaking children produced more verbs than English- or Spanish-speaking children at earlier stages, for example.

As far as L2 acquisition is concerned, there is a growing literature on the lexical development of English as a second language (see Nation, Reference Nation2001). Key issues addressed by the literature include how words might be organised in the mind (e.g. Kroll and Tokowicz, Reference Kroll, Tokowicz and Nicol2001), how the lexicon might relate to other aspects of language (e.g. Clark, Reference Clark1993) and how to measure lexical knowledge (e.g. Meara and Milton, Reference Meara and Milton2003). Studies dealing with French are rarer. However, this is an area which is growing strongly and the present issue is proof. Lexical development amongst instructed second language learners of French in the UK educational setting has been the focus of a few studies. In a cross-sectional study, Milton (Reference Milton2006) used a test measuring receptive vocabulary knowledge (X-Lex) amongst secondary school students. He concludes that an A-level student (aged 18 and at the end of his/her secondary education) knows on average 2000 words (after learning French for seven years). He found that GCSE learners (aged around 16 years old), have receptive knowledge of about 850 words while final year graduates know about 3300 words, according to the measure used. He also noticed a slower period of lexical growth between the second and fourth year of teaching. Milton and Meara (Reference Milton and Meara1998) report on a comparative study of French foreign language learners in Britain and EFL learners in Germany and Greece, finding a passive vocabulary of about 800 French words on average after four years of study; a learning rate of some 200 words per year. All learners in their study appeared to gain between three and four words per contact hour on average, in line with other studies reviewed by Milton and Meara. These studies used vocabulary tests that assessed receptive knowledge as this is easiest to elicit (Fitzpatrick, Reference Fitzpatrick, Daller, Milton and Treffers-Daller2007). But Waring (Reference Waring1997 cited in Daller, Milton and Treffers-Daller, Reference Daller, Milton and Treffers-Daller2007) claims that productive vocabulary is only about 50% of receptive vocabulary. As a result, even though measuring receptive vocabulary is interesting for other purposes and it is linked to productive vocabulary (Laufer and Paribakht, Reference Laufer and Paribakht1998), it is not an accurate indication of a learner's productive abilities.

Currently we have few independent descriptions of vocabulary use during semi-spontaneous oral production amongst instructed foreign language learners. We do not have a detailed picture of learners' access to their L2 lexicon when faced with an unprepared oral task with an unknown interlocutor, a context frequently adopted to represent communicative competence. Such descriptions have the potential to inform teaching, at both a macro-level (syllabus design, assessment criteria) and micro-level (classroom practice). Nation (Reference Nation, Daller, Milton and Treffers-Daller2007) and Read (Reference Read2000) also propose that we need to measure vocabulary in use (e.g. having a conversation about a holiday) to gain a complete picture of the learners' vocabulary. This type of task is interesting to see how varied a learner's vocabulary is when the learner is carrying out an activity whose apparent sole purpose is not to assess vocabulary knowledge. Nation (Reference Nation, Daller, Milton and Treffers-Daller2007) points out that if a learner is asked to write a piece on (or to talk about) his latest holidays, s/he is unlikely to be aware that lexical use will be the focus of the assessment. In contrast when a learner answers a yes/no vocabulary test, s/he knows that vocabulary is the focus of the researcher's agenda. This issue is linked to incidental vocabulary learning (Gass, Reference Gass1999). Learning lexical items while this is not the target of the activity (i.e. incidental learning) is a well-documented phenomenon. Producing lexical items when it is not the focus of the task could be referred to as incidental productions. Gass (Reference Gass1999: 322) suggests that learners are more likely to learn words incidentally. Similarly, we could argue that incidental productions are more likely to be richer/larger than intentional productions. Although the topic of this study is not to compare results that different tasks could provide, the oral task that will be used in this research calls upon incidental productions.

In terms of vocabulary in use, there has been considerable attention given to assessing learners' use of vocabulary in writing (see Malvern, Richards, Chipere and Durán, Reference Malvern, Richards, Chipere and Durán2004, and see Fitzpatrick, Reference Fitzpatrick, Daller, Milton and Treffers-Daller2007 for discussion). Nevertheless, little attention has been given to oral production. One of the reasons for the lack of studies is that it remains difficult to assess precisely the breadth and depth of the lexical knowledge of language users. Tidball and Treffers-Daller (Reference Tidball, Treffers-Daller, Daller, Milton and Treffers-Daller2007) explore different measures of vocabulary richness in L2 university-level learners of French through a cartoon-based story telling task. They showed significant differences between learners at different levels using several different measures as well as correlations between lexical scores and general proficiency scores. They claim that the different measures used (or at least those that did demonstrate a difference between levels) are all valid. These are: D, Guiraud index and Guiraud Advanced (D and the index of Guiraud are further described in the methodology section of this paper). In addition, D and Guiraud did correlate strongly with each other. Other studies based on oral semi-spontaneous tasks for L2 learners exist but these are not dealing with learners of French (e.g. Daller and Xue, Reference Daller, Xue, Daller, Milton and Treffers-Daller2007). This is supported by Tidball and Treffers-Daller (Reference Tidball, Treffers-Daller, Daller, Milton and Treffers-Daller2007) who say that very few measures of lexical diversity/richness have been tried out on learners of French.

As illustrated above, a number of recent studies have focused on measurement and assessment methods (e.g. Eyckmans, van de Velde, van Hout and Boers, Reference Eyckmans, van de Velde, van Hout, Boers, Daller, Milton and Treffers-Daller2007; Brown, Reference Brown2003). Few L2 studies have focused on aspects similar to those present in the L1 literature. As mentioned earlier, one of the focuses of L1 research has been the nature of the developing lexicon. The L2 developing lexicon is much more of a mystery to researchers. In her longitudinal study about the development of narrative abilities in French, Myles (Reference Myles2003) concludes, that the learners (aged between 12 and 15) who were the best at telling a detailed story were also those with the richest vocabulary and at the more advanced syntactic stage. She also claims that lexical chunks and nouns appear first whilst verbs (produced outside of lexical chunks) come later. However, this claim is not backed up by any lexical analyses (other than the number of verbs used). Hence the development of the nature of the L2 lexicon remains under-studied.

AIM AND RESEARCH QUESTIONS

The aim of the study is to provide a developmental profile of the lexicon in secondary school students learning French by focusing on the kind of oral productive French vocabulary that is under-represented in the current literature. Through the use of corpus data, the data will span five different school years.

In view of the present aim and the background literature, we identified the following research questions which this paper will attempt to answer:

  • Does learners' productive lexical diversity increase significantly over the course of five years of instruction?

  • What is the nature of their developing productive lexicon? Are nouns and verbs developing at the same rate? Or do verbs appear later?

METHODS

Below we highlight the methods used to answer the previous research questions.

The corpus used for the current study

The data are taken from the French Learner Language oral corpora (FLLOC). This dataset (constructed by Myles, Mitchell and their research teams (e.g. Rule, Marsden, Myles and Mitchell, Reference Rule, Marsden, Myles, Mitchell, Archer, Rayson, Wilson and McEnery2003)) is publicly available to the research community (http://www.flloc.soton.ac.uk/). The corpus contains digital audio files, related transcripts formatted using the CHILDES software and conventions and files tagged for parts of speech (see Rule, Reference Rule2004; Myles, Reference Myles2005, Reference Myles and Ayoun2007 and Myles and Mitchell, Reference Myles and Mitchell2004 for discussions of the issues relating to transcription and analysis of oral L2 data). The corpus holds data from a series of cross-sectional studies from British students learning French (the youngest are in their first year of secondary school and the oldest are in their final undergraduate year at university) and native speakers. These students perform a range of semi-spontaneous oral tasks on a one-to-one basis with a researcher. These oral tasks can be used to investigate a range of issues: from the emergence of aspects of morphosyntax to lexical development, learners' use of formulaic language, or aspects of discourse development (both monologic and dialogic). Here, we concentrate on lexical development.

Participants

This study describes and compares lexical richness as measured during an oral semi-guided conversation amongst learners of French in Years 9, 10, 11, 12 and 13 of the British school system. Students are aged 13 to 14 in Year 9 and 17 to 18 in Year 13. There are 20 learners in each year group. Most of the data is cross-sectional with the exception of Year 12 and 13. Those learners were tested in Year 12 and once again, a year later, in Year 13. Therefore, the total number of different learners is 80. In Year 9 (their third year of classroom learning) students will have received about 150 hours of instruction. By Year 13, they will have had a maximum of approximately 600 hours. The learners were all tested between December and March (so half-way through the school year). The learners are from different state schools in the U.K. Year 12 and 13 learners are from the North East of England and Years 9 to 11 are from the South (the area in and around Southampton).

The oral task

All participants carried out the same task which involved a conversation about a set of six different photos including questions relating to past, current and future activities. This task takes the form of a one-to-one semi-structured interview in French between individual learners and members of the research team. The task is in two parts. In the first part, the learners are shown two separate sets of stimulus photographs representing young people doing various activities and they have to find out as much information as they can about the young people shown in the pictures, the location, and so on, by asking questions. This task is therefore referred to as the Photos task. In the second part, the researcher asks the learner a range of questions about their current interests, their family life and (for the most advanced learners) their past holidays, and their plans for the future. For this second part, the photos only serve as a starting point to the conversation but the discussion is not solely based on the photos. This task is a combination of what Laufer and Paribakht (Reference Laufer and Paribakht1998) call controlled active and free active tasks. It is controlled as the learners are told to ask questions about the picture. Consequently, they are limited to the context of the pictures. The second part, however, is an active one as they are more or less free to talk about whatever they like within the remits of the researcher's vague questions.

Unit of analysis

It is essential to define the unit of analysis that is used in this paper. According to Richards and Malvern (Reference Richards, Malvern, Daller, Milton and Treffers-Daller2007), this is one of the most crucial decisions to be taken by a researcher investigating vocabulary diversity. The ‘quick and dirty’ method (Richards and Malvern, Reference Richards, Malvern, Daller, Milton and Treffers-Daller2007: 88) adopted by many researchers who do not define what counts as a word is not sufficient. In terms of productive vocabulary use, it has been shown by Vermeer (Reference Vermeer, Bogaards and Laufer2004) that the lemma is the most valid unit of counting. Lemma will be defined here in the morphological sense of the word, as the canonical form of a word or lexeme. For example, in English, the lemma go represents the inflected forms go, goes, going, went and gone. Lemmas are especially significant in highly inflected languages such as French. There are several reasons for using lemmas as our unit of analysis. Firstly, lemmas will allow us to minimise transcription inconsistencies regarding inflected forms in particular. For example, learners often mispronounce forms such un/une or petit/petite. These mispronunciations mean that the transcribers have to make decisions as to the best way of coding them. By using lemmas as the unit of counting, the problem will be eliminated. Secondly, using lemmas rather than words means that the data and transcripts have to be carefully prepared and reliability is, therefore, increased. Finally, using lemmas as the unit of counting allows, for example, for different collocations and grammatical constructions to be counted. Nation (Reference Nation, Daller, Milton and Treffers-Daller2007) stresses that if a researcher was to use another classification, word families for example, these aspects of the lexicon would be masked. Wrong units of analysis could also lead to overestimation or underestimation of the learners' lexicon. An example would be a form-based word family where lexical items such as famille, familier, familiarité, etc would all be counted as one.

Consequently, different morphological forms of the same stem e.g. regardez and regarde or le, la and les were counted as one lemma. Words that have different meanings (e.g. avocat = avocado or lawyer/solicitor) or different grammatical functions depending on the context were also counted as one single lemma. For example, the word que can be a conjunction or a relative pronoun. No difference was made between those two uses and que was counted as one lemma. Derivational morphological forms (such as apprendre and apprenant) were counted as different lemmas though. We did not take into account grammatical inaccuracies as the aim of the paper relates to vocabulary and not grammatical development. Fillers (ah, euh), imitations of the researcher's utterances or words as well as words in languages other than French were excluded. Proper nouns (e.g. of geographical areas and people's names) were excluded from the analyses.

  • *P34: et ils font euh du scuba?

  • *JUL: ils font de la plongée.

  • *P34: de la plongée ou.

In the example above, euh was excluded (filler pause) as well as de la plongée as the learner is imitating the researchers' utterance and did not appear to have prior knowledge of the expression. In cases of repetitions and retracings (repetitions with corrections), only final repairs were counted.

  • *P47: quel a faire leur [//] le garçon?

In the example above, leur was excluded and only le was counted.

Measure used

As outlined before, there is a growing literature (see Daller, Milton and Treffers-Daller, Reference Daller, Milton and Treffers-Daller2007) on the quest for the best measure in terms assessing a learners' lexical development and in particular, lexical diversity. This paper focuses on one measure in particular: D to assess lexical diversity in the production of the learners described above.

Probably the most common measure used, TTR is based on the ratio of different words (Types) to the total number of words (Tokens). This is known as the Type-Token Ratio (TTR). However, there is well-reported controversy over this measure (see e.g. discussion in Daller et al., Reference Daller, Milton and Treffers-Daller2007), as it does not account for the fact that the longer someone speaks (or writes) for, the less varied their language is likely to be, thus misrepresenting some learners' lexical richness. The texts (or transcripts) in our corpus are of very varied length as students tend to speak more as their proficiency increases. Thus, comparing students using TTR would prove unreliable. The Guiraud (Reference Guiraud1954) index is the ratio of types by the square root of tokens (Types/√tokens). It is one of the alternatives to TTR put forward to minimise the impact of text length. The introduction of the square root compensates all samples by multiplying TTR by √N (square root of the number of word tokens). However, this measure does not change the issue of the text-length dependency (see Malvern et al., (Reference Malvern, Richards, Chipere and Durán2004) for a discussion).

Consequently, the measure used here will be D. D is a measure of lexical diversity created to avoid the inherent flaws in raw TTR and other mathematically related measure (like Guiraud). The approach taken is based on an analysis of the probability of new vocabulary being introduced into longer samples of speech (or writing). D uses random sampling of tokens in plotting the curve of TTR against the increasing token curve (see McKee, Malvern and Richards, Reference McKee, Malvern and Richards2000 for a more detailed description of the program). D has three main advantages: it is not text-length dependent; it uses all of the data in a single text (transcript); and it is more informative than TTR ‘as it is based on the TTR versus token curve calculated from data for the transcript as a whole, rather than a particular TTR value on it’ (MacWhinney, Reference MacWhinney2000). The measure has been validated across a wide range of language learners (Malvern et al., Reference Malvern, Richards, Chipere and Durán2004). D has been integrated within CLAN (Computerised Language Analysis program available through CHILDES at http://childes.psy.cmu.edu/) and is computable through the VOCD program. Recent criticisms of D have emerged. In particular, McCarthy and Jarvis (Reference McCarthy and Jarvis2007) claim that D is affected by text length. However, they conclude by saying that D remains a very useful measure and that, even if researchers need to use it with caution, ‘D is undoubtedly a better performer than most alternative indices’ (McCarthy and Jarvis, Reference McCarthy and Jarvis2007: 480).

Comparisons of different lexical categories

The present study includes comparisons of different lexical categories and the use of different types of words (nouns, verbs, adjectives, etc). To do so, type/type ratios are used (see Malvern et al., Reference Malvern, Richards, Chipere and Durán2004, for a discussion of different type/type ratios). We also calculated the Limiting Relative Diversity (LRD) measure, as proposed by Malvern et al. (Reference Malvern, Richards, Chipere and Durán2004). This is a type-type ratio that enables the study of the ratio of one category of words over another (e.g. nouns over verbs). The formula, in the case of verbs/nouns ratio, is: ${\rm LRD} = \sqrt {D(\hbox{\it verbs})/D({\it nouns})}$. This allows the researcher to examine the diversity of one word class compared to another word class and as this measure is based on D, it is not a function of text length. One drawback of this measure is that it only works if the sample of each word class is larger than 50. Therefore, beginner learners who produce less than 50 verb tokens would not be able to be included in this measure.

RESULTS

In this section we will first describe the overall development of the productive lexicon of the learners. Secondly, we will present an analysis of the types of words produced.

Overall developmental trend

The first developmental trend worth mentioning is the increase of the number of tokens in the productions of the learners. Figure 1 shows the apparent ever increasing number of types and tokens from Year 9 until Year 13.

Figure 1: Mean number of types (number of different lemmas) and tokens (total number of lemmas) produced.

Performing the same semi-spontaneous task, learners do appear to produce more and more types and tokens. This might not be surprising as far as the number of tokens is concerned. But it is not as evident for the number of types. An ANOVA highlights the fact that there are significant differences between the year groups for tokens F (4, 95) = 8.475, p < 0.001 and types F (4, 95) = 17.216, p < 0.001. Further details of the post hoc test are found in Table 1.

Table 1. Post hoc test (Tukey) for types and tokens

*The mean difference is significant at the .05 level.

Table 2 shows the mean scores and standard deviations for D per year group. It demonstrates that the five groups differ from each other in predictable ways: productive lexical diversity increases between each year group. T-tests reveal that there are significant differences between Years 9 and 10 (p < 0.05), Years 10 and 11 (p < 0.1) and Years 11 and 12 (p < 0.05). There are no significant differences between Years 12 and 13.

Table 2. Mean scores and standard deviation for D per year group

The developmental thread emerging from D is further illustrated with Figure 2. One aspect of that chart that needs to be explained is the apparent decrease of D between Year 12 and 13. First of all, the decrease is not statistically significant. But one reason for it might be that the students who did the task in Year 12 were the same who were asked to do it again in Year 13. It would appear then that these students might not have been trying as hard as in the previous year. This part of the data collection process was longitudinal and not cross-sectional. Therefore, there could be a task effect there.Footnote 2

Figure 2: Mean D score per year group.

The following example illustrates the level of vocabulary and language a Year 13 student produced during the oral task. In this example, the student recalls her Christmas holidays with very little prompting from the researcher. She uses adverbs, adjectives, conjunctions to coordinate her story as well as a range of verbs to describe different events.

  • *P31: ehm j' ai passé les vacances avec ehm ma famille.

  • *P31: ehm <c' était> [/] c' était absol(ument) [//] ab(solument) [//] absolument super.

  • *P31: euh le matin euh j'ai visité ma grand_mère.

  • *P31: et nous avons ouvrir des cadeaux et euh tous les choses euh pour la famille.

  • *P31: et nous avons <mangé le> [/] mangé le déjeuner là aussi.

  • *P31: ehm j' ai passé ehm le soir encore avec ma famille avec ma mère ehm et mon frère.

  • *P31: et euh <ça c' est> [//] ça c' était tout pour < le jour> [/] euh le jour de Noël.

  • *P31: mais ehm le jour après je [//] ehm mon frère est venu en Angleterre parce qu'il [/] il [/] euh il habite en Irlande.

  • *P31: alors il [/] il [/] il est venu en Angleterre.

  • *P31: et je [/] euh je passer le jour après <avec il> [//] avec lui.

The D values (as shown in Table 2) are relatively low compared with those found by Malvern et al. (Reference Malvern, Richards, Chipere and Durán2004). They found that students taking their oral examination at the end of Year 11 had mean D values of 56.9 (Malvern et al., Reference Malvern, Richards, Chipere and Durán2004: 102). The students in the current study have a mean D value of 23.01 in Year 11 and 28.41 in Year 12. However, the difference in scores is most likely due to the fact that the present study used lemmas as the unit of analysis and Malvern et al. counted inflected forms as different forms. The present lemmatised D values are higher, however, than those found by Tidball and Treffers-Daller (Reference Tidball, Treffers-Daller, Daller, Milton and Treffers-Daller2007) when they report mean values of 18.78 for level 1 students (i.e. first year university students of French with A level qualifications). One factor that could explain this difference is the nature of the task undertaken. Tidball and Treffers-Daller gave their students a story-telling task based on cartoon strips. As pointed out by Laufer and Paribakht (Reference Laufer and Paribakht1998), free active and controlled active tasks can yield different results. The nature of story-based tasks (which we could class as controlled active) means that students repeat certain words as they attempt to describe things happening to the same characters, for example, thus limiting their active vocabulary. We are of course aware that it is not possible to compare absolute values of D as the elicitation materials were different in all three studies discussed here. The comparisons are used here purely to illustrate the importance of the elicitation material and of the effect of lemmatisation on D-values.

Types of words usedFootnote 3

We have established that lexical diversity develops more or less constantly from Year 9 to Year 13. Our next step is to find out the composition of the productive lexicon.

We chose to run type-type ratios. We followed a method proposed by Kauschke and Hofmeister (Reference Kauschke and Hofmeister2002) where types belonging to each word class are analyzed as a proportion of all word types.

Figure 3 shows the mean percentage of noun and verb types for each year group. What this shows is that, in the overall increasing vocabulary, the proportion of nouns increases between Years 9 and 10 and thereafter it consistently decreases. This means that, after Year 10, a growing proportion of the words that are acquired are not nouns anymore. On the other hand, the proportion of verbs has the opposite effect: learners do appear to have an increasing larger proportion of verb types in their lexicon from Year 10 onwards. Between Year 9 and 10, the proportion of verbs decreases. It is possible that as well as the increasing proportion of verbs, other categories (e.g. adjectives) increase too. This should be the subject of further analyses. When comparing means, an analysis of variance shows that the difference in noun type proportions is statistically significant across the different year groups (F (4, 95) = 5.749, p < 0.001). There is also a strong negative correlation between the proportion of noun types and verb types (r = −0.315, p = 0.001). This means that as the proportion of noun types decreases the proportion of verb types increases. Overall, however, there are always more nouns used than verbs, confirming an early noun-bias in the early stages of lexical production. This appears to be in line with L1 acquisition data (see Kern, Reference Kern2007, for data on French, and Caselli, Casadio and Bates, Reference Caselli, Casadio and Bates1999, for cross-linguistic comparisons). Below are examples of two learner's production in Years 9 and 13 respectively representing an apparent move from a more ‘nouny’-type production to a more advanced production.

Figure 3: Mean noun types and verb types as a proportion of all word types per year group.

Example 1:

  • *P11: oh euh # nom de garçon?

  • *FLO: c'est David.

  • *P11: nom de fille?

  • *FLO: elle s'appelle Lisa.

  • *P11: um quel âge le garçon?

Example 2:

  • *P24: ehm pourquoi est ce qu'elles euh sont allées là?

  • *VIV: elles travaillent comme bénévoles oui.

  • *P24: ehm qu'est ce qu'elles ont fait là?

  • *VIV: alors ici ehm elles ont plongé.

  • *P24: ehm qu'est ce qu'elles veulent faire à l' avenir?

From Example 1, we can see that student P11 (in Year 9) makes no use of verbs and simply uses nouns to ask questions. In contrast, student P24 (in Year 13) does use verbs in the right places and in morphologically and syntactically complex utterances. It is important to note that type-type ratios are dependent on text length. Subsequently, it is possible that this result is influenced by the growing mean number of types and tokens produced by the learners across the year groups.

Secondly, we use the Limiting Relative Diversity measure (LRD) for verbs over nouns to allow us to compare the diversity of nouns and the diversity of verbs within the productive lexicon of the learners.

Overall, as Table 3 shows, LRD indicates, for all year groups, that the noun category is (token for token) more diverse than the verb category (overall mean ratio verbs/nouns: 0.345). Furthermore, no trend is evident and no statistical difference is present across the groups (F (4, 43) = 0.101, p = 0.981). This means that the relative diversity of the two categories remains stable. However, as evidenced in Table 3, LRD could only be calculated for a small number of students as most did not produce enough (lemma) tokens (a minimum of 50 is required by the program). So, the lack of a pattern could be due to simply the lack of data. Alternatively, it could simply be that new verbs and nouns are developed in the same way as those learnt previously and that relative diversity of the two categories remains stable.

Table 3. Mean, standard deviation, minimum and maximum Limiting Relative Diversity (LRD) for each year group

DISCUSSION AND CONCLUSION

Previous studies have shown that the overall (receptive/passive) vocabulary of French instructed learners increases (e.g. Milton, Reference Milton2006). This study shows that second language learners' lexical diversity increases in a rather constant positive trend from near-beginner stages through to more advanced stages. Their ability to use that increasing knowledge observed by others also increases. The results (as calculated by D) show clear progressive improvement in lexical diversity and range. It could be said that there is a ‘slow down’ in or around Years 10 and 11 as the difference found between those two groups was only marginally significant (at 10% level while others were at 5%). However, this ‘slow down’ in lexical diversity is not observed at the same time as that observed by Milton (Reference Milton2006), for example. He finds a slower growth in receptive vocabulary acquisition between the second and fourth year of teaching (i.e. Years 8 and 10). Clearly this will have to be further investigated with a study looking at both receptive and productive vocabularies. The largest increase in our data was found between Years 11 and 12. But this is to be expected as Year 12 learners are those who have opted to further their learning of French to an advanced level. There is no observed difference between Years 12 and 13. This could be due to the test repetition effect or it could be an indication of a ceiling effect on that particular task. This will need to be further analysed with comparisons with native speakers' productions performing the same task.

As far as the composition of the lexicon is concerned, the two measures used to assess the use of verbs and nouns enable us to present a more detailed picture of the lexical diversity of learners. Throughout the study period, more nouns types are used than verb types. This noun-bias is more pronounced in the earlier period of study (between Years 9 and 10). The noun-bias between Years 12 and 13 is not due to the increase of the proportion of verb types as this remains largely constant. This suggests that learners are using more types of words which are neither nouns nor verbs. Further analyses taking into account more parts-of-speech need to be carried out. In spite of the greater rate of increase in verb types, the relative diversity of verbs to nouns remains stable. This indicates that, in this task, learners vary the nouns they use to a larger extent than their verbs. It has been suggested in the literature that nouns were acquired first because the nouns children use label concrete, individual and enduring objects (e.g. Gentner, Reference Gentner and Kuczaj1982). However, this is not an argument which is completely valid as far as L2 learners are concerned. Myles (Reference Myles1995, Reference Myles, Towell and Hawkins2004) suggests that verbs take longer to be acquired, and therefore used, as they require more processing. Knowing a verb involves knowing its argument structure: knowing about what kind of complements/subjects they require. This is often too complex for beginner learners. There is little morphology on French nouns (compared to verbs) which may make them easier to process and the use. An alternative option would be that learners are mostly taught nouns in earlier stages, at least. A further study controlling input would be useful to test this hypothesis.

The picture of productive lexical development we have provided is far from being complete. It represents only one aspect of vocabulary (productive) and it is limited by the semantic fields relating to the pictures. As with any measure of productive lexical knowledge, there is no indication of the limits of knowledge of our informants. It would be interesting to compare the production of learners in light of the distinction Laufer and Paribakht (Reference Laufer and Paribakht1998) make (i.e. controlled active versus free active). A different type of elicitation task, more controlled (e.g. story-telling) might help in that respect. This study has highlighted the issue of comparisons based on the same measure (D or any other measure of lexical diversity) using different task types. This is a topic which warrants further research in the future.

Footnotes

1

The research reported here is based on data collected during the FLLOC project (directed by Florence Myles and Ros Mitchell) funded by the UK Economic and Social Research Council (ESRC) award numbers R000223421, RES000220070, the Arts and Humanities Research Council (AHRC) RE-AN9057/APN-15456, AR112118 and the British Academy SG 41141 since 2001, at the University of Southampton and Newcastle University. Special thanks go to all of the participants and native speakers for their help with data collection and transcription. The author would like to thank: Sarah Rule for her help with checking the lemmatisation of the data, Florence Myles, and three anonymous reviewers for their constructive comments on this paper.

2 Or as one reviewer suggested, ‘informant fatigue’ where the learner behaves differently in response to the same task being put forward again (manifested by a lack of motivation in some cases).

3 In this section of the results, lemmas are still used as the unit of analysis but they have been classified according to their grammatical category in the given context. Hence, the word grand, for example, could be counted as either an adjective or a noun depending on the context.

References

REFERENCES

Bassano, D. (2000). Early development of nouns and verbs in French: exploring the interface between lexicon and grammar. Journal of Child Language, 27: 521559.Google Scholar
Brown, A. (2003). Interviewer variation and the co-construction of speaking proficiency. Language Testing, 20.1: 125.CrossRefGoogle Scholar
Caselli, C., Casadio, P. and Bates, E. (1999). A comparison of the transition from first words to grammar in English and Italian. Journal of Child Language, 26: 69111.CrossRefGoogle ScholarPubMed
Childers, J. B. and Tomasello, M. (2006). Are nouns easier to learn than verbs? Three experimental studies. In: Hirsh-Pasek, K. and Golinkoff, R. (eds.), Action Meets Word: How Children Learn Verbs. Oxford: Oxford University Press.Google Scholar
Clark, E. (1993). The Lexicon in Acquisition. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Collentine, J. (2004). The effects of learning contexts on morphosyntactic and lexical development. Studies in Second Language Acquisition, 26.2: 227248.Google Scholar
Daller, H., Milton, J. and Treffers-Daller, J. (eds.). (2007). Modelling and Assessing Vocabulary Knowledge. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Daller, H. and Xue, H. (2007). Lexical richness and the oral proficiency of Chinese EFL students. In: Daller, H., Milton, J. and Treffers-Daller, J. (eds.), pp. 150–164.Google Scholar
David, A. (2004). The developing bilingual lexicon. Unpublished PhD thesis, Newcastle University.Google Scholar
David, A. (2007). Do story (re-)telling tasks really tell us something about lexical development? Paper presented at the Models and Concepts: practical needs and theoretical approaches in modelling and measuring vocabulary knowledge (ESRC seminar series), Swansea, UK, 6–7 July 2007.Google Scholar
Eyckmans, J., van de Velde, H., van Hout, R. and Boers, F. (2007). Learners' response behaviour in yes/no vocabulary tests. In: Daller, H., Milton, J. and Treffers-Daller, J. (eds.), pp. 116–132.Google Scholar
Fitzpatrick, T. (2007). Productive vocabulary tests and the search for concurrent validity. In Daller, H., Milton, J. and Treffers-Daller, J. (eds.), pp. 116–132.CrossRefGoogle Scholar
Gass, S. (1999). Discussion. Incidental vocabulary learning.Studies in Second Language Acquisition, 21: 319333.CrossRefGoogle Scholar
Gentner, D. (1982). Why nouns are learned before verbs: Linguistic relativity versus natural partitioning. In Kuczaj, S. A., II (ed.), Language Development: Vol. 2. Language, Thought, and Culture. Hillsdale, NJ: Erlbaum, pp. 301334.Google Scholar
Guiraud, P. (1954). Les Caractéristiques statistiques du vocabulaire. Paris: Presses Universitaires de France.Google Scholar
Hall, J., Paus, C. and Smith, J. (1993). Metacognitive and other knowledge about the mental lexicon: do we know how many words we know? Applied Linguistics, 14.2:189206.Google Scholar
Kauschke, C. and Hofmeister, C. (2002). Early lexical development in German: a study on vocabulary growth and vocabulary composition during the second and third year of life. Journal of Child Language, 29: 735757.CrossRefGoogle Scholar
Kern, S. (2007). Lexicon development in French-speaking infants. First Language, 27.3: 227250.Google Scholar
Kroll, J. and Tokowicz, N. (2001). The development of conceptual representation for words in a second language. In Nicol, J. (ed.), One Mind Two Languages: Bilingual Language Processing. Cambridge, MA: Blackwell, pp. 4971.Google Scholar
Laufer, B. and Paribakht, T. S. (1998). The relationship between passive and active vocabularies: effects of language learning contexts. Language Learning, 48.3: 365391.Google Scholar
MacWhinney, B. (2000). The CHILDES Project: Tools for Analyzing Talk, 3rd edn, vol. 2. Mahwah, NJ: Erlbaum.Google Scholar
Malvern, D., Richards, B., Chipere, N. and Durán, P. (2004). Lexical Diversity and Language Development: Quantification and Assessment. Basingstoke: Palgrave Macmillan.CrossRefGoogle Scholar
McCarthy, P. and Jarvis, S. (2007). Vocd: a theoretical and empirical evaluation. Language Testing, 24.4: 459488.CrossRefGoogle Scholar
McKee, G., Malvern, D. and Richards, B. (2000). Measuring vocabulary diversity using dedicated software. Literary and Linguistic Computing, 15.3: 323337.CrossRefGoogle Scholar
Meara, P. and Milton, J. (2003). X_Lex The Swansea Levels Test. Newbury: Express.Google Scholar
Milton, J. (2006). Language lite? Learning French vocabulary in school. Journal of French Language Studies, 16.2, 187205.Google Scholar
Milton, J. and Meara, P. (1998). Are the British really bad at learning foreign languages? Language Learning Journal, 18: 6876.Google Scholar
Myles, F. (1995). Interaction between linguistic theory and language processing in SLA. Second Language Research, 11.3: 235266.CrossRefGoogle Scholar
Myles, F. (2003). The early development of L2 narratives: a longitudinal study. Marges Linguistiques, 5: 4055.Google Scholar
Myles, F. (2004). From data to theory: the over-representation of linguistic knowledge in Towell, SLA. In R. and Hawkins, R. (eds.), Empirical evidence and theories of representation in current research in Second Language Acquisition: Special Issue of Transactions of the Philological Society, pp. 139168.Google Scholar
Myles, F. (2005). Interlanguage corpora and second language acquisition research. Second Language Research, 21.4: 373391.CrossRefGoogle Scholar
Myles, F. (2007). Using electronic corpora in SLA research. In: Ayoun, D. (ed.), Handbook of French Applied Linguistics. Amsterdam and Philadelphia: John Benjamins, pp. 377400.Google Scholar
Myles, F. and Mitchell, R. (2004). Using information technology to support empirical SLA research. Journal of Applied Linguistics, 1.2: 169196.CrossRefGoogle Scholar
Nation, I. (2001). Learning Vocabulary in Another Language. Cambridge: Cambridge University Press.Google Scholar
Nation, P. (2007). Fundamental issues in modelling and assessing vocabulary knowledge. In: Daller, H., Milton, J. and Treffers-Daller, J. (eds.), pp. 33–43.Google Scholar
Read, J. (2000). Assessing Vocabulary. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Richards, B. and Malvern, D. (2007). Validity and threats to the validity of vocabulary measurement. In Daller, H., Milton, J. and Treffers-Daller, J. (eds.), pp. 79–92.CrossRefGoogle Scholar
Rule, S. (2004). French interlanguage oral corpora: recent developments. Journal of French Language Studies, 14: 343356.CrossRefGoogle Scholar
Rule, S., Marsden, E., Myles, F. and Mitchell, R. (2003). Constructing a database of French interlanguage oral corpora. In: Archer, D., Rayson, R., Wilson, E. and McEnery, T. (eds.), Proceedings of the Corpus Linguistics 2003 Conference, Vol. 16. University of Lancaster: UCREL Technical Papers, pp. 669–677.Google Scholar
Tardif, T., Shatz, M. and Naigles, L. (1997). Caregiver speech and children's use of nouns versus verbs: a comparison of English, Italian and Mandarin. Journal of Child Language, 24: 535565.CrossRefGoogle ScholarPubMed
Tidball, F. and Treffers-Daller, J. (2007). Exploring measures of vocabulary richness in semi-spontaneous French speech. In Daller, H., Milton, J. and Treffers-Daller, J. (eds), pp. 133–149.Google Scholar
Verhallen, M. and Schoonen, R. (1998). Lexical knowledge in L1 and L2 of third and fifth graders. Applied Linguistics, 19.4: 452470.Google Scholar
Vermeer, A. (2004). The relation between lexical richness and vocabulary size in Dutch L1 and L2 children. In: Bogaards, P. and Laufer, B. (eds.), Vocabulary in a Second Language. Amsterdam and Philadelphia: John Benjamins, pp. 173189.Google Scholar
Waring, R. (1997) A comparison of the receptive and productive vocabulary sizes of some second language learners. Immaculata: The Occasional Papers at Notre Dame Seishin University, 94114.Google Scholar
Figure 0

Figure 1: Mean number of types (number of different lemmas) and tokens (total number of lemmas) produced.

Figure 1

Table 1. Post hoc test (Tukey) for types and tokens

Figure 2

Table 2. Mean scores and standard deviation for D per year group

Figure 3

Figure 2: Mean D score per year group.

Figure 4

Figure 3: Mean noun types and verb types as a proportion of all word types per year group.

Figure 5

Table 3. Mean, standard deviation, minimum and maximum Limiting Relative Diversity (LRD) for each year group