1. INTRODUCTION
Since inflectional morphology is semantically predictable, obligatory and rule-bound (Bybee Reference Bybee1985), it starts to occur early in child language (Brown Reference Brown1973). Noun plural (pl)Footnote 1 is one of the first inflectional categories to appear and develop in child language, typically in the child's second year of life (e.g. Slobin Reference Slobin1985a, b; Stephany Reference Stephany, Voeikova and Dressler2002; Bleses et al. Reference Bleses, Vach, Slott, Wehberg, Thomsen, Madsen and Basbøll2008b). However, for many languages it takes several years before the noun pl inflectional system is fully acquired (e.g. Berko Reference Berko1958, Bybee & Slobin Reference Bybee and Slobin1982, Marcus et al. Reference Marcus, Pinker, Ullman, Hollander, Rosen and Xu1992, Bybee Reference Bybee1995, Plunkett & Nakisa Reference Plunkett and Nakisa1997, Clahsen, Rothweiler & Roca Reference Clahsen, Aveledo and Roca2002).
The frequency of different linguistic structures in the input to children seems to influence how the child processes and produces language, and which aspects of language he/she acquires first (Bybee & Hopper Reference Bybee and Hopper2001), for example regarding lexical acquisition (e.g. Goodman, Dale & Li Reference Goodman, Dale and Li2008). Drawing on earlier research, we can make predictions about when certain linguistic structures might appear in different languages on the basis of frequency, everything else being equal. But everything else is not always equal (Demuth Reference Demuth, Gülzow and Gagarina2007). For example, languages differ in morphological complexity and sound structure. Morphological richness (a measure of the number of overtly marked grammatical categories and of distinctions within a category) may have an effect on how the child is tuned into morphology so that an inflectional system that is morphologically rich is acquired earlier than a system with a more sparse inflectional morphology (Slobin Reference Slobin1985a, b; Bates & MacWhinney Reference Bates, MacWhinney and MacWhinney1987; Dressler Reference Dressler, Kail and Hickman2010; Xanthos et al. Reference Xanthos, Laaha, Gillis, Stephany, Aksu-Koç, Christofidou, Gagarina, Hrzica, Ketrez, Kilani-Schoch, Korecky-Kröll, Kovačević, Laalo, Palmović, Pfeiler, Voeikova and Dressler2011). It is generally assumed that phenomena with a ‘one form – one function’ relationship are more easily acquired than phenomena where one form has many functions, or one function has many forms (Operating Principles, e.g. Slobin Reference Slobin1985a, b). Studies have furthermore shown that agglutinating languages, in which an affix typically only expresses one grammatical category, lead to earlier acquisition of the system compared to fusional languages, where one affix typically expresses a combination of specific features, and the system is thereby less transparent (Argus Reference Argus, Stephany and Voeikova2009, Dressler Reference Dressler, Kail and Hickman2010).
Although numerous studies have suggested that different factors, such as morphological richness, sound structure and frequency, have an impact on the acquisition of noun pl, we still do not know exactly how these factors interact and what specific impact they have on the acquisition. The factors differ across languages and across inflectional systems, and there is also individual variation across children, for example with regard to age of acquisition of inflectional markers, even among children acquiring the same language. However, studies have found several similarities in the order of acquisition of inflectional markers across children acquiring the same language (Brown Reference Brown1973, Pizzuto & Caselli Reference Pizzuto, Caselli and Levy1994).
The aim of the present study is to examine the development of noun pl in Danish children from first appearance to the age of 10 years. To get a fuller picture of Danish children's development of the noun pl inflectional category we compare different kinds of data (i.e. dictionary data, naturalistic spontaneous child language input and output, semi-naturalistic/semi-experimental data, experimental data and reported data, see Section 2). The focus of the present study is the impact of sound structure and input frequency on the development of noun pl inflection in Danish children. For general trends of Danish children's development of the noun pl inflectional system based on the same data, see Kjærbæk (Reference Kjærbæk2013). The study takes the point of view of the language-acquiring child. Thus we adopt a sound perspective, namely phonology. Reduction processes in Danish conspire to make the syllable structure opaque with few and vague cues for identifying the suffix boundaries due to final consonant weakening and schwa-reductions. This is a challenge when a child is to grasp the phonological and then the morphological structure of the language, and we therefore expect specific properties of Danish to be difficult to acquire. In particular, suffix boundaries are opaque in Danish child language input (Bleses, Basbøll & Vach Reference Bleses, Basbøll and Vach2011).Footnote 2
1.1 Danish noun plurals
The analyses presented in this study take their point of departure in the phonologically-based description of the Danish noun pl inflectional system presented in Basbøll, Kjærbæk & Lambertsen (Reference Basbøll, Kjærbæk and Lambertsen2011). We present a brief summary here. Danish is an inflectional-fusional language. The singular indefinite (sg indf) form is the basic form of the noun, morphologically speaking, and the pl form of a noun can be formed from the sg in four ways:Footnote 3
- (1)
a. pl suffix: e.g. bil [biːʔl] ‘car’ – bil-er [ˡbiːʔlɐ] ‘cars’
b. No change: e.g. mus [muːʔs] ‘mouse’ – mus [muːʔs] ‘mice’
c. Stem change: e.g. mand [manʔ] ‘man’ – mænd [mɛnʔ] ‘men’
d. Stem change + pl suffix: e.g. fod [foðʔ] ’foot’ – fødd-er [ˡføðʔɐ] ‘feet’
We do not regard zeroes as morphemes in unmarked members of morphological categories – and especially not a sequence of morphological zeroes, as in kat ‘cat’ for sg + indf + non-possessive; see also Basbøll (Reference Basbøll, Fraser and Turner2009). In this paper, we operate with a zero suffix, as in for example plmus ‘mice’ (the marked member of the category of number), but not in sgmus ‘mouse’ (the unmarked member). The reason is that sg never has an overt suffix whereas pl does, in the large majority of cases. We also distinguish between overt pl suffixes and zero suffix since zero is different from a non-null suffix in being inaudible.
In Danish there are, phonologically speaking, two overt pl suffixes (a-schwa, as in bil [biːʔl] ‘car’ – bil-er [ˡbiːʔlɐ] ‘cars’, and e-schwa, as in blik [bleg] ‘gaze’ – blikke [ˡblegə] ‘gazes’), and a zero suffix (e.g. mus [muːʔs] ‘mouse’ – mus [muːʔs] ‘mice’).Footnote 4Table 1 shows the lexical frequencies of the pl suffixes in Danish according to our purely phonologically/phonetically-based categorization of the suffixes (with no attention paid to morphophonology, see Basbøll et al. Reference Basbøll, Kjærbæk and Lambertsen2011). Lexical frequencies are here taken from the OLAM database, which is our computational linguistic coding and analysis system for Danish.Footnote 5 The pl suffix a-schwa is clearly dominant (lexical frequency 87.8%), whereas the zero suffix (8.3%) and the e-schwa suffix (3.9%) are infrequent.
The two overt native pl suffixes (a-schwa, e-schwa) consist of neutral – i.e. non-full – vowels and thus they constitute a natural phonological class (since they are the only neutral vowels of Danish), i.e. they can be defined by a shared set of distinctive features that are not found together in any other segments. The e-schwa suffix is, furthermore, very often reduced – assimilated or dropped – and it thereby contributes to opacity of the phonetic structure, as in for example hus-e ‘houses’ when pronounced [ˡhuːːs] (which is disyllabic) instead of [ˡhuːsǝ], and bjørn-e ‘bears’ when pronounced [ˡbjɶːn] or [ˡbjɶ] instead of [ˡbjɶnə].
Appendix Tables A1 and A2 show the 23 pl markers in Danish according to Basbøll et al. (Reference Basbøll, Kjærbæk and Lambertsen2011). Each pl marker combines a pl suffix and a specific stem change (including no change). Items 7 and 8 are not pl markers in the strict sense. Noun pl with insertion of /r/ and /n/, as in fætter [ˡfɛdɐ] ‘cousin’ – fætre [ˡfɛdʁɐ] ‘cousins’ and øje [ˡʌjə] ‘eye’ – øjne [ˡʌjnə] ‘eyes’, have two possible analyses according to the principles we adopt: they can be considered as having a non-null pl suffix, i.e. a-schwa and e-schwa, respectively, combined with the phonemic stem change and syncope; this analysis is used in Laaha et al. (Reference Laaha, Kjærbæk, Basbøll and Dressler2011). Or they can be considered as having a zero pl suffix, and then the segmental stem change (insertion of /r/ or /n/) will be the only overt pl marker; this is the analysis chosen in the present paper (as in Basbøll et al. Reference Basbøll, Kjærbæk and Lambertsen2011).
It is clear from Appendix Table A1 that the pl marker ‘ɐ’ has the highest lexical frequency in Danish (35.4%), followed by ‘ɐ+’ (i.e. ɐ-suffix with stød addition, 20.1%), ‘(ə)ɐ’ (i.e. /ɐ/ with apocope of stem-final /ə/, 13.1%) and ‘ø’ (6.1%). Stød is a laryngeal syllable rhyme prosody with a grammatically complex distribution, see Basbøll Reference Basbøll2005:82–87. The other nineteen pl markers are rare (0.005%–2.1%). Of the nouns, 17.5% have no pl forms, i.e. they only occur in sg.
Danish pluralization is transparent where a suffix is just added to the sg stem (with no stem change). It is on the other hand opaque when the stem changes from sg to pl (e.g. mand [manʔ] ‘man’ – mænd [mɛnʔ] ‘men’) without the addition of a suffix. Stem change is a factor which contributes to make the pl formation more opaque and the acquisition more complex.
1.2 Degrees of stem change
In this study we distinguish between three degrees of stem change. The first one is no change, where the pl formation involves no phonological change of the pl stem compared to the sg stem, e.g. banan [balnæːʔn] ‘banana’– bananer [balnæːʔnɐ] ‘bananas’. In the category no change we include apocope, as in bamse [lbɑmsə] ‘teddy bear’ – bamser [lbɑmsɐ] ‘teddy bears’, a stem change which is automatic (when the first vowel is e-schwa) and thus with no alternative: *[lbɑmsəɐ] is an impossible structure.Footnote 6
The second one is prosodic change, where the pl formation involves a phonological change of the pl stem – compared to the sg stem – which is prosodic, i.e. involves syllabic and/or accentual structure: (i) stød addition, e.g. ballon [ballʌŋ] ‘balloon’– balloner [ballʌŋʔɐ] ‘balloons’; (ii) stød drop, e.g. bord [boʔ] ‘table’– borde [lboːɐ] ‘tables’; (iii) syncope, e.g. gaffel [lgɑfəl]Footnote 7 ‘fork’ – gafler [lgɑflɐ] ‘forks’; and/or (iv) a combination of change in vowel length and a-quality, e.g. blad [blað] ‘leaf’– blade [lblæːðə] ‘leafs’. We consider the change in (iv) prosodic because vowel length is prosodic in our view. Vowel length is unstable before vocoids (including [ð], which in Danish is not an obstruent), so the quality difference between [a] and [æ(ː)] is a signal for an underlying quantity – hence prosodic – difference.
The third degree of stem change is phonemic change, where the pl formation involves a phonological change of the pl stem – compared to the sg stem – which is segmental and non-automatic, i.e. ‘phonemic’. In other words, the sg stem and the pl stem differ with regard to (segmental) phonemes. This category contains (i) r-insertion, e.g. fætter [lfɛdɐ] ‘cousin’ – fætre [lfɛdʁɐ] ‘cousins’; (ii) n-insertion, øje [lʌjə] ‘eye’ – øjne [lʌjnə] ‘eyes’; and/or (iii) umlaut, e.g. mand [manʔ] ‘man’ – mænd [mɛnʔ] ‘men’.
1.3 Degrees of productivity of pl markers
We define productivity as the ability of the inflectional marker to occur on new words (Basbøll Reference Basbøll2005:352). For the pl system this means the ability to add the pl marker – the term ‘pl marker’ refers to the 23 pl categories seen in Appendix Table A1 – to a new noun in order to create a pl form of this noun. We find the distinction between ‘regular’ and ‘irregular’ somewhat problematic. It is obvious that this distinction originates from studies of English, which is characterized by having one default inflectional marker for a grammatical category (e.g. the pl suffix -s) and a minor number of exceptions to this default rule. But this is not the case for all languages, for example Danish (Basbøll et al. Reference Basbøll, Kjærbæk and Lambertsen2011) and even more so German (e.g. Laaha et al. Reference Laaha, Ravid, Korecky-Kröll, Laaha and Dressler2006), which have several competing inflectional markers. In order to address this issue we have developed a scale with three degrees of productivity.
The first of these is fully productive pl markers where the pl formation involves addition of the a-schwa (/ɐ/) suffix without phonemic change (e.g. banan [balnæːʔn] ‘banana’ – bananer [balnæːʔnɐ] ‘bananas’, bamse [lbɑmsə] ‘teddy bear’ – bamser [lbɑmsɐ] ‘teddy bears’, baby [lbɛjbi] ‘baby’ – babyer [lbɛjbiːʔɐ] ‘babies’, sofa [lsoːfa] ‘sofa’ – sofaer [lsoːfæːʔɐ] ‘sofas’,Footnote 8bord [boʔ] ‘table’ – borde [lboːɐ] ‘tables’, gaffel [lgɑfəl] ‘fork’ – gafler [lgɑflɐ] ‘forks’).
The second one is semi-productive pl markers where the pl formation involves addition of the e-schwa (/ə/) and zero suffix without phonemic change (e.g. blik [bleg] ‘gaze’ – blikke [lblegə] ‘gazes’, bjørn [bjɶʔn] ‘bear’ – bjørne [lbjɶnə] ‘bears’, gamling [lgɑmleŋ] ‘oldie’– gamlinge [lgɑmleŋ(ʔ)ə] ‘oldies’, blad [blað] ‘leaf’ – blade [lblæːðə] ‘leafs’, mål [mɔːʔl] ‘goal’ – mål [mɔːʔl] ‘goals’).
The third degree of productivity is unproductive pl markers where the pl formation involves phonemic change or addition of the foreign pl suffixes /s/, /a/ and /i/.Footnote 9 Examples with phonemic change include bror [bʁo] ‘brother’ – brødre [lbʁœðʁɐ] ‘brothers’, ko [koːʔ] ‘cow’ – køer [lkøːʔɐ] ‘cows’, bonde [lbɔnə] ‘farmer’ – bønder [lbœnʔɐ] ‘farmers’, fætter [lfɛdɐ] ‘cousin’ – fætre [lfɛdʁɐ] ‘cousins’, datter [ldadɐ] ‘daughter’ – døtre [ldødʁɐ] ‘daughters’, finger [lfeŋʔɐ] ‘finger’ – fingre [lfeŋʁɐ] ‘fingers’, mand [manʔ] ‘man’ – mænd [mɛnʔ] ‘men’, øje [lʌjə] ‘eye’ – øjne [lʌjnə] ‘eyes’, gås [gɔːʔs] ‘goose’ – gæs [gɛs] ‘geese’. Examples with the foreign pl markers /s/, /a/ and /i/ include drink [dʁɛŋg] ‘drink’ – drinks [dʁɛŋgs] ‘drinks’, faktum [lfɑgtɔm] ‘fact’ – fakta [lfɑgta] ‘facts’, stimulus [lsdiːʔmulus] ‘stimulus’– stimuli [lsdiːʔmuli(ːʔ)] ‘stimuli’.
1.4 Frequency measures
In this study we operate with three kinds of frequency measure in the analysis of our child language data. The first one is lemma frequency, which is the frequency of different lemmas, i.e. how many lemmas occur in a given category in relation to the number of all lemmas in the dataset. For example, the naturalistic spontaneous input data consist of 1,574 different noun lemmas (e.g. bold ‘ball’, kop ‘cup’, mål ‘goal’). Out of these noun lemmas, 456 occur in their pl form (e.g. kopper ‘cups’), which means that the pl nouns have a lemma frequency of (456/1,574) × 100, i.e. 29.0%.
The second frequency measure is (word form) type frequency, which is the frequency of different types (word forms), i.e. how many types occur in a given category in relation to the number of all types in the dataset. For example, the naturalistic spontaneous input data consist of 2,416 different noun types (e.g. kop, koppen ‘the cup’, kopper, kopperne ‘the cups’). Out of these noun types, 577 occur in their pl form (e.g. kopper, kopperne), which means that the pl nouns have a (word form) type frequency of (577/2,416) × 100, i.e. 23.9%.
The third frequency measure is token frequency, which is the frequency of actually occurring words, i.e. how many word tokens occur in a given category in relation to the number of all word tokens in the dataset. For example, the naturalistic spontaneous input data consist of 14,126 noun tokens (e.g. bolde ‘balls’, bolde, kop, kop, kop, kopper). Out of these noun tokens, 2,171 occur in their pl form (e.g. bolde, bolde, kopper), which means that the pl nouns have a token frequency of (2,171/14,126) × 100, i.e. 15.4%.
These three frequency measures are used throughout this study. When lexical frequency is given, it should be understood as the frequency of different lemmas in the Danish language – defined here as lexical entries in the OLAM database – in a given category in relation to all lemmas in the selected paradigm. For example, the pl marker ‘ɐ’ (e.g. in banan-er ‘bananas’, bil-er ‘cars’) occurs in 7,599 nouns and thus has a lexical frequency of (7,599/17,594) × 100, i.e. 35.4%, out of all 17,594 nouns.
1.5 Acquisition studies on Danish noun plurals
There are, to our knowledge, no published comprehensive studies on Danish children's acquisition of noun pl. We have only succeeded in finding very few sporadic results. The great Danish linguist Otto Jespersen described how Danish children tend to make errors when they are to produce irregular pl forms, e.g. *[lmɛnʔɐ] as an error form of mænd [mɛnʔ] ‘men’(Jespersen Reference Jespersen1923:100). Kim Plunkett conducted a longitudinal observational study of two Danish children, Anne and Jens (Plunkett Reference Plunkett1985, Reference Plunkett1986). According to Plunkett's study, the girl Anne started to produce noun pl forms at the age of 1;8 (i.e. one year and eight months) and the boy Jens started to produce noun pl forms around the age of 2;0 (Plunkett & Strömqvist Reference Plunkett, Strömqvist and Slobin1992:524–525). According to the Danish cross-sectional Communicative Developmental Inventory study, 50% of Danish children have started to produce noun pl forms at the age of 2;1 (Bleses et al. Reference Bleses, Vach, Wehberg, Faber and Madsen2007:117). Danish noun pl acquisition is treated as part of international projects in Ravid et al. (Reference Ravid, Dressler, Nir-Sagiv, Korecky-Kröll, Souman, Rehfeldt, Laaha, Bertl, Basbøll, Gillis and Behrens2008) and Laaha et al. (Reference Laaha, Kjærbæk, Basbøll and Dressler2011).
1.6 Predictions
The aim of the present study is to investigate the impact of sound structure and input frequency on Danish children's acquisition of noun pl. For this purpose we chose a multi-method research approach comparing (i) dictionary data, (ii) naturalistic spontaneous child language input and output, (iii) semi-naturalistic/semi-experimental data, (iv) experimental data, and (v) reported data. We will seek to answer the following research questions:
- (2)
a. Do aspects of the sound structure (phonetics, phonology and morpho-phonology) have impact on the acquisition of pl formation in Danish?
b. Does the lexical frequency have an impact on the input frequency, and does the input frequency have an impact on the output frequency, relating to the acquisition of pl formation?
We hypothesize that the transparency of the sound structure and input frequency have an impact on Danish children's acquisition of noun pl according to the principles that transparent forms are acquired before opaque ones and frequent forms are acquired before infrequent ones. Given our hypotheses and the above research questions, we make specific predictions, summarized in (3)–(6) at the end of this section, which we will seek to test in the present study. The predictions concern the acquisition of pl suffix (P1), pl stem (change) (P2) and pl marker (P3). Since the pl marker ‘ø’ (‘pure zero’, i.e. pl = sg) is special, a fourth prediction (P4) on pure zeroes is added.
A suffix which is subject to reduction – dropping or assimilation, according to phonological rules – contributes to opacity, i.e. makes the form less transparent, and is thus predicted to be acquired later than one which is not. Since the e-schwa suffix, but not the a-schwa suffix, is often reduced – dropped or assimilated (e.g. tov [tʌw] ‘rope’ – tove [ltʌw] ‘ropes’ rather than the distinct pl form [ltʌwə]) – we expect the a-schwa suffix to be acquired earlier than the e-schwa suffix. a-schwa is by far the most frequent pl suffix in the Danish lexicon, followed in frequency by zero and e-schwa suffix (Basbøll et al. Reference Basbøll, Kjærbæk and Lambertsen2011). We therefore predict the a-schwa suffix to be the most frequent pl suffix in child language input, accordingly also in child language output, followed in frequency by zero and e-schwa suffix. Furthermore, we expect the a-schwa suffix to have the highest number of correct responses and we expect it to be overgeneralized in both Task 1 and Task 2.
pl formation which involves change in the sequence of phonemes in the stem contributes to opacity and is thus predicted to be acquired later than pl formation with no stem change or with only deletion of a stem final e-schwa or change in word prosody. The rationale behind this prediction is that a prosodic pattern is a less inherent part of a lemma than its constituent segmental phonemes, in particular consonants.Footnote 10 Since pl formation with change of phonemes in the stem has less transparency than pl formation without such a change, pl formation with umlaut, r-insertion and n-insertion should be late, as opposed to pl forms with other (i.e. prosodic) or no stem changes. no change is the most frequent ‘stem change condition’ in the Danish lexicon, followed in frequency by prosodic change and then by phonemic change (Basbøll et al. Reference Basbøll, Kjærbæk and Lambertsen2011). We therefore predict no change to be the most frequent ‘stem change condition’ in child language input and output, followed by prosodic change, which we expect to be more frequent than phonemic change. Moreover we predict the stem change error direction to go from phonemic change to prosodic change to no change, rather than in the opposite direction.
Productivity is in this study defined as the ability to add the pl marker to a new noun in order to create a pl form of this noun (see Section 1.3 above). We therefore predict Danish children to produce more correct pl forms of the fully productive pl markers than of the semi-productive, and more semi-productive than unproductive pl markers, in Task 1 and Task 2. We predict the error direction in both tasks to go from unproductive to semi-productive to fully productive pl markers. fully productive pl markers are the most frequent pl markers in the Danish lexicon, followed by the semi-productive and then by the unproductive pl markers (Basbøll et al. Reference Basbøll, Kjærbæk and Lambertsen2011). Therefore, we expect the fully productive pl markers to be the most frequent in child language input and hence also in child language output, followed by the semi-productive and then by the unproductive pl markers. Furthermore, we predict the Danish children to produce more correct pl forms of the fully productive pl markers than of the semi-productive, and more correct pl forms of the semi-productive than of the unproductive pl markers, in both Task 1 and Task 2; we predict the error direction to go from unproductive to semi-productive to fully productive pl markers in both tasks. Forms of pl dominant nouns are expected to be rote learned and thus to be acquired early compared to pl forms of sg dominant nouns.Footnote 11
Nouns taking the pl marker ‘ø’ (pure zeroes, sg = pl, e.g. mus ‘mice’ pronounced just like mus ‘mouse’) have high iconicity of the stem – not of the pl marker – in the sense that the identity of the stem is completely transparent, whereas the number is completely opaque in the isolated noun form. Reduction processes often make pl forms sound nearly like pure zeroes when the pl and sg are only slightly different. This leads us to predict overgeneralization of the pl marker ‘ø’, that is, we expect sg instead of pl to be a frequent error type among Danish children. Pure zero is not a very frequent morphological category in the Danish noun pl system, but since this category contains many child-relevant nouns, e.g. sko ‘shoe’, øre ‘ear’, ben ‘leg’, tog ‘train, is ‘ice cream’, får ‘sheep’, mus ‘mouse’ (see also Appendix Table A1), we expect it to be a frequent category in spontaneous child language input and output.
(3) P1: pl suffixes
a. The a-schwa suffix will be acquired earlier than the e-schwa suffix.
b. The a-schwa suffix will be the most frequent suffix in child language input and output followed by the zero and e-schwa suffixes.
(4) P2: pl stems
a. pl forms with umlaut, r-insertion and n-insertion will be acquired later than pl forms with other (i.e. prosodic) or no stem changes.
b. No change will be the most frequent ‘stem change condition’ in child language input and output, then comes prosodic change and last phonemic change.
c. The stem change error direction will go from phonemic change to prosodic change to no change, rather than the opposite direction.
(5) P3: productivity of pl markers
a. Danish children will produce more correct pl forms of the fully productive pl markers than of the semi-productive, and more semi-productive than unproductive pl markers, in Task 1 and Task 2.
b. Fully productive pl markers will be the most frequent in child language input and hence also in child language output, then come the semi-productive and last the unproductive pl markers.
c. The error direction will go from unproductive to semi-productive to fully productive pl markers in both tasks.
d. Forms of pl dominant nouns will be rote learned and thus acquired early compared to pl forms of sg dominant nouns.
(6) P4: pure zeroes (pl = sg)
a. The children will overgeneralize the pl marker ‘ø’, i.e. sg instead of pl will be a frequent error type among Danish children.
b. The pl marker ‘ø’ (sg = pl) will be a frequent category in spontaneous child language input and output.
2. METHOD AND EMPIRICAL DATA
All children in the study were monolingual Danish-speaking children with no detected developmental or linguistic problems.
2.1 Dictionary data
The dictionary data come from the OLAM-search database consisting of about 43,000 lexical entries based on Gyldendals røde ordbog: Dansk udtale (Molbæk Hansen Reference Molbæk Hansen1990) and include morphological and phonological information (see note 5).
2.2 Reported data
The reported data are based on an adaption of an American instrument, the MacArthur–Bates Communicative Development Inventory (CDI; Fenson et al. Reference Fenson, Marchman, Thal, Dale, Reznick and Bates1993). This is a checklist which is completed by parents about their children's early communicative development. The CDI instrument consists of the two CDI reports, CDI: ord og gestikulation ‘CDI: Words and gestures’ (for children between the ages of 0;8 and 1;8 years, both perception and production) and CDI: ord og sætninger ‘CDI: Words and sentences’ (for children between the ages of 1;4 and 3;0 years, production only). In the CDI report CDI: ord og sætninger, parents are asked about their child's production of noun pl. In Section II B, the parents are asked to mark those of the nine pl forms listed (with semi-productive or unproductive pl markers) which their child uses: børn ‘children’, fødder ‘feet’, (flere) får ‘sheep (pl)’, heste ‘horses’, hunde ‘dogs’, (flere) mus ‘mice’, mænd ‘men’, skibe ‘ships’, (flere) sko ‘shoes’. In Section II C, the parents are asked to mark those of the 21 overgeneralizations supplied (pl error forms) of the seven inflected nouns listed, partly identical to the nine nouns just mentioned, which their child uses: børn, fødder, mænd, mus, sko, tænder ‘teeth’, tæer ‘toes’ (five with umlaut and two pure zeroes). Since the CDI data are in a written checklist format, it is not possible to study the children's pronunciation of the pl forms or to register prosodic stem change, like stød drop or stød addition; the distinction between no change and prosodic change is therefore not relevant for this kind of data.
Our corpus of reported data consists of the following:
- (7)
a. Cross-sectional CDI data from 6,112 randomly selected Danish children between the ages of 0;8 and 3;0 years (see Bleses et al. Reference Bleses, Vach, Wehberg, Faber and Madsen2007; Bleses et al. Reference Bleses, Vach, Slott, Wehberg, Thomsen, Madsen and Basbøll2008a, b).
b. Longitudinal CDI data from 182 randomly selected Danish children between the ages of 0;8 and 2;5 years (see Wehberg et al. Reference Wehberg, Vach, Bleses, Thomsen, Madsen and Basbøll2007, Reference Wehberg, Vach, Bleses, Thomsen, Madsen and Basbøll2008).
c. Longitudinal CDI data from two twin pairs (the same as those mentioned in Section 2.3; see Kjærbæk Reference Kjærbæk2013 for a more detailed description):
i. a fraternal girl/girl twin pair (Ingrid and Sara) between the ages of 0;10 and 2;7
ii. a girl/boy twin pair (Cecilie and Albert) between the ages of 0;11 and 2;5
2.3 Naturalistic spontaneous child language input and output
Our corpus of naturalistic spontaneous child language input and output consists of the following:
- (8)
a. Data from the Odense Twin Corpus (OTC) (Basbøll et al. Reference Basbøll, Bleses, Cadierno, Jensen, Ladegaard, Madsen, Millar, Sinha and Thomsen2002; see Kjærbæk Reference Kjærbæk2013 for a detailed description). The subpart used here consists of data from the two twin pairs described in (7c) above.
b. Data from the Danish Plunkett Corpus (DPC; Plunkett Reference Plunkett1985, Reference Plunkett1986), which consists of two singletons:
i. a girl (Anne) between the ages of 1;1 and 2;11
ii. a boy (Jens) between the ages of 1;0 and 3;11
The corpus is based on video and audio recordings of children interacting with their families in naturalistic settings – playing and dining situations – in their own home. The input is a mixture of child directed and adult directed speech, though the child is always present. The data are transcribed orthographically using the Child Language Data Exchange System (CHILDES) (MacWhinney Reference MacWhinney2000a, b) and coded morphologically and phonologically (according to the standard pronunciation) in OLAM (see note 5).
Table 2 shows the size of the corpus in raw numbers, with regard to word tokens, (word form) types and lemmas as well as noun tokens, noun types and noun lemmas. The input consists of 180,360 and the output of 40,987 word tokens.
2.4 Task 1: Elicitation through semi-structured interviewsFootnote 12
Task 1 is a semi-naturalistic picture-based elicitation task formed as semi-structured interviews focusing on familiar routines. An investigator showed the child five pictures of e.g. a trip to the zoo and a birthday party while asking the child prepared questions for maximal elicitation of pl nouns, e.g. Hvad ser du når du går i zoologisk have? ‘What do you see when you go to the zoo?’. All recordings are transcribed orthographically in CHILDES and coded morphologically and phonologically – according to the standard pronunciation – in OLAM. All nouns are furthermore transcribed phonetically according to the actual pronunciation of the child.
Eighty children (41 girls, 39 boys) between the ages of three and nine years participated in Task 1. They all either attended kindergarten or primary school in a neighborhood in Odense with a middle/high socioeconomic population. Participants were divided into four age groups, consisting of 20 children each, with an almost equal number of boys and girls in each age group: three-year-olds (median age 3;5), five-year-olds (median age 5;4), seven-year-olds (median age 7;5) and nine-year-olds (median age 9;3). The children participating in Task 1 also participated in Task 2. Table 3 shows the size of the data. The whole dataset consists of 22,139 word tokens. The number of produced words and nouns increases from the three-year-olds to the seven-year-olds, then it decreases from the seven- to the nine-year-olds.
* Only common nouns are included (i.e. proper nouns are excluded), noun compounds are treated as distinct noun types.
2.5 Task 2: A picture-based elicitation task
Task 2 is a picture-based elicitation task inspired by Jean Berko's study on both real words and pseudo-words (Berko Reference Berko1958). Task 2 is based only on real words. The test material consists of 48 stimulus items. A complete list of the test items is given in Appendix Table A2, including information on pl marker, standard pronunciation and token frequency in child language input and output. The selected items were all easily imageable. In order to exclude rote learned pl forms, only nouns with low pl token frequency (N < 10 out of approximately 14,000 nouns) were chosen, with five exceptions (fingre ‘fingers’ (N = 81), mænd ‘men’ (N = 20), stole ‘chairs’ (N = 16), æbler ‘apples’ (N = 13), and øjne ‘eyes’ (N = 33)). Only items with an overt pl marker were included in the test, i.e. pure zeroes (e.g. mus [muːʔs] ‘mouse’ – mus [muːʔs] ‘mice’) were excluded because of the difficulty of distinguishing zero pl production from repetition of the sg form in a pl elicitation task. Since the pl suffixes /s/, /a/ and /i/ are very rare in child language, they have not been included in the experiment.
Children were tested orally and individually in their kindergarten/school. Each child was presented with a picture of an object whose name is an sg noun (e.g. bil ‘car’), and the investigator said: Her er en bil ‘Here is a car’. Then a second picture, of two instances of the same object, was shown to the child, and the investigator asked: Her er to hvad? ‘Here are two what?’, and the child's task was to provide the respective pl form. Test items were presented in different orders and were preceded by three training items.
A group of 160 children between the ages of three and ten years participated in Task 2. They had the same background as the participants in Task 1. Participants were divided into eight age groups, consisting of about twenty children each, with an almost equal number of boys and girls in each age group: three-year-olds (median age 3;5), four-year-olds (median age 4;8), five-year-olds (median age 5;4), six-year-olds (median age 6;6), seven-year-olds (median age 7;5), eight-year-olds (median age 8;5), nine-year-olds (median age 9;3), and ten-year-olds (median age 10;1). The 80 children in the three-, five-, seven- and nine-year age groups participating in Task 1 also participated in Task 2. The 80 children in the four-, six-, eight- and ten-year groups were recruited especially for Task 2. The two experiments were run in sequence, and when a child participated in both experiments, Task 2 was always first.
The children's responses were coded in OLAM by two independent researchers using a predetermined set of eight categories: (A) ‘Inaudible’; (B) ‘Other word/form’ (the child provided a lexical item or a morphological form other than the one intended, e.g. pige-r ‘girls’ instead of the plsøstr-e ‘sisters’); (C) ‘No answer’; (D) ‘sg’ (the child repeated the sg form given by the investigator); (E) ‘En/et ‘a/one’ + sg’ (the child repeated the sg form given by the investigator in an sg context); (F) ‘To ‘two’ + sg’ (the child repeated the sg form given by the investigator in a pl context); (G) ‘pl provided’; (H) ‘Missing’. The responses in category G were further coded in terms of correct or incorrect pl provided, correct or incorrect pl suffix provided, correct or incorrect pl stem provided, and coded phonologically for different types of errors regarding stem change and suffix.
2.6 Statistical analysis
We analysed the influence of pl suffix, stem change, and productivity of the pl marker, respectively, on the children's ability to produce the correct plural forms. This was done by multiple logistic regression controlling for age as well as the interaction of age and pl suffix, and token frequencies of pl and sg. The interaction as well as linearity of the covariates was tested using a Wald test. Age was treated as a discrete variable with values 3, 4, . . ., 10. The analyses for suffix and stem change were conducted in a similar manner, and these display the same overall picture. We only show the main effects from the fully adjusted models.
3. RESULTS
3.1 Noun plurals in the Danish lexicon and in child language input and output
The distribution of pl markers in child language input differs from the one seen in the Danish lexicon (see Appendix Table A1, see also Kjærbæk Reference Kjærbæk2013). The pl markers ‘ɐ’ and ‘ɐ+’ have higher lexical frequencies (35.4% and 20.1%, respectively) than input lemma frequencies (22.1% and 4.4%, respectively), whereas the opposite applies for the pl marker ‘(ə)ɐ’ which has a lexical frequency of 13.1% and an input lemma frequency of 28.9%. The pl marker ‘ø’ (pure zero, sg = pl), likewise, has a higher frequency in child language input (17.7%) than in the Danish lexicon (6.1%).
Generally the distribution of pl markers is similar in child language input and output (see Appendix Table A1). The main difference is the fully productive pl marker ‘(ə)ɐ’, which is extremely frequent in child language output, and this leads to lower relative frequencies for the other pl markers. The pl marker ‘ɐ’ is the most frequent pl marker in Task 1 (25.7%), followed by ‘ø’ (23.7%) and ‘(ə)ɐ’ (21.3%). The remaining pl markers are not very frequent (0%–10.2%). This is not the exact pattern we found in spontaneous child language output where ‘(ə)ɐ’ clearly was the most frequent pl marker (token frequency 36.5%; see Appendix Table A1).
3.2 Semi-productive and unproductive pl markers in reported data
Figure 1 illustrates the results of the section in the CDI report Words and Sentences where parents are asked to mark those of the nine pl nouns listed that their child uses; the noun's pl marker is in parentheses. Figure 1 indicates that the pl form børn ‘children’ (øu) has the highest score, followed by hest-e ‘horses’ (ə), sko ‘shoes’ (ø), fødd-er ‘feet’ (ɐu), hund-e ‘dogs’ (ə–) (i.e. ə-suffix and stød drop), skib-e ‘ships’ (ə–), mus ‘mice’ (ø), får ‘sheep’ (ø) and mænd ‘men’ (øu) with the lowest score. pl dominant nouns (with more than 70% pl forms) score high, regardless of their pl marker, viz. børn (øu), sko (ø), whereas sg dominant nouns (with more than 70% sg forms) score low (except hest-e).
3.3 Correctly produced pl suffixes in Task 2
Figure 2 illustrates the proportion of correctly produced pl suffixes in Task 2. We see that the number of correctly produced pl suffixes increases with age. The younger children produce more correct pl suffixes when it comes to nouns taking the a-schwa suffix than nouns taking the e-schwa suffix, but this difference vanishes around the age of six (start of pre-school).
As shown in Table 4, we maintain the general picture of Figure 2 after adjusted analysis. The odds for producing a correct pl form of nouns taking the e-schwa suffix are 50% lower than producing the a-schwa suffix, and the odds for producing a correct pl form of nouns taking the zero suffix are 75% lower than producing the a-schwa suffix. Compared to age three, the odds rise for each age group: 4–5 years Odds ratio = 2; 6–7 years Odds ratio = 4; 8, 9, and 10 years Odds ratio = 13, 16, and 18, respectively. Since the zero suffix category is here limited to forms with phonemic change, which are exceptional in the system, it is not representative for the large number of pure zeroes. The a-schwa and the e-schwa suffixes approach each other with age.
3.4 Correctly produced pl stems in Task 2
Figure 3 shows the proportion of correctly produced pl stems in Task 2. The stem changes (including no stem change) seem to be acquired in roughly the following order: (i) no stem change (e.g. bil [biːʔl] ‘car’ – biler [ˡbiːʔlɐ] ‘cars’); (ii) syncope (e.g. gaffel [ˡgɑfəl] ‘fork’ – gafler [ˡgɑflɐ] ‘forks’); (iii) a-quality + vowel length (e.g. sofa [ˡsoːfa] ‘sofa’ – sofaer [ˡsoːfæːʔɐ] ‘sofas’); (iv) stød drop (e.g. ur [uʔ] ‘watch’ – ure [ˡuːɐ] ‘watches’); (v) stød addition (e.g. baby [ˡbɛjbi] ‘baby’ – babyer [ˡbɛjbiːʔɐ] ‘babies’); (vi) umlaut (e.g. mand [manʔ] ‘man’ – mænd [mɛnʔ] ‘men’); (vii) r-insertion (e.g. fætter [ˡfɛdɐ] ‘cousin’ – fætre [ˡfɛdʁɐ] ‘cousins’; and (viii) n-insertion (øje [lʌjə] ‘eye’ – øjne [lʌjnə] ‘eyes’).
It appears that the correctly produced pl stems fall into three categories: (i) no stem change, (ii) syncope, a-quality + vowel length, stød drop and stød addition, which are all prosodic stem changes, and (iii) umlaut, r-insertion, and n-insertion, which are all phonemic stem changes.
Figure 4 shows the proportion of correctly produced pl stems in Task 2 divided into the three degrees of stem change (no change, prosodic change, phonemic change). The children produce very few stem errors in the no change category, followed by prosodic change and phonemic change.
As shown in Table 5, the adjusted analysis gives a picture that corresponds to the crude proportions shown in Figure 4: there are 99% lower odds for producing a correct pl form of nouns exhibiting a prosodic change and 99.9% lower odds for producing a correct pl form of nouns with phonemic change, both compared to nouns with no change. We also compared nouns with prosodic and phonemic change. The odds for producing the correct pl form for nouns with phonemic change is reduced by 91% (p < .001) compared to nouns with prosodic change. The interaction is significant, estimates not shown, thus the effect of stem change changes with age.
3.5 Correctly produced pl forms in Task 2 by productivity of the pl marker
Figure 5 shows the proportion of correctly produced pl forms by age and degree of productivity in Task 2. In the younger age groups, children produce more correct pl forms of nouns taking a fully productive pl marker compared to nouns taking a semi-productive pl marker, but they appear to coincide in the older age groups. On the other hand, unproductive pl markers have a much lower correctness rate in Task 2 compared to the other pl markers.
Table 6 shows the adjusted logistic regression of the outcome ‘correctly produced pl form’. The interaction is significant, estimates not shown, thus the effect of productivity changes with age. The impact for the covariates changes the picture in the adjusted analysis compared to the crude rates presented in Figure 5. We see that the odds for producing the correct pl form are reduced by 42% for items with semi-productive pl markers compared to fully productive pl markers, and by 92% for items with unproductive pl markers compared to fully productive pl markers. With respect to age, the odds increase with older age, especially when reaching school age, compared to the age of three years. Furthermore we observe that the effect of pl and sg token frequencies are somewhat similar in size. Compared to a pl token frequency of 0, we have a 1.9-fold increase in odds for frequencies between 1 and 9, a 3.7-fold increase for frequencies between 10 and 29, and a 2.7-fold increase for frequencies above 30. Compared to an sg token frequency of 0, the increases in odds are 3.4 for frequencies between 1 and 19, 3.3 for frequencies between 20 and 79, and 3.8 for frequencies above 80. Thus for both types of token frequencies something is better than nothing, but more is not necessarily better.
3.6 Incorrect responses
Figure 6 illustrates the produced pl error forms in Task 1 and Task 2. The responses are divided into four categories: (i) correct pl stem + wrong pl suffix, (ii) wrong pl stem + correct pl suffix, (iii) wrong pl stem + wrong pl suffix, and (iv) ø/sg form.Footnote 13 None of the children produced only sg forms. All age groups are collapsed because of the low number of examples in each age group in Task 1. The children produce a high percentage (22%) of correct stems + wrong suffixes in Task 1, but only 8% in Task 2. In both tasks the children produce only between 8% and 15% of the other error categories – except for the sg form, see Section 3.6.1 below. We see, furthermore, from Appendix Table A3 that the highest percentage of pl errors is with the pl marker ‘ɐu’ (50%), followed by ‘ə’ (28.9%), ‘øu’ (27.5%) and finally ‘ɐa+’ (20%). The remaining pl markers have few pl errors (0%–8.3%).
3.6.1 Overgeneralizations to the pl marker ‘ø’ (sg = pl) in Task 1 and Task 2
It can be seen from Figure 6 that the most frequent pl error form in both Task 1 and Task 2 is children producing an sg form instead of a pl form of the noun, i.e. what may be interpreted as an overgeneralization of the pl marker ‘ø’ (pure zero). In Task 1 they amount to 60%; in Task 2 to 64% of all error forms. This result has to be interpreted very cautiously, however, since it is, in the general case, impossible to distinguish an sg form from an incorrect zero pl form.
3.6.2 Overgeneralizations to pl markers other than ‘ø’ in Task 1 and Task 2
Figure 7 illustrates the results from a part of the cross-sectional CDI data where the parents are asked to mark those out of 21 pl error forms which seem similar to the ones that their child has been using lately. They include the pl error forms shown in (9):
- (9)
a. barn – børn ‘children’ (øu, i.e. up): *(flere) barn (*ø, i.e. sp), *barne (*ə, i.e. sp), *børne (*əu, i.e. up), *børner (*ɐu, i.e. up)
b. fod – fødder ‘feet’ (ɐu, i.e. up): *(flere) fod (*ø, i.e. sp), *fodde (*ə, i.e. sp), *fodder (*ɐ, i.e. fp), *fød (*øu, i.e. up), *fødde (*əu, i.e. up)
c. mand – mænd ‘men’ (øu, i.e. up): *(flere) mand (*ø, i.e. sp), *mande (*ə, i.e. sp), *mander (*ɐ, i.e. fp), *mænde (*əu, i.e. up), *mænder (*ɐu, i.e. up)
d. mus – mus ‘mice’ (ø, i.e. sp): *muse (*ə, i.e. sp), *muser (*ɐ, i.e. fp)
e. sko – sko ‘shoes’ (ø, i.e. sp): *skoe (*ə, i.e. sp), *skoer (*ɐ, i.e. fp)
f. tand – tænder ‘teeth’ (ɐu, i.e. up): *tander (*ɐ, i.e. fp)
g. tå – tæer ‘toes’ (ɐu, i.e. up): *tåer (*ɐ, i.e. fp)
According to the cross-sectional CDI data, children use very few of the error forms in the report. As illustrated in Figure 7, the most frequent of the 21 pl error forms are clearly the error forms *fodder (ɐu, i.e. up > *ɐ, i.e. fp) and *skoer (ø, i.e. sp > *ɐ, i.e. fp), which are both inflected with the fully productive pl marker (a-schwa suffix with no phonemic change). The most frequent error form is children choosing a more productive pl marker (a-schwa suffix) over a less productive one (increase of productivity). Children choose pure zero rather frequently too (decrease of productivity), see Section 3.6.1 above. The combination of the e-schwa suffix and umlaut is very rare. This agrees with the fact that, in the adult system, e-schwa is never combined with umlaut. Since the CDI data are based on checklist data, no prosodic stem changes (like stød drop or stød addition) could be registered, and the category ‘ə’ in Figure 7, for example, also represents ‘ə+’ as well as ‘ə–’; the relevant stem change distinction in the CDI data is, thus, with or without phonemic change.
3.6.3 pl error forms of mand ‘man’
Figure 8 and Figure 9 illustrate the distribution of correct pl forms and five error types of the pl noun mand [manʔ] ‘man’ – mænd [mɛnʔ] ‘men’ (øu, i.e. up) in the longitudinal CDI data and Task 2, respectively.
If we compare the error forms of mænd in the cross-sectional CDI data (see Figure 7) to the longitudinal CDI data (see Figure 8), we find exactly the same pattern, namely that *mænder (*ɐu, i.e. up) is the most frequent pl error form of mænd, followed by *mander (*ɐ, i.e. fp), *(flere) mand (*ø, i.e. sp), *mande (*ə, i.e. sp) and last *mænde (*əu, i.e. up).
We see that in Task 2 (Figure 9) the three- and four-year-olds produce very few correct pl forms of mænd, but the number of correctly produced pl forms increases gradually until the age of ten, where 95% of the children produce the correct pl form. Increase of productivity from the pl marker ‘øu’ toward ‘ø’ (*mand, *ø, i.e. sp) is the most frequent error of this pl form in Task 2, followed by *mænder (*ɐu, i.e. up), *mande (*ə, i.e. sp), *mander (*ɐ, i.e. fp), *mænde (*əu, i.e. up). Again we find that the a-schwa suffix is much more frequent than the e-schwa suffix. And the error form *mænde, with umlaut and e-schwa, is very infrequent – it only occurs at age four and six – in agreement with the adult system where umlaut never combines with e-schwa.
The pl noun mænd does not occur in Task 1, and unfortunately it only occurs once in our corpus of spontaneous child speech. What is interesting, however, is that this one occurrence – produced by Jens at the age of 3;11 – is pronounced *mænder, which is the error form with a-schwa suffix and umlaut (*ɐu, i.e. up). Even though this error form has, according to our definitions (see Section 1.3 above), the same degree of productivity of the pl marker as the correct pl form mænd (øu, up), the pl suffix is the overt and very frequent a-schwa and not the covert and less frequent zero suffix. It seems an attractive option to keep the stem change but add the a-schwa suffix. The form mænder (*ɐu, i.e. up) is also the most frequent pl error form of mand in both the longitudinal and cross-sectional CDI data and the second most frequent in Task 2.
4. DISCUSSION
Using a multi-method research approach of comparing results from different kinds of data (see Section 2 above), we have drawn a picture of the development of the noun pl inflectional category in Danish children. pl forms of nouns emerge early, typically around the age of two years or even earlier, but the noun pl inflectional system is still not fully acquired at the age of ten years. In our purely phonetics/phonology-based analyses, we have found striking common patterns across data types.
The large differences in the distribution of pl markers in the Danish lexicon compared to child language input and output are most likely due to the productivity of the pl markers ‘ɐ’ and ‘ɐ+’ which imply that these pl markers are added to new words, including the majority of derivatives and foreign loan words. These derivatives and foreign loan words are used very little in spontaneous speech and especially in child language. We have not been able to identify a clear default pl marker in Danish children's noun pl formation but a number of competing pl markers (see P3 in Section 1.6 above). We now turn to a discussion of each of our predictions P1–P4, established in Section 1.6, in the light of our results.
4.1 P1: pl suffixes
The results of the study show that the pl suffix a-schwa has the highest frequency of correct pl forms in Task 2 for lower age groups (see Figure 2), and the study therefore indicates that the a-schwa suffix is acquired before the e-schwa suffix (and zero suffix).Footnote 14 The results furthermore show that the pl suffix a-schwa is the most frequent pl suffix in naturalistic spontaneous child language output as well as in Task 1 (see Appendix Table A1). Additionally, 18% of all error forms in Task 1 involve increased productivity with the addition of the fully productive a-schwa suffix (Appendix Table A3). These results support our first prediction (P1). It cannot be decided, though, whether the explanation is rooted in sound structure – a suffix which is subject to reduction (dropping or assimilation), viz. e-schwa, is less transparent and thus acquired later than a suffix which is not, viz. a-schwa – or frequency: the a-schwa suffix is by far the most frequent pl suffix in the Danish noun pl system and also in spontaneous naturalistic child language input (see Appendix Table A1). The present study thus gives two possible explanations for the fact that the a-schwa suffix seems to be acquired before the e-schwa and the zero suffix, but it will take further studies to approach a definitive answer.
4.2 P2: pl stems
The study indicates that the proportion of correctly produced pl stems goes from transparent stems (pl stems identical to the sg stems) to opaque stems (pl stems different from the sg stems). The stem changes seem to fall into the following three categories: (i) no change, (ii) prosodic change (syncope, a-quality + vowel length, stød drop, stød addition), and (iii) phonemic change (umlaut, r-insertion, n-insertion), see Figure 3. The highest frequency of correctly produced pl stems occurs in the no change category, followed by prosodic change and then by phonemic change, see Figure 4. The study thereby suggests that stem change delays acquisition. The pl forms with stem change in the sequence of phonemes (r-insertion, n-insertion, umlaut) seem to be acquired later and thus seem more difficult than forms with no stem change or with more transparent stem changes, i.e. change in word prosody only. This explanation is related to sound structure, and agrees with earlier studies which indicate that it is easier for the child to segment morphologically and phonologically transparent pl markers, such as a pl suffix added to the stem with no change in the phonological form of the stem, than it is to segment pl markers with stem change (Peters & Menn Reference Peters and Menn1993, Dressler Reference Dressler, Kail and Hickman2010). Earlier studies have indicated that morphological transparency (higher salience) plays a role in error direction (Laaha et al. Reference Laaha, Ravid, Korecky-Kröll, Laaha and Dressler2006).
However, frequency may also have an impact: no change is by far the most frequent category in Danish, with regard to lexical frequency as well as token, (word form) type and lemma frequencies in both naturalistic spontaneous child language input and output (see Appendix Table A1), prosodic change is the second most frequent category, and phonemic change is the least frequent category.
Analysis of error direction showed that the majority of all overtly marked overgeneralizations (i.e. pl error forms with stem change and/or suffix addition) result either from shifts from less productive pl markers towards more productive ones or from competition between pl markers of the same degree of productivity. no change is overgeneralized in 16% of all pl error forms in Task 1, prosodic change in 7%, whereas phonemic change is not overgeneralized at all (see Appendix Table A3). This is in agreement with our second prediction (P2) and in accordance with earlier studies on other languages, such as German (e.g. Laaha et al. Reference Laaha, Ravid, Korecky-Kröll, Laaha and Dressler2006, see also Peters & Menn Reference Peters and Menn1993).
4.3 P3: Productivity of pl markers
In our study we found a correlation between productivity of the pl marker and correctly produced pl forms in Task 2. The children produce most correct pl forms of nouns taking a fully productive pl marker, then come semi-productive and last unproductive pl markers. This agrees with earlier studies which indicate that productivity has an impact on the number of correct scores, that is – in the terminology of Laaha et al. Reference Laaha, Ravid, Korecky-Kröll, Laaha and Dressler2006 – fully productive and productive pl patterns obtain higher correct scores than weakly productive and non-productive ones.
Cazden (Reference Cazden1968) claims that the clearest evidence for productivity comes from the wrong combination of stem and suffix. Studies have shown that on their way to fully mastering the pl inflectional system, children make overgeneralization errors. A study by Xu & Pinker (Reference Xu and Pinker1995) shows a regularization rate in spontaneous child speech below 5% – where irregular nouns are provided with a regular pl marker, e.g. *foots instead of feet (pl of the sgfoot) – and an irregularization rate below 1% – where regular nouns are provided with an irregular pl marker (e.g. *weed instead of woods, pl of the sgwood). These findings are corroborated by other studies (e.g. Schaner-Wolles Reference Schaner-Wolles and Grünther1988, Clahsen et al. Reference Clahsen, Rothweiler, Woest and Marcus1992).
The results of the present study indicate that the odds of producing the correct pl form increase with older age, especially when reaching school age, compared to the age of three years. The interaction between age and productivity means that the effect of productivity changes with age. The effect decreases from the age of five years to the age of 10 years, and this is particularly clear when we compare semi-productive and fully productive pl markers.
To investigate the effect of token input frequency of the specific pl form we have calculated the percentage of correct answers – out of all 160 answers for each item – and compared it to their token frequency in the input (in OTC and DPC), see Appendix Table A2. We found a correlation between token frequency of the sg nouns as well as for the pl nouns in child language input (OTC and DPC) and correct responses in Task 2. But our analyses showed that for both types of token frequency something is better than nothing, but more is not necessarily better (see Table 6). Furthermore, we found a correlation between (word form) type frequency of the pl marker in child language input (OTC and DPC) and correct pl production in Task 2. This is in accordance with earlier studies which have shown that the token frequency of a specific marker also plays a role for the acquisition rate (Bybee Reference Bybee1995, Dąbrowska & Szczerbinski Reference Dąbrowska and Szczerbinski2006), and that type frequency in the language input to the child plays a role for the ease of acquisition of an inflectional paradigm. Bybee (Reference Bybee1995) argues that the (word form) type frequency of an inflectional marker has an impact on the acquisition of the specific marker since high (word form) type frequency facilitates the identification of the specific marker. Inflectional markers with a high type and token frequency in the linguistic input to the child seem to be acquired earlier than markers with only a high token frequency (Bybee Reference Bybee1995). A study by Laaha et al. (Reference Laaha, Ravid, Korecky-Kröll, Laaha and Dressler2006) did not find any significant effect of frequency values in child language input.
Even though nouns like børn ‘children’ and mænd ‘men’, for example, follow the same inflectional pattern (pl marker øu), the age of acquisition seems to differ significantly (see Figure 1). To find a reason for this we tested these two pl forms in our corpus of child language input. We found that the lemma barn ‘child’, plbørn ‘children,’ is pl dominant (73% pl tokens of børn) and the lemma mand ‘man’, plmænd ‘men’, is sg dominant (76% sg tokens of mand). This supports our prediction that the pl form of pl dominant nouns are rote learned and therefore acquired before the pl form of sg dominant nouns (P3). Gagarina & Voeikova (Reference Gagarina, Voeikova, Stephany and Voeikova2009) found that the first pl forms which children produce are usually pl forms of pl dominant nouns. This agrees with the assumption that children begin by storing morphological patterns of high token frequency (see Section 1.4 above for a definition of different frequency measures). This also suggests that, for example, the sg form of sg dominant nouns (e.g. mouth, sun) is acquired before the pl form of these nouns, and that the pl form of pl dominant nouns (e.g. shoes, feet) is acquired before the sg form of these nouns (Dressler Reference Dressler, Booij, Guevara, Ralli, Sgroi and Scalise2003). The more frequently a linguistic unit occurs, the harder it might be for the child to ignore it, i.e. high frequency of a particular form forces children to attend to a particular linguistic structure earlier than they otherwise might (Demuth Reference Demuth, Gülzow and Gagarina2007).
4.4 P4: Pure zeroes (pl = sg)
The pl marker ‘ø’ (pure zero, sg = pl) is far more frequent in child language input (21.9% tokens) and output (18.4% tokens) than in the Danish noun pl system (7.4% lexical frequency) (see Appendix Table A1).
The most frequent pl error form in both Task 1 and Task 2 is children producing an sg form instead of a pl form of the noun, i.e. overgeneralization of the pl marker ‘ø’ (pure zero) rather than any of the fully productive pl markers. In Task 1 these overgeneralizations of the pl marker ‘ø’ amount to 60% and in Task 2 to 64% of all error forms (see Figure 6). This result has to be interpreted very cautiously, however, since it is, in the general case, impossible to distinguish an sg form from an incorrect zero pl form. But some further results may give us a hint as to the explanation of the many apparent overgeneralizations of the pl marker ‘ø’. In Task 2, 48% of these sg forms are produced out of context, 1% are produced in an sg context (e.g. en bil ‘a car’) whereas 51% are produced in a pl context (e.g. *to bil ‘two car’). A possible explanation for the overgeneralization of the pl marker ‘ø’ (pure zero) may be that the children produce the sg form instead of the pl form generally for certain words, but the situation in the two experiments is different. In Task 2, the children may have trouble with the task and therefore simply repeat the sg form given by the investigator. This cannot be the case in Task 1, though, since the children are not given any sg forms in this task. A cross-linguistic study suggests that this phenomenon is rather exceptional for Danish. Danish children produce many sg forms instead of pl forms in Task 2. German-speaking children produce much fewer sg forms whereas Hebrew and Dutch children produce almost no sg forms (Gillis et al. Reference Gillis, Souman, Dhollander, Molemans, Kjærbæk, Rehfeldt, Basbøll, Lambertsen, Laaha, Bertl, Dressler, Lavie, Levie and Ravid2008). The reason could be that the pl marker ‘ø’ is a very important category in Danish. It exists in German but is not very important there, and it does not exist in Hebrew and Dutch.Footnote 15 Laaha et al. (Reference Laaha, Ravid, Korecky-Kröll, Laaha and Dressler2006) performed a similar experiment with German-speaking children, but only 31% of the error forms of the German-speaking children were overgeneralization of ‘ø’ (pure zero), i.e. roughly half the percentage compared to Danish in our study (Laaha et al. Reference Laaha, Ravid, Korecky-Kröll, Laaha and Dressler2006:293). We believe that the large number of produced sg forms in Danish children is not only due to the high iconicity of the stem (not of the pl marker, recall P4 in Section 1.6 above) when pl equals sg – an explanation which also holds for German. It is also due to the fact that pure zero is an important morphological category in the Danish noun pl inflectional system. It occurs in German but is less important there. A supplementary explanation is that the dropping of the e-schwa suffix often results in a pl form which is almost identical with the sg form, as e.g. tov [tʌw] ‘rope’ – tove [ltʌw], instead of the distinct form [ltʌwə] ‘ropes’; this results in an even higher frequency of pl forms that sound more or less like sg forms.
5. CONCLUSION
This study shows that a suffix which is subject to dropping, and is less frequent in child language input, viz. e-schwa, is acquired later than a suffix which is not subject to dropping, and is more frequent in child language input, viz. a-schwa. The pl stem changes fall into three groups which seem to be acquired in the following order: no change > prosodic change > phonemic change. Thus the stem changes where there is a change in the sequence of phonemes in the stem (umlaut, r-insertion, n-insertion) seem to cause a severe delay in acquisition whereas syncope, a-quality + vowel length, stød drop and stød addition (which only involve prosody – accentuation and syllable structure) have much less impact on the acquisition, especially in the early ages. sg instead of pl is a very frequent error type in Danish children; in fact Danish children seem to use sg as a default. Furthermore, we found that the pl form of pl dominant nouns is acquired early, whereas the pl form of sg dominant nouns is acquired late.
Two issues seem especially important to address in follow-up studies to the present investigation: we need to be able to differentiate between the effects of type and token frequencies by doing experiments with items where these two make different predictions. We would also like to make perception experiments, with both children and adults, on the different forms in the extremely complicated pl system of Danish.
ACKNOWLEDGEMENTS
We are very grateful to the children who took part in the study, to their families, and to the staff of the kindergartens and schools where the study was conducted. We would like to thank Katja Rehfeldt for contributing to the preparation and collection of data from the two tasks, and we gratefully acknowledge Claus Lambertsen's work with the computational tools, Sonja Wehberg for assistance on the CDI data, and Dorthe Bleses for valuable comments on an earlier version of the manuscript. We would also like to thank our international partners Wolfgang U. Dressler and Sabine Laaha (University of Vienna), Steven Gillis (University of Antwerp) and Dorit Ravid (Tel Aviv University). Finally, we want to thank three anonymous reviewers and Associate Editor Gunnar Ólafur Hansson for numerous highly valuable suggestions. This study was supported by The Carlsberg Foundation and by The Institute of Language and Communication, University of Southern Denmark.