Error patterns in young German children's wh-questions*

DANIEL SCHMERSE; ELENA LIEVEN; MICHAEL TOMASELLO

doi:10.1017/S0305000912000104

Error patterns in young German children's wh-questions*

Published online by Cambridge University Press: 28 May 2012

DANIEL SCHMERSE ,

ELENA LIEVEN and

MICHAEL TOMASELLO

Show author details

DANIEL SCHMERSE: Affiliation:
Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
ELENA LIEVEN: Affiliation:
Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany
MICHAEL TOMASELLO: Affiliation:
Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany

Article contents

Abstract
STUDY 1
STUDY 2
Footnotes
References

Rights & Permissions

Abstract

In this article we report two studies: a detailed longitudinal analysis of errors in wh-questions from six German-learning children (age 2 ; 0–3 ; 0) and an analysis of the prosodic characteristics of wh-questions in German child-directed speech. The results of the first study demonstrate that German-learning children frequently omit the initial wh-word. A lexical analysis of wh-less questions revealed that children are more likely to omit the wh-word was (‘what’) than other wh-words (e.g. wo ‘where’). In the second study, we performed an acoustic analysis of sixty wh-questions that one mother produced during her child's third year of life. The results show that the wh-word was is much less likely to be accented than the wh-word wo, indicating a relationship between children's omission of wh-words and the stress patterns associated with wh-questions. The findings are discussed in the light of discourse–pragmatic and metrical accounts of omission errors.

Type: Brief Research Reports
Information: Journal of Child Language , Volume 40 , Issue 3 , June 2013 , pp. 656 - 671

DOI: https://doi.org/10.1017/S0305000912000104 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2012

Children's production of errors in wh-questions provides an interesting test case for theoretical approaches to language acquisition and syntactic development. Construction-based accounts argue that children formulate questions using lexically specific frames (e.g. What is [THING] [PROCESS]? or What does [THING] [PROCESS]?) and that error patterns reflect the item-specific nature of children's wh-questions (Ambridge, Theakston, Lieven & Tomasello, Reference Ambridge, Theakston, Lieven and Tomasello2006). Based on data from Norwegian, a Germanic V2 language, Westergaard (Reference Westergaard2009) argued that some non-target consistent forms such as omission of verbs or wh-words are incompatible with constructivist accounts. According to Westergaard's account, omission errors disconfirm the constructivists' assumption that children primarily rely on frequent input patterns (specific wh-word+verb frames) when formulating interrogatives.

In the current study, we present data from German children's production of wh-questions to investigate whether wh-omission errors occur in lexically specific frames and whether input properties such as prosodic patterns or discourse givenness may influence their production.

In reports of German-learning children's wh-questions it has been found that children omit the utterance initial wh-word especially during early stages of development (Clahsen, Kursawe & Penke, Reference Clahsen, Kursawe and Penke1995; Tracy, Reference Tracy, Tracy and Lattey1994). Interestingly, wh-word omission has also been reported to occur frequently in other Germanic V2 languages such as Dutch (Van Kampen, Reference Van Kampen1997), Swedish (Santelmann, Reference Santelmann, Josefsson, Platzack and Håkansson2004) and Norwegian (Westergaard, Reference Westergaard2009), but, to our knowledge, has not been reported systematically for English. Other examples of errors in German children's early wh-questions include verb doubling (Penner, Reference Penner, Hoekstra and Schwartz1994), subject omission (Hamann, Penner & Lindner, Reference Hamann, Penner and Lindner1998), verb omission (Steinkrauss, Reference Steinkrauss2009) and non-inversion errors (Wode, Reference Wode1975).

However, overall little is known about the frequency with which German children produce these different types of errors in formulating questions. Therefore, we performed a detailed longitudinal analysis of the different types and rates of errors in German children's wh-questions. Our hypothesis was that German children would show high error rates of verb omission and omission of wh-words, as this seems to be the most common error in wh-questions across typologically similar languages.

The second objective of the current study concerned the factors that might explain wh-omission errors. We wanted to know whether wh-omission occurs in lexically specific frames and whether input properties such as prosodic characteristics of caregiver speech may have an influence on error patterns. We hypothesized that if lexical specificity constrains the production of errors, we would not expect lexical overlap between those questions with omission and those without. However, if lexical specificity alone does not constrain the production of errors, additional factors have to be considered in order to explain under what conditions utterance-initial wh-elements are omitted. Two possibilities are that omission errors might be influenced by pragmatic factors (given information in utterance-initial position being omitted more often than new information), and prosodic factors (unstressed elements being omitted more often than stressed).

STUDY 1

METHOD

Speech corpus

All wh-questions were extracted from the longitudinal data of six children from the Szagun corpus (Szagun, Reference Szagun2004), available from the CHILDES database (MacWhinney, Reference MacWhinney2000). The analyzed utterances were only taken from typically developing children. The recordings were made every six weeks during the children's third year of life. The results reported here are based on data collected when the children were between 2 ; 0 and 3 ; 0. The data for Emely were taken from recordings between 2 ; 0 and 3 ; 4 in order to obtain a higher number of utterances. The mean length of utterance (MLU) was calculated in words per utterance by the first author. MLUs for the children ranged between 1·03 and 2·10 at age 2 ; 0 and between 2·16 and 4·31 at age 3 ; 0 (Table 1). We classified all wh-questions according to MLU stages: stage I (1–1·99), stage II (2·0–2·49), stage III (2·5–3·0) and stage IV (>3·0).

Table 1. Total numbers and percentages of errors produced in children's wh-questions

Error coding

The first author coded the following types of errors: wh-omission errors, verb-omission errors, subject-omission errors and non-inversion errors. Verb doubling errors, which have also been reported, were not found in the data.

(a) wh.omission. wh-omission errors were defined as interrogative question structures containing a verb in initial position (their status as questions was determined from the context, including intonation and the interlocutor's response).
1. (1) macht das pferd?
  doing the horse?
  ‘(What) is the horse doing?’ (Ann, 2 ; 5)
(b) verb.omission. wh-questions containing a wh-word but no verb were coded as verb-omission errors.
1. (2) wo die pfanne?
  where the pan?
  ‘Where (is) the pan?’ (Soe, 2 ; 6)
(c) subject.omission. wh-questions containing a wh-word and a verb, but which did not have a subject were coded as subject-omission errors.
1. (3) wo is?
  where is?
  ‘Where is (X)?’ (Lis, 2 ; 4)
(d) non.inversionerrors. Non-inversion errors were determined by position of the finite verb. Since in German wh-question finite verbs occur in second position (i.e. after the wh-word), questions that deviated from the verb-second word order were coded as word order errors.
1. (4) warum eine frau das ist?
  why a woman that is?
  ‘Why is that a woman?’ (Eme, 2 ; 9)
(e) interrogativecontexts. In order to give percentages of error types, we summarized erroneous structures and correct questions as interrogativecontexts. For the purposes of wh-word specific analysis, we furthermore identified two subcategories of interrogative contexts: wo-contexts (‘where’-contexts) and was-contexts (‘what’-contexts). Single-word wh-questions, embedded wh-questions and fragments were excluded from the analysis (e.g. Welche X? (‘Which X?’), Wie bitte (‘Pardon?’), Was für ein X? (‘What kind of X?’), Warum nicht (‘Why not?’), [Wh] denn? (‘[Wh] denn _[particle]?’). We also excluded seventy-seven questions that contained neither a verb nor a wh-word. These structures were only marked with the modal particle denn, which is commonly used in German wh-questions.

Coding reliability

A second rater was trained in error coding by the first author and coded a total of 180 interrogative contexts from all six children (12% of the data). The level of agreement between coders was 97·2% (Cohen's Kappa=0·93, p<0·001, N=180).

RESULTS

Overall error rates

Children produced errors in approximately 30% of their questions. Table 1 shows a clear pattern of these errors. First, the omission of utterance-initial elements (wh-words and verbs) is the most common error for all children. Verb-omission occurred in 15·2% and wh-omission occured in 13·4% of all wh-questions, which is comparable to the findings of Clahsen et al. (Reference Clahsen, Kursawe and Penke1995), who reported a wh-omission rate of 19%. Second, non-inversion errors and subject-omission errors are extremely rare and do not occur in all children. Most of the children's non-inversion errors contained a finite verb in final position. However, there were also four instances in which children produced a non-finite verb in final position, but no finite verb in second position (e.g. Wo der passen? ‘Where this fit?’). Furthermore, there are large individual differences in both the production rate of wh-questions as well as occurrences of particular types of errors. For example, Emely produced the lowest number of wh-contexts, but showed the highest rate of wh-omission errors (62·5%). However, a closer look revealed that 92·5% of Emely's wh-less questions consisted of only two lexical formulas: (Was) ist das? ‘(What) is that?’ and (Wie) heisst der? ‘(How) is he called?’, each of which occurred in a single but different recording session.

Figure 1 shows the distribution of error rates across MLU stages. The omission of wh-elements and verbs is a phenomenon that is particularly characteristic of the MLU stages I and II. One reason for the decline in error rates might be that omission of elements is bound to specific lexical frames that constitute the majority of question constructions in early phases, but the proportion of these decreases as children acquire more types of wh-questions.

Fig. 1. Error rates for wh-omission, verb-omission and non-inversion errors across MLU stages.

We compared the use of verb types in wh-overt and wh-less questions across MLU stages. Figure 2 shows the mean number of different verb types produced at each MLU stage. The results indicate that an increase in the use of verb types was seen only for wh-overt questions, but not for constructions in which the wh-word was omitted. A lexical analysis showed that children omit wh-words only with a limited set of verbs. All wh-omission errors occurred with one of the following eight verbs (including their inflectional forms): machen (‘do’), passen (‘fit’), kommen (‘come’), gehören (‘belong’), haben (‘have’), heissen (‘be called’), gehen (‘go’), and the copula. Furthermore, 78% of all wh-omission errors occurred in a set of five lexical frames: (Was) ist das ‘(What) is that’ (N=97), (Wie) heisst der/die/das ‘(How) is he/she/it called’ (N=28), (Was) machst du ‘(What) are you doing’ (N=17), (Was) macht der/die ‘(What) is he/she doing (N=8)’, (Wo) kommt das hin ‘(Where) does this go’ (N=7). What these frames have in common is that they contain a pronoun in subject position. Next, we checked whether wh-less questions and wh-overt questions differ with respect to lexical specificity in the verb position and the subject-NP for each child individually.

Fig. 2. Mean number of verb types used in wh-less and wh-overt questions across MLU stages. Error bars show standard errors.

Verb use and type of subject-NP

From the sample of wh-overt questions, we extracted all non-subject questions and all instances of the construction [WH COP NP] and coded how many different verb types occurred in the position following the wh-word and whether the subject was realized pronominally or as a full NP. We applied the same coding procedure for wh-less questions.

Table 2. Total numbers of verb types and type of subject-NPs in wh-questions and wh-less questions

We found that all children use more verb types in overt wh-questions than in wh-less questions (see Table 2). For Ann, Fal, Lis and Soe this difference was more pronounced than for the other two children (Eme and Rah), who also showed the lowest production rate of interrogative contexts overall. However, although children tended to produce fewer verb types in wh-less questions, these verbs were not restricted to wh-less questions. Four of six children (Ann, Fal, Lis and Soe) used all verbs from their wh-less questions in wh-overt questions as well. Rahel produced two verb types in both structures (copula, kommen ‘come’) and Emely three verb types (copula, heissen ‘be called’, gehören ‘belong’). With respect to subject-NP type, children predominantly used pronoun subjects in wh-questions as well as in wh-less questions (71·4% and 91·3%, respectively). However, the type of subject realization did not distribute equally over the two structures with pronouns being more frequent in wh-less questions (χ²(1, N=1152)=28·59, p<0·001).

Omission as complexity reduction?

Bloom (Reference Bloom1990) proposed that the omission of sentential elements could be explained as a general cognitive strategy to reduce the complexity of an utterance. He found that sentences with longer verb phrases (VP) tend to be produced less often with a subject than sentences with shorter verb phrases. Therefore, we tested whether such a VP length effect could also be found in the case of omission errors in wh-questions. The MLU for wh-questions was 4·28, and 2·93 for wh-less questions. Subtracting the wh-word from every overt wh-question yielded a VP length of 3·28, a number that was still significantly larger than the MLU of wh-less questions (t(293)=4·54, p<0·001). Therefore we conclude that children do not omit the wh-word in order to reduce the length of the utterance. This argument is further supported by the fact that 96% of all wh-less questions contain a semantic ‘light’ verb (so they are not more semantically difficult either).

wh-specific errors

When considering errors in children's language we must always note whether the incidence of errors patterns uniformly across different types of lexically specific structures. To answer this question, we analyzed the rates of omission errors in wo-contexts and was-contexts. Overall, there were 750 wo-contexts and 614 was-contexts, which together constituted 90·7% of all interrogative contexts.

Table 3 shows the number of omission errors for was-contexts and wo-contexts. The rates of verb-omission are very similar in both contexts, with 16·6% of all wo-contexts (n=124) and 15·1% of all was-contexts (n=93) missing a verb. Although it is not possible to determine which verb the child intended to produce, it seems that in the vast majority of cases it is the copula that has been dropped. It should be pointed out that in was-contexts the rate of copula omission might be overestimated. This is because the singular form of the copula ist and the wh-word was are often reduced into one contracted form for which it becomes hard to distinguish whether the copula is present or not. But the key finding is that a large difference in error rates was found for the omission of different wh-words. The data shows that the wh-word was is significantly more likely to be dropped than the wh-word wo (χ²(1, N=387)=75·13, p<0·001). The wh-word is missing in only 3·1% of all wo-contexts (n=23), compared to 23·9% of all was-contexts (n=147).

Table 3. Children's omission errors for different wh-contexts (Szagun corpus)

DISCUSSION

Summarizing the results, we find a clear pattern of errors that young German-learning children produce when forming wh-questions. First, omission of utterance-initial elements such as the wh-word or the verb in second position can be identified as the main source of error. Taken together, in 28·6% of all interrogative contexts, either the wh-word or the verb is omitted. Second, non-inversion errors constitute a rare phenomenon. Interestingly, similar low rates of word order errors have also been reported for other V2 languages (for Swedish, see Hansson & Nettelbladt, Reference Hansson and Nettelbladt2006).

An analysis at the lexical level revealed that wh-omission errors are restricted to particular lexical items. First, children mostly drop the wh-word was but preserve the wh-word wo. Second, wh-omission errors occur only with a small set of verbs throughout all MLU stages. Third, almost all wh-less questions contain a pronoun subject following the verb, indicating that the referent of the subject-NP is given in the discourse or even present in the interaction. Finally, the majority of children used verbs occurring in wh-less questions in wh-overt questions as well, suggesting that additional factors other than lexical specificity must be involved in the omission of wh-words.

Notice that the lexical pattern of wh-less constructions has important implications for the prosodic structure of these frames. In our data, the vast majority of wh-less questions have the lexical form [(WH) VERB Pn]. According to Lambrecht (Reference Lambrecht1994), in constructions of the form [WH VERB NP], NPs whose referents have already been established in discourse (as indicated by their pronominal form) are unlikely to receive an accent. The sentence accent in these constructions therefore falls onto the verb, giving wh-less questions a strong–weak stress pattern. It is important to point out that this is true for semantically light as well as semantically heavy verbs, since utterances must have at least one accent to be informative (Lambrecht & Michaelis, Reference Lambrecht and Michaelis1998).

These information structure considerations as well as the observation that lexical specificity alone cannot explain wh-omission point to the possibility that omission errors might result from a prosodic constraint, as proposed by Gerken (Reference Gerken1991; Reference Gerken1994). According to Gerken's account children tend to omit weakly stressed elements and favour the production of strong–weak sequences over weak–strong sequences. We investigated this hypothesis in a second study by analyzing the prosodic characteristics of different types of wh-questions in German child-directed speech (CDS).

STUDY 2

Gerken (Reference Gerken1991) proposed a metrical account to explain English children's omission of sentential subjects. According to her hypothesis, the speech production system of English-learning children around the age of two is influenced by a prosodic constraint favouring strong–weak sequences over weak–strong sequences. More precisely, children tend to omit weakly stressed syllables that cannot be parsed into a trochaic foot. Furthermore, Gerken (Reference Gerken1994) has shown in an imitation experiment that weak syllables in utterance-initial position are more likely to be dropped than weak syllables in utterance-internal or -final position. Since German, like English, shows a predominant trochaic stress pattern in multisyllabic words, it is likely that young German-learning children operate with similar production constraints. We reasoned that if the prosodic characteristics of the ambient language influence children's production, the different omission rates of wh-words in was- and wo-questions might be traced to specific stress patterns associated with wh-questions in German child-directed speech.