This study explores grammatical error detection, correction, and repetition ability in typically developing (TD) Swedish-speaking 10-year-olds. These tasks are commonly used as measures of explicit language knowledge, or metalinguistic ability (Bialystok, Reference Bialystok2001; Bialystok & Ryan, Reference Bialystok and Ryan1985; Kuo & Anderson, Reference Kuo and Anderson2008). Most definitions of metalinguistic ability converge on the notion that it involves awareness, reflection, attention, and sometimes manipulation of aspects of language form, such as grammar or phonology, but definitions vary across studies and researchers (Friesen & Bialystok, Reference Friesen and Bialystok2012; Gombert, Reference Gombert1992; Kuo & Anderson, Reference Kuo and Anderson2008; van Kleeck, Reference van Kleeck1994). Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017) have reported error detection accuracy results from the same group of TD children as in the current study, as well as a group of children with developmental language disorder (DLD). The results showed that error detection performance was affected by verb frequency (for past-tense errors) and by noun gender (for article omission errors): higher frequency was associated with increased error sensitivity. The authors suggested that these frequency effects implied that implicit language knowledge affects performance, rather than performance being a result of explicit reflection on morphosyntactic rules. This, in turn, challenged previous studies that have used the error detection tasks as a measure of explicit metalinguistic skills (see, e.g., Miller, Leonard, & Finneran, Reference Miller, Leonard and Finneran2008; Montgomery & Leonard, Reference Montgomery and Leonard1998; Noonan, Redmond, & Archibald, Reference Noonan, Redmond and Archibald2014; Rice, Wexler, & Redmond, Reference Rice, Wexler and Redmond1999). The main aim of the present study is to build on the results from our earlier study by investigating whether frequency effects can also be found in error correction and repetition tasks, and by comparing the results from all three tasks across the same participants. Reaction time results from the error detection task will also be reported in the current paper, as error detection accuracy results in Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017) for some of the children with TD approached ceiling.
The aspects of input frequency of particular interest in the current study are token frequency and type frequency, corresponding to how often a specific word occurs in the input and how many distinct words can be used in a specific morphosyntactic construction. This terminology comes from emergentist frameworks of language processing and learning (Bybee, Reference Bybee1985, Reference Bybee2010). There is mounting evidence that children are influenced by both type and token frequency when they learn, produce, and process grammatical patterns such as word order, transitive and sentential complement sentences, and verb morphology (see, e.g., Kidd, Lieven, & Tomasello, Reference Kidd, Lieven and Tomasello2006; Marchman, Reference Marchman1997; Matthews, Lieven, Theakston, & Tomasello, Reference Matthews, Lieven, Theakston and Tomasello2005; Ragnarsdottir, Gram Simonsen, & Plunkett, Reference Ragnarsdottir, Gram Simonsen and Plunkett1999). In general, high-frequency words and grammatical patterns have been shown to facilitate language processing, which emergentist researchers argue indicates stronger linguistic representations and more accurate linguistic expectations (for a review, see, e.g., Lieven, Reference Lieven2010).
MODELS OF METALINGUISTIC ABILITY
Some researchers, for example, Gombert (Reference Gombert1992), argue that the term metalinguistic awareness should be preserved for tasks where it is established that intentional and explicit reflection is utilized. In a model suggested by Gombert, there are three stages in metalinguistic development. In the epilinguistic stage, linguistic knowledge is applied more or less automatically, for example, when communicative breakdowns happen and the speaker attempts to fix it. This is part of typical language development, precedes the development of metalinguistic awareness, and often serves a pragmatic function. The subsequent development of metalinguistic awareness is optional, and involves explicit awareness and reflection on language form. After metalinguistic awareness has developed, the individual may acquire what Gombert calls automatic metaprocesses. This is when some of the previously conscious and explicit metalinguistic processes have become automatic. Even though both epilinguistic ability and automatic metaprocesses are implicit processes, the difference is that the latter can be replaced by explicit metalinguistic awareness by drawing attention to the task. In other words, Gombert suggests that tasks thought to measure (explicit) metalinguistic awareness sometimes may be solved without conscious reflection.
An influential model of metalinguistic ability, which has attempted to operationalize this graded nature of conscious reflection in different tasks, was proposed by Bialystok and Ryan (Reference Bialystok and Ryan1985; see also Bialystok, Reference Bialystok2001; Friesen & Bialystok, Reference Friesen and Bialystok2012; Ricciardelli, Reference Ricciardelli1993). They defined metalinguistic abilities as a combination of language analysis and cognitive control, two equally central and interacting abilities. The dimension of analysis spans from intuitive and implicit language knowledge (e.g., the use of grammar or phonology in conversation) to explicit and objective language knowledge (e.g., using literacy-specific rules while reading or writing). Cognitive control involves executive functioning, including working memory skills, attention, inhibition, and monitoring, is responsible for selection and integration of information, and is related to the processing demands of the task (Friesen & Bialystok, Reference Friesen and Bialystok2012; Ricciardelli, Reference Ricciardelli1993). These two dimensions are useful when describing different tasks demands as well as metalinguistic development. For example, early awareness of language form, such as self-correcting errors in speech, demands quite low levels of control, while still demanding some analysis (corresponding to epilinguistic ability in Gombert’s terminology).
In summary, metalinguistic ability is a complex construct that develops over time. Both the analysis and control framework and Gombert’s model acknowledge that not all metalinguistic situations carry the same demands of explicit awareness, and that the amount of explicit metalinguistic awareness employed in different tasks might vary within the individual.
Error detection
In grammatical error detection tasks participants hear grammatical and ungrammatical sentences and have to decide if the sentence is correct or incorrect. Error detection tasks have been used to measure both explicit metalinguistic awareness and implicit language processing (Clahsen, Reference Clahsen2008). Researchers who view the error detection/grammatical judgment task as primarily capturing explicit metalinguistic skills have not considered potential effects of token frequency, as the assumption is that detecting an error involves conscious reflection on the target morphosyntactic rule (see, e.g., Montgomery & Leonard, Reference Montgomery and Leonard1998; Noonan et al., Reference Noonan, Redmond and Archibald2014; Rice et al., Reference Rice, Wexler and Redmond1999). Several studies that have used the error detection task to measure implicit language processing have only emphasized effects of type frequency and not token frequency, acknowledging that the detection of errors that violate high-frequency morphosyntactic patterns should be faster and more accurate than errors violating a more infrequent pattern (e.g., Blackwell, Bates, & Fisher, Reference Blackwell, Bates and Fisher1996; Kail, Kihlstedt, & Bonnet, Reference Kail, Kihlstedt and Bonnet2012). Many speeded error detection studies within this framework do not discuss the possible effects of token frequency, however, even when low-frequency words are included in the materials (see, e.g., Kail et al., Reference Kail, Kihlstedt and Bonnet2012). In terms of error detection accuracy, several studies have found that school-age children perform close to ceiling across different types of errors, which could be due to the error type choices, or reflect that the tasks were primarily designed to capture difficulties in children with DLD (Miller et al., Reference Miller, Leonard and Finneran2008; Montgomery & Leonard, Reference Montgomery and Leonard1998; Purdy, Leonard, Weber-Fox, & Kaganovich, Reference Purdy, Leonard, Weber-Fox and Kaganovich2014; Wulfeck, Bates, Krupa-Wiatkowski, & Saltzman, Reference Wulfeck, Bates, Krupa-Wiatkowski and Saltzman2004). As mentioned previously, Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017) showed that lexical and structural frequency affected error sensitivity in a group of Swedish-speaking children with DLD and TD. Because there were potential ceiling effects in the TD group with smaller standard deviations as a result, the current study will report error detection response times for the same group of TD children, to expand on the results reported in our earlier paper.
Error correction and repetition
In an error correction task, a participant hears an ungrammatical sentence and is asked to correct the error. Morphosyntactic error correction has been argued to involve more metalinguistic demands (both regarding analysis and control) than error detection (e.g., Bialystok & Ryan, Reference Bialystok and Ryan1985; Cain, Reference Cain2007; Cairns, Schlisselberg, Waltzman, & McDaniel, Reference Cairns, Schlisselberg, Waltzman and McDaniel2006; Kamhi & Koenig, Reference Kamhi and Koenig1985; Nation & Snowling, Reference Nation and Snowling2000; Plaza & Cohen, Reference Plaza and Cohen2003; Tong, Deacon, & Cain, Reference Tong, Deacon and Cain2014). The explicit metalinguistic nature of the correction task has also been questioned, however, as simply making an ungrammatical sentence grammatical might be achieved without understanding or reflecting on the exact nature of the error, so-called automatic corrections (Cain, Reference Cain2007; Gombert, Reference Gombert1992; Rubin, Kantor, & Macnab, Reference Rubin, Kantor and Macnab1990). These authors suggest that error correction results should be compared with results from a repetition task where the participant is asked to repeat an ungrammatical sentence verbatim, to see if automatic corrections occur. In the current study, both tasks are included. The correction studies referenced above have primarily included error correction as a blanket measure of explicit metalinguistic awareness, and a single score is reported without analyzing performance on different types of errors. Similarly to previous detection studies, the impact of token frequency has not been explored in correction, but frequency effects have been found in elicitation and sentence repetition tasks.
In elicitation tasks, where the child has to provide the target form after a prompt such as “He walks. Yesterday he _____” (often with picture support), token and type frequency have been found to affect past-tense production in Norwegian-speaking and Icelandic-speaking children with TD (ages 4–8; Ragnarsdottir et al., Reference Ragnarsdottir, Gram Simonsen and Plunkett1999), Dutch-speaking children with TD (ages 7–8; Rispens & De Bree, Reference Rispens and De Bree2014), and in English-speaking children with TD (ages 4–14; Marchman, Reference Marchman1997). All studies included high- and low-frequency verbs that were selected to be familiar to all children. Ragnarsdottir et al. (Reference Ragnarsdottir, Gram Simonsen and Plunkett1999) and Rispens and De Bree (Reference Rispens and De Bree2014) found that the past-tense productions of high-frequency regular and irregular verbs were on average more accurate than production of low-frequency verbs, and Marchman (Reference Marchman1997) found an effect of type frequency in the type of error responses the children made. These authors argue that their results support an emergentist view of language learning and processing.
Grammatical structures containing high-frequency words also tend to be repeated more accurately than the same structures containing low-frequency words (Kidd et al., Reference Kidd, Lieven and Tomasello2006; Matthews et al., Reference Matthews, Lieven, Theakston and Tomasello2005; Matthews, Lieven, Theakston, & Tomasello, Reference Matthews, Lieven, Theakston and Tomasello2007). For repetition of subject–verb–object (SVO) sentences with unusual word order (SOV or VSO), Matthews et al. (Reference Matthews, Lieven, Theakston and Tomasello2005, Reference Matthews, Lieven, Theakston and Tomasello2007) found that sentences containing high-frequency verbs were automatically corrected by young TD children more often than those containing low-frequency verbs (e.g., “The dog the seal pushed” → “The dog pushed the seal”). Kidd et al. (Reference Kidd, Lieven and Tomasello2006) found similar effects of token frequency using sentences with complement-taking high- and low-frequency verbs in preschool TD children.
In summary, from a standpoint where error detection and correction are thought to involve explicit reflection on grammatical rules, an effect of token frequency is not expected, because the morphosyntactic error is the same across words of different frequencies. Based on previous emergentist research using other types of tasks, however, an effect of frequency is expected, where high-frequency morphosyntactic patterns and lexical items facilitate error detection and correction, and make the repetition of errors more challenging, given that these tasks reflect implicit language processes rather than explicit reflection of morphosyntactic rules.
Target grammatical structures
The target grammatical structures in the current study, simple past-tense and singular noun phrases, were chosen based on earlier research on Swedish-speaking preschool children with DLD. These structures have been found to be vulnerable in the expressive language of young Swedish-speaking children with DLD (Hansson & Leonard, Reference Hansson and Leonard2003; Hansson, Nettelbladt, & Leonard, Reference Hansson, Nettelbladt and Leonard2000, Reference Hansson, Nettelbladt and Leonard2003; Leonard, Salameh, & Hansson, Reference Leonard, Salameh and Hansson2001). For all error types, verbs and nouns of high and low token frequency were contrasted.
Past tense
Swedish verb morphology is sparser than English, and finite verbs mark tense, but not person or aspect. There are two regular conjugations where past tense is marked with the suffixes –de or –te on the stem (see Table 1). In spoken language, this marking is only obligatory for the second conjugation, and because of this, studies investigating Swedish regular past-tense use in children with and without DLD typically only include verbs of the second conjugation. In the present study, errors involving regular verbs of the second conjugation are contrasted with irregular verbs in all experiments. The target error is the use of the unmarked infinitive (ending with –a) instead of the regular past tense (ending with –de or –te) or irregular past tense. This error occurs in the expressive language of Swedish-speaking preschool children with DLD significantly more often than age- and mean length of utterance-matched TD children (Hansson & Leonard, Reference Hansson and Leonard2003; Hansson et al., Reference Hansson, Nettelbladt and Leonard2000). The vulnerability of the regular past tense in Swedish could be partly explained by input characteristics: in the first conjugation the past tense, the infinitive, and the imperative share the same surface form in spoken language, and this conjugation contains about two-thirds of all verbs (Holmes & Hinchliffe, Reference Holmes and Hinchliffe2003). If implicit knowledge and input frequency affects the performance of error detection and correction, both regular and irregular past-tense errors that end with –a might be more difficult to detect and correct, especially when involving low-frequency verbs.
Noun phrases
Swedish noun phrase morphology is relatively complex compared to English: there are two noun genders that have to be rote learned for each noun, common and neuter, with different articles and adjective inflections that change with number and definiteness. When a noun phrase contains an adjective, definiteness has to be marked on a preposed article and with a suffix on the noun. In addition, the adjective has to agree in gender, number, and definiteness with the noun. For indefinite noun phrases, the article is obligatory for singular (count) nouns, but not for plural nouns. See Table 2. Bare nouns also exist in a number of other contexts (Bohnacker, Reference Bohnacker2003), including when a noun signals a “quality” of another noun rather than an object (e.g., “Jackan har fint foder,” “The jacket has [a] beautiful lining”), and for mass nouns (e.g., “Vi drack gott kaffe,” “We drank tasty coffee”).
* Other plural markings exist for common gender, but this is not the focus of the current study. ** Not all neuter nouns have ø-marking for plural, but all ending with a consonant do.
Hansson et al. (Reference Hansson, Nettelbladt and Leonard2003) and Leonard et al. (Reference Leonard, Salameh and Hansson2001) found that children with DLD omitted or produced the wrong indefinite article in obligatory contexts, in particular in noun phrases containing an adjective (ART+ADJ+N). In addition, both Leonard et al. (Reference Leonard, Salameh and Hansson2001) and Hansson et al. (Reference Hansson, Nettelbladt and Leonard2003) found an effect of gender in children with DLD: the neuter indefinite article ett was omitted significantly more often than the common article en. These results could not be attributed to prosodic differences, as both indefinite articles are weak monosyllabic VC words in unstressed positions. Available corpus data from written sources (Korp: Borin, Forsberg, & Roxendal, Reference Borin, Forsberg and Roxendal2012) do not support the possible explanation that the article would be omitted more often in indefinite neuter noun phrases containing an adjective than in common noun phrases. Rather, because the common gender is more than three times more frequent than the neuter gender in Swedish overall (Bohnacker, Reference Bohnacker2003), the difference could be a result of type frequency, but this possibility has not previously been discussed.
RESEARCH QUESTIONS AND HYPOTHESES
This study is guided by the following research questions:
1. Is speed of grammatical error detection, and accuracy of grammatical error correction and error repetition, in Swedish-speaking school-age children affected by
a. token frequency of a target verb or noun, and/or
b. the type of verb (regular or irregular), or
c. the noun gender (common or neuter) associated with the error?
2. What is the nature of the relationship between error detection, correction, and repetition, and does this relationship differ depending on type of error and/or token frequency?
The main hypothesis rests on the assumption that error detection and correction of early acquired morphosyntactical structures carry low explicit metalinguistic demands in 10-year-olds with TD, and type and token frequency should therefore influence the results. In addition, Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017) showed that error detection accuracy was affected by token frequency (for verbs) and type frequency (for noun phrases) in the same group of children included in the current study. Thus, token frequency is expected to affect error detection speed, where target words with lower lexical frequency will be associated with slower response times (RTs). In addition, based on type frequency and findings from preschool children with DLD, neuter noun phrase errors are expected to be associated with slower RTs than common noun phrase errors.
For error correction, following the same line of reasoning, token frequency is expected to affect past-tense error correction accuracy, where errors involving high-frequency verbs will be easier to correct than errors involving low-frequency verbs. Regular verb errors are predicted to be easier to correct than irregular verb errors, and type frequency effects should make overgeneralizations of regular patterns frequent for irregular verbs. Common noun phrase errors are predicted to be easier to correct than neuter noun phrase errors, based on the differences in type frequency. Given that the error involves the article in the beginning of the noun phrase and not the noun itself, noun frequency is not expected not to affect noun phrase error correction accuracy.
For error repetition, errors involving high-frequency words and structures are expected to be more difficult to repeat verbatim than low-frequency words and structures. In other words, there will be more automatic corrections of errors involving high-frequency regular and irregular verbs compared to low-frequency verbs. Furthermore, automatic corrections are expected to be more prevalent in noun phrase errors involving common nouns compared to those involving neuter nouns.
Finally, children are expected to detect more errors than they are able correct, as error correction is believed to involve higher processing demands than error detection. If automatic corrections are confirmed in the error repetition task, however, this implies that the error correction task does not necessarily demand explicit reflection on the morphosyntactic structure itself. In addition, if all three tasks tap into children’s explicit metalinguistic ability, a positive significant correlation should be seen between the tasks, showing that children who perform well on one task also perform well on the other tasks.
GENERAL METHOD
Protection of human subjects
This study was approved by the Stockholm Regional Ethical Review Board (#2014/1849-31/5) and the University Committee on Activities Involving Human Subjects at New York University (#14-10405). All caregivers signed a consent form, and all children gave oral consent to participate. Participants received no monetary compensation for taking part in the study.
Participants
The participants were the same children (n=30) included as a TD control group in an earlier study on error detection in children with DLD (Hallin & Reuterskiöld, Reference Hallin and Reuterskiöld2017). The children were all in Grade 4 with a mean age of 10;7 (10;2 to 11;1 years old) and were recruited from six schools in in the larger Stockholm municipality. Based on a caregiver questionnaire, none of the 14 boys or 16 girls had a history of developmental or language delay (beyond earlier difficulties with the /r/-sound for two participants). They were monolingual native Swedish speakers with normal hearing (routinely tested in all Swedish elementary schools). On average, the caregivers had 6.2 years of education beyond Grade 9 (Swedish compulsory school), range 3–12 years, SD 3.47 years (n=23).
All children received a standardized test battery including the core language subtests from the Swedish version of the Clinical Evaluation of Language Fundamentals—Fourth Edition (CELF-4; Semel, Wiig, & Secord, Reference Semel, Wiig and Secord2013), the Peabody Picture Vocabulary Test—Third Edition (Dunn & Dunn, 1959–Reference Dunn and Dunn2007; Hedberg & Kellén Nilsson, Reference Hedberg and Kellén Nilsson2003), and the Raven’s Coloured Progressive Matrices (Raven, Court, & Raven, Reference Raven, Court and Raven1990). Participants scored at or above –1 SD on all measures (CELF-4 composite index: M=103.9, Raven’s Coloured Progressive Matrices: M=105; the Peabody Picture Vocabulary Test—Third Edition: raw score mean equivalent to stanine 5 for Swedish 4th graders). In addition, to ensure the participants did not present with dyslexia, all children scored at or above –1 SD on two single word decoding tasks (one with real words and one with nonwords) from the test battery Logos (Høien, Reference Høien2005).
Experimental stimuli
All three experiments included the same target sentences, which were simple 9–10 word declarative sentences with similar structure and complexity regardless of error type. All sentences described plausible actions and situations.
The past-tense errors involved the substitution of the simple past-tense with the infinitive. Half of the sentences contained a regular verb of the second conjugation and half contained an irregular verb. The verb with the error always appeared as the third word of the sentence, preceded by a phrase that indicated past tense. An example of an error sentence including a regular LF verb where * denotes the infinitive verb is I torsdags * krympa pappan en gul vante i tvätten. Translation: Last Thursday * shrink -INF dad -COM.SING.DEF a .COM yellow .COM.SING mitten in laundry -DEF.COM. The noun phrase errors involved the omission of the obligatory indefinite article in noun phrases with an adjective and a singular countable and concrete noun. In half of the sentences, the target noun had common gender and in half of the sentences the target noun had neuter gender. The noun phrase containing the error was always the direct object of a verb, and the error could be detected after the third word. An example of error sentence including a LF neuter noun where * denotes the missing obligatory article “ett” is Mamman plockade * surt plommon från trädet i lördags. Translation: Mother -COM.SING.DEF pick -PAST * sour -NEUT.SING plum from tree -NEUT.SING.DEF last Saturday. For more examples of sentences with translations, see Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017).
To select 1–2 syllable target verbs and nouns, the corpus LäSBarT was used (Mühlenbock, Reference Mühlenbock2008). LäSBarT is a specialized corpus with about 1.1 million words compiled from easy-to-read texts such as fiction, informational texts, and news articles. Frequency data were obtained from the web user interface Korp (Borin et al., Reference Borin, Forsberg and Roxendal2012). The high-frequency (HF) target nouns and verbs had a minimal lemma frequency of 100/million. In addition, for verbs the occurrence of the simple past form in the corpus were at or above 30/million. The low-frequency (LF) target nouns had a lemma frequency between 0/million and 10/million. The LF target verbs had a lemma frequency below 30/million and a simple past form occurrence between 0/million and 10/million. Adjectives included in all target noun phrases had a lemma frequency of 60/million or higher. All included words (HF and LF) were selected to be familiar to 4th graders. To confirm this, five Swedish elementary school teachers rated all target words for Grade 4 familiarity on a scale ranging from 1 (no familiarity) to 5 (very high familiarity): HF verbs had a mean rating of 4.97, HF nouns 4.98, LF verbs 3.93, and LF nouns 3.93. For a list of all target words with translations, see Appendix A.
Ten sentences of each target error type and frequency were created for a total of 80 target sentences. Sixty filler sentences with the same length as the target sentences, but with other types of errors and error placements, were also created. Two stimuli lists were created: A and B. The same sentences were used for both lists, but all sentences that were grammatical in List A were ungrammatical in List B and vice versa (similar to Miller et al., Reference Miller, Leonard and Finneran2008). Half of the participants (n=15) were randomly assigned to List A, and half were assigned to List B (n=15). A series of independent samples t tests ensured that there were no significant differences in verbal or nonverbal scores between participants assigned to the different lists (all p values>.20).
The first author, who is a female native speaker with extensive vocal training, recorded the sentences with a Marantz Professional PMD670 Recorder at a sampling rate of 44.1 kHz, and a SHURE head mounted microphone in a sound-attenuated audiometric booth. The recordings were downsampled to 22.05 kHz, and root mean square normalized to 65 dB in PRAAT (Boersma & Weenink, Reference Boersma and Weenink2012). One native speech-language pathologist listened to several versions of all tokens and selected sentence pairs for the final list based on prosody and naturalness.
A Thinkpad X230 tablet with Windows 7 and E-Prime 2.0 software (Psychology Software Tools, Pittsburgh, PA) was used for the presentation of all sentences (in headphones) for all three experiments. An E-prime Serial Response box™ was used for response collection.
Excluded target sentences
As a result of post hoc analyses of child responses, two target sentences were excluded from all analyses: one LF irregular verb: sjuda (“simmer”) and one LF neuter noun: ruckel (“shack”). A majority of the participants indicated that they were not familiar with these words by commenting, or by exchanging the word for a nonword in the correction task, which was not observed for the other LF words. One additional ungrammatical sentence with a target LF neuter noun was excluded as the adjective could be interpreted as a verb particle when the obligatory article was omitted. Because these exclusions led to different numbers of analyzed sentences for different target error types, all analyses were rerun with eight sentences per target error category, and the conclusions were the same as reported below.
Procedure
Each child was seen twice (5–15 days between each session). The detection and correction tasks were always administered at the first session, and the ungrammatical sentence repetition task at the second session. For a summary of the materials used for each experiment across sessions, see Table 3. The descriptive and standardized measures were administered in approximately the same order for all children divided between the two sessions. All children saw the same experimenter in a quiet room either at their school or in their home. Each session lasted for about 80 min including short breaks.
EXPERIMENT 1: ERROR DETECTION
Materials
All 140 sentences were included in the first experiment: half were ungrammatical and half were grammatical (see Table 3). Sentences were assigned to 10 blocks with 14 items in each block, with one sentence of each target sentence type (grammatical/ungrammatical) in each block. Sentences were randomly ordered within blocks with the constraints that the same types of sentences could not appear consecutively and no more than three consecutive grammatical or ungrammatical sentences were allowed.
Procedure
Before the error detection task, all children participated in a button-press task to familiarize them with the experimental apparatus. The participants were subsequently instructed that they would hear sentences that would either sound correct or incorrect, and that they should press the green button if they thought that the sentence was correct, or the red button if they heard an error. They were instructed to press the relevant button as soon as they knew the answer, even if the sentence had not finished.
At the start of each trial, a red circle with a cross and a green circle with a checkmark appeared on the screen. After 1000 ms, a fixation-cross appeared between the circles (500 ms) before the sentence started. The circles stayed on the screen to prompt the participant to answer as quickly as possible. The participant had about 2.5 s to respond before the next trial. Before the experiment, three grammatical and three ungrammatical practice sentences with different types of errors compared to the rest of the experiment were presented, and the experimenter ensured that the child understood the instructions.
Participants received the 10 blocks of their assigned list in random order. During the experiment the participants only received nonspecific encouragement. After each block, they could take a short break. After 5 blocks all children had a longer break to avoid fatigue. All children were reminded of the instructions before the second half of the experiment.
Dependent measures
RT
RT was measured by E-prime. For past-tense errors, RT was calculated from the point immediately after the incorrect verb, and for noun phrase errors, immediately after the adjective. In the calculations of RT, only correctly rejected ungrammatical sentences with RTs between 100 and 3000 ms were included. The purpose of excluding RTs above 3000 ms was to exclude those trials where a participant was not focusing on the task. In addition, RTs below 100 ms were considered to be anticipatory responses. Furthermore, the data point for an error category for each subject had to represent the mean RT of at least two accurate rejections.
Accuracy
The accuracy measure A’ (error sensitivity: Grier, Reference Grier1971) was calculated and reported in Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017). To enable comparisons between error detection and the two other tasks in the current paper, the proportion of accurately rejected sentences are reported in the Comparison Between the Three Tasks section below.
Results
Out of 2,310 responses to both grammatical and ungrammatical sentences, 31 (1.3%) had no registered response. Reasons for this included occasions when the participant responded after the trial had finished, a too weak button press on the response box, or technical issues. An additional 2.6% of the accurate responses were excluded because they fell outside the 100–3000 ms boundary. Furthermore, several participants had fewer than two accurate responses across one or more target error category; the final number of participants included in the RT analysis was therefore 27 for the past-tense errors and 21 for the noun phrase errors.
Past-tense errors
Figure 1 shows the mean RT for accurately detected past-tense errors. A repeated measures analysis of variance (ANOVA) was carried out on RT with verb type (regular vs. irregular) and verb frequency (HF vs. LF) as within-subject factors, and stimuli list (A/B) as a between-subject factor (n=27). There was a significant effect of verb frequency, F (1, 25)=6.680, p=.016, ηp 2=0.211. The mean RT for errors involving HF verbs (M=924, SD=243.8) was significantly shorter than the mean RT for errors involving LF verbs (M=1058, SD=384.5), d=0.21. There was no significant effect of verb type (p=.637), and no Verb Type × Frequency interaction (p=.372). No main effect or interactions with stimuli list were significant (all p values ≥ .400).
Noun phrase errors
Figure 2 shows the mean RT for accurately detected noun phrase errors. A repeated measures ANOVA was carried out on RT with noun gender (common vs. neuter) and noun frequency (HF vs. LF) as within-subject factors, and stimuli list (A/B) as a between-subject factor (n=21). There was a significant main effect of noun gender, F (1, 19)=6.904, p=.017, ηp 2=0.267. There was also a significant main effect of noun frequency, F (1, 28)=16.421, p=.001, ηp 2=0.464. The mean RT for noun phrase errors involving common nouns (M=1287, SD=273.1) was significantly shorter than the mean RT for noun phrase errors involving neuter nouns (M=1477, SD=450.5), d=0.56. Furthermore, the mean RT for errors involving HF nouns (M=1261, SD=354.1) was significantly shorter than the mean RT for error involving LF nouns (M=1503, SD=366.0), d=0.34. There was no significant interaction between noun gender and frequency (p=.380). No effects of stimuli list were significant (all p values ≥ .130).
Discussion
Verb target sentences
The RT data from the error detection task are in line with the error sensitivity results reported in Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017): there were no significant differences between regular and irregular verbs, but HF verbs were associated with faster RTs and higher error sensitivity compared to LF verbs. The RT data from accurately corrected past-tense errors likely reflect speed of lexical access. The accuracy data, however, imply that implicit knowledge gained through language exposure affect the results (see also discussion in Hallin & Reuterskiöld, Reference Hallin and Reuterskiöld2017). It is possible, however, that the input characteristics of Swedish make the detection of past-tense errors for LF verbs particularly challenging: there are two regular conjugations, and the marking of past tense is optional in spoken language for the first conjugation. A possible consequence is that a participant assigns the first conjugation to the infinitive verb, and categorizes the sentence as grammatical because of that. A future error detection study with a different design is needed to answer this question, although the results from Experiment 2 (error correction) may shed some light on this matter. If children make overgeneralizations using the pattern of the first conjugation while correcting past-tense errors involving LF verbs, it indicates that they accurately interpret the verbs as being in infinitive form, rather than in the unmarked, colloquial past tense form.
Noun phrase target sentences
For noun phrase target sentences, there was an effect of noun gender in the predicted direction: neuter noun phrase errors were associated with slower RTs compared to common noun phrase errors. Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017) reported error sensitivity scores that follow the same pattern. The result may reflect input frequency (common nouns are three times as frequent as neuter nouns in Swedish) and mirrors findings for younger children with DLD, who have been found to omit the neuter indefinite article more often than the common indefinite article in their expressive language (Hansson et al., Reference Hansson, Nettelbladt and Leonard2003; Leonard et al., Reference Leonard, Salameh and Hansson2001). In addition, there was an effect of noun frequency on RT, where detection of noun phrase errors involving HF nouns was faster on average, than those involving LF nouns. This was not in line with the results reported in Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017), where noun frequency did not have a significant effect on noun phrase error sensitivity. The increased RT points to an increased processing challenge for sentences including a LF noun, even when the morphosyntactic error is accurately detected. A possible explanation could be that bare nouns do exist in certain constructions (most commonly for mass nouns), and that the processing demand or attentional control to rule these constructions out is higher for LF nouns, due to less strong representations and slowed lexical access.
The fact that school-age children with TD show a relative difficulty detecting neuter noun phrase errors confirms that this structure constitutes a challenge in Swedish. In English, omission of articles has been found to be easier to detect than morphological errors for school-age children (McDonald, Reference McDonald2008). This difference between Swedish and English might be explained by the nature of Swedish noun morphology. In Swedish, the adjective in a noun phrase has to agree in gender, number, and finiteness with the noun, and the indefinite article is only obligatory in singular contexts for count nouns (see Table 2). Consequently, the presence or absence of morphological suffixes on the adjective and noun, as well as the semantic content of the noun (count/mass), has to be processed in order to detect the omission of the singular indefinite article. There are at least two possible reasons why the omission of the neuter indefinite article might be particularly challenging. The first is the frequency effect mentioned above: neuter nouns are less frequent in Swedish overall. The second is the fact that most neuter nouns have Ø-marking for plural, including 19/20 neuter nouns in the current experiment. For phrases including a Ø-plural neuter noun, the adjective marking is the only cue to grammaticality when the article is omitted (see Table 2). In contrast, for common noun phrases, both the adjective and the noun always have an unambiguous explicit singular marking. The design of the current study does not allow for differentiation between effects due to morphological or semantic characteristics or due to type frequency, which all could lead to slower RTs and lower detection accuracy.
EXPERIMENT 2: ERROR CORRECTION
Materials
Sixty ungrammatical sentences (40 target, 20 filler) were included in the error correction experiment. Children received the same list (A/B) as they received in Experiment 1, so that they attempted to detect and correct the same errors (see Table 3). Sentences were randomly assigned and ordered in 6 blocks with 10 items in each block, with the constraint that the same type of error could not appear consecutively.
Procedure
The children were instructed that they would hear sentences that all included errors, and that they should repeat the sentence and try to correct the error. The child heard three practice sentences with feedback, and then the six blocks were presented in random order. Before each sentence, the text “Listen carefully” appeared on the screen. Children received nonspecific encouragement between each sentence block. The experimenter controlled the rate of presentation of the sentences, and the child’s responses were recorded on a digital recorder.
Dependent measure
Error correction accuracy
Responses were transcribed and scored by the first author. Participants got 1 point for each accurate correction of the target error (without any demands on accurate repetition of the rest of the sentence), and proportion correct for each target category was calculated. For two of the LF irregular verbs (one in List A and one in List B) corrections to either irregular or regular past tense were scored as correct, as both forms are accepted in Swedish spoken language (the verbs were “concealed” [accepted corrections: dolde (irregular) or döljde (regular)] and “grease” [accepted corrections: smorde (irregular) smörjde (regular)]). Before statistical analysis, the proportions were arcsine transformed to meet the repeated measures ANOVA assumptions (Sokal & Rohlf, Reference Sokal and Rohlf1981).
Results
The three ungrammatical sentences that were excluded in the error detection analysis were also excluded in the error correction analysis (see appendix A). Furthermore, sentences where the child asked to skip the sentence or said that he or she did not remember anything of the sentence (n=13) or where the child substituted the target word for another word (n=11) were also excluded from the analyses. The final number of analyzed trials was 1,086.
Past-tense errors
Figure 3 shows mean proportion accurately corrected past-tense errors. A repeated measures ANOVA was conducted on the arcsine transformed error correction accuracy measure with verb type (regular vs. irregular) and verb frequency (HF vs. LF) as within-subject factors and stimuli list (A/B) as a between-subject factor. There was a significant effect of verb type, F (1, 28)=17.892, p<.001, ηp 2=0.390. There was also a significant effect of verb frequency, F (1, 28)=6.236, p=.019, ηp 2=0.182. Errors involving regular verbs (M=0.937, SD=0.107) were significantly easier to correct than errors involving irregular verbs, with a large effect size (M=0.752, SD=0.246), d=1.05. Furthermore, errors involving HF verbs were significantly easier to correct (M=0.878, SD=0.141) than errors involving LF verbs, with a small to moderate effect size (M=0.752, SD=0.246), d=0.33. No interactions or effects of stimuli list were significant (all p values ≥ .187).
Noun phrase errors
Figure 4 shows mean proportion accurately corrected noun phrase errors. An ANOVA analysis approach was not appropriate because of violated assumptions due to ceiling effects (even with transformations): all four conditions had a median of proportion accurate corrections of 1.00, with a range between 0.5 and 1.0. Therefore, nonparametric alternatives were used instead.
A Friedman test showed no significant differences between error correction for common and neuter noun phrase errors involving HF or LF nouns, χ2 (3)=3.750, p=.290. Comparisons between HF and LF nouns for each gender separately (Wilcoxon signed rank tests) approached but did not reach significance: the p value for the difference between HF and LF common gender nouns was .052, and for HF and LF neuter gender nouns the p value was .087.
Discussion
Past-tense errors
For correction of past-tense errors, the results are in line with the results from error detection: errors involving HF verbs were easier to correct than LF verbs: the participants were essentially at ceiling for HF regular verbs. This effect of token frequency has not been reported or explicitly discussed in any previous error correction studies. Cain (Reference Cain2007) found that receptive vocabulary was a significant predictor for error correction ability in TD English-speaking 5th graders, which may indicate token frequency effects, but she did not discuss these results further.
Participants in the present study had a significantly greater difficulty correcting irregular verbs compared to regular verbs, with a large effect size, even though the variability across participants was large. To investigate the difference between regular and irregular verbs further, an analysis of the types of inaccurate responses was conducted. This revealed an effect of type frequency: 86% of the inaccurate corrections of errors involving irregular verbs were overgeneralizations to the second or first conjugation, and 56% of errors involving LF regular verbs were overgeneralizations to the first conjugation. The remaining errors for both target types were verbatim repetitions of the error. In the instances where the participant repeated the error, it is impossible to know whether this was because of an inability to detect and correct the error, or if it was a consequence of the optional tense marking in the spoken version of the first conjugation. As participants did add an overt morphological marking in a majority of the inaccurate corrections, they did not seem to treat the marking as optional in this formal experiment, however.
A majority of the overgeneralization errors was to the second conjugation (see Table 1), and not the more frequent first conjugation. This is in direct parallel to Ragnarsdottir et al. (Reference Ragnarsdottir, Gram Simonsen and Plunkett1999), who found that their oldest age group showed the same pattern in Norwegian and Icelandic in an elicitation task. The fact that the most common overgeneralization pattern shown by the participants represented the second conjugation pattern, and that a verb frequency effect was still seen in the target regular verbs of the second conjugation, supports a view of past-tense processing that is governed by statistical features of the language (Marchman, Reference Marchman1997; Ragnarsdottir et al., Reference Ragnarsdottir, Gram Simonsen and Plunkett1999).
Noun phrase errors
There were no significant effects of noun gender for the correction of noun phrase errors, which went against the hypothesis. The effect of noun frequency approached but did not reach significance, possibly due to the lack of power in the nonparametric test and the ceiling effects. The ceiling effect is interesting in itself: although neuter noun phrase errors were associated with slower RTs and lower error sensitivity (as reported in Hallin & Reuterskiöld, Reference Hallin and Reuterskiöld2017), the participants were close to ceiling in correcting the same errors. This raises questions about whether some of the noun phrase corrections were automatic: participants may have corrected the error without explicit detection and application of a rule. This possibility will be explored in the third and last experiment.
EXPERIMENT 3: ERROR REPETITION
Materials
The same sentences and experimental design as in Experiment 2 (error correction) were used in the error repetition task. The error repetition task was always presented in a separate second session, to avoid priming and/or learning effects (see Table 3).
Procedure
The participants were instructed that they would hear sentences in headphones that contained errors once more, but that this time they had to repeat the sentence verbatim without correcting the error. The task started with three practice sentences, and then the six blocks were presented in random order, with a different randomization compared to the error correction task. Nonspecific encouragement was given throughout the task. The experimenter controlled the rate of presentation, and responses were recorded on a digital recorder.
Dependent measure
Error repetition accuracy
All responses were transcribed and scored by the first author. The child got 1 point if he or she repeated the grammatical error verbatim, and zero points if he or she corrected the error. For all verbs, any changes that involved overgeneralization were scored as automatic corrections, and received a score of zero. If the child excluded the adjective in the noun phrase, but still added the indefinite article and made the sentence grammatical, this also counted as an automatic correction. Before statistical analysis, the proportion accurate repetitions were arcsine transformed to meet the repeated measures ANOVA assumptions (Sokal & Rohlf, Reference Sokal and Rohlf1981).
Results
Trials where the child asked to skip the sentence or said that he or she did not remember anything (n=4) or where the child substituted another word for the target word (n=15) were excluded from the analyses. There was also missing data from one participant who refused to do the error repetition task. The final number of analyzed trials was 1,054.
Past-tense errors
Figure 5 shows mean proportion of verbatim repeated past-tense errors. A repeated measures ANOVA on the arcsine transformed error repetition accuracy measure was conducted with verb type (regular vs. irregular) and verb frequency (HF vs. LF) as within-subject factors and stimuli list (A/B) as a between-subject factor. There was a significant effect of verb frequency, F (1, 27)=5.554, p=.026, ηp 2=0.171. Errors involving LF verbs (M=0.961, SD=0.074) were significantly easier to repeat verbatim than errors involving HF verbs (M=0.910, SD=0.098), d=0.59. There was no significant effect of verb type (p=.159) and no significant interactions. No interactions with stimuli list were significant (all p values ≥ .187).
Noun phrase errors
Figure 6 shows mean proportion of verbatim repeated noun phrase errors. A repeated measures ANOVA on the arcsine transformed error repetition accuracy measure was conducted with noun gender (common vs. neuter) and noun frequency (HF vs. LF) as within-subject factors and stimuli list (A/B) as a between-subject factor. There were no significant effects of noun gender, noun frequency, or stimuli list (all p values ≥ .11), and no significant interactions.
Discussion
Past-tense errors
The mean proportions of automatic corrections of past-tense errors were quite low: in general, the children had no difficulty repeating past-tense errors. The patterns of inaccurate repetitions supported the hypothesis that implicit language knowledge affected performance, however: HF verbs led to significantly more automatic corrections than LF verbs, indicating that it is more difficult to inhibit the natural tendency to produce a correct sentence when the error involves a high-frequent, everyday word. This finding is in line with sentence repetition results including French- and English-speaking children (Kidd et al., Reference Kidd, Lieven and Tomasello2006; Matthews et al., Reference Matthews, Lieven, Theakston and Tomasello2005, Reference Matthews, Lieven, Theakston and Tomasello2007). In addition, errors involving regular and irregular verbs had similar proportions of automatic corrections. Half of the automatic corrections of irregular past-tense errors involved the accurate past tense, and half were overgeneralizations.
Noun phrase errors
The noun phrase sentences showed a high proportion of automatic corrections. This finding might explain why there were ceiling effects in the correction of noun phrase errors (Experiment 2) despite weaker error detection using the same materials (Hallin & Reuterskiöld, Reference Hallin and Reuterskiöld2017): some of the corrections were likely automatic. Similarly to the correction results, there were no effects of noun gender or token frequency on repetition accuracy, which went against the initial hypothesis. The variation across participants when repeating noun phrase errors was large. Averaging the proportion accurate error repetitions for all noun phrase error sentences (all HF/LF common and neuter tokens), 9 participants had >90% accurate repetitions, 15 participants had 70%–90%, and 5 participants had <55%. A closer look at the 5 participants with the least number of accurate noun phrase repetitions revealed that they all had a standard score of 91 or lower on the total composite language score of CELF-4. A correlation between CELF-4 scores and noun phrase error repetition scores while controlling for stimuli list (n=29) yielded a partial correlation of r=.71 (p<.001), indicating that almost 50% of the variance in noun phrase error repetition accuracy could be explained by CELF-4 scores. In other words, children with stronger language skills were better at inhibiting the natural tendency to correct this type of error. This might indicate that participants with weaker general language skills did not employ as much explicit awareness to the task (especially cognitive control: Bialystok & Ryan, Reference Bialystok and Ryan1985). Another possible explanation is that the sentence repetition itself challenged some children. Riches (Reference Riches2012) argued that in order to repeat longer sentences, information is maintained in short-term memory in chunks, defined as larger units created with the help of linguistic patterns or representations in long-term memory. It is possible that the children with lower language skills in the current sample had difficulty noticing and remembering a brief, prosodically nonsalient error while simultaneously attempting to chunk the sentence in order to keep it in their memory. The connection between language processing and memory has also been investigated by Van Dyke and colleagues (Lewis, Vasishth, & Van Dyke, Reference Lewis, Vasishth and Van Dyke2006; Tan, Martin, & Van Dyke, Reference Tan, Martin and Van Dyke2017; Van Dyke & Johns, Reference Van Dyke and Johns2012; Van Dyke, Johns, & Kukona, Reference Van Dyke, Johns and Kukona2014). In their most recent paper, they used a reading task, in which adult participants read sentences and verbally responded to a comprehension question requiring a one-word response. Sentences were constructed to include embedded semantic and syntactic distractors, and the task required the participants to resolve the interference from distractors to correctly respond to the question. The researchers reported a relationship between syntactic interference in the reading task and general working memory performance, measured by complex span tasks. There was also a link between general working memory performance and the task requiring a verbal response to a comprehension question, when semantic interference was included. The authors speculated that attentional control might be an underlying factor influencing their results. In line with these results, it might be that in our task the children who demonstrated stronger language skills had access to additional cognitive resources to use for attentional control and inhibition while required to repeat sentences including errors.
COMPARISONS BETWEEN THE EXPERIMENTS
Error detection versus error correction
Using paired samples t tests, differences between proportion detected errors and proportion corrected errors were compared; see Table 4. For past-tense errors involving regular verbs (high and low frequency), there were no significant differences between the error detection and error correction performance (p>.05). For past-tense errors involving HF irregular verbs, children detected significantly more past-tense errors than they were able to correct, t (29)=4.19, p<.001, and the same was seen for LF irregular verbs, t (29)=4.32, p<.001. For the noun phrase errors, however, a very different pattern can be seen. On average, children corrected more noun phrase errors than they were able to detect. This difference reached significance for HF common noun phrases, t (29)=–2.69, p=.012, LF neuter noun phrases, t (29)=4.07, p<.001, and HF neuter noun phrases t (29)=–2.57, p=.016.
* p-values < 0.05.
Correlations
Inspection of scatterplots showed no clear linear relationships between the three tasks, and Pearson’s correlations for 29 children, while controlling for stimuli list (A/B), showed only weak to moderate positive correlations for past-tense target errors between error detection and error correction (r=.38), and error correction and error repetition (r=.40). For noun phrase target errors correlations were weak. After controlling for multiple correlations, no relationships were statistically significant (all p values>.006). Additional correlations without averaging across verb type, noun gender, or token frequency were found to be nonsignificant.
SUMMARY AND CONCLUDING DISCUSSION
This study investigated error detection, correction, and repetition in Swedish-speaking typically developing school-age children, through three carefully designed experiments. The focus was effects of error type and token frequency. Four target errors were analyzed: substitution of the simple past tense of regular/irregular verbs for the infinitive, and the omission of the indefinite article in noun phrases with an adjective, including either a common or a neuter noun. The current study reports the first results showing an impact of token frequency on error detection, correction, and repetition for past-tense errors. It also shows an impact of noun gender and token frequency on response time for detection of noun phrase errors (for error detection accuracy results, see Hallin & Reuterskiöld, Reference Hallin and Reuterskiöld2017). A summary of the significant effects in all three experiments can be found in Table 5. Error detection and correction tasks have been used extensively, but effects of token frequency have not previously been investigated. The findings show that even familiar words of lower frequency can affect performance in these tasks. This extends existing research that has shown effects of verb token frequency on expressive verb morphology in elicitation tasks in both children with TD and children with DLD (e.g., Marchman, Reference Marchman1997; Ragnarsdottir et al., Reference Ragnarsdottir, Gram Simonsen and Plunkett1999; Rispens & De Bree, Reference Rispens and De Bree2014). The results also raise questions about the results of previous error detection and correction studies and highlight the importance of controlling for token frequency when designing error detection and correction experiments, even when children are expected to be familiar with all words.
aThese data are presented in Hallin and Reuterskiöld (2017). bThe difference approached but did not reach significance (HF>LF), possibly due to lack of power.
The patterns of results indicate that several factors affect the extent to which explicit metalinguistic processes are involved in these tasks. Many researchers have used error detection and correction tasks as measures of explicit metalinguistic awareness (e.g., Cain, Reference Cain2007; Cairns et al., Reference Cairns, Schlisselberg, Waltzman and McDaniel2006; Montgomery & Leonard, Reference Montgomery and Leonard1998; Nation & Snowling, Reference Nation and Snowling2000; Noonan et al., Reference Noonan, Redmond and Archibald2014; Plaza & Cohen, Reference Plaza and Cohen2003; Tong et al., Reference Tong, Deacon and Cain2014). Most studies that have included both error detection and correction have concluded that error correction is a more demanding task than error detection (e.g., Kamhi & Koenig, Reference Kamhi and Koenig1985; Smith-Lock, Reference Smith-Lock1995), although exceptions exist (Rubin et al., Reference Rubin, Kantor and Macnab1990). The patterns of performance regarding accuracy in the present study considered together with the results from Hallin and Reuterskiöld (Reference Hallin and Reuterskiöld2017) suggest that both error detection and correction, at least when errors involve the very basic types of morphosyntactic errors included here, demand less explicit metalinguistic awareness than several researchers have suggested. In addition, different target errors as well as target words of different frequency also seemed to be associated with different levels of conscious and explicit metalinguistic involvement.
For verbs, the patterns of results across the three tasks indicate that it takes more cognitive control to detect errors involving LF items compared to HF items, although the language analysis demands are similar. This can be related to the emergentist view of differing strengths of implicit representations, where the prediction is that a violation to a HF verb pattern yields a stronger and faster activation of the language system, and thus a faster (and more accurate) error detection. In error correction that same strong representation can facilitate correction of regular past-tense errors without much additional explicit metalinguistic effort. The correction of irregular past-tense errors, in contrast, point to more demands in terms of both language analysis and cognitive control (Bialystok & Ryan, Reference Bialystok and Ryan1985) compared to regular verbs, as the child both has to inhibit the tendency to overgeneralize to a more frequent pattern (= control) as well as retrieving the correct (but less frequent) irregular pattern (= analysis). Finally, for error repetition, the strong representation of a HF verb makes automatic corrections more frequent, and type frequency effects make automatic corrections to the regular form for irregular verbs common (see discussion in Ambridge, Kidd, Rowland, and Theakston, Reference Ambridge, Kidd, Rowland and Theakston2015, on how HF forms can both prevent errors and cause errors).
The pattern of performance for the detection, correction, and repetition of the noun phrase errors indicated a high proportion of unconscious or automatic corrections. In contrast to the past-tense errors, children corrected a higher proportion of errors than they managed to detect. This indicates that 10-year-old children can correct a majority of the article omission errors that they cannot reliably (or at least explicitly) detect. It is unclear if the automatic corrections of noun phrase errors fall under what can be defined as automatic metaprocesses, however. By definition, automatic metaprocesses can be shifted into metalinguistic awareness through conscious attention (Gombert, Reference Gombert1992). The comments from some participants during the error correction task indicated that they simply repeated the sentence as they thought they heard it. Thus, these corrections might better be described as epilinguistic: the children applied their language pattern knowledge while repeating a sentence, and happened to solve a metalinguistic task at the same time.
The challenge of processing the Swedish noun phrase has previously been shown in language production in young children with DLD (Hansson et al., Reference Hansson, Nettelbladt and Leonard2003; Leonard et al., Reference Leonard, Salameh and Hansson2001), but not in school-age children with TD. It is likely that the challenge is due to a combination of prosodic factors, input factors, and morphological aspects (see discussion for Experiment 1). The design of the current experiments does not allow for teasing these factors apart, but the results warrant a closer look at the Swedish noun phrase, and in particular the challenge of learning and processing neuter noun phrases.
Finally, language ability of the individual seemed to be associated with the amount of explicit metalinguistic involvement, and might affect both the analysis and the control employed at any given moment. It is not surprising that a weaker language system leads to poorer performance in error detection and correction tasks, as previous studies including school-age children with DLD shows (Hallin & Reuterskiöld, Reference Hallin and Reuterskiöld2017; Miller et al., Reference Miller, Leonard and Finneran2008; Montgomery & Leonard, Reference Montgomery and Leonard1998; Rubin et al., Reference Rubin, Kantor and Macnab1990; Smith-Lock, Reference Smith-Lock1995; Wulfeck et al., Reference Wulfeck, Bates, Krupa-Wiatkowski and Saltzman2004). A more interesting question is if there is evidence that participants with different language abilities direct different amounts of explicit metalinguistic awareness to the tasks. Gombert (Reference Gombert1992) suggested that not all individuals reach the last stage of metalinguistic development (automatic metaprocesses). If one defines automatic metaprocesses from an emergentist viewpoint, a person with strong and efficient linguistic networks does not necessarily have to employ explicit metalinguistic skills to solve metalinguistic tasks. Instead, robust implicit knowledge can be used to accurately and quickly detect and correct errors. The results seemed to indicate that those participants who had a robust language system were better at exercising cognitive attention and control when needed. This was evident by the strong positive correlation (r=.71) between the accurate repetition of noun phrase errors and a composite language score. This shows that the students with the strongest language abilities were able to utilize more cognitive control to inhibit the automatic correction of the error. This gives support to the notion that language analysis and cognitive control are linked and support each other.
Limitations and conclusions
One potential weakness with the current study is that familiarity with the words in the current experiments was not explicitly checked with each child, but all words were rated as age appropriate by teachers, and included children had receptive vocabulary scores within normal range for their age group. In the final data set, two LF words were excluded to avoid skewing the results, because the responses and comments from several participants indicated that many were not familiar with those particular words. Even after excluding these words, token frequency affected error detection, correction, and repetition performance. It is important to acknowledge, however, that many statistical properties can affect language processing apart from token frequency. This includes (but is not limited to) phonotactic frequency and phonological structure, neighborhood density, and prosodic patterns (see, e.g., Ambridge, Pine, Rowland, & Chang, Reference Ambridge, Pine, Rowland and Chang2012; Coady, Evans, & Kluender, Reference Coady, Evans and Kluender2010; Marchman, Reference Marchman1997; Sabisch, Hahne, Glass, von Suchodoletz, & Friederici, Reference Sabisch, Hahne, Glass, von Suchodoletz and Friederici2009). In the current experiments, words were selected to be of similar length, and no obvious systematic differences in phonological structure between high- and low-frequency words could be seen. Furthermore, two carefully balanced lists of the words were created. This lowers the risk that any single item affected the final results, as well as factors such as lower familiarity or characteristics that were not controlled for in the design.
In future studies the results should be repeated with more participants and wider age groups. This is especially important as the effect of noun token frequency on error correction approached but failed to reach significance. This could have been a consequence of ceiling effects in combination with a possible lack of statistical power. Given that noun token frequency affected response time in error detection, this question warrants further examination.
The weak relationship between the results of the three experiments invites us to reexamine what these tasks measure, the different metalinguistic demands of the tasks, and the different demands associated with different types of errors. The results further challenge the view of grammatical knowledge as separate from the rest of the language system and show that input frequency affects language processing in a dynamic system, where different parts are in constant interaction, even in tasks that previously have mainly been used to capture grammatical knowledge or explicit metalinguistic awareness. The results indicate that implicit knowledge, gained through frequency of exposure, affects the performance on these tasks.
In conclusion, the metalinguistic demands of tasks traditionally viewed as tapping into explicit metalinguistic skills, involving any given morphosyntactic structure, are difficult to determine and seem to vary even within the same task. Thus, a general conclusion is that the use of error detection and correction tasks as blanket measures of metalinguistic awareness should be avoided, if we aim to capture the conscious explicit reflection of language form (Gombert, Reference Gombert1992; Kuo & Anderson, Reference Kuo and Anderson2008). Furthermore, it is important to acknowledge that input frequency may affect performance in tasks that traditionally are viewed as capturing explicit metalinguistic skills in the morphosyntactic domain. This fact needs to be taken into account when designing both error detection and correction tasks, as well as tasks and tests that are designed to measure language knowledge (Conti-Ramsden & Durkin, Reference Conti-Ramsden and Durkin2012; van Kleeck, Reference van Kleeck1994).
APPENDIX A
TARGET HIGH AND LOW FREQUENCY WORDS AND TRANSLATIONS
ACKNOWLEDGMENTS
The authors want to thank participating students, parents, schools, and teachers, and Röstkonsulten Speech and Language Clinic in Stockholm. Thank you to Susannah Levi, Richard Schwartz, and Sven Strömqvist for valuable input. This study was part of Anna Eva Hallin’s doctoral dissertation at New York University, Department of Communicative Sciences and Disorders, and was supported by an NYU Steinhardt School of Culture Education and Human Development Challenge Grant.