Introduction
Background
Studies of children's acquisition of subject-verb agreement often report high production accuracy early in development (Smoczyńska, Reference Smoczyńska and Slobin1985), low rates of commission errors (Hoekstra & Hyams, Reference Hoekstra and Hyams1998), and cross-linguistic regularities in the order of acquisition, leading some accounts to emphasise maturational factors (Hoekstra & Hyams, Reference Hoekstra and Hyams1998; Wexler, Reference Wexler1998) and universal features of grammatical structures (Ackema & Neeleman, Reference Ackema and Neeleman2018; Harley & Ritter, Reference Harley and Ritter2002) as the primary determinants of the learning process. However, the expanding range of research methods and cross-linguistic data continue to reveal patterns of asymmetries in children's performance, which varies significantly across inflectional contexts and languages. These asymmetries highlight the role of multiple language-specific factors.
One such factor is the distributional properties of the ambient language, the effects of which are often obscured in naturalistic studies due to the fact that children self-select words to attempt, and therefore forms with low frequency and low familiarity are difficult to assess. Once this is controlled for, even when overall accuracy is high, particular morphological endings exhibit much higher error rates, often inversely proportional to the input frequency of the given inflectional context and token (Aguado-Orea & Pine, Reference Aguado-Orea and Pine2015; Räsänen, Ambridge & Pine, 2016, Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston & Lieven, Reference Räsänen, Ambridge and Pine2019). Indeed, these effects are also robust in the acquisition of noun morphology (Dąbrowska, Reference Dąbrowska2008; Granlund, Kolak, Vihman, Engelmann, Ambridge, Pine & Lieven, Reference Granlund, Kolak, Vihman, Engelmann, Ambridge, Pine and Lieven2019; Savičiūtė, Ambridge & Pine, Reference Savičiūtė, Ambridge and Pine2018), as well as in many other areas of language development (Ambridge, Kidd, Rowland & Theakston, Reference Ambridge, Kidd, Rowland and Theakston2015; Bybee & Hopper, Reference Bybee and Hopper2001; Lieven, Reference Lieven2010).
Another determinant of cross-linguistic variation in the acquisition of grammatical structures is cue reliability, meaning that highly consistent and distinctive form-function pairings are learnt more quickly (Bates & MacWhinney, Reference Bates, MacWhinney, MacWhinney and Bates1989; MacWhinney, Reference MacWhinney and MacWhinney1987). For example, the comparatively late acquisition of English 3rd person singular-s has long been attributed to its relative opacity (Brown, Reference Brown1973; De Villiers & Johnson, Reference De Villiers and Johnson2007). Similarly, cue reliability has been invoked to account for differences in the development of agreement comprehension in English, Spanish, and French (Legendre et al., Reference Legendre, Culbertson, Zaroukian, Hsin, Barriére and Nazzi2014). Cue reliability has also been shown to explain many cross-linguistic differences in the acquisition of the transitive construction, related to the relative weightings of the cues of word order, and agreement and case marking (Abbot-Smith & Serratrice, Reference Abbot-Smith and Serratrice2015; Chan, Lieven & Tomasello, Reference Chan, Lieven and Tomasello2009; Krajewski & Lieven, Reference Krajewski, Lieven, MacWhinney, Malchukov and Moravcsik2014).
Finally, phonological factors can also explain some of the differences in morphological development across languages. Children's production of functional morphemes has been shown to be affected by utterance position and phonological complexity (Song, Sundara & Demuth, Reference Song, Sundara and Demuth2009; Theodore, Demuth & Shattuck-Hufnagel, Reference Theodore, Demuth and Shattuck-Hufnagel2011), and word position (Kuczaj, Reference Kuczaj1979; Santelmann, Reference Santelmann, Greenhill, Hughes, Littlefield and Walsh1998). This is further complicated by prosodic factors (Demuth, Reference Demuth2018; Lleó & Demuth, Reference Lleó, Demuth, Greenhill, Littlefield and Tano1999), meaning that different languages lead children to become sensitive to different phonological cues. For example, word onsets are particularly salient in English, but often not so in languages with different accentual and phonotactic patterns, as evidenced in both children's word recognition (Hallé & de Boysson-Bardies, Reference Hallé and de Boysson-Bardies1996; Vihman, Nakai, DePaolis & Halle, Reference Vihman, Nakai, DePaolis and Halle2004) and production (Savinainen-Makkonen, Reference Savinainen-Makkonen2007; Szreder, Reference Szreder, Vihman and Keren-Portnoy2013).
The distributional, phonological, and typological properties of the ambient language thus have complex effects on learning, such that they affect the saliency and availability of the target structures, as well as shaping children's sensitivity to particular features. In order to understand how these language-specific factors interact in the acquisition of agreement, crosslinguistic evidence is needed. The current study sets out to provide such evidence from Emirati Arabic (EA), a so far understudied language, where agreement is expressed with highly reliable and phonologically salient syllabic prefixes and circumfixes.
Emirati Arabic
Emirati Arabic (EA) is a variety of Arabic spoken in the region of the United Arab Emirates (UAE), which belongs to a continuum of varieties spoken in the broader region, grouped under the dialectal name of Gulf Arabic. The latter is considered a koine dialect intelligible in and around both shores of the Arabian Gulf including the countries of Kuwait, Bahrain, Qatar, the United Arab Emirates, and parts of Saudi Arabia and Oman. This is quite a large area with considerable variation between local varieties, which has not received much discussion in the Arabic dialectal literature. It is thus perhaps more accurate to think of Gulf Arabic as a dialect continuum with some core similarities rather than as a single dialect. The term Emirati Arabic (as used here) refers to a roughly-defined group of varieties that share core characteristics with specific phonological, lexical, and morphosyntactic idiosyncrasies and a certain degree of intra-dialectal variation which is mostly geographically defined. It incorporates grammatical properties of smaller varieties within the UAE, mainly of tribal nature, including the broader varieties of Dubai and Abu Dhabi (cf. Johnstone, Reference Johnstone1967), as well as a number of other varieties with minor, mainly phonological, distinctive properties.
As with other Arabic varieties and dialects, EA verbs are built on a usually triliteral root (three root consonants) and less frequently quadriliteral (four-consonant) ones, by adding one of nine possible templates (sequences of vowels and possibly affixes), deriving nine forms with certain morphosyntactic properties, e.g., exhibiting causative, applicative/associative, inchoative, mediopassive, or reflexive/reciprocal interpretations (Leung, Ntelitheos & Al Kaabi, Reference Leung, Ntelitheos and Al Kaabi2021). In addition, each of the nine verb forms appears in two different stems with distinct aspectual properties: a perfective stem for completed (past) actions and an imperfective stem for present and future ongoing actions.
Table 1 presents the EA verb forms for the perfective stem and the Modern Standard Arabic (MSA) forms for comparison. Following tradition, we represent different forms using Latin numerals and the trilateral root f-ʕ-l ‘to do’. It must be noted that the list of meanings of the forms is not exhaustive, as they exhibit some variability. Furthermore, many verbs do not appear in all forms. Form I is considered the basic form of each verb. As regards MSA Form IV, it is not used in EA.
Table 1. EA verb forms for the root f-ʕ-l ‘to do’, with Modern Standard Arabic (MSA) equivalents for comparison.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab1.png?pub-status=live)
Based on Al Kaabi & Ntelitheos (Reference Al Kaabi and Ntelitheos2019).
EA verbs are further traditionally classified according to the phonological properties of the root. In keeping with the tradition, we distinguish strong verbs (which do not have a vowel as part of the root), doubled verbs (in which two consonants in the root are identical), and weak verbs (whose root involves a vowel, glide, or the glottal stop) (Feghali, Reference Feghali2008). While this classification may determine the vowel template of the verb, it does not affect affixation in Present Imperfect.
Subject agreement in EA expresses gender, number and person (Qafisheh, Reference Qafisheh1977; Holes, Reference Holes1995). The two aspectual stems (perfective and imperfective) exhibit different subject agreement patterns with respect to the affixes they take. All subject agreement for the perfective stem of the verb in EA is expressed with syllabic suffixes. In contrast, the imperfective form is always marked by a syllabic prefix or circumfix.
Table 2 illustrates the verb inflection paradigm for the root k-s-r ‘to break’. While the imperfective affixes are not strictly analytical, the general pattern is that the prefix carries the information of person (with the exception of 3SG.F), and the suffix – number and gender. A separate set of circumfixes exists for 2PL.F and 3PL.F, but they are almost never used in the dialects, and instead the traditionally masculine affixes are used with both genders.
Table 2. Subject agreement affixes on the Imperfective and Perfective verb stem of k-s-r ‘to break’ in EA.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab2.png?pub-status=live)
There is some variability in the realisation of these forms in casual speech and some imperfective stem suffixes may be omitted. Some speakers drop the final /-n/ of the plural suffix and feminine singular suffix in the imperfective form. The final /-n/ of the plural and feminine singular suffix may also be deleted before the indirect object marker /l-/. The short vowels in affixes also tend to vary due to general instability of vowels in Arabic, and may be realised as more or less open. In addition, weak verbs with a glottal stop or glide root consonant exhibit stem alternation – one stem is used in front of third person suffixes and an alternate stem in front of all other affixes. However, these are less common and will not be investigated in the current study.
Previous studies in the acquisition of verb inflection in Arabic
While there is considerable work on the acquisition of noun plural morphology in Arabic (see Albirini, Reference Albirini, Benmamoun and Bassiouney2017 for an overview), the development of verbal morphology has only received limited attention. Furthermore, a great majority of the studies are naturalistic, rather than experimental, investigations. However, we have some knowledge from these studies regarding the age of acquisition and the types of errors observed, which we discuss below.
Age of acquisition
The most comprehensive overview of the acquisition of an Arabic dialect is perhaps the study of 37 children (ages 0;6–15;0 years) acquiring Egyptian Arabic by Omar (Reference Omar1973), based mainly on language samples from elicitation and imitation tasks. In her discussion of morphological development, Omar includes data only from the six children in the earlier stages of acquisition (1;11–3;6), following the babbling stage. While Omar's younger subjects (1;11, 2;3) produce few inflected verb forms apart from imitations, very few such uninflected forms are found in the speech of the older toddlers (2;8, 3;6). Omar (Reference Omar1973) concludes that most inflections are acquired by the age of 2;0–2;6 years. Interestingly, Omar observes that the older children still tend to have a better mastery of verbal inflections in spontaneous production than in imitation trials, where verbal agreement is frequently dropped. For example, Child 1 (2;8) accurately produces 1st person agreement, but omits verbal agreement morphology frequently in imitation trials. The same pattern was observed in Child 2 (3;0) and Child 4 (3;6).
Similar findings are presented in another investigation of spontaneous Egyptian Arabic production by Fahim (Reference Fahim2017), investigating the development of verb morphology of three children with language impairment (LI) and a group of six typically developing (TD) children with ages ranging between 2;03 and 4;06. The correct provision of subject agreement for person, number and gender appears to develop in the TD group between the ages of two and four years, with the youngest child producing 28% of their forms with agreement errors, and the oldest TD child at 4;06 producing fully grammatical verbal forms.
Only two studies investigate acquisition of agreement inflections using experimental data. The first one, by Basaffar and Safi (Reference Basaffar and Safi2012), discusses the acquisition of verb inflections in the speech of 32 monolingual children aged two to four years, acquiring Hijazi Arabic (a variety of Arabic spoken in parts of Saudi Arabia). The experimental data consists of comprehension and production of real and nonce inflected verbs. In addition, spontaneous speech and data elicited using a video-clip description and a story retell task is also used. The results show steady improvement in performance with age, with a clear difference across conditions, such that error rates are lower in spontaneous speech than in elicited speech, and lower in real words than in nonce words. While spontaneous production shows near-perfect accuracy early on, elicited production reveals errors in 10%-25% of forms, depending on the person-number-gender context and the task, by the age of four. The second experimental study, by Moawad (Reference Moawad2006), is an investigation of gender and number agreement in older, school-aged children, in the nominal and verbal systems of Saudi Arabic (in the Abha variety, spoken in the southwest region of Saudi Arabia, in the city of Abha). Picture-based comprehension and production experiments of 98 children between the ages of 6 and 12, as well as a small longitudinal study of three children reveal that even at this older age, accuracy is significantly lower than in adults.
Error types
The most commonly reported error types are errors of omission: that is, substitutions of an uninflected form for an inflected form, and gender confusion. Omar (Reference Omar1973) reports that most verb inflection errors were errors of omission, and that commission errors usually involved a masculine affix replacing a feminine affix, especially in imperative contexts. Fahim (Reference Fahim2017) also reports mostly errors in gender, such that target feminine forms were erroneously produced for both naturally and grammatically gendered subjects. In addition, children produced both feminine and masculine forms in alternation for the same person or object in successive utterances.
Similarly, Moawad (Reference Moawad2006), in her study of the agreement between nouns, and adjectives and verbs observed masculine forms substituting for feminine ones. In addition, Moawad (Reference Moawad2006) reports that singular forms are often found to substitute for plural and dual (for nouns) forms (as well as plural substituting for dual). This is further confirmed by Abdalla (Reference Abdalla2002) and Abdou and Abdou (Reference Abdou and Abdou1986; as reported in Basaffar & Safi, Reference Basaffar and Safi2012), who present a discussion (in Arabic) of morphological development of two children acquiring Palestinian Arabic, from the first words to five years of age. They find that the singular affixes appear before the plural, a difference they attribute to the complexity of the form. Furthermore, they observe a common developmental trajectory for both children, whereby 3rd person is acquired before 1st, which in turn is acquired before 2nd, although this pattern only holds for perfective forms.
By far the most detailed discussion of subject agreement substitution errors in an Arabic dialect can be found in Aljenaie's (Reference Aljenaie2001, Reference Aljenaie2010) work on the development of Kuwaiti verbal forms, investigating spontaneous language samples in the speech of three children (ages 1;8–3;1). One of the main observations in her imperfective stem data is that masculine forms substitute quite often for feminine forms (19 such errors in the data) while the reverse substitution pattern is very infrequent (only three errors) in the imperfective form of the stem. A second finding in the Kuwaiti imperfective stem data was a pattern of pronoun reversal (Chiat, Reference Chiat, Fletcher and Garman1986; Clark, Reference Clark, Bruner and Garton1978), despite the fact that Arabic is a pro-drop language, in that the children frequently produced the 2nd person singular (feminine or masculine form, depending on the child's gender) instead of the 1st person singular. The two female children used the 2SG.F form in the place of 1SG (three substitutions), and the male child substituted the 2SG.M for the 1SG in 24 instances.
Summary
Overall, the findings from the acquisition of Arabic are compatible with those in other languages. Specifically, naturalistic studies report early high accuracy in spontaneous production, with apparent adult-like proficiency achieved by the age of four. Nonetheless, the increased error rates in imitated speech (Omar, Reference Omar1973) and persistent non-adult-like performance in experimental conditions (Basaffar & Safi, Reference Basaffar and Safi2012; Moawad, Reference Moawad2006) suggest that the high accuracy in spontaneous production may in fact obscure a much lower underlying proficiency, which only surfaces when children cannot self-select the forms they attempt.
Furthermore, the error patterns found in children's production of agreement in Arabic mirror the findings from other languages. Firstly, errors of omission are much higher than errors of commission, suggesting that most of children's difficulties result simply from their lack of familiarity with inflectional affixes, rather than their incorrect application. Secondly, masculine forms are often used to replace feminine forms – a pattern also observed in studies of another Semitic language, Hebrew (Berman, Reference Berman and Slobin1985; Levy, Reference Levy1983). Thirdly, singular forms appear before plural forms, and 3rd person before 2nd person forms. As regards the age of acquisition, despite the high regularity of the paradigm, the findings appear to be different depending on the study. However, none of the studies employ an experimental paradigm allowing for direct comparison with children acquiring other languages.
Research aims and hypotheses
Engelmann and colleagues (Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019) reported the results of an experimental investigation of the acquisition of agreement in two highly inflected European languages, Finnish and Polish. Using an elicitation paradigm, they found effects of type and token frequency in both accuracy rates and substitution patterns (i.e., higher-frequency forms substitute for lower-frequency ones), as well as effects of phonological neighborhood density (PND), such that verb types from higher-density neighborhoods were produced with higher accuracy. We decided to use the same experimental paradigm to study the acquisition of agreement in Emirati Arabic, to enable a direct comparison between the developmental paths of children acquiring typologically different languages.
We had three main research aims. The first aim was to establish the distribution of error rates across the different inflectional contexts. We predicted that error rates would be low overall, as universally found in cross-linguistic research, but also that the errors would not be evenly distributed across inflectional contexts. Specifically, we expected to see higher accuracy in singular than in plural, higher accuracy in masculine than in feminine, and higher accuracy in 3rd person than 1st and 2nd person forms, as in previous studies of Arabic and other languages. Since the experimental paradigm is designed for relatively older children (three to six years), we did not aim to address the question of when agreement first develops. However, we predicted that at least some lasting effects of the asymmetries between different contexts would be observed even at the later ages.
The second aim was to determine whether differences in accuracy, if found, are related to the input frequency of inflectional contexts. We provide a first description of EA child-directed speech, and we predicted that we would find the same positive relationship between input frequency and accuracy as previously shown for Polish and Finnish, using the same methodology.
The third and final aim was to determine whether differences in accuracy are related to the input frequency of individual word forms, i.e., type frequency, token frequency, or PND. We wanted to determine whether there are differences in the acquisition of Arabic vs. Polish and Finnish that could be attributed to the typological differences between the languages, affecting cue reliability and accessibility. We hypothesised that given the extremely regular inflectional paradigm, which is encoded in highly phonologically salient syllabic prefixes and circumfixes, children would be able to generalise their knowledge of the paradigm to all verbs early on, and verb identity would not play a role, especially in older children.
Underlying all the above aims is the intention to provide easily comparable data from a typologically different language, using a methodology developed in previous research of highly inflected European languages, to allow for reliable cross-linguistic comparisons.
Method
Participants
We recruited 48 participants, 19 males and 29 females, aged 2;7 to 5;9 years (mean = 4;7). All participants were native speakers of Emirati Arabic with no known language impairment. The children were tested by native Emirati Arabic-speaking experimenters at home or at school.
Due to the specificity of the local context, it was impossible to ensure that any of the children were monolingual, nor was it realistic to measure the children's exact exposure to different languages. In the UAE, non-native residents constitute 88% of the population, and English is the lingua franca of public life, meaning that all children have regular exposure to the language. Furthermore, many households employ domestic workers, often from South and Southeast Asia, and East Africa, which further affects the linguistic landscape in which the children grow up. Since this multilingual context is common to all Emirati children's experience, it should not be regarded as an interfering factor, but rather as a constant element, inherent to the process of acquisition of the dialect. In this sense, the participants constitute a fairly homogeneous group, in that they all have contact with multiple languages, but not in a systematic way (i.e., they do not attend an English-speaking school or otherwise receive explicit foreign language instruction).
Input-based predictor variables
We set out to investigate the effects of input frequencies of verb types and tokens, and of the different inflectional contexts. To establish the properties of the input, we used the Emirati Arabic Language Acquisition Corpus (EMALAC, Ntelitheos & Idrissi, Reference Ntelitheos, Idrissi and Ouali2017). The corpus consists of 24,695 utterances (78,326 words), and is based on 41 half-hour fortnightly recordings of naturalistic interactions collected over two years. The participants are an Emirati woman and six children, aged 1;8–3;10 at the start of the data collection to 3;4–5;9 at the end. The children and the adult are from the same family (first- and second-degree relations). Approximately one-third of the utterances come from the adult participant (8,512 utterances comprising of 29,478 words), while the rest from the children (cf. Ntelitheos & Idrissi, Reference Ntelitheos, Idrissi and Ouali2017, for detailed information about the composition of the corpus.)
We include the sibling input in our measures of the input for three reasons. First of all, previous studies have suggested that sibling CDS plays an important part in language acquisition (Barton & Tomasello, Reference Barton, Tomasello, Gallaway and Richards1994; Mannle & Tomasello, Reference Mannle, Tomasello, Nelson and van Kleeck1987; Snow, Reference Snow, Fletcher and MacWhinney1995), and it is also often included in CDS corpora, for example the CDS corpus of Polish used by Engelmann et al. (Haman et al., Reference Haman, Etenkowski, Łuniewska, Szwabe, Dąbrowska, Szreder and Łaziński2011). Second, we believe that the corpus provides the closest approximation of the actual language input for children in the UAE, who typically grow up in large, multi-generational families. This is especially true for the relatively older children, such as the participants in our study, who are unlikely to experience any significant amount of one-on-one interaction with adults on a regular basis. Third, and perhaps most importantly, unlike in studies of early infant development, our aim is not to link initial language learning with specific properties of CDS (which would necessitate a careful selection of age-matched adult CDS). Rather, our purpose is to estimate how what children hear often may affect the ease of retrieval or generalisation of structures that they had already learnt. Therefore, we believe that a naturalistic corpus based on family interactions provides the ideal approximation of input for our purposes, regardless of which family members contribute to those interactions. Since we are not comparing error rates between our participants and the EMALAC children, but rather comparing our participants’ error rates to how often certain forms are used around them, we are confident that this does not skew our results. However, in our statistical analysis, we also include an adult corpus of Gulf Arabic GUMAR (Khalifa et al., Reference Khalifa, Habash, Eryani, Obeid, Abdulrahim and Al Kaabi2018) for comparison. GUMAR is the only currently available annotated corpus of 200,381 words of adult EA, and it is based on an internet database of conversational novels – a genre specific to the region, written by internet users in casual language. Overall, we believe that in the absence of a dedicated adult CDS corpus, EMALAC and GUMAR corpora provide a reliable approximation of the properties of the input.
Based on the corpora, we calculated the absolute frequencies of verb types and tokens, as well as of person-number-gender contexts, as a proportion of all words in each corpus. In addition, we also used the EMALAC corpus to establish the PND of all tokens. Engelmann et al. (Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019) used a measure of PND in Polish and Finnish that was based on dictionary phonological classification of verbs. However, Arabic has only three very general phonological verb classes (strong, weak, doubled), which do not determine the inflection in the imperfective verb form. Furthermore, to ensure comparability, we restricted ourselves to the same animations as used by Engelmann et al. (Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019), which made it impossible to fully control for the phonological similarities across verbs. PND was therefore instead calculated for each token based on the EMALAC verb lexicon, using the PND calculator CLEARPOND (Marian, Bartolotti, Chabal & Shook, Reference Marian, Bartolotti, Chabal and Shook2012). The PND of each token was thus the number of all EMALAC verb forms which could be derived by means of addition, subtraction or substitution of a single phoneme. While such measure is imperfect in that it partially correlates with type frequency (since different forms of the same verb are often also its phonological neighbors), it provides an estimate of the availability of phonologically similar words, which the child could potentially use as a basis for analogy across forms.
Thus, the four input-based predictor variables were: verb type frequency, token frequency, person-number-gender frequency (all calculated separately based on EMALAC and GUMAR corpus) and PND (calculated based on EMALAC only, since the GUMAR data are not phonologically transcribed).
Design
The study used the elicited production paradigm developed by Engelmann et al. (Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019), using a subset of the original testing materials available online at https://osf.io/uepz9/.
All of the stimuli were cross-checked with the EMALAC corpus. Out of the 32 verbs that were used for Polish in Engelmann et al. (Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019), 14 were selected, based on their adaptability to the local context and EMALAC frequency, such that half were common (0.03–0.13%), and half fairly uncommon (0–0.003%) in the corpus. Almost all of the verbs were in the default form (Form I), with the exception of jəkwi ‘iron’, which is a Form VII verb.
Table 3 presents the verbs used in the study. All verbs were presented in seven person-number-gender combinations: 1SG, 2SG.F, 3SG.F, 3SG.M, 1PL, 2PL, and 3PL. 2nd person singular masculine (which is syncretic with 3SG.F) was excluded for practical reasons, as all experimenters were female, meaning that a pragmatically natural context could not have been created for the use of the inflection. 2nd and 3rd person plural feminine were excluded because of their very low frequency in modern spoken EA (0.0009% and 0.0076% in the EMALAC corpus, respectively).
Table 3. List of verbs used in the study. All verbs given in 3SG.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab3.png?pub-status=live)
We created eight pseudo-randomized lists, each containing 7 out of the 14 verbs. Each participant completed one list: that is, 7 verbs in all 7 person-number-gender combinations, amounting to 49 test items in total per trial.
Procedure
The experimental procedure was divided into three stages: training, practice and test. All three stages were completed in a single session, which lasted approximately 30–40 minutes. The first stage was intended to ensure that the participants were familiar with the target verbs. In this stage, the child was shown animations of all seven verbs, and the experimenter provided the gerund form of the verb (e.g., ‘Hatha jari’, This is running), to then elicit it back from the child. In the second stage, the test procedure was practiced, and the experimenter elicited all target person-number-gender combinations using animations of a verb not used in the test trial. This was to ensure that the child understood the task. In the test phase, the target verbs were presented in all person-number-gender combinations.
Figure 1 presents a still from one of the videos used in the study. The animations were presented on laptops, using Java-based Processing software (www.processing.org). For 1st and 2nd person contexts, the faces of the characters were digitally replaced with those of the child and the experimenter, to create a pragmatically valid context for the use of these inflections. Each context was thus presented in an animation with different actor/s:
-
1SG: child
-
2SG.F: experimenter (female)
-
3SG.M: male actor 1
-
3SG.F: female actor 1
-
1PL: child and experimenter
-
2PL: experimenter (female) and male actor 2
-
3PL: male actor 3 and female actor 2
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_fig1.png?pub-status=live)
Figure 1. A sample still from the animation for the verb ‘carry’.
After each animation, the child heard the pre-recorded pronoun (e.g., /hi/ she), and was asked to repeat it and complete it with the correct form of the verb (e.g., /hi tabni/ she builds). While the overt pronoun would not always be natural in spontaneous speech, it was nevertheless included to minimise the chances of the child misinterpreting the target person-number-gender context. This was compatible with the original procedure, as Finnish and Polish are also pro-drop languages.
Coding
The responses were recorded on-line on a scoring sheet by the experimenter, as well as audio-recorded using a mobile phone. We had to exclude 123 trials due to a child refusing to continue the experiment or technical problems. There were 2223 usable responses in total.
The responses were coded as correct, incorrect or unscorable. Unscorable responses were those in which the error could not easily be attributed to an issue with the person-number inflection. These included the child using an incorrect verb, incorrect aspectual form, or providing the English equivalent of the word. The rationale for counting these responses separately in the analysis is that it allows for the most conservative estimate of error rates, since only transparent inflectional errors (i.e., correct verb with incorrect agreement marking) are included in the count. The remaining responses were those which involved a correctly inflected form of the verb (correct) and those where the child used the correct verb in the correct aspectual form but with incorrect person-number-gender marking (incorrect).
Results and discussion
We first present descriptive analysis of response types and errors. We then report the results of the statistical analysis using a mixed-effects model, which investigated to what extent children's proficiency can be explained by input properties.
Response types
The proportions of different types of responses – correct, incorrect and unscorable – were very close to Engelmann et al.'s (2019) study on Finnish and Polish. Unscorable responses constituted 34.5% (N = 767) of all data and most commonly involved the child using an incorrect verb (see Table 4). In contrast, errors involving inflectional affixes accounted for only 4% (N = 89) of the responses.
Table 4. Distribution of scorable and unscorable responses for Emirati Arabic (EA), Finnish (FI), and Polish (PL).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab4.png?pub-status=live)
Percentages for Finnish and Polish are taken from Engelmann et al. (Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019)
Table 4 presents the distribution of the different response types, together with the percentages from Finnish and Polish for comparison.
Previous research found that children were more likely to make errors of omission than commission. Although Arabic does not have infinitives per se, children are found to produce imperfective bare stems (Aljenaie, Reference Aljenaie2010) or imperative forms (Abdalla, Reference Abdalla2002; Qasem & Sircar, Reference Qasem and Sircar2017). As Table 4 shows, there were very few such errors in our data. Among the unscorable errors which involved the use of wrong aspectual form (N = 10), only four were substitutions of the imperative form, while the remaining six were of the perfective form. Therefore, the children in our study did not make many classic omission errors, and instead produced more errors of commission (all scorable errors).
As regards the unscorable responses involving the use of the wrong verb or an English equivalent, the difficulty in interpreting these errors is that it cannot be known to what extent they constitute an avoidance strategy, and to what extent they reflect issues with word retrieval, attention, or simply faulty experimental materials. However, it is worth noting that the number of unscorable responses for each verb did show a weak positive correlation with the accuracy rates based on scorable responses (r = 0.53, p < .05). That is, verbs which were often not attempted were also more likely to be produced incorrectly when they were attempted, suggesting that at least some proportion of unscorable responses might have been due to avoidance of challenging targets.
Table 5 presents the percentage of unscorable and accurate responses per verb. The unscorable responses are counted as a percentage of all responses, whereas the accuracy is calculated as a percentage of correct responses out of only the scorable responses. It is worth noting that there were two verbs (build and chase) which had rates of unscorable attempts that were very high (65% and 85%, respectively), but the scorable attempts showed 100% accuracy. In these cases, it is likely that the unscorable responses reflected genuine lack of familiarity with the verb (which was not remedied by training) or misinterpretation of the video.
Table 5. Types of responses by verb.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab5.png?pub-status=live)
There were small differences in the distribution of errors by age of participants. While age was not related to accuracy of scorable responses (r = 0.01, p > .01), it showed a weak but statistically significant positive correlation with absolute accuracy: that is, including unscorable responses (r = 0.32, p < .05).
Table 6 presents accuracy rates by age, with the participants grouped into age categories for ease of illustration. Younger children were more likely to not attempt the target, but when they did, they were no less accurate than their older peers. This further supports the conclusion that some proportion of unscorable errors were in fact an avoidance strategy, which masked lack of proficiency in the target inflections. We return to the differences in performance across the age groups in a later section.
Table 6. Accuracy rates by age group.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab6.png?pub-status=live)
Inflectional errors
To avoid speculation about the nature of children's errors in unscorable responses, all analyses in this section include only scorable responses: that is, responses in which the target verb was attempted (N = 1456). Since the lack of age effect on accuracy in scorable responses and the even distribution of errors among different age groups suggest that the children were at a similar level of proficiency in the use of the inflections, these analyses are performed on the entire cohort.
In order to understand the patterns of errors, we divided them by the target context and the substituted form, and then looked separately at children's proficiency in person, number and gender, and their markers (cf. Appendix 1 for the full list of errors.)
Errors were not distributed evenly across contexts: while the overall percentage of scorable errors was 6.1%, it ranged from 2.4% to 11.8% depending on the context, which appeared to be inversely related to context frequency. Furthermore, 73% of errors involved a substitution of a more frequent for a less frequent form.
Table 7 presents the percentage of errors in each person-number-gender combination, together with the context's frequency in the EMALAC and GUMAR corpora, ordered by % of errors. Singular contexts have higher frequency than plural contexts in both corpora, and this corresponds to generally lower error rates in singular than in plural contexts. 3SG.M and 3SG.F have high frequency in both corpora, while 1PL and 2PL are very infrequent, and this, too, is reflected in the inverse accuracy of these four contexts. In contrast, there are differences between the two corpora when it comes to the frequencies of 2SG.F and 3PL, such that these contexts are much more frequent in EMALAC than in GUMAR, while the opposite is true for 1SG. The sharp asymmetry between 1SG and 2SG in CDS and adult speech is cross-linguistically common (Ambridge et al., Reference Ambridge, Kidd, Rowland and Theakston2015), and confirms that the EMALAC corpus exhibits the properties typically associated with CDS. In our data, the children's error rates appear to reflect the pattern of CDS more closely than that of adult speech. Specifically, 2SG.F and 3PL have high accuracy, as predicted by their high frequency in EMALAC, despite the low frequencies in the adult corpus. Conversely, 1SG is the least frequent form in EMALAC, and the least accurate in the data, despite it being extremely common in GUMAR. These patterns were further confirmed by the statistical model, which we discuss in a later section.
Table 7. Errors by inflectional context.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab7.png?pub-status=live)
Person
Errors usually did not preserve target person (e.g., 3PL for 3SG), with only 30% of errors being person-congruent. Instead, a great majority of errors involved a substitution of 3rd person markings (plural or singular) for other forms. In fact, these constituted 85% of all errors, with 43% using 3PL as a substitute, and 21% each using 3SG.M and 3SG.F forms. One reason for this may be input frequency, as these forms are by far the most frequent in CDS (cf. Table 7).
Sixty percent of all person errors were due to a prefix error: that is, a substitution with a form which differed only by prefix. There were no cases of prefix omission. This is perhaps unsurprising, given that person is marked on the prefix in a mostly predictable manner, and all forms have prefixes.
Table 8 presents the percentage of all prefix errors. It is apparent that the errors tend to go both ways: that is, prefixes substitute for one another. However, there are differences in the direction of the errors, which appear to be attributable to the difference in frequency between the two forms. For example, 3PL substituting for 2PL is the single most common error, while the reverse error is only attested once in the data, which can be explained by the large difference in the frequency of the forms (cf. Table 7). A similar explanation can be applied to the substitutions between 1SG and 1PL. In general, the more frequent prefix most often substituted for the less frequent one.
Table 8. All errors due only to incorrect prefix.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab8.png?pub-status=live)
Gender
The only context tested in our study which distinguished between genders was 3SG. Based on previous studies, we expected the masculine form to show more accuracy than the feminine form, especially considering 3SG.F is also the only 3rd person form which takes the prefix /tə-/ instead of /jə-/. Indeed, errors were over twice as common in 3SG feminine than masculine targets (7.6% vs. 3.1%, respectively).
Table 9 presents all substitution of both the forms. Almost half (44%, N = 7) of the errors in 3SG.F targets were due to substitution with the masculine equivalent (i.e., /jə-/). However, the reverse pattern (3SG.F substituting for 3SG.M) was also found in 57% (N = 4) of 3SG.M errors. Since both forms are very frequent in the input and both were equally likely to be used as a substitution for other targets, it is likely that the slightly higher rate of masculine to feminine substitutions is primarily due to the latter's exceptional marking in the paradigm. The overall proportion of feminine to masculine forms outside of 3.SG was not measured, as the gender among stimuli was not balanced (there were more feminine than masculine contexts). Finally, there was no effect of participant sex on the gender errors.
Table 9. Substitutions in 3rd person singular by gender.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab9.png?pub-status=live)
Number
Errors were almost twice as likely to affect plural targets (8.2%) as singular ones (4.6%), in line with the forms’ frequency in the input (cf. Table 6). However, children were equally likely to replace the plural target with another plural form (e.g., 3PL for 1PL) as they were to replace it with a singular form (51% vs. 49% of all plural errors, respectively), but were slightly more likely to substitute singular forms for other singular forms (63% of all singular errors).
Number (and sometimes gender) is encoded in the suffix. The suffix is different from the prefix in that it is not present in all forms. Rather, it is null for most singular forms (except 2SG.F /-i:n/) and present in most plural forms (except 1PL). In our study, there were only three contexts which called for the use of a suffix: 2SG.F, 2PL and 3PL. However, despite the relative opacity of the suffix, errors could not be easily attributable to problems with only the suffix, whether in instances of commission or omission.
Table 10 presents all errors that involved the child producing the wrong suffix, or adding a suffix where the target form had none. For each error, the ‘Target’ column presents the correct adult affix, and the ‘Substitution’ column presents the child's response. In contrast with the pattern of prefix errors, there were only two errors of commission that affected only the provision of an incorrect suffix, and three further instances of an error in both prefix and suffix.
Table 10. Errors of suffix commission.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab10.png?pub-status=live)
Table 11 presents the number of suffix omissions. As can be seen from the table, six of these errors were due to suffix omission without changing the prefix, four of which involved the substitution of 2SG.F for 2PL. Given that 2PL was generally the least accurate and least frequent context, it is unlikely that many errors could be attributed solely to problems with suffixation.
Table 11. Errors of suffix omission.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab11.png?pub-status=live)
Overall, the children's errors were perhaps qualitatively best characterised as reverting to 3rd person affixes – both singular and plural, masculine and feminine – as default forms. This can be attributed to both the very high relative frequency of these forms in the input, and to the pragmatic context. It could be argued that when describing the actions of a character in the video, using 3rd person is always to some extent justified. However, seeing as children did not generally maintain number in these cases, the pragmatic context cannot explain all of the errors. Rather, the input frequency provides a more reliable explanation, and indeed 73% of all errors involved a substitution of a higher-frequency for a lower-frequency form.
Statistical analysis of input-based and age predictors
In order to analyze the extent to which the characteristics of the input may affect children's proficiency in producing verbal morphology, we used Bayesian statistics. Bayesian models have several advantages over traditional approaches. Most importantly, Bayesian inference specifies the probability of an effect being true given the data. This is different from the traditional null hypothesis significance testing (NHST) approach, where the p-value specifies the probability of an effect being at least as large as observed, assuming that the null hypothesis is true. A challenge with the NHST approach is that non-significant results cannot be interpreted (see e.g., Dienes, Reference Dienes2011, Reference Dienes2014); the results cannot tell whether there is indeed no effect, or whether there is simply not enough data.
In our analysis, we fitted generalised linear mixed-effects models (glm) in R (R Core Team, 2019), using the rstan package (Stan Development Team, 2020) with the rstanarm extension (Gabry & Goodrich, Reference Gabry and Goodrich2016; see also Nicenboim & Vasisth, Reference Nicenboim and Vasishth2016). Because the scorable responses are binary (correct/incorrect), we implemented the models with a binomial link function. We used the models to calculate credible intervals. Unlike confidence intervals, which are often misunderstood (see Morey et al., Reference Morey, Hoekstra, Rouder, Lee and Wagenmakers2016), credible intervals provide the range within which the true effect lies with a certain probability, given the data.
We fitted models with accuracy (0, 1) as the dependent variable, and Age (in months), VerbFreq (verb type frequency), TokenFreq (token frequency), PNfreq (person-number-gender frequency) and PND (phonological neighborhood density), and the interaction of Age TokenFreq, Age and PND, and PND and TokenFreq as fixed factors. All variables were scaled and centered. In addition, we included random intercepts for Participant and Verb, and by-participant slopes for Verb.
We used weakly informative priors for fixed and random effects, which means that we did not make any strong assumptions about the data. Specifically, the priors for the intercept and slope were a Student t-distribution with two degrees of freedom and a mean of zero. For the random effects correlation matrix, a so-called LKJ prior was used (see Sorensen, Hohenstein & Vasishth, Reference Sorensen, Hohenstein and Vasishth2016, for a tutorial). We report the mean estimate β, the lower and upper limits of the 95% credible interval, and the probability P of the effect being smaller than zero (for negative estimates) or greater than zero (for positive estimates). Unlike with p-values, there is no binary decision threshold in the Bayesian approach to accept an effect as true. We use the following principles for interpretation:
• If the 95% credible interval does not span zero, we interpret this as strong evidence for an effect.
• If P is smaller than .95, we conclude there is no evidence for an effect.
We conducted the analyses separately on EMALAC and GUMAR frequencies, and separately for all responses and scorable responses only.
In the model based on EMALAC frequencies and including only scorable responses (Table 12), we found strong evidence for an effect only for person-number-gender frequency (PNFreq). There was no evidence for effects of Age, VerbFreq, TokenFreq or PND, but there was an interaction between TokenFreq and Age in the predicted direction: TokenFreq played less of a role for older than for younger children.
Table 12. Summary of the results of the EMALAC-based Bayesian linear mixed effects model using scorable responses only.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab12.png?pub-status=live)
To illustrate this effect, we again divided the children into age groups (see Table 6) and split the tokens into high- and low-frequency items using a mean split.
Figure 2 illustrates the accuracy rates by age group and centred token frequency, split into high- and low-frequency categories (mean-split), if only scorable responses are included. Counterintuitively, the oldest children are the least accurate, especially on higher-frequency tokens. However, the predicted effect of improvement with age becomes evident when all responses (i.e., both scorable and unscorable ones) are considered (N = 2223). In this model (Table 13), there is the same effect of PNFreq and the same interaction between Age and TokenFreq: younger children are much more accurate with high-frequency tokens compared to low-frequency tokens, whereas older children do not show this difference. In addition, the impact of age becomes more prominent, in that the model suggests a high probability of an effect, although the 95% credible interval does marginally span zero.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_fig2.png?pub-status=live)
Figure 2. Bar chart illustrating the interaction between Age and TokenFreq for different ages, using only scorable responses. Note that for illustration purposes, the centred token frequency variable has been dichotomised into high and low frequency tokens (mean split), and age is shown in three groups rather than as a continuous, centred variable.
Table 13. Summary of the results of the EMALAC-based Bayesian linear mixed effects model using all responses.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab13.png?pub-status=live)
Figure 3 presents the percentage of accurate, scorable responses out of all responses (both scorable and unscorable). It illustrates children's performance by age group and token frequency (as a binary variable, dividing all centred token frequencies into high or low using a mean-split) when unscorable responses are included. It is apparent that younger children were overall less accurate, and specifically less accurate on low-frequency tokens, but that when faced with such tokens they were more likely to not attempt the target at all and hence produce an unscorable response. The absence of an age effect and the poorer performance of older children on high-frequency tokens in the scorable-only model suggests that including only scorable responses may have, in fact, considerably favored younger children. In other words, the younger children produced fewer scorable responses, but those they produced were mostly accurate.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_fig3.png?pub-status=live)
Figure 3. Bar chart illustrating the interaction between age and token frequency for different ages, using scorable and unscorable responses.
Importantly, there was strong evidence for an effect of PNFreq in both models, further confirming that producing unscorable responses was related to proficiency in agreement markings.
Figure 4 presents performance by PN context by age, including unscorable responses. A comparison of the model based on all responses and the model based on only scorable responses suggests that excluding unscorable errors may not provide the intended benefits of revealing any additional information about the children's proficiency, as the effects of PNFreq are robust and evident regardless of which responses are counted. Instead, excluding those errors may obscure some age-related differences in performance, by significantly overestimating younger children's proficiency. Nonetheless, even in the all-response model, the main effect of Age was only a tendency, and the Age:TokenFreq interaction was more important.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_fig4.png?pub-status=live)
Figure 4. Accuracy by inflectional context and age group, based on scorable and unscorable responses.
Figure 5 presents the 95% credible intervals in the all-response EMALAC corpus. The main effect of PNFreq and the Age:TokenFreqinteraction have intervals that do not span zero and P >.95. Although the interval for Age does span zero, it does so only marginally, suggesting that a tendency for a main effect of Age is likely when all responses are included, which confirms the trends found in the descriptive analysis.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_fig5.png?pub-status=live)
Figure 5. 95% credible intervals for the all-response EMALAC corpus.
For comparison, we repeated the analyses based on the adult corpus, again separately using only the scorable vs. all responses (Tables 14 and 15). In both the scorable-only model and the all-response model, all of the credible intervals spanned zero – in other words, none of the predictor variables had an unequivocal (positive or negative) effect on accuracy. However, in the all-responses model, there was again a high probability of an effect of both PNFreq and Age, with higher PNFreq and higher ages improving accuracy. The GUMAR all-responses model is thus more in line with the EMALAC models than the GUMAR scorable-only model.
Table 14. Summary of the results of the GUMAR-based Bayesian linear mixed effects model using scorable responses only.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab14.png?pub-status=live)
Table 15. Summary of the results of the GUMAR-based Bayesian linear mixed effects model using all responses.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab15.png?pub-status=live)
Figure 6 presents the 95% credible intervals in the GUMAR all-response model. The Age and PNFreq effects both have P > 95%, and although the intervals marginally span zero, they mirror the results of the all-response EMALAC model (cf. Fig. 5).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_fig6.png?pub-status=live)
Figure 6. 95% credible intervals in the GUMAR all-response model.
The difference in the results between the EMALAC and the GUMAR models is most probably related to the different nature of the two corpora. This includes a slightly different distribution of PN frequencies (cf. Table 6), which are in line with commonly reported differences between child- and adult-directed speech, as well as different type and token frequencies. It is likely that the GUMAR corpus, as a collection of adult written – albeit casual – language may be less representative of the language children are typically exposed to than the EMALAC corpus. However, the two are not entirely divergent, as apparent from the fact that the all-response GUMAR model does closely mirror the effects found in the EMALAC models.
Overall, across the four models, we found robust evidence for effects of inflectional context frequency and age:token frequency interaction, and a marginal effect of age. In other words, while we did not find an absolute improvement with age, we found an improvement on low-frequency tokens with age. These results suggest that at the age that was tested in our study, all children were already familiar with the paradigm, and thus when they attempted the targets, they were highly accurate. However, their proficiency in the use of the affixes was still sensitive to the inflectional context's frequency, and token frequency – although only at the younger ages.
General discussion
We hypothesised that error rates would significantly vary across contexts, that the variability would be attributable to the contexts’ input frequency, and that it would not show effects of verb identity. The first hypothesis was confirmed: although the general error rate was low at 6%, it was almost double that for the least accurate person-number-gender context (2nd person plural, 11.8%), and less than half for the most accurate one (2nd person singular, 2.4%). This is in line with previous experimental findings (Aguado-Orea & Pine, Reference Aguado-Orea and Pine2015; Basaffar & Safi, Reference Basaffar and Safi2012; Engelmann et al., Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019; Räsänen et al., Reference Räsänen, Ambridge and Pine2016), which show that high overall accuracy often conceals unevenly distributed patterns of errors. The distribution of errors across the contexts was also compatible with previous studies on Arabic and other languages, which showed that 3rd person, singular, and masculine inflections show higher rates of accuracy than plural, feminine, 1st and 2nd person inflections (Abdou & Abdou, Reference Abdou and Abdou1986 as reported in Basaffar & Safi, Reference Basaffar and Safi2012; Aljenaie, Reference Aljenaie2010). The inflectional contexts previously shown to be acquired earlier showed higher accuracy in our data and had an effect on the pattern of substitutions. This was true especially of 3rd person forms, which substituted for other forms in 85% of the errors.
Our second hypothesis was that the uneven distribution of errors could be explained by the inflectional context frequency and this was also confirmed: the statistical analysis of the data revealed robust evidence for a main effect of context frequency on response accuracy, such that the contexts with the highest frequency were the most accurate. The effect held whether all or only scorable responses were included, and was also visible in the all-response model using the adult corpus frequencies, confirming that it was not an artifact of the input data selected. This result is in line with the findings of Engelmann et al. (Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019), and suggests that children's proficiency in verb inflections is related to their exposure to those inflections. In other words, children are better able to provide the affixes they hear most often.
Alternative explanations sometimes offered in the literature for the differences in the order of acquisition and accuracy across different contexts postulate that there are certain linguistic properties inherent in the inflections that determine the order in which they are acquired. Most of these accounts are rooted in the theory of linguistic markedness (Greenberg, Reference Greenberg1966; Jakobson, Reference Jakobson and Greenberg1963), according to which some inflectional contexts are more marked than others, and hence acquired later than the less marked ones. For example, the markedness hierarchy predicts that 3rd person contexts will be acquired before 1st person contexts due to the former being less marked than the latter (Ackema & Neeleman, Reference Ackema and Neeleman2018; Hanson, Reference Hanson2000). One of the arguments often used to provide support to these privative feature systems against competing input-frequency based accounts is the late acquisition of 2nd singular pronouns and agreement markers despite their relatively high frequency in child-directed speech (see, for example, Laakso & Smith (Reference Laakso and Smith2007), where the 2nd singular pronoun is shown to be the most frequent subject in child-directed speech). Despite this high frequency in the input, 2nd person singular pronouns and agreement markers seem to be acquired late in many languages (Ackema & Neeleman, Reference Ackema and Neeleman2018). What these accounts fail to predict are cases where high-frequency 2nd person pronominal agreement in the input corresponds to relatively accurate use of such agreement by children at early stages. In our study, frequency seems to make slightly better predictions with respect to 2nd person agreement development. The 2SG.F context exhibited the smallest number of errors (2.4%, equal to that of 3PL), which inversely corresponds to the second highest frequency in the input, as measured in the EMALAC corpus. This result cannot readily be explained by feature-geometric/privative feature bundle accounts of agreement development, which take 2nd singular and feminine agreements to be ‘marked’.
As Ambridge et al. (Reference Ambridge, Kidd, Rowland and Theakston2015) note, in the case of low accuracy of 2nd person pronouns and agreement in child data from other languages, longitudinal studies cannot distinguish between failure to learn specific forms despite their high frequency and usage choices in which children “have learned these forms, but find little use for them (e.g., young children are not interested in talking about what their listener is doing)” (p. 245). An interpretation based on pragmatic, rather than grammatical, factors, may be better able to account for cross-linguistic differences in the acquisition of 2SG. Future studies may determine whether perhaps the cultural context in Emirati Arabic could have an effect on the increased use of 2SG by children, as compared to other languages studied.
Finally, it should be noted that while 2SG.F was produced with the highest accuracy, it was nonetheless very rarely used as a substitution for other forms. We propose that this further strengthens the argument for pragmatic factors playing a role in children's performance. That is, while children know the 2SG.F affixes well, they also understand that it is not appropriate when describing an animation (which is naturally a 3rd person context). This explanation would account for the fact that even though 3SG.F and 3PL were not as accurately produced as 2SG.F, they were at the same time more likely to be used as replacements for other forms. In the context of describing the actions of characters on the screen, if the child is not able to provide the expected person-number-gender affixes, 3rd person affixes are the next, most pragmatically justifiable solution. In contrast, it is possible that the higher number of 2nd person substitutions in naturalistic studies may be associated with that being a more pragmatically justified response in a one-to-one conversation between the child and the experimenter.
Our third hypothesis was that children would be able to generalise the acquired affixes to all verbs early on, since the high availability, reliability, and phonological saliency of EA Imperfective inflections would not require them to rely on item-by-item analogies. Contrary to previous findings in Polish and Finnish, this hypothesis was partially confirmed, as there was no main effect of type frequency, token frequency, or phonological neighborhood density on accuracy in our data. Children performed equally well with high- and low-frequency verb types, with dense as well as sparse phonological neighborhoods. Nonetheless, contrary to our predictions, there was also a strong interaction between age and token frequency – in scorable and unscorable responses, using CDS and adult corpus frequencies. Furthermore, the effect of age was marginal and was only observable when unscorable responses were considered. While unscorable responses cannot be easily interpreted, there is strong evidence in our data that at least some of them are avoidance errors – their prevalence in the data from younger participants and a reverse correlation between accurate and unscorable responses within each verb suggest that these errors are at least partially related to lower proficiency in the inflections.
These results suggest that all the children already knew the paradigm fairly well, as evident from their high accuracy in scorable responses, but the younger children were more likely to avoid attempting the target and less able to apply their knowledge to low-frequency tokens. This effect was hypothesised by Engelmann et al. (Reference Engelmann, Granlund, Kolak, Szreder, Ambridge, Pine, Theakston and Lieven2019) but not found in their Polish and Finnish data. It implies that even at a fairly high level of proficiency, in a language which provides a highly regular and accessible paradigm, children do not fully abstract the paradigm away from individual stored exemplars early on.
On the one hand, such long-lasting effects of token frequency are incompatible with maturational approaches, which propose that grammatical structures become fully productive as soon as they are learnt (Radford, Reference Radford1990; Wexler, Reference Wexler1998). They appear to confirm the predictions made by usage-based approaches which rely on exemplar learning (e.g., Bybee, Reference Bybee2010; Tomasello, Reference Tomasello2003). According to these approaches, even when children eventually generalise the grammatical patterns away from strictly item-based constructions, grammatical accuracy is expected to exhibit strong and long-lasting effects of frequency due to the process of analogy being conducted on-line, over all stored exemplars. On the other hand, the interaction of token frequency and age with a simultaneous lack of a main effect of token frequency does suggest that token frequency plays a smaller role in the acquisition of EA than in Polish and Finnish (where a main effect of token frequency is observed but no interaction with age). Indeed, the token frequency does not have an effect for the oldest children in our study. Overall, since the token frequency:age interaction had mostly been hypothesised but not found in previous studies, and since this is the first study of the acquisition of agreement in EA, further research will be needed to investigate the reliability and the nature of this interaction. Nonetheless, it is possible that it indicates that the phonological saliency, reliability and regularity of the paradigm leads to earlier generalisations than in other languages, but later than predicted by maturational approaches.
As regards the effects of PND, while our results cannot illuminate the earliest stages of development, they do suggest that at least by age three children acquiring EA do not rely on phonological similarity in analogizing across forms. We propose that this is due to the properties of the system – unlike in Polish, where phonological similarity determines the inflectional class and therefore the affix that needs to be used, the uniformity of affixes across the EA paradigm does not require reliance on the phonological form of the verb in determining the inflection. Furthermore, the high phonological saliency of the affixes, including the reliable use of syllabic prefixes and less reliable but longer suffixes (i.e., including a long vowel) could explain why EA inflections are learnt earlier than those from even more regular languages, such as Finnish, and why they are learnt independently from the phonological form of the verb. Our results demonstrate that children rarely omit either prefixes or suffixes: there were only four instances of children substituting an imperative form, only two cases of errors explainable entirely as suffix omission, and none of prefix omission. Further research is also needed to clarify whether an effect of phonological neighborhood density may be found if a different measure of similarity is used. It is possible that, due to Arabic's templatic morphology, analogies are based on different units than the whole word forms, such as consonantal roots.
Our findings further illuminate how data collection and analysis methods can affect the conclusions we draw about child language proficiency, and therefore highlight the need for complementing naturalistic observations with experimental data. Experimental studies consistently reveal lower proficiency level than the results obtained from raw production studies. In other words, when children do not self-select the targets they want to say, they make more errors, which further strengthens the argument for conducting experimental (rather than observational) studies in the acquisition of inflection. Furthermore, our data suggest that focusing exclusively on ‘pure’ inflectional errors may significantly overestimate the proficiency levels of younger children, while not necessarily providing additional insights into the learning process. The patterns of unscorable responses in our study strongly suggest that these ‘unscorable’ errors of omission or avoidance often do, in fact, reflect issues with proficiency in the target structures.
Conclusion
Since languages employ a range of grammatical forms and structures which present distinctive challenges, comparing cross-linguistic developmental paths can help us determine to what extent the learning process is shaped by the input. Our results from the acquisition of Emirati Arabic confirm previously reported effects of input frequency on the acquisition of verb inflection, while also suggesting that children are able to generalise learnt forms across types and tokens when the system being acquired is regular and easily accessible. While frequency is unquestionably important, high phonological saliency and regularity of the system may eventually lead to across-the-board generalisations which are no longer susceptible to frequency effects, although this was only true for our oldest participants. The results are compatible with an approach to language development which emphasises the properties of the input as factors determining the trajectory of the learning process, but does not preclude an ability of the child to eventually fully generalise grammatical processes, although such generalisations may take a considerable time to develop.
Appendix
The raw numbers of all errors by target and substituted form.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220704031256956-0569:S0305000921000155:S0305000921000155_tab16.png?pub-status=live)