Hostname: page-component-745bb68f8f-g4j75 Total loading time: 0 Render date: 2025-02-05T22:40:36.986Z Has data issue: false hasContentIssue false

Starting Big: The Effect of Unit Size on Language Learning in Children and Adults

Published online by Cambridge University Press:  29 June 2020

Naomi HAVRON*
Affiliation:
Hebrew University of Jerusalem, Israel Department of Psychology, ENS, EHESS, CNRS, PSL University, France
Inbal ARNON
Affiliation:
Hebrew University of Jerusalem, Israel
*
*Address for correspondence: Naomi Havron, 29 rue d'Ulm, Paris, France. Email: naomi.havron@mail.huji.ac.il
Rights & Permissions [Opens in a new window]

Abstract

Multiword units play an important role in language learning and use. It was proposed that learning from such units can facilitate mastery of certain grammatical relations, and that children and adults differ in their use of multiword units during learning, contributing to their varying language-learning trajectories. Accordingly, adults learn gender agreement better when encouraged to learn from multiword units. Previous work has not examined two core predictions of this proposal: (1) that children also benefit from initial exposure to multiword units, and (2) that their learning patterns reflect a greater reliance on multiword units compared to adults. We test both predictions using an artificial-language. As predicted, both children and adults benefit from early exposure to multiword units. In addition, when exposed to unsegmented input – adults show better learning of nouns compared to article-noun pairings, but children do not, a pattern consistent with adults’ predicted tendency to focus less on multiword units.

Type
Article
Copyright
Copyright © The Author(s), 2020. Published by Cambridge University Press

Introduction

While words are often seen as the basic building blocks of language (e.g., Pinker, Reference Pinker1991), there is recent theoretical interest and empirical support for the idea that multiword units also play an important role in language learning and use (Abbot-Smith & Tomasello, Reference Abbot-Smith and Tomasello2006; Arnon, Reference Arnon2010; Arnon & Christiansen, Reference Arnon, Christiansen, Brooks and Kempe2014; Bannard & Matthews, Reference Bannard and Matthews2008; Croft, Reference Croft2001; Goldberg, Reference Goldberg2006; Lieven, Salomo & Tomasello, Reference Lieven, Salomo and Tomasello2009; Wray, Reference Wray1999). This insight is shared by construction-based approaches to language structure (e.g., Croft, Reference Croft2001; Goldberg, Reference Goldberg2006); usage-based approaches to language learning (e.g., Tomasello, Reference Tomasello2003); and emergentist approaches to language processing and representation (e.g., Elman, Reference Elman2009; McClelland, Reference McClelland2010). Under all of these accounts, multiword sequences are integral building blocks of language. We use the term multiword unit here to refer to sequences larger than a single lexical word: using the term ‘unit’ for such sequences does not mean that they are stored holistically, but that there is a representation for both the larger sequence and the individual words (see Arnon & Cohen-Priva, Reference Arnon and Cohen-Priva2014; Baayen, Hendrix & Ramscar, Reference Baayen, Hendrix and Ramscar2013, for two different possible implementations).

Such units are seen as important building blocks for learning grammatical relations and are predicted to impact first language learning (Abbot-Smith & Tomasello, Reference Abbot-Smith and Tomasello2006). Supporting this prediction, multiword information impacts children's correct and erroneous early productions (Kirjavainen, Theakston & Lieven, Reference Kirjavainen, Theakston and Lieven2009; Lieven et al., Reference Lieven, Salomo and Tomasello2009). For instance, young children are more accurate at repeating higher frequency four-word phrases (Bannard & Matthews, Reference Bannard and Matthews2008), while slightly older children make fewer over-regularization errors in frequently encountered multiword phrases (Arnon & Clark, Reference Arnon and Clark2011). Moreover, like single words, early-acquired phrases show Age-of-Acquisition effects: when presented with phrases that are equally frequent in adult speech, adults respond faster to early-acquired phrases compared to the later acquired ones (e.g., take them off-earlier vs. take time off-later, Arnon, Mccauley & Christiansen, Reference Arnon, Mccauley and Christiansen2017). This suggests that like words, multiword units serve as building blocks in language learning, with early-acquired units showing privileged processing in adults.

Recent work relates the use of multiword units in learning to the differences between children and adults in learning language. The claim is that multiword units can facilitate learning of certain grammatical relations, and that children and adults differ in their use of such units in the learning process (Arnon, Reference Arnon2010; Arnon & Ramscar, Reference Arnon and Ramscar2012; Christiansen & Arnon, Reference Christiansen and Arnon2017). In brief, the proposal is that learning from multiword units can improve mastery of certain semantically-opaque grammatical relations that hold between consecutive words (such as gender agreement, verb proposition pairings, and collocations) by increasing the association between the grammatical element and the word it modifies. To give an example, learning article-noun gender agreement in French will be easier when one first associates the entire article-noun sequence with an object (e.g., la-balle = ball) and only then learns the meaning and role of the individual words. Conversely, when learners first acquire vocabulary items (balle = ball) and only then learn which articles they are supposed to appear with, the association between the article and the noun will be weaker (see Arnon & Ramscar, Reference Arnon and Ramscar2012 for a formal learning model demonstrating this). Multiword units can be formed either via under-segmentation (where a multiword sequence is initially not segmented) or chunking (where words that co-occur often together are also represented as a larger unit). Importantly, adults are also predicted to represent both multiword sequences and single words, with multiple factors (e.g., frequency, meaning, age of acquisition) impacting the degree to which the sequence is processed as a whole unit (Christiansen & Arnon, Reference Christiansen and Arnon2017).

This proposal relates the facilitative role of multiword units on learning to adults’ well-documented difficulty with mastering certain aspects of a second language. Adults’ prior linguistic knowledge, and in particular their knowledge of words, and their experience with seeing words in written form can reduce their reliance on multiword units in the learning process. This can have detrimental consequences for learning a set of consistent grammatical relations (like gender agreement), that adults are known to struggle with (Nesselhauf, Reference Nesselhauf2003; Siyanova & Schmitt, Reference Siyanova and Schmitt2007; Wray, Reference Wray1999). The idea that adults rely less on multiword units, and that such units can facilitate learning is supported by several lines of research. Literate adults show increased attention to words as units of processing compared to illiterate adults (Havron & Arnon, Reference Havron and Arnon2017a), a pattern that is also found when comparing pre-literate and literate children (Havron & Arnon, Reference Havron and Arnon2017b). Moreover, not knowing how to read is associated with better learning of agreement patterns compared to vocabulary items in an artificial language, consistent with a bigger focus on multiword units (Havron, Raviv & Arnon, Reference Havron, Raviv and Arnon2018). That is, experience with written words (in orthographies where they are separated by spaces), which adult learners have, seems to lead to increased reliance on single word units.

In addition, there is growing evidence that unit size impacts learning outcomes, with early exposure to multiword units facilitating adult learning of certain grammatical relations. A series of artificial language learning studies showed that adult learning of article-noun gender agreement can be facilitated when adults are exposed first to larger units (Arnon & Ramscar, Reference Arnon and Ramscar2012; Siegelman & Arnon, Reference Siegelman and Arnon2015). The domain of grammatical gender was chosen because gender agreement is notoriously difficult for adults to learn (Dewaele & Véronique, Reference Dewaele and Véronique2001; Holmes & de la Batie, Reference Holmes and de la Bâtie1999; Scherag, Demuth, Rösler, Neville & Röder, Reference Scherag, Demuth, Rösler, Neville and Röder2004), but mastered by children with relative ease (Bassano, Maillochon & Mottet, Reference Bassano, Maillochon and Mottet2008; Karmiloff-Smith, Reference Karmiloff-Smith1979; Slobin, Reference Slobin1985). Psycholinguistic findings further suggest that native speakers treat the article and the noun as a more cohesive unit than do L2 learners (Carroll, Reference Carroll1989; Chevrot, Dugua & Fayol, Reference Chevrot, Dugua and Fayol2009; MacWhinney, Reference MacWhinney1978), and are more capable of using the information on the article in a predictive way (e.g., Lew-Williams & Fernald, Reference Lew-Williams and Fernald2007, Reference Lew-Williams and Fernald2010). If part of the difference between child and adult learners is related to the different building blocks that they use, then manipulating adults’ input in a way that directs them to multiword units (promoting more “child-like” learning) should improve learning outcomes.

In line with this prediction, early exposure to multiword units was shown to improve learning of grammatical gender agreement among adult learners. Adults learned article-noun pairings better in an artificial language when exposed first to multiword utterances and only then to noun labels, compared to the reverse order (Arnon & Ramscar, Reference Arnon and Ramscar2012). A second study refined the experimental manipulation by presenting participants first with either unsegmented sentences (without pauses between words) or segmented sentences (with pauses), and by adding a more direct assessment of the units that learners were extracting (Siegelman & Arnon, Reference Siegelman and Arnon2015). Early exposure to unsegmented utterances – where participants had to learn segmentation and structure simultaneously – was beneficial for learning article-noun pairings. On the individual level, participants who treated the article-noun as less segmented showed better learning. A similar advantage was found in adult English speakers learning classifier-noun associations in Chinese: outcomes were better when participants were exposed first to multiword sequences (Paul & Grüter, Reference Paul and Grüter2016).

The findings so far suggest that learning from multiword units can be beneficial, but they are based only on adult learners, leaving it an open question whether children also benefit from early exposure to multiword units. If such units play a facilitative role in learning more generally, we should see similar benefits in children. Only one previous study examined children's learning of a similar artificial language (Havron et al., Reference Havron, Raviv and Arnon2018), but its focus was on the impact of literacy on learning, and so it only looked at exposure to unsegmented input and did not compare learning from segmented and unsegmented input. That is, previous work did not test the prediction that children's learning will also be facilitated in the unsegmented condition compared to the segmented condition. This study found an effect of literacy on learning, with preliterate children showing better learning of the article-noun pairing compared to the noun labels, and literate children showing a reduction of this “agreement advantage”. This pattern of results – the relatively greater success of pre-literate children on article-noun trials – stands in contrast with results from adults. Adults consistently showed a noun-advantage in previous studies: they were overall better at learning the noun labels compared to the agreement patterns, even in the unsegmented condition (Arnon & Ramscar, Reference Arnon and Ramscar2012; Paul & Grüter, Reference Paul and Grüter2016; Siegelman & Arnon, Reference Siegelman and Arnon2015; Havron et al., Reference Havron, Raviv and Arnon2018). The difference in the noun-advantage between children and adults is consistent with studies showing that adults have less difficulty learning vocabulary items compared to grammatical relations in a second language (MacWhinney, Reference MacWhinney2005; Paul & Grüter, Reference Paul and Grüter2016), and may reflect a difference in their reliance on multiword units. Literate children, like adults, focus more on single words (the noun labels) even in the unsegmented condition, leading to a reduced agreement advantage, while pre-literate children in the same input condition rely more on multiword units, leading to better learning of the article-noun pairings compared to learning of the nouns. These findings suggest that the relative success at learning the article-noun pairings and the vocabulary items can capture developmental changes in the relative impact of words vs. multiword units in learning. If this is true, we should see differences in the relative learning of the two between children and adults. In particular, we expect that when exposed to unsegmented input – where learners can impose their own segmentation strategies – adults will show a noun advantage while children will not. This prediction was not directly tested before since child and adult learning was not compared.

The current study thus has two goals. The first is to ask if children also show better learning of grammatical gender when exposed to unsegmented input compared to segmented input, as has been found with adults in previous work. This is a central assumption of the model put forward by Arnon (Reference Arnon2010), that has not yet been empirically tested. We do not predict a difference between children and adults in this respect – learning of the article-noun pairings (but not of the nouns) should be better for both groups in the unsegmented condition compared to the segmented condition. The second goal is to see whether children will not show the noun-advantage that adults show when exposed to unsegmented input. Such a finding would parallel naturalistic findings in L2 learning and be consistent with the idea that children and adults differ in their reliance on multiword units, leading to differential learning of vocabulary vs. grammatical relations. It is important to note that we do not predict that children will show better learning than adults overall. Artificial language learning studies comparing children and adults repeatedly find better learning among adults (e.g., Asher & Price, Reference Asher and Price1967; Ferman & Karni, Reference Ferman and Karni2010; Hudson Kam & Newport, Reference Hudson Kam and Newport2005; Saffran, 2001). What we expect to find is not an overall child advantage, but a lack of noun advantage for children in the unsegmented condition.

Method

Participants

123 participants participated in this study. 77 six-to-eight-year old children (43 girls, mean age 7;2), and 46 university students from the Hebrew University of Jerusalem (28 women, mean age 24;5). Six additional children were tested but excluded from the analyses: three children had learning disabilities, and three did not understand the task (as evidenced by their alternating first-sentence second-sentence answers regardless of trial type or correct answer). All remaining children had no learning or language disabilities. Participants were randomly assigned to one of two conditions: they were either exposed to segmented input (23 adults, 38 children) or to unsegmented input (23 adults, 39 children). Children's mean age did not significantly differ between the groups (7;3 years in the segmented condition and 7;0 in the unsegmented condition), and bilingual participants were equally distributed between groups (11 in the unsegmented and 12 in the segmented groups). All children were tested at the Living Lab at the Bloomfield science museum in Jerusalem following parental consent. While we do not have SES measures, the visitors of the science museum typically come from various SES and cultural backgrounds, and the population can be assumed to be more heterogeneous than a sample of children coming to a university laboratory.

Materials

Children learned an artificial language similar to that used in Siegelman and Arnon (Reference Arnon2015, see Fig. 1 for a sample trial). We made several modifications needed for use with children. There were fewer words to learn (eight vs. twelve in the previous study), and the carrier phrase was shorter. Unlike previous studies, participants were exposed to only one block of exposure: either segmented or unsegmented. This was done to reduce the length of the experiment and enable children to complete it in full. The artificial language had eight novel labels for concrete nouns, two articles (“fo” and “se”), and a carrier phrase (“ferpel ti”, see Appendix A for item list). The nouns were divided into two noun ‘classes’ and each noun only appeared with one article. Hebrew (L1) has grammatical gender, though there is no gender agreement with function words preceding nouns (nor following them), unlike in our artificial language.

Figure 1. A sample exposure trial from the experiment

To ensure that learning was not affected by the gender of nouns in Hebrew, the noun ‘classes’ were balanced in terms of the gender of the Hebrew-nouns. There were no semantic, prosodic, or phonological cues to class membership – the only cue was distributional (which article the noun appeared with). All nouns were two syllables long. All objects had high frequency labels in Hebrew, were all concrete, and early acquired. The artificial language had a fixed word order: articles always followed the carrier phrase and preceded the nouns (see example 1).

  1. (1) Ferpel ti    se   geesoo

    carrier phrase article noun

The same recorded token of each noun, article and carrier phrase was used throughout the experiment. The duration of the two articles was identical to ensure they were equally prominent. The only difference between the two conditions was the presence of pauses between sentence parts: in the unsegmented condition there were no pauses, while in the segmented condition there were 250ms pauses separating the carrier phrase from the article, and the article from the noun.

Procedure

The study was approved by the ethics committee of the psychology department at the Hebrew University of Jerusalem. Children were told that they were going to learn an alien language. On each trial, they saw a cartoon alien point at objects, and heard the alien tell them what these objects are called in their language (see Figure 1 for an example). Children were asked to repeat the alien's utterance in order to enhance their learning. After they repeated each sentence, the next trial commenced. There were 32 exposure trials (each of the eight objects was described four times), each lasting about ten seconds.

Following the exposure phase, the alien reappeared on the screen and participants were told that it would say two sentences: one is correct in the language and one has a mistake in it. They had to decide which of the sentences was correct. In each trial, the alien pointed at an image and said two sentences that described it. In half of the test trials, the incorrect sentence was wrong because the noun-label did not match the image on the screen (noun trials), and, in the other half, the incorrect sentence had the wrong article (article-noun pairing trials). In all noun trials, both the correct and incorrect noun label came from the same noun ‘class’ (meaning the article was correct for both). After the children gave their oral response, the experimenter pressed the button corresponding to the answer given, and the next trial commenced. Each participant was tested on the same input they heard in the exposure phase: unsegmented sentences in the unsegmented condition and segmented ones in the segmented condition. All correct sentences had already appeared in the learning phase (no generalization was required). There were 16 test trials: one article-noun pairing and one noun trial-type for each of the eight objects. The order of incorrect and correct answers was counterbalanced.

The exposure phase lasted for about five minutes, and the testing phase about three minutes. Students (adult group) were tested in our laboratory, after signing consent forms. They were given class credit or payed $2.5. Their procedure was the same as for children except that adults pressed the button for themselves.

Results

Overall, participants showed learning of both the nouns (69.41%) and the article-noun pairings (61.57%, they were above chance for both, p < .001). The mean performance per condition and trial type for each age group is shown in Table 1 (significance codes refer to above-chance learning as tested by a t-test). All cells showed above chance learning except for children's learning of article-noun pairings in the segmented input condition. See Figure 2 for the full distribution of scores in the different conditions, trial types and age groups.

Figure 2. Performance on each condition and trial type for the two age groups

Table 1. Means and standard deviations (in brackets) for the different groups and conditions.

(‘.’ 0.05-.1, ‘*’ =<.05, ‘**’ =<.01, ‘***’=<.001)

Each dot represents a participant's performance on one type of trial (so there are two dots for each participant, one for the noun trials and one for the article-noun pairing trials), bold lines represent the mean, boxes cover one standard deviation from the mean.

The effect of unit-size on learning in children and adults

To examine the effect of unit-size on learning in children and adults, we ran a logistic mixed model using the lme4 package in R (version 1.1-12, Bates, Maechler, Bolker & Walker, Reference Bates, Mächler, Bolker and Walker2014). p-values were generated based on model comparisons using a “leave-one-out” method whereby the full model is compared to a model excluding each factor in turn. The significance values are reported based on a Chi-square comparing the full model with the nested model to see whether adding the factor leads to a significantly improved model. All fixed effects were sum coded (every level is compared to the grand mean). The dependent variable was accuracy on each trial (as a binary measure), and the fixed effects were age group (adult vs. child), trial type (article-noun pairing vs. noun), condition (segmented vs. unsegmented), article (se vs. fo), and gender (male vs. female). We included an interaction between trial type and condition, to test our prediction that the unsegmented condition will facilitate learning of article-noun pairings (more than nouns). To examine whether this is indeed the case for both age groups, we included a three-way interaction between age group, trial type and condition. If both groups show a differential effect of input on learning article-noun pairings vs. learning nouns, this interaction should not be significantFootnote 1. Given our prediction about the larger noun-advantage for adults in the unsegmented condition, we predict an interaction between group and trial-type, such that adults will show a larger noun-advantage. The interaction between condition and group was included for sake of completeness of the model, but we do not have any predictions about it. The model included random intercepts for item and participants (the maximal effect structure that allowed it to converge, Barr et al., 2013) and had low collinearity (all VIF values < 1.19, see Table 2).

Table 2. Mixed effects logistic regression model for both conditions and age groups (effects of interest in bold)

Adults were overall better than children (β = 0.45, SE = .06, p < .001), consistent with previous artificial language learning studies (e.g., Ferman & Karni, Reference Ferman and Karni2010). As in previous studies, performance was overall better on noun trials than on article-noun pairing trials (β = -0.28, SE = .06, p < .001). The effect of condition was not significant (β = 0.05, SE = .06, p = .4), but there was a significant interaction between condition and trial type (β = 0.14, SE = .06, p = .01), as was found in previous uses of this paradigm. To explore this interaction, we ran two additional restricted models on each trial type separately. The restricted models had the same effect structure as the full model (without a random slope for trial type). As predicted, manipulating initial exposure had a different effect on learning nouns vs. article-noun pairings. Learning the article-noun pairings was better in the unsegmented condition compared to the segmented one (β = 0.2, SE = .08, p = .007), replicating previous results on the facilitative effect of larger units for learning article-noun pairings (Arnon & Ramscar, Reference Arnon and Ramscar2012; Siegelman & Arnon, Reference Siegelman and Arnon2015). In contrast, learning the nouns did not differ between the two conditions (β = -0.09, SE = .09, p = .33). The interaction between age group and trial type was significant in the full model (β = -0.21, SE = .06, p < .001). The restricted models described above (that is, one model including only article-noun pairing trials and a second model including only noun trials) permit us to explore the nature of this interaction. We find that the effect of age group is larger (as can be seen from the coefficient values) in the model run on noun trials, (β = 0.66, SE = .1, p < .001) than in the model run on article-noun pairing trials (β = 0.25, SE = .08, p = .001): adults were better than children on both trial types but the effect seems larger for noun trials. The effect of gender in the full model was marginally significant with boys showing better performance (β = 0.22, SE = .12, p = .07), possibly due to the unequal distribution of boys and girls in our sample. None of the other effects reached significance.

To ensure that the effect of condition on learning was not driven only by adult learners, we ran a second analysis looking only at the performance of children. Note that, while the triple interaction between age group, condition, and trial type was not significant in the full model, this does not necessarily mean that both groups show the predicted interaction between condition and trial type. Since testing this interaction in children is one of the main goals of this study, we also analyzed children's results separately. This model included the same fixed and random effects as the full model, all sum coded (without age group, since we are only looking at children, see Table 3). This time, however, we also controlled for children's age (as a centered variable).

Table 3. Mixed effects logistic regression for children

As in the full model, the effect of condition was not significant, but, as predicted, there was a significant interaction between condition and trial type, showing that the effect of condition was different for noun-labels and article-noun pairings (β = 0.19, SE = .06, p = .003). A restricted model, comparing the effect of condition only for article-noun pairings trials, showed that learning of article-noun pairings was better in the unsegmented condition (β = 0.18, SE = .09, p = .05), supporting our hypothesis that children also benefit from early exposure to unsegmented input. Taken together, these findings replicate previous results with adults and extend them to children: both age groups showed better learning of article-noun pairings in the unsegmented condition.

Differences between child and adult learning from unsegmented input

To test our prediction that children and adults will show different learning patterns given the same input, we looked at the data from children and adults in the unsegmented condition. We focus on this condition because it does not impose a specific segmentation, and is therefore more suitable for exploring differences in segmentation biases. We predicted that adults will show a noun advantage in this condition, while children will not. We ran a mixed-effect logistic regression model to examine this prediction. The model had trial-type, age-group, article (se vs. fo), and gender as fixed effects (all sum coded), as well as the interaction between age-group and trial-type. The model included random intercepts for item and participant and had low collinearity (all VIF values < 3.18 see Table 4).

Table 4. Mixed effects logistic regression for unsegmented condition

Age affected performance, with adults showing better learning overall (β = 0.53, SE = .09, p < .001). Participants showed marginally better performance on noun trials (β = -0.14, SE = 0.08 p = .08), but this was qualified by a significant interaction with age group (β = -0.26, SE = 0.08 p < .001). To test this interaction, we ran two restricted models, which included the same effect structure as the full model, but each included only one age group (so had no random slope of age group). As predicted, adults were significantly better on nouns compared to article-noun pairings (β = -.41, SE = 0.14, p = .003), but children were not (β = 0.12, SE = .09, p =.19). That is, children did not show the noun advantage found in adults. These results show different learning patterns for children and adults, with adults succeeding more on vocabulary, as was previously found in natural language learning.

To summarize our results, we found that article-noun pairings were learned better by both adults and children when they were exposed to unsegmented input (compared to segmented input). Adults showed a noun advantage in the unsegmented condition, consistent with an increased focus on word units. Children, in contrast, showed similar learning of the article-noun pairings and the nouns.

Discussion

The current study was motivated by a recent theoretical proposal suggesting that one of the factors that make children better than adults at learning certain grammatical relations is their greater reliance on multiword units (Arnon, Reference Arnon2010; Arnon & Christiansen, Reference Arnon, Christiansen, Brooks and Kempe2014; Christiansen & Arnon, Reference Christiansen and Arnon2017). We asked two questions extending prior work: (1) Do children, like adults, show better learning of grammatical gender when exposed to unsegmented input compared to segmented input? If multiword units play a facilitative role in language learning, then children should also show this effect. (2) Do children and adults show different learning patterns of vocabulary and article-noun pairings when learning from unsegmented input? Adults are better at learning vocabulary items than certain grammatical relations in natural language (MacWhinney, Reference MacWhinney2005; Paul & Grüter, Reference Paul and Grüter2016), a pattern that is replicated in artificial language learning studies (Arnon & Ramscar, Reference Arnon and Ramscar2012; Siegelman & Arnon, Reference Siegelman and Arnon2015). If children do not show a similar vocabulary advantage, this would be consistent with a greater reliance on multiword units on their part, since this way of learning is predicted to facilitate learning of article-noun pairings, but not vocabulary.

Regarding our first question, we found that, like adults, children learn article-noun pairings better when learning from unsegmented input. Our findings replicate previous studies on adults (Arnon & Ramscar, Reference Arnon and Ramscar2012; Siegelman & Arnon, Reference Siegelman and Arnon2015), and extend these findings to children. These findings highlight the importance of larger building blocks in language learning (Arnon, Reference Arnon2010; Christiansen & Arnon, Reference Christiansen and Arnon2017) and support usage-based claims about the relation between the input learners receive and their learning trajectories (Tomasello, Reference Tomasello2003). Performance on article-noun pairings was better for both children and adults in the unsegmented condition compared to the segmented one. This condition was meant to simulate “child-like” language learning by having learners learn segmentation, meaning and structure simultaneously – as infants normally do. Our findings thus complement existing evidence that children make use of multiword units in natural language learning (e.g., Arnon & Clark, Reference Arnon and Clark2011; Bannard, Lieven & Tomasello, Reference Bannard, Lieven and Tomasello2009; Bannard & Matthews, Reference Bannard and Matthews2008), by showing that such units can also facilitate learning of novel grammatical relations. The idea that the nature of early building blocks impacts learning outcomes draws inspiration from two previous accounts of the difference between child and adult learning: Newport's less-is-more theory (Reference Newport1990) and Elman's starting-small theory (Reference Corbett1993). However, our account differs from them in predicting that learning structure, segmentation and meaning simultaneously – as children do – will result in more multiword building blocks compared to adults.

Regarding our second hypothesis, that when exposed to unsegmented input children will show a different learning pattern from adults, we found that while adults showed a noun advantage in this condition, as in previous studies (Arnon & Ramscar, Reference Arnon and Ramscar2012; Paul & Grüter, Reference Paul and Grüter2016; Siegelman & Arnon, Reference Siegelman and Arnon2015), children did not. Children were in fact numerically better on article-noun trials than on noun trials (though the effect was not significant). This finding is in line with our predictions, and is somewhat counter-intuitive given that it seems easier to remember that balle is a ‘ball’ in French, than to remember that one should say la balle and not le balle. Despite this, and unlike adults, children found it slightly easier to report that an article-noun pairing was incorrect than to report that a noun label did not match the object on the screen. This pattern mirrors a recent study which found better learning of agreement patterns than of vocabulary items in an artificial language in preschoolers (Bulgarelli, Reference Bulgarelli2018), and is consistent with the idea that children are more likely to initially treat the article-noun as one unit, making it easier to learn the correct pairing (Arnon & Ramscar, Reference Arnon and Ramscar2012; Siegelman & Arnon, Reference Siegelman and Arnon2015). While the difference between children and adults’ learning patterns is telling, a clear limitation of our study is that there was no direct measure of segmentation, and so it remains unclear what types of linguistic representations the children in our experiment extracted. One of the former studies used with adults employed a measure of typing which was not applicable to child participants (Siegelman & Arnon, Reference Siegelman and Arnon2015). Future studies need to assess children's’ segmentation using measures that can estimate the size of children's linguistic representations during learning.

While we treated children and adults as distinct groups, other factors beyond age are known to impact segmentation strategies. In particular, literacy has been shown to impact both segmentation and learning, with preliterate children showing different result patterns from literate children. In a previous study, preliterate children showed an advantage for learning article-noun pairings over nouns while literate children showed similar accuracy on both trial types (Havron et al., Reference Havron, Raviv and Arnon2018). These findings give rise to the predictions that in the unsegmented condition: (1) preliterate children should show an article-noun pairing advantage, (2) literate children should show no advantage for either trial type, and (3) adults should show a noun advantage. While we do not have information about participants’ literacy status in the current study, our sample most likely included a mix of pre-literate and literate children. The age range we used – six-to-eight year-olds – is the age range of the emergence of literacy in Israel. Most six-year-olds cannot read, and most eight-year-olds can, but there are of course large individual differences that make age only a weak proxy for literacy abilities. The pattern we see, of a weak (non-significant) article-noun pairing advantage, is what we would expect from a sample containing a mixture of preliterate and literate children (with preliterate children showing an article-noun pairing advantage and literate children not).

Importantly, we do not suggest that children cannot or do not segment multiword sequences into their constituent parts. In fact, there is evidence that infants as young as six-months can segment articles from unfamiliar nouns in their native language (Höhle & Weissenborn, Reference Höhle and Weissenborn2003; Shi, Marquis & Gauthier, Reference Shi, Marquis and Gauthier2006). The claim put forward in this paper is that children initially learn article-noun pairs together, and that having initially learned these larger units as a chunk, they retain a processing advantage over adults, who tend to learn them as separate units. Supporting this, experiments using pseudo-articles, similar to the ones we used in the current study, with young French learning infants (who are already able to segment real articles from noun-phrases), found that infants did not segment them by the end of the experiment (Shi & Lepage, Reference Shi and Lepage2008; Hallé, Durand & de Boysson-Bardies, Reference Hallé, Durand and de Boysson-Bardies2008). Similarly, it takes infants as long as 14 months to segment pronouns from verb phrases, perhaps because their co-occurrence with verbs is less consistent than that of articles and nouns (i.e., verbs appear after a more varied set of function and context words than nouns, Shi & Melançon, Reference Shi and Melançon2010). There is therefore evidence that children do learn multiword units early on, and only with accumulating experience are they able to extract smaller units (see also Soderstrom, Reference Soderstrom2007).

Another question is to what extent our study parallels natural language learning phenomena. The current study follows a long tradition of using artificial languages to test the acquisition of grammatical gender (e.g., Braine, Brody, Brooks, Sudhalter, Ross, Catalano & Fisch, Reference Braine, Brody, Brooks, Sudhalter, Ross, Catalano and Fisch1990; Brooks, Braine, Catalano, Brody & Sudhalter, Reference Brooks, Braine, Catalano, Brody and Sudhalter1993). While these artificial languages do not do justice to the complexity of natural language, they nevertheless provide valuable information about learning mechanisms and biases. Another issue is the way our artificial gender system differs from natural gender systems. Our gender classes did not have phonological and semantic regularities: this was done to test our prediction that the benefit for learning from multiword units expresses itself mainly when relations between adjacent words are not transparently associated with such cues. This is unlike many natural language gender systems, which tend to display some phonological and semantic regularities (e.g., Corbett, Reference Corbett1991; Mirkovic, MacDonald & Seidenberg, Reference Mirkovic, MacDonald and Seidenberg2005). Interestingly, when languages have both phonological and semantic cues for class membership, children seem to rely more on phonological information while adults rely more on semantic information (e.g., Gagliardi & Lidz, Reference Gagliardi and Lidz2014). A recent study (Culbertson, Jarvinen, Haggarty & Smith, Reference Culbertson, Jarvinen, Haggarty and Smith2019) suggests that this reflects children's greater ease in learning local information (between linguistic elements) than the mapping between linguistic and semantic information (i.e. gender class to semantic cues, or, in the case of the current study, learning the noun to object correspondence). While grammatical gender systems tend to have regular aspects, they also have irregular and arbitrary aspects (as in the current study). In French, for example, a recent study documented adults’ difficulty to use morpho-phonological cues for the assignment of grammatical gender, suggesting at least some cases are acquired on an item-by-item basis (Ayoun, Reference Ayoun2018). In addition, any regularity in gender class assignment must be learned from specific examples or prototypes, which may be more easily mastered when learning from article-noun pairs. This is not to say that gender systems do not have functional value: a recent study (Dye, Milin, Futrell & Ramscar, Reference Dye, Milin, Futrell and Ramscar2018) found that German gender-agreeing articles make following nouns more predictable, a function achieved in English by the use of prenominal adjectives (e.g., in the sequence “a nice cold beer”, nice cold makes beer predictable). These predictive relations, however, also need to be learned. Our interest here was in whether initial exposure to multiword units, encompassing the relation to be learned, can facilitate mastery. We looked at gender agreement as a case study, exemplifying such a relation. Similar semi-arbitrary relations are found in other domains of language beyond that of grammatical gender, such as verb-preposition pairings or idioms, which adult language learners also struggle with (e.g., Wray, Reference Wray1999; Siyanova & Schmitt, Reference Siyanova and Schmitt2007).

An additional concern might be that we tested participants whose L1 has grammatical gender, which may have affected their performance. Coming from a gender-marking language does impact the overall ease of learning a novel gender system in natural languages (Sabourin, Stowe & de Haan, Reference Sabourin, Stowe and de Haan2006); however, since we were comparing two different input conditions within the same population of speakers, this could not have impacted the results (whatever advantage there is to knowing a gender-marking language, it will be the same across conditions). Also, Hebrew does not mark gender on articles, and does not treat articles and nouns as separate words (the definite article, ha, appears as a prefix before the noun and is not written as a separate word): the artificial language differed from Hebrew in both respects, so that learners could not apply their existing gender-marking (and segmentation) knowledge to the novel language. The advantage for starting with unsegmented input has by now been documented in speakers of gender-marking (Hebrew) and non-gender-marking (English) languages. Nevertheless, it is important to systematically investigate the effect of different gender-marking systems on the unit-size manipulation. For instance, will exposure to unsegmented input facilitate learning in L1 speakers of Spanish, whose language treats articles and noun as separate words? Or will the impact of prior knowledge override the impact of this manipulation?

To conclude, the current study extends previous work in two ways. First, it shows that, for children also, article-noun pairings are mastered more easily when learning from unsegmented input. In doing so it highlights the importance of the larger units, and in particular of learning segmentation and meaning simultaneously, for learning grammatical relations. Our findings also illustrate the interesting differences in segmentation biases between children and adults and the utility of artificial languages in exploring them: children and adults learned differently when learning from the same input, in ways that mirror natural language learning. That is, the type of input impacts different aspects of language differently, and different groups of learners differently – a finding with implications for second language teaching (see also Barbir, Havron, Recht, Fiévet & Christophe, Reference Barbir, Havron, Recht, Fiévet and Christophe2019). Learning from unsegmented input facilitated article-noun pairings for both age groups. This pattern highlights the importance of taking into account the learning biases of the particular population in second language learning: what works for child learners may be different from what works for adult learners.

Acknowledgements

This work was supported by ISF Grant 52712 (to IA). We want to thank the research assistants at the Living Lab in the Bloomfield Science museum, the museum staff, Limor Raviv the Living Lab manager at the time of data collection, and the children and parents who participated in the study.

Appendix A: Stimuli for the artificial language learning task

Footnotes

1 Though a significant triple interaction could also stem from a difference between the two age groups in their performance on noun trials in the different conditions – such a result would be orthogonal to our hypotheses.

References

Abbot-Smith, K., & Tomasello, M. (2006). Exemplar-learning and schematization in a usage-based account of syntactic acquisition. The Linguistic Review, 23(3), 275290.CrossRefGoogle Scholar
Arnon, I. (2010). Starting big: The role of multiwordphrases in language learning and use. Unpublished doctoral dissertation, Stanford UniversityGoogle Scholar
Arnon, I. (2015). What can frequency effects tell us about the building blocks and mechanisms of language learning? Journal of Child Language, 42, 274277. https://doi.org/10.1017/S0305000914000610CrossRefGoogle ScholarPubMed
Arnon, I., & Christiansen, M. H. (2014). Chunk-Based Language Acquisition. In Brooks, P. J. & Kempe, V. (Eds.), Encyclopedia of Language Development (pp. 8890). Thousand Oaks, CA: Sage Publications.Google Scholar
Arnon, I., & Clark, E. V. (2011). Why brush your teeth is better than teeth – Children's word production is facilitated in familiar sentence-frames. Language Learning and Development, 7(November), 107129. https://doi.org/10.1080/15475441.2010.505489CrossRefGoogle Scholar
Arnon, I., & Cohen-Priva, U. C. (2014). Time and again: The changing effect of word and multiword frequency on phonetic duration for highly frequent sequences. The Mental Lexicon, 9(3), 377400. https://doi.org/10.1075/ml.9.3.01arnCrossRefGoogle Scholar
Arnon, I., Mccauley, S. M., & Christiansen, M. H. (2017). Digging up the building blocks of language: Age-of-acquisition effects for multiword phrases. Journal of Memory and Language, 92(February), 265280. https://doi.org/10.1016/j.jml.2016.07.004CrossRefGoogle Scholar
Arnon, I., & Ramscar, M. (2012). Granularity and the acquisition of grammatical gender: How order-of-acquisition affects what gets learned. Cognition, 122(3), 292305.CrossRefGoogle ScholarPubMed
Asher, J. J., & Price, B. S. (1967). The learning strategy of the total physical response: Some age differences. Child Development, 38(4), 12191227.CrossRefGoogle ScholarPubMed
Ayoun, D. (2018). Grammatical gender assignment in French: dispelling the native speaker myth. Journal of French Language Studies, 28(1), 113148. https://doi.org/10.1017/S095926951700014XCrossRefGoogle Scholar
Baayen, R. H., Hendrix, P., & Ramscar, M. (2013). Sidestepping the Combinatorial Explosion: An Explanation of n-gram Frequency Effects Based on Naive Discriminative Learning. Language and Speech, 56(3), 329347. https://doi.org/10.1177/0023830913484896CrossRefGoogle ScholarPubMed
Bannard, C., Lieven, E., & Tomasello, M. (2009). Modeling children's early grammatical knowledge. Proceedings of the National Academy of Sciences of the United States of America, 106(41), 1728417289. https://doi.org/10.1073/pnas.0905638106CrossRefGoogle ScholarPubMed
Bannard, C., & Matthews, D. (2008). Stored word sequences in language learning: The effect of familiarity on children's repetition of four-word combinations. Psychological Science, 19(3), 241248. https://doi.org/10.1111/j.1467-9280.2008.02075.xCrossRefGoogle ScholarPubMed
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2014). Fitting linear mixed-effects models using lme4. arXiv preprint:1406.5823.Google Scholar
Barbir, M., Havron, N., Recht, S. A., Fiévet, A. C., & Christophe, A. (2019). When one learning method is both a propeller and an obstacle: The effect of translation on second language acquisition in children. The 44th Annual Boston University Conference on Language Development, Boston, USA.Google Scholar
Bassano, D., Maillochon, I., & Mottet, S. (2008). Noun grammaticalization and determiner use in French children's speech: A gradual development with prosodic and lexical influences. Journal of Child Language, 35(2), 403.CrossRefGoogle ScholarPubMed
Braine, M. D., Brody, R. E., Brooks, P. J., Sudhalter, V., Ross, J. A., Catalano, L., & Fisch, S. M. (1990). Exploring language acquisition in children with a miniature artificial language: Effects of item and pattern frequency, arbitrary subclasses, and correction. Journal of memory and language, 29(5), 591610.CrossRefGoogle Scholar
Brooks, P. J., Braine, M. D., Catalano, L., Brody, R. E., & Sudhalter, V. (1993). Acquisition of gender-like noun subclasses in an artificial language: The contribution of phonological markers to learning. Journal of memory and language, 32(1), 7695.CrossRefGoogle Scholar
Bulgarelli, F. (2018). Talker Variability as desirable difficulty for language learning. Pennsylvania State University (unpublished doctoral dissertation).Google Scholar
Carroll, S. (1989). Second-Language Acquisition and the Computational Paradigm. Language Learning, 39(4), 535594.CrossRefGoogle Scholar
Chevrot, J. P., Dugua, C., & Fayol, M. (2009). Liaison acquisition, word segmentation and construction in French: a usage-based account. Journal of child language, 36(03), 557596.CrossRefGoogle ScholarPubMed
Christiansen, M. H., & Arnon, I. (2017). More than words: The role of multiword sequences in language learning and use. Topics in cognitive science, 9(3), 542551.CrossRefGoogle ScholarPubMed
Corbett, G. (1991). Gender. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Croft, W. (2001). Radical construction grammar: Syntactic theory in typological perspective. Oxford, England: Oxford University Press.CrossRefGoogle Scholar
Culbertson, J., Jarvinen, H., Haggarty, F., & Smith, K. (2019). Children's sensitivity to phonological and semantic cues during noun class learning: Evidence for a phonological bias. Language, 95(2), 268293.CrossRefGoogle Scholar
Dewaele, J.-M., & Véronique, D. (2001). Gender assignment and gender agreement in advanced French interlanguage: A cross-sectional study. Bilingualism: Language and Cognition, 4(3), 275297. http://doi.org/10.1017/S136672890100044XCrossRefGoogle Scholar
Dye, M., Milin, P., Futrell, R., & Ramscar, M. (2018). Alternative Solutions to a Language Design Problem: The Role of Adjectives and Gender Marking in Efficient Communication. Topics in Cognitive Science, 10(1), 209224. https://doi.org/10.1111/tops.12316CrossRefGoogle ScholarPubMed
Elman, J. L. (2009). On the meaning of words and dinosaur bones: Lexical knowledge without a lexicon. Cognitive Science, 33(4), 547582. https://doi.org/10.1111/j.1551-6709.2009.01023.xCrossRefGoogle ScholarPubMed
Elman, J. L. (1993). Learning and development in neural networks: the importance of starting small. Cognition, 48, 7199. https://doi.org/10.1016/0010-0277(93)90058-4CrossRefGoogle ScholarPubMed
Ferman, S., & Karni, A. (2010). No childhood advantage in the acquisition of skill in using an artificial language rule. PLoS ONE, 5(10), 2728. https://doi.org/10.1371/journal.pone.0013648CrossRefGoogle ScholarPubMed
Gagliardi, A., & Lidz, J. (2014). Statistical insensitivity in the acquisition of Tsez noun classes. Language, 90(1), 5889.CrossRefGoogle Scholar
Goldberg, A. (2006). Constructions at work: The nature of generalization in language. Oxford University Press, USA.Google Scholar
Hallé, P. A., Durand, C., & de Boysson-Bardies, B. (2008). Do 11-month-old French infants process articles? Language and Speech, 51(1-2), 2344.CrossRefGoogle ScholarPubMed
Havron, N., & Arnon, I. (2017a). Reading between the words: The effect of literacy on second language lexical segmentation. Applied Psycholinguistics, 38(1), 127153. https://doi.org/10.1017/S0142716416000138CrossRefGoogle Scholar
Havron, N., & Arnon, I. (2017b). Minding the gaps: literacy enhances lexical segmentation in children learning to read. Journal of Child Language, 44(6), 15161538.CrossRefGoogle Scholar
Havron, N., Raviv, L., & Arnon, I. (2018). Literate and preliterate children show different learning patterns in an artificial language learning task. Journal of Cultural Cognitive Science, 2(1-2) 2133.CrossRefGoogle Scholar
Höhle, B., & Weissenborn, J. (2003). German-learning infants’ ability to detect unstressed closed-class elements in continuous speech. Developmental Science, 6(2), 122127.CrossRefGoogle Scholar
Holmes, V. M., & de la Bâtie, B. D. (1999). Assignment of grammatical gender by native speakers and foreign learners of French. Applied Psycholinguistics, 20(04), 479506.CrossRefGoogle Scholar
Hudson Kam, C.L. & Newport, E.L. (2005). Regularizing Unpredictable Variation: The Roles of Adult and Child Learners in Language Formation and Change. Language Learning and Development, 1(2), 151195, doi: 10.1080/15475441.2005.9684215CrossRefGoogle Scholar
Karmiloff-Smith, A. (1979). A Functional Approach to Child Language. Cambridge: Cambridge University Press.Google Scholar
Kirjavainen, M., Theakston, A., & Lieven, E. (2009). Can input explain children's me-for-I errors? Journal of Child Language, 36(5), 10911114. https://doi.org/10.1017/S0305000909009350CrossRefGoogle ScholarPubMed
Lieven, E., Salomo, D., & Tomasello, M. (2009). Two-year-old children's production of multiword utterances: A usage-based analysis. Cognitive Linguistics, 20(3), 481507. https://doi.org/10.1515/COGL.2009.022CrossRefGoogle Scholar
Lew-Williams, C., & Fernald, A. (2007). Young children learning Spanish make rapid use of grammatical gender in spoken word recognition. Psychological Science, 18(3), 193198.CrossRefGoogle ScholarPubMed
Lew-Williams, C., & Fernald, A. (2010). Real-time processing of gender-marked articles by native and non-native Spanish speakers. Journal of Memory and Language, 63(4), 447464.CrossRefGoogle ScholarPubMed
MacWhinney, B. (1978). The acquisition of morphophonology. Monographs of the Society for Research in Child Development, 43(1/2), 1123.CrossRefGoogle Scholar
MacWhinney, B. (2005). Emergent fossilization. Studies of Fossilization in Second Language Acquisition, pp. 134156.CrossRefGoogle Scholar
McClelland, J. L. (2010). Emergence in Cognitive Science. Topics in Cognitive Science, 2(4), 751770. https://doi.org/10.1111/j.1756-8765.2010.01116.xCrossRefGoogle ScholarPubMed
Mirkovic, J., MacDonald, M. C., & Seidenberg, M. S. (2005). Where does gender come from? Evidence from a complex inflectional system. Language and Cognitive Processes, 20(1-2), 139167.CrossRefGoogle ScholarPubMed
Nesselhauf, N. (2003). The Use of Collocations by Advanced Learners of English and Some Implications for Teaching. Applied Linguistics, 24, 223242+268. https://doi.org/10.1093/applin/24.2.223CrossRefGoogle Scholar
Newport, E. L. (1990). Maturational constraints on language learning. Cognitive Science, 14(1), 1128.CrossRefGoogle Scholar
Paul, J. Z., & Grüter, T. (2016). Blocking effects in the learning of Chinese classifiers. Language Learning, 66(4), 972999.CrossRefGoogle Scholar
Pinker, S. (1991). Rules of language. Science, 253(5019), 530535.CrossRefGoogle ScholarPubMed
Sabourin, L., Stowe, L. A., & de Haan, G. J. (2006). Transfer effects in learning a second language grammatical gender system. Second Language Research, 22, 129.CrossRefGoogle Scholar
Scherag, A., Demuth, L., Rösler, F., Neville, H. J., & Röder, B. (2004). The effects of late acquisition of L2 and the consequences of immigration on L1 for semantic and morpho-syntactic language aspects. Cognition, 93(3), B97B108.CrossRefGoogle ScholarPubMed
Shi, R., & Lepage, M. (2008). The effect of functional morphemes on word segmentation in preverbal infants. Developmental Science, 11(3), 407413.CrossRefGoogle ScholarPubMed
Shi, R., Marquis, A., & Gauthier, B. (2006). Segmentation and representation of function words in preverbal French-learning infants. In Proceedings of the 30th Annual Boston University Conference on Language Development (Vol. 2, pp. 549560). Somerville, MA: Cascadilla.Google Scholar
Shi, R., & Melançon, A. (2010). Syntactic Categorization in French-Learning Infants. Infancy, 15(5), 517533. https://doi.org/10.1111/j.1532-7078.2009.00022.xCrossRefGoogle ScholarPubMed
Siegelman, N., & Arnon, I. (2015). The advantage of starting big: Learning from unsegmented input facilitates mastery of grammatical gender in an artificial language. Journal of Memory and Language, 85, 6075. https://doi.org/10.1016/j.jml.2015.07.003CrossRefGoogle Scholar
Siyanova, A., & Schmitt, N. (2007). Native and nonnative use of multi-word vs. one-word verbs. IRAL – International Review of Applied Linguistics in Language Teaching, 45, 119139. https://doi.org/10.1515/IRAL.2007.005Google Scholar
Slobin, D. I. (1985). The crosslinguistic study of language acquisition. London: Lawrence Erlbaum Associates Publishers.Google Scholar
Soderstrom, M. (2007) Beyond babytalk: re-evaluating the nature and content of speech input to preverbal infants. Developmental Review, 27(4):501532.CrossRefGoogle Scholar
Tomasello, M. (2003). Constructing a language: A usage-based theory of language acquisition. Cambridge, MA: Harvard University Press.Google Scholar
Wray, A. (1999). Formulaic language in learners and native speakers. Language Teaching, 32, 213231. https://doi.org/10.1017/S027226310000629XCrossRefGoogle Scholar
Figure 0

Figure 1. A sample exposure trial from the experiment

Figure 1

Figure 2. Performance on each condition and trial type for the two age groups

Figure 2

Table 1. Means and standard deviations (in brackets) for the different groups and conditions.

Figure 3

Table 2. Mixed effects logistic regression model for both conditions and age groups (effects of interest in bold)

Figure 4

Table 3. Mixed effects logistic regression for children

Figure 5

Table 4. Mixed effects logistic regression for unsegmented condition