An ongoing debate in the literature on children's syntactic competence concerns the abstractness of their early constructional representations. Researchers in this area have traditionally focused on the behavior of children younger than about three years old as they process constructions from their ambient language, with the point of contention being whether their representations for these constructions are couched in terms of the same abstract semantic and syntactic categories that adults' are (e.g. agent, patient, subject and object), or whether their representations are less abstract, or item-based.
A number of studies – using mostly production methodologies – have demonstrated that early representations are item-based, but that they become increasingly abstract over time (Braine, Reference Braine1976; Baker, Reference Baker1979; Bates & MacWhinney, Reference Bates, MacWhinney, Wanner and Gleitman1982; MacWhinney, Reference MacWhinney and Kuczaj1982; Schlesinger, Reference Schlesinger1982; Tomasello, Reference Tomasello1992; Ingram & Thompson, Reference Ingram and Thompson1996; Akhtar & Tomasello, Reference Akhtar and Tomasello1997; Lieven, Pine & Baldwin, Reference Lieven, Pine and Baldwin1997; Brooks & Tomasello, Reference Brooks and Tomasello1999; Tomasello, Reference Tomasello2000; Savage, Lieven, Theakston & Tomasello, Reference Savage, Lieven, Theakston and Tomasello2003; Tomasello, Reference Tomasello2003; Dittmar, Abbot-Smith, Lieven & Tomasello, Reference Abbot-Smith, Lieven and Tomasello2008). For example, if two-year-olds learn a novel verb by hearing it used in an intransitive construction and are then given opportunities to use it transitively, they are highly likely to resist creatively causitivizing it, and will instead conservatively and exclusively use it only in the exposure construction. As children get older, however, they are increasingly more likely to do what adults do: creatively turning It pilked into He pilked it. This sort of phenomenon has been taken to suggest that early learners' linguistic knowledge is represented only in lexically specific terms.
At the same time, however, recent work using preferential looking methodologies has suggested that two-year-olds may have some abstract constructional knowledge. Given, for example, an exemplar of the English transitive construction with a novel verb, participants are able to match it to the correct meaning in a two-alternative forced choice task (Fernandes, Marcus, Di Nubila & Vouloumanos, Reference Fernandes, Marcus, Di Nubila and Vouloumanos2006; Gertner, Fisher & Eisengart, Reference Gertner, Fisher and Eisengart2006). This has been argued to show that early learners do in fact have abstract syntactic knowledge, since above chance performance in these studies was not possible given only item-specific representations.
The comprehension results merit a somewhat more nuanced interpretation though. In Gertner et al. (Reference Gertner, Fisher and Eisengart2006), test trials were immediately preceded by practice trials that were designed to familiarize children with the task. All practice trials made use of transitive utterances – the same construction type examined at test – so it is possible that good performance at test was due to scaffolding provided during the practice. This interpretation is suggested by Dittmar et al. (Reference Dittmar, Abbot-Smith, Lieven and Tomasello2008), who replicated the Gertner et al. result, but only when a practice session was included in the procedure. Without preceding practice, two-year-olds showed no evidence of sensitivity to a general transitive construction. This indicates that while it may be possible for young children to act in a manner consistent with having acquired abstract representations, generalization appears to be quite tentative, which again suggests a less than fully abstract and robust understanding of the transitive construction.
More generally, the overall pattern of results from production and comprehension studies combined seems to be that evidence for early abstract knowledge is easier to find in comprehension, and that evidence for early lexically based knowledge is easier to find in production. This generalization is only partially accurate, however, since act-out comprehension tasks pattern like production tasks (Akhtar & Tomasello, Reference Akhtar and Tomasello1997). Thus, early learners' syntactic representations are functionally item-based whenever the task at hand requires them to produce something – be it utterances or actions. Moreover, it is arguably the case that item-based performance on any task suggests constructional representations that are less than fully abstract. Adult syntactic knowledge is characterized by even generalization across tasks, so whenever children show uneven performance – as in production versus comprehension tasks – this emphasizes potentially important differences between child and adult competence.
Regardless of the extent to which children's early syntactic representations are lexically based or abstract, an issue that has not been addressed in detail by either camp is how constructional generalization occurs at all. One proposal, known as the critical mass hypothesis (Marchman & Bates, Reference Marchman and Bates1994; Tomasello, Reference Tomasello2000), suggests that better generalization by older children and adults may simply be due to the fact that they have had more exposure to the requisite data than younger children. This assumes a strong relationship between type frequency and generalization such that as learners experience an increasing number of lexical items in a constructional slot – different causative verbs, for example, in the transitive construction – they have increasingly strong evidence that the slot can be populated by causative verbs that did not co-occur with the transitive construction in the input.
Since two-year-olds have witnessed massive numbers of transitive utterances, it is not surprising that they may have formed at least a tentative generalization of the transitive construction (Goldberg, Reference Goldberg2006; Abbot-Smith, Lieven & Tomasello, Reference Abbot-Smith, Lieven and Tomasello2008). Indeed, one might wonder why their representation is still fragile after so much input. This raises the possibility that, in addition to the sheer amount of input experienced, other factors might play a pivotal role in constructional generalization. A prediction of this view is that young children may be more tentative with their generalizations than older children and adults, even when exposed to the same input. In order to investigate this possibility, it is necessary to do something that is not possible when studying the acquisition of constructions from the ambient language – unconfound learners' age with the amount of exposure they have to a construction.
In the present study this was achieved in the context of a novel construction learning paradigm, which allows one to hold the input constant across different groups of learners. Given this design, if older learners generalize better than younger learners, then their advantage has to be due to factors other than the input. Additionally, if younger learners fail to generalize, then this would provide additional evidence in favor of early item-based constructional representations.
Our method revolved around giving learners limited exposure to a wholly novel argument structure construction, involving both a novel form and a novel abstract meaning. Only novel verbs appeared in the construction, so knowledge of verb semantics could not be used to learn the construction's relational meaning. After exposure, participants performed a forced choice comprehension task to determine whether they were able to correctly interpret new instances of the construction, and distinguish it from a known construction type. Note that since our measure was practically identical to that used by researchers who have argued for early abstract constructional representations (e.g. Fernandes et al., Reference Fernandes, Marcus, Di Nubila and Vouloumanos2006; Gertner et al., Reference Gertner, Fisher and Eisengart2006), it should, if anything, favor finding generalization in young children.
In previous work, use of the novel construction learning paradigm has provided evidence that older children (M=6 ; 4) and adults readily generalize on the basis of remarkably little data (Goldberg, Casenhiser & Sethuraman, Reference Goldberg, Casenhiser and Sethuraman2004; Casenhiser & Goldberg, Reference Casenhiser and Goldberg2005). In fact, the learning by adults is quite robust. Both SOV and OSV orders can be learned and distinguished from one another. Moreover, undergraduates are willing to use novel constructions in production tasks, with evidence of retention over a seven-day delay (Boyd, Gottschalk & Goldberg, Reference Boyd, Gottschalk and Goldberg2009). Additionally, an array of different control conditions have been employed to rule out spurious explanations for above chance performance at test (Goldberg, Casenhiser & White, Reference Goldberg, Casenhiser and White2007). These have established that participants' ability to accurately map new exemplars of a novel construction to its correct meaning is due to learning that occurs during exposure.
Overall, previous results from the novel construction learning paradigm indicate that for learners above the age of six, minimal exposure to a novel construction is sufficient for the formation of generalizations that go well beyond the specific exemplars encountered in the input. These older learners readily form abstract constructional representations. What is unclear, however, is whether younger children do the same as easily. As a means of testing for item-specific as well as general constructional representations, the present experiment manipulated the novelty of the items that participants were tested on. Based on previous results (Casenhiser & Goldberg, Reference Casenhiser and Goldberg2005; Boyd et al., Reference Boyd, Gottschalk and Goldberg2009), we predict that adults and older children will perform at high levels, even when test items are high in novelty. Younger children, on the other hand, may show decrements in performance as novelty increases.
METHOD
Participants
Eighteen five-year-olds (age range 4 ; 6–5 ; 9; M=5;2), 18 seven-year-olds (age range 6;9–8;1; M=7 ; 6) and 18 undergraduates (age range 18 ; 5–43 ; 0; M=22 ; 4) took part in the experiment. An additional 8 five-year-olds and 1 seven-year-old were tested, but their data were not included for reasons explained in the results section. Five-year-olds were recruited from summer recreational programs in the Princeton area, and seven-year-olds were recruited from local after-school programs. Each child was given a children's book in exchange for participation. Adults were either recruited through the use of online advertisements aimed at Princeton students, in which case they received a $12 payment for participation, or through the Princeton psychology subject pool, in which case they received course credit.
Novel construction
We created a novel construction that describes approach events in which one person moves towards another. The construction has the form NP1NP2V, where NP1 denotes the individual who is moving (the agent), NP2 denotes the individual who is being moved towards (the goal), and V is a novel verb that can be construed to encode a manner of motion. For example, The doctor the construction worker feigos denotes an event in which a doctor crawls towards a construction worker.
All instantiations of the construction contained two definite English NPs (e.g. the doctor and the construction worker) followed by a verb with an -o suffix, and a present or past tense marker (e.g. feigos, feigoed). The presence of determiners and inflectional markers served to make NPs and verbs simpler to identify. Our main interest was not in how difficult or easy it was to recognize constructional constituents as nouns or verbs, but in whether participants would be able to learn the form and function of the clausal construction as a whole.
Exposure trials
Participants were familiarized with sixteen different exemplars of the construction during an approximately three-minute exposure block. Since a number of studies have suggested that children have an easier time initially learning an abstract category when they are given input samples with less variability (Goldberg et al., Reference Goldberg, Casenhiser and Sethuraman2004; Casasola, Reference Casasola2005; Casenhiser & Goldberg, Reference Casenhiser and Goldberg2005; Maguire, Hirsh-Pasek, Golinkoff & Brandone, Reference Maguire, Hirsh-Pasek, Golinkoff and Brandone2008), we sought to reduce variability during exposure in two ways. First, all of the exposure sentences featured just two NPs – the doctor and the construction worker. These were balanced so that each one occurred as NP1 half of the time, and as NP2 the other half. Second, the frequency distribution of the five verbs that appeared in the exposure sentences was skewed to favor a single verb, since this has been found to enhance initial generalization of novel constructions (Goldberg et al., Reference Goldberg, Casenhiser and Sethuraman2004; Casenhiser & Goldberg, Reference Casenhiser and Goldberg2005). Eight of the sixteen exposure sentences featured the verb moopo, while the remaining eight were evenly divided among four other verbs: keybo, feigo, suuto and vako.
The sixteen exposure sentences were embedded in short, 10-second movies that depicted their meanings. Each movie showed an agent approaching a goal, and used a voiceover that featured an exposure sentence with a present tense verb, followed by the same sentence with a past tense verb. For example, if a movie showed a doctor hopping towards a construction worker (see Figure 1), then the sentence The doctor the construction worker vakos would be heard simultaneously with the hopping motion. After the hopping had ended, the sentence The doctor the construction worker vakoed would play.

Fig. 1. An example exposure trial. The above still frames represent time slices in a short movie that associates the NP1NP2V form with approach semantics.
Test trials
In order to determine whether participants had learned anything during exposure, we administered a forced-choice comprehension task involving six approach trials, six intransitive trials and four linking trials. All trials required participants to listen to a voiceover sentence, then pick which of two simultaneously played movies depicted its meaning. The characteristics of the different trial types are described in more detail below.
Approach and intransitive trials
Approach and intransitive trials were designed to determine whether participants recognized that the NP1NP2V construction had a general approach meaning that was disjoint from the meaning of a more common construction type – the intransitive. The movie pairs that made up approach and intransitive trials depicted the same two types of events: one movie showed an agent character approaching a goal character (as in exposure trials), while the other showed the same two characters performing a repetitive intransitive motion in synchrony, e.g. clapping. These movie pairs were combined with different kinds of voiceover sentences to produce either approach or intransitive trials. Approach trials featured NP1NP2V voiceovers that described the approach movie in a movie pair. The correct answer on an approach trial was therefore a point to the approach movie. Intransitive trials, on the other hand, featured intransitive sentences with novel verbs that described the intransitive movie in a movie pair. This means that the correct answer on an intransitive trial was a point to the movie showing synchronous repetitive intransitive motion. Figure 2 gives an example of the way in which different types of voiceover sentences were combined with movies to create either approach trials or intransitive trials.

Fig. 2. Movies depicting approach and intransitive events were combined with different voiceovers to create either approach trials or intransitive trials. For instance, in an approach trial using the two movies shown above, participants would hear The princess the basketball player pookos, with the target movie to the right. In an intransitive trial they would hear The basketball player and the princess are zorping, with the target movie to the left.
Novelty
In addition to testing whether participants could identify approach scenes if and only if they heard instances of the NP1NP2V construction, we also wanted to assess the effects of novelty on participants' ability to correctly interpret the construction. To this end, we manipulated the amount of novelty that was present in approach trials.
Three levels of novelty were used – no novelty, novel verb and all novel. For trials with no novelty, the target movie and its associated voiceover sentence were simply repeated from exposure. For trials with a novel verb, the target movie showed the same characters from the exposure block – i.e. the construction worker and the doctor – enacting a novel manner of approach, which was labeled in the voiceover with a new novel verb form. In trials that were all novel, the target movie involved characters that had not been seen during exposure enacting a novel manner of approach. These trials were accompanied by voiceovers in which none of the constituents in the NP1NP2V exemplar had ever been heard during the course of the experiment. We hypothesized that increased novelty would generally be associated with worse performance, such that participants would do best on trials in the no novelty condition, and would show increasing decrements to performance in the novel verb and all novel conditions.
Overall, participants were tested on six approach trials and six intransitive trials. The six approach trials were further subdivided according to novelty, with two no novelty trials, two novel verb trials and two all novel trials.
Linking trials
Good performance is possible on approach trials given only a general understanding of the NP1NP2V construction. For example, a participant who knows only that two initial NPs are associated with the meaning approach could correctly pick out approach movies. But do participants know more specifically that the novel construction's first NP encodes the agent, and that its second NP encodes the goal? In order to answer this question, participants were tested on four linking trials immediately following the twelve approach and intransitive trials. In linking trials, participants chose between two approach movies in which the agent and goal roles were reversed. For example, if the voiceover sentence was The construction worker the doctor gippos (see Figure 3), then the trial featured one movie in which the construction worker approached the doctor (the target movie), and another in which the doctor approached the construction worker (a distractor).

Fig. 3. A linking trial. Because both movies show approach actions, above-chance performance is only possible if linking rules have been learned.
Procedure
Participants were tested individually. A pretest block was followed by an exposure block and then a test block. The pretest block consisted of six trials that familiarized participants with the testing procedure by having them match exemplars of known construction types (involving familiar verbs) to one of two simultaneously played movies. The exposure block consisted of sixteen exposure trials, played in different random orders for each participant. The instructions for the exposure block simply asked participants to pay attention to what they would see and hear. The test block was broken down into four sub-blocks. The first three sub-blocks each contained two approach trials randomly intermixed with two intransitive trials. In sub-block one, the approach trials had no novelty; in sub-block two they featured novel verbs; in sub-block three all constituents were novel. The fourth sub-block consisted of four randomly ordered linking trials.
All test trials looped indefinitely until a response was made. Participants were instructed to watch the pair of movies as many times as necessary before picking one. Children pointed to their choice, which the experimenter then recorded with a button press. Adults pressed the button themselves. Test materials were balanced for target side, with half of the target movies appearing on the left, and half on the right. Test trials were additionally counterbalanced across participants for target movie. For the trial shown in Figure 2, for example, half of all participants viewed the two movies while hearing The princess the basketball player pookos, while the other half heard The basketball player and the princess are pooking. Use of the first voiceover sentence makes the approach movie the target; use of the second makes the intransitive movie the target. This sort of counterbalancing served to reduce the possibility that some specific combination of voiceover sentences with movie pairs might skew our results.
RESULTS
There were eight child participants who showed no response variability in the first three test sub-blocks. In particular, six five-year-olds picked the intransitive movie on every trial, and one five-year-old and one seven-year-old picked the approach movie on every trial. This sort of behavior suggests strategic responding in which children decided to consistently pick a certain type of movie without taking into account a semantic interpretation of the voiceover sentences, as instructed. These participants were consequently excluded from further analysis, along with another five-year-old whose poor performance in the pretest block suggested an inadequate understanding of the task. This left us with analyzable data from eighteen participants in each of the three groups. These data are summarized as proportional accuracy scores on the different trial types – approach, intransitive and linking – in Table 1. Figure 4 provides a graphical summary of performance on approach trials.

Fig. 4. Approach trial results. Both child groups were significantly less accurate than adults, but only five-year-olds showed reliable novelty effects. Error bars indicate standard error.
Table 1. Mean accuracy (SE) on approach, intransitive and linking trials (for intransitive trials, novelty refers to the novelty associated with the distracter (approach) movie)

Statistical analysis of these results was carried out in the R computing language and environment (R Development Core Team, 2010) using logit mixed models (Baayen, Reference Baayen2008; Jaeger, Reference Jaeger2008; Quené & van den Bergh, Reference Quené and van den Bergh2008).Footnote 2 For all mixed models appearing in the present work, the reported parameters were arrived at via model comparison: beginning with the simplest possible model, additional parameters were added one at a time to identify the model that provided the best possible balance between fit and complexity.
Intransitive trials
We focus first on the analysis of intransitive trials. Intransitive trials are important because they can potentially help to rule out certain interpretations of the data. For instance, participants might simply have chosen appearance movies on every trial, regardless of whether the voiceover sentence they heard was an exemplar of the novel construction, or an intransitive. Good performance on intransitives would, however, count against this alternative interpretation, and strengthen the argument that participants matched appearance movies to instances of the novel construction based on their understanding of the construction's meaning.
Indeed, we found that participants performed well on intransitive trials regardless of group, or the novelty of the distracter movie. Table 2 summarizes the best-fit logit mixed model of the intransitive data. The model specifies group and novelty as fixed effects, and allows for random participant intercepts. The addition of other random effects did not significantly improve model fit.
Table 2. Mixed model parameters for intransitive trials (parameter estimates indicate the likelihood of a correct response)

The model explicitly tests different pairwise contrasts. For example, the two group parameters listed in Table 2 represent comparisons between five-year-olds and adults, and seven-year-olds and adults. As indicated in Table 1, adults performed at ceiling on intransitive trials (i.e. 100% correct). We are thus testing to see whether there is any evidence that either of the child groups performed significantly worse than 100%. As the very large p-values for the two group parameters in Table 2 indicate, this was not the case: there is no evidence that either child group performed significantly worse than adults on intransitive trials.
The model shows similar null effects of distracter novelty on intransitive trial performance. Recall that in these trials, target movies showing synchronous repetitive intransitive motion were paired with distracter approach movies containing different levels of novelty, and that distracter novelty increased across the three test sub-blocks. The two novelty parameters listed in Table 2 test to see whether intransitive performance differed based on distracter novelty (or alternatively, across sub-blocks). The first novelty parameter represents performance on trials with novel verb distracters relative to no novelty distracters; the second novelty parameter represents performance on trials with all novel distracters relative to no novelty distracters. As the large p-values associated with each parameter indicate, we found no significant effects of distracter novelty.
The intransitive trial findings are important for a number of reasons. First, they help to constrain the interpretation of good appearance trial performance, as noted above. Second, they suggest that even the youngest participants were fully attentive as the experiment wore on: the fact that no group experienced intransitive accuracy declines across sub-blocks suggests that fatigue did not play a significant role in performance. Finally, these data demonstrate that participants in all three groups, as expected, are fully capable of forming abstract linguistic categories. Recall that intransitive trials contained novel verbs. This means that good performance could not have been due to verb-specific knowledge. Instead, participants had to understand that the pattern NP 1and NP 2are Ving was associated with intransitive actions performed by two entities. That all groups had such knowledge should not be surprising given that even children in the five-year-old group had undoubtedly been exposed to hundreds of thousands of intransitive exemplars prior to taking part in the experiment. As discussed below, it is clear that children at age five are capable of generalizing constructions with sufficient time and input; our focus is on their facility with generalization relative to older learners.
Approach trials
While the intransitive data are useful because they restrict how the overall experiment can be interpreted, our main questions of interest concerned participant performance on trials involving the novel construction. Specifically, do the different age groups pattern equivalently? And if not, can decrements in child performance be attributed to their learning less general representations?
Table 3 provides a summary of the best-fit logit mixed model of the approach trial data. The model specifies group and novelty as fixed effects, and allows for random participant intercepts. The addition of other random effects did not significantly improve model fit, nor did the addition of a group-by-novelty interaction term.
Table 3. Mixed model parameters for approach trials (parameter estimates indicate the likelihood of a correct response)

As in the intransitive trial model, the different model parameters represent specific pairwise comparisons. The first group parameter refers to a comparison between children (five-year-olds and seven-year-olds grouped together), and adults. As indicated in Table 3, this effect was significant (p=0·014). The sign of the parameter estimate gives the direction of the effect – that is, −5·33 signifies that children as a group were significantly less likely than adults to respond correctly. It was not the case, however, that both of the child groups behaved uniformly. The second group parameter estimate represents a comparison between five-year-olds and seven-year-olds, and also shows reliable differences (p=0·016). Specifically, the estimate of −1·83 indicates that five-year-olds were significantly less likely to respond correctly than seven-year-olds.
The model's novelty parameters are also given in Table 3. These represent pairwise comparisons between the novel verb and no novelty conditions, and the all novel and no novelty conditions, respectively. The first novelty estimate of −1·04 shows that participants were marginally less likely to respond correctly when a trial contained a verb that had never been witnessed before (p=0·072). The second estimate of −1·66 indicates that participants were significantly less likely to respond correctly on trials in which none of the vocabulary had been witnessed before (p=0·0041).
Thus far, the approach trial results establish that there were indeed significant differences between groups. What is less clear, however, is whether the performance decrements noted in the child groups can be attributed to their having acquired less abstract constructional representations. The presence of a significant group-by-novelty interaction term would certainly be consistent with this interpretation – children with item-specific representations would be expected to perform well on no novelty trials, but worse on trials in which test items had less lexical overlap with exposure. But, as noted above, the addition of an interaction term did not significantly improve model fit. Should we take this to mean that the noted novelty effects were equivalent across groups?
The answer to this question is no. Figure 4 clearly indicates an interaction pattern in which novelty had no effect on adult behavior, but appears to have had increasingly large, age-related effects in children. The reason that this interaction is not a part of the mixed model is because the non-linear transformation of the model results given in Table 3 from logit space to proportions produces the interaction pattern seen in Figure 4. The model is essentially able to represent interaction patterns without the need for an explicit interaction term.
This fact, however, does not obviate the need to provide positive proof of an interaction. As a means of supplying statistical support for the interaction pattern, we performed follow-up tests on the approach trial data using a conditional inference tree (Strobl, Malley & Tutz, Reference Strobl, Malley and Tutz2009). Conditional inference is a non-parametric method developed in machine learning circles that is especially well suited to investigating interaction patterns. It works by recursively partitioning a dataset into increasingly homogeneous subsets. The end result is a tree structure of the different subsets, along with p-values representing the reliability of each of the hypothesized partitions. Figure 5 illustrates the tree that resulted from application of the conditional inference algorithm to the approach trial data.

Fig. 5. A conditional inference tree based on the approach trial data. The tree shows significant group and novelty effects, as well as a group-by-novelty interaction in which the novelty effect is primarily driven by five-year-old performance.
The tree confirms many of the results from the mixed model analysis. There are, for instance, significant effects of group. In particular, the partition that is posited at Node 1 demonstrates that five-year-olds are significantly less likely to answer correctly than seven-year-olds and adults, and the Node 5 partition shows that seven-year-olds are significantly less likely to answer correctly than adults as well. The tree also finds significant novelty effects, as evidenced by the Node 2 partition of all novel trials from no novelty and novel verb trials. Note, however, that this effect is confined to five-year-olds – there are no novelty-based partitions posited for the seven-year-old or adult data. This outcome indicates a significant interaction of group and novelty: the ostensibly general novelty effects reported in the mixed model are actually driven primarily by behavior in the five-year-old group.Footnote 3 The data thus offer positive evidence in favor of the hypothesis that decrements in five-year-old performance on approach trials are due to their having acquired less than fully abstract representations of the novel construction.
Linking trials
While the good performance of seven-year-olds and adults on approach trials indicates that they were able to learn the general association between NP1NP2V forms and approach semantics, it does not demonstrate that they acquired the novel construction's linking rules. In order to determine whether participants learned that NP1 links to the construction's agent role, and NP2 to its goal role, we fit a logit mixed model to the linking trial data, specifying group as a fixed effect, and allowing for random participant intercepts. Other random effects proved irrelevant insofar as they failed to significantly improve model fit. Table 4 summarizes the model.
Table 4. Mixed model parameters for linking trials (parameter estimates indicate the likelihood of a correct response)

As shown in Table 1, adults exhibited near-perfect mastery of the construction's linking rules. The two parameter estimates for group in Table 4 represent explicit tests of five-year-old and seven-year-old performance measured against the baseline provided by adults. These indicate that five-year-olds were significantly less likely than adults to respond correctly (p<0·0001), but that there were no significant differences between seven-year-olds and adults (p=0·65).
DISCUSSION
The findings of the present study demonstrate that younger children – exemplified in the present experiment by participants in the five-year-old group – are less likely to generalize than older children and adults when exposed to the same input. That is, children's failure to generalize a given construction in many previous experiments may not be due simply to a lack of exposure to enough exemplars of that construction, as suggested by the critical mass hypothesis. Instead, we hypothesize that young children fail to generalize because they are less adept at detecting patterns in the input. Item-based behavior is thus the result of younger children's lack of facility in identifying more abstract generalizations in language, perhaps because of their tendency to be distracted from abstract structural similarity by surface dissimilarity (Gentner & Medina, Reference Gentner and Medina1998).
Importantly, we are not claiming that five-year-old children are incapable of generalization. Many studies of considerably younger children have documented abstract knowledge of various constructions (e.g. Pinker, Reference Pinker1989; Tomasello, Reference Tomasello2000; Bencini & Valian, Reference Bencini and Valian2008). Five-year-olds' performance on intransitives with novel verbs in the present experiment provides another example of relatively young children's ability to generalize. It is not the case that five years of age represents a critical threshold for an ability to form abstractions, but rather that younger children have relatively more difficulty in forming abstractions than older children, when the input is held constant.
It is important to keep in mind that exposure to the novel construction in the present study was limited and quite narrow: all of the sixteen exposure scenes involved the same two arguments, and only five novel verbs and corresponding manners of approach were witnessed. Ultimately, in order to fully generalize, there is clearly an advantage to variability, and we have no doubt that sufficient exposure to a wider variety of tokens would eventually lead young children to form a more abstract representation of the construction. But adults and seven-year-olds show evidence of quick and broad generalization with quite minimal and narrow input. In contrast, five-year-olds' generalization was much more limited.
Alternative interpretations
Before addressing the possible causes of item-based behavior in younger children, we seek to rule out some alternative interpretations of the current data. As noted above, one might argue that good performance on approach trials in the present experiment does not reflect actual learning of the NP1NP2V construction. Instead, both children and adults may have adopted a strategy in the first three test sub-blocks whereby they simply favored approach movies, without regard for the meaning of the voiceover sentences that they heard. Recall that two children – whose data were subsequently excluded from analysis – actually chose approach movies on every trial. In order to rule out the possibility that other participants executed a more nuanced version of this strategy wherein they probabilistically (instead of deterministically) favored approach movies, performance on intransitive trials is relevant. Recall that all groups were statistically at ceiling on intransitive trials, that distracter novelty had no effect on intransitive trial performance and that group and novelty did not interact (see Table 2). These findings are inconsistent with the notion that good approach trial performance was due to strategic responding, and instead suggest that participants picked movies – as instructed – based on semantic interpretation of the voiceover sentences.
Another possible strategy involves associating intransitive sentences with intransitive movies, then matching exemplars of the NP1NP2V construction to approach movies based on a process of elimination. If, for example, a participant is able to predict that the movie on the left in Figure 2 should be described with an intransitive structure like The basketball player and the princess are Ving, then they know that if they hear an NP1NP2V structure, it probably refers to the movie on the right. Note that this strategy, like the one outlined above, does not require any actual knowledge of the NP1NP2V construction. Previous work that included a no-sound control and a separate control in which only the two nouns were labeled during exposure has ruled out the idea that participants make use this strategy in general (Goldberg et al., Reference Goldberg, Casenhiser and Sethuraman2004; Casenhiser & Goldberg, Reference Casenhiser and Goldberg2005; Goldberg et al., Reference Goldberg, Casenhiser and White2007), and strongly indicate that without actual exposure to a novel construction, participants cannot successfully distinguish it from a known construction that has the same number of participants.
Moreover, the current data also argue against a process-of-elimination strategy because such a strategy incorrectly predicts that young children's performance should be insensitive to the novelty manipulation. Yet our results demonstrate that five-year-olds show significant novelty effects. This outcome is straightforwardly explained by the hypothesis that five-year-olds' knowledge of the NP1NP2V construction is, in some respects, item-based. As test items show less lexical overlap with exposure items, they are increasingly unsure about whether to interpret them as descriptions of approach movies or descriptions of intransitive movies.
Note also that both of these strategies are additionally ruled out by seven-year-old and adult performance on linking trials. Always choosing approach movies, or matching the NP1NP2V construction to approach movies based on a process of elimination, incorrectly predicts at chance performance when the target and distracter movies both show approach events. Adults and older children, however, were statistically at ceiling on linking trials – an outcome that indicates they have learned the specific linking rules.
Finally, we address the fact that our test item blocks were ordered and that five-year-olds showed numerical declines in accuracy on later trials involving the novel construction. While it is normally legitimate to keep order constant in a between-participants design such as ours, it is possible that five-year-olds had less stamina than seven-year-olds and adults, and that this is responsible for their poorer performance as the experiment went on. However, as noted above, the fact that intransitive trials – which were interspersed with appearance trials in each of the first three test sub-blocks – showed no decrements over time, argues against the notion that the five-year-olds data pattern is due to fatigue. Instead, intransitive performance suggests that all participants remained attentive to the task throughout. Moreover, the entire experiment took 10–12 minutes, of which approximately 5 minutes was devoted to the entire set of test trials. This short time span with children who are kindergarten-aged also argues against an explanation based on increasing fatigue.
When our results are put into a larger context, we find that they are not in fact unexpected. The following section reviews some relevant work about young children's ability to generalize.
Previous indications that younger children tend to miss generalizations
The idea that young children's generalizations may be more tentative or partial, or may need more contextual support than adults', is supported by a great deal of work in non-linguistic category formation (Munakata, McClelland, Johnson & Siegler, Reference Munakata, McClelland, Johnson and Siegler1997; Rovee-Collier, Reference Rovee-Collier1997; Munakata, Reference Munakata2001; Fisher & Sloutsky, Reference Fisher and Sloutsky2005). Moreover, the finding that young children are more conservative despite having the same input is reminiscent of other findings in research on children's memory and cognition (Brainerd & Mojardin, Reference Brainerd and Mojardin1998; Brainerd & Reyna, Reference Brainerd and Reyna2004; Brainerd, Reyna & Ceci, Reference Brainerd, Reyna and Ceci2008). Older children and adults are more likely to ‘fill in’ gaps in their experience than younger children. In one paradigm, for instance, a list of words is generated by compiling a set of associates for a target word – e.g. the word doctor could be used to generate the related words nurse, sick, hospital, ill, patient, cure, stethoscope and surgeon. After exposure to the list of related words, participants were asked whether the original (non-occurring) word, doctor, had appeared in the list. Adults are more likely than children – quite likely, in fact – to falsely recall that they had seen doctor in the original list (Brainerd & Reyna, Reference Brainerd and Reyna2004; Fisher & Sloutsky, Reference Fisher and Sloutsky2005). This is arguably evidence that adults are responding based on a higher-level category – whether or not the probe word is a member of the semantic field exemplified by the associates of doctor – whereas children are responding in an item-based fashion, based on the words that were actually present in the input.
In related work, it has been shown that children have more trouble with systematic reversal shifts than adults do (Brainerd, Reyna & Forrest, Reference Brainerd, Reyna and Forrest2002). For example, if participants are exposed to instances that exemplify a pattern (e.g. all blue objects are labeled ‘winners’ and all red objects are labeled ‘losers’), adults show an advantage when they need to reverse the pattern (making all blue objects losers, and all red objects winners), as compared with when they have to learn an entirely new rule (e.g. all square objects are winners, and all circular objects are losers). The advantage occurs because adults presumably learn abstract, color-based categories for the labels, and a reversal shift requires these categories to be relabeled, which is significantly easier than learning entirely new shape-based categories. In contrast, children show no advantage for reversal shifts (Tighe, Tighe & Schechter, Reference Tighe, Tighe and Schechter1975; Brainerd & Reyna, Reference Brainerd and Reyna2004) – a finding that suggests that they are performing the task in an item-based fashion, by associating each individual object with a label, rather than categorizing objects.
Overall then, there is a significant body of results from linguistic and non-linguistic tasks that demonstrate that children often fail to learn or use abstract categories. Moreover, the results just discussed are crucially similar to our experimental findings in that they demonstrate that children's responses have an item-based character, even when children are exposed to input that is sufficient for category formation in adults. This suggests that the differences between child and adult behavior is not solely due to differences in the amount of input that each group receives.
Children are sometimes ready generalizers
The results reported here at first blush appear to be at odds with recent work by Newport and colleagues that has suggested that in certain types of situations, children are more likely to generalize than adults (Singleton & Newport, Reference Singleton and Newport2004, Hudson Kam & Newport, Reference Hudson Kam and Newport2005, Reference Hudson Kam and Newport2009). For example, a deaf child, referred to as Simon, who was exposed to obligatory motion classifiers in American Sign Language only 70% of the time by his parents, generalized the use of the classifiers to 90% of appropriate contexts, resulting in use that was indistinguishable from native signers (Singleton & Newport, Reference Singleton and Newport2004). Similarly, in an experimental study, hearing children who were taught an artificial language that used a determiner only 60% of the time showed a tendency to simplify the pattern either by always producing the determiner, or by omitting the determiner. Adults, on the other hand, were more likely to match the probabilities witnessed in the input, producing the determiner approximately 60% of the time (Hudson Kam & Newport, Reference Hudson Kam and Newport2005).
How can children be at once more conservative than adults and more likely to generalize than adults? There are likely several factors at play in determining whether generalization occurs. The one that is most relevant in the present context is the idea that a pattern must be implicitly recognized in order to be generalized. Clearly, learners have to detect a generalization in order to take advantage of it. Children arguably generalize in Newport and colleague's experimental studies because the pattern of determiner use was easy to identify, since the morpheme was consistent phonologically. Other studies that have investigated children's knowledge of concrete morphological patterns have also found quite early generalization (Dąbrowska & Szczerbiński, Reference Dąbrowska and Szczerbiński2006; Dąbrowska & Tomasello, Reference Dąbrowska and Tomasello2008).
The patterns required for phrasal construction learning are less easy to detect, however, in that they require children to identify commonalities based on word order (or grammatical relations), and propositional meaning. In order to recognize English argument structure constructions, for example, children have to overlook concrete differences in the various verbs and arguments involved; they need to recognize the common word order and extremely abstract function. Younger children may not recognize a fully abstract pattern as readily; without the shared commonality (in arguments, in the present study) new instances are not easily assimilated to the exemplars witnessed during exposure. Indeed, as reviewed above, studies of children's use of more abstract constructions have consistently shown a conservative bias.
There are other factors – beyond the ability to recognize the patterns involved – that likely lead children and adults to respond as they do in Newport and colleagues' work. We believe the following additional premises are required:
(1) Learners attempt to learn the input veridically; if it contains probabilistic variation, they will attempt to produce probabilistic variation.
(2) Unconditioned variation is harder to retain than conditioned variation (since conditioning factors serve to make variation predictable).
(3) Producing one form (generalization) is easier than producing many forms (variation).
In order to explain why children more readily generalize in Newport et al.'s studies than adults do, we need to focus on what it is that participants are trying to do. Most likely, all participants are attempting to reproduce the input veridically. If the input is probabilistic, the target response is probabilistic (see Premise 1). Participants have in fact been shown to match the probabilities in the input, even when it works against their economic self-interest (Myers, Reference Myers and Estes1976).
In Hudson Kam and Newport's studies, the variation in the input was unpredictable; that is, the variation was unconditioned or inconsistent: the determiner (or classifier) appeared or did not appear, without being conditioned by a differing semantic interpretation or by different accompanying nouns. Premise 2 predicts that such variation should be more difficult to retain. Thus, children's tendency to generalize (or to omit the recalcitrant determiner altogether) may have stemmed from being unable to predict when the determiner should appear. In support of this idea are some of the findings from Hudson Kam and Newport (Reference Hudson Kam and Newport2009) . In their Experiment 1, unconditioned variation was made quite complex – one determiner appeared 60% of the time and sixteen other determiners appeared 2·5% of the time each – and adults also tended to generalize, boosting the probability of the high-frequency determiner. But when the same very complex variation was made predictable in Experiment 2 – i.e. was conditioned such that each of the seventeen determiners now appeared only with certain nouns – the variation was learned by adults.
Thus, it seems that when learners are unable to predict variation (because it is unconditioned), they are likely to use some default strategy (e.g. “I'm going to stick with the one determiner that I can remember” or “Just forget those crazy little words, I have no idea what they're doing”). Understandably, with their better short-term memory and meta-cognitive skills, adults have an easier time keeping track of unconditioned probabilistic variation and reproducing it veridically. But when the language gets too complex, even adults rely on a simpler generalization.
The third premise is that generalization is easier than variation, which explains why the default strategy is to generalize when a pattern is recognized. This is in some sense trivial: all other things being equal, repeating a pattern is less effortful than producing several unrelated patterns. Repeated actions, whether linguistic or non-linguistic, become routinized and therefore require less effort. Evidence of this comes both from studies indicating that repetition or familiarity leads to increased perceptual fluency (Bornstein & D'Agostino, Reference Bornstein and D'Agostino1994), and from studies that demonstrate structural and lexical priming and a tendency to perseverate more generally (Ramage, Bayles, Helm-Estabrooks & Cruz, Reference Ramage, Bayles, Helm-Estabrooks and Cruz1999). The reason that learners do not ultimately generalize or simplify a language completely, using only a single construction, is explained by Premise 1 (they aim to reproduce the target language veridically) in conjunction with the fact that ultimately, of course, multiple constructions exist because communication is improved by being able to utilize various constructions with differing functions. That is, constructions collectively provide the expressive power of a language.
Extant findings are consistent with the idea that both children and adults are subject to a tendency to generalize, because it is easier, and a tendency to reproduce the patterns in the input, because they aim to be ‘correct’ – i.e. they want to do what others do. The present study does not pit generalization against matching of the input; indeed, the only way to match the pattern in the input on the new items was to generalize. But it is only possible to generalize a pattern if there is some (implicit) recognition of the pattern. Older children and adults clearly have better working memory and better meta-cognitive skills, so they are more likely to implicitly or explicitly recognize a pattern when one exists. Conservatism, or initial hugging of the data, results from cases in which children fail to implicitly recognize a pattern.
CONCLUSION
We have presented clear evidence that younger children are more conservative than older learners, even when the younger learners have had just as much exposure to a novel construction as older learners. The results indicate that young children do not simply perform more poorly overall (though they do), but that they face a disproportionately greater challenge when presented with more novel items. That is, young children tend to rely quite closely on the input without taking liberties with it.
We infer that younger children's lack of broad generalization is due to the lack of implicit recognition of the pattern exemplified by the novel construction. Findings from other researchers indicate that children can be ready generalizers when the generalizations are more obvious. The present results are not unexpected when viewed in the context of the non-linguistic memory literature, which has also found that children often do not generalize as fully as older children or adults, but rather show evidence of hugging the data more closely.
Age differences always raise the issue of possible relevance to well-known sensitive period effects. Clearly we can only speculate on this point. It is possible that it is advantageous for young children to be initially conservative. Much recent work in linguistics has observed that far from being a system of a few simple, elegant rules, linguistic competence requires detailed, partially idiosyncratic, often probabilistic knowledge of a multitude of grammatical patterns (e.g. Lakoff, Reference Lakoff1970; Williams, Reference Williams1994; Culicover, Reference Culicover1999; Wray, Reference Wray2002; Goldberg, Reference Goldberg2006). There are thousands of collocations, idioms and minor constructions that often buck the trends of a language in unexpected ways. A mature learner that leapt too quickly to generalizations might fail to learn the nuanced subregularities that exist.
At the same time, we cannot say on the basis of the present experiment that adults and seven-year-olds failed to learn the specifics of the input, because they performed well across the board. It is quite possible that seven-year-olds and adults retained representations of the familiar items, but such lexically specific representations were not required for them to successfully identify new instances of the construction.
If in fact generalization does not eclipse exemplar-based knowledge, the present results would not shed light on children's ultimate advantage in language learning vis-à-vis adults, but instead it may help illuminate adults' and older children's initial outpacing of younger children in language learning tasks (Snow & Hoefnagel-Höhle, Reference Snow and Hoefnagel-Höhle1978). It is clear that while language is rife with semi-idiosyncrasy, it is also replete with regularities. The ability to generalize is required for the interpretation and production of novel utterances.
The present study thus does not determine whether children's failure to generalize represents a net gain or a net loss. But we can safely conclude that their failure is not simply due to a lack of sufficient exposure to a given construction.