Given the Chunk-and-Pass processing necessitated by the Now-or-Never bottleneck, Christiansen & Chater (C&C) propose that learning language is learning to process language. In C&C's conceptualization, the learning and prediction processes are general, (henceforth, model-free), and knowledge used in prediction arises gradually. In discussing the consequences of this scenario, C&C impose a dichotomy between these prediction-based models that are the outcome of learning to process, and learning based on more specialized constraints on how linguistic information is processed (the “child as linguist” approach, henceforth, model-based). In this commentary, we leave aside discussion of the Now-or-Never bottleneck per se and focus on C&C's claims about its theoretical consequences for language acquisition.
C&C's perspective provides an interesting framework for guiding research and developing theories. However, we argue that it does not provide significant constraints on the broader theoretical debates with which the field is engaged: in particular, debates about the nature of constraints on learning. Our argument is based on theoretical necessity and empirical evidence. Theoretically, the model-free approach is destined to be misled by surface-level information. Specifically, the general-purpose learning procedure is underspecified with respect to the level of analysis given different problems: Information for particular problems may exist at different levels, and using the wrong level may lead the learner astray. Empirically, when the model-based and model-free approaches are computationally equivalent, the model-free approach simply may not coincide with human performance. To support these claims we cite two cases: one from syntax, and another from word learning.
Many arguments for model-based learning come from phenomena that require a specific level of analysis. An oft-cited example is the constraint on structure-dependence, which specifies that grammatical operations apply to abstract phrasal structures, not linear sequences. It accounts for the fact that the yes/no question in 1(b), following, is the correct form that is related to the declarative 1(a), but question in 1(c) is not.
-
1.
-
a. The girl who is smiling is happy.
-
b. Is the girl who is smiling happy?
-
c. °Is the girl who smiling is happy?
The distinction hinges superficially on which is is moved to the beginning of the sentence in the question. The grammatical principle that governs this operation is subject-auxiliary inversion; in 1(a), the subject is the complex noun phase [the girl who is smiling], so the entire structure inverts with is. The model-based argument is that young children's input lacks the positive examples of the complex embedded questions as in 1(b), but rather consists of simpler utterances such as 2(a) and 2(b); without the notion that syntactic operations operate over phrasal structures, why would a learner not conclude from 2(a) and 2(b) to simply front the first is?
-
(2)
-
a. The girl is happy.
-
b. Is the girl happy?
Reali and Christiansen's (Reference Reali and Christiansen2005) model-free approach addresses this question. They demonstrated that a model-free learner who is sensitive to local bigram patterns could make the correct predictions about the structure of yes/no questions with complex noun phrases. This demonstration showed how attending to local sequential patterns could achieve the appropriate behavior despite not representing linguistic material at the level of syntactic hierarchies, as called for by model-based accounts. However, it turned out that the success of the model-free mechanism was an artifact of idiosyncrasies in English that had nothing to do with the syntactic structures in question (Kam et al. Reference Kam, Stoyneshka, Tornyova, Fodor and Sakas2008). This does not rule out the possibility that a different model-free mechanism would succeed at learning the right generalizations, but adopting the view that learning language is learning to process language does not get around the fundamental challenges.
We now turn to an example from our own work in cross-situational word-learning, where model-based and model-free versions of learning mechanisms can both work in principle (Yu et al. Reference Yu, Smith, Klein, Shiffrin, McNamara and Trafton2007). Cross-situational word learning refers to naturalistic situations where learners encounter words under referential ambiguity, and learn the correct word-to-referent mappings via the accumulation of cross situational statistics (Yu & Smith Reference Yu and Smith2007, among others). The associative learning account for how cross-situational statistics are used proposes that learning is model-free, in that passive accumulation of the co-occurrence statistics between words and their possible referents suffices for learning word-referent mappings. In contrast, model-based word-learning accounts posit that, like a mini-linguist, learners have the overarching assumption that words are referential, and learners actively evaluate possible word-referent mappings (e.g, Trueswell et al. Reference Trueswell, Medina, Hafri and Gleitman2013; Waxman & Gelman Reference Waxman and Gelman2009). Although computationally, both accounts are plausible (Yu et al. Reference Yu, Smith, Klein, Shiffrin, McNamara and Trafton2007), we recently carried out an experiment showing the importance of learners' knowledge that words are referential – a model-based, top-down constraint (Wang & Mintz, Reference Wang and Mintzunder revision). We created a cross-situational learning experiment in which there was referential ambiguity within trials, but reliable cross-situational statistical information as to the word-referent mappings. In two different conditions, we held word and referent co-occurrence statistics constant but gave each group of participants different instructions. Both groups were instructed to perform a distractor task, and only one group was also told to learn word meanings. Only the latter group successfully learned the mappings, even though both groups were exposed to the same word-to-referent co-occurrence patterns. Thus, although a model-free learner could succeed in the task, human learners required the notion that words refer for word learning. We take this as evidence that model-based hypothesis testing is required for word learning empirically, even though the model-free version could have worked in principle.
In sum, although the Now-or-Never bottleneck presents interesting challenges for theories of language acquisition, the perspective C&C espouse does not solve problems that model-based approaches do, and empirically, model-free mechanisms do not apply to certain learning situations. Thus, casting acquisition as learning to process across levels of linguistic abstraction does not avoid the theoretical controversies and debates that inhabit the field. It simply shifts the debate from the nature of the constraints on linguistic knowledge acquisition to the nature of the constraints on “learning to process.” We do not believe that this shift has substantial theoretical consequences for understanding the nature of the constraints on language learning.
Given the Chunk-and-Pass processing necessitated by the Now-or-Never bottleneck, Christiansen & Chater (C&C) propose that learning language is learning to process language. In C&C's conceptualization, the learning and prediction processes are general, (henceforth, model-free), and knowledge used in prediction arises gradually. In discussing the consequences of this scenario, C&C impose a dichotomy between these prediction-based models that are the outcome of learning to process, and learning based on more specialized constraints on how linguistic information is processed (the “child as linguist” approach, henceforth, model-based). In this commentary, we leave aside discussion of the Now-or-Never bottleneck per se and focus on C&C's claims about its theoretical consequences for language acquisition.
C&C's perspective provides an interesting framework for guiding research and developing theories. However, we argue that it does not provide significant constraints on the broader theoretical debates with which the field is engaged: in particular, debates about the nature of constraints on learning. Our argument is based on theoretical necessity and empirical evidence. Theoretically, the model-free approach is destined to be misled by surface-level information. Specifically, the general-purpose learning procedure is underspecified with respect to the level of analysis given different problems: Information for particular problems may exist at different levels, and using the wrong level may lead the learner astray. Empirically, when the model-based and model-free approaches are computationally equivalent, the model-free approach simply may not coincide with human performance. To support these claims we cite two cases: one from syntax, and another from word learning.
Many arguments for model-based learning come from phenomena that require a specific level of analysis. An oft-cited example is the constraint on structure-dependence, which specifies that grammatical operations apply to abstract phrasal structures, not linear sequences. It accounts for the fact that the yes/no question in 1(b), following, is the correct form that is related to the declarative 1(a), but question in 1(c) is not.
1.
a. The girl who is smiling is happy.
b. Is the girl who is smiling happy?
c. °Is the girl who smiling is happy?
The distinction hinges superficially on which is is moved to the beginning of the sentence in the question. The grammatical principle that governs this operation is subject-auxiliary inversion; in 1(a), the subject is the complex noun phase [the girl who is smiling], so the entire structure inverts with is. The model-based argument is that young children's input lacks the positive examples of the complex embedded questions as in 1(b), but rather consists of simpler utterances such as 2(a) and 2(b); without the notion that syntactic operations operate over phrasal structures, why would a learner not conclude from 2(a) and 2(b) to simply front the first is?
(2)
a. The girl is happy.
b. Is the girl happy?
Reali and Christiansen's (Reference Reali and Christiansen2005) model-free approach addresses this question. They demonstrated that a model-free learner who is sensitive to local bigram patterns could make the correct predictions about the structure of yes/no questions with complex noun phrases. This demonstration showed how attending to local sequential patterns could achieve the appropriate behavior despite not representing linguistic material at the level of syntactic hierarchies, as called for by model-based accounts. However, it turned out that the success of the model-free mechanism was an artifact of idiosyncrasies in English that had nothing to do with the syntactic structures in question (Kam et al. Reference Kam, Stoyneshka, Tornyova, Fodor and Sakas2008). This does not rule out the possibility that a different model-free mechanism would succeed at learning the right generalizations, but adopting the view that learning language is learning to process language does not get around the fundamental challenges.
We now turn to an example from our own work in cross-situational word-learning, where model-based and model-free versions of learning mechanisms can both work in principle (Yu et al. Reference Yu, Smith, Klein, Shiffrin, McNamara and Trafton2007). Cross-situational word learning refers to naturalistic situations where learners encounter words under referential ambiguity, and learn the correct word-to-referent mappings via the accumulation of cross situational statistics (Yu & Smith Reference Yu and Smith2007, among others). The associative learning account for how cross-situational statistics are used proposes that learning is model-free, in that passive accumulation of the co-occurrence statistics between words and their possible referents suffices for learning word-referent mappings. In contrast, model-based word-learning accounts posit that, like a mini-linguist, learners have the overarching assumption that words are referential, and learners actively evaluate possible word-referent mappings (e.g, Trueswell et al. Reference Trueswell, Medina, Hafri and Gleitman2013; Waxman & Gelman Reference Waxman and Gelman2009). Although computationally, both accounts are plausible (Yu et al. Reference Yu, Smith, Klein, Shiffrin, McNamara and Trafton2007), we recently carried out an experiment showing the importance of learners' knowledge that words are referential – a model-based, top-down constraint (Wang & Mintz, Reference Wang and Mintzunder revision). We created a cross-situational learning experiment in which there was referential ambiguity within trials, but reliable cross-situational statistical information as to the word-referent mappings. In two different conditions, we held word and referent co-occurrence statistics constant but gave each group of participants different instructions. Both groups were instructed to perform a distractor task, and only one group was also told to learn word meanings. Only the latter group successfully learned the mappings, even though both groups were exposed to the same word-to-referent co-occurrence patterns. Thus, although a model-free learner could succeed in the task, human learners required the notion that words refer for word learning. We take this as evidence that model-based hypothesis testing is required for word learning empirically, even though the model-free version could have worked in principle.
In sum, although the Now-or-Never bottleneck presents interesting challenges for theories of language acquisition, the perspective C&C espouse does not solve problems that model-based approaches do, and empirically, model-free mechanisms do not apply to certain learning situations. Thus, casting acquisition as learning to process across levels of linguistic abstraction does not avoid the theoretical controversies and debates that inhabit the field. It simply shifts the debate from the nature of the constraints on linguistic knowledge acquisition to the nature of the constraints on “learning to process.” We do not believe that this shift has substantial theoretical consequences for understanding the nature of the constraints on language learning.