Hostname: page-component-745bb68f8f-lrblm Total loading time: 0 Render date: 2025-02-11T09:32:02.473Z Has data issue: false hasContentIssue false

Language acquisition is model-based rather than model-free

Published online by Cambridge University Press:  02 June 2016

Felix Hao Wang
Affiliation:
Department of PsychologyUniversity of Southern California, 3620 McClintock Ave, Los Angeles, CA 90089-1061. wang970@usc.edutmintz@usc.eduhttp://dornsife.usc.edu/tobenmintz
Toben H. Mintz
Affiliation:
Department of PsychologyUniversity of Southern California, 3620 McClintock Ave, Los Angeles, CA 90089-1061. wang970@usc.edutmintz@usc.eduhttp://dornsife.usc.edu/tobenmintz

Abstract

Christiansen & Chater (C&C) propose that learning language is learning to process language. However, we believe that the general-purpose prediction mechanism they propose is insufficient to account for many phenomena in language acquisition. We argue from theoretical considerations and empirical evidence that many acquisition tasks are model-based, and that different acquisition tasks require different, specialized models.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2016 

Given the Chunk-and-Pass processing necessitated by the Now-or-Never bottleneck, Christiansen & Chater (C&C) propose that learning language is learning to process language. In C&C's conceptualization, the learning and prediction processes are general, (henceforth, model-free), and knowledge used in prediction arises gradually. In discussing the consequences of this scenario, C&C impose a dichotomy between these prediction-based models that are the outcome of learning to process, and learning based on more specialized constraints on how linguistic information is processed (the “child as linguist” approach, henceforth, model-based). In this commentary, we leave aside discussion of the Now-or-Never bottleneck per se and focus on C&C's claims about its theoretical consequences for language acquisition.

C&C's perspective provides an interesting framework for guiding research and developing theories. However, we argue that it does not provide significant constraints on the broader theoretical debates with which the field is engaged: in particular, debates about the nature of constraints on learning. Our argument is based on theoretical necessity and empirical evidence. Theoretically, the model-free approach is destined to be misled by surface-level information. Specifically, the general-purpose learning procedure is underspecified with respect to the level of analysis given different problems: Information for particular problems may exist at different levels, and using the wrong level may lead the learner astray. Empirically, when the model-based and model-free approaches are computationally equivalent, the model-free approach simply may not coincide with human performance. To support these claims we cite two cases: one from syntax, and another from word learning.

Many arguments for model-based learning come from phenomena that require a specific level of analysis. An oft-cited example is the constraint on structure-dependence, which specifies that grammatical operations apply to abstract phrasal structures, not linear sequences. It accounts for the fact that the yes/no question in 1(b), following, is the correct form that is related to the declarative 1(a), but question in 1(c) is not.

  1. 1.

    1. a. The girl who is smiling is happy.

    2. b. Is the girl who is smiling happy?

    3. c. °Is the girl who smiling is happy?

The distinction hinges superficially on which is is moved to the beginning of the sentence in the question. The grammatical principle that governs this operation is subject-auxiliary inversion; in 1(a), the subject is the complex noun phase [the girl who is smiling], so the entire structure inverts with is. The model-based argument is that young children's input lacks the positive examples of the complex embedded questions as in 1(b), but rather consists of simpler utterances such as 2(a) and 2(b); without the notion that syntactic operations operate over phrasal structures, why would a learner not conclude from 2(a) and 2(b) to simply front the first is?

  1. (2)

    1. a. The girl is happy.

    2. b. Is the girl happy?

Reali and Christiansen's (Reference Reali and Christiansen2005) model-free approach addresses this question. They demonstrated that a model-free learner who is sensitive to local bigram patterns could make the correct predictions about the structure of yes/no questions with complex noun phrases. This demonstration showed how attending to local sequential patterns could achieve the appropriate behavior despite not representing linguistic material at the level of syntactic hierarchies, as called for by model-based accounts. However, it turned out that the success of the model-free mechanism was an artifact of idiosyncrasies in English that had nothing to do with the syntactic structures in question (Kam et al. Reference Kam, Stoyneshka, Tornyova, Fodor and Sakas2008). This does not rule out the possibility that a different model-free mechanism would succeed at learning the right generalizations, but adopting the view that learning language is learning to process language does not get around the fundamental challenges.

We now turn to an example from our own work in cross-situational word-learning, where model-based and model-free versions of learning mechanisms can both work in principle (Yu et al. Reference Yu, Smith, Klein, Shiffrin, McNamara and Trafton2007). Cross-situational word learning refers to naturalistic situations where learners encounter words under referential ambiguity, and learn the correct word-to-referent mappings via the accumulation of cross situational statistics (Yu & Smith Reference Yu and Smith2007, among others). The associative learning account for how cross-situational statistics are used proposes that learning is model-free, in that passive accumulation of the co-occurrence statistics between words and their possible referents suffices for learning word-referent mappings. In contrast, model-based word-learning accounts posit that, like a mini-linguist, learners have the overarching assumption that words are referential, and learners actively evaluate possible word-referent mappings (e.g, Trueswell et al. Reference Trueswell, Medina, Hafri and Gleitman2013; Waxman & Gelman Reference Waxman and Gelman2009). Although computationally, both accounts are plausible (Yu et al. Reference Yu, Smith, Klein, Shiffrin, McNamara and Trafton2007), we recently carried out an experiment showing the importance of learners' knowledge that words are referential – a model-based, top-down constraint (Wang & Mintz, Reference Wang and Mintzunder revision). We created a cross-situational learning experiment in which there was referential ambiguity within trials, but reliable cross-situational statistical information as to the word-referent mappings. In two different conditions, we held word and referent co-occurrence statistics constant but gave each group of participants different instructions. Both groups were instructed to perform a distractor task, and only one group was also told to learn word meanings. Only the latter group successfully learned the mappings, even though both groups were exposed to the same word-to-referent co-occurrence patterns. Thus, although a model-free learner could succeed in the task, human learners required the notion that words refer for word learning. We take this as evidence that model-based hypothesis testing is required for word learning empirically, even though the model-free version could have worked in principle.

In sum, although the Now-or-Never bottleneck presents interesting challenges for theories of language acquisition, the perspective C&C espouse does not solve problems that model-based approaches do, and empirically, model-free mechanisms do not apply to certain learning situations. Thus, casting acquisition as learning to process across levels of linguistic abstraction does not avoid the theoretical controversies and debates that inhabit the field. It simply shifts the debate from the nature of the constraints on linguistic knowledge acquisition to the nature of the constraints on “learning to process.” We do not believe that this shift has substantial theoretical consequences for understanding the nature of the constraints on language learning.

References

Kam, X. N. C., Stoyneshka, I., Tornyova, L., Fodor, J. D. & Sakas, W. G. (2008) Bigrams and the richness of the stimulus. Cognitive Science 32(4):771–87.CrossRefGoogle ScholarPubMed
Mintz, T. H., Wang, F. H. & Li, J. (2014) Word categorization from distributional information: Frames confer more than the sum of their (Bigram) parts. Cognitive Psychology 75:127.CrossRefGoogle Scholar
Reali, F. & Christiansen, M. H. (2005) Uncovering the richness of the stimulus: Structure dependence and indirect statistical evidence. Cognitive Science 29(6):1007–28.CrossRefGoogle ScholarPubMed
Trueswell, J. C., Medina, T. N., Hafri, A. & Gleitman, L. R. (2013) Propose but verify: Fast mapping meets cross-situational word learning. Cognitive Psychology 66(1):126–56.CrossRefGoogle ScholarPubMed
Wang, F. H. & Mintz, T. H. (under revision) The limits of associative learning in cross-situational word learning.Google Scholar
Waxman, S. R. & Gelman, S. A. (2009) Early word-learning entails reference, not merely associations. Trends in Cognitive Sciences 13(6):258–63.CrossRefGoogle Scholar
Yu, C. & Smith, L. B. (2007) Rapid word learning under uncertainty via cross-situational statistics. Psychological Science 18(5):414–20.CrossRefGoogle ScholarPubMed
Yu, C., Smith, L. B., Klein, K. & Shiffrin, R. M. (2007) Hypothesis testing and associative learning in cross-situational word learning: Are they one and the same? In: Proceedings of the 29th Annual Conference of the Cognitive Science Society, Nashville, TN, August 2007, pp. 737–42, ed. McNamara, D. S. & Trafton, J. G.. Cognitive Science Society.Google Scholar