Prediction plays a key role in language development as well as processing

Matt A. Johnson; Nicholas B. Turk-Browne; Adele E. Goldberg

doi:10.1017/S0140525X12002609

Prediction plays a key role in language development as well as processing

Published online by Cambridge University Press: 24 June 2013

Matt A. Johnson ,

Nicholas B. Turk-Browne and

Adele E. Goldberg

Show author details

Matt A. Johnson: Affiliation:
Department of Psychology, Princeton University, Princeton, NJ 08544. majthree@princeton.eduwww.princeton.edu/ntblabntb@princeton.edu
Nicholas B. Turk-Browne: Affiliation:
Department of Psychology, Princeton University, Princeton, NJ 08544. majthree@princeton.eduwww.princeton.edu/ntblabntb@princeton.edu
Adele E. Goldberg: Affiliation:
Program in Linguistics, Princeton University, Princeton, NJ 08544. adele@princeton.eduwww.princeton.edu/~adele

Article contents

Abstract
References

Rights & Permissions

Abstract

Although the target article emphasizes the important role of prediction in language use, prediction may well also play a key role in the initial formation of linguistic representations, that is, in language development. We outline the role of prediction in three relevant language-learning domains: transitional probabilities, statistical preemption, and construction learning.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 36 , Issue 4 , August 2013 , pp. 360 - 361

DOI: https://doi.org/10.1017/S0140525X12002609 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Pickering & Garrod (P&G) argue forcefully that language production and language comprehension are richly interwoven, allowing for fluid, highly interactive discourse to unfold. They note that a key feature of language that makes such fluidity possible is the pervasive use of prediction. Speakers predict and monitor their own language as they speak, allowing them to plan ahead and self-correct, and listeners predict upcoming utterances as they listen. The authors in fact provide evidence for predictive strategies at every level of language use: from phonology, to lexical semantics, syntax, and pragmatics.

Given the ubiquity of prediction in language use, an interesting consideration that P&G touch on only briefly is how prediction may be involved in the initial formation of linguistic representations, that is, in language development. Indeed, the authors draw heavily from forward modeling, invoking the Wolpert models as a possible schematic for their dynamic, prediction-based system. And although their inclusion is surely appropriate for discourse and language use, these models are fundamentally models of learning (e.g., Wolpert Reference Wolpert1997; Wolpert et al. Reference Wolpert, Ghahramani and Flanagan2001). Hence, the degree to which our predictions are fulfilled (or violated) might have enormous consequences for linguistic representations and, ultimately, for the predictions we make in the future. More generally, prediction has long been viewed as essential to learning (e.g., Rescorla & Wagner Reference Rescorla, Wagner, Black and Prokasy1972).

Prediction might play an important role in language development in several ways, such as when using transitional probabilities, when avoiding overgeneralizations, and when mapping form and meaning in novel phrasal constructions. Each of these three case studies is described, as follows.

Transitional probabilities

Extracting the probability of Q given P can be useful in initial word segmentation (Graf Estes et al. Reference Graf Estes, Evans, Alibali and Saffran2007; Saffran et al. Reference Saffran, Aslin and Newport1996), word learning (Hay et al. Reference Hay, Pelucchi, Graf Estes and Saffran2011; Mirman et al. Reference Mirman, Magnuson, Graf Estes and Dixon2008), and grammar learning (Gomez & Gerken Reference Gomez and Gerken1999; Saffran Reference Saffran2002). A compelling way to interpret the contribution of transitional probabilities to learning is that P allows learners to form an expectation of Q (Turk-Browne et al. Reference Turk-Browne, Scholl, Johnson and Chun2010). In fact, sensitivity to transitional probabilities correlates positively with the ability to use word predictability to facilitate comprehension under noisy input conditions (Conway et al. Reference Conway, Bauernschmidt, Huang and Pisoni2010). Moreover, sensitivity to sequential expectations also correlates positively with the ability to successfully process complex, long-distance dependencies in natural language (Misyak et al. Reference Misyak, Christiansen and Tomblin2010). Simple recurrent networks (SRNs) rely on prediction error to correct connection weights, and appear to learn certain aspects of language in much the same way as children do (Elman Reference Elman1991; Reference Elman1993; Lewis & Elman Reference Lewis and Elman2001; French et al. Reference French, Addyman and Mareschal2011).

Statistical preemption

Children routinely make overgeneralization errors, producing foots instead of feet, or She disappeared the quarter instead of She made the quarter disappear. A number of theorists have suggested that learners implicitly predict upcoming formulations and compare witnessed formulations to their predictions, resulting in error-driven learning. That is, in contexts in which A is expected or predicted, but B is repeatedly used instead: children learn that B, not A, is the appropriate formulation – B statistically preempts A. Preemption is well accepted in morphology (e.g., went preempts goed; Aronoff Reference Aronoff1976; Kiparsky Reference Kiparsky and Yang1982).

Unlike went and goed, distinct phrasal constructions are virtually never semantically and pragmatically identical. Nonetheless, if learners consistently witness one construction in contexts where they might have expected to hear another, the former can statistically preempt the latter (Goldberg Reference Goldberg1995; Reference Goldberg2006; Reference Goldberg2011; Marcotte Reference Marcotte2005). For example, if learners expect to hear disappear used transitively in relevant contexts (e.g., She disappeared it), but instead consistently hear it used periphrastically (e.g., She made it disappear), they appear to read just future predictions so that they ultimately prefer the periphrastic causative (Boyd & Goldberg Reference Boyd and Goldberg2011; Brooks & Tomasello Reference Brooks and Tomasello1999; Suttle & Goldberg forthcoming).

Construction learning

Because possible sentences form an open-ended set, it is not sufficient to simply memorize utterances that have been heard. Rather, learners must generalize over utterances in order to understand and produce new formulations. The learning of novel phrasal constructions involves learning to associate form with meaning, such as the double object pattern with “intended transfer.” Note, for example, that She mooped him something implies that she intends to give him something, and this meaning cannot be attributed to the nonsense word, moop. In the domain of phrasal construction learning, phrasal constructions appear to be at least as good predictors of overall sentence meaning as individual verbs (Bencini & Goldberg Reference Bencini and Goldberg2000; Goldberg et al. Reference Goldberg, Casenhiser and Sethuraman2005).

We have recently investigated the brain systems involved in learning novel constructions. While undergoing functional magnetic resonance imaging (fMRI), participants were shown short audiovisual clips that provided the opportunity to learn novel constructions. For example, a novel “appearance” construction consisted of various characters appearing on or in another object, with the word order Verb-NP _theme-NP _locative, (where NP is noun phrase). For each construction, there was a patterned condition and a random condition. In the patterned condition, videos were consistently narrated by the V-NP _theme-NP _locative pattern, enabling participants to associate the abstract form and meaning. In the random condition, the exact same videos were shown in the same order, but the words were randomly reordered; this inconsistency prevented successful construction learning. Most relevant to present purposes, we found an inverse relationship between ventral striatal (VS) activity and learning for patterned presentations only: Greater test accuracy on new instances (requiring generalization) was correlated with less ventral striatal activity during learning. In other tasks, VS gauges the discrepancy between predictions and outcomes, signaling that something new can be learned (Niv & Montague Reference Niv and Montague2008; O'Doherty et al. Reference O'Doherty, Dayan, Schultz, Deichmann, Friston and Dolan2004; Pagnoni et al. Reference Pagnoni, Zink, Montague and Berns2002). This activity may therefore suggest a role for prediction in construction learning: Better learning results in more accurate predictions of how the scene will unfold.

Such prediction-based learning may therefore be a natural consequence of making implicit predictions during language production and comprehension. Future research is needed to elucidate the scope of this prediction-based learning mechanism, and to understand its role in language. Such investigations would strengthen and ground P&G's proposal, and would suggest that predictions are central to both language use and language development.

References

Aronoff, M. (1976) Word formation in generative grammar. Linguistic Inquiry Monograph 1. MIT Press.Google Scholar

Bencini, G. M. & Goldberg, A. E. (2000) The contribution of argument structure constructions to sentence meaning. Journal of Memory & Language 43:640–51.CrossRef Google Scholar

Boyd, J. K. & Goldberg, A. E. (2011) Learning what not to say: Categorization and statistical preemption in “a-adjective” production. Language 87:1–29.CrossRef Google Scholar

Brooks, P. J. & Tomasello, M. (1999) How children constrain their argument structure constructions. Language 75(4):720–38.Google Scholar

Conway, C. M., Bauernschmidt, A., Huang, S. S. & Pisoni, D. B. (2010) Implicit statistical learning in language processing: Word predictability is the key. Cognition 114:356–71.Google Scholar

Elman, J. L. (1991) Distributed representations, simple recurrent networks, and grammatical structure. Machine Learning 7:195–225.Google Scholar

Elman, J. L. (1993) Learning and development in neural networks: The importance of starting small. Cognition 48:71–99.Google Scholar

French, R. M., Addyman, C. & Mareschal, D. (2011) TRACX: A recognition-based connectionist framework for sequence segmentation and chunk extraction. Psychological Review 118:614–636.Google Scholar

Goldberg, A. E. (1995) Constructions: A construction grammar approach to argument structure. University of Chicago Press.Google Scholar

Goldberg, A. E. (2006) Constructions at work: The nature of generalization in language. Oxford University Press.Google Scholar

Goldberg, A. E. (2011) Corpus evidence of the viability of statistical preemption. Cognitive Linguistics 22:131–54.Google Scholar

Goldberg, A. E., Casenhiser, D. M. & Sethuraman, N. (2005) The role of prediction in construction-learning. Journal of Child Language 32:407–26.Google Scholar

Gomez, R. L. & Gerken, L. (1999) Artificial grammar learning by 1-year-olds leads to specific and abstract knowledge. Cognition 70:109–35.Google Scholar

Graf Estes, K., Evans, J. L., Alibali, M. W. & Saffran, J. R. (2007) Can infants map meaning to newly segmented words? Psychological Science 18:254–60.CrossRef Google Scholar PubMed

Hay, J. F., Pelucchi, B., Graf Estes, K. & Saffran, J. R. (2011) Linking sounds to meaning: Infant statistical learning in a natural language. Cognitive Psychology 63:93–106.Google Scholar

Kiparsky, P. (1982) Lexical morphology and phonology. In: Linguistics in the Morning Calm, ed. Yang, I.-S., pp. 3–91. Hanshin.Google Scholar

Lewis, J. D. & Elman, J. L. (2001) Learnability and the statistical structure of language: Poverty of stimulus arguments revisited. Proceedings of the 26th Annual Conference on Language Development.Google Scholar

Marcotte, J. (2005) Causative alternation errors as event-driven construction paradigm completions . Stanford, Ph.D. dissertation.Google Scholar

Mirman, D., Magnuson, J., Graf Estes, K. & Dixon, J. A. (2008) The link between statistical segmentation and word learning in adults. Cognition 108:271–80.Google Scholar

Misyak, J. B., Christiansen, M. H. & Tomblin, J. B. (2010) Sequential expectations: The role of prediction-based learning in language. Topics in Cognitive Science 2:138–53.Google Scholar

Niv, Y. & Montague, P. R. (2008) Theoretical and empirical studies of learning. Neuroeconomics: Decision making and the brain, pp. 329–50. Elsevier.Google Scholar

O'Doherty, J. P., Dayan, P., Schultz, J., Deichmann, R., Friston, K. & Dolan, R. (2004) Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science 304:452–54.Google Scholar

Pagnoni, G., Zink, C. F., Montague, P. R. & Berns, G. S. (2002) Activity in human ventral striatum locked to errors of reward prediction. Nature Neuroscience 5:97–98.CrossRef Google Scholar PubMed

Rescorla, R. A. & Wagner, A. R. (1972) A theory of Pavlovian conditioning: Variations in the effectiveness of reinforcement and nonreinforcement. In: Classical conditioning II, Black, A. H. & Prokasy, F., pp. 64–99. Appleton-Century-Crofts.Google Scholar

Saffran, J. R. (2002) Constraints on statistical language learning. Journal of Memory and Language 47:172–96.CrossRef Google Scholar

Saffran, J. R., Aslin, R. N. & Newport, E. L. (1996) Statistical learning by 8-month-old infants. Science 274:1926–28.CrossRef Google Scholar PubMed

Suttle, L. & Goldberg, A. E. (forthcoming) Learning what not to say: Comparing the role of preemption and entrenchment. Princeton University.Google Scholar

Turk-Browne, N. B., Scholl, B. J., Johnson, M. K. & Chun, M. M. (2010) Implicit perceptual anticipation triggered by statistical learning. Journal of Neuroscience 30:11177–87.CrossRef Google Scholar PubMed

Wolpert, D. M. (1997) Computational approaches to motor control. Trends in Cognitive Sciences 1:209–16.Google Scholar

Wolpert, D. M., Ghahramani, Z. & Flanagan, J. R. (2001) Perspectives and problems in motor learning. Trends in Cognitive Sciences 5:487–94.Google Scholar