Pickering & Garrod (P&G) argue forcefully that language production and language comprehension are richly interwoven, allowing for fluid, highly interactive discourse to unfold. They note that a key feature of language that makes such fluidity possible is the pervasive use of prediction. Speakers predict and monitor their own language as they speak, allowing them to plan ahead and self-correct, and listeners predict upcoming utterances as they listen. The authors in fact provide evidence for predictive strategies at every level of language use: from phonology, to lexical semantics, syntax, and pragmatics.
Given the ubiquity of prediction in language use, an interesting consideration that P&G touch on only briefly is how prediction may be involved in the initial formation of linguistic representations, that is, in language development. Indeed, the authors draw heavily from forward modeling, invoking the Wolpert models as a possible schematic for their dynamic, prediction-based system. And although their inclusion is surely appropriate for discourse and language use, these models are fundamentally models of learning (e.g., Wolpert Reference Wolpert1997; Wolpert et al. Reference Wolpert, Ghahramani and Flanagan2001). Hence, the degree to which our predictions are fulfilled (or violated) might have enormous consequences for linguistic representations and, ultimately, for the predictions we make in the future. More generally, prediction has long been viewed as essential to learning (e.g., Rescorla & Wagner Reference Rescorla, Wagner, Black and Prokasy1972).
Prediction might play an important role in language development in several ways, such as when using transitional probabilities, when avoiding overgeneralizations, and when mapping form and meaning in novel phrasal constructions. Each of these three case studies is described, as follows.
Transitional probabilities
Extracting the probability of Q given P can be useful in initial word segmentation (Graf Estes et al. Reference Graf Estes, Evans, Alibali and Saffran2007; Saffran et al. Reference Saffran, Aslin and Newport1996), word learning (Hay et al. Reference Hay, Pelucchi, Graf Estes and Saffran2011; Mirman et al. Reference Mirman, Magnuson, Graf Estes and Dixon2008), and grammar learning (Gomez & Gerken Reference Gomez and Gerken1999; Saffran Reference Saffran2002). A compelling way to interpret the contribution of transitional probabilities to learning is that P allows learners to form an expectation of Q (Turk-Browne et al. Reference Turk-Browne, Scholl, Johnson and Chun2010). In fact, sensitivity to transitional probabilities correlates positively with the ability to use word predictability to facilitate comprehension under noisy input conditions (Conway et al. Reference Conway, Bauernschmidt, Huang and Pisoni2010). Moreover, sensitivity to sequential expectations also correlates positively with the ability to successfully process complex, long-distance dependencies in natural language (Misyak et al. Reference Misyak, Christiansen and Tomblin2010). Simple recurrent networks (SRNs) rely on prediction error to correct connection weights, and appear to learn certain aspects of language in much the same way as children do (Elman Reference Elman1991; Reference Elman1993; Lewis & Elman Reference Lewis and Elman2001; French et al. Reference French, Addyman and Mareschal2011).
Statistical preemption
Children routinely make overgeneralization errors, producing foots instead of feet, or She disappeared the quarter instead of She made the quarter disappear. A number of theorists have suggested that learners implicitly predict upcoming formulations and compare witnessed formulations to their predictions, resulting in error-driven learning. That is, in contexts in which A is expected or predicted, but B is repeatedly used instead: children learn that B, not A, is the appropriate formulation – B statistically preempts A. Preemption is well accepted in morphology (e.g., went preempts goed; Aronoff Reference Aronoff1976; Kiparsky Reference Kiparsky and Yang1982).
Unlike went and goed, distinct phrasal constructions are virtually never semantically and pragmatically identical. Nonetheless, if learners consistently witness one construction in contexts where they might have expected to hear another, the former can statistically preempt the latter (Goldberg Reference Goldberg1995; Reference Goldberg2006; Reference Goldberg2011; Marcotte Reference Marcotte2005). For example, if learners expect to hear disappear used transitively in relevant contexts (e.g., She disappeared it), but instead consistently hear it used periphrastically (e.g., She made it disappear), they appear to read just future predictions so that they ultimately prefer the periphrastic causative (Boyd & Goldberg Reference Boyd and Goldberg2011; Brooks & Tomasello Reference Brooks and Tomasello1999; Suttle & Goldberg forthcoming).
Construction learning
Because possible sentences form an open-ended set, it is not sufficient to simply memorize utterances that have been heard. Rather, learners must generalize over utterances in order to understand and produce new formulations. The learning of novel phrasal constructions involves learning to associate form with meaning, such as the double object pattern with “intended transfer.” Note, for example, that She mooped him something implies that she intends to give him something, and this meaning cannot be attributed to the nonsense word, moop. In the domain of phrasal construction learning, phrasal constructions appear to be at least as good predictors of overall sentence meaning as individual verbs (Bencini & Goldberg Reference Bencini and Goldberg2000; Goldberg et al. Reference Goldberg, Casenhiser and Sethuraman2005).
We have recently investigated the brain systems involved in learning novel constructions. While undergoing functional magnetic resonance imaging (fMRI), participants were shown short audiovisual clips that provided the opportunity to learn novel constructions. For example, a novel “appearance” construction consisted of various characters appearing on or in another object, with the word order Verb-NP
theme-NP
locative, (where NP is noun phrase). For each construction, there was a patterned condition and a random condition. In the patterned condition, videos were consistently narrated by the V-NP
theme-NP
locative pattern, enabling participants to associate the abstract form and meaning. In the random condition, the exact same videos were shown in the same order, but the words were randomly reordered; this inconsistency prevented successful construction learning. Most relevant to present purposes, we found an inverse relationship between ventral striatal (VS) activity and learning for patterned presentations only: Greater test accuracy on new instances (requiring generalization) was correlated with less ventral striatal activity during learning. In other tasks, VS gauges the discrepancy between predictions and outcomes, signaling that something new can be learned (Niv & Montague Reference Niv and Montague2008; O'Doherty et al. Reference O'Doherty, Dayan, Schultz, Deichmann, Friston and Dolan2004; Pagnoni et al. Reference Pagnoni, Zink, Montague and Berns2002). This activity may therefore suggest a role for prediction in construction learning: Better learning results in more accurate predictions of how the scene will unfold.
Such prediction-based learning may therefore be a natural consequence of making implicit predictions during language production and comprehension. Future research is needed to elucidate the scope of this prediction-based learning mechanism, and to understand its role in language. Such investigations would strengthen and ground P&G's proposal, and would suggest that predictions are central to both language use and language development.
Pickering & Garrod (P&G) argue forcefully that language production and language comprehension are richly interwoven, allowing for fluid, highly interactive discourse to unfold. They note that a key feature of language that makes such fluidity possible is the pervasive use of prediction. Speakers predict and monitor their own language as they speak, allowing them to plan ahead and self-correct, and listeners predict upcoming utterances as they listen. The authors in fact provide evidence for predictive strategies at every level of language use: from phonology, to lexical semantics, syntax, and pragmatics.
Given the ubiquity of prediction in language use, an interesting consideration that P&G touch on only briefly is how prediction may be involved in the initial formation of linguistic representations, that is, in language development. Indeed, the authors draw heavily from forward modeling, invoking the Wolpert models as a possible schematic for their dynamic, prediction-based system. And although their inclusion is surely appropriate for discourse and language use, these models are fundamentally models of learning (e.g., Wolpert Reference Wolpert1997; Wolpert et al. Reference Wolpert, Ghahramani and Flanagan2001). Hence, the degree to which our predictions are fulfilled (or violated) might have enormous consequences for linguistic representations and, ultimately, for the predictions we make in the future. More generally, prediction has long been viewed as essential to learning (e.g., Rescorla & Wagner Reference Rescorla, Wagner, Black and Prokasy1972).
Prediction might play an important role in language development in several ways, such as when using transitional probabilities, when avoiding overgeneralizations, and when mapping form and meaning in novel phrasal constructions. Each of these three case studies is described, as follows.
Transitional probabilities
Extracting the probability of Q given P can be useful in initial word segmentation (Graf Estes et al. Reference Graf Estes, Evans, Alibali and Saffran2007; Saffran et al. Reference Saffran, Aslin and Newport1996), word learning (Hay et al. Reference Hay, Pelucchi, Graf Estes and Saffran2011; Mirman et al. Reference Mirman, Magnuson, Graf Estes and Dixon2008), and grammar learning (Gomez & Gerken Reference Gomez and Gerken1999; Saffran Reference Saffran2002). A compelling way to interpret the contribution of transitional probabilities to learning is that P allows learners to form an expectation of Q (Turk-Browne et al. Reference Turk-Browne, Scholl, Johnson and Chun2010). In fact, sensitivity to transitional probabilities correlates positively with the ability to use word predictability to facilitate comprehension under noisy input conditions (Conway et al. Reference Conway, Bauernschmidt, Huang and Pisoni2010). Moreover, sensitivity to sequential expectations also correlates positively with the ability to successfully process complex, long-distance dependencies in natural language (Misyak et al. Reference Misyak, Christiansen and Tomblin2010). Simple recurrent networks (SRNs) rely on prediction error to correct connection weights, and appear to learn certain aspects of language in much the same way as children do (Elman Reference Elman1991; Reference Elman1993; Lewis & Elman Reference Lewis and Elman2001; French et al. Reference French, Addyman and Mareschal2011).
Statistical preemption
Children routinely make overgeneralization errors, producing foots instead of feet, or She disappeared the quarter instead of She made the quarter disappear. A number of theorists have suggested that learners implicitly predict upcoming formulations and compare witnessed formulations to their predictions, resulting in error-driven learning. That is, in contexts in which A is expected or predicted, but B is repeatedly used instead: children learn that B, not A, is the appropriate formulation – B statistically preempts A. Preemption is well accepted in morphology (e.g., went preempts goed; Aronoff Reference Aronoff1976; Kiparsky Reference Kiparsky and Yang1982).
Unlike went and goed, distinct phrasal constructions are virtually never semantically and pragmatically identical. Nonetheless, if learners consistently witness one construction in contexts where they might have expected to hear another, the former can statistically preempt the latter (Goldberg Reference Goldberg1995; Reference Goldberg2006; Reference Goldberg2011; Marcotte Reference Marcotte2005). For example, if learners expect to hear disappear used transitively in relevant contexts (e.g., She disappeared it), but instead consistently hear it used periphrastically (e.g., She made it disappear), they appear to read just future predictions so that they ultimately prefer the periphrastic causative (Boyd & Goldberg Reference Boyd and Goldberg2011; Brooks & Tomasello Reference Brooks and Tomasello1999; Suttle & Goldberg forthcoming).
Construction learning
Because possible sentences form an open-ended set, it is not sufficient to simply memorize utterances that have been heard. Rather, learners must generalize over utterances in order to understand and produce new formulations. The learning of novel phrasal constructions involves learning to associate form with meaning, such as the double object pattern with “intended transfer.” Note, for example, that She mooped him something implies that she intends to give him something, and this meaning cannot be attributed to the nonsense word, moop. In the domain of phrasal construction learning, phrasal constructions appear to be at least as good predictors of overall sentence meaning as individual verbs (Bencini & Goldberg Reference Bencini and Goldberg2000; Goldberg et al. Reference Goldberg, Casenhiser and Sethuraman2005).
We have recently investigated the brain systems involved in learning novel constructions. While undergoing functional magnetic resonance imaging (fMRI), participants were shown short audiovisual clips that provided the opportunity to learn novel constructions. For example, a novel “appearance” construction consisted of various characters appearing on or in another object, with the word order Verb-NP theme-NP locative, (where NP is noun phrase). For each construction, there was a patterned condition and a random condition. In the patterned condition, videos were consistently narrated by the V-NP theme-NP locative pattern, enabling participants to associate the abstract form and meaning. In the random condition, the exact same videos were shown in the same order, but the words were randomly reordered; this inconsistency prevented successful construction learning. Most relevant to present purposes, we found an inverse relationship between ventral striatal (VS) activity and learning for patterned presentations only: Greater test accuracy on new instances (requiring generalization) was correlated with less ventral striatal activity during learning. In other tasks, VS gauges the discrepancy between predictions and outcomes, signaling that something new can be learned (Niv & Montague Reference Niv and Montague2008; O'Doherty et al. Reference O'Doherty, Dayan, Schultz, Deichmann, Friston and Dolan2004; Pagnoni et al. Reference Pagnoni, Zink, Montague and Berns2002). This activity may therefore suggest a role for prediction in construction learning: Better learning results in more accurate predictions of how the scene will unfold.
Such prediction-based learning may therefore be a natural consequence of making implicit predictions during language production and comprehension. Future research is needed to elucidate the scope of this prediction-based learning mechanism, and to understand its role in language. Such investigations would strengthen and ground P&G's proposal, and would suggest that predictions are central to both language use and language development.