Seeking predictions from a predictive framework

T. Florian Jaeger; Victor Ferreira

doi:10.1017/S0140525X12002762

Seeking predictions from a predictive framework

Published online by Cambridge University Press: 24 June 2013

T. Florian Jaeger and

Victor Ferreira

Show author details

T. Florian Jaeger: Affiliation:
Department of Brain and Cognitive Sciences, University of Rochester, Rochester, NY 14627-0268. fjaeger@bcs.rochester.eduhttp://www.hlp.rochester.edu/ Department of Computer Science, University of Rochester, Rochester, NY 14627.
Victor Ferreira: Affiliation:
Department of Psychology 0109, University of California, San Diego, La Jolla, CA 92093-0109. vferreira@ucsd.eduhttp://lpl.ucsd.edu/

Article contents

Abstract
References

Rights & Permissions

Abstract

We welcome the proposal to use forward models to understand predictive processes in language processing. However, Pickering & Garrod (P&G) miss the opportunity to provide a strong framework for future work. Forward models need to be pursued in the context of learning. This naturally leads to questions about what prediction error these models aim to minimize.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 36 , Issue 4 , August 2013 , pp. 359 - 360

DOI: https://doi.org/10.1017/S0140525X12002762 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

Pickering & Garrod (P&G) are not the first to propose that comprehension is a predictive process (e.g., Hale Reference Hale2001; Levy Reference Levy2008; Ramscar et al. Reference Ramscar, Yarlett, Dye, Denny and Thorpe2010). Similarly, recent work has found that language production is sensitive to prediction in ways closely resembling comprehension (e.g., Aylett & Turk Reference Aylett and Turk2004; Jaeger Reference Jaeger2010). We believe that forward models (1) offer an elegant account of prediction effects and (2) provide a framework that could generate novel predictions and guide future work. However, in our view, the proposal by P&G fails to advance either goal because it does not take into account two important properties of forward models. The first is learning; the second is the nature of the prediction error that the forward model is minimizing.

Learning

Forward models have been a successful framework for motor control in large part because they provide a unifying framework, not only for prediction, but also for learning. Since their inception, forward models have been used to study learning–both acquisition and adaptation throughout life. However, except for a brief mention of “tuning” (target article, sect. 3.1, para. 15), P&G do not discuss what predictions their framework makes for implicit learning during language production, despite the fact that construing language processing as prediction in the context of learning readily explains otherwise puzzling findings from production (e.g., Roche et al. Reference Roche, Dale, Kreuz and Jaeger2013; Warker & Dell Reference Warker and Dell2006), comprehension (e.g., Clayards et al. Reference Clayards, Tanenhaus, Aslin and Jacobs2008; Farmer et al. Reference Farmer, Brown and Tanenhaus2013; Kleinschmidt et al. Reference Kleinschmidt, Fine and Jaeger2012) and acquisition (Ramscar et al. Reference Ramscar, Yarlett, Dye, Denny and Thorpe2010). If connected to learning, forward models can explain how we learn to align our predictions during dialogue (i.e., learning in order to reduce future prediction errors, Fine et al., submitted; Jaeger & Snider Reference Jaeger and Snider2013; for related ideas, see also Chang et al. Reference Chang, Dell and Bock2006; Fine & Jaeger Reference Fine and Jaeger2013; Kleinschmidt & Jaeger Reference Kleinschmidt and Jaeger2011; Sonderegger & Yu Reference Sonderegger and Yu2010).

Prediction errors

Deriving testable predictions from forward models is integrally tied to the nature of the prediction error that the system is meant to minimize during self- and other-monitoring (i.e., the function of the model, cf. Guenther et al. Reference Guenther, Hampson and Johnson1998). P&G do not explicitly address this. They do, however, propose separate forward models at all levels of linguistic representations. These forward models seem to have just one function, to predict the perceived linguistic unit at each level. For example, the syntactic forward model predicts the “syntactic percept,” which is used to decide whether the production plan needs to be adjusted (how this comparison proceeds and what determines its outcome is left unspecified).

Minimizing communication error: A proposal

If one of the goals of language production is to be understood–or even to communicate the intended message both robustly and efficiently (Jaeger Reference Jaeger2010; Lindblom Reference Lindblom1990)–correctly predicting the intended linguistic units should only be relevant to the extent that not doing so impedes being understood. Therefore, the prediction error that forward models in production should aim to minimize is not the perception of linguistic units, but the outcome of the entire inference process that constitutes comprehension. Support for this alternative view comes from work on motor control, work on articulation, and cross-linguistic properties of language.

For example, if the speaker produces an underspecified referential expression but is understood, there is no need to self-correct (as observed in research on conceptual pacts, Brennan & Clark Reference Brennan and Clark1996). This view would explain why only reductions of words with low confusability tend to enter the lexicon (e.g., “strodny,” rather than “extrary,” for “extraordinary”). If, however, the function of the forward model is to predict linguistic units, as P&G propose, no such generalization is expected. Rather, any deviation from the target phonology will cause a prediction error, regardless of whether it affects the likelihood of being understood. Similar reasoning applies to the reduction of morphosyntactic units, which often is blocked when it would cause systemic ambiguity (e.g., differential or optional case-marking, Fedzechkina et al. Reference Fedzechkina, Jaeger and Newport2012; see also Ferreira Reference Ferreira2008).

Research on motor control finds that not all prediction errors are created equal: Stronger adaptation effects are found after task-relevant errors (Wei & Körding Reference Wei and Körding2009). Indeed, in a recent perturbation study on production, Frank (Reference Frank2011) found that speakers exhibit stronger error correction if the perceived deviation from the intended acoustics makes the actual production more similar to an existing word (see also Perkell et al. Reference Perkell, Guenther, Lane, Matthies, Stockmann, Tiede and Zandipour2004).

This view also addresses another shortcoming of P&G's proposal. At several points, P&G state that the forward models make impoverished predictions. Perhaps predictions are impoverished only in that they map the efference copy directly onto the predicted meaning (rather than the intermediate linguistic units).

Of course, the goal of reducing the prediction error for efficient information transfer is achieved by reducing the prediction error at the levels assumed by P&G. In this case, the architecture assumed by P&G would follow from the more general principle described here. However, in a truly predictive learning framework (Clark Reference Clark2013), there is no guarantee that the levels of representation that such models would learn in order to minimize prediction errors would neatly map onto those traditionally assumed (cf. Baayen et al. Reference Baayen, Milin, Filipović Durdević, Hendrix and Marelli2011).

Finally, we note that, in the architecture proposed by P&G, the production forward model seems to serve no purpose but to be the input of the comprehension forward model (sect. 3.1, Fig. 5; sect. 3.2, Fig. 6). Presumably, the output of, for example, the syntactic production forward model will be a syntactic plan. Hence, the syntactic comprehension forward model takes syntactic plans as input. The output of that comprehension forward model must be akin to a parse, as it is compared to the output of the actual comprehension model. Neither of these components seems to fulfill any independent purpose. Why not map straight from the syntactic efference copy to the predicted “syntactic percept”? If forward models are used as a computational framework, rather than as metaphor, one of their strengths is that they can map efference copies directly onto the reference frame that is required for effective learning and minimization of the relevant prediction error (cf. Guenther et al. Reference Guenther, Hampson and Johnson1998).

References

Aylett, M. & Turk, A. (2004) The smooth signal redundancy hypothesis: A functional explanation for relationships between redundancy, prosodic prominence, and duration in spontaneous speech. Language and Speech 47(1):31–56.Google Scholar

Baayen, R. H., Milin, P., Filipović Durdević, D., Hendrix, P. & Marelli, M. (2011) An amorphous model for morphological processing in visual comprehension based on naive discriminative learning. Psychological Review 118:438–82.CrossRef Google Scholar PubMed

Brennan, S. E. & Clark, H. H. (1996) Conceptual pacts and lexical choice in conversation. Learning, Memory 22(6):1482–93.Google Scholar

Chang, F., Dell, G. S. & Bock, K. (2006) Becoming syntactic. Psychological Review 113(2):234–272.Google Scholar

Clark, A. (2013) Whatever next? Predictive brains, situated agents, and the future of cognitive science. Brain and Behavioral Sciences 36(3):181–253.Google Scholar

Clayards, M., Tanenhaus, M. K., Aslin, R. N. & Jacobs, R. A. (2008) Perception of speech reflects optimal use of probabilistic speech cues. Cognition 108(3):804–809.Google Scholar

Farmer, T. A., Brown, M. & Tanenhaus, M. K. (2013) Prediction, explanation, and the role of generative models in language processing. Brain and Behavioral Sciences 36(3):211–12.Google Scholar

Fedzechkina, M., Jaeger, T. F. & Newport, E. (2012) Language learners restructure their input to facilitate efficient communication. Proceedings of the National Academy of Sciences of the United States of America 109(44):17897–902.Google Scholar

Ferreira, V. S. (2008) Ambiguity, accessibility, and a division of labor for communicative success. Psychology of Learning and Motivation 49:209–46.Google Scholar

Fine, A. B. & Jaeger, T. F. (2013) Evidence for implicit learning in syntactic comprehension. Cognitive Science 37(3):578–91.Google Scholar

Fine, A. B., Jaeger, T. F., Farmer, T. & Qian, T. (submitted) Rapid linguistic adaptation during syntactic comprehension.Google Scholar

Frank, A. (2011) Integrating linguistic, motor, and perceptual information in language production. University of Rochester.Google Scholar

Guenther, F. H., Hampson, M. & Johnson, D. (1998) A theoretical investigation of reference frames for the planning of speech movements. Psychological Review 105(4):611.Google Scholar

Hale, J. (2001) A probabilistic early parser as a psycholinguistic model. In: North American Chapter of the Association for Computational Linguistics (NAACL), Vol. 2, pp. 1–8. Association for Computational Linguistics.Google Scholar

Jaeger, T. F. (2010) Redundancy and reduction: Speakers manage syntactic information density. Cognitive Psychology 61:23–62.CrossRef Google Scholar PubMed

Jaeger, T. F. & Snider, N. (2013) Alignment as a consequence of expectation adaptation: Syntactic priming is affected by the prime's prediction error given both prior and recent experience. Cognition 127(1):57–83.CrossRef Google Scholar PubMed

Kleinschmidt, D., Fine, A. B. & Jaeger, T. F. (2012) A belief-updating model of adaptation and cue combination in syntactic comprehension. Proceedings of the 34rd Annual Meeting of the Cognitive Science Society (CogSci12), 605–10.Google Scholar

Kleinschmidt, D. & Jaeger, T. F. (2011) A Bayesian belief updating model of phonetic recalibration and selective adaptation. Proceedings of the Cognitive Modeling and Computational Linguistics Workshop at ACL, Portland, OR, June 23^rd , 10–19.Google Scholar

Levy, R. (2008) Expectation-based syntactic comprehension. Cognition 106(3):1126–77.Google Scholar

Lindblom, B. (1990) Explaining phonetic variation: A sketch of the H&H theory. Speech production and speech modelling 55:40339.CrossRef Google Scholar

Perkell, J. S., Guenther, F. H., Lane, H., Matthies, M. L., Stockmann, E., Tiede, M. and Zandipour, M. (2004) The distinctness of speakers' productions of vowel contrasts is related to their discrimination of the contrasts. The Journal of the Acoustical Society of America 116:2338.Google Scholar

Ramscar, M., Yarlett, D., Dye, M., Denny, K. & Thorpe, K. (2010) The effects of feature-label-order and their implications for symbolic learning. Cognitive Science 34(6):909–57.Google Scholar

Roche, J., Dale, R., Kreuz, R. J. & Jaeger, T. F. (2013) Learning to avoid syntactic ambiguity. Ms., University of Rochester.Google Scholar

Sonderegger, M. & Yu, A. (2010) A rational account of perceptual compensation for coarticulation. Paper presented at the Proceedings of the 32nd Annual Meeting of the Cognitive Science Society (CogSci10).Google Scholar

Warker, J. A. & Dell, G. S. (2006) Speech errors reflect newly learned phonotactic constraints. Journal of Experimental Psychology: Learning, Memory, and Cognition 32(2):387.Google Scholar