Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-02-11T19:52:14.196Z Has data issue: false hasContentIssue false

The poor helping the rich: How can incomplete representations monitor complete ones?

Published online by Cambridge University Press:  24 June 2013

Kristof Strijkers
Affiliation:
Laboratoire de Psychologie Cognitive, CNRS and Université d'Aix-Marseille, 13331 Marseille, France. kristof.strijkers@gmail.com
Elin Runnqvist
Affiliation:
Departamento de Psicología Básica, Universitat de Barcelona, Barcelona 08035, Spain. elin_runnquist@yahoo.es
Albert Costa
Affiliation:
Universitat Pompeu Fabra, Center for Brain and Cognition, ICREA, Barcelona 08018, Spain. costalbert@gmail.com
Phillip Holcomb
Affiliation:
Department of Psychology, Tufts University, Medford, MA 02155. pholcomb@tufts.edu

Abstract

Pickering & Garrod (P&G) propose that inner speech monitoring is subserved by predictions stemming from fast forward modeling. In this commentary, we question this alignment of language prediction with the inner speech monitor. We wonder how the speech monitor can function so efficiently if it is based on incomplete representations.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2013 

Pickering & Garrod's (P&G's) integrated account of language production and comprehension brings forward novel cognitive mechanisms for a range of language processing functions. Here we would like to focus on the theoretical development of the speech monitor in P&G's theory and the evidence cited in support of it. The authors propose that we construct forward models of predicted percepts during language production and that these predictions form the basis to internally monitor and if necessary correct our speech. This view of a speech monitor grounded in domain-general action control is refreshing in many ways. Nevertheless, in our opinion it raises a general theoretical concern, at least in the form in which it is implemented in P&G's model. Furthermore, we believe that it is important to emphasize that the evidence cited in support of the use of forward modeling in speech monitoring is suggestive, but far from directly supporting of the theory.

In general terms, we question the rationale behind the proposal that incomplete representations constitute the basis of speech monitoring. A crucial aspect of P&G's model refers to timing. Because forward representations are computed faster than the actual representations that will be used to produce speech, the former serve to correct potential deviations in the latter representations. To ensure that the forward representations are available earlier than the implemented representations, P&G propose that the percepts constructed by the forward model are impoverished, containing only part of the information necessary to produce speech. But how can speech monitoring be efficient if it relies on “poor” representations to monitor the “rich” representations? For instance, a predicted syntactic percept could include grammatical category, but lack number and gender information.

In this example, it is evident that if the slower production implementer is erroneously preparing a verb instead of a noun, the predicted representation coming from the forward model might indeed serve to detect and correct the error prior to speech proper. However, if the representation prepared by the production implementer contains a number or gender error, given that this information is not specified in the predicted percept (in this example), then how do we avoid these errors when speaking? If the predicted language percepts are assumed to always be incomplete in order to be available early in the process, it is truly remarkable that an average speaker produces only about 1 error every 1,000 words (e.g., Levelt et al. Reference Levelt, Roelofs and Meyer1999). Therefore, although prediction likely plays an important role in facilitating the retrieval of relevant language representations (e.g. Federmeier Reference Federmeier2007; Strijkers et al. Reference Strijkers, Holcomb and Costa2011) and hence could also serve to aid the speech monitor, a proposal that identifies predictive processes with the inner speech monitor seems problematic or at least underspecified for now.

Besides the above theoretical concern regarding the use of incomplete representations as the basis of speech monitoring, also the strength of the evidence cited to support the use of forward modeling in speech production seems insufficient at present. The three studies discussed by P&G to illustrate the usage of efference copies during speech production (i.e., Heinks-Maldonado et al. Reference Heinks-Maldonado, Nagarajan and Houde2006; Tian & Poeppel Reference Tian and Poeppel2010; Tourville et al. Reference Tourville, Reily and Guenther2008), demonstrate that shifting acoustic properties of linguistic elements in the auditory feedback given to a speaker produces early auditory response enhancements. Although these data are suggestive and merit further investigation, showing that reafference cancellation generalizes to self-induced sounds does not prove that forward modeling is used for language production per se. It merely highlights that a mismatch between predicted and actual self-induced sounds (linguistic or not) produces an enhanced sensorial response just as in other domains of self-induced action (e.g., tickling). As for now, no study has explored whether auditory suppression related to self-induced sounds is also sensitive to purely linguistic phenomena (e.g., lexical frequency) or to variables known to affect speech monitoring (e.g., lexicality). This leaves open the possibility that the evidence cited only relates to general sensorimotor properties of speech (acoustics and articulation) rather than the monitoring of language proper.

In addition, the temporal arguments put forward by P&G to conclude that these data cannot be explained by comprehension-based accounts and instead support the notion of speech monitoring through prediction are premature. For instance, P&G take the speed with which self-induced sound auditory suppression occurs (around 100 ms after speech onset) as an indication that speakers could not be comprehending what they heard and argue that this favors a role of forward modeling in speech production. But, the speed with which lexical representations are activated in speech perception is still a debated issue and some studies provide evidence for access to words within 100 ms (e.g., MacGregor et al. Reference MacGregor, Pulvermuller, van Casteren and Shtyrov2012; Pulvermüller & Shtyrov Reference Pulvermüller and Shtyrov2006). In a similar vein, P&G rely on Indefrey and Levelt's (Reference Indefrey and Levelt2004) temporal estimates of word production to argue in favor of speech monitoring through prediction. However, this temporal map is still hypothetical, especially in terms of the latencies between the different representational formats (see Strijkers & Costa Reference Strijkers and Costa2011). More generally, one may question why P&G choose to link the proposed model, intended to be highly dynamical (rejecting the “cognitive sandwich”), with temporal data embedded in fully serial models. Indeed, if one abandons the strictly sequential time course of such models and instead allows for fast, cascading activation of the different linguistic representations, not only do the arguments of P&G become problematic, but also the notion of a slow production/comprehension implementer being monitored by a fast and incomplete forward model loses a critical aspect of its theoretical motivation.

To sum up, we believe that theoretical development of the speech monitor in P&G's integrated account of language production and comprehension faces a major challenge since it needs to explain how representations that lack certain dimensions of information can serve to detect and correct errors to such a high – almost errorless – degree. Furthermore, it is important to acknowledge that as it stands, the evidence used in support of this proposal could just as easily be reinterpreted in other terms, highlighting the need of direct empirical exploration of P&G's proposal.

References

Federmeier, K. D. (2007) Thinking ahead: The role and roots of prediction in language comprehension. Psychophysiology 44:491505.Google Scholar
Heinks-Maldonado, T. H., Nagarajan, S. S. & Houde, J. F. (2006) Magnetoencephalographic evidence for a precise forward model in speech production. NeuroReport 17(13):1375–79.Google Scholar
Indefrey, P. & Levelt, W. J. M. (2004) The spatial and temporal signatures of word production components. Cognition 92:101–44.CrossRefGoogle ScholarPubMed
Levelt, W. J. M., Roelofs, A. & Meyer, A. S. (1999) A theory of lexical access in speech production. Behavioral and Brain Sciences 22(1):175.Google Scholar
MacGregor, L. J., Pulvermuller, F., van Casteren, M. & Shtyrov, Y. (2012) Ultra-rapid access to words in the brain. Nature Communications 3:711.Google Scholar
Pulvermüller, F. & Shtyrov, Y. (2006) Language outside the focus of attention: the mismatch negativity as a tool for studying higher cognitive processes. Progress in Neurobiology 79:4971.Google Scholar
Strijkers, K. & Costa, A. (2011) Riding the lexical speedway: A critical review on the time course of lexical selection in speech production. Frontiers in Psychology 2:356.Google Scholar
Strijkers, K., Holcomb, P. & Costa, A. (2011) Conscious intention to speak facilitates lexical access during overt object naming. Journal of Memory and Language 65:345–62.Google Scholar
Tian, X. & Poeppel, D. (2010) Mental imagery of speech and movement implicates the dynamics of internal forward models. Frontiers in Psychology 1:166.Google Scholar
Tourville, J. A., Reily, K. J. & Guenther, F. K. (2008) Neural mechanisms underlying auditory feedback control of speech. NeuroImage 39:1429–43.CrossRefGoogle ScholarPubMed