Hostname: page-component-745bb68f8f-d8cs5 Total loading time: 0 Render date: 2025-02-11T11:01:34.094Z Has data issue: false hasContentIssue false

Evidence for, and predictions from, forward modeling in language production

Published online by Cambridge University Press:  24 June 2013

F.-Xavier Alario
Affiliation:
Laboratoire de Psychologie Cognitive, Aix-Marseille Université & CNRS, 13003 Marseille, France. francois-xavier.alario@univ-amu.frhttp://www.univ-provence.fr/wlpc/alariocarlos-miguel.hamame@univ-amu.frhttp://www.researchgate.net/profile/Carlos_Hamame2/
Carlos M. Hamamé
Affiliation:
Laboratoire de Psychologie Cognitive, Aix-Marseille Université & CNRS, 13003 Marseille, France. francois-xavier.alario@univ-amu.frhttp://www.univ-provence.fr/wlpc/alariocarlos-miguel.hamame@univ-amu.frhttp://www.researchgate.net/profile/Carlos_Hamame2/

Abstract

Pickering & Garrod (P&G) put forward the interesting idea that language production relies on forward modeling operating at multiple processing levels. The evidence currently available to substantiate this idea mostly concerns sensorimotor processes and not more abstract linguistic levels (e.g., syntax, semantics, phonology). The predictions that follow from the claim seem too general, in their current form, to guide specific empirical tests.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2013 

A central aspect of Pickering & Garrod's (P&G's) target article is that language production relies on forward modeling processes. These are explicitly described as involving semantic, syntactic, and phonological representations. Here we will attempt to challenge this aspect of their proposal on two grounds: the evidence available for such a claim, and the predictions that may follow from it.

P&G state that there is good evidence for the use of forward models during speech production. This statement must be qualified or clarified. The experimental evidence put forth relies on quite specific language production situations (e.g., vowel articulation or repeated production of very few simple and similar-sounding words). Such linguistic specificity in the stimuli means that the resulting data may not apply to evidence processes other than articulatory motor control. It is dubious that the evidence provides much information about semantic, syntactic, and possibly phonological processes, either in the speech production implementer or in the feedforward model.

Neurophysiological evidence supporting forward modeling in speech production comes from intracranial-EEG studies showing auditory cortex suppression during vocalization (Flinker et al. Reference Flinker, Chang, Kirsch, Barbaro, Crone and Knight2010; Towle et al. Reference Towle, Yoon, Castelle, Edgar, Biassou, Frim, Spire and Kohrman2008). In the visual system, well-identified motor-visual pathways subserve a similar suppression of sensory activity during eye movements (Sommer & Wurtz Reference Sommer and Wurtz2008). Consequently, auditory suppression during speech is attributed to an efference copy which exerts its influence from Broca's area to the auditory cortex via the inferior parietal lobe (Rauschecker & Scott Reference Rauschecker and Scott2009; Tourville & Guenther Reference Tourville and Guenther2011). Although there is anatomical evidence for such a pathway (Frey et al. Reference Frey, Campbell, Pike and Petrides2008), attempts to test the functionality of this connection during speech have proved inconclusive (Flinker et al. Reference Flinker, Chang, Kirsch, Barbaro, Crone and Knight2010; Towle et al. Reference Towle, Yoon, Castelle, Edgar, Biassou, Frim, Spire and Kohrman2008). The largely unclear matter of which motor-auditory pathway could carry such an efference copy is not considered in the target article. More generally, it is difficult to foresee which pathways may underlie the transmission of an efference copy, should linguistic information be involved (semantics, phonology, and syntax).

P&G also refer to previous theoretical work to support their generalization of feed-forward models beyond sensorimotor processes into linguistic levels. They use as an example the previous generalization of a motor control theory (MOSAIC) to a hierarchical version (HMOSAIC) that controls complex sequences of actions. This parallel has not helped to clarify matters. “The HMOSAIC model suggests that there are multiple levels of representation within the sensorimotor system” (Haruno et al. Reference Haruno, Wolpert and Kawato2003, p. 11). Hence, the HMOSAIC model seems specific to the sensorimotor system, and not necessarily applicable to more abstract levels of representation and processing, which is a core assumption of P&G's proposal.

Scalp-EEG evidence consistent with the hypothesis that language production is monitored by a general-purpose mechanism can be found in Riès et al. (Reference Riès, Janssen, Dufau, Alario and Burle2011). Using a grammatical gender decision task (a proxy for lexical access) and a standard picture naming task, those authors reported postresponse EEG waves very similar to those linked to response monitoring in nonlinguistic tasks (e.g., error-related negativity). In the speech task, the onset of these waves preceded the onset of overt response. Although this timing feature has sometimes been taken as a signature of efference copy (Gehring et al. Reference Gehring, Goss, Coles, Meyer and Donchin1993), it could also reflect the engagement of internal loop monitoring (Riès et al. Reference Riès, Janssen, Dufau, Alario and Burle2011).

In the absence of strong evidence for some of P&G's claims on forward modeling in language production, it is appropriate to examine the predictions that follow from them, and to gauge how they might guide future empirical tests.

The only explicitly stated prediction regarding forward modeling in language production is worded in broad terms: “[speakers] should detect semantic errors before syntactic errors, and should syntactic errors before phonological errors” (target article, sect. 3.1, para. 23). Although this statement is clear, there are two requirements for testing the relative ordering of error occurrence: that the errors are unambiguously classified, and that a moment of error detection can be defined and measured. These requirements seem difficult to meet in the absence of (some form of) overt response. Yet the production of such an overt response would complicate the attribution of the detection to forward modeling versus external loop processes. For timing, a common reference time point is required that is available across utterances. A reasonable proxy in experimental setups is stimulus onset, but for narrative speech or dialogue such an event is not easily defined. Testing this prediction is further complicated because “it is not necessary that the predicted representations are computed sequentially […] the syntactic prediction need not be ready before the phonological prediction” (sect. 3.1, para. 10). This clearly opens the possibility of reordering the sequence in which the parallel outputs of the implemented production and the forward model are compared.

A different tentative prediction can be constructed from P&G's proposal. The feedforward model is hypothesized to involve “impoverished representations [that] leave out (or simplify) many components of the implemented representations” (sect. 3.1, para. 6). P&G provide various examples of components that “might” (sect. 3.1, para. 9 onwards) be left out. It is not stated whether such opt-out is circumstantial (i.e., whether a component is left out or not depends on the speech act) or systematic (i.e., a component is always omitted from the feedforward model). In the latter case, omitted components would be susceptible only to external error detection and monitoring, whereas included components should be internally detectable and correctable. Checking whether these general statements are amenable to a specific testable hypothesis, contrasting feedforward with inner and overt speech-monitoring performance (e.g., Oppenheim & Dell Reference Oppenheim and Dell2010), would require more space than this commentary can accommodate.

In short, we submit that the evidence presented by P&G for forward models in language production concerns only limited aspects of this behavior, these being primarily sensorimotor processes (i.e., articulatory processes for speech). No currently available evidence calls for a generalization to more abstract levels. On the other hand, the predictions that may follow from this aspect of P&G's proposal are, in their current form, too general or unconstrained to guide specific empirical tests. These specific points notwithstanding, P&G's proposal provides a stimulating impetus for combining psychological and neurophysiological evidence more closely.

ACKNOWLEDGMENT

Funding from the European Research Council under the European Community's Seventh Framework Program (FP7/2007–2013 Grant agreement 263575). Institutional support from “Fédération de Recherche 3C” and the “Brain and Language Research Institute,” both at Aix-Marseille Université. We thank Marieke Longcamp for comments and Dashiel Munding for comments and native proofreading

References

Flinker, A., Chang, E. F., Kirsch, H. E., Barbaro, N. M., Crone, N. E. & Knight, R.T. (2010) Single-trial speech suppression of auditory cortex activity in humans. Journal of Neuroscience 30:16643–50.Google Scholar
Frey, S., Campbell, J. S., Pike, G. B. & Petrides, M. (2008) Dissociating the human language pathways with high angular resolution diffusion fiber tractography. Journal of Neuroscience 28:11435–44.Google Scholar
Gehring, W. J., Goss, B., Coles, M. G. H., Meyer, D. E. & Donchin, E. (1993) A neural system for error detection and compensation. Psychological Science 4:385–90.Google Scholar
Haruno, M., Wolpert, D. M. & Kawato, M. (2003) Hierarchical MOSAIC for movement generation. International Congress Series 1250:575–90.Google Scholar
Oppenheim, G. M. & Dell, G. S. (2010) Motor movement matters: The flexible abstractness of inner speech. Memory & cognition 38(8):1147–60. DOI:10.1016/j.cognition.2007.02.006.Google Scholar
Rauschecker, J. P. & Scott, S. K. (2009) Maps and streams in the auditory cortex: Nonhuman primates illuminate human speech processing. Nature Neuroscience 12(6):718–24. DOI:10.1038/nn.2331.CrossRefGoogle ScholarPubMed
Riès, S., Janssen, N., Dufau, S., Alario, F.-X. & Burle, B. (2011) General-purpose monitoring during speech production. Journal of Cognitive Neuroscience 23:1419–36.Google Scholar
Sommer, M. A. & Wurtz, R. H. (2008) Brain circuits for the internal monitoring of movements. Annual Review of Neuroscience 31:317–38.CrossRefGoogle ScholarPubMed
Tourville, J. A. & Guenther, F. H. (2011) The DIVA model: A neural theory of speech acquisition and production. Language and Cognitive Processes 26:952–81.Google Scholar
Towle, V. L., Yoon, H. A., Castelle, M., Edgar, J. C., Biassou, N. M., Frim, D. M., Spire, J. P. & Kohrman, M. H. (2008) ECoG gamma activity during a language task: Differentiating expressive and receptive speech areas. Brain 131:2013–27.CrossRefGoogle ScholarPubMed