Integrate, yes, but what and how? A computational approach of sensorimotor fusion in speech

Raphaël Laurent; Clément Moulin-Frier; Pierre Bessière; Jean-Luc Schwartz; Julien Diard

doi:10.1017/S0140525X12002634

Integrate, yes, but what and how? A computational approach of sensorimotor fusion in speech

Published online by Cambridge University Press: 24 June 2013

Raphaël Laurent ,

Clément Moulin-Frier ,

Pierre Bessière ,

Jean-Luc Schwartz and

Julien Diard

Show author details

Raphaël Laurent: Affiliation:
GIPSA-Lab – CNRS UMR 5216, Grenoble University, 38402 Saint Martin D'Hères Cedex, France. Raphael.Laurent@gipsa-lab.grenoble-inp.frhttp://www.gipsa-lab.grenoble-inp.fr/page_pro.php?vid=1238Jean-Luc.Schwartz@gipsa-lab.grenoble-inp.frhttp://www.gipsa-lab.grenoble-inp.fr/~jean-luc.schwartz/http://www.gipsa-lab.grenoble-inp.fr/~clement.moulin-frier/cv_en.html e-Motion team - INRIA Rhône-Alpes, 38334 Saint Ismier Cedex, France.
Clément Moulin-Frier: Affiliation:
GIPSA-Lab – CNRS UMR 5216, Grenoble University, 38402 Saint Martin D'Hères Cedex, France. Raphael.Laurent@gipsa-lab.grenoble-inp.frhttp://www.gipsa-lab.grenoble-inp.fr/page_pro.php?vid=1238Jean-Luc.Schwartz@gipsa-lab.grenoble-inp.frhttp://www.gipsa-lab.grenoble-inp.fr/~jean-luc.schwartz/http://www.gipsa-lab.grenoble-inp.fr/~clement.moulin-frier/cv_en.html FLOWERS team - INRIA Bordeaux Sud-Ouest, 33405 Talence Cedex, France. clement.moulin-frier@inria.fr
Pierre Bessière: Affiliation:
e-Motion team - INRIA Rhône-Alpes, 38334 Saint Ismier Cedex, France. Laboratoire de Physiologie de la Perception et de l'Action – CNRS UMR 7152, Collège de France,75005 Paris, France. Pierre.Bessiere@College-de-France.frhttp://www.Bayesian-Programming.org
Jean-Luc Schwartz: Affiliation:
GIPSA-Lab – CNRS UMR 5216, Grenoble University, 38402 Saint Martin D'Hères Cedex, France. Raphael.Laurent@gipsa-lab.grenoble-inp.frhttp://www.gipsa-lab.grenoble-inp.fr/page_pro.php?vid=1238Jean-Luc.Schwartz@gipsa-lab.grenoble-inp.frhttp://www.gipsa-lab.grenoble-inp.fr/~jean-luc.schwartz/http://www.gipsa-lab.grenoble-inp.fr/~clement.moulin-frier/cv_en.html
Julien Diard: Affiliation:
Laboratoire de Psychologie et NeuroCognition – CNRS UMR 5105, Grenoble University, 38040 Grenoble Cedex 9, France. Julien.Diard@upmf-grenoble.frhttp://diard.wordpress.com/

Article contents

Abstract
References

Rights & Permissions

Abstract

We consider a computational model comparing the possible roles of “association” and “simulation” in phonetic decoding, demonstrating that these two routes can contain similar information in some “perfect” communication situations and highlighting situations where their decoding performance differs. We conclude that optimal decoding should involve some sort of fusion of association and simulation in the human brain.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 36 , Issue 4 , August 2013 , pp. 364 - 365

DOI: https://doi.org/10.1017/S0140525X12002634 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2013

In their target article, Pickering & Garrod (P&G) propose an ambitious model of language perception and production. It is centered on three main ingredients. First, it considers the complete hierarchy of layers of language processing, from message to semantics to syntax to phonology and finally, to speech. Second, it features predictive forward models, so that temporally extended sequences, such as whole sentences and dialogues, can be processed. Third, it features dual processing routes, the “association” route and “simulation” route, so that auditory and motor knowledge can be involved simultaneously, rejecting the classic dichotomy between perception and action processes.

In this commentary, we set aside the temporal and hierarchical aspects, and focus on the domain of speech perception and production, where sequences are typically short (e.g., syllable perception and production), and processing limited to phonological decoding. Even in this more restricted field, the age-old debate between purely motor-based accounts and purely sensory-based accounts of perception and production now appears to be a false dilemma (Schwartz et al. Reference Schwartz, Basirat, Ménard and Sato2012). Indeed, neurophysiological and behavioral evidence strongly suggests a dual route account of information processing in the central nervous system, with both a direct, associative route and an indirect, simulation route. The target article amply documents the evidence, we do not repeat examples here.

In our view, the debate is now shifting toward the issue of the functional role of each route and their integration. That is to say, a central question of the debate asks what is integrated and how integration proceeds in the human brain.

We would argue that conceptual models such as proposed in the target article would unfortunately have a difficult time bringing light to these questions. To support this argument, we consider the question of perceptual decoding of phonetic units, for which we have developed a computational framework (Moulin-Frier et al. Reference Moulin-Frier, Laurent, Bessière, Schwartz and Diard2012) based on Bayesian programming (Bessière et al. Reference Bessière, Laugier and Siegwart2008; Colas et al. Reference Colas, Diard and Bessière2010; Lebeltel et al. Reference Lebeltel, Bessière, Diard and Mazer2004). With this framework, various models of speech perception can be simulated and quantitatively compared. One model is purely auditory, exploiting what P&G call “association.” A second model is purely motor, exploiting what they call “simulation.” A third one is sensory-motor, integrating the association and simulation processes.

All of these models can then be implemented and compared in various experimental configurations. Three major results emerge from such comparisons.

1. Under some hypotheses, with perfectly identified communication noise and no difference between motor repertoires of the speaker and the listener (i.e., when conditions for speech communication are “perfect”), motor and auditory theories are indistinguishable. Therefore, the “association” and “simulation” routes provide exactly the same information in these perfect communication conditions. The reason is that, in our learning scenario, the auditory classifier is learned by association from data obtained through a motor production process, and possesses enough mathematical power of expression.;>This casts an interesting light on the question of what information is encoded in the association and simulation routes: Labeling a box as an “association” route, in a conceptual model, is not enough to be certain that it is different, from an information processing point of view, from another box of the model. Computational descriptions however, by virtue of rigorous mathematical notation, have to be precisely defined, and their content can be systematically assessed. This also explains why behavioral evidence has historically not been able to discriminate between motor and auditory theories of perception and production: They are sometimes simply indistinguishable. Unfortunately, we believe this difficulty was not avoided in the target article, in particular when P&G detail experimental evidence for their model (e.g., target article, sect. 3.2.1, para. 7, “these four studies support forward modeling, but they do not discriminate between prediction-by-simulation and prediction-by-association”; and sect. 3.2.3, para. 6, “all of these findings provide support for the model of prediction-by-simulation […]. Of course, comprehenders may also perform prediction-by-association […].”).
2. In the general case where “perfect conditions” for communication are not met, mathematical comparison of the models emphasizes the respective roles of motor and auditory knowledge in various conditions of speech perception in adverse conditions. Therefore, the information provided by the “association” and “simulation” routes is more or less distinct and prominent depending on the communication conditions. In other words, this demonstrates that adverse conditions provide leverage for discriminating hypotheses about the perceptual and motor processes involved. This is convergent with recent findings from neuroimaging and transcranial magnetic stimulation (TMS) studies (D'Ausilio et al. Reference D'Ausilio, Bufalari, Salmas and Fadiga2012b; Meister et al. Reference Meister, Wilson, Deblieck, Wu and Iacoboni2007; Zekveld et al. Reference Zekveld, Heslenfeld, Festen and Schoonhoven2006), as well as computational studies (Castellini et al. Reference Castellini, Badino, Metta, Sandini, Tavella, Grimaldi and Fadiga2011).
3. In any case, sensory-motor fusion provides better perceptual performance than pure auditory or motor processes. Therefore, complementarities of information provided by the “association” and “simulation” routes could be efficiently exploited in the framework of integrative theories such as those hinted at in the discussion of the target article. It is now obvious in the field of audiovisual perception that auditory and visual cues are complementary, with a great deal of work already done on sensor fusion. In our opinion, comparable work can now be done on how to integrate auditory and motor processes in speech perception. In this view, the proposal by P&G that “comprehenders emphasize whichever route is likely to be more accurate” (sect. 4, para. 6) can be regarded as a first candidate model, which would have to be made mathematically precise and compared with alternative explanations, possibly driven by neuroanatomical findings (e.g., both auditory and motor processes are performed automatically in parallel and compete, or they both bring information in an ongoing fusion process, etc.).

An obvious challenge, of course, is to bridge the gap between computational approaches such as ours, which are usually restricted to isolated syllable production and perception, and conceptual models as proposed in the target article, that tackle continuous flows of speech and consider semantic, syntactic and phonology layers of processing.

However, in our view, the main challenge for future studies is first to assess what kind of information is present in “association” and “simulation” routes, and second, to better understand how computational fusion models, describing the integration of these two routes, can account for experimental neurocognitive data.

References

Bessière, P., Laugier, C. & Siegwart, R. ed. (2008) Probabilistic reasoning and decision making in sensory-motor systems, volume 46 of Springer tracts in advanced robotics. Springer-Verlag.Google Scholar

Castellini, C., Badino, L., Metta, G., Sandini, G., Tavella, M., Grimaldi, M. & Fadiga, L. (2011) The use of phonetic motor invariants can improve automatic phoneme discrimination. PLoS ONE 6(9):e24055.Google Scholar

Colas, F., Diard, J. & Bessière, P. (2010) Common Bayesian models for common cognitive issues. Acta Biotheoretica 58(2–3):191–216.Google Scholar

D'Ausilio, A., Bufalari, I., Salmas, P. & Fadiga, L. (2012b) The role of the motor system in discriminating degraded speech sounds. Cortex 48:882–87.Google Scholar

Lebeltel, O., Bessière, P., Diard, J. & Mazer, E. (2004) Bayesian robot programming. Autonomous Robots 16(1):49–79.Google Scholar

Meister, I. G., Wilson, S. M., Deblieck, C., Wu, A. D. & Iacoboni, M. (2007) The essential role of premotor cortex in speech perception. Current Biology 17:1692–96.Google Scholar

Moulin-Frier, C., Laurent, R., Bessière, P., Schwartz, J.-L. & Diard, J. (2012) Adverse conditions improve distinguishability of auditory, motor and perceptuo-motor theories of speech perception: An exploratory Bayesian modeling study. Language and Cognitive Processes 27(7–8):1240–63.Google Scholar

Schwartz, J.-L., Basirat, A., Ménard, L. & Sato, M. (2012) The perception-for-action-control theory (PACT): A perceptuo-motor theory of speech perception. Journal of Neurolinguistics 25(5):336–54.Google Scholar

Zekveld, A. A., Heslenfeld, D. J., Festen, J. M. & Schoonhoven, R. (2006) Top-down and bottom-up processes in speech comprehension. NeuroImage 32:1826–36.Google Scholar