Neurobiological evidence for the interdependence of receptive-expressive language in speech perception
The most widely adopted model of language neurobiology is a dual-stream model analogous to the visual system (Ungerleider & Haxby Reference Ungerleider and Haxby1994). Within this model, during receptive language, auditory speech sounds map to articulatory (motor) representations in a dorsal stream and to meaning in a ventral stream (Hickok Reference Hickok2009b; Hickok & Poeppel Reference Hickok and Poeppel2000; Reference Hickok and Poeppel2004; Reference Hickok and Poeppel2007; Rauschecker Reference Rauschecker2011; Rauschecker & Scott Reference Rauschecker and Scott2009; Rauschecker & Tian Reference Rauschecker and Tian2000; Rogalsky & Hickok Reference Rogalsky and Hickok2011). If this is correct, models like P&G's must account for the way these processing streams interact with the motor system involved in language production.
This problem is easier to solve within the dorsal stream, as many of the same brain regions are active during speech planning and execution, and during speech perception (Callan et al. Reference Callan, Jones, Callan and Akahane-Yamada2004; Eickhoff et al. Reference Eickhoff, Heim, Zilles and Amunts2009; Hickok & Poeppel Reference Hickok and Poeppel2007; Pulvermüller et al. Reference Pulvermüller, Huss, Kherif, Moscoso del Prado Martin, Hauk and Shtyrov2006; Vigneau et al. Reference Vigneau, Beaucousin, Hervé, Duffau, Crivello, Houdé, Mazoyer and Tzourio-Mazoyer2006; Wilson et al. Reference Wilson, Saygin, Sereno and Iacoboni2004). In fact, a primary contention is not whether the motor system is recruited during speech perception but in what situations it occurs. Some argue the motor system is essential (D'Ausilio et al. Reference D'Ausilio, Pulvermüller, Salmas, Bufalari, Begliomini and Fadiga2009; Iacoboni Reference Iacoboni2008; Meister et al. Reference Meister, Wilson, Deblieck, Wu and Iacoboni2007), whereas others argue that it is only involved when auditory-only speech is difficult to parse (e.g., during noisy situations, or when discriminating between similar phonemic units; Hickok Reference Hickok2009a; Hickok et al. Reference Hickok, Houde and Rong2011; Sato et al. Reference Sato, Tremblay and Gracco2009; Tremblay & Small Reference Tremblay and Small2011).
The latter situation appears to be the case for audiovisual speech perception, when visual information from the lips and mouth is present. Moreover, a forward-modeling architecture consistent with P&G's proposal has been suggested to explain the neurobiology of audiovisual speech perception (Callan et al. Reference Callan, Jones, Callan and Akahane-Yamada2004; Skipper et al. Reference Skipper, Nusbaum and Small2005; Skipper et al. Reference Skipper, van Wassenhove, Nusbaum and Small2007b; van Wassenhove et al. Reference van Wassenhove, Grant and Poeppel2005; Wilson & Iacoboni Reference Wilson and Iacoboni2006). Here, visual information, temporally preceding the auditory signal by several hundred milliseconds (Chandrasekaran et al. Reference Chandrasekaran, Trubanova, Stillittano, Caplier and Ghazanfar2009), provides a “forward model” of the speech sound. These models draw on the listener's articulatory representations to provide possible phonetic targets of the talker's speech (Callan et al. Reference Callan, Jones, Callan and Akahane-Yamada2004; Skipper et al. Reference Skipper, van Wassenhove, Nusbaum and Small2007b; van Wassenhove et al. Reference van Wassenhove, Grant and Poeppel2005). Findings that visual speech influences the auditory neural response's latency and amplitude (van Wassenhove et al. Reference van Wassenhove, Grant and Poeppel2005), and recruits motor-speech regions (Callan et al. Reference Callan, Jones, Callan and Akahane-Yamada2004; Dick et al. Reference Dick, Solodkin and Small2010; Hasson et al. Reference Hasson, Skipper, Nusbaum and Small2007; Sato et al. Reference Sato, Buccino, Gentilucci and Cattaneo2010; Skipper et al. Reference Skipper, Nusbaum and Small2005; Reference Skipper, van Wassenhove, Nusbaum and Small2007b; Watkins et al. Reference Watkins, Strafella and Paus2003), support predictive coding via forward models of the kind P&G propose.
Neurobiological evidence for the interdependence of receptive-expressive language in language and gesture comprehension
Although the neurobiological evidence for receptive-expressive language interdependence is compelling in speech perception, it is mixed for higher-level language comprehension, which involves brain regions along a ventral language pathway (Binder et al. Reference Binder, Desai, Graves and Conant2009; Hickok & Poeppel Reference Hickok and Poeppel2007; Vigneau et al. Reference Vigneau, Beaucousin, Hervé, Duffau, Crivello, Houdé, Mazoyer and Tzourio-Mazoyer2006). There is evidence – for example, in processing verbs – that the motor system contributes to understanding, and this is cited to support “motor simulation” theories (Cappa & Pulvermüller Reference Cappa and Pulvermüller2012; Fischer & Zwaan Reference Fischer and Zwaan2008; Glenberg Reference Glenberg2011; Glenberg & Gallese Reference Glenberg and Gallese2012). Notably, some authors interpret these findings without adhering to motor simulation theories (Bedny & Caramazza Reference Bedny and Caramazza2011; Mahon & Caramazza, Reference Mahon and Caramazza2009). Indeed, motor (production) system contribution to language comprehension is a contentious issue (e.g., this was a topic of an organized debate at the 2011 Neurobiology of Language Conference).
Additional evidence suggests that involvement of the motor system is specific to the task. For example, Tremblay et al. (Reference Tremblay, Sato and Small2012) applied repetitive transcranial magnetic stimulation (rTMS) to the ventral premotor cortex during a sentence comprehension task. The rTMS interfered with sentences describing manual actions, but not with other types of sentences, suggesting that predictive motor encoding is not always called upon. Another example is gesture comprehension. Some studies have shown that the act of viewing gestures recruits areas associated with a putative “mirror neuron” system thought to covertly simulate others' actions (Green et al. Reference Green, Straube, Weis, Jansen, Willmes, Konrad and Kircher2009; Holle et al. Reference Holle, Gunter, Rüschemeyer, Hennenlotter and Iacoboni2008; Skipper et al. Reference Skipper, Goldin-Meadow, Nusbaum and Small2007a; Reference Skipper, Goldin-Meadow, Nusbaum and Small2009; Willems et al. Reference Willems, Özyürek and Hagoort2007; Xu et al. Reference Xu, Gannon, Emmorey, Smith and Braun2009), but others show no evidence that this correlates with comprehension (Andric & Small Reference Andric and Small2012; Dick et al. Reference Dick, Goldin-Meadow, Hasson, Skipper and Small2009; Reference Dick, Goldin-Meadow, Solodkin and Small2012; Straube et al. Reference Straube, Green, Bromberger and Kircher2011; Willems et al. Reference Willems, Özyürek and Hagoort2009).
In closing, we note that within P&G's model it may not be necessary to elicit motor activation. For example, P&G state that “embodied accounts assume that producers and comprehenders use perceptual and motor representations associated with the meaning of what they are communicating. Our account does not require such embodiment but is compatible with it” (sect. 4, para. 9). Hence, the model seems able to account for motor activity, or lack of it, during receptive language. If this is the case, P&G should clarify what neurobiological findings could help decide between competing accounts that call upon interdependent receptive and expressive language systems.
The classical Lichtheim–Broca–Wernicke neurobiological model of language proposed distinct neuroanatomical pathways for language comprehension and production. Recent evidence suggests abandoning this model's classical form, and although there is not yet an established replacement (Dick & Tremblay Reference Dick and Tremblay2012; Price Reference Price2010; Reference Price2012 for review), we think much of the data support P&G's proposal. However, we also think P&G could be clearer about whether there are situations in which their model does not apply. For example, they state that “comprehenders make whatever linguistic predictions they can” (target article, sect. 3.2, para. 1), but this is so broad as to be unfalsifiable.
Neurobiological evidence suggests production and perception system interdependence occurs in specific situations. By highlighting emerging models and findings in the neurobiology of receptive language, we suggest that P&G's proposal could be fine-tuned to make more-specific, testable predictions.
Neurobiological evidence for the interdependence of receptive-expressive language in speech perception
The most widely adopted model of language neurobiology is a dual-stream model analogous to the visual system (Ungerleider & Haxby Reference Ungerleider and Haxby1994). Within this model, during receptive language, auditory speech sounds map to articulatory (motor) representations in a dorsal stream and to meaning in a ventral stream (Hickok Reference Hickok2009b; Hickok & Poeppel Reference Hickok and Poeppel2000; Reference Hickok and Poeppel2004; Reference Hickok and Poeppel2007; Rauschecker Reference Rauschecker2011; Rauschecker & Scott Reference Rauschecker and Scott2009; Rauschecker & Tian Reference Rauschecker and Tian2000; Rogalsky & Hickok Reference Rogalsky and Hickok2011). If this is correct, models like P&G's must account for the way these processing streams interact with the motor system involved in language production.
This problem is easier to solve within the dorsal stream, as many of the same brain regions are active during speech planning and execution, and during speech perception (Callan et al. Reference Callan, Jones, Callan and Akahane-Yamada2004; Eickhoff et al. Reference Eickhoff, Heim, Zilles and Amunts2009; Hickok & Poeppel Reference Hickok and Poeppel2007; Pulvermüller et al. Reference Pulvermüller, Huss, Kherif, Moscoso del Prado Martin, Hauk and Shtyrov2006; Vigneau et al. Reference Vigneau, Beaucousin, Hervé, Duffau, Crivello, Houdé, Mazoyer and Tzourio-Mazoyer2006; Wilson et al. Reference Wilson, Saygin, Sereno and Iacoboni2004). In fact, a primary contention is not whether the motor system is recruited during speech perception but in what situations it occurs. Some argue the motor system is essential (D'Ausilio et al. Reference D'Ausilio, Pulvermüller, Salmas, Bufalari, Begliomini and Fadiga2009; Iacoboni Reference Iacoboni2008; Meister et al. Reference Meister, Wilson, Deblieck, Wu and Iacoboni2007), whereas others argue that it is only involved when auditory-only speech is difficult to parse (e.g., during noisy situations, or when discriminating between similar phonemic units; Hickok Reference Hickok2009a; Hickok et al. Reference Hickok, Houde and Rong2011; Sato et al. Reference Sato, Tremblay and Gracco2009; Tremblay & Small Reference Tremblay and Small2011).
The latter situation appears to be the case for audiovisual speech perception, when visual information from the lips and mouth is present. Moreover, a forward-modeling architecture consistent with P&G's proposal has been suggested to explain the neurobiology of audiovisual speech perception (Callan et al. Reference Callan, Jones, Callan and Akahane-Yamada2004; Skipper et al. Reference Skipper, Nusbaum and Small2005; Skipper et al. Reference Skipper, van Wassenhove, Nusbaum and Small2007b; van Wassenhove et al. Reference van Wassenhove, Grant and Poeppel2005; Wilson & Iacoboni Reference Wilson and Iacoboni2006). Here, visual information, temporally preceding the auditory signal by several hundred milliseconds (Chandrasekaran et al. Reference Chandrasekaran, Trubanova, Stillittano, Caplier and Ghazanfar2009), provides a “forward model” of the speech sound. These models draw on the listener's articulatory representations to provide possible phonetic targets of the talker's speech (Callan et al. Reference Callan, Jones, Callan and Akahane-Yamada2004; Skipper et al. Reference Skipper, van Wassenhove, Nusbaum and Small2007b; van Wassenhove et al. Reference van Wassenhove, Grant and Poeppel2005). Findings that visual speech influences the auditory neural response's latency and amplitude (van Wassenhove et al. Reference van Wassenhove, Grant and Poeppel2005), and recruits motor-speech regions (Callan et al. Reference Callan, Jones, Callan and Akahane-Yamada2004; Dick et al. Reference Dick, Solodkin and Small2010; Hasson et al. Reference Hasson, Skipper, Nusbaum and Small2007; Sato et al. Reference Sato, Buccino, Gentilucci and Cattaneo2010; Skipper et al. Reference Skipper, Nusbaum and Small2005; Reference Skipper, van Wassenhove, Nusbaum and Small2007b; Watkins et al. Reference Watkins, Strafella and Paus2003), support predictive coding via forward models of the kind P&G propose.
Neurobiological evidence for the interdependence of receptive-expressive language in language and gesture comprehension
Although the neurobiological evidence for receptive-expressive language interdependence is compelling in speech perception, it is mixed for higher-level language comprehension, which involves brain regions along a ventral language pathway (Binder et al. Reference Binder, Desai, Graves and Conant2009; Hickok & Poeppel Reference Hickok and Poeppel2007; Vigneau et al. Reference Vigneau, Beaucousin, Hervé, Duffau, Crivello, Houdé, Mazoyer and Tzourio-Mazoyer2006). There is evidence – for example, in processing verbs – that the motor system contributes to understanding, and this is cited to support “motor simulation” theories (Cappa & Pulvermüller Reference Cappa and Pulvermüller2012; Fischer & Zwaan Reference Fischer and Zwaan2008; Glenberg Reference Glenberg2011; Glenberg & Gallese Reference Glenberg and Gallese2012). Notably, some authors interpret these findings without adhering to motor simulation theories (Bedny & Caramazza Reference Bedny and Caramazza2011; Mahon & Caramazza, Reference Mahon and Caramazza2009). Indeed, motor (production) system contribution to language comprehension is a contentious issue (e.g., this was a topic of an organized debate at the 2011 Neurobiology of Language Conference).
Additional evidence suggests that involvement of the motor system is specific to the task. For example, Tremblay et al. (Reference Tremblay, Sato and Small2012) applied repetitive transcranial magnetic stimulation (rTMS) to the ventral premotor cortex during a sentence comprehension task. The rTMS interfered with sentences describing manual actions, but not with other types of sentences, suggesting that predictive motor encoding is not always called upon. Another example is gesture comprehension. Some studies have shown that the act of viewing gestures recruits areas associated with a putative “mirror neuron” system thought to covertly simulate others' actions (Green et al. Reference Green, Straube, Weis, Jansen, Willmes, Konrad and Kircher2009; Holle et al. Reference Holle, Gunter, Rüschemeyer, Hennenlotter and Iacoboni2008; Skipper et al. Reference Skipper, Goldin-Meadow, Nusbaum and Small2007a; Reference Skipper, Goldin-Meadow, Nusbaum and Small2009; Willems et al. Reference Willems, Özyürek and Hagoort2007; Xu et al. Reference Xu, Gannon, Emmorey, Smith and Braun2009), but others show no evidence that this correlates with comprehension (Andric & Small Reference Andric and Small2012; Dick et al. Reference Dick, Goldin-Meadow, Hasson, Skipper and Small2009; Reference Dick, Goldin-Meadow, Solodkin and Small2012; Straube et al. Reference Straube, Green, Bromberger and Kircher2011; Willems et al. Reference Willems, Özyürek and Hagoort2009).
In closing, we note that within P&G's model it may not be necessary to elicit motor activation. For example, P&G state that “embodied accounts assume that producers and comprehenders use perceptual and motor representations associated with the meaning of what they are communicating. Our account does not require such embodiment but is compatible with it” (sect. 4, para. 9). Hence, the model seems able to account for motor activity, or lack of it, during receptive language. If this is the case, P&G should clarify what neurobiological findings could help decide between competing accounts that call upon interdependent receptive and expressive language systems.