One of the most influential current theories suggests that higher-order socio-cognitive processes, such as mind/intention reading and the compound aspects of language, may be primarily “grounded” in sensorimotor brain (Barsalou Reference Barsalou2008).
Inspired by multiple-duty cells originally discovered in the monkey (di Pellegrino et al. Reference Di Pellegrino, Fadiga, Fogassi, Gallese and Rizzolatti1992), neuroimaging studies have proposed that the human brain is equipped with specific, rapid, and automatic mechanisms that share action execution and perception in a common representational domain (Aglioti & Pazzaglia Reference Aglioti and Pazzaglia2010; Reference Aglioti and Pazzaglia2011; Van Overwalle & Baetens Reference Van Overwalle and Baetens2009). According to Pickering & Garrod (P&G), this inherent bidirectional, functional, and anatomical link seems to monitor the perception of other agents' actions through predictive mechanisms. The authors suggest that a simulative process might also run an internal generative representation that serves predictions in response to linguistic input. Despite nearly two decades of intensive research on the inextricable link between the perception and execution of action, there are two key problems with action prediction through simulation and, consequently, with its application to language processing.
The first bottleneck concerns neurophysiological and cognitive constraints, as revealed by action predictive coding. Inferring the intention of an action from a perceptual-motor code should imply accurate, one-to-one perceptual motor mapping between the goal and its respective kinematics (Kilner Reference Kilner2011). This is evident, for example, with a specific set of “action-constrained” single-cell recordings in monkeys, which fired during grasping for eating but not during grasping for placing (Fogassi et al. Reference Fogassi, Ferrari, Gesierich, Rozzi, Chersi and Rizzolatti2005). Upon exploring the various ways in which humans can reach and grasp, it appears that the kinematics precisely differ with respect to compatibility, or incompatibility, with the goal (e.g., drinking vs. passing) (Tretriluxana et al. Reference Tretriluxana, Gordon and Winstein2008). Crucially, activity in the inferior frontal cortex of onlooking human individuals is modulated differentially when a model exhibits different intentions associated with the grasping action (Iacoboni et al. Reference Iacoboni, Molnar-Szakacs, Gallese, Buccino, Mazziotta and Rizzolatti2005). In order to achieve this overall intention, an individual selects the most appropriate movement that is compatible with the purpose of the action. Within this framework, it is clear that the motor representations are comparatively stable and can be arranged in a limited, pre-wired motor chain that is functionally interpreted in terms of motor intention (Rizzolatti & Sinigaglia Reference Rizzolatti and Sinigaglia2007).
Speech sound representations, however, are highly variable; the linguistic message can be achieved with many speech sounds and, more questionably, the same speech units vary with their position within a word (Mottonen & Watkins Reference Mottonen and Watkins2009). In language processing, prediction by a simulation mechanism is plausible for articulatory representations consisting of a limited set of uniform elements that mainly differ in their serial positions and require precise selection and timing, and are at least pre-wired in two units (noun and verb) to form a complete sentence. This type of mechanism was reported for the hearing or reading of motor-related words/sentences, for which a growing number of studies have proposed a constant matching of input–output processes, however action-system mediated (Buccino et al. Reference Buccino, Riggio, Melli, Binkofski, Gallese and Rizzolatti2005; Pulvermüller & Fadiga Reference Pulvermüller and Fadiga2010; Tettamanti et al. Reference Tettamanti, Buccino, Saccuman, Gallese, Danna, Scifo, Fazio, Rizzolatti, Cappa and Perani2005). Moreover, several studies have documented how a rapid simulation process supports motor-related speech/language in the frontal motor cortex, ranging from the spontaneous imagery mechanism of tracking articulatory gestures to the complex motor aspects of action verbs or tool words that grant them their meaning (Pulvermüller Reference Pulvermüller2005). Hence, the motor counterpart enables the matching of production and comprehension, extending motor-related sound identification to language (the “what” of speech recognition), which in turn leads to predictive coding. Given the role of motor system, it remains unclear whether language perception occurs in a more general cognitive-motor domain or is an independent representation interacting with the action system (Fadiga & Craighero Reference Fadiga and Craighero2006). In addition, an intense debate exists that intimately links language and action at the ontogentic (Bates & Dick Reference Bates and Dick2002) and phylogentic (Toni et al. Reference Toni, de Lange, Noordzij and Hagoort2008; Zlatev Reference Zlatev2008) levels. From an evolutionary perspective, studies have postulated that language initially evolved from manual gestures in the form of a system of manual skills, pantomime, and protosigns (Arbib Reference Arbib2005; Leroi-Gourhan Reference Leroi-Gourhan1964; Reference Leroi-Gourhan1965). The subsequent conventionalization of signs and the shift to vocal emblems has enabled the transition to more symbolic, alternative, and open systems of communication (Corballis Reference Corballis2009).
The second bottleneck refers to the controversial neural and functional evidence reported by neuropsychological analyses. In brain-damaged patients with significant deficits of execution, the ability to predict and understand an observed/heard action may be spared. Although recent studies indicated a positive correlation between deficits in perceived and performed actions (Buxbaum et al. Reference Buxbaum, Kyle and Menon2005; Nelissen et al. Reference Nelissen, Pazzaglia, Vandenbulcke, Sunaert, Fannes, Dupont, Aglioti and Vandenberghe2010), many studies fail to provide straightforward evidence for direct matching between observed and executed actions (Hickok Reference Hickok2009a). Precise perceptual-motor coding, on which predictions must be planned, would explain the input–output associations of the impairment, but not the range of dissociations reported at both the group (Cubelli et al. Reference Cubelli, Marchetti, Boscolo and Della Sala2000; Halsband et al. Reference Halsband, Schmitt, Weyers, Binkofski, Grutzner and Freund2001; Negri et al. Reference Negri, Rumiati, Zadini, Ukmar, Mahon and Caramazza2007; Pazzaglia et al. Reference Pazzaglia, Smania, Corato and Aglioti2008a) or single-case (Pazzaglia et al. Reference Pazzaglia, Pizzamiglio, Pes and Aglioti2008b; Rumiati et al. Reference Rumiati, Zanini, Vorano and Shallice2001) level. Moreover, the published studies do not clarify whether or not neurologic patients are still able to infer the intention of an observed action (Fontana et al. Reference Fontana, Kilner, Rodrigues, Joffily, Nighoghossian, Vargas and Sirigu2012), despite the disruption in the ability to mentally simulate movements (Pazzaglia et al. Reference Pazzaglia, Smania, Corato and Aglioti2008a).
Although the dissociations between the neural and functional aspects of matching among input–output still need to be clarified in aphasic and apraxic patients (Pulvermüller et al. 2005; Pazzaglia Reference Pazzaglia2013), the plausible implications of anatomical and clinical divergences cannot be ignored. Dissociation, rather than the association of neuropsychological deficits in brain-damaged patients, continues to be a highly sensitive verification technique that is necessary to exclude vitiates and define the reliability boundaries of empirically viable theories.
Therefore, the range of possible dissociations between production and comprehension, which can occur in both action and language, is rather multifarious. In this conception, such dissociations are reliant upon higher-order sensorimotor experiences manifesting in the computational brain, namely: the intention to act; stable memory traces for different types of percepts; and the ecological and cultural conditions in which gestural, linguistic, and affective communication are implemented. Such processes could probably also interact with unique, more basic, low-level motor-resonance mechanisms (Mahon & Caramazza Reference Mahon and Caramazza2008). In particular, this can include the automatic selection of symbolization on which judgments regarding communication and predictions of appropriateness are based.
P&G discuss and emphasize studies that interweave the production and recognition of actions. However, they too quickly exclude the limits of prediction via the simulation of action. By not looking closely at the crucial roles of the physiological process (whereby predictions emerge through extremely fine-grained cognitive-motor operations) and the neurologic population (behavioral and anatomical disease is fully dissociable), the extension of such mechanisms to language may become unwarranted in situations where language does not call on cognitive-motor representations. A fruitful direction for tracking a complete theory of language processing must not only recognize the degree to which the processes underlying language and action are similar but should also discuss the intertwined and integrated aspects of this relationship, at least in a conceptual sense.
One of the most influential current theories suggests that higher-order socio-cognitive processes, such as mind/intention reading and the compound aspects of language, may be primarily “grounded” in sensorimotor brain (Barsalou Reference Barsalou2008).
Inspired by multiple-duty cells originally discovered in the monkey (di Pellegrino et al. Reference Di Pellegrino, Fadiga, Fogassi, Gallese and Rizzolatti1992), neuroimaging studies have proposed that the human brain is equipped with specific, rapid, and automatic mechanisms that share action execution and perception in a common representational domain (Aglioti & Pazzaglia Reference Aglioti and Pazzaglia2010; Reference Aglioti and Pazzaglia2011; Van Overwalle & Baetens Reference Van Overwalle and Baetens2009). According to Pickering & Garrod (P&G), this inherent bidirectional, functional, and anatomical link seems to monitor the perception of other agents' actions through predictive mechanisms. The authors suggest that a simulative process might also run an internal generative representation that serves predictions in response to linguistic input. Despite nearly two decades of intensive research on the inextricable link between the perception and execution of action, there are two key problems with action prediction through simulation and, consequently, with its application to language processing.
The first bottleneck concerns neurophysiological and cognitive constraints, as revealed by action predictive coding. Inferring the intention of an action from a perceptual-motor code should imply accurate, one-to-one perceptual motor mapping between the goal and its respective kinematics (Kilner Reference Kilner2011). This is evident, for example, with a specific set of “action-constrained” single-cell recordings in monkeys, which fired during grasping for eating but not during grasping for placing (Fogassi et al. Reference Fogassi, Ferrari, Gesierich, Rozzi, Chersi and Rizzolatti2005). Upon exploring the various ways in which humans can reach and grasp, it appears that the kinematics precisely differ with respect to compatibility, or incompatibility, with the goal (e.g., drinking vs. passing) (Tretriluxana et al. Reference Tretriluxana, Gordon and Winstein2008). Crucially, activity in the inferior frontal cortex of onlooking human individuals is modulated differentially when a model exhibits different intentions associated with the grasping action (Iacoboni et al. Reference Iacoboni, Molnar-Szakacs, Gallese, Buccino, Mazziotta and Rizzolatti2005). In order to achieve this overall intention, an individual selects the most appropriate movement that is compatible with the purpose of the action. Within this framework, it is clear that the motor representations are comparatively stable and can be arranged in a limited, pre-wired motor chain that is functionally interpreted in terms of motor intention (Rizzolatti & Sinigaglia Reference Rizzolatti and Sinigaglia2007).
Speech sound representations, however, are highly variable; the linguistic message can be achieved with many speech sounds and, more questionably, the same speech units vary with their position within a word (Mottonen & Watkins Reference Mottonen and Watkins2009). In language processing, prediction by a simulation mechanism is plausible for articulatory representations consisting of a limited set of uniform elements that mainly differ in their serial positions and require precise selection and timing, and are at least pre-wired in two units (noun and verb) to form a complete sentence. This type of mechanism was reported for the hearing or reading of motor-related words/sentences, for which a growing number of studies have proposed a constant matching of input–output processes, however action-system mediated (Buccino et al. Reference Buccino, Riggio, Melli, Binkofski, Gallese and Rizzolatti2005; Pulvermüller & Fadiga Reference Pulvermüller and Fadiga2010; Tettamanti et al. Reference Tettamanti, Buccino, Saccuman, Gallese, Danna, Scifo, Fazio, Rizzolatti, Cappa and Perani2005). Moreover, several studies have documented how a rapid simulation process supports motor-related speech/language in the frontal motor cortex, ranging from the spontaneous imagery mechanism of tracking articulatory gestures to the complex motor aspects of action verbs or tool words that grant them their meaning (Pulvermüller Reference Pulvermüller2005). Hence, the motor counterpart enables the matching of production and comprehension, extending motor-related sound identification to language (the “what” of speech recognition), which in turn leads to predictive coding. Given the role of motor system, it remains unclear whether language perception occurs in a more general cognitive-motor domain or is an independent representation interacting with the action system (Fadiga & Craighero Reference Fadiga and Craighero2006). In addition, an intense debate exists that intimately links language and action at the ontogentic (Bates & Dick Reference Bates and Dick2002) and phylogentic (Toni et al. Reference Toni, de Lange, Noordzij and Hagoort2008; Zlatev Reference Zlatev2008) levels. From an evolutionary perspective, studies have postulated that language initially evolved from manual gestures in the form of a system of manual skills, pantomime, and protosigns (Arbib Reference Arbib2005; Leroi-Gourhan Reference Leroi-Gourhan1964; Reference Leroi-Gourhan1965). The subsequent conventionalization of signs and the shift to vocal emblems has enabled the transition to more symbolic, alternative, and open systems of communication (Corballis Reference Corballis2009).
The second bottleneck refers to the controversial neural and functional evidence reported by neuropsychological analyses. In brain-damaged patients with significant deficits of execution, the ability to predict and understand an observed/heard action may be spared. Although recent studies indicated a positive correlation between deficits in perceived and performed actions (Buxbaum et al. Reference Buxbaum, Kyle and Menon2005; Nelissen et al. Reference Nelissen, Pazzaglia, Vandenbulcke, Sunaert, Fannes, Dupont, Aglioti and Vandenberghe2010), many studies fail to provide straightforward evidence for direct matching between observed and executed actions (Hickok Reference Hickok2009a). Precise perceptual-motor coding, on which predictions must be planned, would explain the input–output associations of the impairment, but not the range of dissociations reported at both the group (Cubelli et al. Reference Cubelli, Marchetti, Boscolo and Della Sala2000; Halsband et al. Reference Halsband, Schmitt, Weyers, Binkofski, Grutzner and Freund2001; Negri et al. Reference Negri, Rumiati, Zadini, Ukmar, Mahon and Caramazza2007; Pazzaglia et al. Reference Pazzaglia, Smania, Corato and Aglioti2008a) or single-case (Pazzaglia et al. Reference Pazzaglia, Pizzamiglio, Pes and Aglioti2008b; Rumiati et al. Reference Rumiati, Zanini, Vorano and Shallice2001) level. Moreover, the published studies do not clarify whether or not neurologic patients are still able to infer the intention of an observed action (Fontana et al. Reference Fontana, Kilner, Rodrigues, Joffily, Nighoghossian, Vargas and Sirigu2012), despite the disruption in the ability to mentally simulate movements (Pazzaglia et al. Reference Pazzaglia, Smania, Corato and Aglioti2008a).
Although the dissociations between the neural and functional aspects of matching among input–output still need to be clarified in aphasic and apraxic patients (Pulvermüller et al. 2005; Pazzaglia Reference Pazzaglia2013), the plausible implications of anatomical and clinical divergences cannot be ignored. Dissociation, rather than the association of neuropsychological deficits in brain-damaged patients, continues to be a highly sensitive verification technique that is necessary to exclude vitiates and define the reliability boundaries of empirically viable theories.
Therefore, the range of possible dissociations between production and comprehension, which can occur in both action and language, is rather multifarious. In this conception, such dissociations are reliant upon higher-order sensorimotor experiences manifesting in the computational brain, namely: the intention to act; stable memory traces for different types of percepts; and the ecological and cultural conditions in which gestural, linguistic, and affective communication are implemented. Such processes could probably also interact with unique, more basic, low-level motor-resonance mechanisms (Mahon & Caramazza Reference Mahon and Caramazza2008). In particular, this can include the automatic selection of symbolization on which judgments regarding communication and predictions of appropriateness are based.
P&G discuss and emphasize studies that interweave the production and recognition of actions. However, they too quickly exclude the limits of prediction via the simulation of action. By not looking closely at the crucial roles of the physiological process (whereby predictions emerge through extremely fine-grained cognitive-motor operations) and the neurologic population (behavioral and anatomical disease is fully dissociable), the extension of such mechanisms to language may become unwarranted in situations where language does not call on cognitive-motor representations. A fruitful direction for tracking a complete theory of language processing must not only recognize the degree to which the processes underlying language and action are similar but should also discuss the intertwined and integrated aspects of this relationship, at least in a conceptual sense.
ACKNOWLEDGMENT
Mariella Pazzaglia is supported by the International Foundation for Research in Paraplegie (IRP, P133).