Scientific writing, like any communication, can be a little bit like the game of ‘telephone’, in which a message is whispered from one person to the next and ends up transforming bit by bit as it is passed along. The problem, of course, is that language is an often ambiguous channel for the communication of complex ideas, subject to distortion by both the sender and receiver. Although we try to write clearly and read carefully, it is inevitable that the message becomes distorted here and there in any scientific piece. Much of David Kemmerer’s (DK) critique of The Myth of Mirror Neurons (TMoMN) (Hickok, Reference Hickok2014), I suggest, results from this sort of miscommunication. I take full responsibility for any lack of clarity in my exposition, so I am grateful to DK for taking the time to highlight points of confusion and for presenting me with an opportunity for clarification. There are some substantive disagreements as well, that I will also address.
At the outset it is worth underlining that DK fully endorses the central conclusion of TMoMN: that the story told by the group in Parma – that mirror neurons are the basis of action understanding, speech perception, theory of mind, imitation, empathy, and so on – is untenable. What we are discussing here is whether mirror neurons, or the motor system more generally, plays some role in perceptual/conceptual processes. This is a theoretically viable and important debate. In what follows I will address DK’s major concerns in their order of appearance in his comment, and include some thoughts on what the motor system might be contributing to perception and understanding.
DK suggests that in Chapter 3 I paint “a mostly negative picture of the work that has been done during the past twenty years to delineate the mirror system in the human brain”. This is one of the shortest chapters in the book (thirteen pages if you don’t count the figure), most of which (ten pages) is devoted to discussing human research that was done in the 1990s when the mirror neuron theory of action understanding was being forged. The reason for this brevity and focus is that the chapter was intended to provide a historical perspective on the evidence that led to the development of the theory, not to provide a thorough review of all we know about the human mirror system from functional imaging. And the fact remains, the foundational arguments that led to the game-changing claims regarding language (Rizzolatti & Arbib, Reference Rizzolatti and Arbib1998), mind-reading (Gallese & Goldman, Reference Gallese and Goldman1998), and empathy (Gallese, Reference Gallese2001) were based on very thin, indeed circular arguments. No one seemed to notice, though, which is an interesting observation and an important lesson that I hoped would come across: the ideas were so exciting that the weaknesses were overlooked. It’s interesting that the papers DK cites as providing strong evidence for the existence and organization of the human mirror system were published in 2009, 2010, and 2012, long after the action understanding doctrine was well-established and widely accepted. In short, I was making a point about the development of the theory and passed on providing a thorough review because, as I stated at the outset, “it is virtually a given that humans have mirror neurons” (p. 27). Then, at the end of the chapter, I backed up this assertion with a discussion of the most direct demonstration of the existence of mirror neurons in human Broca’s area (Kilner, Neal, Weiskopf, Friston, & Frith, Reference Kilner, Neal, Weiskopf, Friston and Frith2009).
Next, DK challenges my claim that the human mirror system is highly plastic – that the mirror response can be trained to “counter-mirror” (Catmur, Walsh, & Heyes, Reference Catmur, Walsh and Heyes2007) – and therefore ill-suited for understanding. Focusing on the TMS (transcranial magnetic stimulation) evidence for plasticity, he makes the important point that recent research suggests that trained counter-mirror responses occur on a different timescale to mirror responses, which are unaffected by training, thus leaving open the possibility that mirroring is stable. But this is a controversial result with yet other new studies showing “significant effects of counter-mirror sensorimotor training at all timepoints […] indicating that mirror and counter-mirror responses follow the same timecourse” (Cavallo, Heyes, Becchio, Bird, & Catmur, Reference Cavallo, Heyes, Becchio, Bird and Catmur2014).
No matter how this mini-debate turns out, there is much more direct evidence for mirror neuron malleability (TMS is a blunt instrument) from monkey mirror neurons, discussed in Chapter 4 and not mentioned by DK. This concerns “tool-responding mirror neurons”, cells that acquire the ability to respond to observing grasping with tools after extended exposure, without a noticeable concomitant change in understanding (Ferrari, Maiolini, Addessi, Fogassi, & Visalberghi, Reference Ferrari, Maiolini, Addessi, Fogassi and Visalberghi2005). There is also evidence for a large population of counter-mirror neurons in area F5, referred to as ‘logical relation’ mirror neurons in the original 1992 report (di Pellegrino, Fadiga, Fogassi, Gallese, & Rizzolatti, Reference di Pellegrino, Fadiga, Fogassi, Gallese and Rizzolatti1992). Interestingly, there were approximately the same number of these counter-mirror neurons as there were ‘congruent mirror neurons’. This shows that mirroring isn’t the only function of cells in the network. These facts alone don’t disprove the action understanding theory, but taken together they seem to fit much more neatly in a broader theory of motor selection (Hickok, Reference Hickok2014; Hickok & Hauser, Reference Hickok and Hauser2010) or sensory-motor association (Heyes, Reference Heyes2010). What we have in monkey area F5 is a constellation of sensory-motor cells that respond to any number of sensory inputs: objects, sounds, object-directed biological actions, tool actions, and so on. It’s possible that most of them function to support good-old-fashioned motor selection, while mirror neurons are special. Or it could be that mirror neurons support action selection too, as I proposed. The bulk of the evidence supports the latter view, as I argued extensively in the book.
DK then transitions to a discussion of a range of studies argued to show that the inferior frontal cortex contributes to action perception. This is not a theoretically innocuous shift of focus. There is a dramatic difference between the foundational mirror neuron claim – mirror neurons are the basis of action understanding – and the claim that motor structures might contribute to action perception/understanding. One mechanism that DK proposes for such a contribution is via motor to sensory predictive coding that constrains sensory processing – an idea that I have proposed myself for speech (Hickok, Houde, & Rong, Reference Hickok, Houde and Rong2011), following others (Sams, Mottonen, & Sihvonen, Reference Sams, Mottonen and Sihvonen2005; van Wassenhove, Grant, & Poeppel, Reference van Wassenhove, Grant and Poeppel2005). Notice that according to this view, the sensory systems form the hub of the perceptual/understanding network and motor systems modulate them, whereas in the Parma-based mirror neurons claims, the motor system is the hub. The point of my book was to argue specifically against the motor-centric Parma claims, so DK’s objections miss the point. In the final chapter of the book, I do address these modulatory models, including a discussion of some conceptual hurdles that these models (including my own!) need to deal with. I do not attempt an exhaustive review of this growing literature – that was not my aim.
Also along these lines, DK brings up the literature on the effects of expertise, dancers that activate their motor systems more than non-dancers while watching dance, expert soccer players that are better than novices at judging penalty kick outcomes, and so on. This literature is complicated by the fact that experience with an action builds sensorimotor expertise, not just motor expertise. (Some studies have attempted to control visual experience, but proprioceptive experience cannot be easily controlled.) As I argue extensively, the motor system is literally and figuratively blind without sensory systems and cannot function without them (the reverse is not true). Therefore, motor expertise necessarily drags along sensory expertise and builds stronger sensory-to-motor associations. It is therefore no surprise that dancers activate their motor systems more robustly: they have seen those moves before, they know what it feels like to perform them, and they have built stronger associations between the sensory states and motor plans for dance. This does not necessarily mean that the understanding is dependent on the motor system.
DK then shifts another gear and discusses work on intention understanding. He points to recent fMRI (functional magnetic resonance imaging) work showing that the mirror system activates more during observation of social actions compared to individual actions. This is important because, as DK notes, it could lead to a neurophysiological basis for mind-reading (mentalizing). There is a simpler explanation though: social actions are more relevant for action selection and therefore more likely to activate the observer’s motor system. Further, as I pointed out in TMoMN, this sort of finding begs the question: How does a motor simulation mechanism know ahead of time whether an action is social (more activation/simulation) or non-social (less activation/simulation)? Some other system must be noticing the difference and then activating the mirror system or not.
DK’s most interesting comments concern the role of the motor system in action semantics, a claim that often co-mingles inappropriately with mirror neurons. For example, it is often argued that primary motor cortex contributes to the meaning of body part-specific actions in a somatotopically organized fashion (Hauk, Johnsrude, & Pulvermuller, Reference Hauk, Johnsrude and Pulvermuller2004), whereas primary motor cortex is not typically considered part of the mirror system (Gallese, Fadiga, Fogassi, & Rizzolatti, Reference Gallese, Fadiga, Fogassi and Rizzolatti1996). DK, on the other hand, has made a serious attempt to integrate action semantics and the mirror system by arguing that abstract representations of verb-class semantics (e.g., X causes Y to go to Z) are coded in the Broca’s area portion of the mirror system. He uses this theoretical work as an example to counter my claim that “the meanings simply aren’t in the movements”. However, DK simultaneously misses and makes my point, which is simply that the movement code involved in, say, pouring water from a pitcher are themselves highly ambiguous and dependent on the non-motoric context (if there is no water, there is no pouring). Thus, as Csibra has pointed out (Csibra, Reference Csibra, Haggard, Rosetti and Kawato2007), simple motor mirroring cannot alone explain action understanding. DK gets around this problem by endowing mirror neurons with abstract semantic properties, that is, by agreeing that simple motor simulation isn’t doing the work. Notice too that by invoking abstract representations it is not at all clear that we are talking about motor plans at all. For example, an abstract frame such as X causes Y to go to Z applies to non-motoric events as well, such as The wind pushed the chair across the patio. Then we have to ask, if this kind of representation can be coded non-motorically, why do we need a motoric code for sentences like David pushed the chair across the patio?
DK delves deeper into action semantics, noting that most action verbs do not specify motor details in their meanings (thus questioning the motor system’s utility), but some do (e.g., pinch). He asks whether the motor system may contribute substantially to these “idiosyncratic verbs” as he calls them. For the sake of argument, let’s assume this to be true. What this would mean is that the motor system contributes meaning to only an idiosyncratic fraction of action verbs, which themselves are only a portion of the range of verb forms we can understand. If true, this would constitute relatively modest progress in understanding the neural basis of language.
DK notices that my skepticism regarding the role of motor cortex in action semantics seems contradictory to my statement (in another context) that sensory and motor systems are capable of performing complex, abstract computations. I can see where this would seem contradictory, but I was making a different point (in that other context) regarding the embodied cognition movement more broadly, namely, that (i) sensorimotor embodiment doesn’t simplify the complexity of the representational problem, it just pushes the complexity into sensorimotor systems (rightly or wrongly), and relatedly that (ii) letting sensorimotor systems do all the work doesn’t mean that the problem is any easier to solve, e.g., with simple ‘resonance’ mechanisms – sensorimotor networks are still very complex, computational systems.
DK next surveys a range of findings showing an association between processing action-related language and the motor system. The imaging and motor-evoked potential TMS studies are not without interpretive complication: is the hand area, for example, active when processing the word throw because the motor system is part of the meaning or because the meaning is associatively linked to hand actions? The evidence is quite mixed at best, with motor involvement variable across task and situational context, as DK acknowledges. How one interprets this variability depends on one’s threshold for accepting the motor-contribution hypothesis. My threshold is admittedly high, and therefore I find myself swayed by examples of dissociations between action word understanding and motor disruption/activation. If the effects are variable across tasks, I worry, doesn’t this indicate that peculiarities of the tasks themselves are driving the recruitment of the motor system or that the effect, if real, is so small as to be nearly inconsequential? DK’s threshold seems rather lower, in that he views the presence of these effects in at least some studies (and there are several to be sure) as evidence that under some circumstances the motor system is contributing something important. The debate may well come down to what one counts as an important contribution. In this eventuality let me suggest a metric. We can ask: How much of the semantic variance between action verbs is accounted for by motor features such as hand- vs. mouth- vs. foot-related action? Informally, I suspect very little. For example, the ‘hand action feature’ common to verbs like THROW, WRITE, ERASE, COMB, and SLICE, to name a few stimulus examples from a recent TMS study (Willems, Labruna, D’Esposito, Ivry, & Casasanto, Reference Willems, Labruna, D’Esposito, Ivry and Casasanto2011), seems to explain very little of the variance in meaning between these items.
DK then turns to whether the motor system plays a role in speech perception. In Chapter 5 of TMoMN I argue that a strong variant of the motor theory of speech perception, including its Parma promoted resurrection, is untenable. I did not intend to thoroughly address the possibility that the motor system may modulate a fundamentally auditory-based model of speech perception, which is the model DK seems to favor. I did discuss this possibility in Chapter 10, however, noting that I have proposed exactly this. Specifically, in 2011 my collaborators and I wrote:
… we suggest … under some circumstances forward predictions from the motor speech system can modulate the perception of others’ speech … forward predictions generated via motor commands can function as a top-down attentional modulation of sensory systems. Such attentional modulation may be important for sensory feedback control because it sharpens the perceptual acuity of the sensory system to the relevant range of expected inputs (see below). This ‘attentional’ mechanism might then be easily co-opted for motor-directed modulation of the perception of others’ speech, which would be especially useful under noisy listening conditions, thus explaining the motor speech induced effects of perception. (Hickok et al., Reference Hickok, Houde and Rong2011, p. 415).
Thus, I don’t disagree with DK that the motor system may contribute to speech perception under some circumstances within the context of a fundamentally auditory-based model of speech perception, i.e., the motor/mirror neuron theory is (still) wrong. It is worth reiterating, though, that much of the evidence for motor influence on speech perception is methodologically problematic, for reasons detailed in TMoMN. Notably, however, a more recent study (Schomers, Kirilina, Weigand, Bajbouj, & Pulvermuller, Reference Schomers, Kirilina, Weigand, Bajbouj and Pulvermuller2014), cited by DK, admirably addresses these concerns and is worthy of comment here.
The study follows up an influential report showing that stimulation of motor lip versus tongue areas differentially affects perception of syllables with lip- versus tongue-related sound onsets (D’Ausilio, Pulvermuller, Salmas, Bufalari, Begliomini, & Fadiga, Reference D’Ausilio, Pulvermuller, Salmas, Bufalari, Begliomini and Fadiga2009). Although the authors of this earlier study concluded that this demonstrates a causal role for the motor system in speech perception, the study did not control response bias and used a task that may not reflect normal speech recognition. The follow-up study avoided these problems by using a two-alternative-forced-choice word comprehension task. The researchers again stimulated either lip or tongue motor areas and found a crossover interaction in the speed at which words starting with lip- or tongue-related sounds were recognized (via button press decisions to a matching picture). This is a nice improvement over previous work, in my view, but still not terribly convincing for three reasons: (i) there are some internal inconsistencies; (ii) it is ambiguous whether the motor system is driving the effect; and (iii) the magnitude of the effect questions its theoretical import. Inconsistencies: A close look at the results shows that the effect holds in the reaction time (RT) but not the accuracy data. One can point to the RT data and claim that the motor system causally contributes to word comprehension. Or one can point out that motor stimulation has no effect on word comprehension accuracy, which is the ultimate metric for real-world communication. There is further inconsistency in the effects of stimulation site on performance in that a significant effect of stimulation was observed for tongue-related sounds but not lip-related sounds. These are not fatal problems but they do mirror the variability found in the general literature. Ambiguity: given the close proximity of motor and somatosensory cortex, and given that the stimulation location in most participants (10/13) relied on stereotaxic information derived from a group averaged fMRI study, it is unclear whether stimulation of motor or somatosensory representations are the basis of the effect. A somatosensory basis would, of course, raise new theoretical issues regarding the neural basis of speech perception – perhaps related to the role of somatosensory targets in speech production (Tremblay, Shiller, & Ostry, Reference Tremblay, Shiller and Ostry2003) – but if true would minimize the role of the motor system. Theoretical import: effects of motor stimulation on perception have only been reported at near-threshold levels of detectability. This is true in this new study where baseline accuracy was reported to be 69%. Further, as noted, motor modulation did not affect accuracy and RT effects were only observed for some sounds. So, if the motor system indeed contributes to speech perception, it only does so when speech is barely intelligible, doesn’t in fact change the intelligibility but only speeds it up, and only does so for some words. This leaves the vast amount of computational workload in speech perception to non-motor systems.
Finally, DK concludes: “I think he has gone too far in his critique, to the extent that he has thrown the proverbial baby out with the bathwater.” I counter that I was quite careful to just drain the bathwater (the Parma mirror neuron theory of action understanding) and leave the question of the baby (the hypotheses that motor system may play a modulatory role on perception) to future empirical work. I have no doubt that sensory and motor systems are tightly related, that they interact both during generation and perception. My own research on speech has demonstrated this repeatedly since 2001 when my then graduate students and I showed that a network of regions including Broca’s area, premotor cortex, the STS (superior temporal sulcus), and a region we have dubbed Spt exhibits auditory-motor response properties and supports auditory-motor integration (Buchsbaum, Hickok, & Humphries, Reference Buchsbaum, Hickok and Humphries2001; Hickok, Buchsbaum, Humphries, & Muftuler, Reference Hickok, Buchsbaum, Humphries and Muftuler2003; Hickok, Okada, & Serences, Reference Hickok, Okada and Serences2009; Isenberg, Vaden, Saberi, Muftuler, & Hickok, Reference Isenberg, Vaden, Saberi, Muftuler and Hickok2012; Pa & Hickok, Reference Pa and Hickok2008). I believe this auditory-motor network is the speech analogue of what the Parma researchers call the human mirror system. In some ways, then, my own work provides one of the best foundations for future research into the system. And, building on the work of others, I have laid out a fairly explicit model for how motor circuits may influence speech perception (Hickok, Reference Hickok2012; Hickok et al., Reference Hickok, Houde and Rong2011). But I remain skeptical of the motor modulation baby; appropriately so, I suggest, because it is intuitively appealing – predictive coding is all the rage – like the original mirror neuron claims. Thus, in the final chapter of TMoMN I pointed out some conceptual problems with current ideas regarding predictive coding such as that promoted by many, including myself and DK. I clearly expressed my reservations – “I now believe that the motor system and mirror neuron prediction operate squarely within the dorsal stream and play little role in perceptual recognition” – but also clearly kept the baby in the tub: “But this is an empirical question. We’ll have to wait and see what the data tell us” (p. 239).