Viability of BG as a speech/emotion synthesizer
A principle incorporated in contemporary models of speech production is that production occurs under one or more levels of feedback, where potential production errors are monitored either after utterance production (sensory feedback) or prior to it (via internal models; e.g., Hickok Reference Hickok2012). Ackermann et al. do not couch their account in an existing speech-production model and leave the issue of feedback underspecified. Nonetheless, if the BG were responsible for imbuing speech with emotive content, they would be expected to have the capacity to monitor and correct for related errors, that is, evaluate that the intended emotive tone/prosody was instantiated. However, BG are a weak candidate for such a function. The authors ignore studies indicating (i) that the auditory response in BG is temporally insufficient to provide feedback (Langers & Melcher Reference Langers and Melcher2011) and that it has limited functional connectivity with areas of the temporal cortex mediating language processing (Choi et al. Reference Choi, Yeo and Buckner2012); (ii) that emotive speech processing is mediated mainly by lateral temporal systems while excluding the BG (Kotz et al. Reference Kotz, Kalberlah, Bahlmann, Friederici and Haynes2013; Wildgruber et al. Reference Wildgruber, Ackermann, Kreifelts, Ethofer, Anders, Ende, Junghofer, Kissler and Wildgruber2006); and, most importantly, (iii) that individuals with BG infarcts are equally sensitive to emotional speech variations as control populations (Paulmann et al. Reference Paulmann, Pell and Kotz2008; Reference Paulmann, Ott and Kotz2011). These three points argue against the authors' claim that adding prosody to speech depends on integrity of striatum.
The suggested account relies on two additional premises that are not strongly supported by the literature: The first, that in adults, the BG can afford coding for emotion since adult perisylvian regions code for syllable motor programs, independently of the BG. Empirical support for this point is tenuous at best: Studies using manipulations of syllable frequency have either reported null results (Brendel et al. Reference Brendel, Erb, Riecker, Grodd, Ackermann and Ziegler2011; Riecker et al. Reference Riecker, Brendel, Ziegler, Erb and Ackermann2008) or documented effects in the anterior insula (Carreiras et al. Reference Carreiras, Mechelli and Price2006). The second, that the BG can merge emotional content due to cross talk between cortico-striatal-thalamic circuits. Although there is anatomical evidence for cross-talk across BG circuits in animal models (Haber Reference Haber2003), the functional significance of these needs to be fleshed out.
On the consideration of alternatives
A BG-oriented account should address questions such as those raised above, and equally importantly argue why the BG is the strongest neurobiological candidate for mediating the function in question. The authors do not make such an argument, which is unfortunate since much of the neurobiological argument made here for BG could be made effectively for other structures, such as the cerebellum.
The involvement of the cerebellum in emotional processing is well established. It is implicated in self-generation of various emotional states (Damasio et al. Reference Damasio, Grabowski, Bechara, Damasio, Ponto, Parvizi and Hichwa2000), with different emotions evoking distinct activity patterns in the structure (Baumann & Mattingley Reference Baumann and Mattingley2012). Damage to the cerebellum affects emotional processing. In animal models, early cerebellar lesions can lead to disrupted emotional processing (Bobee et al. Reference Bobee, Mariette, Tremblay-Leveau and Caston2000), and in human adults, the Cerebellar Cognitive Affective Syndrome (CCAS; Schmahmann & Sherman Reference Schmahmann and Sherman1998) is a recognized clinical entity associated with blunting of affect. CCAS has been attributed to damage to the posterior vermis, which reduces the cerebellar contribution to perisylvian cortical areas via its outflow to the ventral tier thalamic nuclei (Stoodley & Schmahmann Reference Stoodley and Schmahmann2010).
Arguments used by Ackermann et al. in support of their BG hypothesis could also be applied to the cerebellum. For example, FOXP2 expression is found in the cerebellum as well as the caudate (Lai et al. Reference Lai, Gerrelli, Monaco, Fisher and Copp2003; Watkins et al. Reference Watkins, Vargha-Khadem, Ashburner, Passingham, Connelly, Friston, Frackowiak, Mishkin and Gadian2002b), and as shown by Ackermann et al. (Reference Ackermann, Vogel, Petersen and Poremba1992), cerebellar lesions are associated with dysarthia. In addition, activity in the cerebellum, but not BG, discriminates emotive aspects of speech (Kotz et al. Reference Kotz, Kalberlah, Bahlmann, Friederici and Haynes2013). Furthermore, the cerebellum has the capacity for generating an internal forward model of motor-to-auditory predictions of the sort needed to evaluate whether the intended emotive aspect has been communicated (Knolle et al. Reference Knolle, Schroger and Kotz2013). While there is no direct examination of this issue for BG, work on motor control suggests that functionally, BG may implement open- rather than closed-loop control of motor actions (Gabrieli et al. Reference Gabrieli, Stebbins, Singh, Willingham and Goetz1997).
It is important to point out that these explanations are not mutually exclusive. Cerebellar and BG circuits involved with language converge at the ventral anterior nucleus of the thalamus, which has also been implicated in language, and can serve as a nidus for cortical feedback via cortico-thalamic projections (Crosson Reference Crosson2013). Further, cerebellar outflow can directly influence the BG, and vice versa (Bostan et al. Reference Bostan, Dum and Strick2013), suggesting that attributing the emotional content of speech to either of these two systems in isolation may not be possible. Given this connectivity, it may be that the cerebellum drives emotion-carrying vocalizations by involving BG, or that the BG trigger emotional behavior that is ultimately modulated by the cerebellum, as would be consistent with a CCAS syndrome. However, data on this issue are lacking.
Summary
Arguing that the BG can imbue speech with emotional content is a significant claim and, as such, requires additional evidence, accompanied by careful consideration of alternative accounts. We hope this commentary will result in more detailed examination of the aforementioned issues.
Ackermann et al.'s phylogenetic account of speech development hinges, in part, on premises related to the role of basal ganglia (BG) in adult human speech production. It argues that in adults, BG imbue speech with emotive content. While the model targets an important and neglected issue, we argue that it suffers from two structural weaknesses: First, it does not sufficiently consider studies of the role of BG in auditory and emotive processing such as those showing that BG damage does not disrupt emotive processing in speech. Second, the argument also overlooks the possibility that the role attributed to the BG may be at least in part mediated by a different system – the cortico-ponto-cerebellar system. We believe the authors' account would be much strengthened if they address these points, which we detail in turn.
Viability of BG as a speech/emotion synthesizer
A principle incorporated in contemporary models of speech production is that production occurs under one or more levels of feedback, where potential production errors are monitored either after utterance production (sensory feedback) or prior to it (via internal models; e.g., Hickok Reference Hickok2012). Ackermann et al. do not couch their account in an existing speech-production model and leave the issue of feedback underspecified. Nonetheless, if the BG were responsible for imbuing speech with emotive content, they would be expected to have the capacity to monitor and correct for related errors, that is, evaluate that the intended emotive tone/prosody was instantiated. However, BG are a weak candidate for such a function. The authors ignore studies indicating (i) that the auditory response in BG is temporally insufficient to provide feedback (Langers & Melcher Reference Langers and Melcher2011) and that it has limited functional connectivity with areas of the temporal cortex mediating language processing (Choi et al. Reference Choi, Yeo and Buckner2012); (ii) that emotive speech processing is mediated mainly by lateral temporal systems while excluding the BG (Kotz et al. Reference Kotz, Kalberlah, Bahlmann, Friederici and Haynes2013; Wildgruber et al. Reference Wildgruber, Ackermann, Kreifelts, Ethofer, Anders, Ende, Junghofer, Kissler and Wildgruber2006); and, most importantly, (iii) that individuals with BG infarcts are equally sensitive to emotional speech variations as control populations (Paulmann et al. Reference Paulmann, Pell and Kotz2008; Reference Paulmann, Ott and Kotz2011). These three points argue against the authors' claim that adding prosody to speech depends on integrity of striatum.
The suggested account relies on two additional premises that are not strongly supported by the literature: The first, that in adults, the BG can afford coding for emotion since adult perisylvian regions code for syllable motor programs, independently of the BG. Empirical support for this point is tenuous at best: Studies using manipulations of syllable frequency have either reported null results (Brendel et al. Reference Brendel, Erb, Riecker, Grodd, Ackermann and Ziegler2011; Riecker et al. Reference Riecker, Brendel, Ziegler, Erb and Ackermann2008) or documented effects in the anterior insula (Carreiras et al. Reference Carreiras, Mechelli and Price2006). The second, that the BG can merge emotional content due to cross talk between cortico-striatal-thalamic circuits. Although there is anatomical evidence for cross-talk across BG circuits in animal models (Haber Reference Haber2003), the functional significance of these needs to be fleshed out.
On the consideration of alternatives
A BG-oriented account should address questions such as those raised above, and equally importantly argue why the BG is the strongest neurobiological candidate for mediating the function in question. The authors do not make such an argument, which is unfortunate since much of the neurobiological argument made here for BG could be made effectively for other structures, such as the cerebellum.
The involvement of the cerebellum in emotional processing is well established. It is implicated in self-generation of various emotional states (Damasio et al. Reference Damasio, Grabowski, Bechara, Damasio, Ponto, Parvizi and Hichwa2000), with different emotions evoking distinct activity patterns in the structure (Baumann & Mattingley Reference Baumann and Mattingley2012). Damage to the cerebellum affects emotional processing. In animal models, early cerebellar lesions can lead to disrupted emotional processing (Bobee et al. Reference Bobee, Mariette, Tremblay-Leveau and Caston2000), and in human adults, the Cerebellar Cognitive Affective Syndrome (CCAS; Schmahmann & Sherman Reference Schmahmann and Sherman1998) is a recognized clinical entity associated with blunting of affect. CCAS has been attributed to damage to the posterior vermis, which reduces the cerebellar contribution to perisylvian cortical areas via its outflow to the ventral tier thalamic nuclei (Stoodley & Schmahmann Reference Stoodley and Schmahmann2010).
Arguments used by Ackermann et al. in support of their BG hypothesis could also be applied to the cerebellum. For example, FOXP2 expression is found in the cerebellum as well as the caudate (Lai et al. Reference Lai, Gerrelli, Monaco, Fisher and Copp2003; Watkins et al. Reference Watkins, Vargha-Khadem, Ashburner, Passingham, Connelly, Friston, Frackowiak, Mishkin and Gadian2002b), and as shown by Ackermann et al. (Reference Ackermann, Vogel, Petersen and Poremba1992), cerebellar lesions are associated with dysarthia. In addition, activity in the cerebellum, but not BG, discriminates emotive aspects of speech (Kotz et al. Reference Kotz, Kalberlah, Bahlmann, Friederici and Haynes2013). Furthermore, the cerebellum has the capacity for generating an internal forward model of motor-to-auditory predictions of the sort needed to evaluate whether the intended emotive aspect has been communicated (Knolle et al. Reference Knolle, Schroger and Kotz2013). While there is no direct examination of this issue for BG, work on motor control suggests that functionally, BG may implement open- rather than closed-loop control of motor actions (Gabrieli et al. Reference Gabrieli, Stebbins, Singh, Willingham and Goetz1997).
It is important to point out that these explanations are not mutually exclusive. Cerebellar and BG circuits involved with language converge at the ventral anterior nucleus of the thalamus, which has also been implicated in language, and can serve as a nidus for cortical feedback via cortico-thalamic projections (Crosson Reference Crosson2013). Further, cerebellar outflow can directly influence the BG, and vice versa (Bostan et al. Reference Bostan, Dum and Strick2013), suggesting that attributing the emotional content of speech to either of these two systems in isolation may not be possible. Given this connectivity, it may be that the cerebellum drives emotion-carrying vocalizations by involving BG, or that the BG trigger emotional behavior that is ultimately modulated by the cerebellum, as would be consistent with a CCAS syndrome. However, data on this issue are lacking.
Summary
Arguing that the BG can imbue speech with emotional content is a significant claim and, as such, requires additional evidence, accompanied by careful consideration of alternative accounts. We hope this commentary will result in more detailed examination of the aforementioned issues.