Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-02-11T10:33:56.706Z Has data issue: false hasContentIssue false

Brain mechanisms of acoustic communication in humans and nonhuman primates: An evolutionary perspective

Published online by Cambridge University Press:  15 May 2014

Hermann Ackermann
Affiliation:
Neurophonetics Group, Centre for Neurology – General Neurology, Hertie Institute for Clinical Brain Research, University of Tuebingen, D-72076 Tuebingen, Germany. hermann.ackermann@uni-tuebingen.dewww.hih-tuebingen.de/neurophonetik
Steffen R. Hage
Affiliation:
Neurobiology of Vocal Communication Research Group, Werner Reichardt Centre for Integrative Neuroscience, and Institute for Neurobiology, Department of Biology, University of Tuebingen, D-72076 Tuebingen, Germany. steffen.hage@uni-tuebingen.dewww.vocalcommunication.de
Wolfram Ziegler
Affiliation:
Clinical Neuropsychology Research Group, City Hospital Munich-Bogenhausen, D-80992 Munich, and Institute of Phonetics and Speech Processing, Ludwig-Maximilians-University, D-80799 Munich, Germany. wolfram.ziegler@extern.lrz-muenchen.dewww.ekn.mwn.de
Rights & Permissions [Opens in a new window]

Abstract

Any account of “what is special about the human brain” (Passingham 2008) must specify the neural basis of our unique ability to produce speech and delineate how these remarkable motor capabilities could have emerged in our hominin ancestors. Clinical data suggest that the basal ganglia provide a platform for the integration of primate-general mechanisms of acoustic communication with the faculty of articulate speech in humans. Furthermore, neurobiological and paleoanthropological data point at a two-stage model of the phylogenetic evolution of this crucial prerequisite of spoken language: (i) monosynaptic refinement of the projections of motor cortex to the brainstem nuclei that steer laryngeal muscles, presumably, as part of a “phylogenetic trend” associated with increasing brain size during hominin evolution; (ii) subsequent vocal-laryngeal elaboration of cortico-basal ganglia circuitries, driven by human-specific FOXP2 mutations.;>This concept implies vocal continuity of spoken language evolution at the motor level, elucidating the deep entrenchment of articulate speech into a “nonverbal matrix” (Ingold 1994), which is not accounted for by gestural-origin theories. Moreover, it provides a solution to the question for the adaptive value of the “first word” (Bickerton 2009) since even the earliest and most simple verbal utterances must have increased the versatility of vocal displays afforded by the preceding elaboration of monosynaptic corticobulbar tracts, giving rise to enhanced social cooperation and prestige. At the ontogenetic level, the proposed model assumes age-dependent interactions between the basal ganglia and their cortical targets, similar to vocal learning in some songbirds. In this view, the emergence of articulate speech builds on the “renaissance” of an ancient organizational principle and, hence, may represent an example of “evolutionary tinkering” (Jacob 1977).

Type
Target Article
Copyright
Copyright © Cambridge University Press 2014 

1. Introduction: Species-unique (verbal) and primate-general (nonverbal) aspects of human vocal behavior

1.1. Nonhuman primates: Speechlessness in the face of extensive vocal repertoires and elaborate oral-motor capabilities

All attempts to teach great apes spoken language have failed – even in our closest cousins, the chimpanzees (Pan troglodytes) and bonobos (Pan paniscus) (Hillix Reference Hillix and Washburn2007; Wallman Reference Wallman1992), despite the fact that these species have “notoriously mobile lips and tongues, surely transcending the human condition” (Tuttle Reference Tuttle and Washburn2007, p. 21). As an example, the cross-fostered chimpanzee infant Viki mastered less than a handful of “words” even after extensive training. These utterances were not organized as speech-like vocal tract activities, but rather as orofacial manoeuvres imposed on a (voiceless) expiratory air stream (Hayes Reference Hayes1951, p. 67; see Cohen Reference Cohen2010). By contrast, Viki was able to skillfully imitate manual and even orofacial movement sequences of her caretakers (Hayes & Hayes Reference Hayes and Hayes1952) and learned, for example, to blow a whistle (Hayes Reference Hayes1951, pp. 77, 89).

Nonhuman primates are, nevertheless, equipped with rich vocal repertoires, related specifically to ongoing intra-group activities or environmental events (Cheney & Seyfarth Reference Cheney and Seyfarth1990; Reference Cheney and Seyfarth2007). Yet, their calls seem to be linked to different levels of arousal associated with especially urgent functions, such as escaping predators, surviving in fights, keeping contact with the group, and searching for food resources or mating opportunities (Call & Tomasello Reference Call and Tomasello2007; Manser et al. Reference Manser, Seyfarth and Cheney2002; Seyfarth & Cheney Reference Seyfarth and Cheney2003b; Tomasello Reference Tomasello2008). Several studies point, indeed, at a more elaborate “cognitive load” to the vocalizations of monkeys and apes in terms of subtle audience effects (Wich & de Vries Reference Wich and de Vries2006), conceptual-semantic information (Zuberbühler Reference Zuberbühler2000a; Zuberbühler et al. Reference Zuberbühler, Cheney and Seyfarth1999), proto-syntactical call concatenations (Arnold & Zuberbühler Reference Arnold and Zuberbühler2006; Ouattara et al. Reference Ouattara, Lemasson and Zuberbühler2009), conditionability (Aitken & Wilson Reference Aitken and Wilson1979; Hage et al. Reference Hage, Gavrilov and Nieder2013; Sutton et al. Reference Sutton, Larson, Taylor and Lindeman1973; West & Larson Reference West and Larson1995), and the capacity to use distinct calls interchangeably under different conditions (Hage et al. Reference Hage, Gavrilov and Nieder2013). It remains, however, to be determined whether such communicative skills really represent precursors of higher-order cognitive–linguistic operations. In any case, the motor mechanisms of articulate speech appear to lack significant vocal antecedents within the primate lineage. This limitation of the faculty of acoustic communication is “particularly puzzling because [nonhuman primates] appear to have so many concepts that could, in principle, be articulated” (Cheney & Seyfarth Reference Cheney and Seyfarth2005, p. 142). As a consequence, the manual and facial gestures rather than the vocal calls of our primate ancestors have been considered the vantage point of language evolution in our species (e.g., Corballis Reference Corballis2002, p. ix; Reference Corballis2003).

Tracing back to the 1960s, vocal tract morphology has been assumed to preclude production of “the full range of human speech sounds” (Lieberman Reference Lieberman2006a; Reference Lieberman2006b, p. 289) and, thereby, to constrain imitation of spoken language in nonhuman primates (Lieberman Reference Lieberman1968; Lieberman et al. Reference Lieberman, Klatt and Wilson1969). However, this model cannot account for the inability of nonhuman primates to produce even the most simple verbal utterances. The complete lack of verbal acoustic communication rather suggests more crucial cerebral limitations of vocal tract motor control (Boë et al. Reference Boë, Heim, Honda and Maeda2002; Clegg Reference Clegg and Bannan2012; Fitch Reference Fitch2000a; Reference Fitch2000b). According to a more recent hypothesis, lip smacking – a rhythmic facial expression frequently observed in monkeys – might constitute a precursor of the dynamic organization of speech syllables (Ghazanfar et al. Reference Ghazanfar, Takahashi, Mathur and Fitch2012; MacNeilage Reference MacNeilage1998). As an important evolutionary step, a phonation channel must have been added in order to render lip smacking an audible behavioral pattern (Ghazanfar et al. Reference Ghazanfar, Morill and Kayser2013). Hence, this theory calls for a neurophysiological model of how articulator movements were refined and, finally, integrated with equally refined laryngeal movements to create the complex motor skill underlying the production of speech.

1.2. Dual-pathway models of acoustic communication and the enigma of emotive speech prosody

The calls of nonhuman primates are mediated by a complex network of brainstem components, encompassing a midbrain “trigger structure,” located in the periaqueductal gray (PAG) and adjacent tegmentum, and a pontine vocal pattern generator (Gruber-Dujardin Reference Gruber-Dujardin and Brudzynski2010; Hage Reference Hage and Brudzynski2010a; Reference Hage and Brudzynski2010b). In addition to various subcortical limbic areas, the medial wall of the frontal lobes, namely, the cingulate vocalization region and adjacent neocortical areas, also projects to the PAG. This region, presumably, controls higher-order motor aspects of vocalization such as operant call conditioning (e.g., Trachy et al. Reference Trachy, Sutton and Lindeman1981). By contrast, the acoustic implementation of the sound structure of spoken language is bound to a cerebral circuit including the ventrolateral/insular aspects of the language-dominant frontal lobe and the primary sensorimotor cortex, the basal ganglia, and cerebellar structures in either hemisphere (Ackermann & Riecker Reference Ackermann, Riecker, Maassen and van Lieshout2010a; Ackermann & Ziegler Reference Ackermann, Ziegler, Hardcastle, Laver and Gibbon2010; Ackermann et al. Reference Ackermann, Hertrich, Ziegler, Damico, Müller and Ball2010). Given the virtually complete speechlessness of nonhuman primates, the behavioral analogues of acoustic mammalian communication might not be sought within the domain of spoken language, but rather in the nonverbal affective vocalizations of our species such as laughing, crying, or moaning (Owren et al. Reference Owren, Amoss and Rendall2011). Against this background, two separate neuroanatomic “channels” with different phylogenetic histories appear to participate in human acoustic communication, supporting nonverbal affective vocalizations and articulate speech, respectively (the “dual-pathway model” of human acoustic communication; see Ackermann Reference Ackermann2008; Owren et al. Reference Owren, Amoss and Rendall2011; for an earlier formulation, see Myers Reference Myers, Harnad, Steklis and Lancaster1976).

Human vocal expression of motivational states is not restricted to nonverbal affective displays, but deeply invades articulate speech. Thus, a speaker's arousal-related mood such as anger or joy shape the “tone” of spoken language (emotive/affective speech prosody). Along with nonverbal affective vocalizations, emotive speech prosody has also be considered a behavioral trait homologous to the calls of nonhuman primates (Heilman et al. Reference Heilman, Leon and Rosenbek2004; Jürgens Reference Jürgens1986; Reference Jürgens2002b; Jürgens & von Cramon Reference Jürgens and von Cramon1982).Footnote 1 Moreover, one's attitude towards a person and one's appraisal of a topic have a significant impact on the “speech melody” of verbal utterances (attitudinal prosody). Often these implicit aspects of acoustic communication – how we say something – are more relevant to a listener than propositional content, that is, what we say (e.g., Wildgruber et al. Reference Wildgruber, Ackermann, Kreifelts, Ethofer, Anders, Ende, Junghofer, Kissler and Wildgruber2006). The timber and intonational contour of a speaker's voice, the loudness fluctuations and the rhythmic structure of verbal utterances, including the variation of speaking rate and the local distinctness of articulation, represent the most salient acoustic correlates of affective and attitudinal prosody (Scherer Reference Scherer1986; Scherer et al. Reference Scherer, Johnstone, Klasmeyer, Davidson, Scherer and Goldsmith2009; Sidtis & Van Lancker Sidtis Reference Sidtis and Van Lancker Sidtis2003). Unlike the propositional content of the speech signal – which ultimately maps onto a digital code of discrete phonetic-linguistic categories – the prosodic modulation of verbal utterances conveys graded/analogue information on a speaker's motivational states and intentional composure (Burling Reference Burling2005). Most importantly, activity of the same set of vocal tract muscles and a single speech wave simultaneously convey both the propositional and emotional contents of spoken language. Hence, two information sources seated in separate brain networks and creating fundamentally different data structures (analogue versus digital) contribute simultaneously to the formation of the speech signal. Therefore, the two channels must coordinate at some level of the central nervous system. Otherwise these two inputs would distort and corrupt each other. So far, dual-pathway models of human acoustic communication have not specified the functional mechanisms and neuroanatomic pathways that participate in the generation of a speech signal with “intimately intertwined linguistic and expressive cues” (Scherer et al. Reference Scherer, Johnstone, Klasmeyer, Davidson, Scherer and Goldsmith2009, p. 446; see also Banse & Scherer Reference Banse and Scherer1996, p. 618). This deep entrenchment of articulate speech into a “nonverbal matrix” has been assumed to represent “the weakest point of gestural theories” of language evolution (Ingold Reference Ingold, Quiatt and Itani1994, p. 302).

Within the vocal domain, Parkinson's disease (PD) – a paradigmatic dysfunction of dopamine neurotransmission at the level of the striatal component of the basal ganglia – gives predominantly rise to a disruption of prosodic aspects of verbal utterances. Thus, the “addition of prosodic contour” to articulate speech appears to depend on the integrity of the striatum (Darkins et al. Reference Darkins, Fromkin and Benson1988; see Van Lancker Sidtis et al. Reference Van Lancker Sidtis, Pachana, Cummings and Sidtis2006). Against this background, structural reorganization of the basal ganglia during hominin evolution may have been a pivotal prerequisite for the emergence of spoken language, providing a crucial phylogenetic link – at least at the motor level – between the vocalizations of our primate ancestors, on the one hand, and the volitional motor aspects of articulate speech, on the other.Footnote 2

Comparative molecular-genetic data corroborate this suggestion: First, certain mutations of the FOXP2 gene in humans give rise to developmental verbal dyspraxia. This disorder of spoken language, presumably, reflects impaired sequencing of orofacial movements in the absence of basic deficits of motor execution such as paresis of vocal tract muscles (Fisher et al. Reference Fisher, Lai and Monaco2003; Fisher & Scharff Reference Fisher and Scharff2009; Vargha-Khadem et al. Reference Vargha-Khadem, Gadian, Copp and Mishkin2005). Individuals affected with developmental verbal dyspraxia show a reduced volume of the striatum, the extent of which is correlated with the severity of nonverbal oral and speech motor impairments (Watkins et al. Reference Watkins, Vargha-Khadem, Ashburner, Passingham, Connelly, Friston, Frackowiak, Mishkin and Gadian2002b).Footnote 3 Second, placement of two hominin-specific FOXP2 mutations into the mouse genome (“humanized Foxp2”) gives rise to distinct morphological changes at the cellular level of the cortico-striatal-thalamic circuits in these rodents (Enard Reference Enard2011). However, verbal dyspraxia subsequent to FOXP2 mutations is characterized by a fundamentally different profile of speech motor deficits as compared to Parkinsonian dysarthria. The former resembles a communication disorder which, in adults, reflects damage to fronto-opercular cortex (i.e., inferior frontal/lower precentral gyrus) or the anterior insula of the language-dominant hemisphere (Ackermann & Riecker Reference Ackermann and Riecker2010b; Ziegler Reference Ziegler, Goldenberg and Miller2008).

To resolve this dilemma, we propose that ontogenetic speech acquisition depends on close interactions between the basal ganglia and their cortical targets, whereas mature verbal communication requires much less striatal processing capacities. This hypothesis predicts different speech motor deficits in perinatal dysfunctions of the basal ganglia as compared to the acquired dysarthria of PD patients. More specifically, basal ganglia disorders with an onset prior to speech acquisition should severely disrupt articulate speech rather than predominantly compromise the implementation of speech prosody.

1.3. Organization of this target article

The suggestion that structural refinement of cortico-striatal circuits – driven by human-specific mutations of the FOXP2 gene – represents a pivotal step towards the emergence of spoken language in our hominin ancestors eludes any direct experimental evaluation. Nevertheless, certain inferences on the role of the basal ganglia in speech motor control can be tested against the available clinical and functional-imaging data. As a first step, the neuroanatomical underpinnings of the vocal behavior of nonhuman primates are reviewed in section 2 – as a prerequisite to the subsequent investigation of the hypothesis that in our species this system conveys nonverbal information through affective vocalizations and emotive/attitudinal speech prosody (sect. 3). Based upon clinical and neurobiological data, section 4 then characterizes the differential contribution of the basal ganglia to spoken language at the levels of ontogenetic speech acquisition (sect. 4.2.1) and of mature articulate speech (sect. 4.2.2), and delineates a neurophysiological model of the participation of the striatum in verbal behavior. Finally, these data are put into a paleoanthropological perspective in section 5.

2. Acoustic communication in nonhuman primates: Behavioral variation and cerebral control

2.1. Structural malleability of vocal signals

2.1.1. Ontogenetic emergence of acoustic call morphology

The vocal repertoires of monkeys and apes encompass noise-like and harmonic components (Fig. 1A; De Waal Reference De Waal1988; Goodall Reference Goodall1986; Struhsaker Reference Struhsaker and Altmann1967; Winter et al. Reference Winter, Ploog and Latta1966). Vocal signals of both categories vary considerably across individuals, because age, body size, and stamina influence vocal tract shape and tissue characteristics, for example, the distance between the lips and the larynx (Fischer et al. Reference Fischer, Hammerschmidt, Cheney and Seyfarth2002; Reference Fischer, Kitchen, Seyfarth and Cheney2004; Fitch Reference Fitch1997; but see Rendall et al. Reference Rendall, Kollias, Ney and Lloyd2005). However, experiments based on acoustic deprivation of squirrel monkeys (Saimiri sciureus) and cross-fostering of macaques and lesser apes revealed that call structure does not appear to depend in any significant manner on species-typical auditory input (Brockelman & Schilling Reference Brockelman and Schilling1984; Geissmann Reference Geissmann1984; Hammerschmidt & Fischer Reference Hammerschmidt, Fischer, Oller and Griebel2008; Owren et al. Reference Owren, Dieter, Seyfarth and Cheney1992; Reference Owren, Dieter, Seyfarth and Cheney1993; Talmage-Riggs et al. Reference Talmage-Riggs, Winter, Ploog and Mayer1972; Winter et al. Reference Winter, Handley, Ploog and Schott1973). Thus, ontogenetic modifications of acoustic structure may simply reflect maturation of the vocal apparatus, including “motor-training” effects (Hammerschmidt & Fischer Reference Hammerschmidt, Fischer, Oller and Griebel2008; Pistorio et al. Reference Pistorio, Vintch and Wang2006), or the influence of hormones related to social status (Roush & Snowdon Reference Roush and Snowdon1994; Reference Roush and Snowdon1999). In contrast, comprehension and usage of acoustic signals show considerably more malleability than acoustic structure both in juvenile and adult animals (Owren et al. Reference Owren, Amoss and Rendall2011).

Figure 1A. Acoustic communication in nonhuman primates: Call structure.

A. Spectrograms (left-hand section of each panel) and power spectra (right-hand section in each) of two common rhesus monkey vocalizations, that is, a “coo” (left panel) and a “grunt” (right panel). Gray level of the spectrograms codes for spectral energy. Coo calls (left panel) are characterized by a harmonic structure, encompassing a fundamental frequency (F0, the lowest and darkest band) and several harmonics (H1 to Hn). Measures derived from the F0 contour provide robust criteria for a classification of periodic signals, for example, peak frequency (peakF; Hardus et al. Reference Hardus, Lameira, Singleton, Morrogh-Bernard, Knott, Ancrenaz, Utami Atmoko, Wich, Wich, Utami Atmoko, Setia and van Schaik2009a). Onset F0 seems to be highly predictive for the shape of the intonation contour, indicating the implementation of a “vocal plan” prior to movement initiation (Miller et al. Reference Miller, Beck, Meade and Wang2009a; Reference Miller, Eliades and Wang2009b). Grunts (right) represent short and noisy calls whose spectra include more energy in the lower frequency range and a rather flat energy distribution.

2.1.2. Spontaneous adult call plasticity: Convergence on and imitation of species-typical variants of vocal behavior

Despite innate acoustic call structures, the vocalizations of nonhuman primates may display some context-related variability in adulthood. For example, two populations of pygmy marmosets (Cebuella pygmaea) of a different geographic origin displayed convergent shifts of spectral and durational call parameters (Elowson & Snowdon Reference Elowson and Snowdon1994; see further examples in Snowdon & Elowson Reference Snowdon and Elowson1999 and Rukstalis et al. Reference Rukstalis, Fite and French2003). Humans may also match their speaking styles inadvertently during conversation (“speech accommodation theory”; Burgoon et al. Reference Burgoon, Floyd, Guerrero, Berger, Roloff and Roskos-Ewoldsen2010; see Masataka [Reference Masataka and Masataka2008a; Reference Masataka and Masataka2008b] for an example). Such accommodation effects could provide a basis for the changes in call morphology during social interactions in nonhuman primates (Fischer Reference Fischer and Ghazanfar2003; Mitani & Brandt Reference Mitani and Brandt1994; Mitani & Gros-Louis Reference Mitani and Gros-Louis1998; Sugiura Reference Sugiura1998). Subsequent reinforcement processes may give rise to “regional dialects” of primate species (Snowdon Reference Snowdon, Oller and Griebel2008). Rarely, even memory-based imitation capabilities have been observed in great apes: Thus, free-living chimpanzees were found to copy the distinctive intonational and rhythmic pattern of the pant hoots of other subjects – even after the animal providing the acoustic template had disappeared from the troop (Boesch & Boesch-Achermann Reference Boesch and Boesch-Achermann2000, pp. 234f ). Whatever the precise mechanisms of vocal convergence, these phenomena are indicative of the operation of a neuronal feedback loop between auditory perception and vocalization in nonhuman primates (see Brumm et al. Reference Brumm, Voss, Köllmer and Todt2004).

A male bonobo infant (“Kanzi”) reared in an enriched social environment spontaneously augmented his species-typical repertoire by four “novel” vocalizations (Hopkins & Savage-Rumbaugh Reference Hopkins and Savage-Rumbaugh1991). However, these newly acquired signals can be interpreted as scaled variants of a single intonation contour (Fig. 3 in Taglialatela et al. Reference Taglialatela, Savage-Rumbaugh and Baker2003). Since Pan paniscus has, to some degree, a graded rather than discrete call system (Bermejo & Omedes Reference Bermejo and Omedes1999; Clay & Zuberbühler Reference Clay and Zuberbühler2009), new behavior challenges could give rise to a differentiation of the available “vocal space” – indicating a potential to modulate call structures within the range of innate acoustic constraints rather than the ability to learn new vocal signals. An alternative interpretation is that hitherto un-deployed vocalizations were recruited under those conditions (Lemasson & Hausberger Reference Lemasson and Hausberger2004; Lemasson et al. Reference Lemasson, Hausberger and Zuberbühler2005).

2.1.3. Volitional initiation of vocal behavior and modulation of acoustic call structure

It has been a matter of debate for decades, in how far nonhuman primates are capable of volitional call initiation and modulation. A variety of behavioral studies seem to indicate both control over the timing of vocal output and the capacity to “decide” which acoustic signal to emit in a given context. First, at least two species of New World primates (tamarins, marmosets) discontinue acoustic communication during epochs of increased ambient noise in order to avoid signal interferences and, therefore, to increase call detection probability (Egnor et al. Reference Egnor, Wickelgren and Hauser2007; Roy et al. Reference Roy, Miller, Gottsch and Wang2011). In addition, callitrichid monkeys obey “conversational rules” and show response selectivity during vocal exchanges (Miller et al. Reference Miller, Beck, Meade and Wang2009a; Reference Miller, Eliades and Wang2009b; but see Rukstalis et al. Reference Rukstalis, Fite and French2003: independent F0 onset change). Such observations were assumed to indicate some degree of volitional control over call production. As an alternative interpretation, these changes in vocal timing or loudness could simply reflect threshold effects of audio-vocal integration mechanisms. Second, several nonhuman primates produce acoustically different alarm vocalizations in response to distinct predator species, suggesting volitional access to call type (e.g., Seyfarth et al. Reference Seyfarth, Cheney and Marler1980). Again, variation of motivational states could account for these findings. For example, the approach of an aerial predator could represent a much more threatening event than the presence of a snake. To some extent, even dynamic spectro-temporal features resembling the formant transients of the human acoustic speech signal (see below sect. 4.1.) appear to contribute to the differentiation of predator-specific alarm vocalizations (“leopard calls”) in Diana monkeys (Cercopithecus diana) (Riede & Zuberbühler Reference Riede and Zuberbühler2003a; Reference Riede and Zuberbühler2003b; see Lieberman [1968] for earlier data). Yet, computer models insinuate that larynx lowering makes a critical contribution to these changes (Riede et al. Reference Riede, Bronson, Hatzikirou and Zuberbühler2005; Reference Riede, Bronson, Hatzikirou and Zuberbühler2006; see critical comments in Lieberman Reference Lieberman2006b), thus, eliciting in a receiver the impression of a bigger-than-real body size of the sender (Fitch Reference Fitch2000b; Fitch & Reby Reference Fitch and Reby2001). Diana monkeys may have learned this manoeuver as a strategy to mob large predators, a behavior often observed in the wild (Zuberbühler & Jenny Reference Zuberbühler, Jenny, McGraw, Zuberbühler and Noe2007).

The question of whether nonhuman primates are able to decouple their vocalizations from accompanying motivational states and to use them in a goal-directed manner has been addressed in several operant-conditioning experiments (Aitken & Wilson Reference Aitken and Wilson1979; Coudé et al. Reference Coudé, Ferrari, Rodà, Maranesi, Borelli, Veroni, Monti, Rozzi and Fogassi2011; Hage et al. Reference Hage, Gavrilov and Nieder2013; Koda et al. Reference Koda, Oyakawa, Kato and Masataka2007; Sutton et al. Reference Sutton, Larson, Taylor and Lindeman1973; West & Larson Reference West and Larson1995). In most of these studies, nonhuman primates learned to utter a vocalization in response to a food reward (e.g., Coudé et al. Reference Coudé, Ferrari, Rodà, Maranesi, Borelli, Veroni, Monti, Rozzi and Fogassi2011; Koda et al. Reference Koda, Oyakawa, Kato and Masataka2007). Rather than demonstrating the ability to volitionally vocalize on command, these studies merely confirm, essentially, that nonhuman primates produce adequate, motivationally based behavioral reactions to hedonistic stimuli. A recent study found, however, that rhesus monkeys can be trained to produce different call types in response to arbitrary visual signals and that they are capable to switch between two distinct call types associated with different cues on a trial-to-trial basis (Hage et al. Reference Hage, Gavrilov and Nieder2013). These observations indicate that the animals are able – within some limits – to volitionally initiate vocalizations and, therefore, are capable to instrumentalize their vocal utterances in order to accomplish behavioral tasks successfully. Likewise, macaque monkeys may acquire control over loudness and duration of coo calls (Hage et al. Reference Hage, Gavrilov and Nieder2013; Larson et al. Reference Larson, Sutton, Taylor and Lindeman1973; Sutton et al. Reference Sutton, Larson, Taylor and Lindeman1973; Reference Sutton, Trachy and Lindeman1981; Trachy et al. Reference Trachy, Sutton and Lindeman1981). A more recent investigation even reported spontaneous differentiation of coo calls in Japanese macaques with respect to peak and offset of the F0 contour during operant tool-use training (Hihara et al. Reference Hihara, Yamada, Iriki and Okanoya2003). Such accomplishments may, however, be explained by the adjustment of respiratory functions and do not conclusively imply operant control over spectro-temporal call structure in nonhuman primates (Janik & Slater Reference Janik, Slater, Slater, Rosenblatt, Snowdon and Milinski1997; Reference Janik and Slater2000).

2.1.4. Observational acquisition of species-atypical sounds

Few instances of species-atypical vocalizations in nonhuman primates have been reported so far. Allegedly, the bonobo Kanzi, mentioned earlier, spontaneously acquired a few vocalizations resembling spoken words (Savage-Rumbaugh et al. Reference Savage-Rumbaugh, Fields and Spircu2004). Yet, systematic perceptual data substantiating these claims are not available. As further anecdotal evidence, Wich et al. (Reference Wich, Swartz, Hardus, Lameira, Stromberg and Shumaker2009) reported that a captive-born female orangutan (Pongo pygmaeus × Pongo abelii) began to produce human-like whistles at an age of about 12 years in the absence of any training. Furthermore, an idiosyncratic pant hoot variant (“Bronx cheer” – resembling a sound called “blowing raspberries”) spread throughout a colony of several tens of captive chimpanzees after it had been introduced by a male joining the colony (Hopkins et al. Reference Hopkins, Taglialatela and Leavens2007; Marshall et al. Reference Marshall, Wrangham and Arcadi1999; similar sounds have been observed in wild orangutans: Hardus et al. Reference Hardus, Lameira, Singleton, Morrogh-Bernard, Knott, Ancrenaz, Utami Atmoko, Wich, Wich, Utami Atmoko, Setia and van Schaik2009a; Reference Hardus, Lameira, van Schaik and Wich2009b; van Schaik et al. Reference van Schaik, Ancrenaz, Borgen, Galdikas, Knott, Singleton, Suzuki, Utami and Merrill2003; Reference van Schaik, van Noordwijk and Wich2006). Remarkably, these two acoustic displays, “raspberries” and whistles, do not engage laryngeal sound-production mechanisms, but reflect a linguo-labial trill (“raspberries”) or arise from oral air-stream resonances (whistles). Thus, the species-atypical acoustic signals in nonhuman primates observed to date spare glottal mechanisms of sound generation. Apparently, laryngeal motor activity cannot be decoupled volitionally from species-typical audiovisual displays (Knight Reference Knight, Dunbar, Knight and Power1999).

2.2. Cerebral control of motor aspects of call production

2.2.1. Brainstem mechanisms (PAG and pontine vocal pattern generator)

Since operant conditioning of the calls of nonhuman primates is technically challenging (Pierce Reference Pierce1985), analyses of the neurobiological control mechanisms engaged in phonatory functions relied predominantly on electrical brain stimulation. In squirrel monkeys (Saimiri sciureus) – the species studied most extensively so far (Gonzalez-Lima Reference Gonzalez-Lima and Brudzynski2010) – vocalizations could be elicited at many cerebral locations, extending from the forebrain to the lower brainstem. This network encompasses a variety of subcortical limbic structures such as the hypothalamus, septum, and amygdala (Fig. 1B; Brown Reference Brown1915; Jürgens Reference Jürgens2002b; Jürgens & Ploog Reference Jürgens and Ploog1970; Smith Reference Smith1945). In mammals, all components of this highly conserved “communicating brain” (Newman Reference Newman2003) appear to project to the periaqueductal grey (PAG) of the midbrain and the adjacent mesencephalic tegmentum (Gruber-Dujardin Reference Gruber-Dujardin and Brudzynski2010).Footnote 4 Based on the integration of input from motivation-controlling regions, sensory structures, motor areas, and arousal-related systems, the PAG seems to gate the vocal dimension of complex multi-modal emotional responses such as fear or aggression. The subsequent coordination of cranial nerve nuclei engaged in the innervation of vocal tract muscles depends on a network of brainstem structures, including, particularly, a vocal pattern generator bound to the ventrolateral pons (Hage Reference Hage and Brudzynski2010a; Reference Hage and Brudzynski2010b; Hage & Jürgens Reference Hage and Jürgens2006).

Figure 1B. Acoustic Communication in nonhuman Primates: Cerebral Organization.

Cerebral “vocalization network” of the squirrel monkey (as a model of the primate-general “communication brain”). The solid lines represent the “vocal brainstem circuit” of the vocalization network and its modulatory cortical input (ACC), the dotted lines the strong connections of sensory cortical regions (AC, VC) and motivation-controlling limbic structures (Ac, Hy, Se, St) to this circuit.

Key: ACC = Anterior cingulate cortex; AC = Auditory cortex; Ac = Nucleus accumbens; Hy = Hypothalamus; LRF = Lateral reticular formation; NRA = Nucleus retroambigualis; PAG = periaqueductal gray; PB = brachium pontis; SC = superior colliculus; Se = Septum; St = Nucleus stria terminalis; VC = Visual cortex (Unpublished figure. See Jürgens Reference Jürgens2002b and Hage Reference Hage and Brudzynski2010a; Reference Hage and Brudzynski2010b for further details).

2.2.2. Mesiofrontal cortex and higher-order aspects of vocal behavior

Electrical stimulation studies revealed that both New and Old World monkeys possess a “cingulate vocalization region” within the anterior cingulate cortex (ACC), adjacent to the anterior pole of the corpus callosum (Jürgens Reference Jürgens2002b; Smith Reference Smith1945; Vogt & Barbas Reference Vogt, Barbas and Newman1988). Uni- and bilateral ACC ablation in macaques had, however, a minor and inconsistent impact on spontaneously uttered coo calls, but disrupted the vocalizations produced in response to an operant-conditioning task (Sutton et al. Reference Sutton, Larson and Lindeman1974; Trachy et al. Reference Trachy, Sutton and Lindeman1981). Furthermore, damage to preSMA – a cortical area neighboring the ACC in dorsal direction and located rostral to the supplementary motor area (SMA proper) – resulted in significantly prolonged response latencies (Sutton et al. Reference Sutton, Trachy and Lindeman1985). Comparable lesions in squirrel monkeys diminish the rate of spontaneous isolation peeps, but the acoustic structure of the produced calls remains undistorted (Kirzinger & Jürgens Reference Kirzinger and Jürgens1982). As a consequence, mesiofrontal cerebral structures appear to predominantly mediate calls driven by an animal's internal motivational milieu.

2.2.3. Ventrolateral frontal lobe and corticobulbar system

Both squirrel and rhesus monkeys possess a neocortical representation of internal and external laryngeal muscles in the ventrolateral part of premotor cortex, bordering areas associated with orofacial structures, namely, tongue, lips, and jaw (Fig. 1 in Hast et al. Reference Hast, Fischer, Wetzel and Thompson1974; Jürgens Reference Jürgens1974; Simonyan & Jürgens Reference Simonyan and Jürgens2002; Reference Simonyan and Jürgens2005). Furthermore, vocalization-selective neuronal activity may arise at the level of the premotor cortex in macaques that are trained to respond with coo calls to food rewards (Coudé et al. Reference Coudé, Ferrari, Rodà, Maranesi, Borelli, Veroni, Monti, Rozzi and Fogassi2011). Interestingly, premotor neural firing appears to occur only when the animals produce vocalizations in a specific learned context of food reward, but not under other conditions. Finally, a cytoarchitectonic homologue to Broca's area of our species has been found between the lower branch of the arcuate sulcus and the subcentral dimple just above the Sylvian fissure in Old World monkeys (Gil-da-Costa et al. Reference Gil-da-Costa, Martin, Lopes, Muňoz, Fritz and Braun2006; Petrides & Pandya Reference Petrides and Pandya2009; Petrides et al. Reference Petrides, Cadoret and Mackey2005) and chimpanzees (Sherwood et al. Reference Sherwood, Broadfield, Holloway, Gannon and Hof2003). Nevertheless, even bilateral damage to the ventrolateral aspects of the frontal lobes has no significant impact on the vocal behavior of monkeys (P. G. Aitken Reference Aitken1981; Jürgens et al. Reference Jürgens, Kirzinger and von Cramon1982; Myers Reference Myers, Harnad, Steklis and Lancaster1976; Sutton et al. Reference Sutton, Larson and Lindeman1974). Electrical stimulation of these areas in nonhuman primates also failed to elicit overt acoustic responses, apart from a few instances of “slight grunts” obtained from chimpanzees (Bailey et al. Reference Bailey, von Bonin and McCulloch1950, pp. 334f, 355f). Therefore, spontaneous call production, at least, does not critically depend on the integrity of the cortical larynx representation (Ghazanfar & Rendall Reference Ghazanfar and Rendall2008; Simonyan & Jürgens Reference Simonyan and Jürgens2005). Most likely, however, experimental lesions have not included the full extent or even the bulk of the Broca homologue of nonhuman primates as determined by recent cytoarchitectonic studies (Fig. 4 in Aitken Reference Aitken1981; Fig. 1 in Sutton et al. Reference Sutton, Larson and Lindeman1974). The role of this area in the control of vocal behavior in monkeys still remains to be clarified. Nonhuman primates appear endowed with a more elaborate cerebral organization of orofacial musculature as compared to the larynx, which, presumably, provides the basis for their relatively advanced orofacial imitation capabilities (Morecraft et al. Reference Morecraft, Louie, Herrick and Stilwell-Morecraft2001). As concerns the basal ganglia and the cerebellum, the lesion and stimulation studies available so far do not provide reliable evidence for a participation of these structures in the control of motor aspects of vocal behavior (Kirzinger Reference Kirzinger1985; Larson et al. Reference Larson, Sutton and Lindeman1978; Robinson Reference Robinson1967).

Prosimians and New World monkeys are endowed solely with polysynaptic corticobulbar projections to lower brain-stem motoneurons (Sherwood Reference Sherwood2005; Sherwood et al. Reference Sherwood, Hof, Holloway, Semendeferi, Gannon, Frahm and Zilles2005). By contrast, morphological and neurophysiological studies revealed direct connections of the precentral gyrus of Old World monkeys and chimpanzees to the cranial nerve nuclei engaged in the innervation of orofacial muscles (Jürgens & Alipour Reference Jürgens and Alipour2002; Kuypers Reference Kuypers1958b; Morecraft et al. Reference Morecraft, Louie, Herrick and Stilwell-Morecraft2001) which, together with the aforementioned more elaborate cortical representation of orofacial structures, may contribute to the enhanced facial-expressive capabilities of anthropoid primates (Sherwood et al. Reference Sherwood, Hof, Holloway, Semendeferi, Gannon, Frahm and Zilles2005). Most importantly, the direct connections between motor cortex and nucleus (nu.) ambiguus appear restricted, even in chimpanzees, to a few fibers targeting its most rostral component (Kuypers Reference Kuypers1958b), subserving the innervation of pharyngeal muscles via the ninth cranial nerve (Butler & Hodos Reference Butler and Hodos2005). By contrast, humans exhibit considerably more extensive monosynaptic cortical input to the motoneurons engaged in the innervation of the larynx – though still less dense than the projections to the facial and hypoglossal nuclei (Iwatsubo et al. Reference Iwatsubo, Kuzuhara, Kanemitsu, Shimada and Toyokura1990; Kuypers Reference Kuypers1958a). In addition, functional imaging data point to a primary motor representation of human internal laryngeal muscles adjacent to the lips of the homunculus and spatially separated from the frontal larynx region of New and Old World monkeys (Brown et al. Reference Brown, Ngan and Liotti2008; Reference Brown, Laird, Pfordresher, Thelen, Turkeltaub and Liotti2009; Bouchard et al. Reference Bouchard, Mesgarani, Johnson and Chang2013). As a consequence, thus, the monosynaptic elaboration of corticobulbar tracts during hominin evolution might have been associated with a refinement of vocal tract motor control at the cortical level (“Kuypers/Jürgens hypothesis”; Fitch et al. Reference Fitch, Huber and Bugnyar2010).Footnote 5

2.3. Summary: Behavioral and neuroanatomic constraints of acoustic communication in nonhuman primates

The cerebral network controlling acoustic call structure in nonhuman primates centers around midbrain PAG (vocalization trigger) and a pontine vocal pattern generator (coordination of the muscles subserving call production). Furthermore, mesiofrontal cortex (ACC/adjacent preSMA) engages in higher-order aspects of vocal behavior such as conditioned responses. These circuits, apparently, do not allow for a decoupling of vocal fold motor activity from species-typical audio-visual displays (Knight Reference Knight, Dunbar, Knight and Power1999). The resulting inability to combine laryngeal and orofacial gestures into novel movement sequences appears to preclude nonhuman primates from mastering even the simplest speech-like utterances, despite extensive vocal repertoires and a high versatility of their lips and tongue. At best, modification of acoustic call structure is restricted to the “variability space” of innate call inventories, bound to motivational or hedonistic triggers, and confined to intonational, durational, and loudness parameters, that is, signal properties homologous to prosodic aspects of human spoken language.

3. Contributions of the primate-general “limbic communicating brain” to human vocal behavior

The dual-pathway model of human acoustic communication predicts the “limbic communication system” of the brain of nonhuman primates to support the production of affective vocalizations such as laughing, crying, and moaning in our species. In addition, this network might engage in the emotive-prosodic modulation of spoken language. More specifically, ACC and/or PAG could provide a platform for the addition of graded, that is, analogue information on a speaker's motivational states and intentional composure to the speech signal. This suggestion has so far not been thoroughly tested against the available clinical data.

3.1. Brainstem mechanisms of speech production

Ultimately, all cerebral control mechanisms steering vocal tract movements converge on the same set of cranial nerve nuclei. Damage to this final common pathway, therefore, must disrupt both verbal and nonverbal aspects of human acoustic communication. By contrast, clinical observations in patients with bilateral lesions of the fronto-parietal operculum and/or the adjacent white matter point at the existence of separate voluntary and emotional motor systems at the supranuclear level (Groswasser et al. Reference Groswasser, Korn, Groswasser-Reider and Solzi1988; Mao et al. Reference Mao, Coull, Golper and Rau1989). However, these data do not further specify the course of the “affective-vocal motor system” and, more specifically, the role of the PAG, a major component of the primate-general “limbic communication system” (Lamendella Reference Lamendella, Whitaker and Whitaker1977).

According to the dual-pathway model, the cerebral network supporting affective aspects of acoustic communication in our species must include the PAG, but bypass the corticobulbar tracts engaged in articulate speech. Isolated damage to this midbrain structure, thus, should selectively compromise the vocal expression of emotional/motivational states and spare the sound structure of verbal utterances. Yet, lesion data – though still sparse – are at variance with this suggestion. Acquired midbrain lesions restricted to the PAG completely interrupt both channels of acoustic communication, giving rise to the syndrome of akinetic mutism (Esposito et al. Reference Esposito, Demeurisse, Alberti and Fabbro1999). Moreover, comparative electromyographic (EMG) data obtained from cats and humans also indicate that the sound production circuitry of the PAG is recruited not only for nonverbal affective vocalizations, but also during speaking (Davis et al. Reference Davis, Zhang, Winkworth and Bandler1996; Zhang et al. Reference Zhang, Davis, Bandler and Carrive1994). Likewise, a more recent positron emission tomography (PET) study revealed significant activation of this midbrain component during talking in a voiced as compared to a whispered speaking mode (Schulz et al. Reference Schulz, Varga, Jeffires, Ludlow and Braun2005).

Conceivably, the PAG contributes to the recruitment of central pattern generators of the brainstem. Besides the control of stereotyped behavioral activities such as breathing, chewing, swallowing, or yawning, these oscillatory mechanisms might, eventually, be entrained by superordinate functional systems as well (Grillner Reference Grillner1991; Grillner & Wallén Reference Grillner, Wallén, Mori, Stuart and Wiesendanger2004). During speech production, such brainstem networks could be instrumental in the regulation of highly adaptive sensorimotor operations during the course of verbal utterances. Examples include the control of inspiratory and expiratory muscle activation patterns in response to continuously changing biomechanical forces and the regulation of vocal fold tension following subtle alterations of subglottal pressure (see, e.g., Lund & Kolta Reference Lund and Kolta2006). From this perspective, damage to the PAG would interrupt the recruitment of basic adaptive brainstem mechanisms relevant for speech production and, ultimately, cause mutism. However, the crucial assumption of this explanatory model – spoken language engages phylogenetically older, though eventually reorganized, brainstem circuits – remains to be substantiated (Moore Reference Moore, Maassen, Kent, Peters, van Lieshout and Hulstijn2004; Schulz et al. Reference Schulz, Varga, Jeffires, Ludlow and Braun2005; Smith Reference Smith, Hardcastle, Laver and Gibbon2010).

3.2. Recruitment of mesiofrontal cortex during verbal communication

3.2.1. Anterior cingulate cortex (ACC)

There is some evidence that, similar to subhuman primates, the ACC is a mediator of emotional/motivational acoustic expression in humans as well (see sect. 2.2.2). A clinical example is frontal lobe epilepsy, a syndrome characterized by involuntary and stereotyped bursts of laughter (“gelastic seizures”; Wild et al. Reference Wild, Rodden, Grodd and Ruch2003) that lack any concomitant adequate emotions (Arroyo et al. Reference Arroyo, Lesser, Gordon, Uematsu, Hart, Schwerdt, Andreasson and Fisher1993; Chassagnon et al. Reference Chassagnon, Minotti, Kremer, Verceuil, Hoffmann, Benabid and Kahane2003; Iannetti et al. Reference Iannetti, Spalice, Raucci, Atzei and Cipriani1997; Iwasa et al. Reference Iwasa, Shibata, Mine, Koseki, Yasuda, Kasagi, Okada, Yabe, Kaneko and Nakajima2002). The cingulate gyrus appears to be the most commonly disrupted site based on lesion surveys of gelastic seizure patients (Kovac et al. Reference Kovac, Deppe, Mohammadi, Schiffbauer, Schwindt, Möddel, Dogan and Evers2009). This suggestion was further corroborated by a recent case study in which electrical stimulation of the right-hemisphere ACC rostral to the genu of the corpus callosum elicited uncontrollable, but natural-sounding laughter – in the absence of merriment (Sperli et al. Reference Sperli, Spinelli, Pollo and Seeck2006). Conceivably, a homologue of the vocalization center of nonhuman primates bound to rostral ACC may underlie stereotyped motor patterns associated with emotional vocalizations in humans.

Does the ACC participate in speaking as well? Based on an early PET study, “two distinct speech-related regions in the human anterior cingulate cortex” were proposed, the more anterior of which was considered to be homologous to the cingulate vocalization center of nonhuman primates (Paus et al. Reference Paus, Tomaiuolo, Otaky, MacDonald, Petrides, Atlas, Morris and Evans1996, p. 213). A recent and more focused functional imaging experiment by Loucks et al. (Reference Loucks, Poletto, Simonyan, Reynolds and Ludlow2007) failed to substantiate this claim. However, this investigation was based on rather artificial phonation tasks involving prolonged and repetitive vowel productions which do not allow for an evaluation of the specific role of the ACC in the mediation of emotional aspects of speaking. In another study, Schulz et al. (Reference Schulz, Varga, Jeffires, Ludlow and Braun2005) required participants to recount a story in a voiced and a whispered speaking mode and demonstrated enhanced hemodynamic activation during the voiced condition in a region homologous to the cingulate vocalization center, but much larger responses emerged in contiguous neocortical areas of medial prefrontal cortex. It remains unclear, however, how the observed activation differences between voiced and whispered utterances should be interpreted, since both of these phonation modes require specific laryngeal muscle activity. One investigation explicitly aimed at a further elucidation of the role of medial prefrontal cortex in motivational aspects of speech production by analyzing the covariation of induced emotive prosody with blood oxygen level dependent (BOLD) signal changes as measured by functional magnetic resonance imaging (fMRI; Barrett et al. Reference Barrett, Pike and Paus2004). Affect-related pitch variation was found to be associated with supracallosal rather than pregeniculate hemodynamic activation. However, the observed response modulation may have been related to changes in the induced emotional states rather than pitch control. On the whole, the available functional imaging data do not provide conclusive support for the hypothesis that the prosodic modulation of verbal utterances critically depends on the ACC.

The results of lesion studies are similarly inconclusive. Bilateral ACC damage due to cerebrovascular disorders or tumours has been reported to cause a syndrome of akinetic mutism (Brown Reference Brown and Newman1988; for a review, see Ackermann & Ziegler Reference Ackermann and Ziegler1995). Early case studies found the behavioral deficits to extend beyond verbal and nonverbal acoustic communication: Apparently vigilant subjects with normal muscle tone and deep tendon reflexes displayed diminished or abolished spontaneous body movements, delayed or absent reactions to external stimuli, and impaired autonomic functions (e.g., Barris & Schuman Reference Barris and Schuman1953). By contrast, bilateral surgical resection of the ACC (cingulectomy), performed most often in patients suffering from medically intractable pain or psychiatric diseases, failed to significantly compromise acoustic communication (Brotis et al. Reference Brotis, Kapsalaki, Paterakis, Smith and Fountas2009). The complex functional-neuroanatomic architecture of the anterior mesiofrontal cortex hampers, however, any straightforward interpretation of these clinical data. In monkeys, the cingulate sulcus encompasses two or even three distinct “cingulate motor areas” (CMAs), which project to the supplementary motor area (SMA), among other regions (Dum & Strick Reference Dum and Strick2002; Morecraft & van Hoesen Reference Morecraft and van Hoesen1992; Morecraft et al. Reference Morecraft, Louie, Herrick and Stilwell-Morecraft2001). Humans exhibit a similar compartmentalization of the medial wall of the frontal lobes (Fink et al. Reference Fink, Frackowiak, Pietrzyk and Passingham1997; Picard & Strick Reference Picard and Strick1996). A closer look at the aforementioned surgical data reveals that bilateral cingulectomy for treatment of psychiatric disorders, as a rule, did not encroach on caudal ACC (Le Beau Reference Le Beau1954; Whitty Reference Whitty1955; for a review, see Brotis et al. Reference Brotis, Kapsalaki, Paterakis, Smith and Fountas2009, p. 276). Thus, tissue removal restricted to rostral ACC components could explain the relatively minor effects of this surgical approach.Footnote 6 Conceivably, mesiofrontal akinetic mutism reflects bilateral damage to the caudal CMA and/or its efferent projections, rather than dysfunction of a “cingulate vocalization center” bound to rostral ACC. Instead, the anterior mesiofrontal cortex has been assumed to contribute to reward-dependent selection/inhibition of verbal responses in conflict situations rather than to motor aspects of speaking (Calzavara et al. Reference Calzavara, Mailly and Haber2007; Paus Reference Paus2001). This interpretation is compatible with the fact that psychiatric conditions bound to ACC pathology such as obsessive-compulsive disorder or Tourette syndrome cause, among other things, socially inappropriate vocal behavior (Müller-Vahl et al. Reference Müller-Vahl, Kaufmann, Grosskreutz, Dengler, Emrich and Peschel2009; Radua et al. Reference Radua, van den Heuvel, Surguladze and Mataix-Cols2010; Seeley Reference Seeley2008).

3.2.2. Supplementary motor area (SMA)

Damage to the SMA in the language-dominant hemisphere may give rise to diminished spontaneous speech production, characterized by delayed, brief, and dysfluent, but otherwise well-articulated verbal responses without any central-motor disorders of vocal tract muscles or impairments of other language functions such as speech comprehension or reading aloud (“transcortical motor aphasia”; for a review of the earlier literature, see Jonas Reference Jonas1981; Reference Jonas and Perecman1987; more recent case studies in Ackermann et al. Reference Ackermann, Hertrich, Ziegler, Bitzer and Bien1996 and Ziegler et al. Reference Ziegler, Kilian and Deger1997).Footnote 7 This constellation may arise from initial mutism via an intermediate stage of silent word mouthing (Rubens Reference Rubens1975) or whispered speaking (Jürgens & von Cramon Reference Jürgens and von Cramon1982; Masdeu et al. Reference Masdeu, Schoene and Funkenstein1978; Watson et al. Reference Watson, Fleet, Gonzalez-Rothi and Heilman1986). Based on these clinical observations, the SMA, apparently, supports the initiation (“starting mechanism”) and maintenance of vocal tract activities during speech production (Botez & Barbeau Reference Botez and Barbeau1971; Jonas Reference Jonas1981). Indeed, movement-related potentials preceding self-paced tongue protrusions and vocalizations were recorded over the SMA (Bereitschaftspotential; Ikeda et al. Reference Ikeda, Lüders, Burgess and Shibasaki1992). Calculation of the time course of BOLD signal changes during syllable repetition tasks, preceded by a warning stimulus, revealed an earlier peak of the SMA response relative to primary sensorimotor cortex (Brendel et al. Reference Brendel, Hertrich, Erb, Lindner, Riecker, Grodd and Ackermann2010). These data corroborate the suggestion – based on clinical data – of an engagement of the SMA in the preparation and initiation of verbal utterances, that is, pre-articulatory control processes.

3.3. Summary: Role of the primate-general “limbic communication system” in human vocal behavior

In line with the dual-pathway model of human acoustic communication, the ACC seems to participate in the release of stereotyped motor patterns of affective-vocal displays, even in the absence of an adequate emotional state. Whether this mesiofrontal area also contributes to the control of laryngeal muscles during speech production still remains to be established. An adjacent region, the neocortical SMA, appears, however, to participate in the preparation and initiation of articulate speech. Midbrain PAG also supports spoken language and, presumably, helps to recruit ancient brainstem circuitries which have been reorganized to subserve basic adaptive sensorimotor functions bound to verbal behavior.

4. Contribution of the basal ganglia to spoken language: Vocal-affective expression and acquisition of articulate speech

The basal ganglia represent an ensemble of subcortical gray matter structures of a rather conserved connectional architecture across vertebrate taxa, including the striatum (caudate nucleus and putamen), the external and internal segments of the globus pallidus, the subthalamic nucleus, and the substantia nigra (Butler & Hodos Reference Butler and Hodos2005; Nieuwenhuys et al. Reference Nieuwenhuys, Voogd and van Huijzen2008). Clinical and functional imaging data indicate a significant engagement of the striatum both in ontogenetic speech acquisition and subsequent overlearned speech motor control. We propose, however, a fundamentally different role of the basal ganglia at these two developmental stages: The entrainment of articulatory vocal tract motor patterns during childhood versus the emotive-prosodic modulation of verbal utterances in the adult motor system.

4.1. Facets of the faculty of speaking: The recruitment of the larynx as an articulatory organ

The production of spoken language depends upon “more muscle fibers than any other human mechanical performance” (Kent et al. Reference Kent, Kent, Weismer and Duffy2000, p. 273), and the responsible neural control mechanisms must steer all components of this complex action system at a high spatial and temporal accuracy. As a basic constituent, the larynx – a highly efficient sound source – generates harmonic signals whose spectral shape can be modified through movements of the mandible, tongue, and lips (Figs. 2A & 2B). Yet, this physical source-filter principle is not exclusively bound to human speech, but characterizes the vocal behavior of other mammals as well (Fitch Reference Fitch2000a). By contrast to the acoustic communication of nonhuman primates, spoken language depends, however, on a highly articulated larynx whose motor activities must be integrated with the gestures of equally articulated supralaryngeal structures into learned complex vocal tract movement patterns (Fig. 2C). For example, virtually all languages of the world differentiate between voiced and voiceless sounds (e.g., /b/ vs. /p/ or /d/ vs. /t/), a distinction which requires fast and precise laryngeal manoeuvres and a close interaction of the larynx – at a time-scale of tens of milliseconds – with the tongue or lips (Hirose Reference Hirose, Hardcastle, Laver and Gibbon2010; Munhall & Löfqvist Reference Munhall and Löfqvist1992; Weismer Reference Weismer1980). During voiced portions, moreover, the melodic line of the speech signal is modulated in a language-specific meaningful way to implement the intonation patterns inherent to a speaker's native idiom or, in tone languages such as Mandarin, to create different tonal variants of spoken syllables.

Figure 2. Vocal tract mechanisms of speech sound production.

A. Source-filter theory of speech production (Fant Reference Fant1970). Modulation of expiratory air flow at the levels of the vocal folds and supralaryngeal structures (pharynx, velum, tongue, and lips) gives rise to most speech sounds across human languages (Ladefoged Reference Ladefoged2005). In case of vowels and voiced consonants, the adducted vocal folds generate a laryngeal source signal with a harmonic spectrum U(s), which is then filtered by the resonance characteristics of the supralaryngeal cavities T(s) and the vocal tract radiation function R(s). As a consequence, these sounds encompass distinct patterns of peaks and troughs (formant structure; P(s)) across their spectral energy distribution.

B. Consonants are produced by constricting the vocal tract at distinct locations (a), for example, through occlusion of the oral cavity at the alveolar ridge of the upper jaw by the tongue tip for /d/, /t/, or /n/ (insert of left panel: T/B=tip/body of the tongue, U/L = upper/lower lips, J = lower jaw with teeth). Such manoeuvres give rise to distinct up- and downward shifts of formants: Right panels show the formant transients of /da/ as a spectrogram (b) and a schematic display (c); dashed lines indicate formant transients of syllable /ba/ (figures adapted from Kent & Read Reference Kent and Read2002).

C. Schematic display of the gestural architecture of articulate speech, exemplified for the word speaking. Consonant articulation is based on distinct movements of lips, tongue, velum, and vocal folds, phase-locked to more global and slower deformations of the vocal tract (VT) associated with vowel production. Articulatory gestures are assorted into syllabic units, and gesture bundles pertaining to strong and weak syllables are rhythmically patterned to form metrical feet. Note that laryngeal activity in terms of glottal opening movements (bottom line) is a crucial part of the gestural patterning of spoken words and must be adjusted to and sequenced with other vocal tract movements in a precise manner (Ziegler Reference Ziegler, Lowit and Kent2010).

Figure 3. Structural and functional compartmentalization of the basal ganglia.

A. Schematic illustration of the – at least – tripartite functional subdivision of the cortico-basal ganglia–thalamo–cortical circuitry. Motor, cognitive/associative, and limbic loops are depicted in different gray shades, and the two cross-sections of the striatum (center) delineate the limbic, cognitive/associative, and motor compartments of the basal ganglia input nuclei. Alternating reciprocal (e.g., 1–1) and non-reciprocal loops (e.g., subsequent trajectory 2) form a spiraling cascade of dopaminergic projections interconnecting these parallel reentrant circuits (modified Fig. 2.3.5. from Haber Reference Haber, Iversen, Iversen, Dunnett and Björklund2010b).

B. Within the basal ganglia, the motor loop segregates into at least three pathways: a direct (striatum – SNr/GPi), an indirect (striatum – GPe – SNr/GPi), and a hyperdirect (via STN) circuit (based on Fig. 1 in Nambu Reference Nambu2011 and Fig. 25.1 in Walters & Bergstrom Reference Walters, Bergstrom, Steiner and Tseng2010). The direct and indirect medium-sized spiny projection neurons of the striatum (MSN) differ in their patterns of receptor and peptide expression (direct pathway: D1-type dopamine receptors, SP = substance P; indirect pathway: D2, ENK = enkephalin) rather than their somatodendritic architecture.

Key: DA = dopamine; GPi/GPe = internal/external segment of globus pallidus; SNr = substantia nigra, pars reticulata; SNc = substantia nigra, pars compacta; VTA = ventral tegmental area; STN = subthalamic nucleus; SC = superior colliculus; PPN = pedunculopontine nucleus; PAG = periaqueductal gray.

Figure 4. Cerebral network supporting the integration of primate-general (gray arrows) and human-specific aspects of acoustic communication (black).

Clinical and functional-imaging observations indicate the “motor execution level” of speech production, that is, the adjustment of speed and range of coordinated vocal tract gestures, to depend upon lower primary sensorimotor cortex and its efferent pathways, the cranial nerve nuclei, the thalamus, the cerebellum – and the basal ganglia (Ackermann & Ziegler Reference Ackermann, Ziegler, Hardcastle, Laver and Gibbon2010; Ackermann & Riecker Reference Ackermann, Riecker, Maassen and van Lieshout2010a; Ackermann et al. Reference Ackermann, Hertrich, Ziegler, Damico, Müller and Ball2010). More specifically, distributed and overlapping representations of the lips, tongue, jaw, and larynx within the ventral sensorimotor cortex of the dominant hemisphere generate, during speech production, dynamic activation patterns reflecting the gestural organization of spoken syllables (Bouchard et al. Reference Bouchard, Mesgarani, Johnson and Chang2013). Furthermore, it is assumed that the left anterior peri- and subsylvian cortex houses hierarchically “higher” speech-motor-planning information in the adult brain required to orchestrate the motor execution organs during the production of syllables and words (see Fig. 2C for an illustration; Ziegler Reference Ziegler, Goldenberg and Miller2008; Ziegler et al. Reference Ziegler, Aichert and Staiger2012). Hence, ontogenetic speech acquisition can be understood as a long-term entrainment of patterned activities of the vocal tract organs and – based upon practice-related plasticity mechanisms – the formation of a speech motor network which subserves this motor skill with ease and precision. In the following sections we argue that the basal ganglia play a key role in this motor-learning process and in the progressive assembly of laryngeal and supralaryngeal gestures into “motor plans” for syllables and words. In the mature system, this “motor knowledge” gets stored within ventrolateral aspects of the left-hemisphere frontal lobe, while the basal ganglia are, by and large, restricted to a fundamentally different role, that is, the mediation of motivational and emotional-affective drive into the speech motor system.

4.2. Developmental shifts in the contribution of the basal ganglia to speech production

4.2.1. The impact of pre- and perinatal striatal dysfunctions on spoken language

Insight into the potential contributions of the basal ganglia to human speech acquisition can be obtained from damage to these nuclei at a prelinguistic age. Distinct mutations of mitochondrial or nuclear DNA may give rise to infantile bilateral striatal necrosis, a constellation largely restricted to this basal ganglia component (Basel-Vanagaite et al. Reference Basel-Vanagaite, Muncher, Straussberg, Pasmanik-Chor, Yahav, Rainshtein, Walsh, Magal, Taub, Drasinover, Shalev, Attia, Rechavi, Simon and Shohat2006; De Meirleir et al. Reference De Meirleir, Seneca, Lissens, Schoentjes and Desprechins1995; Kim et al. Reference Kim, Ki and Park2010; Solano et al. Reference Solano, Roig, Vives-Bauza, Hernandez-Peña, Garcia-Arumi, Playan, Lopez-Perez, Andreu and Montoya2003; Thyagarajan et al. Reference Thyagarajan, Shanske, Vazquez-Memije, DeVivo and DiMauro1995). At least two variants, both of them point mutations of the mitochondrial ATPase 6 gene, were associated with impaired speech learning capabilities (De Meirleir et al. Reference De Meirleir, Seneca, Lissens, Schoentjes and Desprechins1995: “speech delayed for age”; Thyagarajan et al. Reference Thyagarajan, Shanske, Vazquez-Memije, DeVivo and DiMauro1995, case 1: “no useful language at age 3 years”). As a further clinical paradigm, birth asphyxia may predominantly impact the basal ganglia and the thalamus (eventually, in addition, the brainstem) under specific conditions such as uterine rupture or umbilical cord prolapse, while the cerebral cortex and the underlying white matter are less affected (Roland et al. Reference Roland, Poskitt, Rodriguez, Lupton and Hill1998). A clinical study found nine children out of a group of 17 subjects with this syndrome completely unable to produce any verbal utterances at the ages of 2 to 9 years (Krägeloh-Mann et al. Reference Krägeloh-Mann, Helber, Mader, Staudt, Wolff, Groenendaal and DeVries2002). Six further patients showed significantly compromised articulatory functions (“dysarthria”). Most importantly, five children had not mastered adequate articulate speech at the ages of 3 to 12 years, though lesions were confined to the putamen and ventro-lateral thalamus, sparing the caudate nucleus and the precentral gyrus.

Data from a severe developmental speech or language disorder of monogenic autosomal-dominant inheritance with full penetrance extending across several generations of a large family provide further evidence of a connection between the basal ganglia and ontogenetic speech acquisition (KE family; Hurst et al. Reference Hurst, Baraitser, Auger, Graham and Norell1990). At first considered a highly selective inability to acquire particular grammatical rules (Gopnik Reference Gopnik1990a; for more details, see Taylor Reference Taylor2009), extensive neuropsychological evaluations revealed a broader phenotype of psycholinguistic dysfunctions, including nonverbal aspects of intelligence (Vargha-Khadem & Passingham Reference Vargha-Khadem and Passingham1990; Vargha-Khadem et al. Reference Vargha-Khadem, Watkins, Alcock, Fletcher and Passingham1995; Watkins et al. Reference Watkins, Dronkers and Vargha-Khadem2002a). However, the most salient behavioral deficit in the afflicted individuals consists of pronounced abnormalities of speech articulation (“developmental verbal dyspraxia”) that render spoken language “of many of the affected members unintelligible to the naive listener” (Vargha-Khadem et al. Reference Vargha-Khadem, Watkins, Alcock, Fletcher and Passingham1995, p. 930; see also Fee Reference Fee1995; Shriberg et al. Reference Shriberg, Aram and Kwiatkowski1997). Furthermore, the speech disorder was found to compromise voluntary control of nonverbal vocal tract movements (Vargha-Khadem et al. Reference Vargha-Khadem, Gadian, Copp and Mishkin2005). More specifically, the phenotype includes a significant disruption of simultaneous or sequential sets of motor activities to command, in spite of a preserved motility of single vocal tract organs (Alcock et al. Reference Alcock, Passingham, Watkins and Vargha-Khadem2000a) and uncompromised reproduction of tones and melodies (Alcock et al. Reference Alcock, Passingham, Watkins and Vargha-Khadem2000b).

A heterozygous point mutation (G-to-A nucleotide transition) of the FOXP2 gene (located on chromosome 7; coding for a transcription factor) could be detected as the underlying cause of the behavioral disorder (for a review, see Fisher et al. Reference Fisher, Lai and Monaco2003).Footnote 8 Volumetric analyses of striatal nuclei revealed bilateral volume reduction in the afflicted family members, the extent of which was correlated with oral-motor impairments (Watkins et al. Reference Watkins, Vargha-Khadem, Ashburner, Passingham, Connelly, Friston, Frackowiak, Mishkin and Gadian2002b). Mice and humans share all but three amino acids in the FOXP2 protein, suggesting a high conservation of the respective gene across mammals (Enard et al. Reference Enard, Przeworski, Fisher, Lai, Wiebe, Kitano, Monaco and Pääbo2002; Zhang et al. Reference Zhang, Webb and Podlaha2002). Furthermore, two of the three substitutions must have emerged within our hominin ancestors after separation from the chimpanzee lineage. Since primates lacking the human FOXP2 variant cannot even imitate the simplest speech-like utterances, and since disruption of this gene in humans gives rise to severe articulatory deficits, it appears warranted to assume that the human variant of this gene locus represents a necessary prerequisite for the phylogenetic emergence of articulate speech. Most noteworthy, animal experimentation suggests that the human-specific copy of this gene is related to acoustic communication (Enard et al. Reference Enard, Gehre, Hammerschmidt, Hölter, Blass, Somel, Brückner, Schreiweis, Winter, Sohr, Becker, Wiebe, Nickel, Giger, Müller, Groszer, Adler, Aguilar, Bolle, Calzada-Wack, Dalke, Ehrhardt, Favor, Fuchs, Gailus-Durner, Hans, Hölzlwimmer, Javaheri, Kalaydjiev, Kallnik, Kling, Kunder, Mossbrugger, Naton, Racz, Rathkolb, Rozman, Schrewe, Busch, Graw, Ivandic, Klingenspor, Klopstock, Ollert, Quintanilla-Martinez, Schulz, Wolf, Wurst, Zimmer, Fisher, Morgenstern, Arendt, de Angelis, Fischer, Schwarz and Pääbo2009) and directly influences the dendritic architecture of the neurons embedded into cortico-basal ganglia–thalamo–cortical circuits (Reimers-Kipping et al. Reference Reimers-Kipping, Hevers, Pääbo and Enard2011, p. 82).

4.2.2. Motor aprosodia in Parkinson's disease

A loss of midbrain neurons within the substantia nigra pars compacta (SNc) represents the pathophysiological hallmark of Parkinson's disease (PD; idiopathic Parkinsonian syndrome), one of the most common neurodegenerative disorders (Evatt et al. Reference Evatt, DeLong, Vitek, Asbury, McKhann, McDonald and Goadsby2002; Wichmann & DeLong Reference Wichmann, DeLong, Koller and Melamed2007). This degenerative process results in a depletion of the neurotransmitter dopamine at the level of the striatum, rendering PD a model of dopaminergic dysfunction of the basal ganglia, characterized within the motor domain by akinesia (bradykinesia, hypokinesia), rigidity, tremor at rest, and postural instability (Jankovic Reference Jankovic2008; Marsden Reference Marsden1982). In advanced stages, functionally relevant morphological changes of striatal projection neurons may emerge (Deutch et al. Reference Deutch, Colbran and Winder2007; see Mallet et al. [Reference Mallet, Ballion, Le Moine and Gonon2006] for other nondopaminergic PD pathomechanisms). Recent studies suggest that the disease process develops first in extranigral brainstem regions such as the dorsal motor nucleus of the glossopharyngeal and vagal nerves (Braak et al. Reference Braak, Del Tredici, Rüb, de Vos, Jansen Steur and Braak2003). These initial lesions affect the autonomic-vegetative nervous system, but do not encroach on gray matter structures engaged in the control of vocal tract movements such as the nu. ambiguus.

A classical tenet of speech pathology assumes that Parkinsonian speech/voice abnormalities reflect specific motor dysfunctions of vocal tract structures, giving rise to slowed and undershooting articulatory movements (brady-/hypokinesia). From this perspective, the perceived speech abnormalities of Parkinson's patients have been lumped together into a syndrome termed “hypokinetic dysarthria” (Duffy Reference Duffy2005). Unlike in other cerebral disorders, systematic auditory-perceptual studies and acoustic measurements identified laryngeal signs such as monotonous pitch, reduced loudness, and breathy/harsh voice quality as the most salient abnormalities in PD (Logemann et al. Reference Logemann, Fisher, Boshes and Blonsky1978; Ho et al. Reference Ho, Bradshaw, Iansek and Alfredson1999a; Reference Ho, Iansek and Bradshaw1999b; Skodda et al. Reference Skodda, Rinsche and Schlegel2009; Reference Skodda, Grönheit and Schlegel2011).Footnote 9 Imprecise articulation appears, by contrast, to be bound to later stages of the disease. In line with these suggestions, attempts to document impaired orofacial movement execution, especially, hypometric (“undershooting”) gestures during speech production, yielded inconsistent results (Ackermann et al. Reference Ackermann, Hertrich, Daum, Scharf and Spieker1997a). Moreover, a retrospective study based on a large sample of postmortem-confirmed cases found that PD patients predominantly display “hypophonic/monotonous speech,” whereas atypical Parkinsonian disorders (APDs) such as multiple system atrophy or progressive supranuclear palsy result in “imprecise or slurred articulation” (Müller et al. Reference Müller, Wenning, Verny, McKee, Chaudhuri, Jellinger, Poewe and Litvan2001). As a consequence, Müller et al. assume the articulatory deficits of APD to reflect non-dopaminergic dysfunctions of brainstem or cerebellar structures.

Much like early PD, ischemic infarctions restricted to the putamen primarily give rise to hypophonia as the most salient speech motor disorder (Giroud et al. Reference Giroud, Lemesle, Madinier, Billiar and Dumas1997). In its extreme, a more or less complete loss of prosodic modulation of verbal utterances (“expressive or motor aprosodia”) has been observed following cerebrovascular damage to the basal ganglia (Cohen et al. Reference Cohen, Riccio and Flannery1994; Van Lancker Sidtis et al. Reference Van Lancker Sidtis, Pachana, Cummings and Sidtis2006).Footnote 10 These specific aspects of speech motor disorders in PD or after striatal infarctions suggest a unique role of the basal ganglia in supporting spoken language production in that the resulting dysarthria might primarily reflect a diminished impact of motivational, affective/emotional, and attitudinal states on the execution of speech movements, leading to diminished motor activity at the laryngeal rather than the supralaryngeal level. Similar to other motor domains, thus, the degree of speech deficits in PD appears sensitive to “the emotional state of the patient” (Jankovic Reference Jankovic2008), which, among other things, provides a physiological basis for motivation-related approaches to therapeutic regimens such as the Lee Silverman Voice Treatment (LSVT; Ramig et al. Reference Ramig, Fox and Sapir2004; Reference Ramig, Fox, Sapir, Koller and Melamed2007). This general loss of “motor drive” at the level of the speech motor system and the predominant disruption of emotive speech prosody suggest that the intrusion of emotional/affective tone into the volitional motor mechanisms of speaking depends on a dopaminergic striatal “limbic-motor interface” (Mogenson et al. Reference Mogenson, Jones and Yim1980).

4.3. Dual contribution of the striatum to spoken language: A neurophysiological model

4.3.1. Dopamine-dependent interactions between the limbic and motor loops of the basal ganglia during mature speech production

In mammals, nearly all cortical areas as well as several thalamic nuclei send excitatory, glutamatergic afferents to the striatum. This major input structure of the basal ganglia is assumed to segregate into the caudate-putamen complex, the ventral striatum with the nucleus accumbens as its major constituent, and the striatal elements of the olfactory tubercle (e.g., Voorn et al. Reference Voorn, Vanderschuren, Groenewegen, Robbins and Pennartz2004). Animal experimentation shows these basal ganglia subcomponents to be embedded into a series of parallel reentrant cortico-subcortico-cortical loops (Fig. 3A; Alexander et al. Reference Alexander, Crutcher, DeLong, Uylings, Eden, de Bruin, Corner and Feenstra1990; DeLong & Wichmann Reference DeLong and Wichmann2007; Nakano Reference Nakano2000). Several frontal zones, including primary motor cortex, SMA, and lateral premotor areas, target the putamen, which then projects back via basal ganglia output nuclei and thalamic relay stations to the respective areas of origin (motor circuit). By contrast, cognitive functions relate primarily to connections of prefrontal cortex with the caudate nucleus, and affective states to limbic components of the basal ganglia (ventral striatum). Functional imaging data obtained in humans are consistent with such an at least tripartite division of the basal ganglia (Postuma & Dagher Reference Postuma and Dagher2006) and point to a distinct representation of foot, hand, face, and eye movements within the motor circuit (Gerardin et al. Reference Gerardin, Lehéricy, Pochon, Tézenas du Montcel, Mangin, Poupon, Agid, Le Bihan and Marsault2003). Furthermore, the second basal ganglia output nucleus, the substantia nigra pars reticulata (SNr), projects to several hindbrain “motor centers,” for example, PAG, giving rise to several phylogenetically old subcortical basal ganglia–brainstem–thalamic circuits (McHaffie et al. Reference McHaffie, Stanford, Stein, Coizet and Redgrave2005). A brainstem loop traversing the PAG could participate in the recruitment of phylogenetically ancient vocal brainstem mechanisms during speech production (see sect. 3.1; Hikosaka Reference Hikosaka, Tepper, Abercrombie and Bolam2007).

The suggestion of parallel cortico-basal ganglia–thalamo–cortical circuits does not necessarily imply strict segregation of information flow. To the contrary, connectional links between these networks are assumed to be a basis for integrative data processing (Joel & Weiner Reference Joel and Weiner1994; Nambu Reference Nambu2011; Parent & Hazrati Reference Parent and Hazrati1995). More specifically, antero- and retrograde fiber tracking techniques reveal a cascade of spiraling striato-nigro-striatal circuits, extending from ventromedial (limbic) via central (cognitive-associative) to dorsolateral (motor) components of the striatum (Fig. 3A; e.g., Haber et al. Reference Haber, Fudge and McFarland2000; for reviews, see Haber Reference Haber, Steiner and Tseng2010a; Reference Haber, Iversen, Iversen, Dunnett and Björklund2010b). This dopamine-dependent “cascading interconnectivity” provides a platform for a cross-talk between the different basal ganglia loops and may, therefore, allow emotional/motivational states to impact behavioral responses, including the affective-prosodic shaping of the sound structure of verbal utterances.

The massive cortico- and thalamostriatal glutamatergic (excitatory) projections to the basal ganglia input structures target the GABAergic (inhibitory) medium-sized spiny projection neurons (MSN) of the striatum. MSNs comprise roughly 95% of all the striatal cellular elements. Upon leaving the striatum, the axons of these neurons connect via either the “direct pathway” or the “indirect pathway” to the output nuclei of the basal ganglia (Fig. 3B; Albin et al. Reference Albin, Young and Penney1989; for a recent review, see Gerfen & Surmeier Reference Gerfen and Surmeier2011; for critical comments, see, e.g., Graybiel Reference Graybiel2005; Nambu Reference Nambu2008). In addition, several classes of interneurons and dopaminergic projection neurons impact the MSNs. Dopamine has a modulatory effect on the responsiveness of these cells to glutamatergic input, depending on the receptor subtype involved (David et al. Reference David, Ansseau and Abraini2005; Surmeier et al. Reference Surmeier, Day, Gertler, Chan, Shen, Steiner and Tseng2010a; Reference Surmeier, Day, Gertler, Chan, Shen, Iversen, Iversen, Dunnett and Björklund2010b). Against this background, MSNs must be considered the most pivotal computational units of the basal ganglia that are “optimized for integrating multiple distinct inputs” (Kreitzer & Malenka Reference Kreitzer and Malenka2008), including dopamine-dependent motivation-related information, conveyed via ventromedial–dorsolateral striatal pathways to those neurons. It is well established that midbrain dopaminergic neurons have a pivotal role within the context of classical/Pavlovian and operant/instrumental conditioning tasks (e.g., Schultz Reference Schultz2006; Reference Schultz2010). More specifically, unexpected benefits in association with a stimulus give rise to stereotypic short-latency/short-duration activity bursts of dopaminergic neurons which inform the brain on novel reward opportunities. Whereas, indeed, such brief responses cannot easily account for the impact of a speaker's mood such as anger or joy upon spoken language, other behavioral challenges, for example, longer-lasting changes in motivational state such as “appetite, hunger, satiation, behavioral excitation, aggression, mood, fatigue, desperation,” are assumed to give rise to more prolonged striatal dopamine release (Schultz Reference Schultz2007, p. 207). Moreover, the midbrain dopaminergic system is sensitive to the motivational condition of an animal during instrumental conditioning tasks (“motivation to work for a reward”; Satoh et al. Reference Satoh, Nakai, Sato and Kimura2003).

The dopamine-dependent impact of motivation-related information on MSNs provides a molecular basis for the influence of a speaker's actual mood and actual emotions on the speech control mechanisms bound to the basal ganglia motor loop. Consequently, depletion of striatal dopamine should deprive vocal behavior from the “energetic activation” (Robbins Reference Robbins, Iversen, Iversen, Dunnett and Björklund2010) arising in the various cortical and subcortical limbic structures of the primate brain (see Fig. 1B). The different basic motivational states of our species – shared with other mammals – are bound to distinct cerebral networks (Panksepp Reference Panksepp1998; Reference Panksepp and Brudzynski2010). For example, the “rage/anger” and “fear/anxiety” systems involve the amygdala, which, in turn, targets the ventromedial striatum. On the other hand, the cortico-striatal motor loop is engaged in the control of movement execution, namely, the specification of velocity and range of orofacial and laryngeal muscles. The basal ganglia have an ideal strategic position to translate the various arousal-related mood states (joy or anger) into their respective acoustic signatures by means of a dopaminergic cascade of spiraling striato-nigro-striatal circuits – via adjustments of vocal tract innervation patterns (“psychobiological push effects of vocal affect expression”; Banse & Scherer Reference Banse and Scherer1996; Scherer et al. Reference Scherer, Johnstone, Klasmeyer, Davidson, Scherer and Goldsmith2009). In addition, spoken language may convey a speaker's attitude towards a person or topic (“attitudinal prosody”; Van Lancker Sidtis et al. Reference Van Lancker Sidtis, Pachana, Cummings and Sidtis2006). Such higher-order communicative functions of speech prosody involve a more extensive appraisal of the context of a conversation and may exploit learned stylistic (ritualized) acoustic models of vocal-expressive behavior (Scherer Reference Scherer1986; Scherer et al. Reference Scherer, Johnstone, Klasmeyer, Davidson, Scherer and Goldsmith2009). Besides subcortical limbic structures and orbitofrontal areas, ACC projects to the ventral striatum in monkeys (Haber et al. Reference Haber, Kunishio, Mizobuchi and Lynd-Balta1995; Kunishio & Haber Reference Kunishio and Haber1994; Öngür & Price Reference Öngür and Price2000). Since these mesiofrontal areas are assumed to operate as a platform of motivational-cognitive interactions subserving response evaluation (see above), the connections of ACC with the striatum, conceivably, engage in the implementation of attitudinal aspects of speech prosody (“sociolinguistic/sociocultural pull factors” as opposed to the “psychobiological push effects” referred to above; Banse & Scherer Reference Banse and Scherer1996; Scherer et al. Reference Scherer, Johnstone, Klasmeyer, Davidson, Scherer and Goldsmith2009). Thus, both the psychobiological push and the sociocultural pull effects, ultimately, may converge on the ventral striatum, which then, presumably, funnels this information into the basal ganglia motor loops.

4.3.2. Integration of laryngeal and supralaryngeal articulatory gestures into speech motor programs during speech acquisition

The basal ganglia are involved in the development of stimulus-response associations, for example, Pavlovian conditioning (Schultz Reference Schultz2006), and the acquisition of stimulus-driven behavioral routines, such as habit formation (Wickens et al. Reference Wickens, Horvitz, Costa and Killcross2007). Furthermore, striatal circuits are known to engage in motor skill refinement, another variant of procedural (nondeclarative) learning.Footnote 11 For example, the basal ganglia input nuclei contribute to the development of “motor tricks” such as the control of a running wheel or the preservation of balance in rodents (Dang et al. Reference Dang, Yokoi, Yin, Lovinger, Wang and Li2006; Willuhn & Steiner Reference Willuhn and Steiner2008; Yin et al. Reference Yin, Mulcare, Hilário, Clouse, Holloway, Davis, Hansson, Lovinger and Costa2009). Neuroimaging investigations and clinico-neuropsychological studies suggest that the basal ganglia contribute to motor skill learning in humans as well, though existing data are still ambiguous (e.g., Badgaiyan et al. Reference Badgaiyan, Fischman and Alpert2007; Doya Reference Doya2000; Doyon & Benali Reference Doyon and Benali2005; Kawashima et al. Reference Kawashima, Ueki, Kato, Matsukawa, Mima, Hallett, Ito and Ojika2012; Packard & Knowlton Reference Packard and Knowlton2002; Wu & Hallett Reference Wu and Hallett2005). The clinical observations referred to suggest that bilateral pre-/perinatal damage to the cortico-striatal-thalamic circuits gives rise to severe expressive developmental speech disorders which must be distinguished from the hypokinetic dysarthria syndrome seen in adult-onset basal ganglia disorders. Conceivably, thus, the primary control functions of these nuclei change across different stages of motor skill acquisition. In particular, the basal ganglia may primarily participate in the training phase preceding skill consolidation and automatization: The “engrams” shaping habitual behavior and the “programs” steering skilled movements, thus, may get stored in cortical areas rather than the basal ganglia (for references, see Graybiel Reference Graybiel2008; Groenewegen Reference Groenewegen2003).

Yet, several functional imaging studies of upper-limb movement control failed to document a predominant contribution of the striatum to the early stages of motor sequence learning (Doyon & Benali Reference Doyon and Benali2005; Wu et al. Reference Wu, Kansaku and Hallett2004) or even revealed enhanced activation of the basal ganglia during overlearned task performance (Ungerleider et al. Reference Ungerleider, Doyon and Karni2002) and, therefore, do not support this model. As a caveat, these experimental investigations may not provide an appropriate approach to the understanding of the neural basis of speech motor learning. Spoken language represents an outstanding “motor feat” in that its ontogenetic development starts early after or even prior to birth and extends over more than a decade. During this period, the specific movement patterns of an individual's native idiom are exercised more extensively than any other comparable motor sequences. A case similar to articulate speech can at most be made with educated musicians or athletes who have experienced extensive motor practice from early on over many years. In these subject groups, extended motor learning is known to induce structural adaptations of gray and white matter regions related to the level of motor accomplishments (Bengtsson et al. Reference Bengtsson, Nagy, Skare, Forsman, Forssberg and Ullén2005; Gaser & Schlaug Reference Gaser and Schlaug2003). Such investigations into the mature neuroanatomic network of highly trained “motor experts” have revealed fronto-cortical and cerebellar regionsFootnote 12 to be predominantly moulded by the effects of long-term motor learning with little or no evidence for any lasting changes at the level of the basal ganglia (e.g., Gaser & Schlaug Reference Gaser and Schlaug2003). Against this background, it might be conjectured that the basal ganglia engage primarily in early stages of speech acquisition but do not house the motor representations that ultimately convey the fast, error-resistant, and highly automated vocal tract movement patterns of adult speech. This may explain why pre-/perinatal dysfunctions of the basal ganglia have a disastrous impact on verbal communication and preclude the acquisition of speech motor skills.

How can the contribution of the basal ganglia to the assembly of vocal tract motor patterns during speech acquisition be delineated in neurophysiological terms? One important facet is that the laryngeal muscles should have gained a larger striatal representation in our species as compared to other primates. Humans are endowed with more extensive corticobulbar fiber systems, including monosynaptic connections, engaged in the control of glottal functions (see sect. 2.2.3 above; Iwatsubo et al. Reference Iwatsubo, Kuzuhara, Kanemitsu, Shimada and Toyokura1990; Kuypers Reference Kuypers1958a). Furthermore, functional imaging data point to a significant primary-motor representation of human internal laryngeal muscles, spatially separated from the frontal “larynx region” of New and Old World monkeys (Brown et al. Reference Brown, Ngan and Liotti2008; Reference Brown, Laird, Pfordresher, Thelen, Turkeltaub and Liotti2009). In contrast to other primates, therefore, a higher number of corticobulbar fibers target the nu. ambiguus. As a consequence, the laryngeal muscles should have a larger striatal representation in our species since the cortico-striatal fiber tracts consist, to a major extent, of axon collaterals of pyramidal tract neurons projecting to the spinal cord and the cranial nerve nuclei, including the nu. ambiguus (Gerfen & Bolam Reference Gerfen, Bolam, Steiner and Tseng2010; Reiner Reference Reiner, Steiner and Tseng2010). Apart from the nu. accumbens, electrical stimulation of striatal loci in monkeys, in fact, failed to elicit vocalizations. In the latter case, however, the observed vocalizations reflect, most presumably, evoked changes in the animals' internal motivational milieu rather than the excitation of motor pathways (Jürgens & Ploog Reference Jürgens and Ploog1970).

A more extensive striatal representation of laryngeal functions can be expected to enhance the coordination of these activities with the movements of supralaryngeal structures. Briefly, the dorsolateral striatum separates into two morphologically identical compartments of MSNs, which vary, however, in neurochemical markers and input/output connectivity (Graybiel Reference Graybiel1990; for recent reviews, see Gerfen Reference Gerfen, Iversen, Iversen, Dunnett and Björklund2010; Gerfen & Bolam Reference Gerfen, Bolam, Steiner and Tseng2010). While the so-called striosomes (patches) are interconnected with limbic structures, the matrisomes (matrix) participate predominantly in sensorimotor functions. This matrix component creates an intricate pattern of divergent/convergent information flow. For example, primary-motor and somatosensory cortical representations of the same body part are connected with the same matrisomes of the ipsilateral putamen (Flaherty & Graybiel Reference Flaherty and Graybiel1993). Conversely, the projections of a single cortical primary-motor or somatosensory area to the basal ganglia appear to “diverge to innervate a set of striatal matrisomes which in turn send outputs that reconverge on small, possibly homologous sites” in pallidal structures further downstream (Flaherty & Graybiel Reference Flaherty and Graybiel1994, p. 608). Apparently, such a temporary segregation and subsequent re-integration of cortico-striatal input facilitates “lateral interactions” between striatal modules and, thereby, enhances sensorimotor learning processes.

Similar to other body parts, it must be expected that the extensive larynx-related cortico-striatal fiber tracts of our species feed into a complex divergence/convergence network within the basal ganglia as well. These lateral interactions between matrisomes bound to the various vocal tract structures might provide the structural basis supporting the early stages of ontogenetic speech acquisition. More specifically, a larger striatal representation of laryngeal muscles – split up into a multitude of matrisomes – could provide a platform for the tight integration of vocal fold movements into the gestural architecture of vocal tract motor patterns (Fig. 2C).

4.4. Summary: Basal ganglia mechanisms bound to the integration of primate-general and human-specific aspects of acoustic communication

Dopaminergic dysfunctions of the basal ganglia input nuclei in the adult brain predominantly disrupt the embedding of otherwise well-organized speech motor patterns into an adequate emotive- and attitudinal-prosodic context. Based upon these clinical data, we propose that the striatum adds affective-prosodic modulation to the sound structure of verbal utterances. More specifically, the dopamine-dependent cascading interconnectivity between the various basal ganglia loops allows for a cross-talk between the limbic system and mature speech motor control mechanisms. By contrast, bilateral pre-/perinatal damage to the striato-thalamic components of the basal ganglia motor loops may severely impair speech motor integration mechanisms, resulting in compromised spoken language acquisition or even anarthria. We assume that the striatum critically engages in the initial organization of “motor programs” during speech acquisition, whereas the highly automatized control units of mature speech production, that is, the implicit knowledge of “how syllables and words are pronounced,” are stored within anterior left-hemisphere peri-/subsylvian areas.

5. Paleoanthropological perspectives: A two-step phylogenetic/evolutionary scenario of the emergence of articulate speech

In a comparative view, the striatum appears to provide the platform on which a primate-general and, therefore, phylogenetically ancient layer of acoustic communication penetrates the neocortex-based motor system of spoken language production. Given the virtually complete speechlessness of nonhuman primates due to, especially, a limited role of laryngeal/supralaryngeal interactions during call production, structural elaboration of the cortico-basal ganglia–thalamic circuits should have occurred during hominin evolution. Recent molecular-genetic findings provide first specific evidence in support of this notion. More specifically, human-specific FOXP2 copies may have given rise to an elaboration of somatodendritic morphology of basal ganglia loops engaged in the assemblage of vocal tract movement sequences during early stages of articulate speech acquisition. We propose, however, that the assumed FOXP2-driven “vocal-laryngeal elaboration” of the cortico-striatal-thalamic motor loop should have been preceded by a fundamentally different phylogenetic-developmental process, that is, the emergence of monosynaptic corticobulbar tracts engaged in the innervation of the laryngeal muscles.

5.1. Monosynaptic elaboration of the corticobulbar tracts: Enhanced control over tonal and rhythmic characteristics of vocal behavior (Step 1)

In nonhuman primates the larynx functions as an energetically efficient sound source, but shows highly constrained, if any, volitional motor capabilities. Direct projections of the motor cortex to the nu. ambiguus (see sect. 2.2.3) should have endowed this organ in humans with the potential to serve as a more skillful musical organ and an articulator with similar versatility as the lips and the tongue. Presumably, this first evolutionary step toward spoken language emerged independent of the presence of the human-specific FOXP2 transcription factor. Structural morphometric (Belton et al. Reference Belton, Salmond, Watkins, Vargha-Khadem and Gadian2003; Vargha-Khadem et al. Reference Vargha-Khadem, Watkins, Price, Ashburner, Alcock, Connelly, Frackowiak, Friston, Pembrey, Mishkin, Gadian and Passingham1998; Watkins et al. Reference Watkins, Gadian and Vargha-Khadem1999; Reference Watkins, Vargha-Khadem, Ashburner, Passingham, Connelly, Friston, Frackowiak, Mishkin and Gadian2002b) and functional imaging studies (Liégeois et al. Reference Liégeois, Baldeweg, Connelly, Gadian, Mishkin and Vargha-Khadem2003) in affected KE family members demonstrate abnormalities of all components of the cerebral speech motor control system, except the brainstem targets of the corticobulbar tracts (cranial nerve nuclei, pontine gray) and the SMA (Fig. 4 in Vargha-Khadem et al. Reference Vargha-Khadem, Gadian, Copp and Mishkin2005).Footnote 13 As an alternative to FOXP2-dependent neural processes, the increase of monosynaptic elaboration of corticobulbar tracts within the primate order (see sect. 2.2.3) might reflect a “phylogenetic trend” (Jürgens & Alipour Reference Jürgens and Alipour2002) associated with brain volume enlargement. Thus, “evolutionary changes in brain size frequently go hand in hand with major changes in both structural and functional details” (Striedter Reference Striedter2005, p. 12), For example, absolute brain volume predicts – via a nonlinear function – the size of various cerebral components, ranging from the medulla to the forebrain (Finlay & Darlington Reference Finlay and Darlington1995). The three- to four-fold enlargement of absolute brain size in our species relative to australopithecine forms (Falk Reference Falk, Henke and Tattersall2007), therefore, might have driven this refinement of laryngeal control – concomitant with a reorganization of the respective motor maps at the cortical level (Brown et al. Reference Brown, Ngan and Liotti2008; Reference Brown, Laird, Pfordresher, Thelen, Turkeltaub and Liotti2009). Whatever the underlying mechanism, the development of monosynaptic projections of the motor strip to nu. ambiguus should have been associated with an enhanced versatility of laryngeal functions.

From the perspective of the lip-smack hypothesis (Ghazanfar et al. Reference Ghazanfar, Takahashi, Mathur and Fitch2012), the elaboration of the corticobulbar tracts might have been a major contribution to turn the visual lip-smacking display into an audible signal (see MacNeilage Reference MacNeilage1998; Reference MacNeilage2008). Furthermore, this process should have allowed for a refinement of the rather stereotypic acoustic structure of the vocalizations of our early hominin ancestors (Dissanayake Reference Dissanayake, Malloch and Trevarthen2009, p. 23; Morley Reference Morley and Bannan2012, p. 131), for example, the “discretization” of (innate) glissando-like tonal call segments into “separate tonal steps” (Brandt Reference Brandt, Malloch and Trevarthen2009) or the capacity to match and maintain individual pitches (Bannan Reference Bannan and Bannan2012, p. 309). Such an elaboration of the “musical characteristics” (Mithen Reference Mithen2006, p. 121) of nonverbal vocalizations, for example, contact calls, must have supported mother–child interactions. In order to impact the attention, arousal, or mood of young infants, caregivers often use non-linguistic materials such as “interjections, calls, and imitative sounds”, characterized by “extensive melodic modulations” (Papoušek Reference Papoušek, Deliège and Sloboda2003). Furthermore, monosynaptic corticobulbar projections allow for rapid on/off switching of call segments and, thus, enable synchronization of vocal behavior, first, across individuals (communal chorusing in terms of “wordless vocal exchanges” as a form of “grooming-at-a-distance”; Dunbar Reference Dunbar and Bannan2012) and, second, with other body movements (dance). Such activities support interpersonal emotional bonds (“fellow-feeling”) and promote social cohesion/cooperation (Cross Reference Cross2001; Reference Cross, Peretz and Zatorre2003; Cross & Morley Reference Cross, Morley, Malloch and Trevarthen2009). These accomplishments must have emerged after the separation of the hominin lineage since chimpanzees are unable to converge on a regular beat during call production (e.g., Geissmann Reference Geissmann, Wallin, Merker and Brown2000). More specifically, African apes engage in rhythmical behavior like drumming, but, apparently, lack the capacity of a mutual entrainment of such actions into synchronized group displays (Fitch Reference Fitch, Rebuschat, Rohrmeier, Hawkins and Cross2012). Thus, monosynaptic elaboration of the corticobulbar tracts might have provided the phylogenetic basis both for the “communicative musicality” of human infants and for communal “wordless vocal exchanges,” preceding both articulate speech and more formal musical activities shaped by culture (Malloch & Trevarthen Reference Malloch and Trevarthen2009).Footnote 14 As a further indication that these achievements are not bound to the presence of the human-specific FOXP2 transcription factor, reproduction of musical tones and tunes was found largely uncompromised in KE family members with articulatory disorders (Alcock et al. Reference Alcock, Passingham, Watkins and Vargha-Khadem2000b).

The Kuypers/Jürgens hypothesis (Fitch et al. Reference Fitch, Huber and Bugnyar2010) assumes that the vocal-behavioral limitations of nonhuman primates are rooted in the absence of direct corticobulbar projections to the brainstem motoneurons engaged in the innervation of laryngeal muscles and housed within the nu. ambiguus. Indeed, this model explains the inability of nonhuman primates to produce sound patterns that impose particularly high demands on the coordination of laryngeal and supralaryngeal activities such as the rapid voiced–voiceless alterations characteristic of articulate speech. Yet, this suggestion cannot account for nonhuman primates' inability to imitate less challenging, fully voiced, speech-like vocalizations such as syllables comprising voiced consonants (see sect. 4.3.2).

5.2. FOXP2-driven vocal elaboration of the basal ganglia motor loop: Enhanced integration of laryngeal and supralaryngeal gestures (Step 2)

As a further prerequisite of spoken language, the vocal folds must serve as an “articulatory organ” that can be “pieced together” with equally versatile orofacial gestures into a tightly integrated meshwork of appropriately timed vocal tract movements. Conceivably, FOXP2-driven morphological changes at the level of the basal ganglia in our hominin ancestors provided the physiological basis for these sensorimotor capabilities to emerge as a second phylogenetic step toward articulate speech. More specifically, enhanced “lateral interactions” between striatal representations of vocal tract muscles based on a divergence/convergence architecture of information flow within the basal ganglia (Flaherty & Graybiel Reference Flaherty and Graybiel1994) have the potential to support the linkage of vocal tract movements into language-specific syllabic and metrical patterns. This would represent a major step in sensorimotor verbal learning during ontogenetic speech acquisition. The role of the basal ganglia in this process seems to be confined to the phase where the entrainment and automatization of speech motor patterns takes place, while the persistent motor plans evolving during this process get stored within left-hemisphere peri- or subsylvian cortex. In the mature speech motor system, the contribution of the striatum to speech production appears predominantly restricted to dopamine-dependent, emotive-prosodic shading of the speech signal as a homologue to the vocalizations of nonhuman primates and a vestige of the ancient communication system.

Paleoanthropological data such as endocast traces of Broca's area (Holloway et al. Reference Holloway, Broadfield and Yuan2004, pp. 15ff) or morphological features of the cranial base (Lieberman Reference Lieberman2011) provide only indirect and ambiguous evidence on the evolution of spoken language. “Comparing our behavior and brain with those of other extant primates” (Ghazanfar & Miller Reference Ghazanfar and Miller2006, p. R879) still represents the most robust approach to the investigation of the “biological mechanisms underlying the evolution of speech” (Ghazanfar & Rendall Reference Ghazanfar and Rendall2008, p. R457). Recently, however, molecular-genetic studies have shed light on the phylogeny of verbal communication in the hominin lineage and, more specifically, the contribution of the basal ganglia to the evolution of spoken language. Thus, molecular-genetic analyses found the human form of the FOXP2 protein in 43,000-year-old Neanderthal skeletal remains (Rosas et al. Reference Rosas, Martínez-Maza, Bastir, García-Tabernero, Lalueza-Fox, Huguet, Ortiz, Julià, Soler, de Torres, Martínez, Canaveras, Sánchez-Moral, Cuezva, Lario, Santamaría, de la Rasilla and Fortea2006) linked to the same haplotype as in our species (Krause et al. Reference Krause, Lalueza-Fox, Orlando, Enard, Green, Burbano, Hublin, Hänni, Fortea, De la Rasilla, Bertranpetit, Rosas and Pääbo2007).Footnote 15 Since large-scale analyses of the FOXP2 locus in humans failed to detect any amino acid polymorphisms (Enard et al. Reference Enard, Przeworski, Fisher, Lai, Wiebe, Kitano, Monaco and Pääbo2002), those speech-related mutations must have been the target of strong selection pressures, causing a relatively fast fixation within the human gene pool (“selective sweep”). Assuming modern humans and Neanderthals did not interbreed, positive selection of the relevant FOXP2 mutation(s) should have occurred in our most recent common ancestor (MRCA). Sequence analyses both of nuclear and mitochondrial DNA “locate” the MRCA to the mid-Middle Pleistocene, around 400,000 to 600,000 years ago (Endicott et al. Reference Endicott, Ho and Stringer2010; Green et al. Reference Green, Krause, Briggs, Maricic, Stenzel, Kircher, Patterson, Li, Zhai, Fritz, Hansen, Durand, Malaspinas, Jensen, Marques-Bonet, Alkan, Prüfer, Meyer, Burbano, Good, Schultz, Aximu-Petri, Butthof, Höber, Höffner, Siegemund, Weihmann, Nusbaum, Lander, Russ, Novod, Affourtit, Egholm, Verna, Rudan, Brajkovic, Kucan, Gusic, Doronichev, Golovanova, Lalueza-Fox, de la Rasilla, Fortea, Rosas, Schmitz, Johnson, Eichler, Falush, Birney, Mullikin, Slatkin, Nielsen, Kelso, Lachmann, Reich and Pääbo2010; Hofreiter Reference Hofreiter2011; Noonan Reference Noonan2010), and these data are compatible with the fossil record (Weaver et al. Reference Weaver, Roseman and Stringer2008). As an alternative scenario, gene flow could explain the presence of the human FOXP2 variant in Neanderthal bones (Coop et al. Reference Coop, Bullaughey, Luca and Przeworski2008). Under these conditions, a later emergence of the respective hominin mutations has been assumed – around 40,000 years ago (see Stringer Reference Stringer2012, pp. 190ff, for a recent discussion of interbreeding between modern humans and archaic populations, i.e., Neanderthals and Denisovans). A more recent molecular-genetic study, finally, points at a positive selective sweep of a regulatory FOXP2 element – affecting neuronal expression of this gene – within a comparable time domain, that is, during the last 50,000 years (Maricic et al. Reference Maricic, Günther, Georgiev, Gehre, Curlin, Schreiweis, Naumann, Burbano, Meyer, Laluela-Fox, de la Rasilla, Rosas, Gajovic, Kelso, Enard, Schaffner and Pääbo2013). In any case, whatever model will prove true, FOXP2-driven speech-related modification of cortico-striatal circuits must have emerged in individuals characterized by a cerebral volume similar to that of extant modern humans (Rightmire Reference Rightmire2004; Reference Rightmire, Henke and Tattersall2007).

Assuming a gradual monosynaptic elaboration of corticobulbar projections in parallel with brain size increase across the hominin lineage (see above), the relatively late reorganization of cortico-basal ganglia loops driven by specific FOXP2 mutations should have occurred on top of a fully developed motoneuronal axis. It is tempting to relate the selective sweep of the hominin FOXP2 mutations to the evolution of speech and language functions (Enard & Pääbo Reference Enard and Pääbo2004; Zhang et al. Reference Zhang, Webb and Podlaha2002). However, the benefits of full-fledged verbal communication cannot have been the driving force of the emergence of articulate speech. “If the first one or three or five protolanguage signs [such as syllable repetitions or simple words] didn't have a substantial payoff, no one would have bothered to invent any more” (Bickerton Reference Bickerton2009, p. 165). The announcement of “displaced” objects such as perished large mammals and the subsequent recruitment of troop members for carcass exploitation has been assumed to provide the necessary “substantial payoff” (Bickerton Reference Bickerton2009, pp. 167f). But individuals spending their whole – though often short – lives together in small and intimate troops should have been able to convey such simple messages to a sufficient extent by nonverbal, that is, gestural means (Coward Reference Coward, Dunbar, Gamble and Gowlett2010, p. 469).

Rather than semantic-referential functions, the earliest speech-like vocalizations could have served as refined contact calls and, thus, facilitated mother–child interactions (Falk Reference Falk2004; Reference Falk2009). Likewise, these vocalizations might have allowed for a vocal elaboration of group activities such as communal dancing or grooming, which consolidate intra-group cohesion and cooperation (Dunbar Reference Dunbar1996; Mithen Reference Mithen2006, pp. 208f). In other words, the earliest verbal utterances further expanded and refined the space of versatile vocal displays afforded by the preceding development of monosynaptic corticobulbar projections to the nu. ambiguus. Besides other benefits (see above), these accomplishments should have enhanced a “speaker's” social prestige. Subsequent gradual “conventionalization” (Milo & Quiatt Reference Milo, Quiatt, Quiatt and Itani1994) of speech-like acoustic signals then could have slowly created opportunities for the conveyance of environmental or social information by simply drawing attention to an actual event or situation (Dessalles Reference Dessalles2007, p. 360).

6. A look beyond the primate lineage: Birdsong and human speech

In a broader comparative perspective, the emergence of articulate speech appears to have involved the convergent evolution in our species of rather ancient principles of brain wiring, documented already many years ago in songbirds. The avian “song production network” roughly separates into two circuits, that is, the vocal motor pathway (VMP) and the anterior forebrain pathway (AFP; e.g., Bolhuis et al. Reference Bolhuis, Okanoya and Skarff2010; Jarvis Reference Jarvis, Marler and Slabbekoorn2004a; Reference Jarvis, Zeigler and Marler2004b). Whereas VMP shares essential organizational principles with human corticobulbar tracts such as monosynaptic projections to the cranial nerve centers steering the peripheral vocal apparatus (Wild Reference Wild, Zeigler and Marler2008; see also Ackermann & Ziegler Reference Ackermann, Ziegler, Bolhuis and Everaert2013), there are striking similarities between AFP and the cortico-basal ganglia loops of mammals, including our species (Doupe et al. Reference Doupe, Perkel, Reiner and Stern2005). In zebra finches, area X – a major AFP component that includes both striatal and pallidal elements – shows, for example, specific interdependencies between FoxP2 level and the accuracy of tutor song imitation (Haesler et al. Reference Haesler, Rochefort, Georgi, Licznerski, Osten and Scharff2007) or juvenile/adult singing activity (Teramitsu et al. Reference Teramitsu, Poopatanapong, Torrisi and White2010; for an evolutionary perspective on this gene see Scharff & Haesler Reference Scharff and Haesler2005). Whereas bilateral VMP damage significantly compromises vocal behavior at any stage of an individual's life history, AFP dysfunctions have, by contrast, a more subtle impact upon mature songs, but severely disrupt vocal learning mechanisms (e.g., Brainard & Doupe Reference Brainard and Doupe2002). Thus, (i) monosynaptic connections between upper and lower motoneurons engaged in the innervation of the sound source and (ii) cortico-striatal motor loops supporting vocal-laryngeal functions appear to represent common functional-neuroanatomic prerequisites both of spoken language and birdsong (for a review of the parallels between avian and human acoustic communication, see Doupe & Kuhl Reference Doupe and Kuhl1999; Bolhuis & Everaert Reference Bolhuis and Everaert2013; Bolhuis et al. Reference Bolhuis, Okanoya and Skarff2010). As a consequence, birdsong can serve as an experimental model for the investigation of the neural control of human speech – though, most presumably, syntactic and semantic aspects of verbal utterances elude such an approach (Beckers et al. Reference Beckers, Bolhuis, Okanoya and Berwick2012; Berwick et al. Reference Berwick, Okanoya, Beckers and Bolhuis2011). The hitherto underestimated role of the basal ganglia in spoken language should help to further elucidate the relationship between birdsong and human speech.

7. Conclusions

During recent years, a salient contribution of subcortical structures, including the basal ganglia, to language evolution has been assumed (Lieberman Reference Lieberman2000; Reference Lieberman2007). More specifically, FOXP2-driven modification of neural circuits traversing the basal ganglia must be considered a necessary prerequisite for “the emergence of proficient spoken language” (Vargha-Khadem et al. Reference Vargha-Khadem, Gadian, Copp and Mishkin2005). However, these suggestions do not account for the developmental dynamics of cortico-striatal interactions and the discrepancies between the sequels of basal ganglia lesions in children and adults. Based upon behavioral–clinical and functional imaging data, in this article we have proposed (1) two successive phylogenetic stages of speech acquisition (monosynaptic refinement of corticobulbar tracts and laryngeal elaboration of cortico-striatal motor circuits), and (2) a functional reorganization of the cortico-striatal motor loops engaged in vocal tract control during ontogenetic speech development (Fig. 4).

It goes without saying that the model outlined here addresses only one out of several building blocks of a comprehensive theory of the evolution of spoken language. Most evidently, our approach still fails to account for the co-evolution of the described linguistic motor skills with the auditory skills underlying speech perception, and, as a consequence, the emergence of the auditory-motor network that underlies the phonological processing capacities of our species. Furthermore, we need to better understand how this elaborate auditory-vocal communication apparatus became overarched by the expanding conceptual-semantic and syntactic capabilities of humans. Thus, language evolution must be considered a multicomponent process, and the specific phylogenetic interactions of emergent speech production with these other traits await further elucidation. Presumably, any such phylogenetic account also needs to integrate, among other things, social and motivational contingencies (e.g., Dunbar Reference Dunbar1996), “the desire to use the vocal tract to communicate” (Locke Reference Locke1993, p. 322f), amodal mimetic capacities (Donald Reference Donald, Corballis and Lea1999), mirror neuron systems (Arbib Reference Arbib and Arbib2006), and so-called executive functions (Coolidge & Wynn Reference Coolidge and Wynn2009) as relevant driving forces and prerequisites of spoken language evolution (for a comprehensive overview, see Tallerman & Gibson Reference Tallerman and Gibson2012).

Footnotes

1. Though predominantly depending on glottal source characteristics such as the fluctuations of pitch, loudness, and voice quality, vocal-affective prosodic expression may also be associated with changes in speech breathing patterns, alterations of speaking rate, and the degree to which speech sounds are hyper- or hypo-articulated. Thus, motivational factors have, more or less, an impact on all vocal tract subsystems.Affective-emotive speech prosody, that is, the expression of arousal-related mood states, has been considered as a behavioral trait homologous to the acoustic signals of nonhuman primates in addition to nonverbal affective vocalizations such as laughter (“push-effects” of affective-emotive prosody; see last paragraph in sect. 4.3.1). By contrast, attitudes like doubt or approval cannot unambiguously be expected in nonhuman primates. Thus, it is questionable whether attitudinal prosody, that is, appraisal-related “pull-effects,” can be assumed homologous to the vocal behavior of nonhuman primates.Besides arousal-related motivational/affective states (e.g., joy) or appraisal-based subjective attitudes (e.g., doubt), speech prosody may also convey linguistic information such as word accent (linguistic prosody) or contribute to the implementation of “speech acts” such as verbal intimidation of another subject (Sidtis & Van Lancker Sidtis Reference Sidtis and Van Lancker Sidtis2003; Van Lancker Sidtis et al. Reference Van Lancker Sidtis, Pachana, Cummings and Sidtis2006). Linguistic and pragmatic prosody are outside the scope of this article.In addition to a propositional message and affective/attitudinal states, the speech signal also conveys speaker-related (“indexical”) information on age, gender, and identity, simply because the size and tissue properties of laryngeal and supralaryngeal structures differ across individuals and change over lifetime (Kreiman & Sidtis Reference Kreiman and Sidtis2011).

2. The more recent paleoanthropological literature applies the term hominin – rather than hominid – to the human clade (“family”), that is, the “bush” of all species tracing back to a common ancestor who diverged from the lineage encompassing modern chimpanzees (Lewin & Foley Reference Lewin and Foley2004, p. 9).

3. Nucleotide sequences are given in italics, proteins in regular letters; lower- and uppercase serve to distinguish human (FOXP2/FOXP2), murine (Foxp2/Foxp2), and other, for example, avian (FoxP2/FoxP2) variants of the forkhead family of genes (Kaestner et al. Reference Kaestner, Knöchel and Martínez2000).

4. The PAG and the adjacent mesencephalic tegmentum represent a functional-neuroanatomic entity (Holstege Reference Holstege, Depaulis and Bandler1991). In the subsequent paragraphs, the term “PAG” will always refer to both subcomponents.

5. Monosynaptic projections of (the avian) motor cortex to brainstem nuclei have also been documented in songbirds (for a review see, e.g., Wild Reference Wild, Zeigler and Marler2008), an often neglected prerequisite of vocal learning (see sect. 6).

6. Two cases of a constellation resembling transcortical motor aphasia following ACC infarction have been documented to date (Chang et al. Reference Chang, Lee, Lui and Lai2007). Diffusion tensor imaging revealed additional disruption of efferent SMA fibers in one patient. Thus, a substantial contribution of premotor mesiofrontal cortex to the observed communication disorders must be considered.

7. Two case studies noted compromised speech prosody after mesiofrontal lesion (Bell et al. Reference Bell, Davis, Morgan-Fisher and Ross1990; Heilman et al. Reference Heilman, Leon and Rosenbek2004). In the absence of more detailed neuroanatomic data, such observations are difficult to interpret unambiguously.

8. Further alterations of the FOXP2 gene – such as a nonsense mutation giving rise to truncated protein products – have been found in association with developmental speech dyspraxia (MacDermot et al. Reference MacDermot, Bonora, Sykes, Coupe, Lai, Vernes, Vargha-Khadem, McKenzie, Smith, Monaco and Fisher2005).

9. In contrast to other dysarthria variants, PD subjects show, as a rule, normal speaking rates. A subgroup of patients even displays an accelerated tempo (“hastening phenomenon”; e.g., Duffy Reference Duffy2005). This unique, but rarely studied, phenomenon may reflect a release of oscillatory basal ganglia activity (Ackermann et al. Reference Ackermann, Konczak and Hertrich1997b; Riecker et al. Reference Riecker, Kassubek, Gröschel, Grodd and Ackermann2006).

10. Tracing back to the late 1970s (Ross & Mesulam Reference Ross and Mesulam1979), a series of case studies assigned motor aprosodia – disrupted implementation of the “affective tone” of spoken language, concomitant with a preserved “ability to ‘feel emotion’ inwardly” and an unimpaired comprehension of other subjects' vocal expression of motivational states – to a dysfunction of right-hemisphere fronto-opercular cortex and/or anterior insula (e.g., Ross & Monnot Reference Ross and Monnot2008). However, the lesions in these cases appear to have encroached on the basal ganglia, including their connections to mesiofrontal cortex (see Cancelliere & Kertesz Reference Cancelliere and Kertesz1990).

11. In contrast to habit formation, that is, the incremental emergence of stimulus-driven behavioral routines, motor skill learning is characterized by the incremental refinement of movement execution as reflected in reaction time measurements: “Learning how to ride a bicycle is quite different from having the habit of biking every evening after work” (Graybiel Reference Graybiel2008, p. 370).

12. As compared to the upper limbs, the specific contribution of the cerebellum to speech motor learning is less clear. Most noteworthy, the few reported cases of congenital cerebellar hypoplasia/aplasia, apparently, lack any significant disorders of spoken language (Ackermann & Ziegler Reference Ackermann and Ziegler1992). Acquired dysfunctions of the cerebellum, nevertheless, compromise speech production, giving rise to, among other things, a slowed speaking rate and imprecise consonant articulation (Ackermann Reference Ackermann2008; Duffy Reference Duffy2005).

13. These inferences must be considered with some precautions: We can only conclude that the heterozygous(!) constellations observed so far in the KE family (Bolhuis et al. Reference Bolhuis, Okanoya and Skarff2010, p. 753) do not significantly disrupt the corticobulbar pathway – unlike other components of the central motor system.

14. Although contemporary traditional societies of a predominantly hunter-gatherer mode of subsistence “are not necessarily like some form of pre-human and should not be used uncritically as models,” the respective ethnographic data, nevertheless, allow limited inferences on the behavioral repertoire of our hominin ancestors (Barnard Reference Barnard2011, p. 15). Thus, extensive communal dancing, often accompanied by rhythmic nonverbal utterances, represents a salient component of many ceremonies associated with important events in the life of an individual (e.g., circumcision rite; Turner Reference Turner1967, pp. 186ff, 193) or the history of a group (war-/peace-related gatherings; e.g., Rappaport Reference Rappaport2000, pp. 173ff). Since the coordination of vocal behavior and body movements may encourage a sense of “unity, harmony, and concord” among a group, social bonding should benefit from a vocal elaboration of ritual forms (Rappaport Reference Rappaport1999, pp. 220, 252ff). It must be noted, however, that communal dancing often may include a competitive element aside from social bonding (James Reference James2003, pp. 75f; for examples, see Rappaport Reference Rappaport1999, p. 80; 2000, pp. 191ff; Turner Reference Turner1967, p. 260). Principally, refined musical abilities could have supported to some extent referential communication. Spoken languages may include a broad range of nonverbal signals (Lewis Reference Lewis, Botha and Knight2009). For example, the Mbendjele people living in the dense equatorial forests of the Congo Basin, a habitat that severely impedes visual orientation, report an encounter with a dangerous animal to other group members by means of meticulous mimicry of the respective auditory scene. These anthropological data support the suggestion that enhanced musicality of nonverbal vocalizations may provide communicative benefits, but do not necessarily imply the notion of a “musical protolanguage” or “musilanguage” (Brown Reference Brown, Wallin, Merker and Brown2000), that is, music-like learned communication systems preceding full-fledged spoken language, a hypothesis tracing back to Charles Darwin (Reference Darwin and Murray1871).

15. Similar to nonhuman primates, limitations of articulate speech due to vocal tract constraints have been attributed to Neanderthals as well, giving rise to a reduced repertoire of speech sounds (for a critical discussion, see Barney et al. Reference Barney, Martelli, Serrurier and Steele2012; Clegg Reference Clegg and Bannan2012).

References

Ackermann, H. (2008) Cerebellar contributions to speech production and speech perception: Psycholinguistic and neurobiological perspectives. Trends in Neurosciences 31(6):265–72. doi: 10.1016/j.tins.2008.02.011.CrossRefGoogle ScholarPubMed
Ackermann, H., Hertrich, I., Daum, I., Scharf, G. & Spieker, S. (1997a) Kinematic analysis of articulatory movements in central motor disorders. Movement Disorders 12:1019–27.Google Scholar
Ackermann, H., Hertrich, I. & Ziegler, W. (2010) Dysarthria. In: The handbook of language and speech disorders, ed. Damico, J. S., Müller, N. & Ball, M. J., pp. 362–90. Wiley-Blackwell.Google Scholar
Ackermann, H., Hertrich, I., Ziegler, W., Bitzer, M. & Bien, S. (1996) Acquired dysfluencies following infarction of the left mesiofrontal cortex. Aphasiology 10:409–17.CrossRefGoogle Scholar
Ackermann, H., Konczak, J. & Hertrich, J. (1997b) The temporal control of repetitive articulatory movements in Parkinson's disease. Brain and Language 56:312–19.Google Scholar
Ackermann, H. & Riecker, A. (2010a) Cerebral control of motor aspects of speech production: Neurophysiological and functional imaging data. In: Speech motor control: New developments in basic and applied research, ed. Maassen, B. & van Lieshout, P., pp. 117–34. Oxford University Press.CrossRefGoogle Scholar
Ackermann, H. & Riecker, A. (2010b) The contribution(s) of the insula to speech production: A review of the clinical and functional imaging literature. Brain Structure and Function 214:419–33.Google Scholar
Ackermann, H. & Ziegler, W. (1992) Cerebellar dysarthria: A review. Fortschritte der Neurologie und Psychiatrie 60:2840. (German).CrossRefGoogle ScholarPubMed
Ackermann, H. & Ziegler, W. (1995) Akinetic mutism: A review of the literature. Fortschritte der Neurologie und Psychiatrie 63:5967. (German).CrossRefGoogle ScholarPubMed
Ackermann, H. & Ziegler, W. (2010) Brain mechanisms underlying speech motor control. In: The handbook of phonetic sciences, 2nd edition, ed. Hardcastle, W. J., Laver, J. & Gibbon, F. E., pp. 202–50. Wiley-Blackwell.Google Scholar
Ackermann, H. & Ziegler, W. (2013) A “birdsong perspective” on human speech production. In: Birdsong, speech, and language: Exploring the evolution of mind and brain, ed. Bolhuis, J. J. & Everaert, M., pp. 331–52. MIT Press.Google Scholar
Aitken, P. G. (1981) Cortical control of conditioned and spontaneous vocal behavior in rhesus monkeys. Brain and Language 13:171–84.Google Scholar
Aitken, P. G. & Wilson, W. A. Jr. (1979) Discriminative vocal conditioning in rhesus monkeys: Evidence for volitional control? Brain and Language 8:227–40.CrossRefGoogle ScholarPubMed
Albin, R. L., Young, A. B. & Penney, J. B. (1989) The functional anatomy of basal ganglia disorders. Trends in Neurosciences 12:366–75.CrossRefGoogle ScholarPubMed
Alcock, K. J., Passingham, R. E., Watkins, K. E. & Vargha-Khadem, F. (2000a) Oral dyspraxia in inherited speech and language impairment and acquired dysphasia. Brain and Language 75(1):1733. doi: 10.1006/brln.2000.2322.Google Scholar
Alcock, K. J., Passingham, R. E., Watkins, K. E. & Vargha-Khadem, F. (2000b) Pitch and timing abilities in inherited speech and language impairment. Brain and Language 75:3446.Google Scholar
Alexander, G. E., Crutcher, M. D. & DeLong, M. R. (1990) Basal ganglia-thalamocortical circuits: Parallel substrates for motor, oculomotor, “prefrontal” and “limbic” functions. In: The prefrontal cortex: Its structure, function and pathology, ed. Uylings, H. B. M., Eden, C. G. van, de Bruin, J. P. C., Corner, M. A. & Feenstra, M. G. P., pp. 119–46. Elsevier. (Elsevier Book Series on Neuroscience: Progress in Brain Research, vol. 85).Google Scholar
Arbib, M. A. (2006) The Mirror System Hypothesis on the linkage of action and language. In: Action to language via the mirror neuron system, ed. Arbib, M. A., pp. 347. Cambridge University Press.CrossRefGoogle Scholar
Arnold, K. & Zuberbühler, K. (2006) Semantic combinations in primate calls. Nature 441(7091):303.CrossRefGoogle ScholarPubMed
Arroyo, S., Lesser, R. P., Gordon, B., Uematsu, S., Hart, J., Schwerdt, P., Andreasson, K. & Fisher, R. S. (1993) Mirth, laughter, and gelastic seizures. Brain 116:757–80.CrossRefGoogle ScholarPubMed
Badgaiyan, R. D., Fischman, A. J. & Alpert, N. M. (2007) Striatal dopamine release in sequential learning. NeuroImage 38:549–56.Google Scholar
Bailey, P., von Bonin, G. & McCulloch, W. S. (1950) The isocortex of the chimpanzee. University of Illinois Press.Google Scholar
Bannan, N. (2012) Harmony and its role in human evolution. In: Music, language, and human evolution, ed. Bannan, N., pp. 288339. Oxford University Press.CrossRefGoogle Scholar
Banse, R. & Scherer, K. R. (1996) Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70(3):614–36.CrossRefGoogle ScholarPubMed
Barnard, A. (2011) Social anthropology and human origins. Cambridge University Press.CrossRefGoogle Scholar
Barrett, J., Pike, G. B. & Paus, T. (2004) The role of the anterior cingulate cortex in pitch variation during sad affect. European Journal of Neuroscience 19:458–64.Google Scholar
Barney, A., Martelli, S., Serrurier, A. & Steele, J. (2012) Articulatory capacity of Neanderthals, a very recent and human-like fossil hominin. Philosophical Transactions of the Royal Society of London, Series B: Biological Sciences 367:88102.Google Scholar
Barris, R. W. & Schuman, H. R. (1953) Bilateral anterior cingulate gyrus lesions: Syndrome of the anterior cingulate gyri. Neurology 3:4452.CrossRefGoogle ScholarPubMed
Basel-Vanagaite, L., Muncher, L., Straussberg, R., Pasmanik-Chor, M., Yahav, M., Rainshtein, L., Walsh, C. A., Magal, N., Taub, E., Drasinover, V., Shalev, H., Attia, R., Rechavi, G., Simon, A. J. & Shohat, M. (2006) Mutated nup62 causes autosomal recessive infantile bilateral striatal necrosis. Annals of Neurology 60:214–22.CrossRefGoogle ScholarPubMed
Beckers, G. J. L., Bolhuis, J. J., Okanoya, K. & Berwick, R. C. (2012) Birdsong neurolinguistics: Songbird context-free grammar claim is premature. NeuroReport 23:139–45.CrossRefGoogle ScholarPubMed
Bell, W. L., Davis, D. L., Morgan-Fisher, A. & Ross, E. D. (1990) Acquired aprosodia in children. Journal of Child Neurology 5:1926.Google Scholar
Belton, E., Salmond, C. H., Watkins, K. E., Vargha-Khadem, F. & Gadian, D. G. (2003) Bilateral brain abnormalities associated with dominantly inherited verbal and orofacial dyspraxia. Human Brain Mapping 18:194200.CrossRefGoogle ScholarPubMed
Bengtsson, S. L., Nagy, Z., Skare, S., Forsman, L., Forssberg, H. & Ullén, F. (2005) Extensive piano practicing has regionally specific effects on white matter development. Nature Neuroscience 8:1148–50.CrossRefGoogle ScholarPubMed
Bermejo, M. & Omedes, A. (1999) Preliminary vocal repertoire and vocal communication of wild bonobos (Pan paniscus) at Lilungu (Democratic Republic of Congo). Folia Primatologica 70:328–57.Google Scholar
Berwick, R. C., Okanoya, K., Beckers, G. J. L. & Bolhuis, J. J. (2011) Songs to syntax: The linguistics of birdsong. Trends in Cognitive Sciences 15(3):113–21.Google Scholar
Bickerton, D. (2009) Adam's tongue: How humans made language, how language made humans. Hill & Wang.Google Scholar
Boë, L. J., Heim, J. L., Honda, K. & Maeda, S. (2002) The potential Neandertal vowel space was as large as that of modern humans. Journal of Phonetics 30:465–84.Google Scholar
Boesch, C. & Boesch-Achermann, H. (2000) The chimpanzees of the Taï Forest: Behavioural ecology and evolution. Oxford University Press.CrossRefGoogle Scholar
Bolhuis, J. J. & Everaert, M., eds. (2013) Birdsong, speech and language. Exploring the evolution of mind and brain. MIT Press.CrossRefGoogle Scholar
Bolhuis, J. J., Okanoya, K. & Skarff, C. (2010) Twitter evolution: Converging mechanisms in birdsong and human speech. Nature Reviews Neuroscience 11(11):747–59.Google Scholar
Botez, M. I. & Barbeau, A. (1971) Role of subcortical structures, and particularly of the thalamus, in the mechanisms of speech and language: A review. International Journal of Neurology 8:300–20.Google ScholarPubMed
Bouchard, K. E., Mesgarani, N., Johnson, K. & Chang, E. F. (2013) Functional organization of human sensorimotor cortex for speech articulation. Nature 495:327–32. doi:10.1038/nature11911.Google Scholar
Braak, H., Del Tredici, K., Rüb, U., de Vos, R. A. I., Jansen Steur, E. N. H. & Braak, E. (2003) Staging of brain pathology related to sporadic Parkinson's disease. Neurobiology of Aging 24:197211.Google Scholar
Brainard, M. S. & Doupe, A. J. (2002) What songbirds teach us about learning. Nature 417:351–58.CrossRefGoogle ScholarPubMed
Brandt, P. A. (2009) Music and how we became human – a view from cognitive semiotics: Exploring imaginative hypotheses. In: Communicative musicality: Exploring the basis of human companionship, ed. Malloch, S. & Trevarthen, C., pp. 3144. Oxford University Press.Google Scholar
Brendel, B., Hertrich, I., Erb, M., Lindner, A., Riecker, A., Grodd, W. & Ackermann, H. (2010) The contribution of mesiofrontal cortex to the preparation and execution of repetitive syllable productions: An fMRI study. NeuroImage 50:1219–30.Google Scholar
Brockelman, W. Y. & Schilling, D. (1984) Inheritance of stereotyped gibbon calls. Nature 312:634–36.CrossRefGoogle ScholarPubMed
Brotis, A. G., Kapsalaki, E. Z., Paterakis, K., Smith, J. R. & Fountas, K. N. (2009) Historic evolution of open cingulectomy and stereotactic cingulotomy in the management of medically intractable psychiatric disorders, pain and drug addiction. Stereotactic and Functional Neurosurgery 87:271–91.Google Scholar
Brown, J. W. (1988) Cingulate gyrus and supplementary motor correlates of vocalization in man. In: The physiological control of mammalian vocalization, ed. Newman, J. D., pp. 227–43. Plenum Press.CrossRefGoogle Scholar
Brown, S. (2000) The “musilanguage” model of music evolution. In: The origins of music, ed. Wallin, N. L., Merker, B. & Brown, S., pp. 271300. MIT Press.Google Scholar
Brown, S., Ngan, E. & Liotti, M. (2008) A larynx area in the human motor cortex. Cerebral Cortex 18:837–45.Google Scholar
Brown, S., Laird, A. R., Pfordresher, P. Q., Thelen, S. M., Turkeltaub, P. & Liotti, M. (2009) The somatotopy of speech: Phonation and articulation in the human motor cortex. Brain and Cognition 70:3141.Google Scholar
Brown, T. G. (1915) Note on the physiology of the basal ganglia and mid-brain of the anthropoid ape, especially in reference to the act of laughter. Journal of Physiology 49:195207.Google Scholar
Brumm, H., Voss, K., Köllmer, I. & Todt, D. (2004) Acoustic communication in noise: Regulation of call characteristics in a New World monkey. Journal of Experimental Biology 207(3):443–48.CrossRefGoogle Scholar
Burgoon, J. K., Floyd, K. & Guerrero, L. K. (2010) Nonverbal communication theories of interaction adaptation. In: The handbook of communication science, 2nd edition, ed. Berger, C. R., Roloff, M. E. & Roskos-Ewoldsen, D. R., pp. 93108. Sage.Google Scholar
Burling, R. (2005) The talking ape: How language evolved. Oxford University Press.Google Scholar
Butler, A. B. & Hodos, W. (2005) Comparative vertebrate neuroanatomy: Evolution and adaptation, 2nd edition. Wiley.Google Scholar
Call, J. & Tomasello, M., eds. (2007) The gestural communication of apes and monkeys. Erlbaum.Google Scholar
Calzavara, R., Mailly, P. & Haber, S. N. (2007) Relationship between the corticostriatal terminals from areas 9 and 46, and those from area 8A, dorsal and rostral premotor cortex and area 24c: An anatomical substrate for cognition to action. European Journal of Neuroscience 26:2005–24.CrossRefGoogle ScholarPubMed
Cancelliere, A. E. B. & Kertesz, A. (1990) Lesion localization in acquired deficits of emotional expression and comprehension. Brain and Cognition 13:133–47.Google Scholar
Chang, C.-C., Lee, Y. C., Lui, C.-C. & Lai, S.-L. (2007) Right anterior cingulate cortex infarction and transient speech aspontaneity. Archives of Neurology 64:442–46.Google Scholar
Chassagnon, S., Minotti, L., Kremer, S., Verceuil, L., Hoffmann, D., Benabid, A. L. & Kahane, P. (2003) Restricted frontomesial epileptogenic focus generating dyskinetic behavior and laughter. Epilepsia 44:859–63.CrossRefGoogle ScholarPubMed
Cheney, D. L. & Seyfarth, R. M. (1990) How monkeys see the world: Inside the mind of another species. University of Chicago Press.Google Scholar
Cheney, D. L. & Seyfarth, R. M. (2005) Constraints and preadaptations in the earliest stages of language evolution. The Linguistic Review 22:135–59.Google Scholar
Cheney, D. L. & Seyfarth, R. M. (2007) Baboon metaphysics: The evolution of a social mind. University of Chicago Press.Google Scholar
Clay, Z. & Zuberbühler, K. (2009) Food-associated calling sequences in bonobos. Animal Behaviour 77:1387–96.Google Scholar
Clegg, M. (2012) The evolution of the human vocal tract: Specialized for speech? In: Music, language, and human evolution, ed. Bannan, N., pp. 5880. Oxford University Press.Google Scholar
Cohen, J. (2010) Almost chimpanzee: Searching for what makes us human, in rainforests, labs, sanctuaries, and zoos. Henry Holt.Google Scholar
Cohen, M. J., Riccio, C. A. & Flannery, A. M. (1994) Expressive aprosodia following stroke to the right basal ganglia: A case report. Neuropsychology 8:242–45.CrossRefGoogle Scholar
Coolidge, F. L. & Wynn, T. (2009) The rise of Homo sapiens: The evolution of modern thinking. Wiley-Blackwell.Google Scholar
Coop, G., Bullaughey, K., Luca, F. & Przeworski, M. (2008) The timing of selection at the human FOXP2 gene. Molecular Biology and Evolution 25:1257–59.Google Scholar
Corballis, M. C. (2002) From hand to mouth: The origins of language. Princeton University Press.CrossRefGoogle Scholar
Corballis, M. C. (2003) From mouth to hand: Gesture, speech, and the evolution of right-handedness. Behavioral and Brain Sciences 26:199260.Google Scholar
Coudé, G., Ferrari, P. F., Rodà, F., Maranesi, M., Borelli, E., Veroni, V., Monti, F., Rozzi, S. & Fogassi, L. (2011) Neurons controlling voluntary vocalization in the macaque ventral premotor cortex. PLoS ONE 6:e26822.Google Scholar
Coward, F. (2010) Small worlds, material culture and ancient Near Eastern social networks. In: Social brain, distributed mind, ed. Dunbar, R., Gamble, C. & Gowlett, J., pp. 449–79. Oxford University Press. (Proceedings of the British Academy, vol. 158).Google Scholar
Cross, I. (2001) Music, mind and evolution. Psychology of Music 29:95102.Google Scholar
Cross, I. (2003) Music, cognition, culture, and evolution. In: The cognitive neuroscience of music, ed. Peretz, I. & Zatorre, R., pp. 4256. Oxford University Press.CrossRefGoogle Scholar
Cross, I. & Morley, I. (2009) The evolution of music: Theories, definitions and the nature of the evidence. In: Communicative musicality: Exploring the basis of human companionship, ed. Malloch, S. & Trevarthen, C., pp. 6181. Oxford University Press.Google Scholar
Dang, M. T., Yokoi, F., Yin, H. H., Lovinger, D. M., Wang, Y. & Li, Y. (2006) Disrupted motor learning and long-term synaptic plasticity in mice lacking NMDAR1 in the striatum. Proceedings of the National Academy of Sciences USA 103:15254–59.Google Scholar
Darkins, A. W., Fromkin, V. A. & Benson, D. F. (1988) A characterization of the prosodic loss in Parkinson's disease. Brain and Language 34:315–27.Google Scholar
Darwin, C. (1871) The descent of man, and selection in relation to sex. Murray, John [2nd edition 1879 by John Murray, reprint 2004 by Penguin Books].Google Scholar
David, H. N., Ansseau, M. & Abraini, J. H. (2005) Dopamine–glutamate reciprocal modulation of release and motor responses in the rat caudate-putamen and nucleus accumbens of “intact” animals. Brain Research. Brain Research Reviews 50:336–60.Google Scholar
Davis, P. J., Zhang, S. P., Winkworth, A. & Bandler, R. (1996) Neural control of vocalization: Respiratory and emotional influences. Journal of Voice 10:2338.Google Scholar
DeLong, M. R. & Wichmann, T. (2007) Circuits and circuit disorders of the basal ganglia. Archives of Neurology 64:2024.CrossRefGoogle ScholarPubMed
De Meirleir, L., Seneca, S., Lissens, W., Schoentjes, E. & Desprechins, B. (1995) Bilateral striatal necrosis with a novel point mutation in the mitochondrial ATPase 6 gene. Pediatric Neurology 13:242–46.CrossRefGoogle ScholarPubMed
Dessalles, J.-L. (2007) Why we talk: The evolutionary origins of language. Oxford University Press.Google Scholar
Deutch, A. Y., Colbran, R. J. & Winder, D. J. (2007) Striatal plasticity and medium spiny neuron dendritic remodeling in Parkinsonism. Parkinsonism and Related Disorders 13 (Suppl. 3):S251–58.CrossRefGoogle ScholarPubMed
De Waal, F. B. M. (1988) The communicative repertoire of captive bonobos compared to that of chimpanzees. Behaviour 106:183251.Google Scholar
Dissanayake, E. (2009) Root, leaf, blossom, or bole: Concerning the origin and adaptive function of music. In: Communicative musicality: Exploring the basis of human companionship, ed. Malloch, S. & Trevarthen, C., pp. 1730. Oxford University Press.Google Scholar
Donald, M. (1999) Preconditions for the evolution of protolanguages. In: The descent of mind: Psychological perspectives on hominid evolution, ed. Corballis, M. C. & Lea, S. E. G., pp. 138–54. Oxford University Press.Google Scholar
Doupe, A. J. & Kuhl, P. K. (1999) Birdsong and human speech: Common themes and mechanisms. Annual Review of Neuroscience 22:567631.Google Scholar
Doupe, A. J., Perkel, D. J., Reiner, A. & Stern, E. A. (2005) Birdbrains could teach basal ganglia research a new song. Trends in Neurosciences 28(7):353–63.CrossRefGoogle ScholarPubMed
Doya, K. (2000) Complementary roles of basal ganglia and cerebellum in learning and motor control. Current Opinion in Neurobiology 10:732–39.CrossRefGoogle ScholarPubMed
Doyon, J. & Benali, H. (2005) Reorganization and plasticity in the adult brain during learning of motor skills. Current Opinion in Neurobiology 15:161–67.Google Scholar
Duffy, J. R. (2005) Motor speech disorders: Substrates, differential diagnosis, and management, 2nd edition. Elsevier Mosby.Google Scholar
Dum, R. P. & Strick, P. L. (2002) Motor areas in the frontal lobe of the primate. Physiology and Behavior 77:677–82.Google Scholar
Dunbar, R. I. M. (1996) Grooming, gossip, and the evolution of language. Harvard University Press.Google Scholar
Dunbar, R. I. M. (2012) On the evolutionary function of song and dance. In: Music, language, and human evolution, ed. Bannan, N., pp. 201–14. Oxford University Press.Google Scholar
Egnor, S. E. R., Wickelgren, J. G. & Hauser, M. D. (2007) Tracking silence: Adjusting vocal production to avoid acoustic interference. Journal of Comparative Physiology A: Neuroethology, Sensory, Neural, and Behavioral Physiology 193(4):477–83.Google Scholar
Elowson, A. M. & Snowdon, C. T. (1994) Pygmy marmosets, Cebuella pygmaea, modify vocal structure in response to changed social environment. Animal Behaviour 47:1267–77.Google Scholar
Enard, W. (2011) FOXP2 and the role of cortico-basal ganglia circuits in speech and language evolution. Current Opinion in Neurobiology 21:415–24.Google Scholar
Enard, W., Gehre, S., Hammerschmidt, K., Hölter, S. M., Blass, T., Somel, M., Brückner, M. K., Schreiweis, C., Winter, C., Sohr, R., Becker, L., Wiebe, V., Nickel, B., Giger, T., Müller, U., Groszer, M., Adler, T., Aguilar, A., Bolle, I., Calzada-Wack, J., Dalke, C., Ehrhardt, N., Favor, J., Fuchs, H., Gailus-Durner, V., Hans, W., Hölzlwimmer, G., Javaheri, A., Kalaydjiev, S., Kallnik, M., Kling, E., Kunder, S., Mossbrugger, I., Naton, B., Racz, I., Rathkolb, B., Rozman, J., Schrewe, A., Busch, D. H., Graw, J., Ivandic, B., Klingenspor, M., Klopstock, T., Ollert, M., Quintanilla-Martinez, L., Schulz, H., Wolf, E., Wurst, W., Zimmer, A., Fisher, S. E., Morgenstern, R., Arendt, T., de Angelis, M. H., Fischer, J., Schwarz, J. & Pääbo, S. (2009) A humanized version of Foxp2 affects cortico-basal ganglia circuits in mice. Cell 137:961–71.Google Scholar
Enard, W. & Pääbo, S. (2004) Comparative primate genomics. Annual Review of Genomics and Human Genetics 5:351–78.CrossRefGoogle ScholarPubMed
Enard, W., Przeworski, M., Fisher, S. E., Lai, C. S., Wiebe, V., Kitano, T., Monaco, A. P. & Pääbo, S. (2002) Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418(6900):869–72.Google Scholar
Endicott, P., Ho, S. Y. W. & Stringer, C. (2010) Using genetic evidence to evaluate four palaeoanthropological hypotheses for the timing of Neanderthal and modern human origins. Journal of Human Evolution 59:8795.Google Scholar
Esposito, A., Demeurisse, G., Alberti, B. & Fabbro, F. (1999) Complete mutism after midbrain periaqueductal gray lesion. NeuroReport 10:681–85.Google Scholar
Evatt, M. L., DeLong, M. R. & Vitek, J. L. (2002) Parkinson's disease. In: Diseases of the nervous system, vol. 1, 3rd edition, ed. Asbury, A. K., McKhann, G. M., McDonald, W. I. & Goadsby, P. J., pp. 477–89. Cambridge University Press.Google Scholar
Falk, D. (2004) Prelinguistic evolution in early hominins: Whence motherese? Behavioral and Brain Sciences 27(4):491503.Google Scholar
Falk, D. (2007) Evolution of the primate brain. In: Handbook of palaeoanthropology, vol. 2: Primate evolution and human origins, ed. Henke, W. & Tattersall, I., pp. 1133–62. Springer-Verlag.Google Scholar
Falk, D. (2009) Finding our tongues: Mothers, infants and the origins of language. Basic Books.Google Scholar
Fant, G. (1970) Acoustic theory of speech production – with calculations based on X-ray studies of Russian articulations, 2nd edition. Mouton.Google Scholar
Fee, E. J. (1995) The phonological system of a specifically language-impaired population. Clinical Linguistics and Phonetics 9:189209.Google Scholar
Fink, G. R., Frackowiak, R. S. J., Pietrzyk, U. & Passingham, R. E. (1997) Multiple nonprimary motor areas in the human cortex. Journal of Neurophysiology 77:2164–74.CrossRefGoogle ScholarPubMed
Finlay, B. L. & Darlington, R. B. (1995) Linked regularities in the development and evolution of mammalian brains. Science 268:1578–84.CrossRefGoogle ScholarPubMed
Fischer, J. (2003) Developmental modifications in the vocal behavior of non-human primates. In: Primate audition: Ethology and neurobiology, ed. Ghazanfar, A. A., pp. 109–25. CRC Press.Google Scholar
Fischer, J., Hammerschmidt, K., Cheney, D. L. & Seyfarth, R. M. (2002) Acoustic features of male baboon loud calls: Influences of context, age, and individuality. Journal of the Acoustical Society of America 111:1465–74.Google Scholar
Fischer, J., Kitchen, D. M., Seyfarth, R. M. & Cheney, D. L. (2004) Baboon loud calls advertise male quality: Acoustic features and their relation to rank, age, and exhaustion. Behavioral Ecology and Sociobiology 56:140–48.Google Scholar
Fisher, S. E., Lai, C. S. L. & Monaco, A. P. (2003) Deciphering the genetic basis of speech and language disorders. Annual Review of Neuroscience 26:5780.CrossRefGoogle ScholarPubMed
Fisher, S. E. & Scharff, C. (2009) FOXP2 as a molecular window into speech and language. Trends in Genetics 25:166–77.Google Scholar
Fitch, W. T. (1997) Vocal tract length and formant frequency dispersion correlate with body size in rhesus macaques. Journal of the Acoustical Society of America 102:1213–22.Google Scholar
Fitch, W. T. (2000a) The evolution of speech: A comparative review. Trends in Cognitive Sciences 4(7):258–67.Google Scholar
Fitch, W. T. (2000b) The phonetic potential of nonhuman vocal tracts: Comparative cineradiographic observations of vocalizing animals. Phonetica 57:205–18.Google Scholar
Fitch, W. T. (2012) The biology and evolution of rhythm: Unraveling a paradox. In: Language and music as cognitive systems, ed. Rebuschat, P., Rohrmeier, M., Hawkins, J. A. & Cross, I., pp. 7395. Oxford University Press.Google Scholar
Fitch, W. T., Huber, L. & Bugnyar, T. (2010) Social cognition and the evolution of language: Constructing cognitive phylogenies. Neuron 65(6):795814. doi: 10.1016/j.neuron.2010.03.011.Google Scholar
Fitch, W. T. & Reby, D. (2001) The descended larynx is not uniquely human. Proceedings of the Royal Society, B: Biological Sciences 268:1669–75.Google Scholar
Flaherty, A. W. & Graybiel, A. M. (1993) Two input systems for body representations in the primate striatal matrix: Experimental evidence in the squirrel monkey. Journal of Neuroscience 13:1120–37.CrossRefGoogle ScholarPubMed
Flaherty, A. W. & Graybiel, A. M. (1994) Input-output organization of the sensorimotor striatum in the squirrel monkey. Journal of Neuroscience 14:599610.Google Scholar
Gaser, C. & Schlaug, G. (2003) Brain structures differ between musicians and non-musicians. Journal of Neuroscience 23:9240–45.Google Scholar
Geissmann, T. (1984) Inheritance of song parameters in the gibbon song, analysed in 2 hybrid gibbons (Hylobates pileatus X H. lar). Folia Primatologica 42:216–35.Google Scholar
Geissmann, T. (2000) Gibbon songs and human music from an evolutionary perspective. In: The origins of music, ed. Wallin, N. L., Merker, B. & Brown, S., pp. 103–23. MIT Press.Google Scholar
Gerardin, E., Lehéricy, S., Pochon, J.-B., Tézenas du Montcel, S., Mangin, J.-F., Poupon, F., Agid, Y., Le Bihan, D. & Marsault, C. (2003) Foot, hand, face and eye representation in the human striatum. Cerebral Cortex 13:162–69.Google Scholar
Gerfen, C. R. (2010) Functional neuroanatomy of dopamine in the striatum. In: Dopamine handbook, ed. Iversen, L. L., Iversen, S. D., Dunnett, S. B. & Björklund, A., pp. 1121. Oxford University Press.Google Scholar
Gerfen, C. R. & Bolam, J. P. (2010) The neuroanatomical organization of the basal ganglia. In: Handbook of basal ganglia structure and function, ed. Steiner, H. & Tseng, K. Y., pp. 328. Elsevier.Google Scholar
Gerfen, C. R. & Surmeier, D. J. (2011) Modulation of striatal projection systems by dopamine. Annual Review of Neuroscience 34:441–66.CrossRefGoogle ScholarPubMed
Ghazanfar, A. A. & Miller, C. T. (2006) Language evolution: Loquacious monkey brains? Current Biology 16:R879–81.Google Scholar
Ghazanfar, A. A., Morill, R. J. & Kayser, C. (2013) Monkeys are perceptually tuned to facial expressions that exhibit a theta-like speech rhythm. Proceedings of the National Academy of Sciences USA 110:1959–63.Google Scholar
Ghazanfar, A. A. & Rendall, D. (2008) Evolution of human vocal production. Current Biology 18(11):R457–60.CrossRefGoogle ScholarPubMed
Ghazanfar, A. A., Takahashi, D. Y., Mathur, N. & Fitch, W. T. (2012) Cineradiography of monkey lipsmacking reveals the putative origins of speech dynamics. Current Biology 22:1176–82.CrossRefGoogle Scholar
Gil-da-Costa, R., Martin, A., Lopes, M. A., Muňoz, M., Fritz, J. B. & Braun, A. R. (2006) Species-specific calls activate homologs of Broca's and Wernicke's areas in the macaque. Nature Neuroscience 9:1064–70.Google Scholar
Giroud, M., Lemesle, M., Madinier, G., Billiar, T. & Dumas, R. (1997) Unilateral lenticular infarcts: Radiological and clinical syndromes, aetiology, and prognosis. Journal of Neurology, Neurosurgery, and Psychiatry 63:611–15.Google Scholar
Gonzalez-Lima, F. (2010) Responses of limbic, midbrain and brainstem structures to electrically-induced vocalizations. In: Handbook of mammalian vocalization: An integrative neuroscience approach, ed. Brudzynski, S. M., pp. 293301. Elsevier.Google Scholar
Goodall, J. (1986) The chimpanzees of Gombe: Patterns of behavior. Belknap Press/Harvard University Press.Google Scholar
Gopnik, M. (1990a) Feature-blind grammar and dysphasia. Nature 344(6268):715. doi: 10.1038/344715a0.Google Scholar
Graybiel, A. M. (1990) Neurotransmitters and neuromodulators in the basal ganglia. Trends in Neurosciences 13:244–54.Google Scholar
Graybiel, A. M. (2005) The basal ganglia: Learning new tricks and loving it. Current Opinion in Neurobiology 15:638–44.Google Scholar
Graybiel, A. M. (2008) Habits, rituals, and the evaluative brain. Annual Review of Neuroscience 31:359–87. doi: 10.1146/annurev.neuro.29.051605.112851.Google Scholar
Green, R. E., Krause, J., Briggs, A. W., Maricic, T., Stenzel, U., Kircher, M., Patterson, N., Li, H., Zhai, W., Fritz, M. H.-Y., Hansen, N. F., Durand, E. Y., Malaspinas, A.-S., Jensen, J. D., Marques-Bonet, T., Alkan, C., Prüfer, K., Meyer, M., Burbano, H. A., Good, J. M., Schultz, R., Aximu-Petri, A., Butthof, A., Höber, B., Höffner, B., Siegemund, M., Weihmann, A., Nusbaum, C., Lander, E. S., Russ, C., Novod, N., Affourtit, J., Egholm, M., Verna, C., Rudan, P., Brajkovic, D., Kucan, E., Gusic, I., Doronichev, V. B., Golovanova, L. V., Lalueza-Fox, C. , de la Rasilla, M., Fortea, J., Rosas, A., Schmitz, R. W., Johnson, P. L. F., Eichler, E. E., Falush, D., Birney, E., Mullikin, J. C., Slatkin, M., Nielsen, R., Kelso, J., Lachmann, M., Reich, D. & Pääbo, S. (2010) A draft sequence of the Neandertal genome. Science 328(5979):710–22.Google Scholar
Grillner, S. (1991) Recombination of motor pattern generators. Current Biology 1:231–33.CrossRefGoogle ScholarPubMed
Grillner, S. & Wallén, P. (2004) Innate versus learned movements – a false dichotomy? In: Brain mechanisms for the integration of posture and movement, ed. Mori, S., Stuart, D. G. & Wiesendanger, M., pp. 312. (Progress in Brain Research, vol. 143). Elsevier.Google Scholar
Groenewegen, H. J. (2003) The basal ganglia and motor control. Neural Plasticity 10:107–20.Google Scholar
Groswasser, Z., Korn, C., Groswasser-Reider, I. & Solzi, P. (1988) Mutism associated with buccofacial apraxia and bihemispheric lesions. Brain and Language 34:157–68.Google Scholar
Gruber-Dujardin, E. (2010) Role of the periaqueductal gray in expressing vocalization. In: Handbook of mammalian vocalization: An integrative neuroscience approach, ed. Brudzynski, S. M., pp. 313–27. Elsevier.CrossRefGoogle Scholar
Haber, S. N. (2010a) Integrative networks across basal ganglia circuits. In: Handbook of basal ganglia structure and function, ed. Steiner, H. & Tseng, K. Y., pp. 409–27. Elsevier.Google Scholar
Haber, S. N. (2010b) Convergence of limbic, cognitive, and motor cortico-striatal circuits with dopamine pathways in primate brain. In: Dopamine handbook, ed. Iversen, L. L., Iversen, S. D., Dunnett, S. B. & Björklund, A., pp. 3848. Oxford University Press.Google Scholar
Haber, S. N., Fudge, J. L. & McFarland, N. R. (2000) Striatonigrostriatal pathways in primates form an ascending spiral from the shell to the dorsolateral striatum. Journal of Neuroscience 20:2369–82.Google Scholar
Haber, S. N., Kunishio, K., Mizobuchi, M. & Lynd-Balta, E. (1995) The orbital and medial prefrontal circuit through the primate basal ganglia. Journal of Neuroscience 15:4851–67.Google Scholar
Haesler, S., Rochefort, C., Georgi, B., Licznerski, P., Osten, P. & Scharff, C. (2007) Incomplete and inaccurate vocal imitation after knockdown of FoxP2 in songbird basal ganglia nucleus area X. PLoS Biology 5:2885–97.Google Scholar
Hage, S. R. (2010a) Localization of the central pattern generator for vocalization. In: Handbook of mammalian vocalization: An integrative neuroscience approach, ed. Brudzynski, S. M., pp. 329–37. Elsevier.Google Scholar
Hage, S. R. (2010b) Neuronal networks involved in the generation of vocalization. In: Handbook of mammalian vocalization: An integrative neuroscience approach, ed. Brudzynski, S. M., pp. 339–49. Elsevier.Google Scholar
Hage, S. R., Gavrilov, N. & Nieder, A. (2013) Cognitive control of distinct vocalizations in rhesus monkeys. Journal of Cognitive Neuroscience 25:1692–701. doi:0.1162/jocn_a_00428.Google Scholar
Hage, S. R. & Jürgens, U. (2006) On the role of the pontine brainstem in vocal pattern generation: A telemetric single-unit recording study in the squirrel monkey. Journal of Neuroscience 26:7105–15.Google Scholar
Hammerschmidt, K. & Fischer, J. (2008) Constraints in primate vocal production. In: Evolution of communicative flexibility: Complexity, creativity, and adaptability in human and animal communication, ed. Oller, D. K. & Griebel, U., pp. 93119. MIT Press.Google Scholar
Hardus, M. E., Lameira, A. R., Singleton, I., Morrogh-Bernard, H. C., Knott, C. D., Ancrenaz, M., Utami Atmoko, S. S. & Wich, S. A. (2009a) A description of the orangutan's vocal and sound repertoire, with a focus on geographic variation. In: Orangutans: Geographic variation in behavioral ecology and conservation, ed. Wich, S. A., Utami Atmoko, S. S., Setia, T. M. & van Schaik, C. P., pp. 4964. Oxford University Press.Google Scholar
Hardus, M. E., Lameira, A. R., van Schaik, C. P. & Wich, S. A. (2009b) Tool use in wild orang-utans modifies sound production: A functionally deceptive innovation? Proceedings of the Royal Society B: Biological Sciences 276:3689–94.Google Scholar
Hast, M. H., Fischer, J. M., Wetzel, A. B. & Thompson, V. E. (1974) Cortical motor representation of the laryngeal muscles in Macaca mulatta . Brain Research 73:229–40.Google Scholar
Hayes, C. (1951) The ape in our house. Harper & Brothers.Google Scholar
Hayes, K. J. & Hayes, C. (1952) Imitation in a home-raised chimpanzee. Journal of Comparative and Physiological Psychology 45:450–59.Google Scholar
Heilman, K. M., Leon, S. A. & Rosenbek, J. C. (2004) Affective aprosodia from a medial frontal stroke. Brain and Language 89:411–16.Google Scholar
Hihara, S., Yamada, H., Iriki, A. & Okanoya, K. (2003) Spontaneous vocal differentiation of coo-calls for tools and food in Japanese monkeys. Neuroscience Research 45:383–89.Google Scholar
Hikosaka, O. (2007) GABAergic output of the basal ganglia. In: GABA and the basal ganglia: From molecules to systems, ed. Tepper, J. M., Abercrombie, E. D. & Bolam, J. P., pp. 209–26. (Progress in Brain Research, vol. 160). Elsevier.Google Scholar
Hillix, W. A. (2007) The past, present, and possible futures of animal language research. In: Primate perspectives on behavior and cognition, ed. Washburn, D. A., pp. 223–34. American Psychological Association.Google Scholar
Hirose, H. (2010) Investigating the physiology of laryngeal structures. In: The handbook of phonetic sciences, 2nd edition, ed. Hardcastle, W. J., Laver, J. & Gibbon, F. E., pp. 130–52. Wiley-Blackwell.Google Scholar
Ho, A. K., Bradshaw, J. L., Iansek, R. & Alfredson, R. (1999a) Speech volume regulation in Parkinson's disease: Effects of implicit cues and explicit instructions. Neuropsychologia 37:1453–60.Google Scholar
Ho, A. K., Iansek, R. & Bradshaw, J. L. (1999b) Regulation of Parkinsonian speech volume: The effect of interlocuter distance. Journal of Neurology, Neurosurgery, and Psychiatry 67:199202.Google Scholar
Hofreiter, M. (2011) Drafting human ancestry: What does the Neanderthal genome tell us about hominid evolution? Commentary on Green et al. (2010). Human Biology 83:111.Google Scholar
Holloway, R. L., Broadfield, D. C. & Yuan, M. S. (2004) The human fossil record: Vol. III. Brain endocasts – the paleoneurological evidence. Wiley.Google Scholar
Holstege, G. (1991) Descending pathways from the periaqueductal gray and adjacent areas. In: Midbrain periaqueductal gray matter: Functional, anatomical, and neurochemical organization, ed. Depaulis, A. & Bandler, R., pp. 239–65. Plenum Press. (NATO ASI Series, Life Sciences, vol. 213).Google Scholar
Hopkins, W. D. & Savage-Rumbaugh, E. S. (1991) Vocal communication as a function of differential rearing experiences in Pan paniscus: A preliminary report. International Journal of Primatology 12:559–83.Google Scholar
Hopkins, W. D., Taglialatela, J. P. & Leavens, D. A. (2007) Chimpanzees differentially produce novel vocalizations to capture the attention of a human. Animal Behaviour 73(2):281–86.Google Scholar
Hurst, J., Baraitser, M., Auger, E., Graham, F. & Norell, S. (1990) An extended family with a dominantly inherited speech disorder. Developmental Medicine and Child Neurology 32:347–55.Google Scholar
Iannetti, P., Spalice, A., Raucci, U., Atzei, G. & Cipriani, C. (1997) Gelastic epilepsy: Video-EEG, MRI and SPECT characteristics. Brain and Development 19:418–21.Google Scholar
Ikeda, A., Lüders, H. O., Burgess, R. C. & Shibasaki, H. (1992) Movement-related potentials recorded from supplementary motor area and primary motor area. Brain 115:1017–43.Google Scholar
Ingold, T. (1994) Tool-using, toolmaking, and the evolution of language. In: Hominid culture in primate perspective, ed. Quiatt, D. & Itani, J., pp. 279314. University Press of Colorado.Google Scholar
Iwasa, H., Shibata, T., Mine, S., Koseki, K., Yasuda, K., Kasagi, Y., Okada, M., Yabe, H., Kaneko, S. & Nakajima, Y. (2002) Different patterns of dipole source localization in gelastic seizure with or without a sense of mirth. Neuroscience Research 43:2329.Google Scholar
Iwatsubo, T., Kuzuhara, S., Kanemitsu, A., Shimada, H. & Toyokura, Y. (1990) Corticofugal projections to the motor nuclei of the brainstem and spinal cord in humans. Neurology 40(2):309–12.Google Scholar
Jacob, F. (1977) Evolution and tinkering. Science 196:1161–66.Google Scholar
James, W. (2003) The ceremonial animal: A new portrait of anthropology. Oxford University Press.Google Scholar
Janik, V. & Slater, P. J. B. (1997) Vocal learning in mammals. In: Advances in the Study of Behavior, vol. 26, ed. Slater, P. J. B., Rosenblatt, J. S., Snowdon, C. T., & Milinski, M., pp. 5999. Academic Press.Google Scholar
Janik, V. M. & Slater, P. J. B. (2000) The different roles of social learning in vocal communication. Animal Behaviour 60:111.Google Scholar
Jankovic, J. (2008) Parkinson's disease: Clinical features and diagnosis. Journal of Neurology, Neurosurgery, and Psychiatry 79:368–76.Google Scholar
Jarvis, E. D. (2004a) Brains and birdsong. In: Nature's music: The science of birdsong, ed. Marler, P. & Slabbekoorn, H., pp. 226–71. Elsevier.Google Scholar
Jarvis, E. D. (2004b) Learned birdsong and the neurobiology of human language. In: Behavioral neurobiology of birdsong, ed. Zeigler, H. P., Marler, P., pp. 749–77. (Annals of the New York Academy of Sciences, vol. 1016). New York Academy of Sciences.Google Scholar
Joel, D. & Weiner, I. (1994) The organization of the basal ganglia-thalamocortical circuits: Open interconnected rather than closed segregated. Neuroscience 63:363–79.Google Scholar
Jonas, S. (1981) The supplementary motor region and speech emission. Journal of Communication Disorders 14:349–73.CrossRefGoogle ScholarPubMed
Jonas, S. (1987) The supplementary motor region and speech. In: The frontal lobes revisited, ed. Perecman, E., pp. 241–50. Erlbaum.Google Scholar
Jürgens, U. (1974) On the elicitability of vocalization from the cortical larynx area. Brain Research 81:564–66.Google Scholar
Jürgens, U. (1986) The squirrel monkey as an experimental model in the study of cerebral organization of emotional vocal utterances. European Archives of Psychiatry and Neurological Sciences 236:4043.Google Scholar
Jürgens, U. (2002b) Neural pathways underlying vocal control. Neuroscience and Biobehavioral Reviews 26:235–58.Google Scholar
Jürgens, U. & Alipour, M. (2002) A comparative study on the cortico-hypoglossal connections in primates, using biotin dextranamine. Neuroscience Letters 328:245–48.Google Scholar
Jürgens, U., Kirzinger, A. & von Cramon, D. (1982) The effects of deep-reaching lesions in the cortical face area on phonation: A combined case report and experimental monkey study. Cortex 18:125–39.CrossRefGoogle ScholarPubMed
Jürgens, U. & Ploog, D. (1970) Cerebral representation of vocalization in the squirrel monkey. Experimental Brain Research 10:532–54.Google Scholar
Jürgens, U. & von Cramon, D. (1982) On the role of the anterior cingulate cortex in phonation: A case report. Brain and Language 15:234–48.Google Scholar
Kaestner, K. H., Knöchel, W. & Martínez, D. E. (2000) Unified nomenclature for the winged helix/forkhead transcription factors. Genes and Development 14:142–46.Google Scholar
Kawashima, S., Ueki, Y., Kato, T., Matsukawa, N., Mima, T., Hallett, M., Ito, K. & Ojika, K. (2012) Changes in striatal dopamine release associated with human motor-skill acquisition. PLOS ONE 7:e31728.Google Scholar
Kent, R. D., Kent, J. F., Weismer, G. & Duffy, J. R. (2000) What dysarthrias can tell us about the neural control of speech. Journal of Phonetics 28:273302.Google Scholar
Kent, R. D. & Read, C. (2002) The acoustic analysis of speech, 2nd edition. Singular/Thomson Learning.Google Scholar
Kim, I.-S., Ki, C.-S. & Park, K.-J. (2010) Pediatric-onset dystonia associated with bilateral striatal necrosis and G14459A mutation in a Korean family: A case report. Journal of Korean Medical Science 25:180–84.Google Scholar
Kirzinger, A. (1985) Cerebellar lesion effects on vocalization of the squirrel monkey. Behavioural Brain Research 16:177–81.Google Scholar
Kirzinger, A. & Jürgens, U. (1982) Cortical lesion effects and vocalization in the squirrel monkey. Brain Research 233:299315.Google Scholar
Knight, C. (1999) Sex and language as pretend-play. In: The evolution of culture: An interdisciplinary view, ed. Dunbar, R., Knight, C. & Power, C., pp. 228–47. Edinburgh University Press.Google Scholar
Koda, H., Oyakawa, C., Kato, A. & Masataka, N. (2007) Experimental evidence for the volitional control of vocal production in an immature gibbon. Behaviour 144:681–92.Google Scholar
Kovac, S., Deppe, M., Mohammadi, S., Schiffbauer, H., Schwindt, W., Möddel, G., Dogan, M. & Evers, S. (2009) Gelastic seizures: A case of lateral frontal lobe epilepsy and review of the literature. Epilepsy and Behavior 15:249–53.Google Scholar
Krägeloh-Mann, I., Helber, A., Mader, I., Staudt, M., Wolff, M., Groenendaal, F. & DeVries, L. (2002) Bilateral lesions of thalamus and basal ganglia: Origin and outcome. Developmental Medicine and Child Neurology 44:477–84.Google Scholar
Krause, J., Lalueza-Fox, C., Orlando, L., Enard, W., Green, R. E., Burbano, H. A., Hublin, J. J., Hänni, C., Fortea, J., De la Rasilla, M., Bertranpetit, J., Rosas, A. & Pääbo, S. (2007) The derived FOXP2 variant of modern humans was shared with Neandertals. Current Biology 17:15.Google Scholar
Kreiman, J. & Sidtis, D. (2011) Foundations of voice studies: An interdisciplinary approach to voice production and perception. Wiley-Blackwell.Google Scholar
Kreitzer, A. C. & Malenka, R. C. (2008) Striatal plasticity and basal ganglia circuit function. Neuron 60:543–54.Google Scholar
Kunishio, K. & Haber, S. N. (1994) Primate cingulostriatal projection: Limbic striatal versus sensorimotor striatal input. Journal of Comparative Neurology 350:337–56.Google Scholar
Kuypers, H. G. J. M. (1958a) Corticobulbar connection to the pons and lower brain-stem in man. Brain 81:364–88.Google Scholar
Kuypers, H. G. J. M. (1958b) Some projections from the peri-central cortex to the pons and lower brain stem in monkey and chimpanzee. Journal of Comparative Neurology 110:221–55.Google Scholar
Ladefoged, P. (2005) Vowels and consonants: An introduction to the sounds of languages, 2nd edition. Blackwell.Google Scholar
Lamendella, J. T. (1977) The limbic system in human communication. In: Studies in neurolinguistics, vol. 3, ed. Whitaker, H. & Whitaker, H. A., pp. 157222. Academic Press (Perspectives in Neurolinguistics and Psycholinguistics Series).Google Scholar
Larson, C. R., Sutton, D. & Lindeman, R. C. (1978) Cerebellar regulation of phonation in rhesus monkey (Macaca mulatta). Experimental Brain Research 33:118.Google Scholar
Larson, C. R., Sutton, D., Taylor, E. M. & Lindeman, R. (1973) Sound spectral properties of conditioned vocalization in monkeys. Phonetica 27:100–10.Google Scholar
Le Beau, J. (1954) Anterior cingulectomy in man. Journal of Neurosurgery 11:268–76.Google Scholar
Lemasson, A. & Hausberger, M. (2004) Patterns of vocal sharing and social dynamics in a captive group of Campbell's monkeys (Cercopithecus campbelli campbelli). Journal of Comparative Psychology 118:347–59.Google Scholar
Lemasson, A., Hausberger, M. & Zuberbühler, K. (2005) Socially meaningful vocal plasticity in adult Campbell's monkeys (Cercopithecus campbelli). Journal of Comparative Psychology 119:220–29.Google Scholar
Lewin, R. & Foley, R. A. (2004) Principles of human evolution, 2nd edition. Blackwell.Google Scholar
Lewis, J. (2009) As well as words: Congo Pygmy hunting, mimicry, and play. In: The cradle of language, ed. Botha, R. & Knight, C., pp. 236–56. Oxford University Press.Google Scholar
Lieberman, D. E. (2011) The evolution of the human head. Harvard University Press.Google Scholar
Lieberman, P. (1968) Primate vocalizations and human linguistic ability. Journal of the Acoustical Society of America 44:1574–84.Google Scholar
Lieberman, P. (2000) Human language and our reptilian brain: The subcortical bases of speech, syntax, and thought. Harvard University Press.Google Scholar
Lieberman, P. (2006a) Limits on tongue deformation: Diana monkey formants and the impossible vocal tract shapes proposed by Riede et al. (2005). Journal of Human Evolution 50:219–21.Google Scholar
Lieberman, P. (2006b) Toward an evolutionary biology of language. Harvard University Press.Google Scholar
Lieberman, P. (2007) The evolution of human speech: Its anatomical and neural bases. Current Anthropology 48:3966.Google Scholar
Lieberman, P., Klatt, D. H. & Wilson, W. H. (1969) Vocal tract limitations on the vowel repertoires of rhesus monkey and other nonhuman primates. Science 164:1185–87.Google Scholar
Liégeois, F., Baldeweg, T., Connelly, A., Gadian, D. G., Mishkin, M. & Vargha-Khadem, F. (2003) Language fMRI abnormalities associated with FOXP2 gene mutation. Nature Neuroscience 6:1230–37.Google Scholar
Locke, J. L. (1993) The child's path to spoken language, First edition. Harvard University Press.Google Scholar
Logemann, J. A., Fisher, H. B., Boshes, B. & Blonsky, E. R. (1978) Frequency and cooccurrence of vocal tract dysfunctions in the speech of a large sample of Parkinson patients. Journal of Speech and Hearing Disorders 43:4757.Google Scholar
Loucks, T. M. J., Poletto, C. J., Simonyan, K., Reynolds, C. L. & Ludlow, C. L. (2007) Human brain activation during phonation and exhalation: Common volitional control for two upper airway functions. NeuroImage 36:131–43.Google Scholar
Lund, J. P. & Kolta, A. (2006) Brainstem circuits that control mastication: Do they have anything to say during speech? Journal of Communication Disorders 39:381–90.Google Scholar
MacDermot, K. D., Bonora, E., Sykes, N., Coupe, A. M., Lai, C. S. L., Vernes, S. C., Vargha-Khadem, F., McKenzie, F., Smith, R. L., Monaco, A. P. & Fisher, S. E. (2005) Identification of FOXP2 truncation as a novel cause of developmental speech and language deficits. American Journal of Human Genetics 76:1074–80.Google Scholar
MacNeilage, P. F. (1998) The frame/content theory of evolution of speech production. Behavioral and Brain Sciences 21(4):499511.Google Scholar
MacNeilage, P. F. (2008) The origin of speech. Oxford University Press.Google Scholar
Mallet, N., Ballion, B., Le Moine, C. & Gonon, F. (2006) Cortical inputs and GABA interneurons imbalance projection neurons in the striatum of parkinsonian rats. Journal of Neuroscience 26:3875–84.Google Scholar
Malloch, S. & Trevarthen, C., eds. (2009) Communicative musicality: Exploring the basis of human companionship. Oxford University Press.Google Scholar
Manser, M. B., Seyfarth, R. M. & Cheney, D. L. (2002) Suricate alarm calls signal predator class and urgency. Trends in Cognitive Sciences 6:5557.Google Scholar
Mao, C. C., Coull, B. M., Golper, L. A. & Rau, M. T. (1989) Anterior operculum syndrome. Neurology 39:1169–72.Google Scholar
Maricic, T., Günther, V., Georgiev, O., Gehre, S., Curlin, M., Schreiweis, C., Naumann, R., Burbano, H. A., Meyer, M., Laluela-Fox, C., de la Rasilla, M., Rosas, A., Gajovic, S., Kelso, J., Enard, W., Schaffner, W. & Pääbo, S. (2013) A recent evolutionary change affects a regulatory element in the human FOXP2 gene. Molecular Biology and Evolution 30(4):844–52. doi: 10.1093/molbev/mss271.Google Scholar
Marsden, C. D. (1982) The mysterious motor function of the basal ganglia: The Robert Wartenberg Lecture. Neurology 32:514–39.Google Scholar
Marshall, A. J., Wrangham, R. W. & Arcadi, A. C. (1999) Does learning affect the structure of vocalizations in chimpanzees? Animal Behaviour 58:825–30.Google Scholar
Masataka, N. (2008a) The gestural theory of and the vocal theory of language origins are not incompatible with one another. In: The origins of language: Unraveling evolutionary forces, ed. Masataka, N., pp. 110. Springer.Google Scholar
Masataka, N. (2008b) Implication of the human musical faculty for evolution of language. In: The origins of language: Unraveling evolutionary forces, ed. Masataka, N., pp. 133–51. Springer.Google Scholar
Masdeu, J. C., Schoene, W. C. & Funkenstein, H. (1978) Aphasia following infarction of the left supplementary motor area: A clinicopathologic study. Neurology 28:1220–23.Google Scholar
McHaffie, J. G., Stanford, T. R., Stein, B. E., Coizet, V. & Redgrave, P. (2005) Subcortical loops through the basal ganglia. Trends in Neurosciences 28:401407.Google Scholar
Miller, C. T., Beck, K., Meade, B. & Wang, X. (2009a) Antiphonal call timing in marmosets is behaviorally significant: Interactive playback experiments. Journal of Comparative Physiology, A: Neuroethology, Sensory, Neural, and Behavioral Physiology 195:783–89.Google Scholar
Miller, C. T., Eliades, S. J. & Wang, X. (2009b) Motor planning for vocal production in common marmosets. Animal Behaviour 78:1195–203.Google Scholar
Milo, R. G. & Quiatt, D. (1994) Language in the middle and late stone ages: Glottogenesis in anatomically modern homo sapiens. In: Hominid culture in primate perspective, ed. Quiatt, D. & Itani, J., pp. 321–39. University Press of Colorado.Google Scholar
Mitani, J. C. & Brandt, K. L. (1994) Social factors influence acoustic variability in the long-distance calls of male chimpanzees. Ethology 96:233–52.Google Scholar
Mitani, J. C. & Gros-Louis, J. (1998) Chorusing and call convergence in chimpanzees: Tests of three hypotheses. Behaviour 135:1041–64.Google Scholar
Mithen, S. J. (2006) The singing Neanderthals: The origins of music, language, mind and body. Harvard University Press. (Original work published in 2005).Google Scholar
Mogenson, G. J., Jones, D. L. & Yim, C. Y. (1980) From motivation to action: Functional interface between the limbic system and the motor system. Progress in Neurobiology 14:6997.Google Scholar
Moore, C. A. (2004) Physiologic development of speech production. In: Speech motor control in normal and disordered speech, ed. Maassen, B., Kent, R. D., Peters, H. F. M., van Lieshout, P. H. H. M. & Hulstijn, W., pp. 191209. Oxford University Press.Google Scholar
Morecraft, R. J. & van Hoesen, G. W. (1992) Cingulate input to the primary and supplementary motor cortices in the Rhesus monkey: Evidence for somatotopy in areas 24c and 23c. Journal of Comparative Neurology 322:471–89.Google Scholar
Morecraft, R. J., Louie, J. L., Herrick, J. L. & Stilwell-Morecraft, K. S. (2001) Cortical innervation of the facial nucleus in the non-human primate: A new interpretation of the effects of stroke and related subtotal brain trauma on the muscles of facial expression. Brain 124:176208.Google Scholar
Morley, I. (2012) Hominin physiological evolution and the emergence of musical capacities. In: Music, language, and human evolution, ed. Bannan, N., pp. 109–41. Oxford University Press.Google Scholar
Müller, J., Wenning, G. K., Verny, M., McKee, A., Chaudhuri, K. R., Jellinger, K., Poewe, W. & Litvan, I. (2001) Progression of dysarthria and dysphagia in postmortem-confirmed Parkinsonian disorders. Archives of Neurology 58:259–64.Google Scholar
Müller-Vahl, K. R., Kaufmann, J., Grosskreutz, J., Dengler, R., Emrich, H. M. & Peschel, T. (2009). Prefrontal and anterior cingulate cortex abnormalities in Tourette syndrome: Evidence from voxel-based morphometry and magnetization transfer imaging. BMC Neuroscience 10:47. Available at: www.biomedcentral.com/1471-2202/10/47 Google Scholar
Munhall, K. & Löfqvist, A. (1992) Gestural aggregation in speech: Laryngeal gestures. Journal of Phonetics 20:111–26.Google Scholar
Myers, R. E. (1976) Comparative neurology of vocalization and speech: Proof of a dichotomy. In: Origins and evolution of language and speech, ed. Harnad, S. R., Steklis, H. D. & Lancaster, J., pp. 745–57. (Annals of the New York Academy of Sciences, vol. 280). New York Academy of Sciences.Google Scholar
Nakano, K. (2000) Neural circuits and topographic organization of the basal ganglia and related regions. Brain and Development 22:S516.Google Scholar
Nambu, A. (2008) Seven problems on the basal ganglia. Current Opinion in Neurobiology 18:595604.Google Scholar
Nambu, A. (2011) Somatotopic organization of the primate basal ganglia. Frontiers in Neuroanatomy 5:26.Google Scholar
Newman, J. D. (2003) Vocal communication and the triune brain. Physiology and Behavior 79:495502.Google Scholar
Nieuwenhuys, R., Voogd, J. & van Huijzen, C. (2008) The human central nervous system, 4th edition. Springer.Google Scholar
Noonan, J. P. (2010) Neanderthal genomics and the evolution of modern humans. Genome Research 20:547–53.Google Scholar
Öngür, D. & Price, J. L. (2000) The organization of networks within the orbital and medial prefrontal cortex of rats, monkeys and humans. Cerebral Cortex 10:206–19.Google Scholar
Ouattara, K., Lemasson, A. & Zuberbühler, K. (2009) Campbell's monkeys concatenate vocalizations into context-specific call sequences. Proceedings of the National Academy of Sciences USA 106(51): 22026–31.Google Scholar
Owren, M. J., Amoss, R. T. & Rendall, D. (2011) Two organizing principles of vocal production: Implications for nonhuman and human primates. American Journal of Primatology 73(6):530–44.Google Scholar
Owren, M. J., Dieter, J. A., Seyfarth, R. M. & Cheney, D. L. (1992) “Food” calls produced by adult female rhesus (Macaca mulatta) and Japanese (M. fuscata) macaques, their normally-raised offspring, and offspring cross-fostered between species. Behaviour 120:218–31.Google Scholar
Owren, M. J., Dieter, J. A., Seyfarth, R. M. & Cheney, D. L. (1993) Vocalizations of rhesus (Macaca mulatta) and Japanese (M. fuscata) macaques cross-fostered between species show evidence of only limited modification. Developmental Psychobiology 26:389406.Google Scholar
Packard, M. G. & Knowlton, B. J. (2002) Learning and memory functions of the basal ganglia. Annual Review of Neuroscience 25:563–93.Google Scholar
Panksepp, J. (1998) Affective neuroscience: The foundations of human and animal emotions. Oxford University Press.Google Scholar
Panksepp, J. (2010) Emotional causes and consequences of social-affective vocalization. In: Handbook of mammalian vocalization: An integrative neuroscience approach, ed. Brudzynski, S. M., pp. 201208. Elsevier.Google Scholar
Papoušek, M. (2003) Intuitive parenting: A hidden source of musical stimulation in infancy. In: Musical beginnings: Origins and development of musical competence, ed. Deliège, I. & Sloboda, J., pp. 88112. Oxford University Press.Google Scholar
Parent, A. & Hazrati, L. N. (1995) Functional anatomy of the basal ganglia: I. The cortico-basal ganglia-thalamo-cortical loop. Brain Research Brain Research Reviews 20:91127.Google Scholar
Passingham, R. (2008) What is special about the human brain? Oxford University Press.Google Scholar
Paus, T. (2001) Primate anterior cingulate cortex: Where motor control, drive and cognition interface. Nature Reviews Neuroscience 2:417–24.Google Scholar
Paus, T., Tomaiuolo, F., Otaky, N., MacDonald, D., Petrides, M., Atlas, J., Morris, R. & Evans, A. C. (1996) Human cingulate and paracingulate sulci: Pattern, variability, asymmetry, and probabilistic map. Cerebral Cortex 6:207–14.Google Scholar
Petrides, M., Cadoret, G. & Mackey, S. (2005) Orofacial somatomotor responses in the macaque monkey homologue of Broca's area. Nature 435:1235–38.Google Scholar
Petrides, M. & Pandya, D. N. (2009) Distinct parietal and temporal pathways to the homologues of Broca's area in the monkey. PLoS Biology 7:e1000170.Google Scholar
Picard, N. & Strick, P. L. (1996) Motor areas of the medial wall: A review of their location and functional activation. Cerebral Cortex 6:342–53.Google Scholar
Pierce, J. D. Jr. (1985) A review of attempts to condition operantly alloprimate vocalizations. Primates 26:202–13.Google Scholar
Pistorio, A. L., Vintch, B. & Wang, X. (2006) Acoustic analysis of vocal development in a New World primate, the common marmoset (Callithrix jacchus). Journal of the Acoustical Society of America 120:1655–70.Google Scholar
Postuma, R. B. & Dagher, A. (2006) Basal ganglia functional connectivity based on a meta-analysis of 126 positron emission tomography and functional magnetic resonance imaging publications. Cerebral Cortex 16:1508–21.Google Scholar
Radua, J., van den Heuvel, O. A., Surguladze, S. & Mataix-Cols, D. (2010) Meta-analytical comparison of voxel-based morphometry studies in obsessive-compulsive disorder vs other anxiety disorders. Archives of General Psychiatry 67:701–11.Google Scholar
Ramig, L. O., Fox, C. & Sapir, S. (2004) Parkinson's disease: Speech and voice disorders and their treatment with the Lee Silverman Voice Treatment. Seminars in Speech and Language 25:169–80.Google Scholar
Ramig, L. O., Fox, C. & Sapir, S. (2007) Speech disorders in Parkinson's disease and the effects of pharmacological, surgical and speech treatment with emphasis on Lee Silverman Voice Treatment (LSVT®). In: Parkinson's disease and related disorders, Part 1, ed. Koller, W. C. & Melamed, E., pp. 385–99. (Handbook of Clinical Neurology, vol. 83, 3rd series). Elsevier Press.Google Scholar
Rappaport, R. A. (1999) Ritual and religion in the making of humanity. Cambridge University Press.Google Scholar
Rappaport, R. A. (2000) Pigs for the ancestors: Ritual in the ecology of a New Guinea people, 2nd edition. Waveland Press.Google Scholar
Reimers-Kipping, S., Hevers, W., Pääbo, S. & Enard, W. (2011) Humanized Foxp2 specifically affects cortico-basal ganglia circuits. Neuroscience 175:7584. doi: 10.1016/j.neuroscience.2010.11.042.Google Scholar
Reiner, A. (2010) Organization of corticostriatal projection neuron types. In: Handbook of basal ganglia structure and function, ed. Steiner, H. & Tseng, K. Y., pp. 323–39. Elsevier.Google Scholar
Rendall, D., Kollias, S., Ney, C. & Lloyd, P. (2005) Pitch (F0) and formant profiles of human vowels and vowel-like baboon grunts: The role of vocalizer body size and voice-acoustic allometry. Journal of the Acoustical Society of America 117:944–55.Google Scholar
Riecker, A., Kassubek, J., Gröschel, K., Grodd, W. & Ackermann, H. (2006) The cerebral control of speech tempo: Opposite relationship between speaking rate and BOLD signal changes at striatal and cerebellar structures. NeuroImage 29:4653.Google Scholar
Riede, T., Bronson, E., Hatzikirou, H. & Zuberbühler, K. (2005) Vocal production mechanisms in a non-human primate: Morphological data and a model. Journal of Human Evolution 48:8596.Google Scholar
Riede, T., Bronson, E., Hatzikirou, H. & Zuberbühler, K. (2006) Multiple discontinuities in nonhuman vocal tracts: A response to Lieberman (2006). Journal of Human Evolution 50:222–25.Google Scholar
Riede, T. & Zuberbühler, K. (2003a) Pulse register phonation in Diana monkey alarm calls. Journal of the Acoustical Society of America 113:2919–26.Google Scholar
Riede, T. & Zuberbühler, K. (2003b) The relationship between acoustic structure and semantic information in Diana monkey alarm vocalization. Journal of the Acoustical Society of America 114:1132–42.Google Scholar
Rightmire, G. P. (2004) Brain size and encephalization in early to mid-Pleistocene Homo . American Journal of Physical Anthropology 124:109–23.Google Scholar
Rightmire, G. P. (2007) Later middle Pleistocene Homo . In: Handbook of paleoanthropology, vol. 3: Phylogeny of hominids, ed. Henke, W. & Tattersall, I., pp. 1695–715. Springer.Google Scholar
Robbins, T. W. (2010) From behavior to cognition: Functions of mesostriatal, mesolimbic, and mesocortical dopamine systems. In: Dopamine handbook, ed. Iversen, L. L., Iversen, S. D., Dunnett, S. B. & Björklund, A., pp. 203–14. Oxford University Press.Google Scholar
Robinson, B. W. (1967) Vocalization evoked from forebrain in Macaca mulatta . Physiology and Behavior 2:345–54.Google Scholar
Roland, E. H., Poskitt, K., Rodriguez, E., Lupton, B. A. & Hill, A. (1998) Perinatal hypoxic-ischemic thalamic injury: Clinical features and neuroimagery. Annals of Neurology 44:161–66.Google Scholar
Rosas, A., Martínez-Maza, C., Bastir, M., García-Tabernero, A., Lalueza-Fox, C., Huguet, R., Ortiz, J. E., Julià, R., Soler, V., de Torres, T., Martínez, E., Canaveras, J. C., Sánchez-Moral, S., Cuezva, S., Lario, J., Santamaría, D., de la Rasilla, M. & Fortea, J. (2006) Paleobiology and comparative morphology of a late Neandertal sample from El Sidrón, Asturias, Spain. Proceedings of the National Academy of Sciences USA 103:19266–71.Google Scholar
Ross, E. D. & Mesulam, M.-M. (1979) Dominant language functions of the right hemisphere? Archives of Neurology 36:144–48.Google Scholar
Ross, E. D. & Monnot, M. (2008) Neurology of affective prosody and its functional-anatomic organization in right hemisphere. Brain and Language 104:5174.Google Scholar
Roush, R. S. & Snowdon, C. T. (1994) Ontogeny of food-associated calls in cotton-top tamarins. Animal Behaviour 47:263–73.Google Scholar
Roush, R. S. & Snowdon, C. T. (1999) The effects of social status on food-associated calling behaviour in captive cotton-top tamarins. Animal Behaviour 58:1299–305.Google Scholar
Roy, S., Miller, C. T., Gottsch, D. & Wang, X. (2011) Vocal control by the common marmoset in the presence of interfering noise. Journal of Experimental Biology 214:3619–29.Google Scholar
Rubens, A. B. (1975) Aphasia with infarction in the territory of the anterior cerebral artery. Cortex 11:239–50.Google Scholar
Rukstalis, M., Fite, J. E. & French, J. A. (2003) Social change affects vocal structure in a callitrichid primate (Callithrix kuhlii). Ethology 109:327–40.Google Scholar
Satoh, T., Nakai, S., Sato, T. & Kimura, M. (2003) Correlated coding of motivation and outcome of decision by dopamine neurons. Journal of Neuroscience 23:9913–23.Google Scholar
Savage-Rumbaugh, S., Fields, W. M. & Spircu, T. (2004) The emergence of knapping and vocal expression embedded in a Pan/Homo culture. Biology and Philosophy 19:541–75.Google Scholar
Scharff, C. & Haesler, S. (2005) An evolutionary perspective on FoxP2: Strictly for the birds? Current Opinion in Neurobiology 15:694703.Google Scholar
Scherer, K. R. (1986) Vocal affect expression: A review and a model for future research. Psychological Bulletin 99:143–65.Google Scholar
Scherer, K. R., Johnstone, T. & Klasmeyer, G. (2009) Vocal expression of emotion. In: Handbook of affective sciences, ed. Davidson, R. J., Scherer, K. R. & Goldsmith, H. Hill, pp. 433–56. Oxford University Press.Google Scholar
Schultz, W. (2006) Behavioral theories and the neurophysiology of reward. Annual Review of Psychology 57:87115.Google Scholar
Schultz, W. (2007) Behavioral dopamine signals. Trends in Neurosciences 30:203–10.Google Scholar
Schultz, W. (2010) Dopamine signals for reward value and risk: Basic and recent data. Behavioral and Brain Functions 6:24.Google Scholar
Schulz, G. M., Varga, M., Jeffires, K., Ludlow, C. L. & Braun, A. R. (2005) Functional neuroanatomy of human vocalization: An H2 15O PET study. Cerebral Cortex 15:1835–47.Google Scholar
Seeley, W. W. (2008) Selective functional, regional, and neuronal vulnerability in frontotemporal dementia. Current Opinion in Neurology 21:701707.Google Scholar
Seyfarth, R. M. & Cheney, D. L. (2003b) Signalers and receivers in animal communication. Annual Review of Psychology 54:145–73.Google Scholar
Seyfarth, R. M., Cheney, D. L. & Marler, P. (1980) Vervet monkey alarm calls: Semantic communication in a free-ranging primate. Animal Behaviour 28:1070–94.Google Scholar
Sherwood, C. C. (2005) Comparative anatomy of the facial motor nucleus in mammals, with an analysis of neuron numbers in primates. The Anatomical Record, Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology 287(1):1067–79.Google Scholar
Sherwood, C. C., Broadfield, D. C., Holloway, R. L., Gannon, P. J. & Hof, P. R. (2003) Variability of Broca's area homologue in African great apes: Implications for language evolution. The Anatomical Record, Part A: Discoveries in Molecular, Cellular, and Evolutionary Biology 271:276–85.Google Scholar
Sherwood, C. C., Hof, P. R., Holloway, R. L., Semendeferi, K., Gannon, P. J., Frahm, H. D. & Zilles, K. (2005) Evolution of the brainstem orofacial motor system in primates: A comparative study of trigeminal, facial, and hypoglossal nuclei. Journal of Human Evolution 48:4584.Google Scholar
Shriberg, L. D., Aram, D. M. & Kwiatkowski, J. (1997) Developmental apraxia of speech: I. Descriptive and theoretical perspectives. Journal of Speech, Language, and Hearing Research 40:273–85.Google Scholar
Sidtis, J. J. & Van Lancker Sidtis, D. (2003) A neurobehavioral approach to dysprosody. Seminars in Speech and Language 24:93105.Google Scholar
Simonyan, K. & Jürgens, U. (2002) Cortico-cortical projections of the motorcortical larynx area in the rhesus monkey. Brain Research 949:2331.Google Scholar
Simonyan, K. & Jürgens, U. (2005) Afferent subcortical connections into the motor cortical larynx area in the rhesus monkey. Neuroscience 130:119–31.Google Scholar
Skodda, S., Grönheit, W. & Schlegel, U. (2011) Intonation and speech rate in Parkinson's disease: General and dynamic aspects and responsiveness to levodopa admission. Journal of Voice 25:199205.Google Scholar
Skodda, S., Rinsche, H. & Schlegel, U. (2009) Progression of dysprosody in Parkinson's disease over time – a longitudinal study. Movement Disorders 24:716–22.Google Scholar
Smith, A. (2010) Development of neural control of orofacial movements for speech. In: The handbook of phonetic sciences, 2nd edition, ed. Hardcastle, W. J., Laver, J. & Gibbon, F. E., pp. 251–96. Wiley-Blackwell.Google Scholar
Smith, W. K. (1945) The functional significance of the rostral cingulate cortex as revealed by its responses to electrical excitation. Journal of Neurophysiology 8:241–55.Google Scholar
Snowdon, C. T. (2008) Contextually flexible communication in nonhuman primates. In: Evolution of communicative flexibility: Complexity, creativity, and adaptability in human and animal communication, ed. Oller, D. K. & Griebel, U., pp. 7191. MIT Press.Google Scholar
Snowdon, C. T. & Elowson, A. M. (1999) Pygmy marmosets modify call structure when paired. Ethology 105:893908.Google Scholar
Solano, A., Roig, M., Vives-Bauza, C., Hernandez-Peña, J., Garcia-Arumi, E., Playan, A., Lopez-Perez, M. J., Andreu, A. L. & Montoya, J. (2003) Bilateral striatal necrosis associated with a novel mutation in the mitochondrial ND6 gene. Annals of Neurology 54:527–30.Google Scholar
Sperli, F., Spinelli, L., Pollo, C. & Seeck, M. (2006) Contralateral smile and laughter, but no mirth, induced by electrical stimulation of the cingulate cortex. Epilepsia 47:440–43.Google Scholar
Striedter, G. F. (2005) Principles of brain evolution. Sinauer.Google Scholar
Stringer, C. (2012) Lone survivors: How we came to be the only humans on earth. Times Books, Henry Holt.Google Scholar
Struhsaker, T. T. (1967) Auditory communication among vervet monkeys (Cercopithecus aethiops). In: Social communication among primates, ed. Altmann, S. A., pp. 281324. University of Chicago Press.Google Scholar
Sugiura, H. (1998) Matching of acoustic features during the vocal exchange of coo calls by Japanese macaques. Animal Behaviour 55:673–87.Google Scholar
Surmeier, D. J., Day, M., Gertler, T., Chan, S. & Shen, W. (2010a) D1 and D2 dopamine receptor modulation of glutamatergic signaling in striatal medium spiny neurons. In: Handbook of basal ganglia structure and function, ed. Steiner, H., Tseng, K. Y., pp. 113–32. Elsevier.Google Scholar
Surmeier, D. J., Day, M., Gertler, T.S, Chan, C. S. & Shen, W. (2010b) Dopaminergic modulation of striatal glutamatergic signaling in health and Parkinson's disease. In: Dopamine handbook, ed. Iversen, L. L., Iversen, S. D., Dunnett, S. B. & Björklund, A., pp. 349–67. Oxford University Press.Google Scholar
Sutton, D., Larson, C. & Lindeman, R. C. (1974) Neocortical and limbic lesion effects on primate phonation. Brain Research 71:6175.Google Scholar
Sutton, D., Larson, C., Taylor, E. M. & Lindeman, R. C. (1973) Vocalization in rhesus monkeys: Conditionability. Brain Research 52:225–31.Google Scholar
Sutton, D., Trachy, R. E. & Lindeman, R. C. (1981) Vocal and nonvocal discriminative performance in monkeys. Brain and Language 14:93105.Google Scholar
Sutton, D., Trachy, R. E. & Lindeman, R. C. (1985) Discriminative phonation in macaques: Effects of anterior mesial cortex damage. Experimental Brain Research 59:410–13.Google Scholar
Taglialatela, J. P., Savage-Rumbaugh, S. & Baker, L. A. (2003) Vocal production by a language-competent Pan paniscus . International Journal of Primatology 24:117.Google Scholar
Tallerman, M. & Gibson, K. R. (2012) The Oxford handbook of language evolution. Oxford University Press. (Oxford Handbooks in Linguistics Series).Google Scholar
Talmage-Riggs, G., Winter, P., Ploog, D. & Mayer, W. (1972) Effect of deafening on the vocal behavior of the squirrel monkey (Saimiri sciureus). Folia Primatologica 17:404–20.Google Scholar
Taylor, J. (2009) Not a chimp: The hunt to find the genes that make us human. Oxford University Press.Google Scholar
Teramitsu, I., Poopatanapong, A., Torrisi, S. & White, S. A. (2010) Striatal FoxP2 is actively regulated during songbird sensorimotor learning. PLoS ONE 5:e8548.Google Scholar
Thyagarajan, D., Shanske, S., Vazquez-Memije, M., DeVivo, D. & DiMauro, S. (1995) A novel mitochondrial ATPase 6 point mutation in familial bilateral striatal necrosis. Annals of Neurology 38:468–72.Google Scholar
Tomasello, M. (2008) Origins of human communication. MIT Press.Google Scholar
Trachy, R. E., Sutton, D. & Lindeman, R. C. (1981) Primate phonation: Anterior cingulate lesion effects on response rate and acoustical structure. American Journal of Primatology 1:4355.Google Scholar
Turner, V. (1967) The forest of symbols: Aspects of Ndembu ritual. Cornell University Press.Google Scholar
Tuttle, R. H. (2007) Apes, intelligent science, and conservation. In: Primate perspectives on behavior and cognition, ed. Washburn, D. A., pp. 1728. American Psychological Association.Google Scholar
Ungerleider, L. G., Doyon, J. & Karni, A. (2002) Imaging brain plasticity during motor skill learning. Neurobiology of Learning and Memory 78:553–64.Google Scholar
Van Lancker Sidtis, D., Pachana, N., Cummings, J. L. & Sidtis, J. J. (2006) Dysprosodic speech following basal ganglia insult: Toward a conceptual framework for the study of the cerebral representation of prosody. Brain and Language 97:135–53.Google Scholar
van Schaik, C. P., Ancrenaz, M., Borgen, G., Galdikas, B., Knott, C. D., Singleton, I., Suzuki, A., Utami, S. S. & Merrill, M. (2003) Orangutan cultures and the evolution of material culture. Science 299:102105.Google Scholar
van Schaik, C. P., van Noordwijk, M. A., & Wich, S. A. (2006) Innovation in wild Bornean orangutans (Pongo pygmaeus wurmbii). Behaviour 143:839–76.Google Scholar
Vargha-Khadem, F., Gadian, D. G., Copp, A. & Mishkin, M. (2005) FOXP2 and the neuroanatomy of speech and language. Nature Reviews Neuroscience 6:131–38.Google Scholar
Vargha-Khadem, F. & Passingham, R. (1990) Speech and language defects. Nature 346(6281):226.Google Scholar
Vargha-Khadem, F., Watkins, K. E., Alcock, K. J., Fletcher, P. & Passingham, R.E. (1995) Praxic and nonverbal cognitive deficits in a large family with a genetically transmitted speech and language disorder. Proceedings of the National Academy of Sciences USA 92:930–33.Google Scholar
Vargha-Khadem, F., Watkins, K. E., Price, C. J., Ashburner, J., Alcock, K. J., Connelly, A., Frackowiak, R. S. J., Friston, K. J., Pembrey, M. E., Mishkin, M., Gadian, D. G. & Passingham, R. E. (1998) Neural basis of an inherited speech and language disorder. Proceedings of the National Academy of Sciences USA 95:12695–700.Google Scholar
Vogt, B. A. & Barbas, H. (1988) Structure and connections of the cingulate vocalization region in the rhesus monkey. In: The physiological control of mammalian vocalization, ed. Newman, J. D., pp. 203–25. Plenum Press.Google Scholar
Voorn, P., Vanderschuren, L. J. M. J., Groenewegen, H. J., Robbins, T. W. & Pennartz, C. M. A. (2004) Putting a spin on the dorsal–ventral divide of the striatum. Trends in Neurosciences 27:468–74.Google Scholar
Wallman, J. (1992) Aping language. Cambridge University Press.Google Scholar
Walters, J. R. & Bergstrom, D. A. (2010) Synchronous activity in basal ganglia circuits. In: Handbook of basal ganglia structure and function, ed. Steiner, H. & Tseng, K. Y., pp. 429–43. Elsevier.Google Scholar
Watkins, K. E., Dronkers, N. F. & Vargha-Khadem, F. (2002a) Behavioural analysis of an inherited speech and language disorder: Comparison with acquired aphasia. Brain: A Journal of Neurology 125(Pt. 3):452–64.Google Scholar
Watkins, K. E., Gadian, D. G. & Vargha-Khadem, F. (1999) Functional and structural brain abnormalities associated with a genetic disorder of speech and language. American Journal of Human Genetics 65:1215–21.Google Scholar
Watkins, K. E., Vargha-Khadem, F., Ashburner, J., Passingham, R. E., Connelly, A., Friston, K. J., Frackowiak, R. S., Mishkin, M. & Gadian, D. G. (2002b) MRI analysis of an inherited speech and language disorder: Structural brain abnormalities. Brain 125 (Pt. 3):465–78.Google Scholar
Watson, R. T., Fleet, W. S., Gonzalez-Rothi, L. & Heilman, K. M. (1986) Apraxia and the supplementary motor area. Archives of Neurology 43:787–92.Google Scholar
Weaver, T. D., Roseman, C. C. & Stringer, C. B. (2008) Close correspondence between quantitative- and molecular-genetic divergence times for Neandertals and modern humans. Proceedings of the National Academy of Sciences USA 105:4645–49.Google Scholar
Weismer, G. (1980) Control of the voicing distinction for intervocalic stops and fricatives: Some data and theoretical considerations. Journal of Phonetics 8:427–38.Google Scholar
West, R. A. & Larson, C. R. (1995) Neurons of the anterior mesial cortex related to faciovocal activity in the awake monkey. Journal of Neurophysiology 74:1856–69.Google Scholar
Whitty, C. W. M. (1955) Effects of anterior cingulectomy in man. Proceedings of the Royal Society of Medicine 48:463–69.Google Scholar
Wich, S. A. & de Vries, H. (2006) Male monkeys remember which group members have given alarm calls. Proceedings of the Royal Society of London, Series B: Biological Sciences 273:735–40.Google Scholar
Wich, S. A., Swartz, K., Hardus, M. E., Lameira, A. R., Stromberg, E. & Shumaker, R. (2009) A case of spontaneous acquisition of a human sound by an orangutan. Primates 50:5664.Google Scholar
Wichmann, T. & DeLong, M. R. (2007) Epidemiology of Parkinson's disease. In: Parkinson's disease and related disorders, Part 1, ed. Koller, W. C. & Melamed, E., pp. 318. (Handbook of Clinical Neurology, vol. 83, 3rd series). Elsevier Press.Google Scholar
Wickens, J. R., Horvitz, J. C., Costa, R. M. & Killcross, S. (2007) Dopaminergic mechanisms in actions and habits. Journal of Neuroscience 27:8181–83.Google Scholar
Wild, B., Rodden, F. A., Grodd, W. & Ruch, W. (2003) Neural correlates of laughter and humour. Brain 126:2121–38.Google Scholar
Wild, J. M. (2008) Birdsong: Anatomical foundations and central mechanisms of sensorimotor integration, In: Neuroscience of birdsong, ed. Zeigler, H. P. & Marler, P., pp. 136–51. Cambridge University Press.Google Scholar
Wildgruber, D., Ackermann, H., Kreifelts, B. & Ethofer, T. (2006) Cerebral processing of linguistic and emotional prosody: fMRI studies. In: Understanding emotions, ed. Anders, S., Ende, G., Junghofer, M., Kissler, J. & Wildgruber, D., pp. 249–68. (Series: Progress in Brain Research, vol. 156). Elsevier.Google Scholar
Willuhn, I. & Steiner, H. (2008) Motor-skill learning in a novel running-wheel task is dependent on D1 dopamine receptors in the striatum. Neuroscience 153:249–58.Google Scholar
Winter, P., Handley, P., Ploog, D. & Schott, D. (1973) Ontogeny of squirrel monkey calls under normal conditions and under acoustic isolation. Behaviour 47:230–39.Google Scholar
Winter, P., Ploog, D. & Latta, J. (1966) Vocal repertoire of the squirrel monkey (Saimiri sciureus), its analysis and significance. Experimental Brain Research 1:359–84.Google Scholar
Wu, T. & Hallett, M. (2005) A functional MRI study of automatic movements in patients with Parkinson's disease. Brain 128:2250–59.Google Scholar
Wu, T., Kansaku, K. & Hallett, M. (2004) How self-initiated memorized movements become automatic: A functional MRI study. Journal of Neurophysiology 91:1690–98.Google Scholar
Yin, H. H., Mulcare, S. P., Hilário, M. R. F., Clouse, E., Holloway, T., Davis, M. I., Hansson, A. C., Lovinger, D. M. & Costa, R. M. (2009) Dynamic reorganization of striatal circuits during the acquisition and consolidation of a skill. Nature Neuroscience 12:333–41.Google Scholar
Zhang, J., Webb, D. M. & Podlaha, O. (2002) Accelerated protein evolution and origins of human-specific features: FOXP2 as an example. Genetics 162:1825–35.Google Scholar
Zhang, S. P., Davis, P. J., Bandler, R. & Carrive, P. (1994) Brain stem integration of vocalization: Role of the midbrain periaqueductal gray. Journal of Neurophysiology 72:1337–56.Google Scholar
Ziegler, W. (2008) Apraxia of speech. In: Neuropsychology and behavioral neurology, ed. Goldenberg, G. & Miller, B. L., pp. 269–85. (Handbook of clinical neurology, vol. 88, 3rd series). Elsevier Press.Google Scholar
Ziegler, W. (2010) Apraxic failure and the hierarchical structure of speech motor plans: A nonlinear probabilistic model. In: Assessment of motor speech disorders, ed. Lowit, A. & Kent, R. D., pp. 305–23. Plural Publishing.Google Scholar
Ziegler, W., Aichert, I. & Staiger, A. (2012) Apraxia of speech: Concepts and controversies. Journal of Speech, Language, and Hearing Research 55:S1485–501.Google Scholar
Ziegler, W., Kilian, B. & Deger, K. (1997) The role of the left mesial frontal cortex in fluent speech: Evidence from a case of left supplementary motor area hemorrhage. Neuropsychologia 35:1197–208.Google Scholar
Zuberbühler, K. (2000a) Causal cognition in a nonhuman primate: Field playback experiments with Diana monkeys. Cognition 76(3):195207.Google Scholar
Zuberbühler, K., Cheney, D. L. & Seyfarth, R. M. (1999) Conceptual semantics in a nonhuman primate. Journal of Comparative Psychology 113:3342.Google Scholar
Zuberbühler, K. & Jenny, D. (2007) Interaction between leopard and monkeys. In: Monkeys of the Taï Forest: An African primate community, ed. McGraw, W. S., Zuberbühler, K., & Noe, R., pp. 133–54. Cambridge University Press.Google Scholar
Figure 0

Figure 1A. Acoustic communication in nonhuman primates: Call structure.A. Spectrograms (left-hand section of each panel) and power spectra (right-hand section in each) of two common rhesus monkey vocalizations, that is, a “coo” (left panel) and a “grunt” (right panel). Gray level of the spectrograms codes for spectral energy. Coo calls (left panel) are characterized by a harmonic structure, encompassing a fundamental frequency (F0, the lowest and darkest band) and several harmonics (H1 to Hn). Measures derived from the F0 contour provide robust criteria for a classification of periodic signals, for example, peak frequency (peakF; Hardus et al. 2009a). Onset F0 seems to be highly predictive for the shape of the intonation contour, indicating the implementation of a “vocal plan” prior to movement initiation (Miller et al. 2009a; 2009b). Grunts (right) represent short and noisy calls whose spectra include more energy in the lower frequency range and a rather flat energy distribution.

Figure 1

Figure 1B. Acoustic Communication in nonhuman Primates: Cerebral Organization.Cerebral “vocalization network” of the squirrel monkey (as a model of the primate-general “communication brain”). The solid lines represent the “vocal brainstem circuit” of the vocalization network and its modulatory cortical input (ACC), the dotted lines the strong connections of sensory cortical regions (AC, VC) and motivation-controlling limbic structures (Ac, Hy, Se, St) to this circuit.Key: ACC = Anterior cingulate cortex; AC = Auditory cortex; Ac = Nucleus accumbens; Hy = Hypothalamus; LRF = Lateral reticular formation; NRA = Nucleus retroambigualis; PAG = periaqueductal gray; PB = brachium pontis; SC = superior colliculus; Se = Septum; St = Nucleus stria terminalis; VC = Visual cortex (Unpublished figure. See Jürgens 2002b and Hage 2010a; 2010b for further details).

Figure 2

Figure 2. Vocal tract mechanisms of speech sound production.A. Source-filter theory of speech production (Fant 1970). Modulation of expiratory air flow at the levels of the vocal folds and supralaryngeal structures (pharynx, velum, tongue, and lips) gives rise to most speech sounds across human languages (Ladefoged 2005). In case of vowels and voiced consonants, the adducted vocal folds generate a laryngeal source signal with a harmonic spectrum U(s), which is then filtered by the resonance characteristics of the supralaryngeal cavities T(s) and the vocal tract radiation function R(s). As a consequence, these sounds encompass distinct patterns of peaks and troughs (formant structure; P(s)) across their spectral energy distribution.B. Consonants are produced by constricting the vocal tract at distinct locations (a), for example, through occlusion of the oral cavity at the alveolar ridge of the upper jaw by the tongue tip for /d/, /t/, or /n/ (insert of left panel: T/B=tip/body of the tongue, U/L = upper/lower lips, J = lower jaw with teeth). Such manoeuvres give rise to distinct up- and downward shifts of formants: Right panels show the formant transients of /da/ as a spectrogram (b) and a schematic display (c); dashed lines indicate formant transients of syllable /ba/ (figures adapted from Kent & Read 2002).C. Schematic display of the gestural architecture of articulate speech, exemplified for the word speaking. Consonant articulation is based on distinct movements of lips, tongue, velum, and vocal folds, phase-locked to more global and slower deformations of the vocal tract (VT) associated with vowel production. Articulatory gestures are assorted into syllabic units, and gesture bundles pertaining to strong and weak syllables are rhythmically patterned to form metrical feet. Note that laryngeal activity in terms of glottal opening movements (bottom line) is a crucial part of the gestural patterning of spoken words and must be adjusted to and sequenced with other vocal tract movements in a precise manner (Ziegler 2010).

Figure 3

Figure 3. Structural and functional compartmentalization of the basal ganglia.A. Schematic illustration of the – at least – tripartite functional subdivision of the cortico-basal ganglia–thalamo–cortical circuitry. Motor, cognitive/associative, and limbic loops are depicted in different gray shades, and the two cross-sections of the striatum (center) delineate the limbic, cognitive/associative, and motor compartments of the basal ganglia input nuclei. Alternating reciprocal (e.g., 1–1) and non-reciprocal loops (e.g., subsequent trajectory 2) form a spiraling cascade of dopaminergic projections interconnecting these parallel reentrant circuits (modified Fig. 2.3.5. from Haber 2010b).B. Within the basal ganglia, the motor loop segregates into at least three pathways: a direct (striatum – SNr/GPi), an indirect (striatum – GPe – SNr/GPi), and a hyperdirect (via STN) circuit (based on Fig. 1 in Nambu 2011 and Fig. 25.1 in Walters & Bergstrom 2010). The direct and indirect medium-sized spiny projection neurons of the striatum (MSN) differ in their patterns of receptor and peptide expression (direct pathway: D1-type dopamine receptors, SP = substance P; indirect pathway: D2, ENK = enkephalin) rather than their somatodendritic architecture.Key: DA = dopamine; GPi/GPe = internal/external segment of globus pallidus; SNr = substantia nigra, pars reticulata; SNc = substantia nigra, pars compacta; VTA = ventral tegmental area; STN = subthalamic nucleus; SC = superior colliculus; PPN = pedunculopontine nucleus; PAG = periaqueductal gray.

Figure 4

Figure 4. Cerebral network supporting the integration of primate-general (gray arrows) and human-specific aspects of acoustic communication (black).