In the target article, Ackermann et al. present an interesting twist on the well-weathered hypothesis of a direct cortico-bulbar tract as a key step in the evolution of spoken language in humans, or song in vocal-learning birds. The authors seek to generate a new hypothesis that the basal ganglia, in particular, are functionally reorganized during human evolution for spoken language and also change in function during ontogeny with the learning of speech. Curiously, however, the basal ganglia, after supporting a language-learning role during child development, are proposed to revert to a seemingly more evolutionarily conserved functional role of supporting “emotive-prosodic” modulation in adult humans. This illustrates how the proposal flexes to encompass most data and risks being empirically untestable. Especially unclear is what similarities or differences are hypothesized to exist between humans and different animal models, where presumably homologous or analogous neurobiological mechanisms can be clarified.
Although we have little doubt that the basal ganglia were an evolutionary substrate for spoken language, one among many others, the current proposal requires considerable strengthening. We make two key suggestions. First, the hypothesis needs to be grounded in, or its key tenets distinguished from, certain cognitive and/or motor theories. Such theories have proposed that specific improvements occurred in vocal-learning systems or motor pathways of humans and some birds, including cortico-striatal-thalamic circuits (Arriaga & Jarvis Reference Arriaga and Jarvis2013; Feenders et al. Reference Feenders, Liedvogel, Rivas, Zapka, Horita, Hara and Jarvis2008; Fitch et al. Reference Fitch, Huber and Bugnyar2010; Fitch & Jarvis Reference Fitch, Jarvis and Arbib2012; Petkov & Jarvis Reference Petkov and Jarvis2012; Wild Reference Wild1997). Second, we propose that the key tenets of the proposal, if clarified, can be comparatively tested in studies between, for instance, human and nonhuman primates, and songbirds and vocal non-learning birds, and any of these species and rodents (see our Figure 1). Such comparative analyses have already been used in the past to test for the hypothesized differences in the cortico-striatal system between some of these species, and can still be used to comparatively test additional aspects of the current proposal.
Figure 1. Summary diagrams of vocal systems in songbirds, humans, monkeys, and mice. Modified from Arriaga and Jarvis (Reference Arriaga and Jarvis2013). Cortico-striatal-thalamic loops are schematized from data in humans and songbirds. Yellow dashed lines in macaque monkeys and mice show proposed cortico-striatal-thalamic connections for vocalization that need to be tested.
One issue is whether and which basal ganglia–dependent differences exist between humans and other nonhuman primates or mammals. There is little direct comparative evidence in the primate literature to suggest that the cortico-striatal-thalamic system is strikingly different in humans relative to nonhuman primates. In fact, as Ackermann et al. note, nonhuman primates and rodents are used as cellular model systems for human basal ganglia–related cognitive function on motor and procedural learning, habit forming, reward and decision-making, and sensory-motor timing relationships (Matell & Meck Reference Matell and Meck2004; Schultz et al. Reference Schultz, Tremblay and Hollerman2000). Presumably, the proposal is that the basal ganglia, as part of a cognitive system, increased in capacity in humans to support language learning (Friederici Reference Friederici2011; Petkov & Jarvis Reference Petkov and Jarvis2012; Petkov & Wilson Reference Petkov and Wilson2012). In this regard, it is possibly interesting that Artificial Grammar learning tasks, which were developed in the infant learning literature and that tap into rule-based procedural learning, appear to show differences between different species of monkeys (Wilson et al. Reference Wilson, Slater, Kikuchi, Milne, Marslen-Wilson, Smith and Petkov2013) and between monkeys and humans (Fitch & Hauser Reference Fitch and Hauser2004). These observations were predicted by cognitive theories on spoken language origins (Arriaga & Jarvis Reference Arriaga and Jarvis2013; Petkov & Jarvis Reference Petkov and Jarvis2012).
Thus, the proposal lacks the strength of the specificity of the direct cortico-bulbar hypothesis, and at the same time suffers from the limitation of overemphasis on a region vital for cognition, whose function is lost without the context of the cortico-striatal-thalamic circuits that are formed in the brains of birds and mammals. As a historical example, the direct cortico-bulbar hypothesis is now seen to be grounded in motor theories of spoken language origins (Petkov & Jarvis Reference Petkov and Jarvis2012). It is very specific that a monosynaptic change allowed learned sensory patterns to be vocally produced. But its strength in specificity was also its Achilles heel, leaving unanswered how humans and other mammals differ in their neurobiological substrates for learned auditory patterns, and which are linked to vocal motor output (via the nucleus ambiguus). Cognitive theories and the current proposal aim to address this shortcoming. Moreover, even the tenet of a presence versus absence of a direct cortico-bulbar tract is being challenged by recent data: Mice appear to have a sparse but still present direct cortico-bulbar projection to the nucleus ambiguus and greater vocal-production-plasticity capabilities than had been thought (Arriaga & Jarvis Reference Arriaga and Jarvis2013; Arriaga et al. Reference Arriaga, Zhou and Jarvis2012), features that had been thought to be unique to humans and vocal-learning birds.
Notably, the more precise link that the authors are pursuing with regard to the origins of spoken language and basal ganglia function, already has an evolutionary counterpart in vocal-learning and vocal-non-learning birds. The avian striatal vocal nucleus (called Area X in songbirds) sits within a cortico-striatal-thalamic loop, which is important for song learning (Jarvis Reference Jarvis, Zeigler and Marler2004b; Reference Jarvis2006; Jarvis et al. Reference Jarvis, Ribeiro, da Silva, Ventura, Vielliard and Mello2000), including covert-skill song learning (Charlesworth et al. Reference Charlesworth, Warren and Brainard2012). Moreover, Feenders et al. (Reference Feenders, Liedvogel, Rivas, Zapka, Horita, Hara and Jarvis2008), by comparing the anterior-forebrain pathway in vocal-learning birds to this pathway in vocal-non-learning birds, found evidence to develop a motor theory of vocal-learning origin.
This theory proposes that the anterior-forebrain song pathway (including Area X) independently arose multiple times in vocal-learning birds from a set of regions that in vocal-non-learning birds control non-vocal motor actions. The discrete striatal Area X that sits within the cortico-striatal-thalamic vocal-learning loop (Fig. 1) is not present in vocal-non-learning birds. Motor striatal regions outside of Area X, or the comparable forebrain regions in vocal-non-learning birds, are more diffuse and relate to these animals' non-vocal motor learning abilities. Thus, considerable insights on the cortico-striatal-thalamic system have already been provided by avian models. These are only briefly alluded to but not meaningfully used to inform the current proposal.
In summary, Ackermann et al.'s proposal is an interesting review of the literature with an emphasis on the basal ganglia as an evolutionary substrate for spoken language. However, we found it heavy on conjecture and light on empirical hypotheses, which, as we have suggested, can be strengthened by (1) taking a broader evolutionary perspective that allows integrating data from birds and mammals, and (2) delineating more carefully how the current proposal can be integrated within or distinguished from other theories on spoken language origins.
In the target article, Ackermann et al. present an interesting twist on the well-weathered hypothesis of a direct cortico-bulbar tract as a key step in the evolution of spoken language in humans, or song in vocal-learning birds. The authors seek to generate a new hypothesis that the basal ganglia, in particular, are functionally reorganized during human evolution for spoken language and also change in function during ontogeny with the learning of speech. Curiously, however, the basal ganglia, after supporting a language-learning role during child development, are proposed to revert to a seemingly more evolutionarily conserved functional role of supporting “emotive-prosodic” modulation in adult humans. This illustrates how the proposal flexes to encompass most data and risks being empirically untestable. Especially unclear is what similarities or differences are hypothesized to exist between humans and different animal models, where presumably homologous or analogous neurobiological mechanisms can be clarified.
Although we have little doubt that the basal ganglia were an evolutionary substrate for spoken language, one among many others, the current proposal requires considerable strengthening. We make two key suggestions. First, the hypothesis needs to be grounded in, or its key tenets distinguished from, certain cognitive and/or motor theories. Such theories have proposed that specific improvements occurred in vocal-learning systems or motor pathways of humans and some birds, including cortico-striatal-thalamic circuits (Arriaga & Jarvis Reference Arriaga and Jarvis2013; Feenders et al. Reference Feenders, Liedvogel, Rivas, Zapka, Horita, Hara and Jarvis2008; Fitch et al. Reference Fitch, Huber and Bugnyar2010; Fitch & Jarvis Reference Fitch, Jarvis and Arbib2012; Petkov & Jarvis Reference Petkov and Jarvis2012; Wild Reference Wild1997). Second, we propose that the key tenets of the proposal, if clarified, can be comparatively tested in studies between, for instance, human and nonhuman primates, and songbirds and vocal non-learning birds, and any of these species and rodents (see our Figure 1). Such comparative analyses have already been used in the past to test for the hypothesized differences in the cortico-striatal system between some of these species, and can still be used to comparatively test additional aspects of the current proposal.
Figure 1. Summary diagrams of vocal systems in songbirds, humans, monkeys, and mice. Modified from Arriaga and Jarvis (Reference Arriaga and Jarvis2013). Cortico-striatal-thalamic loops are schematized from data in humans and songbirds. Yellow dashed lines in macaque monkeys and mice show proposed cortico-striatal-thalamic connections for vocalization that need to be tested.
One issue is whether and which basal ganglia–dependent differences exist between humans and other nonhuman primates or mammals. There is little direct comparative evidence in the primate literature to suggest that the cortico-striatal-thalamic system is strikingly different in humans relative to nonhuman primates. In fact, as Ackermann et al. note, nonhuman primates and rodents are used as cellular model systems for human basal ganglia–related cognitive function on motor and procedural learning, habit forming, reward and decision-making, and sensory-motor timing relationships (Matell & Meck Reference Matell and Meck2004; Schultz et al. Reference Schultz, Tremblay and Hollerman2000). Presumably, the proposal is that the basal ganglia, as part of a cognitive system, increased in capacity in humans to support language learning (Friederici Reference Friederici2011; Petkov & Jarvis Reference Petkov and Jarvis2012; Petkov & Wilson Reference Petkov and Wilson2012). In this regard, it is possibly interesting that Artificial Grammar learning tasks, which were developed in the infant learning literature and that tap into rule-based procedural learning, appear to show differences between different species of monkeys (Wilson et al. Reference Wilson, Slater, Kikuchi, Milne, Marslen-Wilson, Smith and Petkov2013) and between monkeys and humans (Fitch & Hauser Reference Fitch and Hauser2004). These observations were predicted by cognitive theories on spoken language origins (Arriaga & Jarvis Reference Arriaga and Jarvis2013; Petkov & Jarvis Reference Petkov and Jarvis2012).
Thus, the proposal lacks the strength of the specificity of the direct cortico-bulbar hypothesis, and at the same time suffers from the limitation of overemphasis on a region vital for cognition, whose function is lost without the context of the cortico-striatal-thalamic circuits that are formed in the brains of birds and mammals. As a historical example, the direct cortico-bulbar hypothesis is now seen to be grounded in motor theories of spoken language origins (Petkov & Jarvis Reference Petkov and Jarvis2012). It is very specific that a monosynaptic change allowed learned sensory patterns to be vocally produced. But its strength in specificity was also its Achilles heel, leaving unanswered how humans and other mammals differ in their neurobiological substrates for learned auditory patterns, and which are linked to vocal motor output (via the nucleus ambiguus). Cognitive theories and the current proposal aim to address this shortcoming. Moreover, even the tenet of a presence versus absence of a direct cortico-bulbar tract is being challenged by recent data: Mice appear to have a sparse but still present direct cortico-bulbar projection to the nucleus ambiguus and greater vocal-production-plasticity capabilities than had been thought (Arriaga & Jarvis Reference Arriaga and Jarvis2013; Arriaga et al. Reference Arriaga, Zhou and Jarvis2012), features that had been thought to be unique to humans and vocal-learning birds.
Notably, the more precise link that the authors are pursuing with regard to the origins of spoken language and basal ganglia function, already has an evolutionary counterpart in vocal-learning and vocal-non-learning birds. The avian striatal vocal nucleus (called Area X in songbirds) sits within a cortico-striatal-thalamic loop, which is important for song learning (Jarvis Reference Jarvis, Zeigler and Marler2004b; Reference Jarvis2006; Jarvis et al. Reference Jarvis, Ribeiro, da Silva, Ventura, Vielliard and Mello2000), including covert-skill song learning (Charlesworth et al. Reference Charlesworth, Warren and Brainard2012). Moreover, Feenders et al. (Reference Feenders, Liedvogel, Rivas, Zapka, Horita, Hara and Jarvis2008), by comparing the anterior-forebrain pathway in vocal-learning birds to this pathway in vocal-non-learning birds, found evidence to develop a motor theory of vocal-learning origin.
This theory proposes that the anterior-forebrain song pathway (including Area X) independently arose multiple times in vocal-learning birds from a set of regions that in vocal-non-learning birds control non-vocal motor actions. The discrete striatal Area X that sits within the cortico-striatal-thalamic vocal-learning loop (Fig. 1) is not present in vocal-non-learning birds. Motor striatal regions outside of Area X, or the comparable forebrain regions in vocal-non-learning birds, are more diffuse and relate to these animals' non-vocal motor learning abilities. Thus, considerable insights on the cortico-striatal-thalamic system have already been provided by avian models. These are only briefly alluded to but not meaningfully used to inform the current proposal.
In summary, Ackermann et al.'s proposal is an interesting review of the literature with an emphasis on the basal ganglia as an evolutionary substrate for spoken language. However, we found it heavy on conjecture and light on empirical hypotheses, which, as we have suggested, can be strengthened by (1) taking a broader evolutionary perspective that allows integrating data from birds and mammals, and (2) delineating more carefully how the current proposal can be integrated within or distinguished from other theories on spoken language origins.