Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-02-11T14:57:32.523Z Has data issue: false hasContentIssue false

Vocal communication is multi-sensorimotor coordination within and between individuals

Published online by Cambridge University Press:  17 December 2014

Daniel Y. Takahashi
Affiliation:
Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544. takahashiyd@gmail.com Department of Psychology, Princeton University, Princeton, NJ 08544.
Asif A. Ghazanfar
Affiliation:
Princeton Neuroscience Institute, Princeton University, Princeton, NJ 08544. takahashiyd@gmail.com Department of Psychology, Princeton University, Princeton, NJ 08544. Department of Ecology and Evolutionary Biology, Princeton University, Princeton, NJ 08544. asifg@princeton.eduwww.princeton.edu/~asifg

Abstract

Speech is an exquisitely coordinated interaction among effectors both within and between individuals. No account of human communication evolution that ignores its foundational multisensory characteristics and cooperative nature will be satisfactory. Here, we describe two additional capacities – rhythmic audiovisual speech and cooperative communication – and suggest that they may utilize the very same or similar circuits as those proposed for vocal learning.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2014 

Both speech and nonhuman primate vocalizations are produced by the coordinated movements of the lungs, larynx (vocal folds), and the supralaryngeal vocal tract (Ghazanfar & Rendall Reference Ghazanfar and Rendall2008). During vocal production, the shape of the vocal tract can be changed by moving the various effectors of the face (including the lips, jaw, and tongue) into different positions. The different shapes, along with changes in vocal fold tension and respiratory power, are what give rise to different sounding vocalizations. Different vocalizations (including different speech sounds) are produced in part by making different facial expressions. Thus vocalizations are inherently “multisensory” (Ghazanfar Reference Ghazanfar2013).

Given the inextricable link between vocal output and facial expressions, it is perhaps not surprising that nonhuman primates, like humans, readily recognize the correspondence between the visual and auditory components of vocal signals (Ghazanfar & Logothetis Reference Ghazanfar and Logothetis2003; Ghazanfar et al. Reference Ghazanfar, Turesson, Maier, van Dinther, Patterson and Logothetis2007; Habbershon et al. Reference Habbershon, Ahmed and Cohen2013; Jordan et al. Reference Jordan, Brannon, Logothetis and Ghazanfar2005; Sliwa et al. Reference Sliwa, Duhamel, Pascalis and Wirth2011) and use facial motion to more accurately and more quickly detect vocalizations (Chandrasekaran et al. Reference Chandrasekaran, Lemus, Trubanova, Gondan and Ghazanfar2011). However, one striking dissimilarity between monkey vocalizations and human speech is that the latter has a unique bi-sensory rhythmicity, in that both the acoustic output and the movements of the mouth share a 3–8 Hz rhythmicity and are tightly correlated (Chandrasekaran et al. Reference Chandrasekaran, Trubanova, Stillittano, Caplier and Ghazanfar2009; Greenberg et al. Reference Greenberg, Carvey, Hitchcock and Chang2003). According to one hypothesis, this bimodal speech rhythm evolved through the linking of rhythmic facial expressions to vocal output in ancestral primates to produce the first babbling-like speech output (Ghazanfar & Poeppel Reference Ghazanfar, Poeppel, Gazzaniga and Mangun2014; MacNeilage Reference MacNeilage1998). Lip-smacking, a rhythmic facial expression commonly produced by many primate species, may have been one such ancestral expression. It is used during affiliative and often face-to-face interactions (Ferrari et al. Reference Ferrari, Paukner, Ionica and Suomi2009; Van Hooff Reference Van Hooff1962); it exhibits a 3–8 Hz rhythmicity like speech (Ghazanfar et al. Reference Ghazanfar, Chandrasekaran and Morrill2010); and the coordination of effectors during its production (Ghazanfar et al. Reference Ghazanfar, Takahashi, Mathur and Fitch2012) and its developmental trajectory are similar to speech (Morrill et al. Reference Morrill, Paukner, Ferrari and Ghazanfar2012).

Very little is known about the neural mechanisms underlying the production of rhythmic communication signals in human and nonhuman primates. The mandibular movements shared by lip-smacking, vocalizations, and speech all require the coordination of muscles controlling the jaw, face, tongue, and respiration, and their foundational rhythms are likely produced by homologous central pattern generators in the brainstem (Lund & Kolta Reference Lund and Kolta2006). These circuits are modulated by feedback from peripheral sensory receptors. The neocortex may be an additional source influencing orofacial movements and their rhythmicity. Indeed, lip-smacking and speech production are both modulated by the neocortex, in accord with social context and communication goals (Bohland & Guenther Reference Bohland and Guenther2006; Caruana et al. Reference Caruana, Jezzini, Sbriscia-Fioretti, Rizzolatti and Gallese2011). Thus, one hypothesis for the similarities between lip-smacking and visual speech (i.e., the orofacial component of speech production) is that they are a reflection of the development of neocortical circuits influencing brainstem central pattern generators.

One important neocortical node likely to be involved in this circuit is the insula, a structure that has been a target for selection in the primate lineage (Bauernfiend et al. Reference Bauernfiend, de Sousa, Avasthi, Dobson, Raghanti, Lewandowski, Zilles, Semendeferi, Allman, Craig, Hof and Sherwood2013). The human insula is involved in, among other socio-emotional behaviors, speech production (Ackermann & Riecker Reference Ackermann and Riecker2004; Bohland & Guenther Reference Bohland and Guenther2006; Dronkers Reference Dronkers1996). Consistent with an evolutionary link between lip-smacking and speech, the insula also plays a role in generating monkey lip-smacking (Caruana et al. Reference Caruana, Jezzini, Sbriscia-Fioretti, Rizzolatti and Gallese2011). It is conceivable that for both monkey lip-smacking and human speech, the development and coordination of effectors related to their shared orofacial rhythm are due to the socially guided development of the insula. However, a neural substrate is needed to link the production of lip-smack-like facial expressions to concomitant vocal output (the laryngeal source) in order to generate that first babbling-like vocal output. This link to laryngeal control remains a mystery. One scenario is the evolution of insular cortical control over the brainstem's nucleus ambiguus. The fact that gelada baboons produce lip-smacks concurrently with vocal output, generating a babbling-like sound (Bergman Reference Bergman2013), is evidence that a coordination between lip-smacking and vocal output may be easy to evolve.

Human vocal communication is also a coordinated and cooperative exchange of signals between individuals (Hasson et al. Reference Hasson, Ghazanfar, Galantucci, Garrod and Keysers2012). Foundational to all cooperative verbal communicative acts is a more general one: taking turns to speak. Given the universality of turn-taking (Stivers et al. Reference Stivers, Enfield, Brown, Englert, Hayashi, Heinemann, Hoymann, Rossano, de Ruiter, Yoon and Levinson2009), it is natural to ask how it evolved. Recently, we tested whether marmoset monkeys communicate cooperatively like humans (Takahashi et al. Reference Takahashi, Narayanan and Ghazanfar2013). Among the traits marmosets share with humans are a cooperative breeding strategy and volubility. Cooperative care behaviors scaffold prosocial motivational and cognitive processes not typically seen in other primate species (Burkart et al. Reference Burkart, Hrdy and van Schaik2009a). We capitalized on the fact that marmosets are not only prosocial, but are also highly vocal and readily exchange vocalizations with conspecifics. We observed that they exhibit cooperative vocal communication, taking turns in extended sequences of call exchanges (Takahashi et al. Reference Takahashi, Narayanan and Ghazanfar2013), using conversation rules that are strikingly similar to human rules (Stivers et al. Reference Stivers, Enfield, Brown, Englert, Hayashi, Heinemann, Hoymann, Rossano, de Ruiter, Yoon and Levinson2009). Such exchanges did not depend upon pair-bonding or kinship with conspecifics and are more sophisticated than simple call-and-responses exhibited by other species. Moreover, our data show that turn-taking in marmosets shares with humans the characteristics of coupled oscillators with self-monitoring as a necessary component (Takahashi et al. Reference Takahashi, Narayanan and Ghazanfar2013) – an example of convergent evolution.

The lack of evidence for such turn-taking (vocal or otherwise) in apes suggests that human cooperative vocal communication could have evolved in a manner very different than what the gestural-origins hypotheses predict (Rizzolatti & Arbib Reference Rizzolatti and Arbib1998; Tomasello Reference Tomasello2008). In this alternative scenario, existing vocal repertoires could begin to be used in a cooperative, turn-taking manner when prosocial behaviors in general emerged. Although the physiological basis of cooperative breeding is unknown (Fernandez-Duque et al. Reference Fernandez-Duque, Valeggia and Mendoza2009), the “prosociality” that comes with it certainly would require modifications to the organization of social and motivational neuroanatomical circuitry. This must have been an essential step in the evolution of both human and marmoset cooperative vocal communication – one that may, like vocal production learning, also include changes to the cortical-basal ganglia loops as well as changes to socially related motivational circuitry in the hypothalamus and amygdala (Syal & Finlay Reference Syal and Finlay2011). These neuroanatomical changes would link vocalizations and response contingency to reward centers during development. Importantly, given the small encephalization quotient of marmosets, such changes may not require an enlarged brain.

References

Ackermann, H. & Riecker, A. (2004) The contribution of the insula to motor aspects of speech production: A review and a hypothesis. Brain and Language 89:320–28.Google Scholar
Bauernfiend, A. L., de Sousa, A. A., Avasthi, T., Dobson, S. D., Raghanti, M. A., Lewandowski, A. H., Zilles, K., Semendeferi, K., Allman, J. M., Craig, A. D., Hof, P. R. & Sherwood, C. C. (2013) A volumetric comparison of the insular cortex and its subregions in primates. Journal of Human Evolution 64:263–79.CrossRefGoogle Scholar
Bergman, T. J. (2013) Speech-like vocalized lip-smacking in geladas. Current Biology 23(7):R268–69.CrossRefGoogle ScholarPubMed
Bohland, J. W. & Guenther, F. H. (2006) An fMRI investigation of syllable sequence production. NeuroImage 2:821–41.CrossRefGoogle Scholar
Burkart, J. M., Hrdy, S. B. & van Schaik, C. P. (2009a) Cooperative breeding and human cognitive evolution. Evolutionary Anthropology 18:175–86.Google Scholar
Caruana, F., Jezzini, A., Sbriscia-Fioretti, B., Rizzolatti, G. & Gallese, V. (2011) Emotional and social behaviors elicited by electrical stimulation of the insula in the macaque monkey. Current Biology 21:195–99.Google Scholar
Chandrasekaran, C., Lemus, L., Trubanova, A., Gondan, M. & Ghazanfar, A. A. (2011) Monkeys and humans share a common computation for face/voice integration. PLOS Computational Biology 7(9):e1002165.Google Scholar
Chandrasekaran, C., Trubanova, A., Stillittano, S., Caplier, A. & Ghazanfar, A. A. (2009) The natural statistics of audiovisual speech. PLOS Computational Biology 5:e1000436.Google Scholar
Dronkers, N. F. (1996) A new brain region for coordinating speech articulation. Nature 384(6605):159–61. doi: 10.1038/384159a0.Google Scholar
Fernandez-Duque, E., Valeggia, C. R. & Mendoza, S. P. (2009) The biology of paternal care in human and nonhuman primates. Annual Review of Anthropology 38:115–30.Google Scholar
Ferrari, P. F., Paukner, A., Ionica, C. & Suomi, S. J. (2009) Reciprocal face-to-face communication between rhesus macaque mothers and their newborn infants. Current Biology 19:1768–72.Google Scholar
Ghazanfar, A. A. (2013) Multisensory vocal communication in primates and the evolution of rhythmic speech. Behavioral Ecology and Sociobiology 67(9):1441–48.Google Scholar
Ghazanfar, A. A., Chandrasekaran, C. & Morrill, R. J. (2010) Dynamic, rhythmic facial expressions and the superior temporal sulcus of macaque monkeys: Implications for the evolution of audiovisual speech. European Journal of Neuroscience 31:1807–17.Google Scholar
Ghazanfar, A. A. & Logothetis, N. K. (2003) Facial expressions linked to monkey calls. Nature 423(6943):937–38.Google Scholar
Ghazanfar, A. A. & Poeppel, D. (2014) The neurophysiology and evolution of the speech rhythm. In: The cognitive neurosciences V (5th edition), ed. Gazzaniga, M. S. & Mangun, G. R., pp. 629–38. MIT Press.Google Scholar
Ghazanfar, A. A. & Rendall, D. (2008) Evolution of human vocal production. Current Biology 18(11):R457–60.Google Scholar
Ghazanfar, A. A., Takahashi, D. Y., Mathur, N. & Fitch, W. T. (2012) Cineradiography of monkey lipsmacking reveals the putative origins of speech dynamics. Current Biology 22:1176–82.Google Scholar
Ghazanfar, A. A., Turesson, H. K., Maier, J. X., van Dinther, R., Patterson, R. D. & Logothetis, N. K. (2007) Vocal tract resonances as indexical cues in rhesus monkeys. Current Biology 17:425–30.Google Scholar
Greenberg, S., Carvey, H., Hitchcock, L. & Chang, S. (2003) Temporal properties of spontaneous speech – a syllable-centric perspective. Journal of Phonetics 31 (3–4):465–85.Google Scholar
Habbershon, H. M., Ahmed, S. Z. & Cohen, Y. E. (2013) Rhesus macaques recognize unique multimodal face-voice relations of familiar individuals and not of unfamiliar ones. Brain, Behavior, and Evolution 81:219–25.Google Scholar
Hasson, U., Ghazanfar, A. A., Galantucci, B., Garrod, S. & Keysers, C. (2012) Brain-to-brain coupling: A mechanism for creating and sharing a social world. Trends in Cognitive Sciences 16(2):114–21.Google Scholar
Jordan, K. E., Brannon, E. M., Logothetis, N. K. & Ghazanfar, A. A. (2005) Monkeys match the number of voices they hear with the number of faces they see. Current Biology 15:1034–38.Google Scholar
Lund, J. P. & Kolta, A. (2006) Brainstem circuits that control mastication: Do they have anything to say during speech? Journal of Communication Disorders 39:381–90.Google Scholar
MacNeilage, P. F. (1998) The frame/content theory of evolution of speech production. Behavioral and Brain Sciences 21(4):499511.CrossRefGoogle ScholarPubMed
Morrill, R. J., Paukner, A., Ferrari, P. F. & Ghazanfar, A. A. (2012) Monkey lip-smacking develops like the human speech rhythm. Developmental Science 15:557–68.Google Scholar
Rizzolatti, G. & Arbib, M. A. (1998) Language within our grasp. Trends in Neurosciences 21:188–94.Google Scholar
Sliwa, J., Duhamel, J. R., Pascalis, O. & Wirth, S. (2011) Spontaneous voice-face identity matching by rhesus monkeys for familiar conspecifics and humans. Proceedings of the National Academy of Sciences USA 108:1735–40.CrossRefGoogle ScholarPubMed
Stivers, T., Enfield, N. J., Brown, P., Englert, C., Hayashi, M., Heinemann, T., Hoymann, G., Rossano, F., de Ruiter, J. P., Yoon, K. E. & Levinson, S. C. (2009) Universals and cultural variation in turn-taking in conversation. Proceedings of the National Academy of Sciences USA 106:10587–92.CrossRefGoogle ScholarPubMed
Syal, S. & Finlay, B. L. (2011) Thinking outside the cortex: Social motivation in the evolution and development of language. Developmental Science 14:417–30.Google Scholar
Takahashi, D. Y., Narayanan, D. Z. & Ghazanfar, A. A. (2013) Coupled oscillator dynamics of vocal turn-taking in monkeys. Current Biology 23:2162–68.Google Scholar
Tomasello, M. (2008) Origins of human communication. MIT Press.Google Scholar
Van Hooff, J. A. R. A. M. (1962) Facial expressions of higher primates. Symposium of the Zoological Society, London 8:97125.Google Scholar