The evolution of coordinated vocalizations before language

Gregory A. Bryant

doi:10.1017/S0140525X1300397X

The evolution of coordinated vocalizations before language

Published online by Cambridge University Press: 17 December 2014

Gregory A. Bryant

Show author details

Gregory A. Bryant*: Affiliation:
Department of Communication, Center for Behavior, Evolution, and Culture, University of California, Los Angeles (UCLA), Los Angeles, CA 90095-1563. gabryant@ucla.eduhttp://gabryant.bol.ucla.edu/

Article contents

Abstract
References

Rights & Permissions

Abstract

Ackermann et al. briefly point out the potential significance of coordinated vocal behavior in the dual pathway model of acoustic communication. Rhythmically entrained and articulated pre-linguistic vocal activity in early hominins might have set the evolutionary stage for later refinements that manifest in modern humans as language-based conversational turn-taking, joint music-making, and other behaviors associated with prosociality.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 37 , Issue 6 , December 2014 , pp. 549 - 550

DOI: https://doi.org/10.1017/S0140525X1300397X [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2014

Ackermann et al. present an excellent overview of the neurocognitive architecture underlying primate vocal production, including a proposal for the evolution of articulated speech in humans. Multiple sources of evidence support the dual pathway model of acoustic communication. The evolution of volitional control over vocalizations might critically involve adaptations for rhythmic entrainment (i.e., a coupling of independent oscillators that have some means of energy transfer between them). Entrained vocal and non-vocal behaviors afford a variety of modern abilities such as turn-taking in conversation and coordinated music-making, in addition to refinements that lead to the production of speech sounds that interface with the language faculty.

Wilson and Wilson (Reference Wilson and Wilson2005) described an oscillator model of conversational turn-taking where syllable production entrainment allows for efficient interlocutor coordination with minimal gap and overlap in talk. The mechanisms underlying this ability might have been present in the hominin line well before language evolved, and could be closely tied to potential early functions of social signaling including rhythmic musical behavior and dance (Bryant Reference Bryant2013; Hagen & Bryant Reference Hagen and Bryant2003; Hagen & Hammerstein Reference Hagen and Hammerstein2009). Research on error correction mechanisms has revealed several design features of such entrainment mechanisms. Repp (Reference Repp2005) proposed distinct neural systems underlying different kinds of error correction in synchronous tapping. Phase-related adjustments involve dorsal processes controlling action, while ventral perception and planning processes underlie period correction adjustments.

Bispham (Reference Bispham2006) and Phillips-Silver et al. (Reference Phillips-Silver, Aktipis and Bryant2010) have suggested that behavioral entrainment in humans involves the coupling of perception and action incorporating pre-existing elements of motor control and pulse perception. This coupling is plausibly linked to Ackermann et al.'s first phylogenetic stage including laryngeal elaboration and monosynaptic refinement of corticobulbar tracts. In order to implement proper error correction in improvised contexts of vocal synchrony, volitional control over articulators is necessary. While little comparative work has shown such an ability in nonhuman primates, there is some evidence suggesting control over vocal articulators in gelada baboons, with an ability to control, for example, vocal onset times relative to conspecific vocalizations (Richman Reference Richman1976). And recently, Perlman et al. (Reference Perlman, Patterson and Cohn2012) have found that Koko the gorilla exercises breath control in her deliberate play with wind instruments. Other evidence of this sort is certainly forthcoming, and will help us develop an accurate account of the evolutionary precursors to speech production in humans.

Laughter provides a window into the phylogeny of human vocal production as well. Laugh-like vocalizations first appeared prior to the last common ancestor (Davila-Ross et al. Reference Davila-Ross, Owren and Zimmermann2009), and in humans is likely derived from the breathing patterns exhibited during play activity (Provine Reference Provine2000). Bryant and Aktipis (Reference Bryant and Aktipis2014) found that perceptible proportions of inter-voicing intervals (IVIs) differed systematically between spontaneous and volitional human laughter, and altered versions of the laughs were differentially perceived as being human made, and related to the IVI measures. Specifically, slowed spontaneous laughs were indistinguishable from nonhuman animal calls, while slowed volitional laughs were recognizable as being human produced. These data were interpreted as being evidence for perceptual sensitivity to vocalizations originating from different production machinery – a finding consistent with the dual pathway model presented here by Ackermann et al.

Interestingly, laughter seems to play a role in coordinating conversational timing. Manson et al. (Reference Manson, Bryant, Gervais and Kline2013) have reported that convergence in speech rate was positively associated with how much interlocutors engaged in co-laughter. While the degree of convergence over a 10-minute conversation predicted cooperative play in an unannounced Prisoner's Dilemma game, the amount of co-laughter did not. The relationship between laughter and speech is not well understood, though evidence suggests that it is integrated to some extent. The placement of laughter in the speech stream follows some linguistic patterns (i.e., a punctuation effect) (Provine Reference Provine1993), but also manifests itself embedded within words and sentences as well (Bryant Reference Bryant2012). Co-laughter might serve in some capacity to help conversationalists coordinate their talk, and, in early humans, perhaps coordinate other kinds of vocal behavior. Recent work has demonstrated that people can detect in very short co-laughter segments (<2 seconds) whether the co-laughers are acquainted or not (Bryant Reference Bryant2012) suggesting a possible chorusing function.

A surge of recent work is showing how interpersonal synchrony involving entrainment results in cooperative interactions (e.g., Kirschner & Tomasello Reference Kirschner and Tomasello2010; Manson et al. Reference Manson, Bryant, Gervais and Kline2013; Wiltermuth & Heath Reference Wiltermuth and Heath2009), and the effect seems immune to the negative consequences of explicit recognition. That is, when behavior matching is noticed, but does not involve fine temporal coordination, interactants do not respond positively (e.g., Bailenson et al. Reference Bailenson, Yee, Patel and Beall2008). Manson et al. (Reference Manson, Bryant, Gervais and Kline2013) described interpersonal synchrony as a coordination game that does not afford cheating opportunities, unlike mimicry and other behavior matching phenomena where deceptive, manipulative strategies are potentially profitable. Coordinating vocal (and other) behavior provides a means for individuals to assess the fit of others as cooperating partners. Given the extreme cooperative nature of humans relative to other species, mechanisms for such assessment are not surprising, and in fact should be expected.

Taken together, the findings described above point to an important component of human vocal communication that involves the independent and integrated action of emotional vocal production and speech production systems. Selection for articulatory control mechanisms underlying the entrainment of vocal behavior for within- and between-group communicative functions could have set the stage for conversational turn-taking – an ability that incorporated speech. Dual pathway models of acoustic communication should more seriously consider the neurocognitive underpinnings of vocal entrainment abilities and consider these adaptations in the phylogenetic history of human vocal behavior.

References

Bailenson, J. N., Yee, N., Patel, K. & Beall, A. C. (2008) Detecting digital chameleons. Computers in Human Behavior 24:66–87.Google Scholar

Bispham, J. (2006) Rhythm in music: What is it? Who has it? And why? Music Perception 24:125–34.CrossRef Google Scholar

Bryant, G. A. (2012) Shared laughter in conversation as coalition signaling. Paper presented at the XXI Biennial International Conference on Human Ethology, Vienna, Austria, August 13, 2012.Google Scholar

Bryant, G. A. (2013) Animal signals and emotion in music: Coordinating affect across groups. Frontiers in Psychology 4(Article 990):1–13.Google Scholar

Bryant, G. A. & Aktipis, A. (2014) The animal nature of spontaneous human laughter. Evolution and Human Behavior 35(4):327–35.CrossRef Google Scholar

Davila-Ross, M., Owren, M. & Zimmermann, E. (2009) Reconstructing the evolution of laughter in great apes and humans. Current Biology 19:1106–11.CrossRef Google Scholar PubMed

Hagen, E. H. & Bryant, G. A. (2003) Music and dance as a coalition signaling system. Human Nature 14(1):21–51.CrossRef Google Scholar PubMed

Hagen, E. H. & Hammerstein, P. (2009) Did Neanderthals and other early humans sing? Seeking the biological roots of music in the territorial advertisements of primates, lions, hyenas, and wolves. Musicae Scientiae 13(Suppl. 2):291–320.Google Scholar

Kirschner, S. & Tomasello, M. (2010) Joint music making promotes prosocial behavior in 4- year-old children. Evolution and Human Behavior 31:354–64.Google Scholar

Manson, J. E., Bryant, G. A., Gervais, M. & Kline, M. (2013) Convergence of speech rate in conversation predicts cooperation. Evolution and Human Behavior 34(6):419–26.CrossRef Google Scholar

Perlman, M., Patterson, F. G. & Cohn, R. H. (2012) The human-fostered gorilla Koko shows breath control in play with wind instruments. Biolinguistics 6:433–44.Google Scholar

Phillips-Silver, J., Aktipis, A. & Bryant, G. A. (2010) The ecology of entrainment: Foundations of coordinated rhythmic movement. Music Perception 28(1):3–14.Google Scholar

Provine, R. R. (1993) Laughter punctuates speech: Linguistic, social and gender contexts of laughter. Ethology 95:291–98.CrossRef Google Scholar

Provine, R. R. (2000) Laughter: A scientific investigation. Viking/Penguin Press.Google Scholar

Repp, B. H. (2005) Sensorimotor synchronization: A review of the tapping literature. Psychonomic Bulletin and Review 12:969–92.Google Scholar

Richman, B. (1976) Some vocal distinctive features used by gelada monkeys. Journal of the Acoustical Society of America 60:718–24.Google Scholar

Wilson, M. & Wilson, T. P. (2005) An oscillator model of the timing of turn-taking. Psychonomic Bulletin and Review 12:957–68.CrossRef Google Scholar PubMed