“Origins of Music in Credible Singing” by Mehr et al. and “Music as a Co-evolved System for Social Bonding” by Savage et al. attempt to explain the evolutionary origins of musicality. Although both papers mention gene–culture interaction, both fall short of an evolutionary account, which would need to consider the complex interactions between adaptations specifically for musicality (i.e., musical capacity itself increased inclusive fitness, leading to adaptive changes), exaptation (evolutionary pressures for adaptation of non-musical traits or capacities that, once evolved, helped enable musicality), and cultural creation (which could be useful for fitness and, although not an evolutionary adaptation, could drive subsequent evolutionary adaptations). The evolution of complex cognitive capacities such as musicality and language are almost certain to involve a complex interplay between these three factors (see Trainor, Reference Trainor2015, Reference Trainor and Honing2018 for a detailed discussion).
Both the Mehr et al.'s and Savage et al.'s papers agree in focusing on social interactions as the adaptive functions that drove the evolution of musicality. However, they differ markedly in how they conceptualize the role of musicality in promoting social interaction. For Mehr et al., music is a form of credible signaling that conveys information from sender to receiver about their social cohesion intentions – music can signal to a competitor the coalition strength of one's group, or signal one's attention to an infant. Savage et al. postulate that, independent from musicality, social cohesion within groups has adaptive benefits – and that musicality originated as a human cultural creation, but because music caused increased social cohesion, through adaptation it became more effective in this regard.
One elephant in the room of these papers is a definition of musicality. For Savage et al., musicality arose recently, initially through human culture, and involves a set of design features including predictive hierarchical beat structures and “synchronized, harmonized singing and dancing in groups.” For Mehr et al., music is ancient and found in many nonhuman species in the form of vocalizations to signal territory and contact calls. Thus, directly comparing the papers is difficult.
What both papers are missing is an evolutionary approach that not only addresses possible adaptive functions of musicality, but why and how musical capacities evolved in the particular way they did. Similar to other complex capacities, the evolution of musical capacities will likely not consist of one adaptation, but rather a long sequence of adaptive, exapted, and cultural influences that interact in complex ways. Human biology, including brain architecture, has changed from that of our closest genetic relatives (see Savage et al.), so some adaptations (whether music-specific or exaptations) enhancing musicality have evolved fairly recently. At the same time, a full understanding requires considerations of precursor abilities that evolved over longer timeframes.
Both papers describe multiple adaptive effects of musicality, from intimate infant caretaking to social signaling or bonding between large groups of adults. Why would these disparate functions be served by the same musical faculty? Why did we not evolve three systems, for example, for information communication (language), intimate emotional interaction, and group bonding? Because music evolved in the context of a brain and body already adapted for other functions, there were likely severe constraints on possible forms communication systems might take. In other words, music is, in part, an exaptation or perhaps in some aspects a “byproduct” of these prior adaptations. Here, I discuss three important constraints.
First, why is music based in the auditory-motor system? As Mehr et al. elaborated, signaling through sound production and perception involves evolutionarily ancient adaptations found in animals as disparate as insects, birds, and mammals. Many would not call such signaling music per se, but it was available to be exapted in the creation of human musicality, thereby greatly constraining musical forms. Thus, musicality is, in part, an exaptation of adaptations for auditory production–perception signaling systems.
Second, why does music have the pitch structure it does? As elaborated by Trainor (Reference Trainor2015, Reference Trainor and Honing2018; see also Huron, Reference Huron2001), the perception of pitch itself is most likely a byproduct of auditory scene analysis (Bregman, Reference Bregman1990). The goal of auditory scene analysis is to identify and locate multiple sound sources in a typical environment. An obvious evolutionary advantage is for identifying predators, prey, and offspring. Most animal vocalizations have harmonic structure consisting of a fundamental frequency and harmonics at integer multiples of the fundamental. Harmonics of simultaneous sound sources overlap in frequency range, and the signal reaching the ear consists of their sum in one complex waveform. The neural solution, preserved across many species, is to first separate the complex waveform into its constituent frequencies, and then groups frequencies together that stand in harmonic relations to a fundamental, as these likely originated from the same sound source. At the level of consciousness, individual frequencies are not perceived, only integrated percepts with particular pitches, timbres, and spatial locations. Thus, pitch perception is a byproduct of evolutionary pressures for auditory scene analysis, and was likely exapted for the creation of musicality. There may also have been more recent musicality-specific adaptations to enhance perception and production of pitch and tonal structures especially in group contexts.
Third, why is music rewarding? Accurately predicting the future is crucial for fitness (Huron, Reference Huron2006) – prediction errors can result in being eaten or a missed mating opportunity. Indeed, the brain is continually predicting the future and adjusting its internal models when incorrect (predictive coding) (e.g., Heilbron & Chait, Reference Heilbron and Chait2018; Trainor & Zatorre, Reference Trainor, Zatorre, Hallam, Cross and Thaut2015). Across many species, prediction is intimately connected to brain reward centers (Schultz, Reference Schultz2013). The often-noted tonal and rhythmic regularities in music enable prediction of upcoming sounds. Indeed, one argument for musicality being a cultural creation is that it well designed to activate preexisting reward centers (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, Reference Salimpoor, Zald, Zatorre, Dagher and McIntosh2015). Although subsequent adaptations may have enhanced these effects, musicality is, in part, an exaptation of ancient adaptations for rewarding correct predictions.
In sum, although it is not easy to reconstruct the evolutionary history of complex capacities, I propose that to understand the evolution of musicality we need to seriously consider complex interactions between music-specific adaptations, exaptations, and cultural creation over an extended evolutionary timeframe.
“Origins of Music in Credible Singing” by Mehr et al. and “Music as a Co-evolved System for Social Bonding” by Savage et al. attempt to explain the evolutionary origins of musicality. Although both papers mention gene–culture interaction, both fall short of an evolutionary account, which would need to consider the complex interactions between adaptations specifically for musicality (i.e., musical capacity itself increased inclusive fitness, leading to adaptive changes), exaptation (evolutionary pressures for adaptation of non-musical traits or capacities that, once evolved, helped enable musicality), and cultural creation (which could be useful for fitness and, although not an evolutionary adaptation, could drive subsequent evolutionary adaptations). The evolution of complex cognitive capacities such as musicality and language are almost certain to involve a complex interplay between these three factors (see Trainor, Reference Trainor2015, Reference Trainor and Honing2018 for a detailed discussion).
Both the Mehr et al.'s and Savage et al.'s papers agree in focusing on social interactions as the adaptive functions that drove the evolution of musicality. However, they differ markedly in how they conceptualize the role of musicality in promoting social interaction. For Mehr et al., music is a form of credible signaling that conveys information from sender to receiver about their social cohesion intentions – music can signal to a competitor the coalition strength of one's group, or signal one's attention to an infant. Savage et al. postulate that, independent from musicality, social cohesion within groups has adaptive benefits – and that musicality originated as a human cultural creation, but because music caused increased social cohesion, through adaptation it became more effective in this regard.
One elephant in the room of these papers is a definition of musicality. For Savage et al., musicality arose recently, initially through human culture, and involves a set of design features including predictive hierarchical beat structures and “synchronized, harmonized singing and dancing in groups.” For Mehr et al., music is ancient and found in many nonhuman species in the form of vocalizations to signal territory and contact calls. Thus, directly comparing the papers is difficult.
What both papers are missing is an evolutionary approach that not only addresses possible adaptive functions of musicality, but why and how musical capacities evolved in the particular way they did. Similar to other complex capacities, the evolution of musical capacities will likely not consist of one adaptation, but rather a long sequence of adaptive, exapted, and cultural influences that interact in complex ways. Human biology, including brain architecture, has changed from that of our closest genetic relatives (see Savage et al.), so some adaptations (whether music-specific or exaptations) enhancing musicality have evolved fairly recently. At the same time, a full understanding requires considerations of precursor abilities that evolved over longer timeframes.
Both papers describe multiple adaptive effects of musicality, from intimate infant caretaking to social signaling or bonding between large groups of adults. Why would these disparate functions be served by the same musical faculty? Why did we not evolve three systems, for example, for information communication (language), intimate emotional interaction, and group bonding? Because music evolved in the context of a brain and body already adapted for other functions, there were likely severe constraints on possible forms communication systems might take. In other words, music is, in part, an exaptation or perhaps in some aspects a “byproduct” of these prior adaptations. Here, I discuss three important constraints.
First, why is music based in the auditory-motor system? As Mehr et al. elaborated, signaling through sound production and perception involves evolutionarily ancient adaptations found in animals as disparate as insects, birds, and mammals. Many would not call such signaling music per se, but it was available to be exapted in the creation of human musicality, thereby greatly constraining musical forms. Thus, musicality is, in part, an exaptation of adaptations for auditory production–perception signaling systems.
Second, why does music have the pitch structure it does? As elaborated by Trainor (Reference Trainor2015, Reference Trainor and Honing2018; see also Huron, Reference Huron2001), the perception of pitch itself is most likely a byproduct of auditory scene analysis (Bregman, Reference Bregman1990). The goal of auditory scene analysis is to identify and locate multiple sound sources in a typical environment. An obvious evolutionary advantage is for identifying predators, prey, and offspring. Most animal vocalizations have harmonic structure consisting of a fundamental frequency and harmonics at integer multiples of the fundamental. Harmonics of simultaneous sound sources overlap in frequency range, and the signal reaching the ear consists of their sum in one complex waveform. The neural solution, preserved across many species, is to first separate the complex waveform into its constituent frequencies, and then groups frequencies together that stand in harmonic relations to a fundamental, as these likely originated from the same sound source. At the level of consciousness, individual frequencies are not perceived, only integrated percepts with particular pitches, timbres, and spatial locations. Thus, pitch perception is a byproduct of evolutionary pressures for auditory scene analysis, and was likely exapted for the creation of musicality. There may also have been more recent musicality-specific adaptations to enhance perception and production of pitch and tonal structures especially in group contexts.
Third, why is music rewarding? Accurately predicting the future is crucial for fitness (Huron, Reference Huron2006) – prediction errors can result in being eaten or a missed mating opportunity. Indeed, the brain is continually predicting the future and adjusting its internal models when incorrect (predictive coding) (e.g., Heilbron & Chait, Reference Heilbron and Chait2018; Trainor & Zatorre, Reference Trainor, Zatorre, Hallam, Cross and Thaut2015). Across many species, prediction is intimately connected to brain reward centers (Schultz, Reference Schultz2013). The often-noted tonal and rhythmic regularities in music enable prediction of upcoming sounds. Indeed, one argument for musicality being a cultural creation is that it well designed to activate preexisting reward centers (Salimpoor, Zald, Zatorre, Dagher, & McIntosh, Reference Salimpoor, Zald, Zatorre, Dagher and McIntosh2015). Although subsequent adaptations may have enhanced these effects, musicality is, in part, an exaptation of ancient adaptations for rewarding correct predictions.
In sum, although it is not easy to reconstruct the evolutionary history of complex capacities, I propose that to understand the evolution of musicality we need to seriously consider complex interactions between music-specific adaptations, exaptations, and cultural creation over an extended evolutionary timeframe.
Financial support
This commentary was supported by grants from the Canadian Institutes of Health Research (MOP 153130), the Natural Sciences and Engineering Research Council of Canada (RGPIN-2019-05416), the Social Science and Humanities Research Council (435-2020-0442), and the Canadian Institute for Advanced Research.
Conflict of interest
None.