Mehr, Krasnow, Bryant, and Hagen (Mehr et al.) present a cogent argument that music evolved as a credible signal of coalitional formidability and, within parent–infant relationships, of caregiver attention. Their careful application of an adaptationist logic serves as a prime example of how to conduct work in evolutionary science, and they marshal a compelling case against both the mate quality and social bonding models of music's origins. That said, we do not believe Mehr et al. provide adequate grounds to dismiss the hypothesis that music is a by-product of adaptations for language. To illustrate, consider the six points which the authors use to dispel the by-product hypothesis.
The authors first suggest that the widespread convergent evolution of “song-like vocalizations” and the presence of “musical behaviors” across species demonstrates that “music-like adaptations” could have evolved in humans. But, as the authors acknowledge in a footnote, it's not clear what these vocalizations and other behaviors represent. Can we be sure that these phenomena are not instead “proto-language-like” evidencing that “proto-language-like” adaptations can evolve? Calling animal vocalizations “musical” or “song-like” as a new category of phenomena might be unfounded, perhaps akin to calling running a separate adaptation from walking, despite the common entrainment of psycho-motor systems.
Mehr et al. then note that music is a human universal (so is, for instance, language), that music production and perception is complex (so is language), that it has a grammar-like structure (so does language), that it isn't random (neither is language), and that artificial intelligence (AI) engineers have difficulty to replicating it (ditto for language). At this point, music is starting to look like a duck.
Mehr et al. offer that motivations and abilities to perceive music appear early, that specific neural circuitry underlies music perception, and that deficits to specific circuitry impair music perception. None of this is surprising. What would be surprising would be to find that impairments causing tone-deafness didn't also impair linguistic cadence/tonality perception. Mehr et al. cite Norman-Haignere, Kanwisher, and McDermott (Reference Norman-Haignere, Kanwisher and McDermott2015), who report that music and speech are captured by different neural component profiles. But, the label “music” could just as easily have been “prosody,” and their findings viewed as evidence that different components of language are processed by different cortical circuits, much like edges and depth in vision perception. Finally, Mehr et al.'s claim that music is culturally ancient again begs the question of whether we are merely talking about a by-product of, say, language, because adaptations for language, too, are generally regarded as ancient.
Although the authors concede that none of the six lines of evidence alone dismisses the by-product hypothesis, we suggest that, even together, all six do not adequately motivate the search for an evolved adaptation. Additional evidence and theoretical rationale are required to convincingly argue that music is a separate adaptation, either for signaling coalitional formidability or for signaling joint attention. Next, we examine issues specific to each of these two putative functions.
First, with respect to coalitions, it is unclear why signals of formidability need be credible. Predators don't signal prey from afar. As Sun Tzu in The Art of War states: “All warfare is based on deception.” In the context of coalitional antagonism, why should we expect coalitions to reliably signal their formidability when successful territorial defense (or, for that matter, appropriation) might best be accomplished by deceiving rather than informing the enemy?
An alternative function for music in the context of coalitional antagonism is suggested by Sun Tzu: “On the field of battle, the spoken word does not carry far enough: hence the institution of gongs and drums. Nor can ordinary objects be seen clearly enough: hence the institution of banners and flags.” In this sense, music may serve to coordinate coalitional members in the context of intergroup antagonism, but music (or, for that matter, flag-waving) does not itself function as a strong signal to enemies of the group's ability to coordinate and, in turn, to enforce its interests. One does not easily imagine that soldiers of the Chinese army, upon encountering enemy infantry massed before them on the plain, would mutter to themselves, “Oh, shit. They've got gongs and drums.” Nor would those soldiers overly concern themselves with the threat of banners and flags. Thus, we suspect that inferences of coalitional formidability from cues of coordination are not made as readily as the authors' coalitional signaling account suggests. Additional empirical evidence is needed, given that the only research cited in support (Fessler & Holbrook, Reference Fessler and Holbrook2016) relied upon indirect measures of perceived coalitional formidability.
Second, with respect to parent–infant interactions, it is once again unclear that music is decisively different from or superior to language in its ability to solve the adaptive problem of assessing caregiver attention. Cognitive mechanisms for inferring the direction and source of vocalizations, for inferring attention from vocal turn-taking, and for associating voice tones and volume with meaning and intent – all appear to be features of language and music alike. Consider motherese (not mentioned by Mehr et al.). Motherese solves the problem of infant-directed attention, but motherese does not represent a clear break from language into a distinctly musical realm. It is linguistic although emphasizing language's ability to exploit pitch and tone. As an available solution to the problem of infant-directed attention, motherese points to the strong overlap between language and music – the latter of which elaborates upon elements of the former. We suggest that the increasing complexity of human social structures over time enabled the production and perception of subtle shades of linguistic expressions, meanings, and intentions that could be variably deployed across an array of relationships, caregiver–infant interactions and coalitional allies being two prime examples. In short, the flexibility of language solves the problem of mental coordination.
To the claim that music represents a separate adaptation, we must therefore echo the words of Galileo, “E pur si quacks” (And yet it quacks).
Mehr, Krasnow, Bryant, and Hagen (Mehr et al.) present a cogent argument that music evolved as a credible signal of coalitional formidability and, within parent–infant relationships, of caregiver attention. Their careful application of an adaptationist logic serves as a prime example of how to conduct work in evolutionary science, and they marshal a compelling case against both the mate quality and social bonding models of music's origins. That said, we do not believe Mehr et al. provide adequate grounds to dismiss the hypothesis that music is a by-product of adaptations for language. To illustrate, consider the six points which the authors use to dispel the by-product hypothesis.
The authors first suggest that the widespread convergent evolution of “song-like vocalizations” and the presence of “musical behaviors” across species demonstrates that “music-like adaptations” could have evolved in humans. But, as the authors acknowledge in a footnote, it's not clear what these vocalizations and other behaviors represent. Can we be sure that these phenomena are not instead “proto-language-like” evidencing that “proto-language-like” adaptations can evolve? Calling animal vocalizations “musical” or “song-like” as a new category of phenomena might be unfounded, perhaps akin to calling running a separate adaptation from walking, despite the common entrainment of psycho-motor systems.
Mehr et al. then note that music is a human universal (so is, for instance, language), that music production and perception is complex (so is language), that it has a grammar-like structure (so does language), that it isn't random (neither is language), and that artificial intelligence (AI) engineers have difficulty to replicating it (ditto for language). At this point, music is starting to look like a duck.
Mehr et al. offer that motivations and abilities to perceive music appear early, that specific neural circuitry underlies music perception, and that deficits to specific circuitry impair music perception. None of this is surprising. What would be surprising would be to find that impairments causing tone-deafness didn't also impair linguistic cadence/tonality perception. Mehr et al. cite Norman-Haignere, Kanwisher, and McDermott (Reference Norman-Haignere, Kanwisher and McDermott2015), who report that music and speech are captured by different neural component profiles. But, the label “music” could just as easily have been “prosody,” and their findings viewed as evidence that different components of language are processed by different cortical circuits, much like edges and depth in vision perception. Finally, Mehr et al.'s claim that music is culturally ancient again begs the question of whether we are merely talking about a by-product of, say, language, because adaptations for language, too, are generally regarded as ancient.
Although the authors concede that none of the six lines of evidence alone dismisses the by-product hypothesis, we suggest that, even together, all six do not adequately motivate the search for an evolved adaptation. Additional evidence and theoretical rationale are required to convincingly argue that music is a separate adaptation, either for signaling coalitional formidability or for signaling joint attention. Next, we examine issues specific to each of these two putative functions.
First, with respect to coalitions, it is unclear why signals of formidability need be credible. Predators don't signal prey from afar. As Sun Tzu in The Art of War states: “All warfare is based on deception.” In the context of coalitional antagonism, why should we expect coalitions to reliably signal their formidability when successful territorial defense (or, for that matter, appropriation) might best be accomplished by deceiving rather than informing the enemy?
An alternative function for music in the context of coalitional antagonism is suggested by Sun Tzu: “On the field of battle, the spoken word does not carry far enough: hence the institution of gongs and drums. Nor can ordinary objects be seen clearly enough: hence the institution of banners and flags.” In this sense, music may serve to coordinate coalitional members in the context of intergroup antagonism, but music (or, for that matter, flag-waving) does not itself function as a strong signal to enemies of the group's ability to coordinate and, in turn, to enforce its interests. One does not easily imagine that soldiers of the Chinese army, upon encountering enemy infantry massed before them on the plain, would mutter to themselves, “Oh, shit. They've got gongs and drums.” Nor would those soldiers overly concern themselves with the threat of banners and flags. Thus, we suspect that inferences of coalitional formidability from cues of coordination are not made as readily as the authors' coalitional signaling account suggests. Additional empirical evidence is needed, given that the only research cited in support (Fessler & Holbrook, Reference Fessler and Holbrook2016) relied upon indirect measures of perceived coalitional formidability.
Second, with respect to parent–infant interactions, it is once again unclear that music is decisively different from or superior to language in its ability to solve the adaptive problem of assessing caregiver attention. Cognitive mechanisms for inferring the direction and source of vocalizations, for inferring attention from vocal turn-taking, and for associating voice tones and volume with meaning and intent – all appear to be features of language and music alike. Consider motherese (not mentioned by Mehr et al.). Motherese solves the problem of infant-directed attention, but motherese does not represent a clear break from language into a distinctly musical realm. It is linguistic although emphasizing language's ability to exploit pitch and tone. As an available solution to the problem of infant-directed attention, motherese points to the strong overlap between language and music – the latter of which elaborates upon elements of the former. We suggest that the increasing complexity of human social structures over time enabled the production and perception of subtle shades of linguistic expressions, meanings, and intentions that could be variably deployed across an array of relationships, caregiver–infant interactions and coalitional allies being two prime examples. In short, the flexibility of language solves the problem of mental coordination.
To the claim that music represents a separate adaptation, we must therefore echo the words of Galileo, “E pur si quacks” (And yet it quacks).
Conflict of interest
None.