Hostname: page-component-745bb68f8f-b6zl4 Total loading time: 0 Render date: 2025-02-06T08:57:09.421Z Has data issue: false hasContentIssue false

Rapid dissonant grunting, or, but why does music sound the way it does?

Published online by Cambridge University Press:  30 September 2021

Beau R. Sievers
Affiliation:
Psychology Department, Harvard University, Cambridge, MA02138, USA. beau@beausievers.comwww.beausievers.com
Thalia Wheatley
Affiliation:
Psychological and Brain Sciences, Dartmouth and Santa Fe Institute, Hanover, NH03755, USA. thalia.p.wheatley@dartmouth.eduwww.wheatlab.com

Abstract

Each target article contributes important proto-musical building blocks that constrain music as-we-know-it. However, neither the credible signaling nor social bonding accounts elucidate the central mystery of why music sounds the way it does. Getting there requires working out how proto-musical building blocks combine and interact to create the complex, rich, and affecting music humans create and enjoy.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2021. Published by Cambridge University Press

The social bonding and credible signaling hypotheses share a basic strategy: To identify a set of proto-musical building blocks that, if connected in the right way, could be shaped by cultural evolution into music as-we-know-it. To this end, both hypotheses identify functional fixedness between features of sounds and features of the agents producing them: An uncooperative group cannot produce synchronized sound, a caregiver singing to an infant cannot also be talking to someone else, and so on. We endorse this approach, but argue that neither hypothesis fixes enough functions to explain a central mystery of musical evolution: But, why does music sound like that? Furthermore, we note that neither hypothesis explains how to get from the simple signaling of brute facts to the complex semantic playground of music as-we-know-it. We suggest the cognitive capacity for domain-general compositional thinking may have played an important role.

To understand why the account of functional fixedness needs to be elaborated, consider the case of the lullaby. On the credible signaling hypothesis, the form of the lullaby is fixed by the functional requirements of signaling attention, proximity, and responsiveness of the caregiver. But taken alone, this does not explain why lullabies across the world are slow and consonant (Mehr et al., Reference Mehr, Singh, Knox, Ketter, Pickens-Jones, Atwood and Glowacki2019). Given the spare set of requirements, lullabies could just as well be mostly rapid dissonant grunting, yet to our knowledge no culture has adopted this strategy.

To better account for the form of the lullaby, we propose that connections between sound, movement, and emotion are an additional source of functional fixedness. If one function of emotion is to bias agents toward context-appropriate action (Frijda, Kuipers, & Ter Schure, Reference Frijda, Kuipers and Ter Schure1989), then the path from action predisposition to functional fixedness is short. If high-arousal states make characteristically low-arousal movement difficult (and vice versa), and if high-arousal movements produce sounds that distinguish them from low-arousal movements, then sounds should credibly signal both the movement that produced them and the sound producer's state of mind. Rapid dissonant grunting is, therefore, bad lullaby material – not because it fails to signal attention, proximity, or responsiveness, but because it signals that the caregiver is in a high-arousal state inappropriate for the context (e.g., bedtime).

Supporting this account, we have shown that music and movement share a dynamic structure, such that the same combinations of music and movement features express the same emotions across cultures (Sievers, Polansky, Casey, & Wheatley, Reference Sievers, Polansky, Casey and Wheatley2013). Furthermore, harsh timbres and spiky movement contours (both quantified using the spectral centroid) are reliably used in the expression and perception of high-arousal emotion (Sievers, Lee, Haslett, & Wheatley, Reference Sievers, Lee, Haslett and Wheatley2019). Interestingly, Filippi et al. (Reference Filippi, Congdon, Hoang, Bowling, Reber, Pašukonis and Newen2017) have shown that harsh timbres are used to express high-arousal by many species of terrestrial vertebrate, suggesting the sound–emotion connection is evolutionarily ancient, fitting the proposed timeline of the credible signaling hypothesis.

The social bonding hypothesis faces a similar challenge, as predictability cannot do the job of fixing musical form all on its own. Because predictability is, in principle, separable from sonic features such as loudness and harshness, two pieces of music could be similar in terms of overall predictability but otherwise completely different. Our proposed approach should work here, too: Expand the account of inferential roles and functional fixedness to accommodate more of what makes music matter, even at the cost of weakening the claim that music has a singular function.

The credible signaling and social bonding hypotheses each describe building blocks to be shaped into music by cultural evolution. For the social bonding hypothesis, the building blocks are interlocking neurobiological reward-learning systems. For the credible signaling hypothesis, the building blocks are signaling systems selected to fit specific inferential roles. Both hypotheses face the same problem: If mere exposure to a sonic stimulus can provoke appropriate behavior, why is music so elaborate, such a parade of semantic excess? Although signals communicate simple meanings (the shake of the rattlesnake's tail means you are in danger), music communicates complexes of meaning, supporting a semantics with a richness different from but rivaling that of language (Schlenker, Reference Schlenker2019). This may also be true of dance (Charnavel, Reference Charnavel2019; Patel-Grosz, Grosz, Kelkar, & Jensenius, Reference Patel-Grosz, Grosz, Kelkar and Jensenius2018). What could possibly get us from the mechanistically and computationally simple raw material of reward-learning and signaling to the confoundingly meaningful playground of music as-we-know-it?

Building blocks are meant to be combined. We suggest that the capacity for compositional thought – the ability to “make infinite use of finite means” (von Humboldt, Reference von Humboldt1836/1999) by recombining parts into novel arrangements – had an important role in the transition from proto-music to music. If so, the “shapes” of the proto-musical building blocks should matter – each block must be interoperable with the others. This interoperability could take many forms, ranging from accessibility to a global workspace (Baars, Reference Baars1993) or a symbolic representational system (Fodor, Reference Fodor1975) to participation in a network of interacting cognitive maps (Bottini & Doeller, Reference Bottini and Doeller2020; Parkinson, Liu, & Wheatley, Reference Parkinson, Liu and Wheatley2014). Hinting at the latter possibility, we have shown that the brain represents emotional music and movement using a similar format, possibly to facilitate comparison across sensory modalities (Sievers et al., Reference Sievers, Parkinson, Kohler, Hughes, Fogelson and Wheatley2018).

Critically, the requirement of proto-musical interoperability poses different challenges for each hypothesis. The social bonding hypothesis must avoid the trap of behaviorism, showing how the simplistic stimulus-response characteristics of reward-learning systems could be bootstrapped to build a rich inferential semantics. By contrast, the credible signaling hypothesis must avoid the trap of massive modularity, showing how a motley of signaling systems, each evolved to narrowly serve different inferential roles, could be harmonized, placed in a shared context, and used to express a wider range of meanings.

Both target articles, here, elucidate important proto-music building blocks that functionally constrain music as-we-know-it. But, neither explains why music sounds the way it does. Getting from proto-music to music as-we-know-it requires not only knowing what the building blocks are, but also how they fit together, combining and interacting to create the deeply affecting, complex and semantically rich music humans enjoy.

Financial support

This short commentary was not funded by any institution.

Conflict of interest

None.

References

Baars, B. J. (1993). A cognitive theory of consciousness. Cambridge University Press.Google Scholar
Bottini, R., & Doeller, C. F. (2020). Knowledge across reference frames: Cognitive maps and image spaces. Trends in Cognitive Sciences, 8, 606619.CrossRefGoogle Scholar
Charnavel, I. (2019). Steps towards a universal grammar of dance: Local grouping structure in basic human movement perception. Frontiers in Psychology, 10, 1364.CrossRefGoogle Scholar
Filippi, P., Congdon, J. V., Hoang, J., Bowling, D. L., Reber, S. A., Pašukonis, A., … Newen, A. (2017). Humans recognize emotional arousal in vocalizations across all classes of terrestrial vertebrates: Evidence for acoustic universals. Proceedings of the Royal Society B, 284, 1859.Google ScholarPubMed
Fodor, J. A. (1975). The language of thought. Thomas Y. Crowell.Google Scholar
Frijda, N. H., Kuipers, P., & Ter Schure, E. (1989). Relations among emotion, appraisal, and emotional action readiness. Journal of Personality and Social Psychology, 57, 212228.CrossRefGoogle Scholar
Mehr, S. A., Singh, M., Knox, D., Ketter, D. M., Pickens-Jones, D., Atwood, S., … Glowacki, L. (2019). Universality and diversity in human song. Science, 366(6468), 957970.CrossRefGoogle ScholarPubMed
Parkinson, C., Liu, S., & Wheatley, T. (2014). A common cortical metric for spatial, temporal, and social distance. Journal of Neuroscience, 34, 19791987.CrossRefGoogle ScholarPubMed
Patel-Grosz, P., Grosz, P. G., Kelkar, T., & Jensenius, A. R. (2018). Coreference and disjoint reference in the semantics of narrative dance. Proceedings of Sinn und Bedeutung, 22, 199216.Google Scholar
Schlenker, P. (2019). Prolegomena to music semantics. Review of Philosophy and Psychology, 10, 35111.CrossRefGoogle Scholar
Sievers, B., Lee, C., Haslett, W., & Wheatley, T. (2019). A multi-sensory code for emotional arousal. Proceedings of the Royal Society B, 286, 1906.Google ScholarPubMed
Sievers, B., Parkinson, C., Kohler, P. J., Hughes, J., Fogelson, S. V., & Wheatley, T. (2018). Visual and auditory brain areas share a neural code for perceived emotion. BioRxiv, 254961.Google Scholar
Sievers, B., Polansky, L., Casey, M., & Wheatley, T. (2013). Music and movement share a dynamic structure that supports universal expressions of emotion. Proceedings of the National Academy of Sciences, 110, 7075.CrossRefGoogle Scholar
von Humboldt, W. (1836/1999). On language: On the diversity of human language construction and its influence on the mental development of the human Species. Cambridge University Press.Google Scholar