Although we generally agree with the credible signaling hypothesis and provide evidence for credible signaling in contemporary music, an important issue remains unaddressed by the theory. Recent research suggests that contemporary credible musical signals are less emotionally impactful than their vocal counterparts. Musical signals take far more time and energy to manufacture compared to vocal ones. The theory falls short of explaining the evolutionary added value of these more taxing and less affective musical signals. The credibility hypothesis should be extended to account for this counterintuitive observation by including a component regarding the motivation for, and consequences of, culturally decontextualizing a biologically contextualized signal (Frühholz, Trost, & Kotz, Reference Frühholz, Trost and Kotz2016). Specifically, we hypothesize that these affectively weaker musical signals communicate “emotional fiction” alongside their biological meanings and may have been motivated by the adaptive need for emotionally impactful storytelling.
Although we agree with the authors' claim that today's actual domain of music is far removed from its proper domain, recent findings of credible signals in contemporary music show that some ancient, vocal-inspired signals have resiliently persisted throughout the diverse cultural metamorphoses that music has undergone over centuries across the world. For example, one contemporary credible signal feature in music to convey affective meaning is roughness, a harsh, buzzing, raspy sound quality (Vassilakis & Kendall, Reference Vassilakis and Kendall2010). Roughness has a long evolutionary trajectory in human and animal alarm calls (Arnal, Flinker, Kleinschmidt, Giraud, & Poeppel, Reference Arnal, Flinker, Kleinschmidt, Giraud and Poeppel2015; Engelberg & Gouzoules, Reference Engelberg and Gouzoules2019; Schwartz, Engelberg, & Gouzoules, Reference Schwartz, Engelberg and Gouzoules2019) and has been found to be present in terrifying excerpts from horror film music (Trevor, Arnal, & Frühholz, Reference Trevor, Arnal and Frühholz2020). Another contemporary credible signal in music is the sigh, a vocal signal generated by both humans and animals that typically expresses sadness or frustration (Li & Yackle, Reference Li and Yackle2017; Teigen, Reference Teigen2008). In music, sighs are mimicked by a falling narrow melodic motion with a decreasing loudness, a standard device in Western classical music used to signal grief to the listener (Monelle, Reference Monelle2000). Music has also been found to imitate the staccato acoustic profile of laughter, a credible signal found in both humans and many animal species (Bryant, Reference Bryant, Floyd and Weber2020), when communicating humor (Trevor & Huron, Reference Trevor and Huron2018). These instances of credible signals in contemporary music are indicative of the continued presence of biologically rooted credible signals in music today, extending the reach of Mehr and colleagues' theory to present day music.
Although such mimicry of vocal signals exists as predicted by the credible signaling theory, many cross-comparisons between music and voices have shown that affective meaning is signaled and perceived more poorly in music than in voices (Frühholz, Trost, & Grandjean, Reference Frühholz, Trost and Grandjean2014; Juslin & Laukka, Reference Juslin and Laukka2003; Paquette, Takerkart, Saget, Peretz, & Belin, Reference Paquette, Takerkart, Saget, Peretz and Belin2018; Scherer, Reference Scherer1995). For example, Paquette et al. (Reference Paquette, Takerkart, Saget, Peretz and Belin2018) report overall lower recognition accuracies for fearful, sad, happy, and neutral emotions expressed in music compared to voices. Furthermore, one of our recent studies showed that vocal screams are perceived as significantly more intense and emotionally negative than horror film music excerpts that mimic human screams even though both use the credible signal roughness (Trevor et al., Reference Trevor, Arnal and Frühholz2020). Affective meaning seems thus less well signaled and recognized in music compared to voices, a difference that is not accounted for in the credibility hypothesis and therefore could be a downside to this theory.
To address these perceptual differences, we propose that the credibility hypothesis could be extended to include a component regarding culturally de-contextualized biological signals. A similar functional de-contextualization component has been described for the evolution of human reasoning (Stanovich & West, Reference Stanovich and West2000). Vocal signals have biological significance, are largely triggered by situational cues, and have direct contextual meanings to listeners (Frühholz & Schweinberger, Reference Frühholz and Schweinberger2020; Frühholz et al., Reference Frühholz, Trost and Kotz2016). On the contrary, musical imitations of these vocal signals are of a more “symbolic” and “fictional” nature, are voluntarily produced along musical principles and cultural rules, and are meant to capture the attention and emotional sway of the listener. The weaker credibility of musically signaled affective meaning could be because of this difference in signal goals and the de-contextualization of the signal. What then is the evolutionary value of these musical signals? The de-contextualized nature of these signals results in the communication of two pieces of information: “emotional fiction” and the biological meaning of the natural signal being imitated. Music-induced emotions are sometimes regarded as “make-believe” emotions, as fictional tools in de-contextualized settings (Walton, Reference Walton1990). In communicating “emotional fiction,” the musical signal tells the listener that the situation is not real, it is a simulation. That information might weaken the second part of the signal, the affective impression of the imitated vocal expression. Given this “emotional fiction” component, perhaps the creation of biologically rooted affective musical signals was motivated by an adaptive need for simulating emotional situations.
What evolutionary role do simulations of emotional situations serve? There is a theory that nightmares may have evolved to simulate threatening situations to increase threat preparedness and survival chances in early humans (Revonsuo, Reference Revonsuo2000). Part of such threat preparedness would include emotional preparedness, or resilience and emotion regulation skills, because nightmares induce fearful emotions. Some research on other threat simulating activities (horror films and violent videogames) supports this theory. People who enjoy horror movies have been found to be more resilient in the face of real-life dangers, such as the COVID pandemic (Scrivner, Johnson, Kjeldgaard-Christiansen, & Clasen, Reference Scrivner, Johnson, Kjeldgaard-Christiansen and Clasen2020). Similarly, people who play violent video games have fewer nightmares, suggesting that videogame simulations actually fill that adaptive need for threat simulation (Bown & Gackenbach, Reference Bown, Gackenbach, Tettegah and Huang2016). In ancient human cultures, threat simulations were conveyed through storytelling. Storytelling is a universal human practice with ancient roots (Smith et al., Reference Smith, Schlaepfer, Major, Dyble, Page, Thompson and Astete2017) and it often involved musical instruments (Pellowski, Reference Pellowski1990). Perhaps storytellers were motivated to create sounds that would be similar to real-life signals but also clearly fictional, increasing the emotional impression of the stories and enabling listeners to rehearse the emotions of the tale in a safe, imaginary, and cooperative space.
Although we generally agree with the credible signaling hypothesis and provide evidence for credible signaling in contemporary music, an important issue remains unaddressed by the theory. Recent research suggests that contemporary credible musical signals are less emotionally impactful than their vocal counterparts. Musical signals take far more time and energy to manufacture compared to vocal ones. The theory falls short of explaining the evolutionary added value of these more taxing and less affective musical signals. The credibility hypothesis should be extended to account for this counterintuitive observation by including a component regarding the motivation for, and consequences of, culturally decontextualizing a biologically contextualized signal (Frühholz, Trost, & Kotz, Reference Frühholz, Trost and Kotz2016). Specifically, we hypothesize that these affectively weaker musical signals communicate “emotional fiction” alongside their biological meanings and may have been motivated by the adaptive need for emotionally impactful storytelling.
Although we agree with the authors' claim that today's actual domain of music is far removed from its proper domain, recent findings of credible signals in contemporary music show that some ancient, vocal-inspired signals have resiliently persisted throughout the diverse cultural metamorphoses that music has undergone over centuries across the world. For example, one contemporary credible signal feature in music to convey affective meaning is roughness, a harsh, buzzing, raspy sound quality (Vassilakis & Kendall, Reference Vassilakis and Kendall2010). Roughness has a long evolutionary trajectory in human and animal alarm calls (Arnal, Flinker, Kleinschmidt, Giraud, & Poeppel, Reference Arnal, Flinker, Kleinschmidt, Giraud and Poeppel2015; Engelberg & Gouzoules, Reference Engelberg and Gouzoules2019; Schwartz, Engelberg, & Gouzoules, Reference Schwartz, Engelberg and Gouzoules2019) and has been found to be present in terrifying excerpts from horror film music (Trevor, Arnal, & Frühholz, Reference Trevor, Arnal and Frühholz2020). Another contemporary credible signal in music is the sigh, a vocal signal generated by both humans and animals that typically expresses sadness or frustration (Li & Yackle, Reference Li and Yackle2017; Teigen, Reference Teigen2008). In music, sighs are mimicked by a falling narrow melodic motion with a decreasing loudness, a standard device in Western classical music used to signal grief to the listener (Monelle, Reference Monelle2000). Music has also been found to imitate the staccato acoustic profile of laughter, a credible signal found in both humans and many animal species (Bryant, Reference Bryant, Floyd and Weber2020), when communicating humor (Trevor & Huron, Reference Trevor and Huron2018). These instances of credible signals in contemporary music are indicative of the continued presence of biologically rooted credible signals in music today, extending the reach of Mehr and colleagues' theory to present day music.
Although such mimicry of vocal signals exists as predicted by the credible signaling theory, many cross-comparisons between music and voices have shown that affective meaning is signaled and perceived more poorly in music than in voices (Frühholz, Trost, & Grandjean, Reference Frühholz, Trost and Grandjean2014; Juslin & Laukka, Reference Juslin and Laukka2003; Paquette, Takerkart, Saget, Peretz, & Belin, Reference Paquette, Takerkart, Saget, Peretz and Belin2018; Scherer, Reference Scherer1995). For example, Paquette et al. (Reference Paquette, Takerkart, Saget, Peretz and Belin2018) report overall lower recognition accuracies for fearful, sad, happy, and neutral emotions expressed in music compared to voices. Furthermore, one of our recent studies showed that vocal screams are perceived as significantly more intense and emotionally negative than horror film music excerpts that mimic human screams even though both use the credible signal roughness (Trevor et al., Reference Trevor, Arnal and Frühholz2020). Affective meaning seems thus less well signaled and recognized in music compared to voices, a difference that is not accounted for in the credibility hypothesis and therefore could be a downside to this theory.
To address these perceptual differences, we propose that the credibility hypothesis could be extended to include a component regarding culturally de-contextualized biological signals. A similar functional de-contextualization component has been described for the evolution of human reasoning (Stanovich & West, Reference Stanovich and West2000). Vocal signals have biological significance, are largely triggered by situational cues, and have direct contextual meanings to listeners (Frühholz & Schweinberger, Reference Frühholz and Schweinberger2020; Frühholz et al., Reference Frühholz, Trost and Kotz2016). On the contrary, musical imitations of these vocal signals are of a more “symbolic” and “fictional” nature, are voluntarily produced along musical principles and cultural rules, and are meant to capture the attention and emotional sway of the listener. The weaker credibility of musically signaled affective meaning could be because of this difference in signal goals and the de-contextualization of the signal. What then is the evolutionary value of these musical signals? The de-contextualized nature of these signals results in the communication of two pieces of information: “emotional fiction” and the biological meaning of the natural signal being imitated. Music-induced emotions are sometimes regarded as “make-believe” emotions, as fictional tools in de-contextualized settings (Walton, Reference Walton1990). In communicating “emotional fiction,” the musical signal tells the listener that the situation is not real, it is a simulation. That information might weaken the second part of the signal, the affective impression of the imitated vocal expression. Given this “emotional fiction” component, perhaps the creation of biologically rooted affective musical signals was motivated by an adaptive need for simulating emotional situations.
What evolutionary role do simulations of emotional situations serve? There is a theory that nightmares may have evolved to simulate threatening situations to increase threat preparedness and survival chances in early humans (Revonsuo, Reference Revonsuo2000). Part of such threat preparedness would include emotional preparedness, or resilience and emotion regulation skills, because nightmares induce fearful emotions. Some research on other threat simulating activities (horror films and violent videogames) supports this theory. People who enjoy horror movies have been found to be more resilient in the face of real-life dangers, such as the COVID pandemic (Scrivner, Johnson, Kjeldgaard-Christiansen, & Clasen, Reference Scrivner, Johnson, Kjeldgaard-Christiansen and Clasen2020). Similarly, people who play violent video games have fewer nightmares, suggesting that videogame simulations actually fill that adaptive need for threat simulation (Bown & Gackenbach, Reference Bown, Gackenbach, Tettegah and Huang2016). In ancient human cultures, threat simulations were conveyed through storytelling. Storytelling is a universal human practice with ancient roots (Smith et al., Reference Smith, Schlaepfer, Major, Dyble, Page, Thompson and Astete2017) and it often involved musical instruments (Pellowski, Reference Pellowski1990). Perhaps storytellers were motivated to create sounds that would be similar to real-life signals but also clearly fictional, increasing the emotional impression of the stories and enabling listeners to rehearse the emotions of the tale in a safe, imaginary, and cooperative space.
Financial support
C.T. received funding from the European Union's Horizon 2020 research and innovation program under the Marie Skłodowska-Curie Grant Agreement (No. 835682). S.F. received funding from Swiss National Science Foundation (Grants Nos. SNSF PP00P1_157409/1 and PP00P1_183711/1).
Conflict of interest
None.