Musicality consists of the (neuro)biological underpinnings to perceive and produce music. Research in the evolution of musicality needs cross-species evidence. As a parallel, to understand the evolution of bat wings, one asks why all other mammals lack wings and why other flying animals have evolved them. Similarly, our species only constitutes one datapoint to construct evolutionary hypotheses on musicality. Comparisons with other species are necessary to avoid post-hoc explanations of evolutionary traits.
Four concepts discussed in Savage et al. are key for understanding musicality, both in humans and other animals (Fig. 1). Isochrony describes metronomic temporal regularity, similar to the ticking of a clock (Merker, Madison, & Eckerdal, Reference Merker, Madison and Eckerdal2009; Ravignani & Madison, Reference Ravignani and Madison2017). Synchrony is the perfect co-occurrence in time of two series of events, with no strong teleological or mechanistic focus (Kotz, Ravignani, & Fitch, Reference Kotz, Ravignani and Fitch2018; Ravignani, Reference Ravignani2017). Vocal learning is the ability to learn and modify non-innate vocalizations, including melodies (Lattenkamp & Vernes, Reference Lattenkamp and Vernes2018). Beat induction denotes a top-down capacity to induce a regular pulse from music and move in synchrony to it (Grahn & Brett, Reference Grahn and Brett2007; Honing, Reference Honing2012).
Figure 1. Conceptualization of the four abilities partly explored in the target articles plus a fifth one, vocal rhythms, which deserves entering the discussion. Isochrony, when present in acoustic or motoric behaviors, may provide a clear, extremely predictable temporal grid, similar to squared notebooks guiding children who learn how to write. An isochronous pattern is, per se, neither musical nor demanding to produce or perceive. Isochrony has low entropy, definitely lower than expected for “musical” patterns (Milne & Herff, Reference Milne and Herff2020; Ravignani & Madison, Reference Ravignani and Madison2017). Production of isochrony can result from a motoric behavior entraining to a neural oscillator. Perception of isochrony requires, at least, comparing pairs of temporal intervals, an ability found in several species (e.g., Church & Lacourse, Reference Church and Lacourse1998; Heinrich, Ravignani, & Hanke, Reference Heinrich, Ravignani and Hanke2020; Ng, Garcia, Dyer, & Stuart-Fox, Reference Ng, Garcia, Dyer and Stuart-Fox2020). Although isochrony is characterized by equal timing in a series of events, synchrony requires pairwise coincidence of events from two series, neither of which needs to be isochronous (Ravignani, Reference Ravignani2017). Given an acoustic sequence (black), beat induction consists of inferring an isochronous pulse (gray), which need not physically exist in the sequence (Honing, Reference Honing2012; Kotz et al., Reference Kotz, Ravignani and Fitch2018). Synchronization differs from beat induction in being independent from isochrony, relatively inflexible, achievable for a narrow range of tempi and unimodal (Patel, Iversen, Bregman, & Schulz, Reference Patel, Iversen, Bregman and Schulz2009). Vocal learning – here with emphasis in its spectral domain – includes, among other things, the capacity to copy (gray) a vocal signal (black) (Lattenkamp & Vernes, Reference Lattenkamp and Vernes2018; Wirthlin et al., Reference Wirthlin, Chang, Knörnschild, Krubitzer, Mello, Miller and Yartsev2019). A vocal rhythm (black) is a temporal pattern of events, which conveys most information in the temporal domain (Ravignani et al., Reference Ravignani, Kello, De Reus, Kotz, Dalla Bella, Méndez-Aróstegui and de Boer2019) and could also be learnt or imitated (gray).
Do other animals have these capacities supporting musicality? Isochrony appears in many species' communication (e.g., from lobster rattles to sea lion barks: Patek & Caldwell, Reference Patek and Caldwell2006; Schusterman, Reference Schusterman1977), autonomously-regulated behavior or (neuro)physiology. Synchrony is widespread but scattered across taxonomic groups (Ravignani, Bowling, & Fitch, Reference Ravignani, Bowling and Fitch2014; Wilson & Cook, Reference Wilson and Cook2016). Vocal learning is rare but potentially arose multiple times in evolution because of different pressures across species (Garcia & Ravignani, Reference Garcia and Ravignani2020; Martins & Boeckx, Reference Martins and Boeckx2020; Nowicki & Searcy, Reference Nowicki and Searcy2014). Beat induction has only been found in a few animals, as acknowledged by Savage and colleagues (Kotz et al., Reference Kotz, Ravignani and Fitch2018; cf. Mehr et al., claiming its presence in many species).
Savage and colleagues briefly characterize these four abilities; this invites discussion of cross-species implications and predictions as to how they evolved to support musicality. I add a fifth, still largely unexplored capacity: vocal rhythms, which consist of producing, perceiving, learning, or imitating signals with accuracy in the temporal – as opposed to the spectral – domain. Although this capacity to precisely time one's vocalizations is related to its spectral counterpart, vocal rhythms also have their own mechanistic and communicative value (Wirthlin et al., Reference Wirthlin, Chang, Knörnschild, Krubitzer, Mello, Miller and Yartsev2019). I argue that, across species, these five capacities are linked, mapping them to Savage et al.'s framework.
The core of Savage et al.'s idea of melodic and rhythmic musicality features vocal learning and beat induction. These are also at the core of an influential hypothesis in evolutionary neuroscience (Patel, Reference Patel2006), predicting in some cases their joint co-occurrence across species. However, a few outlier species point to a mismatch between the current data and the hypothesis' predictions (Cook, Rouse, Wilson, & Reichmuth, Reference Cook, Rouse, Wilson and Reichmuth2013), requiring an updated theoretical framework.
Within Savage et al.'s framework, I argue that rhythm and melody may have bootstrapped each other in humans and other species gradually, especially in social interactions, such as chorusing, turn-taking, and so forth (Christophe, Millotte, Bernal, & Lidz, Reference Christophe, Millotte, Bernal and Lidz2008; Hannon & Johnson, Reference Hannon and Johnson2005; Höhle, Reference Höhle2009; Ravignani et al., Reference Ravignani, Bowling and Fitch2014). An isochronous sequence, such as the repetitive bark of a sea lion, provides a temporal grid of predictable sound events. Both the producer of an isochronous rhythm and its conspecifics can rely on this periodicity to learn and experiment in the spectral, hence melodic, domain during vocal learning: vocal emissions could be anchored to the onsets of the isochronous sequence (Merker et al., Reference Merker, Madison and Eckerdal2009). Hence, rhythmic isochrony may function as temporal grid to rehearse learnt vocalizations (and possibly orient attention; Bolger, Coull, & Schön, Reference Bolger, Coull and Schön2014; Cason, Astésano, & Schön, Reference Cason, Astésano and Schön2015; Jones, Reference Jones, Nobre and Coull2010; Norton, Reference Norton2019). In turn, learnt, consolidated vocalizations may serve as a “spectral anchor” to segment conspecifics' temporal sequences (Hyland Bruno, Reference Hyland Bruno2017; Lipkind et al., Reference Lipkind, Marcus, Bemis, Sasahara, Jacoby, Takahasi and Tchernichovski2013), also generating vocal rhythms. Therefore, melodic templates acquired via vocal learning can afford increased attentional or cognitive resources spent on the rhythmic domain, including temporal segmentation and regularization. This provides a bootstrapping mechanism for Savage et al.'s co-evolutionary dynamics to work, and a testbench for some signaling hypotheses in Mehr and colleagues.
This hypothesis generates several testable predictions. First, by testing species along the vocal learning continuum (Martins & Boeckx, Reference Martins and Boeckx2020), and extending this continuum to beat induction, species with a stronger sense of beat should be found among those with more developed vocal learning capacities. Chickens, great apes, parrots, and humans are examples of species predicted to show, in this order, increasing abilities in both domains. Second, isochrony should go hand in hand with synchrony but not with beat induction, so that species with developed isochrony should also synchronize. Third, empirical evidence for the rhythm–melody scaffolding process (Cason et al., Reference Cason and Schön2012; Emmendorfer, Correia, Jansma, Kotz, & Bonte, Reference Emmendorfer, Correia, Jansma, Kotz and Bonte2020) could be obtained from large-scale developmental datasets, which should feature both humans and nonhuman animals, and contain data from as many capacities as possible from Figure 1. As ontogeny sometimes recapitulates phylogeny (e.g., Heldstab, Isler, Schuppli, & van Schaik, Reference Heldstab, Isler, Schuppli and van Schaik2020), one would test whether the same stepwise processes hypothesized above appear in the first years of human life (Höhle, Reference Höhle2009). Fourth, a partial neural dissociation between rhythm and melody may occur early in life and become less severe over development; the dynamics of this dissociation could be tested via longitudinal neuroimaging studies (Bengtsson & Ullén, Reference Bengtsson and Ullén2006; Salami, Wåhlin, Kaboodvand, Lundquist, & Nyberg, Reference Salami, Wåhlin, Kaboodvand, Lundquist and Nyberg2016). Fifth, within Savage et al.'s framework, physiological evidence for the rhythm–melody gradual interplay could come from measurements or manipulations of the dopaminergic reward system and the endogenous opioid system, testing whether they provide complementary, alternating effects. Finally, most of these putative links can be, following Savage et al., modulated by species-specific social factors, such us group density and social networks. Similarly, their value as honest signals can be tested to provide empirical support for Mehr et al. using, among others, methods from cultural evolution research (e.g., Lumaca et al., commentary on the target article by Mehr et al.; Miton, Vesper, Wolf, Knoblich, & Sperber, Reference Miton, Wolf, Vesper, Knoblich and Sperber2020).
To conclude, the frameworks proposed in both target articles can benefit from a finer dissection of core abilities for musicality (Fig. 1 and Honing, commentary on the target article by Savage et al.). These must then be tested across species to infer plausible evolutionary scenarios.
Musicality consists of the (neuro)biological underpinnings to perceive and produce music. Research in the evolution of musicality needs cross-species evidence. As a parallel, to understand the evolution of bat wings, one asks why all other mammals lack wings and why other flying animals have evolved them. Similarly, our species only constitutes one datapoint to construct evolutionary hypotheses on musicality. Comparisons with other species are necessary to avoid post-hoc explanations of evolutionary traits.
Four concepts discussed in Savage et al. are key for understanding musicality, both in humans and other animals (Fig. 1). Isochrony describes metronomic temporal regularity, similar to the ticking of a clock (Merker, Madison, & Eckerdal, Reference Merker, Madison and Eckerdal2009; Ravignani & Madison, Reference Ravignani and Madison2017). Synchrony is the perfect co-occurrence in time of two series of events, with no strong teleological or mechanistic focus (Kotz, Ravignani, & Fitch, Reference Kotz, Ravignani and Fitch2018; Ravignani, Reference Ravignani2017). Vocal learning is the ability to learn and modify non-innate vocalizations, including melodies (Lattenkamp & Vernes, Reference Lattenkamp and Vernes2018). Beat induction denotes a top-down capacity to induce a regular pulse from music and move in synchrony to it (Grahn & Brett, Reference Grahn and Brett2007; Honing, Reference Honing2012).
Figure 1. Conceptualization of the four abilities partly explored in the target articles plus a fifth one, vocal rhythms, which deserves entering the discussion. Isochrony, when present in acoustic or motoric behaviors, may provide a clear, extremely predictable temporal grid, similar to squared notebooks guiding children who learn how to write. An isochronous pattern is, per se, neither musical nor demanding to produce or perceive. Isochrony has low entropy, definitely lower than expected for “musical” patterns (Milne & Herff, Reference Milne and Herff2020; Ravignani & Madison, Reference Ravignani and Madison2017). Production of isochrony can result from a motoric behavior entraining to a neural oscillator. Perception of isochrony requires, at least, comparing pairs of temporal intervals, an ability found in several species (e.g., Church & Lacourse, Reference Church and Lacourse1998; Heinrich, Ravignani, & Hanke, Reference Heinrich, Ravignani and Hanke2020; Ng, Garcia, Dyer, & Stuart-Fox, Reference Ng, Garcia, Dyer and Stuart-Fox2020). Although isochrony is characterized by equal timing in a series of events, synchrony requires pairwise coincidence of events from two series, neither of which needs to be isochronous (Ravignani, Reference Ravignani2017). Given an acoustic sequence (black), beat induction consists of inferring an isochronous pulse (gray), which need not physically exist in the sequence (Honing, Reference Honing2012; Kotz et al., Reference Kotz, Ravignani and Fitch2018). Synchronization differs from beat induction in being independent from isochrony, relatively inflexible, achievable for a narrow range of tempi and unimodal (Patel, Iversen, Bregman, & Schulz, Reference Patel, Iversen, Bregman and Schulz2009). Vocal learning – here with emphasis in its spectral domain – includes, among other things, the capacity to copy (gray) a vocal signal (black) (Lattenkamp & Vernes, Reference Lattenkamp and Vernes2018; Wirthlin et al., Reference Wirthlin, Chang, Knörnschild, Krubitzer, Mello, Miller and Yartsev2019). A vocal rhythm (black) is a temporal pattern of events, which conveys most information in the temporal domain (Ravignani et al., Reference Ravignani, Kello, De Reus, Kotz, Dalla Bella, Méndez-Aróstegui and de Boer2019) and could also be learnt or imitated (gray).
Do other animals have these capacities supporting musicality? Isochrony appears in many species' communication (e.g., from lobster rattles to sea lion barks: Patek & Caldwell, Reference Patek and Caldwell2006; Schusterman, Reference Schusterman1977), autonomously-regulated behavior or (neuro)physiology. Synchrony is widespread but scattered across taxonomic groups (Ravignani, Bowling, & Fitch, Reference Ravignani, Bowling and Fitch2014; Wilson & Cook, Reference Wilson and Cook2016). Vocal learning is rare but potentially arose multiple times in evolution because of different pressures across species (Garcia & Ravignani, Reference Garcia and Ravignani2020; Martins & Boeckx, Reference Martins and Boeckx2020; Nowicki & Searcy, Reference Nowicki and Searcy2014). Beat induction has only been found in a few animals, as acknowledged by Savage and colleagues (Kotz et al., Reference Kotz, Ravignani and Fitch2018; cf. Mehr et al., claiming its presence in many species).
Savage and colleagues briefly characterize these four abilities; this invites discussion of cross-species implications and predictions as to how they evolved to support musicality. I add a fifth, still largely unexplored capacity: vocal rhythms, which consist of producing, perceiving, learning, or imitating signals with accuracy in the temporal – as opposed to the spectral – domain. Although this capacity to precisely time one's vocalizations is related to its spectral counterpart, vocal rhythms also have their own mechanistic and communicative value (Wirthlin et al., Reference Wirthlin, Chang, Knörnschild, Krubitzer, Mello, Miller and Yartsev2019). I argue that, across species, these five capacities are linked, mapping them to Savage et al.'s framework.
The core of Savage et al.'s idea of melodic and rhythmic musicality features vocal learning and beat induction. These are also at the core of an influential hypothesis in evolutionary neuroscience (Patel, Reference Patel2006), predicting in some cases their joint co-occurrence across species. However, a few outlier species point to a mismatch between the current data and the hypothesis' predictions (Cook, Rouse, Wilson, & Reichmuth, Reference Cook, Rouse, Wilson and Reichmuth2013), requiring an updated theoretical framework.
Within Savage et al.'s framework, I argue that rhythm and melody may have bootstrapped each other in humans and other species gradually, especially in social interactions, such as chorusing, turn-taking, and so forth (Christophe, Millotte, Bernal, & Lidz, Reference Christophe, Millotte, Bernal and Lidz2008; Hannon & Johnson, Reference Hannon and Johnson2005; Höhle, Reference Höhle2009; Ravignani et al., Reference Ravignani, Bowling and Fitch2014). An isochronous sequence, such as the repetitive bark of a sea lion, provides a temporal grid of predictable sound events. Both the producer of an isochronous rhythm and its conspecifics can rely on this periodicity to learn and experiment in the spectral, hence melodic, domain during vocal learning: vocal emissions could be anchored to the onsets of the isochronous sequence (Merker et al., Reference Merker, Madison and Eckerdal2009). Hence, rhythmic isochrony may function as temporal grid to rehearse learnt vocalizations (and possibly orient attention; Bolger, Coull, & Schön, Reference Bolger, Coull and Schön2014; Cason, Astésano, & Schön, Reference Cason, Astésano and Schön2015; Jones, Reference Jones, Nobre and Coull2010; Norton, Reference Norton2019). In turn, learnt, consolidated vocalizations may serve as a “spectral anchor” to segment conspecifics' temporal sequences (Hyland Bruno, Reference Hyland Bruno2017; Lipkind et al., Reference Lipkind, Marcus, Bemis, Sasahara, Jacoby, Takahasi and Tchernichovski2013), also generating vocal rhythms. Therefore, melodic templates acquired via vocal learning can afford increased attentional or cognitive resources spent on the rhythmic domain, including temporal segmentation and regularization. This provides a bootstrapping mechanism for Savage et al.'s co-evolutionary dynamics to work, and a testbench for some signaling hypotheses in Mehr and colleagues.
This hypothesis generates several testable predictions. First, by testing species along the vocal learning continuum (Martins & Boeckx, Reference Martins and Boeckx2020), and extending this continuum to beat induction, species with a stronger sense of beat should be found among those with more developed vocal learning capacities. Chickens, great apes, parrots, and humans are examples of species predicted to show, in this order, increasing abilities in both domains. Second, isochrony should go hand in hand with synchrony but not with beat induction, so that species with developed isochrony should also synchronize. Third, empirical evidence for the rhythm–melody scaffolding process (Cason et al., Reference Cason and Schön2012; Emmendorfer, Correia, Jansma, Kotz, & Bonte, Reference Emmendorfer, Correia, Jansma, Kotz and Bonte2020) could be obtained from large-scale developmental datasets, which should feature both humans and nonhuman animals, and contain data from as many capacities as possible from Figure 1. As ontogeny sometimes recapitulates phylogeny (e.g., Heldstab, Isler, Schuppli, & van Schaik, Reference Heldstab, Isler, Schuppli and van Schaik2020), one would test whether the same stepwise processes hypothesized above appear in the first years of human life (Höhle, Reference Höhle2009). Fourth, a partial neural dissociation between rhythm and melody may occur early in life and become less severe over development; the dynamics of this dissociation could be tested via longitudinal neuroimaging studies (Bengtsson & Ullén, Reference Bengtsson and Ullén2006; Salami, Wåhlin, Kaboodvand, Lundquist, & Nyberg, Reference Salami, Wåhlin, Kaboodvand, Lundquist and Nyberg2016). Fifth, within Savage et al.'s framework, physiological evidence for the rhythm–melody gradual interplay could come from measurements or manipulations of the dopaminergic reward system and the endogenous opioid system, testing whether they provide complementary, alternating effects. Finally, most of these putative links can be, following Savage et al., modulated by species-specific social factors, such us group density and social networks. Similarly, their value as honest signals can be tested to provide empirical support for Mehr et al. using, among others, methods from cultural evolution research (e.g., Lumaca et al., commentary on the target article by Mehr et al.; Miton, Vesper, Wolf, Knoblich, & Sperber, Reference Miton, Wolf, Vesper, Knoblich and Sperber2020).
To conclude, the frameworks proposed in both target articles can benefit from a finer dissection of core abilities for musicality (Fig. 1 and Honing, commentary on the target article by Savage et al.). These must then be tested across species to infer plausible evolutionary scenarios.
Acknowledgments
I am grateful to Henkjan Honing, Koen de Reus, Laura Verga, Massimo Lumaca, and Sonja Kotz for helpful discussion and feedback.
Financial support
Andrea Ravignani is supported by the Max Planck Society via an Independent Research Group Leader position.
Conflict of interest
None.