The target article by Mehr and colleagues provides a welcome critique of prevailing evolutionary theories of music while also advancing their own credible signaling proposal. We find many aspects of this promising. However, although adaptations for rhythm and melody seem plausible, we take issue with the claim that credible signaling resulted in a “grammar-like, combinatorially generative interface” based on the “hierarchical organization of meter and tonality.”
Coalition signaling provides plausible reasons to evolve the capacity to produce and discriminate rhythmically coordinated displays. This is supported in the cited data on birds (Hall & Magrath, Reference Hall and Magrath2007; Tobias et al., Reference Tobias, Sheard, Seddon, Meade, Cotton and Nakagawa2016) and primates (Geissmann, Reference Geissmann, Wallin, Merker and Brown2000). But, it does not, as far as we can see, provide reasons to evolve hierarchical means of doing so. And indeed, these data only show evidence of rhythmic coordination in terms of temporal precision or synchronization and provide no evidence for or against hierarchy.
Similarly, parent–infant signaling provides evolutionary reasons for melodic signals and infant sensitivity to them. A comparison of animal contact calls (Bouchet, Blois-Heulin, & Lemasson, Reference Bouchet, Blois-Heulin and Lemasson2013; Leighton, Reference Leighton2017) and data on genomic imprinting disorders in humans (Mehr, Kotler, Howard, Haig, & Krasnow, Reference Mehr, Kotler, Howard, Haig and Krasnow2017) supports these claims. But, here too: why hierarchies? For the purpose of signaling attention to an infant, or for contact calls more generally, hierarchical organization poses no obvious advantage. There is also limited evidence for contact calls being hierarchically organized. Moreover, although some brain areas show differential responses to tonal structure from birth (Perani et al., Reference Perani, Saccuman, Scifo, Spada, Andreolli, Rovelli and Koelsch2010), behavioral sensitivity only begins to manifest at around 4 years of age before continuing to develop into the teenage years (Brandt, Gebrian, & Slevc, Reference Brandt, Gebrian and Slevc2012; Corrigall & Trainor, Reference Corrigall and Trainor2014).
Taken together, although the hierarchical properties of meter and tonality are a design feature of the musical capacity, their presence is not so clearly motivated by credible signaling.
Hierarchies, however, are not unique to music. They are found in other cognitive domains such as language (Chomsky, Reference Chomsky1957), vision (Bill, Pailian, Gershman, & Drugowitsch, Reference Bill, Pailian, Gershman and Drugowitsch2020), metacognition (Frith, Reference Frith2012), and action planning (Miller, Galanter, & Pibram, Reference Miller, Galanter and Pibram1960). In nonhuman primates, they are found in social learning (Byrne & Russon, Reference Byrne and Russon1998) and tool use (Byrne, Sanz, & Morgan, Reference Byrne, Sanz, Morgan, Sanz, Call and Boesch2013; Greenfield, Reference Greenfield1991). Musical hierarchality may, therefore, be better conceived as using generic mechanisms evolved for reasons other than as a specific adaptation for credible signaling.
We have previously argued that the hierarchality of both musical and linguistic structures derives from mechanisms originally evolved for action planning (Asano & Boeckx, Reference Asano and Boeckx2015; see also: Fitch & Martins, Reference Fitch and Martins2014; Jackendoff, Reference Jackendoff2009). The inspiration for much of this thinking was Karl Lashley's (Reference Lashley1951) prescient insight that complex actions generally, and those for music and language specifically, control their sequential manifestation through hierarchical plans. Doing so, he argued, was necessary for flexibility and robustness, especially for more complex and abstractly motivated actions in which the limitations of control by linear associative chaining are laid bare.
The primary neurocognitive mechanism underlying this capacity is hierarchical cognitive control and comprises a combination of executive functions (maintenance, selection, and inhibition). Maintenance is subserved by prefrontal areas (together with their parietal connections) and selection and inhibition by the basal ganglia. The orchestration of these functional areas through a number of distinct cortico-basal ganglia-thalamocortical circuits enables complex and flexible behavior (Badre & Nee, Reference Badre and Nee2018). Consistent with Lashley's insight, these neural circuits are not only implicated in action planning, but also for processing musical and linguistic hierarchies (Asano et al., Reference Asano, Boeckx and Seifert2021; Fitch & Martins, Reference Fitch and Martins2014; Jeon, Anwander, & Friederici, Reference Jeon, Anwander and Friederici2014; Slevc & Okada, Reference Slevc and Okada2015).
Functional explanations of behavior are essential for understanding biological evolution. But, based on these alone, the determination of how they are translated into mechanisms is too underconstrained. Are new mechanisms evolved de novo? Or are existing ones tweaked and put to new use? And then how may these and other proximate mechanisms in turn constrain the space of ultimate reasons that guides selection in a reciprocal cycle? (Laland, Sterelny, Odling-Smee, Hoppitt, & Uller, Reference Laland, Sterelny, Odling-Smee, Hoppitt and Uller2011). As Tinbergen (Reference Tinbergen1963) suggested, the biological study of behavior (and cognitive systems, in the current paper) should give equal attention to each of four questions: mechanism, ontogeny, phylogeny, and function. Each provides unique constraints whose combined consilience is the basis for robust theory.
One notable “so what?” of all this for the target article is that adaptations for credible signaling may also have implications for language. According to our proposal, the structural complexity of both music and language partly derives from generic hierarchical cognitive control mechanisms that interface with auditory and motor systems. Compared to nonhuman primates, humans have substantially greater white-matter connectivity both within the hierarchical control circuits and through the dorsal auditory pathway that links motor, auditory, and parietal areas with the prefrontal cortex (Barrett et al., Reference Barrett, Dawson, Dyrby, Krug, Ptito, D'Arceuil and Catani2020; Rilling et al., Reference Rilling, Glasser, Preuss, Ma, Zhao, Hu and Behrens2008). Adaptations for producing and perceiving rhythmically coordinated audio-motor displays and for fine-scale vocal control of pitch conceivably include an expansion of this shared connectome (Merchant & Honing, Reference Merchant and Honing2014; Patel & Iversen, Reference Patel and Iversen2014), thus entangling the evolution of both domains. This would also be consistent with claims about music-to-language transfer effects more generally in ontogeny (Patel, Reference Patel2011; Zatorre, Reference Zatorre2013).
To conclude, the credible signaling proposal of Mehr and colleagues is commendable. But, we suggest that it can be further improved by considering interactions of proximate and ultimate causes, and specifically how this may clarify the origins of musical hierarchies.
The target article by Mehr and colleagues provides a welcome critique of prevailing evolutionary theories of music while also advancing their own credible signaling proposal. We find many aspects of this promising. However, although adaptations for rhythm and melody seem plausible, we take issue with the claim that credible signaling resulted in a “grammar-like, combinatorially generative interface” based on the “hierarchical organization of meter and tonality.”
Coalition signaling provides plausible reasons to evolve the capacity to produce and discriminate rhythmically coordinated displays. This is supported in the cited data on birds (Hall & Magrath, Reference Hall and Magrath2007; Tobias et al., Reference Tobias, Sheard, Seddon, Meade, Cotton and Nakagawa2016) and primates (Geissmann, Reference Geissmann, Wallin, Merker and Brown2000). But, it does not, as far as we can see, provide reasons to evolve hierarchical means of doing so. And indeed, these data only show evidence of rhythmic coordination in terms of temporal precision or synchronization and provide no evidence for or against hierarchy.
Similarly, parent–infant signaling provides evolutionary reasons for melodic signals and infant sensitivity to them. A comparison of animal contact calls (Bouchet, Blois-Heulin, & Lemasson, Reference Bouchet, Blois-Heulin and Lemasson2013; Leighton, Reference Leighton2017) and data on genomic imprinting disorders in humans (Mehr, Kotler, Howard, Haig, & Krasnow, Reference Mehr, Kotler, Howard, Haig and Krasnow2017) supports these claims. But, here too: why hierarchies? For the purpose of signaling attention to an infant, or for contact calls more generally, hierarchical organization poses no obvious advantage. There is also limited evidence for contact calls being hierarchically organized. Moreover, although some brain areas show differential responses to tonal structure from birth (Perani et al., Reference Perani, Saccuman, Scifo, Spada, Andreolli, Rovelli and Koelsch2010), behavioral sensitivity only begins to manifest at around 4 years of age before continuing to develop into the teenage years (Brandt, Gebrian, & Slevc, Reference Brandt, Gebrian and Slevc2012; Corrigall & Trainor, Reference Corrigall and Trainor2014).
Taken together, although the hierarchical properties of meter and tonality are a design feature of the musical capacity, their presence is not so clearly motivated by credible signaling.
Hierarchies, however, are not unique to music. They are found in other cognitive domains such as language (Chomsky, Reference Chomsky1957), vision (Bill, Pailian, Gershman, & Drugowitsch, Reference Bill, Pailian, Gershman and Drugowitsch2020), metacognition (Frith, Reference Frith2012), and action planning (Miller, Galanter, & Pibram, Reference Miller, Galanter and Pibram1960). In nonhuman primates, they are found in social learning (Byrne & Russon, Reference Byrne and Russon1998) and tool use (Byrne, Sanz, & Morgan, Reference Byrne, Sanz, Morgan, Sanz, Call and Boesch2013; Greenfield, Reference Greenfield1991). Musical hierarchality may, therefore, be better conceived as using generic mechanisms evolved for reasons other than as a specific adaptation for credible signaling.
We have previously argued that the hierarchality of both musical and linguistic structures derives from mechanisms originally evolved for action planning (Asano & Boeckx, Reference Asano and Boeckx2015; see also: Fitch & Martins, Reference Fitch and Martins2014; Jackendoff, Reference Jackendoff2009). The inspiration for much of this thinking was Karl Lashley's (Reference Lashley1951) prescient insight that complex actions generally, and those for music and language specifically, control their sequential manifestation through hierarchical plans. Doing so, he argued, was necessary for flexibility and robustness, especially for more complex and abstractly motivated actions in which the limitations of control by linear associative chaining are laid bare.
The primary neurocognitive mechanism underlying this capacity is hierarchical cognitive control and comprises a combination of executive functions (maintenance, selection, and inhibition). Maintenance is subserved by prefrontal areas (together with their parietal connections) and selection and inhibition by the basal ganglia. The orchestration of these functional areas through a number of distinct cortico-basal ganglia-thalamocortical circuits enables complex and flexible behavior (Badre & Nee, Reference Badre and Nee2018). Consistent with Lashley's insight, these neural circuits are not only implicated in action planning, but also for processing musical and linguistic hierarchies (Asano et al., Reference Asano, Boeckx and Seifert2021; Fitch & Martins, Reference Fitch and Martins2014; Jeon, Anwander, & Friederici, Reference Jeon, Anwander and Friederici2014; Slevc & Okada, Reference Slevc and Okada2015).
Functional explanations of behavior are essential for understanding biological evolution. But, based on these alone, the determination of how they are translated into mechanisms is too underconstrained. Are new mechanisms evolved de novo? Or are existing ones tweaked and put to new use? And then how may these and other proximate mechanisms in turn constrain the space of ultimate reasons that guides selection in a reciprocal cycle? (Laland, Sterelny, Odling-Smee, Hoppitt, & Uller, Reference Laland, Sterelny, Odling-Smee, Hoppitt and Uller2011). As Tinbergen (Reference Tinbergen1963) suggested, the biological study of behavior (and cognitive systems, in the current paper) should give equal attention to each of four questions: mechanism, ontogeny, phylogeny, and function. Each provides unique constraints whose combined consilience is the basis for robust theory.
One notable “so what?” of all this for the target article is that adaptations for credible signaling may also have implications for language. According to our proposal, the structural complexity of both music and language partly derives from generic hierarchical cognitive control mechanisms that interface with auditory and motor systems. Compared to nonhuman primates, humans have substantially greater white-matter connectivity both within the hierarchical control circuits and through the dorsal auditory pathway that links motor, auditory, and parietal areas with the prefrontal cortex (Barrett et al., Reference Barrett, Dawson, Dyrby, Krug, Ptito, D'Arceuil and Catani2020; Rilling et al., Reference Rilling, Glasser, Preuss, Ma, Zhao, Hu and Behrens2008). Adaptations for producing and perceiving rhythmically coordinated audio-motor displays and for fine-scale vocal control of pitch conceivably include an expansion of this shared connectome (Merchant & Honing, Reference Merchant and Honing2014; Patel & Iversen, Reference Patel and Iversen2014), thus entangling the evolution of both domains. This would also be consistent with claims about music-to-language transfer effects more generally in ontogeny (Patel, Reference Patel2011; Zatorre, Reference Zatorre2013).
To conclude, the credible signaling proposal of Mehr and colleagues is commendable. But, we suggest that it can be further improved by considering interactions of proximate and ultimate causes, and specifically how this may clarify the origins of musical hierarchies.
Note
The authors contributed equally to this commentary.
Financial support
This study was supported by MEXT/JSPS Grant-in-Aid for Scientific Research on Innovative Areas #4903 (Evolinguistics) [grant number JP17H06379] and the Spanish Ministry of Science and Innovation [grant number PID2019-107042GB-I00].
Conflict of interest
None.