Savage et al.'s article presents multidisciplinary evidence supporting their music and social bonding hypothesis. The authors emphasize that in the situations where language is less effective, music enhances the social bonding functions. Then, why language outstrips music in many situations in evolution is still a question. We propose that it is because of “displacement,” one of the design features of language allowing humans to communicate events beyond here and now. From the perspective of cognitive neuroscience, we focus on the contributions of the subcortical structures of the brain in both music and language. In addition to the domain-general function of the basal ganglia, we propose that the functions of the hippocampus could underlie “displacement” which makes language more effective in general. Furthermore, comparative studies reveal that various subcomponents of music and language have been identified in nonhuman animals, and thus music and social bonding hypothesis cannot explain why only humans have music/language for social bonding or communication.
First, it has been clear that some aspects of music and language have the common neural basis (Brown, Reference Brown, Wallin, Merker and Brown2000). From the clinical perspective, Shi and Zhang (Reference Shi and Zhang2020) highlight the function of rhythm processing of the cortical-basal ganglia loop for both cognitive domains. To be more specific, we propose that the basal ganglia loop is responsible for transferring hierarchy to linearization in music and language, which is supported by the mechanism of temporal prediction, motor programing, and execution. However, this domain-general function of the basal ganglia loop cannot explain why language succeeded in outstripping music as the main means for communication.
Second, what makes language more effective than music in some situations most likely depends on its feature of displacement, and the hippocampus is proposed to be the neural basis for this property. Displacement is one of the design features of language enabling humans to “talk about things that are remote in space or time (or both) from where the talking goes on” (Hockett, Reference Hockett1960, p. 6). It was assumed the most salient property of human language (Bickerton, Reference Bickerton2009). Displacement requires mental time/space travel in mind, which was proposed to depend on episodic memory (Tulving, Reference Tulving1983) and the ability to put oneself in different timescales (Tulving, Reference Tulving2001). Neuroimaging studies have shown that the hippocampus is responsible for episodic memory (e.g., Dickerson & Eichenbaum, Reference Dickerson and Eichenbaum2010; Ergorul & Eichenbaum, Reference Ergorul and Eichenbaum2004). The hippocampus not only binds disparate elements across both space and time, but it can also compare already formed representations with current perceptual input (Olsen, Moses, Riggs, & Ryan, Reference Olsen, Moses, Riggs and Ryan2012). Covington and Duff (Reference Covington and Duff2016) proposed that the shared predictive processing of memory and language is supported by the hippocampus. In the case of language, this predictive processing associates the incoming words and semantic knowledge and builds the interface between episodic memory and communication, thinking of the case of megafauna scavenging of ancient humans. If one member of a group detected a dead deinotherium, he must exchange information, such as where and when he found it, because only by himself he cannot exploit it, he must persuade other members in the group to cooperate. It is this kind of high-end scavenging that distinguishes human ancestors with bone-crunching garhi and habilis. In this sense, the feature of displacement subserved by the function of the hippocampus enhances the power of language in social bonding.
Third, from the bottom-up perspective of evolutionary biology (De Waal & Ferrari, Reference De Waal and Ferrari2010), analogous or homologous mechanisms implicated in language and music have been found in other animals (Fitch, Reference Fitch2015; Hauser, Chomsky, & Fitch, Reference Hauser, Chomsky and Fitch2002). Comparative studies have shown that the subcomponents of rhythm processing and episodic-like memory are present in diverse species, which are supported by the basal ganglia and hippocampus. Rhythm processing was proposed to be subdivided into four subcomponents, among which beat perception and synchronization has been detected in vocal learning birds and mammals, and entrainment of conspecific signaling can be found in both vertebrates and invertebrates (Kotz, Ravignani, & Fitch, Reference Kotz, Ravignani and Fitch2018). The involvement of the basal ganglia circuit in vocal learning in birds (Jarvis, Reference Jarvis2007) and rhythm processing in humans (Grahn, Reference Grahn2009) encouraged Patel (Reference Patel2008) to come up with the “vocal learning and rhythm synchronization hypothesis.” Damaging the basal ganglia in zebra finches produces stuttering-like songs, a behavior with the disrupted rhythm, resembling stuttering in humans with impaired function of the basal ganglia (Ravignani et al., Reference Ravignani, Dalla Bella, Falk, Kello, Noriega and Kotz2019). With respect to the episodic-like memory, with behavioral criteria “where-what-when,” it has been identified in scrub jays (Clayton & Dickinson, Reference Clayton and Dickinson1998), rodents (Crystal & Smith, Reference Crystal and Smith2014), and nonhuman primates (Martin-Ordas, Haun, Colmenares, & Call, Reference Martin-Ordas, Haun, Colmenares and Call2010). Evidence has shown that the hippocampus is involved in episodic memory in mice (Ergorul & Eichenbaum, Reference Ergorul and Eichenbaum2004) and monkeys (Buckley & Gaffan, Reference Buckley, Gaffan and Bolhuis2000). Although no direct connection between the hippocampus and episodic memory in birds has been reported, Gould et al. (Reference Gould, Gilbertson, Hrvol, Nelson, Seyfer, Brantner and Kamil2013) have found that the avian relative hippocampal size is closely related to food caching, a behavior related to episodic-like memory. Interestingly, the hippocampal size is potentially linked to song plasticity in open-ended vocal learning birds and language learning in human adults (Zhang & Alamri, Reference Zhang and Alamri2016). It is also worth noting that the basal ganglia and hippocampus are conserved brain structures, and as different species evolve, they may be involved in more advanced cognitive abilities with conserved functions. The identified subcomponents of rhythm processing and episodic-like memory subserved by the basal ganglia and hippocampus above seem to all contribute to social bonding in different species. However, the social bonding theory proposed in the target article cannot explain the evolutionary trajectory of these subcomponents.
Savage et al.'s article presents multidisciplinary evidence supporting their music and social bonding hypothesis. The authors emphasize that in the situations where language is less effective, music enhances the social bonding functions. Then, why language outstrips music in many situations in evolution is still a question. We propose that it is because of “displacement,” one of the design features of language allowing humans to communicate events beyond here and now. From the perspective of cognitive neuroscience, we focus on the contributions of the subcortical structures of the brain in both music and language. In addition to the domain-general function of the basal ganglia, we propose that the functions of the hippocampus could underlie “displacement” which makes language more effective in general. Furthermore, comparative studies reveal that various subcomponents of music and language have been identified in nonhuman animals, and thus music and social bonding hypothesis cannot explain why only humans have music/language for social bonding or communication.
First, it has been clear that some aspects of music and language have the common neural basis (Brown, Reference Brown, Wallin, Merker and Brown2000). From the clinical perspective, Shi and Zhang (Reference Shi and Zhang2020) highlight the function of rhythm processing of the cortical-basal ganglia loop for both cognitive domains. To be more specific, we propose that the basal ganglia loop is responsible for transferring hierarchy to linearization in music and language, which is supported by the mechanism of temporal prediction, motor programing, and execution. However, this domain-general function of the basal ganglia loop cannot explain why language succeeded in outstripping music as the main means for communication.
Second, what makes language more effective than music in some situations most likely depends on its feature of displacement, and the hippocampus is proposed to be the neural basis for this property. Displacement is one of the design features of language enabling humans to “talk about things that are remote in space or time (or both) from where the talking goes on” (Hockett, Reference Hockett1960, p. 6). It was assumed the most salient property of human language (Bickerton, Reference Bickerton2009). Displacement requires mental time/space travel in mind, which was proposed to depend on episodic memory (Tulving, Reference Tulving1983) and the ability to put oneself in different timescales (Tulving, Reference Tulving2001). Neuroimaging studies have shown that the hippocampus is responsible for episodic memory (e.g., Dickerson & Eichenbaum, Reference Dickerson and Eichenbaum2010; Ergorul & Eichenbaum, Reference Ergorul and Eichenbaum2004). The hippocampus not only binds disparate elements across both space and time, but it can also compare already formed representations with current perceptual input (Olsen, Moses, Riggs, & Ryan, Reference Olsen, Moses, Riggs and Ryan2012). Covington and Duff (Reference Covington and Duff2016) proposed that the shared predictive processing of memory and language is supported by the hippocampus. In the case of language, this predictive processing associates the incoming words and semantic knowledge and builds the interface between episodic memory and communication, thinking of the case of megafauna scavenging of ancient humans. If one member of a group detected a dead deinotherium, he must exchange information, such as where and when he found it, because only by himself he cannot exploit it, he must persuade other members in the group to cooperate. It is this kind of high-end scavenging that distinguishes human ancestors with bone-crunching garhi and habilis. In this sense, the feature of displacement subserved by the function of the hippocampus enhances the power of language in social bonding.
Third, from the bottom-up perspective of evolutionary biology (De Waal & Ferrari, Reference De Waal and Ferrari2010), analogous or homologous mechanisms implicated in language and music have been found in other animals (Fitch, Reference Fitch2015; Hauser, Chomsky, & Fitch, Reference Hauser, Chomsky and Fitch2002). Comparative studies have shown that the subcomponents of rhythm processing and episodic-like memory are present in diverse species, which are supported by the basal ganglia and hippocampus. Rhythm processing was proposed to be subdivided into four subcomponents, among which beat perception and synchronization has been detected in vocal learning birds and mammals, and entrainment of conspecific signaling can be found in both vertebrates and invertebrates (Kotz, Ravignani, & Fitch, Reference Kotz, Ravignani and Fitch2018). The involvement of the basal ganglia circuit in vocal learning in birds (Jarvis, Reference Jarvis2007) and rhythm processing in humans (Grahn, Reference Grahn2009) encouraged Patel (Reference Patel2008) to come up with the “vocal learning and rhythm synchronization hypothesis.” Damaging the basal ganglia in zebra finches produces stuttering-like songs, a behavior with the disrupted rhythm, resembling stuttering in humans with impaired function of the basal ganglia (Ravignani et al., Reference Ravignani, Dalla Bella, Falk, Kello, Noriega and Kotz2019). With respect to the episodic-like memory, with behavioral criteria “where-what-when,” it has been identified in scrub jays (Clayton & Dickinson, Reference Clayton and Dickinson1998), rodents (Crystal & Smith, Reference Crystal and Smith2014), and nonhuman primates (Martin-Ordas, Haun, Colmenares, & Call, Reference Martin-Ordas, Haun, Colmenares and Call2010). Evidence has shown that the hippocampus is involved in episodic memory in mice (Ergorul & Eichenbaum, Reference Ergorul and Eichenbaum2004) and monkeys (Buckley & Gaffan, Reference Buckley, Gaffan and Bolhuis2000). Although no direct connection between the hippocampus and episodic memory in birds has been reported, Gould et al. (Reference Gould, Gilbertson, Hrvol, Nelson, Seyfer, Brantner and Kamil2013) have found that the avian relative hippocampal size is closely related to food caching, a behavior related to episodic-like memory. Interestingly, the hippocampal size is potentially linked to song plasticity in open-ended vocal learning birds and language learning in human adults (Zhang & Alamri, Reference Zhang and Alamri2016). It is also worth noting that the basal ganglia and hippocampus are conserved brain structures, and as different species evolve, they may be involved in more advanced cognitive abilities with conserved functions. The identified subcomponents of rhythm processing and episodic-like memory subserved by the basal ganglia and hippocampus above seem to all contribute to social bonding in different species. However, the social bonding theory proposed in the target article cannot explain the evolutionary trajectory of these subcomponents.
Financial support
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Conflict of interest
None.