Ackermann et al. propose a model of the evolution of neural adaptations related to the production of spoken language. Although we are convinced of the importance of such adaptations, and although the authors themselves state that “the model outlined here addresses only one out of several building blocks” (sect. 7, para. 2), we would nevertheless like to make two reflections on their article. Our first reflection is on the assumption that independent control over the vocal folds and the upper vocal tract is somehow a given, and our second reflection is on the ability of apes to control vocalization voluntarily.
Following Fitch (Reference Fitch2000a), the authors assume that animal vocalizations have a source and a filter. They also appear to assume that source and filter are independent as they are in modern humans, which is not necessarily the case. In many instances, the behavior of the source is in fact strongly coupled to that of the filter (e.g., in woodwind instruments). Source-filter theory was originally formulated in the context of human speech (Fant Reference Fant1960). However, the fact that independence of source and filter is a good approximation for human speech does not mean it is universally valid.
Fletcher (Reference Fletcher1993) has investigated the theory of vibrating valves and found that the independence of source and filter depends on the precise shape and configuration of the source. In addition, it depends on the ratio of resonance frequencies of the source and the filter. Titze (Reference Titze2008) has adapted the theory to human-like vocal folds, and found that if the frequency at which the vocal folds vibrate is near the resonance frequencies of the vocal tract, strong coupling can occur. Apparently, modern human vocal folds and vocal tracts avoid strong coupling, but it is an open question whether this was the case in our evolutionary ancestors.
The little that we do know about ape vocal anatomy appears to argue against independence of source and filter. One instance of this is the large air sacs present in all great apes (Hewitt et al. Reference Hewitt, MacLarnon and Jones2002), which lower the resonance frequency of the upper vocal tract considerably (de Boer Reference de Boer2008) and would therefore increase coupling (as found in model experiments by Riede et al. Reference Riede, Tokuda, Munger and Thomson2008). In addition, chimpanzee vocal folds (the only ones about which we have anatomical data) have so-called vocal lips (Demolin & Delvaux Reference Demolin, Delvaux, Cangelosi, Smith and Smith2006; Kelemen Reference Kelemen and Bourne1969), and thus a very different shape from human vocal folds. Although we do not know the function of these vocal lips, this difference between two closely related species underscores the point that we should not just assume similar behavior of their vocalization systems.
In systems where source and filter cannot behave independently, the set of signals that can be produced is necessarily more limited. This consequence is demonstrated in a modeling study showing that when source and filter are closely coupled, vocalization may be more chaotic, and thus it may be more difficult to time the onset of vocalization precisely (de Boer Reference de Boer2012). Given these observations, it may not just be a lack of neural control that makes precise vocalizations difficult for nonhuman primates. It may also be that the anatomy of their vocal folds and their vocal tracts makes it much harder as well.
Our second point of commentary is to note evidence of at least one case in which a nonhuman primate appears to have some voluntary control over her larynx in the performance of learned, species atypical vocalizations. Koko, a human-reared, female gorilla (Patterson & Linden Reference Patterson and Linden1981), has been video-recorded performing numerous instances from a repertoire of play behaviors involving voluntary control over her larynx and surpralaryngeal vocal tract in coordination with various gestures and action routines (Perlman et al. Reference Perlman, Patterson and Cohn2011). This repertoire includes the production of breathy-voiced sounds and glottal stops in situations that are determined by the particular play routine.
Perlman et al. (Reference Perlman, Patterson and Cohn2011) describe how Koko exhibits vocal control in her play behavior of “talking” into telephones, when she often directs breathy grunt-like vocalizations into the receiver, which she holds to her mouth (voicing was observed in 42 of 68 exhalations over 11 bouts). That she exercises voluntary control over her larynx in these vocalizations is suggested by the contrast of this behavior to her routine of huffing on the lenses of eyeglasses as if to clean them. As in the real human performance of cleaning eyeglasses, Koko produces, in this case, open-mouthed audible huffs that are distinctly and without exception voiceless (as exhibited in 12 video-recorded bouts involving 25 exhalations). Another dimension of vocal control is demonstrated in her voluntary performance of a mock “cough,” which involves a glottal stop, often in coordination with a gesture in which she covers her mouth with an open hand. In several instances, she produces this behavior on command, demonstrating clear voluntary control over the closure of her glottis.
These behaviors appear to be examples of voluntary control over laryngeal motor activity outside of a species-typical audiovisual display, something that Ackermann et al. say has not been attested yet in great apes. Apparently we should not discount the possibility that apes – and by implication our last common ancestor – have more (rudimentary) abilities to control vocalization voluntarily than is often assumed.
Given that (1) control over vocalization is not just limited by neural factors, but also by purely anatomical and physiological ones, and that (2) a gorilla has been shown to have some rudimentary voluntary control over vocalization, we conclude that in the evolution of speech, anatomical and physiological adaptations to the vocal folds and the vocal tract may have been as important as neural adaptations of their control.
Ackermann et al. propose a model of the evolution of neural adaptations related to the production of spoken language. Although we are convinced of the importance of such adaptations, and although the authors themselves state that “the model outlined here addresses only one out of several building blocks” (sect. 7, para. 2), we would nevertheless like to make two reflections on their article. Our first reflection is on the assumption that independent control over the vocal folds and the upper vocal tract is somehow a given, and our second reflection is on the ability of apes to control vocalization voluntarily.
Following Fitch (Reference Fitch2000a), the authors assume that animal vocalizations have a source and a filter. They also appear to assume that source and filter are independent as they are in modern humans, which is not necessarily the case. In many instances, the behavior of the source is in fact strongly coupled to that of the filter (e.g., in woodwind instruments). Source-filter theory was originally formulated in the context of human speech (Fant Reference Fant1960). However, the fact that independence of source and filter is a good approximation for human speech does not mean it is universally valid.
Fletcher (Reference Fletcher1993) has investigated the theory of vibrating valves and found that the independence of source and filter depends on the precise shape and configuration of the source. In addition, it depends on the ratio of resonance frequencies of the source and the filter. Titze (Reference Titze2008) has adapted the theory to human-like vocal folds, and found that if the frequency at which the vocal folds vibrate is near the resonance frequencies of the vocal tract, strong coupling can occur. Apparently, modern human vocal folds and vocal tracts avoid strong coupling, but it is an open question whether this was the case in our evolutionary ancestors.
The little that we do know about ape vocal anatomy appears to argue against independence of source and filter. One instance of this is the large air sacs present in all great apes (Hewitt et al. Reference Hewitt, MacLarnon and Jones2002), which lower the resonance frequency of the upper vocal tract considerably (de Boer Reference de Boer2008) and would therefore increase coupling (as found in model experiments by Riede et al. Reference Riede, Tokuda, Munger and Thomson2008). In addition, chimpanzee vocal folds (the only ones about which we have anatomical data) have so-called vocal lips (Demolin & Delvaux Reference Demolin, Delvaux, Cangelosi, Smith and Smith2006; Kelemen Reference Kelemen and Bourne1969), and thus a very different shape from human vocal folds. Although we do not know the function of these vocal lips, this difference between two closely related species underscores the point that we should not just assume similar behavior of their vocalization systems.
In systems where source and filter cannot behave independently, the set of signals that can be produced is necessarily more limited. This consequence is demonstrated in a modeling study showing that when source and filter are closely coupled, vocalization may be more chaotic, and thus it may be more difficult to time the onset of vocalization precisely (de Boer Reference de Boer2012). Given these observations, it may not just be a lack of neural control that makes precise vocalizations difficult for nonhuman primates. It may also be that the anatomy of their vocal folds and their vocal tracts makes it much harder as well.
Our second point of commentary is to note evidence of at least one case in which a nonhuman primate appears to have some voluntary control over her larynx in the performance of learned, species atypical vocalizations. Koko, a human-reared, female gorilla (Patterson & Linden Reference Patterson and Linden1981), has been video-recorded performing numerous instances from a repertoire of play behaviors involving voluntary control over her larynx and surpralaryngeal vocal tract in coordination with various gestures and action routines (Perlman et al. Reference Perlman, Patterson and Cohn2011). This repertoire includes the production of breathy-voiced sounds and glottal stops in situations that are determined by the particular play routine.
Perlman et al. (Reference Perlman, Patterson and Cohn2011) describe how Koko exhibits vocal control in her play behavior of “talking” into telephones, when she often directs breathy grunt-like vocalizations into the receiver, which she holds to her mouth (voicing was observed in 42 of 68 exhalations over 11 bouts). That she exercises voluntary control over her larynx in these vocalizations is suggested by the contrast of this behavior to her routine of huffing on the lenses of eyeglasses as if to clean them. As in the real human performance of cleaning eyeglasses, Koko produces, in this case, open-mouthed audible huffs that are distinctly and without exception voiceless (as exhibited in 12 video-recorded bouts involving 25 exhalations). Another dimension of vocal control is demonstrated in her voluntary performance of a mock “cough,” which involves a glottal stop, often in coordination with a gesture in which she covers her mouth with an open hand. In several instances, she produces this behavior on command, demonstrating clear voluntary control over the closure of her glottis.
These behaviors appear to be examples of voluntary control over laryngeal motor activity outside of a species-typical audiovisual display, something that Ackermann et al. say has not been attested yet in great apes. Apparently we should not discount the possibility that apes – and by implication our last common ancestor – have more (rudimentary) abilities to control vocalization voluntarily than is often assumed.
Given that (1) control over vocalization is not just limited by neural factors, but also by purely anatomical and physiological ones, and that (2) a gorilla has been shown to have some rudimentary voluntary control over vocalization, we conclude that in the evolution of speech, anatomical and physiological adaptations to the vocal folds and the vocal tract may have been as important as neural adaptations of their control.