Within-species comparisons
For more than a century, psychometricians have devised IQ tests to measure human intelligence. However, the breadth of test items is quite narrow. The tasks are, for the most part, administered in the same manner, with no or only modest variation of test-taking situation, motivation, or sensory domain (Locurto et al. Reference Locurto, Benoit, Crowley and Miele2006). For instance, the WAIS-IV (Wechsler et al. Reference Wechsler, Coalson and Raiford2008) comprises four index scores, focusing on verbal comprehension, perceptual reasoning, working memory, and processing speed. This paper-and-pencil task may be enough to represent major components of human intelligence, but it does not tap the most interesting cognitive abilities in nonhuman animals, especially in the technical and social domains.
A crucial question in the search for the influence of an underlying general mental ability is the rationale behind which tests are included in the test batteries and the reliability of those tests for uncovering cognitive abilities. Tests measure performance, not cognitive abilities per se. A huge number of possible noncognitive factors may influence performance, from anatomical to perceptual and motivational. Therefore, it is important to know which cognitive tasks and which controls are included in the test battery. Human IQ tests are often constructed in the manner of a best-case scenario, in that tasks are included in the final battery only if they correlate positively with other tasks and loaded positively on the first component. That is, the presence of g is assumed and tasks chosen that verify its presence (Locurto et al. Reference Locurto, Benoit, Crowley and Miele2006). Furthermore, human IQ tests are standardized with several hundreds to thousands of people of all age classes. This is not feasible with (most) nonhuman animals.
Between-species comparisons
Large data sets for valid comparisons are only possible if we collect data from different labs. But can we rely on data sampled in different labs, using (slightly) different methods (different stimuli, apparatuses, procedures, etc.) and groups of subjects differing in important features like housing and rearing conditions, individual experiences, age, and sex composition? This is both a practical and a theoretical problem. It would demand an enormous amount of labor, money, space, and other resources to test a large sample of species in one lab. Even if one has access to a zoo or game park, testing the abilities that tap reasoning in nonhuman cognition is a difficult and time-consuming business. Furthermore, if the tasks were designed to tap different response systems, sensory modalities, and motivations, it would be a huge undertaking.
Therefore, the evidence for general intelligence on the interspecific level so far rests on meta-analyses. This strategy is based on the assumption that the frequency of reported observations of complex traits associated with behavioral flexibility is a reflection of that species' intellectual capability. For instance, Reader and Laland (Reference Reader and Laland2002) used indices of innovation, tool use, and social learning for their correlations. But is innovation really a direct outcome of a cognitive trait of a species? The relation is vague and the behavioral definitions are rather slippery. Furthermore, most of these meta-analyses rely on observation frequency, which may deviate widely from the experimentally proven existence of a cognitive trait in a species. For instance, reports of true imitation in callithrichids are very rare, but rigorous laboratory tests have proven its existence (Voelkl & Huber Reference Voelkl and Huber2000; Voelkl & Huber Reference Voelkl and Huber2007). The same is true with invisible displacement in Callithrix jacchus (Mendes & Huber Reference Mendes and Huber2004). Tool use may be the best example of the problem with drawing conclusions about species differences in general intelligence based on publication counting. It is an important ability in chimpanzees, New Caledonian crows, and Galápagos woodpecker finches. However, these species have no clear, experimentally proven cognitive superiority over their non-tool-using relatives, bonobos, carrion crows, or tree finches, respectively (Gruber et al. Reference Gruber, Clay and Zuberbuhler2010; Herrmann et al. Reference Herrmann, Hare, Call and Tomasello2010a; Teschke et al. Reference Teschke, Cartmill, Stankewitz and Tebbich2011; Reference Teschke, Wascher, Scriba, von Bayern, Huml, Siemers and Tebbich2013). This led to the conclusion that habitual tool use is not a clear predictor of general intelligence, not even physical intelligence (Emery & Clayton Reference Emery and Clayton2009). Although it would be unfair to dismiss the meta-analytical studies completely, at least they require substantiation by experimental data collected with similar methods across large samples of species (Healy & Rowe Reference Healy and Rowe2007). So far, such experimental comparisons are rare, and if available, they don't support the meta-analytical studies. All four experimental comparisons listed in Table 5 of Burkart et al.'s target article lack clear-cut evidence for G.
Reasoning
Burkart et al. claim that “recent studies are consistent with the presence of general intelligence in mammals” (in the Abstract), which is defined as the ability to reason, plan, and think abstractly (Gottfredson Reference Gottfredson1997). However, the only cited reasoning study outside of rodents (Anderson Reference Anderson1993; Wass et al. Reference Wass, Denman-Brice, Light, Kolata, Smith and Matzel2012) has not found evidence for g (Herrmann & Call Reference Herrmann and Call2012). The author of this commentary has found evidence for reasoning by exclusion in several human animals (Aust et al. Reference Aust, Range, Steurer and Huber2008; Huber Reference Huber, Watanabe, Blaisdell, Huber and Young2009; O'Hara et al. Reference O'Hara, Auersperg, Bugnyar and Huber2015; Reference O'Hara, Schwing, Federspiel, Gajdon and Huber2016), but so far, evidence for g in these species is lacking.
Finally, concerning the search for g or G in nonhuman animals, caution toward overgeneralization is warranted. The few supportive studies in rodents and primates, two taxa that together represent about 20% of mammalian species and only 2% of vertebrates, cannot be generalized to “nonhuman animals.” Especially primatologists may be at risk of overemphasizing cognitive continuity between humans and nonhuman animals, instead of seeing radiation of traits outward in all directions (Hodos & Campbell Reference Hodos and Campbell1969; Shettleworth Reference Shettleworth2010a). The search for (human-like) general intelligence (based on reasoning) should be compensated by an appreciation of convergent evolution (Emery & Clayton Reference Emery and Clayton2004; Reference Emery and Clayton2009; Fitch et al. Reference Fitch, Huber and Bugnyar2010; Güntürkün & Bugnyar Reference Güntürkün and Bugnyar2016).
In this thought-provoking, highly inspiring article, Burkart et al. explore the possibility of the existence of general intelligence in nonhuman animals. Given the evidence for g in humans, it is a reasonable and worthwhile endeavor to look for its existence in other taxa. However, to pursue a psychometric approach to nonhuman intelligence, it is necessary to obtain relevant and reliable data. As the authors themselves admit, evolutionary plausibility does not amount to empirical evidence.
Within-species comparisons
For more than a century, psychometricians have devised IQ tests to measure human intelligence. However, the breadth of test items is quite narrow. The tasks are, for the most part, administered in the same manner, with no or only modest variation of test-taking situation, motivation, or sensory domain (Locurto et al. Reference Locurto, Benoit, Crowley and Miele2006). For instance, the WAIS-IV (Wechsler et al. Reference Wechsler, Coalson and Raiford2008) comprises four index scores, focusing on verbal comprehension, perceptual reasoning, working memory, and processing speed. This paper-and-pencil task may be enough to represent major components of human intelligence, but it does not tap the most interesting cognitive abilities in nonhuman animals, especially in the technical and social domains.
A crucial question in the search for the influence of an underlying general mental ability is the rationale behind which tests are included in the test batteries and the reliability of those tests for uncovering cognitive abilities. Tests measure performance, not cognitive abilities per se. A huge number of possible noncognitive factors may influence performance, from anatomical to perceptual and motivational. Therefore, it is important to know which cognitive tasks and which controls are included in the test battery. Human IQ tests are often constructed in the manner of a best-case scenario, in that tasks are included in the final battery only if they correlate positively with other tasks and loaded positively on the first component. That is, the presence of g is assumed and tasks chosen that verify its presence (Locurto et al. Reference Locurto, Benoit, Crowley and Miele2006). Furthermore, human IQ tests are standardized with several hundreds to thousands of people of all age classes. This is not feasible with (most) nonhuman animals.
Between-species comparisons
Large data sets for valid comparisons are only possible if we collect data from different labs. But can we rely on data sampled in different labs, using (slightly) different methods (different stimuli, apparatuses, procedures, etc.) and groups of subjects differing in important features like housing and rearing conditions, individual experiences, age, and sex composition? This is both a practical and a theoretical problem. It would demand an enormous amount of labor, money, space, and other resources to test a large sample of species in one lab. Even if one has access to a zoo or game park, testing the abilities that tap reasoning in nonhuman cognition is a difficult and time-consuming business. Furthermore, if the tasks were designed to tap different response systems, sensory modalities, and motivations, it would be a huge undertaking.
Therefore, the evidence for general intelligence on the interspecific level so far rests on meta-analyses. This strategy is based on the assumption that the frequency of reported observations of complex traits associated with behavioral flexibility is a reflection of that species' intellectual capability. For instance, Reader and Laland (Reference Reader and Laland2002) used indices of innovation, tool use, and social learning for their correlations. But is innovation really a direct outcome of a cognitive trait of a species? The relation is vague and the behavioral definitions are rather slippery. Furthermore, most of these meta-analyses rely on observation frequency, which may deviate widely from the experimentally proven existence of a cognitive trait in a species. For instance, reports of true imitation in callithrichids are very rare, but rigorous laboratory tests have proven its existence (Voelkl & Huber Reference Voelkl and Huber2000; Voelkl & Huber Reference Voelkl and Huber2007). The same is true with invisible displacement in Callithrix jacchus (Mendes & Huber Reference Mendes and Huber2004). Tool use may be the best example of the problem with drawing conclusions about species differences in general intelligence based on publication counting. It is an important ability in chimpanzees, New Caledonian crows, and Galápagos woodpecker finches. However, these species have no clear, experimentally proven cognitive superiority over their non-tool-using relatives, bonobos, carrion crows, or tree finches, respectively (Gruber et al. Reference Gruber, Clay and Zuberbuhler2010; Herrmann et al. Reference Herrmann, Hare, Call and Tomasello2010a; Teschke et al. Reference Teschke, Cartmill, Stankewitz and Tebbich2011; Reference Teschke, Wascher, Scriba, von Bayern, Huml, Siemers and Tebbich2013). This led to the conclusion that habitual tool use is not a clear predictor of general intelligence, not even physical intelligence (Emery & Clayton Reference Emery and Clayton2009). Although it would be unfair to dismiss the meta-analytical studies completely, at least they require substantiation by experimental data collected with similar methods across large samples of species (Healy & Rowe Reference Healy and Rowe2007). So far, such experimental comparisons are rare, and if available, they don't support the meta-analytical studies. All four experimental comparisons listed in Table 5 of Burkart et al.'s target article lack clear-cut evidence for G.
Reasoning
Burkart et al. claim that “recent studies are consistent with the presence of general intelligence in mammals” (in the Abstract), which is defined as the ability to reason, plan, and think abstractly (Gottfredson Reference Gottfredson1997). However, the only cited reasoning study outside of rodents (Anderson Reference Anderson1993; Wass et al. Reference Wass, Denman-Brice, Light, Kolata, Smith and Matzel2012) has not found evidence for g (Herrmann & Call Reference Herrmann and Call2012). The author of this commentary has found evidence for reasoning by exclusion in several human animals (Aust et al. Reference Aust, Range, Steurer and Huber2008; Huber Reference Huber, Watanabe, Blaisdell, Huber and Young2009; O'Hara et al. Reference O'Hara, Auersperg, Bugnyar and Huber2015; Reference O'Hara, Schwing, Federspiel, Gajdon and Huber2016), but so far, evidence for g in these species is lacking.
Finally, concerning the search for g or G in nonhuman animals, caution toward overgeneralization is warranted. The few supportive studies in rodents and primates, two taxa that together represent about 20% of mammalian species and only 2% of vertebrates, cannot be generalized to “nonhuman animals.” Especially primatologists may be at risk of overemphasizing cognitive continuity between humans and nonhuman animals, instead of seeing radiation of traits outward in all directions (Hodos & Campbell Reference Hodos and Campbell1969; Shettleworth Reference Shettleworth2010a). The search for (human-like) general intelligence (based on reasoning) should be compensated by an appreciation of convergent evolution (Emery & Clayton Reference Emery and Clayton2004; Reference Emery and Clayton2009; Fitch et al. Reference Fitch, Huber and Bugnyar2010; Güntürkün & Bugnyar Reference Güntürkün and Bugnyar2016).