The target article by Burkart et al. is a valuable study, bringing together lines of evidence that have heretofore seldom been considered together (Locurto Reference Locurto, Tomic and Kigman1997). I do have several concerns about the viability of marking a general factor in nonhumans using either species differences or individual difference. I also have a more minor quibble about the definition of general intelligence (g) itself. The authors, quoting Gottfredson (Reference Gottfredson1997, p. 13) offer a rather complex definition of general intelligence that one might call unnecessarily impenetrable, as follows: “the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience” The authors add: “It is thus not merely book learning, a narrow academic skill, or test-taking smarts. Rather it reflects a broader and deeper capability for comprehending our surroundings – ‘catching on,’ ‘making sense’ of things, or ‘figuring out what to do’” (sect. 1.1, para. 1; from Gottfredson Reference Gottfredson1997, p. 13).
I offer a simpler definition, based on Charles Spearman's original work in this area. It was Spearman (Reference Spearman1904) who first developed the idea of a general faculty, based on his study of individual differences in the performance of school children across a variety of tasks, some sensory/perceptual, as in pitch discrimination, others more fully representative of cognitive functioning, such as school grades (see, in particular, Spearman Reference Spearman1904, p. 291). Spearman defined the general factor as tapping “the eduction of correlates” (or, more fully, “the eduction of relations and correlates,” Spearman Reference Spearman1927, pp. 165–66). I love the simplicity and sheer elegance of “the eduction of correlates” expression, and I think it suffices in the stead of more complex definitions. The implication of Spearman's definition was that g was better conceptualized as a single process – mental energy and the like – instead of a series of strung-together mechanisms that functioned as a whole because of overlapping microprocesses (see Mackintosh Reference Mackintosh1998, for presentation of the overlapping mechanisms idea for g.). Although the essence of the target article favors density in the definition of g, I think Spearman's original simplicity remains defensible.
The marking of a general factor by looking for systematic differences between nonhuman species (G) is potentially compromised by Euan Macphail's argument that species differences in cognitive performance may be the result of differences in what he called contextual variables (Macphail Reference Macphail1982; Reference Macphail1987) – that is, all of the sensory/motoric/motivational and so on factors that might differ between species, and consequently might masquerade as cognitive differences. The end point of this argument is that we may not be able to reject Macphail's hypothesis that all nonhuman species are capable of all types of learning/cognition. This argument may appear easily rendered moot (after all, isn't a chimpanzee capable of more complex cognition than a frog?), but it has proven more resilient than initially expected. To their credit, the authors cite Macphail's argument, and they offer a reasonable rebuttal in the form that perhaps not all tasks are affected by this problem to the same extent. Reversal learning tasks, for instance, adapt each species to the task in the form of initial acquisition before measuring the rapidity of reversal. Therefore, tasks like this might be seen as mitigating what might be initial between-species differences in reaction to contextual variables. But the problem posed by contextual variables is more insidious than the authors recognize. To fully account for the influence of these contextual confounds, one would have to expose different species to rather strenuous parametric work, where potential contextual confounds are systematically examined across a given dimension, such as studying species differences in reversal learning across a number of sensory dimensions: visual, olfactory, tactile, and so forth. That kind of work is unlikely to be done, and, as a consequence, Macphail's argument remains a thorn in our collective side.
The study of within-species individual differences is a more promising avenue for identifying markers of a general process. Systematic individual differences have been observed in nonhumans, particularly in mice, and these differences are not confounded by differences in noncognitive factors: for instance, overall activity levels (Locurto & Scanlon Reference Locurto and Scanlon1998; Locurto et al. Reference Locurto, Benoit, Crowley and Miele2006). However, an important, perhaps even critical limitation of such studies is that they lack something that is commonplace in studies of human g – namely, what is called predictive or criterion-related validity (Anastasi Reference Anastasi1961). In psychometrics, validity refers to what a test measures. Predictive validity refers to the effectiveness of a test in forecasting behavior in domains outside of the test content per se. To assess it, there need to be independent measures of what the test is designed to predict. Independence in this sense can be taken to mean measures outside of the province of the test items themselves. In the human literature, predictive validity of an intelligence test is not at issue: g is a reasonably good predictor of various measures of life outcome, including school achievement, the probability of occupational success, social mobility, and even health and survival. g is better at predicting such variables than are specific cognitive abilities on their own (Locurto Reference Locurto1991). The many criteria external to the test itself that correlate with human g represent a powerful measure of real-life success.
There is nothing similar in the nonhuman literature on g, although there have been important findings that stretch the initial g battery to include a number of additional processes that seem reasonably related to what g should measure, such as selective attention, working memory, and tests of reasoning (Matzel et al. Reference Matzel, Wass and Kolata2011b; Sauce et al. Reference Sauce, Wass, Smith, Kwan and Matzel2014). These extensions are valuable, but they do not constitute extra-domain assays. They are simply additional cognitive tasks that load on the initial g. This form of adding tasks is itself a type of validity called content validity, but it is not predictive validity. The authors recognize this issue, and in their Table 7 they offer a series of additional categories of evidence, some of which are forms of predictive validity, that would be useful going forward. The authors end by raising the critical question: does (nonhuman) g predict success in real life? Only if that question can be successfully addressed can we conclude that g is not uniquely human.
The target article by Burkart et al. is a valuable study, bringing together lines of evidence that have heretofore seldom been considered together (Locurto Reference Locurto, Tomic and Kigman1997). I do have several concerns about the viability of marking a general factor in nonhumans using either species differences or individual difference. I also have a more minor quibble about the definition of general intelligence (g) itself. The authors, quoting Gottfredson (Reference Gottfredson1997, p. 13) offer a rather complex definition of general intelligence that one might call unnecessarily impenetrable, as follows: “the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience” The authors add: “It is thus not merely book learning, a narrow academic skill, or test-taking smarts. Rather it reflects a broader and deeper capability for comprehending our surroundings – ‘catching on,’ ‘making sense’ of things, or ‘figuring out what to do’” (sect. 1.1, para. 1; from Gottfredson Reference Gottfredson1997, p. 13).
I offer a simpler definition, based on Charles Spearman's original work in this area. It was Spearman (Reference Spearman1904) who first developed the idea of a general faculty, based on his study of individual differences in the performance of school children across a variety of tasks, some sensory/perceptual, as in pitch discrimination, others more fully representative of cognitive functioning, such as school grades (see, in particular, Spearman Reference Spearman1904, p. 291). Spearman defined the general factor as tapping “the eduction of correlates” (or, more fully, “the eduction of relations and correlates,” Spearman Reference Spearman1927, pp. 165–66). I love the simplicity and sheer elegance of “the eduction of correlates” expression, and I think it suffices in the stead of more complex definitions. The implication of Spearman's definition was that g was better conceptualized as a single process – mental energy and the like – instead of a series of strung-together mechanisms that functioned as a whole because of overlapping microprocesses (see Mackintosh Reference Mackintosh1998, for presentation of the overlapping mechanisms idea for g.). Although the essence of the target article favors density in the definition of g, I think Spearman's original simplicity remains defensible.
The marking of a general factor by looking for systematic differences between nonhuman species (G) is potentially compromised by Euan Macphail's argument that species differences in cognitive performance may be the result of differences in what he called contextual variables (Macphail Reference Macphail1982; Reference Macphail1987) – that is, all of the sensory/motoric/motivational and so on factors that might differ between species, and consequently might masquerade as cognitive differences. The end point of this argument is that we may not be able to reject Macphail's hypothesis that all nonhuman species are capable of all types of learning/cognition. This argument may appear easily rendered moot (after all, isn't a chimpanzee capable of more complex cognition than a frog?), but it has proven more resilient than initially expected. To their credit, the authors cite Macphail's argument, and they offer a reasonable rebuttal in the form that perhaps not all tasks are affected by this problem to the same extent. Reversal learning tasks, for instance, adapt each species to the task in the form of initial acquisition before measuring the rapidity of reversal. Therefore, tasks like this might be seen as mitigating what might be initial between-species differences in reaction to contextual variables. But the problem posed by contextual variables is more insidious than the authors recognize. To fully account for the influence of these contextual confounds, one would have to expose different species to rather strenuous parametric work, where potential contextual confounds are systematically examined across a given dimension, such as studying species differences in reversal learning across a number of sensory dimensions: visual, olfactory, tactile, and so forth. That kind of work is unlikely to be done, and, as a consequence, Macphail's argument remains a thorn in our collective side.
The study of within-species individual differences is a more promising avenue for identifying markers of a general process. Systematic individual differences have been observed in nonhumans, particularly in mice, and these differences are not confounded by differences in noncognitive factors: for instance, overall activity levels (Locurto & Scanlon Reference Locurto and Scanlon1998; Locurto et al. Reference Locurto, Benoit, Crowley and Miele2006). However, an important, perhaps even critical limitation of such studies is that they lack something that is commonplace in studies of human g – namely, what is called predictive or criterion-related validity (Anastasi Reference Anastasi1961). In psychometrics, validity refers to what a test measures. Predictive validity refers to the effectiveness of a test in forecasting behavior in domains outside of the test content per se. To assess it, there need to be independent measures of what the test is designed to predict. Independence in this sense can be taken to mean measures outside of the province of the test items themselves. In the human literature, predictive validity of an intelligence test is not at issue: g is a reasonably good predictor of various measures of life outcome, including school achievement, the probability of occupational success, social mobility, and even health and survival. g is better at predicting such variables than are specific cognitive abilities on their own (Locurto Reference Locurto1991). The many criteria external to the test itself that correlate with human g represent a powerful measure of real-life success.
There is nothing similar in the nonhuman literature on g, although there have been important findings that stretch the initial g battery to include a number of additional processes that seem reasonably related to what g should measure, such as selective attention, working memory, and tests of reasoning (Matzel et al. Reference Matzel, Wass and Kolata2011b; Sauce et al. Reference Sauce, Wass, Smith, Kwan and Matzel2014). These extensions are valuable, but they do not constitute extra-domain assays. They are simply additional cognitive tasks that load on the initial g. This form of adding tasks is itself a type of validity called content validity, but it is not predictive validity. The authors recognize this issue, and in their Table 7 they offer a series of additional categories of evidence, some of which are forms of predictive validity, that would be useful going forward. The authors end by raising the critical question: does (nonhuman) g predict success in real life? Only if that question can be successfully addressed can we conclude that g is not uniquely human.