How generalizability claims should be justified has been a point of contention in psychology for decades. There are two well-established methodological perspectives on the issue. The falsificationist approach, a deductive strategy, consists of severely testing claims to discover the limits of their generalizability (Popper, Reference Popper1959). The confirmationist approach, an inductive strategy, consists of accumulating single facts that collectively build partially confirmed generalizability claims (Carnap, Reference Carnap1936). Both approaches provide coherent and effective ways to evaluate generalizability claims in science. Yarkoni proposes a third approach built on the impossible ideal of verifying (i.e., conclusively confirming) generalizability claims through random-effect modeling. Unsurprisingly, he concludes that this approach is practically impossible to apply, because infinitely many factors exist that could moderate the generalizability of effects. Yarkoni's “crisis” narrative conflates his impossible approach to achieve a goal with the impossibility of achieving a goal. Generalizability claims are by definition based on extrapolation, and go beyond the data (Shadish, Cook, & Campbell, Reference Shadish, Cook and Campbell2001). Generalizations are therefore always speculations on the basis of tentative assumptions that await falsification, or based on incrementally increasing beliefs through partial confirmations.
The falsificationist strategy is summarized by Mook (Reference Mook1983, p. 380): “We are not making generalizations, but testing them.” Falsificationists test predictions of a theory, along with a ceteris paribus clause which posits that “nothing else is at work except factors that are totally random” (Meehl, Reference Meehl1990, p. 111). If the ceteris paribus clause holds, the claim is generalizable. Yarkoni is correct that ceteris paribus is often not literally true (cf. Meehl, Reference Meehl1990). Systematic non-trivial factors exist. However, all theories are necessarily simplifications: A map is never meant to be the territory (Bateson, Reference Bateson1972, p. 459). The challenge is to identify, from an infinite set of possible factors that falsify the theory's generalizability claim, which do so in a way that actually matters (Box, Reference Box1976). For example, although it is possible that temperature has a tiny impact on the Stroop effect, nobody considers it plausible that the effect will be meaningful enough to actually study it.
If experiments yield data that are too heterogeneous to be explained by a theory, either the theory or the ceteris paribus clause is falsified. If the latter option is chosen, a less general theory is proposed. Even when well-corroborated by the data, generalizability claims are only tentatively accepted. Lakatos (Reference Lakatos1978) reminds us that theories will always have unresolved problems, which is acceptable as long as our theories are “good enough” (Meehl, Reference Meehl1990, p. 115).
The confirmationist strategy is summarized by Carnap (Reference Carnap1936, p. 425): “We cannot verify the law, but we can test it by testing its single instances. … If in the continued series of such testing experiments no negative instance is found but the number of positive instances increases then our confidence in the law will grow step by step.” Within a confirmationist framework, researchers start by observing a single (often the most prototypical) instance of the investigated phenomenon. If subsequent observations enlarge the set of positive instances predicted by the theory, researchers increase their belief in its generalizability. Since verification is deemed impossible, confirmationists aim to specify the extent to which a generalizability claim is supported.
Yarkoni is not satisfied with either strategy and invents a third approach that we call neo-operationalism. Yarkoni's core argument is that generalizability claims need to be strictly data-driven, that is, based on random-effect modeling. He believes this is a feasible approach to “closely align” verbal and statistical hypotheses, which should lead to well-warranted generalizability claims.
His proposal cannot work for two reasons. First, we can only close the gap between concepts and their measures by stochastically sampling operationalizations from some underlying population if the meaning of the concept is identical to the population of its operationalizations. This is true for what MacCorquodale and Meehl (Reference MacCorquodale and Meehl1948) call abstractive concepts, such as “color” in the Stroop task, which is identical to the colors in the visible spectrum. However, many concepts in psychology are hypothetical (e.g., “anger”), meaning they are semantically richer than and cannot be reduced to their operationalizations (e.g., anger is not just what anger measures measure). Thus, as long as psychologists want to theorize via hypothetical concepts, random-effect modeling cannot bridge the gap between verbal and statistical hypotheses, no matter how expansive the fitted model is (Green, Reference Green1992; Leahey, Reference Leahey1980).
Second, exhaustively defining a universe of operationalizations is impossible for hypothetical concepts (cf. Bear & Phillips commentary in this treatment): Such a “universe” would be too vast, theory-laden, and most probably time-dependent to be definable. Together, these points imply that statistical hypotheses will never be perfectly aligned with verbal hypotheses involving hypothetical concepts. Yarkoni's proposed solution could work if scientists limit themselves to abstractive concepts, but as Yarkoni still recommends the use of concepts such as “anger” or “charitable donation” in titles, which go well beyond any specific operationalizations, limiting psychological science to abstractive concepts seems too big a sacrifice.
Yarkoni's inductive neo-operationalism clearly does not sing from the same songbook as the deductive methodological falsificationist approach of tentatively accepting the ceteris paribus clause. But neo-operationalism is also in conflict with confirmationism, although both accounts are inductivist. For Yarkoni, claims can only be generalized as far as they are aligned with the model that is fitted. By contrast, the confirmationist does not try to bridge the gap between verbal and statistical models. Partial confirmation is all one can get. To the extent that the generalizability claim is supported by novel data, the belief in it increases.
Yarkoni's recommendation to deal with the generalizability “crisis” is to clearly indicate that any extrapolation beyond the data is speculation (Sect. 6.3.1, para. 1). Rarely has there been a crisis solved more easily than by adding “going beyond the data” before a generalizability claim. Moreover, “going beyond the data” essentially means either tentative acceptance or partial confirmation. The diagnosis of a “crisis” is unwarranted when the two tried and tested approaches to justifying generalizability claims, the falsificationist and the confirmationist approach, already deliver what they promise. This leads us to conclude only one thing: There is no generalizability crisis.
How generalizability claims should be justified has been a point of contention in psychology for decades. There are two well-established methodological perspectives on the issue. The falsificationist approach, a deductive strategy, consists of severely testing claims to discover the limits of their generalizability (Popper, Reference Popper1959). The confirmationist approach, an inductive strategy, consists of accumulating single facts that collectively build partially confirmed generalizability claims (Carnap, Reference Carnap1936). Both approaches provide coherent and effective ways to evaluate generalizability claims in science. Yarkoni proposes a third approach built on the impossible ideal of verifying (i.e., conclusively confirming) generalizability claims through random-effect modeling. Unsurprisingly, he concludes that this approach is practically impossible to apply, because infinitely many factors exist that could moderate the generalizability of effects. Yarkoni's “crisis” narrative conflates his impossible approach to achieve a goal with the impossibility of achieving a goal. Generalizability claims are by definition based on extrapolation, and go beyond the data (Shadish, Cook, & Campbell, Reference Shadish, Cook and Campbell2001). Generalizations are therefore always speculations on the basis of tentative assumptions that await falsification, or based on incrementally increasing beliefs through partial confirmations.
The falsificationist strategy is summarized by Mook (Reference Mook1983, p. 380): “We are not making generalizations, but testing them.” Falsificationists test predictions of a theory, along with a ceteris paribus clause which posits that “nothing else is at work except factors that are totally random” (Meehl, Reference Meehl1990, p. 111). If the ceteris paribus clause holds, the claim is generalizable. Yarkoni is correct that ceteris paribus is often not literally true (cf. Meehl, Reference Meehl1990). Systematic non-trivial factors exist. However, all theories are necessarily simplifications: A map is never meant to be the territory (Bateson, Reference Bateson1972, p. 459). The challenge is to identify, from an infinite set of possible factors that falsify the theory's generalizability claim, which do so in a way that actually matters (Box, Reference Box1976). For example, although it is possible that temperature has a tiny impact on the Stroop effect, nobody considers it plausible that the effect will be meaningful enough to actually study it.
If experiments yield data that are too heterogeneous to be explained by a theory, either the theory or the ceteris paribus clause is falsified. If the latter option is chosen, a less general theory is proposed. Even when well-corroborated by the data, generalizability claims are only tentatively accepted. Lakatos (Reference Lakatos1978) reminds us that theories will always have unresolved problems, which is acceptable as long as our theories are “good enough” (Meehl, Reference Meehl1990, p. 115).
The confirmationist strategy is summarized by Carnap (Reference Carnap1936, p. 425): “We cannot verify the law, but we can test it by testing its single instances. … If in the continued series of such testing experiments no negative instance is found but the number of positive instances increases then our confidence in the law will grow step by step.” Within a confirmationist framework, researchers start by observing a single (often the most prototypical) instance of the investigated phenomenon. If subsequent observations enlarge the set of positive instances predicted by the theory, researchers increase their belief in its generalizability. Since verification is deemed impossible, confirmationists aim to specify the extent to which a generalizability claim is supported.
Yarkoni is not satisfied with either strategy and invents a third approach that we call neo-operationalism. Yarkoni's core argument is that generalizability claims need to be strictly data-driven, that is, based on random-effect modeling. He believes this is a feasible approach to “closely align” verbal and statistical hypotheses, which should lead to well-warranted generalizability claims.
His proposal cannot work for two reasons. First, we can only close the gap between concepts and their measures by stochastically sampling operationalizations from some underlying population if the meaning of the concept is identical to the population of its operationalizations. This is true for what MacCorquodale and Meehl (Reference MacCorquodale and Meehl1948) call abstractive concepts, such as “color” in the Stroop task, which is identical to the colors in the visible spectrum. However, many concepts in psychology are hypothetical (e.g., “anger”), meaning they are semantically richer than and cannot be reduced to their operationalizations (e.g., anger is not just what anger measures measure). Thus, as long as psychologists want to theorize via hypothetical concepts, random-effect modeling cannot bridge the gap between verbal and statistical hypotheses, no matter how expansive the fitted model is (Green, Reference Green1992; Leahey, Reference Leahey1980).
Second, exhaustively defining a universe of operationalizations is impossible for hypothetical concepts (cf. Bear & Phillips commentary in this treatment): Such a “universe” would be too vast, theory-laden, and most probably time-dependent to be definable. Together, these points imply that statistical hypotheses will never be perfectly aligned with verbal hypotheses involving hypothetical concepts. Yarkoni's proposed solution could work if scientists limit themselves to abstractive concepts, but as Yarkoni still recommends the use of concepts such as “anger” or “charitable donation” in titles, which go well beyond any specific operationalizations, limiting psychological science to abstractive concepts seems too big a sacrifice.
Yarkoni's inductive neo-operationalism clearly does not sing from the same songbook as the deductive methodological falsificationist approach of tentatively accepting the ceteris paribus clause. But neo-operationalism is also in conflict with confirmationism, although both accounts are inductivist. For Yarkoni, claims can only be generalized as far as they are aligned with the model that is fitted. By contrast, the confirmationist does not try to bridge the gap between verbal and statistical models. Partial confirmation is all one can get. To the extent that the generalizability claim is supported by novel data, the belief in it increases.
Yarkoni's recommendation to deal with the generalizability “crisis” is to clearly indicate that any extrapolation beyond the data is speculation (Sect. 6.3.1, para. 1). Rarely has there been a crisis solved more easily than by adding “going beyond the data” before a generalizability claim. Moreover, “going beyond the data” essentially means either tentative acceptance or partial confirmation. The diagnosis of a “crisis” is unwarranted when the two tried and tested approaches to justifying generalizability claims, the falsificationist and the confirmationist approach, already deliver what they promise. This leads us to conclude only one thing: There is no generalizability crisis.
Acknowledgment
Thanks to Leo Tiokhin for feedback on an earlier draft.
Funding
This work was funded by VIDI Grant 452-17-013 from the Netherlands Organisation for Scientific Research, and by the European Union and the Turkish Scientific and Technological Research Council under the Horizon 2020 Marie Skłodowska-Curie Actions Cofund program Co-Circulation2.
Conflict of interest
None.