I argue that the underlying problem reflected in the target article is that psychologists often apply inappropriate schemas when evaluating our own research. Applying the wrong evaluative schema can cause us to ignore questions of generalizability when they are crucial – but can also cause us to emphasize generalizability when it is of secondary importance.
In order to explicate my reasoning, I reinterpret Benjamin's (Reference Benjamin1941) taxonomy of common scientific “modes of explanation.” As I argue below, the issue of generalizability should be treated differently in the different modes.
Phase I: name
Sometimes, scientists simply identify and name a phenomenon, and provide us with a prescription for using their neologism ourselves. As an example, consider the effect of “hedonic adaptation” – the observation that individuals may adapt to an improvement\decrement in their life circumstances, and return to baseline levels of well-being (Frederick & Loewenstein, Reference Frederick and Loewenstein1999).
In this “naming” mode (prevalent in social psychology) it is not crucial to demonstrate that the phenomenon is omnipresent. It is informative if some of the people some of the time exhibit “hedonic adaptation,” “ingroup bias,” or “pluralistic ignorance.” Surely, we will be interested to know who exhibits the phenomenon and when, but this is a matter for the next phase of the scientific process.
Phase II: analysis\description
Once we have identified an entity, we may start to characterize its properties. For example, the relation between A (e.g., Conservatism) and B (Happiness) has form C and is moderated by D. A scientist can advance our understanding by finding such regularities – even without providing us with a full-fledged theory.
To provide such analyses, scientists rely on induction. Principled induction entails specifying a model embodying our assumptions regarding the possible nature of variability in the world and our study; when we omit sources of variability, we sweep them under the ceteris paribus rug (e.g., assuming stimuli will work the same, people will act the same). Sources of variability are infinite, and cannot be fully estimated or cogently assumed away. This means that all inductions entail a leap of faith – and are considered logically invalid (e.g., Hume, Reference Hume2003).
However, not all inductions are equal. As our observations become more comprehensive, and when our leaps of faith are relatively cogent, our inductions become better – in the sense that they are more likely to hold in novel contexts. Yarkoni is clearly right in saying that we'd be happy to find very general laws, or at least laws that apply to some well-known scope.
Nonetheless, I think Yarkoni is mistaken if he is saying that way to evaluate any research that generalizes is by assessing the congruence between the scientists' summary of their results, and what the results “actually” show (i.e., assessing the strength of their “inductive argument”). Litigating the congruency of evidence against belief is a proper way to test bona fide theories (see below). It is the wrong schema for evaluation of purely descriptive generalizations, because the generalizability of a pattern is fully independent of the claims made about its generalizability. In a research paper, we (should) have what we need to independently assess the appropriate scope of generalization (indeed, the target article makes such assessments); therefore, it is irrelevant whether an author is grandstanding.
Surely, as reviewers, we should tell scientists to “tone it down” when they overstate their results. However, science shouldn't be personal – we should not confuse rhetoric/aesthetics/ethics with epistemology.
Phase III: causal ontology
Once we observed the properties of entities, we can generate claims about causes that may have given rise to these properties. For example, we may argue that people's evaluation of an event is most affected by its ending (i.e., the “end rule,” Kahneman, Fredrickson, Schreiber, & Redelmeier, Reference Kahneman, Fredrickson, Schreiber and Redelmeier1993) because people better remember recent events (Baddeley & Hitch, Reference Baddeley and Hitch1993).
Causes are never “out there” in the world and thus must be imagined rather than directly observed (Kant, Reference Kant1908). After we have imagined a causal theory, we try to empirically justify it – and can often do so by deducing and testing the theory's entailments. One of the ways through which scientists justify their theories is by fiercely challenging them, giving rise to gradual “survival of the fittest” (Popper, Reference Popper1999).
Theory-challenge can be a lengthy process, pursued through various routes. We may seek the boundaries of a theory by testing it on broad, representative samples of individuals and stimuli. However, oftentimes, a good way to challenge a theory is by testing it in narrow, unusual contexts (e.g., does the “peak-end” effect occur in people with episodic amnesia?). Thus, gathering evidence for broad applicability is (one of the) means for severe theory-challenging – not its end-goal.
Phase IV: synthesis
Once we have some ontology of causes, we can seek a full-fledged “synthesis” that explains how a phenomenon can be truly accounted for in terms of a set of causes and their relations. For example, Rutledge, Skandali, Dayan, and Dolan (Reference Rutledge, Skandali, Dayan and Dolan2014) showed how people's momentary well-being is captured by a formula that describes precise relations between causes such as “prediction error” and “expected rewards.”
The typical way to conduct such research (prevalent in cognitivist “computational modeling” studies) is to pick an operationalization of a phenomenon, and gradually try to find the model that best fits the observed data. As such, those who conduct such research can pride themselves in severely challenging their models.
A caveat of such research is that the strong focus on a specific paradigm as a benchmark often leads us to forget that operationalization was once merely a means to study the general phenomenon. We should (but often forget to) ask the question raised by Yarkoni – can our model generalize to additional manifestations of the phenomenon? However, we must also remember that scientific progress can be slow; it is possible that models that explain behavior on a specific paradigm will eventually develop into full, generalizable accounts of a phenomenon.
Summary
The abovementioned modes of explanation can all reduce puzzlement. A better appreciation of these different modes is critical for cogent discussions concerning replicability, generalizability, and the utility of psychological science.
I argue that the underlying problem reflected in the target article is that psychologists often apply inappropriate schemas when evaluating our own research. Applying the wrong evaluative schema can cause us to ignore questions of generalizability when they are crucial – but can also cause us to emphasize generalizability when it is of secondary importance.
In order to explicate my reasoning, I reinterpret Benjamin's (Reference Benjamin1941) taxonomy of common scientific “modes of explanation.” As I argue below, the issue of generalizability should be treated differently in the different modes.
Phase I: name
Sometimes, scientists simply identify and name a phenomenon, and provide us with a prescription for using their neologism ourselves. As an example, consider the effect of “hedonic adaptation” – the observation that individuals may adapt to an improvement\decrement in their life circumstances, and return to baseline levels of well-being (Frederick & Loewenstein, Reference Frederick and Loewenstein1999).
In this “naming” mode (prevalent in social psychology) it is not crucial to demonstrate that the phenomenon is omnipresent. It is informative if some of the people some of the time exhibit “hedonic adaptation,” “ingroup bias,” or “pluralistic ignorance.” Surely, we will be interested to know who exhibits the phenomenon and when, but this is a matter for the next phase of the scientific process.
Phase II: analysis\description
Once we have identified an entity, we may start to characterize its properties. For example, the relation between A (e.g., Conservatism) and B (Happiness) has form C and is moderated by D. A scientist can advance our understanding by finding such regularities – even without providing us with a full-fledged theory.
To provide such analyses, scientists rely on induction. Principled induction entails specifying a model embodying our assumptions regarding the possible nature of variability in the world and our study; when we omit sources of variability, we sweep them under the ceteris paribus rug (e.g., assuming stimuli will work the same, people will act the same). Sources of variability are infinite, and cannot be fully estimated or cogently assumed away. This means that all inductions entail a leap of faith – and are considered logically invalid (e.g., Hume, Reference Hume2003).
However, not all inductions are equal. As our observations become more comprehensive, and when our leaps of faith are relatively cogent, our inductions become better – in the sense that they are more likely to hold in novel contexts. Yarkoni is clearly right in saying that we'd be happy to find very general laws, or at least laws that apply to some well-known scope.
Nonetheless, I think Yarkoni is mistaken if he is saying that way to evaluate any research that generalizes is by assessing the congruence between the scientists' summary of their results, and what the results “actually” show (i.e., assessing the strength of their “inductive argument”). Litigating the congruency of evidence against belief is a proper way to test bona fide theories (see below). It is the wrong schema for evaluation of purely descriptive generalizations, because the generalizability of a pattern is fully independent of the claims made about its generalizability. In a research paper, we (should) have what we need to independently assess the appropriate scope of generalization (indeed, the target article makes such assessments); therefore, it is irrelevant whether an author is grandstanding.
Surely, as reviewers, we should tell scientists to “tone it down” when they overstate their results. However, science shouldn't be personal – we should not confuse rhetoric/aesthetics/ethics with epistemology.
Phase III: causal ontology
Once we observed the properties of entities, we can generate claims about causes that may have given rise to these properties. For example, we may argue that people's evaluation of an event is most affected by its ending (i.e., the “end rule,” Kahneman, Fredrickson, Schreiber, & Redelmeier, Reference Kahneman, Fredrickson, Schreiber and Redelmeier1993) because people better remember recent events (Baddeley & Hitch, Reference Baddeley and Hitch1993).
Causes are never “out there” in the world and thus must be imagined rather than directly observed (Kant, Reference Kant1908). After we have imagined a causal theory, we try to empirically justify it – and can often do so by deducing and testing the theory's entailments. One of the ways through which scientists justify their theories is by fiercely challenging them, giving rise to gradual “survival of the fittest” (Popper, Reference Popper1999).
Theory-challenge can be a lengthy process, pursued through various routes. We may seek the boundaries of a theory by testing it on broad, representative samples of individuals and stimuli. However, oftentimes, a good way to challenge a theory is by testing it in narrow, unusual contexts (e.g., does the “peak-end” effect occur in people with episodic amnesia?). Thus, gathering evidence for broad applicability is (one of the) means for severe theory-challenging – not its end-goal.
Phase IV: synthesis
Once we have some ontology of causes, we can seek a full-fledged “synthesis” that explains how a phenomenon can be truly accounted for in terms of a set of causes and their relations. For example, Rutledge, Skandali, Dayan, and Dolan (Reference Rutledge, Skandali, Dayan and Dolan2014) showed how people's momentary well-being is captured by a formula that describes precise relations between causes such as “prediction error” and “expected rewards.”
The typical way to conduct such research (prevalent in cognitivist “computational modeling” studies) is to pick an operationalization of a phenomenon, and gradually try to find the model that best fits the observed data. As such, those who conduct such research can pride themselves in severely challenging their models.
A caveat of such research is that the strong focus on a specific paradigm as a benchmark often leads us to forget that operationalization was once merely a means to study the general phenomenon. We should (but often forget to) ask the question raised by Yarkoni – can our model generalize to additional manifestations of the phenomenon? However, we must also remember that scientific progress can be slow; it is possible that models that explain behavior on a specific paradigm will eventually develop into full, generalizable accounts of a phenomenon.
Summary
The abovementioned modes of explanation can all reduce puzzlement. A better appreciation of these different modes is critical for cogent discussions concerning replicability, generalizability, and the utility of psychological science.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of interest
None.