Formalism is not associated with a strong focus on context
Yarkoni correctly points out that the practice of using verbal models tested by statistical tools can be problematic. It then elaborates, focusing almost exclusively on the issue of context-dependence of human behavior and the overall complexity of the subject matter of social science, which renders generalizability of research findings difficult. I wish to contribute to the discussion from the perspective of a different behavioral discipline, experimental economics, which uses the mathematical language in its theory more frequently than psychology. I draw lessons from that discipline to show that the elaborated generalizability issues, while relevant, are not related directly to the lack of formalism.
In experimental economics, theories are predominantly mathematical, rather than verbal. Economics employs a set of principles, based on which, deductive models are constructed. These include preferences, beliefs, optimization, and equilibrium. Models and their associated properties are pitted against each other using data, while the formal rigor facilitates clear connection among models, underlying principles, and empirical methods. Prominent scholars have long regarded the assessment of competing models, not external validity, as the main focus of experiments (Plott, Reference Plott1982; Smith, Reference Smith1976). Schram (Reference Schram2005) argued that “external validity has received much more attention in psychology than in economics. To a large extent, psychological research is inductive and based on observed empirical regularities.”
Camerer (Reference Camerer2011) clearly explains why experimental economics has traditionally had a weaker concern for generalizability to real-life settings: “all empirical methods are trying to accumulate regularity about how behavior is generally influenced by individual characteristics, incentives, endowments, rules, norms, and other factors. A typical experiment therefore has no specific target for ‘external validity’….” According to this view – called the “scientific” view – a theory-testing experiment helps choose between different theories and connects to our current understanding of the world.
Partly because of this specific methodological tradition, the issues that Yarkoni develops in the main text have not received major attention in experimental economics. As Loewenstein (Reference Loewenstein1999) and Levitt and List (Reference Levitt and List2007) argue, external validity or sampling concerns have not been given more focus relative to psychology – but see Exadaktylos, Espín, and Branas-Garza (Reference Exadaktylos, Espín and Branas-Garza2013) – and contextual variables are not regularly incorporated in models as Yarkoni envisions. Duflo (Reference Duflo2017) argues: “details that we as economists might consider relatively uninteresting are in fact extraordinarily important in determining the final impact of a policy or a regulation, while some of the theoretical issues we worry about most may not be that relevant.”
A literature comparison indicates that experimental economists do not introduce and systematically vary contextual factors more frequently than psychologists (especially within a given study, as Yarkoni advocates). Because of their interest in general principles, economists focus more on the importance of homogenizing important types of stimuli and removing context (Hertwig & Ortmann, Reference Hertwig and Ortmann2001). However, Levitt and List (Reference Levitt and List2007) argue that cross-situational consistency of behavior is lacking, which requires theories and methodologies to be addressed (e.g., see Galizzi & Navarro-Martinez, Reference Galizzi and Navarro-Martinez2019). Coinciding with a possible reproductivity crisis in science (see Ioannidis, Reference Ioannidis2005; Maniadis, Tufano, & List, Reference Maniadis, Tufano and List2014), theoretical interest in generalizability has recently increased (Kessler & Vesterlund, Reference Kessler and Vesterlund2015; List, Reference List2020; Zizzo, Reference Zizzo2013).
To summarize the point: for experimental economics, it is not the case that the use of mathematical theories for decades was accompanied by a focus on the importance of heterogeneity of stimuli and other contextual factors. Instead, formal theory-testing is considered a domain where generalizability concerns should apply less. The problem of context-dependence in psychology may deserve to be addressed by careful statistical models and advanced experimental designs. However, the verbal representation of theories does not seem to be the culprit.
Advantages of formal theory
I argue that the lack of formal theories in psychology is more problematic for another reason: it hampers clear theoretical predictions. In economics, formalism facilitates a relatively tight logical connection between theory and predictions. Accordingly, statistical research hypotheses follow theory naturally. Hence, it is more difficult to account – using ad hoc arguments – for experimental evidence inconsistent with a given theory. Muthukrishna and Henrich (Reference Muthukrishna and Henrich2019) and Ortmann (Reference Ortmann2020) also advocate mathematical formalism to help us understand what a theory predicts and what it does not.
Contrary to the main connection made in Yarkoni, a formal framework grounded on a set of overarching principles may facilitate knowledge accumulation not by allowing an arbitrary number of moderators to be considered, but by restricting the set of questions that are considered reasonable. This aspect of theory in experimental economics is now attracting some attention in psychology (Muthukrishna & Henrich, Reference Muthukrishna and Henrich2019). However, one needs to be cautious: while formalism makes excessive ad hoc theorizing more difficult, it does not rule it out.
Experimental economics seems to fare better in terms of replicability (Camerer et al., Reference Camerer, Dreber, Forsell, Ho, Huber, Johannesson and Wu2016), and rigorous theory plays a role in this. However, this rigor mediates replicability primarily via some of the secondary channels mentioned in Yarkoni: making riskier predictions and explicitly comparing competing theories. Predictions in economics tend to be much more quantitative and often estimation (rather than statistical hypothesis-testing alone) is the objective.
A pragmatic approach
If the target is applicability to specific domains rather than theory-testing, another approach could be used. Randomized controlled trials in development and public economics examine the performance of interventions in natural environments. This methodological approach has been compared to that of plumbers, dentists, or engineers (Duflo, Reference Duflo2017; Roth, Reference Roth2002, Reference Roth2018), and may be useful as a partial remedy to a possible “generalizability crisis.” Variability-enhancing designs that examine a high number of psychological factors may not always be pragmatic or feasible. Instead, in many cases of interest, one could focus on specific policy domains and try to emulate them. A promising approach is assessing systematically whether the effect size of a given intervention is robust to the intervention being scaled-up as a full policy (Al-Ubaydli, List, & Suskind, Reference Al-Ubaydli, List and Suskind2017). Acknowledging the importance of scalability in concrete policy domains could be a less ambitious – but potentially useful – approach for addressing a potential generalizability crisis.
Formalism is not associated with a strong focus on context
Yarkoni correctly points out that the practice of using verbal models tested by statistical tools can be problematic. It then elaborates, focusing almost exclusively on the issue of context-dependence of human behavior and the overall complexity of the subject matter of social science, which renders generalizability of research findings difficult. I wish to contribute to the discussion from the perspective of a different behavioral discipline, experimental economics, which uses the mathematical language in its theory more frequently than psychology. I draw lessons from that discipline to show that the elaborated generalizability issues, while relevant, are not related directly to the lack of formalism.
In experimental economics, theories are predominantly mathematical, rather than verbal. Economics employs a set of principles, based on which, deductive models are constructed. These include preferences, beliefs, optimization, and equilibrium. Models and their associated properties are pitted against each other using data, while the formal rigor facilitates clear connection among models, underlying principles, and empirical methods. Prominent scholars have long regarded the assessment of competing models, not external validity, as the main focus of experiments (Plott, Reference Plott1982; Smith, Reference Smith1976). Schram (Reference Schram2005) argued that “external validity has received much more attention in psychology than in economics. To a large extent, psychological research is inductive and based on observed empirical regularities.”
Camerer (Reference Camerer2011) clearly explains why experimental economics has traditionally had a weaker concern for generalizability to real-life settings: “all empirical methods are trying to accumulate regularity about how behavior is generally influenced by individual characteristics, incentives, endowments, rules, norms, and other factors. A typical experiment therefore has no specific target for ‘external validity’….” According to this view – called the “scientific” view – a theory-testing experiment helps choose between different theories and connects to our current understanding of the world.
Partly because of this specific methodological tradition, the issues that Yarkoni develops in the main text have not received major attention in experimental economics. As Loewenstein (Reference Loewenstein1999) and Levitt and List (Reference Levitt and List2007) argue, external validity or sampling concerns have not been given more focus relative to psychology – but see Exadaktylos, Espín, and Branas-Garza (Reference Exadaktylos, Espín and Branas-Garza2013) – and contextual variables are not regularly incorporated in models as Yarkoni envisions. Duflo (Reference Duflo2017) argues: “details that we as economists might consider relatively uninteresting are in fact extraordinarily important in determining the final impact of a policy or a regulation, while some of the theoretical issues we worry about most may not be that relevant.”
A literature comparison indicates that experimental economists do not introduce and systematically vary contextual factors more frequently than psychologists (especially within a given study, as Yarkoni advocates). Because of their interest in general principles, economists focus more on the importance of homogenizing important types of stimuli and removing context (Hertwig & Ortmann, Reference Hertwig and Ortmann2001). However, Levitt and List (Reference Levitt and List2007) argue that cross-situational consistency of behavior is lacking, which requires theories and methodologies to be addressed (e.g., see Galizzi & Navarro-Martinez, Reference Galizzi and Navarro-Martinez2019). Coinciding with a possible reproductivity crisis in science (see Ioannidis, Reference Ioannidis2005; Maniadis, Tufano, & List, Reference Maniadis, Tufano and List2014), theoretical interest in generalizability has recently increased (Kessler & Vesterlund, Reference Kessler and Vesterlund2015; List, Reference List2020; Zizzo, Reference Zizzo2013).
To summarize the point: for experimental economics, it is not the case that the use of mathematical theories for decades was accompanied by a focus on the importance of heterogeneity of stimuli and other contextual factors. Instead, formal theory-testing is considered a domain where generalizability concerns should apply less. The problem of context-dependence in psychology may deserve to be addressed by careful statistical models and advanced experimental designs. However, the verbal representation of theories does not seem to be the culprit.
Advantages of formal theory
I argue that the lack of formal theories in psychology is more problematic for another reason: it hampers clear theoretical predictions. In economics, formalism facilitates a relatively tight logical connection between theory and predictions. Accordingly, statistical research hypotheses follow theory naturally. Hence, it is more difficult to account – using ad hoc arguments – for experimental evidence inconsistent with a given theory. Muthukrishna and Henrich (Reference Muthukrishna and Henrich2019) and Ortmann (Reference Ortmann2020) also advocate mathematical formalism to help us understand what a theory predicts and what it does not.
Contrary to the main connection made in Yarkoni, a formal framework grounded on a set of overarching principles may facilitate knowledge accumulation not by allowing an arbitrary number of moderators to be considered, but by restricting the set of questions that are considered reasonable. This aspect of theory in experimental economics is now attracting some attention in psychology (Muthukrishna & Henrich, Reference Muthukrishna and Henrich2019). However, one needs to be cautious: while formalism makes excessive ad hoc theorizing more difficult, it does not rule it out.
Experimental economics seems to fare better in terms of replicability (Camerer et al., Reference Camerer, Dreber, Forsell, Ho, Huber, Johannesson and Wu2016), and rigorous theory plays a role in this. However, this rigor mediates replicability primarily via some of the secondary channels mentioned in Yarkoni: making riskier predictions and explicitly comparing competing theories. Predictions in economics tend to be much more quantitative and often estimation (rather than statistical hypothesis-testing alone) is the objective.
A pragmatic approach
If the target is applicability to specific domains rather than theory-testing, another approach could be used. Randomized controlled trials in development and public economics examine the performance of interventions in natural environments. This methodological approach has been compared to that of plumbers, dentists, or engineers (Duflo, Reference Duflo2017; Roth, Reference Roth2002, Reference Roth2018), and may be useful as a partial remedy to a possible “generalizability crisis.” Variability-enhancing designs that examine a high number of psychological factors may not always be pragmatic or feasible. Instead, in many cases of interest, one could focus on specific policy domains and try to emulate them. A promising approach is assessing systematically whether the effect size of a given intervention is robust to the intervention being scaled-up as a full policy (Al-Ubaydli, List, & Suskind, Reference Al-Ubaydli, List and Suskind2017). Acknowledging the importance of scalability in concrete policy domains could be a less ambitious – but potentially useful – approach for addressing a potential generalizability crisis.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of interest
None.