Yarkoni says there's a generalizability crisis, and we largely agree. In some ways it's actually worse than he suggests, because of widespread ambiguity and imprecision in specifying theories. It's very hard to test theories if they are imprecise (Smaldino, Reference Smaldino2019, Reference Smaldino2020). This isn't just a matter of limitations in mapping specific experiments to more general verbal constructs. Rather, those verbal constructs themselves are often so poorly described that severe tests of their applicability become nearly impossible (Mayo, Reference Mayo2018; Popper, Reference Popper1963). In this light, accounting for more sources of variance in the manner Yarkoni recommends might even be harmful if doing so props up theories that are poorly specified, further insulating such theories from further scrutiny (Smaldino, Reference Smaldino2016). It is plausible that failure to clearly specify the components, relationships, and processes in systems of interest leads to exactly those failures to align verbal and statistical models that characterize the generalizability crisis. If one cannot specify how system components influence one another, how could one begin to guess at how observed data might vary? We suggest that this difficulty in precision, which some have named the “theory crisis” (Oberauer & Lewandowsky, Reference Oberauer and Lewandowsky2019), is inherently interlinked with the generalizability crisis. Theories based solely on statistical correlation are notoriously hard to evaluate (Fried, Reference Fried2020; Meehl, Reference Meehl1990).
Mechanistic explanations and formal models can help by forcing the researcher to articulate their guiding assumptions, decomposing their study system into the parts, properties, and relationships critical for well-formed hypotheses (Kauffman, Reference Kauffman, Buck and Cohen1971; Smaldino, Reference Smaldino, Vallacher, Nowak and Read2017, Reference Smaldino2020). Mechanistic explanations also allow us to ask “what if things had been different” in a way that non-mechanistic explanations cannot (Craver, Reference Craver2006). When mechanistic models are operationalized as mathematical or computational models, simulation experiments can be performed that mirror real-world experiments to understand, a priori, how certain outcome variables might be affected by treatment or contextual variables (Schank, May, & Joshi, Reference Schank, May, Joshi, Griesemer, Wimsatt and Caporael2014). Even purely verbal mechanistic explanations are a welcome, if marginal, improvement over psychological theories that are too often founded on a series of interesting correlations – each of those correlations potentially a victim of non-replicability or overgeneralization. Mechanistic models facilitate expanded qualitative analyses as well, since mechanistic models can often simplify systems of interest to the point where they can be represented as simple box-and-arrow type diagrams for rapid comprehension, critique, correction, and extension.
Expanding the use of mechanistic explanations in psychology would require a substantial restructuring of the operation and training of psychological scientists. In the long run, this will likely require fairly major institutional change (Smaldino, Turner, & Contreras Kallens, Reference Smaldino, Turner and Contreras Kallens2019). It is unclear at the present whether the changes will come from within psychology and related departments, or by incursion from other disciplines better trained in formal methods and interested in the juicy problems previously guarded as the domain of psychologists (Smaldino, Reference Smaldino2020). Time will tell.
For now, there are some things individuals and institutions can do in the short term to kickstart improvements. Strengthening the theoretical foundations in ways needed to mitigate the generalizability and theory crises requires increased interdisciplinarity, technical expertise, and philosophical scrutiny of assumptions (Smaldino, Reference Smaldino2020). Agencies could help by increasing funding for interdisciplinary work between researchers using different approaches to study related problems in the social and behavioral sciences, fostering deeper collaborations between experts in modeling and complex systems and topic experts more familiar with experimental or observational methods. Such projects could also include funding for research software developers and other professional research staff to assist with technical intricacies and provide modeling expertise required to model and analyze complex systems mechanistically. These interdisciplinary teams could collaboratively produce new cyberinfrastructure tools to make mechanistic modeling easier for modeling novices. Later iterations could even extend these tools to automate the generation of computer models and computational experiments to identify sources of variance, and automatically generate statistical models based on the generated computational model (Rand, Reference Rand, Davis, O'Mahony and Pfautz2019).
The study of cultural evolution provides some insights into how we might think about the spread of better practices (Gervais, Reference Gervais2021; Smaldino et al., Reference Smaldino2019; Smaldino & O'Connor, Reference Smaldino and O'Connor2020), including the use and improvement of mechanistic explanations in psychology literature. One approach is to consider that the strategy of failing to align verbal and statistical models is a communication strategy which has been culturally transmitted to generations of psychology trainees. The lack of specificity and resulting lack of alignment between theory and statistical model may be an instance of deceptive signaling (in the ecological sense, which doesn't imply intent to deceive), where a lack of theoretical rigor is covered up with statistical tests and a recitation of related observed correlations. This maps the generalizability crisis onto analogous problems for which models already exist as starting points, including models of signaling in collaborative environments (Smaldino & Turner, Reference Smaldino and Turner2020; Smaldino, Flamson, & McElreath, Reference Smaldino, Flamson and McElreath2018; Tiokhin et al., Reference Tiokhin, Panchanathan, Lakens, Vazire, Morgan and Zollman2021), the evolution of scientific knowledge on networks (O'Connor & Weatherall, Reference O'Connor and Weatherall2018, Reference O'Connor, Weatherall, Allen and Howell2020; Zollman, Reference Zollman2007, Reference Zollman2010, Reference Zollman2013), and the effect of prevailing social power on individual choices (Bergstrom, Foster, & Song, Reference Bergstrom, Foster and Song2016; Henrich & Boyd, Reference Henrich and Boyd2008; Higginson & Munafò, Reference Higginson and Munafò2016; O'Connor, Reference O'Connor2019). With some further development, these models could be used to conduct several “what if things are different” computational experiments under a variety of assumptions to understand what might happen if various interpersonal or institutional changes were instituted. If any of the considered approaches seem promising in silica, it will strengthen the case to expend resources to try them in the real world.
Doing all this is likely to be hard, but worth it. A lack of mechanistic modeling at least complicates the generalizability crisis and is perhaps partly to blame. While this problem is frustrating, it also provides a valuable opportunity to apply social science to an important problem: its own bad state of affairs. The recent, rapid adoption of better practices in psychology and across the sciences, including replication (even if sometimes misguided), registered reports, open science initiatives, data management plans, and more, indicate that many scientists are willing to make changes toward better practices. Changes to institutional incentives must follow.
Yarkoni says there's a generalizability crisis, and we largely agree. In some ways it's actually worse than he suggests, because of widespread ambiguity and imprecision in specifying theories. It's very hard to test theories if they are imprecise (Smaldino, Reference Smaldino2019, Reference Smaldino2020). This isn't just a matter of limitations in mapping specific experiments to more general verbal constructs. Rather, those verbal constructs themselves are often so poorly described that severe tests of their applicability become nearly impossible (Mayo, Reference Mayo2018; Popper, Reference Popper1963). In this light, accounting for more sources of variance in the manner Yarkoni recommends might even be harmful if doing so props up theories that are poorly specified, further insulating such theories from further scrutiny (Smaldino, Reference Smaldino2016). It is plausible that failure to clearly specify the components, relationships, and processes in systems of interest leads to exactly those failures to align verbal and statistical models that characterize the generalizability crisis. If one cannot specify how system components influence one another, how could one begin to guess at how observed data might vary? We suggest that this difficulty in precision, which some have named the “theory crisis” (Oberauer & Lewandowsky, Reference Oberauer and Lewandowsky2019), is inherently interlinked with the generalizability crisis. Theories based solely on statistical correlation are notoriously hard to evaluate (Fried, Reference Fried2020; Meehl, Reference Meehl1990).
Mechanistic explanations and formal models can help by forcing the researcher to articulate their guiding assumptions, decomposing their study system into the parts, properties, and relationships critical for well-formed hypotheses (Kauffman, Reference Kauffman, Buck and Cohen1971; Smaldino, Reference Smaldino, Vallacher, Nowak and Read2017, Reference Smaldino2020). Mechanistic explanations also allow us to ask “what if things had been different” in a way that non-mechanistic explanations cannot (Craver, Reference Craver2006). When mechanistic models are operationalized as mathematical or computational models, simulation experiments can be performed that mirror real-world experiments to understand, a priori, how certain outcome variables might be affected by treatment or contextual variables (Schank, May, & Joshi, Reference Schank, May, Joshi, Griesemer, Wimsatt and Caporael2014). Even purely verbal mechanistic explanations are a welcome, if marginal, improvement over psychological theories that are too often founded on a series of interesting correlations – each of those correlations potentially a victim of non-replicability or overgeneralization. Mechanistic models facilitate expanded qualitative analyses as well, since mechanistic models can often simplify systems of interest to the point where they can be represented as simple box-and-arrow type diagrams for rapid comprehension, critique, correction, and extension.
Expanding the use of mechanistic explanations in psychology would require a substantial restructuring of the operation and training of psychological scientists. In the long run, this will likely require fairly major institutional change (Smaldino, Turner, & Contreras Kallens, Reference Smaldino, Turner and Contreras Kallens2019). It is unclear at the present whether the changes will come from within psychology and related departments, or by incursion from other disciplines better trained in formal methods and interested in the juicy problems previously guarded as the domain of psychologists (Smaldino, Reference Smaldino2020). Time will tell.
For now, there are some things individuals and institutions can do in the short term to kickstart improvements. Strengthening the theoretical foundations in ways needed to mitigate the generalizability and theory crises requires increased interdisciplinarity, technical expertise, and philosophical scrutiny of assumptions (Smaldino, Reference Smaldino2020). Agencies could help by increasing funding for interdisciplinary work between researchers using different approaches to study related problems in the social and behavioral sciences, fostering deeper collaborations between experts in modeling and complex systems and topic experts more familiar with experimental or observational methods. Such projects could also include funding for research software developers and other professional research staff to assist with technical intricacies and provide modeling expertise required to model and analyze complex systems mechanistically. These interdisciplinary teams could collaboratively produce new cyberinfrastructure tools to make mechanistic modeling easier for modeling novices. Later iterations could even extend these tools to automate the generation of computer models and computational experiments to identify sources of variance, and automatically generate statistical models based on the generated computational model (Rand, Reference Rand, Davis, O'Mahony and Pfautz2019).
The study of cultural evolution provides some insights into how we might think about the spread of better practices (Gervais, Reference Gervais2021; Smaldino et al., Reference Smaldino2019; Smaldino & O'Connor, Reference Smaldino and O'Connor2020), including the use and improvement of mechanistic explanations in psychology literature. One approach is to consider that the strategy of failing to align verbal and statistical models is a communication strategy which has been culturally transmitted to generations of psychology trainees. The lack of specificity and resulting lack of alignment between theory and statistical model may be an instance of deceptive signaling (in the ecological sense, which doesn't imply intent to deceive), where a lack of theoretical rigor is covered up with statistical tests and a recitation of related observed correlations. This maps the generalizability crisis onto analogous problems for which models already exist as starting points, including models of signaling in collaborative environments (Smaldino & Turner, Reference Smaldino and Turner2020; Smaldino, Flamson, & McElreath, Reference Smaldino, Flamson and McElreath2018; Tiokhin et al., Reference Tiokhin, Panchanathan, Lakens, Vazire, Morgan and Zollman2021), the evolution of scientific knowledge on networks (O'Connor & Weatherall, Reference O'Connor and Weatherall2018, Reference O'Connor, Weatherall, Allen and Howell2020; Zollman, Reference Zollman2007, Reference Zollman2010, Reference Zollman2013), and the effect of prevailing social power on individual choices (Bergstrom, Foster, & Song, Reference Bergstrom, Foster and Song2016; Henrich & Boyd, Reference Henrich and Boyd2008; Higginson & Munafò, Reference Higginson and Munafò2016; O'Connor, Reference O'Connor2019). With some further development, these models could be used to conduct several “what if things are different” computational experiments under a variety of assumptions to understand what might happen if various interpersonal or institutional changes were instituted. If any of the considered approaches seem promising in silica, it will strengthen the case to expend resources to try them in the real world.
Doing all this is likely to be hard, but worth it. A lack of mechanistic modeling at least complicates the generalizability crisis and is perhaps partly to blame. While this problem is frustrating, it also provides a valuable opportunity to apply social science to an important problem: its own bad state of affairs. The recent, rapid adoption of better practices in psychology and across the sciences, including replication (even if sometimes misguided), registered reports, open science initiatives, data management plans, and more, indicate that many scientists are willing to make changes toward better practices. Changes to institutional incentives must follow.
Financial support
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.
Conflict of interest
None.