The generalizability challenges outlined in the target article are not unique to psychology. Artificial intelligence (AI) – which also attempts to characterize and influence complex systems – is susceptible to many similar challenges. These include random effects of “subject” (random seeds), and unrecognized, unmeasured factors that affect conclusions (Bouthillier et al., Reference Bouthillier, Delaunay, Bronzi, Trofimov, Nichyporuk, Szeto and Vincent2021; Henderson et al., Reference Henderson, Islam, Bachman, Pineau, Precup and Meger2018; Weinberger, Reference Weinberger2020). But the fields respond differently. Each field has different established practices on the publication or dissemination of research, and these different practices help to uniquely immunize the fields to some of these challenges. Could a publication strategy that incorporates elements from both fields be key to achieving generalizability?
In AI, publishing is rapid and multifaceted. Blog posts describe ideas before papers are written, and sharing pre-submission preprints on arXiv is standard practice. The vast majority of novel empirical findings, whether incremental or paradigm-altering, either remain as preprints or are rapidly published in the peer-reviewed proceedings of annual conferences, rather than journals.
Publishing fast accelerates progress in AI. It allows authors to get rapid, broad feedback, and encourages early discovery of the settings where ideas do or do not generalize. Faster publishing is also more inclusive – preliminary knowledge is shared with the entire community, rather than only those who happen to know the author, or who can afford to subscribe to the right journals or attend the right conferences.
In psychology, publishing is slower. Articles are longer, typically summarizing the results of a series of closely-related experiments. In an even slower process, articles are aggregated into larger reviews and meta-analyses.
Publishing slowly allows psychology to carefully explore phenomena, and to integrate the results of many studies. While the writing in individual articles may elide important factors of variation, as cautioned by the target article, psychology studies include more carefully controlled manipulation of some factors than studies in AI. Meta-analyses and reviews attempt to fill the gaps, outlining the limits of a phenomenon and integrating related works, as do journals (like this one) that explicitly encourage debate. Psychology values broader analyses and summaries as an important part of scientific research. Summaries also increase inclusivity, by making the state of knowledge readily available to those who are not directly immersed in the literature or community.
However, publishing fast and slow need not be mutually exclusive. Their benefits are complementary, and each field could learn from the other.
Psychology should incorporate faster publishing, including early preprints, dataset sharing, and conference publications. If researchers shared more preliminary and negative results (as preprints and conference papers), the field could more rapidly learn which factors of variation might be important moderators of an effect. By contrast, relying on journal publications delays the dissemination of research, and increases the bias toward positive results. Fast publishing reduces pressure to present fully developed, distinct stories, instead favoring incremental developments and collaboration across the community. This relates to dataset sharing, which has helped AI to progress, as noted in the target article (section 6.3.7). Collecting a new dataset for each paper slows the development and sharing of research. Thus, we agree that shared datasets – as well as experiment code and materials – help improve the generalizability of research.
It may seem counterintuitive to suggest that psychology should accelerate publishing, given recent arguments for more careful deliberation, even including replication and meta-analysis within papers (McShane & Böckenholt, Reference McShane and Böckenholt2017). What AI shows, however, is that ideas are more thoroughly explored by engaging the broader research community. The ultimate construction of an overarching theory should aggregate across many papers, produced by many unique groups, each with their own biases, apparatuses, and experimental techniques. Fast publishing thus seeds slow publishing; rapidly producing varied studies around a conceptual theme provides the basis for more generalizable summaries. High-level hypotheses and arguments that are not yet sufficiently supported can be shared in blog posts. Thus, individual experimental papers (especially preliminary preprints) can state more conservative, descriptive conclusions, as the target article suggests, but broader speculation and extrapolation can nevertheless be shared.
AI should incorporate slower publishing, including integrative reviews and meta-analyses. Fast publishing in AI often leads to communal knowledge about which techniques are beneficial in which settings. But this knowledge is rarely integrated or made explicit. AI would benefit from more reviews and meta-analyses that quantify variance components across many experiments (target article section 6.3.4–5). There is increasing evidence that unmeasured factors affect conclusions in AI, for example, environmental realism and embodiment (Hill et al., Reference Hill, Lampinen, Schneider, Clark, Botvinick, McClelland and Santoro2020), or reward scales and random seeds (Henderson et al., Reference Henderson, Islam, Bachman, Pineau, Precup and Meger2018). These factors can be partially addressed by more careful experimentation and statistics (Henderson et al., Reference Henderson, Islam, Bachman, Pineau, Precup and Meger2018; Weinberger, Reference Weinberger2020), especially accounting for random effects. However, no single paper can explore every factor, so aggregating research is critical to achieving generalizable understanding.
While incorporating both fast and slow publishing would help, this strategy comes with challenges. Implementing it would require altering incentive structures. Psychology would need to recognize the value of imperfect preprints as research contributions. AI would need to value summarizing articles, even if their primary contribution is to clearly articulate and evaluate common knowledge within the field, rather than proposing something new. Finally, the public and press would need to avoid overinterpreting preliminary results.
There are also research challenges. While shared datasets can accelerate publishing, a “dataset-as-fixed-effect” fallacy can reduce generalizability. For example, common techniques that improve ImageNet performance are detrimental on other datasets because they bias models to rely on texture (Hermann, Chen, & Kornblith, Reference Hermann, Chen and Kornblith2020). There must be sufficient dataset diversity to ensure that the entire community does not overfit (Grootswagers & Robinson, Reference Grootswagers and Robinson2021). But it is easier to identify and correct these issues by exploring, sharing results quickly, and integrating knowledge across many studies.
In summary, generalizability in both psychology and AI would be improved if both fields embraced two “systems” of publication: one rapid and reactive, and the other slower and more deliberate. By rapidly exploring many variations on an idea, and then integrating the results through broad meta-analyses and reviews, both fields could more efficiently arrive at generalizable insights about their domains of inquiry.
The generalizability challenges outlined in the target article are not unique to psychology. Artificial intelligence (AI) – which also attempts to characterize and influence complex systems – is susceptible to many similar challenges. These include random effects of “subject” (random seeds), and unrecognized, unmeasured factors that affect conclusions (Bouthillier et al., Reference Bouthillier, Delaunay, Bronzi, Trofimov, Nichyporuk, Szeto and Vincent2021; Henderson et al., Reference Henderson, Islam, Bachman, Pineau, Precup and Meger2018; Weinberger, Reference Weinberger2020). But the fields respond differently. Each field has different established practices on the publication or dissemination of research, and these different practices help to uniquely immunize the fields to some of these challenges. Could a publication strategy that incorporates elements from both fields be key to achieving generalizability?
In AI, publishing is rapid and multifaceted. Blog posts describe ideas before papers are written, and sharing pre-submission preprints on arXiv is standard practice. The vast majority of novel empirical findings, whether incremental or paradigm-altering, either remain as preprints or are rapidly published in the peer-reviewed proceedings of annual conferences, rather than journals.
Publishing fast accelerates progress in AI. It allows authors to get rapid, broad feedback, and encourages early discovery of the settings where ideas do or do not generalize. Faster publishing is also more inclusive – preliminary knowledge is shared with the entire community, rather than only those who happen to know the author, or who can afford to subscribe to the right journals or attend the right conferences.
In psychology, publishing is slower. Articles are longer, typically summarizing the results of a series of closely-related experiments. In an even slower process, articles are aggregated into larger reviews and meta-analyses.
Publishing slowly allows psychology to carefully explore phenomena, and to integrate the results of many studies. While the writing in individual articles may elide important factors of variation, as cautioned by the target article, psychology studies include more carefully controlled manipulation of some factors than studies in AI. Meta-analyses and reviews attempt to fill the gaps, outlining the limits of a phenomenon and integrating related works, as do journals (like this one) that explicitly encourage debate. Psychology values broader analyses and summaries as an important part of scientific research. Summaries also increase inclusivity, by making the state of knowledge readily available to those who are not directly immersed in the literature or community.
However, publishing fast and slow need not be mutually exclusive. Their benefits are complementary, and each field could learn from the other.
Psychology should incorporate faster publishing, including early preprints, dataset sharing, and conference publications. If researchers shared more preliminary and negative results (as preprints and conference papers), the field could more rapidly learn which factors of variation might be important moderators of an effect. By contrast, relying on journal publications delays the dissemination of research, and increases the bias toward positive results. Fast publishing reduces pressure to present fully developed, distinct stories, instead favoring incremental developments and collaboration across the community. This relates to dataset sharing, which has helped AI to progress, as noted in the target article (section 6.3.7). Collecting a new dataset for each paper slows the development and sharing of research. Thus, we agree that shared datasets – as well as experiment code and materials – help improve the generalizability of research.
It may seem counterintuitive to suggest that psychology should accelerate publishing, given recent arguments for more careful deliberation, even including replication and meta-analysis within papers (McShane & Böckenholt, Reference McShane and Böckenholt2017). What AI shows, however, is that ideas are more thoroughly explored by engaging the broader research community. The ultimate construction of an overarching theory should aggregate across many papers, produced by many unique groups, each with their own biases, apparatuses, and experimental techniques. Fast publishing thus seeds slow publishing; rapidly producing varied studies around a conceptual theme provides the basis for more generalizable summaries. High-level hypotheses and arguments that are not yet sufficiently supported can be shared in blog posts. Thus, individual experimental papers (especially preliminary preprints) can state more conservative, descriptive conclusions, as the target article suggests, but broader speculation and extrapolation can nevertheless be shared.
AI should incorporate slower publishing, including integrative reviews and meta-analyses. Fast publishing in AI often leads to communal knowledge about which techniques are beneficial in which settings. But this knowledge is rarely integrated or made explicit. AI would benefit from more reviews and meta-analyses that quantify variance components across many experiments (target article section 6.3.4–5). There is increasing evidence that unmeasured factors affect conclusions in AI, for example, environmental realism and embodiment (Hill et al., Reference Hill, Lampinen, Schneider, Clark, Botvinick, McClelland and Santoro2020), or reward scales and random seeds (Henderson et al., Reference Henderson, Islam, Bachman, Pineau, Precup and Meger2018). These factors can be partially addressed by more careful experimentation and statistics (Henderson et al., Reference Henderson, Islam, Bachman, Pineau, Precup and Meger2018; Weinberger, Reference Weinberger2020), especially accounting for random effects. However, no single paper can explore every factor, so aggregating research is critical to achieving generalizable understanding.
While incorporating both fast and slow publishing would help, this strategy comes with challenges. Implementing it would require altering incentive structures. Psychology would need to recognize the value of imperfect preprints as research contributions. AI would need to value summarizing articles, even if their primary contribution is to clearly articulate and evaluate common knowledge within the field, rather than proposing something new. Finally, the public and press would need to avoid overinterpreting preliminary results.
There are also research challenges. While shared datasets can accelerate publishing, a “dataset-as-fixed-effect” fallacy can reduce generalizability. For example, common techniques that improve ImageNet performance are detrimental on other datasets because they bias models to rely on texture (Hermann, Chen, & Kornblith, Reference Hermann, Chen and Kornblith2020). There must be sufficient dataset diversity to ensure that the entire community does not overfit (Grootswagers & Robinson, Reference Grootswagers and Robinson2021). But it is easier to identify and correct these issues by exploring, sharing results quickly, and integrating knowledge across many studies.
In summary, generalizability in both psychology and AI would be improved if both fields embraced two “systems” of publication: one rapid and reactive, and the other slower and more deliberate. By rapidly exploring many variations on an idea, and then integrating the results through broad meta-analyses and reviews, both fields could more efficiently arrive at generalizable insights about their domains of inquiry.
Acknowledgments
We thank Julie Cachia and Yochai Shavit for comments on this manuscript.
Financial support
The authors are funded by DeepMind.
Conflict of interest
None.