The focal article by Murphy (Reference Murphy2021) raises a number of excellent points about legacy statistical reporting practices that would be well-addressed by increasing transparency in the research process. Here, we argue that a complementary solution to many of the ills that were proposed by Murphy is to consider raw dataFootnote 1 and analysis code as a “product” of the research process on par with the manuscript that describes one’s research efforts. That is to say, many of the problems that are outlined in the focal article could be (at least partially) solved if journals required researchers to submit their raw data and analysis code as a supplement to their manuscript, which would be made publicly available upon publication. Murphy specifically notes three issues that are associated with increasing complexity and diversity of data-analytic methods in organizational research: (a) incorrect applications and interpretations of analyses, (b) increasing reliance on significance testing, and (c) increasing difficulty in interpretation that widens the gaps between science and practice. We argue here that requiring open sharing of raw data and analysis code is at least a partial remedy for each of these issues and would also have a much broader benefit to the industrial-organizational (I-O) psychology literature in terms of increasing the transparency (and thus the credibility) of our science.
To the first point, regarding the potential for incorrect applications and interpretations of analyses, we argue that one antidote to such errors is to provide readers with the raw data and the analysis code that gave rise to the analyses and corresponding results in the first place. Indeed, to the extent that readers can triangulate the results that are reported in manuscripts to the raw data and analysis code that produced them, more confidence in the conclusions of the research can be instilled. Also, the potential to catch errors and/or fraud in analyses during the review process would be increased to the extent that editors and reviewers have access to this information when making evaluations of such work (Simonsohn, Reference Simonsohn2013). Ultimately, with raw data and analysis code in hand, anyone can reproduce Table 1 (and Table 2, Table 3 … Table k).
To the second point, regarding the limitations of (and overreliance on) null hypothesis significance testing (NHST), sharing raw data and analysis code helps to overcome two related issues. First, it facilitates meta-analytic techniques, which generally focus on effect size estimation and to some extent eschew NHST logic (e.g., Hunter & Schmidt, Reference Hunter and Schmidt2004; Schmidt & Hunter, Reference Schmidt and Hunter2002). Regarding meta-analysis, it is true that such analyses are typically undertaken with summary data. However, we should point out that (a) it is often difficult to extract the necessary information from summary data alone to compute appropriate effect sizes (e.g., Rudolph & Jundt, Reference Rudolph and Jundt2017) and (b) there are more advanced meta-analytic procedures that would certainly benefit from access to both raw and summary data (e.g., meta-analytically testing nonlinear associations; see Katz et al., Reference Katz, Rudolph and Zacher2019). Second, sharing raw data facilitates “mega-analysis” (also called “integrative data analysis,” Curran & Hussong, Reference Curran and Hussong2009; Eisenhauer, Reference Eisenhauer2021) of pooled raw data sets of the same phenomena across multiple samples; a “high-n” and thus “high-powered” technique that can help avoid certain pitfalls of NHST logic.
Finally, to the third point about interpretation fueling science–practice gaps, open sharing of raw data and analysis code empowers consumers of science on both sides of the divide to take action to answer their own questions about the results that are being presented; it facilitates the broad translation of increasingly complex analyses to consumers of research across different roles (i.e., students, researchers, and practitioners—but also editors and reviewers) by providing a reproducible pipeline from data to code to manuscript.
Implementing these ideas will require a few things, not least of which is changing norms about what are the “products” of the research process. To do this, first we need to normalize the expectation that papers will include online supplemental appendices with raw data and analysis code. Indeed, various journals in our field already (at least tacitly) encourage this. For example, in its instructions for authors, the Journal of Applied Psychology suggests, “We recommend sharing data and materials via trusted repositories.” Similarly, the European Journal of Work and Organizational Psychology suggests, “Authors are encouraged to share or make open the data supporting the results or analyses presented in their paper.”
Second, the move toward open sharing of raw data and analysis code will require a willingness of editors and reviewers to do “extra work” by looking at and evaluating these materials as part of the peer review process. However, it could be argued that this will also save time in the long run, as papers with shoddy efforts toward raw data and analysis code sharing would be quickly dispensed with in favor of those whose authors have made the effort to do so. Moreover, these suggestions open up the potential for new editorial roles focusing on raw data and code review (i.e., the “methods editor” as a stand-alone editorial role).
Finally, arguably, moving toward open access to raw data and analysis code requirements would necessitate a broader set of knowledge and skills that goes beyond just understanding complex analyses, particularly an understanding of complex raw data and analysis code structures. To this end, we would argue that these are contemporary competencies that are required of top researchers, reviewers, and editors, as well as consumers of research (including scientists and practitioners).
We can also anticipate push back on these ideas. For example, the argument could be offered that one cannot share their raw data and/or analysis code for various reasons (e.g., the raw data contains identifying information; the code contains proprietary algorithms). Acouple of rejoinders here bear consideration. First, anonymized raw data can readily be shared and, depending on the models considered, it may be the case that summary statistics (e.g., correlation/covariance matrices) may suffice in place of raw data (e.g., to reproduce simple regression or path analyses). Second, there are advanced tools available to researchers to create anonymous versions of data sets that maintain the original structure of one’s data (e.g., the “synthpop” package for R; Nowok et al., Reference Nowok, Raab and Dibben2016). Finally, with the advent and use of more advanced statistical analysis techniques, researchers should question the reproducibility of studies based on proprietary analysis methods and demand open-source methodologies whenever possible.
Moreover, it could also be rightly argued that requiring open access to raw data and analysis code does not solve all of the issues that are associated with research misconduct; for example, it would not necessarily curtail people from creatively maneuvering their data before it is shared. We contend, however, that it is arguably easier to spot manipulated raw data than manipulated summary data (e.g., for an exceptionally notable example, see Levelt et al., Reference Levelt, Noort and Drenth2012). Moreover, it is almost certainly easier to spot errors in analysis code than in consequent results derived therefrom (e.g., Poldrack & Poline, Reference Poldrack and Poline2015).
Another possible critique of our suggestions is the possibility that others may use or “scoop” one’s data (i.e., use without credit or attribution). Although this issue has been debated variously in the open-science community (e.g., Bishop, Reference Bishop2015; Laine, Reference Laine2017), we would argue that in the rare case in which this occurs, this is more of an issue for the “scoop-er” than the “scoop-ee.” Somewhat related, there is also the potential fear that opening up raw data and analysis code would invite reanalysis that could debunk one’s original claims (e.g., if substantial errors are found in the way in which data has been analyzed). However, we would argue that this is just “good science” and as such is a feature (not a “bug”) of our proposition (e.g., Silberzahn et al., Reference Silberzahn, Simonsohn and Uhlmann2014).
Finally, and related to several of the points we raise here, it could be argued that requiring open sharing of raw data and analysis code would add “hurdles” or levels of bureaucracy to the research process. To this point, we would suggest that the value of psychological research has to some extent suffered from issues of credibility, stemming in part from a lack of transparency at various levels (see Rudolph, Reference Rudolph2021). Arguably, the effort required to remove poorly conducted research from the literature is much higher than preventing such work from ever making it into our journals. As such, the suggestions we have offered represent a much easier means of curtailing “bad science” (while encouraging “good science”) in the first place.
In closing, we ask the question, “How can we encourage the open sharing of raw data and analysis code?” Beyond the requirements that are laid out by journals to this end (which, so far, have come in the form of “encouragements”), there are creative ways to incentivize this practice. Indeed, it is often a requirement of funding agencies that data be eventually deposited in public repositories (e.g., ICPSR; https://www.icpsr.umich.edu). However, more immediately, programs such as the Center for Open Science’s “badges initiative” (https://www.cos.io/initiatives/badges) give credit to authors who adopt open science practices, including the sharing of raw data and analysis code, by awarding “badges” that are displayed graphically at the top of published research articles. Despite widespread adoption in other fields of psychology, to our knowledge, no journals in I-O psychology currently participate in this initiative. By no means are the suggestions offered here a perfect answer to each of the challenges that are raised by Murphy (Reference Murphy2021). However, we would argue that moving toward more open and transparent research practices in I-O psychology, especially open sharing of raw data and analysis code, would go a long way toward addressing these challenges. We hope that researchers and journal editors will take us up on these suggestions and look forward to a future in which I-O psychology research (i.e., manuscripts, but also supporting raw data and analysis code) is more accessible to all.
The focal article by Murphy (Reference Murphy2021) raises a number of excellent points about legacy statistical reporting practices that would be well-addressed by increasing transparency in the research process. Here, we argue that a complementary solution to many of the ills that were proposed by Murphy is to consider raw dataFootnote 1 and analysis code as a “product” of the research process on par with the manuscript that describes one’s research efforts. That is to say, many of the problems that are outlined in the focal article could be (at least partially) solved if journals required researchers to submit their raw data and analysis code as a supplement to their manuscript, which would be made publicly available upon publication. Murphy specifically notes three issues that are associated with increasing complexity and diversity of data-analytic methods in organizational research: (a) incorrect applications and interpretations of analyses, (b) increasing reliance on significance testing, and (c) increasing difficulty in interpretation that widens the gaps between science and practice. We argue here that requiring open sharing of raw data and analysis code is at least a partial remedy for each of these issues and would also have a much broader benefit to the industrial-organizational (I-O) psychology literature in terms of increasing the transparency (and thus the credibility) of our science.
To the first point, regarding the potential for incorrect applications and interpretations of analyses, we argue that one antidote to such errors is to provide readers with the raw data and the analysis code that gave rise to the analyses and corresponding results in the first place. Indeed, to the extent that readers can triangulate the results that are reported in manuscripts to the raw data and analysis code that produced them, more confidence in the conclusions of the research can be instilled. Also, the potential to catch errors and/or fraud in analyses during the review process would be increased to the extent that editors and reviewers have access to this information when making evaluations of such work (Simonsohn, Reference Simonsohn2013). Ultimately, with raw data and analysis code in hand, anyone can reproduce Table 1 (and Table 2, Table 3 … Table k).
To the second point, regarding the limitations of (and overreliance on) null hypothesis significance testing (NHST), sharing raw data and analysis code helps to overcome two related issues. First, it facilitates meta-analytic techniques, which generally focus on effect size estimation and to some extent eschew NHST logic (e.g., Hunter & Schmidt, Reference Hunter and Schmidt2004; Schmidt & Hunter, Reference Schmidt and Hunter2002). Regarding meta-analysis, it is true that such analyses are typically undertaken with summary data. However, we should point out that (a) it is often difficult to extract the necessary information from summary data alone to compute appropriate effect sizes (e.g., Rudolph & Jundt, Reference Rudolph and Jundt2017) and (b) there are more advanced meta-analytic procedures that would certainly benefit from access to both raw and summary data (e.g., meta-analytically testing nonlinear associations; see Katz et al., Reference Katz, Rudolph and Zacher2019). Second, sharing raw data facilitates “mega-analysis” (also called “integrative data analysis,” Curran & Hussong, Reference Curran and Hussong2009; Eisenhauer, Reference Eisenhauer2021) of pooled raw data sets of the same phenomena across multiple samples; a “high-n” and thus “high-powered” technique that can help avoid certain pitfalls of NHST logic.
Finally, to the third point about interpretation fueling science–practice gaps, open sharing of raw data and analysis code empowers consumers of science on both sides of the divide to take action to answer their own questions about the results that are being presented; it facilitates the broad translation of increasingly complex analyses to consumers of research across different roles (i.e., students, researchers, and practitioners—but also editors and reviewers) by providing a reproducible pipeline from data to code to manuscript.
Implementing these ideas will require a few things, not least of which is changing norms about what are the “products” of the research process. To do this, first we need to normalize the expectation that papers will include online supplemental appendices with raw data and analysis code. Indeed, various journals in our field already (at least tacitly) encourage this. For example, in its instructions for authors, the Journal of Applied Psychology suggests, “We recommend sharing data and materials via trusted repositories.” Similarly, the European Journal of Work and Organizational Psychology suggests, “Authors are encouraged to share or make open the data supporting the results or analyses presented in their paper.”
Second, the move toward open sharing of raw data and analysis code will require a willingness of editors and reviewers to do “extra work” by looking at and evaluating these materials as part of the peer review process. However, it could be argued that this will also save time in the long run, as papers with shoddy efforts toward raw data and analysis code sharing would be quickly dispensed with in favor of those whose authors have made the effort to do so. Moreover, these suggestions open up the potential for new editorial roles focusing on raw data and code review (i.e., the “methods editor” as a stand-alone editorial role).
Finally, arguably, moving toward open access to raw data and analysis code requirements would necessitate a broader set of knowledge and skills that goes beyond just understanding complex analyses, particularly an understanding of complex raw data and analysis code structures. To this end, we would argue that these are contemporary competencies that are required of top researchers, reviewers, and editors, as well as consumers of research (including scientists and practitioners).
We can also anticipate push back on these ideas. For example, the argument could be offered that one cannot share their raw data and/or analysis code for various reasons (e.g., the raw data contains identifying information; the code contains proprietary algorithms). Acouple of rejoinders here bear consideration. First, anonymized raw data can readily be shared and, depending on the models considered, it may be the case that summary statistics (e.g., correlation/covariance matrices) may suffice in place of raw data (e.g., to reproduce simple regression or path analyses). Second, there are advanced tools available to researchers to create anonymous versions of data sets that maintain the original structure of one’s data (e.g., the “synthpop” package for R; Nowok et al., Reference Nowok, Raab and Dibben2016). Finally, with the advent and use of more advanced statistical analysis techniques, researchers should question the reproducibility of studies based on proprietary analysis methods and demand open-source methodologies whenever possible.
Moreover, it could also be rightly argued that requiring open access to raw data and analysis code does not solve all of the issues that are associated with research misconduct; for example, it would not necessarily curtail people from creatively maneuvering their data before it is shared. We contend, however, that it is arguably easier to spot manipulated raw data than manipulated summary data (e.g., for an exceptionally notable example, see Levelt et al., Reference Levelt, Noort and Drenth2012). Moreover, it is almost certainly easier to spot errors in analysis code than in consequent results derived therefrom (e.g., Poldrack & Poline, Reference Poldrack and Poline2015).
Another possible critique of our suggestions is the possibility that others may use or “scoop” one’s data (i.e., use without credit or attribution). Although this issue has been debated variously in the open-science community (e.g., Bishop, Reference Bishop2015; Laine, Reference Laine2017), we would argue that in the rare case in which this occurs, this is more of an issue for the “scoop-er” than the “scoop-ee.” Somewhat related, there is also the potential fear that opening up raw data and analysis code would invite reanalysis that could debunk one’s original claims (e.g., if substantial errors are found in the way in which data has been analyzed). However, we would argue that this is just “good science” and as such is a feature (not a “bug”) of our proposition (e.g., Silberzahn et al., Reference Silberzahn, Simonsohn and Uhlmann2014).
Finally, and related to several of the points we raise here, it could be argued that requiring open sharing of raw data and analysis code would add “hurdles” or levels of bureaucracy to the research process. To this point, we would suggest that the value of psychological research has to some extent suffered from issues of credibility, stemming in part from a lack of transparency at various levels (see Rudolph, Reference Rudolph2021). Arguably, the effort required to remove poorly conducted research from the literature is much higher than preventing such work from ever making it into our journals. As such, the suggestions we have offered represent a much easier means of curtailing “bad science” (while encouraging “good science”) in the first place.
In closing, we ask the question, “How can we encourage the open sharing of raw data and analysis code?” Beyond the requirements that are laid out by journals to this end (which, so far, have come in the form of “encouragements”), there are creative ways to incentivize this practice. Indeed, it is often a requirement of funding agencies that data be eventually deposited in public repositories (e.g., ICPSR; https://www.icpsr.umich.edu). However, more immediately, programs such as the Center for Open Science’s “badges initiative” (https://www.cos.io/initiatives/badges) give credit to authors who adopt open science practices, including the sharing of raw data and analysis code, by awarding “badges” that are displayed graphically at the top of published research articles. Despite widespread adoption in other fields of psychology, to our knowledge, no journals in I-O psychology currently participate in this initiative. By no means are the suggestions offered here a perfect answer to each of the challenges that are raised by Murphy (Reference Murphy2021). However, we would argue that moving toward more open and transparent research practices in I-O psychology, especially open sharing of raw data and analysis code, would go a long way toward addressing these challenges. We hope that researchers and journal editors will take us up on these suggestions and look forward to a future in which I-O psychology research (i.e., manuscripts, but also supporting raw data and analysis code) is more accessible to all.