Crossref Citations
This article has been cited by the following publications. This list is generated based on data provided by
Crossref.
Pietschnig, Jakob
Siegel, Magdalena
Eder, Junia Sophia Nur
and
Gittler, Georg
2019.
Effect Declines Are Systematic, Strong, and Ubiquitous: A Meta-Meta-Analysis of the Decline Effect in Intelligence Research.
Frontiers in Psychology,
Vol. 10,
Issue. ,
Nuijten, Michèle B.
and
Polanin, Joshua R.
2020.
“statcheck”: Automatically detect statistical reporting inconsistencies to increase reproducibility ofmeta‐analyses.
Research Synthesis Methods,
Vol. 11,
Issue. 5,
p.
574.
Kaufmann, Esther
2020.
How accurately do teachers’ judge students? Re-analysis of Hoge and Coladarci (1989) meta-analysis.
Contemporary Educational Psychology,
Vol. 63,
Issue. ,
p.
101902.
de Winter, J.C.F.
Petermeijer, S.M.
Kooijman, L.
and
Dodou, D.
2021.
Replicating five pupillometry studies of Eckhard Hess.
International Journal of Psychophysiology,
Vol. 165,
Issue. ,
p.
145.
Hardwicke, Tom E.
Bohn, Manuel
MacDonald, Kyle
Hembacher, Emily
Nuijten, Michèle B.
Peloquin, Benjamin N.
deMayo, Benjamin E.
Long, Bria
Yoon, Erica J.
and
Frank, Michael C.
2021.
Analytic reproducibility in articles receiving open data badges at the journal
Psychological Science
: an observational study
.
Royal Society Open Science,
Vol. 8,
Issue. 1,
Nuijten, Michèle
2021.
The New Common.
p.
161.
Samota, Evanthia Kaimaklioti
and
Davey, Robert P.
2021.
Knowledge and Attitudes Among Life Scientists Toward Reproducibility Within Journal Articles: A Research Survey.
Frontiers in Research Metrics and Analytics,
Vol. 6,
Issue. ,
Heirene, Robert M.
2021.
A call for replications of addiction research: which studies should we replicate and what constitutes a ‘successful’ replication?.
Addiction Research & Theory,
Vol. 29,
Issue. 2,
p.
89.
Page, Matthew J.
Moher, David
Fidler, Fiona M.
Higgins, Julian P. T.
Brennan, Sue E.
Haddaway, Neal R.
Hamilton, Daniel G.
Kanukula, Raju
Karunananthan, Sathya
Maxwell, Lara J.
McDonald, Steve
Nakagawa, Shinichi
Nunan, David
Tugwell, Peter
Welch, Vivian A.
and
McKenzie, Joanne E.
2021.
The REPRISE project: protocol for an evaluation of REProducibility and Replicability In Syntheses of Evidence.
Systematic Reviews,
Vol. 10,
Issue. 1,
Fletcher, Samuel C.
2021.
The role of replication in psychological science.
European Journal for Philosophy of Science,
Vol. 11,
Issue. 1,
Nosek, Brian A.
Hardwicke, Tom E.
Moshontz, Hannah
Allard, Aurélien
Corker, Katherine S.
Dreber, Anna
Fidler, Fiona
Hilgard, Joe
Kline Struhl, Melissa
Nuijten, Michèle B.
Rohrer, Julia M.
Romero, Felipe
Scheel, Anne M.
Scherer, Laura D.
Schönbrodt, Felix D.
and
Vazire, Simine
2022.
Replicability, Robustness, and Reproducibility in Psychological Science.
Annual Review of Psychology,
Vol. 73,
Issue. 1,
p.
719.
Nuijten, Michèle B.
2022.
Avoiding Questionable Research Practices in Applied Psychology.
p.
379.
Laurinavichyute, Anna
Yadav, Himanshu
and
Vasishth, Shravan
2022.
Share the code, not just the data: A case study of the reproducibility of articles published in the Journal of Memory and Language under the open data policy.
Journal of Memory and Language,
Vol. 125,
Issue. ,
p.
104332.
Bakker, Bert N.
2023.
The Oxford Handbook of Political Psychology.
p.
21.
Nuijten, Michèle B.
and
Wicherts, Jelte M.
2024.
Implementing Statcheck During Peer Review Is Related to a Steep Decline in Statistical-Reporting Inconsistencies.
Advances in Methods and Practices in Psychological Science,
Vol. 7,
Issue. 2,
Kaufmann, Esther
and
Harpaz, Gal
2024.
Teachers’ judgment accuracy: A replication check by psychometric meta-analysis.
PLOS ONE,
Vol. 19,
Issue. 7,
p.
e0307594.
Luijken, K.
Lohmann, A.
Alter, U.
Claramunt Gonzalez, J.
Clouth, F. J.
Fossum, J. L.
Hesen, L.
Huizing, A. H. J.
Ketelaar, J.
Montoya, A. K.
Nab, L.
Nijman, R. C. C.
Penning de Vries, B. B. L.
Tibbe, T. D.
Wang, Y. A.
and
Groenwold, R. H. H.
2024.
Replicability of simulation studies for the investigation of statistical methods: the RepliSims project.
Royal Society Open Science,
Vol. 11,
Issue. 1,
Wendelborn, Christian
Anger, Michael
and
Schickhardt, Christoph
2024.
Promoting Data Sharing: The Moral Obligations of Public Funding Agencies.
Science and Engineering Ethics,
Vol. 30,
Issue. 4,
Clift, Georgia
Beaudry, Jennifer
Leung, Sumie
and
Kaufman, Jordy
2024.
Verification report: Egalitarianism in young children.
Infant and Child Development,
Vol. 33,
Issue. 4,
Zwaan et al. (2017) provide an important and timely overview of the discussion as to whether direct replications in psychology have value. Along with others (see, e.g., Royal Netherlands Academy of Arts and Sciences 2018), we agree wholeheartedly that replication should become mainstream in psychology. However, we feel that the authors missed a crucial aspect in determining whether a direct replication is valuable. Here, we argue that it is essential to first verify the results of the original study by conducting an independent reanalysis of its data or a check of reported results, before choosing to replicate an earlier finding in a novel sample.
A result is successfully reproduced if independent reanalysis of the original data, using either the same or a (substantively or methodologically) similar analytic approach, corroborates the result as reported in the original paper. If a result cannot be successfully reproduced, the original result is not reliable and it is hard, if not impossible, to substantively interpret it. Such an irreproducible result will have no clear bearing on theory or practice. Specifically, if a reanalysis yields no evidence for an effect in the original study, it is safe to assume that there is no effect to begin with, raising the question of why one would invest additional resources in any replication.
Problems with reproducibility in psychology
Lack of reproducibility might seem like a non-issue; after all, it may seem like a guarantee that running the same analysis on the same data would give the same result. However, there is increasing evidence that reproducibility of published results in psychology is relatively low.
Checking reproducibility of reported results in psychology is greatly impeded by a common failure to share data (Vanpaemel et al. Reference Vanpaemel, Vermorgen, Deriemaecker and Storms2015; Wicherts et al. Reference Wicherts, Borsboom, Kats and Molenaar2006). Even when data are available, they are often of poor quality or not usable (Kidwell et al. Reference Kidwell, Lazarevic, Baranski, Hardwicke, Piechowski, Falkenberg, Kennett, Slowik, Sonnleitner, Hess-Holden, Errington, Fiedler and Nosek2016). Yet some issues with reproducibility can be assessed by scrutinizing papers. Studies repeatedly showed that roughly half of all published psychology articles contains at least one inconsistently reported statistical result, wherein the reported p value does not match the degrees of freedom and test statistic; in roughly one in eight results this may have affected the statistical conclusion (e.g., Bakker & Wicherts, Reference Bakker and Wicherts2011; Nuijten et al. Reference Nuijten, Hartgerink, Van Assen, Epskamp and Wicherts2016; Veldkamp et al. Reference Veldkamp, Nuijten, Dominguez-Alvarez, van Assen and Wicherts2014; Wicherts et al. Reference Wicherts, Bakker and Molenaar2011). Furthermore, there is evidence that roughly half of psychology articles are inconsistent with the given sample size and number of items (Brown & Heathers Reference Brown and Heathers2017), coefficients in mediation models often do not add up (Petrocelli et al. Reference Petrocelli, Clarkson, Whitmire and Moon2012), and in 41% of psychology articles reported degrees of freedom do not match the sample size description (Bakker & Wicherts Reference Bakker and Wicherts2014).
Problems that can be detected without having the raw data, are arguably just the tip of the iceberg of reproducibility issues. Studies that intended to reanalyze data from published studies also often ran into problems (e.g., Ebrahim et al. Reference Ebrahim, Sohani, Montoya, Agarwal, Thorlund, Mills and Ioannidis2014; Ioannidis et al. Reference Ioannidis, Allison, Ball, Coulibaly, Cui, Culhane, Falchi, Furlanello, Game, Jurman, Mangion, Mehta, Nitzburg, Page, Petretto and van Noort2009). Beside the poor availability of raw data, papers usually do not contain details about the exact analytical strategy. Researchers often seem to make analytical choices that are driven by the need to obtain a significant result (Agnoli et al. Reference Agnoli, Wicherts, Veldkamp, Albiero and Cubelli2017; John et al. Reference John, Loewenstein and Prelec2012). These choices can be seemingly arbitrary (e.g., choice of control variables or rules for outlier removal; see also Bakker et al. [Reference Bakker, van Dijk and Wicherts2012] and Simmons et al. [Reference Simmons, Nelson and Simonsohn2011]), which makes it hard to retrace the original analytical steps to verify the result.
Suggested solution
Performing a replication study in a novel sample to establish the reliability of a certain result is time consuming and expensive. It is essential that we avoid wasting resources on trying to replicate a finding that may not even be reproducible from the original data. Therefore, we argue that it should be standard practice to verify the original results before any direct replication is conducted.
A first step in verifying original results can be to check whether the results reported in a paper are internally consistent. Some initial screenings can be done quickly with automated tools such as “statcheck” (Epskamp & Nuijten Reference Epskamp and Nuijten2016; http://statcheck.io), “p-checker” (Schönbrodt Reference Schönbrodt2018), and granularity-related inconsistency of means (“GRIM” [Brown & Heathers Reference Brown and Heathers2017]). Especially if such preliminary checks already flag several potential problems, it is crucial that data and analysis scripts are made available for more detailed reanalysis. One could even argue that if data are not shared in such cases, the article should be retracted.
If a result can successfully be reproduced with the original data and analyses, it is interesting to investigate its sensitivity to alternative analytical choices. One way to do so is to run a so-called multiverse analysis (Steegen et al. 2016), in which different analytical choices are compared to test the robustness of the result. When a multiverse analysis shows that the study result is present in only a limited set of reasonable scenarios, you may not want to invest additional resources in replicating such a study. Note that a multiverse analysis still does not require any new data, and is therefore a relatively cost-effective way to investigate reliability.
Reanalysis of existing data is a crucial tool in investigating reliability of psychological results, so it should become standard practice to share raw data and analysis scripts. Journal policies can be successful in promoting this (Giofrè et al. Reference Giofrè, Cumming, Fresc, Boedker and Tressoldi2017; Kidwell et al Reference Kidwell, Lazarevic, Baranski, Hardwicke, Piechowski, Falkenberg, Kennett, Slowik, Sonnleitner, Hess-Holden, Errington, Fiedler and Nosek2016; Nuijten et al. Reference Nuijten, Borghuis, Veldkamp, Dominguez-Alvarez, Van Assen and Wicherts2017), so we hope that more journals will start requiring raw data and scripts.
In our proposal, the assessment of replicability is a multistep approach that first assesses whether the original reported results are internally consistent, then sets out to verify the original results through independent reanalysis of the data using the original analytical strategy, followed by a sensitivity analysis that checks whether the original result is robust to alternative choices in the analysis, and only then involves the collection of new data.