The replicability revolution

Ulrich Schimmack

doi:10.1017/S0140525X18000833

The replicability revolution

Published online by Cambridge University Press: 27 July 2018

Ulrich Schimmack

Show author details

Ulrich Schimmack*: Affiliation:
Department of Psychology, University of Toronto, Mississauga, ON L5L 1C6, Canada. Ulrich.schimmack@utoronto.cahttps://replicationindex.wordpress.com/

Article contents

Abstract
References

Rights & Permissions

Abstract

Psychology is in the middle of a replicability revolution. High-profile replication studies have produced a large number of replication failures. The main reason why replication studies in psychology often fail is that original studies were selected for significance. If all studies were reported, original studies would fail to produce significant results as often as replication studies. Replications would be less contentious if original results were not selected for significance.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 41 , 2018 , e147

DOI: https://doi.org/10.1017/S0140525X18000833 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

The history of psychology is characterized by revolutions. This decade is marked by the replicability revolution. One prominent feature of the replicability revolution is the publication of replication studies with nonsignificant results. The publication of several high-profile replication failures has triggered a confidence crisis.

Zwaan et al. have been active participants in the replicability revolution. Their target article addresses criticisms of direct replication studies.

One concern is the difficulty of re-creating original studies, which may explain replication failures, particularly in social psychology. This argument fails on three counts. First, it does not explain why published studies have an apparent success rate greater than 90%. If social psychological studies were difficult to replicate, the success rate should be lower. Second, it is not clear why it would be easier to conduct conceptual replication studies that vary crucial aspects of a successful original study. If social priming effects were, indeed, highly sensitive to contextual variations, conceptual replication studies would be even more likely to fail than direct replication studies; however, miraculously they always seem to work. The third problem with this argument is that it ignores selection for significance. It treats successful conceptual replication studies as credible evidence, but bias tests reveal that these studies have been selected for significance and that many original studies that failed are simply not reported (Schimmack Reference Schimmack2017; Schimmack et al. Reference Schimmack, Heene and Kesavan2017).

A second concern about direct replications is that they are less informative than conceptual replications (Crandall & Sherman Reference Crandall and Sherman2016). This argument is misguided because it assumes a successful outcome. If a conceptual replication study is successful, it increases the probability that the original finding was true and it expands the range of conditions under which an effect can be observed. However, the advantage of a conceptual replication study becomes a disadvantage when a study fails. For example, if the original study showed that eating green jelly beans increases happiness and a conceptual replication study with red jelly beans does not show this effect, it remains unclear whether green jelly beans make people happier or not. Even the nonsignificant finding with red jelly beans is inconclusive because the result could be a false negative. Meanwhile, a failure to replicate the green jelly bean effect in a direct replication study is informative because it casts doubt on the original finding. In fact, a meta-analysis of the original and replication study might produce a nonsignificant result and reverse the initial inference that green jelly beans make people happy. Crandall and Sherman's argument rests on the false assumption that only significant studies are informative. This assumption is flawed because selection for significance renders significance uninformative (Sterling Reference Sterling1959).

A third argument against direct replication studies is that there are multiple ways to compare the results of original and replication studies. I believe the discussion of this point also benefits from taking publication bias into account. Selection for significance explained why the reproducibility project obtained only 36% significant results in direct replications of original studies with significant results (Open Science Collaboration 2015). As a result, the significant results of original studies are less credible than the nonsignificant results in direct replication studies. This generalizes to all comparisons of original studies and direct replication studies. Once there is suspicion or evidence that selection for significance occurred, the results of original studies are less credible, and more weight should be given to replication studies that are not biased by selection for significance. Without selection for significance, there is no reason why replication studies should be more likely to fail than original studies. If replication studies correct mistakes in original studies and use larger samples, they are actually more likely to produce a significant result than original studies.

Selection for significance also explains why replication failures are damaging to the reputation of researchers. The reputation of researchers is based on their publication record, and this record is biased in favor of successful studies. Thus, researchers' reputations are inflated by selection for significance. Once an unbiased replication produces a nonsignificant result, the unblemished record is tainted, and it is apparent that a perfect published record is illusory and not the result of research excellence (a.k.a flair). Thus, unbiased failed replication studies not only provide new evidence; they also undermine the credibility of existing studies. Although positive illusions may be beneficial for researchers' eminence, they have no place in science. It is therefore inevitable that the ongoing correction of the scientific record damages the reputation of researchers, if this reputation was earned by selective publishing of significant results. In this way direct replication studies complement statistical tools that can reveal selective publishing of significant results with statistical tests of original studies (Schimmack Reference Schimmack2012; Reference Schimmack2014; Schimmack & Brunner submitted for publication).

References

Crandall, C. S. & Sherman, J. W. (2016) On the scientific superiority of conceptual replications for scientific progress. Journal of Experimental Social Psychology 66:93–99. Available at: http://doi.org/10.1016/j.jesp.2015.10.002.Google Scholar

Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716. Available at: http://doi.org/10.1126/science.aac4716.Google Scholar

Schimmack, U. (2012) The ironic effect of significant results on the credibility of multiple-study articles. Psychological Methods 17:551–56.Google Scholar

Schimmack, U. (2014) The test of insufficient variance (TIVA): A new tool for the detection of questionable research practices. Working paper. Available at: https://replicationindex.wordpress.com/2014/12/30/the-test-of-insufficient-variance-tiva-a-new-tool-for-the-detection-of-questionable-research-practices/.Google Scholar

Schimmack, U. (2017) ‘Before you know it’ by John A. Bargh: A quantitative book review. Available at: https://replicationindex.wordpress.com/2017/11/28/before-you-know-it-by-john-a-bargh-a-quantitative-book-review/.Google Scholar

Schimmack, U. & Brunner, J. (submittrd for publication) Z-Curve: A method for estimating replicability based on test statistics in original studies. Submitted for Publication.Google Scholar

Schimmack, U., Heene, M. & Kesavan, K. (2017) Reconstruction of a train wreck: How priming research went off the rails. Blog post. Available at: https://replicationindex.wordpress.com/2017/02/02/reconstruction-of-a-train-wreck-how-priming-research-went-of-the-rails/.Google Scholar

Sterling, T. D. (1959) Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. Journal of the American Statistical Association 54(285):30–34. Available at: http://doi.org/10.2307/2282137.Google Scholar