Hostname: page-component-745bb68f8f-grxwn Total loading time: 0 Render date: 2025-02-11T09:32:42.225Z Has data issue: false hasContentIssue false

Three strong moves to improve research and replications alike

Published online by Cambridge University Press:  27 July 2018

Roger Giner-Sorolla
Affiliation:
School of Psychology – Keynes College, University of Kent, Canterbury, Kent CT2 7NP, United Kingdom. rsg@kent.ac.ukhttps://www.kent.ac.uk/psychology/people/ginerr/
David M. Amodio
Affiliation:
Department of Psychology, New York University, New York, NY 10003. david.amodio@gmail.comhttp://amodiolab.org/ Department of Social Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands. G.A.vanKleef@uva.nlhttp://www.uva.nl/profile/g.a.vankleef/
Gerben A. van Kleef
Affiliation:
Department of Social Psychology, University of Amsterdam, 1018 WS Amsterdam, The Netherlands. G.A.vanKleef@uva.nlhttp://www.uva.nl/profile/g.a.vankleef/

Abstract

We suggest three additional improvements to replication practices. First, original research should include concrete checks on validity, encouraged by editorial standards. Second, the reasons for replicating a particular study should be more transparent and balance systematic positive reasons with selective negative ones. Third, methodological validity should also be factored into evaluating replications, with methodologically inconclusive replications not counted as non-replications.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2018 

Although we largely agree with Zwaan et al.'s analysis, we want to add to it, drawing on our experiences with replications as authors and editors. Over the past years in psychology, successful reforms have been based on concrete suggestions with visible incentives. We suggest three such moves that Zwaan et al. might not have considered.

Anticipate replication in design

In answering concerns about context variability, Zwaan et al. suggest that original authors' reports should be more detailed and acknowledge limitations. But these suggestions miss what lets us meaningfully compare two studies across contexts: calibration of methods, independent from the hypothesis test.

Often, suspicions arise that a replication is not measuring or manipulating the same thing as the original. For example, the Reproducibility Project (Open Science Collaboration 2015) was criticized for substituting an Israeli vignette's mention of military service with an activity more common to the replication's U.S. participants (Gilbert et al. Reference Gilbert, King, Pettigrew and Wilson2016). All of the methods reporting in the world cannot resolve this kind of debate. Instead, we need to know whether both scenarios successfully affected the independent variable. Whether researchers have the skill to carry out a complex or socially subtle procedure is also underspecified in most original and replication research, surfacing only as a doubt when replications fail.

Unfortunately, much original research does not include procedures to check that manipulations affected the independent variable or to validate original measures. Such steps can be costly, especially if participant awareness concerns require a separate study for checking. Nevertheless, the highest standard of research methodology should include validation that lets us interpret both positive and negative results (Giner-Sorolla Reference Giner-Sorolla2016; LeBel & Peters Reference LeBel and Peters2011). Although the rules of replication should allow replicators to add checks on methods, such checks should also be a part of original research. Specifically, by adopting the Registered Report publication format (Chambers et al. Reference Chambers, Dienes, McIntosh, Rotshtein and Willmes2015), evaluation of methods precedes data collection, so that planning to interpret negative results is essential. More generally, publication decisions should openly favor studies that take the effort to validate their methods.

Discuss and balance reasons to replicate

Providing a rationale for studying a particular relationship is pivotal to any scientific enterprise, but there are no clear guidelines for choosing a study to replicate. One criterion might be importance: theoretical weight, societal implications, influence through citations or textbooks, mass appeal. Alternatively, replications may be driven by doubt in the robustness of the effect. Currently, most large-scale replication efforts (e.g., Ebersole et al. Reference Ebersole, Atherton, Belanger, Skulborstad, Allen, Banks, Baranski, Bernstein, Bofiglio, Boucher, Brown, Budima, Cairo, Capaldi, Chartier, Chung, Cicero, Coleman, Conway, Davis, Devos, Fletcher, German, Grahe, Hermann, Hicks, Honeycutt, Humphrey, Janus, Johnson, Joy-Gaba, Juzeler, Keres, Kinney, Kirschenbaum, Klein, Lucas, Lustgraff, Martin, Menon, Metzger, Moloney, Morse, Prislin, Razza, Re, Rule, Sacco, Sauerberger, Shrider, Shultz, Siesman, Sobocko, Sternglanz, Summerville, Tskhay, van Allen, Vaughn, Walker, Weinberg, Wilson, Wirth, Wortman and Nosek2016a; Klein et al. Reference Klein, Ratliff, Vianello, Adams, Bahník, Bernstein, Bocian, Brandt, Brooks, Brumbaugh, Cemalcilar, Chandler, Cheong, Davis, Devos, Eisner, Frankowska, Furrow, Galliani, Hasselman, Hicks, Hovermale, Hunt, Huntsinger, IJerzman, John, Joy-Gaba, Kappes, Krueger, Kurtz, Levitan, Mallett, Morris, Nelson, Nier, Packard, Pilati, Rutchick, Schmidt, Skorinko, Smith, Steiner, Storbeck, Van Swol, Thompson, van't Veer, Vaughn, Vranka, Wichman, Woodzicka and Nosek2014b; Open Science Collaboration 2015) have chosen their studies either arbitrarily (e.g., by journal dates) or by an unsystematic and opaque process.

Without well-justified reasons and methods for selection, it is easy to imagine doubt motivating any replication. Speculatively, many individual replications seem to be attracted by a profile of surprising results, weak theory, and methods. But if replications hunt the weak by choice, conclusions about the robustness of a science will skew negative. This problem is compounded by the psychological reality that findings that refute the status quo (such as failed replications) attract more attention than findings that reinforce the status quo (such as successful replications).

Replicators (like original researchers) should provide strong justification for their choice of topic. When replication is driven by perceptions of faulty theory or implausibly large effects, this should be stated openly. Most importantly, replications should also draw on selection criteria a priori based on positive traits, such as theoretical importance, or diffusion in the academic and popular literature. Indeed, we are aware of one attempt to codify some of these traits, but it has not yet been finalized or published (Lakens Reference Lakens2016).

Although non-replication of shaky effects can be valuable, encouragement is also needed to replicate studies that are meaningful to psychological theory and literature. Importance could be one criterion of evaluation for single replication articles. Special issues and large-scale replication projects could be planned around principled selection of important effects to replicate. The Collaborative Replications and Education Project (2018), for example, chooses studies for replication based on a priori citation criteria.

Evaluate replication outcomes more accurately

The replication movement also suffers from an underdeveloped process for evaluating the validity of its findings. Currently, replication results are reported and publicized as a success or failure. But “failure” really represents two categories: valid non-replications and invalid (i.e., inconclusive) research. In original research, a null result could reflect a true lack of effect or problems with validity (a manipulation or measure not being operationalized precisely and effectively). Validity is best established through pilot testing, manipulation checks, and the consideration of context, sample, and experimental design, and evaluated through peer review. If validity is inadequate, then the results are inconclusive, not negative.

Indeed, most replication attempts try hard to avoid inconclusive statistical outcomes, often allotting themselves stronger power than the original study. But there has not been as much attention to identifying inconclusive methodological outcomes, such as when a replication's manipulation check fails, or a method is changed in a way that casts doubts upon the findings. One hindrance is the attitude, sometimes seen, that direct replications do not need to meet the same standards of external peer review as original research. For example, the methods of the individual replications in Open Science Collaboration (2015) were reviewed only by one or two project members and an original study author, pre-data collection.

Conclusion and recommendations

Reasons for replicating a particular effect should be made transparent, with positive, systematic methods encouraged. Replication reports and original research alike should include evidence of the validity of measures and manipulations, with standards set before data collection. Methods should be externally peer reviewed for validity by experts, with clear consequences (revision, rejection) if they are judged as inadequate. Also, when outcomes of replication are simplified into “box scores,” they should be sorted into three categories: replication, non-replication, and inconclusive. By improving the validity of replication reports, we will strengthen our science, while offering a more accurate portrayal of its state.

References

Chambers, C. D., Dienes, Z., McIntosh, R. D., Rotshtein, P. & Willmes, K. (2015) Registered reports: Realigning incentives in scientific publishing. Cortex 66:A1A2.Google Scholar
Collaborative Replications and Education Project (2018) Current study list and selection methods. Available at: https://osf.io/flaue/wiki/home/.Google Scholar
Ebersole, C. R., Atherton, O. E., Belanger, A. L., Skulborstad, H. M., Allen, J. M., Banks, J. B., Baranski, E., Bernstein, M. J., Bofiglio, D. B. V., Boucher, L., Brown, E. R., Budima, N. I., Cairo, A. H., Capaldi, C. A., Chartier, C. R., Chung, J. M., Cicero, D. C., Coleman, J. A., Conway, J. G., Davis, W. E., Devos, T., Fletcher, M. M., German, K., Grahe, J. E., Hermann, A. D., Hicks, J. A., Honeycutt, N., Humphrey, B., Janus, M., Johnson, D. J., Joy-Gaba, J. A., Juzeler, H., Keres, A., Kinney, D., Kirschenbaum, J., Klein, R. A., Lucas, R. E., Lustgraff, C. J. N., Martin, D., Menon, M., Metzger, M., Moloney, J. M., Morse, P. J., Prislin, R., Razza, T., Re, D. E., Rule, N. O., Sacco, D. F., Sauerberger, K., Shrider, E., Shultz, M., Siesman, C., Sobocko, K., Sternglanz, R. W., Summerville, A., Tskhay, K. O., van Allen, Z., Vaughn, L. A., Walker, R. J., Weinberg, A., Wilson, J. P., Wirth, J. H., Wortman, J. & Nosek, B. A. (2016a) Many Labs 3: Evaluating participant pool quality across the academic semester via replication. Journal of Experimental Social Psychology 67:6882. Available at: http://doi.org/10.1016/j.jesp.2015.10.012.Google Scholar
Gilbert, D. T., King, G., Pettigrew, S, & Wilson, T. D. (2016) Comment on “estimating the reproducibility of psychological science.Science 351(6277):1037. Available at: http://doi.org/10.1126/science.aad7243.Google Scholar
Giner-Sorolla, R. (2016) Approaching a fair deal for significance and other concerns. Journal of Experimental Social Psychology 65:16.Google Scholar
Klein, R. A., Ratliff, K. A., Vianello, M., Adams, R. B. Jr., Bahník, S., Bernstein, M. J., Bocian, K., Brandt, M. J., Brooks, B., Brumbaugh, C. C., Cemalcilar, Z., Chandler, J., Cheong, W., Davis, W. E., Devos, T., Eisner, M., Frankowska, N., Furrow, D., Galliani, E. M., Hasselman, F., Hicks, J. A., Hovermale, J. F., Hunt, S. J., Huntsinger, J. R., IJerzman, H., John, M.-S., Joy-Gaba, J. A., Kappes, H. B., Krueger, L. E., Kurtz, J., Levitan, C. A., Mallett, R. K., Morris, W. L., Nelson, A. J., Nier, J. A., Packard, G., Pilati, R., Rutchick, A. M., Schmidt, K., Skorinko, J. L., Smith, R., Steiner, T. G., Storbeck, J., Van Swol, L. M., Thompson, D., van't Veer, A. E., Vaughn, L. A., Vranka, M., Wichman, A. L., Woodzicka, J. A. & Nosek, B. A. (2014b). Data from investigating variation in replicability: A “Many Labs” Replication Project. Journal of Open Psychology Data 2(1):e4.Google Scholar
Lakens, D. (2016) The replication value: What should be replicated? Blog post. Available at: http://daniellakens.blogspot.co.uk/2016/01/the-replication-value-what-should-be.html.Google Scholar
LeBel, E. P. & Peters, K. R. (2011) Fearing the future of empirical psychology: Bem's (2011) evidence of psi as a case study of deficiencies in modal research practice. Review of General Psychology 15(4):371–79.Google Scholar
Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716. Available at: http://doi.org/10.1126/science.aac4716.Google Scholar