Hostname: page-component-745bb68f8f-l4dxg Total loading time: 0 Render date: 2025-02-11T09:35:35.481Z Has data issue: false hasContentIssue false

Putting replication in its place

Published online by Cambridge University Press:  27 July 2018

Evan Heit
Affiliation:
E.H. Division of Research on Learning, Education and Human Resources Directorate, National Science Foundation, Alexandria, VA 22314. ekheit@nsf.gov
Caren M. Rotello
Affiliation:
CMR Department of Psychological and Brain Sciences, University of Massachusetts, Amherst, MA 01003. caren@psych.umass.eduhttps://www.umass.edu/pbs/people/caren-rotello

Abstract

Direct replication is valuable but should not be elevated over other worthwhile research practices, including conceptual replication and checking of statistical assumptions. As noted by Rotello et al. (2015), replicating studies without checking the statistical assumptions can lead to increased confidence in incorrect conclusions. Finally, successful replications should not be elevated over failed replications, given that both are informative.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2018 

What is the theoretical value of direct replication? In a recent paper, we (Rotello et al. Reference Rotello, Heit and Dubé2015) described several cases where oft-replicated studies repeated the methodological flaws of the original work. In particular, we presented examples from research on reasoning, memory, social cognition, and child welfare in which the standard method of analysis was not justified and indeed could – and in at least two cases, did – lead to erroneous inferences. Repeating the study, along with the flawed analyses, could lead to yet greater confidence in these incorrect conclusions. Most of our examples concerned conceptual rather than direct replications, in the sense that there were various purposeful design and material changes across studies. Our point was about methodology, namely that inferential errors as a result of unjustified analyses can be magnified upon replication. Contrary to the implication of the target article, we would not argue that the theoretical value of direct, or for that matter conceptual, replications is limited.

Indeed, the target article makes a compelling case for the value of replication, as well as its mainstream role in psychology. Yet we would not elevate replication over other worthwhile research practices. Using an example from Rotello et al. (Reference Rotello, Heit and Dubé2015), we reported that, beginning with Evans et al.(Reference Evans, Barston and Pollard1983), for three decades, replication studies on the belief bias effect in reasoning have employed analyses such as analyses of variance on differences in response rates without checking the assumptions of those analyses. (In this example, researchers could easily do so by collecting data that would allow them to plot receiver operating characteristic curves to see whether there is a linear or curvilinear relationship between correct and incorrect positive response rates.) Checking statistical assumptions is another worthwhile research practice, the results of which sometimes will contraindicate the strategy of simply running the same analyses again. Researchers should place a high priority on checking the assumptions of their statistical analyses and their dependent measures. Just as the Reproducibility Project: Psychology (Open Science Collaboration 2015) has launched a highly successful effort to crowdsource direct replication, we note that other worthwhile research practices, such as checking statistics, could also be crowdsourced. In light of the potential problems with difference scores and analyses of variance that place so many reasoning and recognition memory studies at risk (see also Dubé et al. Reference Dubé, Rotello and Heit2010; Heit & Rotello Reference Heit and Rotello2014; Rotello et al. Reference Rotello, Masson and Verde2008), we would like to see a large-scale effort to check statistical assumptions across of wide range of research domains. We point to statcheck (Nuijten et al. Reference Nuijten, Hartgerink, Van Assen, Epskamp and Wicherts2016) as a promising example along these lines, although its focus to date has been on checking p values. For some research domains, checking statistical assumptions may be a higher priority than direct replications.

Likewise, we would not elevate direct replication over conceptual replication. Philosophers of science have argued that researchers should be particularly confident in a conclusion that can be repeated across diverse contexts and methods (for a review, see Heit et al. Reference Heit, Hahn, Feeney, Ahn, Goldstone, Love, Markman and Wolff2005). For example, Salmon (Reference Salmon1984) described how early twentieth-century scientists developed a diverse set of experimental methods for deriving Avogadro's number (6.02×1023). These methods included Brownian movement, alpha particle decay, X-ray diffraction, black body radiation, and electrochemistry. Together, these diverse methods – these conceptual replications – provided particularly strong support for the existence of atoms and molecules, going well beyond what direct replications could have accomplished. Turning back to psychology, we pose the question of whether the field learns more from N direct replications of a study or from N conceptual replications of the same study. Perhaps when N is very low there is greater value from direct replications, but as N increases the value of conceptual replications becomes more pronounced.

Finally, we would not elevate replication “successes” over replication “failures,” namely, successes or failures in obtaining the same results as a prior study. Scientists learn something important from either outcome. This point is perhaps clearer in medical research—finding evidence that a once-promising medical treatment does not work should be just as important as a positive finding. To the degree that psychological research has an influence on health and medical practices, educational practices, and public policy, finding out which results do not replicate will be crucial. Although replication failures can be associated with fluctuating contexts and post hoc explanations, we note that in much research, context is varied purposefully from study to study. In a sense, context itself is an object of study, and failures are informative. Given that a drug is effective for men, does it work for women? Given that an educational intervention is successful for native English speakers, is it successful for English language learners? Here, addressing replication failures is central to the research enterprise rather than being a problematic matter.

To conclude, the pursuit of direct replication is potentially of high theoretical value, and indeed is becoming increasingly mainstream, for example, as psychology journals devote sections to direct replication reports. However, we would place direct replication alongside other worthwhile research practices, such as conceptual replication and careful evaluation of statistical assumptions. Likewise, we would place successful replications alongside failed replications in terms of their potential to inform the field.

Footnotes

1.

Parts of this commentary are a work of the U.S. Government and are not subject to copyright protection in the United States.

2.

This material includes work performed by Evan Heit while serving at the National Science Foundation. Any opinion, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.

References

Dubé, C., Rotello, C. M. & Heit, E. (2010) Assessing the belief bias effect with ROCs: It's a response bias effect. Psychological Review 117:831–63.Google Scholar
Evans, J. St. B. T., Barston, J. L. & Pollard, P. (1983) On the conflict between logic and belief in syllogistic reasoning. Memory and Cognition 11:295306.Google Scholar
Heit, E., Hahn, U. & Feeney, A. (2005) Defending diversity. In: Categorization inside and outside the laboratory: Essays in honor of Douglas L. Medin, ed. Ahn, W.-K., Goldstone, R. L., Love, B. C., Markman, A. B. & Wolff, P., pp. 8799. American Psychological Association.Google Scholar
Heit, E. & Rotello, C. M. (2014) Traditional difference-score analyses of reasoning are flawed. Cognition 131:7591.Google Scholar
Nuijten, M. B., Hartgerink, C. H. J., Van Assen, M. A. L. M., Epskamp, S. & Wicherts, J. M. (2016) The prevalence of statistical reporting errors in psychology (1985–2013). Behavior Research Methods 48(4):1205–26. Available at: http://doi.org/10.3758/s13428-015-0664-2.Google Scholar
Open Science Collaboration (2015) Estimating the reproducibility of psychological science. Science 349(6251):aac4716. Available at: http://doi.org/10.1126/science.aac4716.Google Scholar
Rotello, C. M., Heit, E. & Dubé, C. (2015) When more data steer us wrong: Replications with the wrong dependent measure perpetuate erroneous conclusions. Psychonomic Bulletin and Review 22:944–54.Google Scholar
Rotello, C. M., Masson, M. E. J. & Verde, M. F. (2008) Type I error rates and power analyses for single-point sensitivity measures. Perception and Psychophysics 70:389401.Google Scholar
Salmon, W. C. (1984) Scientific explanation and the causal structure of the world. Princeton University Press.Google Scholar