Hostname: page-component-7b9c58cd5d-g9frx Total loading time: 0 Render date: 2025-03-15T09:47:19.000Z Has data issue: false hasContentIssue false

Strong scientific theorizing is needed to improve replicability in psychological science

Published online by Cambridge University Press:  27 July 2018

Timothy Carsel
Affiliation:
Department of Psychology, University of Illinois at Chicago, Chicago, IL 60607. timothy.carsel@gmail.comademos@uic.edumatt.motyl@gmail.comwww.timcarsel.wordpress.comwww.alexanderdemos.org/www.mattmotyl.com
Alexander P. Demos
Affiliation:
Department of Psychology, University of Illinois at Chicago, Chicago, IL 60607. timothy.carsel@gmail.comademos@uic.edumatt.motyl@gmail.comwww.timcarsel.wordpress.comwww.alexanderdemos.org/www.mattmotyl.com
Matt Motyl
Affiliation:
Department of Psychology, University of Illinois at Chicago, Chicago, IL 60607. timothy.carsel@gmail.comademos@uic.edumatt.motyl@gmail.comwww.timcarsel.wordpress.comwww.alexanderdemos.org/www.mattmotyl.com

Abstract

The target article makes the important case for making replicability mainstream. Yet, their proposal targets a symptom, rather than the underlying cause of low replication rates. We argue that psychological scientists need to devise stronger theories that are more clearly falsifiable. Without strong, falsifiable theories in the original research, attempts to replicate the original research are nigh uninterpretable.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2018 

We applaud Zwaan et al. for compiling many of the present concerns researchers have regarding replication and for their thoughtful rejoinders to those concerns. Yet, the authors gloss over an underlying cause of the problem of the lack of replicability in psychological science and instead focus exclusively on addressing a symptom, specifically that the field does not make replications a centerpiece of hypothesis testing. An underlying cause is that psychologists do not actually propose “strong” testable theories. To paraphrase Meehl (Reference Meehl1990a), null hypothesis testing of “weak” theories produces a literature that is “uninterpretable.” In particular, this is because the qualitative hypotheses generated from weak theories are not formulated specifically enough, just that “X and Y” will interact. Thus, any degree and form of interaction could be used to support the [frequentists'] statistical hypothesis. Further, it is important to remember that the statistical hypothesis, that is, the alternative to the null, is never actually true (Cohen Reference Cohen1994) and can address only the degree of the interaction, not the form. In other words, both a disordinal interaction from an original study and an ordinal interaction from a replication would yield statistical support for the interaction hypothesis. Had the theory been stronger, the hypothesis would have predicted a specific degree and form of the interaction, resulting in the non-replication of the original study by the second. This in part may explain how we came to the conclusion in our own examination of research practices and replication metrics of published research (Motyl et al. Reference Motyl, Demos, Carsel, Hanson, Melton, Mueller, Prims, Sun, Washburn, Wong, Yantis and Skitka2017; Washburn et al., Reference Washburn, Hanson, Motyl, Skitka, Yantis, Wong, Sun, Prims, Mueller, Melton and Carsel2018) that the metrics of replicability seemed to support Meehl's prediction that a poorly theorized scientific literature would produce “uninterpretable” results. Thus, the authors' concern VI (sect. 5.6) regarding point estimation (e.g., effect size, p-values) and their confidence intervals implicitly assume that the original study and replication were interpretable results regarding the verisimilitude of the theory. To summarize this argument, take for a moment the example of throwing darts at a dart board. Zwaan et al. were concerned with whether the second dart came near the first. However, based on the way psychology often works, the size of the bullseye may be the whole wall. Thus, replication can only contribute to the falsification of a theory that is well-defined.

The current predicament of weak theorizing may be created in part by the thinking that “humans are too complicated for strong theories.” Zwaan et al. speak to the symptom of this problem by stating that “context” needs to be better described in our methods sections. Psychological theories often require substantial auxiliary theories and hypotheses to “derive” the qualitative hypotheses that motivate our studies (Meehl Reference Meehl1990b). In short, this leads to the problem of “theoretical degrees of freedom” such that the ambiguous theory can be re-instantiated in such a way that any result we may find will be used as support for our, in fact, unfalsifiable theories. Zwaan et al. assert “If a finding that was initially presented as support for a theory cannot be reliably reproduced using the comprehensive set of instructions for duplicating the original procedure, then the specific prediction that motivated the original research question has been falsified (Popper 1959/2002), at least in the narrow sense” (sect. 2, para. 3). The kind of falsification advocated by Zwaan et al., however, becomes increasingly difficult the further removed a statistical hypothesis is from the qualitative hypothesis (Meehl Reference Meehl1990a), and the finding is rendered uninterpretable when our statistical and qualitative hypotheses become couched in an increasing number of implicit auxiliary hypotheses. Indeed, if our theories are so weak that any contextual change negates them, then those are not theories; they are hypotheses masquerading as theories. Gray (Reference Gray2017) proposed a preliminary method to instantiate our theories visually, which forces the scientist to think through their theory's concepts and relationships. This is a stronger recommendation than the ones made by Zwaan et al., who suggest simply being more careful about statements of generalizability. Concerns II (sect 5.2) and IV (sect. 5.4) would be resolved with stronger theorizing, more careful derivations and discussions of statistical and qualitative hypotheses, as well as both direct and conceptual replications to test the boundary conditions of those theories.

In summary, we contend that the target article authors are right that we need to make replication more mainstream, but argue that we need to go further and encourage stronger theorizing to help make replications more feasible and meaningful.

References

Cohen, J. (1994) The earth is round (p < .05). American Psychologist 49:9971003.Google Scholar
Gray, K. (2017) How to map theory: Reliable methods are fruitless without rigorous theory. Perspectives on Psychological Science 12(5):731–41.Google Scholar
Meehl, P. E. (1990a) Why summaries of research on psychological theories are often uninterpretable. Psychological Reports 66(1):195244. Available at: https://doi.org/10.2466/pr0.1990.66.1.195.Google Scholar
Meehl, P. E. (1990b) Appraising and amending theories: The strategy of Lakatosian defense and two principles that warrant it. Psychological Inquiry 1:108–41. Available at: http://doi.org/10.1207/s15327965pli0102_1.Google Scholar
Motyl, M., Demos, A. P., Carsel, T. S., Hanson, B. E., Melton, Z. J., Mueller, A. B., Prims, J. P., Sun, J., Washburn, A. N., Wong, K., Yantis, C. A. & Skitka, L. J. (2017) The state of social and personality science: Rotten to the core, not so bad, getting better, or getting worse? Journal of Personality and Social Psychology 113(1):34.Google Scholar
Washburn, A. N., Hanson, B. E., Motyl, M., Skitka, L., Yantis, C., Wong, K., Sun, J., Prims, J., Mueller, A. B., Melton, Z. J. & Carsel, T. S. (2018) Why do some psychology researchers resist using proposed reforms to research practices? A description of researchers' rationales. Advances in Methods and Practices in Psychological Science. Published online March 7, 2018. Available at: https://doi.org/10.1177/2515245918757427.Google Scholar