Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-02-05T12:00:22.999Z Has data issue: false hasContentIssue false

Generalizability, transferability, and the practice-to-practice gap

Published online by Cambridge University Press:  10 February 2022

Joshua R. de Leeuw
Affiliation:
Department of Cognitive Science, Vassar College, Poughkeepsie, NY12604, USAjdeleeuw@vassar.edu; https://www.vassar.edu/faculty/jdeleeuw/
Benjamin A. Motz
Affiliation:
Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN47405, USAbmotz@indiana.edu; efyfe@indiana.edu; rgoldsto@indiana.edu; https://motzweb.sitehost.iu.edu/; https://psych.indiana.edu/directory/faculty/fyfe-emily.html; https://psych.indiana.edu/directory/faculty/goldstone-robert.html
Emily R. Fyfe
Affiliation:
Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN47405, USAbmotz@indiana.edu; efyfe@indiana.edu; rgoldsto@indiana.edu; https://motzweb.sitehost.iu.edu/; https://psych.indiana.edu/directory/faculty/fyfe-emily.html; https://psych.indiana.edu/directory/faculty/goldstone-robert.html
Paulo F. Carvalho
Affiliation:
Human-Computer Interaction Institute, Carnegie Mellon University, Pittsburgh, PA15213, USA. pcarvalh@andrew.cmu.edu; https://sites.google.com/view/paulocarvalho
Robert L. Goldstone
Affiliation:
Department of Psychological and Brain Sciences, Indiana University, Bloomington, IN47405, USAbmotz@indiana.edu; efyfe@indiana.edu; rgoldsto@indiana.edu; https://motzweb.sitehost.iu.edu/; https://psych.indiana.edu/directory/faculty/fyfe-emily.html; https://psych.indiana.edu/directory/faculty/goldstone-robert.html

Abstract

Emphasizing the predictive success and practical utility of psychological science is an admirable goal but it will require a substantive shift in how we design research. Applied research often assumes that findings are transferable to all practices, insensitive to variation between implementations. We describe efforts to quantify and close this practice-to-practice gap in education research.

Type
Open Peer Commentary
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press

Yarkoni's call for a focus on “predictive practical utility” led us to think about how scientists could adapt their methodological practices to meet this goal. One approach that a scientist might take is to shift toward applied work. For example, rather than running a learning experiment under the tight controls of a laboratory setting with the aim of establishing generalizable principles, researchers might instead run a learning experiment in a live classroom setting and test whether theoretical predictions improve educationally-relevant measures of student performance. Moving from the lab to the classroom (or any applied field) requires extensive revision to a study's structure, and will require researchers to specify, whether implicitly or explicitly, potentially relevant covariates that might otherwise be ignored. When translating from research-to-practice, these implementation variables could become useful signposts, informing where an intervention's benefits might extend. If this strategy were adopted collectively, fields might converge on reliable predictions about what interventions work in what contexts.

Unfortunately, we think that this description is more about our aspirations than our reality. While researchers now routinely run learning experiments in live classes (Motz, Carvalho, de Leeuw, & Goldstone, Reference Motz, Carvalho, de Leeuw and Goldstone2018), the predictions inferred from these studies are almost never informed by moderating variables (Koedinger, Booth, & Klahr, Reference Koedinger, Booth and Klahr2013). Studies conducted in small numbers of classes are commonly assumed to apply to all classes. Moreover, when field research observes null findings, the failure is often attributed to constraints of implementation rather than limitations of theory. As an example, classroom research on retrieval practice interventions is promising (Dunlosky, Rawson, Marsh, Nathan, & Willingham, Reference Dunlosky, Rawson, Marsh, Nathan and Willingham2013) but not consistently observed (e.g., Gurung & Burns, Reference Gurung and Burns2019; see also Moreira, Pinto, Starling, & Jaeger, Reference Moreira, Pinto, Starling and Jaeger2019; Yang, Luo, Vadillo, Yu, & Shanks, Reference Yang, Luo, Vadillo, Yu and Shanks2021). Such mixed evidence hardly justifies the bold recommendations that it works for “all grade levels, all subject areas, and all students” (Agarwal, Roediger, McDaniel, & McDermott, Reference Agarwal, Roediger, McDaniel and McDermott2020, p. 6). Sweeping practical recommendations like this are common in education; in some ways they constitute the very nature of the What Works Clearinghouse, a large evidence library of recommendations for how to intervene in education settings in useful ways, generalized from individual studies. Recommendations in education are no more reliable than the rest of psychological science, considering that two-thirds of US federally-funded impact studies found no impact (Schneider, Reference Schneider2018), and 50% of independent replication attempts in education fail to find evidence consistent with the original findings (Makel & Plucker, Reference Makel and Plucker2014). When researchers plan to intervene on the world, the crisis of generalizability is no less potent than it is for laboratory studies.

Given that applied research practices presently share most of the shortcomings of laboratory work with respect to generalizability, we predict that a shift toward testing theory in applied settings will not suffice. Even with such a focus, psychological scientists still exhibit a tendency to seek narrow, under-specified evidence of abstract principles, which are assumed to generalize across settings. By closing the research-to-practice gap, we do not necessarily close what we'll call the practice-to-practice gap: the benefits of an intervention, even when supported by field research that made accurate predictions for one practical setting, may not be transferable to other practical settings.

This concept of “transferability,” more commonly associated with qualitative research, refers to the extent to which an intervention's effectiveness could be achieved in another sample and setting, whereas generalizability refers to the extent to which a sample statistic applies to the whole population and its many situations. While synonymous with “replicability,” transferability does not presume that an invariant global effect should exist in the first place (Lincoln & Guba, Reference Lincoln and Guba1986), to be replicated or not. Rather, transferability presumes that an effect is conditionally dependent on context, analogous to state dependencies in a complex systems framework (Hawe, Shiell, & Riley, Reference Hawe, Shiell and Riley2009).

Like Yarkoni, we believe that the transferability of an outcome in practice is contingent on variables which are typically not modeled, let alone articulated, in applied psychological science. Consider a teacher who hears that retrieval practice is an effective technique for improving student learning outcomes, and decides to incorporate regular practice quizzes into the curriculum. This teacher's implementation will likely deviate from the field tests where the technique was originally applied. As long as the field tests were carried out in narrow contexts, the teacher's deviations represent unmodeled sources of variance. Failing to account for this variance necessarily causes researchers to underestimate our uncertainty in the benefit of applying evidence-based practices to the teacher's classroom.

So what might an alternative approach look like? Yarkoni highlights the strategy of “design[ing] research with variation in mind,” which raises the question: Which sources of variation? As Yarkoni points out, introducing variation makes the study significantly more resource-intensive to run. At some point adding additional sources of variation will have diminishing returns.

In the case of applied research, we think researchers could be guided by the natural variation that occurs in actual practice. This was the strategy for our ManyClasses study (Fyfe et al., Reference Fyfe, de Leeuw, Carvalho, Goldstone, Sherman, Admiraal and Motz2021). We examined how the timing of feedback on student work affected learning performance in 38 different college classes. In each class, the experiment's parameters were allowed to vary according to instructors' preferences (difficulty, frequency, length, etc.). While the resulting set of implementations is by no means exhaustive, it does provide an estimate of the variance introduced when translating learning theory into normative instructional practice.

Purposefully introducing wide variation along theoretically important variables (e.g., Baribault et al., Reference Baribault, Donkin, Little, Trueblood, Oravecz, van Ravenzwaaij and Vandekerckhove2018) makes sense in laboratory contexts because this will assess generalizability to extreme and corner cases. In contrast, when concerned with the transferability of a phenomenon observed in the field, incorporating representative variation in settings will often be a better strategy if the goal is to determine how likely it is that the phenomenon will be observed in naturally occurring situations.

Financial support

Not applicable.

Conflict of interest

None.

Footnotes

*

Co-first authors.

References

Agarwal, P. K., Roediger, H. L., McDaniel, M. A., & McDermott, K. B. (2020). How to use retrieval practice to improve learning. St. Louis, MO: Washington University in St. Louis. Retrieved from http://www.retrievalpractice.org.Google Scholar
Baribault, B., Donkin, C., Little, D. R., Trueblood, J. S., Oravecz, Z., van Ravenzwaaij, D., … Vandekerckhove, J. (2018). Metastudies for robust tests of theory. Proceedings of the National Academy of Sciences, 115(11), 26072612.CrossRefGoogle ScholarPubMed
Dunlosky, J., Rawson, K. A., Marsh, E. J., Nathan, M. J., & Willingham, D. T. (2013). Improving students’ learning with effective learning techniques: Promising directions from cognitive and educational psychology. Psychological Science in the Public Interest, 14(1), 458. https://doi.org/10.1177/1529100612453266.CrossRefGoogle ScholarPubMed
Fyfe, E., de Leeuw, J. R., Carvalho, P. F., Goldstone, R., Sherman, J., Admiraal, D., … Motz, B. (2021). ManyClasses 1: Assessing the generalizable effect of immediate versus delayed feedback across many college classes. Advances in Methods and Practices in Psychological Science, 4(3), 124. https://doi.org/10.1177/25152459211027575.CrossRefGoogle Scholar
Gurung, R. A., & Burns, K. (2019). Putting evidence-based claims to the test: A multi-site classroom study of retrieval practice and spaced practice. Applied Cognitive Psychology, 33(5), 732743. https://doi.org/10.1002/acp.3507.CrossRefGoogle Scholar
Hawe, P., Shiell, A., & Riley, T. (2009). Theorising interventions as events in systems. American Journal of Community Psychology, 43(3–4), 267276. https://doi.org/10.1007/s10464-009-9229-9.CrossRefGoogle ScholarPubMed
Koedinger, K. R., Booth, J. L., & Klahr, D. (2013). Instructional complexity and the science to constrain it. Science, 342(6161), 935937. https://doi.org/10.1126/science.1238056.CrossRefGoogle Scholar
Lincoln, Y. S., & Guba, E. G. (1986). But is it rigorous? Trustworthiness and Authenticity in Naturalistic evaluation. New Directions for Program Evaluation, 1986(30), 7384.CrossRefGoogle Scholar
Makel, M. C., & Plucker, J. A. (2014). Facts are more important than novelty: Replication in the education sciences. Educational Researcher, 43(6), 304316. https://doi.org/10.3102/0013189X14545513.CrossRefGoogle Scholar
Moreira, B. F. T., Pinto, T. S. S., Starling, D. S. V., & Jaeger, A. (2019). Retrieval practice in classroom settings: A review of applied research. Frontiers in Education, 4(5), 116. https://doi.org/10.3389/feduc.2019.00005.Google Scholar
Motz, B. A., Carvalho, P. F., de Leeuw, J. R., & Goldstone, R. L. (2018). Embedding experiments: Staking causal inference in authentic educational contexts. Journal of Learning Analytics, 5(2), 4759. https://doi.org/10.18608/jla.2018.52.4.CrossRefGoogle Scholar
Schneider, M. (2018). A more systematic approach to replicating research. IES Director's Blog, Institute of Education Sciences. Retrieved from https://ies.ed.gov/director/remarks/12-17-2018.asp.Google Scholar
Yang, C., Luo, L., Vadillo, M. A., Yu, R., & Shanks, D. R. (2021). Testing (quizzing) boosts classroom learning: A systematic and meta-analytic review. Psychological Bulletin, 147(4), 399435. https://doi.org/10.1037/bul0000309.CrossRefGoogle ScholarPubMed