Lee and Schwarz (L&S) present a “theory of grounded procedures” that aims to account for empirical findings relating to cleansing and other physical actions (henceforth “cleansing effects”). In sect. 1.2, they report two forms of evidence that they argue indicate that cleansing effects are robust: (a) meta-analytic research and (b) replication studies. Although we applaud their consideration of robustness issues, we argue that they have not provided convincing evidence for the existence of cleansing effects.
L&S summarize the results of meta-analysis (currently unpublished and data unavailable) of experimental studies of cleansing effects (Lee, Chen, Ma, & Hoang, Reference Lee, Chen, Ma and Hoang2020a) that estimates the overall effect size to be “in the small-to-medium range and highly significant” (sect. 1.2., para. 2). Moreover, they claim that converging evidence from fail-safe n, trim-and-fill, and normal quantile plots shows that “publication bias alone was unlikely to account for the existence of cleansing effects” (sect. 1.2, para. 2). However, we agree with Ropovik et al. (this treatment) that this conclusion is unwarranted because these bias detection methods rely on untestable assumptions and have been superseded by more sophisticated methods. In addition, we note that these methods are particularly inappropriate for assessing this literature because, as L&S note, effect sizes are “highly heterogeneous” (sect. 1.2, para. 2). Fail-safe n does not take heterogeneity in effect sizes into account at all (Iyengar & Greenhouse, Reference Iyengar and Greenhouse1988), whereas trim-and-fill provides misleading results when heterogeneity is present (Peters, Sutton, Jones, Abrams, & Rushton, Reference Peters, Sutton, Jones, Abrams and Rushton2007; Terrin, Schmid, Lau, & Olkin, Reference Terrin, Schmid, Lau and Olkin2003). Removing large positive effects identified in a normal quantile plot is also inappropriate because these large effects may be genuine if the studies are heterogeneous. Consequently, we encourage Lee and colleagues to re-examine the evidence for publication bias in their upcoming meta-analysis using state-of-the-art methods such as Bayesian fill-in meta-analysis (Du, Liu, & Wang, Reference Du, Liu and Wang2017), PET-PEESE (Stanley & Doucouliagos, Reference Stanley and Doucouliagos2014), and p-uniform* (van Aert & van Assen, Reference van Aert and van Assen2020).
Another serious concern is that the p-curve analysis conducted by Ropovik et al. (this treatment) indicates that the statistically significant replication effects reported in the target article contain no evidential value and that the large proportion of p-values just below 0.05 may have been caused by the opportunistic use of researcher degrees of freedom (Simonsohn, Nelson, & Simmons, Reference Simonsohn, Nelson and Simmons2014a).
We argue that the evaluation of evidence for cleansing effects should be largely focused on preregistered studies. Preregistration is an effective approach for restricting researcher degrees of freedom and, thus, has an important role to play in resolving the replication crisis in psychology (Lakens, Reference Lakens2019; Nosek, Ebersole, DeHaven, & Mellor, Reference Nosek, Ebersole, DeHaven and Mellor2018). Among other things, a high-quality preregistration includes a specification of a target sample size that prevents optional stopping, a description of primary and secondary outcomes that prevents outcome switching, and an analysis plan that constrains the use of other researcher degrees of freedom (Bakker et al., Reference Bakker, Veldkamp, van Assen, Crompvoets, Ong, Nosek and Wicherts2020; Wicherts et al., Reference Wicherts, Veldkamp, Augusteijn, Bakker, van Aert and van Assen2016). By contrast, meta-analytic methods that aim to correct for biases necessarily rely on untestable assumptions about the processes that generate biases and the magnitudes of these biases, which means we cannot be confident that biases have been corrected (Carter, Schonbrodt, Gervais, & Hilgard, Reference Carter, Schonbrodt, Gervais and Hilgard2019). In other words, meta-analysis is no substitute for preregistered replications (van Elk et al., Reference van Elk, Matzke, Gronau, Guana, Vandekerckhove and Wagenmakers2015).
We identified 22 replication studies (that reported results) in the target article and found that only four of them (from two publications) were preregistered (Camerer et al., Reference Camerer, Dreber, Holzmeister, Ho, Huber, Johannesson and Wu2018; Johnson, Cheung, & Donnellan, Reference Johnson, Cheung and Donnellan2014b; see https://osf.io/7ehr8). Notably, each of these preregistered studies had much larger samples (N = 219, N = 132, N = 123, and N = 286) than the studies they attempted to replicate (all N = 40) (Lee & Schwarz, Reference Lee and Schwarz2010a; Schnall, Benton, & Harvey, Reference Schnall, Benton and Harvey2008) and none of them found any evidence for the cleansing effects reported in the original studies. In fact, in all four studies the point-estimate for the effect size was very close to zero (d = −0.01, d = 0.01, r = −0.07, and r = −0.05). In addition, we have identified a large multisite replication project (N = 7,001) not cited by L&S that included a test of a cleansing effect (Klein et al., Reference Klein, Fasselman, Adams, Adamsn, Alper, Aveyard and Nosek2018). This study attempted to replicate Study 2 of Zhong and Liljenquist (Reference Zhong and Liljenquist2006) (N = 27) across 50 sites and found no evidence for the predicted effect (d = 0.00). This fits a general pattern in the psychology literature: preregistered replication studies fail to replicate at a much higher rate than one would expect given the large effect sizes reported in original studies (Camerer et al., Reference Camerer, Dreber, Holzmeister, Ho, Huber, Johannesson and Wu2018; Open Science Collaboration, 2015), including for effects that had been supported by meta-analyses of studies that were not preregistered (Kvarven, Stromland, & Johannesson, Reference Kvarven, Stromland and Johannesson2020).
Because researcher degrees of freedom are curtailed in preregistered studies (if not entirely absent, see Bakker et al., Reference Bakker, Veldkamp, van Assen, Crompvoets, Ong, Nosek and Wicherts2020; Claesen, Gomes, Tuerlinckx, & Vanpaemel, Reference Claesen, Gomes, Tuerlinckx and Vanpaemel2020) we suggest that Lee and colleagues could enhance the informativeness of their upcoming meta-analysis of cleansing effects by supplementing it with a targeted meta-analysis that includes only those studies that were preregistered. Finding meta-analytic evidence for cleansing effects in preregistered studies would considerably strengthen the case for cleansing effects being robust phenomena, whereas a failure to find evidence would be cause for concern. A meta-analysis of the money priming effect provides an interesting example of the extent to which results can diverge (Lodder, Ong, Grasman, & Wicherts, Reference Lodder, Ong, Grasman and Wicherts2019). The full meta-analysis of 246 money priming studies estimated an overall effect size of small to medium magnitude (g = 0.31; see Fig. 1 (top-left plot), p. 701). By contrast, the targeted meta-analysis of the 47 preregistered studies found an average effect size that was non-significant (g = 0.01; see Fig. 1 (middle-right plot), p. 701).
In summary, we have argued that a scientific assessment of the evidence for cleansing effects requires the application of state-of-the-art publication bias methods and a meta-analysis of preregistered studies. As things stand, the empirical foundation for the theory of grounded procedures is tenuous.
Lee and Schwarz (L&S) present a “theory of grounded procedures” that aims to account for empirical findings relating to cleansing and other physical actions (henceforth “cleansing effects”). In sect. 1.2, they report two forms of evidence that they argue indicate that cleansing effects are robust: (a) meta-analytic research and (b) replication studies. Although we applaud their consideration of robustness issues, we argue that they have not provided convincing evidence for the existence of cleansing effects.
L&S summarize the results of meta-analysis (currently unpublished and data unavailable) of experimental studies of cleansing effects (Lee, Chen, Ma, & Hoang, Reference Lee, Chen, Ma and Hoang2020a) that estimates the overall effect size to be “in the small-to-medium range and highly significant” (sect. 1.2., para. 2). Moreover, they claim that converging evidence from fail-safe n, trim-and-fill, and normal quantile plots shows that “publication bias alone was unlikely to account for the existence of cleansing effects” (sect. 1.2, para. 2). However, we agree with Ropovik et al. (this treatment) that this conclusion is unwarranted because these bias detection methods rely on untestable assumptions and have been superseded by more sophisticated methods. In addition, we note that these methods are particularly inappropriate for assessing this literature because, as L&S note, effect sizes are “highly heterogeneous” (sect. 1.2, para. 2). Fail-safe n does not take heterogeneity in effect sizes into account at all (Iyengar & Greenhouse, Reference Iyengar and Greenhouse1988), whereas trim-and-fill provides misleading results when heterogeneity is present (Peters, Sutton, Jones, Abrams, & Rushton, Reference Peters, Sutton, Jones, Abrams and Rushton2007; Terrin, Schmid, Lau, & Olkin, Reference Terrin, Schmid, Lau and Olkin2003). Removing large positive effects identified in a normal quantile plot is also inappropriate because these large effects may be genuine if the studies are heterogeneous. Consequently, we encourage Lee and colleagues to re-examine the evidence for publication bias in their upcoming meta-analysis using state-of-the-art methods such as Bayesian fill-in meta-analysis (Du, Liu, & Wang, Reference Du, Liu and Wang2017), PET-PEESE (Stanley & Doucouliagos, Reference Stanley and Doucouliagos2014), and p-uniform* (van Aert & van Assen, Reference van Aert and van Assen2020).
Another serious concern is that the p-curve analysis conducted by Ropovik et al. (this treatment) indicates that the statistically significant replication effects reported in the target article contain no evidential value and that the large proportion of p-values just below 0.05 may have been caused by the opportunistic use of researcher degrees of freedom (Simonsohn, Nelson, & Simmons, Reference Simonsohn, Nelson and Simmons2014a).
We argue that the evaluation of evidence for cleansing effects should be largely focused on preregistered studies. Preregistration is an effective approach for restricting researcher degrees of freedom and, thus, has an important role to play in resolving the replication crisis in psychology (Lakens, Reference Lakens2019; Nosek, Ebersole, DeHaven, & Mellor, Reference Nosek, Ebersole, DeHaven and Mellor2018). Among other things, a high-quality preregistration includes a specification of a target sample size that prevents optional stopping, a description of primary and secondary outcomes that prevents outcome switching, and an analysis plan that constrains the use of other researcher degrees of freedom (Bakker et al., Reference Bakker, Veldkamp, van Assen, Crompvoets, Ong, Nosek and Wicherts2020; Wicherts et al., Reference Wicherts, Veldkamp, Augusteijn, Bakker, van Aert and van Assen2016). By contrast, meta-analytic methods that aim to correct for biases necessarily rely on untestable assumptions about the processes that generate biases and the magnitudes of these biases, which means we cannot be confident that biases have been corrected (Carter, Schonbrodt, Gervais, & Hilgard, Reference Carter, Schonbrodt, Gervais and Hilgard2019). In other words, meta-analysis is no substitute for preregistered replications (van Elk et al., Reference van Elk, Matzke, Gronau, Guana, Vandekerckhove and Wagenmakers2015).
We identified 22 replication studies (that reported results) in the target article and found that only four of them (from two publications) were preregistered (Camerer et al., Reference Camerer, Dreber, Holzmeister, Ho, Huber, Johannesson and Wu2018; Johnson, Cheung, & Donnellan, Reference Johnson, Cheung and Donnellan2014b; see https://osf.io/7ehr8). Notably, each of these preregistered studies had much larger samples (N = 219, N = 132, N = 123, and N = 286) than the studies they attempted to replicate (all N = 40) (Lee & Schwarz, Reference Lee and Schwarz2010a; Schnall, Benton, & Harvey, Reference Schnall, Benton and Harvey2008) and none of them found any evidence for the cleansing effects reported in the original studies. In fact, in all four studies the point-estimate for the effect size was very close to zero (d = −0.01, d = 0.01, r = −0.07, and r = −0.05). In addition, we have identified a large multisite replication project (N = 7,001) not cited by L&S that included a test of a cleansing effect (Klein et al., Reference Klein, Fasselman, Adams, Adamsn, Alper, Aveyard and Nosek2018). This study attempted to replicate Study 2 of Zhong and Liljenquist (Reference Zhong and Liljenquist2006) (N = 27) across 50 sites and found no evidence for the predicted effect (d = 0.00). This fits a general pattern in the psychology literature: preregistered replication studies fail to replicate at a much higher rate than one would expect given the large effect sizes reported in original studies (Camerer et al., Reference Camerer, Dreber, Holzmeister, Ho, Huber, Johannesson and Wu2018; Open Science Collaboration, 2015), including for effects that had been supported by meta-analyses of studies that were not preregistered (Kvarven, Stromland, & Johannesson, Reference Kvarven, Stromland and Johannesson2020).
Because researcher degrees of freedom are curtailed in preregistered studies (if not entirely absent, see Bakker et al., Reference Bakker, Veldkamp, van Assen, Crompvoets, Ong, Nosek and Wicherts2020; Claesen, Gomes, Tuerlinckx, & Vanpaemel, Reference Claesen, Gomes, Tuerlinckx and Vanpaemel2020) we suggest that Lee and colleagues could enhance the informativeness of their upcoming meta-analysis of cleansing effects by supplementing it with a targeted meta-analysis that includes only those studies that were preregistered. Finding meta-analytic evidence for cleansing effects in preregistered studies would considerably strengthen the case for cleansing effects being robust phenomena, whereas a failure to find evidence would be cause for concern. A meta-analysis of the money priming effect provides an interesting example of the extent to which results can diverge (Lodder, Ong, Grasman, & Wicherts, Reference Lodder, Ong, Grasman and Wicherts2019). The full meta-analysis of 246 money priming studies estimated an overall effect size of small to medium magnitude (g = 0.31; see Fig. 1 (top-left plot), p. 701). By contrast, the targeted meta-analysis of the 47 preregistered studies found an average effect size that was non-significant (g = 0.01; see Fig. 1 (middle-right plot), p. 701).
In summary, we have argued that a scientific assessment of the evidence for cleansing effects requires the application of state-of-the-art publication bias methods and a meta-analysis of preregistered studies. As things stand, the empirical foundation for the theory of grounded procedures is tenuous.
Financial support
Robert M. Ross is supported by the Australian Research Council, Grant Number: DP180102384. Robbie C.M. van Aert and Olmo R. van den Akker are supported by the European Research Council, Grant Number: 726361 (IMPROVE).
Conflict of interest
None.