In their clearly reasoned target article, Zwaan et al. make a persuasive case that direct replication is essential for the health of psychological science. The principle of the primacy of internal validity (Cook et al. Reference Cook, Campbell, Peracchio, Dunnette and Hough1990) underscores the point that one must convincingly demonstrate a causal effect (internal validity) before generalizing it to similar settings, participants, measures, and the like (external validity). Some scholars appear to have overlooked the importance of this mandate. In an otherwise incisive article, my Ph.D. mentor David Lykken (Reference Lykken1968) wrote that “Since operational replication [what Zwaan et al. term direct replication] must really be done by an independent second investigator and since constructive replication [what Zwaan et al. term conceptual replication] has greater generality, its success strongly impl[ies] that an operational replication would have succeeded also” (p. 159). Lykken, like many scholars, underestimated the myriad ways (e.g., p-hacking, file-drawering of negative results) in which conceptual replications can yield significant but spurious results (Lindsay et al. Reference Lindsay, Simons and Lilienfeld2016). Hence, an apparently successful conceptual replication does not imply that the direct replication would have succeeded, as well.
I build on Zwaan et al.'s well-reasoned arguments by extending them to a subdiscipline they did not explicitly address: clinical psychological science. Probably because recent replicability debates have been restricted largely to scholars in cognitive, social, and personality psychology (Tackett et al. Reference Tackett, Lilienfeld, Patrick, Johnson, Krueger, Miller, Oltmans and Shrout2017a), the implications of these discussions for key domains of clinical psychology, especially psychotherapy and assessment, have been insufficiently appreciated. I contend that an overemphasis on conceptual replication at the expense of direct replication can generate misleading conclusions that are potentially detrimental to clinical research and patient care.
In the psychotherapy field, attention has turned increasingly to the development and identification of empirically supported therapies (ESTs; Chambless & Ollendick Reference Chambless and Ollendick2001), which are treatments demonstrated to be efficacious for specific disorders in independently replicated trials. Their superficial differences notwithstanding, all EST taxonomies require these interventions to be manualized or at least delineated in sufficient detail to permit replication by independent researchers. Although direct replications of psychotherapy outcome studies are often impractical (Coyne Reference Coyne2016) given the formidable difficulties of recruiting comparable patients and ensuring comparably trained therapists, investigators can still undertake concerted efforts to ascertain whether a carefully described psychotherapy protocol that yields positive effects in one study does so in future studies. Herein lies the problem: Without an independently replicated demonstration that the original protocol generates positive effects, practitioners and researchers can interpret a successful conceptual replication of a modified protocol as evidence that the treatment is ready for routine clinical application. Such a conclusion would be premature and potentially harmful, because the original protocol has demonstrated its mettle in a single study alone.
Conversely, practitioners and researchers may assume that a conceptual replication failure implies that the initial psychotherapy protocol was ineffective, but this conclusion could likewise be erroneous. Admittedly, research on the extent to which adaptations of EST protocols tend to degrade their efficacy is inconsistent (Stirman et al. Reference Stirman, Gamarra, Bartlett, Calloway and Gutner2017). Nevertheless, in certain instances, seemingly minor changes in psychotherapy protocols may produce detrimental effects. For example, studies of exposure therapy for anxiety disorders suggest that the commonplace practice of encouraging patients to engage in safety behaviors (e.g., practicing relaxation skills) during exposure often adversely affects treatment outcomes (Blakey & Abramowitz Reference Blakey and Abramowitz2016). The same overarching conclusion may hold for self-help interventions. Rosen (Reference Rosen1993) observed that even seemingly trivial changes to self-help programs can result in unanticipated changes in treatment compliance, effectiveness, or both. For example, in one study the addition of a self-reward contracting manipulation to an effective program for snake phobia decreased treatment compliance from 50% to zero (Barrera & Rosen Reference Barrera and Rosen1977), perhaps because clients perceived the supplementary component as onerous. Consequently, failed conceptual replications can lead to the mistaken conclusion that effective treatment protocols are impractical, ineffective, or both.
In the clinical assessment field, an overemphasis on conceptual replication can contribute to what Pinto and I (Lilienfeld & Pinto Reference Lilienfeld and Pinto2015) termed the illusion of replication. This illusion can arise when investigators fail to delineate an explicit nomological network (Cronbach & Meehl Reference Cronbach and Meehl1955) of predictions for the construct validation of a measure, permitting them to engage in a program of ad hoc validation (Kane Reference Kane2001). In such a research program, psychologists are free to hand-pick from an assortment of findings on diverse indicators to justify support for a measure's construct validity. In some cases, they may conclude that a measure has been validated for a given clinical purpose even in the absence of a single directly replicated finding.
Research on the widely used “Suicide Constellation” of the Rorschach Inkblot Test affords a potential illustration. Based on a meta-analysis of Rorschach variables, an author team concluded that the Suicide Constellation is a well-validated indicator of suicide risk (Mihura et al. Reference Mihura, Meyer, Dumitrascu and Bombel2013, p. 572). Nevertheless, this conclusion hinged on only four studies (see Wood et al. [Reference Wood, Garb, Nezworski, Lilienfeld and Duke2015] for a discussion), one on completed suicides, one on attempted suicides, one on ratings of suicidality, and one on levels of serotonin in cerebrospinal fluid (low levels of which have been tied to suicide risk [Glick Reference Glick2015]). As a result, the validity of the Suicide Constellation is uncertain given that its support rests on correlations with four ostensibly interrelated, but separable, indicators, with no evidence of direct replication.
Conversely, researchers may assume that a conceptual replication failure following a seemingly minor change to a measure calls into question the initial positive finding. For example, in efforts to save time, investigators frequently administer abbreviated forms of well-established measures, such as the Minnesota Multiphasic Personality Inventory–2. Nevertheless, such short forms often exhibit psychometric properties inferior to those of their parent measures (Smith et al. Reference Smith, McCarthy and Anderson2000). Hence, failed conceptual replications using such measures do not mean that the original result was untrustworthy.
When it comes to psychological treatments and measures, generalizability cannot simply be assumed. Direct replications of initial positive results, or at least close approximations of them, are not merely a research formality. They are indispensable for drawing firm conclusions regarding the use of clinical methods.
In their clearly reasoned target article, Zwaan et al. make a persuasive case that direct replication is essential for the health of psychological science. The principle of the primacy of internal validity (Cook et al. Reference Cook, Campbell, Peracchio, Dunnette and Hough1990) underscores the point that one must convincingly demonstrate a causal effect (internal validity) before generalizing it to similar settings, participants, measures, and the like (external validity). Some scholars appear to have overlooked the importance of this mandate. In an otherwise incisive article, my Ph.D. mentor David Lykken (Reference Lykken1968) wrote that “Since operational replication [what Zwaan et al. term direct replication] must really be done by an independent second investigator and since constructive replication [what Zwaan et al. term conceptual replication] has greater generality, its success strongly impl[ies] that an operational replication would have succeeded also” (p. 159). Lykken, like many scholars, underestimated the myriad ways (e.g., p-hacking, file-drawering of negative results) in which conceptual replications can yield significant but spurious results (Lindsay et al. Reference Lindsay, Simons and Lilienfeld2016). Hence, an apparently successful conceptual replication does not imply that the direct replication would have succeeded, as well.
I build on Zwaan et al.'s well-reasoned arguments by extending them to a subdiscipline they did not explicitly address: clinical psychological science. Probably because recent replicability debates have been restricted largely to scholars in cognitive, social, and personality psychology (Tackett et al. Reference Tackett, Lilienfeld, Patrick, Johnson, Krueger, Miller, Oltmans and Shrout2017a), the implications of these discussions for key domains of clinical psychology, especially psychotherapy and assessment, have been insufficiently appreciated. I contend that an overemphasis on conceptual replication at the expense of direct replication can generate misleading conclusions that are potentially detrimental to clinical research and patient care.
In the psychotherapy field, attention has turned increasingly to the development and identification of empirically supported therapies (ESTs; Chambless & Ollendick Reference Chambless and Ollendick2001), which are treatments demonstrated to be efficacious for specific disorders in independently replicated trials. Their superficial differences notwithstanding, all EST taxonomies require these interventions to be manualized or at least delineated in sufficient detail to permit replication by independent researchers. Although direct replications of psychotherapy outcome studies are often impractical (Coyne Reference Coyne2016) given the formidable difficulties of recruiting comparable patients and ensuring comparably trained therapists, investigators can still undertake concerted efforts to ascertain whether a carefully described psychotherapy protocol that yields positive effects in one study does so in future studies. Herein lies the problem: Without an independently replicated demonstration that the original protocol generates positive effects, practitioners and researchers can interpret a successful conceptual replication of a modified protocol as evidence that the treatment is ready for routine clinical application. Such a conclusion would be premature and potentially harmful, because the original protocol has demonstrated its mettle in a single study alone.
Conversely, practitioners and researchers may assume that a conceptual replication failure implies that the initial psychotherapy protocol was ineffective, but this conclusion could likewise be erroneous. Admittedly, research on the extent to which adaptations of EST protocols tend to degrade their efficacy is inconsistent (Stirman et al. Reference Stirman, Gamarra, Bartlett, Calloway and Gutner2017). Nevertheless, in certain instances, seemingly minor changes in psychotherapy protocols may produce detrimental effects. For example, studies of exposure therapy for anxiety disorders suggest that the commonplace practice of encouraging patients to engage in safety behaviors (e.g., practicing relaxation skills) during exposure often adversely affects treatment outcomes (Blakey & Abramowitz Reference Blakey and Abramowitz2016). The same overarching conclusion may hold for self-help interventions. Rosen (Reference Rosen1993) observed that even seemingly trivial changes to self-help programs can result in unanticipated changes in treatment compliance, effectiveness, or both. For example, in one study the addition of a self-reward contracting manipulation to an effective program for snake phobia decreased treatment compliance from 50% to zero (Barrera & Rosen Reference Barrera and Rosen1977), perhaps because clients perceived the supplementary component as onerous. Consequently, failed conceptual replications can lead to the mistaken conclusion that effective treatment protocols are impractical, ineffective, or both.
In the clinical assessment field, an overemphasis on conceptual replication can contribute to what Pinto and I (Lilienfeld & Pinto Reference Lilienfeld and Pinto2015) termed the illusion of replication. This illusion can arise when investigators fail to delineate an explicit nomological network (Cronbach & Meehl Reference Cronbach and Meehl1955) of predictions for the construct validation of a measure, permitting them to engage in a program of ad hoc validation (Kane Reference Kane2001). In such a research program, psychologists are free to hand-pick from an assortment of findings on diverse indicators to justify support for a measure's construct validity. In some cases, they may conclude that a measure has been validated for a given clinical purpose even in the absence of a single directly replicated finding.
Research on the widely used “Suicide Constellation” of the Rorschach Inkblot Test affords a potential illustration. Based on a meta-analysis of Rorschach variables, an author team concluded that the Suicide Constellation is a well-validated indicator of suicide risk (Mihura et al. Reference Mihura, Meyer, Dumitrascu and Bombel2013, p. 572). Nevertheless, this conclusion hinged on only four studies (see Wood et al. [Reference Wood, Garb, Nezworski, Lilienfeld and Duke2015] for a discussion), one on completed suicides, one on attempted suicides, one on ratings of suicidality, and one on levels of serotonin in cerebrospinal fluid (low levels of which have been tied to suicide risk [Glick Reference Glick2015]). As a result, the validity of the Suicide Constellation is uncertain given that its support rests on correlations with four ostensibly interrelated, but separable, indicators, with no evidence of direct replication.
Conversely, researchers may assume that a conceptual replication failure following a seemingly minor change to a measure calls into question the initial positive finding. For example, in efforts to save time, investigators frequently administer abbreviated forms of well-established measures, such as the Minnesota Multiphasic Personality Inventory–2. Nevertheless, such short forms often exhibit psychometric properties inferior to those of their parent measures (Smith et al. Reference Smith, McCarthy and Anderson2000). Hence, failed conceptual replications using such measures do not mean that the original result was untrustworthy.
When it comes to psychological treatments and measures, generalizability cannot simply be assumed. Direct replications of initial positive results, or at least close approximations of them, are not merely a research formality. They are indispensable for drawing firm conclusions regarding the use of clinical methods.