Although Yarkoni frames his primary concern with current psychology research paradigms in terms of the frequent mismatch between verbal and statistical expressions of their hypotheses, the main problem he uncovers is the failure to account for, measure, or control variance. Perhaps most important is the common failure to account for variation in the experimental materials themselves – the “stimulus as fixed effect fallacy.” Here we would like to point out that these problems, which Yarkoni correctly identifies, are a necessary consequence of psychologists' abiding commitment to the stimulus-response (S-R) formula when constructing experiments. The assumptions behind this style of psychological investigation are the root cause of the operationalization and generalizability crisis.
The S-R formula assumes that you can reduce a psychological phenomenon to a simplified behavioral response to an isolated perceptual cue. For example, “attention” is measured by recording response time to select letter presentations in the presence or absence of distractors, or, in Yarkoni's example, “recognition memory” is measured by asking participants to select a target photograph of a previously-seen face under different dual-task interference conditions. This kind of reductive operationalization of the psychological phenomenon appears to offer experimental control, but in fact hides real subject-induced variance (Dewey, Reference Dewey1896), removes the phenomenon from the actual contexts in which it manifests (Danziger, Reference Danziger1994, pp. 30–33), and in some cases may well destroy the phenomenon entirely (Gibson, Reference Gibson1979, pp. 1–4). All of these are necessary consequences of the S-R model and lead directly to problems of generalizability. There is, however, a different way to proceed. As an additional remedy to the issues Yarkoni raises, we would like to draw attention to a class of methods, sometimes called perturbation experiments, that approach the study of perception and behavior differently.
A traditional S-R experiment asks questions of the form, “if I present this isolated cue to a participant, what response is elicited?” The aim is to establish a link in statistical terms between the thing being experimentally varied (the “stimulus,” or the independent variable) and the behavior being measured (the “response,” or the dependent variable). Because the question is answered through these statistical means, and finding an experimental effect depends vitally on controlling variability within the experiment other than the intended experimental manipulation, these S-R experiments are necessarily vulnerable to the failure of accounting for sources of variance, a problem that, as Yarkoni shows, can quickly become intractable.
In a perturbation experiment, the aim is different. Perturbation experiments aim to identify the precise variable or variables implicated in the ongoing control of a complete activity. A perturbation experiment asks questions of the form, “precisely which aspects of this ongoing activity do I need to disrupt in order to cause a qualitative shift in the behavior?” This kind of methodology has a long history in physiological psychology. Classic nineteenth century studies of brain injury are a form of natural perturbation experiment (Damasio et al., Reference Damasio, Grabowski, Frank, Galaburda and Damasio1994; James, Reference James1890, Ch. 2). Modern transcranial magnetic stimulation studies in human subjects, and optogenetic methods in animal models, are perturbation experiments in which a physiological perturbation is introduced artificially by the experimenter.
Perturbation methods have long been used in behavioral studies too, notably in motor control studies (e.g., Gibson & Walk, Reference Gibson and Walk1960). We would like to draw attention to their use in a motor development study looking at how infants negotiate slopes of varying inclination (Adolph, Eppler, & Gibson, Reference Adolph, Eppler and Gibson1993). This study found that, while crawling infants attempt to descend too-steep slopes head-first, older, more experienced toddlers modify their style of locomotion before attempting the descent (e.g., sliding down instead of attempting to walk down). The qualitative bifurcation in the behavior of the toddlers – the slope's perturbation of their default mode of locomotion – is unambiguous evidence of their having learned to attend to the visual cue for slope. Conclusions drawn from perturbation experiments do not depend on establishing links between the phenomenon and what causes it in statistical terms, and so they are not vulnerable to the stimulus as fixed-effect fallacy and other failures in accounting for variance. Further, because this methodology allows complete and ongoing behavior, it keeps the phenomenon of interest and the context in which it happens relatively intact. Note, as well, in contrast to typical S-R perceptual cues, the perceptual cue in this case (the inclination of the slope) is not isolated nor it is fixed; and, by systematically changing this variable, these experiments do not sacrifice methodological rigor.
A challenge is how to scale the perturbation methodology from investigation of “online” motor control tasks to more cognitive tasks, such as those in the experiments Yarkoni discusses putatively demonstrating the verbal overshadowing effect. In response, we would make two points. First, certain kinds of cognitive abilities are more immediately amenable to the perturbation paradigm than others. Decision-making and attention may be relatively amenable to perturbation methods. In the Adolph et al. (Reference Adolph, Eppler and Gibson1993) study, the bifurcation in the toddlers' behavior when the slope becomes too steep is evidence that they have learned to attend to the visual cue for the slope (which necessarily means they perceive it), and as a result they have decided to locomote in a different way. The study can be interpreted as measuring attention and decision-making in situ. Of course, more work needs to be done to extend this methodological approach to higher-order symbolic forms of cognition (Baggs, Raja, & Anderson, Reference Baggs, Raja and Anderson2020).
Second, by assuming that the only way to establish a psychological fact is via experiments set up in the S-R format, psychology unnecessarily constrains itself. The S-R methodology has been the dominant method used in psychology labs for 150 years, and this situation has led repeatedly to periods of crisis, of which the current version – focused on replicability and now on generalizability – is merely the latest iteration (Reed, Reference Reed1996, pp. 3–5). Perhaps it is time to recognize that there are more methods available in the psychologist's toolbox than what is dreamt under the S-R philosophy.
Although Yarkoni frames his primary concern with current psychology research paradigms in terms of the frequent mismatch between verbal and statistical expressions of their hypotheses, the main problem he uncovers is the failure to account for, measure, or control variance. Perhaps most important is the common failure to account for variation in the experimental materials themselves – the “stimulus as fixed effect fallacy.” Here we would like to point out that these problems, which Yarkoni correctly identifies, are a necessary consequence of psychologists' abiding commitment to the stimulus-response (S-R) formula when constructing experiments. The assumptions behind this style of psychological investigation are the root cause of the operationalization and generalizability crisis.
The S-R formula assumes that you can reduce a psychological phenomenon to a simplified behavioral response to an isolated perceptual cue. For example, “attention” is measured by recording response time to select letter presentations in the presence or absence of distractors, or, in Yarkoni's example, “recognition memory” is measured by asking participants to select a target photograph of a previously-seen face under different dual-task interference conditions. This kind of reductive operationalization of the psychological phenomenon appears to offer experimental control, but in fact hides real subject-induced variance (Dewey, Reference Dewey1896), removes the phenomenon from the actual contexts in which it manifests (Danziger, Reference Danziger1994, pp. 30–33), and in some cases may well destroy the phenomenon entirely (Gibson, Reference Gibson1979, pp. 1–4). All of these are necessary consequences of the S-R model and lead directly to problems of generalizability. There is, however, a different way to proceed. As an additional remedy to the issues Yarkoni raises, we would like to draw attention to a class of methods, sometimes called perturbation experiments, that approach the study of perception and behavior differently.
A traditional S-R experiment asks questions of the form, “if I present this isolated cue to a participant, what response is elicited?” The aim is to establish a link in statistical terms between the thing being experimentally varied (the “stimulus,” or the independent variable) and the behavior being measured (the “response,” or the dependent variable). Because the question is answered through these statistical means, and finding an experimental effect depends vitally on controlling variability within the experiment other than the intended experimental manipulation, these S-R experiments are necessarily vulnerable to the failure of accounting for sources of variance, a problem that, as Yarkoni shows, can quickly become intractable.
In a perturbation experiment, the aim is different. Perturbation experiments aim to identify the precise variable or variables implicated in the ongoing control of a complete activity. A perturbation experiment asks questions of the form, “precisely which aspects of this ongoing activity do I need to disrupt in order to cause a qualitative shift in the behavior?” This kind of methodology has a long history in physiological psychology. Classic nineteenth century studies of brain injury are a form of natural perturbation experiment (Damasio et al., Reference Damasio, Grabowski, Frank, Galaburda and Damasio1994; James, Reference James1890, Ch. 2). Modern transcranial magnetic stimulation studies in human subjects, and optogenetic methods in animal models, are perturbation experiments in which a physiological perturbation is introduced artificially by the experimenter.
Perturbation methods have long been used in behavioral studies too, notably in motor control studies (e.g., Gibson & Walk, Reference Gibson and Walk1960). We would like to draw attention to their use in a motor development study looking at how infants negotiate slopes of varying inclination (Adolph, Eppler, & Gibson, Reference Adolph, Eppler and Gibson1993). This study found that, while crawling infants attempt to descend too-steep slopes head-first, older, more experienced toddlers modify their style of locomotion before attempting the descent (e.g., sliding down instead of attempting to walk down). The qualitative bifurcation in the behavior of the toddlers – the slope's perturbation of their default mode of locomotion – is unambiguous evidence of their having learned to attend to the visual cue for slope. Conclusions drawn from perturbation experiments do not depend on establishing links between the phenomenon and what causes it in statistical terms, and so they are not vulnerable to the stimulus as fixed-effect fallacy and other failures in accounting for variance. Further, because this methodology allows complete and ongoing behavior, it keeps the phenomenon of interest and the context in which it happens relatively intact. Note, as well, in contrast to typical S-R perceptual cues, the perceptual cue in this case (the inclination of the slope) is not isolated nor it is fixed; and, by systematically changing this variable, these experiments do not sacrifice methodological rigor.
A challenge is how to scale the perturbation methodology from investigation of “online” motor control tasks to more cognitive tasks, such as those in the experiments Yarkoni discusses putatively demonstrating the verbal overshadowing effect. In response, we would make two points. First, certain kinds of cognitive abilities are more immediately amenable to the perturbation paradigm than others. Decision-making and attention may be relatively amenable to perturbation methods. In the Adolph et al. (Reference Adolph, Eppler and Gibson1993) study, the bifurcation in the toddlers' behavior when the slope becomes too steep is evidence that they have learned to attend to the visual cue for the slope (which necessarily means they perceive it), and as a result they have decided to locomote in a different way. The study can be interpreted as measuring attention and decision-making in situ. Of course, more work needs to be done to extend this methodological approach to higher-order symbolic forms of cognition (Baggs, Raja, & Anderson, Reference Baggs, Raja and Anderson2020).
Second, by assuming that the only way to establish a psychological fact is via experiments set up in the S-R format, psychology unnecessarily constrains itself. The S-R methodology has been the dominant method used in psychology labs for 150 years, and this situation has led repeatedly to periods of crisis, of which the current version – focused on replicability and now on generalizability – is merely the latest iteration (Reed, Reference Reed1996, pp. 3–5). Perhaps it is time to recognize that there are more methods available in the psychologist's toolbox than what is dreamt under the S-R philosophy.
Financial support
This work was supported by a Canada Research Chair award to MLA (award # 950-231929 from SSHRC). The authors declare they have no conflicts of interest pertaining to the material presented here.