Firestone & Scholl (F&S) argue for the impenetrability of perception by cognition in part by relegating many well-established top-down effects to peripheral, trivial changes in sensory processing akin to willfully closing ones' eyes. Drawing on a large neuroscience literature, we argue that many of the class of effects the authors dismiss are neither trivial nor peripheral, but rather reflect complex tuning changes implemented across multiple levels of the visual hierarchy. More generally, the authors' arguments betray a conception of the mind that is difficult to square with modern neuroscience. In particular, they view the visual brain as containing a sensory/input module, a perception module, and a cognition module. Although carving the mind into these boxes has, in the past, provided a convenient construct for thinking about vision, this construct is unsupported by what we now know about the organization of the visual brain.
Although primary visual cortex (V1) is a well-established input to other cortical areas, it is only an input in the sense that the vast majority of visual input to the cortex first passes through it. Importantly, however, it is not encapsulated with respect to subsequent areas or task-demands. Activity in V1 has been shown to modulate in response to attention (e.g., Motter Reference Motter1993; O'Connor et al. Reference O'Connor, Fukui, Pinsk and Kastner2002), task demands (e.g., Li et al. Reference Li, Piëch and Gilbert2006), interpretation, (Hsieh et al. Reference Hsieh, Vul and Kanwisher2010; Kok et al. Reference Kok, Jehee and de Lange2012; Roelfsema & Spekreijse Reference Roelfsema and Spekreijse2001; van Loon et al. Reference van Loon, Fahrenfort, van der Velde, Lirk, Vulink, Hollmann, Scholte and Lamme2015) and also to vary as a function of conscious experience (e.g., Lee et al. Reference Lee, Blake and Heeger2005; Wunderlich et al. Reference Wunderlich, Schneider and Kastner2005). Further, re-entrant activity in V1, long after the initial feedforward sweep of information, appears to be necessary for conscious perception (Pascual-Leone & Walsh Reference Pascual-Leone and Walsh2001). These data indicate that V1 is not so much an early stage of vision but rather part of ongoing, dynamic network of feedforward and feedback activity. Even the lateral geniculate nucleus (LGN), a subcortical region that passes information from the retina to V1, tracks changes in attention (O'Connor et al. Reference O'Connor, Fukui, Pinsk and Kastner2002) and conscious experience (Wunderlich et al. Reference Wunderlich, Schneider and Kastner2005). In other words, early visual regions are better thought of as part of a dynamic network of visual areas rather than an encapsulated sensory stage. Thus, even though it might be conceptually convenient to cleanly separate sensation from perception, the empirical data suggest that this dichotomy does not actually exist in the brain.
Even if, for the sake of argument, we equated early visual areas with an input or sensation module, other neural data do not support the authors' intuition that attention works by first modulating those early stages (i.e., their peripheral attention effects). The anatomical pathways through which attention initially modulates visual processing are relatively late in the visual hierarchy (Baldauf & Desimone Reference Baldauf and Desimone2014; Buffalo et al. Reference Buffalo, Bertini, Ungerleider and Desimone2005; Esterman & Yantis Reference Esterman and Yantis2010; Moore & Armstrong Reference Moore and Armstrong2003). In fact, the levels at which attention networks interface with visual cortex correspond more closely to a “perception box”; that is, later visual areas where activity seems to correlate more robustly with conscious perception and behavior (Hung et al. Reference Hung, Kreiman, Poggio and DiCarlo2005; Leopold & Logothetis Reference Leopold and Logothetis1996; Logothetis & Schall Reference Logothetis and Schall1989; Walther et al. Reference Walther, Caddigan, Fei-Fei and Beck2009). Attention effects thus seem to feed back from later areas to earlier areas, not proceeding from input to perception as the authors' box model implies. Indeed, the authors' entire concept of peripheral attention effects is unsubstantiated by neural data.
Finally, the authors' intuition that attention effects are functionally similar to trivial changes in input also does not actually accord with the data. Although we agree that closing one's eyes is a trivial effect of cognition on vision, it is a mistake to take this intuition about the peripheral nervous system (i.e., the eye) and apply it to the brain. Instead of merely enhancing or gating the processing of stimuli (a simple gain model), attention also appears to cause large-scale tuning changes across the visual hierarchy. For example, when people view video clips and search for either humans or vehicles, the tuning of the entire ventral visual cortex, including V1, shifts toward the attended category and away from unattended categories (Çukur et al. Reference Çukur, Nishimoto, Huth and Gallant2013). Similarly, given identical stimuli, neurons in V1 can flexibly change whether they carry information about collinearity of lines or bisection of parallel lines depending on which task has been cued (Gilbert & Li Reference Gilbert and Li2013). Attention can even enhance orientations that are not actually present in the display (i.e., on either side of a target in orientation-tuning space) to more optimally discriminate the target from distractors (Scolari et al. Reference Scolari, Byers and Serences2012).
These attention effects are nothing like turning off the lights. Rather than simply gating or enhancing input, much of the neural hardware responsible for vision flexibly changes its function in complex ways depending on the goals of the observer. It is hard to imagine better evidence for the cognitive penetrability of vision than this.
The authors do seem aware of some of the neural data discussed here; they even admit that attention can work in “rich and nuanced ways,” and “change the content of perception rather than merely influence what we focus on” (sect. 4.5, para. 6). Curiously, here they momentarily appear to back off their main thesis and restrict their criticisms to “peripheral sorts of attention – involving simple changes in which locations, features, or objects we focus on” (emphasis in original, sect. 4.5, para. 6). However, as we have discussed, the interactions among the cortical areas involved in vision are so extensive and result in such flexible representations throughout visual cortex that not only is it impossible to neatly separate sensation and perception, but also the concept of peripheral attention is rendered useless. Although these box models of the mind have great appeal and have facilitated both careful experimentation and fruitful theorizing in the past, the neural data are clear: It is time we move beyond a box model of the brain.
Firestone & Scholl (F&S) argue for the impenetrability of perception by cognition in part by relegating many well-established top-down effects to peripheral, trivial changes in sensory processing akin to willfully closing ones' eyes. Drawing on a large neuroscience literature, we argue that many of the class of effects the authors dismiss are neither trivial nor peripheral, but rather reflect complex tuning changes implemented across multiple levels of the visual hierarchy. More generally, the authors' arguments betray a conception of the mind that is difficult to square with modern neuroscience. In particular, they view the visual brain as containing a sensory/input module, a perception module, and a cognition module. Although carving the mind into these boxes has, in the past, provided a convenient construct for thinking about vision, this construct is unsupported by what we now know about the organization of the visual brain.
Although primary visual cortex (V1) is a well-established input to other cortical areas, it is only an input in the sense that the vast majority of visual input to the cortex first passes through it. Importantly, however, it is not encapsulated with respect to subsequent areas or task-demands. Activity in V1 has been shown to modulate in response to attention (e.g., Motter Reference Motter1993; O'Connor et al. Reference O'Connor, Fukui, Pinsk and Kastner2002), task demands (e.g., Li et al. Reference Li, Piëch and Gilbert2006), interpretation, (Hsieh et al. Reference Hsieh, Vul and Kanwisher2010; Kok et al. Reference Kok, Jehee and de Lange2012; Roelfsema & Spekreijse Reference Roelfsema and Spekreijse2001; van Loon et al. Reference van Loon, Fahrenfort, van der Velde, Lirk, Vulink, Hollmann, Scholte and Lamme2015) and also to vary as a function of conscious experience (e.g., Lee et al. Reference Lee, Blake and Heeger2005; Wunderlich et al. Reference Wunderlich, Schneider and Kastner2005). Further, re-entrant activity in V1, long after the initial feedforward sweep of information, appears to be necessary for conscious perception (Pascual-Leone & Walsh Reference Pascual-Leone and Walsh2001). These data indicate that V1 is not so much an early stage of vision but rather part of ongoing, dynamic network of feedforward and feedback activity. Even the lateral geniculate nucleus (LGN), a subcortical region that passes information from the retina to V1, tracks changes in attention (O'Connor et al. Reference O'Connor, Fukui, Pinsk and Kastner2002) and conscious experience (Wunderlich et al. Reference Wunderlich, Schneider and Kastner2005). In other words, early visual regions are better thought of as part of a dynamic network of visual areas rather than an encapsulated sensory stage. Thus, even though it might be conceptually convenient to cleanly separate sensation from perception, the empirical data suggest that this dichotomy does not actually exist in the brain.
Even if, for the sake of argument, we equated early visual areas with an input or sensation module, other neural data do not support the authors' intuition that attention works by first modulating those early stages (i.e., their peripheral attention effects). The anatomical pathways through which attention initially modulates visual processing are relatively late in the visual hierarchy (Baldauf & Desimone Reference Baldauf and Desimone2014; Buffalo et al. Reference Buffalo, Bertini, Ungerleider and Desimone2005; Esterman & Yantis Reference Esterman and Yantis2010; Moore & Armstrong Reference Moore and Armstrong2003). In fact, the levels at which attention networks interface with visual cortex correspond more closely to a “perception box”; that is, later visual areas where activity seems to correlate more robustly with conscious perception and behavior (Hung et al. Reference Hung, Kreiman, Poggio and DiCarlo2005; Leopold & Logothetis Reference Leopold and Logothetis1996; Logothetis & Schall Reference Logothetis and Schall1989; Walther et al. Reference Walther, Caddigan, Fei-Fei and Beck2009). Attention effects thus seem to feed back from later areas to earlier areas, not proceeding from input to perception as the authors' box model implies. Indeed, the authors' entire concept of peripheral attention effects is unsubstantiated by neural data.
Finally, the authors' intuition that attention effects are functionally similar to trivial changes in input also does not actually accord with the data. Although we agree that closing one's eyes is a trivial effect of cognition on vision, it is a mistake to take this intuition about the peripheral nervous system (i.e., the eye) and apply it to the brain. Instead of merely enhancing or gating the processing of stimuli (a simple gain model), attention also appears to cause large-scale tuning changes across the visual hierarchy. For example, when people view video clips and search for either humans or vehicles, the tuning of the entire ventral visual cortex, including V1, shifts toward the attended category and away from unattended categories (Çukur et al. Reference Çukur, Nishimoto, Huth and Gallant2013). Similarly, given identical stimuli, neurons in V1 can flexibly change whether they carry information about collinearity of lines or bisection of parallel lines depending on which task has been cued (Gilbert & Li Reference Gilbert and Li2013). Attention can even enhance orientations that are not actually present in the display (i.e., on either side of a target in orientation-tuning space) to more optimally discriminate the target from distractors (Scolari et al. Reference Scolari, Byers and Serences2012).
These attention effects are nothing like turning off the lights. Rather than simply gating or enhancing input, much of the neural hardware responsible for vision flexibly changes its function in complex ways depending on the goals of the observer. It is hard to imagine better evidence for the cognitive penetrability of vision than this.
The authors do seem aware of some of the neural data discussed here; they even admit that attention can work in “rich and nuanced ways,” and “change the content of perception rather than merely influence what we focus on” (sect. 4.5, para. 6). Curiously, here they momentarily appear to back off their main thesis and restrict their criticisms to “peripheral sorts of attention – involving simple changes in which locations, features, or objects we focus on” (emphasis in original, sect. 4.5, para. 6). However, as we have discussed, the interactions among the cortical areas involved in vision are so extensive and result in such flexible representations throughout visual cortex that not only is it impossible to neatly separate sensation and perception, but also the concept of peripheral attention is rendered useless. Although these box models of the mind have great appeal and have facilitated both careful experimentation and fruitful theorizing in the past, the neural data are clear: It is time we move beyond a box model of the brain.