Firestone & Scholl's (F&S's) target article is an important and timely critique of the present zeitgeist that supports fundamental, even detailed, influence of cognition on perception. The notion that a high-level mental event such as a mere semantic label can change basic percepts goes beyond having a revolutionary potential in changing our understanding of the modus operandi of the brain; it falls into the category of an “extraordinary claim.” And extraordinary claims require extraordinary evidence. The notion of cognitive penetration into perception is extraordinary because, as I show below, no feasible physical route exists for such a penetration. Only by concentrating on psychophysical results – while totally refraining from suggesting how cognitive penetration can be instantiated – can the proponents of that view escape the realization of how extraordinary their claim is. Homeopathy, to take an extreme example, is recognized as making extraordinary claims because no plausible mechanisms for its effects are suggested.
My comments aim to complement the target article in pointing out that the anatomical and functional properties of the visual cortex preclude any significant, spatially specific cognitive penetration. To support this view I present two fundamental mechanisms related to the organization and functionality of the visual cortex: representation of space by primary visual cortex (V1) as evident by conscious perception, and the organization of top-down inputs.
Conscious perception of space is manifested first and foremost in the detailed, high-resolution, topographically precise depiction of spatial elements. When we see a face, for example, we perceive it as an integrated whole, but at the same time we are consciously aware of all of the minute spatial elements that make up that face. Indeed, it is our perception of details that enables us to discriminate between faces – in spite of their inherent similarity. Elsewhere, I show (Gur Reference Gur2015) that because V1 is the only visual area that represents space at a resolution and topographical exactitude that is compatible with our perceptual abilities, its preintegration response patterns (“V1 map”) must be the neural substance of image representation. From these preintegration response patterns, information converges in successive hierarchical stages, from V1 orientation-selective cells through downstream cortical areas V2, V4, and the inferotemporal cortex (IT). This successive convergence results in cells having increasingly large receptive fields (RFs) that cannot encode spatial details but may be selective to global features such as size, orientation, or category. At the pinnacle of the hierarchy we find anterior IT “object selective” and “face-selective” cells with very large (10°–50°) RFs responding to a considerable part of the visual field and requiring large (>5°) stimuli for their activation (Ito et al. Reference Ito, Tamura, Fujita and Tanaka1995). Clearly, such cells cannot represent the fine spatial details that we are so sensitive to. Thus, in the visual cortex, objects in their full details are represented by V1 response patterns while the global information for each object that is required for comparing the acute image to its stored prototype (“recognition”) is extracted by information-integrating cells at the various visual areas. Note that it is only the acute image, small or large, tilted or not, bright or dark, that is consciously perceived. All of the processes, from V1 preintegration activity patterns onward, that extract spatial global information are not perceived. We also note that the perceptual ability to transform individual spatial elements into holistic objects is instantaneous and parallel – which argues for a feed-forward, encapsulated perception.
Top-down inputs to V1
V1 is the only cortical area where space is represented at a high resolution, so for cognition to influence perception it must affect V1 space representation. However, top-down inputs to V1 do not match its topographic accuracy. Cognitive inputs, which originate, presumably, in nonvisual areas, go through multisynaptic, multiarea paths before reaching V1. For example, inputs originating at the prefrontal cortex reach first the IT cortex; some outputs from the IT cortex reach V1 directly and some reach it via the V4→V2→V1 route (Gilbert & Wu Reference Gilbert and Wu2013). Almost all top-down information reaches V1 via layer 1 (see summary Fig. 1 in Gur & Snodderly Reference Gur and Snodderly2008), where the spatial extent of its synaptic connectivity is rather spread out (Rockland Reference Rockland, Peters and Rockland1994). Such connectivity rules out any direct, spatially circumscribed influence over V1 upper layers' cells representing the objects' details (Gur Reference Gur2015; Gur et al. Reference Gur, Kagan and Snodderly2005). Even more important, the route into the visual cortex goes through the large RF cells of the IT cortex. Feedback from such cells, which are not sensitive to small spatial details, makes it impossible for top-down input to influence a selective part of the visual scene – such as a stroke on a Chinese character or an image of a dog surrounded by shoes (Fig. 2 of the target article). We can thus conclude that because cognition must reach V1 through IT large RF cells to end up in diffuse synaptic contacts in V1 layer 1, it can modulate processes, such as attention, affecting a large part of the visual field, but cannot target a small part of the visual scene while leaving other parts unchanged. Thus, the nature and organization of this fuzzy top-down information rule out any direct, spatially accurate influence.
Firestone & Scholl's (F&S's) target article is an important and timely critique of the present zeitgeist that supports fundamental, even detailed, influence of cognition on perception. The notion that a high-level mental event such as a mere semantic label can change basic percepts goes beyond having a revolutionary potential in changing our understanding of the modus operandi of the brain; it falls into the category of an “extraordinary claim.” And extraordinary claims require extraordinary evidence. The notion of cognitive penetration into perception is extraordinary because, as I show below, no feasible physical route exists for such a penetration. Only by concentrating on psychophysical results – while totally refraining from suggesting how cognitive penetration can be instantiated – can the proponents of that view escape the realization of how extraordinary their claim is. Homeopathy, to take an extreme example, is recognized as making extraordinary claims because no plausible mechanisms for its effects are suggested.
My comments aim to complement the target article in pointing out that the anatomical and functional properties of the visual cortex preclude any significant, spatially specific cognitive penetration. To support this view I present two fundamental mechanisms related to the organization and functionality of the visual cortex: representation of space by primary visual cortex (V1) as evident by conscious perception, and the organization of top-down inputs.
Conscious perception of space is manifested first and foremost in the detailed, high-resolution, topographically precise depiction of spatial elements. When we see a face, for example, we perceive it as an integrated whole, but at the same time we are consciously aware of all of the minute spatial elements that make up that face. Indeed, it is our perception of details that enables us to discriminate between faces – in spite of their inherent similarity. Elsewhere, I show (Gur Reference Gur2015) that because V1 is the only visual area that represents space at a resolution and topographical exactitude that is compatible with our perceptual abilities, its preintegration response patterns (“V1 map”) must be the neural substance of image representation. From these preintegration response patterns, information converges in successive hierarchical stages, from V1 orientation-selective cells through downstream cortical areas V2, V4, and the inferotemporal cortex (IT). This successive convergence results in cells having increasingly large receptive fields (RFs) that cannot encode spatial details but may be selective to global features such as size, orientation, or category. At the pinnacle of the hierarchy we find anterior IT “object selective” and “face-selective” cells with very large (10°–50°) RFs responding to a considerable part of the visual field and requiring large (>5°) stimuli for their activation (Ito et al. Reference Ito, Tamura, Fujita and Tanaka1995). Clearly, such cells cannot represent the fine spatial details that we are so sensitive to. Thus, in the visual cortex, objects in their full details are represented by V1 response patterns while the global information for each object that is required for comparing the acute image to its stored prototype (“recognition”) is extracted by information-integrating cells at the various visual areas. Note that it is only the acute image, small or large, tilted or not, bright or dark, that is consciously perceived. All of the processes, from V1 preintegration activity patterns onward, that extract spatial global information are not perceived. We also note that the perceptual ability to transform individual spatial elements into holistic objects is instantaneous and parallel – which argues for a feed-forward, encapsulated perception.
Top-down inputs to V1
V1 is the only cortical area where space is represented at a high resolution, so for cognition to influence perception it must affect V1 space representation. However, top-down inputs to V1 do not match its topographic accuracy. Cognitive inputs, which originate, presumably, in nonvisual areas, go through multisynaptic, multiarea paths before reaching V1. For example, inputs originating at the prefrontal cortex reach first the IT cortex; some outputs from the IT cortex reach V1 directly and some reach it via the V4→V2→V1 route (Gilbert & Wu Reference Gilbert and Wu2013). Almost all top-down information reaches V1 via layer 1 (see summary Fig. 1 in Gur & Snodderly Reference Gur and Snodderly2008), where the spatial extent of its synaptic connectivity is rather spread out (Rockland Reference Rockland, Peters and Rockland1994). Such connectivity rules out any direct, spatially circumscribed influence over V1 upper layers' cells representing the objects' details (Gur Reference Gur2015; Gur et al. Reference Gur, Kagan and Snodderly2005). Even more important, the route into the visual cortex goes through the large RF cells of the IT cortex. Feedback from such cells, which are not sensitive to small spatial details, makes it impossible for top-down input to influence a selective part of the visual scene – such as a stroke on a Chinese character or an image of a dog surrounded by shoes (Fig. 2 of the target article). We can thus conclude that because cognition must reach V1 through IT large RF cells to end up in diffuse synaptic contacts in V1 layer 1, it can modulate processes, such as attention, affecting a large part of the visual field, but cannot target a small part of the visual scene while leaving other parts unchanged. Thus, the nature and organization of this fuzzy top-down information rule out any direct, spatially accurate influence.