Disagreements arise when people argue with different facts. But disagreements can also arise when people argue from different starting assumptions. F&S and I share all of the same facts, but F&S come to the wrong conclusions because they have the wrong assumptions.
Many of the studies F&S review indeed suffer from stimulus and experimenter-demand confounds, But many others are well-controlled investigations using gold-standard psychophysical methods. These studies show that expectations and knowledge affect virtually all aspects of visual perception. For example, knowledge of surface hardness affects amodal completion (Vrins et al. Reference Vrins, de Wit and van Lier2009), knowledge of bodies affects perceiving depth from binocular disparity (Bulthoff et al. Reference Bulthoff, Bulthoff and Sinha1998), expectations of motion affect motion perception (Sterzer et al. Reference Sterzer, Frith and Petrovic2008), and knowledge of real-world size affects perceived speed of motion (Martín et al. Reference Martín, Chambeaud and Barraza2015). Meaningfulness – a putatively late process – affects putatively earlier processes such as shape discrimination (Lupyan & Spivey Reference Lupyan and Spivey2008; Lupyan et al. Reference Lupyan, Thompson-Schill and Swingley2010) and recovery of 3-D volumes from two-dimensional images (Moore & Cavanagh Reference Moore and Cavanagh1998). Color knowledge affects color appearance of images (Hansen et al. Reference Hansen, Olkkonen, Walter and Gegenfurtner2006) and even color afterimages (Lupyan Reference Lupyan2015b). Hearing a word affects the earliest stages of visual processing (Boutonnet & Lupyan Reference Boutonnet and Lupyan2015; see also Landau et al. Reference Landau, Aziz-Zadeh and Ivry2010; Pelekanos & Moutoussis Reference Pelekanos and Moutoussis2011).
How can F&S, who are aware of all of this work (some of which they discuss in detail in the target article), still argue that there are no top-down effects on perception? They dismiss all of those studies on the grounds that they are “just” effects of attention, memory, or categorization/recognition. This “it's not perception, it's just X” reasoning assumes that attention, memory, and so forth be cleanly split from perception proper. But attentional effects can be dismissed if and only if attention simply changes input to a putatively modular visual system (sect. 4.5). Memory effects can be dismissed if and only if memory is truly an amodal “back-end” system. Recognition and categorization effects can be dismissed if and only if these processes are wholly downstream of “true” perception (sects. 3.4, 4.6). All of those assumptions are wrong.
Some aspects of attention really are a bit like changing the input to our eyes. Attending to one or another part of a Necker cube is kind of like shifting one's eyes. If we dismiss the latter as an interesting sort of top-down effect on perception, we should likewise dismiss the former. But as we now know, attention is far richer. We can, for example, attend to people or dogs, or the letter “T” (across the visual field) – a process of deploying complex priors within which incoming information is processed. In so doing, attention warps the visual representations (e.g., Çukur et al. Reference Çukur, Nishimoto, Huth and Gallant2013; sect. 5.2 in Lupyan Reference Lupyan2015a for discussion). Aside from the simplest confounds in spatial attention, attentional effects are not an alternative to top-down effects on perception, but rather one of the mechanisms by which higher-level knowledge affects lower-level perceptual processes (Lupyan & Clark Reference Lupyan and Clark2015).
Some top-down effects can be dismissed as being effects on memory. Someone might remember a $20 bill as being larger than a $1 bill, but not see it as such. But F&S's “just memory” argument goes much further. For example, Lupyan and Spivey (Reference Lupyan and Spivey2008) found that instructing participants to view the meaningless symbols
and
as meaningful – rotated numbers 2 and 5 – improved visual search efficiency. F&S argue that this might be merely an effect on memory, citing Klemfuss et al. (Reference Klemfuss, Prinzmetal and Ivry2012) as having shown that decreasing the memory load by showing participants a target-preview caused the meaningfulness advantage to disappear. But actually, the largest effect of the target-preview was to slow search performance for the meaningful-number condition, bringing it in line with that of the meaningless-shape condition.
But suppose Klemfuss et al. actually found that showing a target-preview to participants improved search as much as the instructional manipulation we had used. Would this mean that meaningfulness does not affect perception? Not at all! If telling people to think of
and
as 2s and 5s is as effective as showing a target preview in helping them to find the completely unambiguous target in a singleton search, that would mean a high-level instructional manipulation meaning can affect visual search efficiency as much as an overtly visual aid. A top-down effect that can be partially ascribed to memory does not mean it is not (also) an effect on perception, because part of what we call memory – visual memory – appears to have a perceptual locus (D'Esposito & Postle Reference D'Esposito and Postle2015; Pratte & Tong Reference Pratte and Tong2014). This is why holding visual items in memory causes people to see things differently (e.g., Scocchia et al. Reference Scocchia, Cicchini and Triesch2013).
Visual memory is not a back-end system, as F&S assume. It is perceptual. This helps explain the confusion F&S have about Lupyan and Ward's (Reference Lupyan and Ward2013) demonstration that hearing a word (e.g., “kangaroo”) can make visible an image of a kangaroo made invisible through continuous flash suppression. Lupyan & Ward's explanation was exactly the same as F&S's (sect. 4.6.2): Hearing a word activates visual knowledge – knowledge that is visual – which we argued allows people to see otherwise weak and fragmented visual inputs. Even if this is “merely” an effect on back-end memory, the fact remains that hearing a word improves sensitivity in simply detecting objects. It helps people see. Like attention, memory is part of the mechanism by which knowledge affects perception.
Lastly, recognition. Vision scientists might be surprised to learn that, according to F&S, studying how people recognize a dog as a dog or that two objects look the same is studying the postperceptual back end, but studying animacy (Gao et al. Reference Gao, Newman and Scholl2009), causal history (Chen & Scholl Reference Chen and Scholl2016), and reconstructions of shapes through occluders (Firestone & Scholl Reference Firestone and Scholl2014a), is studying true perception. Not all perceptual tasks require recognition, but most of the ones vision scientists care about do. If simply detecting an object as an object (Lupyan & Ward Reference Lupyan and Ward2013) is “just” recognition and therefore not true perception, many vision scientists might want to find other employment.
Disagreements arise when people argue with different facts. But disagreements can also arise when people argue from different starting assumptions. F&S and I share all of the same facts, but F&S come to the wrong conclusions because they have the wrong assumptions.
Many of the studies F&S review indeed suffer from stimulus and experimenter-demand confounds, But many others are well-controlled investigations using gold-standard psychophysical methods. These studies show that expectations and knowledge affect virtually all aspects of visual perception. For example, knowledge of surface hardness affects amodal completion (Vrins et al. Reference Vrins, de Wit and van Lier2009), knowledge of bodies affects perceiving depth from binocular disparity (Bulthoff et al. Reference Bulthoff, Bulthoff and Sinha1998), expectations of motion affect motion perception (Sterzer et al. Reference Sterzer, Frith and Petrovic2008), and knowledge of real-world size affects perceived speed of motion (Martín et al. Reference Martín, Chambeaud and Barraza2015). Meaningfulness – a putatively late process – affects putatively earlier processes such as shape discrimination (Lupyan & Spivey Reference Lupyan and Spivey2008; Lupyan et al. Reference Lupyan, Thompson-Schill and Swingley2010) and recovery of 3-D volumes from two-dimensional images (Moore & Cavanagh Reference Moore and Cavanagh1998). Color knowledge affects color appearance of images (Hansen et al. Reference Hansen, Olkkonen, Walter and Gegenfurtner2006) and even color afterimages (Lupyan Reference Lupyan2015b). Hearing a word affects the earliest stages of visual processing (Boutonnet & Lupyan Reference Boutonnet and Lupyan2015; see also Landau et al. Reference Landau, Aziz-Zadeh and Ivry2010; Pelekanos & Moutoussis Reference Pelekanos and Moutoussis2011).
How can F&S, who are aware of all of this work (some of which they discuss in detail in the target article), still argue that there are no top-down effects on perception? They dismiss all of those studies on the grounds that they are “just” effects of attention, memory, or categorization/recognition. This “it's not perception, it's just X” reasoning assumes that attention, memory, and so forth be cleanly split from perception proper. But attentional effects can be dismissed if and only if attention simply changes input to a putatively modular visual system (sect. 4.5). Memory effects can be dismissed if and only if memory is truly an amodal “back-end” system. Recognition and categorization effects can be dismissed if and only if these processes are wholly downstream of “true” perception (sects. 3.4, 4.6). All of those assumptions are wrong.
Some aspects of attention really are a bit like changing the input to our eyes. Attending to one or another part of a Necker cube is kind of like shifting one's eyes. If we dismiss the latter as an interesting sort of top-down effect on perception, we should likewise dismiss the former. But as we now know, attention is far richer. We can, for example, attend to people or dogs, or the letter “T” (across the visual field) – a process of deploying complex priors within which incoming information is processed. In so doing, attention warps the visual representations (e.g., Çukur et al. Reference Çukur, Nishimoto, Huth and Gallant2013; sect. 5.2 in Lupyan Reference Lupyan2015a for discussion). Aside from the simplest confounds in spatial attention, attentional effects are not an alternative to top-down effects on perception, but rather one of the mechanisms by which higher-level knowledge affects lower-level perceptual processes (Lupyan & Clark Reference Lupyan and Clark2015).
Some top-down effects can be dismissed as being effects on memory. Someone might remember a $20 bill as being larger than a $1 bill, but not see it as such. But F&S's “just memory” argument goes much further. For example, Lupyan and Spivey (Reference Lupyan and Spivey2008) found that instructing participants to view the meaningless symbols
and
as meaningful – rotated numbers 2 and 5 – improved visual search efficiency. F&S argue that this might be merely an effect on memory, citing Klemfuss et al. (Reference Klemfuss, Prinzmetal and Ivry2012) as having shown that decreasing the memory load by showing participants a target-preview caused the meaningfulness advantage to disappear. But actually, the largest effect of the target-preview was to slow search performance for the meaningful-number condition, bringing it in line with that of the meaningless-shape condition.
But suppose Klemfuss et al. actually found that showing a target-preview to participants improved search as much as the instructional manipulation we had used. Would this mean that meaningfulness does not affect perception? Not at all! If telling people to think of
and
as 2s and 5s is as effective as showing a target preview in helping them to find the completely unambiguous target in a singleton search, that would mean a high-level instructional manipulation meaning can affect visual search efficiency as much as an overtly visual aid. A top-down effect that can be partially ascribed to memory does not mean it is not (also) an effect on perception, because part of what we call memory – visual memory – appears to have a perceptual locus (D'Esposito & Postle Reference D'Esposito and Postle2015; Pratte & Tong Reference Pratte and Tong2014). This is why holding visual items in memory causes people to see things differently (e.g., Scocchia et al. Reference Scocchia, Cicchini and Triesch2013).
Visual memory is not a back-end system, as F&S assume. It is perceptual. This helps explain the confusion F&S have about Lupyan and Ward's (Reference Lupyan and Ward2013) demonstration that hearing a word (e.g., “kangaroo”) can make visible an image of a kangaroo made invisible through continuous flash suppression. Lupyan & Ward's explanation was exactly the same as F&S's (sect. 4.6.2): Hearing a word activates visual knowledge – knowledge that is visual – which we argued allows people to see otherwise weak and fragmented visual inputs. Even if this is “merely” an effect on back-end memory, the fact remains that hearing a word improves sensitivity in simply detecting objects. It helps people see. Like attention, memory is part of the mechanism by which knowledge affects perception.
Lastly, recognition. Vision scientists might be surprised to learn that, according to F&S, studying how people recognize a dog as a dog or that two objects look the same is studying the postperceptual back end, but studying animacy (Gao et al. Reference Gao, Newman and Scholl2009), causal history (Chen & Scholl Reference Chen and Scholl2016), and reconstructions of shapes through occluders (Firestone & Scholl Reference Firestone and Scholl2014a), is studying true perception. Not all perceptual tasks require recognition, but most of the ones vision scientists care about do. If simply detecting an object as an object (Lupyan & Ward Reference Lupyan and Ward2013) is “just” recognition and therefore not true perception, many vision scientists might want to find other employment.