R1. Introduction
We are clearly not the only ones with strong views about how seeing relates to thinking. We were driven to explore the influences of cognition on perception primarily because this issue is so foundational to so many areas of cognitive science, and the commentaries on our target article exemplified this breadth and importance in several ways – hailing from many different fields (from systems neuroscience, to social psychology, to philosophy), drawing on vastly different methods (from rodent electrophysiology, to hue adjustment, to computational modeling), and originating from diverse perspectives (from predictive coding, to embodied cognition, to constructivism).
All of this led to a staggering diversity of reactions to our six “pitfalls,” our conclusions about the state of the art, and our proposals for moving forward. Our approach was “theoretically unique” (Tseng, Lane, & Bridgeman [Tseng et al.]) but also “heavily recycled” (Vinson, Abney, Amso, Chemero, Cutting, Dale, Freeman, Feldman, Friston, Gallagher, Jordan, Mudrik, Ondobaka, Richardson, Shams, Shiffrar, & Spivey [Vinson et al.]); our recommendations constituted “an excellent checklist” (Esenkaya & Proulx) that was also “fundamentally flawed” (Balcetis & Cole); we gave a “wonderful exposé” (Block) that was also “not even wrong” (Lupyan); our critique was “timely” (Gur) but also “anachronistic” (Clore & Proffitt); we provided “a signal service to the cognitive psychology community” (Cutler & Norris) that was also “marginal, if not meaningless, for understanding situated behaviors” (Cañal-Bruland, Rouwen, van der Kamp, & Gray [Cañal-Bruland et al.]); we heard that “the anatomical and physiological properties of the visual cortex argue against cognitive penetration” (Gur), but also that our view “violates the functional architecture of the brain” (Hackel, Larson, Bowen, Ehrlich, Mann, Middlewood, Roberts, Eyink, Fetterolf, Gonzalez, Garrido, Kim, O'Brien, O'Malley, Mesquita, & Barrett [Hackel et al.]).
We are extremely grateful to have had so many of the leading lights of our field weigh in on these issues, and these 34 commentaries from 103 colleagues have given us a lot to discuss – so let's get to it. We first explore the foundational issues that were raised about the nature of seeing and its relation to thinking (sect. R2). Then, we take up the reactions to our article's empirical core: the six-pitfall “checklist” (sect. R3). Finally, we turn to the many new examples that our commentators suggested escape our pitfalls and demonstrate genuine top-down effects of cognition on perception (sect. R4).
R2. The big picture
Our target article was relentlessly focused on empirical claims, placing less emphasis on the broader theoretical landscape surrounding these issues. That focus was not an accident: We feel that purely theoretical discussions, though fascinating, have failed to move the debate forward. Nevertheless, many commentators raised issues of exactly this sort, and we have a lot to say about them.
R2.1. See for yourself: Isolating perception from cognition
Some commentators despaired over ever being able to separate seeing and thinking, denying that this distinction is real (Beck & Clevenger; Clore & Proffitt; Goldstone, de Leeuw, & Landy [Goldstone et al.]; Hackel et al.; Keller; Lupyan; Miskovic, Kuntzelman, Chikazoe, & Anderson [Miskovic et al.]; Vinson et al.) or well-defined (Emberson; Gerbino & Fantoni; Rolfs & Dambacher; Witt, Sugovic, Tenhundfeld, & King [Witt et al.]), and even claiming that “the distinction between perception and judgment, if there is one, is not clear and intuitive” in the first place (Keller).
R2.1.1. Seeing versus thinking
Speaking as people who can see and think (rather than as scientists who study perception and cognition), we find such perspectives baffling. One of the clearest and most powerful ways to appreciate that seeing and thinking must be different is simply to note that they often conflict: Sometimes, what you see is different than what you think. This conflict may occur not only because cognition fails to penetrate perception, but also because seeing is governed by different and seemingly idiosyncratic rules that we would never think to apply ourselves.
Perhaps nobody has elucidated the empirical foundations and theoretical consequences of this observation better than Gaetano Kanizsa, whose ingenious demonstrations of such conflict can, in a single figure, obliterate the worry that perception and cognition are merely “folk categories” (Hackel et al.) that “reify the administrative structure of psychology departments” (Gantman & Van Bavel) rather than carve the mind at its joints. For example, in Figure R1, you may see amodally completed figures that run counter to your higher-level intuitions of what should be behind the occluding surfaces, or that contradict your higher-level knowledge. Reflecting on such demonstrations, Kanizsa is clear and incisive:
The visual system, in cases in which it is free to do so, does not always choose the solution that is most coherent with the context, as normal reasoning would require. This means that seeing follows a different logic – or, still better, that it does not perform any reasoning at all but simply works according to autonomous principles of organization which are not the same principles which regulate thinking. (Kanizsa Reference Kanizsa1985, p. 33)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170928100721-22104-mediumThumb-S0140525X16000029_fig1g.jpg?pub-status=live)
Figure R1. Visual phenomena that contradict higher-level expectations. (A) Sandwiched by octagons, a partially occluded octagon looks to have an unfamiliar shape inconsistent with the scene (see also the discussion in Pylyshyn Reference Pylyshyn1999). (B) An animal is seemingly stretched to an impossible length (and identity) when occluded by a wide surface. Adapted from Kanizsa and Gerbino (Reference Kanizsa, Gerbino and Beck1982).
Notice that Kanizsa's treatment forces us to acknowledge a distinction between seeing and thinking even before offering any definition of those processes. Indeed, literal definitions that cover all and only the relevant extensions of a concept are famously impossible to generate for anything worth thinking much about; by those austere standards, most words don't even have definitions. (Try it yourself with Wittgenstein's famous example of defining the word game.) So, too, for perception: It can't be that the scientific study of perception must be complete before we can say anything interesting about the relationship between seeing and thinking. As Kanizsa's insights show, distinctions are what really matter. Such distinctions are thus precisely what our target article focused on, and we remain amazed that anyone looking at Figure R1 could deny the distinction between seeing and thinking.
Moreover, the concrete case studies we highlighted for our six pitfalls can serve a similar function as Figure R1. It can sometimes sound compelling in the abstract to question whether lines can be drawn between this or that process in the mind – for example, between perception and memory (e.g., Emberson; Gantman & Van Bavel; Goldstone et al.; Lupyan). But a concrete case study – for example, of “moral pop-out,” which anchored Pitfall 6 (“Memory and Recognition”) from our target article – wipes such abstract concerns away. And indeed, although the commentaries frequently discussed the distinction between perception and memory in the abstract – sometimes complaining that memory cannot be “cleanly split from perception proper” (Lupyan) – not a single commentator responded to this case study by rejecting the perception/memory distinction itself. (To the contrary, as we explore in sect. R3.6, Gantman & Van Bavel went to great lengths to argue that “moral pop-out” does not reflect semantic priming – presumably because they agreed that this alternative would indeed undermine their view.)
R2.1.2. Signatures of perception
Our target article focused primarily on case studies of phenomena that we argue don't reflect perception (instead involving processes such as higher-level judgment), but we have a robust theoretical and empirical interest in ways to show that various phenomena do reflect perception. Some commentators think our notion of perception is “extremely narrow” (Cañal-Bruland et al.) and “restrictive” (Clore & Proffitt), and that it “whittles the fascinating and broad domain of perception to sawdust” (Gantman & Van Bavel). We couldn't disagree more. Perception may be immune from cognitive influence, but it nevertheless traffics in a truly fascinating and impressively rich array of seemingly higher-level properties – including not just lower-level features such as color, motion, and orientation, but also causality (Scholl & Nakayama Reference Scholl and Nakayama2002), animacy (Gao et al. Reference Gao, McCarthy and Scholl2010), persistence (Scholl Reference Scholl2007), explanation (Firestone & Scholl Reference Firestone and Scholl2014a), history (Chen & Scholl Reference Chen and Scholl2016), prediction (Turk-Browne et al. Reference Turk-Browne, Junge and Scholl2005), rationality (Gao & Scholl Reference Gao and Scholl2011), and even aesthetics (Chen & Scholl Reference Chen and Scholl2014). (“Sawdust”!)
Indeed, the study of such things is the primary occupation of our laboratory, and in general we think perception is far richer and smarter than it is often given credit for. But we don't think that anything goes, and we take seriously the need to carefully demonstrate that such factors are truly extracted during visual processing, per se. It is often difficult to do so, but it can be done – empirically and decisively (as opposed to only theoretically, as in proposals by Halford & Hine and Ogilvie & Carruthers). Indeed, our new favorite example of this was highlighted by Rolfs & Dambacher, who have demonstrated that the perception of physical causality (as when one billiard ball is seen to “launch” another) exhibits a property associated exclusively with visual processing: retinotopically specific adaptation (Rolfs et al. Reference Rolfs, Dambacher and Cavanagh2013; see also Kominsky & Scholl Reference Kominsky and Scholl2016) of the sort that also drives certain types of color afterimages. This example illustrates how “perception” can be identified – not by abstract definitional wordplay, but rather by concrete empirical signatures, of which there are many (for extensive discussion, see Scholl & Gao Reference Scholl, Gao, Rutherford and Kuhlmeier2013).
R2.2. What would it take?
Several commentators worried that our view (as expressed in our target article's title) could not be disproven even in principle, and that it was even an “unfalsifiable tautology” (Gerbino & Fantoni). De Haas, Schwarzkopf, & Rees (De Haas et al.) challenged our view most directly in this way: “Specifically, what type of neural or behavioural evidence could refute it?” We accept this challenge.
We take our view to involve the most easily falsifiable claims in this domain in decades, and this assumption is absolutely central to our aims. Gantman & Van Bavel got things exactly right when they wrote that “the crux of F&S's argument lies in their empirical re-explanations of a handful of case studies. These are falsifiable.” Similarly, we entirely agree with Witt et al. that our “claim that no top-down effects on perception exist can be felled with the demonstration that one effect survives all pitfalls.”
So, what would it take to falsify our view in practice? That's easy: Every single one of the case studies discussed in our target article could easily have counted against our thesis! It could have been that when you give a good cover story for wearing a backpack (to mask its otherwise-obvious purpose), the backpack still makes hills look steeper. It could have been that when you blur faces, observers who don't see race also don't see the relevant lightness differences. It could have been that, under conditions characterized by an El Greco fallacy, the relevant top-down effects (e.g., of emotion on perceived brightness or of action-capabilities on perceived aperture width) disappear entirely, as they should. It could have been that when you carefully ask subjects to distinguish perception from “nonvisual factors,” their responses clearly implicate perception rather than judgment. It could have been that “pop-out” effects in lexical decision tasks work for morality but not for arbitrary categories such as fashion. The list goes on.
Moreover, there's no “file-drawer problem” here; it's not that we've investigated dozens of alleged top-down effects and reported only those rare successes. Instead, every time we or others poke one of these studies with our pitfalls, it collapses. In other words, our view is eminently falsifiable, and indeed we ourselves – perhaps more so than any commentator – have tried our best to falsify it. We have simply failed to do so (and we engage with several new such claims in sect. R4).
R2.3. The perspective from neuroscience: Allowing everything but demanding nothing
Our discussion focused on what we see and what we think, and we suggested that perception is encapsulated from cognition. But many of the commentaries worried that in doing so we are living in the wrong century, harboring an “outdated view of the mind” (Hackel et al.). Instead, the more fashionable way to investigate what goes on in our heads is to consider “descending neural pathways” (O'Callaghan, Kveraga, Shine, Adams, & Bar [O'Callaghan et al.]), or “feedback projections” (Vinson et al.), or “reciprocal neural connections” (Clore & Proffitt), or a “dynamically reverberating loop” (Miskovic et al.), or a “continuum of brain modes” (Hackel et al.), or an “ongoing, dynamic network of feedforward and feedback activity” (Beck & Clevenger), or an “interconnected network of neurons, bathed in a chemical system, that can be parsed as a set of broadly distributed, dynamically changing, interacting systems” (Hackel et al.), or a “potpourri of synaptic crosstalk, baked into pluripotent cytocircuitry” (OK, we made that one up).
Many commentators noted, quite correctly, that we “readily dismiss the extensive presence of descending neural pathways” (O'Callaghan et al.) as having little to contribute to the core issue of how seeing and thinking interact. But we did so only in passing, in our zeal to focus on the relevant psychological experiments. And so in response, we will dismiss this work more comprehensively. We are, of course, aware of such ideas, but we think they are too often raised in these contexts in an uncritical way, and in fact are (some mixture of) irrelevant, false, and unscientific. Let's expand on this:
R2.3.1. “Unscientific.”
As far as we know, nobody thinks that every top-down effect of cognition on perception that could occur in fact does occur. For example, looking at Figure R2, you should experience the illusion of motion when you move your head from side to side (panel A) or forward and back (panel B), even though you can be morally certain that nothing is in fact moving in those images. (Indeed, to eliminate any doubt, you can view Figure R2 on a physical page, where a lifetime of experience and a vast body of knowledge about how ink and paper work can assure you that the images on the page are static.) Yet, as so often occurs with such phenomena, the illusion of motion persists. So, here is an example of what we know failing to influence what we see.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170928100721-42516-mediumThumb-S0140525X16000029_fig2g.jpg?pub-status=live)
Figure R2. Illusory motion in static images, defying our knowledge that these images are not, in fact, moving. (a) Moving one's eyes back and forth produces illusory motion in this illusion by Hajime Ouchi. (b) When the center is fixated, moving one's head toward and away from the image produces illusory rotary motion (Pinna & Brelstaff Reference Pinna and Brelstaff2000; this version created by Pierre Bayerl and Heiko Neumann).
This sort of phenomenon invites a straightforward question for the perspectives articulated by so many of our neuro-inspired commentators: How does this happen? Given the overwhelming prevalence of loops and re-entrance and descending pathways and interconnected networks and continua of brain modes, how is seeing insulated from thinking in this particular instance? Apparently, these rhapsodic accounts of the complete flexibility of perception are no obstacle to the thousands of visual phenomena that aren't affected by what we know, believe, remember, and so forth. As Vinson et al. concede, “Admittedly, cognition cannot ‘penetrate’ our perception to turn straight lines into curved ones.”
In stark contrast with our view (which makes strong and necessary predictions; see sect. R2.2), these grand theories of brain function truly are unfalsifiable in the context of the present issues. Whenever there is an apparent top-down effect of cognition on perception, re-entrant/descending/recurrent … pathways/connections/projections get all of the credit. But whenever there isn't such an effect, nobody seems concerned, because that can apparently be accommodated just as easily. That is what an unfalsifiable theory looks like.
We have no doubt that these theories can contort themselves to explain away cases where thinking fails to affect seeing. (Maybe the solution has something to do with “the joint impact of priors and context-variable precision estimations” [Clark].) After all, many commentaries hedged their renditions of the pervasiveness of top-down processing in the brain, suggesting only that (with emphases added) “much of the neural hardware responsible for vision flexibly changes its function in complex ways depending on the goals of the observer” (Beck & Clevenger); that “the activity of neurons in the primary visual cortex is routinely modulated by contextual feedback signals” (Vinson et al.); and that “connectivity patterns within the cortex … often dominate” (Hackel et al.). But this is precisely the problem: Without independent constraints on “much of,” “routinely,” and “often,” these brain models can accommodate any result. In short, they allow everything, but demand nothing – and so they don't explain anything at all. And until they are rid of this property, these “theories” are difficult to take seriously.
R2.3.2. “False.”
One commentator did take such views very seriously, issuing a powerful critique that met the ideas on their own terms. Gur makes a simple but ingenious case against various neuroscientifically inspired claims of cognitive penetrability, reaching the very opposite conclusion of so many others writing on this topic.
Perception, Gur notes, often traffics in fine-grained details. For example, we can perceive not only someone's face as a whole, but also the particular shape of a freckle on their nose. The only brain region that represents space with such fine resolution is V1 – which, accordingly, has cells with tiny receptive fields. So, to alter perception at the level of detailed perceptual experience, any influence from higher brain regions must be able match the same fine grain of V1 neurons. However, the only such connections – the “re-entrant pathways” trumpeted in so many commentaries – have much coarser resolution, in part because they pass through intermediate regions whose cells have much larger receptive fields (at least >5°, or about the size of your palm when held at arm's length). Therefore, influences from such higher areas cannot selectively alter the experience of spatial details smaller than the receptive fields of those cells. In other words, Gur concludes that the brain cannot even implement the top-down phenomena reported in the literature, making cognitive penetrability like “homeopathy … because no plausible mechanisms for its effects are suggested.”
The key to such insights is critical: To appreciate the relevance (or lack thereof) of feedback connections in the brain, one must not only note their existence (as did so many commentaries), but also consider what they are doing. When you do only the former, you may hastily conclude that our view “violates the functional architecture of the brain” (Hackel et al.). But when you do the latter, you realize that “there is no feasible physical route for such a penetration” (Gur).
R2.3.3. “Irrelevant.”
Why is it so popular to leap from flexible models of brain function to a flexible relationship between seeing and thinking? Desseilles & Phillips may have shed some light on this issue: “Like the vast majority of professional neuroscientists worldwide, we consider that cognitions and perceptions are governed by specific patterns of electrical and chemical activity in the brain, and are thus essentially physiological phenomena.”
But this line of reasoning is and has always been confused (see Carandini Reference Carandini2012; Fodor Reference Fodor1974). After all, it is equally true that cognition and perception are “governed” by the movement and interaction of protons and electrons; does this entail, in any way that matters for cognitive science, that seeing and thinking are essentially subatomic phenomena? That we should study subatomic structures to understand how seeing and thinking work? Clearly not.
Similarly, some commentaries seemed to go out of their way to turn our view into an easily incinerated straw man, alleging that “F&S assume that the words cognition and perception refer to distinct types of mental processes … localized to spatially distinct sets of neurons in the brain” (Hackel et al.). However, we assumed no such thing. Just as Microsoft Word is clearly encapsulated from Tetris regardless of whether they are distinguishable at the level of microprocessor structure, so too can perception be clearly encapsulated from cognition regardless of whether they are distinguishable at the level of brain circuitry (or subatomic structure).
R2.4. But why?
The core of our approach has been empirical, not theoretical, and we have avoided purely abstract discussions of why perception should or should not be encapsulated. Still, it can be interesting (if historically less productive) to consider why perception might be encapsulated from the rest of the mind.
R2.4.1. Flexibility, stability, and the free press
Many commentaries seemed to take for granted that cognitively penetrable vision would be a good thing to have, for example suggesting that a thoroughly top-down architecture is “undoubtedly a highly adaptive mechanism” (O'Callaghan et al.). This is not a foregone conclusion, however, as Durgin emphasized. Whereas other commentaries suggested that our target article “neglects the fundamental question what perception is for” (Cañal-Bruland et al.) – and that action is the answer – Durgin noted how successful action benefits from perceptual stability, because “momentary destabilization of space perception by desire, fatigue, and so forth would tend to undermine the whole point of perception as a guide for action” (Durgin).
These ideas relate to what our colleague Alan Gilchrist has informally called the “free press” model of perception. In government, as in the mind, it may serve certain short-term interests to actively distort the information reaching the people (or cognitive systems) who rely on it. However, in both cases, it is ultimately preferable not to exert such top-down influence, and instead to support a “free press” that can report what is happening honestly, without concern for what would be expedient at one particular time or for one special interest.
One reason to support the free press is that one doesn't know in advance just how this information will be used, and so any such distortions may have unintended negative consequences. In the case of perception, we may view a hill with a momentary intention to climb it; but even with this intention, we may also have other purposes in mind, for example using the hill as a landmark for later navigation, or to escape a flood. If the hill's perceived slant or height constantly shifts according to our energy levels or the weight on our shoulders (Bhalla & Proffitt Reference Bhalla and Proffitt1999), then its utility for those other purposes will be undermined. (For example, we may think the hill offers greater safety from a flood than it truly does.) Better for vision to report the facts as honestly as possible and let the other systems relying on it (e.g., action, navigation, social evaluation, decision-making) use that information as they see fit.
R2.4.2. Protecting seeing from thinking
Another advantage of encapsulated perception is the benefit of automation. As another colleague, Scott Grafton, informally notes, encapsulation may sometimes seem like an exotic or specialized view when considered in the context of the mind, but it is actually commonplace in many control systems, whether engineered by natural selection or by people – and for good reason:
Impenetrability … is the rule, not the exception. A pilot in a 787 gets to control a lot of things in his plane. But not everything. Much of it is now done by local circuits that are layered or protected from the pilot. A lot of plane crashes in modern planes arise when the pilot is allowed into a control architecture that is normally separate (fighting with the autopilot). Modern software in your computer keeps you out of the assembly code. For a human, how long do you think you would stay alive if you were allowed conscious control of your brainstem nuclei involved in blood pressure control, blood pH or cerebral perfusion pressure? Sensing and perception mechanisms likely operate with protocols that are not accessible by cognition. This should be the norm.” (Grafton, personal communication)
In other words, by being encapsulated from thinking, seeing is protected from thinking. Our wishes, emotions, actions, and concerns are barred from altering visual processing so that they don't mess it up.
R3. The six pitfalls
The core of our target article explored how six concrete and empirically testable pitfalls can account for the hundreds of alleged top-down effects of cognition on perception reported in at least the last two decades – and how these pitfalls do account for many such effects in practice. Some commentaries accepted our recommendations, agreeing that “researchers should apply this checklist to their own work” (Witt et al.) and that “only research reports that pass (or at least explicitly address) F&S's six criteria can henceforth become part of the serious theoretical conversation” (Cutler & Norris). Other commentaries argued that our recommendations were “fundamentally flawed” (Balcetis & Cole). Here we respond to these many reactions.
R3.1. Pitfall 1: Uniquely disconfirmatory predictions
Although many commentators suggested that their favorite top-down effect escapes the El Greco fallacy, it was encouraging that nearly every commentary that discussed this pitfall seemed to accept its underlying logic: When the “measuring equipment” should be affected in the same way as whatever it's measuring, the effects must cancel out. Indeed, Xie & Zhang found this logic compelling enough to “fix” an El Greco fallacy that afflicted a previously reported top-down effect (Meier et al. Reference Meier, Robinson, Crawford and Ahlvers2007), though they ultimately implicated pupillary changes as the mechanism of the effect.
R3.1.1. An El Greco fallacy fallacy?
One commentary, however, contested our application of the El Greco fallacy to a particular top-down effect. Holding a wide pole across one's body reportedly makes doorway-like apertures look narrower, as measured not only by adjusting a measuring tape to match the aperture's perceived width (Stefanucci & Geuss Reference Stefanucci and Geuss2009), but also by adjusting a second aperture (Firestone & Scholl Reference Firestone and Scholl2014b) – even though, according to the underlying theory, this second aperture should also have looked narrower and so the effects should have canceled out. Hackel et al. objected: “[T]he first aperture is meant to be passed through, whereas the second is not …. To assume that the top-down influence on width estimates would be the same and therefore cancel out under these distinct conditions suggests a misunderstanding of top-down effects.”
However, it is Hackel et al. who have misunderstood both this top-down effect and the methodology of our El Greco fallacy studies. We did not blindly “assume” that the rod would influence both apertures equally – we actively built this premise into our study's design, anticipating exactly this concern. The original aperture-width study that inspired our own (Stefanucci & Geuss Reference Stefanucci and Geuss2009) required subjects to imagine walking through the aperture before estimating its width, so as to engage the appropriate motor simulations. So, we made sure to ask our subjects to imagine walking through both apertures, on every trial, to ensure that both apertures would be “scaled” to the subject's aperture-passing abilities (Firestone & Scholl Reference Firestone and Scholl2014b; Study 2). In other words, Hackel et al. simply have the facts wrong when they write that “the first aperture is meant to be passed through, whereas the second is not”; in truth, both apertures were viewed with passage in mind, just as the El Greco logic requires.
The fact that this crucial methodological detail (which was explicitly stated in the experiment's procedures) escaped the notice of all 16 of this commentary's authors amplifies one of our core themes: The empirical literature on top-down effects has suffered from a shortage of attention to exactly these sorts of details. Moving forward, it will not be enough to simply report an effect of some higher-level state on some perceptual property and leave it at that, without care to rule out other, nonperceptual interpretations. If there is one unifying message running through our work on this topic, it is this: The details matter.
R3.2. Pitfall 2: Perception versus judgment
Our target article called for greater care in distinguishing perception from postperceptual judgment. For example, subjects who are asked how far, large, or fast some object is might respond not only on the basis of how the object looks, but also on the basis of how far or large or fast they think it is. Many commentators accepted this distinction and our suggestions for exploring it. However, multiple commentaries denied that this pitfall afflicts the research we discussed, because of special measures that allegedly rule out judgment conclusively.
R3.2.1. Does “action” bypass judgment?
At least two commentaries (Balcetis & Cole; Witt et al.) argued that so-called “action-based measures” can rule out postperceptual judgment as an alternative explanation of alleged top-down effects. For example, rather than verbally reporting how far away or how fast an object is, subjects could throw a ball (Witt et al. Reference Witt, Proffitt and Epstein2004) or a beanbag (Balcetis & Dunning Reference Balcetis and Dunning2010) to the object, or catch the object if it is moving (Witt & Sugovic Reference Witt and Sugovic2013b). Witt et al. asserted that such measures directly tap into perception and not judgment: “Because this measure is of action, and not an explicit judgment, the measure eliminates the concern of judgment-based effects.”
But that assertion is transparently false: In those cases, actions not only can reflect explicit judgments, but also they often are explicit judgments. This may be easier to see in more familiar contexts where perception and judgment come apart. For example, objects in convex passenger-side mirrors are famously “closer than they appear,” and experienced drivers learn to account for this. Someone looking at an object through a mirror that they know distorts distances may see an object as being, say, 20 feet away, and yet judge the object to be only 15 feet away. Indeed, such a person might respond “15 feet” if asked in a psychology experiment how far away they think the object is. What about their actions? According to Witt et al.'s line of argument, once people are asked to throw a ball at the object, they will somehow forget everything they know about the mirror's distortion and simply throw the ball as far as the object looks, without correcting for that higher-level knowledge. But that seems absurd: Our object-directed actions can and do incorporate what we think, know, and judge – in addition to what we see – and there is no reason to think that the actions in Witt et al.'s various experiments are any different.Footnote 1
R3.3. Pitfall 3: Task demands and response bias
Certain points of disagreement with our commentators were not unexpected. For example, we anticipated having to defend the distinctions we drew between perception, attention, and memory (see sect. 3.5 and 3.6). We were genuinely surprised, however, that a few brave commentators rejected our recommendations about controlling for task demands (Balcetis & Cole; Clore & Proffitt). We suggested that many apparent top-down effects of cognition on perception arise because subjects figure out the purpose of such experiments and act compliantly (cf. Orne Reference Orne1962), or otherwise respond strategically,Footnote 2 and so we made what we thought were some mild recommendations for being careful about such things (e.g., actively asking subjects about the study's purpose, and taking measures to mask the purpose of the manipulations).
Balcetis & Cole rejected these recommendations as “fundamentally flawed,” and replaced them with “five superior techniques” of their own (see also Clore & Proffitt). Though we were happy to see these concrete details discussed so deeply, these new recommendations are no substitute for the nonnegotiable strategies of masking demand and carefully debriefing subjects about the experiment – and in many cases these supposedly “superior” techniques actually worsen the problem of demand.
R3.3.1. Asking…
A primary technique we advocated for exploring the role of demand in top-down effects is simply to ask the subjects what they thought was going on. This has been revealing in other contexts: For example, more than 75% of subjects who are handed an unexplained backpack and are then asked to estimate a hill's slant believe that the backpack is meant to alter their slant estimates (stating, e.g., “I would assume it was to see if I would overestimate the slope of the hill”; Durgin et al. Reference Durgin, Klein, Spiegel, Strawser and Williams2012). It is hard to see what could be flawed about such a technique, and yet it is striking just how few studies (by our count, zero) bother to systematically debrief subjects as Durgin et al. (Reference Durgin, Baird, Greenburg, Russell, Shaughnessy and Waymouth2009; Reference Durgin, Klein, Spiegel, Strawser and Williams2012) show is necessary.
Clore & Proffitt suggested that asking subjects about their hypotheses once the experiment is over fails to separate hypotheses generated during the task from hypotheses generated post hoc. We are unmoved by that suggestion. First, the same studies that find most subjects figure out the experiment's purpose and change their estimates also find those same subjects are driving the effect (Durgin et al. Reference Durgin, Baird, Greenburg, Russell, Shaughnessy and Waymouth2009). But second, Clore & Proffitt's observation makes debriefing a stronger test: If your effect is reliable even in subjects who don't ever guess the experiment's purpose – whether during or after the experiment – then that is even more compelling evidence against a role for demand. There is no reason not to ask.
R3.3.2. …and telling
Balcetis & Cole specifically criticized our recommendation to use cover stories to mask the purpose of manipulations (e.g., telling subjects that a backpack carried electrodes or that a pole was for balance): “Alternative cover stories do not remove the opportunity to guess hypotheses, nor do they eliminate the possibility that participants will amend their responses in accordance with their conjectured suppositions. They simply introduce new task demands.”
We agree that there is no such thing as a demand-free environment, but what is the problem here? Alternative cover stories could be problematic only if the “conjectured suppositions” they imply would produce a directional bias in estimates. What is the implied direction in telling subjects that a backpack contains electrodes (Durgin et al. Reference Durgin, Baird, Greenburg, Russell, Shaughnessy and Waymouth2009) or that a pole is for keeping one's balance (Firestone & Scholl Reference Firestone and Scholl2014b)? These cover stories eliminated the relevant effects; they didn't reverse them. Giving a cover story for the backpack, for example, led subjects to make the same slant estimates they made with no backpack at all. Is it really Balcetis & Cole's contention that when backpack-wearing subjects were given a cover story, they saw the slope as steeper but then intentionally lowered their estimates (for some unarticulated reason), and by precisely the amount required to make it look like there was no effect at all? That is the only possibility that would undermine the use of alternative cover stories, and accordingly we find the surprising resistance to this invaluable methodological technique to be uncritical and unfounded.
R3.3.3. Flawed alternatives
In place of cover stories and careful debriefing, Balcetis & Cole suggested five alternative techniques to combat demand. We briefly respond to each:
-
1. Accuracy incentives, including paying subjects extra money for correct responses: This technique sounds promising in principle, but it has foundered in practice. Balcetis and Dunning (Reference Balcetis and Dunning2010), for example, told subjects they could win a gift card by throwing a beanbag closer to it than any other subject; subjects made shorter throws to a $25 gift card than a $0 gift card, suggesting that desirable objects look closer. But subjects care about winning valuable gift cards (and not worthless ones), and so they may have been differently engaged across these situations, or used different strategies. Indeed, follow-up studies showed that such strategic differences alone produce similar effects (Durgin et al. Reference Durgin, DeWald, Lechich, Li and Ontiveros2011a).Footnote 3
-
2. Counterintuitive behavioral responses, including standing farther from chocolate if it looks closer (Balcetis & Dunning Reference Balcetis and Dunning2010): Whether something is intuitive or counterintuitive is an empirical question, and one cannot be sure without investigation. Rather than potentially underestimating subjects' insights, we recommend asking them what they thought, precisely to learn just what is (counter)intuitive.
-
3. Between-subjects designs, so as not to highlight differences between conditions: such designs can help, but they are completely insufficient. The backpack/hill study, for example, employed a between-subjects design, and subjects readily figured out its purpose anyway (Bhalla & Proffitt Reference Bhalla and Proffitt1999; Durgin et al. Reference Durgin, Baird, Greenburg, Russell, Shaughnessy and Waymouth2009; Reference Durgin, Klein, Spiegel, Strawser and Williams2012).
-
4. Double-blind hypothesis testing: This is another good idea, but experimenter expectancy effects are different than task demands. Our concern is not that subjects may divine the study's purpose from the experimenter's behavior; it is that the task itself makes the purpose transparent. The simple act of giving subjects an unexplained backpack and asking them to estimate slant reveals the hypothesis no matter what the experimenter knows or doesn't know.
-
5. Dissociate measures from demand, for example by having subjects throw a beanbag to a $100 bill they can win in a later unrelated contest (Cole & Balcetis Reference Cole and Balcetis2013): Again, this may be helpful in principle, but in practice it may cause more problems than it solves. Indeed, that same study also showed that subjects felt more excited or “energized” upon seeing the winnable $100 bill (compared with no bill) – and that sort of confounding factor could independently influence subjects' throws.
In short, we reject the contention that Balcetis & Cole's alternatives are “superior” – or even remotely sufficient. If an empirical paper implemented only their techniques, we would be entirely unconvinced – and you should be, too. The direct approach is the truly superior one: Cover stories have proven effective in exactly these circumstances, and they can and should be used in an unbiased way. And ever since Durgin et al. (Reference Durgin, Baird, Greenburg, Russell, Shaughnessy and Waymouth2009), asking subjects what they think is simply mandatory for any such experiment to be taken seriously.
R3.4. Pitfall 4: Low-level differences (and amazing demonstrations!)
Low-level differences in experimental stimuli (e.g., shading, curvature) can be confounded with higher-level differences (e.g., race, being an animal/artifact; Levin & Banaji Reference Levin and Banaji2006; Levin et al. Reference Levin, Takarae, Miner and Keil2001), such that it is not always clear which is responsible for an apparent top-down effect. We showed that low-level factors must contribute to one such effect: African-American faces look darker than Caucasian faces even when the images are equated for mean luminance (Levin & Banaji Reference Levin and Banaji2006); however, when the faces are blurred, even subjects who do not appreciate race in the images still judge the African-American face to be darker than the Caucasian face (Firestone & Scholl Reference Firestone and Scholl2015a), implying that low-level properties (e.g., the distribution of luminance) contribute to the effect.
Levin, Baker, & Banaji (Levin et al.) engaged with this critique in exactly the spirit we had hoped, and we thank them for their insightful and constructive reaction. However, we contend that they have misinterpreted both our data and theirs.
R3.4.1. Seeing race?
Levin et al.'s primary response was to suggest that subjects could still detect race after our blurring procedure, reporting above-chance performance in race identification in a two-alternative forced-choice (2AFC) between “Black” and “White.” But this response simply misunderstands the logic of our critique, which is not that it is completely impossible to guess the races of the faces, but rather that even those subjects who fail to see race in the images still show the lightness distortion. Our experiment gave subjects every opportunity to identify race in the images: We asked them to (a) describe the images in a way that could help someone identify the person; (b) explicitly state whether the races of the faces looked the same or different; (c) explicitly categorize the faces from a list of possible races; and (d) tell us if they ever thought about race but had been embarrassed to say so. Even those subjects who repeatedly showed no evidence of seeing race in the images (and indeed, even those subjects who explicitly thought the two images were of the same person) still judged the blurry African-American face to be darker.
Worse yet, 2AFC tasks are notoriously unreliable for higher-level properties such as race, which is why our own studies did not use them. Levin et al. concluded from above-chance performance in two-alternative racial categorization that “the blurring left some race-specifying information in the images.” But when you give subjects only two options, they can choose the “right” answer for the wrong reason or be prompted to look for information that hadn't previously seemed relevant – for example particular patterns of shading that they hadn't previously considered in a racial context.
For example, suppose that instead of blurring the images, we just replaced them with two homogeneous squares, one black and one white, and then we adapted Levin et al.'s paradigm to those images – so that the question was “Using your best guess, how would you differentiate these squares by race?” – and subjects had to choose which square was “African-American” or “Caucasian” (Fig. R3). In fact, we made this thought experiment an empirical reality, using 100 online subjects and the same parameters as Levin et al. All 100 subjects chose “African-American” for the black square and “Caucasian” for the white square.Footnote 4 Do these results imply that subjects perceived race in these geometric shapes? Does it mean that “replacing the faces with homogeneous squares left some race-specifying information in the images”? Obviously not – but this is the same logic as in Levin et al.'s commentary. The mere ability to assign race when forced to do so doesn't imply that subjects actively categorized the faces by race; our data are still the only investigation of this latter question, and they suggest that even subjects who don't categorize the faces as African-American and Caucasian still experience distorted lightness.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170928100721-25606-mediumThumb-S0140525X16000029_fig3g.jpg?pub-status=live)
Figure R3. Two-alternative forced-choice judgments can produce seemingly reliable patterns of results even when subjects don't base their judgments on the property of interest. If you had to choose, which of these squares would you label “African-American,” and which would you label “Caucasian”?
R3.4.2. Other evidence
Levin et al. correctly note that we discussed only one of Levin and Banaji's (Reference Levin and Banaji2006) many experiments in our target article, and they suggest that the other data (e.g., with line drawings equated for the distribution of luminance) provide better evidence. But we focused on Levin and Banaji's (Reference Levin and Banaji2006) “demo” rather than their experiments not because it was easy to pick on, but rather because it was the most compelling evidence we had ever seen for a top-down effect – much more so than their other experiments, which suffered from an El Greco fallacy, weren't subjectively appreciable, and included a truly unfortunate task demand (in that subjects were told in advance that the study would be about “how people perceive the shading of faces of different races,” which may have biased subjects' responses). By contrast, their “demo” seemed like the best evidence they had found, and so we focused on it. A major theme throughout our project has been to focus not on “low-hanging fruit” but instead on the strongest, most influential, and best-supported cases we know of for top-down effects of cognition on perception. We happily include Levin and Banaji's (Reference Levin and Banaji2006) inspiring work in that class.
R3.5. Pitfall 5: Peripheral attentional effects
Most commentaries agreed that peripheral effects of attention (e.g., attending to one location or feature rather than another) don't “count” as top-down effects of cognition on perception, because – like shifts of the eyes or head – they merely select the input to otherwise-impenetrable visual processing. (Most, but not all: Vinson et al. suggested that our wide-ranging and empirically anchored target article is undermined by the century-old duck–rabbit illusion. Believe it or not, we knew about that one already – and it, like so many other ambiguous figures, is easily explained by appeal to attentional shifts; Long & Toppino Reference Long and Toppino2004; Peterson & Gibson Reference Peterson and Gibson1991; Toppino Reference Toppino2003).
Other commentaries, however (especially Beck & Clevenger; Clark; Goldstone et al.; Most; Raftopoulos), argued that attention “does not act only in this external way” (Raftopoulos). Clark, for example, pointed to rich models of attention as “a deep, pervasive, and entirely nonperipheral player in the construction of human experience,” and asked whether attention can be written off as “peripheral.”
We are sympathetic to this perspective in general. That said, we find allusions to the notion that attention can “alter the balance between top-down prediction and bottom-up sensory evidence at every stage and level of processing” (Clark) to be a bit too abstract for our taste, and we wish that these commentaries had pointed to particular experimental demonstrations that they think could be explained only in terms of top-down effects. Without such concrete cases, florid appeals to the richness of attention are reminiscent of the appeals to neuroscience in section R2.3: They sound compelling in the abstract, but they may collapse under scrutiny (as in sect. R2.3.2).
In general, however, our claim is not that all forms of attention must be “peripheral” in the relevant sense. Rather, our claim is that at least some are merely peripheral, and that many alleged top-down effects on perception can be explained by those peripheral forms of attention. This is why Lupyan is mistaken in arguing that “Attentional effects can be dismissed if and only if attention simply changes input to a putatively modular visual system”; attention may be a genuine alternative explanation just as long as attention sometimes changes input to later visual processing – because then such attentional effects must be actively ruled out by careful experimental tests of the sorts sketched in our target article.
R3.5.1. On which side of the “joint” is attention?
So, what about those cases of attention that aren't like moving your eyes? To be sure, we think such cases are rarer than many commentaries imagine. For example, attending to features, rather than locations, may not be analogous to moving one's eyes, but it is importantly analogous to seeing through a tinted lens – merely increasing sensitivity to certain features rather than others. Across the core cases of attending to locations, features, and objects, both classical and contemporary theorizing understands that, fundamentally, “attention is a selective process” that modulates “early perceptual filters” (Carrasco Reference Carrasco2011, pp. 1485–1486, emphasis added). That is what we mean when we speak of attention as constraining input: Attention acts as a filter that selects the information for downstream visual processing, which may itself be impervious to cognitive influence.
However, even if attention can go beyond this role and “alter the balance between top-down prediction and bottom-up sensory evidence at every stage and level of processing” (Clark), we find it odd to move from such sophisticated attentional processing to the further claim that perception is “cognitively penetrated” by attention (Raftopoulos). The controversy over top-down effects of cognition on perception is a controversy over the revolutionary possibility that what we see is directly altered by how we think, feel, act, speak, and so forth. But attention's role in perception simply cannot be revolutionary in this way: As Block noted in his commentary, “attention works via well-understood perceptual mechanisms” (emphasis his); and, as he has noted more informally, attention – unlike morality and hunger, say – is already extensively studied by vision science, and it fits comfortably within the orthodox framework of how the mind (in general) and perception (in particular) are organized. Our project concerns the “joint” between perception and cognition, and attention unquestionably belongs on the perception side of this joint. If some continue to think of attention as a nonperceptual influence on what we see, they can do so; but to quote Block out of context, “If this is cognitive penetration, why should we care about cognitive penetration?”
R3.6. Pitfall 6: Memory and recognition
Although many commentaries discussed the distinction between perception and memory – some suggesting that memory accounts for even more top-down effects than we suggested (Tseng et al.) – two commentaries in particular protested our empirical case studies of this distinction (perhaps unsurprisingly, given that those case studies involved their work; Gantman & Van Bavel; Lupyan). At the same time, these commentaries sent mixed signals: Both objected to our distinction between perception and memory, claiming that it “carves the mind at false joints” (Gantman & Van Bavel) because memory cannot be “cleanly split from perception proper” (Lupyan); but both then went to extraordinary lengths to try to rule out the memory-based interpretations we offered – apparently agreeing that such alternatives would undermine their claims. How compelling were these attempted rebuttals?
R3.6.1. “Moral pop-out” does not exist
Moral words are identified more accurately than random nonmoral words, which led Gantman and Van Bavel (Reference Gantman and Van Bavel2014) to claim that “moral concerns shape our basic awareness” (p. 29) – a claim that has since been upgraded to “human perception is preferentially attuned to moral content” (Gantman & Van Bavel Reference Gantman and Van Bavel2015, p. 631; though, see Firestone & Scholl Reference Firestone and Scholl2016). However, the moral words in these studies were semantically related to each other (e.g., crime, punishment), whereas the nonmoral words were not (e.g., steel, ownership), which led us to suspect that semantic priming due to spreading activation – a phenomenon of memory rather than perception – might explain the effect. Sure enough, you can obtain “pop-out” effects with any arbitrary category of related words (Firestone & Scholl Reference Firestone and Scholl2015b), including fashion (e.g., blouse, dress; pop-out effect: 8.6%) and transportation (e.g., car, bus; pop-out effect: 4.3%). (We also replicated the effect with morality; pop-out effect: 3.9%, which matched Gantman & Van Bavel's original report.) Moreover, although our experiments were not designed to test this (and our account does not require it), semantic priming was evident even at the trial-by-trial level, such that seeing a category word (whether fashion, transportation, or moral) on one trial boosted recognition of subsequent category words more than it boosted recognition of subsequent noncategory words (providing such a boost of 9% for fashion, 6% for transportation, and 5% for morality [which are the means that Gantman & Van Bavel requested, and which straightforwardly support our account]).
R3.6.2. Really, it doesn't
Whereas some of the empirical case studies we have explored turn on subtle details that may be open to interpretation, the “moral pop-out” case study has always seemed to us to be clear, unsubtle, and unusually decisive (and we have been pleased to see that others concur; e.g., Jussim et al. Reference Jussim, Crawford, Anglin, Stevens and Duarte2016). Gantman & Van Bavel disagreed, with three primary counterarguments. However, their responses respectively (1) mischaracterize our challenge, (2) cannot possibly account for our results, and (3) bet on possibilities that are already known to be empirically false. We briefly elaborate on each of these challenges:
First, Gantman & Van Bavel write, “F&S recently claimed that semantic memory must be solely responsible for the moral pop-out effect because the moral words were more related to each other than the control words were.” We made no such claim, and we don't even think this claim makes sense: The relatedness confound alone doesn't mean that it “must be solely responsible” (emphasis added); it merely means that semantic priming could be responsible, such that Gantman and Van Bavel's (Reference Gantman and Van Bavel2014) original conclusions wouldn't follow. Nevertheless, we actively tested this alternative empirically: When we ran the relevant experiments, semantic relatedness in fact produced analogous pop-out effects. It was our experimental results, not the confound itself, that suggested that “moral pop-out” is really just semantic priming.
Second, they complain that subjects in our pop-out studies were not “randomly assigned” to the three experiments we ran (i.e., fashion, transportation, or morality). This was certainly true, insofar as these were three separate experiments. But surely that can't by itself be somehow disqualifying. After all, if this feature prevented our studies of morality and fashion from being interpreted as analogous, then by the same criteria, no two experiments conducted at different times or in different labs could ever be compared for any purpose – even just to suggest, as we do, that both experiments appear to be investigating the same thing.
More generally, the manner in which this second complaint was supposed to undermine our argument was completely unelaborated. So, let's evaluate this carefully: Just how could such “nonrandom” assignment undermine our interpretation that morality plays no role in “moral pop-out”? If we were claiming differences between the experiments, then nonrandom assignment could be problematic in a straightforward way: Perhaps one group of subjects was more tired or stressed out (etc.), and that factor explains the difference. But in fact we suggested that there is no evidence of any relevant differences among the various pop-out effects. Our explanation for this apparent equivalence is that the same underlying process (semantic priming) drives all of the effects, with no evidence that morality, per se, plays any role. Can a lack of random assignment explain this apparent equivalence differently? Such an explanation would have to assume that the “true” effect with morality in our experiments was in fact much larger than for the other categories (due to the morality-specific boost) but that this particular group of subjects (i.e., members of the Yale community tested in one month rather than another) somehow deflated this previously undiscovered “super-pop-out” down to … exactly the same magnitude (of 4%) that Gantman & Van Bavel (Reference Gantman and Van Bavel2014) previously reported. In other words, “random assignment” is a red herring here, and it cannot save the day for “moral pop-out.”
Third, Gantman & Van Bavel offer a final speculation about semantic priming to salvage their account, but in fact this speculation is demonstrably false. In particular, they suggest that semantic priming cannot explain moral pop-out because moral words cannot easily prime each other:
We suspect that moral words are not explicitly encoded in semantic memory as moral terms or as having significant overlapping content. For example, kill and just both concern morality, but one is a noun referring to a violent act and the other is an adjective referring to an abstract property. Category priming is more likely when the terms are explicitly identifiable as being in the same category or at least as having multiple overlapping semantic features (e.g., pilot, airport). (Gantman & Van Bavel, para. 8)
But this novel suggestion completely misconstrues the nature of spreading activation in memory. Semantic priming is a phenomenon of relatedness – not of being “explicitly identifiable as being in the same category” – and it works just fine between nouns and adjectives (though kill, of course, is more commonly a verb, not a noun). Our own fashion words, for example, included words from multiple parts of speech and varying levels of abstractness (e.g., wear, trendy, pajamas), and they had no difficulty priming each other. And the moral words included justice, law, illegal, crime, convict, guilty, jail, and so on – words so related as to practically constitute a train of thought. In short, Gantman & Van Bavel's speculation in this domain effectively requires that law and illegal would not activate each other via associative links in semantic memory, but this seems counter to everything we know about how semantic priming works.
R3.6.3. Labels and squiggles
Applying “labels” to meaningless squiggles (i.e., thinking of and
as a rotated 2 and 5) makes them easier to find in a search array (Lupyan & Spivey Reference Lupyan and Spivey2008). Is this a “conceptual effect on visual processing” (Lupyan Reference Lupyan2012)? Or does thinking of the symbols as familiar items just make it easier to remember what you're looking for? Klemfuss et al. (Reference Klemfuss, Prinzmetal and Ivry2012) – highlighted as one of our case studies – demonstrated the latter: When the task is repeated with a copy of the target symbol on-screen (so that one needn't remember it), the “labeling” advantage disappears; moreover, such labeling fails to improve visual processing of other features of the symbols that don't rely on memory (e.g., line thickness).
Lupyan agreed that the on-screen cue eliminated the labeling advantage but also noted that it slowed performance relative to the no-cue condition. This is simply irrelevant: The cue display was more crowded initially, and it included a stronger orienting signal (a large cue vs. a small fixation cross), both of which may have affected performance. What matters is the interaction: holding fixed the presence of the cue, labels had no effect, contra Lupyan's account.
More generally, though, Lupyan's suggestion that our memory explanation and his “retuning of visual feature detectors” explanation (Lupyan & Spivey Reference Lupyan and Spivey2008; Lupyan et al. Reference Lupyan, Thompson-Schill and Swingley2010; Lupyan & Ward Reference Lupyan and Ward2013) are “exactly the same” is oddly self-undermining. If these effects really are explained by well-known mechanisms of memory as we suggested (see also Chen & Proctor Reference Chen and Proctor2012), then none of these new experiments needed to be done in the first place, because semantic priming has all of the same effects and has been well characterized for nearly half a century. By contrast, we think Lupyan's exciting and provocative work raises the revolutionary possibility that meaningfulness per se reaches down into visual processing to change what we see; but if this revolution is to be achieved, mere effects of memory must be ruled out.
R4. Whac-a-Mole
We find the prospect of a genuine top-down effect of cognition on perception to be exhilarating. In laying out our checklist of pitfalls, our genuine hope is to discover a phenomenon that survives them – and indeed many commentators suggested they had found one. On the one hand, we are hesitant to merely discuss (rather than empirically investigate) these cases, for the same reason that our target article focused so exclusively on empirical case studies: We sincerely wish to avoid the specter of vague “Australian stepbrothers” (Bruner & Goodman Reference Bruner and Goodman1947; see sect. 5.1) that merely could explain away these effects, without evidence that they really do. What we really need are new empirical case studies (and we have plenty more in the works; e.g., see Firestone & Scholl Reference Firestone and Scholl2015c). On the other hand, we have strong opinions about many of the cases raised in the commentaries – and it wouldn't be sporting to ignore them. So here, we'll discuss several of the most provocative, most compelling, best-supported cases that were raised.
In general, this part of the conversation feels a bit like the children's game of “Whac-a-Mole” (see Fig. R4): Even if you manage to whack one top-down effect, another immediately pops up to replace it. Our hope is that, by highlighting the six-pitfall checklist, such mole-whacking may occur preemptively, such that “only research reports that pass (or at least explicitly address) F&S's six criteria can henceforth become part of the serious theoretical conversation” (Cutler & Norris). For now, we'll play Whac-a-Mole – both for general phenomena (sect. R4.1) and specific studies (sect. R4.2).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170928100721-98252-mediumThumb-S0140525X16000029_fig4g.jpg?pub-status=live)
Figure R4. Some excited people playing Whac-a-mole.
R4.1. General phenomena
Over and above particular studies that some commentators believed escape our pitfalls, many commentaries focused on general psychological phenomena that may or may not be top-down effects of cognition on perception.
R4.1.1. Inattentional blindness and emotion-induced blindness
Most suggested that failures to see what is right in front of us when our attention is otherwise occupied (by a distracting task or an emotional image) are examples of cognition penetrating perception (Most et al. Reference Most, Simons, Scholl, Jimenez, Clifford and Chabris2001; Reference Most, Scholl, Clifford and Simons2005b; Most & Wang Reference Most and Wang2011). We think these phenomena are fascinating – so much so that we wish we studied them ourselves. (Well, one of us [CF] wishes that; the other [BJS] does work on this topic [e.g., Ward & Scholl Reference Ward and Scholl2015] and thinks the second author of Most et al. Reference Most, Scholl, Clifford and Simons2005b made a valuable contribution.) At any rate, both of us think that “inattentional blindness” is aptly named: It is clearly a phenomenon of selective attention, occurring when attention is otherwise occupied. As such, it is exactly the sort of input-level effect that does not violate the encapsulation of seeing from thinking, per section R3.5.
R4.1.2. Hallucinations and delusions
In looking for top-down effects of cognition on perception, Howe & Carter suggested that “hallucinations are one example that clearly meets this challenge.” However, Ross, McKay, Coltheart, & Langdon (Ross et al.) disagreed, arguing that two-factor theories of such abnormal psychological states “are not committed to perception being cognitively penetrable,” and, indeed, “are predicated on a conceptual distinction (and empirical dissociation) between perception and cognition.” We agree with Ross et al. It is important in evaluating a candidate top-down effect on perception to consider exactly what the “top” is supposed to be. If anything, hallucinations show many of the hallmarks of inflexible processing: After all, many patients who experience hallucinations find them to be intrusive in their daily lives and unresponsive to the patient's wishes that they would disappear.
O'Callaghan et al. suggested that hallucinations must be examples of cognitive penetrability because they incorporate autobiographical information, including visions of “familiar people or animals” such as a “deceased spouse during a period of bereavement.” But this analysis conflates higher-level expectations with lower-level priming and long-term sensitivity changes; it is no coincidence, after all, that O'Callaghan et al. used “familiar” items as examples. Again, it is equally important to consider the content that hallucinations do not incorporate, and the states they are not sensitive to – including the very higher-level wishes and desires that would make these genuine top-down effects of cognition on perception.
R4.1.3. Motor expertise
Cañal-Bruland et al. observed that, for an unskilled baseball player facing a pitch, “the information you attune to for guiding your batting action would be crucially different from the information the expert attunes to and uses,” but then they assert without argument that “this is the perfect example for no change in visual input but a dramatic change in visual perception” (emphasis theirs). (See also Witt et al.'s colorful quote by Pedro Martinez.) Why? Why is this perception at all, rather than a change in action or attentional focus? This is exactly what remains to be shown. Our core aim is to probe the distinctions between processes such as perception, memory, attention, action, and so on; these are the distinctions Cañal-Bruland et al. simply ignore.
R4.1.4. Mental imagery
Howe & Carter suggested that, in mental imagery, “perception is obviously affected by top-down cognition” (see also de Haas et al. and Esenkaya & Proulx). We don't find this so obvious. First, Howe & Carter assert that “people actually see the mental images as opposed to merely being aware of them,” without acknowledging that this is one of the single most controversial claims in the last half-century of cognitive science (for a review in this journal, see Pylyshyn Reference Pylyshyn2002). But second, so what? Even if mental imagery and visual perception share some characteristics, they differ in other ways, including vividness, speed, effortfulness, and so on, and these differences allow us to distinguish visual imagery from visual perception. As Block argues about imagery:
If this is cognitive penetration, why should we care about cognitive penetration? Mental imagery can be accommodated to a joint between cognition and perception by excluding these quasi-perceptual states, or alternatively, given that imagery is so slow, by excluding slow and effortful quasi-perceptual states. (Block, para. 5)
We agree.
R4.1.5. Sinewave speech
Seemingly random electronic-sounding squeaks can be suddenly and strikingly segmented into comprehensible speech when the listener first hears the unambiguous speech from which the squeaks were derived (Remez et al. Reference Remez, Rubin, Pisoni and Carrell1981). Vinson et al. ask: “Is this not top-down?”
Maybe not. The role of auditory attention in such “sinewave speech” is still relatively unknown: Even Vinson et al.'s citation for this phenomenon (Darwin Reference Darwin1997) rejects the more robustly “top-down” interpretation of sinewave speech and instead incorporates it into a framework of “auditory grouping” – an analogy with visual grouping, which is a core phenomenon of perception and not an example of cognitive penetrability.
But maybe so. In any case, this is not our problem: As many commentaries noted, our thesis is about visual perception, “the most important and interesting of the human modalities” (Keller), and it would take a whole other manifesto to address the also-important-and-interesting case of audition. Luckily, Cutler & Norris have authored such a manifesto, in this very journal (Norris et al. Reference Norris, McQueen and Cutler2000) – and their conclusion is that in speech perception, “feedback is never necessary.” Bottoms up to that!
R4.1.6. Multisensory phenomena
Though the most prominent crossmodal effects are from vision to other senses (e.g., from vision to audition; McGurk & MacDonald Reference McGurk and MacDonald1976), de Haas et al. and Esenkaya & Proulx pointed to examples of other sense modalities affecting vision as evidence for cognitive penetrability. For example, a single flash of light can appear to flicker when accompanied by multiple auditory beeps (Shams et al. Reference Shams, Kamitani and Shimojo2000), and waving one's hand in front of one's face while blindfolded can produce illusory motion (Dieter et al. Reference Dieter, Hu, Knill, Blake and Tadin2014). Are these top-down effects of cognition on perception?
We find it telling that none of these empirical reports themselves connect the findings up with issues of cognitive penetrability. Indeed, these effects show the very same inflexibility that visual perception itself shows, and in fact they don't work with mere higher-level knowledge; for example, merely knowing that someone else is waving his or her hand in front of your face does not produce illusory motion (Dieter et al. Reference Dieter, Hu, Knill, Blake and Tadin2014). Instead, these are straightforwardly effects of perception on perception.
R4.1.7. Drugs and “Neurosurgery.”
Some commentaries pointed to influences on perception from more extreme sources, including powerful hallucinogenic drugs (Howe & Carter) and even radical “neurosurgery” (Goldstone et al.). Whether raised sincerely or in jest, these cases may be exceptions that prove the rule: If the only way to get such spectacular effects on perception is to directly alter the chemical and anatomical makeup of the brain, then this only further testifies to the power of encapsulation and how difficult it is to observe such effects in healthy, lucid, un-operated-on observers.
R4.2. Particular studies
Beyond general phenomena that may bear on the relationship between seeing and thinking, some commentaries emphasized particular studies that they felt escaped our six pitfalls.
R4.2.1. Action-specific perception in Pong
Subjects judge a ball to be moving faster when playing the game Pong with a smaller (and thus less effective) paddle (e.g., Witt & Sugovic Reference Witt and Sugovic2010). Witt et al. advertised this effect as “An action-specific effect on perception that avoids all pitfalls.” We admire many of the measures this work has taken to address alternative explanations, but it remains striking that the work has still failed to apply the lessons from research on task demands. To our knowledge, subjects in these studies have never even been asked about their hypotheses (let alone told a cover story), nor have they been asked how they make their judgments. This could really matter: For example, other work on action-specific perception in similarly competitive tasks has shown that subjects blame the equipment to justify poor performance (Wesp et al. Reference Wesp, Cichello, Gracia and Davis2004; Wesp & Gasper Reference Wesp and Gasper2012); could something similar be occurring in the Pong paradigm, such that subjects say the ball is moving faster to excuse their inability to catch it?
R4.2.2. Perceptual learning and object perception
Emberson explored a study of perceptual learning and object segmentation showing that subjects who see a target object in different orientations within a scene during a training session are subsequently more likely to see an ambiguous instance of the target object as completed behind an occluder rather than as two disconnected objects (Emberson & Amso Reference Emberson and Amso2012).
We find this result fascinating, but we fail to see its connection to cognitive (im)penetrability. (And we also note in passing that almost nobody in the rich field of perceptual learning has discussed their results in terms of cognitive penetrability.) Emberson quotes our statement that in perceptual learning, “the would-be penetrator is just the low-level input itself,” but then seems to interpret this statement as referring to “simple repeated exposure.” But, as we wrote right after this quoted sentence, “the thesis of cognitive impenetrability constrains the information modules can access, but it does not constrain what modules can do with the input they do receive.” Indeed, we suspect that we have the same rich view of perceptual learning as Emberson does, such that perceptual learning may incorporate all sorts of sophisticated processing in extracting the statistics of the environment. Nevertheless, the “top” in this putative top-down effect is simply the statistical regularities of the environment.
R4.2.3. Energy and slant perception
Clore & Proffitt suggested that recent studies of energy and slant perception overcome demand characteristics in past research, pointing to studies of sugary beverages and estimated hill slant (Schnall et al. Reference Schnall, Zadra and Proffitt2010), and quasi-experimental designs linking body weight with slant estimates of a staircase (Taylor-Covill & Eves Reference Taylor-Covill and Eves2016). Our target article already discussed studies of sugar and slant, in which subjects who drank a sugary beverage judged a hill to be less steep (Schnall et al. Reference Schnall, Zadra and Proffitt2010). For reasons that remain unclear, all subjects in those studies also wore heavy backpacks, regardless of whether they drank a sugary beverage, and we suggested that the sugar manipulation may have interacted with the demand from the backpack. Clore & Proffitt wrote, “we are not aware of any data supporting glucose effects on susceptibility to demand.” But their own commentary cited those very data: Durgin et al. (Reference Durgin, Klein, Spiegel, Strawser and Williams2012) empirically demonstrated this by showing that instructing subjects to ignore the backpack not only eliminates the backpack's effect on slant estimates (which is not so surprising), but also eliminates the effects of the sugar manipulation – which is quite surprising indeed, if one thinks that sugar affects perceived slant all on its own.
In another study Clore & Proffitt discussed, subjects were recruited at a train station and were visually classified as overweight (i.e., having a high body-mass index [BMI]) or healthy (i.e., having a normal BMI). Overweight subjects estimated a staircase as steeper (Taylor-Covill & Eves Reference Taylor-Covill and Eves2016), even though there was no overt manipulation to create experimental demand. However, this quasi-experimental design ensured nonrandom assignment of subjects (in a way that actually matters, due to a claimed difference; cf. Gantman & Van Bavel), and data about subjects' height and posture (etc.) were not reported, even though such variables correlate with BMI (Garn et al. Reference Garn, Leonard and Hawthorne1986) and may alter subjects' staircase-viewing perspective. But more broadly, it's not clear which direction of this effect supports the view that effort affects perception. The purpose of stairs, after all, is to decouple steepness from effort, and in fact steeper staircases are not always harder to climb than shallower staircases, holding fixed the staircase's height of ascent. Indeed, the 23.4° staircase used in Taylor-Covill and Eves' (2016) study is actually less steep than the energetically optimal staircase steepness of 30° (Warren Reference Warren1984), meaning that, if anything, perceiving the staircase as steeper (as the high-BMI subjects did) is actually perceiving it as easier to climb, not harder to climb. In other words, this effect is in the wrong direction for Clore & Proffitt's account!
R4.2.4. Categorization and inattentional blindness
Most reviewed evidence that categorization of a stimulus (e.g., as a number or a letter) can change the likelihood that we will see it in the first place (Most Reference Most2013). But this study manipulated categorization by changing the way the stimulus itself looked (in particular, its orientation) – the kind of low-level difference (Pitfall 4) that can really matter. Better to use a truly ambiguous stimulus (such as the B/13 stimulus employed in other top-down effects; e.g., Balcetis & Dunning Reference Balcetis and Dunning2006).
R4.2.5. Memory color
We are very impressed by Witzel, Olkkonen, & Gegenfurtner's (Witzel et al.'s) reports that the remembered color of an object alters its perceived color – such that, for example, subjects who must set a banana's color to be gray in fact make it a bit blue (Hansen et al. Reference Hansen, Olkkonen, Walter and Gegenfurtner2006). This work (unlike most recently alleged top-down effects) comes from our own field and applies the rigor of vision science in careful ways that are sensitive to many of our concerns. So, what explains it?
Even if gray bananas truly look yellow, that needn't imply cognitive impenetrability; it could instead simply be a form of perceptual learning, as we explored earlier (see also Deroy Reference Deroy2013). Still, deep puzzles remain about the nature of this effect. For example, Hansen et al. (Reference Hansen, Olkkonen, Walter and Gegenfurtner2006) themselves note that these memory-color effects are many times larger than established discrimination thresholds, and yet the effect fails to work as a subjectively appreciable demo: A gray banana all on its own just doesn't look yellow, and it certainly does not look as yellow as the results imply. This suggests that some kind of response bias could be involved. Because the subjects' task in these experiments is to adjust the banana's color to look gray, one possibility is that subjects are overcorrecting: They see a banana, they know that bananas are yellow, and so they try to make sure that all of the yellow is gone, which ends up producing a slightly blue banana. Another, less skeptical, possibility is that the effect of memory color is also an effect on memory – of gray. In other words, the gray standard that subjects have in mind as their adjustment goal may change depending on the object's identity, rather than the perceived color of the object itself (see Zeimbekis Reference Zeimbekis2013; though see also Macpherson Reference Macpherson2012).
Either way, we wonder whether this effect is truly perceptual, and we are willing to “pre-register” an experiment in this response. We suspect that, after adjusting a banana to look gray (but in fact be blue), subjects who see this bluish banana next to (a) an objectively gray patch and (b) a patch that is objectively as blue as the bluish (but supposedly gray-looking) banana will be able to tell that the banana is the same color as the blue patch, not the gray patch. Conversely, we suspect that subjects who see an objectively gray banana next to (a) an objectively gray patch and (b) a patch that is as yellow as the magnitude of the memory color effect will be able to tell that the banana is the same color as the gray patch, not the yellow patch (as we can in Witzel et al.'s figure).Footnote 5 At any rate, Witzel et al.'s account makes strong predictions to the contrary in both cases.
R5. Concluding remarks
We have a strong view about the relationship between seeing and thinking. However, the purpose of this work is not to defend that view. Instead, it is to lay the empirical groundwork for discovering genuinely revolutionary top-down effects of cognition on perception, which we find to be a truly exhilarating possibility. We hope we have highlighted the key empirical tools that will matter most for discovering and evaluating such effects, and that we have further shown how these tools can be employed in concrete experiments. We contend that no study has yet escaped our pitfalls; but if this framework helps reveal a genuine top-down effect of cognition on perception, we will be thrilled to have played a part.
Target article
Cognition does not affect perception: Evaluating the evidence for “top-down” effects
Related commentaries (34)
Acting is perceiving!
Action valence and affective perception
An action-specific effect on perception that avoids all pitfalls
Attention alters predictive processing
Attention and memory-driven effects in action studies
Attention and multisensory modulation argue against total encapsulation
Behavior is multiply determined, and perception has multiple components: The case of moral perception
Beyond perceptual judgment: Categorization and emotion shape what we see
Bottoms up! How top-down pitfalls ensnare speech perception researchers, too
Carving nature at its joints or cutting its effective loops? On the dangers of trying to disentangle intertwined mental processes
Cognition can affect perception: Restating the evidence of a top-down effect
Convergent evidence for top-down effects from the “predictive brain”1
Crossmodal processing and sensory substitution: Is “seeing” with sound and touch a form of perception or cognition?
Firestone & Scholl conflate two distinct issues
Fundamental differences between perception and cognition aside from cognitive penetrability
Gaining knowledge mediates changes in perception (without differences in attention): A case for perceptual learning
Hallucinations and mental imagery demonstrate top-down effects on visual perception
How cognition affects perception: Brain activity modelling to unravel top-down dynamics
Memory colours affect colour appearance
Not even wrong: The “it's just X” fallacy
Oh the irony: Perceptual stability is important for action
On the neural implausibility of the modular mind: Evidence for distributed construction dissolves boundaries between perception, cognition, and emotion
Perception, as you make it
Perception, cognition, and delusion
Representation of affect in sensory cortex
Studies on cognitively driven attention suggest that late vision is cognitively penetrated, whereas early vision is not
Task demand not so damning: Improved techniques that mitigate demand in studies that support top-down effects
The anatomical and physiological properties of the visual cortex argue against cognitive penetration
The distinction between perception and judgment, if there is one, is not clear and intuitive
The El Greco fallacy and pupillometry: Pupillary evidence for top-down effects on perception
The folly of boxology
The myth of pure perception
Tweaking the concepts of perception and cognition
What draws the line between perception and cognition?
Author response
Seeing and thinking: Foundational issues and empirical horizons