The brain is not modular
F&S assume that the words cognition and perception refer to distinct types of mental processes (“natural kind” categories; Barrett Reference Barrett2009) localized to spatially distinct sets of neurons in the brain, sometimes called modules or mental organs (Fodor Reference Fodor1983; Gall Reference Gall1835; Pinker Reference Pinker1997). As an intuition pump, the authors ask readers to “imagine looking at an apple in a supermarket and appreciating its redness” (sect. 1, para. 3). “That is perception,” they suggest, compared with appreciating an apple's price, which they argue is “cognition.” This modular view assumes that the brain's visual processing is “encapsulated” from nonperceptual influences. For example, F&S propose that context effects on perception are fully encapsulated within the visual system and therefore are not a meaningful example of top-down effects.
This approach promotes using phenomenology to guide scientific insight, which epitomizes naive realism – the belief that one's experiences reveal the objective realities of the world (Hart et al. Reference Hart, Tullett, Shreves and Fetterman2015; Ross & Ward Reference Ross, Ward, Reed, Turiel and Brown1996). The distinctive experiences of seeing and thinking do not reveal a natural boundary in brain structure or function. The idea that the brain contains separate “mental organs” stems from an ancient view of neuroanatomy (see Finger Reference Finger2001 for a history of neuroanatomy). Modern neuroanatomy reveals that the brain is better understood as one large, interconnected network of neurons, bathed in a chemical system, that can be parsed as a set of broadly distributed, dynamically changing, interacting systems (Marder Reference Marder2012; Sporns Reference Sporns2011; van den Heuvel & Sporns Reference van den Heuvel and Sporns2013). These systems are domain general: Their interactions constitute mental phenomena that we consider distinct, such as perception, cognition, emotion, and action (for discussions, see Anderson Reference Anderson2014; Barrett Reference Barrett2009; Barrett & Satpute Reference Barrett and Satpute2013; Lindquist & Barrett Reference Lindquist and Barrett2012; Pessoa 2014; Yeo et al. Reference Yeo, Krienen, Eickhoff, Yaakub, Fox, Buckner, Asplund and Chee2015). For example, Figure 1 displays a meta-analytic summary of more than 5,600 neuroimaging studies from the Neurosynth database (www.neurosynth.org; Yarkoni et al. Reference Yarkoni, Poldrack, Nichols, Van Essen and Wager2011) showing brain “hot spots” that evidence a consistent increase in activity across a wide variety of tasks spanning the domains of perception, cognition, emotion, and action (for other evidence, see Yeo et al. Reference Yeo, Krienen, Eickhoff, Yaakub, Fox, Buckner, Asplund and Chee2015). Seemingly distinct mental phenomena are implemented as dynamic brain states, not as individual, static mental organs, violating the assumption that the mind has intuitive “joints.”
The brain is not reflexively “stimulus driven.”
F&S assume that perception is reflexively driven by sensory inputs from the world that are commonly referred to as “bottom-up input.” For example, they describe cross-modal effects and context effects on perception as occurring “reflexively” based on visual input alone. But again, neuroanatomy is inconsistent with claims of reflexiveness. Cortical cytoarchitecture is linked to information flow within the brain (see Barbas, Reference Barbas2015; for discussion, see Chanes & Barrett, Reference Chanes and Barrett2016) and shows how representations of the past, created in the vast repertoire of connectivity patterns within the cortex (referred to as “top-down” and colloquially called “memory” or “cognition”), are always involved in perception, and often dominate. Vision is largely a top-down affair (e.g., Gilbert & Li Reference Gilbert and Li2013). For example, by most estimates, only 10% of the synapses from incoming neurons to primary visual cortex originate in the thalamus, which brings sensory input from the retina; the remaining 90% of these synapses originate in the cortex itself (Peters Reference Peters2002). Indeed, a bottom-up, reactive brain would be metabolically expensive and anatomically infeasible (e.g., see Sterling & Laughlin Reference Sterling and Laughlin2015). F&S dismiss top-down connections as irrelevant to their argument, because knowledge of anatomical connections is “common ground for all parties” (sect. 2.4, para. 2) and so these connections cannot be “revolutionary.” The issue of their novelty is irrelevant, however: F&S are arguing a position that violates the functional architecture of the brain.
Top-down anatomical connections are consistent with a predictive brain that models the world through active inference (e.g., Bar Reference Bar2007; Barrett & Simmons Reference Barrett and Simmons2015; Clark Reference Clark2013; den Ouden et al. Reference Den Ouden, Kok and De Lange2012; Friston Reference Friston2010; Rao & Ballard Reference Rao and Ballard1999). This process not only allows for, but also is predicated on, the existence of top-down effects. Specifically, the brain generatively synthesizes past experiences to continually construct predictions about the world, estimating their Bayesian prior probabilities relative to incoming sensory input. The brain then refines predictions accordingly. This means that top-down influences typically drive perception, and are constrained or corrected by incoming sensory inputs, rather than the other way around. Indeed, when humans and nonhuman animals change their expectations, sensory neurons change their firing patterns (e.g., Alink et al. Reference Alink, Schwiedrzik, Kohler, Singer and Muckli2010; Egner et al. Reference Egner, Monti and Summerfield2010; Makino & Komiyama Reference Makino and Komiyama2015; for a discussion, see Chanes & Barrett, Reference Chanes and Barrett2016).
Although F&S acknowledge unconscious, reflexive inference in the visual system, they dispute the idea that “cognitive inferences” shape perception. Their distinction between reflexive visual inference and cognitive inference again advocates for a boundary that is rooted in naive realism, between reflex and volition (Descartes 1649/Reference Descartes1989). It has long been known that the main distinction between automatic and controlled processing (or System 1 and System 2) is primarily phenomenological (for a discussion, see Barrett et al. Reference Barrett, Tugade and Engle2004). Because the brain's control networks are involved in processing prediction error (applying attention to neurons to shape which inputs from the world are considered information and which are noise (Barrett & Simmons Reference Barrett and Simmons2015; Behrens et al. Reference Behrens, Woolrich, Walton and Rushworth2007; Gottlieb Reference Gottlieb2012; Pezzulo Reference Pezzulo2012), they are always engaged to some degree; relative differences in activity should not be confused with “activation” and “deactivation” or “on” and “off”). It is more consistent with neuroanatomy to assume a continuum of brain modes, with one end characterized by brain states constructed primarily with prediction (e.g., phenomena called “memory,” “daydreaming,” “mind wandering,” etc.), and the other end characterized by brain states where prediction error dominates (e.g., novelty-processing, learning, etc.), with a range of gradations in between. Evidence that the brain is active, not merely reactive, undermines the idea that perception is a bottom-up reflex isolated from cognition.
The brain is context-dependent
In discussing their “El Greco” fallacy, F&S implicitly assume that top-down effects would uniformly influence all elements in a visual field. This represents a fundamental misunderstanding of how the brain constructs Bayesian inferences, and in particular, reveals an underappreciated role of context. The authors argue that if an aperture looks more narrow when passing through it holding a wide rod (Stefanucci & Geuss Reference Stefanucci and Geuss2009), then a second aperture that a participant adjusts to match the perceived width of the first (but is not required to pass through) should look similarly narrow, at least while one is holding the same rod. Following the authors' logic, these two distortions should cancel out, leaving no measurable impact of holding the rod on width estimates. However, the first aperture is meant to be passed through, whereas the second is not, and it is well known that requirements for action strongly shape moment-to-moment processing (Cisek & Klaska Reference Cisek and Klaska2010). To assume that the top-down influence on width estimates would be the same and therefore cancel out under these distinct conditions suggests a misunderstanding of top-down effects.
Context shapes Bayesian “priors” that inform the brain's sensory predictions (Clark Reference Clark2013; Friston Reference Friston2010). As a consequence, sensory neurons behave differently when involved in contextually distinct perceptual tasks (Gilbert & Sigman Reference Gilbert and Sigman2007). Requirements for action (dictated by the task context) exert a top-down influence on the processing of visual information. In light of these considerations, research questions should shift from, “Are there top-down effects on perception?” in a global, undifferentiated way to “In what contexts and at what level in a hierarchy of predictions do different top-down effects emerge in a nuanced way?”
We agree with F&S that some studies of top-down effects may capture processes that are not traditionally categorized as perception (e.g., the impact of demand characteristics on judgment), and that studies designed for disconfirmation are an important part of theory testing. But the main thrust of their critique is based on folk categories of perception and cognition as reified in a modular, reactive, and context-insensitive brain. These assumptions, and the conclusion they support, are untenable given the considerable neuroscientific evidence that processing in the human brain is distributed, active, and exquisitely sensitive to context.
Firestone & Scholl (F&S) rely on an outdated view of the mind to argue against top-down influences on perception. We highlight three of their assumptions that are untenable given contemporary neuroscience evidence: that the brain is modular, reflexively stimulus-driven, and context-independent. This evidence undermines their leap from critiques of individual studies to the conclusion that cognition does not affect perception.
The brain is not modular
F&S assume that the words cognition and perception refer to distinct types of mental processes (“natural kind” categories; Barrett Reference Barrett2009) localized to spatially distinct sets of neurons in the brain, sometimes called modules or mental organs (Fodor Reference Fodor1983; Gall Reference Gall1835; Pinker Reference Pinker1997). As an intuition pump, the authors ask readers to “imagine looking at an apple in a supermarket and appreciating its redness” (sect. 1, para. 3). “That is perception,” they suggest, compared with appreciating an apple's price, which they argue is “cognition.” This modular view assumes that the brain's visual processing is “encapsulated” from nonperceptual influences. For example, F&S propose that context effects on perception are fully encapsulated within the visual system and therefore are not a meaningful example of top-down effects.
This approach promotes using phenomenology to guide scientific insight, which epitomizes naive realism – the belief that one's experiences reveal the objective realities of the world (Hart et al. Reference Hart, Tullett, Shreves and Fetterman2015; Ross & Ward Reference Ross, Ward, Reed, Turiel and Brown1996). The distinctive experiences of seeing and thinking do not reveal a natural boundary in brain structure or function. The idea that the brain contains separate “mental organs” stems from an ancient view of neuroanatomy (see Finger Reference Finger2001 for a history of neuroanatomy). Modern neuroanatomy reveals that the brain is better understood as one large, interconnected network of neurons, bathed in a chemical system, that can be parsed as a set of broadly distributed, dynamically changing, interacting systems (Marder Reference Marder2012; Sporns Reference Sporns2011; van den Heuvel & Sporns Reference van den Heuvel and Sporns2013). These systems are domain general: Their interactions constitute mental phenomena that we consider distinct, such as perception, cognition, emotion, and action (for discussions, see Anderson Reference Anderson2014; Barrett Reference Barrett2009; Barrett & Satpute Reference Barrett and Satpute2013; Lindquist & Barrett Reference Lindquist and Barrett2012; Pessoa 2014; Yeo et al. Reference Yeo, Krienen, Eickhoff, Yaakub, Fox, Buckner, Asplund and Chee2015). For example, Figure 1 displays a meta-analytic summary of more than 5,600 neuroimaging studies from the Neurosynth database (www.neurosynth.org; Yarkoni et al. Reference Yarkoni, Poldrack, Nichols, Van Essen and Wager2011) showing brain “hot spots” that evidence a consistent increase in activity across a wide variety of tasks spanning the domains of perception, cognition, emotion, and action (for other evidence, see Yeo et al. Reference Yeo, Krienen, Eickhoff, Yaakub, Fox, Buckner, Asplund and Chee2015). Seemingly distinct mental phenomena are implemented as dynamic brain states, not as individual, static mental organs, violating the assumption that the mind has intuitive “joints.”
Figure 1. Results of a forward inference analysis, revealing hot spots in the brain that are active across 5,633 studies from the Neurosynth database. Activations are thresholded at FWE P<0.05. Figure taken from Clark-Polner et al. (Reference Clark-Polner, Wager, Satpute, Barrett, Barrett, Lewis and Haviland-Jones2016).
The brain is not reflexively “stimulus driven.”
F&S assume that perception is reflexively driven by sensory inputs from the world that are commonly referred to as “bottom-up input.” For example, they describe cross-modal effects and context effects on perception as occurring “reflexively” based on visual input alone. But again, neuroanatomy is inconsistent with claims of reflexiveness. Cortical cytoarchitecture is linked to information flow within the brain (see Barbas, Reference Barbas2015; for discussion, see Chanes & Barrett, Reference Chanes and Barrett2016) and shows how representations of the past, created in the vast repertoire of connectivity patterns within the cortex (referred to as “top-down” and colloquially called “memory” or “cognition”), are always involved in perception, and often dominate. Vision is largely a top-down affair (e.g., Gilbert & Li Reference Gilbert and Li2013). For example, by most estimates, only 10% of the synapses from incoming neurons to primary visual cortex originate in the thalamus, which brings sensory input from the retina; the remaining 90% of these synapses originate in the cortex itself (Peters Reference Peters2002). Indeed, a bottom-up, reactive brain would be metabolically expensive and anatomically infeasible (e.g., see Sterling & Laughlin Reference Sterling and Laughlin2015). F&S dismiss top-down connections as irrelevant to their argument, because knowledge of anatomical connections is “common ground for all parties” (sect. 2.4, para. 2) and so these connections cannot be “revolutionary.” The issue of their novelty is irrelevant, however: F&S are arguing a position that violates the functional architecture of the brain.
Top-down anatomical connections are consistent with a predictive brain that models the world through active inference (e.g., Bar Reference Bar2007; Barrett & Simmons Reference Barrett and Simmons2015; Clark Reference Clark2013; den Ouden et al. Reference Den Ouden, Kok and De Lange2012; Friston Reference Friston2010; Rao & Ballard Reference Rao and Ballard1999). This process not only allows for, but also is predicated on, the existence of top-down effects. Specifically, the brain generatively synthesizes past experiences to continually construct predictions about the world, estimating their Bayesian prior probabilities relative to incoming sensory input. The brain then refines predictions accordingly. This means that top-down influences typically drive perception, and are constrained or corrected by incoming sensory inputs, rather than the other way around. Indeed, when humans and nonhuman animals change their expectations, sensory neurons change their firing patterns (e.g., Alink et al. Reference Alink, Schwiedrzik, Kohler, Singer and Muckli2010; Egner et al. Reference Egner, Monti and Summerfield2010; Makino & Komiyama Reference Makino and Komiyama2015; for a discussion, see Chanes & Barrett, Reference Chanes and Barrett2016).
Although F&S acknowledge unconscious, reflexive inference in the visual system, they dispute the idea that “cognitive inferences” shape perception. Their distinction between reflexive visual inference and cognitive inference again advocates for a boundary that is rooted in naive realism, between reflex and volition (Descartes 1649/Reference Descartes1989). It has long been known that the main distinction between automatic and controlled processing (or System 1 and System 2) is primarily phenomenological (for a discussion, see Barrett et al. Reference Barrett, Tugade and Engle2004). Because the brain's control networks are involved in processing prediction error (applying attention to neurons to shape which inputs from the world are considered information and which are noise (Barrett & Simmons Reference Barrett and Simmons2015; Behrens et al. Reference Behrens, Woolrich, Walton and Rushworth2007; Gottlieb Reference Gottlieb2012; Pezzulo Reference Pezzulo2012), they are always engaged to some degree; relative differences in activity should not be confused with “activation” and “deactivation” or “on” and “off”). It is more consistent with neuroanatomy to assume a continuum of brain modes, with one end characterized by brain states constructed primarily with prediction (e.g., phenomena called “memory,” “daydreaming,” “mind wandering,” etc.), and the other end characterized by brain states where prediction error dominates (e.g., novelty-processing, learning, etc.), with a range of gradations in between. Evidence that the brain is active, not merely reactive, undermines the idea that perception is a bottom-up reflex isolated from cognition.
The brain is context-dependent
In discussing their “El Greco” fallacy, F&S implicitly assume that top-down effects would uniformly influence all elements in a visual field. This represents a fundamental misunderstanding of how the brain constructs Bayesian inferences, and in particular, reveals an underappreciated role of context. The authors argue that if an aperture looks more narrow when passing through it holding a wide rod (Stefanucci & Geuss Reference Stefanucci and Geuss2009), then a second aperture that a participant adjusts to match the perceived width of the first (but is not required to pass through) should look similarly narrow, at least while one is holding the same rod. Following the authors' logic, these two distortions should cancel out, leaving no measurable impact of holding the rod on width estimates. However, the first aperture is meant to be passed through, whereas the second is not, and it is well known that requirements for action strongly shape moment-to-moment processing (Cisek & Klaska Reference Cisek and Klaska2010). To assume that the top-down influence on width estimates would be the same and therefore cancel out under these distinct conditions suggests a misunderstanding of top-down effects.
Context shapes Bayesian “priors” that inform the brain's sensory predictions (Clark Reference Clark2013; Friston Reference Friston2010). As a consequence, sensory neurons behave differently when involved in contextually distinct perceptual tasks (Gilbert & Sigman Reference Gilbert and Sigman2007). Requirements for action (dictated by the task context) exert a top-down influence on the processing of visual information. In light of these considerations, research questions should shift from, “Are there top-down effects on perception?” in a global, undifferentiated way to “In what contexts and at what level in a hierarchy of predictions do different top-down effects emerge in a nuanced way?”
We agree with F&S that some studies of top-down effects may capture processes that are not traditionally categorized as perception (e.g., the impact of demand characteristics on judgment), and that studies designed for disconfirmation are an important part of theory testing. But the main thrust of their critique is based on folk categories of perception and cognition as reified in a modular, reactive, and context-insensitive brain. These assumptions, and the conclusion they support, are untenable given the considerable neuroscientific evidence that processing in the human brain is distributed, active, and exquisitely sensitive to context.
ACKNOWLEDGMENTS
This manuscript was supported by a US National Institute on Aging grant (R01AG030311), a US National Institute of Child Health and Human Development grant (R21 HD076164), and contracts from the US Army Research Institute for the Behavioral and Social Sciences (contracts W5J9CQ12C0049 and W5J9CQ11C0046) to Barrett. The views, opinions, and findings contained in this article are those of the authors and should not be construed as an official position, policy, or decision of the US National Institutes of Health or Department of the Army unless so designated by other documents.