According to the predictive coding view, at every level of the visual/cortical hierarchy, there are two kinds of units: error units and representation units. Representations propagate downward in the visual hierarchy whereas error signals propagate upward. Error in this sense might be better called “discrepancy,” since it is the discrepancy between what the visual system predicts (at a given level) and what is represented at that level. Clark advertises the predictive coding (PC) framework as applying to a wide range of phenomena, including attention, which Clark says “is achieved by altering the gain (the ‘volume,’ to use a common analogy) on the error-units” (sect. 2.3, para. 6). We argue that for many attentional phenomena, the predictive coding picture either makes false predictions, or else it offers no distinctive explanation of those phenomena, thereby reducing its explanatory power.
Consider a basic result in this area (Carrasco et al. Reference Carrasco, Ling and Read2004), which is that attention increases perceived contrast by enhancing “the representation of a stimulus in a manner akin to boosting its physical contrast” (Ling & Carrasco Reference Ling and Carrasco2006, p. 1243). A cross-modal study using auditory attention-attractors (Störmer et al. Reference Störmer, McDonald and Hillyard2009) showed that the contrast-boosting effect correlated with increased activity in early stages of visual processing that are sensitive to differences in contrast among stimuli. The larger the cortical effect, the larger the effect on perceivers' judgments. Increasing the contrast of a stimulus has an effect on the magnitude of perceptual adaptation to that stimulus, causing greater threshold activation in the tilt after-effect and longer recovery time. Ling and Carrasco (Reference Ling and Carrasco2006) showed that attending to a stimulus while adapting to that stimulus has the same effect as increasing the contrast of the adapting stimulus. After attending to the adaptor (70% contrast), the contrast sensitivity of all observers was equivalent to the effect of adapting to a 81–84% contrast adaptor.
How do these results look from a PC perspective? Suppose that at time t
1, the perceiver is not attending to the left side of space but nonetheless sees a striped grid on the left with apparent contrast of 70%. Because there is no movement or other change, at time t
2, the visual system predicts that the patch will continue at 70%. But at t
2 the perceiver attends to the patch, raising the apparent contrast to, say, 82%. Now at t
2 there is an error, a discrepancy between what is predicted and what is “observed.” Since the PC view says attention is turning up the volume on the error representations, it predicts that at t
3 the signal (the represented contrast) should rise even higher than 82%. But that does not happen.
There are two important lessons. First, the initial changes due to attending come before there is an error (at t
2 in the example), so the PC viewpoint cannot explain them. Second, the PC view makes the false prediction that the changes due to attending will be magnified.
Sometimes PC theorists assume the error signal is equal to the input. Perhaps this identification makes some sense if the perceiver's visual system has no “expectations,” say because the eyes have just opened. But once the eyes have opened and things in the environment are seen, it makes no sense to take the error signal to be the sensory input.
The PC picture also seems to lack a distinctive explanation of why attention increases spatial acuity. Yeshurun and Carrasco (Reference Yeshurun and Carrasco1998) showed that increased attention can be detrimental to performance when resolution was already on the border of too high for the scale of the texture, increasing acuity to the point where the subject does not see the forest for the trees. Too little attention can also be detrimental, making it harder to see the trees. Yeshurun and Carrasco varied resolution of perception by presenting textured squares (such as the one in Fig. 1) at different eccentricities (the more foveal, the better the resolution). But they also varied resolution by manipulating the focus of spatial attention: With the eyes focused at the center, they attracted attention to the left or to the right. Combining contributions to resolution from eccentricity and attention, they found that there was an optimal level of resolution for detecting the square, with detection falling off on both ends. Single cell recordings in monkey visual cortex reveal shrinking receptive fields (the area of space that a neuron responds to) in mid-to-high level vision, specifically in V4, MT, and LIP, and this shrinkage in receptive fields is a contributor to explaining the increase in acuity (Carrasco Reference Carrasco2011).
Figure 1. A display of one of the textured figures (the square on the right) used by Yeshurun and Carrasco (Reference Yeshurun and Carrasco1998). The square appeared at varying degrees of eccentricity. With low resolution in peripheral locations, attention improved detection of the square; but with high resolution in central locations, attention impaired detection.
Does the PC framework have a distinctive explanation of attentional effects on spatial acuity, in terms of “gain in error-units”? If, due to the level of acuity, one does not see the square, then the prediction of no square will be confirmed, and there will be no discrepancy (“error”) to be magnified. Since the gain in error units is the only distinctive resource of the PC view for explaining attentional phenomena, the view seems to have no distinctive explanation of this result either. Can the predictive coding point of view simply borrow Carrasco's explanation? That explanation is a matter of shrinkage in receptive fields of neurons in the representation nodes, not anything to do with prediction error, so the predictive coding point of view would have to concede that attention can act directly on representation nodes without a detour through error nodes.
Finally, attention to certain items – for example, random dot patterns – makes them appear larger. Anton-Erxleben et al. (Reference Anton-Erxleben, Henrich and Treue2007) showed that the size of the effect is inversely related to the size of the stimulus, explaining the result in terms of receptive field shift (such shifts are also observed from single cell recordings in monkey visual areas; Womelsdorf et al. Reference Womelsdorf, Anton-Erxleben, Pieper and Treue2006). This explanation depends on the retinotopic and therefore roughly spatiotopic organization common to many visual areas – not on error units. Neurons whose receptive fields lie on the periphery of the pattern shift their receptive fields so as to include the pattern, moving the portion of the spatiotopically represented space to include the pattern, resulting in the representation of the pattern as occupying a larger area. Here too, predictive coding offers no distinctive explanation.
The facts of attention and adaptation do not fit well with the predictive coding view or any picture based on how “sensory neurons should behave” (Lochmann et al. Reference Lochmann, Ernst and Denève2012) rather than the facts of how they do behave. Without a distinctive explanation of these facts, the explanatory promises of predictive coding are overdrawn.
According to the predictive coding view, at every level of the visual/cortical hierarchy, there are two kinds of units: error units and representation units. Representations propagate downward in the visual hierarchy whereas error signals propagate upward. Error in this sense might be better called “discrepancy,” since it is the discrepancy between what the visual system predicts (at a given level) and what is represented at that level. Clark advertises the predictive coding (PC) framework as applying to a wide range of phenomena, including attention, which Clark says “is achieved by altering the gain (the ‘volume,’ to use a common analogy) on the error-units” (sect. 2.3, para. 6). We argue that for many attentional phenomena, the predictive coding picture either makes false predictions, or else it offers no distinctive explanation of those phenomena, thereby reducing its explanatory power.
Consider a basic result in this area (Carrasco et al. Reference Carrasco, Ling and Read2004), which is that attention increases perceived contrast by enhancing “the representation of a stimulus in a manner akin to boosting its physical contrast” (Ling & Carrasco Reference Ling and Carrasco2006, p. 1243). A cross-modal study using auditory attention-attractors (Störmer et al. Reference Störmer, McDonald and Hillyard2009) showed that the contrast-boosting effect correlated with increased activity in early stages of visual processing that are sensitive to differences in contrast among stimuli. The larger the cortical effect, the larger the effect on perceivers' judgments. Increasing the contrast of a stimulus has an effect on the magnitude of perceptual adaptation to that stimulus, causing greater threshold activation in the tilt after-effect and longer recovery time. Ling and Carrasco (Reference Ling and Carrasco2006) showed that attending to a stimulus while adapting to that stimulus has the same effect as increasing the contrast of the adapting stimulus. After attending to the adaptor (70% contrast), the contrast sensitivity of all observers was equivalent to the effect of adapting to a 81–84% contrast adaptor.
How do these results look from a PC perspective? Suppose that at time t 1, the perceiver is not attending to the left side of space but nonetheless sees a striped grid on the left with apparent contrast of 70%. Because there is no movement or other change, at time t 2, the visual system predicts that the patch will continue at 70%. But at t 2 the perceiver attends to the patch, raising the apparent contrast to, say, 82%. Now at t 2 there is an error, a discrepancy between what is predicted and what is “observed.” Since the PC view says attention is turning up the volume on the error representations, it predicts that at t 3 the signal (the represented contrast) should rise even higher than 82%. But that does not happen.
There are two important lessons. First, the initial changes due to attending come before there is an error (at t 2 in the example), so the PC viewpoint cannot explain them. Second, the PC view makes the false prediction that the changes due to attending will be magnified.
Sometimes PC theorists assume the error signal is equal to the input. Perhaps this identification makes some sense if the perceiver's visual system has no “expectations,” say because the eyes have just opened. But once the eyes have opened and things in the environment are seen, it makes no sense to take the error signal to be the sensory input.
The PC picture also seems to lack a distinctive explanation of why attention increases spatial acuity. Yeshurun and Carrasco (Reference Yeshurun and Carrasco1998) showed that increased attention can be detrimental to performance when resolution was already on the border of too high for the scale of the texture, increasing acuity to the point where the subject does not see the forest for the trees. Too little attention can also be detrimental, making it harder to see the trees. Yeshurun and Carrasco varied resolution of perception by presenting textured squares (such as the one in Fig. 1) at different eccentricities (the more foveal, the better the resolution). But they also varied resolution by manipulating the focus of spatial attention: With the eyes focused at the center, they attracted attention to the left or to the right. Combining contributions to resolution from eccentricity and attention, they found that there was an optimal level of resolution for detecting the square, with detection falling off on both ends. Single cell recordings in monkey visual cortex reveal shrinking receptive fields (the area of space that a neuron responds to) in mid-to-high level vision, specifically in V4, MT, and LIP, and this shrinkage in receptive fields is a contributor to explaining the increase in acuity (Carrasco Reference Carrasco2011).
Figure 1. A display of one of the textured figures (the square on the right) used by Yeshurun and Carrasco (Reference Yeshurun and Carrasco1998). The square appeared at varying degrees of eccentricity. With low resolution in peripheral locations, attention improved detection of the square; but with high resolution in central locations, attention impaired detection.
Does the PC framework have a distinctive explanation of attentional effects on spatial acuity, in terms of “gain in error-units”? If, due to the level of acuity, one does not see the square, then the prediction of no square will be confirmed, and there will be no discrepancy (“error”) to be magnified. Since the gain in error units is the only distinctive resource of the PC view for explaining attentional phenomena, the view seems to have no distinctive explanation of this result either. Can the predictive coding point of view simply borrow Carrasco's explanation? That explanation is a matter of shrinkage in receptive fields of neurons in the representation nodes, not anything to do with prediction error, so the predictive coding point of view would have to concede that attention can act directly on representation nodes without a detour through error nodes.
Finally, attention to certain items – for example, random dot patterns – makes them appear larger. Anton-Erxleben et al. (Reference Anton-Erxleben, Henrich and Treue2007) showed that the size of the effect is inversely related to the size of the stimulus, explaining the result in terms of receptive field shift (such shifts are also observed from single cell recordings in monkey visual areas; Womelsdorf et al. Reference Womelsdorf, Anton-Erxleben, Pieper and Treue2006). This explanation depends on the retinotopic and therefore roughly spatiotopic organization common to many visual areas – not on error units. Neurons whose receptive fields lie on the periphery of the pattern shift their receptive fields so as to include the pattern, moving the portion of the spatiotopically represented space to include the pattern, resulting in the representation of the pattern as occupying a larger area. Here too, predictive coding offers no distinctive explanation.
The facts of attention and adaptation do not fit well with the predictive coding view or any picture based on how “sensory neurons should behave” (Lochmann et al. Reference Lochmann, Ernst and Denève2012) rather than the facts of how they do behave. Without a distinctive explanation of these facts, the explanatory promises of predictive coding are overdrawn.