Hostname: page-component-745bb68f8f-5r2nc Total loading time: 0 Render date: 2025-02-06T16:42:29.314Z Has data issue: false hasContentIssue false

Visual prediction: Psychophysics and neurophysiology of compensation for time delays

Published online by Cambridge University Press:  14 May 2008

Romi Nijhawan
Affiliation:
Department of Psychology, University of Sussex, Falmer, East Sussex, BN1 9QH, United Kingdomromin@sussex.ac.ukhttp://www.sussex.ac.uk/psychology/profile116415.html
Rights & Permissions [Opens in a new window]

Abstract

A necessary consequence of the nature of neural transmission systems is that as change in the physical state of a time-varying event takes place, delays produce error between the instantaneous registered state and the external state. Another source of delay is the transmission of internal motor commands to muscles and the inertia of the musculoskeletal system. How does the central nervous system compensate for these pervasive delays? Although it has been argued that delay compensation occurs late in the motor planning stages, even the earliest visual processes, such as phototransduction, contribute significantly to delays. I argue that compensation is not an exclusive property of the motor system, but rather, is a pervasive feature of the central nervous system (CNS) organization. Although the motor planning system may contain a highly flexible compensation mechanism, accounting not just for delays but also variability in delays (e.g., those resulting from variations in luminance contrast, internal body temperature, muscle fatigue, etc.), visual mechanisms also contribute to compensation. Previous suggestions of this notion of “visual prediction” led to a lively debate producing re-examination of previous arguments, new analyses, and review of the experiments presented here. Understanding visual prediction will inform our theories of sensory processes and visual perception, and will impact our notion of visual awareness.

Type
Main Articles
Copyright
Copyright ©Cambridge University Press 2008

1. Introduction

1.1. Time delays in the nervous system

Time delays are intrinsic to all neural processes. Helmholtz, an eminent physicist and neurophysiologist of the nineteenth century, was among the first scientists to provide a clear measure of the speed of signal transmission within the nervous system. Using a nerve–muscle preparation, he electrically stimulated the motor nerve at two different points and noted the time of contraction of the connected muscle. The distance between the stimulated points divided by the time difference with which the muscle responded, gave the speed of neural transmission along the given section of the motor nerve. Before this seminal experiment in 1850, many well-known scientists including Johannes Müller had speculated that transmission of neural signals would be too fast to allow experimental measurement. However, to the surprise of the scientific community, the experiment revealed not only that the speed of neural conduction was measurable, but also that it was an order of magnitude slower than the speed of sound through air! In this article, I focus on visual delays, the problem of measurement of visual delays, and the effect these delays have on neural representations of change – such as those that result from visual motion – and on perception and behavior. In addition, the focus is on potential mechanisms that might compensate for visual delays.

Visual processing occurs in a series of hierarchical steps involving photoreceptors, retinal bipolar cells, ganglion cells, the lateral geniculate nucleus (LGN), the primary visual cortex (V1), and beyond. Neural delays have been extensively investigated at various levels within the visual system. For example, in response to retinal stimulation, significant neural delays (>10 msec) have been measured within the retina and at the optic nerve (Dreher et al. Reference Dreher, Fukada and Rodieck1976; Kaplan & Shapley Reference Kaplan and Shapley1982; Ratliff & Hartline Reference Ratliff and Hartline1959). Schiller and Malpeli (Reference Schiller and Malpeli1978) electrically stimulated axons of the optic nerve at the optic chiasm and measured the delay (≈2–3 msec) in the response of cells in the magno- and parvo-cellular layers of the LGN. Many studies on the macaque have recorded the delay with which neurons in the primary visual cortex (V1) respond to retinal stimulation (Raiguel et al. Reference Raiguel, Lagae, Gulyas and Orban1989; Maunsell & Gibson Reference Maunsell and Gibson1992; Schmolesky et al. Reference Schmolesky, Wang, Hanes, Thompson, Leutgeb, Schall and Leventhal1998). An estimate based on a large database yields an average delay of approximately 72 msec of V1 neurons (Lamme & Roelfsema Reference Lamme and Roelfsema2000). However, there is usually a range of delays with which neurons in a given area of the cortex respond. different neurons in the macaque inferotemporal cortex, for example, respond to complex visual information with delays ranging from 100 to 200 msec (Nakamura et al. Reference Nakamura, Matsumoto, Mikami and Kubota1994).

1.2. Neuronal and behavioral delays for discrete stimuli

Discreteness of stimulation is key to defining time delays for both neural responses and in behavioral tasks. For visual neurons, time delays are typically measured in response to light flashes, light onsets/offsets, or short electrical pulses applied to some point in the visual pathway. Neural delay is defined as the time interval between the discrete change in stimulation and change in neural activity at the target site. The onset time of a discrete stimulus is easily determined. However, the definition of neural delays becomes more complex for time-varying stimuli that are a continuous function of time. For such stimuli, neural delay can only be defined with respect to an instantaneous value of the stimulus.

Discreteness of stimulation is central also to defining behavioral delays. The paradigm that formed the cornerstone of experimental psychology in the latter half of the nineteenth century with the work of Helmholtz, Donders, Wundt, and Cattell, involved measurement of simple reaction times (Meyer et al. Reference Meyer, Osman, Irwin and Yantis1988). In simple reaction-time tasks, the participant produces a prespecified overt response – for example a button press – in response to a discrete suprathreshold stimulus such as a flash of light, a sound burst, or a tactile stimulus. The stimulus–response interval for the light stimulus is 200–250 msec, whereas for sound or touch it is about 150 msec. The minimum latency for a voluntary learned motor response appears to be around 100–120 msec (Woodworth & Schlosberg Reference Woodworth and Schlosberg1954, p 9). Simple reaction time is directly related to input–output delays of single neurons and of neural chains (see, e.g., DiCarlo & Maunsell Reference DiCarlo and Maunsell2005). Neural delays vary substantially across cell types and modalities. For example, the transduction of mechanical stimulation by the mechanoreceptors of the touch system, and the response of hair cells to mechanical movement of the cochlear fluid caused by sound, is much quicker than the speed with which light is transduced by the photoreceptors. The fact that reaction time to a light flash is longer than that to a sound burst, for example, is thought to be a direct consequence of the faster processing of auditory neurons. Likewise, a response requiring a longer neural pathway will be slower. Thus, reflex behaviors such as the knee jerk can be produced in about 40 msec via a shorter pathway involving only the spinal cord, whereas behaviors such as in “choice reaction time” experiments, which involve additional cortical processing, are decidedly slower. Additional variables that impact behavioral delays are: the size of the neurons involved, whether the neurons have myelinated axons or not, the number of synapses between the peripheral receptors and the central nervous system (CNS), the types of the intervening synapses, the strength of stimulation, and to an extent certain qualitative aspects of the stimulus.

Several investigators have directly studied the relationship between neuronal latency and reaction time. In measurements of reaction time, presentation of a discrete stimulus starts a time-counter, which the subject's overt motor response stops. In measurements of neuronal latency, however, discerning the neural event used to stop the time-counter can be more challenging. In one study, researchers recorded from single neurons in the monkey motor cortex while the animal performed a learned flexion/extension arm movement in response to a visual, somatosensory, or auditory stimulus (Lamarre et al. Reference Lamarre, Busby and Spidalieri1983). They reported a constant delay between the change in the firing rate of the motor cortical neuron and arm movement, irrespective of the stochastic and modality contingent variation in reaction times. Whether change in a neuron's spike rate following the stimulus, on a single trial, can be used in the measurement of neuronal latency (i.e., to stop the time-counter) is debatable. Individual spikes provide a sparse sample of the assumed underlying rate function of the neuron. Thus, even if one were to assume a step change in the underlying rate function triggered by a discrete stimulus, the variance in spike times dictates averaging over many trials to determine the precise time of the “true” change in the firing rate triggered by sensory stimulation (DiCarlo & Maunsell Reference DiCarlo and Maunsell2005).

1.3. Neuronal and behavioral delays for continuous stimuli

Animals encounter both discrete and continuous environmental events. At one extreme are discrete stimuli resulting from unexpected events in nature (Walls Reference Walls1942). At the opposite extreme are stationary stimuli, for example, nearby objects such as trees, that can be just as behaviorally relevant as changing stimuli, but for which the issue of neural delays arises only when the animal itself moves (see further on). Between the two extremes of the discrete–continuous continuum, there are a multitude of time-varying stimuli unfolding as smooth functions with respect to time. Consider the role of neural delays in a situation where a potential prey confronts a predator who has appeared in a surprise encounter. In relation to the speed at which changes occur in the prey's retinal images, the central transmission of retinal information can be extremely slow. In other words, a significant environmental event, such as the predator adopting the attack posture, can take a significant fraction of a second to be communicated to and be processed by the parts of prey's CNS that control the escape response.

Properties of neurons allowing for higher transmission speeds confer an enormous advantage to animals, and adaptations related to faster conduction of neural information are well known. For example, the myelination of nerve fibers, which is not found in invertebrates, allows for much faster conduction speeds in vertebrate nervous systems. However, slowness of neural response does not necessarily mean a disadvantage to the animal. In fact, some slower processes appear to be linked to the animals producing more adaptive behaviors. First, there is the well-known speed–accuracy tradeoff, where increased speed of actions necessarily leads to lower accuracy (Woodworth Reference Woodworth1899; Meyer et al. Reference Meyer, Osman, Irwin and Yantis1988). Second, although visual response is slower than touch, it does not necessarily lead to a disadvantage for the animals, as vision allows for objects to be detected at a distance, so animals are able to plan behaviors (Sarnat & Netsky Reference Sarnat and Netsky1981). Within limits, animals are able to plan their behavior when conditions are stationary or are changing smoothly, but they must rely directly on sensory input when there is a discrete change in conditions for which there are no priors.

A temperature drop over time and an object changing position over time are common time-varying stimuli. A closer look at neural delays in the context of such stimuli reveals some complexities and a puzzle. At the stimulus end, neural delays in the processing of a continuously unfolding stimulus can be meaningfully defined only in relation to an instantaneous value of a continuous variable (IVCV; e.g., 0° temperature). The IVCV at a given instant of time is what would, so to speak, start the time-counter. As discussed earlier, the question of determining the neural event that may be used to stop the time-counter is challenging enough for discrete events, but an even more difficult problem arises in discerning the neural event that should stop the time-counter for a continuous stimulus.

Suppose a scientist wants to determine the latency with which a spiking neuron responds to a continuous stimulus. Delay in the neural response can only be measured in relation to event IVCVt0 (IVCV at selected time t 0). Typically, a neuron tuned to a particular variable will respond to not just one value of the variable but to a range of values. Thus, the neuron will respond not just to IVCVt0 but also to IVCVs in the neighborhood of IVCVt0. Consider a hypothetical V1 neuron with narrow orientation selectivity. Suppose further that this neuron responds with a latency of 70 msec and produces a maximum spike rate to vertical line orientation (0°). If a line rotates continuously at speed ω, and is presented within the receptive field of this neuron, then the neuron's response will vary as a function of line orientation. Narrow orientation selectivity means that this cell will produce its strongest response when the line is vertical, and a somewhat weaker response to the line orientation that deviates from the vertical by a small angle (ω•70 msec). For a continuously rotating line, neural delay may be defined, for example, between the two following events: (1) the line reaching the vertical orientation, and (2) the moment the neuron's spike rate reaches maximum. In this experiment, the signal to stop the time-counter would be “spike rate to line orientation 0°” – “spike rate to line orientation 0° – (ω•70 msec),” where the negative sign means line orientation before the line reached vertical. Because spikes are collected in time bins, detecting such a signal can be challenging (Bialek et al. Reference Bialek, Rieke, de Ruyter van Steveninck and Warland1991).

However, even if it were in principle possible to pick up the neural signal for IVCVt0 in order to determine latency, there is a more basic issue. The method described here would require the following assumption to be true: Neural response to IVCVt0 (in the preceding example the vertical line orientation) is the same whether the stimulus is presented in isolation, or as part of continuous stimulus presentation. This assumption may be valid for “early” neurons in the vision pathway. However, such an assumption is almost certain to be invalid for neurons as one moves higher up in the visual hierarchy. This is simply because neurons in the “later” parts of the visual pathway are stimulated not just by direct sensory input, but also by ongoing perceptual and motor processes (Treue Reference Treue2003).

The question of where a neuron is located in the visual hierarchy and how its response properties change has been revisited recently. One can define a neuron as being “stimulation centered” or “perception centered,” depending on its response properties. Recent neurophysiological studies suggest that the activity of neurons earlier in the visual pathway corresponds more with sensory stimulation, whereas behavior of later neurons seems to correspond more with perception (Treue Reference Treue2003; Williams et al. Reference Williams, Elfar, Eskandar, Toth and Assad2003). Depending on whether a given neuron is stimulation centered or perception centered, one can or cannot define its sensory input delays in response to continuous stimuli. For example, a perception-centered neuron's response may be related to what the animal anticipates the stimulus would do in the near future, rather than what the stimulus is doing at a given moment (Eskandar & Assad Reference Eskandar and Assad1999). Hence, in such cases there is the potential of relating a neural signal to the wrong IVCV – which suggests that if and only if the IVCVt0 directly drives a neuron's response, may one meaningfully define neural delays for that stimulus event.

Interaction of animals with continuous stimuli, such as moving objects, reveals a puzzle: Behaviors directed at moving objects seem subject to no appreciable delays. Among the innumerable examples, this point is brought forth clearly in the following three examples:

  1. 1. In vision-based predator–prey interactions, there is often extreme pressure on each animal to produce behavior vital to its survival. On the basis of neural and behavioral delays, as revealed in experiments employing discrete stimulation, the prediction would be that actions should always lag behind targets in proportion to the magnitude of the delay, irrespective of whether the stimulus is discrete or continuous. The lags are expected to be large when the rate of change of the external stimulus, for example, the speed of a moving object, is large. The expected lags for smoothly changing stimuli are, however, never seen in typical predator–prey interactions.

  2. 2. Scientists have observed behavior of toads catching woodlice in extremely low levels of illumination. Close to the detection threshold of the toad's retinal ganglion cells there are one to two photoisomerizations per 100 rods per second. At such low levels of light intensity, the ganglion cells sum signals from 400–1000 rods over a period of 1.0–2.5 seconds, so vision necessarily becomes slow. Aho et al. (Reference Aho, Donner, Helenius, Larsen and Reuter1993) noted that when a worm-like stimulus was presented to a ganglion cell's receptive field, the cell response occurred about 3 seconds after the presentation of the stimulus (Aho et al. Reference Aho, Donner, Helenius, Larsen and Reuter1993). Nonetheless, even under these extremely limited visibility conditions a toad could still produce an accurate tongue-snap response to catch a moving worm!

  3. 3. The final example concerns the highly practiced behavior of humans in relation to fast-moving objects. In the game of cricket, a fast bowler can bowl at speeds of more than 90 mph. At these speeds, the batsman has a very short time to decide how to hit the ball, and an even shorter time in which to decide how to get his head (say) out of the line of the ball in case the ball is a bouncer heading directly towards his head. The required temporal accuracy could be as small as 5 msec (Tresilian Reference Tresilian1993) or even smaller (Land & McLeod Reference Land and McLeod2000; Regan Reference Regan1992). As delays in the output of photoreceptors can be appreciably longer than 5 msec, common observations in high-speed ball games immediately bring up the issue of compensation for neural delays.

2. Compensation of neural delays

The delay between the input and output of a black box – which is a widely used concept in engineering and systems neurobiology (DeYoe & Van Essen Reference DeYoe and Van Essen1988) – is known as phase-lag. During animal interactions phase-lags can be large, and appropriate compensation is necessary for adaptive behavior. According to Ghez and Krakauer (Reference Ghez, Krakauer, Kandel, Schwartz and Jessell2000, p. 657): “If [phase] lags are long and external conditions change rapidly, specific feedback corrections may not be appropriate by the time they are implemented.” One type of mechanism that could compensate for phase-lags, called feedforward control, relies on information that the nervous system acquired before the actual arrival of external input from the sensors (Ghez & Krakauer Reference Ghez, Krakauer, Kandel, Schwartz and Jessell2000).

It is perhaps not a coincidence that Helmholtz was also a pioneer in recognizing the need for compensation of delays and proposing feedforward control. Feedforward control is most clearly seen in situations in which sensory stimulation results as a consequence of movements produced by the animal itself (reafference). A famous example by Helmholtz is the “canceling” of the effects of retinal image motion during voluntary eye movements (Jeannerod et al. Reference Jeannerod, Kennedy and Magnin1979); when one voluntarily moves one's eyes, the visual world, which shifts on the retina, is not seen as moving. The notion of feedforward control suggests that a “comparator” receiving afferent signals from the retina due to image motion also receives a corollary of the motor command to the eye muscles (von Holst & Mittelstaedt Reference von Holst and Mittelstaedt1950; Sperry Reference Sperry1950). The copy of the motor command “cancels” the retinal input signals, resulting in perceptual stability of the visual scene. This view is supported by the fact that when a copy of the motor command to eye muscles is absent, such as during involuntary eye movements (e.g., due to externally forced movement of the eyeball), the visual environment does appear to shift.

Predictive or anticipatory control, which are concepts related to feedforward control, have been demonstrated at the level of single cells, in modules serving sensory-motor functions (e.g., the cerebellum), and in psychophysical tasks in humans. Single neurons in the parietal cortex (lateral intrapariental area) have been shown to respond in a predictive manner to changes in visual input caused by voluntary saccades (Duhamel et al. Reference Duhamel, Colby and Goldberg1992). Prediction has also been shown in psychophysical experiments investigating the grip force required to lift weights (Johansson & Westling Reference Johansson and Westling1988), in a pole-balancing task (Mehta & Schaal Reference Mehta and Schaal2002), and in weight unloading (Diedrichsen et al. Reference Diedrichsen, Verstynen, Hon, Lehman and Ivry2003). The cerebellum has been identified as a possible sensory-motor structure contributing to predictive control (Kawato et al. Reference Kawato, Furukawa and Suzuki1987; Wolpert et al. Reference Wolpert, Miall and Kawato1998). However, predictive mechanisms are unlikely to be localized in any one sub-system of the CNS. For example, in humans, damage to the cerebellum leaves predictive responses in the weight-unloading task intact (Diedrichsen et al. Reference Diedrichsen, Verstynen, Hon, Lehman and Ivry2003), which suggests that prediction is a distributed property of animal nervous systems.

The notion that sensory processes per se may be predictive is not widely recognized. In fact, previous suggestions of sensory prediction (Nijhawan Reference Nijhawan1994) have led to controversy (Cavanagh Reference Cavanagh1997; Krekelberg & Lappe Reference Krekelberg and Lappe2001; Nijhawan Reference Nijhawan2002; Schlag & Schlag-Rey Reference Schlag and Schlag-Rey2002). The position that sensory processes are predictive should not be surprising for the following reasons. First, there is debate as to whether the function of parietal neurons, a cortical area in which predictive processes appear to be prevalent, is sensory or motor. Second, sensory processes are known to contribute to prediction in sensory-motor tasks, such as in the canceling of retinal image motion signals during voluntary eye movements and reduction in the latency of parietal neurons (Duhamel et al. Reference Duhamel, Colby and Goldberg1992). Therefore, in the very least, sensory processes participate in prediction. Finally, it is rather unparsimonious to suggest that although prediction is a general property of the nervous system found in many different areas, sensory areas are excluded. This would be particularly awkward for vision, given that more than 50% of the primate neocortex is involved with visual processes (Kandel & Wurtz Reference Kandel, Wurtz, Kandel, Schwartz and Jessell2000, p. 497). Indeed, the concept of the “internal model” of the motor system, which serves a predictive function, may be generalized to the domain of perception (Kawato Reference Kawato1999). The goal of this article is to place the notion of sensory prediction in general, and visual prediction in particular, on firmer ground by addressing the existing controversies, re-analyzing experimental results, and outlining the wide-ranging issues that consequently arise.

3. The problem: Topography of the early visual system, neural delays, and spatial lags for motion

For humans, moving objects can be targets for perception, orienting responses, interception, or avoidance. The concept of prediction as it exists in the literature is a somewhat limited point of view resulting from focus on motor or sensory-motor processes at the expense of visual processes. A majority of the literature suggests that predictive processes are one step removed from sensory processes. These processes are located in the parietal and frontal cortical areas, perhaps even beyond the sensory-motor integration areas, or in sub-cortical structures such as the cerebellum, with the immediate goal of producing movement. One corollary of the viewpoint that there is no visual compensation for delays is: Perceptual state of a changing visual stimulus should trail behind the actual current state of the stimulus. Changing patterns of illumination, especially those resulting from movement, stimulate the visual system strongly and frequently elicit a behavioral response from the animal (Barlow Reference Barlow and Rosenblith1961a). However the quintessential stimulus of visual motion leads to the following conundrum. The retina is mapped topographically on to many maps of the brain, and particularly well known are topographic maps found in V1 (Felleman & Van Essen Reference Felleman and Van Essen1991; Tootell et al. Reference Tootell, Silverman, Switkes and De Valois1982). As an object moves over the retina, the representation of the object (wave of neural activity generated by the object) shifts over the cortical surface containing the topographic maps. Because of neural delays, the object's representation in the cortical map should lag behind the object's retinal representation (Figure 1a). As our perception is based in part on neural activity in cortical maps, this renders a perceptual lag in the position of a moving object in relation to its actual position (Figure 1b).

Figure 1. (a) The retina is shown as a one-dimensional coordinate system mapped topographically onto the cortex. For simplicity, the thalamic stage is not included. A moving ball's image travels from left to right at constant velocity. It is assumed that the neural delay between the retina and the cortex is equal to the time the ball takes in moving from retinal coordinate x −1 to x 0. At the instant depicted, the cortical coordinate of the ball, showing greatest activity is x'−1, which corresponds to retinal coordinate x −1, while the ball's actual retinal coordinate at the instant depicted is x 0. (b) The batsman in cricket views a ball heading toward him after bouncing some distance in front of him. The batsman's perception is assumed to depend on the cortical activity triggered by the moving ball. At the instant depicted, the physical position of the ball (filled circle) is ahead of the ball's perceived position (unfilled circle). The lag is proportional to the neural delay and object velocity, which for a cricket ball can be more than 90 mph. For an assumed delay of 100 msec, the difference between the ball's perceived position and its actual position should be 13.2 ft. Adapted from Land and McLeod (2000).

The recent statements concerning the perceived lag of moving objects have lead to a vigorous debate (Cavanagh Reference Cavanagh1997; Gegenfurtner Reference Gegenfurtner1999; Krekelberg & Lappe Reference Krekelberg and Lappe2001; Nijhawan Reference Nijhawan1994; Reference Nijhawan2002; Schlag & Schlag-Rey Reference Schlag and Schlag-Rey2002). What might be the magnitude of the lag? At a simplistic level the lag should equal the product of visual input delay and object velocity. However, in order to precisely measure input delay in perception one would have to first make several assumptions. The most problematic assumption is that there is some specific brain site where neural processes culminate in perception. It has been possible to measure delays between two neural sites (Maunsell & Gibson Reference Maunsell and Gibson1992; Raiguelet al. 1989; Schmolesky et al. Reference Schmolesky, Wang, Hanes, Thompson, Leutgeb, Schall and Leventhal1998) and to show that these delays differ for different pathways (Schiller & Malpeli Reference Schiller and Malpeli1978). However, there are arguments that there is no specific brain site where the stream of visual processing ends to yield perception (Dennett & Kinsbourne Reference Dennett and Kinsbourne1992).

Despite the existing controversies, however, limits can be imposed on the “where” and “when” of perception. In response to retinal stimulation significant neural delays have been measured within the retina and the optic nerve (Dreher et al. Reference Dreher, Fukada and Rodieck1976; Kaplan & Shapley Reference Kaplan and Shapley1982; Ratliff & Hartline Reference Ratliff and Hartline1959). It is generally believed that neural activity at the level of photoreceptors is not sufficient to yield visual awareness. It follows that if the photoreceptor delay is 10 msec and object velocity is 10 m/sec then the lag in perceived position of the moving object should be at least 10 cm. The same logic holds for measurements at different points of the optic nerve, the LGN, and so on; in fact, it may be argued that observers are not directly aware of neural activity even in V1 (primary visual cortex) (Crick & Koch Reference Crick and Koch1995; He et al. Reference He, Cavanagh and Intriligator1996) which involves much longer delays. Therefore, the perceptual lag postulate may be re-stated as: Neural delays should cause the perceived position of a moving object to lag behind its physical position by at least a distance vΔt max, where v is object velocity and Δt max is the maximum cumulative delay in the vision pathway until a point in the pathway where neural activity can be said to not yield visual perception.

Because of the slowness of vision, a large chunk of the delay during visual-motor interactions is due to visual processes per se. Which component(s) of the animal nervous system compensate for visual delays? Although late compensatory mechanisms are frequently invoked, an interesting possibility is that of contribution from the visual system itself (Ramachandran & Anstis Reference Ramachandran and Anstis1990; De Valois & De Valois Reference De Valois and De Valois1991). This possibility is related to a conjecture by David Marr, who, in his famous treatise on vision, considered the possibility of visual compensation in diving birds that feed on fish first seen from above the water's surface (Marr Reference Marr1982). Because of the water's refractive properties, the fish's position is optically shifted when it is viewed from the air, and the bird's nervous system must correct for this optical distortion. Although a correction is not present in humans, who continue to perceive submerged objects as shifted in relation to their true positions, Marr speculated that the nervous system of birds whose survival depends on correctly localizing the fish might visually correct for the optical distortion. Thus, the fish's true position would be given “directly” by the bird's visual system, as opposed to through an “inferential,” non-visual process. For humans, the case of moving objects (Fig. 1b) presents an analogous situation where the basis of error and impetus for correction is neural rather than optical.

4. The phenomenon: Concatenation of continuous and discrete visual events

Neurophysiological and psychophysical experiments reveal delays for discrete stimulus events. In a seeming contradiction, animal behaviors directed at continuous stimuli (e.g., moving objects) reveal virtually no delay. Does a similar contradiction exist for visual processes? In other words, are visual responses to discrete stimuli delayed, but visual responses to continuous stimuli not? Were this the case, the concatenation of discrete and continuous visual events in the same display should reveal an anomaly. An experimental paradigm, known as the flash-lag effect, combines a discrete stimulus (a light flash) and a continuous stimulus (a moving object). Indeed, when observers view a display in which the moving item is spatially and temporally aligned with the flashed item (Fig. 2a), the flashed item is seen as spatially lagging behind the moving item (Fig. 2b) (Hazelhoff & Wiersma Reference Hazelhoff and Wiersma1924; Metzger Reference Metzger1932; Mateeff & Hohnsbein Reference Mateeff and Hohnsbein1988; Nijhawan Reference Nijhawan1992).

Figure 2. (a) In a dark room, a single physical rod, made of three segments, rotates clockwise in the direction of the arrow. The segment labeled “C” is illuminated with a continuous light source, while the discrete segments labeled “D” are illuminated with a brief flash produced by a stroboscope. (b) The percept of the observers. Adapted from Nijhawan (1992).

There is a parallel between the notion of visual prediction emerging from the flash-lag anomaly and the notion of motor prediction appearing on the basis of behavioral contradiction (Nijhawan Reference Nijhawan1994) – in both cases there is a discrete event and a continuous event. Visual prediction suggests that visually guided behaviors, such as hitting a fast-moving ball, require the observer to have information concerning the ball's physical position before contact. If there were no visual prediction, then for a 100 msec delay, the observer would see a ball traveling at 90 mph to be about 13.2 ft behind its actual instantaneous position (see Fig. 1b). Consistent with Marr's conjecture that visual mechanisms of diving birds compensate for optical displacement of submerged objects, the visual compensation account of the flash-lag effect (Nijhawan Reference Nijhawan1994) suggests that the expected lag in the perceived position of moving objects (resulting from neural delays) is compensated for by visual mechanisms. In contrast, discrete events such as flashes have no priors, and their perception can only be based on delayed sensory input. (There is probably no neural process that could overcome the delay in the registration of a flash [Reference van de Grindvan de Grind 2002]). This results in the flash appearing in a position lagging behind the perceived (spatially extrapolated) position of the moving object. The goal of visual prediction is to use priors contained in the unfolding visual stimulus to create a perceived state of the world that matches, as far as possible, the actual state of the world.

5. The flash-lag debate

There is intense debate as to whether the flash-lag effect does indeed provide prima facie evidence of a predictive process in vision, and more generally as to whether prediction in vision could serve any function. The controversy is multifaceted. First, the results of the flash-initiated and flash-terminated conditions of the flash-lag effect (Eagleman & Sejnowski Reference Eagleman and Sejnowski2000; Khurana & Nijhawan Reference Khurana and Nijhawan1995; Khurana et al. Reference Khurana, Watanabe and Nijhawan2000; Nijhawan Reference Nijhawan1992), both of which involve unpredictable changes to the moving stimulus, render results that appear contradictory to the notion of visual prediction. The second basis for the controversy is logical. How can prediction, which is typically considered to be a high-level or cognitive phenomenon (see, e.g., Thier & Ilg Reference Thier and Ilg2005), be part and parcel of perception – predominantly a result of processes in the feedforward vision pathway? On the other hand, how can visual percepts be influenced by higher-level (e.g., parietal) neurons known to be involved in prediction? The third source that has fueled the debate is related to doubts over the functional significance of visual prediction. There is unequivocal evidence and acceptance that “late” motor or sensory-motor processes serve a predictive role. These late processes are certainly capable of compensating for all the delays in the sensorimotor loop, including those incurred by visual processes per se. So what could be the functional significance of prediction in vision? Each of these challenges is now addressed in turn.

5.1. The standard view and its revision suggested by the flash-lag effect

I attribute to the “standard view,” which represents one side of the debate, the following statement: A moving object should be visible in a position that it occupied in the recent past. Figure 3 presents this standard view in space–time plots. One of the two plots in Figure 3 represents an object moving at constant velocity v, and a flash presented at position x 0 at time t 0. This plot represents the physical state of affairs. The second plot represents how a neuron “sees” these events. This plot, called the ds-error line, results from the standard view of neural delays in the processing of physical events.

Figure 3. Representation of the “standard” view with space–time plots of an object moving at constant velocity v (thick line) and how a neuron “sees” the object with some delay (thin line). A brief flash (filled gray square) presented in position x 0 at time t 0 is “seen” (outline square) by the neuron in position x 0 at time t 0t p, where Δt p is input delay to perception. Two events, the arrival-of-the-moving-object in position x 0 (filled gray circle on the x=x 0 line) and the “seeing” of this event by the neuron (outline circle on the x=x 0 line) occur at different times because of neural latency between the event and its perception. At a given time (say t 0), the physical position of the moving object spatially leads (filled gray circle on t=t 0 line) the position in which the neurons “see” the object (outline circle on t=t 0 line). The spatial gap between the physical position and the neurally represented position are referred to as ds-error. The standard view suggests that the perceived object travels on the ds-error line. Adapted from Krekelberg and Lappe (2001).

If the scenario depicted in Figure 3 were true, then there would be no flash-lag effect (e.g., note that on the intersection of the ds-error line and the x=x 0 line of Figure 3 the moving and flashed objects coincide). Since the flash-lag effect occurs, a revision of the view presented in Figure 3 is in order. Figure 4 shows the revised picture.

Figure 4. A revision of the standard view is forced by the flash-lag effect. This figure shows a new trajectory, the “reduced ds-error” line, which is a line parallel to the ds-error (and the physical) line, passing through the point in which the moving item is seen in the flash-lag effect. The distance along the x (space)-axis between the reduced ds-error line and the ds-error is Δσ; where Δσ is the experimentally determined flash-lag effect (see, e.g., Nijhawan 1994). The standard view needs to be revised, as on this view, the moving object is seen on the ds-error line throughout, and in particular, at t 0 – the time of the flash (open circle). The flash-lag effect shows that the moving object is seen ahead of the flash by distance Δσ at t 0t p (filled black circle). Thus, for both the standard view to be true and the flash-lag effect to occur, the moving object would have to appear to speed up (which corresponds to a segment of a space–time trajectory, shown as the thin broken line of different slope). Because the moving object is assumed to travel at constant velocity, the standard view is untenable.

The reduced ds-error line in Figure 4 corresponds to perception. On the generalized visual prediction view (see below), there are multiple reduced ds-error lines, some of which will not be relevant for perception but only for visually guided action.

5.2. Experiments with unpredictable motion

In the flash-terminated condition, a smoothly moving object disappears unpredictably, simultaneously with the flash. In the flash-initiated condition, an unpredictable onset of the moving object occurs simultaneously with the flash, following which the motion is smooth. The results of these conditions (Nijhawan Reference Nijhawan1992) are as follows: When the flash-lag display “was separated into two temporal intervals ... [o]ne interval consisted only of events before the flash (past-interval) and the other interval only of events after the flash (future-interval). The observers reported no [flash-lag] effect in the past-interval but did so in the future-interval. The magnitude of the [flash-lag] effect for the future-interval was as strong as that for the past+future interval.”

5.2.1. Flash-initiated condition

In the spatial extrapolation view, the perceived position of the moving object is based on a type of “guess” as to the object's actual position. All guesses are a product of prior information. The crucial feature of the flash-initiated display is that the physical time (and position) at which the moving object appears is unpredictable. Prima facie, one would expect no flash-lag in the flash-initiated display. Yet, the findings are that the flash-lag effect in the flash-initiated condition is undiminished (Khurana & Nijhawan Reference Khurana and Nijhawan1995). This result has been obtained with real (analog) motion and extremely brief (100 µs) flashes (Nijhawan Reference Nijhawan1992), and with digital displays in which the moving and flashed objects are aligned in the first video frame for 10 or more milliseconds (Alais & Burr Reference Alais and Burr2003; Brenner & Smeets Reference Brenner and Smeets2000; Eagleman & Sejnowski Reference Eagleman and Sejnowski2000; Khurana et al. Reference Khurana, Watanabe and Nijhawan2000; Sheth et al. Reference Sheth, Nijhawan and Shimojo2000).

Since the visual prediction account is strictly spatial, it might seem that it is necessary to assume a shift in the coordinates of the moving object occurring in a direction parallel to the time axis (see Fig. 4), which would necessitate infinite speed of neural processing. Excluding the sub-millisecond photochemical responses to bright flashes (Wald Reference Wald1968), which are unlikely to contribute to extrapolation, no processes in the visual system are fast enough. However, this reasoning is flawed, as the flash-lag effect is a relative judgment that only requires that the time taken for the forward shift in coordinates of the moving object is small in relation to the time taken for the processing of the flash. Therefore, the all-important questions for visual prediction are: When are the neural processes responsible for motion extrapolation initiated, and how long do they take to be completed?

On the spatial extrapolation account, perception of the moving object in the flash-initiated condition is subject to an input delay (Δt p) following its abrupt onset, which is the same as the delay for the flash. Say motion onset occurs at time t 0 (simultaneously with the flash); the moving object and the flash are both first perceived at time t 0t p. Further, suppose that retinal coordinate x 0 is represented by coordinate x0 on a cortical retinotopic map. If the abrupt onset of the moving object occurred in retinal position x 0, and there were no motion extrapolation, then the neural activity as a result of this event will occur in cortical position x0 after a delay of ≈100 msec (assuming 100 msec latency). How much additional time over and above the ≈100 msec delay is required to generate a neural representation of the object that is shifted ahead of x0 by a distance corresponding to Δσ (see reduced ds-error line in Figure 4)? For spatial extrapolation to be the correct account of the flash-initiated result, and for this mechanism to be behaviorally relevant (e.g., for producing quick actions), the additional time must be small.

The following analysis estimates the additional time for extrapolation to be less than 2% of the “baseline” delay of ≈100 msec between stimulus presentation and its perception (De Valois & De Valois Reference De Valois and De Valois1991). This analysis is based on a prominent property of the vertebrate visual system that it consists of two types of pathways (Dowling Reference Dowling, Schmitt and Worden1979; Tessier-Lavigne Reference Tessier-Lavigne, Kandel, Schwartz and Jessell2000): the vertical pathway and the horizontal pathway (Fig. 5). In the early visual system, the vertical pathway corresponds to information flow from photoreceptors to bipolar cells to ganglion cells to LGN cells, and so on. The horizontal pathway in the retina includes information flow from one retinal site to a neighboring one via horizontal cells, the amacrine cells, and tangential extensions of dendrites and axon terminals. Analogous processes are present in the cortex (Bringuier et al. Reference Bringuier, Chavane, Glaeser and Fregnac1999). In evaluating the consequences of neural delays and compensation, it is best to treat these two pathways independently. Delays in signals traveling along the vertical pathway are the putative source of lagging retinotopic coordinates for visual motion. In contrast, the notion of visual prediction is concerned primarily with horizontal processes, which transmit neural information between two neighboring retinotopic sites (Barlow Reference Barlow1953; Reference Barlow1981; Berry et al. Reference Berry, Brivanlou, Jordan and Meister1999).

Figure 5. The pathways in the retina depicting the vertical (distal–proximal) and the horizontal pathways of information flow. Adapted from Dowling (1979) and Tessier-Lavigne (2000).

It is premature to commit to a specific neural mechanism that would cause a “forward” shift in the coordinates for moving objects. This shift could be based on the interaction of signals between the magnocellular and parvocellular systems (Barlow Reference Barlow1981) (see Fig. 6), on a spatial reorganization of receptive fields or of retinotopic maps, or on some other yet more specialized mechanisms (see further on). Nonetheless, any viable predictive mechanism would involve two time-consuming processes: “obtaining a motion sample” and “spatial extrapolation.” How soon after motion onset can the sampling of motion and the spatial extrapolation processes be completed to produce the required forward shift (Δσ)? At the outset it must be recognized that the correct estimate of the time will be based on the neural representation of Δσ (i.e., retinotopic distance corresponding to Δσ) and the time taken by horizontal processes to cover this distance (see, e.g., Anderson & Van Essen Reference Anderson and Van Essen1987; Berry et al. Reference Berry, Brivanlou, Jordan and Meister1999). In particular, a correct time estimate will be based on neither the time the moving object takes to travel the physical distance corresponding to Δσ, nor on physical or perceived velocity of the object; rather, it will be based on neural distances and speed of neural signals (see further on).

Figure 6. A hypothetical model based on a network with excitatory and inhibitory interactions. This network involves the layer 5 pyramidal cells in V1. The vertical processes of the pyramidal cells are the apical dendrites extending into layer 4. These dendrites receive horizontal connections from both the faster transient (Y) cells and the slower sustained (X) cells located in different sub-layers of layer 4. A moving ball's current retinal position is shown at the top of the figure. The resulting neural representations in layer 4 X and Y maps are “misaligned” with respect to the retinal representation and with respect to each other (due to different speeds of the sustained and transient channels). The excitatory and inhibitory horizontal connections cause the leftmost pyramidal cell to be excited, the rightmost pyramidal cell to be inhibited, and the middle cell's activity to remain unchanged (because it receives both excitatory and inhibitory inputs). Thus, the leftmost pyramidal cell behaves as if it is representing the “current” position of the ball. Adapted form Barlow (1981).

5.2.1.1. Motion sampling

In the flash-initiated display, the sudden motion onset causes neurons at early levels of the visual pathway, for example, a linear set of photoreceptors, to respond before the response of corresponding thalamic or cortical neurons. A flash likewise stimulates photoreceptors before the thalamus or the cortex. However, there is a crucial difference between the motion onset stimulus and the flashed stimulus. At ordinary speeds, the motion-onset stimulus will undergo a significant shift across several photoreceptors even prior to the completion of the early-retinal-potential phase triggered by the flash (Kirschfeld Reference Kirschfeld1983), which is well before the impact of the flash on the reduction of neurotransmitter (glutamate) released by the photoreceptors. Thus, the moving object will have stimulated an array of photoreceptors even before the output of the first stimulated photoreceptor in the array.

If the onset of the object moving at velocity v occurs in position x 0, the finding in the flash-initiated condition is that, at the instant the flash is perceived, the moving object is seen in position corresponding to x 0+Δσ. In the primate visual pathway, a directionally selective response is first observed in V1 (Hubel & Wiesel Reference Hubel and Wiesel1962). Input from two neighboring presynaptic neurons is sufficient to trigger a directionally selective response in a cortical simple cell (De Valois & Cottaris Reference De Valois and Cottaris1998). In humans, successive stimulation by a moving object of two neighboring foveal receptors separated by 2.8 µm will occur in a very small time window; for example, within 0.9 msec for an object traveling at 10 deg/s (Fein & Szuts Reference Fein and Szuts1982). Hence, it may be argued that for objects moving at 10 deg/s, early neurons will have acquired an adequate motion sample following motion onset in just under one millisecond. This estimate is in approximate agreement with previous estimates (Anderson et al. Reference Anderson, Van Essen and Gallant1990; Westheimer & McKee Reference Westheimer and McKee1977). Note that this does not imply that a full response to motion (e.g., a directional response) occurs in less than 1 msec, but simply that 0.9 msec after motion onset, there is motion information in the pathway capable (later) of a directionally selective neural response. This information could feed either directly into a motion extrapolation mechanism or indirectly, after the computation of a directional response.

5.2.1.2. Spatial extrapolation

The second “spatial extrapolation” stage involves speed of communication of neural signals from one retinotopic site to a neighboring one. A flash-lag effect of magnitude 0.5 degrees of visual angle corresponds approximately to 0.15 mm of retinal surface. Frequently cited speeds of signals traveling along neural pathways in the nervous system range from 0.2 to 120 m/sec. Lateral communication between two neighboring retinal locations separated by 0.15 mm, via neural signals traveling even at the slowest speed in this range of 0.2 m/sec, would occur within 0.75 msec. A similar scenario holds for other retinotopic maps, for example, a cortical map, after accounting for cortical magnification.

According to the estimateFootnote 1 provided here, for an object moving at 10 deg/sec, the required input sample and the resulting extrapolated output will together take only 1.65 msec. This time is less than 2% of the 100 msec baseline processing time (De Valois & De Valois Reference De Valois and De Valois1991). Because the required time for spatial extrapolation is so small, this represents an extremely efficient mechanism that could start and finish virtually anytime during the required baseline (≈100 msec) delay after motion onset (Khurana & Nijhawan Reference Khurana and Nijhawan1995) (see Fig. 7).

Figure 7. On the spatial extrapolation view, the latency for the moving object and the flash, in the flash-initiated condition, is Δ t i (input delay). The figure shows how this model applies when the motion trajectory before the flash is missing, as in the flash-initiated condition. The reduced ds-error line stops before intersecting the x=x 0 line as the flash-lag effect occurs in the flash-initiated condition. The extrapolation model brings the moving object to the correct position on the reduced ds-error line by two processes: sampling of motion and spatial extrapolation. These two processes correspond to the thin broken line consisting of two segments of different slopes, enlarged in inset “a”. The first segment, parallel to the space–time plot of physical motion, depicts a motion sample taken by neighboring neurons in a retinotopic map. The height at which this segment intersects the x=x 0 line is arbitrary and will depend on whether an early or a late sampling is assumed. The second segment of greater slope represents the travel of neural signals along the horizontal visual pathway, which can occur at higher speed. Inset “b” is made of three segments of different slopes. It depicts a situation of “over-extrapolation” that temporarily brings the extrapolated object closer to the thick continuous (physical) line. The object is brought back to the reduced ds-error line, as neural processes receiving the over-extrapolated input are presumed not capable of compensating for delays (see vertical segment in inset “b”, and see text).

To further clarify, the suggestion here is not that very early mechanisms accomplish extrapolation, or that within 1.65 msec after motion onset extrapolated output is already available. It is highly unlikely that extrapolated output is available before the hyperpolarization response of the photoreceptors, which has a latency of more than 10 msec. Rather, the suggestion is that the time taken for an extrapolated output is only a small fraction of the time longer than the baseline delay for an “un-extrapolated” output. The added segment of horizontal pathway responsible for the lateral shift in coordinates can be interjected at either an “early” (e.g., at the level of the retinal ganglion cells) or a “late” cortical level. Two further points are worth emphasizing. First, the time required in the generation of an extrapolated representation is so short that visual mechanisms could “over-extrapolate” and place the neural representation of the moving object farther ahead of the reduced ds-error line (see Fig. 4) with little computational time cost. There is already some evidence for over-extrapolation (Berry et al. Reference Berry, Brivanlou, Jordan and Meister1999; Khurana & Nijhawan Reference Khurana and Nijhawan1995; and see further on in the target article). Over-extrapolation could be useful, for example, if time-consuming visual processes further downstream, to which the extrapolation mechanisms report, cannot compensate for delays (see Fig. 7, inset “b”). Second, on the present model, further spatial shift than that depicted in Figure 7, for further reduction or even complete elimination of the ds-error, is possible. Such an outcome would be impossible with the differential latency view (Purushothaman et al. Reference Purushothaman, Patel, Bedell and Öğmen1998; Whitney & Murakami Reference Whitney and Murakami1998), as this would require motion to be processed with close to zero latency.

5.2.2. Flash-terminated condition

The foregoing discussion leaves little doubt that an undiminished flash-lag effect in the flash-initiated condition is not incompatible with the proposal of visual prediction. I now address another major empirical challenge for visual prediction resulting from the flash-terminated condition, in which the moving object disappears simultaneously with the flash. At the outset, spatial extrapolation should continue beyond the time of unpredictable disappearance of the object. Yet these displays produce no flash-lag effect.

On the spatial extrapolation view, vision contributes to compensation for neural delays whenever a moving object is viewed; the flash has only a role of a spatiotemporal reference. So, the correct space–time trajectory for the moving object is the reduced ds-error line. However, why then does the moving object not appear to overshoot its disappearance point in this condition (Fig. 8)?

Figure 8. In this space–time plot, the moving object stops at position x 0 at time t 0, simultaneously with the presentation of the flash presented in position x 0. The fully delayed ds-error line (from the standard view) stops at the x 0 line, which is consistent with the result of “no effect” in the flash-terminated condition. However, spatial extrapolation can also explain this result. On this view, the cortical representation of the moving object (R C), which is only partially maintained by retinal input, corresponds to the reduced ds-error (thick broken) line. Once the physical object is switched off, R C quickly decays (shown by lighter thick broken line) as retinal input stops. The switching off of the moving object triggers transient signals, which create a strong thalamic representation (R T). Because of biased competition, and because R T is newer input, R T wins and dominates R C and the percept, overwhelming any predictive-overshoot of the moving object.

It is commonly assumed that the full-motion condition of the flash-lag display is equal to the flash-terminated condition plus the flash-initiated condition. Although this relationship is correct for describing methodology, neural responses to the full-motion condition are not equal to the neural responses to the flash-terminated condition plus neural responses to the flash-initiated condition. For example, because of the visual transients generated by the offset of the moving object, the moving object in the flash-terminated condition stimulates the visual system in a dramatically different manner than in the full-motion condition. The transient generates a strong retinal neural representation that opposes the previously set-up cortical representation supporting spatial extrapolation. It is proposed that the newer representation suppresses the expected “predictive-overshoot” of the moving object and acts like a “correction-for-spatial-extrapolation.”

The above proposal is inspired by the notion of “biased competition” in which two neural representations compete for the control of cell activity (Desimone Reference Desimone1998; Desimone & Duncan Reference Desimone and Duncan1995). The assumptions of biased competition are: that competing representations typically suppress each other; that competition can occur between two representations at a given level, or between low-level (e.g., retinal) and high-level (e.g., cortical) representations; and that the representation that wins the competition could be “stronger” for one or more of several reasons, including its recency, novelty, and behavioral significance to the animal (Desimone Reference Desimone1998). A related notion has been invoked recently to explain the perceptual phenomenon of visual masking (Keysers & Perrett Reference Keysers and Perrett2002). On the spatial extrapolation view, after some initial motion input, the neural activity representing the moving object is less dependent on (external) retinal input, and is supported more by partially “autonomous” (internal) neural representations. These highly processed cortical representations feed into representations that control limb movements and typically control actions directed at moving objects. These representations could, however, produce their own localization errors if things in the world abruptly changed. Nonetheless, as these representations are based on prediction, new retinal input due to unpredictable change is given a higher relative weight by the visual system. In addition, greater weighting is assigned to retinal input, as stimulus offsets strongly stimulate the retina (Macknik et al. Reference Macknik, Martinez-Conde and Haglund2000). It is suggested that localization errors (predictive-overshoots) are suppressed by the strong, transient-fed, retinal representation generated by motion offset. Thus, for example, in the competition between a high-level cortical representation (RC) and a low-level thalamic representation (R T), R T dominates (Fig. 8). This leads to a “correction-for-extrapolation” via feedforward competition (Desimone Reference Desimone1998; Keysers & Perrett Reference Keysers and Perrett2002). (In section 9.1 of the target article, I discuss interactions between non-competing representations.)

The hallmark of competition models is that, when strengths of the competing representations are changed, the representation that dominates will change (Keysers & Perrett Reference Keysers and Perrett2002). Indeed, there are psychophysical results that produce the overshoot of moving objects predicted by the extrapolation model (Freyd Reference Freyd1987; Fu et al. Reference Fu, Shen and Dan2001; Kanai et al. Reference Kanai, Sheth and Shimojo2004; Whitaker et al. Reference Whitaker, Perarson, McGraw and Banford1998). Interestingly, in one study an overshoot of the moving object was not found in human psychophysics, but was revealed in the monkey visual area V4 (Sundberg et al. Reference Sundberg, Fallah and Reynolds2006). The authors conjectured that processing at higher cortical areas beyond V4 must be engaged to fully account for the lack of overshoot in human psychophysics. I suggest an alternative point of view: Lower-level representations, possibly thalamic, compete with higher-level representations such as found in V4. This competition leads to the suppression of extrapolated V4 output.

Recently, Maus and Nijhawan (Reference Maus and Nijhawan2006) empirically tested the afore-discussed “correction-for-extrapolation” hypothesis (Fig. 9). In the flash-terminated condition, the disappearance of the moving object is an all-or-none event. We weakened the retinal representation (and its thalamic counterpart R T) by replacing the all-or-none event by a luminance ramp. The correction-for-extrapolation hypothesis predicts that in the absence of a strong R T competing with RC (cortical representation), RC supporting the moving dot's extrapolated position will be revealed in perception. Indeed, this was the case. The experiments (Maus & Nijhawan Reference Maus and Nijhawan2006) revealed a forward extrapolation of v×175 msec, where v is moving object velocity.

The main empirical challenges for spatial extrapolation were based on the flash-initiated and flash-terminated conditions. As has been shown, an unabated flash-lag effect in the flash-initiated cycle is not inconsistent with visual prediction, as neural representations supporting extrapolation can be set up quickly in relation to the input delay for motion. In fact, the extrapolation times may be so small that when required, some processes could “over-extrapolate.” Finally, the “missing predictive-overshoot” in the flash-terminated condition can be accommodated by the biased competition model (Desimone Reference Desimone1998), which suggests a correction for spatial extrapolation caused by a strong retinal input due to transients. This new analysis, based on empirical results, reinforces the general proposal that when possible, the visual system attempts to minimize localization errors. Both spatial extrapolation and the correction for it achieve this goal.

Figure 9. (a) The gray background is used only for illustration; the actual experiment was performed with a homogenous dark background. A neutral density filter, which decreased in transmission from 100% (at the 6 o'clock position) to 0.01% in the counterclockwise direction, was used. Thus, a moving stimulus (a white dot of fixed luminance) behind the filter appeared to vary in luminance; it was always visible in the 6 o'clock position and never visible past about the 9 o'clock position. In condition I, on different trials, the dot moved back and forth on a short trajectory behind the filter. The average position of the short trajectory dot was changed from trial to trial (figure depicts a trial sequence of trials 1, 2, and 3). A reference line (white radial line segment of shorter length next to the filter's edge) was presented on each trial. Participants said “yes” or “no” depending on whether they saw the dot at the position of the reference line or not. Detection thresholds were determined (ThresholdposI, schematically depicted by the longer line segment). At ThresholdposI, “yes” and “no” responses were equally likely. (b) In condition II, the same white dot moved on a long trajectory from the 6 o'clock position into the filter's denser part until it disappeared, and again around the 9 o'clock position. A reference line was presented (white radial line segment of shorter length). For this task, participants said “ahead” or “behind” depending on whether they perceived the dot as disappearing before it reached the reference line or after it had passed it. From trial to trial, the position of the reference line was changed to determine the disappearance threshold (ThresholdposII, schematically depicted by the longer segment of radial line). At ThresholdposII, “ahead” and “behind” responses were equally likely. Our main finding was that (ThresholdposII−ThresholdposI)/v=175 msec, where v is dot velocity. Adapted from Maus and Nijhawan (2006).

6. Further challenges to the notion of visual prediction

6.1. Luminance contrast dependent variation in the flash-lag effect

It is known that luminance contrast can influence visual delay, which in turn can influence the perceptual localization and shape of moving objects, as has been shown by several phenomena. The Pulfrich pendulum effect occurs when the observer views a moving pendulum bob with one eye covered by a neutral density filter. As a result of reduced contrast in one eye, the observer perceives the motion of the swinging bob in a plane as movement along a curved path in depth (Burr & Ross Reference Burr and Ross1979; Morgan & Thompson Reference Morgan and Thompson1975). This finding can be explained in terms of longer visual processing delays for the eye with the filter, which receives lower image contrast relative to the other eye. The effect of variations in luminance contrast on perceptual distortions in the shape of moving objects has also been shown (Roufs Reference Roufs1963; Williams & Lit Reference Williams and Lit1983; Zanker et al. Reference Zanker, Quenzer and Fahle2001). In addition, the effect of luminance contrast on localization of moving objects has been studied using the flash-lag effect (Purushothaman et al. Reference Purushothaman, Patel, Bedell and Öğmen1998). Purushothaman et al. (Reference Purushothaman, Patel, Bedell and Öğmen1998) found that value of Δσ/v changed from 20 to 70 msec as the luminance of the moving object was increased (with luminance of flash held constant) by 1 log unit above detection threshold. Zanker et al. (Reference Zanker, Quenzer and Fahle2001) found a luminance dependent shape distortion for moving objects corresponding to 3 msec. In order for visual compensation to be effective, it must be flexible, accounting not just for the visual delay but also for variations in it. Since the flash-lag magnitude varies with luminance (Purushothaman et al. Reference Purushothaman, Patel, Bedell and Öğmen1998), this has been taken to imply that the visual system is not able to take account of the variations in visual delays. This has led to the suggestion that visual processes do not contribute to compensation for neural delays (Purushothaman et al. Reference Purushothaman, Patel, Bedell and Öğmen1998; Zanker et al. Reference Zanker, Quenzer and Fahle2001).

This reasoning, however, is questionable. Consider a related question: How are actions such as catching or hitting a ball affected by a change in a ball's luminance contrast? Depending on the degree to which luminance contrast varies, performance will either be affected or not (Anstis et al. Reference Anstis, Smith and Mather2000). Suppose the change in luminance contrast is large enough that the catching ability of a player is affected; that is, at luminance contrast level 1 the player is able to catch the ball, but at luminance contrast level 2, he or she is unable to catch the ball. If we follow the reasoning that luminance contrast–dependent modulation of the flash-lag effect implies that the visual system does not compensate, then the luminance contrast–dependent modulation in behavior would suggest that the nervous system (as a whole) does not compensate for visual delays. This is clearly mistaken reasoning because it is all but certain that the nervous system (as a whole) cannot function without compensation for neural delays. Hence, variation in performance, measured either visually (e.g., via the flash-lag effect) or through action, cannot be taken to indicate that there is no compensation for visual delays; variation in performance simply means that the output of the system changes when conditions change. These considerations lead to what I call the fundamental assumption of compensation for visual delays,Footnote 2 which states: “In the absence of mechanisms compensating for visual delays, many behaviors in otherwise healthy animals would be disrupted.”

A further question, however, arises. If there is a visual compensation mechanism, then what might be the range of environmental conditions (and/or conditions internal to the animal) within which the mechanism produces an invariant output? All known neural mechanisms have limitations. Consider the well-known size constancy mechanism. This mechanism functions well (i.e., produces an invariant size estimate) only within a limited range of distances – imperfect size constancy for distances outside the range does not mean that size constancy does not occur at all. What might be the range of luminance contrasts for the moving object over which spatial extrapolation might produce an invariant output? It is premature to give a definitive answer to this question. In the study of size constancy, zero distance is the natural “anchor point.” One of the difficulties is determining what might be the anchor point for luminance contrast. There are three possibilities: (a) the stimulus in an “Off” state; (b) the observer's absolute detection threshold; or (c) some “average” photopic luminance level in which the observer's visual system evolved.

It is well known that in sports, for example, degradation of performance occurs outside certain limits of ambient illumination; play is sometimes stopped because of “bad light.” From failures of performance meeting task criteria (e.g., if one is forced to play in bad light), however, one cannot conclude that compensation for visual delays does not occur when light is “good” – this would violate the fundamental assumption of compensation for visual delays – nor can one conclude that breakdown of performance reflects no compensation for visual delays; behavior can break down because of partial, or inappropriate, compensation. The only conclusion one can draw from behavior not meeting task criteria is that compensation is not appropriate for the total (sensory+motor) delay in the system.

In producing goal-directed behavior, the flexibility of the animal's nervous system as a whole would be expected to be higher than the flexibility of the visual system per se. This is simply because the activity of single neurons located relatively late in the sensory-motor stream, for example, in the spinal cord, can be influenced by many more neurons, than can the activity of neurons located early in the visual pathway. Indeed, spinal motor neurons could receive input from more than two thousand other neurons (Poritsky Reference Poritsky1969). This suggests that if the visual system contributes to compensation for neural delays, then this contribution may be less flexible in accommodating for luminance contrast variations than the animal's nervous system as a whole. Therefore, for different visual delays, compensation could be adjusted dynamically between visual and motor processes. For example, for low luminance contrast, where performance still meets task criteria, the compensation could be carried out more by the motor than by the visual processes. This suggestion is consistent with the results of Purushothaman et al. (Reference Purushothaman, Patel, Bedell and Öğmen1998) and Zanker et al. (Reference Zanker, Quenzer and Fahle2001), and reconciles the contrast-dependent variation in the flash-lag effect with the notion of visual prediction.

6.2. Logical challenges to visual prediction

Compensation for all neural delays can be carried out by late components of the sensorimotor pathways. As suggested by Purushothaman et al. (Reference Purushothaman, Patel, Bedell and Öğmen1998), “accurate visually guided motor actions are likely to depend on motor instead of perceptual compensation” (p. 424). There is unequivocal evidence for compensation mechanisms that are located within the non-visual parts of the nervous system (Lacquaniti & Maioli Reference Lacquaniti and Maioli1989). The view challenging the role of visual prediction and assigning a compensatory role only to non-visual mechanisms seems parsimonious. Hence, it has been suggested that visual prediction is unusual in that, on this proposal, the visual system somehow attempts to compensate for its own input delays. In fact, Schlag and Schlag-Rey write: “Clearly, this is one of the most daring proposals of a top-down hypothesis. Here, ‘top–down’ means that somehow the brain is attempting to correct its own input from the senses” (Schlag & Schlag-Rey Reference Schlag and Schlag-Rey2002, p 197).

On the fundamental assumption of compensation for visual delays, the delays must be compensated somewhere within the nervous system. For concreteness, suppose that the visual delay for a moving ball between the photoreceptors and the LGN is 50 msec, and the total delay (for which the nervous system must compensate) involved in catching the ball is 200 msec. If the 50 msec delay is not compensated, and if the required precision in performing the catch is greater than 50 msec, which is common in even moderately demanding tasks (Tresilian Reference Tresilian1993), then the animal's behavior will not meet the task criteria. If, on the other hand, the animal is able to produce behavior that meets the task criteria, then it follows that if compensation for 50 msec (visual delay) is not carried out in the vision pathway, then it must be carried out in later pathways before neural stimulation of muscles. If one assumes only non-visual compensation, then one must also assume that the non-visual part of the nervous system has information concerning the 50 msec delay, and that the nervous system as whole is attempting to correct for the visual delay. Thus, the criticism that seemed specific to visual prediction in terms of “correcting its input from the senses” (Schlag & Schlag-Rey Reference Schlag and Schlag-Rey2002) is not specific at all, and would apply to any and in fact all predictive mechanisms within the CNS.

Furthermore, there are clear examples in which the CNS does monitor its own input delays. Variables such as muscle fatigue impact the force a muscle can generate in response to a given neural signal, so the CNS must monitor such variables in order to produce a desired trajectory of limb movement (Bullock Reference Bullock and Arbib2003). Therefore, mechanisms compensating for delays in the motor system require information not only about delays resulting from the sluggish response of muscles to neural stimulation, but also about variations in these delays caused by varying degree of muscle fatigue. Helmholtz's efference copy notion suggests another example. In order to maintain a motionless visual scene during voluntary eye movement, the efference copy of the motor command and the visual feedback must be available for comparison within some time window. As change in luminance contrast changes the retinal input delays, the fact that we perceive visual scenes as remaining stable under a wide range of luminance contrasts already supports the view that the visual system is able to monitor its input delays.

6.2.1. Prismatic adaptation and visual prediction

A related challenge is posed by adaptation experiments involving optical distortions (Harris Reference Harris1963). Humans can adapt to optical displacements produced by wedge prisms. Upon first wearing prism goggles, the wearer will incorrectly reach for objects, but later he or she will adapt and reach accurately. Such results might suggest that errors in visual localization, whether resulting from optical or neural factors, can be dealt with by late sensory-motor mechanisms, and that compensation for delays in vision per se is unnecessary for interceptive or reaching actions.

There are, however three issues that need further attention. First, there is debate as to whether the adaptation observed in displaced/inverted vision experiments is visual or motor (Harris Reference Harris and Harris1980). To the degree that adaptation is at the level of vision, the notion of visual prediction would be supported. Second, adaptation to inverted vision has been tested after days or, at the most, weeks of the subject's visual experience while wearing prism goggles. Although it is not known what the state of the visual adaptation would be if the experience with displaced vision was lifelong, the expectation is that adaptation would be more complete with longer experience. Following his famous experiments, Stratton reported that the difficulty of seeing things as being upright with upright retinal images (as opposed to inverted retinal images) seemed entirely due to the “resistance Offered by long-established previous experience” (Stratton Reference Stratton1896). There are many reports, including those from Stratton in the 1890s and Kohler in the 1950s and 1960s, indicating that with time, things appeared more “normal” and, therefore, that there was an adaptation at the perceptual level (Harris Reference Harris and Harris1980). Clearly, with time, the observer's interaction with visual objects improved, so over time the correlation between normalcy in appearance and improved interaction with things suggests that perception does impact action. I argue (see further on) that the improved level of perceptual adaptation feeds into more accurate action, and more accurate action feeds (via reafference) into improved perception. A lifetime of experience with neural delays should no doubt render the visual adjustment complete, and so in this case, action may be influenced strongly by perception. Finally, there are clear examples of some animal species that do not adapt at all to displaced vision (Hess Reference Hess1956). Such animals under displaced vision conditions would starve to death even though food was nearby. Therefore, if one were to follow the reasoning that ready adjustment in humans to optical distortion suggests late compensation, then, by the same token, a lack of adjustment would imply compensation at the early stage of visual processing. Because in evolution older systems are modified, not discarded, the lack of adaptation in some older systems suggests that some form of visual compensation, perhaps less flexible than motor compensation, may also exist in more recent systems such as humans.

7. Functions and mechanisms of spatial extrapolation

We have so far considered the effect of extrapolation on perception. This is indeed appropriate, as the one empirical basis of the current viewpoint is psychophysical. Although extrapolation carried out by visual mechanisms could impact behavior directly with no perceptual consequences, here I consider only processes that can and do reveal themselves perceptually. Therefore, the discussion here will keep in view the human flash-lag results. Multiple retina-, eye-, head-, and body-centered coordinate systems that encode locations of objects for various functions exist within the CNS. One encounters the issue of neural delays and compensation not just when these coordinate frames are static, but also when they are moving, as, for example, when the eyes or the head track a moving object. Although this situation can be incorporated within the general framework presented here, detailed analysis of neural delays for moving reference frames is beyond the scope of this article – the reader is directed to previous empirical work on moving reference frames (Brenner et al. Reference Brenner, Smeets and van den Berg2001; Nijhawan Reference Nijhawan2001; Schlag & Schlag-Rey Reference Schlag and Schlag-Rey2002). A forthcoming article considers the consequences of delay compensation when the whole animal moves in relation to static objects (Changizi et al., in press).

The important function of compensation for neural delays is likely to be carried out by multiple spatial extrapolation mechanisms at different levels of processing; the requirements for spatial extrapolation of coordinates are very simple, and mechanisms that could potentially accomplish extrapolation are present at virtually all levels of neural processing. However, the immediate goal of the mechanisms at different levels may not be exactly the same. In addition, these mechanisms may be variants of one underlying process or could be significantly different from each other. I now consider these mechanisms.

There is a natural link between mechanisms of spatial extrapolation and their function. For example, a mechanism serving a high-speed response may be different from a mechanism for which speed is less relevant. The simplest function of visual mechanisms compensating for delays would be to remove errors in localization and support rapid behaviors such as orienting toward moving objects. Mechanisms serving this function might be located early in the visual pathway. Intermediate-level mechanisms also may serve the function of perception of objects and of planning of more complex actions that involve coordination of gross and fine bodily movements over a longer time frame. Finally, late mechanisms of visual prediction may be related to visual feedback to the animal about successes or failures of actions that have already been performed. The early-to-late hierarchy mimics the literal-to-abstract representation of space in the early to late representations in parietal and frontal cortical areas (Andersen et al. Reference Andersen, Snyder, Li and Stricanne1993). The distinction between early versus late compensation mechanisms may also be understood in terms of the requirements of speed, accuracy, and flexibility of actions; early mechanisms may feed into fast relatively stereotyped actions, whereas late mechanisms might feed into actions in which variability in sensory and motor delays is likely to be high, or in which information about delays in the feedforward vision pathways is less reliable.

The suggestion that compensation mechanisms may be ubiquitous within the CNS is not without support. There exist a great variety of potential mechanisms capable of spatial extrapolation (see, e.g., Baldo & Caticha Reference Baldo and Caticha2005). The required lateral shifts in coordinates could be based on shifts in peaks of population response already shown in the retina (Berry et al. Reference Berry, Brivanlou, Jordan and Meister1999) and in V1 (Jancke et al. Reference Jancke, Erlhagen, Dinse, Akhavan, Giese, Steinhage and Schoner1999); shifts in receptive fields, or on the basis of lateral spread of neural facilitation (Erlhagen Reference Erlhagen2003; Grzywacz & Amthor Reference Grzywacz and Amthor1993; Kirschfeld & Kammer Reference Kirschfeld and Kammer1999; Sillito et al. Reference Sillito, Jones, Gerstein and West1994), and other forms of distortion of retinotopic representations (Sundberg et al. Reference Sundberg, Fallah and Reynolds2006). Such mechanisms are known to exist not just in the visual system, but also in the auditory system (Witten et al. Reference Witten, Bergan and Knudsen2006). Other, more specialized, computational schemes, such as “shifter circuits,” can also produce a horizontal shift in neural array activity on the basis of motion information (Anderson & Van Essen Reference Anderson and Van Essen1987). Finally, there are mechanisms located in higher cortical areas or the cerebellum that integrate information from multiple sources. These mechanisms fall in the general category of internal models (Kawato Reference Kawato1999).

7.1. Prediction in feedforward vision pathways serves perception for online control of behavior

The goal of visual prediction cannot be solely to inform perception. Rather, visual prediction must impact the behavior of animals and ultimately contribute to a survival advantage. Barlow has argued that significant recoding of sensory information occurs within the retina itself (Barlow Reference Barlow1979), and that retinal processes per se contribute to transformations of information in preparation for use during behavior. Indeed, mechanisms for spatial extrapolation have been discovered in the retina (Berry et al. Reference Berry, Brivanlou, Jordan and Meister1999) and in other early stages of visual processing (Barlow Reference Barlow1953). Berry et al. (Reference Berry, Brivanlou, Jordan and Meister1999) used isolated retinas of the salamander and the rabbit in their studies. A moving or a flashed bar was projected on the retina, and spiking activity from a population of ganglion cells was recorded simultaneously. The cells responded with a latency of about 50 msec to the flash. However, the peak of the population cell response to the moving bar appeared close to the leading edge of the bar, instead of trailing behind the bar's center by 50 msec. (Note that full compensation requires the population response to be aligned with the bar's center. The forward shift of the population response in relation to the bar's center is noteworthy, and will be further considered later.) This extrapolation of the response of the ganglion cells may be explained in terms of spatio-temporal filtering, biphasic response of cells, and contrast gain control (Berry et al. Reference Berry, Brivanlou, Jordan and Meister1999). These mechanisms are not unique either to lower species or to the early parts of the vision pathway; analogous mechanisms exist at higher levels of processing in primates (Shapley & Victor Reference Shapley and Victor1978), which may explain the flash-lag results in humans.

If one assumes that online behavior is controlled by perception of events (Davidson Reference Davidson and Rosenthal1970), then in humans the simplest possible function of extrapolation in the feedforward visual pathway, say, between the photoreceptors and V1, would be to aid interaction with moving objects. On this view, extrapolation reduces latency-based spatial error in the perceived position of moving objects such as tennis balls, which in turn aids successful interceptive actions such as hitting tennis balls. This possibility, however, cannot be entirely correct, as humans can perform goal-directed actions without awareness of the position of the visual objects (Goodale et al. Reference Goodale, Pelisson and Prablanc1986). Thus, it cannot be that the perception of the true position of objects is necessary for action. The general issue of whether perception is necessary for action has bearing on this point.

The vision pathway consists of two parallel streams. The magnocellular stream consists of larger, faster-conducting neurons; these neurons appear to be specialized for processing spatial and motion information. The parvocellular stream consists of smaller, slower-conducting neurons. These neurons seem more specialized for processing of object and color information. The functional specialization of the two streams, which starts with parvo- and magno-retinal ganglion cells, continues beyond V1 as the dorsal and ventral pathways (Mishkin & Ungerleider Reference Mishkin and Ungerleider1983). One point of view argues for substantial functional independence between the two streams captured by the dichotomy of “vision for perception” versus “vision for action” (Goodale & Milner Reference Goodale and Milner1992). The suggestion is that the dorsal stream provides visual information to action systems, whereas the ventral stream provides visual information for perception (Goodale & Milner Reference Goodale and Milner1992). Support for this position comes from findings in humans that damage to the dorsal stream produces a deficit (optic ataxia) in the action system but not the perception system, whereas damage to the ventral pathway impairs perception and recognition of objects but spares the ability to produce correct actions.

If perception and action were based on visual processes that were completely isolated in independent pathways, then one might be forced to come to the conclusion that perception does not affect action systems, and so compensation for delays at the perceptual level cannot contribute to adaptive behavior. There are, however, virtually no researchers who believe there to be a complete segregation of visual processes for perception versus action. For example, Goodale and Milner (Reference Goodale and Milner1992) have argued that the visual phenomenon of size constancy plays a role in the scaling of grasp size. Therefore, it may not be unreasonable to expect that the delay-compensated perceived position of moving objects, which is a type of constancy phenomenon (see further on), does play a role in interceptive actions.

7.2. Internal model for visual prediction

Much of the flash-lag debate has centered on the interpretation of the effect given in section 7.1; that is, spatial extrapolation is caused by visual mechanisms in the feedforward vision pathway serving perception. I now develop a generalized concept of visual prediction that assumes that spatial extrapolation is not based on mechanisms in the stream that directly feeds perception, that is, the parvocellular stream. A modular approach is adopted, with multiple mechanisms located at different levels within the CNS. The picture that emerges is consistent with the notion of internal models that have been useful in describing both motor and sensory systems. In outlining the internal model for visual prediction (IMVP), I begin by describing the various processes and component mechanisms that make up the IMVP.

7.2.1. Prediction in feedforward vision pathways serves online control of behavior: Impact of “motor maps” on perception

There are psychophysical (Nijhawan Reference Nijhawan1997) and neurophysiological (Sundberg et al. Reference Sundberg, Fallah and Reynolds2006) reports of color-specific effects that are easiest to explain in terms of spatial extrapolation within the ventral stream. Therefore, mechanisms for extrapolation may be located in both vision streams. However, if one assumes that the ventral stream is mainly responsible for visual awareness (Goodale & Milner Reference Goodale and Milner1992), then an extrapolation mechanism cannot be located only in the ventral stream. This is because goal-directed actions can be performed without awareness of the stimulus, as revealed by the pathological phenomenon of blindsight (Stoerig & Cowey Reference Stoerig and Cowey1997; Weiskrantz Reference Weiskrantz1996) and in normal observers (Milner & Goodale Reference Milner and Goodale1995). For simplicity, and for the purposes of addressing the debate directly, I assume that visual extrapolation mechanisms are located only in the sensory pathways feeding motor processes for control of behavior. Cortical and subcortical areas receiving sensory input that function mainly to control behaviors such as orienting and reaching towards, or withdrawing from moving objects, are common in the CNS.

7.2.1.1. Compensation for lag of moving objects in motor-oriented reference frames

Figure 1 depicted the effect of visual delays on a wave of neural activity in a retina-centered cortical frame and on human perception. In order to execute fast interceptive actions, the CNS requires not just spatial but also temporal information. I first consider neural transformations that provide action systems with spatial information. Visual input is initially encoded in a retina-centered frame. Interaction with objects requires that the location of objects be transformed from a sensory-based representation into a muscle-centered representation. Intermediate eye- and head-centered maps contribute to such transformations. In the lateral intraparietal area and the parietal reach region of the cortex, there are maps for controlling eye movements and reaching movements, respectively (Andersen & Buneo Reference Andersen and Buneo2002). The primary function of such maps is to specify the position of stimuli for action (Taira et al. Reference Taira, Mine, Georgopoulos, Murata and Sakata1990).

In addition to cortical maps, there are many subcortical retina-centered frames, for example, in the primate superior colliculus (SC), which serve mainly a motor function (Cynader & Berman Reference Cynader and Berman1972). Findings of various studies suggest that the SC represents a “motor map” of locations for the guidance of orienting responses such as saccades and head movements (DuBois & Cohen Reference DuBois and Cohen2000; C. Lee et al. Reference Lee, Rohrer and Sparks1988; Schiller Reference Schiller and Darien-Smith1984; Sparks & Jay Reference Sparks and Jay1986; Sparks et al. Reference Sparks, Lee and Rohrer1990). Many SC cells respond to moving visual stimuli (Dean et al. Reference Dean, Redgrave and Westby1989). An uncompensated delay in the retina to the SC pathway will, for example, cause a saccadic target in motion to lag in the retina-centered SC frame. This lag, in turn, would affect the direction, amplitude, and velocity of saccades, impeding the animal's ability to rapidly orient toward a moving target that might be of imminent interest or danger to the animal. This suggests a strong basis for compensation of visual delays in the retina–SC pathway. Indeed, mechanisms for extrapolation have recently (Witten et al. Reference Witten, Bergan and Knudsen2006) been uncovered in the owl's optic tectum (homolog of the mammalian superior colliculus).

There is also a wealth of data related to how the CNS obtains timing information for potential action. In order to hit a fast-moving tennis ball (say), the player's racket must arrive in the correct position at the correct time; so in addition to position information, the action systems of the hitter require precise timing information. One related line of research is based on the time-to-collision (TTC) paradigm (Bootsma & Oudejans Reference Bootsma and Oudejans1993; D. N. Lee Reference Lee1976; D. N. Lee & Reddish Reference Lee and Reddish1981; Regan Reference Regan1992; Tresilian Reference Tresilian1993; Wagner Reference Wagner1982). Gibson proposed that information for acting on the world is contained in the stimulus array in the form of invariants, and can be directly used by the animal without recourse to internal representations (Gibson Reference Gibson1961). D. N. Lee (Reference Lee1976) applied Gibson's approach to situations in which animals encounter moving objects. For an approaching object, TTC can be computed on the basis of the ratio of retinal image size divided by the rate of expansion of the retinal image. Recent evidence suggests that computation of TTC is considerably more complex than previously expected (Tresilian Reference Tresilian1999), however, the basic premise of much of the TTC research is that the CNS uses visual information directly to predict the time of collision. In particular, this point of view rarely considers internal representations and perception as necessary for action. The evidence that optical information is used in the computation of TTC in lower species, such as pigeons (Wang & Frost Reference Wang and Frost1992), gannets (D. N. Lee & Reddish Reference Lee and Reddish1981), and houseflies (Wagner Reference Wagner1982), underscores this point.

Speeded actions directed at moving targets depend on precise spatial information. The TTC approach does not explicitly address where or how visual delays, which would contribute to spatial errors, are compensated. I argue that a corollary of the TTC approach is that visual mechanisms compensate for visual delays. Consider, for example, experiments revealing the ability of humans to estimate the future time of arrival at a predetermined position of visual objects traveling at constant speed. Observers are accurate to within 10 msec in estimating arrival times of objects at positions that are more than 2000 msec in the future (Bootsma & Oudejans Reference Bootsma and Oudejans1993). As the visual processing delay is much longer than 10 msec, it is clear that in order to perform such tasks, this delay must be compensated. In addition, visual processing delay is much smaller than 2000 msec. Thus, the mechanisms that purportedly predict a moving object's position 2000 msec into the future can certainly predict the object's position over a much shorter time, corresponding to a visual delay of, for example, 100 msec. Therefore, it is implicit in the TTC approach that the CNS uses optical information to compensate for visual delays. This compensation, however, may be temporal rather than spatial in nature. Thus, the compensation may not involve an explicit spatially extrapolated representation.

Frost and colleagues (Sun & Frost Reference Sun and Frost1998; Wang & Frost Reference Wang and Frost1992) have found evidence of neurons that compute TTC in the nucleus rotundus of pigeons (the homolog of the pulvinar in the mammalian thalamus), a major midbrain area receiving input from the optic tectum. Interestingly, these researchers did not find evidence of neurons that compute TTC in the optic tectum (Wang & Frost Reference Wang and Frost1992). Since mechanisms for spatial extrapolation have been found in the optic tectum (Witten et al. Reference Witten, Bergan and Knudsen2006), this suggests the interesting possibility that neurons in the optic tectum spatially extrapolate sensory input, and that this extrapolated output is sent to nucleus rotundus for computations of timing.

7.2.1.2. Prediction minimizes the interference of non-veridical percepts with online action

The present hypothesis suggests that compensation for visual delays is not carried out in the feedforward ventral pathway serving perception. What, then, is the function of extrapolation in perception revealed by the flash-lag, and how does it occur? I suggest that extrapolation in perception occurs in two steps. The first step is extrapolation carried out by feedforward visual processes for online control of action. This extrapolation is not directly revealed in perception. The second step is the communication of the extrapolated visual data to ventral processes at a later stage of the vision pathway. Thus, the perceptual consequences of extrapolation are there due to crosstalk between the dorsal and the ventral pathways at a late stage (Khurana et al. Reference Khurana, Carter, Watanabe and Nijhawan2006).

The function of this mechanism is to remove conflict between object positions available to perception and action systems. Note that the lack of necessity of correct perceptual information for online action (as noted above) is one thing, but the perceptual system delivering non-veridical position information is quite another. This point is brought out quite revealingly when one reconsiders the effect of delays in the ventral stream on the perception of moving objects. In the absence of compensation, the perceived position of moving objects would be non-veridical (Fig. 1b). Thus, in light of everyday observations of successful interceptive actions, a suggestion that there is no perceptual compensation for delays would imply not just that perceptual compensation is unnecessary for online actions, but also that online actions are immune to non-veridical percepts of moving objects. Contrary to the expectation that perception may be irrelevant for action, non-veridical percepts (e.g., geometric illusions) have been shown to influence actions (Franz et al. Reference Franz, Gegenfurtner, Bulthoff and Fahle2000). Therefore, a behavioral advantage would be gained for visual mechanisms that remove errors in perceptual localization of moving objects.

7.2.2. Prediction resulting from processes beyond feedforward vision pathways: Extrapolated perception caused by feedback from motor processes

Early sensory mechanisms and extensions of these mechanisms found in later areas of the primate visual systems might explain the flash-lag effect in humans (Berry et al. Reference Berry, Brivanlou, Jordan and Meister1999; Witten et al. Reference Witten, Bergan and Knudsen2006). There is a distinct possibility, however, that the extrapolation mechanisms revealed in the early processes in lower species feed only action systems. Thus, the connection between early mechanisms and human perception may be less straightforward than previously suggested (Gegenfurtner Reference Gegenfurtner1999).

The currently predominant view is that prediction exists mainly in the motor components of the nervous system (Wolpert & Flanagan Reference Wolpert and Flanagan2001). In consonance with this view, I assume that extrapolation mechanisms exist in neither the action- nor the perception-oriented feedforward vision pathways, but are located in later brain areas serving action: Motor mechanisms receive delayed sensory data consisting a spatial error, extrapolate, and send the compensated information to the muscles. Motor extrapolation is carried out during actual interactions, or planning of interactions, or during imagined interactions (mental rehearsal). Extrapolation in perception then results either because of predictive motor processes interacting with visual processes, or because of a deep-seated unity between motor and perceptual processes (see further on). The interaction could be supported by feedback from motor areas to the visual areas, analogous to the efference copy notion of feedforward control, or by communication between motor and visual neurons within a given area of the CNS. On the feedback view, the extrapolated motor output is sent not just to the muscles, but a copy of this output is also fed back to visual cortical areas influencing perception. A strong case for the impact of feedback from higher cortical areas on perception has been made (Lamme & Roelfsema Reference Lamme and Roelfsema2000). In addition, it is known that in humans, frontal cortical areas can modulate neural activity in visual areas (Sterzer & Kleinschmidt Reference Sterzer and Kleinschmidt2007). The function of the interaction between motor and visual processes is to provide the animal with visual feedback resulting from actual actions directed at moving objects. Without this mechanism, there would be an offset between the time–space of the performed action and the time–space of the visual feedback resulting from the action.

Similarities between motor extrapolation and its counterpart in perception may be understood along a different line. Many influential researchers have suggested that despite apparent differences, perception and action are two sides of the same coin (see, e.g., Sperry Reference Sperry1952). Evidence for this position comes from multiple sources. For example, the perception and production of vocal sounds appear to be served by the same underlying neural structures (Liberman & Mattingly Reference Liberman and Mattingly1985; Williams & Nottebohm Reference Williams and Nottebohm1985); experiments requiring human subjects to both produce actions and view events relatable to those actions (Müsseler & Prinz Reference Müsseler and Prinz1996; Parkinson & Khurana Reference Parkinson and Khurana2007; Stürmer et al. Reference Stürmer, Aschersleben and Prinz2000) have found similarities in results across the two domains; and neurons in many parts of the CNS respond both when the animal performs an action and also when the animal views an event relatable to that action. In one study, Port et al. (Reference Port, Kruse, Lee and Georgopoulos2001) recorded neural activity in the monkey motor cortex either while the animal acted to catch a moving object, or when it simply viewed the same object without performing an overt action. A significant proportion of investigated neurons responded both when the monkey performed the action and when the monkey simply looked at the moving object.

Many neuroscientists (Arbib Reference Arbib1972; Rizzolatti et al. Reference Rizzolatti, Fadiga, Fogassi and Gallese1997; Sperry Reference Sperry1952) have attempted to diffuse the sharp boundary between the sensory and the motor systems on theoretical grounds. During evolution, there is no “jump” that can be discerned between the nervous systems of primitive species, whose primary function is to produce overt responses conducive to survival, and of higher species, in which perceptual-cognitive systems seem to have become disengaged from action systems (Sperry Reference Sperry1952). On the single system view, one should not be surprised to find similar mechanisms performing similar functions in apparently different parts of the CNS. Parsimony suggests that if there are mechanisms compensating for delays within the motor system, then similar mechanisms are also likely to be present in the visual system, and vice versa (Nijhawan & Kirschfeld Reference Nijhawan and Kirschfeld2003).

It is worth noting that the flash-lag effect in the flash-initiated condition is better explained in terms of visual mechanisms in the feedforward vision pathways. Nonetheless, the late mechanism described here could modulate the extrapolated output of feedforward visual mechanisms. As discussed in section 5.2.1, the time required for extrapolation in the feedforward vision pathway is vanishingly small, so a visual mechanism could “over-extrapolate.” There is, in fact, evidence that the feedforward pathway does over-extrapolate. First, the flash-lag effect in the flash-initiated condition is larger in magnitude than in the complete motion condition (Khurana & Nijhawan Reference Khurana and Nijhawan1995). Second, Berry et al. (Reference Berry, Brivanlou, Jordan and Meister1999) found the population ganglion-cell response to be shifted further forward of the middle of the moving bar. (Note that precise compensation for delays would require the population response to be aligned with the center of the moving bar.) It is unlikely that early mechanisms could precisely compensate for the neural delays or deliver a reliable enough output for the animal to act on. Thus, over-extrapolation may be a result of “rough and ready” computation provided by the early visual pathway. For the purposes of action, the over-extrapolated output would then require modulation, which would be undertaken by the late mechanism described here.

7.2.3. Forward models

There are many types of internal models. Internal models of the motor system are “neural mechanisms that can mimic the input/output characteristics, or their inverses, of the motor apparatus” (Kawato Reference Kawato1999, p. 718). There is also evidence for internal models within sensory systems; one such model within the vestibular system functions to represent a stable orientation of the head with respect to the world (Merfeld et al. Reference Merfeld, Zupan and Peterka1999).

There are two varieties of internal models: forward models predict the next state of the motor apparatus given the current state and motor command, and inverse models infer the motor command that caused a change in state on the basis of feedback from sensory data (e.g., vision and proprioception). Forward models are necessary when delays in the sensory motor loops are significant. A generalized internal model that combines prediction and sensory feedback to estimate the current state of the motor apparatus is known as the “internal forward model” (Wolpert et al. Reference Wolpert, Ghahramani and Jordan1995). In a study testing the existence of this model, participants moved their unseen arm in the presence of a null, an assistive, or a resistive force. When required to estimate their arm position during the initial stages of movement, participants overestimated the distance by which their arm had moved (Wolpert et al. Reference Wolpert, Ghahramani and Jordan1995). The overestimation error reduced during the later parts of movement, suggesting the engagement of a forward model when sensory feedback from the limb is relatively weak at the beginning of movement. (An alternative account of this result is considered later.)

Control and feedback are essential features of the internal models of the motor system. Muscles and limbs may be considered part and parcel of the internal model for the motor apparatus (Kawato Reference Kawato1999), as the cortex controls the movement of limbs, and the limbs in turn send sensory input to the cortex. An IMVP cannot have visual objects as its components. This is because visual objects are “external” entities that cannot be controlled by the internal model. Likewise, the IMVP cannot include the retina as its component. (There does not seem to be any evidence that feedback from higher brain areas can descend down to the retina.) However, such a model can include any or all components of the visual system up to and including the LGN. The LGN receives a massive number of descending fibers from the cortex, and is the lowest known level of the visual system at which cortical feedback can modulate ongoing neural activity (Cudeiro & Sillito Reference Cudeiro and Sillito2006). Descending fibers affect the ongoing thalamocortical transfer of sensory information and may, indeed, serve a predictive function (Engel et al. Reference Engel, Fries and Singer2001; Sillito et al. Reference Sillito, Jones, Gerstein and West1994).

I suggest that the concept of the internal model for the motor apparatus may be naturally extended to incorporate visual prediction, with the qualification that in IMVP sensory input has a different role than it does in the motor model. The suggestion here is that neural representations in the LGN are both stimulus-driven sensory entities (based on input from the retina) and controlled entities (receiving descending cortical input), just as the neural limb representations (not the limbs per se) are in the motor models. What distinguishes IMVP from the motor model is the retina-to-LGN segment, which cannot be controlled. In the motor model, there is no strictly sensory segment.

The present IMVP model has another unique feature. In the IMVP, extrapolation can occur within the feedforward vision pathway, which may provide a rough and ready over-extrapolated output (Berry et al. Reference Berry, Brivanlou, Jordan and Meister1999; Khurana & Nijhawan Reference Khurana and Nijhawan1995). This suggests an alternative account of Wolpert et al.'s (1995) results in which participants overestimated the distance by which their arm had moved during the initial stages of movement. This result may be due to over-extrapolation in the proprioceptive input pathway, and not due to a greater engagement of a high-level forward model during the initial stages of movement. The overestimation is revealed during the early stages of movement because the late modulatory process that keeps over-extrapolation in check has as yet not been engaged.

7.3. Support for IMVP

Without doubt, the motor system contributes to compensation for delays during action (Lacquaniti & Maioli Reference Lacquaniti and Maioli1989). It follows that if the flash-lag has something to do with compensation for neural delays, then one should certainly observe a flash-lag effect for action. Such an effect has indeed been observed. If during voluntary limb movement a flash is presented in alignment with the moving, invisible, limb, then the observer sees the flash in a position lagging the registered position of the invisible limb (Nijhawan & Kirschfeld Reference Nijhawan and Kirschfeld2003; and see further on in the target article). One may attempt to explain this “motor flash-lag” result in terms of shorter latency for limb proprioception relative to the latency for the flash. However, the average motor flash-lag corresponds to Δσ/v=128.5 msec. Because the input delay in the perception of the flash itself is likely to be in this time range, this explanation would have to assume an impossible, close-to-zero latency for limb proprioception.

Two suggestions follow from the motor flash-lag result. First, the finding of similar effects in perception and action reinforces previous suggestions that the sensory and the motor systems have an underlying unity (Arbib Reference Arbib1972; Rizzolatti et al. Reference Rizzolatti, Fadiga, Fogassi and Gallese1997; Sperry Reference Sperry1952). Second, there is little doubt that the two flash-lag effects, one measured in relation to a moving external object and the other in relation to the moving invisible limb under the observer's voluntary control, belong to the same category of phenomena. Hence, it is probable that the two effects have a common origin. Compensation for neural delays via feedforward (Georgopoulos Reference Georgopoulos1986) or anticipatory motor control (Ghez & Krakauer Reference Ghez, Krakauer, Kandel, Schwartz and Jessell2000) that readily explains motor flash-lag, has the advantage of parsimony: the two effects, the standard flash-lag and the motor flash-lag, can be explained with one hypothesis.

We employed the flash-lag effect to test for the existence of an IMVP (Rojas-Anya et al. Reference Rojas-Anya, Thirkettle and Nijhawan2005). A previous study (Batista et al. Reference Batista, Buneo, Snyder and Andersen1999) supporting the existence of internal models found that visually guided reach plans in posterior parietal cortex are represented in an eye-centered frame; neural response was modulated by gaze direction irrespective of retinal position of visual reach targets in this study. Using the flash-lag effect, in our study we asked whether the instantaneous position of a moving item is represented in an eye-centered frame (Rojas-Anya et al. Reference Rojas-Anya, Thirkettle and Nijhawan2005). The moving item was either a visual target or the observer's invisible limb under voluntary control. During movement, a flash was presented in alignment with the moving item. The observers' gaze direction was manipulated such that at the time of the flash, the moving item (visual target or observer's limb) either approached the fovea or receded away from it. We found that in both cases, a much stronger flash-lag effect occurred when the item approached the fovea than when the item receded away from the fovea. The asymmetric flash-lag for visual motion is consistent with previous results with visual stimuli (Mateeff & Hohnsbein Reference Mateeff and Hohnsbein1988). However, the same asymmetry for the voluntarily moved limb (Rojas-Anya et al. Reference Rojas-Anya, Thirkettle and Nijhawan2005), in the absence of any retinal image motion, shows two things. First, consistent with previous observations (Batista et al. Reference Batista, Buneo, Snyder and Andersen1999), the nervous system uses an eye-centered frame for representing location of moving limbs. Second, the similarity of gaze-contingent modulation of the flash-lag effect across the modalities suggests that the nervous system uses a common rule for localizing moving items. These results support the suggestion that the CNS uses an internal model to perform the important task of localizing both moving objects and moving limbs.

On a final note, it is likely that there is reciprocal influence between visual and motor processes – thus, motor processes not only inform but also receive feedback from visual processes, and vice versa. As Helmholtz argued (see Warren & Warren Reference Warren and Warren1968, p. 112), the interactions between motor and visual representations may be trained by experiences such as the stimulation of the observer's own retinas caused by voluntary movements of his or her own limbs. The suggested interaction between voluntary movements and vision leading to common localization codes for visual objects and limbs would require cross-modal plasticity, for which there is ample evidence (Goel et al. Reference Goel, Jiang, Xu, Song, Kirkwood and Lee2006; Held & Freedman Reference Held and Freedman1963).

8. Visual prediction: New, or another constancy?

Although the proposal of visual prediction has been deemed to be novel by some, this concept is in consonance with various constancy phenomena. Consider, for example, size constancy, which minimizes variation in the perceived size of objects despite variation in their retinal image size. Visual prediction may be thought of as a process that achieves “lag-constancy for moving objects.” Just as the distance between the observer's eyes and objects is variable, so is the velocity of moving objects. Let an “early” and a “late” neural visual representation of a moving ball be denoted by RA and RB, respectively, with the goal of the visual compensation being to reduce discrepancy between RA and RB. Suppose further that v n and v N are two different velocities of the ball, with v N>v n. If there were no visual compensation, then the difference between RA and RB for v N will be greater than the difference between RA and RB for v n. Furthermore, in the absence of any visual compensation, the difference between RA and RB would be expected to increase linearly with object velocity. Visual compensation for delays would achieve lag constancy for moving objects by minimizing the variation of the difference between RA and RB for different velocities. This suggestion is analogous to suggestions concerning motion deblurring (Burr Reference Burr1980; Burr & Morgan Reference Burr and Morgan1997), in which the visual system removes motion-smear that would otherwise result from persistence of neural activity. Due to motion deblurring, a moving object appears relatively constant in shape, a process termed “sharpness constancy” (Ramachandran et al. Reference Ramachandran, Rao and Vidyasagar1974). In the absence of sharpness constancy, the same moving object, traveling at different speeds, would appear more or less elongated (smeared).

9. Implications for neural prediction

Until now, the approach developed here has focused on visual processes leading to perception or on visual processes feeding into action systems, which nonetheless do ultimately manifest themselves perceptually. However, the current approach can be easily adapted for visual spatial extrapolation processes that exclusively supply controllers of end-effectors with no concomitant influence on perception. The current approach can, furthermore, be applied to the investigation of prediction in the CNS more generally – an approach that may be termed neural prediction. Such an approach would complement other approaches (Duhamel et al. Reference Duhamel, Colby and Goldberg1992; Ghez & Krakauer Reference Ghez, Krakauer, Kandel, Schwartz and Jessell2000; Kawato Reference Kawato1999; Wolpert & Flanagan Reference Wolpert and Flanagan2001) investigating prediction phenomena in the CNS. The general implication drawn from the present approach is that the function of predictive mechanisms, located anywhere in the CNS, is to coordinate neural activity between different parts of the CNS.

9.1. Coordination of neural representations

It is suggested that a basic goal of the CNS is coordination of neural activity across various levels within the sensory and the motor systems. During normal behaviors under sensory guidance, the animal's actions depend on many neural representations existing at multiple levels in different modalities. Information must be exchanged between representations, therefore necessitating the translation of information (Andersen et al. Reference Andersen, Snyder, Li and Stricanne1993). For example, during visually guided action, information must be exchanged between visual brain areas representing position of objects in the environment and somatosensory/motor brain areas representing the position of the animal's effectors. Typically the distances that separate the representations are large, so the information exchange between the representations is time consuming. Neural prediction suggests that the goal of any type of compensation is to produce coordination between the time-varying neural representations that exist in parts of the CNS separated by significant delays.

Suppose two neural representations, RA and RB, exist in two different parts of the CNS. As discussed in section 5.2.2, when there are inconsistencies, these representations might compete to control cell activity. For non-competing representations, there are three types of potential errors between RA and RB, resulting from neural delays: (1) The moving coordinates of an item in one representation (say, RA) may lead relative to the coordinates of the same item in another representation (say, RB); for example, RA and RB could both be representing the visual position of a moving ball (say), with one being a retinal representation and the other a representation in LGN. In another situation, RA could be located in the visual and RB in the motor system, with RB representing the position of the ball in torso-centered coordinates. In yet another situation, RA could belong to one item (e.g., the ball) and RB to a different item (e.g., the subject's arm). Which representations are created, and which representations interact during behavior, will depend on the specific behavior. In general, the distances separating the representations – and consequently the neural delays – will vary. The task of coordination would be to minimize the differences between RA and RB. (2) Different modalities process information at different speeds. Coordination would reduce the potential error between two representations (e.g., vision and touch), created by one object (e.g., an insect moving quickly on the observer's arm). (3) The neural latencies for different stimuli within a given modality vary depending on various properties (e.g., stimulus intensity). In the absence of a coordinating influence, for moving objects, the “rule” that connects the object location in retinal coordinates (say) to the object location in head-centered coordinates further downstream, would vary as a function of the contrast level of the stimulus. In this case, the coordinating influence would act to minimize the required modification of the rule in different situations.

10. Summary and conclusions

It is clear that predictive phenomena could not have evolved without input to the CNS from continuous time-varying stimuli such as those resulting from movement of objects, or if the world consisted only of discrete aperiodic events and unchanging objects. It is also clear that predictive responses seen in animals and neurons could not exist without compensation of the significant time delays that are intrinsic to all neural processes. Previous researchers have asked how the CNS reconstructs the continuous time-varying stimulus from discrete spike trains (Bialek et al. Reference Bialek, Rieke, de Ruyter van Steveninck and Warland1991). However, neurons located far from the receptor epithelia face the additional challenge of reconstructing the current state of the time-varying stimulus. This current state of the stimulus may be necessary data for decision processes, the outcome of which would directly determine whether the animal acts or withholds action, or which action the animal selects. In addition, the current state of changing stimuli may be necessary for providing the animal feedback about actions it has performed.

The reconstruction of the current state of the stimulus could be undertaken by a predictive process within the visual system; however, at the outset, prediction appears to be a high-level phenomenon. Hence, previous proposals of visual prediction led to wide debate. On the one hand, the debate is intertwined with the flash-lag findings based on unpredictable motion, and, on the other, with logical challenges for the notion of visual prediction. New analysis and data reveal that empirical results with unpredictable motion, and in particular the flash-terminated and flash-initiated conditions, are not inconsistent with visual prediction. In fact, the analysis shows that the times involved by the putative extrapolation mechanism are so small that the mechanisms may be more efficient than previously anticipated. In terms of logic, the fuel for the debate is the assumption that late (non-visual) processes compensate for all the delays during sensory-motor interaction. The corollary of this viewpoint is that the motor system receives spatially lagged visual input. Visual prediction suggests that the lag in the sensory data is compensated by visual spatial mechanisms in the feedforward vision pathways or by interactions of vision processes with late neural process linked to action.

Interception of moving targets is thought to depend on the motor system's ability to produce an output with features that match properties of the moving target. For example, when we attempt to catch a ball in flight, our motor system attempts to match the instantaneous position of our moving hand with the position of the moving ball. However, an opposite position that is not often entertained is that the goal of the visual system is to deliver to the animal visual information that is suitably shaped for action directed at a moving ball. Thus, during a catch, the visual system modifies the perceived position of the ball so that it matches the position of the moving hand.Footnote 3 On this view, the goal of the visual system is to generate an output that has shared features with motor processes: in particular, predictions (Wilson & Knoblich Reference Wilson and Knoblich2005).

The proposal is that visual representations that receive feedback from higher cortical areas are susceptible to modification. Thus, these visual representations are controlled entities, just like neural limb representations. The descending visual signals cannot, of course, activate otherwise silent neurons, which is presumably only possible on the basis of stimulus-driven retinal input (Hupe et al. Reference Hupe, James, Payne, Lomber, Girard and Bullier1998). But the descending signals can, nonetheless, affect ongoing activity in many areas (e.g., the thalamus) and produce anticipatory spatial adjustments (Sillito et al. Reference Sillito, Jones, Gerstein and West1994).

Although my somewhat limited goal was to evaluate the feasibility of visual prediction, during the course of this endeavor it seems that prediction may be far more pervasive in the CNS than originally expected. The novel approach developed here may be easily adapted to investigate predictive phenomena in the CNS more generally. Visual prediction has a strong logical basis and seems consonant with other visual phenomena such as the various constancies and motion deblurring, as well as theoretical constructs such as neural competition. Prediction may be a multi-level, multi-modal phenomenon found in both sensory and motor systems. Furthermore, prediction may result from computations carried out by single neurons, or neural networks, or both. This general approach to the study of prediction suggests possibilities that could unify research from single cells to cognition.

ACKNOWLEDGMENTS

I would like to thank Beena Khurana for her continuous involvement with this work. I would also like to thank Raje Nijhawan, Mark Changizi, Wim van de Grind, Mike Beaton, Rich Ivry, Shin Shimojo, Kuno Kirschfeld, Gerrit Maus, and Raghbir Singh for stimulating discussions, and Barbara Finlay, Marcus Baldo and five referees for insightful feedback on an earlier draft of the article.

Footnotes

1. Since signals within the retina itself are transmitted via graded potentials, as opposed to action potentials, the time taken for lateral interactions across 0.15 mm of retina could take significantly shorter time than the above estimate (Bruce Bridgeman 2005, personal communication).

2. There could be many reasons for behaviors normally displayed by animals to break down. Breakdown in behavior is known to occur, for example, in nervous systems in which disease has affected neural processing delays. For example, Multiple Sclerosis occurs because of demyelination, which affects neural transmission delays. Commonly observed behavior in healthy animals leads to the following assumption: In the absence of mechanisms compensating for neural delays, many behaviors in otherwise healthy animals would be disrupted. An analogous assumption holds for visual delays. The fundamental assumption of compensation for visual delays states that in the absence of mechanisms compensating for visual delays, many behaviors in otherwise healthy animals would be disrupted. Note that this last statement makes no assumption about whether visual or non-visual mechanisms compensate for visual delays.

3. One significant difference between the treatments of the visual position of a moving object and the sensed position of a moving limb is that we appear to have no conscious control over the position of the moving object, while we do have conscious control over the position of our limb. However, we are aware of only some internal representations that allow us to predict the future states of our limbs from current states during movement (Blakemore et al. Reference Blakemore, Wolpert and Frith2002). Thus, many representations that allow for prediction of moving visual objects and of limbs during movement are not available to awareness.

References

Aho, A. C., Donner, K., Helenius, S., Larsen, L. O. & Reuter, T. (1993) Visual performance of the toad (Bufo bufo) at low light levels: Retinal ganglion cell responses and prey-catching accuracy. Journal of Comparative Physiology A 172(6):671–82.CrossRefGoogle ScholarPubMed
Alais, D. & Burr, D. (2003) The “flash-lag” effect occurs in audition and cross-modally. Current Biology 13(1):5963.CrossRefGoogle Scholar
Andersen, R. A. & Buneo, C. A. (2002) Intentional maps in posterior parietal cortex. Annual Review of Neuroscience 25:189220.CrossRefGoogle ScholarPubMed
Andersen, R. A., Snyder, L. H., Li, C. S. & Stricanne, B. (1993) Coordinate transformations in the representation of spatial information. Current Opinion in Neurobiology 3(2):171–76.CrossRefGoogle ScholarPubMed
Anderson, C. H. & Van Essen, D C. (1987) Shifter circuits: A computational strategy for dynamic aspects of visual processing. Proceedings of the National Academy of Science USA 84(17):6297–301.CrossRefGoogle ScholarPubMed
Anderson, C. H., Van Essen, D. C. & Gallant, J. L. (1990) Blur into focus. Nature 343(6257):419–20.CrossRefGoogle ScholarPubMed
Anstis, S. M., Smith, D. R. & Mather, G. (2000) Luminance processing in apparent motion, Vernier offset and stereoscopic depth. Vision Research 40(6):657–75.CrossRefGoogle ScholarPubMed
Arbib, M. A. (1972) The metaphorical brain: An introduction to cybernetics as artificial intelligence and brain theory. Wiley Interscience.Google Scholar
Baldo, M. V. & Caticha, N. (2005) Computational neurobiology of the flash-lag effect. Vision Research 45(20):2620–30.CrossRefGoogle ScholarPubMed
Barlow, H. B. (1953) Summation and inhibition in the frog's retina. Journal of Physiology 119:6988.CrossRefGoogle ScholarPubMed
Barlow, H. B. (1961a) Possible principles underlying the transformations of sensory messages. In: Sensory communication, ed. Rosenblith, W. A.. Wiley.Google Scholar
Barlow, H. B. (1979) Reconstructing the visual image in space and time. Nature 279(5710):189–90.CrossRefGoogle ScholarPubMed
Barlow, H. B. (1981) The Ferrier Lecture, 1980. Critical limiting factors in the design of the eye and visual cortex. Proceedings of the Royal Society of London B: Biological Sciences 212(1186):134.Google ScholarPubMed
Batista, A. P., Buneo, C. A., Snyder, L. H. & Andersen, R. A. (1999) Reach plans in eye-centered coordinates. Science 285(5425):257–60.CrossRefGoogle ScholarPubMed
Berry, M. J., Brivanlou, I. H., Jordan, T. A. & Meister, M. (1999) Anticipation of moving stimuli by the retina. Nature 398(6725):334–38.CrossRefGoogle ScholarPubMed
Bialek, W., Rieke, F., de Ruyter van Steveninck, R. R. & Warland, D. (1991) Reading a neural code. Science 252(5014):1854–57.CrossRefGoogle Scholar
Blakemore, S. J., Wolpert, D. M. & Frith, C. D. (2002) Abnormalities in the awareness of action. Trends in Cognitive Sciences 6(6):237–42.CrossRefGoogle ScholarPubMed
Bootsma, R. J. & Oudejans, R. R. (1993) Visual information about time-to-collision between two objects. Journal of Experimental Psychology: Human Perception and Performance 19(5):1041–52.Google ScholarPubMed
Brenner, E. & Smeets, J. B. (2000) Motion extrapolation is not responsible for the flash-lag effect. Vision Research 40(13):1645–48.CrossRefGoogle Scholar
Brenner, E., Smeets, J. B. J. & van den Berg, A. V. (2001) Smooth eye movements and spatial localization. Vision Research 41:2253–59.CrossRefGoogle Scholar
Bringuier, V., Chavane, F., Glaeser, L. & Fregnac, Y. (1999) Horizontal propagation of visual activity in the synaptic integration field of area 17 neurons. Science 283(5402):695–99.CrossRefGoogle ScholarPubMed
Bullock, D. (2003) Motoneuron recruitment. In: The handbook of brain theory and neural networks, ed. Arbib, M. A.. MIT Press.Google Scholar
Burr, D. C. (1980) Motion smear. Nature 284(5752):164–65.CrossRefGoogle ScholarPubMed
Burr, D. C. & Morgan, M. J. (1997) Motion deblurring in human vision. Proceedings of the Royal Society of London B: Biological Sciences 264(1380):431–36.CrossRefGoogle ScholarPubMed
Burr, D. C. & Ross, J. (1979) How does binocular delay give information about depth? Vision Research 19(5):523–32.CrossRefGoogle ScholarPubMed
Cavanagh, P. (1997) Predicting the present. Nature 386(6620):19, 21.CrossRefGoogle ScholarPubMed
Changizi, M. A., Hsieh, A., Nijhawan, R., Kanai, R. & Shimojo, S. (in press) Perceiving-the-present and a systematization of illusions. Cognitive ScienceGoogle Scholar
Crick, F. & Koch, C. (1995) Are we aware of neural activity in primary visual cortex? Nature 375(6527):121–23.CrossRefGoogle ScholarPubMed
Cudeiro, J. & Sillito, A. M. (2006) Looking back: Corticothalamic feedback and early visual processing. Trends in Neuroscience 29(6):298306.CrossRefGoogle ScholarPubMed
Cynader, M. & Berman, N. (1972) Receptive-field organization of monkey superior colliculus. Journal of Neurophysiology 35(2):187201.CrossRefGoogle ScholarPubMed
Davidson, D. (1970) Mental events. In: The nature of mind, ed. Rosenthal, D.. Oxford University Press.Google Scholar
De Valois, R. L. & Cottaris, N. P. (1998) Inputs to directionally selective simple cells in macaque striate cortex. Proceedings of the National Academy of Sciences USA 95(24):14488–93.CrossRefGoogle ScholarPubMed
De Valois, R. L. & De Valois, K. K. (1991) Vernier acuity with stationary moving Gabors. Vision Research 31(9):1619–26.CrossRefGoogle ScholarPubMed
Dean, P., Redgrave, P. & Westby, G. W. (1989) Event or emergency? Two response systems in the mammalian superior colliculus. Trends in Neuroscience 12(4):137–47.CrossRefGoogle ScholarPubMed
Dennett, D. C. & Kinsbourne, M. (1992) Time and the observer: The where and when of consciousness in the brain. Behavioral and Brain Sciences 15:183247.CrossRefGoogle Scholar
Desimone, R. (1998) Visual attention mediated by biased competition in extrastriate visual cortex. Philosophical Transactions of the Royal Society of London B: Biological Sciences 353(1373):1245–55.CrossRefGoogle ScholarPubMed
Desimone, R. & Duncan, J. (1995) Neural mechanisms of selective visual attention. Annual Review of Neuroscience 18:193222.CrossRefGoogle ScholarPubMed
DeYoe, E. A. & Van Essen, D. C. (1988) Concurrent processing streams in monkey visual cortex. Trends in Neuroscience 11(5):219–26.CrossRefGoogle ScholarPubMed
DiCarlo, J. J. & Maunsell, J. H. (2005) Using neuronal latency to determine sensory-motor processing pathways in reaction time tasks. Journal of Neurophysiology 93(5):2974–86.CrossRefGoogle ScholarPubMed
Diedrichsen, J., Verstynen, T., Hon, A., Lehman, S. L. & Ivry, R. B. (2003) Anticipatory adjustments in the unloading task: Is an efference copy necessary for learning? Experimental Brain Research 148(2):272–76.CrossRefGoogle ScholarPubMed
Dowling, J. E. (1979) Information processing by local circuits: the vertebrate retina as a model system. In: The neurosciences: Fourth study program, ed. Schmitt, F. O. & Worden, F. G.. MIT Press.Google Scholar
Dreher, B., Fukada, Y. & Rodieck, R. W. (1976) Identification, classification and anatomical segregation of cells with X-like and Y-like properties in the lateral geniculate nucleus of old-world primates. Journal of Physiology 258(2):433–52.CrossRefGoogle ScholarPubMed
DuBois, R. M. & Cohen, M. S. (2000) Spatiotopic organization in human superior colliculus observed with fMRI. Neuroimage 12(1):6370.CrossRefGoogle ScholarPubMed
Duhamel, J.-R., Colby, C. L. & Goldberg, M. E. (1992) The updating of the representation of visual space in parietal cortex by intended eye movements. Science 255:9092.CrossRefGoogle ScholarPubMed
Eagleman, D. M. & Sejnowski, T. J. (2000) Motion integration and postdiction in visual awareness. Science 287(5460):2036–38.CrossRefGoogle ScholarPubMed
Engel, A. K., Fries, P. & Singer, W. (2001) Dynamic predictions: Oscillations and synchrony in top-down processing. Nature Reviews Neuroscience 2(10):704–16.CrossRefGoogle ScholarPubMed
Erlhagen, W. (2003) Internal models for visual perception. Biological Cybernetics 88(5):409–17.CrossRefGoogle ScholarPubMed
Eskandar, E. N. & Assad, J. A. (1999) Dissociation of visual, motor and predictive signals in parietal cortex during visual guidance. Nature Neuroscience 2(1):8893.CrossRefGoogle ScholarPubMed
Fein, A. & Szuts, E. Z. (1982) Photoreceptors: Their role in vision Cambridge University Press.Google Scholar
Felleman, D. J. & Van Essen, D. C. (1991) Distributed hierarchical processing in the primate cerebral cortex. Cerebral Cortex 1(1):147.CrossRefGoogle ScholarPubMed
Franz, V. H., Gegenfurtner, K. R., Bulthoff, H. H. & Fahle, M. (2000) Grasping visual illusions: no evidence for a dissociation between perception and action. Psychological Science 11(1):2025.CrossRefGoogle ScholarPubMed
Freyd, J. J. (1987) Dynamic mental representations. Psychological Review 94(4):427–38.CrossRefGoogle ScholarPubMed
Fu, Y. X., Shen, Y. & Dan, Y. (2001) Motion-induced perceptual extrapolation of blurred visual targets. Journal of Neuroscience 21(20):RC172.CrossRefGoogle ScholarPubMed
Gegenfurtner, K. (1999) Neurobiology. The eyes have it! Nature 398(6725):291–92.CrossRefGoogle Scholar
Georgopoulos, A. P. (1986) On reaching. Annual Review of Neuroscience 9:147–70.CrossRefGoogle ScholarPubMed
Ghez, C. & Krakauer, J. (2000) The organization of movement. In: Principles of neural science, ed. Kandel, E. R., Schwartz, J. H. & Jessell, T. M.. McGraw Hill.Google Scholar
Gibson, J. J. (1961) Ecological optics. Vision Research 1:253–62.CrossRefGoogle Scholar
Goel, A., Jiang, B., Xu, L. W., Song, L., Kirkwood, A. & Lee, H. K. (2006) Cross-modal regulation of synaptic AMPA receptors in primary sensory cortices by visual experience. Nature Neuroscience 9(8):10011003.CrossRefGoogle ScholarPubMed
Goodale, M. A. & Milner, A. D. (1992) Separate visual pathways for perception and action. Trends in Neurosciences 15(1):2025.CrossRefGoogle ScholarPubMed
Goodale, M. A., Pelisson, D. & Prablanc, C. (1986) Large adjustments in visually guided reaching do not depend on vision of the hand or perception of target displacement. Nature 320(6064):748–50.CrossRefGoogle ScholarPubMed
Grzywacz, N. M. & Amthor, F. R. (1993) Facilitation in ON-OFF directionally selective ganglion cells of the rabbit retina. Journal of Neurophysiology 69(6):2188–99.CrossRefGoogle ScholarPubMed
Harris, C. S. (1963) Adaptation to displaced vision: Visual, motor, or proprioceptive change? Science 140:812–13.CrossRefGoogle ScholarPubMed
Harris, C. S. (1980) Insight or out of sight? Two examples of perceptual plasticity in the human adult. In: Visual coding and adaptability, ed. Harris, C. S.. Erlbaum.Google Scholar
Hazelhoff, F. & Wiersma, H. (1924) Die Wahrnehmungszeit I. Zeitschrift für Psychologie 96:171–88.Google Scholar
He, S., Cavanagh, P. & Intriligator, J. (1996) Attentional resolution and the locus of visual awareness. Nature 383(6598):334–37.CrossRefGoogle ScholarPubMed
Held, R. & Freedman, S. J. (1963) Plasticity in human sensorimotor control. Science 142:455–62.CrossRefGoogle ScholarPubMed
Hess, E. H. (1956) Space perception in the chick. Scientific American 195:7180.CrossRefGoogle Scholar
Hubel, D. H. & Wiesel, T. N. (1962) Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. Journal of Physiology 160:106–54.CrossRefGoogle ScholarPubMed
Hupe, J. M., James, A. C., Payne, B. R., Lomber, S. G., Girard, P. & Bullier, J. (1998) Cortical feedback improves discrimination between figure and background by V1, V2 and V3 neurons. Nature 394(6695):784–87.CrossRefGoogle ScholarPubMed
Jancke, D., Erlhagen, W., Dinse, H. R., Akhavan, A. C., Giese, M., Steinhage, A. & Schoner, G. (1999) Parametric population representation of retinal location: Neuronal interaction dynamics in cat primary visual cortex. Journal of Neuroscience 19(20):9016–28.CrossRefGoogle ScholarPubMed
Jeannerod, M., Kennedy, H. & Magnin, M. (1979) Corollary discharge: Its possible implications in visual and oculomotor interactions. Neuropsychologia 17(2):241–58.CrossRefGoogle ScholarPubMed
Johansson, R. S. & Westling, G. (1988) Programmed and triggered actions to rapid load changes during precision grip. Experimental Brain Research 71(1):7286.CrossRefGoogle ScholarPubMed
Kanai, R., Sheth, B. R. & Shimojo, S. (2004) Stopping the motion and sleuthing the flash-lag effect: spatial uncertainty is the key to perceptual mislocalization. Vision Research 44(22):2605–19.CrossRefGoogle ScholarPubMed
Kandel, E. R. & Wurtz, R. H. (2000) Constructing the visual image. In: Principles of neural science, ed. Kandel, E. R., Schwartz, J. H. & Jessell, T. M.. McGraw Hill.Google Scholar
Kaplan, E. & Shapley, R. M. (1982) X and Y cells in the lateral geniculate nucleus of macaque monkeys. Journal of Physiology 330:125–43.CrossRefGoogle ScholarPubMed
Kawato, M. (1999) Internal models for motor control and trajectory planning. Current Opinion in Neurobiology 9(6):718–27.CrossRefGoogle ScholarPubMed
Kawato, M., Furukawa, K. & Suzuki, R. (1987) A hierarchical neural-network model for control and learning of voluntary movement. Biological Cybernetics 57(3):169–85.CrossRefGoogle ScholarPubMed
Keysers, C. & Perrett, D. I. (2002) Visual masking and RSVP reveal neural competition. Trends in Cognitive Sciences 6(3):120–25.CrossRefGoogle ScholarPubMed
Khurana, B., Carter, R. M., Watanabe, K. & Nijhawan, R. (2006) Flash-lag chimeras: The role of perceived alignment in the composite face effect. Vision Research 46(17):2757–72.CrossRefGoogle ScholarPubMed
Khurana, B. & Nijhawan, R. (1995) Extrapolation or attention shift? Reply to Baldo and Klein. Nature 378:565–66.CrossRefGoogle Scholar
Khurana, B., Watanabe, K. & Nijhawan, R. (2000) The role of attention in motion extrapolation: Are moving objects “corrected” or flashed objects attentionally delayed? Perception 29(6):675–92.CrossRefGoogle ScholarPubMed
Kirschfeld, K. (1983) Are photoreceptors optimal? Trends in Neurosciences 6:97101.CrossRefGoogle Scholar
Kirschfeld, K. & Kammer, T. (1999) The Fröhlich effect: A consequence of the interaction of visual focal attention and metacontrast. Vision Research 39(22):3702–709.CrossRefGoogle ScholarPubMed
Krekelberg, B. & Lappe, M. (2001) Neuronal latencies and the position of moving objects. Trends in Neurosciences 24:335–39.CrossRefGoogle ScholarPubMed
Lacquaniti, F. & Maioli, C. (1989) The role of preparation in tuning anticipatory and reflex responses during catching. Journal of Neuroscience 9(1):134–48.CrossRefGoogle ScholarPubMed
Lamarre, Y., Busby, L. & Spidalieri, G. (1983) Fast ballistic arm movements triggered by visual, auditory, and somesthetic stimuli in the monkey. I. Activity of precentral cortical neurons. Journal of Neurophysiology 50(6):1343–58.CrossRefGoogle ScholarPubMed
Lamme, V. A. & Roelfsema, P. R. (2000) The distinct modes of vision offered by feedforward and recurrent processing. Trends in Neuroscience 23(11):571–z79.CrossRefGoogle ScholarPubMed
Land, M. F. & McLeod, P. (2000) From eye movements to actions: how batsmen hit the ball. Nature Neuroscience 3(12):1340–45.CrossRefGoogle ScholarPubMed
Lee, C., Rohrer, W. H. & Sparks, D. L. (1988) Population coding of saccadic eye movements by neurons in the superior colliculus. Nature 332(6162):357–60.CrossRefGoogle ScholarPubMed
Lee, D. N. (1976) A theory of visual control of braking based on information about time-to-collision. Perception 5(4):437–59.CrossRefGoogle ScholarPubMed
Lee, D. N. & Reddish, P. E. (1981) Plummeting gannets: A paradigm of ecological optics. Nature 293:293–94.CrossRefGoogle Scholar
Liberman, A. M. & Mattingly, I. G. (1985) The motor theory of speech perception revised. Cognition 21(1):136.CrossRefGoogle ScholarPubMed
Macknik, S. L., Martinez-Conde, S. & Haglund, M. M. (2000) The role of spatiotemporal edges in visibility and visual masking. Proceedings of the National Academy of Sciences USA 97(13):7556–60.CrossRefGoogle ScholarPubMed
Marr, D. (1982) Vision. W. H. Freeman.Google Scholar
Mateeff, S. & Hohnsbein, J. (1988) Perceptual latencies are shorter for motion towards the fovea than for motion away. Vision Research 28(6):711–19.CrossRefGoogle ScholarPubMed
Maunsell, J. H. & Gibson, J. R. (1992) Visual response latencies in striate cortex of the macaque monkey. Journal of Neurophysiology 68(4):1332–44.CrossRefGoogle ScholarPubMed
Maus, G. W. & Nijhawan, R. (2006) Forward displacements of fading objects in motion: The role of transient signals in perceiving position. Vision Research 46(26):4375–81.CrossRefGoogle ScholarPubMed
Mehta, B. & Schaal, S. (2002) Forward models in visuomotor control. Journal of Neurophysiology 88(2):942–53.CrossRefGoogle ScholarPubMed
Merfeld, D. M., Zupan, L. & Peterka, R. J. (1999) Humans use internal models to estimate gravity and linear acceleration. Nature 398(6728):615–18.CrossRefGoogle ScholarPubMed
Metzger, W. (1932) Versuch einer gemeinsamen Theorie der Phänomene Fröhlichs und Hazelhoffs und Kritik ihrer Verfahren zur Messung der Empfindungszeit. Psychologische Forschung 16:176200.CrossRefGoogle Scholar
Meyer, D. E., Osman, A. M., Irwin, D. E. & Yantis, S. (1988) Modern mental chronometry. Biological Psychology 26(1–3):367.CrossRefGoogle ScholarPubMed
Milner, A. D. & Goodale, M. A. (1995) The visual brain in action Oxford University Press.Google Scholar
Mishkin, M. & Ungerleider, L. G. (1983) Object vision and spatial vision: Two cortical pathways. Trends in Neuroscience 6:414–35.CrossRefGoogle Scholar
Morgan, M. J. & Thompson, P. (1975) Apparent motion and the Pulfrich effect. Perception 4(1):318.CrossRefGoogle ScholarPubMed
Müsseler, J. & Prinz, W. (1996) Action planning during the presentation of stimulus sequences: effects of compatible and incompatible stimuli. Psychological Research 59(1):4863.CrossRefGoogle ScholarPubMed
Nakamura, K., Matsumoto, K., Mikami, A. & Kubota, K. (1994) Visual response properties of single neurons in the temporal pole of behaving monkeys. Journal of Neurophysiology 71(3):1206–21.CrossRefGoogle ScholarPubMed
Nijhawan, R. (1992) Misalignment of contours through the interaction of the apparent and real motion systems. Investigative Ophthalmology and Visual Science 33(Suppl. 4):1415.Google Scholar
Nijhawan, R. (1994) Motion extrapolation in catching. Nature 370(6487):256–57.CrossRefGoogle ScholarPubMed
Nijhawan, R. (1997) Visual decomposition of colour through motion extrapolation. Nature 386(6620):6669.CrossRefGoogle ScholarPubMed
Nijhawan, R. (2001) The flash-lag phenomenon: object-motion and eye-movements. Perception 30:263–82.CrossRefGoogle ScholarPubMed
Nijhawan, R. (2002) Neural delays, visual motion and the flash-lag effect. Trends in Cognitive Sciences 6:387–93.CrossRefGoogle ScholarPubMed
Nijhawan, R. & Kirschfeld, K. (2003) Analogous mechanisms compensate for neural delays in the sensory and the motor pathways: Evidence from motor flash-lag. Current Biology 13(9):749–53.CrossRefGoogle ScholarPubMed
Parkinson, J. & Khurana, B. (2007) Temporal order of strokes primes letter recognition. Quarterly Journal of Experimental Psychology 60:1265–74.CrossRefGoogle ScholarPubMed
Poritsky, R. (1969) Two- and three-dimensional ultrastructure of boutons and glial cells on the motoneuronal surface in the cat spinal cord. Journal of Comparative Neurology 135(4):423–52.CrossRefGoogle Scholar
Port, N. L., Kruse, W., Lee, D. & Georgopoulos, A. P. (2001) Motor cortical activity during interception of moving targets. Journal of Cognitive Neuroscience 13(3):306–18.CrossRefGoogle ScholarPubMed
Purushothaman, G., Patel, S. S., Bedell, H. E. & Öğmen, H. (1998) Moving ahead through differential visual latency. Nature 396(6710):424.CrossRefGoogle ScholarPubMed
Raiguel, S. E., Lagae, L., Gulyas, B. & Orban, G. A. (1989) Response latencies of visual cells in macaque areas V1, V2 and V5. Brain Research 493(1):155–59.CrossRefGoogle ScholarPubMed
Ramachandran, V. S. & Anstis, S. M. (1990) Illusory displacement of equiluminous kinetic edges. Perception 19(5):611–16.CrossRefGoogle ScholarPubMed
Ramachandran, V. S., Rao, V. M. & Vidyasagar, T. R. (1974) Sharpness constancy during movement perception. Perception 3(1):9798.CrossRefGoogle ScholarPubMed
Ratliff, F. & Hartline, H. K. (1959) The responses of limulus optic nerve fibers to patterns of illumination on the receptor mosaic. Journal of General Physiology 42(6):1241–55.CrossRefGoogle ScholarPubMed
Regan, D. (1992) Visual judgements and misjudgements in cricket, and the art of flight. Perception 21(1):91115.CrossRefGoogle ScholarPubMed
Rizzolatti, G., Fadiga, L., Fogassi, L. & Gallese, V. (1997) The space around us. Science 277(5323):190–91.CrossRefGoogle ScholarPubMed
Rojas-Anya, H., Thirkettle, M. & Nijhawan, R. (2005) Flash-lag anisotropy for movement in three domains. Perception 34:219–20.Google Scholar
Roufs, J. A. J. (1963) Perception lag as a function of stimulus luminance. Vision Research 3:8191.CrossRefGoogle Scholar
Sarnat, H. B. & Netsky, M. G. (1981) Evolution of the nervous system. Oxford Uinversity Press.Google Scholar
Schiller, P. H. (1984) The superior colliculus and visual function. In: Handbook of physiology: The nervous system. Neurophysiology, ed. Darien-Smith, I.. American Physiological Society.Google Scholar
Schiller, P. H. & Malpeli, J. G. (1978) Functional specificity of lateral geniculate nucleus laminae of the rhesus monkey. Journal of Neurophysiology 41(3):788–97.CrossRefGoogle ScholarPubMed
Schlag, J. & Schlag-Rey, M. (2002) Through the eye slowly: delays and localization errors in the visual system. Nature Reviews Neuroscience 3:191200.CrossRefGoogle ScholarPubMed
Schmolesky, M. T., Wang, Y., Hanes, D. P., Thompson, K. G., Leutgeb, S., Schall, J. D. & Leventhal, A. G. (1998) Signal timing across the macaque visual system. Journal of Neurophysiology 79(6):3272–78.CrossRefGoogle ScholarPubMed
Shapley, R. M. & Victor, J. D. (1978) The effect of contrast on the transfer properties of cat retinal ganglion cells. Journal of Physiology 285:275–98.CrossRefGoogle ScholarPubMed
Sheth, B. R., Nijhawan, R. & Shimojo, S. (2000) Changing objects lead briefly flashed ones. Nature Neuroscience 3(5):489–95.CrossRefGoogle ScholarPubMed
Sillito, A. M., Jones, H. E., Gerstein, G. L. & West, D. C. (1994) Feature-linked synchronization of thalamic relay cell firing induced by feedback from the visual cortex. Nature 369(6480):479–82.CrossRefGoogle ScholarPubMed
Snyder, L. (1999) This way up: Illusions and internal models in the vestibular system. Nature Neuroscience 2(5):396–98.CrossRefGoogle ScholarPubMed
Sparks, D. L. & Jay, M. F. (1986) The functional organization of the primate superior colliculus: a motor perspective. Progress in Brain Research 64:235–41.CrossRefGoogle ScholarPubMed
Sparks, D. L., Lee, C. & Rohrer, W. H. (1990) Population coding of the direction, amplitude, and velocity of saccadic eye movements by neurons in the superior colliculus. Cold Spring Harbor Symposium on Quantitative Biology 55:805–11.CrossRefGoogle ScholarPubMed
Sperry, R. W. (1950) Neural basis of the spontaneous optokinetic response produced by visual inversion. Journal of Comparative and Physiological Psychology 43:482–89.CrossRefGoogle ScholarPubMed
Sperry, R. W. (1952) Neurology and the mind-brain problem. American Scientist 40:291312.Google Scholar
Sterzer, P. & Kleinschmidt, A. (2007) A neural basis for inference in perceptual ambiguity. Proceedings of the National Academy of Sciences USA 104(1):323–28.CrossRefGoogle ScholarPubMed
Stoerig, P. & Cowey, A. (1997) Blindsight in man and monkey. Brain 120(Part 3):535–59.CrossRefGoogle ScholarPubMed
Stratton, G. M. (1896) Some preliminary experiments on vision without inversion of the retinal image. Psychological Review 3:611–17.CrossRefGoogle Scholar
Stürmer, B., Aschersleben, G. & Prinz, W. (2000) Correspondence effects with manual gestures and postures: A study of imitation. Journal of Experimental Psychology: Human Perception and Performance 26(6):1746–59.Google ScholarPubMed
Sun, H. & Frost, B. J. (1998) Computation of different optical variables of looming objects in pigeon nucleus rotundus neurons. Nature Neuroscience 1(4):296303.CrossRefGoogle ScholarPubMed
Sundberg, K. A., Fallah, M. & Reynolds, J. H. (2006) A motion-dependent distortion of retinotopy in area V4. Neuron 49(3):447–57.CrossRefGoogle ScholarPubMed
Taira, M., Mine, S., Georgopoulos, A. P., Murata, A. & Sakata, H. (1990) Parietal cortex neurons of the monkey related to the visual guidance of hand movement. Experimental Brain Research 83(1):2936.CrossRefGoogle Scholar
Tessier-Lavigne, M. (2000) Visual processing by the retina. In: Principles of neural science, ed. Kandel, E. R., Schwartz, J. H. & Jessell, T. M.. McGraw Hill.Google Scholar
Thier, P. & Ilg, U. J. (2005) The neural basis of smooth-pursuit eye movements. Current Opinion in Neurobiology 15(6):645–52.CrossRefGoogle ScholarPubMed
Tootell, R. B., Silverman, M. S., Switkes, E. & De Valois, R. L. (1982) Deoxyglucose analysis of retinotopic organization in primate striate cortex. Science 218(4575):902904.CrossRefGoogle ScholarPubMed
Tresilian, J. R. (1993) Four questions of time to contact: A critical examination of research on interceptive timing. Perception 22(6):653–80.CrossRefGoogle ScholarPubMed
Tresilian, J. R. (1999) Visually timed action: time-out for “tau”? Trends in Cognitive Sciences 3(8):301–10.CrossRefGoogle ScholarPubMed
Treue, S. (2003) Climbing the cortical ladder from sensation to perception. Trends in Cognitive Sciences 7(11):469–71.CrossRefGoogle ScholarPubMed
van de Grind, W. (2002) Physical, neural, and mental timing. Consciousness and Cognition 11(2):241–64; discussion 308–13.CrossRefGoogle ScholarPubMed
von Holst, E. & Mittelstaedt, H. (1950) Das Reafferenzprinzip. Naturwissenschaften 37:464–76.CrossRefGoogle Scholar
Wagner, H. (1982) Flow-field variables trigger landing in flies. Nature 297:147–48.CrossRefGoogle Scholar
Wald, G. (1968) The molecular basis of visual excitation. Nature 219(5156):800807.CrossRefGoogle ScholarPubMed
Walls, G. L. (1942) The vertebrate eye and its adaptive radiation. Cranbrook Press.Google Scholar
Wang, Y. & Frost, B. J. (1992) Time to collision is signalled by neurons in the nucleus rotundus of pigeons. Nature 356(6366):236–38.CrossRefGoogle ScholarPubMed
Warren, R. M. & Warren, R. P. (1968) Helmholtz on perception: Its physiology and development. Wiley.Google Scholar
Weiskrantz, L. (1996) Blindsight revisited. Current Opinion in Neurobiology 6(2):215–20.CrossRefGoogle ScholarPubMed
Westheimer, G. & McKee, S. P. (1977) Perception of temporal order in adjacent visual stimuli. Vision Research 17(8):887–92.CrossRefGoogle ScholarPubMed
Whitaker, D., Perarson, S., McGraw, P. V. & Banford, M. (1998) Keeping a step ahead of moving objects. Investigative Ophthalmology and Visual Science 39(Suppl):S1078.Google Scholar
Whitney, D. & Murakami, I. (1998) Latency difference, not spatial extrapolation. Nature Neuroscience 1(8):656–57.CrossRefGoogle Scholar
Williams, H. & Nottebohm, F. (1985) Auditory responses in avian vocal motor neurons: a motor theory for song perception in birds. Science 229(4710):279–82.CrossRefGoogle ScholarPubMed
Williams, J. M. & Lit, A. (1983) Luminance-dependent visual latency for the Hess effect, the Pulfrich effect, and simple reaction time. Vision Research 23(2):171–79.CrossRefGoogle ScholarPubMed
Williams, Z. M., Elfar, J. C., Eskandar, E. N., Toth, L. J. & Assad, J. A. (2003) Parietal activity and the perceived direction of ambiguous apparent motion. Nature Neuroscience 6(6):616–23.CrossRefGoogle ScholarPubMed
Wilson, M. & Knoblich, G. (2005) The case for motor involvement in perceiving conspecifics. Psychological Bulletin 131(3):460–73.CrossRefGoogle ScholarPubMed
Witten, I. B., Bergan, J. F. & Knudsen, E. I. (2006) Dynamic shifts in the owl's auditory space map predict moving sound location. Nature Neuroscience 9(11):1439–45.CrossRefGoogle ScholarPubMed
Wolpert, D. M. & Flanagan, J. R. (2001) Motor prediction. Current Biology 11(18):R729–32.CrossRefGoogle ScholarPubMed
Wolpert, D. M., Ghahramani, Z. & Jordan, M. I. (1995) An internal model for sensorimotor integration. Science 269(5232):1880–82.CrossRefGoogle ScholarPubMed
Wolpert, D. M., Miall, R. C. & Kawato, M. (1998) Internal models in the cerebellum. Trends in Cognitive Sciences 2:338–47.CrossRefGoogle ScholarPubMed
Woodworth, R. S. (1899) The accuracy of voluntary movement. Psychological Review 3(2), Whole No. 13.Google Scholar
Woodworth, R. S. & Schlosberg, H. (1954) Experimental psychology. Methuen.Google Scholar
Zanker, J. M., Quenzer, T. & Fahle, M. (2001) Perceptual deformation induced by visual motion. Naturwissenschaften 88(3):129–32.CrossRefGoogle ScholarPubMed
Figure 0

Figure 1. (a) The retina is shown as a one-dimensional coordinate system mapped topographically onto the cortex. For simplicity, the thalamic stage is not included. A moving ball's image travels from left to right at constant velocity. It is assumed that the neural delay between the retina and the cortex is equal to the time the ball takes in moving from retinal coordinate x−1 to x0. At the instant depicted, the cortical coordinate of the ball, showing greatest activity is x'−1, which corresponds to retinal coordinate x−1, while the ball's actual retinal coordinate at the instant depicted is x0. (b) The batsman in cricket views a ball heading toward him after bouncing some distance in front of him. The batsman's perception is assumed to depend on the cortical activity triggered by the moving ball. At the instant depicted, the physical position of the ball (filled circle) is ahead of the ball's perceived position (unfilled circle). The lag is proportional to the neural delay and object velocity, which for a cricket ball can be more than 90 mph. For an assumed delay of 100 msec, the difference between the ball's perceived position and its actual position should be 13.2 ft. Adapted from Land and McLeod (2000).

Figure 1

Figure 2. (a) In a dark room, a single physical rod, made of three segments, rotates clockwise in the direction of the arrow. The segment labeled “C” is illuminated with a continuous light source, while the discrete segments labeled “D” are illuminated with a brief flash produced by a stroboscope. (b) The percept of the observers. Adapted from Nijhawan (1992).

Figure 2

Figure 3. Representation of the “standard” view with space–time plots of an object moving at constant velocity v (thick line) and how a neuron “sees” the object with some delay (thin line). A brief flash (filled gray square) presented in position x0 at time t0 is “seen” (outline square) by the neuron in position x0 at time t0tp, where Δtp is input delay to perception. Two events, the arrival-of-the-moving-object in position x0 (filled gray circle on the x=x0 line) and the “seeing” of this event by the neuron (outline circle on the x=x0 line) occur at different times because of neural latency between the event and its perception. At a given time (say t0), the physical position of the moving object spatially leads (filled gray circle on t=t0 line) the position in which the neurons “see” the object (outline circle on t=t0 line). The spatial gap between the physical position and the neurally represented position are referred to as ds-error. The standard view suggests that the perceived object travels on the ds-error line. Adapted from Krekelberg and Lappe (2001).

Figure 3

Figure 4. A revision of the standard view is forced by the flash-lag effect. This figure shows a new trajectory, the “reduced ds-error” line, which is a line parallel to the ds-error (and the physical) line, passing through the point in which the moving item is seen in the flash-lag effect. The distance along the x (space)-axis between the reduced ds-error line and the ds-error is Δσ; where Δσ is the experimentally determined flash-lag effect (see, e.g., Nijhawan 1994). The standard view needs to be revised, as on this view, the moving object is seen on the ds-error line throughout, and in particular, at t0 – the time of the flash (open circle). The flash-lag effect shows that the moving object is seen ahead of the flash by distance Δσ at t0tp (filled black circle). Thus, for both the standard view to be true and the flash-lag effect to occur, the moving object would have to appear to speed up (which corresponds to a segment of a space–time trajectory, shown as the thin broken line of different slope). Because the moving object is assumed to travel at constant velocity, the standard view is untenable.

Figure 4

Figure 5. The pathways in the retina depicting the vertical (distal–proximal) and the horizontal pathways of information flow. Adapted from Dowling (1979) and Tessier-Lavigne (2000).

Figure 5

Figure 6. A hypothetical model based on a network with excitatory and inhibitory interactions. This network involves the layer 5 pyramidal cells in V1. The vertical processes of the pyramidal cells are the apical dendrites extending into layer 4. These dendrites receive horizontal connections from both the faster transient (Y) cells and the slower sustained (X) cells located in different sub-layers of layer 4. A moving ball's current retinal position is shown at the top of the figure. The resulting neural representations in layer 4 X and Y maps are “misaligned” with respect to the retinal representation and with respect to each other (due to different speeds of the sustained and transient channels). The excitatory and inhibitory horizontal connections cause the leftmost pyramidal cell to be excited, the rightmost pyramidal cell to be inhibited, and the middle cell's activity to remain unchanged (because it receives both excitatory and inhibitory inputs). Thus, the leftmost pyramidal cell behaves as if it is representing the “current” position of the ball. Adapted form Barlow (1981).

Figure 6

Figure 7. On the spatial extrapolation view, the latency for the moving object and the flash, in the flash-initiated condition, is Δti (input delay). The figure shows how this model applies when the motion trajectory before the flash is missing, as in the flash-initiated condition. The reduced ds-error line stops before intersecting the x=x0 line as the flash-lag effect occurs in the flash-initiated condition. The extrapolation model brings the moving object to the correct position on the reduced ds-error line by two processes: sampling of motion and spatial extrapolation. These two processes correspond to the thin broken line consisting of two segments of different slopes, enlarged in inset “a”. The first segment, parallel to the space–time plot of physical motion, depicts a motion sample taken by neighboring neurons in a retinotopic map. The height at which this segment intersects the x=x0 line is arbitrary and will depend on whether an early or a late sampling is assumed. The second segment of greater slope represents the travel of neural signals along the horizontal visual pathway, which can occur at higher speed. Inset “b” is made of three segments of different slopes. It depicts a situation of “over-extrapolation” that temporarily brings the extrapolated object closer to the thick continuous (physical) line. The object is brought back to the reduced ds-error line, as neural processes receiving the over-extrapolated input are presumed not capable of compensating for delays (see vertical segment in inset “b”, and see text).

Figure 7

Figure 8. In this space–time plot, the moving object stops at position x0 at time t0, simultaneously with the presentation of the flash presented in position x0. The fully delayed ds-error line (from the standard view) stops at the x0 line, which is consistent with the result of “no effect” in the flash-terminated condition. However, spatial extrapolation can also explain this result. On this view, the cortical representation of the moving object (RC), which is only partially maintained by retinal input, corresponds to the reduced ds-error (thick broken) line. Once the physical object is switched off, RC quickly decays (shown by lighter thick broken line) as retinal input stops. The switching off of the moving object triggers transient signals, which create a strong thalamic representation (RT). Because of biased competition, and because RT is newer input, RT wins and dominates RC and the percept, overwhelming any predictive-overshoot of the moving object.

Figure 8

Figure 9. (a) The gray background is used only for illustration; the actual experiment was performed with a homogenous dark background. A neutral density filter, which decreased in transmission from 100% (at the 6 o'clock position) to 0.01% in the counterclockwise direction, was used. Thus, a moving stimulus (a white dot of fixed luminance) behind the filter appeared to vary in luminance; it was always visible in the 6 o'clock position and never visible past about the 9 o'clock position. In condition I, on different trials, the dot moved back and forth on a short trajectory behind the filter. The average position of the short trajectory dot was changed from trial to trial (figure depicts a trial sequence of trials 1, 2, and 3). A reference line (white radial line segment of shorter length next to the filter's edge) was presented on each trial. Participants said “yes” or “no” depending on whether they saw the dot at the position of the reference line or not. Detection thresholds were determined (ThresholdposI, schematically depicted by the longer line segment). At ThresholdposI, “yes” and “no” responses were equally likely. (b) In condition II, the same white dot moved on a long trajectory from the 6 o'clock position into the filter's denser part until it disappeared, and again around the 9 o'clock position. A reference line was presented (white radial line segment of shorter length). For this task, participants said “ahead” or “behind” depending on whether they perceived the dot as disappearing before it reached the reference line or after it had passed it. From trial to trial, the position of the reference line was changed to determine the disappearance threshold (ThresholdposII, schematically depicted by the longer segment of radial line). At ThresholdposII, “ahead” and “behind” responses were equally likely. Our main finding was that (ThresholdposII−ThresholdposI)/v=175 msec, where v is dot velocity. Adapted from Maus and Nijhawan (2006).