Leading theories of visual search postulate that search targets are found by deploying attention sequentially to individual objects (items). Hulleman & Olivers (H&O) reject such serial item-based accounts and propose an alternative where fixations replace items as the conceptual unit of visual search. In their nascent computational model, individual search episodes start once the eyes have reached a new fixation location. Parallel processing of all items within a functional view field (FVF) results in a decision about target presence/absence. If no target is found, the eyes move to a different location, and a new search episode commences. This model performs remarkably well in simulating search slopes and the variability of search performance across different types of search tasks. However, questions remain about the mechanisms proposed for localizing targets and discriminating them from irrelevant objects during individual fixations. For example, fixation duration is constant at 250 ms, and the visual slate is wiped clean during each new eye movement, and therefore the decision about the presence of a target within the FVF has to be made within this brief time window. Results from attentional dwell time and attentional blink experiments suggest that target identification processes may require at least 300–500 ms, and may therefore extend in time beyond individual fixation periods.
At a more fundamental level, it is difficult to see how objects can be replaced as conceptual units in visual search, given that the visual world is made up of objects, and finding a particular target object is the goal of a typical search task. H&O claim that processing with a fixation period is not item-based, because “all items are in principle selected and processed simultaneously” (sect. 6.3) by mechanisms that compute global area activations and pooled summary statistics across the FVF. This is plausible for easy search tasks where targets can be found on the basis of local feature discontinuities (singleton detection), and also for non-search tasks that require the rapid extraction of the gist of a scene. What remains unclear is whether such global area-based mechanisms can detect the presence or absence of targets even in moderately difficult search tasks where no diagnostic low-level saliency signals are available and distractors share features with the target. Furthermore, the spatially non-selective group-based account proposed by H&O seems at odds with neuroscientific insights into the control of visual search. During search for targets with known features, biases of visual processing towards target-matching objects emerge rapidly within the first 200 ms after the presentation of a search display, even outside of the current attentional focus (e.g., Bichot et al. Reference Bichot, Rossi and Desimone2005). These biases are elicited in a spatially specific fashion in retinotopic visual areas that match the location of possible target objects. They can initially be triggered at multiple locations across the visual field, but gradually become more spatially focused, and may eventually result in the selective activation of one particular object representation (see Eimer Reference Eimer2014; Reference Eimer2015, for a more detailed discussion, and Duncan Reference Duncan2006, for related ideas on object-based integrated competition mechanisms in visual search). The important point here is that such task-dependent attentional biases of visual processing emerge in spatial visual maps that represent candidate target objects at particular locations. In this fundamental sense, attentional selection mechanisms and their neural basis remain irreducibly item-based. Crucially, this type of item-based selectivity does not imply serial selection. Spatially selective processing biases for target-matching objects can emerge in parallel across the visual field (e.g., Bichot et al. Reference Bichot, Rossi and Desimone2005; Saenz et al. Reference Saenz, Buracas and Boynton2002), and multiple target objects at different locations can be selected simultaneously and independently (e.g., Eimer & Grubert Reference Eimer and Grubert2014).
Within the framework proposed by H&O, it may be useful to distinguish between the guidance of spatial attention during individual fixation episodes, and the guidance of eye movements. The selection of new fixation locations might indeed be informed by global area-based computations that are performed in parallel outside of the currently fixated region, and provide information about the likelihood of a target being present elsewhere in the visual field. In contrast, attentional control processes within the FVF during a fixation episode operate via spatially selective and thus essentially item-based modulations of visual processing. In fact, H&O acknowledge the existence of such spatial biases that gradually become more item-based for the case of compound search where target-defining and response-relevant features differ. Here, “a global search for the target-defining feature may be followed by a local search for the response-defining feature.” The question remains whether this type of item-based spatially selective attentional control is the exception or the rule during visual search. Although some real-world visual search tasks (e.g., the scanning of mammograms or security X-ray images) do not involve the well-defined objects that are used in lab-based search studies, one could argue that even here, search is still guided in a spatially selective fashion by image features that are relevant for the task at hand.
The new fixation-based search model proposed by H&O is useful not only because of its power to simulate behavioural results, but also because it invites us to think differently about visual search. Serial selection models have dominated the field for decades, and alternative concepts are sorely needed. H&O provide excellent arguments for abandoning strictly sequential item-by-item accounts of visual search. However, in their endeavour to reject serial selection, they may have thrown out the item-based baby with the serial bathwater. Attentional processes in visual search may indeed operate in a largely parallel fashion, but the item will remain a primary unit of selection.
Leading theories of visual search postulate that search targets are found by deploying attention sequentially to individual objects (items). Hulleman & Olivers (H&O) reject such serial item-based accounts and propose an alternative where fixations replace items as the conceptual unit of visual search. In their nascent computational model, individual search episodes start once the eyes have reached a new fixation location. Parallel processing of all items within a functional view field (FVF) results in a decision about target presence/absence. If no target is found, the eyes move to a different location, and a new search episode commences. This model performs remarkably well in simulating search slopes and the variability of search performance across different types of search tasks. However, questions remain about the mechanisms proposed for localizing targets and discriminating them from irrelevant objects during individual fixations. For example, fixation duration is constant at 250 ms, and the visual slate is wiped clean during each new eye movement, and therefore the decision about the presence of a target within the FVF has to be made within this brief time window. Results from attentional dwell time and attentional blink experiments suggest that target identification processes may require at least 300–500 ms, and may therefore extend in time beyond individual fixation periods.
At a more fundamental level, it is difficult to see how objects can be replaced as conceptual units in visual search, given that the visual world is made up of objects, and finding a particular target object is the goal of a typical search task. H&O claim that processing with a fixation period is not item-based, because “all items are in principle selected and processed simultaneously” (sect. 6.3) by mechanisms that compute global area activations and pooled summary statistics across the FVF. This is plausible for easy search tasks where targets can be found on the basis of local feature discontinuities (singleton detection), and also for non-search tasks that require the rapid extraction of the gist of a scene. What remains unclear is whether such global area-based mechanisms can detect the presence or absence of targets even in moderately difficult search tasks where no diagnostic low-level saliency signals are available and distractors share features with the target. Furthermore, the spatially non-selective group-based account proposed by H&O seems at odds with neuroscientific insights into the control of visual search. During search for targets with known features, biases of visual processing towards target-matching objects emerge rapidly within the first 200 ms after the presentation of a search display, even outside of the current attentional focus (e.g., Bichot et al. Reference Bichot, Rossi and Desimone2005). These biases are elicited in a spatially specific fashion in retinotopic visual areas that match the location of possible target objects. They can initially be triggered at multiple locations across the visual field, but gradually become more spatially focused, and may eventually result in the selective activation of one particular object representation (see Eimer Reference Eimer2014; Reference Eimer2015, for a more detailed discussion, and Duncan Reference Duncan2006, for related ideas on object-based integrated competition mechanisms in visual search). The important point here is that such task-dependent attentional biases of visual processing emerge in spatial visual maps that represent candidate target objects at particular locations. In this fundamental sense, attentional selection mechanisms and their neural basis remain irreducibly item-based. Crucially, this type of item-based selectivity does not imply serial selection. Spatially selective processing biases for target-matching objects can emerge in parallel across the visual field (e.g., Bichot et al. Reference Bichot, Rossi and Desimone2005; Saenz et al. Reference Saenz, Buracas and Boynton2002), and multiple target objects at different locations can be selected simultaneously and independently (e.g., Eimer & Grubert Reference Eimer and Grubert2014).
Within the framework proposed by H&O, it may be useful to distinguish between the guidance of spatial attention during individual fixation episodes, and the guidance of eye movements. The selection of new fixation locations might indeed be informed by global area-based computations that are performed in parallel outside of the currently fixated region, and provide information about the likelihood of a target being present elsewhere in the visual field. In contrast, attentional control processes within the FVF during a fixation episode operate via spatially selective and thus essentially item-based modulations of visual processing. In fact, H&O acknowledge the existence of such spatial biases that gradually become more item-based for the case of compound search where target-defining and response-relevant features differ. Here, “a global search for the target-defining feature may be followed by a local search for the response-defining feature.” The question remains whether this type of item-based spatially selective attentional control is the exception or the rule during visual search. Although some real-world visual search tasks (e.g., the scanning of mammograms or security X-ray images) do not involve the well-defined objects that are used in lab-based search studies, one could argue that even here, search is still guided in a spatially selective fashion by image features that are relevant for the task at hand.
The new fixation-based search model proposed by H&O is useful not only because of its power to simulate behavioural results, but also because it invites us to think differently about visual search. Serial selection models have dominated the field for decades, and alternative concepts are sorely needed. H&O provide excellent arguments for abandoning strictly sequential item-by-item accounts of visual search. However, in their endeavour to reject serial selection, they may have thrown out the item-based baby with the serial bathwater. Attentional processes in visual search may indeed operate in a largely parallel fashion, but the item will remain a primary unit of selection.