The target article reviews a number of previous theories of visual search and argues that a better understanding of visual search is hampered by the assumption that individual items are central units of visual search. To explain visual search behavior (VSB), Hulleman & Olivers (H&O) propose a compelling alternative framework in which fixations are central units of the VSB and suggest that the demise of the item in visual search is impending. We greatly appreciate this target article drawing attention to the importance of fixations in visual search and to other ways of conceiving visual search behavior, but we object to the proposition that the item and item-based theories in explaining VSB are flawed. In our opinion, whereas fixations are a necessary element of the VSB, they are not sufficient for an integrative account of visual search. We argue that both peripheral and central processes contribute to the VSB and therefore an integrative account of the VSB will include fixations as well as elements of the established theories of VSB such as feature integration and attention. The authors acknowledge that the theories they call “item-based” recognize the existence of large amounts of parallel processing and that some of these theories are not based on individual items. Therefore, it is possible to consider these theories in ways other than strict serial processing of items. H&O have claimed that “within fixations, items are processed in parallel.” We reconsider this by highlighting the role of attention in visual search. The obligatory relationship between eye movement and attentional shifts in which eye movement cannot be performed without the attentional shifts has long been identified (Fischer Reference Fischer1987). Fixations occur to cluster items together. We agree with the authors that subjects tend to move their eyes because “covert search is much harder” (sect. 6.4, para. 2). However, we emphasize that within each fixation, covert attention plays a critical role on serial processing of individual items (Buschman & Miller Reference Buschman and Miller2009). In a recent study Marti et al. (Reference Marti, Bayet and Dehaene2015) used a unique strategy in which subjects had to report their fixations in a search task. The results of self-reports were then compared with the actual eye movements. They showed that in some cases, subjects reported eye movements that they had never made. They concluded that item search was conducted by covert attention strategy and they had probably reported covert shifts of attention as eye movements. This indicates the importance of items in search strategy within each fixation.
Regarding feature integration theory (FIT; Treisman & Gelade Reference Treisman and Gelade1980), feature integration and fixations are reconcilable in our proposal. H&O invalidate FIT as a viable account of VSB because this theory has classically been used to advocate serial processing of items arising from the conjunctions of different features. Although conjunctions are necessary for full perception, it is not necessary to perceive full conjunctions with a full map of features that lead to serial processing of the items. Feature extraction takes place at several levels and it does not need complete scrutiny at every level as there is evidence that humans can recognize degraded images such as faces (Gilad-Gutnick & Sinha Reference Gilad-Gutnick and Sinha2012). In our account, at the first fixation, incomplete feature maps are made which gives a gist of the whole scene. These maps are made randomly, though the most salient features (Xiaodi et al. Reference Xiaodi, Harel and Koch2012) have a higher chance to enter these maps. Rather than conjunctions that lead to a full perception of individual items, loose conjunctions and clusters of similarities among features are made (Oliva & Torralba Reference Oliva and Torralba2006). Using these maps, parallel exclusions and inclusions guide attention covertly or overtly to the most informative areas of the visual scene. This guidance leads to a more detailed map. At this level, more detailed (though not necessarily complete) parallel feature maps are formed within each fixation. Whenever an item or a number of items passes a certain threshold of similarity with the template, those individual items might be examined serially within the fixation, which can lead to either a target-present response or continuation of the search task. This is specifically true in the case of real world situations such as searching for a lesion in a radiographic image.
An important question is the size of each fixation or functional viewing field (FVF). The extent of feature extraction/integration depends on the size of the FVF. H&O argue that the difficulty of discriminating items determines the size of the FVF. We propose that the fixation size is determined by the perceptual load. Following earlier work of Kahneman and Treisman (Reference Kahneman, Treisman, Parasuraman and Davies1984), Lavie demonstrated that perceptual load is the major determinant of the locus of selection in visual attention (Lavie & Tsal Reference Lavie and Tsal1994) and that perceptual load is necessary for early selection (Lavie Reference Lavie1995). According to the load theory of attention, the scope of perception will be stretched from the center of the fixation to the surrounding area to the extent that the perception is loaded. It has to be noted that although this is a theory of attention, unlike cognitive load, perceptual load involves the early sensory system. To enable feature integration, the size of the fixations is adjusted according to the perceptual load of a group of items. In larger FVFs (e.g., initial fixation), the perceptual load is saturated with incomplete feature extraction. In our account, fixations are a measure, not central unit, of the feature integration at different levels.
In conclusion, H&O present a powerful case to support a framework that unifies fixation-based studies of VSB. However, their RT-based arguments to invalidate item-based theories of VSB need to be revisited. We argue that perceptual load determines the size of the fixations and consequently the number of the fixations. In a step towards an integrative account of the VSB, we propose an account in which core elements of the item-based theories hold and fixations are included.
The target article reviews a number of previous theories of visual search and argues that a better understanding of visual search is hampered by the assumption that individual items are central units of visual search. To explain visual search behavior (VSB), Hulleman & Olivers (H&O) propose a compelling alternative framework in which fixations are central units of the VSB and suggest that the demise of the item in visual search is impending. We greatly appreciate this target article drawing attention to the importance of fixations in visual search and to other ways of conceiving visual search behavior, but we object to the proposition that the item and item-based theories in explaining VSB are flawed. In our opinion, whereas fixations are a necessary element of the VSB, they are not sufficient for an integrative account of visual search. We argue that both peripheral and central processes contribute to the VSB and therefore an integrative account of the VSB will include fixations as well as elements of the established theories of VSB such as feature integration and attention. The authors acknowledge that the theories they call “item-based” recognize the existence of large amounts of parallel processing and that some of these theories are not based on individual items. Therefore, it is possible to consider these theories in ways other than strict serial processing of items. H&O have claimed that “within fixations, items are processed in parallel.” We reconsider this by highlighting the role of attention in visual search. The obligatory relationship between eye movement and attentional shifts in which eye movement cannot be performed without the attentional shifts has long been identified (Fischer Reference Fischer1987). Fixations occur to cluster items together. We agree with the authors that subjects tend to move their eyes because “covert search is much harder” (sect. 6.4, para. 2). However, we emphasize that within each fixation, covert attention plays a critical role on serial processing of individual items (Buschman & Miller Reference Buschman and Miller2009). In a recent study Marti et al. (Reference Marti, Bayet and Dehaene2015) used a unique strategy in which subjects had to report their fixations in a search task. The results of self-reports were then compared with the actual eye movements. They showed that in some cases, subjects reported eye movements that they had never made. They concluded that item search was conducted by covert attention strategy and they had probably reported covert shifts of attention as eye movements. This indicates the importance of items in search strategy within each fixation.
Regarding feature integration theory (FIT; Treisman & Gelade Reference Treisman and Gelade1980), feature integration and fixations are reconcilable in our proposal. H&O invalidate FIT as a viable account of VSB because this theory has classically been used to advocate serial processing of items arising from the conjunctions of different features. Although conjunctions are necessary for full perception, it is not necessary to perceive full conjunctions with a full map of features that lead to serial processing of the items. Feature extraction takes place at several levels and it does not need complete scrutiny at every level as there is evidence that humans can recognize degraded images such as faces (Gilad-Gutnick & Sinha Reference Gilad-Gutnick and Sinha2012). In our account, at the first fixation, incomplete feature maps are made which gives a gist of the whole scene. These maps are made randomly, though the most salient features (Xiaodi et al. Reference Xiaodi, Harel and Koch2012) have a higher chance to enter these maps. Rather than conjunctions that lead to a full perception of individual items, loose conjunctions and clusters of similarities among features are made (Oliva & Torralba Reference Oliva and Torralba2006). Using these maps, parallel exclusions and inclusions guide attention covertly or overtly to the most informative areas of the visual scene. This guidance leads to a more detailed map. At this level, more detailed (though not necessarily complete) parallel feature maps are formed within each fixation. Whenever an item or a number of items passes a certain threshold of similarity with the template, those individual items might be examined serially within the fixation, which can lead to either a target-present response or continuation of the search task. This is specifically true in the case of real world situations such as searching for a lesion in a radiographic image.
An important question is the size of each fixation or functional viewing field (FVF). The extent of feature extraction/integration depends on the size of the FVF. H&O argue that the difficulty of discriminating items determines the size of the FVF. We propose that the fixation size is determined by the perceptual load. Following earlier work of Kahneman and Treisman (Reference Kahneman, Treisman, Parasuraman and Davies1984), Lavie demonstrated that perceptual load is the major determinant of the locus of selection in visual attention (Lavie & Tsal Reference Lavie and Tsal1994) and that perceptual load is necessary for early selection (Lavie Reference Lavie1995). According to the load theory of attention, the scope of perception will be stretched from the center of the fixation to the surrounding area to the extent that the perception is loaded. It has to be noted that although this is a theory of attention, unlike cognitive load, perceptual load involves the early sensory system. To enable feature integration, the size of the fixations is adjusted according to the perceptual load of a group of items. In larger FVFs (e.g., initial fixation), the perceptual load is saturated with incomplete feature extraction. In our account, fixations are a measure, not central unit, of the feature integration at different levels.
In conclusion, H&O present a powerful case to support a framework that unifies fixation-based studies of VSB. However, their RT-based arguments to invalidate item-based theories of VSB need to be revisited. We argue that perceptual load determines the size of the fixations and consequently the number of the fixations. In a step towards an integrative account of the VSB, we propose an account in which core elements of the item-based theories hold and fixations are included.
Acknowledgment
The authors thank Dr. Michael A. Lebedev for his helpful comments on an earlier version of the manuscript.