Eye movements are an important part of many laboratory search tasks, and most real world search tasks. A complete account of visual search and visual attention will require an explanation of how eye movements are guided and how they contribute to selection. Furthermore, the ability to track eye movements has led to a succession of new insights into how attention is controlled. There are many examples, but one is Olivers et al.'s (Reference Olivers, Meijer and Theeuwes2006) work investigating the working memory representations guiding search. Future eye tracking studies will almost certainly produce valuable new insights. Thus, in terms of both theory and method, eye movements have played a key role, and will continue to do so.
In the target article's framework , eye tracking data are combined with assumptions about the functional viewing field (FVF), the area within the visual field that is actively processed. The FVF is assumed to be small in difficult visual tasks, so that attentional processing during a single fixation is confined to a small region, and many fixations are necessary to search through a large array. The FVF can be expanded for easier tasks, allowing them to be completed with fewer fixations that are farther apart. As noted in the target article, the concept has been around for some time, but nonetheless it is difficult to demonstrate experimentally that the FVF is actually adjusted during search as Hulleman & Olivers (H&O) suggest ; it cannot be measured as straightforwardly as tracking eye movements. However, H&O point out that evidence from gaze-contingent search experiments (Rayner & Fisher Reference Rayner, Fisher, O'Regan and Levy-Schoen1987; Young & Hulleman Reference Young and Hulleman2013) provide good evidence that information is being taken in from smaller regions in more difficult tasks. The FVF concept has also been useful in interpreting attentional phenomena other than search. For instance, Chen and Cave (Reference Chen and Cave2013; Reference Chen and Cave2014; Reference Chen and Cave2016) found that patterns of distractor interference that did not fit with perceptual load theory (Lavie Reference Lavie2005) or with dilution accounts (Tsal & Benoni Reference Tsal and Benoni2010; Wilson et al. Reference Wilson, Muroi and MacLeod2011) could be explained by assuming that more difficult tasks induce subjects to adopt a narrower FVF (or as it is called in those studies, attentional zoom).
Thus, I agree on the importance of combining eye tracking data with assumptions about variations in FVF to build accounts of visual search. It is also clear that attentional theories need to be able to explain search in scenes that are not easily segmented into separate items. Earlier theories were clearly limited in these respects, in part because they were originally formulated at a time when it was more difficult to track eye movements. However, the proposed framework has other limitations. It appears to be built on the assumption that once the FVF size is set, there is nothing else for covert attention to do. That seems surprising, given the abundant evidence that covert attention can select locations based on color, shape, and other simple features within a fixation (Cave & Zimmerman Reference Cave and Zimmerman1997; Hoffman & Nelson Reference Hoffman and Nelson1981; Kim & Cave Reference Kim and Cave1995). This selection can be done relatively quickly and efficiently (Mangun & Hillyard Reference Mangun, Hillyard, Rugg and Coles1995). I am not trying to argue that attentional selection is fundamentally limited to one item at a time, but it is hard to believe that covert selection would not be employed during search to lower the processing load and limit interference within each fixation. In fact, shifts in covert attention can be tracked from one hemisphere to the other in the course of visual search (Woodman & Luck Reference Woodman and Luck1999). Given that covert attention can be adjusted more quickly than a saccade can be programmed and executed, it should be able to contribute substantially in investigating potential target regions and in choosing the next saccade, as suggested by a group of studies including Deubel and Schneider (Reference Deubel and Schneider1996), and Bichot et al. (Reference Bichot, Rossi and Desimone2005).
Over the years, many attention researchers have tried to study visual search by focusing on covert attention and ignoring eye movements, while others have tried to focus on eye movements while ignoring covert attention. If the H&O framework is truly to be a hybrid approach, it seems that it should allow the possibility that many searches are accomplished through an interaction between eye movements and covert attention.
In considering the history of attention research, it is worth noting that the idea that attention can be adjusted between a broad distribution and a narrow focus has been explored in contexts other than Sanders' (Reference Sanders1970) discussion of FVF mentioned in the target article. There is, of course, Eriksen and St. James' (Reference Eriksen and James1986) zoom lens analogy, but perhaps even more relevant for this discussion is Treisman and Gormican's discussion of how attention makes information about stimulus location available. Here is their description:
Attention selects a filled location within the master map and thereby temporarily restricts the activity from each feature map to the features that are linked to the selected location. The finer the grain of the scan, the more precise the localization and, as a consequence, the more accurately conjoined the features present in different maps will be. (Treisman & Gormican Reference Treisman and Gormican1988, p. 17)
Although they do not explicitly refer to the functional field of view, it seems they had a similar concept in mind, as discussed in Cave (Reference Cave, Wolfe and Robertson2012).
Another aspect of this framework is the move away from visual input that is organized into separate items. The motivation for this is clearly spelled out, but what is not explained is how the concept of object-based attention fits into this framework. There are some circumstances in which visual selection is apparently not shaped by the boundaries defining objects and groups (Chen Reference Chen1998; Goldsmith & Yeari Reference Goldsmith and Yeari2003; Shomstein & Yantis Reference Shomstein and Yantis2002), but they are rare, and the object organization of a display often affects attentional allocation even when it is not relevant to the task (Egly et al. Reference Egly, Driver and Rafal1994; Harms & Bundesen Reference Harms and Bundesen1983). Is the claim in the target article that object and group boundaries play no role in visual search, even though their effects are difficult to avoid in other attentional tasks?
Eye movements are an important part of many laboratory search tasks, and most real world search tasks. A complete account of visual search and visual attention will require an explanation of how eye movements are guided and how they contribute to selection. Furthermore, the ability to track eye movements has led to a succession of new insights into how attention is controlled. There are many examples, but one is Olivers et al.'s (Reference Olivers, Meijer and Theeuwes2006) work investigating the working memory representations guiding search. Future eye tracking studies will almost certainly produce valuable new insights. Thus, in terms of both theory and method, eye movements have played a key role, and will continue to do so.
In the target article's framework , eye tracking data are combined with assumptions about the functional viewing field (FVF), the area within the visual field that is actively processed. The FVF is assumed to be small in difficult visual tasks, so that attentional processing during a single fixation is confined to a small region, and many fixations are necessary to search through a large array. The FVF can be expanded for easier tasks, allowing them to be completed with fewer fixations that are farther apart. As noted in the target article, the concept has been around for some time, but nonetheless it is difficult to demonstrate experimentally that the FVF is actually adjusted during search as Hulleman & Olivers (H&O) suggest ; it cannot be measured as straightforwardly as tracking eye movements. However, H&O point out that evidence from gaze-contingent search experiments (Rayner & Fisher Reference Rayner, Fisher, O'Regan and Levy-Schoen1987; Young & Hulleman Reference Young and Hulleman2013) provide good evidence that information is being taken in from smaller regions in more difficult tasks. The FVF concept has also been useful in interpreting attentional phenomena other than search. For instance, Chen and Cave (Reference Chen and Cave2013; Reference Chen and Cave2014; Reference Chen and Cave2016) found that patterns of distractor interference that did not fit with perceptual load theory (Lavie Reference Lavie2005) or with dilution accounts (Tsal & Benoni Reference Tsal and Benoni2010; Wilson et al. Reference Wilson, Muroi and MacLeod2011) could be explained by assuming that more difficult tasks induce subjects to adopt a narrower FVF (or as it is called in those studies, attentional zoom).
Thus, I agree on the importance of combining eye tracking data with assumptions about variations in FVF to build accounts of visual search. It is also clear that attentional theories need to be able to explain search in scenes that are not easily segmented into separate items. Earlier theories were clearly limited in these respects, in part because they were originally formulated at a time when it was more difficult to track eye movements. However, the proposed framework has other limitations. It appears to be built on the assumption that once the FVF size is set, there is nothing else for covert attention to do. That seems surprising, given the abundant evidence that covert attention can select locations based on color, shape, and other simple features within a fixation (Cave & Zimmerman Reference Cave and Zimmerman1997; Hoffman & Nelson Reference Hoffman and Nelson1981; Kim & Cave Reference Kim and Cave1995). This selection can be done relatively quickly and efficiently (Mangun & Hillyard Reference Mangun, Hillyard, Rugg and Coles1995). I am not trying to argue that attentional selection is fundamentally limited to one item at a time, but it is hard to believe that covert selection would not be employed during search to lower the processing load and limit interference within each fixation. In fact, shifts in covert attention can be tracked from one hemisphere to the other in the course of visual search (Woodman & Luck Reference Woodman and Luck1999). Given that covert attention can be adjusted more quickly than a saccade can be programmed and executed, it should be able to contribute substantially in investigating potential target regions and in choosing the next saccade, as suggested by a group of studies including Deubel and Schneider (Reference Deubel and Schneider1996), and Bichot et al. (Reference Bichot, Rossi and Desimone2005).
Over the years, many attention researchers have tried to study visual search by focusing on covert attention and ignoring eye movements, while others have tried to focus on eye movements while ignoring covert attention. If the H&O framework is truly to be a hybrid approach, it seems that it should allow the possibility that many searches are accomplished through an interaction between eye movements and covert attention.
In considering the history of attention research, it is worth noting that the idea that attention can be adjusted between a broad distribution and a narrow focus has been explored in contexts other than Sanders' (Reference Sanders1970) discussion of FVF mentioned in the target article. There is, of course, Eriksen and St. James' (Reference Eriksen and James1986) zoom lens analogy, but perhaps even more relevant for this discussion is Treisman and Gormican's discussion of how attention makes information about stimulus location available. Here is their description:
Attention selects a filled location within the master map and thereby temporarily restricts the activity from each feature map to the features that are linked to the selected location. The finer the grain of the scan, the more precise the localization and, as a consequence, the more accurately conjoined the features present in different maps will be. (Treisman & Gormican Reference Treisman and Gormican1988, p. 17)
Although they do not explicitly refer to the functional field of view, it seems they had a similar concept in mind, as discussed in Cave (Reference Cave, Wolfe and Robertson2012).
Another aspect of this framework is the move away from visual input that is organized into separate items. The motivation for this is clearly spelled out, but what is not explained is how the concept of object-based attention fits into this framework. There are some circumstances in which visual selection is apparently not shaped by the boundaries defining objects and groups (Chen Reference Chen1998; Goldsmith & Yeari Reference Goldsmith and Yeari2003; Shomstein & Yantis Reference Shomstein and Yantis2002), but they are rare, and the object organization of a display often affects attentional allocation even when it is not relevant to the task (Egly et al. Reference Egly, Driver and Rafal1994; Harms & Bundesen Reference Harms and Bundesen1983). Is the claim in the target article that object and group boundaries play no role in visual search, even though their effects are difficult to avoid in other attentional tasks?