Hulleman & Olivers (H&O) make an interesting case for an approach that takes eye fixations, rather than individual items, as its central unit. Within the fixational “functional field of view” (FFV), items are processed in parallel. The size of the FFV is adjusted according to search (target discrimination) difficulty, determining the number of fixations and thus RTs. While H&O's, and previous (e.g., Zelinsky Reference Zelinsky2008), arguments that eye movements and the FFV play a role in realistic visual search are persuasive, their model leaves (1) the attentional process that detects targets and (2) the pre-attentive process that guides fixations underspecified. Here, we discuss point (1) in relation to Humphreys and Müller's (Reference Humphreys and Müller1993) “Search via Recursive Rejection” (SERR) model (discussed by H&O in sect. 3.2), which, arguably, anticipated some of the ideas advocated by H&O, and (2) the need for a pre-attentive search-guidance mechanism in both SERR and H&O's model.
-
1. Like H&O's model, SERR deploys a sequence of parallel search steps to decide whether a target is present in the display. Although H&O are silent about the process that determines whether the target is present in each FFV region (a process their model considers as error-free), SERR – a connectionist implementation of Duncan and Humphreys' (Reference Duncan and Humphreys1989) “Similarity Theory” – posits an error-prone mechanism. In SERR, items, the target and the distractors, within some FFV of spatially parallel processing compete for activating their (higher-level) template representations. When there are multiple distractors of the same complex feature description in the FFV, they are likely to win the competition over the single target, whereupon they are top-down suppressed “as a group.” This process operates recursively until either (1) the target activates its template, triggering a target-present (TP) decision; or (2) all items are “removed” from the FFV, leading to a target-absent (TA) decision. These dynamics are influenced by target–distractor similarity: The more similar the target is to (some of) the distractors, the more likely it is to be rejected along with a distractor group, yielding increasing miss rates. To bring the rate of target misses down to acceptable levels (matching those exhibited by humans), SERR must make several rechecking “runs” at the items in the FFV, until the target is either detected or consistently not found. Importantly, SERR produces miss rates that accelerate positively with the number of items in the FFV (especially with multiple distractor groups), in which case the rechecking strategy can become prohibitively expensive. As discussed by Humphreys & Müller (Reference Humphreys and Müller1993, p. 105), “A solution is to limit SERR's functional field so that there is a balance between the first-pass miss rate and the time cost incurred by rechecking” – providing an explicit, error-based “rule” for the FFV size adjustment. The adjusted FFV would then have to be deployed serially across the display (whether this involves covert or overt attention shifts). This resembles some of H&O's central ideas concerning discriminability-dependent FFV adjustments, which would be reflected in the number of attention shifts necessary to perform the task. As an aside, H&O are not quite right in stating that “the…empirical work [associated with SERR] focused on relatively shallow search slopes” (sect. 3.2, para. 3): Müller et al. (Reference Müller, Humphreys and Donnelly1994) present simulations of human slopes (with slope estimates derived from simulated mean RTs and RT distributions) ranging, for example, in their Experiment 1, from about 30 to well over 200 ms/item.
-
2. Given a need for overt or covert attention shifts, efficient search would require an element of pre-attentive “guidance” for the FFV to be directed to (only) the most “promising” regions of the display. In principle, guidance can be provided by a combination of bottom-up and top-down mechanisms, for example, through the computation of local feature-contrast signals and their summation, across dimensions, on some search-guiding “overall-saliency” or “priority” map of the field. Note that this map is generally conceived as a pre-attentive representation, even though it is subject to top-down (feature- and dimension- as well as memory-based) biasing. Notions of guidance are at the heart of models from the Guided-Search (GS) family, including our “Competitive GS” model (e.g., Liesefeld et al. Reference Liesefeld, Moran, Usher, Müller and Zehetleitner2016; Moran et al. Reference Moran, Zehetleitner, Müller and Usher2013; Reference Moran, Zehetleitner, Liesefeld, Müller and Usher2016), and well supported empirically. Although feature contrast computations themselves are not necessarily “item-based” (see, e.g., Itti & Koch Reference Itti and Koch2001), much of what is known about their workings stems from item-based search experiments! Arguably, then, as acknowledged by H&O (in sect. 6.6), their model (and SERR!) would need to incorporate some notion of “guidance” to fully account for human search performance – which would bring it closer into line with “traditional,” two-stage models of visual search like GS.
Note that H&O “buy in” guidance from models such as Zelinsky's (Reference Zelinsky2008) “Target Acquisition Model” or Pomplun et al.'s (Reference Pomplun, Reingold and Shen2003) “Area Activation Model.” In these types of model, guidance is exclusively top-down: target- (template- or feature-) based. In fact, Zelinsky (Reference Zelinsky2008) finds it “arguable whether a model that combines both top-down [target-template-based] and bottom-up [saliency] signals would be more successful than TAM in describing human behavior, at least in tasks in which the top-down target information [is] highly reliable” (p. 825). Such models, however, fail to address what determines target detection in search for (feature or feature conjunction) singleton targets, where there is no (reliable) target template to top-down guide the search (Müller et al. Reference Müller, Heller and Ziegler1995; Weidner & Müller Reference Weidner and Müller2013); for example, is target “pop-out” based on a parallel attentive process operating over the whole display or a pre-attentive, salience-based process? One interesting possibility is that, on TP trials, detection decisions are triggered directly by the salience map – consistent with studies showing pop-out detection with no or minimal target identity processing (e.g., Müller et al. Reference Müller, Krummenacher and Heller2004; Töllner et al. Reference Töllner, Rangelov and Müller2012b) and some process of parallel distractor rejection taking place on TA trials (e.g., Müller et al. Reference Müller, von Mühlenen and Geyer2007). On more difficult search trials, the pre-attentive guidance mechanism could direct the attentive process to sample an area that surrounds the location of the highest salience. Here, models such as H&O's may indeed add to the traditional item-based models.
Hulleman & Olivers (H&O) make an interesting case for an approach that takes eye fixations, rather than individual items, as its central unit. Within the fixational “functional field of view” (FFV), items are processed in parallel. The size of the FFV is adjusted according to search (target discrimination) difficulty, determining the number of fixations and thus RTs. While H&O's, and previous (e.g., Zelinsky Reference Zelinsky2008), arguments that eye movements and the FFV play a role in realistic visual search are persuasive, their model leaves (1) the attentional process that detects targets and (2) the pre-attentive process that guides fixations underspecified. Here, we discuss point (1) in relation to Humphreys and Müller's (Reference Humphreys and Müller1993) “Search via Recursive Rejection” (SERR) model (discussed by H&O in sect. 3.2), which, arguably, anticipated some of the ideas advocated by H&O, and (2) the need for a pre-attentive search-guidance mechanism in both SERR and H&O's model.
1. Like H&O's model, SERR deploys a sequence of parallel search steps to decide whether a target is present in the display. Although H&O are silent about the process that determines whether the target is present in each FFV region (a process their model considers as error-free), SERR – a connectionist implementation of Duncan and Humphreys' (Reference Duncan and Humphreys1989) “Similarity Theory” – posits an error-prone mechanism. In SERR, items, the target and the distractors, within some FFV of spatially parallel processing compete for activating their (higher-level) template representations. When there are multiple distractors of the same complex feature description in the FFV, they are likely to win the competition over the single target, whereupon they are top-down suppressed “as a group.” This process operates recursively until either (1) the target activates its template, triggering a target-present (TP) decision; or (2) all items are “removed” from the FFV, leading to a target-absent (TA) decision. These dynamics are influenced by target–distractor similarity: The more similar the target is to (some of) the distractors, the more likely it is to be rejected along with a distractor group, yielding increasing miss rates. To bring the rate of target misses down to acceptable levels (matching those exhibited by humans), SERR must make several rechecking “runs” at the items in the FFV, until the target is either detected or consistently not found. Importantly, SERR produces miss rates that accelerate positively with the number of items in the FFV (especially with multiple distractor groups), in which case the rechecking strategy can become prohibitively expensive. As discussed by Humphreys & Müller (Reference Humphreys and Müller1993, p. 105), “A solution is to limit SERR's functional field so that there is a balance between the first-pass miss rate and the time cost incurred by rechecking” – providing an explicit, error-based “rule” for the FFV size adjustment. The adjusted FFV would then have to be deployed serially across the display (whether this involves covert or overt attention shifts). This resembles some of H&O's central ideas concerning discriminability-dependent FFV adjustments, which would be reflected in the number of attention shifts necessary to perform the task. As an aside, H&O are not quite right in stating that “the…empirical work [associated with SERR] focused on relatively shallow search slopes” (sect. 3.2, para. 3): Müller et al. (Reference Müller, Humphreys and Donnelly1994) present simulations of human slopes (with slope estimates derived from simulated mean RTs and RT distributions) ranging, for example, in their Experiment 1, from about 30 to well over 200 ms/item.
2. Given a need for overt or covert attention shifts, efficient search would require an element of pre-attentive “guidance” for the FFV to be directed to (only) the most “promising” regions of the display. In principle, guidance can be provided by a combination of bottom-up and top-down mechanisms, for example, through the computation of local feature-contrast signals and their summation, across dimensions, on some search-guiding “overall-saliency” or “priority” map of the field. Note that this map is generally conceived as a pre-attentive representation, even though it is subject to top-down (feature- and dimension- as well as memory-based) biasing. Notions of guidance are at the heart of models from the Guided-Search (GS) family, including our “Competitive GS” model (e.g., Liesefeld et al. Reference Liesefeld, Moran, Usher, Müller and Zehetleitner2016; Moran et al. Reference Moran, Zehetleitner, Müller and Usher2013; Reference Moran, Zehetleitner, Liesefeld, Müller and Usher2016), and well supported empirically. Although feature contrast computations themselves are not necessarily “item-based” (see, e.g., Itti & Koch Reference Itti and Koch2001), much of what is known about their workings stems from item-based search experiments! Arguably, then, as acknowledged by H&O (in sect. 6.6), their model (and SERR!) would need to incorporate some notion of “guidance” to fully account for human search performance – which would bring it closer into line with “traditional,” two-stage models of visual search like GS.
Note that H&O “buy in” guidance from models such as Zelinsky's (Reference Zelinsky2008) “Target Acquisition Model” or Pomplun et al.'s (Reference Pomplun, Reingold and Shen2003) “Area Activation Model.” In these types of model, guidance is exclusively top-down: target- (template- or feature-) based. In fact, Zelinsky (Reference Zelinsky2008) finds it “arguable whether a model that combines both top-down [target-template-based] and bottom-up [saliency] signals would be more successful than TAM in describing human behavior, at least in tasks in which the top-down target information [is] highly reliable” (p. 825). Such models, however, fail to address what determines target detection in search for (feature or feature conjunction) singleton targets, where there is no (reliable) target template to top-down guide the search (Müller et al. Reference Müller, Heller and Ziegler1995; Weidner & Müller Reference Weidner and Müller2013); for example, is target “pop-out” based on a parallel attentive process operating over the whole display or a pre-attentive, salience-based process? One interesting possibility is that, on TP trials, detection decisions are triggered directly by the salience map – consistent with studies showing pop-out detection with no or minimal target identity processing (e.g., Müller et al. Reference Müller, Krummenacher and Heller2004; Töllner et al. Reference Töllner, Rangelov and Müller2012b) and some process of parallel distractor rejection taking place on TA trials (e.g., Müller et al. Reference Müller, von Mühlenen and Geyer2007). On more difficult search trials, the pre-attentive guidance mechanism could direct the attentive process to sample an area that surrounds the location of the highest salience. Here, models such as H&O's may indeed add to the traditional item-based models.