Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-02-12T07:53:13.708Z Has data issue: false hasContentIssue false

Contextual and social cues may dominate natural visual search

Published online by Cambridge University Press:  24 May 2017

Linda Henriksson
Affiliation:
Department of Neuroscience and Biomedical Engineering, Aalto University, 00076 AALTO, Finland; linda.henriksson@aalto.fihttps://people.aalto.fi/linda_henriksson
Riitta Hari
Affiliation:
Department of Neuroscience and Biomedical Engineering, Aalto University, 00076 AALTO, Finland; linda.henriksson@aalto.fihttps://people.aalto.fi/linda_henriksson Department of Art, Aalto University, 00076 AALTO, Finland. riitta.hari@aalto.fihttps://people.aalto.fi/riitta_hari

Abstract

A framework where only the size of the functional visual field of fixations can vary is hardly able to explain natural visual-search behavior. In real-world search tasks, context guides eye movements, and task-irrelevant social stimuli may capture the gaze.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2017 

Can visual search be explained by a model with only one free parameter, the size of the functional visual field (FVF) of a fixation, as suggested by Hulleman & Olivers (H&O)? Considering fixations, rather than individual items, as the primary unit of visual search agrees with the tight connection between eye gaze and information retrieval. H&O demonstrate that their framework successfully captures the variability of reaction times in easy, medium and difficult searches of elementary visual features. However, beyond laboratory conditions (“find a specific item among very similar distractors”), visual search strategies can hardly be explained by such a simple model because the search space is poorly specified (e.g., “Where did I leave my keys?”, “Is my friend already here?”), and the search strategy is affected, for example, by experience, task, memory, and motives. Moreover, some parts of the scene may attract attention and eye-gaze automatically because of their social and not only visual saliency.

In real-life situations, the search targets are not a priori evenly distributed in the visual field, and the task given to the subject will affect the eye movements (Neider & Zelinsky Reference Neider and Zelinsky2006; Torralba et al. Reference Torralba, Oliva, Castelhano and Henderson2006; Yarbus Reference Yarbus1967). Moreover, the scene context can provide spatial constraints on the most likely locations of the target(s) within the scene (Neider & Zelinsky Reference Neider and Zelinsky2006; Torralba et al. Reference Torralba, Oliva, Castelhano and Henderson2006). The viewing strategy is also affected by expertise: experienced radiologists find abnormalities in mammography images more efficiently than do less-experienced colleagues (Kundel et al. Reference Kundel, Nodine, Conant and Weinstein2007); experts in art history and laypersons view paintings differently (Pihko et al. Reference Pihko, Virtanen, Saarinen, Pannasch, Hirvenkari, Tossavainen, Haapala and Hari2011); and dog experts view interacting dogs differently than do naïve observers (Kujala et al. Reference Kujala, Kujala, Carlson and Hari2012). Moreover, the fixation durations vary depending on the task and scene: Although all fixations may be of about the same duration for homogeneous search displays, short fixations associated with long saccades occur while exploring the general features of a natural scene (ambient processing mode) and long fixations with short saccades take place while examining the focus of interest (focal processing mode; Unema et al. Reference Unema, Pannasch, Joos and Velichkovsky2005).

H&O suggest that the concept of FVF would allow semantic biases in visual search by accommodating multiple parallel FVFs – for example, a small FVF for the target object and a larger FVF for recognizing the scene. This extension might account for processing within the fixated area, but could it also predict saccade guidance? Predicting eye movements occurring in the real world would require a comprehensive model of the semantic saliency of the scene, which is really challenging. That said, the recent advances in neural network modeling of artificial visual object recognition (Krizhevsky et al. Reference Krizhevsky, Sutskever, Hinton, Pereira, Burges, Bottou and Weinberger2012) could facilitate the modeling of the semantic and contextual features that guide the gaze (Kümmerer et al. Reference Kümmerer, Theis and Bethge2014).

Finally and importantly, social cues strongly affect natural visual processing. Faces and other social stimuli efficiently attract gaze (Birmingham et al. Reference Birmingham, Bischof and Kingstone2008; Yarbus Reference Yarbus1967), insofar as a saccade toward a face can be difficult to suppress (Cerf et al. Reference Cerf, Frady and Koch2009; Crouzet et al. Reference Crouzet, Kirchner and Thorpe2010). Thus, the mere presence of a task-irrelevant face can disrupt visual search by attracting more frequent and longer fixations than do other distractors (Devue et al. Reference Devue, Belopolsky and Theeuwes2012). Such a viewing behavior contrasts with the conventional search tasks that become more difficult when the resemblance of the distractors and target increases. Whereas faces capture attention (and gaze) in healthy subjects, autistic individuals are less distracted by social stimuli in the search scene (Riby et al. Reference Riby, Brown, Jones and Hanley2012) and experience reduced saliency in semantic-level features, especially in faces and social gaze, during free-viewing of natural scenes (Wang et al. Reference Wang, Jiang, Duchesne, Laugeson, Kennedy, Adolphs and Zhao2015). Altogether, social stimuli have such a central role in human behavior and brain function (Hari et al. Reference Hari, Henriksson, Malinen and Parkkonen2015) that they should not be neglected in models aimed to explain natural visual-search behavior. Peripheral vision can provide effective summary statistics of the global features of the visual field (Rosenholtz Reference Rosenholtz2016), and thus social stimuli, such as faces, outside of the foveal vision could significantly affect the visual search.

Face recognition represents a special case of visual search – a natural search task could be, for example, to find a friend among a crowd of people. For (Western) faces, the optimal fixation location is just below the eyes (Peterson & Eckstein Reference Peterson and Eckstein2012), and two fixations can be enough for face recognition (Hsiao & Cottrell Reference Hsiao and Cottrell2008) for isolated face images. Whether the same is true for faces in their natural context remains to be seen. Overall, it appears that the saccades to faces and to scenes are consistent across subjects during the initial viewing and become less consistent during later saccades (Castelhano & Henderson Reference Castelhano and Henderson2008). In addition, the initial saccades are consistent across cultures, with saccade endpoints reflecting the optimal fixation locations in face identification tasks (Or et al. Reference Or, Peterson and Eckstein2015). These findings raise interesting questions related to the neural underpinnings of natural visual search: How does the guidance of the initial saccades differ from later saccades? At what level of cortical processing does the cultural background or expertise affect the saccade guidance?

In conclusion, we doubt that “an overarching framework of visual search” can be built without implementing effects of contextual and social cues. Building a model that can predict an observer's eye movements during natural search tasks in real-world visual environment remains a challenge.

References

Birmingham, E., Bischof, W. F. & Kingstone, A. (2008) Social attention and real-world scenes: The roles of action, competition and social content. Quarterly Journal of Experimental Psychology (Hove) 61(7):986–98.Google Scholar
Castelhano, M. S. & Henderson, J. M. (2008) Stable individual differences across images in human saccadic eye movements. Canadian Journal of Experimental Psychology 62(1):114.Google Scholar
Cerf, M., Frady, E. P. & Koch, C. (2009) Faces and text attract gaze independent of the task: Experimental data and computer model. Journal of Vision 9(12):10 1115.Google Scholar
Crouzet, S. M., Kirchner, H. & Thorpe, S. J. (2010) Fast saccades toward faces: Face detection in just 100 ms. Journal of Vision 10(4):16 1117.Google Scholar
Devue, C., Belopolsky, A. V. & Theeuwes, J. (2012) Oculomotor guidance and capture by irrelevant faces. PLoS ONE 7(4):e34598.Google Scholar
Hari, R., Henriksson, L., Malinen, S. & Parkkonen, L. (2015) Centrality of social interaction in human brain function. Neuron 88(1):181–93.Google Scholar
Hsiao, J. H. & Cottrell, G. (2008) Two fixations suffice in face recognition. Psychological Science 19(10):9981006.Google Scholar
Krizhevsky, A., Sutskever, I. & Hinton, G. E. (2012) ImageNet classification with deep convolutional neural networks. In: Conference of Advances in neural information processing systems, ed. Pereira, F., Burges, C. J. C., Bottou, L. & Weinberger, K. Q., pp. 1097–105. Neural Information Processing Systems Foundation. Available at: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.Google Scholar
Kujala, M. V., Kujala, J., Carlson, S. & Hari, R. (2012) Dog experts' brains distinguish socially relevant body postures similarly in dogs and humans. PLoS ONE 7(6):e39145.Google Scholar
Kümmerer, M., Theis, L. & Bethge, M. (2014) Deep Gaze I: Boosting saliency prediction with feature maps trained on ImageNet. arXiv preprint arXiv:1411.1045. Available at: http://arxiv.org/abs/1411.1045.Google Scholar
Kundel, H. L., Nodine, C. F., Conant, E. F. & Weinstein, S. P. (2007) Holistic component of image perception in mammogram interpretation: Gaze-tracking study. Radiology 242(2):396402.CrossRefGoogle ScholarPubMed
Neider, M. B. & Zelinsky, G. J. (2006) Scene context guides eye movements during visual search. Vision Research 46(5):614–21.Google Scholar
Or, C. C., Peterson, M. F. & Eckstein, M. P. (2015) Initial eye movements during face identification are optimal and similar across cultures. Journal of Vision 15(13):12.CrossRefGoogle ScholarPubMed
Peterson, M. F. & Eckstein, M. P. (2012) Looking just below the eyes is optimal across face recognition tasks. Proceedings of the National Academy of Sciences of the United States of America 109(48):E3314–23.Google ScholarPubMed
Pihko, E., Virtanen, A., Saarinen, V. M., Pannasch, S., Hirvenkari, L., Tossavainen, T., Haapala, A. & Hari, R. (2011) Experiencing art: The influence of expertise and painting abstraction level. Frontiers in Human Neuroscience 5:94.Google Scholar
Riby, D. M., Brown, P. H., Jones, N. & Hanley, M. (2012) Brief report: Faces cause less distraction in autism. Journal of Autism and Developmental Disorders 42(4):634–39.Google Scholar
Rosenholtz, R. (2016) Capabilities and limitations of peripheral vision. Annual Review of Vision Science 2:437–57.Google Scholar
Torralba, A., Oliva, A., Castelhano, M. S. & Henderson, J. M. (2006) Contextual guidance of eye movements and attention in real-world scenes: The role of global features in object search. Psychological Review. 113(4):766–86.Google Scholar
Unema, P. J. a., Pannasch, S., Joos, M. & Velichkovsky, B. M. (2005) Time course of information processing during scene perception: The relationship between saccade amplitude and fixation duration. Visual Cognition 12(3):473–94. doi: 10.1080/13506280444000409.Google Scholar
Wang, S., Jiang, M., Duchesne, X. M., Laugeson, E. A., Kennedy, D. P., Adolphs, R. & Zhao, Q. (2015) Atypical visual saliency in Autism spectrum disorder quantified through model-based eye tracking. Neuron 88(3):604–16.Google Scholar
Yarbus, A. L. (1967) Eye movements and vision. Plenum Press.Google Scholar