Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-02-11T15:47:30.736Z Has data issue: false hasContentIssue false

Preparing to be punched: Prediction may not always require inference of intentions

Published online by Cambridge University Press:  24 June 2013

Helene Kreysa*
Affiliation:
Department for General Psychology and Cognitive Neuroscience, Friedrich Schiller University Jena, 07743 Jena, Germany. helene.kreysa@uni-jena.dehttp://www2.uni-jena.de/svw/allgpsy/team/kreysa-h.htm

Abstract

Pickering & Garrod's (P&G's) framework assumes an efference copy based on the interlocutor's intentions. Yet, elaborate attribution of intentions may not always be necessary for online prediction. Instead, contextual cues such as speaker gaze can provide similar information with a lower demand on processing resources.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2013 

At several points in their target article, Pickering & Garrod (P&G) suggest that prediction by simulation is based on determining a conversational partner's intention through perceiving the unfolding action or speech, potentially combined with background knowledge. On this basis, an efference copy of the intended act is generated, enabling the prediction of upcoming behavior and the production of behavior or speech that complements it. But how explicit do the attributed intentions need to be to allow such prediction, and how might comprehenders derive them?

Unfortunately, P&G do not define clearly what they mean by “intention”: Whereas on the part of the actor or speaker, an intention seems to represent nothing more than an “action command” (e.g., in the legend to Figure 3; target article, sect. 2.2), recognizing intentions on the part of the comprehender is more complicated. According to P&G, it can involve considerations of past and present behavior and of the speaker's perceived state of mind, as well as ongoing modifications of this interpretation. In the literature, identifying others' intentions is generally taken to imply an additional step beyond action recognition, that of identifying the goal of this action (see e.g., Levinson Reference Levinson, Enfield and Levinson2006; Tomasello et al. Reference Tomasello, Carpenter, Call, Behne and Moll2005). A similar view underlies the HMOSAIC architecture (Wolpert et al. Reference Wolpert, Doya and Kawato2003), which contains symbolic representations of the task in the form of goals or intentions.

Elaborate attributions of intention based on the interlocutor's potential motivations in the current situation and on general world knowledge are certainly useful for understanding speech. They help to generate expectations about how the conversation is likely to develop and can be used for what P&G call “offline prediction.” Yet although such expectations have, in turn, been shown to influence moment-by-moment language comprehension (e.g., Kamide et al. Reference Kamide, Altmann and Haywood2003; Van Berkum et al. Reference Van Berkum, van den Brink, Tesink, Kos and Hagoort2008), they are presumably not computed on a moment-by-moment basis and remain relatively constant across extended periods of time.

The time-critical online simulations in P&G's account of real-time conversation must require intentions of a more basic kind – some form of heuristic for anticipating others' upcoming actions. To use their example: It is unquestionably valuable if I am quick to predict that someone is preparing to punch me rather than to shake my hand, so I can prepare to move appropriately. But if I start considering why they might wish to hurt me, I will probably be too late in responding.

For real-time online prediction of upcoming words and sentences, I would like to suggest that it may often be possible to rely on contextual clues to a speaker's upcoming actions that are directly perceivable: A particular tone of voice, a facial expression, a hand gesture, or a shift in the speaker's gaze direction can all be informative about how the speaker plans to continue a sentence. In this sense, such cues are closely connected to the speaker's intentions. At the same time, they are often produced unintentionally on the part of the speaker, and can be readily perceived on the part of the comprehender. This is exactly what makes them so efficient: They can help to disambiguate the linguistic signal without requiring deliberate consideration of intentions (cf. Shintel & Keysar Reference Shintel and Keysar2009, who refer to such processes as “nonstrategic generic-listener adaptations”; p. 269).

A prime example of this type of contextual cue is the direction of other people's gaze. Gaze is a salient attentional cue that reliably causes viewers to shift their own attention in the same direction (Emery Reference Emery2000). This occurs even in the visual periphery and without requiring conscious awareness (Langton et al. Reference Langton, Watt and Bruce2000; Xu et al. Reference Xu, Zhang and Geng2011). Additionally, because speakers tend to look at objects they are preparing to mention (e.g., Griffin & Bock Reference Griffin and Bock2000; Meyer et al. Reference Meyer, Sleiderink and Levelt1998), gaze will often directly reflect the speaker's action plan. Such referential gaze is therefore both easy to detect and informative about upcoming sentence content. In fact, comprehenders can and do make rapid use of the speaker's gaze direction to anticipate upcoming referents (Hanna & Brennan Reference Hanna and Brennan2007; Staudte & Crocker, Reference Staudte and Crocker2011) and even to assign thematic role relations (Nappa et al. Reference Nappa, Wessel, McEldoon, Gleitman and Trueswell2009; Knoeferle & Kreysa Reference Knoeferle and Kreysa2012).

These benefits of gaze-following in comprehension can be conceived of as a form of prediction about what will be mentioned next, similar to anticipatory fixations of objects in the visual world in eye tracking studies of spoken sentence processing (e.g., Altmann & Kamide Reference Altmann and Kamide1999; Knoeferle & Crocker Reference Knoeferle and Crocker2006; for recent reviews see Altmann Reference Altmann, Liversedge, Gilchrist and Everling2011 and Huettig et al. Reference Huettig, Rommers and Meyer2011). It is interesting to consider P&G's classification and to speculate on whether this might be prediction by association (e.g., “people who look at objects often mention them shortly thereafter”) or even prediction by simulation (using one's own gaze behavior as a proxy, e.g., “if I had just looked at the kite, I'd refer to it next”). Alternatively, it might be sufficient that speaker gaze attracts the comprehender's attention to a location that is relevant for understanding the unfolding speech utterance. In all three cases, the end result is a coordination of the interlocutors' attention on the same objects in the visual world. This is known to benefit problem solving (Grant & Spivey Reference Grant and Spivey2003; Knoblich et al. Reference Knoblich, Öllinger, Spivey and Underwood2005) and conversation in general (Richardson & Dale Reference Richardson and Dale2005; Richardson et al. Reference Richardson, Dale and Kirkham2007). Such benefits may well be due to a shared perspective and aligned representations of the situation, but they need not imply awareness of the interlocutor's intentions.

References

Altmann, G. T. M. (2011) The mediation of eye movements by spoken language. In The Oxford handbook of eye movements, ed. Liversedge, S. P., Gilchrist, I. D. & Everling, S., pp. 9791003. Oxford University Press.Google Scholar
Altmann, G. T. M. & Kamide, Y. (1999) Incremental interpretation at verbs: Restricting the domain of subsequent reference. Cognition 73(3):247–64.CrossRefGoogle ScholarPubMed
Emery, N. J. (2000) The eyes have it: The neuroethology, function and evolution of social gaze. Neuroscience and Biobehavioral Reviews 24:581604.Google Scholar
Grant, E. R. & Spivey, M. J. (2003) Eye movements and problem solving: Guiding attention guides thought. Psychological Science 14:462–66.Google Scholar
Griffin, Z. M. & Bock, K. (2000) What the eyes say about speaking. Psychological Science 11:274–79.Google Scholar
Hanna, J. E. & Brennan, S. E. (2007) Speakers' eye gaze disambiguates referring expressions early during face-to-face conversation. Journal of Memory and Language 57:596615.Google Scholar
Huettig, F., Rommers, J. & Meyer, A. S. (2011) Using the visual world paradigm to study language processing: A review and critical evaluation. Acta Psychologica 137:151–71.Google Scholar
Kamide, Y., Altmann, G. T. M. & Haywood, S. L. (2003) Prediction and thematic information in incremental sentence processing: Evidence from anticipatory eye movements. Journal of Memory and Language 49:133–56.Google Scholar
Knoblich, G., Öllinger, M. & Spivey, M. J. (2005) Tracking the eyes to obtain insight into insight problem solving. In: Cognitive processes in eye guidance, ed. Underwood, G., pp. 355–75. Oxford University Press.Google Scholar
Knoeferle, P. & Crocker, M. W. (2006) The coordinated interplay of scene, utterance, and world knowledge: Evidence from eye tracking. Cognitive Science 30:481529.CrossRefGoogle ScholarPubMed
Knoeferle, P. & Kreysa, H. (2012) Can speaker gaze modulate syntactic structuring and thematic role assignment during spoken sentence comprehension? Frontiers in Psychology 3:538.Google Scholar
Langton, S. R. H., Watt, R. J. & Bruce, V. (2000) Do the eyes have it? Cues to the direction of social attention. Trends in Cognitive Sciences 4:5059.CrossRefGoogle Scholar
Levinson, S. C. (2006) On the human “interaction engine.” In: Roots of human sociality: Culture, cognition and interaction, ed. Enfield, N. J. & Levinson, S. C. (Cur.), pp. 3969. Berg.Google Scholar
Meyer, A. S., Sleiderink, A. M. & Levelt, W. J. M. (1998) Viewing and naming objects: Eye movements during noun phrase production. Cognition 66:B2533.Google Scholar
Nappa, R., Wessel, A., McEldoon, K. L., Gleitman, L. R. & Trueswell, J. C. (2009) Use of speaker's gaze and syntax in verb learning. Language Learning and Development 5:203–34.Google Scholar
Richardson, D. C. & Dale, R. (2005) Looking to understand: The coupling between speakers' and listeners' eye movements and its relationship to discourse comprehension. Cognitive Science 29:1045–60.Google Scholar
Richardson, D. C., Dale, R. & Kirkham, N. Z. (2007) The art of conversation is coordination. Psychological Science 18:407–13.CrossRefGoogle ScholarPubMed
Shintel, H. & Keysar, B. (2009) Less is more: A minimalist account of joint action in communication. Topics in Cognitive Science 1:260–73.Google Scholar
Staudte, M. & Crocker, M. W. (2011) Investigating joint attention mechanisms through spoken human-robot interaction. Cognition 120:268–91.Google Scholar
Tomasello, M., Carpenter, M., Call, J., Behne, T. & Moll, H. (2005) Understanding and sharing intentions: The origins of cultural cognition. Behavioral and Brain Sciences 28:675–91.CrossRefGoogle ScholarPubMed
Van Berkum, J. J. A., van den Brink, D., Tesink, C. M. J. Y., Kos, M. & Hagoort, P. (2008) The neural integration of speaker and message. Journal of Cognitive Neuroscience 20:580–91.Google Scholar
Wolpert, D. M., Doya, K. & Kawato, M. (2003) A unifying computational framework for motor control and social interaction. Philosophical Transactions of the Royal Society B: Biological Sciences 358(1431):593602. DOI:10.1098/rstb.2002.1238.CrossRefGoogle ScholarPubMed
Xu, S., Zhang, S. & Geng, H. (2011) Gaze-induced joint attention persists under high perceptual load and does not depend on awareness. Vision Research 51:2048–56.Google Scholar