When the simplest voluntary decisions appear patently suboptimal

Emilio Salinas; Joshua A. Seideman; Terrence R. Stanford

doi:10.1017/S0140525X18001474

When the simplest voluntary decisions appear patently suboptimal

Published online by Cambridge University Press: 10 January 2019

Emilio Salinas

Joshua A. Seideman and

Terrence R. Stanford

Show author details

Emilio Salinas: Affiliation:
Department of Neurobiology & Anatomy, Wake Forest School of Medicine, Winston-Salem, NC 27157-1010. esalinas@wakehealth.edujseidema@wakehealth.edustanford@wakehealth.edu
Joshua A. Seideman: Affiliation:
Department of Neurobiology & Anatomy, Wake Forest School of Medicine, Winston-Salem, NC 27157-1010. esalinas@wakehealth.edujseidema@wakehealth.edustanford@wakehealth.edu
Terrence R. Stanford: Affiliation:
Department of Neurobiology & Anatomy, Wake Forest School of Medicine, Winston-Salem, NC 27157-1010. esalinas@wakehealth.edujseidema@wakehealth.edustanford@wakehealth.edu

Article contents

Abstract
References

Rights & Permissions

Abstract

Rahnev & Denison (R&D) catalog numerous experiments in which performance deviates, often in subtle ways, from the theoretical ideal. We discuss an extreme case, an elementary behavior (reactive saccades to single targets) for which a simple contextual manipulation results in responses that are dramatically different from those expected based on reward maximization – and yet are highly informative and amenable to mechanistic examination.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 41 , 2018 , e240

DOI: https://doi.org/10.1017/S0140525X18001474 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2018

The conclusions drawn by Rahnev & Denison (R&D) rely on analyses spanning many tasks and experimental conditions in which perceptually guided decisions deviate, for a variety of reasons, from those of an ideal observer model. Indeed, they exhaustively build a convincing argument. But sometimes a single, powerful example can illustrate a general result with great eloquence. That is the case with an elegant paradigm known as the one-direction-rewarded, or 1DR, task.

The 1DR task is deceptively simple (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018; Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Lauwereyns et al. Reference Lauwereyns, Watanabe, Coe and Hikosaka2002). The subject (a monkey, in this case) is instructed to perform an elementary action: to look at a lone, clearly visible stimulus. Each trial starts with the monkey briefly fixating on a central spot on an otherwise blank screen. Then the fixation spot disappears, and, at the same time, a target stimulus appears at one of four possible symmetric locations (or one of two locations, depending on the study). The stimulus location varies randomly across trials. The monkey is rewarded for making a quick eye movement, a saccade, to the target – a reaction that is, in fact, quite natural.

But there is a catch. Correct saccades to one location yield a large reward, whereas correct saccades to the other locations yield either a small reward or no reward (this varies by monkey, but, importantly, the results are the same). The rewarded location stays constant for a block of trials. The spatial asymmetry in reward expectation leads to a conflict: The monkey wants to look in one direction but is often instructed to look elsewhere. Nevertheless, all trials must be completed, whether the reward on offer is large or small. There is no strategic advantage to responding differently in one condition compared with the other. Only one alternative is available, so deliberating is unnecessary. To maximize the reward rate, the monkey should look at the target as quickly and as accurately as possible each time, regardless of where it appears.

However, the observed behavior diverges quite drastically from this prescription. Saccades in congruent trials, in which the target and the highly rewarded locations coincide, are initiated more quickly and are more accurate than those in incongruent trials, in which the target and the highly rewarded locations differ. The effects are huge. For example, in our own data (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018), we found that the reaction time (RT) went from about 150 ± 25 ms (mean ± standard deviation) to about 250 ± 80 ms, with the error rate changing from virtually zero (99.7% right) to about 10% of incorrect saccades. The extraordinary sensitivity of the monkeys to reward asymmetry also manifests in other, low-level behavioral metrics, such as the peak saccade velocity, as well as in the swiftness with which the animals respond to changes in the asymmetry over time. When the rewarded location changes, which happens without warning, it takes a single trial for the spatial bias to switch accordingly (when only two locations are used). This rich phenomenology is highly consistent between animals, laboratories, and task variants, and it remains stable for months, even after many thousands of trials of practice (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018; Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Takikawa et al. Reference Takikawa, Kawagoe, Itoh, Nakahara and Hikosaka2002; Watanabe et al. Reference Watanabe, Cromwell, Tremblay, Hollerman, Hikosaka and Schultz2001).

Such behavior runs counter to the expectation based on reward maximization, as outlined previously. Within the behavioral repertoire discussed by R&D, the spatial bias represents a particularly drastic breakdown of the speed-accuracy tradeoff (sect. 3.4), because one condition (congruent) leads to more accurate and much faster responses than the other (incongruent). The 1DR behavior can also be considered as a limit case of a choice task in which different responses have different payoffs (sect. 3.3). Normally, in monkeys, such asymmetry produces a shift in criterion (Feng et al. Reference Feng, Holmes, Rorie and Newsome2009; Stanford et al. Reference Stanford, Shankar, Massoglia, Costello and Salinas2010). Here, the perceptual uncertainty about the right option is eliminated, and the adjustment in criterion is grossly inappropriate. Either way, the underlying “cost function” guiding the behavior must be radically different from those that may be naively construed as optimal.

It is not difficult to imagine why such a discrepancy arises. The capacity to discriminate and seek rewarding events must be critical for survival, so it is not surprising that reward drives or modulates numerous cognitive processes. In particular, reward expectation is intimately linked to attentional deployment and oculomotor control (Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Maunsell Reference Maunsell2004; Peck et al. Reference Peck, Jangraw, Suzuki, Efem and Gottlieb2009; Preciado et al. Reference Preciado, Munneke and Theeuwes2017). The conditions in the 1DR task likely set up a cognitive trap of sorts – the illusion of a choice – such that the monkeys never cease to strongly prioritize the rewarded location. In essence, they demonstrate persistent wishful thinking.

Regardless, the 1DR paradigm has been extremely useful, even though it does not adhere to a normative theory. For many years, Hikosaka and colleagues have exploited it to investigate how cognition and motivation interact, seeking to identify and functionally characterize the oculomotor and reward-encoding neural circuits that mediate the biasing effects and their motor expression. Theirs is an impressive research program that has uncovered many such contributions and mechanistic components (e.g., Ding & Hikosaka Reference Ding and Hikosaka2006; Ikeda & Hikosaka Reference Ikeda and Hikosaka2003; Isoda & Hikosaka Reference Isoda and Hikosaka2008; Tachibana & Hikosaka Reference Tachibana and Hikosaka2012; Takikawa et al. Reference Takikawa, Kawagoe and Hikosaka2004; Yasuda & Hikosaka Reference Yasuda and Hikosaka2017). In this context, justifying the animals’ behavior on the basis of an optimality principle or ideal observer model seems rather unnecessary. Furthermore, in our own laboratory, we recently developed a mechanistic model that replicates the monkeys’ RT distributions as well as single-neuron activity in the frontal eye field (FEF) during performance of the 1DR task (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018). This model explains the observed behavior in great quantitative detail based on dynamical interactions found in FEF.

In summary, the results in the 1DR task exemplify one of the main conclusions drawn by R&D – that although a normative benchmark may provide useful interpretive guidance in many cases, it is by no means necessary for understanding a particular behavior, or for generating a complete mechanistic description of it.

References

Ding, L. & Hikosaka, O. (2006) Comparison of reward modulation in the frontal eye field and caudate of the macaque. Journal of Neuroscience 26:6695–703.Google Scholar

Feng, S., Holmes, P., Rorie, A. & Newsome, W. T. (2009) Can monkeys choose optimally when faced with noisy stimuli and unequal rewards? PLoS Computational Biology 5(2):e1000284. Available at: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=2631644&tool=pmcentrez&rendertype=abstract.Google Scholar

Hauser, C. K., Zhu, D., Stanford, T. R. & Salinas, E. (2018) Motor selection dynamics in FEF explain the reaction time variance of saccades to single targets. eLife 7:e33456.Google Scholar

Hikosaka, O., Nakamura, K. & Nakahara, H. (2006) Basal ganglia orient eyes to reward. Journal of Neurophysiology 95:567–84.Google Scholar

Ikeda, T. & Hikosaka, O. (2003) Reward-dependent gain and bias of visual responses in primate superior colliculus. Neuron 39:693–700.Google Scholar

Isoda, M. & Hikosaka, O. (2008) A neural correlate of motivational conflict in the superior colliculus of the macaque. Journal of Neurophysiology 100:1332–42.Google Scholar

Lauwereyns, J., Watanabe, K., Coe, B. & Hikosaka, O. (2002) A neural correlate of response bias in monkey caudate nucleus. Nature 418:413–17.Google Scholar

Maunsell, J. H. (2004) Neuronal representations of cognitive state: Reward or attention? Trends in Cognitive Sciences 8:261–65.Google Scholar

Peck, C. J., Jangraw, D. C., Suzuki, M., Efem, R. & Gottlieb, J. (2009) Reward modulates attention independently of action value in posterior parietal cortex. Journal of Neuroscience 29:11182–91.Google Scholar

Preciado, D., Munneke, J. & Theeuwes, J. (2017) Mixed signals: The effect of conflicting reward- and goal-driven biases on selective attention. Attention Perception & Psychophysics 79:1297–310.Google Scholar

Stanford, T. R., Shankar, S., Massoglia, D. P., Costello, M. G. & Salinas, E. (2010) Perceptual decision making in less than 30 milliseconds. Nature Neuroscience 13:379–85.Google Scholar

Tachibana, Y. & Hikosaka, O. (2012) The primate ventral pallidum encodes expected reward value and regulates motor action. Neuron 76:826–37.Google Scholar

Takikawa, Y., Kawagoe, R. & Hikosaka, O. (2004) A possible role of midbrain dopamine neurons in short- and long-term adaptation of saccades to position-reward mapping. Journal of Neurophysiology 92:2520–29.Google Scholar

Takikawa, Y., Kawagoe, R., Itoh, H., Nakahara, H. & Hikosaka, O. (2002) Modulation of saccadic eye movements by predicted reward outcome. Experimental Brain Research 142:284–91.Google Scholar

Watanabe, M., Cromwell, H. C., Tremblay, L., Hollerman, J. R., Hikosaka, K. & Schultz, W. (2001) Behavioral reactions reflecting differential reward expectations in monkeys. Experimental Brain Research 140:511–18.Google Scholar

Yasuda, M. & Hikosaka, O. (2017) To wait or not to wait—separate mechanisms in the oculomotor circuit of basal ganglia. Frontiers in Neuroanatomy 11:35.Google Scholar