The conclusions drawn by Rahnev & Denison (R&D) rely on analyses spanning many tasks and experimental conditions in which perceptually guided decisions deviate, for a variety of reasons, from those of an ideal observer model. Indeed, they exhaustively build a convincing argument. But sometimes a single, powerful example can illustrate a general result with great eloquence. That is the case with an elegant paradigm known as the one-direction-rewarded, or 1DR, task.
The 1DR task is deceptively simple (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018; Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Lauwereyns et al. Reference Lauwereyns, Watanabe, Coe and Hikosaka2002). The subject (a monkey, in this case) is instructed to perform an elementary action: to look at a lone, clearly visible stimulus. Each trial starts with the monkey briefly fixating on a central spot on an otherwise blank screen. Then the fixation spot disappears, and, at the same time, a target stimulus appears at one of four possible symmetric locations (or one of two locations, depending on the study). The stimulus location varies randomly across trials. The monkey is rewarded for making a quick eye movement, a saccade, to the target – a reaction that is, in fact, quite natural.
But there is a catch. Correct saccades to one location yield a large reward, whereas correct saccades to the other locations yield either a small reward or no reward (this varies by monkey, but, importantly, the results are the same). The rewarded location stays constant for a block of trials. The spatial asymmetry in reward expectation leads to a conflict: The monkey wants to look in one direction but is often instructed to look elsewhere. Nevertheless, all trials must be completed, whether the reward on offer is large or small. There is no strategic advantage to responding differently in one condition compared with the other. Only one alternative is available, so deliberating is unnecessary. To maximize the reward rate, the monkey should look at the target as quickly and as accurately as possible each time, regardless of where it appears.
However, the observed behavior diverges quite drastically from this prescription. Saccades in congruent trials, in which the target and the highly rewarded locations coincide, are initiated more quickly and are more accurate than those in incongruent trials, in which the target and the highly rewarded locations differ. The effects are huge. For example, in our own data (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018), we found that the reaction time (RT) went from about 150 ± 25 ms (mean ± standard deviation) to about 250 ± 80 ms, with the error rate changing from virtually zero (99.7% right) to about 10% of incorrect saccades. The extraordinary sensitivity of the monkeys to reward asymmetry also manifests in other, low-level behavioral metrics, such as the peak saccade velocity, as well as in the swiftness with which the animals respond to changes in the asymmetry over time. When the rewarded location changes, which happens without warning, it takes a single trial for the spatial bias to switch accordingly (when only two locations are used). This rich phenomenology is highly consistent between animals, laboratories, and task variants, and it remains stable for months, even after many thousands of trials of practice (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018; Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Takikawa et al. Reference Takikawa, Kawagoe, Itoh, Nakahara and Hikosaka2002; Watanabe et al. Reference Watanabe, Cromwell, Tremblay, Hollerman, Hikosaka and Schultz2001).
Such behavior runs counter to the expectation based on reward maximization, as outlined previously. Within the behavioral repertoire discussed by R&D, the spatial bias represents a particularly drastic breakdown of the speed-accuracy tradeoff (sect. 3.4), because one condition (congruent) leads to more accurate and much faster responses than the other (incongruent). The 1DR behavior can also be considered as a limit case of a choice task in which different responses have different payoffs (sect. 3.3). Normally, in monkeys, such asymmetry produces a shift in criterion (Feng et al. Reference Feng, Holmes, Rorie and Newsome2009; Stanford et al. Reference Stanford, Shankar, Massoglia, Costello and Salinas2010). Here, the perceptual uncertainty about the right option is eliminated, and the adjustment in criterion is grossly inappropriate. Either way, the underlying “cost function” guiding the behavior must be radically different from those that may be naively construed as optimal.
It is not difficult to imagine why such a discrepancy arises. The capacity to discriminate and seek rewarding events must be critical for survival, so it is not surprising that reward drives or modulates numerous cognitive processes. In particular, reward expectation is intimately linked to attentional deployment and oculomotor control (Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Maunsell Reference Maunsell2004; Peck et al. Reference Peck, Jangraw, Suzuki, Efem and Gottlieb2009; Preciado et al. Reference Preciado, Munneke and Theeuwes2017). The conditions in the 1DR task likely set up a cognitive trap of sorts – the illusion of a choice – such that the monkeys never cease to strongly prioritize the rewarded location. In essence, they demonstrate persistent wishful thinking.
Regardless, the 1DR paradigm has been extremely useful, even though it does not adhere to a normative theory. For many years, Hikosaka and colleagues have exploited it to investigate how cognition and motivation interact, seeking to identify and functionally characterize the oculomotor and reward-encoding neural circuits that mediate the biasing effects and their motor expression. Theirs is an impressive research program that has uncovered many such contributions and mechanistic components (e.g., Ding & Hikosaka Reference Ding and Hikosaka2006; Ikeda & Hikosaka Reference Ikeda and Hikosaka2003; Isoda & Hikosaka Reference Isoda and Hikosaka2008; Tachibana & Hikosaka Reference Tachibana and Hikosaka2012; Takikawa et al. Reference Takikawa, Kawagoe and Hikosaka2004; Yasuda & Hikosaka Reference Yasuda and Hikosaka2017). In this context, justifying the animals’ behavior on the basis of an optimality principle or ideal observer model seems rather unnecessary. Furthermore, in our own laboratory, we recently developed a mechanistic model that replicates the monkeys’ RT distributions as well as single-neuron activity in the frontal eye field (FEF) during performance of the 1DR task (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018). This model explains the observed behavior in great quantitative detail based on dynamical interactions found in FEF.
In summary, the results in the 1DR task exemplify one of the main conclusions drawn by R&D – that although a normative benchmark may provide useful interpretive guidance in many cases, it is by no means necessary for understanding a particular behavior, or for generating a complete mechanistic description of it.
The conclusions drawn by Rahnev & Denison (R&D) rely on analyses spanning many tasks and experimental conditions in which perceptually guided decisions deviate, for a variety of reasons, from those of an ideal observer model. Indeed, they exhaustively build a convincing argument. But sometimes a single, powerful example can illustrate a general result with great eloquence. That is the case with an elegant paradigm known as the one-direction-rewarded, or 1DR, task.
The 1DR task is deceptively simple (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018; Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Lauwereyns et al. Reference Lauwereyns, Watanabe, Coe and Hikosaka2002). The subject (a monkey, in this case) is instructed to perform an elementary action: to look at a lone, clearly visible stimulus. Each trial starts with the monkey briefly fixating on a central spot on an otherwise blank screen. Then the fixation spot disappears, and, at the same time, a target stimulus appears at one of four possible symmetric locations (or one of two locations, depending on the study). The stimulus location varies randomly across trials. The monkey is rewarded for making a quick eye movement, a saccade, to the target – a reaction that is, in fact, quite natural.
But there is a catch. Correct saccades to one location yield a large reward, whereas correct saccades to the other locations yield either a small reward or no reward (this varies by monkey, but, importantly, the results are the same). The rewarded location stays constant for a block of trials. The spatial asymmetry in reward expectation leads to a conflict: The monkey wants to look in one direction but is often instructed to look elsewhere. Nevertheless, all trials must be completed, whether the reward on offer is large or small. There is no strategic advantage to responding differently in one condition compared with the other. Only one alternative is available, so deliberating is unnecessary. To maximize the reward rate, the monkey should look at the target as quickly and as accurately as possible each time, regardless of where it appears.
However, the observed behavior diverges quite drastically from this prescription. Saccades in congruent trials, in which the target and the highly rewarded locations coincide, are initiated more quickly and are more accurate than those in incongruent trials, in which the target and the highly rewarded locations differ. The effects are huge. For example, in our own data (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018), we found that the reaction time (RT) went from about 150 ± 25 ms (mean ± standard deviation) to about 250 ± 80 ms, with the error rate changing from virtually zero (99.7% right) to about 10% of incorrect saccades. The extraordinary sensitivity of the monkeys to reward asymmetry also manifests in other, low-level behavioral metrics, such as the peak saccade velocity, as well as in the swiftness with which the animals respond to changes in the asymmetry over time. When the rewarded location changes, which happens without warning, it takes a single trial for the spatial bias to switch accordingly (when only two locations are used). This rich phenomenology is highly consistent between animals, laboratories, and task variants, and it remains stable for months, even after many thousands of trials of practice (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018; Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Takikawa et al. Reference Takikawa, Kawagoe, Itoh, Nakahara and Hikosaka2002; Watanabe et al. Reference Watanabe, Cromwell, Tremblay, Hollerman, Hikosaka and Schultz2001).
Such behavior runs counter to the expectation based on reward maximization, as outlined previously. Within the behavioral repertoire discussed by R&D, the spatial bias represents a particularly drastic breakdown of the speed-accuracy tradeoff (sect. 3.4), because one condition (congruent) leads to more accurate and much faster responses than the other (incongruent). The 1DR behavior can also be considered as a limit case of a choice task in which different responses have different payoffs (sect. 3.3). Normally, in monkeys, such asymmetry produces a shift in criterion (Feng et al. Reference Feng, Holmes, Rorie and Newsome2009; Stanford et al. Reference Stanford, Shankar, Massoglia, Costello and Salinas2010). Here, the perceptual uncertainty about the right option is eliminated, and the adjustment in criterion is grossly inappropriate. Either way, the underlying “cost function” guiding the behavior must be radically different from those that may be naively construed as optimal.
It is not difficult to imagine why such a discrepancy arises. The capacity to discriminate and seek rewarding events must be critical for survival, so it is not surprising that reward drives or modulates numerous cognitive processes. In particular, reward expectation is intimately linked to attentional deployment and oculomotor control (Hikosaka et al. Reference Hikosaka, Nakamura and Nakahara2006; Maunsell Reference Maunsell2004; Peck et al. Reference Peck, Jangraw, Suzuki, Efem and Gottlieb2009; Preciado et al. Reference Preciado, Munneke and Theeuwes2017). The conditions in the 1DR task likely set up a cognitive trap of sorts – the illusion of a choice – such that the monkeys never cease to strongly prioritize the rewarded location. In essence, they demonstrate persistent wishful thinking.
Regardless, the 1DR paradigm has been extremely useful, even though it does not adhere to a normative theory. For many years, Hikosaka and colleagues have exploited it to investigate how cognition and motivation interact, seeking to identify and functionally characterize the oculomotor and reward-encoding neural circuits that mediate the biasing effects and their motor expression. Theirs is an impressive research program that has uncovered many such contributions and mechanistic components (e.g., Ding & Hikosaka Reference Ding and Hikosaka2006; Ikeda & Hikosaka Reference Ikeda and Hikosaka2003; Isoda & Hikosaka Reference Isoda and Hikosaka2008; Tachibana & Hikosaka Reference Tachibana and Hikosaka2012; Takikawa et al. Reference Takikawa, Kawagoe and Hikosaka2004; Yasuda & Hikosaka Reference Yasuda and Hikosaka2017). In this context, justifying the animals’ behavior on the basis of an optimality principle or ideal observer model seems rather unnecessary. Furthermore, in our own laboratory, we recently developed a mechanistic model that replicates the monkeys’ RT distributions as well as single-neuron activity in the frontal eye field (FEF) during performance of the 1DR task (Hauser et al. Reference Hauser, Zhu, Stanford and Salinas2018). This model explains the observed behavior in great quantitative detail based on dynamical interactions found in FEF.
In summary, the results in the 1DR task exemplify one of the main conclusions drawn by R&D – that although a normative benchmark may provide useful interpretive guidance in many cases, it is by no means necessary for understanding a particular behavior, or for generating a complete mechanistic description of it.