Beyond “incentive hope”: Information sampling and learning under reward uncertainty

Maya Zhe Wang; Benjamin Y. Hayden

doi:10.1017/S0140525X18001930

Beyond “incentive hope”: Information sampling and learning under reward uncertainty

Published online by Cambridge University Press: 19 March 2019

Maya Zhe Wang

and

Benjamin Y. Hayden

Show author details

Maya Zhe Wang: Affiliation:
Department of Brain and Cognitive Sciences and Center for Visual Sciences, University of Rochester, Rochester, NY 14627 Department of Neuroscience and Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455. wang8200@umn.edubenhayden@gmail.comhaydenlab.com
Benjamin Y. Hayden: Affiliation:
Department of Neuroscience and Center for Magnetic Resonance Research, University of Minnesota, Minneapolis, MN 55455. wang8200@umn.edubenhayden@gmail.comhaydenlab.com

Article contents

Abstract
References

Rights & Permissions

Abstract

Information seeking, especially when motivated by strategic learning and intrinsic curiosity, could render the new mechanism “incentive hope” proposed by Anselme & Güntürkün sufficient, but not necessary to explain how reward uncertainty promotes reward seeking and consumption. Naturalistic and foraging-like tasks can help parse motivational processes that bridge learning and foraging behaviors and identify their neural underpinnings.

Type: Open Peer Commentary
Information: Behavioral and Brain Sciences , Volume 42 , 2019 , e56

DOI: https://doi.org/10.1017/S0140525X18001930 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2019

Anselme & Güntürkün (A&G) propose a new mechanism, “incentive hope,” inspired by both behavioral science and neuroscience, to explain how exposure to reward uncertainty could lead to increased reward-seeking effort and reward consumption. We agree with the authors that it is crucial to identify common motivational processes that bridge learning, decision making, and foraging behaviors. However, we cautiously suggest that their explanation, although sufficient, is not necessary. In other words, other motivating factors, not necessarily mutually exclusive with incentive hope, could lead to the same behavior.

In one-shot choices, animals should maximize expected reward, but in repeated gambles, animals ought to maximize long-term intake; this means actively sampling risky options if there is an opportunity to learn more about distributions (Daw et al. Reference Daw, O'Doherty, Dayan, Seymour and Dolan2006; Pearson et al. Reference Pearson, Hayden, Raghavachari and Platt2009). Multiple factors contribute to uncertainty. These include variation in prey quality, variation in spatial and temporal distributions of prey (patchiness, density, etc.), stochastic prey quality, and rate of variation in environmental factors (volatility) (Stephens & Krebs Reference Stephens and Krebs1986). Therefore, after a fresh encounter with a prey (or failure to find an expected prey), animals must assign credit; that is, they must decide which factor(s) contributed to the experience to update and infer (learn) a mental model of the environment and make predictions to guide future behavior (Noonan et al. Reference Noonan, Walton, Behrens, Sallet, Buckley and Rushworth2010; Rushworth et al. Reference Rushworth, Noonan, Boorman, Walton and Behrens2011; Walton et al. Reference Walton, Behrens, Buckley, Rudebeck and Rushworth2010).

Building a mental model of the environment yields more successful decisions in the long run but requires sampling to gather information. Information sampling often comes in the guise of locally suboptimal decisions, such as exploration (Daw et al. Reference Daw, O'Doherty, Dayan, Seymour and Dolan2006; Pearson et al. Reference Pearson, Hayden, Raghavachari and Platt2009). Numerous species can exploit complex environments rich with uncertainty (e.g., Bateson & Kacelnik Reference Bateson and Kacelnik1997; Blanchard et al. Reference Blanchard, Wilke and Hayden2014; De Petrillo et al. Reference De Petrillo, Ventricelli, Ponsi and Addessi2015; Kacelnik & Bateson Reference Kacelnik and Bateson1997). Many organisms will forego primary reward to seek information that provides strategic benefits or satisfies curiosity (Blanchard et al. Reference Blanchard, Hayden and Bromberg-Martin2015a; Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010; Kidd & Hayden Reference Kidd and Hayden2015; Wang & Hayden Reference Wang and Hayden2018). Animals appear to be aware of their own uncertainty; their proclivity to sampling information is affected by ambiguity of the information and their confidence in the estimation (Kiani & Shadlen Reference Kiani and Shadlen2009; Kornell et al. Reference Kornell, Son and Terrace2007; Pouget et al. Reference Pouget, Drugowitsch and Kepecs2016).

This information-seeking perspective offers an alternative explanation for some of the behaviors that incentive hope explains. For example, A&G argue that sign-tracking behavior is motivated by incentive hope. Alternatively, sign tracking could reflect incomplete learning (in the process of building a mental model of the environment, transitioning from model-free to model-based control), high ambiguity of new reward information, and/or low confidence in the current mental model, whereas goal tracking could reflect complete learning (model-based control), low ambiguity of new reward information, and/or high confidence in the current mental model. This means that the heightened reward-seeking behavior that A&G argued as being motivated by incentive hope could reflect sampling of information to form a better estimation of uncertain rewards or building a mental model of the task/foraging environment used in model-based reinforcement learning.

A&G use several findings about ventral tegmental area (VTA) dopamine neurons to make reverse inferences about motivations for behavior. However, recent empirical and theoretical developments on dopamine responses suggest that dopamine neurons interact with a vast network of striatal, orbitofrontal, and prefrontal regions that play a critical role in learning and represent the task state to build a mental model of the current task and guide behavior (Abe & Lee Reference Abe and Lee2011; Behrens et al. Reference Behrens, Woolrich, Walton and Rushworth2007; Blanchard et al. Reference Blanchard, Hayden and Bromberg-Martin2015a; Boorman et al. Reference Boorman, Behrens and Rushworth2011; Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010; Chang et al. Reference Chang, Gardner, Di Tillio and Schoenbaum2017; Gershman & Schoenbaum Reference Gershman and Schoenbaum2017; Hayden et al. Reference Hayden, Pearson and Platt2009; Langdon et al. Reference Langdon, Sharpe, Schoenbaum and Niv2018; Sadacca et al. Reference Sadacca, Jones and Schoenbaum2016; Takahashi et al. Reference Takahashi, Batchelor, Liu, Khanna, Morales and Schoenbaum2017; Wang & Hayden Reference Wang and Hayden2017). This means that the dopamine response described by the authors as reflecting the motivational power of incentive hope could potentially reflect a learning signal that differentiates the various sources of environmental factors involved in reward uncertainty (cf. Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010).

We do not believe that all ostensible cases of incentive hope are really just cases of information seeking. We think the explanation proposed by A&G is often correct. Instead, we argue that neuroscientists need to move toward behaviorally rich but well-controlled tasks to disambiguate factors that can be intractable with traditional low-dimensional lab tasks (Kaplan et al. Reference Kaplan, Schuck and Doeller2017; Pitkow & Angelaki Reference Pitkow and Angelaki2017; Schonberg et al. Reference Schonberg, Fox and Poldrack2011). For example, incentive hope and information seeking are hard to isolate and manipulate in traditional lab paradigms – yet an at least somewhat controlled laboratory environment is necessary for sufficient control of confounding variables.

Thus we propose that a valuable path forward will be seminaturalistic foraging-like decision paradigms. Careful analysis of behavior, along with consideration of the tenets of foraging theory, will be able to separate various motivational factors (Blanchard et al. Reference Blanchard, Strait and Hayden2015b; Calhoun and Hayden Reference Calhoun and Hayden2015; Hayden Reference Hayden2018; Mobbs et al. Reference Mobbs, Trimmer, Blumstein and Dayan2018). Such tasks can reveal deeper coding of task variables than conventional tasks and analyses can (e.g., Killian et al. Reference Killian, Jutras and Buffalo2012; Musall et al. Reference Musall, Kaufman, Gluf and Churchland2018; Strait et al. Reference Strait, Sleezer, Blanchard, Azab, Castagno and Hayden2016; Wirth et al. Reference Wirth, Baraduc, Planté, Pinède and Duhamel2017). It is likely that large-scale, multiple-unit recordings will be necessary to take advantage of the likely rare occurrences of cognitively important events. We are especially sanguine about virtual reality settings, as these allow both full controllability and relatively immersive task environments (Kaplan et al. Reference Kaplan, Schuck and Doeller2017; Minderer & Harvey Reference Minderer and Harvey2016). Such tasks can focus on independent manipulation of a task's or foraging environment's richness, reward density, and patchiness, as well as the ambiguity and volatility of uncertainty reward.

References

Abe, H. & Lee, D. (2011) Distributed coding of actual and hypothetical outcomes in the orbital and dorsolateral prefrontal cortex. Neuron 70(4):731–41. http://doi.org/10.1016/j.neuron.2011.03.026.Google Scholar

Bateson, M. & Kacelnik, A. (1997) Starlings’ preference for predictable and unpredictable delays to food. Animal Behaviour 53(6):1129–42. https://doi.org/10.1006/anbe.1996.0388.Google Scholar

Behrens, T. E. J., Woolrich, M. W., Walton, M. E. & Rushworth, M. F. S. (2007) Learning the value of information in an uncertain world. Nature Neuroscience 10(9):1214–21. http://doi.org/10.1038/nn1954.Google Scholar

Blanchard, T. C., Hayden, B. Y. & Bromberg-Martin, E. S. (2015a) Orbitofrontal cortex uses distinct codes for different choice attributes in decisions motivated by curiosity. Neuron 85(3):602–14. http://doi.org/10.1016/j.neuron.2014.12.050.Google Scholar

Blanchard, T. C., Strait, C. E. & Hayden, B. Y. (2015b) Ramping ensemble activity in dorsal anterior cingulate neurons during persistent commitment to a decision. Journal of Neurophysiology 114:2439–49. http://doi:10.1152/jn.00711.2015.Google Scholar

Blanchard, T. C., Wilke, A. & Hayden, B. Y. (2014) Hot-hand bias in rhesus monkeys. Journal of Experimental Psychology: Animal Learning and Cognition 40(3):280–86. http://doi.org/10.1037/xan0000033.Google Scholar

Boorman, E. D., Behrens, T. E. & Rushworth, M. F. (2011) Counterfactual choice and learning in a neural network centered on human lateral frontopolar cortex. PLoS Biology 9(6):e1001093–13. http://doi.org/10.1371/journal.pbio.1001093.Google Scholar

Bromberg-Martin, E. S., Matsumoto, M. & Hikosaka, O. (2010) Dopamine in motivational control: Rewarding, aversive, and alerting. Neuron 68(5):815–34. http://doi.org/10.1016/j.neuron.2010.11.022.Google Scholar

Calhoun, A. J. & Hayden, B. Y. (2015) The foraging brain. Current Opinion in Behavioral Science 5:24–31. http://dx.doi.org/10.1016/j.cobeha.2015.07.003.Google Scholar

Chang, C. Y., Gardner, M., Di Tillio, M. G. & Schoenbaum, G. (2017) Optogenetic blockade of dopamine transients prevents learning induced by changes in reward features. Curbio 27(22):3480–86.Google Scholar

Daw, N. D., O'Doherty, J. P., Dayan, P., Seymour, B. & Dolan, R. J. (2006) Cortical substrates for exploratory decisions in humans. Nature 441(7095):876–79. http://doi.org/10.1038/nature04766.Google Scholar

De Petrillo, F., Ventricelli, M., Ponsi, G. & Addessi, E. (2015) Do tufted capuchin monkeys play the odds? Flexible risk preferences in Sapajus spp. Animal Cognition 18(1):119–30.Google Scholar

Gershman, S. J. & Schoenbaum, G. (2017) Rethinking dopamine prediction errors. bioRxiv 239731 preprint. doi: https://doi.org/10.1101/239731.Google Scholar

Hayden, B. Y. (2018) Economic choice: The foraging perspective. Current Opinion in Behavioral Sciences 24:1–6.Google Scholar

Hayden, B. Y., Pearson, J. M. & Platt, M. L. (2009) Fictive reward signals in the anterior cingulate cortex. Science 324(5929):948–50. http://doi.org/10.1126/science.1168488.Google Scholar

Kacelnik, A. & Bateson, M. (1997) Risk-sensitivity: Crossroads for theories of decision-making. Trends in Cognitive Sciences 1(8):304–309. http://doi.org/10.1016/S1364-6613(97)01093-0.Google Scholar

Kaplan, R., Schuck, N. W. & Doeller, C. F. (2017) The role of mental maps in decision-making. Trends in Neurosciences 40(5):1–4. http://doi.org/10.1016/j.tins.2017.03.002.Google Scholar

Kiani, R. & Shadlen, M. N. (2009) Representation of confidence associated with a decision by neurons in the parietal cortex. Science 324(5928):759–64. http://doi.org/10.1126/science.1169405.Google Scholar

Kidd, C. & Hayden, B. Y. (2015) The psychology and neuroscience of curiosity. Neuron 88(3):449–60. http://doi.org/10.1016/j.neuron.2015.09.010.Google Scholar

Killian, N. J., Jutras, M. J. & Buffalo, E. A. (2012) A map of visual space in the primate entorhinal cortex. Nature 491(7426):761.Google Scholar

Kornell, N., Son, L. K. & Terrace, H. S. (2007) Transfer of metacognitive skills and hint seeking in monkeys. Psychological Science 18(1):64–71.Google Scholar

Langdon, A. J., Sharpe, M. J., Schoenbaum, G. & Niv, Y. (2018) Model-based predictions for dopamine. Current Opinion in Neurobiology 49:1–7.Google Scholar

Minderer, M. & Harvey, C. D. (2016) Neuroscience: Virtual reality explored. Nature 533(7603):324–24. http://doi.org/10.1038/nature17899.Google Scholar

Mobbs, D., Trimmer, P. C., Blumstein, D. T. & Dayan, P. (2018) Foraging for foundations in decision neuroscience: Insights from ethology. Neuroscience 13(18):19.Google Scholar

Musall, S., Kaufman, M. T., Gluf, S. & Churchland, A. K. (2018) Movement-related activity dominates cortex during sensory-guided decision making. bioRxiv preprint. https://doi.org/10.1101/308288.Google Scholar

Noonan, M. P., Walton, M. E., Behrens, T. E. J., Sallet, J., Buckley, M. J. & Rushworth, M. F. S. (2010) Separate value comparison and learning mechanisms in macaque medial and lateral orbitofrontal cortex. Proceedings of the National Academy of Sciences USA 107(47):20547–52. http://doi.org/10.1073/pnas.1012246107.Google Scholar

Pearson, J. M., Hayden, B. Y., Raghavachari, S. & Platt, M. L. (2009) Neurons in posterior cingulate cortex signal exploratory decisions in a dynamic multioption choice task. Current Biology 19(18):1532–37. http://doi.org/10.1016/j.cub.2009.07.048.Google Scholar

Pitkow, X. & Angelaki, D. (2017) How the brain might work: Statistics flowing in redundant population codes. arXiv:1702.03492Google Scholar

Pouget, A., Drugowitsch, J. & Kepecs, A. (2016) Confidence and certainty: Distinct probabilistic quantities for different goals. Nature Neuroscience 19(3):366–74. http://doi.org/10.1038/nn.4240.Google Scholar

Rushworth, M. F. S., Noonan, M. P., Boorman, E. D., Walton, M. E. & Behrens, T. E. (2011) Frontal cortex and reward-guided learning and decision-making. Neuron 70(6):1054–69. http://doi.org/10.1016/j.neuron.2011.05.014.Google Scholar

Sadacca, B. F., Jones, J. L. & Schoenbaum, G. (2016) Midbrain dopamine neurons compute inferred and cached value prediction errors in a common framework. eLife 016;5:e13665. http://doi.org/10.7554/eLife.13665.Google Scholar

Schonberg, T., Fox, C. R. & Poldrack, R. A. (2011) Mind the gap: Bridging economic and naturalistic risk-taking with cognitive neuroscience. Trends in Cognitive Sciences 15(1):11–19. http://doi.org/10.1016/j.tics.2010.10.002.Google Scholar

Stephens, D. W. & Krebs, J. R. (1986) Foraging theory. Princeton University Press.Google Scholar

Strait, C. E., Sleezer, B. J., Blanchard, T. C., Azab, H., Castagno, M. D. & Hayden, B. Y. (2016) Neuronal selectivity for spatial position of offers and choices in five reward regions. Journal of Neurophysiology 115:1098–1111.Google Scholar

Takahashi, Y. K., Batchelor, H. M., Liu, B., Khanna, A., Morales, M. & Schoenbaum, G. (2017) Dopamine neurons respond to errors in the prediction of sensory features of expected rewards. Neuron 95(6):1395–1405.Google Scholar

Walton, M. E., Behrens, T. E. J., Buckley, M. J., Rudebeck, P. H. & Rushworth, M. F. S. (2010) Separable learning systems in the macaque brain and the role of orbitofrontal cortex in contingent learning. Neuron 65(6):927–39. http://doi.org/10.1016/j.neuron.2010.02.027.Google Scholar

Wang, M. Z. & Hayden, B. (2018) Monkeys are curious about counterfactual outcomes. bioRxiv preprint. http://doi.org/10.1101/291708.Google Scholar

Wang, M. Z. & Hayden, B. Y. (2017) Reactivation of associative structure specific outcome responses during prospective evaluation in reward-based choices. Nature Communications 8:15821. http://doi.org/10.1038/ncomms15821.Google Scholar

Wirth, S., Baraduc, P., Planté, A., Pinède, S., & Duhamel, J.-R. (2017) Gaze-informed, task-situated representation of space in primate hippocampus during virtual navigation. PLoS Biology 15(2):e2001045. http://doi.org/10.1371/journal.pbio.2001045.Google Scholar