Anselme & Güntürkün (A&G) propose a new mechanism, “incentive hope,” inspired by both behavioral science and neuroscience, to explain how exposure to reward uncertainty could lead to increased reward-seeking effort and reward consumption. We agree with the authors that it is crucial to identify common motivational processes that bridge learning, decision making, and foraging behaviors. However, we cautiously suggest that their explanation, although sufficient, is not necessary. In other words, other motivating factors, not necessarily mutually exclusive with incentive hope, could lead to the same behavior.
In one-shot choices, animals should maximize expected reward, but in repeated gambles, animals ought to maximize long-term intake; this means actively sampling risky options if there is an opportunity to learn more about distributions (Daw et al. Reference Daw, O'Doherty, Dayan, Seymour and Dolan2006; Pearson et al. Reference Pearson, Hayden, Raghavachari and Platt2009). Multiple factors contribute to uncertainty. These include variation in prey quality, variation in spatial and temporal distributions of prey (patchiness, density, etc.), stochastic prey quality, and rate of variation in environmental factors (volatility) (Stephens & Krebs Reference Stephens and Krebs1986). Therefore, after a fresh encounter with a prey (or failure to find an expected prey), animals must assign credit; that is, they must decide which factor(s) contributed to the experience to update and infer (learn) a mental model of the environment and make predictions to guide future behavior (Noonan et al. Reference Noonan, Walton, Behrens, Sallet, Buckley and Rushworth2010; Rushworth et al. Reference Rushworth, Noonan, Boorman, Walton and Behrens2011; Walton et al. Reference Walton, Behrens, Buckley, Rudebeck and Rushworth2010).
Building a mental model of the environment yields more successful decisions in the long run but requires sampling to gather information. Information sampling often comes in the guise of locally suboptimal decisions, such as exploration (Daw et al. Reference Daw, O'Doherty, Dayan, Seymour and Dolan2006; Pearson et al. Reference Pearson, Hayden, Raghavachari and Platt2009). Numerous species can exploit complex environments rich with uncertainty (e.g., Bateson & Kacelnik Reference Bateson and Kacelnik1997; Blanchard et al. Reference Blanchard, Wilke and Hayden2014; De Petrillo et al. Reference De Petrillo, Ventricelli, Ponsi and Addessi2015; Kacelnik & Bateson Reference Kacelnik and Bateson1997). Many organisms will forego primary reward to seek information that provides strategic benefits or satisfies curiosity (Blanchard et al. Reference Blanchard, Hayden and Bromberg-Martin2015a; Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010; Kidd & Hayden Reference Kidd and Hayden2015; Wang & Hayden Reference Wang and Hayden2018). Animals appear to be aware of their own uncertainty; their proclivity to sampling information is affected by ambiguity of the information and their confidence in the estimation (Kiani & Shadlen Reference Kiani and Shadlen2009; Kornell et al. Reference Kornell, Son and Terrace2007; Pouget et al. Reference Pouget, Drugowitsch and Kepecs2016).
This information-seeking perspective offers an alternative explanation for some of the behaviors that incentive hope explains. For example, A&G argue that sign-tracking behavior is motivated by incentive hope. Alternatively, sign tracking could reflect incomplete learning (in the process of building a mental model of the environment, transitioning from model-free to model-based control), high ambiguity of new reward information, and/or low confidence in the current mental model, whereas goal tracking could reflect complete learning (model-based control), low ambiguity of new reward information, and/or high confidence in the current mental model. This means that the heightened reward-seeking behavior that A&G argued as being motivated by incentive hope could reflect sampling of information to form a better estimation of uncertain rewards or building a mental model of the task/foraging environment used in model-based reinforcement learning.
A&G use several findings about ventral tegmental area (VTA) dopamine neurons to make reverse inferences about motivations for behavior. However, recent empirical and theoretical developments on dopamine responses suggest that dopamine neurons interact with a vast network of striatal, orbitofrontal, and prefrontal regions that play a critical role in learning and represent the task state to build a mental model of the current task and guide behavior (Abe & Lee Reference Abe and Lee2011; Behrens et al. Reference Behrens, Woolrich, Walton and Rushworth2007; Blanchard et al. Reference Blanchard, Hayden and Bromberg-Martin2015a; Boorman et al. Reference Boorman, Behrens and Rushworth2011; Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010; Chang et al. Reference Chang, Gardner, Di Tillio and Schoenbaum2017; Gershman & Schoenbaum Reference Gershman and Schoenbaum2017; Hayden et al. Reference Hayden, Pearson and Platt2009; Langdon et al. Reference Langdon, Sharpe, Schoenbaum and Niv2018; Sadacca et al. Reference Sadacca, Jones and Schoenbaum2016; Takahashi et al. Reference Takahashi, Batchelor, Liu, Khanna, Morales and Schoenbaum2017; Wang & Hayden Reference Wang and Hayden2017). This means that the dopamine response described by the authors as reflecting the motivational power of incentive hope could potentially reflect a learning signal that differentiates the various sources of environmental factors involved in reward uncertainty (cf. Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010).
We do not believe that all ostensible cases of incentive hope are really just cases of information seeking. We think the explanation proposed by A&G is often correct. Instead, we argue that neuroscientists need to move toward behaviorally rich but well-controlled tasks to disambiguate factors that can be intractable with traditional low-dimensional lab tasks (Kaplan et al. Reference Kaplan, Schuck and Doeller2017; Pitkow & Angelaki Reference Pitkow and Angelaki2017; Schonberg et al. Reference Schonberg, Fox and Poldrack2011). For example, incentive hope and information seeking are hard to isolate and manipulate in traditional lab paradigms – yet an at least somewhat controlled laboratory environment is necessary for sufficient control of confounding variables.
Thus we propose that a valuable path forward will be seminaturalistic foraging-like decision paradigms. Careful analysis of behavior, along with consideration of the tenets of foraging theory, will be able to separate various motivational factors (Blanchard et al. Reference Blanchard, Strait and Hayden2015b; Calhoun and Hayden Reference Calhoun and Hayden2015; Hayden Reference Hayden2018; Mobbs et al. Reference Mobbs, Trimmer, Blumstein and Dayan2018). Such tasks can reveal deeper coding of task variables than conventional tasks and analyses can (e.g., Killian et al. Reference Killian, Jutras and Buffalo2012; Musall et al. Reference Musall, Kaufman, Gluf and Churchland2018; Strait et al. Reference Strait, Sleezer, Blanchard, Azab, Castagno and Hayden2016; Wirth et al. Reference Wirth, Baraduc, Planté, Pinède and Duhamel2017). It is likely that large-scale, multiple-unit recordings will be necessary to take advantage of the likely rare occurrences of cognitively important events. We are especially sanguine about virtual reality settings, as these allow both full controllability and relatively immersive task environments (Kaplan et al. Reference Kaplan, Schuck and Doeller2017; Minderer & Harvey Reference Minderer and Harvey2016). Such tasks can focus on independent manipulation of a task's or foraging environment's richness, reward density, and patchiness, as well as the ambiguity and volatility of uncertainty reward.
Anselme & Güntürkün (A&G) propose a new mechanism, “incentive hope,” inspired by both behavioral science and neuroscience, to explain how exposure to reward uncertainty could lead to increased reward-seeking effort and reward consumption. We agree with the authors that it is crucial to identify common motivational processes that bridge learning, decision making, and foraging behaviors. However, we cautiously suggest that their explanation, although sufficient, is not necessary. In other words, other motivating factors, not necessarily mutually exclusive with incentive hope, could lead to the same behavior.
In one-shot choices, animals should maximize expected reward, but in repeated gambles, animals ought to maximize long-term intake; this means actively sampling risky options if there is an opportunity to learn more about distributions (Daw et al. Reference Daw, O'Doherty, Dayan, Seymour and Dolan2006; Pearson et al. Reference Pearson, Hayden, Raghavachari and Platt2009). Multiple factors contribute to uncertainty. These include variation in prey quality, variation in spatial and temporal distributions of prey (patchiness, density, etc.), stochastic prey quality, and rate of variation in environmental factors (volatility) (Stephens & Krebs Reference Stephens and Krebs1986). Therefore, after a fresh encounter with a prey (or failure to find an expected prey), animals must assign credit; that is, they must decide which factor(s) contributed to the experience to update and infer (learn) a mental model of the environment and make predictions to guide future behavior (Noonan et al. Reference Noonan, Walton, Behrens, Sallet, Buckley and Rushworth2010; Rushworth et al. Reference Rushworth, Noonan, Boorman, Walton and Behrens2011; Walton et al. Reference Walton, Behrens, Buckley, Rudebeck and Rushworth2010).
Building a mental model of the environment yields more successful decisions in the long run but requires sampling to gather information. Information sampling often comes in the guise of locally suboptimal decisions, such as exploration (Daw et al. Reference Daw, O'Doherty, Dayan, Seymour and Dolan2006; Pearson et al. Reference Pearson, Hayden, Raghavachari and Platt2009). Numerous species can exploit complex environments rich with uncertainty (e.g., Bateson & Kacelnik Reference Bateson and Kacelnik1997; Blanchard et al. Reference Blanchard, Wilke and Hayden2014; De Petrillo et al. Reference De Petrillo, Ventricelli, Ponsi and Addessi2015; Kacelnik & Bateson Reference Kacelnik and Bateson1997). Many organisms will forego primary reward to seek information that provides strategic benefits or satisfies curiosity (Blanchard et al. Reference Blanchard, Hayden and Bromberg-Martin2015a; Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010; Kidd & Hayden Reference Kidd and Hayden2015; Wang & Hayden Reference Wang and Hayden2018). Animals appear to be aware of their own uncertainty; their proclivity to sampling information is affected by ambiguity of the information and their confidence in the estimation (Kiani & Shadlen Reference Kiani and Shadlen2009; Kornell et al. Reference Kornell, Son and Terrace2007; Pouget et al. Reference Pouget, Drugowitsch and Kepecs2016).
This information-seeking perspective offers an alternative explanation for some of the behaviors that incentive hope explains. For example, A&G argue that sign-tracking behavior is motivated by incentive hope. Alternatively, sign tracking could reflect incomplete learning (in the process of building a mental model of the environment, transitioning from model-free to model-based control), high ambiguity of new reward information, and/or low confidence in the current mental model, whereas goal tracking could reflect complete learning (model-based control), low ambiguity of new reward information, and/or high confidence in the current mental model. This means that the heightened reward-seeking behavior that A&G argued as being motivated by incentive hope could reflect sampling of information to form a better estimation of uncertain rewards or building a mental model of the task/foraging environment used in model-based reinforcement learning.
A&G use several findings about ventral tegmental area (VTA) dopamine neurons to make reverse inferences about motivations for behavior. However, recent empirical and theoretical developments on dopamine responses suggest that dopamine neurons interact with a vast network of striatal, orbitofrontal, and prefrontal regions that play a critical role in learning and represent the task state to build a mental model of the current task and guide behavior (Abe & Lee Reference Abe and Lee2011; Behrens et al. Reference Behrens, Woolrich, Walton and Rushworth2007; Blanchard et al. Reference Blanchard, Hayden and Bromberg-Martin2015a; Boorman et al. Reference Boorman, Behrens and Rushworth2011; Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010; Chang et al. Reference Chang, Gardner, Di Tillio and Schoenbaum2017; Gershman & Schoenbaum Reference Gershman and Schoenbaum2017; Hayden et al. Reference Hayden, Pearson and Platt2009; Langdon et al. Reference Langdon, Sharpe, Schoenbaum and Niv2018; Sadacca et al. Reference Sadacca, Jones and Schoenbaum2016; Takahashi et al. Reference Takahashi, Batchelor, Liu, Khanna, Morales and Schoenbaum2017; Wang & Hayden Reference Wang and Hayden2017). This means that the dopamine response described by the authors as reflecting the motivational power of incentive hope could potentially reflect a learning signal that differentiates the various sources of environmental factors involved in reward uncertainty (cf. Bromberg-Martin et al. Reference Bromberg-Martin, Matsumoto and Hikosaka2010).
We do not believe that all ostensible cases of incentive hope are really just cases of information seeking. We think the explanation proposed by A&G is often correct. Instead, we argue that neuroscientists need to move toward behaviorally rich but well-controlled tasks to disambiguate factors that can be intractable with traditional low-dimensional lab tasks (Kaplan et al. Reference Kaplan, Schuck and Doeller2017; Pitkow & Angelaki Reference Pitkow and Angelaki2017; Schonberg et al. Reference Schonberg, Fox and Poldrack2011). For example, incentive hope and information seeking are hard to isolate and manipulate in traditional lab paradigms – yet an at least somewhat controlled laboratory environment is necessary for sufficient control of confounding variables.
Thus we propose that a valuable path forward will be seminaturalistic foraging-like decision paradigms. Careful analysis of behavior, along with consideration of the tenets of foraging theory, will be able to separate various motivational factors (Blanchard et al. Reference Blanchard, Strait and Hayden2015b; Calhoun and Hayden Reference Calhoun and Hayden2015; Hayden Reference Hayden2018; Mobbs et al. Reference Mobbs, Trimmer, Blumstein and Dayan2018). Such tasks can reveal deeper coding of task variables than conventional tasks and analyses can (e.g., Killian et al. Reference Killian, Jutras and Buffalo2012; Musall et al. Reference Musall, Kaufman, Gluf and Churchland2018; Strait et al. Reference Strait, Sleezer, Blanchard, Azab, Castagno and Hayden2016; Wirth et al. Reference Wirth, Baraduc, Planté, Pinède and Duhamel2017). It is likely that large-scale, multiple-unit recordings will be necessary to take advantage of the likely rare occurrences of cognitively important events. We are especially sanguine about virtual reality settings, as these allow both full controllability and relatively immersive task environments (Kaplan et al. Reference Kaplan, Schuck and Doeller2017; Minderer & Harvey Reference Minderer and Harvey2016). Such tasks can focus on independent manipulation of a task's or foraging environment's richness, reward density, and patchiness, as well as the ambiguity and volatility of uncertainty reward.