Imagine you are offered 1 million euros to work for a year on an extremely high-stakes job in a remote location. Although this may sound like a worthwhile reward, it entails spending a long time away from loved ones in a high-pressure stressful environment. Despite the considerable rewards of this “once in a lifetime” opportunity, you may pass on it because of its high costs. How did you make this decision?
Scientists across multiple fields, from psychology, to neuroscience to robotics, have put a lot of effort into the challenge of defining motivation and its workings (ironically, one might add). However, as Murayama and Jach rightly emphasize, there is a pressing need for studies of motivation to move beyond the “black-box” approach and provide more precise definition, quantification, and implementation to the many concepts they have associated with motivation. We wholeheartedly support this constructivist view and contend that it applies not only to the rewards but also the costs of a situation.
A fundamental principle of decision-making in economics, neuroscience, and psychology is that individuals generate adaptive behavior by making trade-offs between the benefits and costs of alternative options (Camerer, Reference Camerer2008; Silvestrini, Musslick, Berry, & Vassena, Reference Silvestrini, Musslick, Berry and Vassena2023; Silvetti, Vassena, Abrahamse, & Verguts, Reference Silvetti, Vassena, Abrahamse and Verguts2018; Westbrook & Braver, Reference Westbrook and Braver2015). Illustrating this principle, the Motivational Intensity Theory, a classic theory of motivation (Brehm & Self, Reference Brehm and Self1989; Silvestrini, Reference Silvestrini2017; Silvestrini et al., Reference Silvestrini, Musslick, Berry and Vassena2023), posits that effort investment is proportional to the importance of the outcome and the difficulty of the task. This implies a trade-off whereby individuals discount the benefits by the costs implied in obtaining them and, consequently, aim to minimize effort by selectively boosting it for a sufficiently valuable goal.
In the last decade, the idea of cost-benefit trade-offs was successfully married to the framework of reinforcement learning to explain how experience can drive learning of rewards as well as of costs (Sutton & Barto, Reference Sutton and Barto1998; Verguts, Vassena, & Silvetti, Reference Verguts, Vassena and Silvetti2015). In reinforcement learning, expectations are updated whenever an outcome is better or worse than expected – that is, a prediction error occurs. Importantly, by applying this learning to both costs and rewards, reinforcement learning can explain how decision-makers learn not only which actions lead to rewards but also how much effort they need to exert to obtain the reward, and what is the likelihood that the reward will arrive after completing the action. However, it is important to note that learning to optimize the effort involved in a task entails not only the monitoring of external rewards, but also a meta-learning mechanism that monitors and regulates the decision-maker's own internal state. This is because the effort involved in a task depends critically on internal computations entailed in performing the task – such as the difficulty of attending to relevant features, of planning ahead and/or generating vigorous actions – as well as on the decision-maker's levels of fatigue and arousal (Bijleveld, Reference Bijleveld2023; Dora, van Hooff, Geurts, Kompier, & Bijleveld, Reference Dora, van Hooff, Geurts, Kompier and Bijleveld2022; Matthews et al., Reference Matthews, Pisauro, Jurgelis, Müller, Vassena, Chong and Apps2023; Müller, Klein-Flügge, Manohar, Husain, & Apps, Reference Müller, Klein-Flügge, Manohar, Husain and Apps2021). In turn, selecting the best decision strategy also requires tracking more complex features of the environment, such as volatility (i.e., how stable the environment is), average reward rate (i.e., how much reward is available in a given context), or the opportunity cost of time (i.e., whether time on this particular task is well spent or should better be allocated to an alternative task) (Kurzban, Duckworth, Kable, & Myers, Reference Kurzban, Duckworth, Kable and Myers2013). In this framework, a decision-maker can adapt its decisions to the context, for example, by learning to be more flexible in a volatile situation, or by learning to exert more effort to obtain rewards in a favorable reward-rich environment.
A promising neuro-computational model of meta-level motivated behavior is the Reinforcement Meta-Learner (RML) developed by Silvetti et al. (Reference Silvetti, Vassena, Abrahamse and Verguts2018), which situates mathematically precise computations of meta-learning value and costs within biologically plausible neural circuits. The RML postulates that the dorsal anterior cingulate cortex receives dopaminergic inputs conveying the rate of rewards in a task and, upon perceiving a decline in rewards (a “need for control”), calls for a boost of noradrenaline from the locus coeruleus to enhance the efficiency of cognitive computations. However, a noradrenaline boost is perceived as a cost and the system learns through experience to choose the level of boost that maximizes rewards while minimizing the cost. This multi-level cost-benefit optimization allows a remarkable level of cross-validation and falsification across tasks, contexts, and modalities. For example, the RML can capture trade-offs in motivated behavior in the context of working memory, physical effort, or attentional effort driven by the need to gain information (Silvetti, Lasaponara, Daddaoua, Horan, & Gottlieb, Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023; Silvetti et al., Reference Silvetti, Vassena, Abrahamse and Verguts2018). The RML also reproduces the sensitivity to reward volatility, producing higher learning rates in volatile relative to stable environments – that is, specifically when quickly updating beliefs is beneficial given the situation at hand (Silvetti, Seurinck, & Verguts, Reference Silvetti, Seurinck and Verguts2011, Reference Silvetti, Seurinck and Verguts2013). Finally, the RML conceptually squares with intriguing work in the motivation literature on persistence and giving up (goal disengagement; Gollwitzer, Reference Gollwitzer2018; Kappes & Schattke, Reference Kappes and Schattke2022), highlighting its ability to optimize effort exertion over longer time scales.
The RML thus offers a mathematically precise computation of the subjective benefits and costs involved in a task and implements this computation in a biologically plausible circuit. Because of its biological plausibility, the RML generates testable (falsifiable) predictions about brain and behavior, which less prone to typical pitfalls of verbal predictions such as oversimplification and lack of specificity. Crucially, the RML can be used to simulate the effects of impairments of the system on motivation. Motivational impairments are consistently observed across neuropsychiatric disorders (Caligiore, Silvetti, D'Amelio, Puglisi-Allegra, & Baldassarre, Reference Caligiore, Silvetti, D'Amelio, Puglisi-Allegra and Baldassarre2020; Husain & Roiser, Reference Husain and Roiser2018; Silvetti, Baldassarre, & Caligiore, Reference Silvetti, Baldassarre, Caligiore and Cutsuridis2019). A mechanistic understanding of the impaired computations may reveal dissociable underlying disease profiles that are virtually indistinguishable at the surface symptom levels, suggesting that motivation – if properly situated and specified – may be the key to capture clinically relevant phenotypes.
In sum, motivational constructs are characterized as emergent phenomena that stem from dynamic optimization of a cost-benefit trade-off of many decision-relevant variables within a reinforcement meta-learning framework (i.e., multivariate dynamic optimization). The meta-learning dimension allows considering not only momentary simple trade-offs but explains how we flexibly adapt to our environment while considering our internal states. The meta-reinforcement learning framework (as implemented by the RML model) thus dismisses the “motivational homunculus,” in favor of a highly integrated, situated neurocomputational solution, whose building blocks are constructed based on existing and validated psychological and neurobiological knowledge (Silvetti et al., Reference Silvetti, Vassena, Abrahamse and Verguts2018, Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023), which constitute a significant advance toward the constructivist view advocated by M&J.
Imagine you are offered 1 million euros to work for a year on an extremely high-stakes job in a remote location. Although this may sound like a worthwhile reward, it entails spending a long time away from loved ones in a high-pressure stressful environment. Despite the considerable rewards of this “once in a lifetime” opportunity, you may pass on it because of its high costs. How did you make this decision?
Scientists across multiple fields, from psychology, to neuroscience to robotics, have put a lot of effort into the challenge of defining motivation and its workings (ironically, one might add). However, as Murayama and Jach rightly emphasize, there is a pressing need for studies of motivation to move beyond the “black-box” approach and provide more precise definition, quantification, and implementation to the many concepts they have associated with motivation. We wholeheartedly support this constructivist view and contend that it applies not only to the rewards but also the costs of a situation.
A fundamental principle of decision-making in economics, neuroscience, and psychology is that individuals generate adaptive behavior by making trade-offs between the benefits and costs of alternative options (Camerer, Reference Camerer2008; Silvestrini, Musslick, Berry, & Vassena, Reference Silvestrini, Musslick, Berry and Vassena2023; Silvetti, Vassena, Abrahamse, & Verguts, Reference Silvetti, Vassena, Abrahamse and Verguts2018; Westbrook & Braver, Reference Westbrook and Braver2015). Illustrating this principle, the Motivational Intensity Theory, a classic theory of motivation (Brehm & Self, Reference Brehm and Self1989; Silvestrini, Reference Silvestrini2017; Silvestrini et al., Reference Silvestrini, Musslick, Berry and Vassena2023), posits that effort investment is proportional to the importance of the outcome and the difficulty of the task. This implies a trade-off whereby individuals discount the benefits by the costs implied in obtaining them and, consequently, aim to minimize effort by selectively boosting it for a sufficiently valuable goal.
In the last decade, the idea of cost-benefit trade-offs was successfully married to the framework of reinforcement learning to explain how experience can drive learning of rewards as well as of costs (Sutton & Barto, Reference Sutton and Barto1998; Verguts, Vassena, & Silvetti, Reference Verguts, Vassena and Silvetti2015). In reinforcement learning, expectations are updated whenever an outcome is better or worse than expected – that is, a prediction error occurs. Importantly, by applying this learning to both costs and rewards, reinforcement learning can explain how decision-makers learn not only which actions lead to rewards but also how much effort they need to exert to obtain the reward, and what is the likelihood that the reward will arrive after completing the action. However, it is important to note that learning to optimize the effort involved in a task entails not only the monitoring of external rewards, but also a meta-learning mechanism that monitors and regulates the decision-maker's own internal state. This is because the effort involved in a task depends critically on internal computations entailed in performing the task – such as the difficulty of attending to relevant features, of planning ahead and/or generating vigorous actions – as well as on the decision-maker's levels of fatigue and arousal (Bijleveld, Reference Bijleveld2023; Dora, van Hooff, Geurts, Kompier, & Bijleveld, Reference Dora, van Hooff, Geurts, Kompier and Bijleveld2022; Matthews et al., Reference Matthews, Pisauro, Jurgelis, Müller, Vassena, Chong and Apps2023; Müller, Klein-Flügge, Manohar, Husain, & Apps, Reference Müller, Klein-Flügge, Manohar, Husain and Apps2021). In turn, selecting the best decision strategy also requires tracking more complex features of the environment, such as volatility (i.e., how stable the environment is), average reward rate (i.e., how much reward is available in a given context), or the opportunity cost of time (i.e., whether time on this particular task is well spent or should better be allocated to an alternative task) (Kurzban, Duckworth, Kable, & Myers, Reference Kurzban, Duckworth, Kable and Myers2013). In this framework, a decision-maker can adapt its decisions to the context, for example, by learning to be more flexible in a volatile situation, or by learning to exert more effort to obtain rewards in a favorable reward-rich environment.
A promising neuro-computational model of meta-level motivated behavior is the Reinforcement Meta-Learner (RML) developed by Silvetti et al. (Reference Silvetti, Vassena, Abrahamse and Verguts2018), which situates mathematically precise computations of meta-learning value and costs within biologically plausible neural circuits. The RML postulates that the dorsal anterior cingulate cortex receives dopaminergic inputs conveying the rate of rewards in a task and, upon perceiving a decline in rewards (a “need for control”), calls for a boost of noradrenaline from the locus coeruleus to enhance the efficiency of cognitive computations. However, a noradrenaline boost is perceived as a cost and the system learns through experience to choose the level of boost that maximizes rewards while minimizing the cost. This multi-level cost-benefit optimization allows a remarkable level of cross-validation and falsification across tasks, contexts, and modalities. For example, the RML can capture trade-offs in motivated behavior in the context of working memory, physical effort, or attentional effort driven by the need to gain information (Silvetti, Lasaponara, Daddaoua, Horan, & Gottlieb, Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023; Silvetti et al., Reference Silvetti, Vassena, Abrahamse and Verguts2018). The RML also reproduces the sensitivity to reward volatility, producing higher learning rates in volatile relative to stable environments – that is, specifically when quickly updating beliefs is beneficial given the situation at hand (Silvetti, Seurinck, & Verguts, Reference Silvetti, Seurinck and Verguts2011, Reference Silvetti, Seurinck and Verguts2013). Finally, the RML conceptually squares with intriguing work in the motivation literature on persistence and giving up (goal disengagement; Gollwitzer, Reference Gollwitzer2018; Kappes & Schattke, Reference Kappes and Schattke2022), highlighting its ability to optimize effort exertion over longer time scales.
The RML thus offers a mathematically precise computation of the subjective benefits and costs involved in a task and implements this computation in a biologically plausible circuit. Because of its biological plausibility, the RML generates testable (falsifiable) predictions about brain and behavior, which less prone to typical pitfalls of verbal predictions such as oversimplification and lack of specificity. Crucially, the RML can be used to simulate the effects of impairments of the system on motivation. Motivational impairments are consistently observed across neuropsychiatric disorders (Caligiore, Silvetti, D'Amelio, Puglisi-Allegra, & Baldassarre, Reference Caligiore, Silvetti, D'Amelio, Puglisi-Allegra and Baldassarre2020; Husain & Roiser, Reference Husain and Roiser2018; Silvetti, Baldassarre, & Caligiore, Reference Silvetti, Baldassarre, Caligiore and Cutsuridis2019). A mechanistic understanding of the impaired computations may reveal dissociable underlying disease profiles that are virtually indistinguishable at the surface symptom levels, suggesting that motivation – if properly situated and specified – may be the key to capture clinically relevant phenotypes.
In sum, motivational constructs are characterized as emergent phenomena that stem from dynamic optimization of a cost-benefit trade-off of many decision-relevant variables within a reinforcement meta-learning framework (i.e., multivariate dynamic optimization). The meta-learning dimension allows considering not only momentary simple trade-offs but explains how we flexibly adapt to our environment while considering our internal states. The meta-reinforcement learning framework (as implemented by the RML model) thus dismisses the “motivational homunculus,” in favor of a highly integrated, situated neurocomputational solution, whose building blocks are constructed based on existing and validated psychological and neurobiological knowledge (Silvetti et al., Reference Silvetti, Vassena, Abrahamse and Verguts2018, Reference Silvetti, Lasaponara, Daddaoua, Horan and Gottlieb2023), which constitute a significant advance toward the constructivist view advocated by M&J.
Financial support
Eliana Vassena was supported by an Open Competition Xs grant (NWO 406.XS.04.129) of the Netherlands Organisation for Scientific Research (NWO).
Competing interest
None.