The authors propose a promising link between social/cognitive and behavioral approaches to choice by arguing that the course of purely mental processes is determined by the same kind of reward that governs external goal pursuit (citing Murayama, Reference Murayama, Bong, Kim and Reeve2022). This conclusion is acceptable to any behaviorist who is not still bound by the old Skinnerian dictum against dealing with mental processes. It perhaps marks the falling away of the last theoretical barrier between the two approaches. The target article does not pursue a remaining holdover, its odd distinction between motivation and reward, but rather calls for moving on to find “the type of information that is perceived as rewarding” (target article, sect. 4.3).
Certainly Figure 2 depicts a perfect reward-learning model. The authors argue that the various high-level goals described in social/cognitive psychology are not natural types but rather clusters of related outcomes. They are not elementary variables but “black boxes” (e.g., target article, sect. 3.1), the building blocks of which are computed values. Thus, “priority should be given to understanding the underlying computational mechanisms” (target article, sect. 4.3). Behavioral reward theory has already provided candidates for such building blocks, by analyzing the most elementary computations as sequences of binary choices. At least with external rewards, brain imaging shows rehearsal of sequences leading up to choices – vicarious trial and error – that looks like human deliberation (Redish, Reference Redish2016). As with a computer, such binary choices should be able to form high-order processes of great complexity.
As for what makes information rewarding, the authors settle on curiosity that is induced by awareness of “information gaps” (target article, sect. 4.2) which lead to the experiences of “novelty, uncertainty, conflict, complexity, etc.” (Fig. 2); but even these will require some fleshing out to become “basic building blocks of rewarding value” (target article, sect. 5.2). Here is where a behavioral approach can make further contributions. Whatever the reward is for narrowing the knowledge gap or for satisfying curiosity, it should (a) perform like rewards that have been studied in other contexts; (b) have a variable effect over a time course; and (c) depend on some kind of appetite.
(a) As an example of intrinsic reward performing like other kinds: Opportunities to add up small numbers can be shown to reward human subjects' attention-paying responses so these follow Herrnstein's matching law of reward, despite no physical behavior and no feedback about getting the answers right (Heyman & Moncaleano, Reference Heyman and Moncaleano2021).
(b) High-order goals are clearly subject to nonexponential discounting – probably hyperbolic – which entails the overeffectiveness of near-term rewards. The pursuit of almost any of the authors' examples requires solving intertemporal conflicts: Long-term competence is threatened by present-paying laziness, self-esteem by impulsiveness, and so on. The formation of high-order goals may depend on identifying larger but more delayed rewards that can overcome faster-rewarding alternatives only by making common cause with similar delayed rewards, that is, by being perceived as serving a shared aspiration that is at stake in each relevant choice (Ainslie, Reference Ainslie1992, pp. 144–162; Reference Ainslie2005, pp. 5–9; Read, Lowenstein, & Rabin, Reference Read, Lowenstein and Rabin1999; but see also Rachlin, Reference Rachlin1995).
(c) Appetite will be the key variable in rewards that do not depend on physical states, and in the goals built from them. Given the variety of goals that people have made central to their lives, including powers, loves, collections, knowledge, faiths, theories, and delusions, sources of intrinsic reward are probably not limited to inborn turn-key patterns – not the “inherently interesting or enjoyable” (Deci & Ryan, Reference Deci and Ryan1985) – but open to arbitrary choice. This author has elsewhere proposed that much reward is thus not just intrinsic but endogenous, available at will (Ainslie, Reference Ainslie1992, pp. 243–263; Reference Ainslie2013, pp. 8–13; Reference Ainslie2017, pp. 178–184). Some attention-directing skill is apparently learned to promote a goal above moment-to-moment satisfactions by holding off reward until appetite gets strong, and only then harvesting it. But this skill will itself be subject to hyperbolic impatience, so it must find criteria outside of its control for harvesting its investment. For a basic example without extrinsic incentives: A solitaire player is deterred from claiming a win until the cards occasion it by remembering that cheating has always made the investment worthless. The art of exploiting endogenous reward is to find occasions that are singular – distinct and infrequent – enough to prevent a reward's overuse and hence inflation.
Repeated successes make some kinds of gambles lose value through habituation, but let others stand out by revealing related gambles that build appetite. Complex patterns of occasioning should sometimes proliferate into major preoccupations or lifestyles (outlined in Ainslie, Reference Ainslie2013), and thus form high-level goals as in the authors' examples (target article, sect. 4.2). Endogenous reward is a fiat currency in which agents can indulge freely, limited only by depletion of their appetite for it. However, such reward is subject to competition by not only extrinsic incentives, but also by different patterns of endogenous reward that build their appetite and harvest their reward on alternative timetables. Painful thoughts, for instance, would offer a combination of rapid reward for attention but with an inhibition of other sources of reward.
The authors have only begun to tap the radical potential of endogenous reward, which is, in effect, a behavior (Ainslie, Reference Ainslie2023, pp. 19–22). If reward governs cognitive functions in general, it may be the universal selective factor in all modifiable mental processes. Of course tendencies toward many responses are inborn – for example, in emotions such as anger after frustration or fear when facing danger, or in the authors' example of orienting attention to an object that suddenly appears (target article, sect. 5.2). Obedience to pre-existing tendencies is conventionally ascribed to unmotivated black-box factors such as incentive salience, simple pairing (“conditioning”), or the actor's having formed a habit. But many examples show pre-existing tendencies to be modifiable in the marketplace of motivation: They can be overcome by competing rewards with the right magnitude and timing, as in learned emotional control (Ainslie & Monterosso, Reference Ainslie and Monterosso2005) or strong attentional focus (Beecher, Reference Beecher1948). After all, incentive salience is still incentive, emotions all have valences (Miller, Reference Miller1969) and conditioning doesn't occur to neutral unconditioned stimuli (Goldwater, Reference Goldwater1972, pp. 350–351). As for habits, even rats switch flexibly into and out of them (Keramati, Smittenaar, Dolan, & Dayan, Reference Keramati, Smittenaar, Dolan and Dayan2016). Mental processes in general may be pulled by reward much more than they are pushed by prior stimuli.
The authors propose a promising link between social/cognitive and behavioral approaches to choice by arguing that the course of purely mental processes is determined by the same kind of reward that governs external goal pursuit (citing Murayama, Reference Murayama, Bong, Kim and Reeve2022). This conclusion is acceptable to any behaviorist who is not still bound by the old Skinnerian dictum against dealing with mental processes. It perhaps marks the falling away of the last theoretical barrier between the two approaches. The target article does not pursue a remaining holdover, its odd distinction between motivation and reward, but rather calls for moving on to find “the type of information that is perceived as rewarding” (target article, sect. 4.3).
Certainly Figure 2 depicts a perfect reward-learning model. The authors argue that the various high-level goals described in social/cognitive psychology are not natural types but rather clusters of related outcomes. They are not elementary variables but “black boxes” (e.g., target article, sect. 3.1), the building blocks of which are computed values. Thus, “priority should be given to understanding the underlying computational mechanisms” (target article, sect. 4.3). Behavioral reward theory has already provided candidates for such building blocks, by analyzing the most elementary computations as sequences of binary choices. At least with external rewards, brain imaging shows rehearsal of sequences leading up to choices – vicarious trial and error – that looks like human deliberation (Redish, Reference Redish2016). As with a computer, such binary choices should be able to form high-order processes of great complexity.
As for what makes information rewarding, the authors settle on curiosity that is induced by awareness of “information gaps” (target article, sect. 4.2) which lead to the experiences of “novelty, uncertainty, conflict, complexity, etc.” (Fig. 2); but even these will require some fleshing out to become “basic building blocks of rewarding value” (target article, sect. 5.2). Here is where a behavioral approach can make further contributions. Whatever the reward is for narrowing the knowledge gap or for satisfying curiosity, it should (a) perform like rewards that have been studied in other contexts; (b) have a variable effect over a time course; and (c) depend on some kind of appetite.
(a) As an example of intrinsic reward performing like other kinds: Opportunities to add up small numbers can be shown to reward human subjects' attention-paying responses so these follow Herrnstein's matching law of reward, despite no physical behavior and no feedback about getting the answers right (Heyman & Moncaleano, Reference Heyman and Moncaleano2021).
(b) High-order goals are clearly subject to nonexponential discounting – probably hyperbolic – which entails the overeffectiveness of near-term rewards. The pursuit of almost any of the authors' examples requires solving intertemporal conflicts: Long-term competence is threatened by present-paying laziness, self-esteem by impulsiveness, and so on. The formation of high-order goals may depend on identifying larger but more delayed rewards that can overcome faster-rewarding alternatives only by making common cause with similar delayed rewards, that is, by being perceived as serving a shared aspiration that is at stake in each relevant choice (Ainslie, Reference Ainslie1992, pp. 144–162; Reference Ainslie2005, pp. 5–9; Read, Lowenstein, & Rabin, Reference Read, Lowenstein and Rabin1999; but see also Rachlin, Reference Rachlin1995).
(c) Appetite will be the key variable in rewards that do not depend on physical states, and in the goals built from them. Given the variety of goals that people have made central to their lives, including powers, loves, collections, knowledge, faiths, theories, and delusions, sources of intrinsic reward are probably not limited to inborn turn-key patterns – not the “inherently interesting or enjoyable” (Deci & Ryan, Reference Deci and Ryan1985) – but open to arbitrary choice. This author has elsewhere proposed that much reward is thus not just intrinsic but endogenous, available at will (Ainslie, Reference Ainslie1992, pp. 243–263; Reference Ainslie2013, pp. 8–13; Reference Ainslie2017, pp. 178–184). Some attention-directing skill is apparently learned to promote a goal above moment-to-moment satisfactions by holding off reward until appetite gets strong, and only then harvesting it. But this skill will itself be subject to hyperbolic impatience, so it must find criteria outside of its control for harvesting its investment. For a basic example without extrinsic incentives: A solitaire player is deterred from claiming a win until the cards occasion it by remembering that cheating has always made the investment worthless. The art of exploiting endogenous reward is to find occasions that are singular – distinct and infrequent – enough to prevent a reward's overuse and hence inflation.
Repeated successes make some kinds of gambles lose value through habituation, but let others stand out by revealing related gambles that build appetite. Complex patterns of occasioning should sometimes proliferate into major preoccupations or lifestyles (outlined in Ainslie, Reference Ainslie2013), and thus form high-level goals as in the authors' examples (target article, sect. 4.2). Endogenous reward is a fiat currency in which agents can indulge freely, limited only by depletion of their appetite for it. However, such reward is subject to competition by not only extrinsic incentives, but also by different patterns of endogenous reward that build their appetite and harvest their reward on alternative timetables. Painful thoughts, for instance, would offer a combination of rapid reward for attention but with an inhibition of other sources of reward.
The authors have only begun to tap the radical potential of endogenous reward, which is, in effect, a behavior (Ainslie, Reference Ainslie2023, pp. 19–22). If reward governs cognitive functions in general, it may be the universal selective factor in all modifiable mental processes. Of course tendencies toward many responses are inborn – for example, in emotions such as anger after frustration or fear when facing danger, or in the authors' example of orienting attention to an object that suddenly appears (target article, sect. 5.2). Obedience to pre-existing tendencies is conventionally ascribed to unmotivated black-box factors such as incentive salience, simple pairing (“conditioning”), or the actor's having formed a habit. But many examples show pre-existing tendencies to be modifiable in the marketplace of motivation: They can be overcome by competing rewards with the right magnitude and timing, as in learned emotional control (Ainslie & Monterosso, Reference Ainslie and Monterosso2005) or strong attentional focus (Beecher, Reference Beecher1948). After all, incentive salience is still incentive, emotions all have valences (Miller, Reference Miller1969) and conditioning doesn't occur to neutral unconditioned stimuli (Goldwater, Reference Goldwater1972, pp. 350–351). As for habits, even rats switch flexibly into and out of them (Keramati, Smittenaar, Dolan, & Dayan, Reference Keramati, Smittenaar, Dolan and Dayan2016). Mental processes in general may be pulled by reward much more than they are pushed by prior stimuli.
Financial support
This material was supported by the Department of Veterans Affairs Medical Center, Coatesville, PA, USA. The opinions expressed are not those of the Department of Veterans Affairs or the US Government.
Competing interest
None.