Hostname: page-component-7b9c58cd5d-9klzr Total loading time: 0 Render date: 2025-03-15T06:52:36.583Z Has data issue: false hasContentIssue false

Timing models of reward learning and core addictive processes in the brain

Published online by Cambridge University Press:  29 July 2008

Don Ross
Affiliation:
Finance, Economics and Quantitative Methods, Department of Philosophy, University of Alabama at Birmingham, Birmingham, AL 35294-1260 Department of Economics, University of Cape Town, Rondebosch 7701, South Africa. don.ross@uct.ac.zahttp://www.uab.edu/philosophy/ross.htmlhttp://www.commerce.uct.ac.za/Economics/staff/dross/default.asp
Rights & Permissions [Opens in a new window]

Abstract

People become addicted in different ways, and they respond differently to different interventions. There may nevertheless be a core neural pathology responsible for all distinctively addictive suboptimal behavioral habits. In particular, timing models of reward learning suggest a hypothesis according to which all addiction involves neuroadaptation that attenuates serotonergic inhibition of a mesolimbic dopamine system that has learned that cues for consumption of the addictive target are signals of a high-reward-rate environment.

Type
Open Peer Commentary
Copyright
Copyright © Cambridge University Press 2008

Redish et al. superbly organize current knowledge of addiction into an elegant system of conceptual folders. Their claims that different people succumb to addiction under different pressures, and that we should expect variability in the way clinical populations respond to various interventions, are persuasive. However, their remark that “treatments aimed at these specific modes are more likely to be successful than general treatments aimed at the general addicted population” (sect. 5.5, para. 4), while presently true, invites an overly strong reading to the effect that we should not expect to find any core neural process that is always crucial to the distinctively addictive properties of some but not most suboptimal behavioral habits, and which neuropharmacological interventions could target. Certainly, we do not now know there is such a core process. However, the evidence that Redish et al. so ably review remains compatible with this hypothesis. What might lead them to underestimate its probability is their treating Vulnerabilities 2, 4, 6, 7 and 9 as separate, bypassing reasons for suspecting they may be faces of one process.

The dopamine circuit from ventral tegmental area and substantia nigra pars compacta (SNpc) to ventral striatum (especially nucleus accumbens [NAcc]) is, as the authors emphasize, a learning system. Its learning function has sometimes been modeled as seamlessly integrating reward prediction, reward valuation, salience maintenance of perceptual targets that cue expectations of reward, and preparation of motor response (McClure et al. Reference McClure, Daw and Montague2003). More abstractly, Panksepp (Reference Panksepp1998) has conceptualized this integrated circuit as a “seeking system.” Merely because theorists conceptually distinguish all of these functions doesn't mean that the circuitry of the brain does so.

Another theoretical distinction of importance here that may also lack a clear basis in neural functioning is that between classical and operant conditioning. It is undermined by Gallistel and Gibbon's (Reference Gallistel and Gibbon2000; Reference Gallistel and Gibbon2002) timing models of conditioning phenomena, according to which animals represent the durations of intervals and the rates of events, and conditioned responding occurs as a function of the comparison of rates of reward. On these models, animals are drawn to environments with higher such rates by gradient climbing, rather than by forming explicit associations between stimuli and conditioned responses, or between behaviors and specific expected outcomes. Call such processes “G-learning.” In addiction studies there has been a long-running and unresolved debate over the relationship between supposedly classically conditioned cravings and apparently instrumentally conditioned preparations for consumption of addictive targets. Perhaps the debate has been inconclusive because these are one and the same process so far as neural implementation is concerned. Addictions might then be conceptualized as high-reward-rate “environments” that lure organisms' attention and approach, unless the dopamine system is opposed – as it appears to be, successfully in non-addicts, by serotonergic and GABAnergic signals from the prefrontal cortex (PFC).

Daw (Reference Daw2003) computationally models G-learning and temporal-difference (TD) learning, of the kind implicated in Redish et al.'s Vulnerability 7, as complementary. Suppose an animal has learned a function that predicts a reward at t, where the function in question decomposes into models of two stages: one applying to the interval between the conditioned and the unconditioned stimulus, and one applying to the interval between the unconditioned stimulus and the next conditioned stimulus. Then imagine that a case occurs in which at t nothing happens. Should the animal infer that its model of the world needs revision, perhaps to a one-stage model, or should it retain the model and regard the omission as noise or error? This is the problem that underlies Redish et al.'s Vulnerability 6. In Daw's account, the animal uses G-learning to select a world-model: whichever such model matches behavior that yields the higher reward rate will be preferred to alternatives. Given this model as a constraint, TD learning can then predict the temporal placement of rewards (“when”-learning). This hybrid approach allows Daw to drop unbiological features of the original model of TD learning by the dopamine system: tapped-line delay timing and exogenously fixed trial boundaries.

If this is on the right track, then the mesolimbic dopamine circuit's response to an addictive target does not involve a breakdown: it is functioning just as evolution intended. The casino, for example, is a high reward-rate environment. Furthermore, because of its variable reward schedule, the casino continually challenges the system's “when”-learning and prevents the dopamine signaling, as represented by the TD algorithm, from settling down. Addictive drugs may all encourage the same response by interfering with the reliability of neural clocks, a possible vulnerability Redish et al. do not explicitly consider, but which might be expressed as changes in allostasis, their Vulnerability 2.

If addicts' dopamine circuits are working just as promised in the Darwinian user's manual, what has gone wrong, at the level of neural functioning, in their case? The answer may be Redish et al.'s Vulnerability 9: neuroadaptation in inhibitory (serotonergic and other) circuits resulting from continuous dopamine overload in NAcc. This vulnerability, as caused by the mechanism identified in Vulnerability 7, is expressed as Vulnerability 4. (Vulnerability 6 is simply an expression of Vulnerability 7 given consumption of addictive targets.)

The only evidence that, as far as I can see, Redish et al. provide against a common central role in all addictions for Vulnerability 7 is their claim that cravings must implicate the planning system, whereas the mesolimbic dopamine system is part of the habit system. This claim must be defended against alternative neural accounts of cravings. Seamans and Yang (Reference Seamans and Yang2004) suggest that dopamine action gives rise to two possible states in the ventromedial prefrontal cortex (VMPFC), depending on which of two groups of receptors, D1 or D2, predominates. Where D2 reception predominates, multiple excitatory inputs promote VMPFC output to NAcc. Where D1 reception predominates, all signals below a high threshold are inhibited. In cocaine withdrawal, protein signaling to D2 receptors is reduced, thus inducing the animal to seek stimuli that can clear the high D1 threshold. Learned cues that such stimuli (e.g., cocaine) are at hand may then arouse the system. Here is a potential dopaminergic model of the mechanism by which habituation gives rise to cravings. A craving might simply be the uncomfortable phenomenology associated with the dopamine system's pulling attention away from motivators, alternative to the addictive target, on which frontal and prefrontal systems are “trying” to focus. That the person can, when probed, name what eliminates the discomfort, and that frontal cognition helps her seek its consumption, does not in itself show that goals set by a planning system are necessary for cravings.

References

Daw, N. D. (2003) Reinforcement learning models of the dopamine system and their behavioral implications. Unpublished doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA.Google Scholar
Gallistel, C. & Gibbon, J. (2000) Time, rate and conditioning. Psychological Review 107:289344.CrossRefGoogle ScholarPubMed
Gallistel, C. & Gibbon, J. (2002) The symbolic foundations of conditioned behavior. Erlbaum.CrossRefGoogle Scholar
McClure, S. M., Daw, N. & Montague, R. (2003) A computational substrate for incentive salience. Trends in Neuroscience 26:423–28.CrossRefGoogle ScholarPubMed
Panksepp, J. (1998) Affective neuroscience: The foundations of human and animal emotions. Oxford University Press.CrossRefGoogle Scholar
Seamans, J. K. & Yang, C. R. (2004) The principal features and mechanisms of dopamine modulation in the prefrontal cortex. Progress in Neurobiology 74:157.CrossRefGoogle ScholarPubMed