Introduction
Anorexia nervosa (AN) is a medically-dangerous eating disorder characterized by extreme dietary restriction, an intense fear of weight gain, and disturbed body-related experience, resulting in severe weight loss and sustained low body weight (American Psychiatric Association, 2013). A growing body of research supports altered decision-making (Guillaume et al., Reference Guillaume, Gorwood, Jollant, Van den Eynde, Courtet and Richard-Devantoy2015; Wu et al., Reference Wu, Brockmeyer, Hartmann, Skunde, Herzog and Friederich2016) and reinforcement learning (Pike et al., Reference Pike, Sharpley, Park, Cowen, Browning and Pulcu2023; Ritschel et al., Reference Ritschel, Geisler, King, Bernardoni, Seidel, Boehm, Vettermann, Biemann, Roessner, Smolka and Ehrlich2017) in AN, which are thought to underlie core cognitive and behavioral symptoms, including the persistence of dietary restriction and compensatory weight loss behaviors (e.g., purging, excessive exercise) despite negative consequences. However, less is known about how decision-making and reinforcement learning may interact in AN in ill and remitted states.
The ability to flexibly adapt to changing environments is based on both learning to maximize rewarding outcomes and to avoid aversive outcomes and requires outcome valuation and cognitive flexibility to update information and arbitrate between options (Dayan & Daw, Reference Dayan and Daw2008; Frank et al., Reference Frank, Seeberger and O’Reilly2004). Reinforcement learning (RL) models are based on the notion that the rate of learning is driven by violations of expectations, or prediction errors (PE), which reflect the difference between received and expected outcome (Pearce & Hall, Reference Pearce and Hall1980; Rescorla & Wagner, Reference Rescorla and Wagner1972). Learning from experience occurs through updating expectations about the outcome in proportion to PE, so that the expected outcome converges to the actual outcome. Deficits in reward processing in ill and remitted AN have been frequently observed (Haynos et al., Reference Haynos, Lavender, Nelson, Crow and Peterson2020; Kaye et al., Reference Kaye, Wierenga, Bischoff-Grethe, Berner, Ely, Bailer, Paulus and Fudge2020; O’Hara et al., Reference O’Hara, Schmidt and Campbell2015; Wierenga et al., Reference Wierenga, Bischoff-Grethe, Melrose, Irvine, Torres, Bailer, Simmons, Fudge, McClure, Ely and Kaye2015), with emerging evidence of altered aversive processing and increased punishment sensitivity in ill states (Bernardoni et al., Reference Bernardoni, Geisler, King, Javadi, Ritschel, Murr, Reiter, Rössner, Smolka, Kiebel and Ehrlich2018; Bischoff-Grethe et al., Reference Bischoff-Grethe, McCurdy, Grenesko-Stevens, (Zoe) Irvine, Wagner, Wendy Yau, Fennema-Notestine, Wierenga, Fudge, Delgado and Kaye2013; Harrison et al., Reference Harrison, O’Brien, Lopez and Treasure2010; Jonker et al., Reference Jonker, Glashouwer and de Jong2022; Jonker et al., Reference Jonker, Glashouwer, Hoekzema, Ostafin, de Jong and Hadjikhani2020; Monteleone et al., Reference Monteleone, Monteleone, Esposito, Prinster, Volpe, Cantone, Pellegrino, Canna, Milano, Aiello, Di Salle and Maj2017), thought to interfere with the ability to learn from experience. Consistent with this view, studies of RL for reward in AN tend to demonstrate decreased learning accuracy in acutely ill and weight-restored individuals (Foerde et al., Reference Foerde, Daw, Rufin, Walsh, Shohamy and Steinglass2021; Foerde & Steinglass, Reference Foerde and Steinglass2017; Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). In contrast, increased rates of punishment learning during RL tasks that involved changing rules (e.g., reversal learning) or outcome probabilities (e.g., adaptive learning tasks) have been reported in adolescents acutely ill with AN (Bernardoni et al., Reference Bernardoni, Geisler, King, Javadi, Ritschel, Murr, Reiter, Rössner, Smolka, Kiebel and Ehrlich2018) and adults remitted from AN (Bernardoni et al., Reference Bernardoni, King, Geisler, Ritschel, Schwoebel, Reiter, Endrass, Rössner, Smolka and Ehrlich2021; Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014; Pike et al., Reference Pike, Sharpley, Park, Cowen, Browning and Pulcu2023), although one study reported learning decreased following a rule change (Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014). Model simulations conducted to help explain this observed decrease in learning showed that while increasing a punishment sensitivity parameter explained observed accelerated initial category learning in remitted AN, observed deficits in set shifting were explained by altering parameters representing changes in rule selection and flexibility, highlighting that both punishment sensitivity and cognitive flexibility may impact associative learning in AN (Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014), consistent with other findings of worse cognitive control during RL in women remitted from AN (Ritschel et al., Reference Ritschel, Geisler, King, Bernardoni, Seidel, Boehm, Vettermann, Biemann, Roessner, Smolka and Ehrlich2017).
We previously used a well-studied two-choice, feedback-based, probabilistic associative learning task (PALT; (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011)) to investigate the influence of rewarding and punishing outcomes on instrumental RL over extended task exposure in individuals acutely ill with AN (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). Performance on the PALT paradigm has provided insights into parkinsonism (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009), posttraumatic stress disorder (Myers et al., Reference Myers, Moustafa, Sheynin, VanMeenen, Gilbertson, Orr, Beck, Pang, Servatius and Boraud2013), major depressive disorder (Herzallah et al., Reference Herzallah, Moustafa, Natsheh, Abdellatif, Taha, Tayem, Sehwail, Amleh, Petrides, Myers and Gluck2013) and aging (Sojitra et al., Reference Sojitra, Lerner, Petok and Gluck2018), supporting the clinical validity of the PALT.
The PALT relies on the contingency between a participant’s response and outcome (i.e., whether or not they won or lost points) to facilitate learning (i.e., to select the optimal reward-based stimuli and avoid the non-optimal punishment-based stimuli), and unlike reversal learning tasks or adaptive learning tasks, outcome contingencies and learning rules do not change over the course of the task (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011; Myers et al., Reference Myers, Moustafa, Sheynin, VanMeenen, Gilbertson, Orr, Beck, Pang, Servatius and Boraud2013). Using a value-based computational model of RL, we found that individuals with acute AN had reduced learning rates when either positive (better than expected) or negative (worse than expected) PE occurred. Individuals with AN were also less likely than healthy controls to exploit what they had learned suggesting they may less decisively make choices (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). A larger magnitude of negative PE was associated with worse treatment outcome, suggesting that poorer loss-related learning may be a mechanism of AN persistence.
Typically, PE models of RL only account for accuracy data (Pedersen et al., Reference Pedersen, Frank and Biele2017) and use a choice rule that involves a single parameter, which encompasses all of the decision processes that led to a choice. However, because many cognitive processes other than PE learning, such as decision-making, underlie performance on RL tasks, computational models that incorporate both response time and accuracy data from these tasks, such as a hybrid RL and drift diffusion model (DDM; (Pedersen & Frank, Reference Pedersen and Frank2020; Pedersen et al., Reference Pedersen, Frank and Biele2017)) may better determine what cognitive processes account for abnormal functioning. Notably, drift diffusion models track learning as changes in accuracy, response time or a tradeoff of both. The RLDDM includes the PE learning rule in its architecture (Rescorla & Wagner, Reference Rescorla and Wagner1972), but models choice probability using the DDM, which accounts for the time to complete non-decision processes, such as stimulus encoding and response execution, and time to complete decisions. The DDM assumes that during the decision time information is sequentially sampled from a noisy stimulus until one of two thresholds is reached. The choice associated with the exceeded threshold is then made. Decisions are determined by such processes as choice bias, information sampling rate, and the spread of the response thresholds for the two choices. Thus, the RLDDM provides a more thoroughly articulated account of cognitive processes underlying decision-making than does the
single-parameter choice rule (Pedersen & Frank, Reference Pedersen and Frank2020; Pedersen et al., Reference Pedersen, Frank and Biele2017). Pedersen et al. (Reference Pedersen, Frank and Biele2017) found model parameters were strongly to very strongly related to medication effects in attention deficit hyperactivity disorder supporting the clinical validity of RLDDM parameters.
The current study aimed to clarify two key issues. First, to examine whether our previous findings of poorer learning following both positive and negative prediction errors reflect state-related correlates of the illness or are also present in remitted AN, we employed the same reinforcement learning paradigm in a sample of women remitted from AN. Second, we sought to better characterize cognitive processes related to decision-making that might impact RL by modeling both PALT accuracy and response time data using a hybrid RLDDM. Moreover, the hierarchical RLDDM explored whether psychological processes underlying RL and decision-making were altered by greater task exposure as reflected in parameter changes across consecutively presented sets of stimuli. Lastly, we explored whether the learning rate and decision-making model parameters were associated with clinical variables for the remitted AN group.
Methods
Participants
Twenty-five women remitted from DSM-5 anorexia nervosa (rAN; 16 pure restricting type; 9 binge eating/purging type, with regular purging but no binge-eating behavior) and 22 community control women (cCN) participated. Consistent with prior work, remission from AN was defined as maintaining above 85% of ideal body weight, regular menstrual cycles, and the absence of binge eating, purging (including excessive exercise), and restrictive eating for at least 1 year prior to study, with no current psychological symptoms of AN (e.g., body dissatisfaction) (Wagner et al., Reference Wagner, Barbarich‐Marsteller, Frank, Bailer, Wonderlich, Crosby, Henry, Vogel, Plotnicov, McConaha and Kaye2006). Women who met criteria for a current Axis I diagnosis were excluded from both groups, and controls with any eating disorder history were excluded. The study was approved by the University of California San Diego Institutional Review Board and conducted in compliance with the Helsinki Declaration of 1975, as revised in 2008. All participants provided written informed consent (see Supplemental Materials for additional details regarding assessment tools and exclusion criteria).
Probabilistic associative learning task (PALT)
A probabilistic associative learning task (Figure 1) was used to assess trial-by-trial response-outcome instrumental learning to reward (wins) and punishment (losses). The task was administered using two stimulus sets containing different stimuli to examine generalization of learning and evaluate whether rAN demonstrate altered learning from rewarding or aversive feedback with additional exposure to these types of feedback, as seen in ill AN (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). The order of stimulus sets (A or B) was counterbalanced across participants; set 1 refers to the first set presented (A or B) and set 2 refers to the second set presented (A or B) (Bodi et al., Reference Bodi, Keri, Nagy, Moustafa, Myers, Daw, Dibo, Takats, Bereczki and Gluck2009; Mattfeld et al., Reference Mattfeld, Gluck and Stark2011; Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022).
Reinforcement learning drift diffusion model (RLDDM)
The hybrid RLDDM model provides information about prediction errors while giving additional information about the decision process. We used a slight modification of Pedersen and colleagues’ (2017) optimal fitting model to analyze our data (Supplemental Materials). Like Pedersen et al. (Reference Pedersen, Frank and Biele2017), we modeled learning using a delta rule and modeled decision-making using a drift diffusion (Wiener) process (Ratcliff et al., Reference Ratcliff, Smith, Brown and McKoon2016; Ratcliff & Tuerlinckx, Reference Ratcliff and Tuerlinckx2002). Pedersen et al.’s (Reference Pedersen, Frank and Biele2017) hierarchical Bayesian model estimated positive and negative prediction error learning rates (η p and η n), the scale parameter (m), a base boundary separation parameter (a), an exponent of a power function (i), and a non-decision parameter (t er), which represents time needed to encode a stimulus and to prepare and execute responses. Larger values of the learning parameters indicate greater adaption of outcome expectancies when the reward is greater than expected or punishment less than expected (η p) or when the outcome is more punishing or less rewarding than expected (η n). Parameter m reflects how sensitive participants are to differences in outcome values associated with the two choices (Pedersen et al., Reference Pedersen, Frank and Biele2017). For some participants, differences in choice expectancies might be less important than for other participants. The boundary separation parameter a, which reflects the difference between the two response thresholds, is often interpreted as response caution with higher values favoring accuracy over speed (Myers et al., Reference Myers, Interian and Moustafa2022). Dynamic changes in boundary separation over trials are assumed to follow a power function of trial number with the parameter i reflecting the rate of change (Pedersen et al., Reference Pedersen, Frank and Biele2017). Values near zero represent little dynamic change, whereas larger values might represent less response caution with experience across trials. The psychological meaning of both boundary separation and the dynamic parameter i should be interpreted alongside drift rate v, which is derived from the difference in the two response expectancies multiplied by the sensitivity parameter m (Pedersen et al., Reference Pedersen, Frank and Biele2017). The drift rate represents the average speed of information extraction from a stimulus (Ratcliff et al., Reference Ratcliff, Smith, Brown and McKoon2016). The information accumulation process is noisy and is influenced by task difficulty, sensory discriminability, attention, and speed of cognitive processes among other neurobehavioral processes (Myers et al., Reference Myers, Interian and Moustafa2022; Ratcliff et al., Reference Ratcliff, Smith, Brown and McKoon2016). The association of narrower boundary separation with higher drift rates can lead to fast and accurate responding, implying learning rather than caution. We also added a starting point parameter z to the model (See Supplemental Materials). Values greater than 0.5 bias the participant towards making faster and more frequent optimal choices (Myers et al., Reference Myers, Interian and Moustafa2022). The RLDDM was fit to all valid trials of all participants within group modeling both between block and within block learning. The model hierarchy involved trial set as a higher level variable under which trials were embedded. See Table 2 for a summary of the parameters estimated and their descriptions and Supplemental Materials for more details.
Data analysis
The observed dependent variables were the median RT and mean proportion of optimal choices made (accuracy) calculated over reward and punishment trials separately for stimulus set 1 and 2. For observed data and modeled data involving trial-type or prediction error (learning rate and drift rate), we used the general linear model with repeated measures (SPSS Version 28.0.1.0) to perform a diagnostic group (cCN vs. rAN) x stimulus-set order (set 1 vs set 2) x trial type (reward vs. punishment) Analysis of Variance (ANOVA). To analyze model parameters that did not involve trial type (starting point bias, base boundary separation, boundary power, non-decision time, and scale), we performed a diagnostic group × set ANOVA. In order to explore interactions, we used SPSS multivariate simple main effect tests of marginal means (Garofalo et al., Reference Garofalo, Giovagnoli, Orsoni, Starita, Benassi and Evans2022). These tests provide linearly independent tests of pairwise comparisons among estimates of the marginal means.
Data and model parameter estimates from three participants were excluded from analysis. One cCN participant had a large number of missing values on punishment trials and appeared to game the program. Two participants (1 cCN, 1 rAN) with poor convergence of some parameter estimates were also excluded. Various levels of power given different effect sizes are presented in Supplemental Figure 6. Throughout the paper the 95% CI is reported.
Exploratory clinical associations
Separate Pearson correlational analyses within the rAN group examined relationships between the two RL model values (learning rate positive PE (ηp) and learning rate negative PE (ηn)) and five AN clinical measures (current BMI, lowest BMI, age of AN onset, duration of illness, duration of remission) separately for each set, and between three DDM model values (base boundary separation (a), drift rate reward trials (v r), drift rate punishment trials (v p)) and the five AN clinical measures. Bonferroni correction for multiple comparisons was used to determine a family-wise p-value of .005 for the two RL values and the five clinical measures, and a family-wise p-value of .003 for the three DDM values and the five clinical measures, assuming p = .05 for each test. Correlational analyses were repeated for mood/personality measures (BDI, STAI, TCI Harm Avoidance, TCI Reward Dependence) and age.
Results
Demographic variables
The rAN and cCN groups did not differ in education, current BMI, anxiety or depressive symptoms (Table 1). The rAN group was slightly older and had significantly lower historical BMI (p < .001). There were no correlations between parameter values and age by group and within stimulus set, suggesting age had little if any impact on parameter estimates.
Note: Student’s sample t-tests were used to assess statistical significance for between-group differences in continuous variables. Despite the age difference between groups, there were no correlations between parameter values and age by group and within stimulus set, suggesting age had little if any impact on parameter estimates. BDI = Beck Depression Inventory-Second Edition (BDI-2), BMI = body mass index. EDI-3 = Eating Disorder Inventory, STAI = Spielberger State-Trait Anxiety Inventory, TCI = Temperament and Character Inventory.
Observed task values
A priori tests of linear trends of response time and optimal choice over blocks of trials confirmed task learning for both groups (See Supplemental Materials for ANOVA results and plots).
Response time
The set × trial-type effect was significant, F(1,42) = 12.82, p < .001, η p2 = .23 (Figure 2 top row). Simple main effect tests on marginal means revealed that whereas the RT difference between set 1 and set 2 on reward trials was not significant (p = .102, η p2 = .06), RT was significantly slower on set 1 than set 2 on punishment trials (p < .001, η p2= .46), generating a large effect. RTs for set 1 trials were significantly longer than those for set 2, F(1,42) = 26.05, p < .001, η p2 = .38. Participants responded to trials on which stimuli were associated with punishing outcomes more slowly than on trials where stimuli were associated with rewarding outcomes, F(1,42) = 89.47, p < .001, η p2 = .68. The main effect of diagnostic group and its interactions were not significant (all η p2 < .07).
Proportion of optimal choices
There was a significant diagnostic group × set × trial-type interaction, F(1,42) = 4.78, p = .034, η p2 = .10 (Figure 3 top row). Simple main effects tests on marginal means revealed significantly lower proportion of optimal choices on set 2 than set 1 only among rAN participants and only on reward trials (p = .007, η p2 = .16). Participants responded on trials associated with punishing outcomes more accurately than when responding on reward trials, F(1,42) = 32.13, p < .001, η p2 = .43. The main effect of diagnostic group and its interaction with trial type were non-significant (both < .02).
Parameter values
See Table 2 for summary of results.
Note: g refers to the trial grouping variable, either set 1 or set 2; s is the subject number; t is the trial number; PE is prediction error. Some parameters vary across trials within stimulus set and diagnostic group, whereas others are fixed across trials.
PE learning rates
A significant two-factor interaction of learning rate valence with diagnostic group, F(1,42) = 12.28, p = .001, η p2 = .23 indicated a greater learning rate for negative PE in the rAN group compared to cCN, (p < .001, η p2 = .25). Additionally, the difference between positive and negative learning rate valence was greater for set 1 trials than for set 2 trials, F(1,42) = 4.59, p = .038, η p2 = .10 (Figure 4). These two-way interactions varied across a third study condition producing a three-factor learning rate × diagnostic group × set interaction F(1,42) = 8.09, p = .007, η p2 = .16. Simple main effects tests on marginal means indicated a significant increase in negative PE learning rate from set 1 to set 2 trials for rAN participants, p < .001, η p2 = .27. No other simple effect tests within the three-factor interaction, including the apparent set difference on positive PE learning rate for rAN, were significant. Learning rate for positive PE (ηp) was greater than for negative PE (ηn), F(1,42) = 65.85, p < .001, η p2 = .61. Neither the main effect of diagnostic group nor its interaction with stimulus set was significant.
Drift rate
The diagnostic group × set interaction was significant, F(1,42) = 5.79, p = .021, η p2 = .12, driven by a reduction of mean drift rate from set 1 to set 2 in the rAN group (p = .012, η p2 = .14). Moreover, even though the diagnostic group × trial-type interaction was not significant, the diagnostic group × set x trial-type interaction was, F(1,42) = 4.42, p = .042, η p2 = .10. Simple main effect tests on marginal means revealed a significantly smaller drift rate on set 2 compared with set 1 reward trials within the rAN group, p = .012, η p2 = .14 (Figure 5) clarifying the group × set interaction. No other simple effects were significant with all other η p2 < .03. There were no significant main effects of diagnostic group, trial type, or set on drift rate.
Scale
There were no significant effects of group, set, or their interaction (all η p2 < .02) for the scaling parameter. The grand mean (M = 2.75, [2.40, 3.12]) was significantly greater than 1.0 – a value of the difference scale that would have had no impact on the drift rate, t (43) = 9.32, p < .001, d = 1.40.
Boundary parameters
Base boundary separation
Stimulus set significantly interacted with diagnostic group, F(1,42) = 4.69, p = .036, η p2 = .10 (Figure 6). Although simple main effects tests on marginal means revealed that the base boundary separation estimated from set 1 trials was significantly greater than from set 2 for both cCN (p = .023) and rAN (p < .001), the effect size of the boundary separation difference across sets for rAN was greater (η p2 = .44) than for cCN (η p2 = .12). The base boundary separation estimated from set 1 trials was significantly greater than the separation estimated from set 2, F(1,42) = 32.06, p < .001, η p2 = .43. The main effect of diagnostic group was not significant, p = .285, η p2 = .027.
Boundary power separation
Values were negative for all participants when estimated from set 2 stimuli and all participants but two on set 1, causing smaller boundary separation widths as a stimulus was repeated. Mean boundary power value was significantly smaller (F(1,42) = 4.77, p = .035, η p2 = .10) for set 1 stimuli (M = -.09, [−.11, −.07]) than for set 2 (M = .-.07, [−.07, −.06]). However, the mean differences across diagnostic group and the interaction of group with set were not significant (both η p2 < .01).
Starting point bias
The grand mean for starting point bias (M = .48, [.47, .49]) was significantly lower than the unbiased value 0.50, t (43) = −4.15, p < .001, d = −.63, indicating a bias toward the non-optimal choice (lower boundary) at the start of a trial. There were no significant effects of group, set or their interaction on starting point bias (all η p2 < .02).
Non-decision time
There were no significant effects of diagnostic group, set or their interaction on the non-decision time parameter.
Integrating parameter findings
Model estimated non-decision time did not differ across the study conditions, implying that time to encode stimuli and prepare responses did not significantly vary between groups. Rather study conditions affected reinforcement learning and decision processes (below).
The relative increase in negative PE rate from set 1 to set 2 among rAN implies that with greater task exposure rAN more readily learned when receiving less reward or greater punishment than expected compared with cCN.
Findings involving decision processes included effects involving the separation of optimal and non-optimal choice boundaries (Base Boundary Separation) and the mean rate of information extraction from the stimulus. Although the boundary separation was less on set 2 than set 1 for both groups, the difference was larger for rAN. At the same time, the drift rate was slower for set 2 stimuli than set 1 on reward trials for rAN. When drift rate is fixed, less boundary separation causes the extracted information to reach a response boundary more quickly. However, drift rate slowed more for set 2 reward stimuli than for set 1 reward stimuli in rAN. Together tighter boundary separation and slower drift rate could counterbalance one another and account for absence of a significant interaction of group and set on RT. However, together these effects could explain the reduced accuracy seen in rAN on set 2 because responses would be based on less extracted information than on set 1.
The absence of significant main effects or interactions on the scale parameter suggests the two groups might not have differed in their sensitivity to the two learning rates. Finally, our results did not support a group difference in global choice bias.
Analyzing posterior predictions of response time and proportion optimal choices
The modeled values appear to mirror the key interactions from the observed data quite well (Compare the top and bottom rows of Figures 2 and 3). ANOVAs on the modeled RT and accuracy values supported their similarity to the observed data (Supplemental Materials).
Exploratory clinical associations
No associations between age, mood or personality variables and RLDDM parameters were detected in rAN (uncorrected p < .05). For illness-related variables, the only association to meet statistical significance after controlling for multiple comparisons was the association between age of AN onset and learning rate for negative PE on set 2 (r = .57, p = .004) and across both sets (r = .52, p = .009), demonstrating that women with a later age of onset had greater learning rates following negative PE.
Discussion
To better understand the latent processes involved in choice behavior during reinforcement-based decision-making in rAN, this study incorporated the drift diffusion model of decision-making as the choice function, instead of using only a single choice parameter. Three key findings emerged: 1) Participants had greater learning rates for positive PE than negative PE with no significant group differences observed for positive PE; 2) the rAN group had greater learning rates for negative PE than cCN. Moreover, exploration of a three-way interaction indicated a significant increase in negative PE learning rate from set 1 to set 2 trials for rAN participants. Negative PE learning rate on set 2 trials was associated with later age of AN onset; 3) Within a three-way interaction, the rAN group had a lower drift rate for reward trials on set 2, suggesting that their poorer optimal choice accuracy for set 2 reward trials was explained by less information uptake.
Findings that rAN do not differ from cCN in learning rate from positive PE, and have greater learning rate than cCN from negative PE, are in contrast to our previous findings in women with acute AN, suggesting the impairment in the acute illness might be related to state-specific factors, such as psychological symptom severity and/or malnutrition. The current finding of greater learning rate for negative PE in rAN is consistent with studies of reinforcement learning under changing learning rules (Bernardoni et al., Reference Bernardoni, Geisler, King, Javadi, Ritschel, Murr, Reiter, Rössner, Smolka, Kiebel and Ehrlich2018; Bernardoni et al., Reference Bernardoni, King, Geisler, Ritschel, Schwoebel, Reiter, Endrass, Rössner, Smolka and Ehrlich2021; Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014) and outcome contingencies (Pike et al., Reference Pike, Sharpley, Park, Cowen, Browning and Pulcu2023) demonstrating increased punishment learning in both ill and remitted states, and suggests increased punishment learning in more stable learning contexts as well.
AN is also characterized by cognitive inflexibility, both in ill and remitted states (Miles et al., Reference Miles, Gnatt, Phillipou and Nedeljkovic2020; Roberts et al., Reference Roberts, Tchanturia, Stahl, Southgate and Treasure2007; Wu et al., Reference Wu, Brockmeyer, Hartmann, Skunde, Herzog and Friederich2014). However, few studies have examined the degree to which cognitive inflexibility and difficulty set shifting in AN contribute to altered reinforcement learning. We previously found that women ill with AN had smaller explore-exploit parameter values during RL, suggesting that they were less decisive about exploiting what they had learned (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). However, since the single explore-exploit parameter encompasses all of the decision processes leading to a choice, the current study aimed to increase precision by examining whether DDM provides a more fine-grained characterization of altered decision processes in rAN during instrumental learning. An examination of the DDM parameters related to choice behavior revealed no differences between cCN and rAN in starting point bias (both groups preferred the non-optimal trial at the beginning of each set before decision evidence was available), boundary power (groups did not differ in being less cautious when making a choice with greater practice), or scale (no group differences in the importance of expectancy differences). rAN and cCN groups also did not differ on non-decision time, suggesting no differences in the time spent encoding the stimuli and engaging in motor processes. The decision parameters that affected accuracy findings, especially on set 2 for rAN, were the slower drift rate and the more narrow base boundary separation. These findings suggest the hypothesis that differences in the single decision parameter we previously reported might have been driven by slower information extraction rather than by inflexible or perseverative responding.
Like other investigators, we chose to use symbolic feedback in order to investigate reinforcement learning processes generally (Chan et al., Reference Chan, Ahn, Bates, Busemeyer, Guillaume, Redgrave, Danner and Courtet2014; Filoteo et al., Reference Filoteo, Paul, Ashby, Frank, Helie, Rockwell, Bischoff-Grethe, Wierenga and Kaye2014; Foerde & Steinglass, Reference Foerde and Steinglass2017). Assuming the present results and those from our previous study would generalize from symbolic to food feedback, some additional hypotheses based on our findings are possible. We previously found that the magnitude of negative PE when punishment was possible was most strongly associated with treatment outcome in ill AN; larger negative PEs predicted less weight gain over the course of treatment (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). Deficits in learning from punishment have been hypothesized to help explain the rigid persistence of disordered eating behaviors despite their negative consequences in AN. In the current study, greater learning rate for negative PE in rAN was associated with later age of AN onset, suggesting that worse negative PE learning may be an indicator of early neurodevelopmental disruption. The PE learning rate findings from the present study suggest that with increasing exposure to rewarding and punishing outcomes, rAN shift attention to learning from negative PE while slowing down extraction of information from rewarding stimuli due to limited attentional resources. The increased emphasis on negative over positive prediction error learning in rAN that was not observed in ill AN might be a marker of recovery (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022). Hypotheses suggested by the current findings require testing in studies using food as feedback.
A key strength of this study is the innovative incorporation of the DDM as the choice mechanism into a PE reinforcement learning model to provide a richer characterization of theoretically distinct aspects of decision-making to reinforcement learning in rAN. Other strengths include refinements to the PE model by modeling separate trial-specific positive and negative PE learning rates, and using hierarchical Bayesian analysis to simultaneously estimate individual and group parameters to ensure reliable and mutually constrained parameter estimates for complex models (Gelman et al., Reference Gelman, Carline, Stern, Dunson, Vehtari and Rubin2013; Kruschke, Reference Kruschke2010). As shown in Supplemental Materials, our models demonstrated good fit to the behavioral data. Moreover, similarity of learning rate values among community controls estimated from the RLDDM in the current study to those from the RL model we previously published (Wierenga et al., Reference Wierenga, Reilly, Bischoff-Grethe, Kaye and Brown2022) supports the assumption that adding the DDM apparatus to the RL architecture does not greatly alter the reinforcement learning mechanism.
Despite these strengths, this study was limited in its cross-sectional design. Longitudinal studies are needed to determine the causal role of learning rate, particularly for negative PE, in clinical outcome and to determine how enduring our findings are. Similarly, retest data are needed to determine reliability of task measures and model parameter values. The exclusion of individuals with a current Axis I diagnosis from the rAN group may limit the external validity of the study findings, though this is a strength with respect to internal validity. Small sample sizes precluded AN subtype analyses (e.g., restricting-only vs purging). The rAN group was also older and less racially representative; sensitivity analyses suggest age did not contribute to group performance differences. Groups did not differ on reaction time nor did they differ on non-decision time, suggesting the rAN group did not have slowed processing speed indicative of residual cognitive symptoms. Thus, it is unlikely that reduced accuracy on reward trials over time in rAN is reflective of broader cognitive sequelae. Moreover, our stringent remission criteria mitigated potential confounding influences of malnutrition, comorbid psychopathology and medication effects on performance. Finally, direct comparisons of ill and remitted AN within the same study are needed to test hypotheses about the state versus trait status of findings and potential markers of recovery.
Conclusions
This is the first study to evaluate the contribution of theoretically distinct aspects of decision-making to reinforcement learning in remitted AN by integrating computational models of reinforcement learning with a drift diffusion model of decision-making. Using an instrumental probabilistic associative learning task that included positive and negative outcomes, we observed better negative PE learning rate with extended stimuli exposure, and slower information uptake with less cautious responding over time on reward trials in rAN, suggesting with increasing exposure to rewarding and punishing outcomes, rAN shift attention to learning from negative PE while slowing down extraction of information from rewarding stimuli. Simultaneous estimation of RL and DDM parameters provides a fine-grained analysis of the cognitive processes underlying speeded binary decisions during reinforcement learning. The increased emphasis on negative over positive feedback during trial-by-trial learning in rAN that was not previously observed in ill AN suggests this might reflect a marker of recovery and potential target of treatment.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S1355617725000013.
Competing interests
There are no competing interests.
Funding statement
This research received no specific grant from any funding agency, commercial or not-for-profit sectors.