Introduction
Bipolar disorder (BD) is a chronic cyclical mood disorder involving periods of elevated (mania/hypomania) and periods of depressed mood. Prospective longitudinal studies have indicated that patients experience mood symptoms around half of the time they have the disorder, but while the characteristic feature of the disorder is (hypo)mania, it is depressive symptoms that are far more prevalent (Judd et al., Reference Judd, Akiskal, Schettler, Endicott, Maser, Solomon and Keller2002, Reference Judd, Akiskal, Schettler, Coryell, Endicott, Maser and Keller2003). Its etiology is unknown and a large amount of work in recent years has been undertaken to characterize the functional, cognitive and social deficits associated with the illness (Bonnín et al., Reference Bonnín, Martínez-Arán, Torrent, Pacchiarotti, Rosa, Franco and Vieta2010; Fagiolini et al., Reference Fagiolini, Kupfer, Masalehdan, Scott, Houck and Frank2005; Goetz, Tohen, Reed, Lorenzo, & Vieta, Reference Goetz, Tohen, Reed, Lorenzo and Vieta2007; Green, Cahill, & Malhi, Reference Green, Cahill and Malhi2007; MacQueen, Young, & Joffe, Reference MacQueen, Young and Joffe2001; Van Rheenen & Rossell, Reference Van Rheenen and Rossell2014b). Emotion processing in BD has received increasing attention in an attempt to understand whether some element of dysfunction in the processing of emotional stimuli plays a part in clinical mood symptoms (Van Rheenen & Rossell, Reference Van Rheenen and Rossell2013). Part of that endeavor has involved exploring facial expression recognition to capture emotion-decoding and labeling processes. Given the central importance of emotional expressions in day-to-day communication, deficits (reduced accuracy), or biases (greater sensitivity to specific emotions or a tendency to consistently interpret emotional stimuli in a particular way) in emotion processing could be of relevance in the experience of mood episodes or in the impaired social functioning seen in BD (Miklowitz, Reference Miklowitz2011; Sanchez-Moreno et al., Reference Sanchez-Moreno, Martinez-Aran, Tabares-Seisdedos, Torrent, Vieta and Ayuso-Mateos2009).
The findings of studies exploring facial emotion processing in BD are characterized by variabilityFootnote 1 rather than supporting a single deficit or bias in emotion processing (Kohler, Hoffman, Eastman, Healey, & Moberg, Reference Kohler, Hoffman, Eastman, Healey and Moberg2011; Van Rheenen & Rossell, Reference Van Rheenen and Rossell2013). In part, this may be due to the differences in methods used (e.g., facial image sets, emotion categories used/contrasted with one another, labeling versus discrimination tasks, stimulus display time, response format), the population studied (or pooled BD subtypes/samples), and sample size. Even in samples of euthymic BD patients there is considerable variability in the findings and conclusions of extant studies, with some reporting specific differences in the recognition of particular emotions, for example, enhanced recognition of disgust (Harmer, Grayson, & Goodwin, Reference Harmer, Grayson and Goodwin2002), poorer recognition of fear (Martino, Strejilevich, Fassi, Marengo, & Igoa, Reference Martino, Strejilevich, Fassi, Marengo and Igoa2011; Vederman et al., Reference Vederman, Weisenbach, Rapport, Leon, Haase, Franti and Mcinnis2012; Venn et al., Reference Venn, Gray, Montagne, Murray, Michael Burt, Frigerio and Young2004), poorer recognition of sadness (Vederman et al., Reference Vederman, Weisenbach, Rapport, Leon, Haase, Franti and Mcinnis2012), or poorer recognition of happiness and disgust (Yalcin-Siedentopf et al., Reference Yalcin-Siedentopf, Hoertnagl, Biedermann, Baumgartner, Deisenhammer, Hausmann and Hofer2014); others reporting difficulties with emotion discrimination generally (Addington & Addington, Reference Addington and Addington1998; Bozikas, Tonia, Fokas, Karavatos, & Kosmidis, Reference Bozikas, Tonia, Fokas, Karavatos and Kosmidis2006); and others reporting no significant differences in facial expression recognition (Addington & Addington, Reference Addington and Addington1998; Lee et al., Reference Lee, Altshuler, Glahn, Miklowitz, Ochsner and Green2013; Lembke & Ketter, Reference Lembke and Ketter2002; Rowland et al., Reference Rowland, Hamilton, Vella, Lino, Mitchell and Green2012); or none specifically in patients without a history of psychotic illness features (Thaler et al., Reference Thaler, Strauss, Sutton, Vertinski, Ringdahl, Snyder and Allen2013). In “symptomatic” patients, the picture is no clearer, with some studies reporting no differences on one or other of: recognition, discrimination, or sensitivity (Bellack, Blanchard, & Mueser, Reference Bellack, Blanchard and Mueser1996; Edwards, Pattison, Jackson, & Wales, Reference Edwards, Pattison, Jackson and Wales2001; Summers, Papadopoulou, Bruno, Cipolotti, & Ron, Reference Summers, Papadopoulou, Bruno, Cipolotti and Ron2006; Vaskinn et al., Reference Vaskinn, Sundet, Friis, Simonsen, Birkenaes, Engh and Andreasse2007). Here clinical heterogeneity is also an issue, with these three studies, respectively, including patients defined as being: generally symptomatic (without specific depression or mania ratings), having “affective psychosis” (including some patients in mixed and manic states), and a sub-group with varying degrees of residual depressive symptoms. Others have reported differences in recognition in manic patients [generally without exploring specific emotions (Getz, Shear, & Strakowski, Reference Getz, Shear and Strakowski2003), or worse recognition of surprise, but better recognition of disgust in patients compared to controls (Summers et al., Reference Summers, Papadopoulou, Bruno, Cipolotti and Ron2006)]. In bipolar depression, in two relatively small samples (n=14 and n=21, respectively), differences specifically in sensitivity (i.e., the “amount” of any particular emotion that needs to be present for the emotion to be correctly recognized) have been reported (Gray et al., Reference Gray, Venn, Montagne, Murray, Burt, Frigerio and Young2006; Schaefer, Baumann, Rich, Luckenbaugh, & Zarate, Reference Schaefer, Baumann, Rich, Luckenbaugh and Zarate2010).
To make sense of the disparate and contradictory findings, further studies are needed to develop our understanding of the extent to which emotional processing (specifically the processing and accurate labeling of different facial emotions) may be affected in BD. Studies in relatively large samples of well-characterized patients in clearly defined mood states and assessing alternative emotion processing/labeling paradigms would go some way to address this gap.
In a recent article in this journal, Van Rheenen and Rossell (Reference Van Rheenen and Rossell2014a) used a series of face-processing paradigms in a pooled sample of patients with BD in different mood states. In the study, three tasks were administered that each used four basic emotions (happy, sad, anger, and fear): emotion labeling of full-intensity dynamically morphed images (i.e., where static faces are presented rapidly through successive frames from a neutral to the final emotional expression, thereby being perceived as a moving image); emotion labeling of static images of different emotion intensities [high(100%), medium(75%), and low(50%)]; and emotion discrimination of static images using the same three intensity levels. When assessing all three tasks simultaneously, patients with BD were significantly less accurate than controls generally, although the effect was not seen for all of the tasks when analyzed individually. However, significant differences between groups on individual emotions were not evident. This led the authors to conclude that there was evidence of a broad deficit in aspects of emotion processing in BD, with effect sizes in the small to medium range. The comprehensive set of tasks used is undoubtedly a strength of the study and serves to highlight the extent to which methodological variations in task demands may contribute to the varied findings in this field. The patient cohort included a mix of depressed, hypomanic, mixed, and euthymic states, which were pooled for the primary analyses. While follow-up analyses indicated no statistical differences were reported between these different mood states, the size of the subgroups and complexity of the analyses in a repeated measures design may have impacted on the statistical power of post hoc contrasts to detect differences, which the authors identify as relatively subtle in the group as a whole and which were not detected in all tasks (Van Rheenen & Rossell, Reference Van Rheenen and Rossell2014a).
To further explore the impact of current mood episode and task variations on emotion processing deficits in individuals with BD compared to healthy unaffected controls, the present investigation reports data from two independent studies designed to explore the labeling of facial emotion between bipolar patients and healthy controls, using a series of tasks all designed to assess the perception/labeling of emotion from the human face. The first study was conducted in a well-characterized sample of prospectively verified euthymic BD patients and involved emotion labeling of static images of five basic emotions (angry, happy, fearful, sad, disgusted) at different intensity levels and static facial expression recognition of complex emotions. The second study was conducted in a well-characterized sample of depressed bipolar patients, where it was anticipated that any group differences that resulted from emotion processing deficits would be larger as patients were symptomatic (effectively “adding” state-related effects to the purported trait-related deficit). To maximize ecological validity of the second study, the tasks involved emotion-labeling of dynamic facial expressions (of the same five basic emotions used in the first study) displayed up to four different intensity levels, in addition to a standardized task of processing more ambiguous expression, labeling static images of “blends” emotions (Young, et al., Reference Young, Perret, Calder, Sprengelmeyer and Ekman2002). It was anticipated that emotion labeling deficits would be observed in euthymic patients compared to controls and that between group differences would be significantly greater in symptomatic patients.
Study 1: Euthymia
To assess the mood-state independence of basic emotion recognition ability in bipolar disorder, study 1 focused on testing patients when euthymic.
Methods
Participants
Sixty-four participants were recruited (n=38 bipolar patients and n=28 controls). Patients were recruited from secondary and tertiary psychiatric services throughout the North East of England. Inclusion criteria comprised: aged 18–65 years, a DSM–IV SCID diagnosis of bipolar disorder [confirmed using the Structured Clinical Interview (First, Spitzer, Williams, & Gibbon, Reference First, Spitzer, Williams and Gibbon1995)] and currently euthymic [≤7 on the 17-item Hamilton Depression Rating Scale (Hamilton, Reference Hamilton1960) and the Young Mania Rating Scale (Young, Biggs, Ziegler, & Meyer, Reference Young, Biggs, Ziegler and Meyer1978), which was prospectively verified for 4 weeks before testing; for details see Thompson et al. (Reference Thompson, Gallagher, Hughes, Watson, Gray, Ferrier and Young2005)]. Exclusion criteria comprised, current alcohol misuse/dependence, history of head injury with loss of consciousness, neurological illness/major medical illness, electroconvulsive therapy within the last 6 months, learning disability, or difficulty with fluent use of English language. Patients were not excluded for use of psychotropic medication or for comorbid anxiety disorders [comorbidities were assessed using the Mini-International Neuropsychiatric Interview (Sheehan et al., Reference Sheehan, Lecrubier, Sheehan, Amorim, Janavs, Weiller and Dunbar1998)].
Control participants were recruited via local advertisements. They were subject to the same exclusion criteria as the patient sample with the addition of no personal history of psychiatric illness and no family history of BD in a first-degree relative. The study was approved by the Newcastle Research Ethics Committee. All participants gave written informed consent (see Table 1 for demographics).
Measures
Facial Expression Recognition Task – Static Images (FERT-static)
The task used was based on versions used in earlier studies (Harmer et al., Reference Harmer, Grayson and Goodwin2002; Montagne et al., Reference Montagne, Kessels, de Haan and Perrett2007). Participants were presented with a black and white still facial photograph of a person showing one-of-five facial expressions (angry, disgusted, fearful, happy, or sad) or neutral. The images used were drawn from the Ekman series (Ekman and Friesen, Reference Ekman and Friesen1976) and were morphed with neutral (Tiddeman, Burt, & Perrett, Reference Tiddeman, Burt and Perrett2001) to produce expressions which varied in intensity before being masked from the bottom of the chin to the top of the forehead (thereby covering the hair and ears). Four different individuals were used from the Ekman series (two male, two female) each posing the five expressions plus neutral. This meant each of the expressions was shown sixteen times: four times at each of four intensity levels (20%, 40%, 60%, 80%) (5 emotions × 16 presentations=80 stimuli). The neutral expression was shown four times (once per individual, 84 stimuli in total).
The picture of the face was presented on a black background (333×482 pixels) on the left hand side of the screen for 1000 ms (see Figure 1a). After it had displayed, a solid black mask covered the image and the participant was instructed to indicate the expression (see Figure 1b). The words “Angry,” “Disgusted,” “Fearful,” “Happy,” “Sad,” and “Neutral” were presented on the side of the screen, listed in alphabetical order. It was not possible for a response to be given when the face was still being displayed.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922015620-56155-mediumThumb-S1355617715000909_fig1g.jpg?pub-status=live)
Fig. 1 Screen shot of the response options for the Facial Expression Recognition Task using static images. Faces were presented in the black rectangle for 1 s.
In order for participants to familiarize themselves with the task, it began with practice trials. This involved six presentations of 100% intensity of each of the five emotions and one neutral face. The pictures were of the same individual, who was not used again in the task. The practice trials were presented in the same fixed order to all participants. The 84 experimental trials were presented in a random order to each participant.
Stimuli were presented using Superlab 4.0 (Cedrus) and responses were recorded using 15” CTX resistive-touchscreen monitor. Responses were self-paced with the next stimulus appearing only after the participant had responded to the previous stimulus. The outcome measure of interest was the number of correct responses at each intensity level for each emotion. Reaction time was not analyzed as participants were not instructed to respond as quickly as possible.
“Reading-the-Mind-in-the-Eyes” test
This task is described in detail by Baron-Cohen, Wheelwright, Hill, Raste, and Plumb (Reference Baron-Cohen, Wheelwright, Hill, Raste and Plumb2001). It was used as a measure to assess identification of complex emotions. Although the task is described as a measure of “theory of mind,” it shares features in common with facial expression recognition paradigms and is interpreted in this way here. Participants are shown a single picture of the eye region of a face presented on an A4 page. The picture is surrounded by four adjectives describing a mental state (e.g., perplexed, horrified, astonished). The participant is instructed to identify which of the words they think best describes what the person in the picture is thinking or feeling and circle their choice on a separate answer sheet. After a single practice, 36 experimental items are completed one after the other in a self-paced manner. Response time is not recorded. The outcome measure of interest was the number of correct responses.
Procedure
The tests were administered as part of a wider battery of neuropsychological tests (Robinson, Reference Robinson2010). Participants completed the FERT-static test before the Eyes test, with unrelated tasks in between. The whole assessment took ~2 hr, and participants were able to take breaks.
Data analysis
Data were analyzed using SPSS v.17.0. A significance level of p<.05 was adopted. Patients and controls were compared using independent samples t-tests, χ 2-tests or, for tests that involved multiple levels or repetitions, repeated-measures analysis of variance (ANOVA). For t-tests, when Levene’s F-test identified instances of unequal variance, corrected p-values were reported. Effect sizes were calculated using Cohen’s d or partial eta squared in the case of factors from the ANOVA (η p 2). For Cohen’s d, positive effect sizes indicate higher scores by the control group. The proportion of the patient sample scoring <10th percentile of the control group was calculated for each emotion and each intensity level.
Results
FERT-static
The results of the facial expression recognition task are shown in Table 2. The results of a five(emotions) × four(intensity) × two(patient, control) repeated-measures ANOVA indicated that there was no significant main effect of group (F 1,64=0.59; p=.45; η p 2=0.01). There was a significant main effect of emotion (F 4,256=66.44; p<.001; η p 2=0.51) and intensity (F 3,192=583.77; p<.001; η p 2=0.90). Follow-up t-tests indicated the main effect of emotion reflected that happy expressions were significantly more easily recognized than each of the other emotions (all p<.05) and anger was significantly more poorly recognized than the other emotions (all p<.05) except sadness (p=.097). There was a significant group × intensity interaction (F 3,192=2.96; p=.034; η p 2=0.04), but follow-up independent samples t-tests did not indicate a significant difference between the groups at any intensity level (all p>.084); therefore, the effect could not be related to particular comparisons. The group × emotion interaction was not significant (F 4,256=1.13; p=.34; η p 2=0.02). The three-way interaction between group, intensity, and emotion was not significant (F 12,768=0.51; p=.91; η p 2=0.01). Using an independent samples t-test, there was no significant difference between the two groups for recognition of neutral faces (t 64=0.81; p=.42).
“Reading-the-Mind-in-the-Eyes” test
There were no significant differences between patients and controls for this task [patient mean (SD)=26.69 (4.03), control mean (SD)=26.79 (3.5); t 62=0.10; p=.93].
Summary of Study 1
In this well-characterized, prospectively verified sample of euthymic BD patients, there were no significant differences in emotion labeling of static facial expressions of (primary or complex) emotions, compared to controls. Images were presented to low intensities (20% and 40%), making the task more difficult (although not at floor level) and, therefore, more likely to both expose group differences and avoid ceiling effects in the control group. Despite this, no statistically significant differences were observed. Small effects were observed (0.2<d<0.5) for recognition of angry, disgusted, and fearful expressions at the higher intensity levels indicating poorer recognition by the patient sample. There was a small effect size indicating better recognition of happiness at the lowest intensity for the patient group. Thus there may be subtle differences in processing/labeling emotions that may become clearer when patients are symptomatic or when stimuli are more naturalistic or ambiguous.
Study 2: Depression
In a second study, we aimed to examine emotional expression labeling in bipolar patients in a current depressive episode. We also administered a dynamic version of the facial emotion recognition test, an approach which has been suggested to hold many advantages over typical static displays, including increased ecological validity (for a review, see Krumhuber, Kappas, & Manstead, Reference Krumhuber, Kappas and Manstead2013). In addition, we administered a standardized, well-validated “static” facial emotion labeling task from the Facial Expression of Emotion: Stimuli and Tests (FEEST) battery (Young et al., Reference Young, Perret, Calder, Sprengelmeyer and Ekman2002).
Methods
Participants
A total of 100 participants (n=53 bipolar patients and n=47 matched controls) were recruited. Recruitment was part of a larger research program into the effects of glucocorticoid receptor antagonists in bipolar depression, which involved a comprehensive baseline assessment of neuropsychological processing, including emotional processing (Gallagher, Gray, Watson, Young, & Ferrier, Reference Gallagher, Gray, Watson, Young and Ferrier2014) (Table 1).
Table 1. Demographic details of the patient samples
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922015620-77942-mediumThumb-S1355617715000909_tab1.jpg?pub-status=live)
NART, National Adult Reading Test; HDRS-17, Hamilton Depression Rating Scale 17-item; YMRS, Young Mania Rating Scale; BDI, Beck Depression Inventory; AMRS, Altman Mania Rating Scale
Patients were aged 18–65 years with a diagnosis of BD, confirmed using the Structured Clinical Interview (SCID; First et al., Reference First, Spitzer, Williams and Gibbon1995), and were recruited from secondary/tertiary care in North-East England. All were out-patients and currently in a SCID-defined depressive episode. Patients were excluded if they met criteria for any other current Axis-I disorder or substance dependence/abuse. All were receiving medication at the time of testing (stable ≥4 weeks). Healthy controls were recruited by general advertisement. All controls were screened to exclude personal/family history (first-degree) of psychiatric illness, significant medical/neurological illness, or history of drug/alcohol abuse. The study was approved by the Newcastle and North Tyneside Local Research Ethics Committee. Written informed consent was obtained from all participants.
Table 2. Results of the facial expression recognition task in euthymic patients using static stimuli.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922015620-56042-mediumThumb-S1355617715000909_tab2.jpg?pub-status=live)
Note. Means and standard deviations of % correct at each intensity level for each emotion.
a Of the control sample.
Measures
Facial Expression Recognition Task – Dynamic Images (FERT-dynamic)
Similar to the FERT-static, this version of the task uses faces from Ekman and Friesen (Reference Ekman and Friesen1976), cropped to isolate the face. Two male and two female faces were used (sets: jj, pe, pf, mo). The program rapidly displays the images (~50 ms per image), which change from neutral (0% intensity) to the full prototypical emotion (100% intensity) in 5% steps, producing a dynamic morphing effect. This 1000 ms “stream” can be terminated at any of these steps allowing emotional morphs of 5% increments to be possible. For this study, after a short practice block, 80 trials were randomly administered, divided into four blocks, permitting a rest between each block. In total there were 16 trials for each of five emotions (happy, sad, anger, disgust, fear). For each of these emotions, four intensity levels were used (30–50–70–100%). Participants make their response by pressing one of the five emotion labels presented on the right of the screen. These are only active after the “morph” has completed and the face disappears.
Benton Facial Recognition Test (short-form) (Benton, Sivan, Hamsher, Varney, & Spreen, Reference Benton, Sivan, Hamsher, Varney and Spreen1983)
The BFRT was administered as a control task to examine general face recognition ability. The short form contains 13 trials (maximum score=27). On each item, participants are presented with a target black and white photograph and are asked to choose the target individual from six faces, presented simultaneously with the target photograph.
Emotional Hexagon test (FEEST)
The Emotional Hexagon test from the FEEST was administered according to the standardized instructions (Young et al., Reference Young, Perret, Calder, Sprengelmeyer and Ekman2002). The test uses one actor (jj) from (Ekman and Friesen, Reference Ekman and Friesen1976) displaying six emotional expressions (happiness, surprise, fear, sadness, disgust, anger). Each emotion is blended with the two it is most often confused with, resulting in blends over five continua: happiness–surprise, surprise–fear, fear–sadness, sadness–disgust, disgust–anger; the final blend from anger–happiness completes the hexagon. Blends are displayed in five different proportions of the two emotions: 90:10%, 70:30%, 50:50%, 30:70%, 10:90%. This results in 30 unique stimuli which are displayed randomly 5 times each over the course of the task, giving a total of 150 experimental trials. Participants make their response by pressing one of the six emotion labels presented along the bottom of the screen, which most closely represents the face they saw.
Results
Two patients did not complete the emotion recognition tasks, so results are presented for the remaining 51 who had full valid data.
FERT-dynamic
The results of the facial expression recognition task using dynamic stimuli in depressed patients are shown in Table 3. The results of a five(emotion) × four(intensity) × two(group) repeated measures ANOVA indicated that there was no significant main effect of group (F 1,96=2.23; p=.14; η p 2=0.02). There was a significant main effect of emotion (F 4,384=76.77; p<.001; η p 2=0.44) indicating differences in the accuracy of overall emotional labeling (ranging from happy being the most easily detected; average collapsed across group and intensity=95.9%, and disgust being the most difficult; 58.4%) and a main effect of intensity (F 3,288=104.30; p<.001; η 2=0.52), with accuracy increasing with increasing intensity. There was no significant group × emotion interaction (F 4,384=0.71; p=.59; η p 2=0.01) and no interaction between group, intensity, and emotion (F 12,1152=1.15; p=.31; η p 2=0.01), although the group × intensity interaction was significant (F 3,288=2.96; p=.033; η p 2=0.03), with patients being worse at 30% compared to controls.
Table 3. Results of the facial expression recognition task in depressed patients using dynamic stimuli.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922015620-68025-mediumThumb-S1355617715000909_tab3.jpg?pub-status=live)
Note. Means and standard deviations of number correct at each intensity level for each emotion.
a Of the control sample.
The effect sizes showed a small effect size difference for the recognition of disgust and happiness at the lowest intensity level, indicating poorer recognition by the patients. Small effects were also noted for poorer recognition of fear by the patients at the 30%, 50% and 100% intensity levels. There was a medium effect size (0.5≤d<0.8) again showing poorer performance by the patients for the recognition of anger at the lowest intensity level. These are commensurate with the magnitude of effect sizes noted in euthymic patients, not larger as anticipated. As for the euthymic sample, the majority of the calculated effect sizes were d<0.2.
BFRT
BD patients were significantly poorer than controls on the BFRT (t 98=−2.41; p=.02), although this corresponded to only a 1-point difference in performance (BD: mean=22.8; SD=2.32; Controls: mean=23.8; SD=1.72).
FEEST
Data from the Emotional Hexagon paradigm (Figure 2) were available in a sub-set of 51 participants (26 bipolar depressed patients and 25 controls). The results of a six (emotion: angry-disgusted-fearful-happy-sad-surprised) × two (group: patient, control) repeated-measures ANOVA indicated that there was no significant main effect of group (F 1,49=1.56; p=.22; η p 2=0.30) or group × emotion interaction (F 5,245 =0.31; p=.85; η p 2=0.01). A significant main effect of emotion was observed (F 5,245=13.66; p<.0001; η p 2=0.22). Pairwise comparisons revealed that overall, while not differing from each other, accuracy for happy and sad faces was significantly higher than for all other emotions. Conversely, while not differing from each other, accuracy for disgusted, angry and fearful faces was significantly lower than all other emotions (p<.05).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922015620-37237-mediumThumb-S1355617715000909_fig2g.jpg?pub-status=live)
Fig. 2 Results of the facial expression recognition task in depressed patients using static blended stimuli (emotional hexagon).
Exploratory Analyses (Study 1 and 2)
Correlations
The relationship between performance in emotional labeling and age, and in patients, length of illness was examined. From study 1, for the FERT-static, in euthymic patients the only significant correlations between length of illness and accuracy were for 40% anger (rs=−0.39; p=.02) and 80% happy (rs=−0.41; p=.01). Significant correlations with age were found for anger at 40% and 60% (rs=−0.42; p<.01 and rs=−0.403; p=.01), fear at 60% and 80% (rs=−0.42; p=.01 and rs=−0.37; p=.02), and happy at 60% (rs=−0.35; p=.03). In controls, age was correlated with anger at 60% (rs=−0.45; p=.02), fear at 40% and 60% (rs=−0.52; p=.005 and rs=−0.44; p=.02), and sad at 20% (rs=−0.44; p=.02). No significant correlations were observed with the “Eyes test”. From Study 2, for the FERT-dynamic, in depressed patients, the only significant correlation was between length of illness and 50% disgust (rs=−0.31; p=.04). In controls, age was correlated negatively with fear at 50% (rs=−0.32; p=.03), 70% (rs=−0.46; p=.001) and 100% (rs=−0.33; p=.02), and happy at 30% (rs=−0.32; p=.03). For the Emotional Hexagon, the only significant relationship was a positive correlation between disgust and age in patients (rs=0.44; p=.02). The overall effect of age on FERT performance was examined by analysis of covariance (ANCOVA). Age was a significant covariate in both the depressed (F 1,95 =4.04; p<.05, η 2=0.04) and euthymic (F 1,95 =9.03; p<.01; η 2=0.13) analyses but did not affect the overall significant findings (i.e., the significant main effects of emotion, intensity, and a group by intensity interaction).
Impact of General Neuropsychological Performance
Alongside the emotional recognition tasks in both studies 1 and 2, a broader battery of neuropsychological tests was administered (see Robinson, Reference Robinson2010; Gallagher et al., Reference Gallagher, Gray, Watson, Young and Ferrier2014). To explore the effect of more general (non-emotion related) cognitive processes on performance, we repeated the analysis of FERT data from studies 1 and 2 with the addition of a covariate (ANCOVA). Two commonly used measures utilized in both studies were the “FAS” test verbal fluency (assessing executive function) (Benton et al., Reference Benton, Sivan, Hamsher, Varney and Spreen1983) and the digit symbol substitution test (DSST) (Wechsler, Reference Wechsler1981); assessing psychomotor/processing speed); these were examined independently. In study 1 (euthymia), both the DSST (F 1,63 =6.94; p=.01; η p 2=0.10) and the “FAS” (F 1,62 =4.06; p<.05; η p 2=0.06) were significant covariates (FAS: euthymic patients mean=43.3; SD=11.97; controls mean=48.0; SD=11.97; DSST: euthymic patients mean=48.3; SD=11.87; controls mean=54.4; SD=11.53). Their inclusion did not affect the significant main effects of “emotion” or “intensity”; however, the addition of the “FAS” rendered the previously observed group × intensity interaction non-significant (F 3,186 =2.29; p=.09; η p 2=0.04). In study 2, only the inclusion of the DSST was significant (F 1,95 =5.61; p=.02; η p 2=0.06) which again did not affect the significant main effects of “emotion” or “intensity”, but rendered the group × intensity interaction non-significant (F 3,285 =2.41; p=.07; η p 2=0.025). (FAS: depressed patients mean=38.2; SD=8.88, controls mean=44.5; SD=10.33; DSST: depressed patients mean=48.0; SD=11.76, controls mean=56.4; SD=11.35).
General Discussion
There were no significant differences between patient and control groups on any of the emotional expression measures used in the present study, with the exception of a group by intensity interaction in facial emotion labeling. Contrary to expectations, neither overall group differences or emotion-specific differences were observed in symptomatic patients nor with tasks using stimuli that were either more ecologically valid (in the case of the dynamic FERT) or more ambiguous (i.e., labeling complex emotions or blends of different emotions). The only significant differences observed were either not associated with emotional processing (i.e., matching facial identity in depressed BD) or did not remain once general neuropsychological functioning was accounted for (in the case of the FERT interactions between group and intensity). This differs from the recent findings of van Rheenen and Rossell (Reference Van Rheenen and Rossell2014a), where a general deficit in emotion recognition and discrimination was observed. It is worth noting that unlike their study, the present studies did not include measures of emotion discrimination. Nonetheless, van Rheenen & Rossell (Reference Van Rheenen and Rossell2014a) noted differences on the emotion recognition measures that were not evident in the present studies on similar tasks (emotion recognition of static or dynamic images displayed at different intensities). Our sample included patients in either the euthymic or depressed phase of illness and explored the two groups separately. Combining groups of patients in different symptomatic states and including patients in the manic or hypomanic state could be one reason why the results differ. A recent meta-analysis focusing exclusively on euthymia reported a significant effect for the Eyes Test in contrast to the present finding, although the effect size of the deficit in patients was small (Hedges’ g=0.27) (Samamé, Martino, & Strejilevich, Reference Samamé, Martino and Strejilevich2015). Interestingly, the labeling of several individual emotions (i.e., anger, sadness, disgust) from facial emotion perception studies was not significantly different between patients and controls (Hedges’ g=0.15–0.25), although recognition of surprise and fear was significantly worse but again this effect was small (Hedges’ g=0.22–0.29). After excluding one outlying study, significantly greater impairment was observed for disgust and fear recognition in patients (Hedges’ g=0.39–0.43).
The relatively comprehensive set of emotion recognition tests, including paradigms that are generally considered more difficult and, therefore, more likely to expose a deficit or bias (e.g., static images of low-intensity emotions), combined with large samples of well-characterized patient groups are strengths of the present study. As with many studies in patient samples, low statistical power is a concern in the present study. While the present analyses were adequately powered (1-β≥80%) to identify large effect size differences for main effects of group, power was lower to detect smaller effect sizes, especially from interactions. Indeed, the observed effect sizes indicated small effects (d<0.2) on some measures, although many were below this threshold. This study adds to others (Addington & Addington, Reference Addington and Addington1998; Bellack et al., Reference Bellack, Blanchard and Mueser1996; Edwards et al., Reference Edwards, Pattison, Jackson and Wales2001; Lembke & Ketter, Reference Lembke and Ketter2002; Rowland et al., Reference Rowland, Hamilton, Vella, Lino, Mitchell and Green2012; Vaskinn et al., Reference Vaskinn, Sundet, Friis, Simonsen, Birkenaes, Engh and Andreasse2007) that have not reported evidence of significant impairment in facial emotion recognition in BD. It is difficult to infer directly from statistical effect size to clinical significance, but it seems this element of emotion processing (specifically the labeling of displayed emotion) may be of limited importance in understanding the presentation of those with this disorder. However, we reiterate the specificity of our findings here as we address only one aspect of emotion processing—the perception and labeling of emotion transmitted by the face/facial features. Numerous other processes have been examined in mood disorders, such as attentional bias for emotional stimuli, go/no-go biases, and memory/recall of emotional information (Jongen, Smulders, Ranson, Arts, & Krabbendam, Reference Jongen, Smulders, Ranson, Arts and Krabbendam2007; Rubinsztein, Michael, Underwood, Tempest, & Sahakian, Reference Rubinsztein, Michael, Underwood, Tempest and Sahakian2006; Wessa & Linke, Reference Wessa and Linke2009). Our studies are concerned only with this labeling process and cannot speak to questions around other processes, although it is critical for future studies to determine that their findings are clearly attributable to the emotional process per se and not secondary to a more general neurocognitive deficit.
It is important to note that the patient samples in our study did show significant neuropsychological deficits with large effect sizes in many domains of “cold” cognition (Gallagher et al., Reference Gallagher, Gray, Watson, Young and Ferrier2014; Robinson, Reference Robinson2010); therefore, the absence of differences is not a consequence of recruiting high-performing patients with BD. To derive a sense of the relative scale of “impairment,” the proportion of the patient group falling below the 5th/10th percentile of controls can be examined (Gallagher et al., Reference Gallagher, Gray, Watson, Young and Ferrier2014; Thompson et al., Reference Thompson, Gallagher, Hughes, Watson, Gray, Ferrier and Young2005). In the euthymic sample, the proportion of patients scoring below the 10th percentile on cognitive measures (administered alongside the facial expression battery) ranged from 2.6–53.8% (Robinson, Reference Robinson2010). These tests included measures of executive function, verbal declarative memory, working memory, and psychomotor speed. Those domains showing the largest proportion of low-scoring patients were executive measures (category fluency, 53.8%) and verbal declarative memory (list-learning total recall, 42.1%). In contrast, the proportion of patients scoring below the 10th percentile on the facial expression recognition test, after separating by intensity level, ranged from 2.6 to 18.4% (for the total, this was 2.6–15.8%), suggesting there is less evidence of potential impairment on these measures. Data for the depressed patients showed a similar pattern. The cognitive performance of the depressed sample is detailed elsewhere (Gallagher et al., Reference Gallagher, Gray, Watson, Young and Ferrier2014): patients performed significantly worse on 18/26 measures examined, with large effect sizes (d>0.8) on tests of speed of processing, verbal learning, and specific executive/working memory processes. Almost all tests produced at least one outcome measure on which ~25–50% of the BD sample performed at more than 1 SD below the control mean. Patients performing below the controls’ 10th percentile for measures of accuracy ranged from 11.3–47.2%. However, in the present study, for the facial expression recognition task, examining the separate intensities this ranged from 0–29.4% (for the total this was 7.8–13.7%). Importantly, our exploratory analyses showed that by including measures of executive function or psychomotor/processing speed as covariates could account for the group by intensity interactions seen in both FERT tasks. This is in line with a previous study suggesting that deficits in Theory of Mind and emotion labeling may be in part mediated by attention-executive deficits (Martino et al., Reference Martino, Strejilevich, Fassi, Marengo and Igoa2011). However, several caveats should be noted. First, directly comparing “cold” and “hot” cognitive tasks is problematic if the discriminating power of the tasks differ; indeed with tests of differing reliabilities, the measure with the higher reliability coefficient will record a greater performance decrement for less able participants (Chapman & Chapman, Reference Chapman and Chapman1973). There are also several possible interpretations for the observed interaction effect between group and intensity, while we suggest that the effect is a consequence of generalized deficits leading to difficulties with the most difficult/ambiguous stimuli (i.e., stimuli with the lowest “information” content), we cannot rule out the possibility that this is reflecting a specific deficit in low-level emotional perception (for both positive and negative emotions). Therefore, it is important for future studies to explore this effect within the task design itself (rather than post hoc through statistical methods).
Given the extent of these neuropsychological deficits, it might be that, where individuals with BD have shown performance deficits on tasks involving facial expression perception previously, some of these findings may have been secondary to general difficulties in performing (lab-based experimental) tasks, rather than deficits in facial expression perception per se. However, the effect of such general deficits might be expected to be fairly small (since one would hope that the assessments of facial expression perception have a good degree of specificity) and emerge as significant in a fairly random manner in some experiments but not others and, within these experiments, in some conditions but not others (contingent upon the precise demands of the task/condition); this pattern seems to describe the literature reviewed previously in BD. For example, where facial expression perception experiments and analyses overlap with cognitive domains in which individuals with BD have deficits, they would be more likely to report significant results with a greater effect size. It is of interest that, in functional magnetic resonance imaging studies, it has been demonstrated that patterns of activation differ according to the demands of the task. Direct matching of emotional facial expressions has been found to increase amygdala activation while the selection of the label that matches (e.g., “afraid”) results in greater right prefrontal cortex activation (Hariri, Bookheimer, & Mazziotta, Reference Hariri, Bookheimer and Mazziotta2000). Therefore, tasks which examine emotion discrimination compared to labeling may be tapping different aspects of processing.
These methodological differences may partially account for some of the variability in findings to date (we refer specifically to the accuracy decrement here, rather than bias). For example, tasks with a response format that have a high memory load, complex instructions, or time-pressured responses may be more likely to show group differences. Indeed, the greatest proportion of the depressed sample scoring below the 10th percentile for the control group occurred at the 30% level of intensity, which were the stimuli that were displayed for the shortest amount of time. Future studies should also consider how the specifics of the response format can potentially affect the outcome of studies of this nature. For example, it is important to be mindful that the majority of studies are fixed-choice paradigms [i.e., there is no “don’t know” option, such as in standardized measures like the Ekman-60 (Young et al., Reference Young, Perret, Calder, Sprengelmeyer and Ekman2002)]. Therefore, if stimuli are presented quickly or are ambiguous, participants still have to select one of the options to move to the next trial. Therefore, patients (who may simply be slightly slowed in general processing speed or decision making) are more likely to “miss” stimuli and select a random response to move on; this is not an emotional processing bias/deficit, although it may seem so if systematic factors influence the response chosen (e.g., the response option closest to the participant’s hand). It should also be noted that, in tasks of this nature, the majority of the available responses are “negative” emotions, with “happy” typically the only overt positive emotion available. Therefore, with regard to the occurrence of this latter phenomena, any form of systematic response bias will lead to a “deficit” in the perception of one emotion and an increase in another, which will typically be another “negative” emotion.
A further point to consider is how findings in this area are interpreted. For example, results that have demonstrated reduced accuracy of labeling specific expressions have been interpreted as supporting the notion that emotion perception decrements are evident in BD (Vederman et al., Reference Vederman, Weisenbach, Rapport, Leon, Haase, Franti and Mcinnis2012). Other studies have interpreted increased correct recognition of specific emotions (e.g., disgust) as possibly being linked to low self-esteem and other cognitive biases in BD (Harmer et al., Reference Harmer, Grayson and Goodwin2002). It is, therefore, important to consider the precise nature of the task demands and the social processes being assessed to avoid a situation in which both increased and decreased accuracy is considered as reflecting a “negative outcome.” It is also necessary to consider the potential difference between greater accuracy, which may reflect hypersensitivity to characteristic features of emotional expressions and hence “more accurate” social perception, versus a “true” bias where stimuli (especially ambiguous stimuli) are consistently interpreted as showing a particular emotion (Leppanen, Reference Leppanen2006), suggesting top down influences are affecting the interpretation of incoming information so the individual “sees” a particular emotion when it may not be present (Martino et al., Reference Martino, Strejilevich, Fassi, Marengo and Igoa2011). It is worth noting that the two processes (mood related bias and general deficit in accurate responding) may work counter each other in particular cases. Further work is needed to develop an understanding of the circumstances in which accuracy decrements occur and those where hypersensitivity or bias may occur.
There are several limitations of the present study to be considered. First, low statistical power for the interaction analyses has already been mentioned. This difficulty is commonly encountered in this area of investigation and is likely to contribute to the varied findings. More widespread reporting of effect sizes alongside inferential statistics would help clarify whether studies are broadly finding group differences of a similar magnitude or, if not, it may help to identify which methodological variations impact most markedly on group differences. Second, we did not administer the same tests to both patient groups, which raises the possibility that some measures may have shown differences had both groups received the same tasks. However, three of the tasks used the same image set and similar intensities of emotions and all involved a range of difficulty in the stimuli presented, thereby offering the opportunity for even a subtle deficit to become evident. Also, using the two different experimental expression recognition tests suggests the lack of difference is not specific to a methodological feature of one particular task. Furthermore, the depressed sample were administered standardized measures [e.g., the Emotional Hexagon (Young et al., Reference Young, Perret, Calder, Sprengelmeyer and Ekman2002)] alongside the other tasks and did show pronounced deficits in other aspects of cognitive function. Third, although we used a dynamic emotional expression task to increase the ecological validity of the task, some studies have suggested that dynamic facial movements play only a small role in the ability to identify emotion from facial expressions (Gold et al., Reference Gold, Barker, Barr, Bittner, Bromfield, Chu and Srinath2013). Nonetheless, using different variants of facial emotion stimuli develops our understanding of the robustness or otherwise of any effect irrespective of ecological validity. Recently, it has been demonstrated that impairments can be observed in dynamic (videotaped) displays of emotion and more complex aspects of social communication in BD, in the absence of differences in labeling static images of facial emotion (Rowland et al., Reference Rowland, Hamilton, Vella, Lino, Mitchell and Green2012). Therefore, the use of methodologies that capture the real-world complexities and subtleties of social interaction may prove important tools for future studies to explore emotional processing deficits in BD.
Based upon our current findings and the mixed findings of the literature, we conclude there is little evidence of abnormalities in explicit facial emotion identification in euthymic or depressed patients, within the parameters examined in the present studies. Future studies should address the methodological issues in this area of research—especially using paradigms with limited memory load and time pressure—to build a more complete picture of emotion processing in BD and how or whether it is of relevance in our understanding of this illness.
Acknowledgments
This work was supported by grant funding from the Stanley Medical Research Institute (REF: 03T-429) and the Medical Research Council (GU0401207). L.J.R., P.G., and I.N.F. received Research Capability Funding support from the Northumberland, Tyne and Wear NHS Foundation Trust. We thank the Mental Health Foundation (north-east) for supporting the purchase of test materials. We are grateful to the participants who contributed to the research and to those clinicians involved in the wider research programme, including recruitment and screening: Stuart Watson, David Watson, Allan Young, Niraj Ahuja, Sankalpa Basu, Jane Carlile, Louise Golightly, Thiyyancheri Harikumar, Patrick Keown, Samer Makhoul, Anuradha Menon, Gavin Mercer, Rajesh Nair, Bruce Owen, and Nanda Palanichamy. We thank Daniel Hedley for assistance with the analysis of data from the FEEST. The authors have no conflicts of interest to declare in relation to the manuscript.