Introduction
The prevalence of bipolar disorder in the U.S. has been estimated to be about 1% of the population for bipolar I disorder and another 1% for bipolar II disorder.Reference Merikangas, Akiskal and Angst1 Bipolar disorder can be conceptualized as a predominantly depressive disorder, based on the amount of time patients with bipolar disorder are symptomatic with depression.Reference Judd, Akiskal and Schettler2, Reference Judd, Akiskal and Schettler3 Moreover, on average, the ratio of the number of depressive episodes to manic/hypomanic episodes is 3:1 for bipolar I disorder,Reference Judd, Akiskal and Schettler2 and the ratio of depressive episodes to hypomanic episodes is 39:1 for bipolar II disorder.Reference Judd, Akiskal and Schettler3 None of the classic antidepressants, serotonin specific reuptake inhibitors, or serotonin-norepinephrine reuptake inhibitors have ever received regulatory approval by the U.S. Food and Drug Administration (FDA) as monotherapies for the treatment of bipolar depression. Astonishingly, up until the approval of olanzapine/fluoxetine combination (OFC) in 2003, there were no FDA-approved medications for the specific indication of acute bipolar depression. Today we have 3 different approved agents to select from: OFC, quetiapine (immediate or extended release), and lurasidone (monotherapy or adjunctive to lithium or valproate).Reference Citrome, Ketter, Cucchiaro and Loebel4
This narrative review outlines the definition of bipolar depression, makes the case for the importance of making an accurate diagnosis, provides an approach to the interpretation of clinical trials that test interventions for bipolar depression, reviews both approved and unapproved treatments for bipolar depression, and concludes with a discussion of maintenance treatment.
What Is Bipolar Depression?
In order to make a Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) diagnosis of bipolar I disorder, it is necessary to meet criteria for a current or past manic episode.5 The manic episode may have been preceded by and may be followed by hypomanic or major depressive episodes. For a diagnosis of bipolar II disorder, it is necessary to meet criteria for a current or past hypomanic episode and criteria for a current or past major depressive episode. The criteria for a major depressive episode associated with either bipolar I disorder or bipolar II disorder is identical to that for major depressive disorder. The major distinguishing feature between bipolar disorder vs major depressive disorder is thus the presence of manic or hypomanic episodes. For the sake of brevity, it is common to call major depressive episodes associated with bipolar disorder “bipolar depression,” and that bipolar I disorder and bipolar II disorder are part of a “bipolar spectrum disorder.”
Making an Accurate Diagnosis
It can be difficult to differentiate between an acute episode of bipolar depression and an acute episode of major depressive disorder. Both can appear identical on cross-sectional mental status examination. Taking a longitudinal history is thus essential. However, among persons with bipolar disorder, there may be a lack of insight into the pathological nature of mania or hypomania,Reference Berk, Berk, Moss, Dodd and Malhi6 and it may not get reported by the patient; contacting a third party (family member or friend) will often be necessary in order to get a more accurate history.
The Mood Disorder Questionnaire (MDQ) can be a helpful diagnostic screen for bipolarity.Reference Hirschfeld, Williams and Spitzer7 The MDQ is completed by the patient and takes about 5 minutes. Part 1 of the MDQ consists of 13 items assessing areas such as irritability, sleep, racing thoughts, and speech. If the patient endorses at least 7 of the items, with several occurring during the same period of time, and whose consequences (being unable to work; having family, money, or legal troubles; getting into arguments or fights) are “moderate” or “serious,” then the patient has screened positive and should receive a comprehensive medical evaluation for bipolar spectrum disorder.
Unfortunately, misdiagnosis of bipolar disorder (and bipolar depression) is common. Up to 69% of persons with bipolar disorder are misdiagnosed initially (usually diagnosed as having major depressive disorder), with a mean number of 3.5 other diagnoses being proffered and receiving evaluation or treatment from 4 clinicians before receiving the correct diagnosis of bipolar disorder.Reference Hirschfeld, Lewis and Vornik8 Comorbidity is a common confounding factor when evaluating patients, making assessments quite complex; 50%–70% of persons with bipolar disorder have at least one comorbid psychiatric or medical condition, such as anxiety, substance use, obesity, and cardiovascular disease.Reference Simon, Otto and Weiss9–Reference de Almeida, Moreira and Lafer12 It is estimated that as many as 1 in 5 primary care patients who have clinically significant depressive symptoms and are receiving antidepressant treatment actually have bipolar I or bipolar II disorder.Reference Hirschfeld, Cass, Holt and Carlson13
The consequences regarding misdiagnosing bipolar depression include the use of incorrect treatments, making incorrect prognoses, and increasing the likelihood for poor outcomes.Reference Das, Olfson and Gameroff14 The incorrect treatment of greatest concern is the use of antidepressant medications such as those routinely prescribed for the treatment of major depressive disorder. As noted, no antidepressant is approved for the treatment of bipolar depression (except for fluoxetine in combination with olanzapine). Antidepressant monotherapy can destabilize a person with bipolar depression by causing the induction of mania or hypomania and/or rapid cyclingReference Pacchiarotti, Bond and Baldessarini15; the emergence of a manic or hypomanic episode during antidepressant treatment is now recognized in DSM-5 as sufficient for the diagnosis of mania or hypomania.5 Moreover, when comparing groups of patients receiving adjunctive antidepressant treatment vs adjunctive placebo together with a mood stabilizer, antidepressants do not confer a treatment advantage for either transient or enduring response.Reference Sachs, Nierenberg and Calabrese16
In addition to the MDQ, additional clues that would increase one’s index of suspicion for bipolar disorder in a patient presenting with a major depressive episode are listed in Table 1.Reference Muzina, Colangelo, Manning and Calabrese17–Reference Citrome and Goldberg20
Table 1 Clues to avoid misdiagnosis: increase your index of suspicion for bipolar disorder if these items are present
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:99097:20160414095248082-0981:S109285291400056X_tab1.gif?pub-status=live)
How Can Clinical Trials Inform Us?
Although the randomized placebo-controlled clinical trials that are done for regulatory purposes enroll patients who may differ from those in our own clinical practice, they do provide an estimate of a medication’s potential therapeutic effect and can inform us about potential tolerability issues that may complicate medication use in the “real world” setting.
In many studies of acute bipolar depression, the primary efficacy outcome measure has been change in the Montgomery–Asberg Depression Rating Scale (MADRS),Reference Montgomery and Asberg21 a 10-item rater-administered scale. If the change observed with the test medication is statistically significantly larger than that observed with placebo, the study is considered “positive” and supportive of efficacy. However, statistical significance does not necessarily mean the result is clinically relevant or “clinically significant.” A result that is statistically significant at the threshold of P<.05 or P<.001 may be clinically irrelevant if the size of the treatment effect is small.Reference Citrome22
There are a number of different treatment effect size metrics that can be used to assess clinical significance,Reference Citrome23 but perhaps the most clinically intuitive one is called number needed to treat (NNT).Reference Citrome22–Reference Citrome and Ketter24 NNT can be defined as the number of patients you would need to treat with one medication instead of another intervention before you would expect to encounter one additional positive outcome of interest. Thus in “patient-units,” the NNT spells out the size of the treatment effect. The lower the NNT, the more robust the differences are between the 2 interventions. When examining adverse effects, the term number needed to harm (NNH) is used. The higher the NNH, the less likely that one will encounter the outcome one would rather avoid. The best treatments will have a low NNT (so benefits are encountered as often as possible) and a high NNH (so harms are encountered as seldom as possible). Therapeutic outcomes of interest include response (achievement of a reduction from baseline of at least 50% on a rating scale score, such as on the MADRS) and remission (achievement of a score no greater than a preset threshold on a rating scale, such as a score of 12 on the MADRS). Adverse events of interest include the occurrence of sedation/somnolence, weight gain of at least 7% from baseline, akathisia, and nausea.
The ratio of NNH to NNT is called the likelihood to be helped or harmed (LHH).Reference Citrome and Ketter24 LHH can quantify trade-offs between benefits and harms. For example, for a hypothetical medication, if the NNT vs placebo is 6 for a clinically relevant therapeutic response and the NNH vs placebo for nausea is 12, the LHH is 12/6 or 2. This LHH of 2 for response vs nausea can be interpreted that “treatment was twice as likely to help (therapeutic response) than to harm (nausea) the patient.” Matching up the benefit to the specific harm that is of the most concern for the patient and clinician requires individualized decision-making based on the patient’s past experiences, values, and preferences. Not all harms (or benefits) are valued the same by all patients. For example, some patients may want to avoid sedation and/or weight gain, while others may be willing to accept this trade-off in the quest for a better therapeutic response.
In the absence of direct head-to-head controlled trials of the approved medications available for the treatment of acute bipolar depression, indirect comparisons can be made by examining the effect sizes (as measured by NNT and NNH) vs placebo for the interventions in question. In general, approved interventions have NNT values vs placebo for response and remission that are less than 10, indicating that fewer than 10 patients are required to be randomized to the test medication vs placebo before expecting to encounter 1 additional responder or remitter. The lower the NNT, the more powerful the treatment effect, but it is unusual for complex chronic illnesses to have interventions that carry NNT values vs placebo that are less than 4. On the other hand, desirable interventions should have NNH values vs placebo that are at least 10, so that these harms would be uncommon. There are sometimes exceptions when NNH values less than 10 can be acceptable, such as when the adverse event is mild or moderate, temporary in duration, easily managed, and does not necessarily lead to discontinuation.Reference Citrome and Ketter24
Approved Treatments for Bipolar Depression
There are currently 3 different treatments that are FDA-approved for the indication of acute bipolar depression: OFC, quetiapine monotherapy (immediate or extended release), and lurasidone (as a monotherapy or adjunctive to lithium or valproate). See Table 2 for the responder and remitter rates and the resultant NNTs and 95% confidence intervals (CI) for each of these interventions.Reference Citrome, Ketter, Cucchiaro and Loebel4, Reference Loebel, Cucchiaro and Silva25–Reference Tohen, Vieta and Calabrese31 The NNTs for response and remission for each of these interventions vs placebo range from 4 to 7 and 5 to 7, respectively. For both response and remission, the NNTs for each intervention are approximately the same and the 95% CIs overlap, suggesting negligible differences in efficacy among the interventions when comparing groups of patients vs placebo. Although there is no clear evidence suggesting that one medication would be better than the other regarding efficacy, there may be differences in efficacy that emerge when treating individual patients, as would be expected when treating heterogeneous disorders.
Table 2 Psychopharmacology of acute bipolar depression: response/remission rates from short-term placebo-controlled clinical trials and number needed to treat vs placebo
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160711024759-90460-mediumThumb-S109285291400056X_tab2.jpg?pub-status=live)
Data from references Reference Citrome, Ketter, Cucchiaro and Loebel4, Reference Loebel, Cucchiaro and Silva25–Reference Tohen, Vieta and Calabrese31.
NNT – number needed to treat; CI – confidence interval.
Response defined as a 50% or greater reduction from baseline on the Montgomery–Asberg Depression Rating Scale (MADRS) total score. Remission defined as an endpoint MADRS total score less than or equal to 12.
More distinct differences emerge when tolerability outcomes are examined. The product labels for antipsychotics generally include a list of spontaneously reported adverse events that occur in at least 5% of patients in the clinical trials, the percentage who gain at least 7% of their baseline body weight, and the percentage of patients who discontinue because of an adverse event. Tables 3–6 list the adverse events that meet the incidence threshold of 5%, together with the NNH values.Reference Citrome, Ketter, Cucchiaro and Loebel4, Reference Loebel, Cucchiaro and Silva25–34 Of potential concern regarding tolerability during routine clinical use, NNH values vs placebo of less than 10 were observed for OFC for weight gain (NNH 7) and diarrhea (NNH 9), and for quetiapine for somnolence (NNH 3) and dry mouth (NNH 4). No NNH values vs placebo were less than 10 for any of the adverse events observed with lurasidone monotherapy or adjunctive lurasidone. However, in general, NNH values vs placebo for lurasidone monotherapy were lower (ie, more problematic) for the dose range of 80–120 mg/day compared with 20–60 mg/day. For the outcome of weight gain of at least 7% from baseline, the NNH values vs placebo were 6 for OFC, 16 for quetiapine, 36 for adjunctive lurasidone, and 58 (not statistically significant) for lurasidone monotherapy (Table 7). Discontinuation due to an adverse event was not statistically significantly different from placebo for lurasidone monotherapy 20–60 mg/day or 80–120 mg/day, adjunctive lurasidone, or OFC, but was statistically significantly different for quetiapine vs placebo, with rates of 15.0% vs 4.1%, respectively, yielding a NNH of 10 (95% CI 8–13).
Table 3 Olanzapine/fluoxetine combination (6 and 25, 6 and 50, or 12 and 50 mg/day): spontaneously reported adverse events with incidence of at least 5% and number needed to harm vs placebo
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160711024759-94006-mediumThumb-S109285291400056X_tab3.jpg?pub-status=live)
Data from reference Reference Citrome30.
NNH – number needed to harm; CI – confidence interval.
Table 4 Quetiapine monotherapy (immediate or extended release, 300 or 600 mg/mg/day): spontaneously reported adverse events with incidence of at least 5% and number needed to harm vs placebo
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160711024759-39282-mediumThumb-S109285291400056X_tab4.jpg?pub-status=live)
Data calculated from references Reference Calabrese, Keck and Macfadden27–Reference Suppes, Datto, Minkwitz, Nordenhem, Walker and Darko29, with somnolence data pooled from references 33 and 34.
NNH – number needed to harm; CI – confidence interval.
Table 5 Lurasidone monotherapy (20–120 mg/day): spontaneously reported adverse events with incidence of at least 5% and number needed to harm vs placebo
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:28412:20160414095248082-0981:S109285291400056X_tab5.gif?pub-status=live)
Data from reference Reference Citrome, Ketter, Cucchiaro and Loebel4.
NNH – number needed to harm; CI – confidence interval.
ns – not significant; the 95% confidence interval includes infinity.
Table 6 Lurasidone adjunctive therapy: spontaneously reported adverse events with incidence of at least 5% and number needed to harm vs placebo
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:28987:20160414095248082-0981:S109285291400056X_tab6.gif?pub-status=live)
Data calculated from reference 32.
NNH – number needed to harm; CI – confidence interval.
ns – not significant; the 95% confidence interval includes infinity.
Table 7 Weight gain of at least 7% from baseline and number needed to harm
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:99482:20160414095248082-0981:S109285291400056X_tab7.gif?pub-status=live)
Data from reference Reference Citrome30 and calculated from references 32–34.
NNH – number needed to harm; CI – confidence interval.
ns – not significant; the 95% confidence interval includes infinity.
For lurasidone monotherapy or lurasidone adjunctive therapy, the LHH is substantially higher than 1 when contrasting response or remission with any of the adverse events listed in Tables 5 and 6, or for weight gain in excess of 7% from baseline (Table 7). This is not the case for OFC, where for response vs weight gain of at least 7% the LHH is 1.5, and given the difficulty in managing weight gain, this trade-off may be problematic.
For quetiapine, for response vs somnolence, the LHH is 0.5, meaning that a patient is twice as likely to encounter an adverse event of somnolence vs a therapeutic response. Important additional considerations include the time to onset of the adverse event vs time to onset of a therapeutic response, as well as the severity and duration of the adverse event. The adverse event in question may be easily manageable if it is non-serious and short-lived for that individual. Thus, despite their tolerability challenges, OFC and quetiapine monotherapy may still have utility in high-urgency situations, particularly in persons who have demonstrated good outcomes with these interventions in the past, and where a pressing clinical need for efficacy mitigates their tolerability shortcomings. In addition, there may be a specific preference by the patient and the clinician for some degree of sedation, to help with heightened anxiety during the day and difficulty with sleep during the night, and potentially obviating the need for additional medication. Nevertheless, lurasidone may ultimately prove to have utility in a broad spectrum of situations, independent of the degree of urgency, because of evidence suggesting not only adequate efficacy, but also adequate tolerability. Important limitations to these indirect comparisons is that the study populations (bipolar I with or without bipolar II, with or without psychosis) and durations (6 vs 8 weeks) differed in the clinical trials for OFC, quetiapine, and lurasidone.
Unapproved Treatments for Bipolar Depression
Unapproved agents such as lamotrigine monotherapy and antidepressants are commonly used to treat acute bipolar depression. Despite their relatively weak treatment effects sizes (NNT vs placebo for response is 12 for lamotrigine and 29 for antidepressants), their risk/tolerability profiles may be more attractive for some patients than the agents that are currently approved.Reference Citrome, Ketter, Cucchiaro and Loebel4 On a cautionary note, although the risk of a mood switch with antidepressants is relatively low (NNH vs placebo for a mood switch is 200, as calculated from Sidor and MacqueenReference Sidor and Macqueen35), a switch to mania may have profound adverse psychosocial consequences.
AripiprazoleReference Thase, Jonas and Khan36 and ziprasidoneReference Sachs, Ice and Chappell37, Reference Lombardo, Sachs, Kolluri, Kremer and Yang38 have been tested in clinical trials for the treatment of bipolar depression but have not demonstrated adequate efficacy, with NNT values for response vs placebo of 44 (calculated from Thase et al Reference Thase, Jonas and Khan36) and 163 (calculated from Lombardo et al Reference Lombardo, Sachs, Kolluri, Kremer and Yang38), respectively.
After the Acute Episode, What’s Next? Maintenance Treatment
At present, 5 monotherapies (lithium, lamotrigine, olanzapine, aripiprazole, and long-acting injectable risperidone) and 3 combination therapies (quetiapine, ziprasidone, and long-acting injectable risperidone, with lithium or valproate) are approved for the longer-term treatment of bipolar disorder.Reference Ketter, Citrome, Wang, Culver and Srivastava39 Valproate, although never approved for maintenance treatment, is often used for that purpose. In general, the NNT vs placebo to avoid a relapse or recurrence is less than 10 for all of these options, within a range of 3 for olanzapine and 9 for lamotrigine.Reference Ketter, Citrome, Wang, Culver and Srivastava39 However, different treatment options have different profiles when comparing the prevention of mania vs depression. For example, for lithium, the NNT vs placebo for mania prevention is 8 and that for depression prevention is 49.Reference Ketter, Citrome, Wang, Culver and Srivastava39 For valproate and lamotrigine, the direction is reversed, with the values for NNT vs placebo for the prevention of depression being more robust (11 and 15, respectively) than the NNT for the prevention of mania (22 and 23, respectively). Adjunctive quetiapine is the only agent where the NNT vs lithium or valproate alone is less than 10 for both mania prevention (NNT 8) and depression prevention (NNT 6). The polarity index (PI) is a metric used to describe the relative antimanic vs antidepressive preventive efficacy of medications, and is calculated by the ratio of the NNT for the prevention of depression to the NNT for the prevention of mania.Reference Popovic, Reinares, Goikolea, Bonnin, Gonzalez-Pinto and Vieta40 Thus, a PI greater than 1.0 indicates relatively greater antimanic prophylactic efficacy, and a PI below 1.0 indicates relatively greater antidepressive prophylactic efficacy. Table 8 provides the PI for selected agents.Reference Popovic, Reinares, Goikolea, Bonnin, Gonzalez-Pinto and Vieta40 The highest PI is for risperidone at 12.09, representing a 12-fold higher potency for the prophylaxis against mania than for depression. The lowest PI was for lamotrigine at 0.40, representing a 2.5-fold higher potency for the prophylaxis against depression than for mania.
Table 8 Polarity index for commonly used maintenance treatments for bipolar disorder
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:14333:20160414095248082-0981:S109285291400056X_tab8.gif?pub-status=live)
Data from reference Reference Popovic, Reinares, Goikolea, Bonnin, Gonzalez-Pinto and Vieta40; the polarity index may differ depending on which studies have been included when calculating the respective NNT values.Reference Ketter, Citrome, Wang, Culver and Srivastava39, Reference Popovic, Reinares, Goikolea, Bonnin, Gonzalez-Pinto and Vieta40
An important caveat regarding the PI, and examining maintenance studies in general, is that most maintenance trials have enrolled enriched populations of patients who were currently or recently manic or mixed; very few studies have enrolled patients with index depressive episodes. This introduces a bias, since it is thought that the polarity of the index episode tends to predict the polarity of relapse into a subsequent episode.Reference Calabrese, Vieta and El-Mallakh41 Unfortunately, there are no maintenance studies of OFC, quetiapine, or lurasidone vs placebo that have enrolled patients with an index episode of acute bipolar depression.
Adherence to long-term maintenance treatment can be a significant challenge in the face of tolerability problems. In spite of favorable NNTs, the tolerability limitations of the approved second-generation antipsychotics suggest that, in many instances, clinicians and patients may prefer to hold these agents in reserve for patients with inadequate efficacy or tolerability with mood stabilizers.Reference Ketter, Citrome, Wang, Culver and Srivastava39 A typical trade-off includes consideration of the prevention of bipolar episodes and the tolerability issue of weight gain. Lamotrigine and lithium were several times more likely to result in prevention of relapse⁄recurrence than weight gain in excess of at least 7% from baseline, with NNHs of 25 or more.Reference Ketter, Citrome, Wang, Culver and Srivastava39 This relatively favorable tolerability profile was not shared by olanzapine, aripiprazole, risperidone, and quetiapine, which all had more problematic NNH values in the maintenance studies (8, 8, 12, and 13 for olanzapine, aripiprazole, risperidone, and quetiapine, respectively).
Adjunctive psychosocial or psychological interventions may also be helpful in the maintenance treatment of bipolar disorder, with NNT values less than 10 for prevention of relapse⁄recurrence similar to those of approved pharmacotherapies,Reference Ketter, Citrome, Wang, Culver and Srivastava39, Reference Popovic, Reinares and Scott42 and for some, calculated in the range of 4–6.Reference Ketter, Citrome, Wang, Culver and Srivastava39 The PI can also be calculated for these different psychological interventions, and although values were predominantly less than 1.0, they did range from a low of 0.33 for one study of cognitive behavioral therapyReference Lam, Watkins and Hayward43 to a high of 3.36 for brief technique-driven interventions.Reference Perry, Tarrier, Morriss, McCarthy and Limb44
Conclusions
A major challenge in the treatment of major depressive episodes associated with bipolar disorder is differentiating this illness from major depressive episodes associated with major depressive disorder. Mistaking the former for the latter will lead to incorrect treatment and poor outcomes. At present, there are only 3 FDA-approved medication treatments for bipolar depression: OFC, quetiapine (immediate or extended release), and lurasidone (monotherapy or adjunctive to lithium or valproate). All 3 have similar efficacy profiles, but they differ in terms of tolerability. NNT and NNH can be used to quantify these similarities and differences. Individualizing treatment decisions will require consideration of the different potential adverse events that are more likely to occur with each medication. The metric of LHH can illustrate the trade-offs inherent in selecting medications, and a more favorable LHH was noted for treatment with lurasidone. However, OFC and quetiapine monotherapy may still have utility in high-urgency situations, particularly in persons who have demonstrated good outcomes with these interventions in the past, and where a pressing clinical need for efficacy mitigates their potential tolerability shortcomings. In terms of maintenance therapy, adjunctive quetiapine is the only agent where the NNT vs lithium or valproate alone is less than 10 for both mania prevention and depression prevention.
Disclosures
In the past 12 months Leslie Citrome has engaged in collaborative research with, or received consulting or speaking fees, from: Alexza, Alkermes, AstraZeneca, Avanir, Bristol-Myers Squibb, Eli Lilly, Forest, Forum, Genentech, Janssen, Jazz, Lundbeck, Merck, Medivation, Mylan, Novartis, Noven, Otsuka, Pfizer, Reckitt Benckiser, Reviva, Shire, Sunovion, Takeda, Teva.