Introduction
A comprehensive meta-analysis of placebo-controlled randomized trials (RCTs) since the introduction of chlorpromazine in 1953 found that about twice as many patients with acute exacerbation of schizophrenia respond to antipsychotics in comparison to inert placebo (Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer, Samara, Rabaioli, Bacher, Cipriani, Geddes, Salanti and Davis2017). Double-blind RCTs are the gold standard for the assessment of treatments, but a major concern is the risk of unblinding due to antipsychotic side effects (Shader et al., Reference Shader, Cohler, Elashoff and Grinspoon1964; Leucht et al., Reference Leucht, Heres, Hamann and Kane2008, Reference Leucht, Cipriani, Spineli, Mavridis, Orey, Richter, Samara, Barbui, Engel, Geddes, Kissling, Stapf, Lassig, Salanti and Davis2013). Such unblinding could lead to overestimates of the efficacy (Jensen et al., Reference Jensen, Bielefeldt and Hrobjartsson2017). In addition, one might argue that the mechanism of action of antipsychotics may be unspecific sedation rather than direct effects on positive symptoms (Moncrieff and Cohen, Reference Moncrieff and Cohen2005), a message that if exaggerated could have a potential consequence of non-adherence leading to unnecessary psychotic relapses, and their negative social consequences.
The use of ‘active placebos,’ i.e. drugs that induce side effects but are not efficacious on positive symptoms, could provide more convincing results. A greatly discussed systematic review of active placebo RCTs questioned the efficacy of antidepressants based on inert placebo RCTs (Moncrieff et al., Reference Moncrieff, Wessely and Hardy2004; Jensen et al., Reference Jensen, Bielefeldt and Hrobjartsson2017). Tricyclic antidepressants were found to be only slightly more efficacious for depression in comparison to atropine, a drug with anticholinergic side effects similar to those of tricyclic antidepressants (Moncrieff et al., Reference Moncrieff, Wessely and Hardy2004). In schizophrenia trials conducted in the 1960s, barbiturates and later benzodiazepines were used as active placebos because they can mimic antipsychotic-induced sedation (Casey et al., Reference Casey, Lasky, Klett and Hollister1960b; Holden et al., Reference Holden, Itil, Keskiner and Fixk1968), and they both enhance GABAA receptor function. However, the comparison of antipsychotics with barbiturates has not been evaluated systematically before, and reviews on benzodiazepines were focused on short-term sedation or adjunctive treatment (Dold et al., Reference Dold, Li, Tardy, Khorsand, Gillies and Leucht2012). As a result, this review aims to compare the efficacy of antipsychotics with barbiturates or benzodiazepines as active placebos for acute schizophrenia.
Material and methods
We followed the PRISMA statement (checklist in eAppendix-1) (Liberati et al., Reference Liberati, Altman, Tetzlaff, Mulrow, Gotzsche, Ioannidis, Clarke, Devereaux, Kleijnen and Moher2009) and the a priori written protocol was registered on PROSPERO (CRD42018086263).
Search strategy and selection criteria
RCTs comparing antipsychotics with barbiturates or benzodiazepines in patients with exacerbation of schizophrenia or schizophrenia-like psychosis were eligible. We used only the first phase of cross-over studies to avoid carry-over effects (Elbourne et al., Reference Elbourne, Altman, Higgins, Curtin, Worthington and Vail2002). As in our previous review on antipsychotics v. inactive placebo (Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer, Samara, Rabaioli, Bacher, Cipriani, Geddes, Salanti and Davis2017), the minimum duration of follow-up was 3 weeks, a study duration that has been shown to be sufficient to show the full effects of antipsychotics (McMahon et al., Reference McMahon, Kelly, Boggs, Li, Hu, Davis and Carpenter2008). Any antipsychotic, barbiturate or benzodiazepine was included at any dose range and administered via any form of application, except for short-term intramuscular injections, which are used for sedation purposes. We excluded promazine and mepazine post-hoc, when we found that old reviews had already clearly shown that they are less efficacious than other antipsychotics (Davis et al., Reference Davis, Barter, Kane and Baltimore1989), but they were included in a sensitivity analysis of the primary outcome.
We searched ClinicalTrials.gov, the Cochrane Central Register of Controlled Trials, EMBASE, MEDLINE, PsycINFO, PubMed, and World Health Organization International Trial Registry on 9 January 2018 (search strategies in eAppendix-3). Two separate searches were conducted for barbiturates (no restriction in terms of publication date) and benzodiazepines [search for literature published after 2010 and older records were identified from the reference list of our previous Cochrane review (Dold et al., Reference Dold, Li, Tardy, Khorsand, Gillies and Leucht2012), for which extensive searches had been conducted]. Additional reviews (Klein and Davis, Reference Klein and Davis1969; Wolkowitz and Pickar, Reference Wolkowitz and Pickar1991) and reference lists of included studies were inspected. At least two independent reviewers or contributors screened all title/abstracts from the search (SS, GA), full texts against the predefined eligibility criteria (SS, GP), extracted data in electronic forms, and assessed the quality of included studies using the Cochrane Collaboration's risk of bias tool (Higgins and Green, Reference Higgins and Green2011; The Cochrane Collaboration, 2014) (SS, GD, AC, SH, CM, and JS). Any disagreements in all stages were resolved by consultation with a third reviewer (SL). The strength of the evidence of the primary outcome was assessed using the GRADE approach (Schünemann et al., Reference Schünemann, Brożek, Guyatt and Oxman2013).
Outcome variables
As in our previous meta-analysis of antipsychotics v. placebo, two response criteria were investigated, ‘good’ (primary outcome) and ‘any’ response (Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer, Samara, Rabaioli, Bacher, Cipriani, Geddes, Salanti and Davis2017). ‘Good’ response was defined as either ‘at least much improvement’ in the Clinical Global Impression scale (CGI) (Guy et al., Reference Guy1976) or ‘at least 50% reduction from baseline of overall symptoms’ as measured by published rating scales (Leucht et al., Reference Leucht, Kane, Kissling, Hamann, Etschel and Engel2005, Reference Leucht, Davis, Engel, Kane and Wagenpfeil2007, Reference Leucht, Engel, Davis, Kissling, Meyer Zur Capellen, Schmauss and Messer2012; Levine et al., Reference Levine, Rabinowitz, Engel, Etschel and Leucht2008), e.g. the older Lorr's Multidimensional Scale of Rating Psychiatric Patients (MSRPP) (Lorr et al., Reference Lorr, Jenkins and Holsopple1953) and the Psychotic Reaction Profile (PRP) (Lorr et al., Reference Lorr O'Connor and Stafford1960). ‘Any’ response was defined as at least minimal improvement in the CGI or at least 20% reduction of overall symptoms (Leucht et al., Reference Leucht, Kane, Kissling, Hamann, Etschel and Engel2005, Reference Leucht, Davis, Engel, Kane and Wagenpfeil2007, Reference Leucht, Engel, Davis, Kissling, Meyer Zur Capellen, Schmauss and Messer2012; Levine et al., Reference Levine, Rabinowitz, Engel, Etschel and Leucht2008). If these cut-offs were not available, then authors’ definitions of response were also accepted. It has been shown that pooling studies with different definitions of response are acceptable as long as the effect size is presented as relative risks or odds ratios (Furukawa et al., Reference Furukawa, Akechi, Wagenpfeil and Leucht2011). When responder rates were not reported, they were imputed from mean overall symptoms applying a validated method (Samara et al., Reference Samara, Spineli, Furukawa, Engel, Davis, Salanti and Leucht2013).
Secondary outcomes were overall, positive, and negative symptoms as measured by published rating scales (Marshall et al., Reference Marshall, Lockwood, Bradley, Adams, Joy and Fenton2000). Change from baseline to endpoint was preferred to follow-up scores of the these scales. Intention-to-treat data were used whenever available. Other secondary outcomes were premature discontinuation (dropouts) due to any cause, inefficacy, and adverse events. Missing standard deviations were estimated from reported statistics, or, if not possible, from other arms of the same study or from the average of other studies (Higgins and Green, Reference Higgins and Green2011).
Statistical analysis
The rational to present the results on barbiturates and benzodiazepines in one paper is that they were both used as active placebos and they act via GABAA receptors. Nevertheless in the primary analysis, separate meta-analyses for barbiturates and benzodiazepines were calculated throughout using random-effects models (DerSimonian and Laird, Reference DerSimonian and Laird1986). The effect size for dichotomous outcomes was the relative risk or response ratio (RR) and its 95% confidence intervals (95% CI). Event rates and number-needed-to-treat for an additional beneficial/harmful outcome (NNTB/NNTH) were estimated according to the Cochrane Handbook (Higgins and Green, Reference Higgins and Green2011), using the relative risk and the occurrence of an outcome in control groups as assumed control risk (see eAppendix-5). If the original authors presented only the results of the per protocol population, we conservatively assumed that participants who were lost to follow-up would not have responded. The effect size for continuous outcomes was the standardized mean difference expressed as Hedges’ g and its 95% CI.
Assessment of heterogeneity was determined by visual inspection of forest plots and by applying a χ2 test for homogeneity and the I 2 statistic (considerable heterogeneity when >50%) (Higgins and Green, Reference Higgins and Green2011). Evaluation of the magnitude of heterogeneity was also supplemented using the empirical distribution of τ 2 of SMDs (Rhodes et al., Reference Rhodes, Turner and Higgins2015) (see eAppendix-5). Based on available data, we conducted the following predefined subgroup or meta-regression analyses of the primary outcome: follow-up duration (⩽3 months v. longer-term), chlorpromazine equivalents (Gardner et al., Reference Gardner, Murphy, O'Donnell, Centorrino and Baldessarini2010), and baseline severity (MSRPP total score). Predefined sensitivity analyses of the primary outcome were the use of a fixed-effects model and exclusion of studies that presented per protocol data. Post-hoc sensitivity analyses were also conducted by including promazine/mepazine, excluding studies with imputed responder rates as well as sensitivity analyses of overall symptoms using different estimates of standard deviations (see eAppendix-4). Following a reviewer request, we post-hoc pooled the results of the single benzodiazepine study with those of the barbiturates studies to obtain one estimate for the comparison of antipsychotics with GABAergic drugs. As data on the primary cut-off of ‘good’ response were not available, this analysis was only possible for the secondary cut-off ‘any’ response. Post-hoc analyses of the primary outcome were also conducted by comparing barbiturates with inert placebo and antipsychotics with mepazine. We investigated small study effects and publication bias with funnel plots if at least 10 studies were available (Egger et al., Reference Egger, Smith, Schneider and Minder1997; Higgins and Green, Reference Higgins and Green2011). The analyses were conducted with meta v4.9-2/5 (Schwarzer, Reference Schwarzer2007) and metafor v2.0-0 (Viechtbauer, Reference Viechtbauer2010) packages in R statistical language v3.5 (R Core Team, 2018). The α was set at 0.05, except for heterogeneity at 0.1.
Results
Description of the included studies
Nine studies were included in the analysis, seven comparing barbiturates and two comparing benzodiazepines with antipsychotics for schizophrenia. The PRISMA flow diagram of the search is presented in Fig. 1 (Moher et al., Reference Moher, Liberati, Tetzlaff and Altman2009). Study characteristics are displayed in Table 1 and in eAppendix-4. The included studies were published between 1960 and 1968 (median 1961), and were conducted in the USA, four in Veteran Administration Hospitals (VAH) (Casey et al., Reference Casey, Bennett, Lindley, Hollister, Gordon and Springer1960a, Reference Casey, Lasky, Klett and Hollister1960b; Hollister et al., Reference Hollister, Erickson and Motzentecker1960; Vestre et al., Reference Vestre, Hall and Schiele1962). No study concerning barbiturates was industry-sponsored, whereas both studies concerning benzodiazepines were sponsored by pharmaceutical industries manufacturing benzodiazepines. The median sample size was 80 (ranging from 24 to 805 participants) and the median follow-up duration was 12 weeks (ranging from 4 to 16 weeks). Two studies had a slightly longer follow-up duration (16 weeks) from 3 months (Hollister et al., Reference Hollister, Erickson and Motzentecker1960; Clark et al., Reference Clark, Ray, Paredes, Costiloe, Chappell, Hagans and Wold1961), and one <6 weeks (4 weeks) (Merlis et al., Reference Merlis, Turner and Krumholz1962). Longer- and shorter-term results are presented together in the manuscript and separately in graphs in eAppendix-5.
CHD, chlordiazepoxide; CPZ, chlorpromazine; DZ, diazepam; FLU, fluphenazine; MEP, mepazine; PERPH, perphenazine; PHEN, phenobarbital; PLAC, inert placebo; PROCH, prochlorperazine; PROM, promazine; THIOR, thioridazine; TPRE, trifluoperazine; TPRO, triflupromazine; TRIFLU, trifluperidol; VAH, Veteran Administration Hospitals; n, number of participants; DSM, Diagnostic and Statistical Manual of Mental Disorders; M/F, males/females; y, years; *, completers’ data when the randomized n was not available; n.i., not indicated; †, mepazine and promazine were excluded from the primary analysis.
All studies were randomized double-blind, involving inpatients. In the case of crossover studies, the duration of the first crossover phase is presented. Mean/median and range of age is presented.
The total number of participants (n) was 2099, 1328 on antipsychotics, 436 on barbiturates, 48 on benzodiazepines, and 287 on inert placebo or treatment combination. Almost all the studies included chronically ill patients, but not all were pretreated with antipsychotics (see Table 1 and eAppendix-4). No study examined exclusively treatment-resistant, first-episode patients, patients with predominant negative symptoms, or children and adolescents. The mean/median age of the participants ranged from 33 to 43 years. Five studies included only male patients (four of them from VAH), one included only females (Clark et al., Reference Clark, Ray, Paredes, Costiloe, Chappell, Hagans and Wold1961), and in the other three, the ratios between genders were 1:1 (Merlis et al., Reference Merlis, Turner and Krumholz1962; Gallant et al., Reference Gallant, Bishop, Nesselhof and Sprehe1965) and 1:2 (Kurland et al., Reference Kurland, Hanlon, Tatom, Ota and Simopoulos1961a). Ten antipsychotics were investigated: chlorpromazine (number of studies N = 6), fluphenazine (N = 1), trifluopromazine (N = 3), trifluoperazine (N = 2), mepazine (N = 2), promazine (N = 2), thioridazine (N = 1), perphenazine (N = 2), prochlorperazine (N = 2), and trifluperidol (N = 1). Excluding mepazine and promazine, the median of chlopromazine equivalents (Gardner et al., Reference Gardner, Murphy, O'Donnell, Centorrino and Baldessarini2010) was 500 mg/day (ranging from 150 to about 1250 mg/day). All seven studies on barbiturates used phenobarbital. The two studies on benzodiazepines used chlordiazepoxide (N = 2) and diazepam (N = 1) (Merlis et al., Reference Merlis, Turner and Krumholz1962; Holden et al., Reference Holden, Itil, Keskiner and Fixk1968).
Assessment of risk of bias
The risk of bias of the included studies is displayed in Fig. 2. As all studies reported randomization without adequately describing the method of random sequence generation as well as allocation concealment, they were given a rating of ‘unclear’ for these items. All studies were double-blind with a low risk for performance and detection bias in six of them, and unclear detection bias in three (Casey et al., Reference Casey, Bennett, Lindley, Hollister, Gordon and Springer1960a, Reference Clark, Ray, Paredes, Costiloe, Chappell, Hagans and Wold1961; Kurland et al., Reference Kurland, Hanlon, Tatom, Ota and Simopoulos1961a; Vestre et al., Reference Vestre, Hall and Schiele1962; Gallant et al., Reference Gallant, Bishop, Nesselhof and Sprehe1965; Holden et al., Reference Holden, Itil, Keskiner and Fixk1968). Seven studies were judged to have an unclear risk of bias due to ‘incomplete outcome data,’ and two studies were assessed with a high risk of bias (Casey et al., Reference Casey, Lasky, Klett and Hollister1960b; Kurland et al., Reference Kurland, Hanlon, Tatom, Ota and Simopoulos1961a, Reference Kurland, Michaux, Hanlon, Ota and Simopoulos1962). As it is typical for older trials, the studies, in general, were not well reported, such that four of them were rated to be of high risk of bias for selective reporting (Hollister et al., Reference Hollister, Erickson and Motzentecker1960; Merlis et al., Reference Merlis, Turner and Krumholz1962; Gallant et al., Reference Gallant, Bishop, Nesselhof and Sprehe1965; Holden et al., Reference Holden, Itil, Keskiner and Fixk1968). For example, standard deviations sometimes had to be estimated, and the exact number of participants was not always clear. Conservative extraction decisions were made throughout (details are reported in the characteristics of included studies in eAppendix-4). Finally, three studies were judged to be of low-risk of ‘other bias,’ and six studies of ‘unclear risk.’
Antipsychotics v. barbiturates
Response to treatment
Antipsychotics were more efficacious than barbiturates in terms of the primary outcome ‘good’ response [36.2% of the patients on antipsychotics v. 16.8% on barbiturates; Ν = 6; n = 1302; RR 2.15; 95% CI (1.36–3.41); I 2 = 48.9; NNTB 5, 95% CI (2–17); low quality of evidence according to GRADE] (Fig. 3a). The results were not changed substantially in the sensitivity analyses using a fixed-effects model [RR 2.03 (1.57–2.62)] and including promazine and mepazine [N = 6; n = 1676; RR 1.98 (1.07–3.68), I 2 = 71.1; NNTB 6(2–84)]. Excluding studies with imputed response rates, few data remained, however, the results were still significant [N = 3; n = 193; RR 2.5 (1.07–5.84), I 2 = 13.5; NNTB 7(2–150)]. No significant difference between antipsychotics and phenobarbital was found in the small dataset after exclusion of studies with per protocol data for the primary outcome [N = 2; n = 153; RR 3.46 (0.44–27.05); I 2 = 56.2; NNTB 4(0, NNTH: 18)]. Sensitivity and post-hoc analyses are presented in eAppendix-5.
In terms of the secondary cut-off, ‘any’ response, antipsychotics were more efficacious than phenobarbital and the response ratio was similar to that of ‘good’ response [57.4% v. 27.8%; N = 7; n = 1362; RR 2.07 (1.35–3.18); I 2 = 68.2; NNTB 3(2–10)] (Fig. 3b).
Overall, positive, and negative symptoms
Antipsychotics decreased overall symptoms more than phenobarbital on the total morbidity scores of MSRPP or PRP [N = 4; n = 928; SMD (95% CI) −0.56 (−0.96 to −0.16); I 2 = 83.9%, τ 2 = 0.13, high magnitude of heterogeneity] (Fig. 3c). Standard deviations were estimated using the significance threshold of 0.05 reported in three studies (Casey et al., Reference Casey, Bennett, Lindley, Hollister, Gordon and Springer1960a; Kurland et al., Reference Kurland, Hanlon, Tatom, Ota and Simopoulos1961a, Reference Kurland, Hanlon, Tatom and Simopoulos1961b; Vestre et al., Reference Vestre, Hall and Schiele1962), which underestimated the difference, because the original p values were probably smaller. Therefore, we conducted post-hoc sensitivity analyses by using the smallest standard deviation within studies as well as the standard deviation of MSRPP derived by F-values in Casey et al. (Reference Casey, Lasky, Klett and Hollister1960b). The SMDs for overall symptoms were larger [−0.73 (−0.95 to −0.50) and −0.82 (−1.01 to −0.62]) as well as less heterogeneity was observed (I 2 = 49%, τ 2 = 0.02 and I 2 = 32%, τ 2 = 0.01) in both scenarios, respectively (eAppendix-5).
Antipsychotics reduced positive symptoms more than phenobarbital, in terms of the subscales ‘thinking disorder’ of PRP [N = 1; n = 82; SMD −0.55 (−1.03 to −0.07)], ‘perceptual disorganization’ (N = 1; n = 114; SMD −0.58 (−1.05 to −0.12)], ‘conceptual disorganization’ [N = 1; n = 114; SMD −0.55 (−1.01 to −0.08)], and ‘paranoid belligerence’ of MSRPP [N = 1; n = 114; SMD −0.54 (−1.01 to −0.08)]. No data were available for negative symptoms.
Premature discontinuation
There was no difference between antipsychotics and phenobarbital in dropouts due to any cause [21.3% v. 24.5%; N = 5; n = 1242; RR 0.87 (0.64–1.17); I 2 = 51; NNTB 31(11, NNTH 24)] (Fig. 4a). Fewer patients on antipsychotics than phenobarbital discontinued due to inefficacy [3.3% v. 8.3%, N = 5; n = 1242; RR 0.39 (0.16–0.95); I 2 = 62; NNTB 20(14–241)] (Fig. 4b), whereas more patients on antipsychotic drugs than phenobarbital discontinued due to side effects [3% v. 1%; N = 5; n = 1242, RR 2.98 (1.12–7.96); I 2 = 0; NNTH 51(14, 833)] (Fig. 4c).
Subgroup analysis and meta-regressions of the primary outcome
Some heterogeneity was found on the primary outcome for the comparison between antipsychotics and barbiturates (I 2 = 48.9%; χ25 = 9.79, p = 0.08). The a priori defined subgroup and meta-regression analyses found no significant differences between studies with shorter and longer follow-up duration (N = 6; χ21 = 0.19, p = 0.66), while greater response ratios were associated with higher mean doses of chlorpromazine equivalents (N = 6; slope = 0.0021; z = 2.87, p = 0.004) and lower mean baseline severity, as measured by total score of MSRPP (N = 3; slope = −0.1209; z = −2.49, p = 0.013) (eAppendix-5).
Small study effects and publication bias
The analysis of funnel plots was not meaningful because fewer than 10 studies were included in all comparisons.
Antipsychotics v. benzodiazepines
Data for this comparison were very limited. Of the two small studies available, only one provided useable data for response to treatment (Merlis et al., Reference Merlis, Turner and Krumholz1962). Data on ‘good’ response, overall, positive, and negative symptoms were not available. In terms of ‘any’ response, there was no difference between antipsychotics and benzodiazepines [74.7% on antipsychotics v. 65% on benzodiazepines; N = 1; n = 60; RR 1.15 (0.82–1.62); NNTB 10 (2; NNTH 9); low quality of evidence] (Fig. 3b). A post-hoc analysis by pooling this benzodiazepine-controlled study with the barbiturate-controlled studies did not change the results materially [RR 1.81 (1.26–2.60)]. There was also no difference between antipsychotics and benzodiazepines in terms of dropouts due to any cause [N = 1; n = 16; RR 3(0.14–63.74)], inefficacy [N = 1; n = 16; RR 3(0.14–63.74)], and side effects (N = 1; n = 16; no dropouts due to side effects occurred) (Fig. 4). Due to the paucity of available data, subgroup, sensitivity, and meta-regression analyses were not meaningful for benzodiazepines.
Strength of the evidence for the primary outcome according to GRADE
We rated the strength of the evidence for both comparisons, antipsychotics v. barbiturates and v. benzodiazepines as low, mainly due to concerns about the risk of bias as well as imprecision of the results for the latter (eAppendix-6).
Discussion
Antipsychotics were more efficacious than phenobarbital based on substantial evidence. In contrast, no difference compared to benzodiazepines was found based on one small study. Almost twice as many patients on antipsychotics in comparison to phenobarbital achieved a ‘good’ response or ‘any’ response. The number of included studies was small, but the results of the primary outcome were based on a considerable number of participants (n = 1302), which is higher than the threshold of 1000 participants suggested by Trikalinos et al. for the robustness of meta-analysis in psychiatry (Trikalinos et al., Reference Trikalinos, Churchill, Ferri, Leucht, Tuunainen, Wahlbeck and Ioannidis2004). Excluding studies analyzed ‘per protocol’ was the only sensitivity analysis with no significant difference between antipsychotics and phenobarbital, but it was based only on two studies with 153 participants. For these studies, we employed a conservative approach by assuming that dropouts had not responded; therefore, our primary analysis could be reliable from that point of view. Antipsychotics also decreased overall and positive symptoms with a medium effect size, and fewer patients than in the phenobarbital groups discontinued due to inefficacy. Whereas more patients on antipsychotics than on phenobarbital discontinued due to adverse effects, there were no differences in dropouts due to any cause.
About 72% of the total number of participants in the primary outcome stemmed from two industry-independent studies conducted in VAHs (Casey et al., Reference Casey, Bennett, Lindley, Hollister, Gordon and Springer1960a, Reference Casey, Lasky, Klett and Hollister1960b). It is impressive that large, multicenter, double-blind RCTs in hospitalized patients were conducted in the 1960s without the support of modern technology, such as online communication (no facsimile, let alone e-mail), statistical software, as well as without industry funding. These early RCTs were milestones in clinical psychopharmacology (Shen, Reference Shen1999; Carpenter and Davis, Reference Carpenter and Davis2012). However, as these studies are from an era when trial methodology had not been established clearly yet, they did not meet all current methodological standards. Antipsychotic drugs were clearly more efficacious than phenobarbital, but due to these short comings, we downgraded the evidence, because there is uncertainty about the exact magnitude of the superiority of antipsychotics according to the GRADE approach (see eAppendix-6).
Our results comparing antipsychotics with phenobarbital were similar to those observed in our previous meta-analysis of RCTs with inactive placebo (Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer, Samara, Rabaioli, Bacher, Cipriani, Geddes, Salanti and Davis2017), where the response ratio was 1.96(1.65–2.44) for ‘good’ response (23% participants on antipsychotics had a ‘good’ response v. 14% on inert placebo) and 1.93(1.72–2.19) for ‘any’ response. The populations in the reviews were also similar, i.e. chronically ill patients, although it is likely that compared to modern trials, at that time, there were more antipsychotic-naïve patients. For example, in the two VAH studies (Casey et al., Reference Casey, Bennett, Lindley, Hollister, Gordon and Springer1960a, Reference Casey, Lasky, Klett and Hollister1960b), only 65% of the participants had used ‘tranquilizers’ before the studies. Nevertheless, chronic patients might not respond as well to antipsychotics as first-episode patients (Zhu et al., Reference Zhu, Krause, Huhn, Rothe, Schneider-Thoma, Chaimani, Li, Davis and Leucht2017a, Reference Zhu, Li, Huhn, Rothe, Krause, Bighelli, Schneider-Thoma and Leucht2017b). Indeed, in an early large NIMH study, almost half of the participants had their first episode or were antipsychotic-naive, and higher response rates were observed (61% patients on antipsychotics were much improved v. 22% on inert placebo) (National Institute of Mental Health Psychopharmacology Service Center Collaborative Study Group, 1964). The meta-analysis with inactive placebo-RCTs included many other antipsychotics than phenothiazines, in particular, haloperidol and second-generation antipsychotics (Leucht et al., Reference Leucht, Leucht, Huhn, Chaimani, Mavridis, Helfer, Samara, Rabaioli, Bacher, Cipriani, Geddes, Salanti and Davis2017). Despite these differences, phenobarbital and inert placebo appear to be similarly ineffective compared to antipsychotics. Indeed, no difference between phenobarbital and inert placebo was found post-hoc in the primary outcome [N = 3; n = 517; RR 0.94 (0.68–1.30)] (eAppendix-5).
On the other hand, only two small trials were eligible for the comparison between antipsychotics and benzodiazepines (Merlis et al., Reference Merlis, Turner and Krumholz1962; Holden et al., Reference Holden, Itil, Keskiner and Fixk1968). There was no significant difference in response rates between antipsychotics and benzodiazepines based on one small study supported by the manufacturer of the benzodiazepines (Merlis et al., Reference Merlis, Turner and Krumholz1962). The response rates were high in all arms (65% in inert placebo and benzodiazepines, 75% in chlorpromazine), although the daily dose of chlorpromazine was about 150 mg/day, which may be insufficient (Davis and Chen, Reference Davis and Chen2004). Benzodiazepines are effective for sedating agitated patients with schizophrenia (Dold et al., Reference Dold, Li, Tardy, Khorsand, Gillies and Leucht2012), and a large, pragmatic RCT found that midazolam was effective more rapidly than combined haloperidol–promethazine in this regard (TREC Collaborative Group, 2003). However, as so few randomized data are available for longer-term outcomes, the results are inconclusive. We believe that further research is warranted, since benzodiazepines are used frequently in clinical practice, although they are liable for addiction, and observational studies suggest that benzodiazepine augmentation could be associated with increased mortality (Tiihonen et al., Reference Tiihonen, Suokas, Suvisaari, Haukka and Korhonen2012).
This review has some limitations. First, barbiturates and benzodiazepines induce sedation, and blinding could be jeopardized by antipsychotic-specific side effects, such as extrapyramidal symptoms (EPS). We did not analyze specific side effects, but four studies reported more EPS in the antipsychotic arms than under phenobarbital (Casey et al., Reference Casey, Lasky, Klett and Hollister1960b; Clark et al., Reference Clark, Ray, Paredes, Costiloe, Chappell, Hagans and Wold1961; Kurland et al., Reference Kurland, Hanlon, Tatom, Ota and Simopoulos1961a, Reference Kurland, Hanlon, Tatom and Simopoulos1961b; Gallant et al., Reference Gallant, Bishop, Nesselhof and Sprehe1965; Hollister, Reference Hollister1972), while one study reported an overall low incidence of EPS (six EPS events in 805 patients, one in phenobarbital) (Casey et al., Reference Casey, Bennett, Lindley, Hollister, Gordon and Springer1960a) and another one, no significant difference in the use of anticholinergic drugs between antipsychotics and phenobarbital (Vestre et al., Reference Vestre, Hall and Schiele1962). We found that more patients on antipsychotics than phenobarbital dropped out due to side effects, but dropout rates were low (3% v. 1%, respectively), in comparison to 20% observed in more recent antipsychotic trials (Huhn et al., Reference Huhn, Nikolakopoulou, Schneider-Thoma, Krause, Samara, Peter, Arndt, Backers, Rothe, Cipriani, Davis, Salanti and Leucht2019). Dropout rates in antipsychotic trials have increased since that early era of psychopharmacology (Wahlbeck et al., Reference Wahlbeck, Tuunainen, Ahokas and Leucht2001), which could reflect improvements in the quality of reporting, as well as changes in trial methodology, patient recruitment and ethical aspects (Wahlbeck et al., Reference Wahlbeck, Tuunainen, Ahokas and Leucht2001; Brunoni et al., Reference Brunoni, Tadini and Fregni2010).
From a theoretical point of view, drugs which before study start were supposed to have antipsychotic properties would better control for investigator bias resulting from unblinding. Mepazine can serve as an example. It was introduced as a chlorpromazine-like phenothiazine with potential superiority to other phenothiazines (Casey et al., Reference Casey, Lasky, Klett and Hollister1960b). It might inhibit slightly dopaminergic receptors (Heiss et al., Reference Heiss, Hoyer and Thalhammer1976), somewhat similar to promethazine, but the dopamine blockade mechanism of antipsychotics had not been established at that time. In a post-hoc analysis, the other phenothiazines were more efficacious than mepazine [n = 2; N = 704; RR 1.78(1.21–2.62)] (eAppendix-5), although it was investigated as a potential antipsychotic, reducing investigator bias (Casey et al., Reference Casey, Lasky, Klett and Hollister1960b; Kurland et al., Reference Kurland, Hanlon, Tatom, Ota and Simopoulos1961a). In a similar vein, the phenothiazine perazine was superior to the tricyclic antidepressant trimipramine in a double-blind RCT, which attempted to prove trimipramine's efficacy, inspired by its receptor-binding profile being similar to clozapine (Bender et al., Reference Bender, Olbrich, Fischer, Hornstein, Schoene, Falkai, Haarmann, Berger and Gastpar2003; Leucht et al., Reference Leucht, Helfer and Hartung2014). Finally, phenothiazines were also superior to chlorprophenpyridamine/scopolamine (an active placebo mimicking sedative and anticholinergic properties) in a double-blind RCT with 288 participants (Adelson and Epstein, Reference Adelson and Epstein1962).
Second, antipsychotics were not analyzed separately but were pooled together, since small differences in terms of efficacy could be expected (Leucht et al., Reference Leucht, Cipriani, Spineli, Mavridis, Orey, Richter, Samara, Barbui, Engel, Geddes, Kissling, Stapf, Lassig, Salanti and Davis2013; Samara et al., Reference Samara, Cao, Helfer, Davis and Leucht2014). Third, the included studies were published before the CONSORT statements for RCTs (Begg et al., Reference Begg, Cho, Eastwood, Horton, Moher, Olkin, Pitkin, Rennie, Schulz, Simel and Stroup1996); therefore, their reporting did not follow current standards and their quality was downgraded. Fourth, for this reason in some cases, estimation of values was mandatory, but conservative decisions were made throughout, which should have underestimated differences (eAppendix-4). For example, responder rates were imputed in three studies (Casey et al., Reference Casey, Bennett, Lindley, Hollister, Gordon and Springer1960a, Reference Casey, Lasky, Klett and Hollister1960b; Kurland et al., Reference Kurland, Hanlon, Tatom, Ota and Simopoulos1961a, Reference Kurland, Hanlon, Tatom and Simopoulos1961b) from mean values using a validated methodology (Samara et al., Reference Samara, Spineli, Furukawa, Engel, Davis, Salanti and Leucht2013), which tends to narrow the difference between experimental and control groups (Samara et al., Reference Samara, Spineli, Furukawa, Engel, Davis, Salanti and Leucht2013). Standard deviations were also poorly reported and conservative estimations were used in three studies (Casey et al., Reference Casey, Bennett, Lindley, Hollister, Gordon and Springer1960a; Kurland et al., Reference Kurland, Hanlon, Tatom, Ota and Simopoulos1961a, Reference Kurland, Hanlon, Tatom and Simopoulos1961b; Vestre et al., Reference Vestre, Hall and Schiele1962), which underestimated the difference (see post-hoc sensitivity analyses of overall symptoms, eAppendix-5). These estimations, in part, also could explain the considerable heterogeneity in some secondary analyses and outcomes for which no clear other reasons were found.
Last, the results of the predefined meta-regressions on baseline severity and antipsychotic dose should be interpreted with most caution, because they were based on a maximum of six studies, and they were prone to ecological fallacy, potential outliers, and chance findings (eAppendix-5).
We conclude that antipsychotics were more efficacious than phenobarbital with medium effect sizes, based on old studies which are large but do not meet all current methodological standards so that the strength of the evidence about the exact magnitude of the superiority was low according to GRADE (Schünemann et al., Reference Schünemann, Brożek, Guyatt and Oxman2013). The data on benzodiazepines as ‘active placebos’ were inconclusive so that trials are still warranted, although possibly of less clinical importance due to their risk of addiction and excess mortality (Tiihonen et al., Reference Tiihonen, Suokas, Suvisaari, Haukka and Korhonen2012).
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S003329171900285X.
Acknowledgements
We would like to thank Farhad Shokraneh for conducting the electronic database search and assisting with the retrieval of full texts, Ghazaleh Aali for conducting the double screening of title and abstracts, and Makoto Wada for assisting with the translation of a Japanese full text. We would like to thank Dr Dimitris Mavridis for his statistical consultation regarding the empirical distributions of τ 2. SH was supported during his research internship at the Department of Psychiatry and Psychotherapy, School of Medicine, Technische Universität München, Germany by the European College of Neuropsychopharmacology (ECNP).
Author contributions
SL was the supervisor of the study. SL, SS, GP, and JD conceived and designed the study. SS, GM, AC, CM, SH, GP, JD, JS, and SL selected articles, extracted, and interpreted data. SS did the statistical analysis. SS and SL wrote the draft and the final version of the report. SL and AV obtained funding for the staff. SL, GP, and AV provided administrative, technical, or material support. All authors critically reviewed the report for important intellectual content and approved the final submitted version.
Financial support
This research received no specific grant from any funding agency, commercial, or not-for-profit sectors.
Conflict of interest
In the last 3 years, Stefan Leucht has received honoraria as a consultant/advisor and/or for lectures from LB Pharma, Otsuka, Lundbeck, Boehringer Ingelheim, LTS Lohmann, Janssen, Johnson&Johnson, TEVA, MSD, Sandoz, SanofiAventis, Angelini, Recordati, Sunovion, Geodon Richter. In the last 3 years, Antonio Vita received support directly or indirectly for clinical studies or trials, conferences, consultancies, congress presentation, and advisory boards from Angelini, Boheringer Ingelheim, Eli Lilly, Fidia, Forum Pharmaceutical, Innovapharma, Janssen-Cilag, Lundbeck, Otsuka, Recordati, Takeda. No other disclosures were reported.