Introduction
Humans have intentionally consumed psychoactive substances for thousands of years (Guerra-Doce, Reference Guerra-Doce2015). Psychedelic substances, in particular, figure prominently in indigenous medical and religious practices around the world (Samorini, Reference Samorini2019; Schultes, Reference Schultes1969). Scientific interest during the 1950s and 1960s in the therapeutic potential of both plant-based psychedelics (e.g. psilocybin) and synthetic psychedelics [e.g. lysergic acid diethylamide (LSD)] largely ceased following legislative changes during the 1970s and 1980s (Bonson, Reference Bonson2018). Research has resumed in the past two decades. While early work in this contemporary period focused on pharmacokinetics (e.g. Callaway et al., Reference Callaway, McKenna, Grob, Brito, Raymon, Poland and Mash1999) or the use of psychedelics as a model for psychiatric conditions (e.g. schizophrenia; Vollenweider, Vollenweider-Scherpenhuyzen, Bäbler, Vogel, and Hell, Reference Vollenweider, Vollenweider-Scherpenhuyzen, Bäbler, Vogel and Hell1998), a growing number of studies are again evaluating the therapeutic potential of psychedelics (Reiff et al., Reference Reiff, Richman, Nemeroff, Carpenter, Widge and Rodriguez2020).
Classical psychedelics are a class of psychoactive substances that share both mode of action (agonism of the 5-HT2A receptor; Carhart-Harris, Reference Carhart-Harris2019) and psychoactive effects (marked cognitive, affective, and perceptual changes). Members of this class that have received recent scientific attention include psilocybin, ayahuasca, and LSD (dos Santos, Bouso, Alcázar-Córcoles, & Hallak, Reference dos Santos, Bouso, Alcázar-Córcoles and Hallak2018). Psilocybin (4-phosphoroyloxy-N,N-dimethyltryptamine) is a naturally occurring plant alkaloid used ritualistically for spiritual and healing purposes by indigenous cultures in Mexico and South America (Guzmán, Reference Guzmán2008). Ayahuasca is a plant-based serotonergic psychedelic also used ritualistically by indigenous cultures in South America (McKenna, Reference McKenna2004). The psychoactive effects of ayahuasca are due to N,N-dimethyltryptamine coupled with reversible monoamine oxidase inhibitors (Ott, Reference Ott1999). LSD is a synthetic psychedelic first synthesized in 1943 by Hofmann (Reference Hofmann1980) that is both a serotonin and dopamine receptor agonist (Giacomelli, Palmery, Romanelli, Cheng, & Silvestrini, Reference Giacomelli, Palmery, Romanelli, Cheng and Silvestrini1998; Preller et al., Reference Preller, Herdener, Pokorny, Planzer, Kraehenmann, Stämpfli and Vollenweider2017). Numerous studies in the 1960s investigated the therapeutic effects of LSD for the treatment of addiction (Krebs & Johansen, Reference Krebs and Johansen2012) and other clinical applications (e.g. end-of-life distress; Ross, Reference Ross2018). Research halted as LSD became associated with the countercultural revolution of the late 1960s coupled with concerns regarding its safety (Nutt, King, & Nichols, Reference Nutt, King and Nichols2013).
Studies have begun reexamining the therapeutic potential of classical psychedelics for clinical conditions including depression (Carhart-Harris et al., Reference Carhart-Harris, Bolstridge, Rucker, Day, Erritzoe, Kaelen and Nutt2016a, Reference Carhart-Harris, Bolstridge, Day, Rucker, Watts, Erritzoe and Nutt2018a; Palhano-Fontes et al., Reference Palhano-Fontes, Barreto, Onias, Andrade, Novaes, Pessoa and Tófoli2019), anxiety (Gasser et al., Reference Gasser, Holstein, Michel, Doblin, Yazar-Klosinski, Passie and Brenneisen2014; Ross et al., Reference Ross, Bossis, Guss, Agin-Liebes, Malone, Cohen and Schmidt2016), and substance use (Bogenschutz et al., Reference Bogenschutz, Forcehimes, Pommy, Wilcox, Barbosa and Strassman2015; Johnson, Garcia-Romeu, Cosimano, & Griffiths, Reference Johnson, Garcia-Romeu, Cosimano and Griffiths2014). Often psychedelics are paired with behavioral interventions intended to maximize benefits by enhancing the mental ‘set’ and physical ‘setting’ (Carhart-Harris et al., Reference Carhart-Harris, Roseman, Haijen, Erritzoe, Watts, Branchi and Kaelen2018b). Other studies have examined effects in non-clinical samples on measures of well-being, personality, and associated constructs (e.g. mindfulness, spirituality; MacLean, Johnson, and Griffiths, Reference MacLean, Johnson and Griffiths2011; Soler et al., Reference Soler, Elices, Dominguez-Clavé, Pascual, Feilding, Navarro-Gil and Riba2018).
Several systematic reviews have examined the safety and efficacy of psychedelics for both clinical and non-clinical populations. These narrative reviews consistently suggest psychedelics can be safely administered (i.e. adverse effects are minimal and transient) and may reduce depression and anxiety symptoms (Muttoni, Ardissino, & John, Reference Muttoni, Ardissino and John2019), provide psychological benefits in the context of life-threatening disease (Reiche et al., Reference Reiche, Hermle, Gutwinski, Jungaberle, Gasser and Majić2018), and induce mystical experiences associated with enduring changes in personality and attitudes (Aday, Mitzkovitz, Bloesch, Davoli, & Davis, Reference Aday, Mitzkovitz, Bloesch, Davoli and Davis2020). Despite several well-conducted systematic reviews, only two quantitative reviews (i.e. meta-analyses) have characterized the efficacy of psychedelics. Krebs and Johansen (Reference Krebs and Johansen2012) meta-analyzed six randomized controlled trials (RCTs) published between 1966 and 1970 testing LSD for alcoholism, finding LSD substantially reduced substance misuse (odds ratio = 1.96). Goldberg, Pace, Nicholas, Raison, and Hutson (Reference Goldberg, Pace, Nicholas, Raison and Hutson2020) found that psilocybin was associated with large reductions in depression and anxiety across four recent studies (Hedges' gs = 0.82 to 1.47).
The available reviews suggest psychedelics may have therapeutic potential. Yet, a clear quantitative depiction of the breadth of this literature is lacking. A comprehensive meta-analysis would be valuable for characterizing the magnitude and variability (i.e. heterogeneity) of the effect of psychedelics across psychological outcomes, including but not limited to psychiatric symptoms. Such a meta-analysis would be particularly valuable for clarifying effects that have been inconsistent in prior studies (e.g. effects on personality; Barrett, Doss, Sepeda, Pekar, & Griffiths, Reference Barrett, Doss, Sepeda, Pekar and Griffiths2020; MacLean et al., Reference MacLean, Johnson and Griffiths2011). The small sample size in many primary studies (e.g. mean n = 29.25; Goldberg et al., Reference Goldberg, Pace, Nicholas, Raison and Hutson2020) also recommends the use of meta-analysis which allows aggregation across studies. Lastly, meta-analysis offers the opportunity to examine whether various study-level features (e.g. psychedelic type, behavioral support) moderate effects.
The current study sought to address this gap in the literature by quantitatively synthesizing psychological effects from experimental studies testing psilocybin, ayahuasca, or LSD. We focus on these three substances due to their shared mechanism of action (5-HT2A receptor agonism) and subjective effects. Other psychoactive compounds that produce partially overlapping effects through partially overlapping mechanisms were not considered [e.g. enactogens such as 3,4-methylenedioxymethamphetamine (MDMA); Reiff et al., Reference Reiff, Richman, Nemeroff, Carpenter, Widge and Rodriguez2020]. Given our interest in therapeutic applications, we focus on effects outside of the acute period of intoxication. To provide the most comprehensive depiction, we included studies with either clinical or non-clinical (i.e. healthy) samples. Likewise, we included both between-group (e.g. RCTs) and within-group (e.g. pre-post) designs. Four study-level characteristics (psychedelic type, clinical sample, presence of behavioral support, percentage female) were examined as moderators. We also assess adverse effects and risk of bias within and between studies.
Method
Protocol and registration
We followed the PRISMA guidelines (Moher, Liberati, Tetzlaff, & Altman, Reference Moher, Liberati, Tetzlaff and Altman2009). This meta-analysis was pre-registered through the Open Science Framework (https://osf.io/79y5v/). Upon reviewing the available studies, we made several deviations. First, we restricted our focus to post-acute effects given the acute hallucinogenic effects have been well characterized (e.g. Studerus, Kometer, Hasler, and Vollenweider, Reference Studerus, Kometer, Hasler and Vollenweider2011) and are less relevant for therapeutic purposes. Second, there were insufficient studies to test moderation by specific clinical condition (e.g. depression v. anxiety disorders). Instead, we report results restricted to clinical samples and to samples with depression. Third, no waitlist control conditions were available to compare with placebo-controlled studies. Fourth, we aggregated outcomes into conceptually coherent categories based on measures reported across studies. This led to the addition of some categories (e.g. adverse effects) and exclusion of some that were rarely reported (e.g. substance use).
Eligibility criteria
Eligible studies involved the administration of psilocybin, ayahuasca, or LSD within an experimental setting (i.e. not a naturalistic setting). Studies were required to report at least one psychological outcome. We maintained a broad definition of psychological to include psychiatric symptoms as well as non-clinical measures (e.g. well-being, spirituality). However, measures primarily focused on the acute psychedelic experience itself (e.g. altered states of consciousness; Studerus, Gamma, and Vollenweider, Reference Studerus, Gamma and Vollenweider2010) were excluded. Outcomes were assessed outside of the period of acute intoxication, which we operationalized as ⩾24 h post-administration of the psychedelic, consistent with prior studies (e.g. Schmid et al., Reference Schmid, Enzler, Gasser, Grouzmann, Preller, Vollenweider and Liechti2015). Studies with and without behavioral support were eligible. Both single group (e.g. within-group pre-post) or between-group designs (e.g. placebo-controlled RCT) were eligible. Both clinical and non-clinical samples were eligible. No restriction was placed on language or publication status. Studies were excluded if they were missing data necessary for computing effect sizes. Studies that only reported post-treatment data without a baseline measurement or a relevant control group (e.g. persisting effects at post-treatment for a single-group design; Nicholas et al., Reference Nicholas, Henriquez, Gassman, Cooper, Muller, Hetzel and Hutson2018) were excluded. Principal investigators of completed clinical trials were contacted regarding available results.
Information sources
We searched six databases including PubMed, CINAHL, PsycINFO, Web of Science, Scopus, and Cochrane. We restricted our search to studies from the contemporary period of psychedelic research (1990 or later). This window captured the period when research on classical psychedelics resumed (e.g. Strassman, Qualls, Uhlenhuth, and Kellner, Reference Strassman, Qualls, Uhlenhuth and Kellner1994) but excluded early research (1950s to 1960s) conducted under sufficiently different methodological standards such that safety and efficacy data may not be interpretable (Bonson, Reference Bonson2018). The search was conducted between October 23rd and 31st, 2019. In addition, we hand searched recent systematic reviews (Aday et al., Reference Aday, Mitzkovitz, Bloesch, Davoli and Davis2020; Bouso, dos Santos, Alcázar-Córcoles, & Hallak, Reference Bouso, dos Santos, Alcázar-Córcoles and Hallak2018; dos Santos et al., Reference dos Santos, Bouso, Alcázar-Córcoles and Hallak2018; Jungaberle et al., Reference Jungaberle, Thal, Zeuch, Rougemont-Bücking, von Heyden, Aicher and Scheidegger2018; Muttoni et al., Reference Muttoni, Ardissino and John2019; Reiche et al., Reference Reiche, Hermle, Gutwinski, Jungaberle, Gasser and Majić2018; Reiff et al., Reference Reiff, Richman, Nemeroff, Carpenter, Widge and Rodriguez2020; Schenberg, Reference Schenberg2018).
Search
We paired search terms associated with the three psychedelics of interest (e.g. ‘psilocybin,’ ‘ayahuasca,’ ‘LSD,’ ‘psychedelic*’) with terms related to both clinical (e.g. ‘mental disorders,’ ‘depression,’ ‘anx*’) and non-clinical populations (e.g. ‘well-being,’ ‘quality of life,’ ‘healthy’). The full search terms for all six databases are shown in online Supplementary Table 1.
Study selection
Two authors independently reviewed each title and/or abstract of potential studies for inclusion. Full texts were reviewed for studies that passed initial screening. Disagreements were discussed with the first author until consensus was reached.
Data collection process
Standardized spreadsheets were developed for study- and effect size-level coding. The first and second authors independently extracted data. Inter-rater reliabilities were good to excellent (i.e. Ks and ICCs ⩾ 0.74; Cicchetti, Reference Cicchetti1994).
Data items
In addition to data necessary for computing effect sizes (e.g. sample sizes, means, standard deviations), we extracted: (1) study design, (2) psychedelic type and dose and control condition, (3) inclusion criteria, (4) adverse events, (5) post-treatment and follow-up timing, (6) behavioral support, (7) sample age and sex composition, (8) country, (9) and retention. We also extracted data necessary for coding risk of bias with the Cochrane tool (Higgins & Green, Reference Higgins and Green2008). Outcomes were grouped into categories that were intended to be both parsimonious and conceptually coherent. This yielded 14 categories: adverse effects (i.e. symptoms potentially associated with negative drug effects such as psychotic symptoms or mania), targeted symptoms of psychiatric disorders (e.g. alcohol use for samples with alcohol use disorder), depression for samples with depression (as this was the most common psychiatric disorder studied), negative affect-related outcomes (e.g. negative mood, anxiety), positive affect-related outcomes (e.g. joy), social outcomes (e.g. altruism), behavior (e.g. observer-rated behavior change), existential and spiritual outcomes (e.g. death transcendence, lifetime mystical experience), mindfulness, and the big five personality traits (i.e. openness, neuroticism, extraversion, agreeableness, conscientiousness).
Risk of bias in individual studies
Risk of bias was evaluated using the Cochrane tool (Higgins & Green, Reference Higgins and Green2008). Bias was assessed across five domains: selection bias (random sequence generation, allocation concealment), performance bias (blinding of participants and personnel), detection bias (blinding of outcome assessors), attrition bias (incomplete outcome data), and reporting bias (selective reporting). For each study, an evaluation of low, high, or unclear risk of bias was made.
Summary measures
Effect sizes in standardized units were calculated using standard meta-analytic methods (Cooper, Hedges, & Valentine, Reference Cooper, Hedges and Valentine2009). Specifically, a within-group pre-post and pre-follow-up Cohen's (Reference Cohen1988) d was computed for all studies providing eligible data. The pre-post effect used baseline and the first available data collected post-treatment. To provide the most conservative estimate of effects at follow-up, pre-follow-up effects used data from the last available follow-up. For within-group effects, we assumed a correlation of r xx = 0.50 between timepoints (Hoyt & Del Re, Reference Hoyt and Del Re2018). For controlled studies, a between-group effect size was also computed. When pre-post data were available for both the treatment and control conditions, within-group effects were computed for each group separately. Then, the between-group effect was computed as the difference between within-group effects (i.e. Becker's del; Becker, Reference Becker1988 del). This effect size has the advantage of accounting for baseline data. When within-group effects were not available (e.g. outcomes like persisting effects assessed only at post-treatment; Griffiths, Richards, McCann, and Jesse, Reference Griffiths, Richards, McCann and Jesse2006), a between-group Cohen's d was computed. To provide the most conservative estimate of controlled effects, we used data from the last available follow-up timepoint. For randomized controlled cross-over designs in which both groups ultimately received the active treatment (e.g. Ross et al., Reference Ross, Bossis, Guss, Agin-Liebes, Malone, Cohen and Schmidt2016), we used data from the last timepoint prior to cross-over. For within-person RCTs that included multiple dosages (e.g. Bershad, Schepers, Bremmer, Lee, and de Wit, Reference Bershad, Schepers, Bremmer, Lee and de Wit2019), we compared the placebo condition with the highest dose condition.
In order to decrease the influence of selective reporting bias (Higgins & Green, Reference Higgins and Green2008), we attempted to represent all outcome measures that were assessed. Authors were contacted regarding measures described in the Method section but not included in the Results section. When data remained missing at the time of analysis, we represented effects described in the text as non-significant as d = 0.00. Authors were also contacted when adverse effects were not mentioned in the published report.
Synthesis of results
Using standard meta-analytic methods (Cooper et al., Reference Cooper, Hedges and Valentine2009), effects were aggregated first within measure [e.g. subscales of the Depression Anxiety and Stress Scale (Lovibond & Lovibond, Reference Lovibond and Lovibond1995)] and then within study using the ‘MAd’ package (Del Re & Hoyt, Reference Del Re and Hoyt2014) in R (R Core Team, 2018). As noted previously, separate analyses examined effects for specific outcome domains. Meta-analytic effect sizes with an associated 95% confidence interval (CI) were computed when at least two studies were available for a specific estimate (Valentine, Pigott, & Rothstein, Reference Valentine, Pigott and Rothstein2010). Summary effects were converted from Cohen's d to Hedges' g in order to account for small sample bias (Cooper et al., Reference Cooper, Hedges and Valentine2009). As appropriate, the sign for each effect was reversed so that a positive g always indicated improvement (e.g. decreased depression, increased well-being). Magnitude was interpreted based on Cohen's (Reference Cohen1988) guidelines. Separate aggregate effect size estimates were computed for within-group effects at post-treatment and follow-up and for between-group effects at last available post-treatment assessment. Heterogeneity was characterized using I 2 (i.e. proportion of heterogeneity that is between-study heterogeneity) and interpreted based on Higgins, Thompson, Deeks, & Altman's (Reference Higgins, Thompson, Deeks and Altman2003) guidelines. Random effects models with weighting based on the inverse of the variance of each study's effect size was implemented through the ‘metafor’ package (Viechtbauer, Reference Viechtbauer2010).
Risk of bias across studies
We assessed publication bias using trim-and-fill analyses in the ‘metafor’ package. When funnel plot asymmetry was detected, an adjusted effect size was computed with studies imputed to account for asymmetry. Due to the small number of studies in some analyses, which limits statistical power, these tests were considered exploratory. In addition, we calculated the fail-safe Ns to represent the number of non-significant results that would need to exist to nullify an observed effect (Rosenthal, Reference Rosenthal1979).
Additional analyses
We tested four study-level characteristics as moderators. These included the psychedelic type (coded as 1 = psilocybin, 0 = LSD or ayahuasca), whether the sample was clinical (i.e. required elevated symptoms of a medical/psychiatric diagnosis for inclusion) or non-clinical (i.e. healthy controls), whether behavioral support was provided (e.g. pre-treatment preparation), and percentage female. Psilocybin was compared with LSD or ayahuasca as the majority of studies investigated psilocybin (k = 14). Insufficient studies were available to adequately compare psilocybin with LSD (k = 4) and ayahuasca (k = 6) separately, or LSD and ayahuasca with each other. We also conducted sensitivity analyses with outliers excluded. There are several methods for identifying outliers in meta-analysis (Viechtbauer & Cheung, Reference Viechtbauer and Cheung2010). We used the ‘find.outliers’ function provided by Harrer, Cuijpers, Furukawa, and Ebert (Reference Harrer, Cuijpers, Furukawa and Ebert2019) which defines an outlier as a study whose CI does not overlap the omnibus effect CI.
Results
Study selection
Our search produced a total of 14 591 citations. After removing 4540 duplicates, 10 051 unique titles and/or abstracts were reviewed. After applying our exclusion criteria (Fig. 1), we retained 34 studies representing 24 unique samples and 549 participants (see online Supplementary Table 2 for a list of the 34 studies). Studies were published between 2006 and 2020.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210114085507226-0963:S003329172000389X:S003329172000389X_fig1.png?pub-status=live)
Fig. 1. PRISMA flow diagram.
Study characteristics
Study-level characteristics are reported in Table 1. Half of the studies used single-group pre-post designs (50.0%) with the remainder being within-group RCTs (i.e. participants received all conditions in random order; 16.7%), or between-group RCTs (33.3%). The majority of studies tested psilocybin (58.3%) with 25.0% testing ayahuasca and 16.7% testing LSD. Dosages of each psychedelic and placebo control conditions are listed in online Supplementary Table 3. Post-test assessment occurred on average at 5.54 weeks post-treatment (s.d. = 6.48, range = 0 to 26.00). Most studies (54.2%) included a follow-up assessment. For studies with a follow-up assessment, last follow-up occurred on average 53.34 weeks (s.d. = 64.25) post-treatment (range = 3 to 234.90). Retention at post-treatment was 94.5% (s.d. = 10.0) and 85.6% (s.d. = 16.9) at follow-up.
Table 1. Study characteristics
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210114085507226-0963:S003329172000389X:S003329172000389X_tab1.png?pub-status=live)
Behav, inclusion of behavioral support (e.g. preparation prior to psychedelic administration); N, sample size; tx, treatment; cont, control; Wkpost, week of post-treatment assessment; WkFU, week of follow-up assessment; Fem, female; Retpost, % of sample retained at post-treatment assessment; RetFU, % of sample retained at follow-up assessment; NA, not available; life-threat disease, life-threatening disease.
Sample sizes were generally small, on average 22.88 participants (s.d. = 17.42, range = 6 to 85). Mean age was 42.13 years old and the samples were 51.5% female. Among the studies that reported race/ethnicity (37.5% of studies), 74.6% were non-Hispanic white or Caucasian. Studies were conducted in the USA (45.8%), Europe (41.7%), and Brazil (12.5%). Approximately half of the studies (45.8%) included participants with clinical conditions. The most common clinical condition was depression (k = 4). Other clinical conditions included cancer/life-threatening diseases with comorbid anxiety and/or depression (k = 3), alcohol dependence (k = 1), smoking (k = 1), and AIDS (k = 1).
Risk of bias within studies
Risk of bias varied, often based on whether a single-group design was used (online Supplementary Table 4). Single-group designs lacked randomization and other features (e.g. blinding) that increase confidence that effects are associated with the active treatment. Risk of bias also varied across domains (Fig. 2). Blinding of participants and personnel and blinding of outcome assessment were the domains most at risk for bias. Selective reporting bias was commonly rated as unclear due to difficulty in determining whether the reported outcomes were planned.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210114085507226-0963:S003329172000389X:S003329172000389X_fig2.png?pub-status=live)
Fig. 2. Cochrane risk of bias assessment. Random sequence, random sequence generation; Allocat Concealment, allocation concealment; Blinding Person/Partic, blinding of personnel and participants; Blind Outcome, blinding of outcome assessment; Attrition bias, incomplete outcome data.
Results of individual studies
Effect size-level data are reported by study, domain, timepoint, and design in online Supplementary Table 5. The outcome measures included across studies are listed in online Supplementary Table 6 along with their corresponding domain.
Synthesis of results
Adverse effects
Adverse effects were available for 79.2% of studies (online Supplementary Table 3). Among those reporting adverse effects, none reported serious adverse effects (e.g. death, hospitalization). Commonly reported transient adverse effects included headache, anxiety, nausea, and increased blood pressure.
Several studies (29.2%) also included measures of longer-term adverse effects that could be used to quantify the magnitude of these effects (e.g. psychotic symptoms, mania, persisting negative effects; see online Supplementary Table 6). There was no evidence that psychedelics increased risk for adverse effects. In fact, within-group effects suggested decreased adverse effects at post-treatment and follow-up (gs = 0.40 and 0.50, respectively; Table 2). As noted above, a positive effect size indicates a reduction in adverse effects. Heterogeneity was low for within-group pre-post comparisons but moderate to high for within-group pre-follow-up and between-group comparisons.
Table 2. Meta-analytic estimates of effects of classical psychedelics across outcome domains
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210114085507226-0963:S003329172000389X:S003329172000389X_tab2.png?pub-status=live)
N, sample size; K, number of studies; CI, confidence interval; k imp, number of studies imputed for trim-and-fill adjustment; ESadj, trim-and-fill adjusted effect size; FSN, fail-safe N; Targeted sx, targeted symptoms within psychiatric samples; Depression, depression outcomes restricted to samples with depression; Neg, negative; Pos, positive; Exist/spirit, existential/spiritual; ES, effect size in Hedges' g units; FU, follow-up; NA, not available. Estimates based on k = 1 not included.
a statistically significant result not robust to publication bias based on Rosenberg's (Reference Rosenberg2005) guidelines (i.e. fail-safe N > 5n + 10, where n = number of published studies).
Within-group effects
Psychedelics showed statistically significant within-group improvements across several outcome domains at both post-treatment and follow-up (Table 2, Fig. 3). Domains showing beneficial effects included targeted symptoms within psychiatric samples, depression within samples with depression, negative affect, positive affect, social outcomes, and existential/spiritual outcomes. Associated effect sizes ranged from gs = 0.44 (positive affect) to 2.06 (depression) and were fairly similar in magnitude at post-treatment and follow-up. Psychedelics showed improvements in behavior and mindfulness at post-treatment, although estimates were not available at follow-up. Psychedelics were not associated with changes in big five personality dimensions, with the exception of openness which showed a small increase. Heterogeneity was generally high (I 2 > 50%).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20210114085507226-0963:S003329172000389X:S003329172000389X_fig3.png?pub-status=live)
Fig. 3. Forest plots displaying effects of classical psychedelics across psychological outcome domains. Each point represents an effect size estimates (Hedges' g units) and a corresponding 95% confidence interval. Targeted sx, targeted symptoms within psychiatric samples; Depression, depression outcomes restricted to samples with depression; Neg, negative; Pos, positive; Exist/spirit, existential/spiritual; ES, effect size in Hedges' g units; FU, follow-up; NA, not available.
Between-group effects
Moderate to large and statistically significant between-group effects favored psychedelics relative to placebo controls across several outcome domains at longest follow-up. These included targeted symptoms within psychiatric samples, negative affect, positive affect, social outcomes, behavior, and existential/spiritual outcomes. Effect sizes ranged from gs = 0.84 to 1.16. There was no evidence of between-group effects on personality. Heterogeneity was generally high (I 2 > 50%).
Risk of bias across studies
There was evidence of funnel plot asymmetry (i.e. publication bias) in eight models (Table 2). Statistical significance was not impacted by this adjustment, with one exception [within-group pre-post effect on social outcomes which became non-significant, g = 0.43 (−0.10, 0.97)]. Fail-safe Ns ranged from 0 to 803. Based on Rosenberg's (Reference Rosenberg2005) guidelines (i.e. fail-safe N > 5n + 10, where n = number of published studies), within-group effects on adverse effects, social outcomes, openness, and mindfulness as well as between-group effects on behavior were not robust against publication bias.
Additional analyses
Due to insufficient studies, not all moderators could be tested for all models (see online Supplementary Table 7). Clinical samples were associated with larger improvements for some comparisons in the domains of negative affect, positive affect, adverse effects, existential/spiritual outcomes, and extraversion. Psychedelic type did not moderate effects, with the exception of within-group pre-post effects on mindfulness for which psilocybin produced larger increases. Presence of behavioral support did not moderate effects. Percentage female did not moderate effects, with the exception of within-group pre-follow-up effects on extraversion for which higher percentage female was associated with smaller increases.
Models with outliers removed are reported in online Supplementary Table 8. No significance tests changed as a result of this and effect sizes were similar in magnitude (change in g ⩽ 0.26).
Discussion
To our knowledge, this is the first comprehensive meta-analysis of experimental studies testing the post-acute effects of psychedelics.Footnote 1 Although based on a relatively small number of studies and participants (k = 34 studies and 24 unique samples, n = 549), results suggest psychedelics may produce beneficial effects. Most relevant for psychiatric samples, large and statistically significant effects were detected for targeted symptoms (g = 1.08) when psychedelics were compared with placebo controls in RCTs. As points of comparison, this effect is on par or larger than that achieved by psychotherapy relative to waitlist (e.g. d = 0.80; Wampold and Imel, Reference Wampold and Imel2015) and antidepressants relative to placebo (e.g. ds = 0.42 to 0.17; Cipriani et al., Reference Cipriani, Furukawa, Salanti, Chaimani, Atkinson, Ogawa and Geddes2018). Moreover, this effect appears robust to publication bias and not influenced by outliers. Psychedelics also compared favorably with placebo controls on measures related to negative and positive affect; on measures of social, behavior, and existential/spiritual outcomes; and on depression in samples with depression (although effect on behavior was not robust to fail-safe N). The superiority over placebo controls supports the possibility of specific effects, however this conclusion is necessarily uncertain given difficultly blinding psychedelics. Within-group effects were similar in magnitude and statistical significance, and support the notion that beneficial effects may persist at follow-up. Although adverse effects were not available for 20.8% studies, effects reported were transient and no serious adverse events occurred. Quantitative assessment of longer-term adverse effects similarly suggests that transient psychological effects do not typically remain elevated during the post-acute period and may even reduce in some instances. Evidence supporting the effects of psychedelics on personality and mindfulness were less compelling and less robust to test of publication bias.
Due to the limited number of studies and variation across studies in design features, we were limited in our ability to test moderators. Nonetheless, it appears that some effects may be larger for clinical samples. Psychedelic type, presence of behavioral support, and percentage female generally did not moderate effects, although confounding with other design characteristics (e.g. amount of behavioral support, clinical sample) makes these null findings tenuous. It does appear that moderate to large reductions in psychiatric symptoms have been achieved in studies testing psilocybin with relatively little behavioral support (e.g. one to three sessions; Carhart-Harris et al. Reference Carhart-Harris, Bolstridge, Day, Rucker, Watts, Erritzoe and Nutt2018a, Reference Carhart-Harris, Roseman, Haijen, Erritzoe, Watts, Branchi and Kaelenb; Grob et al., 2011). Future clinical trials and meta-analyses should clarify the requisite dosage of behavioral support.
Although the most comprehensive quantitative review to date, our study remained limited in sample size and associated statistical power. Indeed, the sample available in the entire literature reviewed (n = 549) is considerably smaller than that from large-scale RCTs (e.g. n = 952 in Project MATCH; Project Match Research Group, 1998). This highlights the inherent uncertainty in conclusions drawn. An additional complication is the degree to which generalizations can be made from the individuals who chose to participate in the available experimental studies, given psychedelics remain Schedule I substances in most study locations. While selection bias may have produced inflated effect size estimates (e.g. selecting individuals most open to the possibility of change through psychedelic treatments, higher expectancy), some studies included healthy controls with previous use of psychedelics which could have created ceiling effects (i.e. therapeutic effects were achieved at baseline through prior use). A relatively modest amount of racial/ethnic diversity and a lack of reporting on sample race/ethnicity in the available studies is another important limitation that must be addressed (Michaels, Purdon, Collins, & Williams, Reference Michaels, Purdon, Collins and Williams2018). While we attempted to aggregate effects in conceptually coherent ways, there remained methodological heterogeneity (e.g. psychedelic dose, provision of behavioral support) that was either not modeled or tested in underpowered ways. This makes it impossible to provide recommendations regarding the specific treatment characteristics most strongly linked to beneficial effects. Similarly, although results generally did not change when accounting for publication bias, trim-and-fill analyses were also likely underpowered.
A broader potentially more pernicious limitation is risk of bias within the available studies. As noted, obviously psychoactive substances may be particularly difficult to adequately double blind. However, several studies included features that may increase the strength of the placebo condition (e.g. using methylphenidate or other psychoactive agents, making specific treatment conditions and study aims ambiguous; Griffiths et al., Reference Griffiths, Richards, McCann and Jesse2006). Two potential sources of bias that would be relatively straightforward to address are risks associated with attrition and selective reporting. None of the included studies explicitly used an intention-to-treat analysis, although this would be a straightforward way to address attrition bias. Of note, studies rated here as low on attrition bias generally had no attrition. Selective reporting could be reduced through more consistent pre-registration of study hypotheses. While several included studies were pre-registered (e.g. clinicaltrials.gov), many were not, making it difficult to ascertain the degree to which the reported outcomes were specified a priori v. drawn from a larger number of unpublished outcomes (i.e. increasing risk for opportunistic bias; DeCoster, Sparks, Sparks, Sparks, and Sparks, Reference DeCoster, Sparks, Sparks, Sparks and Sparks2015). It did not appear that any of the included studies published their hypotheses using the Open Science Framework or similar platforms (e.g. AsPredicted.Org). While perhaps unsurprising given these platforms are relatively new (Foster & Deardorff, Reference Foster and Deardorff2017) and some contemporary research on psychedelics has been exploratory in nature and may not have had a priori hypotheses, explicit pre-registration of study hypotheses and analysis plans could help reduce selective reporting bias and increase confidence in this body of literature.
These limitations notwithstanding, the current study joins the two previous meta-analyses (Goldberg et al., Reference Goldberg, Pace, Nicholas, Raison and Hutson2020; Krebs & Johansen, Reference Krebs and Johansen2012) suggesting that psychedelics are a class of substances worthy of further exploration.Footnote 2 Careful, large-scale, placebo-controlled RCTs are especially needed to clarify the empirical status for specific clinical conditions (e.g. depression) as well as for non-clinical applications. Particularly promising applications may include the use of psilocybin for the treatment of anxiety and depression (Goldberg et al., Reference Goldberg, Pace, Nicholas, Raison and Hutson2020), although ayahuasca and LSD may also prove beneficial for these indications. While based on only one study each in the contemporary period, the use of psilocybin for smoking cessation and LSD for alcohol use are also promising avenues for future exploration, given the prevalence, health burden, and recalcitrance associated with both nicotine and alcohol use disorders. Future studies could pursue the pairing of psychedelics with behavioral interventions and non-psychotherapeutic approaches (e.g. meditation retreats; Smigielski et al., Reference Smigielski, Scheidegger, Kometer and Vollenweider2019) to enhance well-being and support flourishing in both clinical and non-clinical samples.
However, it is crucial that future work investigating clinical and non-clinical applications of psychedelics carefully evaluate adverse effects. While we found no clear evidence of persistent adverse effects, many of the included studies excluded individuals with personal or family histories of psychiatric conditions (e.g. bipolar disorder, psychotic disorders). Future studies using alternative designs (e.g. naturalistic and population-based surveys, case reports); extending long-term follow-up to measure protracted effects and naturalistic use in trial participants; and examining safety in previously excluded samples (e.g. contraindicated family histories; personality disorder) may help clarify potential risks.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S003329172000389X.
Acknowledgements
The authors are grateful to Drs Harriet de Wit, Frederick Barrett, Gitte Knudsen, and Rafael dos Santos for sharing data from their studies. They are grateful to Dr Brian Pace for his comments on study design.
Financial support
Research reported in this publication was supported by the National Center for Complementary & Integrative Health of the National Institutes of Health under Award Number K23AT010879 (SG). The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.
Conflict of interest
In the prior 12 months, Charles L. Raison has served as a consultant for Usona Institute, Alkermes and Shire. All other authors declare that there is no conflict of interest.