Hostname: page-component-745bb68f8f-cphqk Total loading time: 0 Render date: 2025-02-11T15:58:06.592Z Has data issue: false hasContentIssue false

Citation bias and selective focus on positive findings in the literature on the serotonin transporter gene (5-HTTLPR), life stress and depression

Published online by Cambridge University Press:  12 August 2016

Y. A. de Vries*
Affiliation:
Department of Psychiatry, Interdisciplinary Center Psychopathology and Emotion Regulation, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
A. M. Roest
Affiliation:
Department of Psychiatry, Interdisciplinary Center Psychopathology and Emotion Regulation, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
M. Franzen
Affiliation:
Department of Psychology, University of Groningen, Groningen, The Netherlands
M. R. Munafò
Affiliation:
MRC Integrative Epidemiology Unit (IEU) at the University of Bristol, Bristol, UK UK Centre for Tobacco and Alcohol Studies, School of Experimental Psychology, University of Bristol, Bristol, UK
J. A. Bastiaansen
Affiliation:
Department of Psychiatry, Interdisciplinary Center Psychopathology and Emotion Regulation, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands
*
*Address for correspondence: Y. A. de Vries, M.Sc., Department of Psychiatry, University Medical Center Groningen, Hanzeplein 1, 9713 GZ Groningen, The Netherlands. (Email: y.a.de.vries@umcg.nl)
Rights & Permissions [Opens in a new window]

Abstract

Background

Caspi et al.'s 2003 report that 5-HTTLPR genotype moderates the influence of life stress on depression has been highly influential but remains contentious. We examined whether the evidence base for the 5-HTTLPR–stress interaction has been distorted by citation bias and a selective focus on positive findings.

Method

A total of 73 primary studies were coded for study outcomes and focus on positive findings in the abstract. Citation rates were compared between studies with positive and negative results, both within this network of primary studies and in Web of Science. In addition, the impact of focus on citation rates was examined.

Results

In all, 24 (33%) studies were coded as positive, but these received 48% of within-network and 68% of Web of Science citations. The 38 (52%) negative studies received 42 and 23% of citations, respectively, while the 11 (15%) unclear studies received 10 and 9%. Of the negative studies, the 16 studies without a positive focus (42%) received 47% of within-network citations and 32% of Web of Science citations, while the 13 (34%) studies with a positive focus received 39 and 51%, respectively, and the nine (24%) studies with a partially positive focus received 14 and 17%.

Conclusions

Negative studies received fewer citations than positive studies. Furthermore, over half of the negative studies had a (partially) positive focus, and Web of Science citation rates were higher for these studies. Thus, discussion of the 5-HTTLPR–stress interaction is more positive than warranted. This study exemplifies how evidence-base-distorting mechanisms undermine the authenticity of research findings.

Type
Original Articles
Copyright
Copyright © Cambridge University Press 2016 

Introduction

Major depressive disorder (MDD) is a complex illness, caused by a combination of genetic and environmental risk factors (Sullivan et al. Reference Sullivan, Neale and Kendler2000). One of the most robust risk factors for MDD is the experience of a stressor, such as a stressful life event or childhood abuse (Hammen, Reference Hammen2005). However, many who experience such a stressor do not develop depression. This individual variability has been suggested to be due, at least in part, to genetic variation (Caspi et al. Reference Caspi, Hariri, Holmes, Uher and Moffitt2010).

In 2003, Caspi et al. reported that a polymorphism in the serotonin transporter gene (5-HTTLPR) moderates the relationship between life stress and depression: while carriers of at least one short (S) allele had a similar risk of depression as people homozygous for the long (L) allele in the absence of stress, S carriers were up to twice as likely to develop depression after stressful life events or childhood abuse (Caspi et al. Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003). This study has since been highly cited (>3800 times, Web of Science, October 2015) and has become the seminal finding within the burgeoning field of gene–environment interactions (G × E). However, this finding also remains highly contentious. Even meta-analyses on this topic contradict each other, with some finding evidence of an effect (Karg et al. Reference Karg, Burmeister, Shedden and Sen2011; Sharpley et al. Reference Sharpley, Palanisamy, Glyde, Dillingham and Agnew2014), while others do not (Munafò et al. Reference Munafò, Durrant, Lewis and Flint2009; Risch et al. Reference Risch, Herrell, Lehner, Liang, Eaves, Hoh, Griem, Kovacs, Ott and Merikangas2009).

Many issues complicate the interpretation of G × E findings and replications, such as publication bias (Duncan & Keller, Reference Duncan and Keller2011) and analytical flexibility (Zammit et al. Reference Zammit, Owen and Lewis2010; Simmons et al. Reference Simmons, Nelson and Simonsohn2011), which increases the chance of false-positives due to the multitude of analyses performed (Heininga et al. Reference Heininga, Oldehinkel, Veenstra and Nederhof2015). The likelihood of false-positives is further increased by low power and by the low prior probability of associations in candidate gene studies (Duncan & Keller, Reference Duncan and Keller2011). Although replication has been suggested as the solution to false-positive findings (De Jonge et al. Reference De Jonge, Conradi, Thombs, Rosmalen, Burger and Ormel2011), many G × E studies are imprecise replications of the original finding, and a loose definition of replication may still permit propagation of false-positives (Sullivan, Reference Sullivan2007).

Additionally, researchers may emphasize positive findings while downplaying negative findings. Within the randomized controlled trial literature, such reporting strategies, whether intentional or unintentional, that focus on positive (secondary) findings (in spite of non-significant results for the primary outcome) and that may distort the interpretation of results, are defined as ‘spin’ (Boutron et al. Reference Boutron, Dutton, Ravaud and Altman2010; Roest et al. Reference Roest, de Jonge, Williams, de Vries, Schoevers and Turner2015). A focus on positive findings has also been demonstrated in observational studies (Park et al. Reference Park, Peacey and Munafò2014). As a consequence, the published literature on a topic may appear more convincing than is justified by the strength of the evidence.

Selective citation may also affect the quality of the evidence base (Greenberg, Reference Greenberg2009). Statistically significant (positive) studies are cited more frequently than non-significant (negative) studies (Kjaergard & Gluud, Reference Kjaergard and Gluud2002; Nieminen et al. Reference Nieminen, Rucker, Miettunen, Carpenter and Schumacher2007; Etter & Stapleton, Reference Etter and Stapleton2009; Jannot et al. Reference Jannot, Agoritsas, Gayet-Ageron and Perneger2013), which may render non-supportive studies relatively invisible. Citation bias and focus on positive findings can also work synergistically to hide negative results from view. A previous examination of citation patterns on a related topic, that of 5-HTTLPR and amygdala activation (Bastiaansen et al. Reference Bastiaansen, de Vries and Munafò2015), showed that negative studies that had been spun were cited at a similar rate as positive studies, while negative studies that had not been spun received almost no citations. The resulting invisibility of negative findings may create the impression that this effect has been proven beyond doubt, although meta-analyses have questioned its robustness (Murphy et al. Reference Murphy, Norbury, Godlewska, Cowen, Mannie, Harmer and Munafò2013; Bastiaansen et al. Reference Bastiaansen, Servaas, Marsman, Ormel, Nolte, Riese and Aleman2014).

In the current study, we aimed to determine whether citation bias and selective focus on positive findings are also present in the literature on 5-HTTLPR, life stress and depression. Achieving a better understanding of the aetiology of depression is of vital importance to psychiatry, given the high burden of depression (Whiteford et al. Reference Whiteford, Degenhardt, Rehm, Baxter, Ferrari, Erskine, Charlson, Norman, Flaxman, Johns, Burstein, Murray and Vos2013). Distortion of the evidence base could mislead researchers and clinicians and thus pose a major obstacle to this goal.

Method

Study selection

To establish the network of primary studies, we searched PubMed for the most recent meta-analysis on 5-HTTLPR, stress and depression (Sharpley et al. Reference Sharpley, Palanisamy, Glyde, Dillingham and Agnew2014), which included 81 studies. For each study, we determined the outcome for the effect of interest (i.e. 5-HTTLPR × stress). We included studies with continuous outcomes (e.g. score on a depression questionnaire), as well as studies with binary outcomes (depression diagnosis). We excluded studies in which the outcome was clearly a different construct from depression (e.g. cognitive dysfunction). Studies were included regardless of whether the 5-HTTLPR x stress interaction effect on depression was the primary outcome. No exclusion criteria were applied for stressors, which were very diverse.

Coding study outcomes

Coding was done in duplicate by two independent raters (Y.A.d.V. and M.F.), and disagreements were resolved by discussion with A.M.R. and J.A.B. Study outcome was coded as positive, negative or unclear. We coded a study outcome as unclear if we could not determine whether the 5-HTTLPR x stress interaction was significant, for instance because only the p value associated with a three-way interaction (e.g. 5-HTTLPR x stress x gender) was presented. Study outcome was coded as positive if the extracted p value was <0.05, provided that the interaction was in the expected direction (i.e. S allele associated with increased depression), and as negative otherwise.

p Values were extracted according to a hierarchical decision tree. We first determined whether the design of the study was ‘exposed-only’. In these studies, the entire sample was exposed to a stressor, such as a somatic illness. The effect of interest, in this case, is not an interaction but the main effect of 5-HTTLPR. Hence, we extracted the p value associated with the main effect for these studies. For all other studies, we determined whether a p value was reported for a two-way interaction between 5-HTTLPR and stress, consistent with Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003).

If multiple relevant, independent outcomes or stressors were included in a study, we extracted all p values. Following Sharpley et al. (Reference Sharpley, Palanisamy, Glyde, Dillingham and Agnew2014), we averaged these p values to arrive at a conclusion. If multiple non-independent outcomes were given (e.g. a continuous symptom scale and a dichotomized version thereof), we only included the continuous outcome. When studies provided p values for both biallelic and triallelic genotyping, we erred towards coding a study as positive by selecting the smallest p value, as it is unclear which genotyping approach should be preferred (Hu et al. Reference Hu, Oroszi, Chun, Smith, Goldman and Schuckit2005; Wendland et al. Reference Wendland, Martin, Kruse, Lesch and Murphy2006; Martin et al. Reference Martin, Cleak, Willis-Owen, Flint and Shifman2007). If both unadjusted and adjusted analyses were given, we also used the smallest p value. We preferentially extracted the p value of an overall test of interaction; however, if only post-hoc comparisons were available, we extracted the p value associated with the SS v. LL homozygotes comparison.

Coding study abstracts

Two independent raters (Y.A.d.V. and M.F.) coded the abstract of each study, and discrepancies were resolved by discussion with A.M.R. and J.A.B. Abstracts were preferentially coded based upon their conclusions, but if these did not provide a clear statement, we used the results section of the abstract. In coding abstracts, we were interested in the way abstracts reported on how their findings reflected on the original result by Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003). Abstracts were coded as positive if a claim was made that the results supported the existence and/or importance of the 5-HTTLPR × stress interaction. Abstracts were coded as partially supportive if a positive claim was made that was not directly related to the 5-HTTLPR × stress interaction (e.g. positive findings for a three-way interaction) or if the abstract mentioned findings for multiple outcomes or stressors and not all were positive. Abstracts that did not make a positive claim or that made an explicitly negative claim were coded as negative. If the abstract did not report on the effect of interest, the study was excluded (two studies).

Citation outcomes

We examined citations both within the network of primary studies and outside of the network in the broader literature (Bastiaansen et al. Reference Bastiaansen, de Vries and Munafò2015). To examine within-network citations, we constructed a citation grid and marked for each study by which of the other included studies it was cited. Total citation counts for each study were calculated from the grid. To examine out-of-network citations, we looked up the citation counts for each study on Web of Science (Core Collection, October 2015). To create non-overlapping outcomes, we pruned the within-network citations from the Web of Science citations. While within-network citations represent citations by other experts working within the 5-HTTLPR × stress field, Web of Science citations also include citations by researchers not directly involved in this area.

Analyses

For our citation analysis, we first compared the citations received by studies with positive, negative or unclear outcomes (irrespective of abstract coding). The sum of citations was calculated and the percentage of all citations received by studies with a given outcome was determined. In examining within-network citations, we excluded the most recent study, as it could not have been cited within the network. We also examined the study by Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003) separately, as we expected it to receive many citations.

To determine whether a (selective) focus on positive findings was present, we examined the number of negative studies with a negative abstract (studies without a positive focus), a partially supportive abstract (studies with a partially positive focus), or a positive abstract (studies with a positive focus). We then examined the impact of focus on citation rates by calculating the percentage of all citations to negative studies received by each type of negative study.

Within the network, we also examined whether positive studies, negative studies without a positive focus, and negative studies with a (partially) positive focus showed different citations patterns, that is, whether positive studies were more likely to cite other positive studies and negative studies more likely to cite other negative studies.

We performed several sensitivity analyses. First, since older studies have had more opportunities to be cited, we re-examined citation rates based on measures taking into account publication year. For within-network citations, the percentage of subsequent studies citing a given study was calculated; for out-of-network citations, the yearly citation rate was calculated. Second, as the distribution of citations is right-skewed, we examined the median number of citations to each study type. Third, we recoded the outcome for studies with multiple relevant p values based upon the smallest p value. As it is often unclear what should be considered the primary outcome, we used average p values in our main analysis; however, in some cases the smallest p value may have been associated with the outcome considered most important by the authors, which is why we performed this sensitivity analysis.

Since the included studies form the total population of studies on the effect of interest, we used descriptive analyses rather than statistical tests (Bastiaansen et al. Reference Bastiaansen, de Vries and Munafò2015), which are designed to generalize from a sample to a hypothetical larger population.

Results

Coding of studies and abstracts

We excluded 10 of the 81 studies in Sharpley et al. (Reference Sharpley, Palanisamy, Glyde, Dillingham and Agnew2014): eight studies were excluded because the outcome was not depression-related, no stressor was included, or the entire sample was depressed; one study was excluded because the abstract did not report on 5-HTTLPR; and one study was excluded because the abstract did not report on the depression outcome. Furthermore, we included two additional studies that had been excluded from the meta-analysis because the sample was a subset of those included in a later study (see flow chart in online Supplementary material). Consequently, we included 73 studies, of which 24 studies were coded as positive, 38 studies as negative, and 11 studies as unclear in terms of outcome. Of the 11 unclear studies, four studies were coded as unclear because of the inclusion of three-way interactions in the model (e.g. with social support), while another study was coded as unclear because the 5-HTTLPR × stress interaction was only tested in males and females separately. Four studies were coded as unclear because the 5-HTTLPR × stress interaction was not tested (e.g. only the main effect of 5-HTTLPR in the different stress groups was tested). Finally, two studies were coded as unclear because we could not determine whether the (averaged) p value was <0.05, as one p value was given as ‘non-significant’ while another was <0.05. Inter-rater agreement was moderate (κ = 0.49). Our agreement with Sharpley et al. (Reference Sharpley, Palanisamy, Glyde, Dillingham and Agnew2014) was good: within the subset of studies included in both Sharpley et al. (Reference Sharpley, Palanisamy, Glyde, Dillingham and Agnew2014) and our own paper and that we coded as positive or negative (rather than unclear), the percentage of positive studies was 38% (23 out of 60) by both our coding and Sharpley's coding; coding was identical for 54 out of 60 (90%) papers.

Of the 73 studies, we coded 40 abstracts as positive, 16 abstracts as negative and 17 abstracts as partially supportive. Inter-rater agreement for abstract coding was good (κ = 0.71). A full table of studies with characteristics and coding is given in the online Supplementary material.

Citations by study outcome

Fig. 1 shows the percentage of citations to positive, negative and unclear studies (outer circle) compared with the percentage of studies of each type (inner circle).

Fig. 1. Percentage of citations received by positive, negative and unclear studies. The inner ring indicates the percentage of studies of each type. The outer ring indicates the percentage of total citations received by studies with positive, negative or unclear outcomes.

The total number of citations was 488 within the network and 9166 on Web of Science. Positive studies, comprising 33% of all studies, received 236 (48%) within-network citations and 6189 (68%) Web of Science citations. Negative studies (52% of all studies) received 205 (42%) within-network citations and 2115 (23%) Web of Science citations, while unclear studies (15% of all studies) received 47 (10%) within-network citations and 862 (9%) Web of Science citations. The study by Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003) received a large share of the citations to positive studies, particularly in Web of Science. However, even after exclusion of this study, positive studies still received 40% of within-network and 45% of Web of Science citations, as compared with 48 and 39%, respectively, for negative studies.

On average, negative studies received 5.5 (s.d. = 9.3) within-network citations, while unclear studies received 4.3 (s.d. = 6.2) and positive studies received 9.8 (s.d. = 14.6). Positive studies other than Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003) received 7.4 (s.d. = 8.9) within-network citations on average. For Web of Science, negative studies received, on average, 55.7 (s.d. = 72.4) citations, while unclear studies received 78.4 (s.d. = 61.8) and positive studies received 257.9 (s.d. = 765.7) citations. Positive studies other than Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003) received 103.8 (s.d. = 132.4) citations on average.

Presence of positive focus in abstracts

Fig. 2 depicts the presence of a positive focus within the set of studies. Of the 24 positive studies, 21 (88%) abstracts were positive and three (13%) abstracts were partially supportive. These partially supportive abstracts focused on gender differences (two abstracts) or on a three-way interaction (one abstract). Of the 11 unclear studies, five (45%) abstracts were partially supportive and six (55%) abstracts were positive. Of the 38 negative studies, 16 (42%) abstracts were negative, nine (24%) abstracts were partially supportive, and 13 (34%) abstracts were positive (see online Supplementary Table S1 for a list of these 13 studies, with the relevant sentence(s) from the abstract). Thus, 22 out of 38 (58%) negative studies had a (partially) positive focus.

Fig. 2. Abstract coding by study outcome. The categories on the x-axis represent the outcome of the study, while the different sections of the bars indicate the abstract coding.

Effect of focus on citation

Fig. 3 shows the distribution of citations (outer circle) by the presence of a positive focus (inner circle) in negative studies. Studies without a positive focus, which comprised 42% of all negative studies, received 97 (47%) out of 205 within-network citations and 679 (32%) out of 2115 Web of Science citations to negative studies. Studies with a partially positive focus (24% of all studies) received 28 (14%) within-network citations and 367 (17%) Web of Science citations, while studies with a positive focus (34% of all studies) received 80 (39%) within-network citations and 1069 (51%) Web of Science citations.

Fig. 3. Percentage of citations received by negative studies without a positive focus, with a partially positive focus and with a positive focus. The inner ring indicates the percentage of studies of each type. The outer ring indicates the percentage of total citations received by studies of each type.

On average, a negative study without a positive focus received 6.1 (s.d. = 9.5) citations within the network, while a study with a partially positive focus received 3.1 (s.d. = 5.9) citations and a study with a positive focus received 6.7 (s.d. = 11.3) citations. For Web of Science, a study without a positive focus received 42.4 (s.d. = 44.8) citations on average, while a study with a partially positive focus received 40.8 (s.d. = 40.8) citations and a study with a positive focus received 82.2 (s.d. = 106.8) citations.

Citation patterns by study category

Within the network, both positive and negative studies showed preferential citation of positive studies. Although only 33% of all studies were positive, 55% of citations made by positive studies were to other positive studies, as were 45% of citations made by negative studies. Only negative studies without a positive focus (22% of all studies) additionally showed increased citation of other negative studies without a positive focus, allocating 30% of citations to these studies (online Supplementary Table S2).

Sensitivity analyses

Analyses examining the percentage of subsequent studies citing a study (within-network), the yearly Web of Science citation rate, or the median number of citations (rather than the mean) yielded similar results to our main analyses (online Supplementary Tables S3 and S4).

When we recoded studies based upon the smallest p value rather than the average p value, 10 negative studies and two unclear studies became positive. Of the smallest p values from these 12 studies, two were between 0.04 and 0.05, five were between 0.01 and 0.05, four were less than 0.01, and one was only given as <0.05. After recoding, 36 studies were positive, 28 studies were negative, and nine studies were unclear. The prevalence of a (partially) positive focus in the remaining negative studies decreased from 58 to 43% (12 out of 28). Recoding did not markedly affect citation patterns (see online Supplementary Figs S1 and S2).

Discussion

We examined citation patterns within the literature on 5-HTTLPR, life stress and depression. In line with previous research (e.g. Nieminen et al. Reference Nieminen, Rucker, Miettunen, Carpenter and Schumacher2007; Jannot et al. Reference Jannot, Agoritsas, Gayet-Ageron and Perneger2013), we found that positive studies received more citations than negative studies. This effect was present both within the network of primary studies and within the broader literature (as represented by Web of Science citations), but it was more pronounced within the broader literature. This more pronounced difference appeared to be largely driven by the study of Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003), which was cited especially frequently, illustrating how such a premier finding may continue to exert considerable influence even as other studies accumulate. Excluding this study reduced, but did not eliminate, citation differences between positive and negative studies.

Furthermore, we found that a (partially) positive focus was present in the abstract of over half of the negative studies. Consequently, although the majority of studies (52%) were negative, these appeared to form a fairly small minority (22%), judging by the abstracts. A positive focus did not affect citation rates within the network, but it increased citation rates within the broader literature. This suggests that authors of other primary studies are not affected by a positive focus in abstracts. However, upon examining within-network citations to negative studies, we found that studies without a positive focus were overwhelmingly cited as negative (95%), while studies with a positive focus were usually cited as positive (56%) or partially supportive (38%), and only rarely as negative (6%). Thus, the positive focus was still propagated through these citations. Studies with a partially positive focus were actually cited less frequently than studies without a positive focus, particularly within the network. This may be because these studies, which often focused on three-way interactions, appear less relevant to the authors of primary studies on the two-way interaction itself.

Our results resemble those found previously for the literature on 5-HTTLPR and amygdala activation (Bastiaansen et al. Reference Bastiaansen, de Vries and Munafò2015), although citation bias toward positive studies and in particular positive abstracts was more pronounced in the amygdala activation literature. This difference may be due to the controversy surrounding gene–environment interactions: both opponents and proponents may be more likely to cite negative studies when there is controversy, the former to cast doubt upon the value of gene–environment research, the latter to point out potential flaws in these negative studies. However, when we examined early citations (prior to 2010) and late citations separately, there was little evidence that citation bias toward positive studies has changed since the publication of critical meta-analyses in 2009 (Munafò et al. Reference Munafò, Durrant, Lewis and Flint2009; Risch et al. Reference Risch, Herrell, Lehner, Liang, Eaves, Hoh, Griem, Kovacs, Ott and Merikangas2009), although there did seem to be a decrease in citation bias toward negative studies with a positive focus.

In this study, we extended the concept of spin, which originated within the clinical trial literature, to observational studies. Given the differences between observational studies and clinical trials, we use the term ‘positive focus’ instead of spin. Unlike clinical trials, which are usually narrowly focused on the efficacy of an intervention, observational studies tend to have more wide-ranging topics and often lack a clearly defined, a priori primary outcome. In this study, we specifically examined whether abstracts suggested that the results supported the 5-HTTLPR, life stress and depression hypothesis, although some studies had other (primary) hypotheses (e.g. three-way interactions). However, all studies were clearly inspired by Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003) and have a bearing on the original finding. As discussed by Kapur et al. (Reference Kapur, Phillips and Insel2012), novel findings in biological psychiatry often become surrounded by a penumbra of subsequent studies with a multiplicity of measures and significant findings that are, at best, ‘approximate replications’. A finding thus appears to be supported, even though it has not been decisively replicated (or refuted) and even though some supportive findings may have been accompanied by negative findings on a more precise replication of the original finding. We therefore deemed it important to specifically investigate how papers report on their findings with respect to the original finding by Caspi et al. (Reference Caspi, Sugden, Moffitt, Taylor, Craig, Harrington, McClay, Mill, Martin, Braithwaite and Poulton2003).

Duncan & Keller (Reference Duncan and Keller2011) have previously shown that negative replications of G × E findings were often published alongside positive findings. This tendency, which is distinct from, although related to a focus on positive findings in the abstract, further illustrates that authors are inclined to present a positive message. The tendency for the hypothesis to expand, as reflected in the study of three- or even four-way interactions between 5-HTTLPR, life stress, and gender, other genes or environmental factors, may also be rooted, in part, in the search for positive findings. There is a consensus that negative results are difficult to publish, which is supported by the finding that the sample size of purely negative G × E studies was six times greater than that of positive studies (Duncan & Keller, Reference Duncan and Keller2011). Although cohort studies have not found a greater journal acceptance rate for positive papers compared with negative papers (Song et al. Reference Song, Parekh-Bhurke, Hooper, Loke, Ryder, Sutton, Hing and Harvey2009), these studies often examined high-impact general medical journals, and authors may not submit negative studies that they judge to have little chance of acceptance to such journals. The perception that negative studies are unpublishable, as well as the conviction that the effect is real, may lead researchers to use motivated reasoning to justify presenting their findings in a positive light (without necessarily any conscious intentions of doing so) (Nosek et al. Reference Nosek, Spies and Motyl2012).

One of the strengths of our study is our examination of positive focus in abstracts and its influence on citation patterns, as the decision to cite a study and the manner of citation may be based on the abstract only. An additional strength is that we examined citations within the network of primary studies as well as in the broader literature, since authors of other primary studies are likely to have different citation motives from authors writing on a broader or different topic. We also corrected for differences in opportunity to be cited by looking at yearly rates and the percentage of studies citing a given study, which yielded similar results. Finally, we performed a sensitivity analysis based upon the smallest p values, when studies had multiple relevant stressors or outcomes. Using only the smallest p value accounts for studies in which the analysis considered most important by the authors is statistically significant, whereas other analyses are not. This lenient approach does not account for multiple testing, although many p values were not highly significant (only four out of 12 were smaller than 0.01). While this approach increased the proportion of positive studies, 43% of the remaining negative studies still had a (partially) positive focus in the abstract, and citation patterns were comparable, showing that the overall pattern remains the same even as some individual studies shift categories.

A limitation of our study is that the inter-rater agreement for coding study outcomes was only moderate. Although some disagreements were easily resolved, others reflect the opacity of some of the studies we included, which often included a multitude of stressors, outcomes, analyses and p values. Unfortunately, the G × E field is characterized by a proliferation of approaches, hampering easy interpretability and comparability. Pre-specification of a primary outcome and analytical approach, such as proposed in the protocol of a collaborative meta-analysis (Culverhouse et al. Reference Culverhouse, Bowes, Breslau, Nurnberger, Burmeister, Fergusson, Munafò, Saccone and Bierut2013), may help curb this proliferation and yield clear results.

A second limitation is that we did not incorporate meta-analyses, although citations are probably diverted from primary studies to meta-analyses once these are published. However, both the negative and positive meta-analyses in this field (Munafò et al. Reference Munafò, Durrant, Lewis and Flint2009; Risch et al. Reference Risch, Herrell, Lehner, Liang, Eaves, Hoh, Griem, Kovacs, Ott and Merikangas2009; Karg et al. Reference Karg, Burmeister, Shedden and Sen2011) have been highly cited, suggesting that inclusion of meta-analyses would not undo the preferential citation of positive studies. Finally, we did not assess study quality. Arguably, high-quality studies should receive more citations, and it is possible, although not very likely (Duncan & Keller, Reference Duncan and Keller2011), that positive studies were of higher quality than negative studies.

Although we have examined a specific, highly prominent finding, selective focus on positive findings and citation bias are unlikely to be isolated problems, limited to this particular example. On the contrary, like other biases, they are probably widespread in many scientific disciplines. Our research therefore illustrates evidence-base-distorting mechanisms that may be at work in other areas as well. Consequently, our findings have broad implications. The frequent presence of positive conclusions in the abstracts of negative studies suggests that readers should endeavor to read the full study and personally assess its results whenever possible. Furthermore, researchers are well-advised to perform an independent search to obtain all relevant studies, as combing through reference lists may yield a disproportionate number of positive studies. Researchers should also be encouraged to cite all relevant studies, and peer reviewers may play a part in ensuring that relevant negative studies are cited and that abstracts provide an accurate and complete representation of the results.

Our study is not a meta-analysis and is not intended to provide a definitive answer to the question of whether 5-HTTLPR moderates the association between life stress and the development of depression. Instead, we examined whether there is a tendency within this literature to preferentially cite some studies over others. We have shown that positive studies receive a disproportionate amount of attention and that negative studies are frequently presented as positive, which distorts the apparent evidence base. In the G × E field, where individual studies often include a variety of analyses and p values, it is difficult for any reader to tell the forest from the trees. The presence of a selective focus on positive findings and citation bias further compounds this difficulty by hiding published negative results from view and rendering the ‘forest’ greener than it truly is.

Supplementary material

For supplementary material accompanying this paper visit http://dx.doi.org/10.1017/S0033291716000805

Acknowledgements

This research received no specific grant from any funding agency, commercial or not-for-profit sectors. M.R.M. is a member of the United Kingdom Centre for Tobacco and Alcohol Studies, a UK Clinical Research Collaboration (UKCRC) Public Health Research Centre of Excellence. Funding from the British Heart Foundation, Cancer Research UK, Economic and Social Research Council, Medical Research Council and the National Institute for Health Research, under the auspices of the UKCRC, is gratefully acknowledged.

Declaration of Interest

None.

References

Bastiaansen, JA, de Vries, YA, Munafò, MR (2015). Citation distortions in the literature on the serotonin-transporter-linked polymorphic region and amygdala activation. Biological Psychiatry 78, E35E36.CrossRefGoogle ScholarPubMed
Bastiaansen, JA, Servaas, MN, Marsman, JBC, Ormel, J, Nolte, IM, Riese, H, Aleman, A (2014). Filling the gap: relationship between the serotonin-transporter-linked polymorphic region and amygdala activation. Psychological Science 25, 20582066.CrossRefGoogle Scholar
Boutron, I, Dutton, S, Ravaud, P, Altman, DG (2010). Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. JAMA 303, 20582064.CrossRefGoogle ScholarPubMed
Caspi, A, Hariri, AR, Holmes, A, Uher, R, Moffitt, TE (2010). Genetic sensitivity to the environment: the case of the serotonin transporter gene and its implications for studying complex diseases and traits. American Journal of Psychiatry 167, 509527.CrossRefGoogle Scholar
Caspi, A, Sugden, K, Moffitt, TE, Taylor, A, Craig, IW, Harrington, HL, McClay, J, Mill, J, Martin, J, Braithwaite, AW, Poulton, R (2003). Influence of life stress on depression: moderation by a polymorphism in the 5-HTT gene. Science 301, 386389.CrossRefGoogle ScholarPubMed
Culverhouse, RC, Bowes, L, Breslau, N, Nurnberger, JI, Burmeister, M, Fergusson, DM, Munafò, MR, Saccone, NL, Bierut, LJ (2013). Protocol for a collaborative meta-analysis of 5-HTTLPR, stress, and depression. BMC Psychiatry 13, 304.CrossRefGoogle ScholarPubMed
De Jonge, P, Conradi, HJ, Thombs, BD, Rosmalen, JGM, Burger, H, Ormel, J (2011). Prevention of false positive findings in observational studies: registration will not work but replication might. Journal of Epidemiology and Community Health 65, 9596.CrossRefGoogle ScholarPubMed
Duncan, L, Keller, M (2011). A critical review of the first 10 years of candidate gene-by-environment interaction research in psychiatry. American Journal of Psychiatry 168, 10411049.CrossRefGoogle ScholarPubMed
Etter, J-F, Stapleton, J (2009). Citations to trials of nicotine replacement therapy were biased toward positive results and high-impact-factor journals. Journal of Clinical Epidemiology 62, 831837.CrossRefGoogle ScholarPubMed
Greenberg, SA (2009). How citation distortions create unfounded authority: analysis of a citation network. BMJ 339, b2680.CrossRefGoogle ScholarPubMed
Hammen, C (2005). Stress and depression. Annual Review of Clinical Psychology 1, 293319.CrossRefGoogle ScholarPubMed
Heininga, VE, Oldehinkel, AJ, Veenstra, R, Nederhof, E (2015). I just ran a thousand analyses: benefits of multiple testing in understanding equivocal evidence on gene–environment interactions. PLOS ONE 10, e0125383.CrossRefGoogle Scholar
Hu, X, Oroszi, G, Chun, J, Smith, TL, Goldman, D, Schuckit, MA (2005). An expanded evaluation of the relationship of four alleles to the level of response to alcohol and the alcoholism risk. Alcoholism: Clinical and Experimental Research 29, 816.CrossRefGoogle Scholar
Jannot, A-S, Agoritsas, T, Gayet-Ageron, A, Perneger, TV (2013). Citation bias favoring statistically significant studies was present in medical research. Journal of Clinical Epidemiology 66, 296301.CrossRefGoogle ScholarPubMed
Kapur, S, Phillips, AG, Insel, TR (2012). Why has it taken so long for biological psychiatry to develop clinical tests and what to do about it? Molecular Psychiatry 17, 11741179.CrossRefGoogle Scholar
Karg, K, Burmeister, M, Shedden, K, Sen, S (2011). The serotonin transporter promoter variant (5-HTTLPR), stress, and depression meta-analysis revisited: evidence of genetic moderation. Archives of General Psychiatry 68, 444454.CrossRefGoogle ScholarPubMed
Kjaergard, LL, Gluud, C (2002). Citation bias of hepato-biliary randomized clinical trials. Journal of Clinical Epidemiology 55, 407410.CrossRefGoogle ScholarPubMed
Martin, J, Cleak, J, Willis-Owen, SAG, Flint, J, Shifman, S (2007). Mapping regulatory variants for the serotonin transporter gene based on allelic expression imbalance. Molecular Psychiatry 12, 421422.CrossRefGoogle ScholarPubMed
Munafò, MR, Durrant, C, Lewis, G, Flint, J (2009). Gene x environment interactions at the serotonin transporter locus. Biological Psychiatry 65, 211219.CrossRefGoogle Scholar
Murphy, SE, Norbury, R, Godlewska, BR, Cowen, PJ, Mannie, ZM, Harmer, CJ, Munafò, MR (2013). The effect of the serotonin transporter polymorphism (5-HTTLPR) on amygdala function: a meta-analysis. Molecular Psychiatry 18, 512520.CrossRefGoogle ScholarPubMed
Nieminen, P, Rucker, G, Miettunen, J, Carpenter, J, Schumacher, M (2007). Statistically significant papers in psychiatry were cited more often than others. Journal of Clinical Epidemiology 60, 939946.CrossRefGoogle ScholarPubMed
Nosek, BA, Spies, JR, Motyl, M (2012). Scientific Utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspectives on Psychological Science 7, 615631.CrossRefGoogle ScholarPubMed
Park, I-U, Peacey, MW, Munafò, MR (2014). Modelling the effects of subjective and objective decision making in scientific peer review. Nature 506, 9396.CrossRefGoogle ScholarPubMed
Risch, N, Herrell, R, Lehner, T, Liang, K, Eaves, L, Hoh, J, Griem, A, Kovacs, M, Ott, J, Merikangas, KR (2009). Interaction between the serotonin transporter gene (5-HTTLPR), stressful life events, and risk of depression. JAMA 301, 24622471.CrossRefGoogle ScholarPubMed
Roest, AM, de Jonge, P, Williams, CD, de Vries, YA, Schoevers, RA, Turner, EH (2015). Reporting bias in clinical trials investigating the efficacy of second-generation antidepressants in the treatment of anxiety disorders: a report of 2 meta-analyses. JAMA Psychiatry 72, 500510.CrossRefGoogle ScholarPubMed
Sharpley, CF, Palanisamy, SKA, Glyde, NS, Dillingham, PW, Agnew, LL (2014). An update on the interaction between the serotonin transporter promoter variant (5-HTTLPR), stress and depression, plus an exploration of non-confirming findings. Behavioural Brain Research 273, 89105.CrossRefGoogle ScholarPubMed
Simmons, JP, Nelson, LD, Simonsohn, U (2011). False-positive psychology: undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science 22, 13591366.CrossRefGoogle ScholarPubMed
Song, F, Parekh-Bhurke, S, Hooper, L, Loke, YK, Ryder, JJ, Sutton, AJ, Hing, CB, Harvey, I (2009). Extent of publication bias in different categories of research cohorts: a meta-analysis of empirical studies. BMC Medical Research Methodology 9, 79.CrossRefGoogle ScholarPubMed
Sullivan, PF (2007). Spurious genetic associations. Biological Psychiatry 61, 11211126.CrossRefGoogle ScholarPubMed
Sullivan, PF, Neale, MC, Kendler, KS (2000). Genetic epidemiology of major depression: review and meta-analysis. American Journal of Psychiatry 157, 15521562.CrossRefGoogle ScholarPubMed
Wendland, JR, Martin, BJ, Kruse, MR, Lesch, K-P, Murphy, DL (2006). Simultaneous genotyping of four functional loci of human SLC6A4, with a reappraisal of 5-HTTLPR and rs25531. Molecular Psychiatry 11, 224226.CrossRefGoogle ScholarPubMed
Whiteford, HA, Degenhardt, L, Rehm, J, Baxter, AJ, Ferrari, AJ, Erskine, HE, Charlson, FJ, Norman, RE, Flaxman, AD, Johns, N, Burstein, R, Murray, CJL, Vos, T (2013). Global burden of disease attributable to mental and substance use disorders: findings from the Global Burden of Disease Study 2010. Lancet 382, 15751586.CrossRefGoogle ScholarPubMed
Zammit, S, Owen, MJ, Lewis, G (2010). Misconceptions about gene–environment interactions in psychiatry. Evidence-Based Mental Health 13, 6568.CrossRefGoogle ScholarPubMed
Figure 0

Fig. 1. Percentage of citations received by positive, negative and unclear studies. The inner ring indicates the percentage of studies of each type. The outer ring indicates the percentage of total citations received by studies with positive, negative or unclear outcomes.

Figure 1

Fig. 2. Abstract coding by study outcome. The categories on the x-axis represent the outcome of the study, while the different sections of the bars indicate the abstract coding.

Figure 2

Fig. 3. Percentage of citations received by negative studies without a positive focus, with a partially positive focus and with a positive focus. The inner ring indicates the percentage of studies of each type. The outer ring indicates the percentage of total citations received by studies of each type.

Supplementary material: Image

de Vries supplementary material

Figure image

Download de Vries supplementary material(Image)
Image 2.6 MB
Supplementary material: Image

de Vries supplementary material

Figure image

Download de Vries supplementary material(Image)
Image 2.4 MB
Supplementary material: PDF

de Vries supplementary material

Flow chart

Download de Vries supplementary material(PDF)
PDF 44.7 KB
Supplementary material: File

de Vries supplementary material

Table

Download de Vries supplementary material(File)
File 144.9 KB
Supplementary material: File

de Vries supplementary material

Tables S1-S4

Download de Vries supplementary material(File)
File 29.3 KB