English is perceived generally to be the universal language of science (Reference Egger, Zellweger-Zähner and Schneider5;6). The top international medical journals, by Journal Citation Reports impact factor, are English-language publications (9). On the other hand, systematic reviews that rely exclusively on English-language studies may miss important evidence on a health intervention. Comprehensive searches to identify all relevant studies and minimize biases are essential for systematic reviews (1;Reference Higgins and Green8). Papers reporting positive results are more likely to be published in English-language journals, while papers reporting negative results are more likely to be published in non-English-language journals. While one study found higher estimates of effectiveness in non-English language trial reports (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11), other studies found no significant difference between meta-analyses that included non-English versus English-only trials in conventional medicine, but did find a difference in trials in alternative medicine (Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19). Systematic bias due to the selection of studies in a particular language is called a language bias (Reference Egger, Juni, Bartlett, Holenstein and Sterne4). The potential for this type of bias in English-language only study selection is called a “Tower of Babel” bias (Reference Grégorie, Derderian and Le Lorier7) or “English-language” bias (Reference Egger, Zellweger-Zähner and Schneider5). Bias may lead to an over- or underestimation of an intervention's effectiveness, and ultimately, to inappropriate health policy decisions or patient care (Reference Grégorie, Derderian and Le Lorier7).
Barriers to including trials published in languages other than English (LOE) in systematic reviews are the time and costs required to obtain and translate studies. Whether these additional resources are justified to minimize bias is not clear. Health technology assessments usually involve systematic reviews or meta-analyses, thus examination of English-language bias may be helpful for researchers in this field.
OBJECTIVES
The objective of this work was to examine the impact of English-language restriction on systematic review-based meta-analyses (SR/MA). It is based on an earlier assessment of language restrictions in systematic reviews (Reference Morrison, Moulton and Clark16).
METHODS
Literature Search Strategy
The literature search included bibliographic databases such as: MEDLINE, PubMed, The Cochrane Library, EMBASE, Biosis Previews, and CINAHL. Search terms included controlled vocabulary (e.g., “selection bias” and “publication bias”) and additional keywords (e.g., “non-English” and “LOE”). No language or study type limits were applied. Project team information specialists reviewed the search strategy. The search timeframe was from January 1990 until March 2011, and monthly update searches were run using Ovid AutoAlerts. The grey literature search included health technology assessment agency Web sites, meeting abstracts, Google, and bibliographies in relevant papers. Details are reported in Morrison et al. (Reference Morrison, Moulton and Clark16).
Selection Criteria
Studies were eligible for inclusion if they measured the effect of excluding randomized controlled trials (RCTs) in LOE for one or more outcomes in SR/MA of conventional medicine.Footnote 1 Outcomes measured included summary treatment effect, methodological quality, and statistical heterogeneity.
Selection and Data Extraction
In the first screen, two reviewers (A.M., and K.M. or M.C.) independently reviewed titles to remove obviously irrelevant references. In the second screen, two reviewers (K.M., M.C.) scanned titles and abstracts and applied selection criteria. Information was extracted by two reviewers (K.M., A.M.) using a structured form, checked for discrepancies, and tabulated (Table 1). When necessary, reviewers contacted study authors for additional information. Differences were discussed and resolved by consensus.
CDSR, Cochrane Database of Systematic Reviews; CISCOM, Centralised Information Service for Complementary Medicine; CRD, Centre for Reviews and Dissemination; HTA, health technology assessment; LOE, languages other than English; MA, meta-analysis; NHS R&D, National Health Service Research and Development; NR, not reported; RCT, randomized controlled trial; SR/MA, systematic review-based meta-analyses.
Quality Assessment
A checklist (Reference Downs and Black3) validated for human analytic studies was adapted for this review and applied by two reviewers (A.M., J.P.). Questions were associated with domains of reporting and internal validity. Differences were discussed and resolved by consensus.
Data Analysis Methods
Studies were detailed in evidence tables and a structured discussion of the data was prepared.
RESULTS
Quantity of Research Available
The PRISMA flowchart (Figure 1) shows the selection process. From 26,551 unique citations identified in the literature search, twenty-five full-text papers were reviewed. Five reports (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15;Reference Pham, Klassen, Lawson and Moher19) describing three unique studies (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15) were included.
Study Characteristics
Table 1 summarizes the characteristics of the five included reports.
Study Design. All reports identified meta-analyses through literature searches and application of selection criteria. The meta-analytic endpoint of a binary outcome in each meta-analysis was compared using an odds ratio to the same meta-analysis re-analyzed after removing data from LOE trials. Bias was expressed as a summary effect measured across all SR/MAs combined meta-analytically and reported as a ratio of odds ratios (ROR).
Selection Criteria. In two reports, SR/MAs were included if they were published in English, the main data sources were RCTs, and the review stated whether only English-language trials were eligible for inclusion or whether LOE trials were also considered (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Pham, Klassen, Lawson and Moher19). Pham et al. included English language systematic reviews which included at least one LOE trial on the meta-analytic outcomes of interest (Reference Pham, Klassen, Lawson and Moher19). Moher et al. considered meta-analyses that included between two and 99 trials and reported binary outcomes (Reference Moher, Pham and Klassen15). Their study included meta-analyses that excluded LOE studies, and those that included LOE (whether or not LOE trials were used in the analysis) (Reference Moher, Pham and Klassen15). Jüni et al. (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11) and Egger et al. (Reference Egger, Juni, Bartlett, Holenstein and Sterne4) included meta-analyses with information to allow replication of the meta-analysis (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11).
Databases Searched. All reports searched the Cochrane Database of Systematic Reviews; four searched MEDLINE (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19), and three searched EMBASE (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19). Moher et al. (Reference Moher, Pham, Lawson and Klassen12) and Pham et al. (Reference Pham, Klassen, Lawson and Moher19) included searches of the Centralized Information Services for Complementary Medicine. Searches covered literature from Reference Egger, Juni, Bartlett, Holenstein and Sterne1966 (Reference Moher, Pham and Klassen15) to 1999 (Reference Pham, Klassen, Lawson and Moher19).
Number of Studies Reviewed. The number of meta-analyses in the reports ranged from 42 (Reference Pham, Klassen, Lawson and Moher19) to 130 (Reference Moher, Pham, Lawson and Klassen12). The number of randomized controlled trials (RCTs) ranged from 600 (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11) to 783 (Reference Egger, Juni, Bartlett, Holenstein and Sterne4).
Languages of Studies. Systematic reviews in four of the reports considered RCTs published in German, French, Italian, and Spanish (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15). Other languages included Chinese (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham and Klassen15), Portuguese (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12), and Danish (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12). Of the LOE trials in Jüni et al. (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11) and Egger et al. (Reference Egger, Juni, Bartlett, Holenstein and Sterne4), forty-two trials (36.5 percent) were in German, twenty-nine (25.2 percent) in French, twelve (10 percent) in Italian, eight (7 percent) in Japanese, seven (6.1 percent) in Spanish, six (5.2 percent) in Portuguese, eight (7 percent) in four other European languages, and three (2.6 percent) in Chinese. Of the 1,383 trials included in these two reviews, 115 (8.3 percent) were in LOE. Other reviews did not state the proportion of LOE trials (Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15;Reference Pham, Klassen, Lawson and Moher19).
Disease Areas and Medical Specialties. The diseases areas included infectious diseases (four reports) (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15), and circulatory diseases (two reports) (Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15). Two reports included systematic reviews of complementary and alternative medicine (Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19). None of the reports described the RCTs or patient populations of included studies.
Country of Origin. Three reports were published in Canada (Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15;Reference Pham, Klassen, Lawson and Moher19), one in the United Kingdom (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11), and one in Switzerland (Reference Egger, Juni, Bartlett, Holenstein and Sterne4).
Source of Funding. Moher et al. received funding from the Medical Research Council of Canada (Reference Moher, Pham and Klassen15); other reports were funded by the UK National Health Service Research & Development Health Technology Assessment Program (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19). No reports declared a conflict of interest.
All reports were methodologically sound and most met all quality assessment criteria for reporting (e.g., objectives, outcomes, study characteristics, confounders and findings were clearly stated) and validity (e.g., estimates of random variability, probability values and statistical tests for main outcomes). The quality assessment checklist is described elsewhere (Reference Morrison, Moulton and Clark16). We noted flaws in two areas: sample power calculation and distribution of confounders. Two reports (Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19) reported a sample power calculation, and another (Reference Pham, Klassen, Lawson and Moher19) did not describe the distribution of confounders though it did refer to another report with this information.
Data Synthesis and Analyses
The impact of including or excluding LOE trials are presented below. Table 2 summarizes each report findings.
LOE, Language other than English.
Bias in Summary Treatment Effects. None of the reports found major differences in summary treatment effects between English-language only meta-analyses and LOE-inclusive meta-analyses (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15;Reference Pham, Klassen, Lawson and Moher19).
Moher et al. (Reference Moher, Pham and Klassen15) found that language-restricted meta-analyses did not differ in the estimate of benefit of effectiveness of an intervention (ROR 0.98, 95 percent confidence interval [CI] 0.81 to 1.17). This suggested an average 2 percent difference between treatment estimates with or without language restrictions. Language inclusive meta-analyses had narrower CIs (average width 0.79; 95 percent CI 0.51 to 1.07) compared with English-language only meta-analyses (average width 0.92; 95 percent CI 0.53 to 1.32; relative difference of 16 percent; p = .045)—probably because meta-analyses without language restrictions typically include more trials.
Egger et al. (Reference Egger, Juni, Bartlett, Holenstein and Sterne4) and Jüni et al. (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11) found treatment effect estimates in LOE trials showed more benefit (ROR 0.84; 95 percent CI 0.74 to 0.97; p = .011). Significant heterogeneity was present between meta-analyses (p = .003), with pooled effect estimates of LOE trials ranging from 90 percent more to 147 percent less benefit compared with English-language trials. Changes in the pooled estimates of individual meta-analyses when LOE trials were excluded ranged from a 42 percent increase (less benefit) to a 22.7 percent decrease (more benefit) of the associated estimates relative to treatment effect. In 58 percent of the sixty meta-analyses the changes were less than 5 percent. Among the twenty-one remaining meta-analyses, five showed more benefit and sixteen showed less benefit, and average precision of pooled estimates decreased from 8.34 to 7.68 after LOE trials were excluded. The authors (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11) compared pooled estimates in cardiology and angiology (ROR 0.78, 95 percent CI 0.64 to 0.94), infectious disease (ROR 0.83, 95 percent CI 0.68 to 1.00), neurology (ROR 0.68, 95 percent CI 0.40 to 1.13), obstetrics and gynecology (ROR 1.00, 95 percent CI 0.61 to 1.65), psychiatry (ROR 0.63, 95 percent CI 0.39 to 1.02), rheumatology (ROR 1.02, 95 percent CI 0.80 to 1.30), and tobacco addiction (ROR 0.75, 95 percent CI 0.50 to 1.13). The extent of overestimation of effect sizes in LOE trials (an ROR of less than one) varied by field. Accordingly, the proportion of LOE trials incorporated in the meta-analyses ranged from 10.1 percent (tobacco addiction) and 12.3 percent (obstetrics) to 35 percent (psychiatry) and 35.7 percent (rheumatology). The LOE trials contributed an average of 17.5 percent of the weight in the meta-analyses (median 10.2 percent, range 1.2 to 81.1 percent).
Pham et al. (Reference Pham, Klassen, Lawson and Moher19) found that excluding LOE trials from meta-analyses did not affect results in conventional medicine. Bias was not detected in estimates of effectiveness in systematic reviews that excluded or included LOE (random effects ROR 1.02, 95 percent CI 0.83 to 1.26). English-language trials reported smaller effect sizes than LOE trials.
Number of Included Studies and Patients. Four (Reference Egger, Juni, Bartlett, Holenstein and Sterne4;Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Moher, Pham and Klassen15) reports examined the number of patients and studies in meta-analyses that included LOE trials versus those that did not.
Moher et al. (Reference Moher, Pham, Lawson and Klassen12) found that language-inclusive systematic reviews included more trials (median 17, interquartile range [IQR] 9 to 25) and more participants (median 1,658, IQR 112 to 40,341) than English-language only reviews (median 11 RCTs, IQR 6 to 23, median 971 patients, IQR 112 to 52,869). Moher et al. (Reference Moher, Pham and Klassen15) reported medians of nine trials per meta-analysis (IQR 6.5 to 18) in language-inclusive reviews, compared with medians of six studies (IQR 4 to 9.25) in language-restricted reviews. Egger et al. (Reference Egger, Juni, Bartlett, Holenstein and Sterne4) noted that trials in LOE had fewer participants than English-language trials but were more likely to show statistically significant results. Jüni et al. (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11) found that English-language trials had significantly higher mean (269 ± 487 compared with 147 ± 195; p < .01) and median (116 compared with 88; p < .01) sample sizes compared with LOE trials.
Methodological Quality. Two reports (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12) assessed the quality of the RCTs or the meta-analyses.
Moher et al. (Reference Moher, Pham, Lawson and Klassen12) detected small differences in reporting quality. Language-inclusive systematic reviews were higher quality and had more comprehensive searches than language-restrictive reviews (Reference Moher, Pham, Lawson and Klassen12). Small differences were detected in the reporting quality of English-language trials compared with those in other languages.
Moher et al. (Reference Moher, Pham, Lawson and Klassen12) found no statistically significant differences between English-language and LOE trials in likelihood of reporting a valid approach to patient randomization (90 percent compared with 83 percent; p = .13), accounting for patient withdrawals and losses to follow-up (64 percent compared with 57 percent; p = .43), or reported use of double-blinding (57 percent compared with 50 percent; p = .29). The authors compared RCT quality scores using the Jadad scale (Reference Jadad, Moore and Carroll10). Percentages of low-quality studies (Jadad score 0 to 2; 52 percent of English RCTs and 60 percent of LOE RCTs) and high-quality studies (Jadad score 3 to 5; 48 percent of English RCTs and 40 percent of LOE RCTs were comparable (p = .23). Allocation concealment was inadequate or unclear in 87 percent (English) and 96 percent (LOE) of trials.
In contrast, Jüni et al. found that English-language trials tended to be of higher methodological quality than those published in other languages (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11). Specifically, 88 English-language trials (35.7 percent) indicated adequate concealment of allocation compared with twelve LOE trials (25.0 percent) (p = .15), and 153 English-language trials (66.5 percent) were double- or assessor-blinded compared with twenty-three LOE trials (46.9 percent) (p = .016) (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11).
Publication Status. Two reports (Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19) found no evidence of publication bias in English-language only meta-analyses, or LOE-inclusive meta-analyses with or without LOE contribution to the quantitative analysis (Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19).
Statistical Heterogeneity. Moher et al. (Reference Moher, Pham, Lawson and Klassen12) used I2 to compare the statistical heterogeneity of English-language restricted meta-analyses or LOE-inclusive meta-analyses. The I2 statistic quantifies the percentage of variation across studies due to heterogeneity instead of chance. Between-study heterogeneity is considered substantial if I2 is 50 percent or more (Reference Deeks, Higgins, Altman, Higgins and Green2). They found that between-study heterogeneity increased by 2.4 percent with the inclusion of LOE RCTs in thirty-four systematic reviews in conventional medicine. Pham et al. (Reference Pham, Klassen, Lawson and Moher19) found no significant association between language of publication restrictions and statistical heterogeneity.
DISCUSSION
One limitation of this review is that no studies examined single fields of medicine, preventing analysis of LOE trials in particular specialties. Egger et al. demonstrated that LOE trials are important in psychiatry, rheumatology, and orthopedics (Reference Egger, Juni, Bartlett, Holenstein and Sterne4). Pan et al. concluded that Chinese studies are crucial in molecular medicine (Reference Pan, Trikalinos, Kavvoura, Lau and Ioannidis18). These studies indicate that the influence of LOE trials in different specialties may vary. Although the primary computation of RORs in several included articles did not identify significant changes in overall pooled measures of effectiveness, stratified analyses showed the impact of LOE trials is heterogeneous across medical specialties and there are more LOE trials in some areas of medicine (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham, Lawson and Klassen12;Reference Pham, Klassen, Lawson and Moher19).
There is conflicting evidence about the methodological and reporting quality of trials published in English versus those in LOE. Moher et al. (Reference Moher, Fortin and Jadad13;Reference Moher, Pham and Jones14) detected no differences in the reporting of randomization, double-blinding, dropouts, withdrawals, and allocation concealment. Previously, Moher et al. (Reference Moher, Pham and Jones14) found an association between poor reporting of methods and exaggerated estimates of efficacy. Jüni et al. (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11), however, found English-language trials were of higher methodological quality than LOE trials. The discrepancy may be due to the different quality measures used and the inclusion of alternative medicine SR/MA in the Moher report.
Some studies included meta-analyses where one or two trials reported in LOE were identified. This may not represent all available foreign-language studies, and may be due to a lack of resources for foreign language translation. Thus, the true “exposure” of meta-analyses to LOE data may be limited.
Another limitation to this review is that the reports are now relatively old with literature searches ranging from 1996 to 1999. Publishing practices may have changed, and research methods have since improved with greater adherence to guidelines for systematic reviews (Reference Higgins and Green8).
Two reports did not search EMBASE (Reference Jüni, Holenstein, Sterne, Bartlett and Egger11;Reference Moher, Pham and Klassen15). EMBASE covers more European journals (Reference Wilkin, Gillies and Davies20) and relevant studies may have been missed as a result.
Searching for studies in LOE may have other benefits, including increasing the external validity for specific clinical specialties where LOE studies are known to be important, and increasing awareness of the number and quality of LOE studies.
CONCLUSIONS
We found no evidence of systematic bias from the use of language restrictions in SRs/MAs in conventional medicine. There were conflicting findings about the methodological and reporting quality of English-language versus LOE trials. These findings do not rule out the potential for language bias when language restrictions are used. Searches should include LOE studies when resources and time are available to minimize the risk of a biased summary effect. More research, in different medical specialties, will provide better evidence on the effect of language restriction on systematic reviews.
CONTACT INFORMATION
Andra Morrison, BSc, ACLIP, Environmental Scanning Officer, Julie Polisena, MSc, Scientific Adviser, Canadian Agency for Drugs and Technologies in Health, Ottawa, Ontario, Canada
Don Husereau, BScPharm, MSc, Senior Scientist, Department of Health and HTA, UMIT–Private Universität für Gesundheitswissenschaften, Medizinische Informatik and Technik GmbH, Hall in Tirol, Austria; Adjunct Professor, Department of Epidemiology and Community Medicine, University of Ottawa, Ottawa, Ontario, Canada
Kristen Moulton, BA, Clinical Research Assistant, Michelle Clark, BSc, Clinical Research Assistant, Canadian Agency for Drugs and Technologies in Health, Ottawa Ontario, Canada
Michelle Fiander, MA, MLIS, Information Specialist, Cochrane Effective Practice and Organisation of Care (EPOC) Group, Ottawa, Ontario, Canada
Monika Mierzwinski-Urban, BA, MLIS, Information Specialist, Danielle Rabb, MLIS, Information Specialist, Canadian Agency for Drugs and Technologies in Health, Ottawa, Ontario, Canada
Tammy Clifford, PhD, Chief Scientist, Canadian Agency for Drugs and Technologies in Health; Adunct Professor, Department of Pediatrics, and Epidemiology & Community Medicine, Faculty of Medicine, University of Ottawa, Ottawa, Ontario, Canada
Brian Hutton, PhD, Senior Methodologist, Ottawa Hospital Research Institute, Ottawa, Ontario, Canada
CONFLICTS OF INTEREST
Michelle Fiander is employed by the Cochrane Effective Practice and Organisation of Care (EPOC) Group. The other authors report they have no potential conflicts of interest.