Introduction
Systematic reviews can provide a wealth of information, through meta-analyses of data and qualitative judgments about the certainty of available evidence. They aim at providing information on relative effects of treatments that are directly compared, including estimates of differences among treatments or between each treatment and the common comparator, and information related to the weight of each comparison (and to the related certainty of evidence). When the effects of multiple treatments need to be assessed, network meta-analysis (NMA) may be used—besides multiple direct comparisons—to allow a quantitative estimation of the effect of different interventions using both direct and indirect comparisons against a common comparator [Reference Chaimani, Caldwell, Li, Higgins and Salanti1].
Several NMA-specific tools are available to present graphical and numerical summaries of the analyses. Network plots (Figure 1) can show which direct comparisons are available within the network and how many trials or patients are available for each direct comparison [Reference Chaimani, Higgins, Mavridis, Spyridonos and Salanti2]; sometimes a judgment on the risk of bias associated with each direct comparison is also shown by applying coloring of edges.
League tables (Figure 2) report the meta-analytic estimate with confidence interval (CI) of the difference in effect sizes between each pair of treatments, derived from both direct and indirect comparisons [Reference Chaimani, Caldwell, Li, Higgins and Salanti1].
Summary of findings (SoF) tables (Supplementary Figure 1) report the meta-analytic estimate described above referred to single comparisons, generally between each treatment and a common reference treatment, adding information on the overall judgment of the certainty of evidence which is derived from risk of bias, indirectness, inconsistency, and publication bias [Reference Schunemann, Higgins, Vist, Glasziou, Akl and Skoetz3;Reference Yepes-Nunez, Li, Guyatt, Jack, Brozek and Beyene4].
These reporting tools can provide information on the number of available comparisons, on estimates of their relative effects, and on the certainty of evidence associated with these estimates. However, such information is usually split among different figures and tables and cannot be easily displayed as a single synoptic graphic tool, especially in presence of multiple comparisons among different health treatments and technologies.
Synthesizing such information is then challenging. When NMA cannot be used (e.g., when the anchor treatment differs systematically between trials and when transitivity cannot be assumed), multiple direct comparisons would make the interpretation of results particularly difficult. Whether NMAs or just direct comparisons are performed, they should help provide readers with quantitative and qualitative information to facilitate the assessment of relative effects of treatments, as well as of their clinical relevance and the level of certainty of the related evidence.
A specific approach to facilitate knowledge transfer, that can be particularly useful in case of multiple comparisons, is proposed here with examples from a complex multi-health technology assessment (HTA) on surgical techniques for the treatment of benign prostatic hyperplasia, carried out by a team of European HTA Agencies as part of the outputs of the EUnetHTA Joint Action 3 [5].
Our goal was to provide an easily interpretable overview of the available research and specifically about: (1) which technologies had been compared, (2) estimates of the size of the differences for available comparisons, (3) their clinical relevance, and (4) the certainty of the related evidence.
Methods and Results
When assessing twenty-one minimally invasive technologies alternative to the standard transurethral resection of the prostate (TURP) or to open prostatectomy (OP) (twenty-three technologies in total), NMA could not be performed on each of the twenty-seven selected outcomes since the transitivity assumption could not be warranted. We needed to find an alternative way to comprehensively visualize: all the available research, the quality of included studies, and the effect of each technology in the context of all the included alternatives; so that decisions related to the likely effect (including its uncertainty) of using each technology in clinical practice and about the research which is still needed could be facilitated. From this example, we developed two different (although conceptually similar) approaches that could be applied in any situation where synthesis of information from pairwise comparisons is at stake, providing information about different pairwise treatments on a single outcome (examples 1A and 1B), or about each treatment versus a common standard treatment on multiple outcomes (example 2).
Example 1A
Table 1 shows all the comparisons, available from the studies selected within our HTA, assessing International Prostate Symptom Score (IPSS) at 6 months. The rows display all interventions considered in our scope, whereas the columns display all the possible controls considered for comparison, including the two standard comparators in our scope—TURP and OP as well as the minimally invasive technologies listed among the interventions—since they all could be compared to each other. Following the row of a specific intervention, one can see how many other technologies it had been compared to and, looking at the corresponding cells, if there are differences, in favor of which technology, whether any difference is statistically significant and clinically relevant (using different colors), and what is the certainty of the related evidence (using abbreviations).
Key to certainty of evidence: H = high; L = low; M = moderate; VL = very low certainty of evidence [Reference Hultcrantz, Rind, Akl, Treweek, Mustafa and Iorio6].
Notes on reasons for downgrading the evidence and key messages were provided in Ref. [5].
Key to quantitative differences
Intervention statistically significantly better than control (95% CI crossing MCID).
Intervention statistically significantly better than control (but 95% CI below MCID).
No difference.
Intervention statistically significantly worse than control (but 95% CI below MCID).
Intervention statistically significantly worse than control (95% CI crossing MCID).
MCID, minimal clinically important difference; CI, confidence interval.
Notes: Comparison of each technology (intervention, left column) by row to the other technologies (control, other columns). The colors denote the quantitative difference for each comparison, as shown in the key. Letters denote the quality of the evidence.
We were particularly concerned about how to represent judgments on clinical relevance, often overlooked or missing in graphical representations of comparative results. In this regard, we used the concept of minimal clinically important difference (MCID), which was first used by Jaeschke et al. and defined as “the smallest difference in score in the domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive costs, a change in the patient's management” [Reference Jaeschke, Singer and Guyatt7]. When available, MCIDs are generally obtained considering patients’ or clinicians perspectives’—e.g., asking patients to indicate, in ad hoc studies, how relevant observed changes are, or through expert consensus [Reference Wells, Beaton, Shea, Boers, Simon and Strand8]. When assessing an estimate for a domain of interest (e.g., the estimated difference in IPSS between two interventions) in light of an MCID, we should also consider how uncertain is that estimate, therefore the possibility that the true effect can be higher than the MCID threshold even if the point estimate is lower. In this regard, we used more or less intense colors so that, if the 95% CI was entirely below the MCID, a statistically significant difference was considered not clinically relevant and the blue (or red) color (depending on the direction of effect) was less intense; in case the CI crossed the MCID, we could not exclude the possibility of a clinically relevant difference and used a more intense color. We decided not to add quantitative estimates and CIs to keep the table easier to read, relying on color intensity to indicate the level of uncertainty; however, it would be possible to add CIs to provide more precise indication of it. As for judgments on the quality of evidence, notes on reasons for downgrading the evidence were also provided for each comparison. Final key messages eventually synthesized the overall results.
Since most technologies are listed among both interventions and controls, information on a specific comparison may be available in two different cells (intervention A × control B, and intervention B × control A), where colors may be reversed depending on the direction of effect (A × B, and B × A). Even if the related information is duplicated, this representation may help a reader to quickly get information on the quantity and the quality of information available for each intervention of interest just by following the corresponding row.
Example 1B
The previous example refers to the assessment of continuous outcomes where an MCID is available. The same method can also be used with dichotomous outcomes. In this case, color gradients could also be used to highlight different strengths of relative risks, always considering the level of uncertainty (Table 2). Judgments on clinical relevance of the observed differences for binary outcomes are difficult, since no related MCIDs are available in the scientific literature. In case, frequency and perceived relevance of the specific outcomes should be considered to discuss and provide such judgments.
Key to certainty of evidence : H = high; L = low; M = moderate; VL = very low certainty of evidence [Reference Hultcrantz, Rind, Akl, Treweek, Mustafa and Iorio6].
Notes on reasons for downgrading the evidence and key messages were provided in Ref. [5].
Key to quantitative differences
Intervention statistically significantly worse than control (95% CI for RR > 2 and includes 3, or 95% CI for RR > 3).
Intervention statistically significantly worse than control (95% CI for RR includes 2 but < 3).
Intervention statistically significantly worse than control (95% CI for RR between 1 and 2).
Intervention statistically significantly better than control (95% CI for RR < 1 and includes 0.5).
Intervention statistically significantly better than control (95% CI for RR < 0.5 and includes 0.33).
Intervention statistically significantly better than control (95% CI for RR < 0.33).
No difference.
Notes: Comparison of each technology (intervention, left column) by row to the other technologies (control, other columns). The colors denote the quantitative difference for each comparison, as shown in the key. Letters denote the quality of the evidence.
Example 2
When the objective is to present results of parallel comparisons for each technology versus only one comparator, a single table can display results for each technology versus that comparator, using columns to show results for each of the outcomes of interest (e.g., on IPSS and Qmax at different time periods, Table 3).
Key to certainty of evidence:
H = high; L = low; M = moderate; VL = very low certainty of evidence [Reference Hultcrantz, Rind, Akl, Treweek, Mustafa and Iorio6].
Notes on reasons for downgrading the evidence and key messages were provided in Ref. [5].
Key to quantitative differences:
Intervention statistically significantly better than control (95% CI crossing MCID).
Intervention statistically significantly better than control (but 95% CI below MCID).
No difference.
Intervention statistically significantly worse than control (but 95% CI below MCID).
Intervention statistically significantly worse than control (95% CI crossing MCID).
Notes: Comparison of each technology (intervention, left column) by row to the other technologies (control, other columns). The colors denote the quantitative difference for each comparison, as shown in the key. Letters denote the quality of the evidence.
Discussion
Transferring information on benefits, risks, and certainty of the available evidence on different health interventions is challenging, especially when assessing multiple treatments. Reporting graphic tools, used in the context of NMAs or of direct comparisons, provides a set of relevant information but may be targeted to researchers more than to different kinds of decision makers. For example, SoF tables provide a wealth of information, but they refer to one comparison at a time; league tables appear as kind of mathematical artifacts providing quantitative estimates that are not easy to interpret in terms of direction of effect.
Different pieces of information need to be integrated in order to get an overall picture about what is particularly relevant for a decision maker, at least for interpreting trial results and learning about missing research. In other words, the overall picture should include information on which treatments have been (or not) compared, estimates of differences (if present), their clinical relevance, and certainty of the available evidence. This would be important considering different decision-making contexts, either at the bedside or when deciding about technologies to be acquired or implemented in clinical practice, or when deciding about which new research should get resources.
Compressing data from available studies always entails the risk of overlooking “real-life” complexities, missing possibly relevant information related to the heterogeneity of patient populations, and uncertainty of “average” estimates. While some compromise between completeness of presented information and immediate comprehension may be required, further data (e.g., on the level of uncertainty through CIs and on subgroup analyses) and more in-depth layers of information can be always added through different tools so that different kinds of decision makers and researchers could find all the data they need. In other words, the ultimate goal should not just be to synthesize (thus reducing) information, but to provide with progressively enriched contents that could help to grasp the bigger picture, including all relevant data for decision making, without necessarily missing more in-depth analyses. In all regards, it should be noted that the clinical relevance of results is often overlooked in data synthesis, considering that it may represent a key element for decision makers and (of course) patients. MCIDs are especially missing for dichotomous outcomes and could be discussed in the context of HTAs looking at the frequency of these outcomes and at their perceived relevance, which may be also discussed with patients’ representatives.
As for the use of color gradients to indicate statistical significance and clinical relevance, issues of color vision deficiency, particularly affecting the vision of reds and greens, could be considered. In the examples displayed here, we used red-blue combinations, as they may be less problematic in this regard [9].
Conclusion
Knowledge translation requires easily interpretable tools, showing more attention to the information needs of different target groups [Reference Formoso, Rizzini, Bassi, Bonfanti, Rizzardini and Campomori10]. Knowledge can be best promoted by paying due attention to information tools and to the characteristics of target populations [Reference Formoso, Marata and Magrini11], as successful marketing of commercial products show. The specific contents should therefore allow a reader to be able to understand, on a given topic, if there is (or there is not) evidence-based information, and can make a judgment for himself/herself of what can be expected from the use of a diagnostic, preventive, or therapeutic intervention.
We tried to implement these principles by proposing, for each of the technologies assessed in a multi-HTA, a synopsis of the information deemed relevant for both decision makers and researchers; in just one table one can easily get information on all available comparisons, on whether the related results are statistically significant and clinically relevant and on the certainty of the related evidence. We think it may be a doable approach to ease the interpretation of articulated scientific data, which should always be one of the objectives of evidence synthesis. Such an approach could also find a relevant field of application in non-paper-based and electronic publishing, addressing information needs of those professionals seeking decision support from digital sources using related applications, as Richard Smith (former Editor of the BMJ) envisioned 25 years ago [Reference Smith12].
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0266462322000046.
Funding
The contents of this paper arise from the project “724130/EUnetHTA JA3,” which has received funding from the EU, in the framework of the Health Programme (2014–2020). Sole responsibility for its contents lies with the author(s) and neither the EUnetHTA Coordinator, the European Commission, nor any other body of the European Union is responsible for any use that may be made of the information contained therein.
Conflict of Interest
There are no conflicts of interest.