When a new health technology is more effective than its standard-care comparator, the methods to determine cost-effectiveness and make decisions are well established. For instance, if the new technology is more effective and less costly than standard care, it is considered a dominant strategy and is likely to be adopted. Conversely, if the new technology is less effective and more costly than standard care, it is considered a strategy that is dominated and is unlikely to be adopted. If the new technology is more effective and more costly, the ratio of incremental costs to incremental effectiveness can be calculated, and this ratio is sometimes compared with a willingness-to-pay threshold (WTP). In theory, this threshold represents the maximum amount a payer is willing to pay for an additional unit of effect and is sometimes considered as the cutoff value for determining cost-effectiveness. There is debate regarding appropriate threshold values, with pronounced between country variations, and the published literature addresses this topic extensively (Reference Thokala, Ochalek, Leech and Tong1;Reference Cameron, Ubels and Norstrom2).
However, when a new technology is less effective and less costly than standard care, decision making is more complex and has been infrequently studied. There are no two treatments exactly identical, while the differences can be small or large. When the differences between the new technology and the standard care are adequately small, the new intervention may be termed non-inferior to the standard treatment. Otherwise, the new intervention is truly inferior to the standard treatment. Guidance for evaluating non-inferior technologies is important, because a lower relative effectiveness may be offset by other advantages, such as lower cost, greater availability or patient preference. Each of these factors, and others, might be considered by decision makers.
Repetitive transcranial magnetic stimulation (rTMS) for treatment-resistant depression is an example of a technology that is marginally less effective, but with potentially attractive offsets, than what might be considered the standard of care, electroconvulsive therapy (ECT). ECT is sometimes used in patients who do not respond to antidepressant medications (Reference Enns, Reiss and Chan3). However, several barriers, such as geographic availability, lack of skilled operators, and safety concerns, have limited its uptake (Reference Delva, Graf and Patry4). Previous studies have shown that, although rTMS is more effective than sham therapy for treatment-resistant depression, it is less effective than ECT (Reference Knapp, Romeo and Mogg5–9). However, rTMS is a less invasive procedure with fewer short-term side effects than ECT. For these reasons, patients may prefer rTMS over ECT (Reference Walter, Martin, Kirkby and Pridmore10). We conducted an economic evaluation as part of a health technology assessment to inform a recommendation regarding publicly funding rTMS (8;9). Results from our earlier economic evaluation indicated that rTMS was less effective than ECT and also less costly (9).
This scenario is not unique. For instance, home-based sleep apnea tests are less accurate than laboratory-based polysomnography, but home-based tests also cost less. Home-based testing has other obvious advantages as well (Reference El Shayeb, Topfer, Stafinski, Pawluk and Menon11). The use of a conventional incremental cost-effectiveness ratio (ICER) can be misleading in this scenario. Although negative incremental effects and negative incremental costs yield a positive ICER, the direction of decision making based on this ratio is different from an ICER calculated from positive incremental effects and costs. Net monetary benefit (NMB) analysis addresses some of the primary concerns with ICERs (Reference Glick, Briggs and Polsky12), but the use of traditional WTPs is problematic when considering new interventions that are less effective than the current standard treatment.
This concept is consistent with the principle of loss aversion, whereby individuals have a tendency to prefer avoiding loss over obtaining equivalent gains (Reference Kahneman and Tversky13). Some studies have suggested the use of an alternative, often higher, WTP in such cases (Reference Eckermann14–Reference Claxton, Martin and Soares16). However, significant ethical, social, and political challenges may be associated with disinvesting in an effective standard treatment and replacing it with a substantially inferior technology (e.g., when the effectiveness of the technology is similar to that of a placebo or no treatment).
Borrowing from the concepts of non-inferiority clinical trials, we propose a new decision-making framework when faced with a new technology that is less costly and less effective than its comparator. We developed our approach for model-based economic evaluations that report outcome measures in quality-adjusted life-years (QALYs). The underlying assumptions of this framework are that (i) if the new intervention does not meet non-inferiority to its standard-care comparator, it should generally not be adopted, and (ii) when the difference in effectiveness between both interventions is clinically acceptable (i.e., the new intervention is judged to be non-inferior), the NMB of the two interventions is a key factor in decision making. This framework overcomes the potential limitations of the classic interpretations of the ICER and NMB. Next, we introduce the method of estimating the effectiveness preserved by the new intervention and the new framework for determining cost-effectiveness. Then, we illustrate the application of the framework using rTMS for treatment-resistant depression as an example.
Methods
Estimating the Fraction of Effectiveness Preserved by the New Technology
In the present study, we distinguish between health outcomes in an economic evaluation (e.g., QALYs) and health outcomes in a clinical trial by referring to the former as “effectiveness” outcomes and the latter as “effect” outcomes. We applied the concept of non-inferiority trials (17;Reference Wangge, Roes, de Boer, Hoes and Knol18) to model-based economic evaluations. Although the placebo control group is often not included in the non-inferiority trial, to conclude non-inferiority conceptually is to synthesize evidence from both the current non-inferiority trial comparing the experimental therapy with the standard therapy and historical data comparing the standard therapy with a placebo control (17;Reference Xie, Wang, Ng and Sikich19). However, we explicitly included the placebo control group to determine the fraction of effectiveness preserved by the new intervention to use in a model-based economic evaluation.
We used probabilistic sensitivity analysis (PSA) to simultaneously examine the effectiveness of a new technology, an active control (i.e., the standard treatment), and a placebo control (or no treatment). In PSA, distributions are assigned to the model parameters to reflect uncertainty in the decision model (Reference Briggs, Sculpher and Claxton20). Monte Carlo simulation is used to generate model parameter estimates and their associated uncertainty from these distributions and to generate probability distributions for the final results (i.e., model outcomes). These results are then used to determine the probability of non-inferiority and cost-effectiveness.
In our analysis, we used QALYs to evaluate treatment effectiveness, as QALYs are the most common and comprehensive measure of health outcomes in economic evaluations. To estimate the fraction of effectiveness preserved by the new intervention, we modeled the QALYs associated with the new intervention, an active control (or standard treatment), and a placebo control (or no treatment). We then calculated the fraction of effectiveness preserved for each simulation as follows (17;Reference Wangge, Roes, de Boer, Hoes and Knol18):
where
F = the fraction of effectiveness preserved by the new intervention, ∆ QIP = the difference in QALYs between the new intervention and the placebo control (or no treatment), ∆ QAP = the difference in QALYs between the active control (or standard treatment) and the placebo control (or no treatment), QI = the QALYs associated with the new intervention, QP = the QALYs associated with the placebo control (or no treatment), and QA = the QALYs associated with the active control (or standard treatment)
Our decision-making framework is guided by the following assumptions. First, if the new intervention is found to be more effective than the placebo control (i.e., ∆ QIP > 0), it is considered superior to a “do nothing” strategy. Second, if the new intervention preserves a certain fraction of the effect of the active control (or, equivalently, if we tolerate a certain percentage of effect loss), it is considered non-inferior to the active control. If there are multiple standard treatments, we can model the new intervention, the multiple active controls, and the placebo control together and estimate the effectiveness preserved by the new intervention relative to each active control. If there are multiple new interventions and one standard treatment, we can estimate the effectiveness preserved by each new intervention and the cost-effectiveness of each relative to the standard treatment. Further discussion of the non-inferiority margin in clinical trials and the fraction of effectiveness preserved in the economic model can be found in Supplementary Material 1.
Defining the Threshold of Effectiveness Preserved when Assessing Non-inferiority in Economic Evaluations
The threshold of effectiveness preserved is a key determining factor in assessing the non-inferiority of a new intervention. According to the U.S. Food and Drug Administration, several non-inferiority trials have used a threshold of 50 percent or 67 percent of effect preserved to determine non-inferiority (17;Reference Wangge, Roes, de Boer, Hoes and Knol18). Lower thresholds for non-inferiority would make it more feasible (i.e., in terms of sample size) to conclude non-inferiority in trials. Such trials typically deal with one (or just a few) specific outcome related to the clinical effect of an intervention relative to its comparator(s). We recommend a higher threshold be used for economic modeling studies than what is currently recommended for non-inferiority clinical trials, as a higher threshold indicates that a higher level of effectiveness must be preserved to achieve non-interiority.
The underlying assumption for this recommendation is that the difference in clinical outcomes between two treatments would be larger than the difference measured in QALYs. Non-inferiority trials typically report results for a single primary health outcome, whereas economic modeling evaluations typically report results using QALYs, a summary measure of the combined effects of several health outcomes. As such, in a scenario in which two treatments have demonstrated substantially different primary outcomes but identical secondary outcomes, the difference in primary outcomes may become diluted in the combined effects of the QALY. Furthermore, model-based evaluations can incorporate an improved safety profile or other advantages of the new intervention into the QALY estimates, which can make the difference in QALYs even smaller. As such, the threshold of effectiveness preserved may need to be higher in a cost–utility analysis for an intervention to be considered non-inferior.
In addition, this framework can be used for scenarios where key clinical evidence is obtained from either non-inferiority or superiority trials. The conclusions of non-inferiority in a model-based economic evaluation and non-inferiority clinical trial are not necessarily consistent.
In the example below, we used a threshold of effectiveness preserved of 75 percent in the reference case analysis and thresholds of 50 percent and 90 percent in the sensitivity analyses. However, with the wider application of this framework, another value may be determined to be a more suitable threshold of effectiveness preserved.
Defining Cost-Effectiveness in the New Decision Framework
We consider an intervention cost-effective if two criteria are met: (i) the new intervention is non-inferior to the active non-inferior control, and (ii) the new intervention has a positive NMB at a given WTP. For each simulated result, we calculated the effectiveness preserved and the NMB between the new intervention and the active control groups. Then, we simultaneously determined the non-inferiority and cost-effectiveness of the new intervention.
We calculated the NMB using the following formula:
where NMB = net monetary benefit, WTP = willingness-to-pay threshold, ∆ Q = the difference in QALYs between the new intervention and the active control, and ∆ C = the difference in cost between the new intervention and the active control
It is uncertain whether the WTP in the non-inferiority framework should be lower than or the same as that in standard cost-effectiveness analyses. The WTP in a non-inferiority framework may be related to the threshold of effectiveness preserved. For example, if the threshold of effectiveness preserved is high (e.g., 90 percent), the demonstrated non-inferiority suggests that the health outcomes of the new intervention and the active control are very similar. Therefore, a lower WTP may be appropriate. When the WTP is $0, new interventions that are (i) noninferior to the active control and (ii) cost-saving (which would lead positive NMB) compared with active control, would be considered cost-effective. How to select a WTP in the different threshold of effectiveness preserved scenarios is beyond the scope of the present study. In this study, we used WTPs ranging from $0 to $100,000 per QALY gained.
The results can be presented in a three-dimensional (3D) graph, incorporating a range of thresholds of effectiveness preserved (on the x-axis), a range of WTPs (on the y-axis), and the probability of the new intervention being non-inferior and cost-effective (on the z-axis). The original cost-effectiveness acceptability curve (CEAC) is calculated from the joint density of the incremental cost and incremental effectiveness and a range of WTPs using the formula for NMB above (Reference Fenwick, O'Brien and Briggs21). A positive NMB at a given WTP indicates that the new intervention is cost-effective. We added one additional criterion, that is, non-inferiority in effectiveness, to define cost-effectiveness for the new intervention so that the probability of the new intervention being cost-effective in the new framework is always equal to or lower than that in the original decision framework if the same WTP is used for both new and original frameworks. In addition to the 3D graph, the results can also be presented in a modified CEAC. In the modified CEAC, instead of using a range of WTPs, the threshold of effectiveness preserved (e.g., 50–100 percent) can be given on the x-axis, and the probability of the new intervention being cost-effective at a given WTP in the new framework can be given on the y-axis.
An Example: rTMS for Treatment-Resistant Depression
We discuss rTMS just briefly, as details (e.g., model used, assumptions, parameter inputs, results) of a cost–utility analysis of rTMS versus ECT and a sham intervention have been previously reported elsewhere (9). Instead of comparing rTMS to ECT and rTMS to a sham intervention separately, as was done in the earlier publication, we included the three strategies in a single decision-analytic model: rTMS as the new intervention, ECT as the active control, and a sham intervention as the placebo control. We performed a meta-analysis and network meta-analysis to obtain the relevant effect estimates for the model.
We obtained the data used for the meta-analysis and network meta-analysis from the earlier publication (9). We conducted the meta-analysis using a random-effects model to estimate the proportion of patients achieving response and remission using the “metafor” package, version 1.9-9 (Reference Viechtbauer22), for R, version 3.3.1 (R Foundation, Vienna, Austria). We then estimated the odds ratios of response and remission for the rTMS and sham groups versus ECT by means of a frequentist random-effect network meta-analysis using the “netmeta” R package, version 0.9-2 (Reference Rucker23). Based on the pooled proportions of response and remission in the ECT group and the aforementioned odds ratios, we derived the probabilities of response and remission in the rTMS and sham groups.
All other model inputs used in our analyses were obtained from the original report (9). We conducted a PSA by running 10,000 Monte Carlo simulations to predict the costs and QALYs for rTMS, ECT, and the sham intervention. First, we used conventional cost-effectiveness analysis methods to compare rTMS with ECT. This included the use of an ICER (average incremental cost ÷ average incremental QALY) and a CEAC.
Next, we used the new non-inferiority and cost-effectiveness framework. We used the QALYs associated with rTMS, ECT, and the sham intervention groups to estimate the fraction of effectiveness preserved by rTMS. We presented the results as means (standard deviations [SDs]) and used a 3D graph to show the probabilities at which rTMS is cost-effective and non-inferior given a threshold of effectiveness preserved (ranging from 0 to 100 percent) and a WTP (ranging from $0 to $100,000 per QALY gained). We also conducted additional sensitivity analyses to examine the joint impact of response and remission in the rTMS group versus the ECT group (using odds ratios) on the results.
Results
Network Meta-analysis of rTMS
Our meta-analysis showed that the pooled probabilities of response and remission were 0.544 (95 percent confidence interval [CI]: 0.413, 0.670) and 0.366 (95 percent CI: 0.171, 0.617), respectively, in the ECT group (see Table 1). Our network meta-analysis showed that the pooled odds ratios of response were 0.563 (95 percent CI: 0.209, 1.515) for rTMS versus ECT, and 0.131 (95 percent CI: 0.040, 0.427) for the sham intervention versus ECT. Although this analysis did not provide a high degree of confidence that rTMS is less effective than ECT, 88 percent of the Monte Carlo simulations (assuming that the odds ratio of rTMS follows a log-normal distribution) were associated with lower effect (odds ratio < 1). The pooled odds ratios of remission were 0.390 (95 percent CI: 0.142, 1.069) for rTMS versus ECT, and 0.126 (95 percent CI: 0.038, 0.419) for the sham intervention versus ECT.
CI, confidence interval; ECT, electroconvulsive therapy; QALY, quality-adjusted life-year; rTMS, repetitive transcranial magnetic stimulation; SD, standard deviation.
The Conventional Cost-Effectiveness Analysis
We presented the results of the Monte Carlo simulations in a conventional CEAC (Figure 1), which shows that the probability of rTMS being cost-effective decreases as the WTP increases. The probabilities were 100 percent, 39 percent, and 14 percent at WTPs of $0, $50,000, and $100,000 per QALY gained, respectively. Further results can be found in Supplementary Material 2, Supplementary Table 1, and Supplementary Figure 1. In Supplementary Figure 1, most of the dots are located in quadrant III, indicating that rTMS was found to be less effective and less costly than ECT in most simulations. Compared with ECT, rTMS leads to an average incremental QALY of −0.0175 (SD: 0.0092) and an average incremental cost of −$738 (SD: $228), corresponding to an ICER of $42,193 saved per QALY lost. (Note: When both the incremental cost and QALY values are negative, the ICER is interpreted as “cost saved per QALY lost” instead of “cost incurred per QALY gained.”) Importantly, the direction of the cost-effectiveness judgment for an ICER at a given threshold in quadrants I and III is opposite (i.e., ECT is cost-effective compared with rTMS at a threshold of $50,000 per QALY gained).
The New Non-inferiority and Cost-Effectiveness Framework
On average, rTMS preserved 60 percent (SD: 20 percent) of the active treatment effectiveness (measured in QALYs) of ECT (see Table 2). The average NMB was −$137 (SD: $520) for rTMS relative to ECT at a WTP of $50,000 per QALY gained. The probabilities of rTMS preserving 50 percent, 75 percent, and 90 percent of effectiveness were 70 percent, 23 percent, and 7 percent, respectively. Figure 2 shows that the probability of rTMS being non-inferior and cost-effective increased as both the threshold of effectiveness preserved and WTP decreased; the highest probabilities of non-interiority and cost-effectiveness were reached when effectiveness preserved was 0 percent and the WTP was $0. Given that the cost of rTMS was lower than that of ECT for almost all simulations, the effectiveness preserved largely determined the probability of rTMS being cost-effective and non-inferior when the WTP was $0.
ECT, electroconvulsive therapy; QALY, quality-adjusted life-year; rTMS, repetitive transcranial magnetic stimulation.
Note: Results are expressed as means (standard deviations).
a All costs are presented in 2016 Canadian dollars.
We present the modified CEAC with WTPs of $0, $50,000, and $100,000 per QALY gained in Supplementary Figure 2 of Supplementary Material 3. When a 75 percent effectiveness preserved threshold and a positive NMB are used as the acceptable decision criteria, the probability of rTMS being non-inferior and cost-effective was 23 percent, 21 percent, and 13 percent at WTPs of $0, $50,000, and $100,000 per QALY gained, respectively (Supplementary Table 2). Supplementary Table 3 provides the results of the sensitivity analyses.
Discussion
Two previous economic evaluations of rTMS also found that rTMS was associated with lower costs and QALYs compared with ECT; in these evaluations, the authors elected to switch rTMS to the standard care and ECT to the new intervention when they estimated the ICER (Reference Zhao, Tor, Khoo, Teng, Lim and Mok24;Reference Kozel, George and Simpson25). Both studies reported a high ICER for ECT versus rTMS (S$311,024 [Singapore dollars] and $460,031 USD per QALY gained); the authors, therefore, concluded that rTMS was cost-effective compared with ECT. However, if rTMS was required to preserve at least 75 percent of the effectiveness of ECT, the probability of rTMS being cost-effective would become relatively low in the new decision framework.
To date, several health technology assessment agencies have produced both negative and positive funding recommendations for the use of rTMS for the treatment of depression, with different considerations provided (26–28). For instance, Australia's Medical Services Advisory Committee recommended against publicly funding rTMS owing to uncertainty regarding the clinical effect and cost-effectiveness of this intervention (Reference Galletly, Clarke and Fitzgerald27). Conversely, the Canadian Network for Mood and Anxiety Treatments made a positive funding recommendation for rTMS over ECT and other neurostimulation treatment comparators as first-line treatment for patients who have failed at least one antidepressant medication (Reference Milev, Giacobbe and Kennedy29). Furthermore, Health Quality Ontario recommended publicly funding rTMS for only a subset of patients for whom ECT is not an option; this recommendation thus considers the relatively higher patient preference and safety profile for rTMS versus ECT (26).
An intervention that is only slightly inferior (i.e., by a clinically acceptable difference) but cost-saving compared with alternative interventions would have a higher likelihood of being recommended for public funding based on our new framework. Our decision-making framework can also be extended to evaluations of the equivalence of effectiveness and/or cost jointly. For instance, if the incremental effectiveness is small, one could examine the equivalence between treatments in QALYs (e.g., the probability that the ratio of effectiveness between two interventions falls in a prespecified interval, such as 90–110 percent) and incremental costs. A similar framework has been suggested for economic evaluations conducted alongside equivalence trials (Reference Bosmans, de Bruijne, van Hout, Hermens, Ader and van Tulder30).
Currently, our framework considers only non-inferior interventions to be cost-effective. Literature exists regarding the decision rules for evaluating interventions that are substantially less effective. For example, O'Brien and colleagues introduced a “kinked cost-effectiveness threshold” to reflect the differences between individuals’ willingness to accept (a reduction in health) and their willingness to pay (for an increase in health), such that the threshold value in the southwest quadrant of the cost-effectiveness plane is greater than that of the northeast quadrant (Reference O'Brien, Gertsen, Willan and Faulkner15). Some authors have questioned the concept of a kinked cost-effectiveness threshold when maximizing health in society is the underlying goal of pharmacoeconomics (Reference Klok and Postma31;Reference Dowie32).
However, new developments in this concept continue to be made (Reference Eckermann14). For instance, Eckermann recently introduced decision rules for determining an appropriate threshold in the southwest quadrant of the cost-effectiveness plane (Reference Eckermann14). In addition, Kent and colleagues published an article on the concept of “acceptability trials,” whereby a maximally acceptable difference replaces the minimally important difference or non-inferiority margin frequently used in non-inferiority trials (Reference Kent, Fendrick and Langa33). We note the attractiveness of such concepts under severely constrained budgets; however, the acceptance and diffusion of interventions that exceed a clinically acceptable difference compared with standard care would likely face several ethical and political challenges.
In conclusion, the proposed new probabilistic decision framework overcomes the limitations of conventional approaches to estimate the cost-effectiveness of less effective interventions. It provides a different perspective for decision making with considerations of both non-inferiority and WTP thresholds. Decision makers could consider using the new framework in addition to, or instead of, the traditional framework. Doing so might aid in making recommendations or decisions about new treatments that are slightly less effective than the standard of care.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0266462319000576
Conflicts of interest
The authors declare that there are no conflicts of interest.