Those who use health economic evaluations for their decision making, such as health insurers or governmental reimbursement agencies, can gauge their quality by choosing from a wide array of guidelines, checklists, questionnaires, and scoring cards that have been published (Reference Philips, Bojke and Sculpher1;Reference Vemer, Corro Ramos and van Voorn2). However, these instruments often focus on methodologic aspects, to be rated by technically adept users, and many of these checklists and questionnaires do not address the “relevance” of economic evaluations (Reference McCabe and Dixon3;Reference Caro, Eddy and Kan4). Relevance refers to the applicability of the study results to actual clinical practice and the context, such as a country or state, of those who are directly interested in its results (Reference Caro, Eddy and Kan4). To be relevant, economic evaluations should, for example, include all clinical outcomes that are of importance with regard to the illness and treatment(s) under consideration, and use cohorts that reflect the target population of the drug.
Many health economic evaluations are model based, and are equipped to deal with issues of relevance by the nature of modeling. Data on clinical outcomes and costs used as input for model-based economic evaluations (MBEEs) may be missing or clouded by uncertainty. However, MBEEs are especially equipped to deal with missing data or uncertainties, for example, by incorporating data from other sources, making rationally founded assumptions, applying uncertainty distributions and implementing scenario analyses (Reference Grutters, Van Asselt and Chalkidou5;Reference Briggs, Weinstein and Fenwick6). These strengths ensure that MBEEs can yield information that reaches beyond what empirical studies can provide, and they could be used to make outcomes of MBEEs more relevant for decision makers (Reference McCabe and Dixon3;Reference O'Brien7;Reference Neyt, Cleemput and Thiry8).
As an example, we consider the case of dabigatran, a novel oral anticoagulant (NOAC) used for the prevention of stroke in patients with atrial fibrillation (AF). Dabigatran was the first NOAC and reached the market in 2010. Three NOACs quickly followed (apixaban, edoxaban, and rivaroxaban). In terms of efficacy, phase III trials on NOACs show that they are noninferior compared with the traditionally used vitamin K antagonists (VKAs), and are associated with a smaller risk of intracranial hemorrhage, a detrimental adverse event associated with anticoagulation (Reference Connolly, Ezekowitz and Yusuf9–Reference Giugliano, Ruff and Braunwald12). The evidence on the (cost-)effectiveness of NOACs has a great impact on public health and collective resources, because AF is a relatively common chronic disease with a large impact on quality of life, and one that is expected to become even more prevalent in the future (Reference Lozano, Naghavi and Foreman13;Reference Ball, Carrington and McMurray14).
To effectively assess the relevance of MBEEs on dabigatran, we need to distinguish between two different kinds of relevance factors: (i) factors that are independent of a decision context, and (ii) factors that are dependent on such a context. Context-independent factors refer to clinical outcomes and treatment durations; these are dependent on the illness and treatments, and not the context that surrounds them. Context-dependent factors relate to a country's or region's target population and clinical environment. In this systematic review, we assess all MBEEs on dabigatran as compared to VKAs on the basis of context-independent factors, and MBEEs performed for the United States regarding context-dependent factors. We also evaluate whether MBEEs on dabigatran have become more relevant through time, as subgroup analyses of the phase III trial on the drug have appeared since its market approval.
METHODS
Development of Questionnaire
To systematically assess the relevance of the MBEEs on dabigatran, we used part of the questionnaire developed by Caro et al., which is the product of a recent and collaborative initiative, specifically an ISPOR-AMCP-NPC Good Practice Task Force Report, to aid decision makers in making optimal resource allocation decisions (Reference Caro, Eddy and Kan4). This questionnaire was developed to evaluate economic evaluations on the basis of their relevance and credibility, and the final verdict for each study depends on how the answers to these questions run through a flow chart (with three possible outcomes of the flowchart: a fatal flaw occurred; results are, in general, unfavorable; and no flaws or “gaps” were found).
For the purpose of this systematic review, we used only those questions from Caro et al. that pertained to relevance, and not those pertaining to credibility. With regard to relevance, there are four general questions that need to be answered, which are as follows: (i) “Is the population relevant?”; (ii). “Are any critical interventions missing?” (iii) “Are any relevant outcomes missing?”; and (iv) “Is the context (settings and circumstances) applicable?” (Reference Caro, Eddy and Kan4). These generic items were specified by us to produce a questionnaire that was applicable to the case of oral anticoagulation for the prevention of stroke in patients with AF.
However, before we could specify such a questionnaire, we needed to infer what the relevant population characteristics, interventions, clinical outcomes, and contextual factors were. The relevant clinical outcomes and population characteristics stem from the RE-LY trial, a phase III randomized controlled trial that was performed by the manufacturer of dabigatran before its market introduction for AF in 2010. This trial was used because it provided us with comprehensive, reliable, and detailed information from a single study population (consisting of over 18,000 participants). See Supplementary Tables 1 and 2 for more details on the used background information on the relevance of clinical outcomes and population characteristics.
Our questionnaire was divided into two categories: questions pertaining to context-independent factors and questions related to context-dependent factors. Context-independent factors are not dependent on the context, such as the country, region or healthcare system of interest, but rather on the illness and treatment itself. We also analyzed context-dependent factors with regard to MBEEs performed for the United States. All studies that met the criteria of the literature search were systematically addressed by using the questionnaire that was developed (Table 1). A more detailed explanation of how this questionnaire was developed can be found in Supplementary Table 3.
Table 1. Questionnaire Used to Assess the Relevance Model-Based Economic Evaluations Comparing the Cost-Effectiveness of Dabigatran with That of VKAs in the Prevention of Stroke for Patients with AF

aRenal function was not taken into account as co-morbidity, because there is no information on the prevalence of renal impairment in the target population.
bIf a subgroup analysis is performed on the basis of this factor, this should be answered with “yes”?
cIf a subgroup analysis is performed on the basis of the CHADS2 score, this should be answered with “yes”.
AF, atrial fibrillation; DBG, dabigatran; GIH, gastrointestinal hemorrhage; INR, international normalization ratio; MI, myocardial infarction; nGIH-ECH, non-gastrointestinal hemorrhage extracranial hemorrhage; TIA, transient ischemic attack; U.S., United States; VKA, vitamin K antagonist.
Literature Search and Review of the Studies
A systematic literature search was performed in Embase and PubMed (including Medline), using the search command: “dabigatran AND ‘atrial fibrillation’ AND (‘cost effectiveness’ OR economic evaluation)”. Two reviewers (H.R. and J.G.) selected studies on the basis of title and abstract independent from each other. Economic evaluations were included if the following requirements applied: (i) a full body text in English; (ii) dabigatran is compared with a VKA for stroke prevention in patients with AF; (iii) the modeled cohort may be divided into subgroups but not dedicated to one or more specific subgroup(s); and (iv) the study yields either an incremental cost-effectiveness ratio (ICER) or information which allowed for calculation of an ICER.
HR assessed all included economic evaluations on the basis of the developed questionnaire. JG independently assessed a random subset of studies, with the aim to find differences in interpretation. Differences in interpretation were resolved by discussing with the other authors.
Analyses
Context-dependent factors are related to the relevant clinical outcomes of treatment with dabigatran or VKAs for the prevention of stroke in patients with AF, as well as the appropriate time horizon of these treatments (Reference Caro, Eddy and Kan4). Relevant clinical events consisted of: ischemic stroke, systemic embolism, pulmonary embolism, myocardial infarction, intracranial hemorrhage, major extracranial hemorrhage, minor hemorrhage, and dyspepsia. We also evaluated whether it was accounted for that some events lead to a permanent or long-lasting change in quality of life and/or healthcare costs (ischemic stroke, intracranial hemorrhage, myocardial infarction, and dyspepsia). When AF patients are eligible for oral anticoagulants, they usually take them for the remainder of their life.
The context-dependent factors can be subsumed under the following categories distinguished by Caro et al.: the target population, the clinical environment, and relevant interventions (Reference Caro, Eddy and Kan4). The target population of the United States was specified with regard to socio-demographic factors (age, gender, ethnicity), medical histories (prior stroke, prior myocardial infarction), behavior (drug discontinuation, treatment adherence), and co-morbidities (diabetes mellitus, heart failure, hypertension). The clinical environment pertained to the frequency of monitoring of both treatments, and critical interventions to the time in therapeutic range of VKAs and the dose of dabigatran (which was 150 mg twice daily in the United States).
As we aimed to find out how the context-dependent factors in the MBEEs on dabigatran compared with the actual context of the United States, we needed to specify the context of the United States. Data from studies that used samples from Medicare and private insurance plans were used to establish the prevalence of different socio-demographic characteristics and medical conditions among the U.S. target population (Reference Lauffenburger, Farley and Gehi15–Reference Graham, Reichman and Wernecke17). We specified these characteristics as ranges, which were used as the backdrop for our systematic review of context-dependent factors (Table 2).
Table 2. Characteristics of Patients with Atrial Fibrillation, Eligible for Dabigatran or Vitamin K Antagonists: the RE-LY Study Population versus Patients Diagnosed with Atrial Fibrillation in the United States

aGiven for each factor is the range or average value found in literature (Reference Lauffenburger, Farley and Gehi15–Reference Graham, Reichman and Wernecke17).
CAD, coronary artery disease; CC, creatinine clearance; CHADS2 score, risk score based on the presence of Congestive heart failure, Hypertension, Age ≥ 75 years, Diabetes mellitus, and prior Stroke or TIA; MI, myocardial infarction; TIA, transient ischemic attack.
We evaluated whether the MBEEs on dabigatran have become more relevant through time, as subgroup analyses of the phase III trial on the drug and observational studies have appeared since its market approval, which could be used to make data input more relevant. For this purpose, we analyzed what the average share of incorporated relevance factors were for each year. When MBEEs pertained to the U.S. context, shares were calculated over both context-independent and -dependent factors. When MBEEs were performed for other countries, the shares were only calculated over the context-independent factors.
RESULTS
Literature Search and Review
The literature search yielded a total of 195 hits with fifty-five duplicates. From the remaining 140 hits, thirty-eight studies were selected on the basis of title and abstract. The full texts of the studies in this set were assessed, leading to inclusion of twenty-nine studies (Figure 1). Studies were published between 2011 and 2015. Six studies were performed for the United States (Reference Freeman, Zhu and Owens18–Reference Clemens, Peng and Brand23), three for Canada (Reference Sorensen, Kansal and Connolly24–Reference Singh, Micieli and Wijeysundera26), seventeen for European countries (Reference Pink, Lane and Pirmohamed27–Reference Kongnakorn, Lanitis and Annemans43), and three for other countries (South Africa, Taiwan, and Singapore) (Reference Bergh, Marais and Miller-Janson44–Reference Wang, Xie and Kong46).

Figure 1. Flow chart of the search process and study exclusion. * Reviews are not restricted to reviews of cost-effectiveness analyses, and includes conference reviews and reviews on anticoagulation. † Uetsuka Y. Cost-effectiveness of oral anticoagulant in patients with atrial fibrillation. Japanese Journal of Clinical Pharmacology and Therapeutics. 2011; 42: 321–32. Unavailable in all Dutch and German university libraries.
Relevance in Terms of Context-Independent Factors
Context-independent factors are factors that are universally relevant, regardless of the region or healthcare system of interest, in this case health outcomes associated with AF and the use of oral anticoagulation through either VKAs or dabigatran, as well as the duration of oral anticoagulant treatment (which is permanent in the case of atrial fibrillation). Ten context-independent factors were defined (Table 3). On average, 54 percent of these factors were included per study. On average, 47 percent of these factors were incorporated in studies performed for the United States, 67 percent in studies performed for Canada, 55 percent for Europe, and 50 percent for other countries.
Table 3. Overview of Whether the Model-Based Economic Evaluations on Dabigatran versus VKAs for the Prevention of Stroke in Patients with AF Are Relevant with Respect to Different Context-independent Factors

ECH, extracranial hemorrhage; GI, gastrointestinal; TIA, transient ischemic attack.
Incorporated
Not (completely) incorporated
Unknown
The study that included the most context-independent factors is the one from Zheng et al. (Reference Zheng, Sorensen and Gonschior41), which was performed for the United Kingdom, and only excluded pulmonary embolism and dyspepsia from the model. The three studies with the lowest inclusion of relevant context-independent factors (Reference Shah and Gage19;Reference Krejczy, Harenberg and Marx36;Reference Wang, Xie and Kong46) covered only 30 percent of factors, although it was unclear whether one study adequately included dyspepsia in the model (Reference Shah and Gage19).
All studies included the health outcomes ischemic stroke and intracranial hemorrhage in their model. In all but one study (Reference Krejczy, Harenberg and Marx36), a lifetime horizon for treatment with dabigatran and VKAs was applied. Only in two studies was pulmonary embolism taken into account (Reference Coyle, Coyle and Cameron25;Reference Lanitis, Cotte and Gaudin37).
Relevance in Terms of Context-Dependent Factors
The context-dependent factors are dependent on the country of interest, i.e., patient population characteristics, relevant interventions and the clinical environment (such as the method and frequency of drug monitoring). We defined seventeen context-dependent factors. The United States formed the context, or country of interest, in this review, and six of the selected studies in this review were performed for the United States. When set against the actual U.S. context, 37 percent of the seventeen context-dependent factors were incorporated per study (Supplementary Table 4).
The study that included the most context-dependent factors is the one from Freeman et al. (2011) (Reference Freeman, Zhu and Owens18), which was the first MBEE that was performed for the United States. The study with the lowest inclusion of relevant context-dependent factors was performed by Harrington et al. (2013) (Reference Harrington, Armstrong and Nolan22). All U.S. studies used cohorts that reflected the study population of the RE-LY trial (the phase III trial on dabigatran), which suggests that all MBEEs used a baseline age which reflects the U.S. target population, but also, in contrast, that none of the modeled cohorts reflect the U.S. target population in terms of gender, ethnicity, share of patients with a previous myocardial infarction, the rate and/or costs of treatment discontinuation, and treatment adherence.
Relevance over Time
When taking both context-independent and -dependent factors into account for the U.S. MBEEs, and context-independent factors for the non-U.S. MBEEs, the average share of factors that are included per study is 56 percent. The share of relevant factors per study did not increase over time: shares were 58 percent in 2011 (four studies: 18;19;24;27), 61 percent in 2012 (four studies: 20;28–30), 59 percent in 2013 (nine studies: 21;22;25;26;31–34;44), 50 percent in 2014 (ten studies: 23;35–41;45;46), and 55 percent in 2015 (two studies: 42;43).
DISCUSSION
Little over half of the factors that were deemed relevant for MBEEs on dabigatran versus VKAs for the prevention of stroke in patients with AF, were actually found to be appropriately included in the twenty-nine MBEEs analyzed in this systematic review. Although we focused on dabigatran as an example, our methods could be applied to any other intervention or disease area to assess whether relevant factors have been considered. Not only methodological flaws, which receive much attention from health economic researchers, but which were not reviewed here, but also such a paucity in relevance will undercut the usefulness of MBEEs for decision makers. Perhaps an even more disconcerting finding of this study is that no improvement was observed with respect to relevance over time, while more recent subgroup analyses or other empirical findings could arguably have been incorporated in later studies to enhance their relevance. It is unclear in how far a lack in relevance may have impacted the cost-effectiveness results of the reviewed studies, although it is clear there are extremely large differences in results.
Context-dependent relevance factors are strongly related to the concept of “external validity” (Reference Rothwell47). In many cases, MBEEs investigate the cost-effectiveness of novel interventions and directly derive their input from the phase III trial or other randomized clinical trials on the intervention. However, results from such trials have only “internal validity,” as they pertain to a selected study population studied under trial-like circumstances, and, therefore, often lack external validity, that is, results may not apply for the target population and within a context that is true to real-world conditions (Reference O'Brien7;Reference Neyt, Cleemput and Thiry8). When an MBEE suffers from a lack of relevance, its results may not be applicable to the decision context under consideration, and the studied intervention may, therefore, receive undeserved priority, or lack of priority, over other healthcare interventions. The resulting suboptimal allocation of treatments and associated resources may cause a loss rather than a gain in public health.
Therefore, we would like to emphasize it is important that health economic researchers promote the relevance of their MBEEs by specifying and implementing both context-independent factors and dependent factors, which can be done by incorporating data from other sources, making rationally founded assumptions, applying uncertainty distributions and implementing subgroup and/or scenario analyses (Reference Grutters, Van Asselt and Chalkidou5;Reference Briggs, Weinstein and Fenwick6). O'Brien, however, stipulated that there may be downsides to making a MBEE more externally valid with such instruments. Sometimes, in the attempt to reach external validity, researchers create a “Frankenstein's monster”: one “pulls together the many needed pieces of information from multiple sources and then stitches them together into a (hopefully) cohesive whole” (Reference O'Brien7). It is possible that when one creates a “Frankenstein” model, the multiple sources pertain to different, incomparable contexts, and that some assumptions may be inadequate. Researchers should be aware of this and carefully and critically select the sources that are useful for their MBEE.
With regard to the MBEEs on dabigatran performed for the United States, the briefing document from its manufacturer (48) is additionally informative for model inputs, and many subgroup analyses from the phase III trial on dabigatran have appeared since its market introduction (Reference Diener, Connolly and Ezekowitz49–Reference Brambatti, Darius and Oldgren57), and results from these studies could have been incorporated to make MBEEs more relevant without having to implement data from other sources or make, perhaps debatable, assumptions.
In our analysis, we investigate whether context-independent and context-dependent factors have or have not been considered in the economic evaluation. Unfortunately, we could not provide an indication of how important these factors are to the estimate of cost-effectiveness of dabigatran. For example, would the estimate of cost-effectiveness differ meaningfully had a model considered the differential rate of dyspepsia or systemic embolism or any of the numerous extracranial hemorrhages? Or, would a decision maker have made a different decision had these factors been considered? While MBEEs may have excluded some factors, this may be reasonable if they make no material difference to the estimate of cost-effectiveness.
However, it is impossible to make generic statements about the extent and direction of potential bias when excluding each factor, because it strongly depends on the results of the economic evaluation and further assumptions. A simple example to illustrate this point is the time horizon. If the (long-term) costs of dabigatran are higher, a longer time horizon will be in favor of VKAs. If the model results in cost savings for dabigatran, a longer time horizon will be in favor of dabigatran. For the latter, if dabigatran is not cost-effective in the MBEE, it may be a meaningful difference. If dabigatran is already highly cost-effective, the difference will not be meaningful. In addition, it was not our intention to score or grade the MBEEs. Even if less than half of the factors are not included in an MBEE, the results may still be correct. However, we do want to raise awareness about the external validity that is hampered in such cases.
Another limitation of our systematic review is that we specifically focused on dabigatran. This choice was based on the fact that dabigatran was the first NOAC that reached the market, and was, therefore, extensively researched against VKAs. Dabigatran was also chosen because we did not want to focus on a complete class of drugs, but rather on a single pharmaceutical. A consequence, however, of choosing one NOAC is that it remains unknown whether MBEEs on other NOACs show similar gaps in relevance, or less or more so. Also, we did not include drug–drug interactions (e.g., with anti-arrhythmic agents, statins, anti-platelets, or selective serotonin reuptake inhibitors) (Reference Andrus58–Reference Quinn, Singer and Chang60), as data on medication use by AF patients in the United States were entirely missing.
In addition, we chose to include only MBEEs that are readily available for decision makers, also including clinicians, and, therefore, only included MBEEs in scientific papers. It may well be that MBEEs that are published in HTA reports are more targeted to a decision and, therefore, include more relevance factors. Lastly, a limitation is that we focused on relevance and not credibility, as the questionnaire from Caro et al. focused on. We could, therefore, also not use the flowchart as it was intended. We made this choice, however, because we wanted to bring the concept of relevance to the attention of health economics researchers and decision makers.
To summarize then, we propose that economic researchers need to specify the context for which the MBEE's results are supposed to be relevant, and data input and model assumptions need to be consistent with this choice. To this end, researchers need to answer two basic questions: first, what are the relevant clinical outcomes, treatment durations, characteristics of the target population, and healthcare characteristics of the intervention(s) under consideration? Second, how might these impact on health outcomes and costs incurred? We believe that, to answer these questions, economic researchers could use the relevance checklist as defined by Caro et al., in close collaboration with clinicians as they are best equipped to answer the questions related to both internal and external validity (Reference Grutters, Seferina and Tjan-Heijnen61).
We also believe that providing open access to health economic models by their makers could prove useful for policy makers and health economists, mainly because this would enable updating of the model when new information on clinical event rates or costs become available, and may help to prevent “wasteful research” (Reference Chalmers, Bracken and Djulbegovic62) as researchers worldwide do not need to develop a separate model for their country, and can deliberate about ways to improve the model. It is also important that policy decision makers, journal editors and peer reviewers acknowledge that economic evaluations may suffer from a lack of relevance with regard to clinical practice, also in reimbursement dossiers, and assess whether this lack of relevance can be dealt with, before accepting such a study as the basis of a decision or for publication.
In conclusion, model-based economic evaluations on dabigatran vs. VKAs for the prevention of stroke in patient with atrial fibrillation show a paucity in relevance, not only because they are all almost solely based on the phase III trial on dabigatran, which has limitations with regard to external validity, but also because researchers failed to include all relevant clinical events in their model. The relevance of these studies did not increase over time, while new studies after the phase III trial on dabigatran provided ample opportunity to increase relevance. Economic evaluations that are not representative of clinical practice may lead to a suboptimal allocation of resources, and cause a loss rather than a gain in the health of patients. We, therefore, urge economic researchers to optimize the relevance of their economic evaluations, possibly with the help of clinicians. This should be done by ensuring that at least all context-independent relevance factors are adequately incorporated, and that the accurate use of context-dependent factors should be promoted, which may be possible without resorting to the use of an artificial “Frankenstein” model. Policy makers, clinicians, and journal editors/reviewers are advised to critically assess the relevance of health economic evaluations.
SUPPLEMENTARY MATERIAL
The supplementary material for this article can be found at https://doi.org/10.1017/S0266462318000211
Supplementary Table 1: https://doi.org/10.1017/S0266462318000211
Supplementary Table 2: https://doi.org/10.1017/S0266462318000211
Supplementary Table 3: https://doi.org/10.1017/S0266462318000211
Supplementary Table 4: https://doi.org/10.1017/S0266462318000211
CONFLICTS OF INTEREST
Dr. Grutters reports grants from ZonMw during the conduct of the study. Drs. Maas, Rolden, and Van der Wilt have nothing to disclose.