Published online by Cambridge University Press: 26 April 2005
Objectives: The aim of the Consensus on Health Economic Criteria (CHEC) project is to develop a criteria list for assessment of the methodological quality of economic evaluations in systematic reviews. The criteria list resulting from this CHEC project should be regarded as a minimum standard.
Methods: The criteria list has been developed using a Delphi method. Three Delphi rounds were needed to reach consensus. Twenty-three international experts participated in the Delphi panel.
Results: The Delphi panel achieved consensus over a generic core set of items for the quality assessment of economic evaluations. Each item of the CHEC-list was formulated as a question that can be answered by yes or no. To standardize the interpretation of the list and facilitate its use, the project team also provided an operationalization of the criteria list items.
Conclusions: There was consensus among a group of international experts regarding a core set of items that can be used to assess the quality of economic evaluations in systematic reviews. Using this checklist will make future systematic reviews of economic evaluations more transparent, informative, and comparable. Consequently, researchers and policy-makers might use these systematic reviews more easily. The CHEC-list can be downloaded freely from http://www.beoz.unimaas.nl/chec/.
Health-care professionals, consumers, researchers, and policy-makers can be overwhelmed by the sometimes unmanageably large number of trials and economic evaluations of health care interventions. Systematic reviews of these studies can help in making well-informed decisions on which intervention to adopt. For maximum usefulness, systematic reviews of economic evaluations should be transparent, that is, all relevant methodological information from the included studies should be described in a systematic way. However, there is no generally accepted criteria list for reviewing economic evaluations, which may be because most of the criteria lists are created single-handed. The aim of the Consensus on Health Economic Criteria (CHEC) project was to develop a generally accepted criteria list, which should be regarded as a minimum standard.
The CHEC-list focuses only on the methodological quality of economic evaluation aspects, as existing criteria lists focus already on the methodological quality of more general aspects of effectiveness studies (20;23). The CHEC-list was developed for systematic reviews of full economic evaluations based on effectiveness studies (cohort studies, case-control studies, randomized controlled trials). The focus on economic evaluations alongside trials is due to practical considerations, because other methodological criteria are relevant when using other designs, for example, modeling studies or scenario analyses (27). The criteria list has been developed using a Delphi method. A similar method was used in development of an instrument to assess the methodological quality of randomized controlled studies (30).
The study started with the creation of a large item pool, followed by reduction by a Delphi method. In this study, three Delphi rounds were sufficient to reach consensus (defined as general agreement on a substantial majority).
The project team was responsible for the construction and reduction of the item pool, the selection of participants of the Delphi panel, the construction of the Delphi questionnaires, the analysis of the response, and the formulation of the feedback.
For the development of the initial item pool, items were selected from various sources, and several search strategies were used to identify the relevant literature in the field of economic evaluations. First, a Medline search was performed for the period 1990 to 2000 using MESH headings “cost and cost analysis,” “meta-analysis or review literature.” Additionally, Psychlit and Econlit were screened for the same period using keywords “review or meta-analysis” and “economic or cost” in titles. Additional articles were identified by searching the Cochrane Library (2000, issue 3) and the National Health Service EED database using the terms “cost or economic.” Finally, handbooks on economic evaluations were monitored and a request to submit additional guidelines was presented to the HealthEcon Discussion List.
Three members of the team (S.E., M.G., and A.A.) developed a classification scheme, which included the various domains of economic evaluations (e.g., economic study question, economic study design, economic identification, outcome valuation) under which all items were ordered. The classification scheme consisted of nineteen categories. Details of the methods have been published elsewhere (2). The Delphi participants were given the opportunity in the first Delphi round to add additional categories and items they thought were missing.
Twenty-three international experts participated in our Delphi panel (see acknowledgments). We first formed a Task Force Group consisting of seven international experts in the field of economic evaluations. This Task Force Group assisted the project team with the composition of the final Delphi panel. Participants for the Delphi panel were selected if they were authors of guidelines or criteria lists or if they had special expertise in (systematic reviews of) economic evaluations. The project team made a first selection, keeping a balance between various countries and research groups. The inclusion of experts from different research settings all over the world was an explicit goal. The final Delphi panel was approved by the Task Force Group.
During the entire Delphi procedure, structured questions were combined with open questions to facilitate the Delphi process. For example, in the Delphi-1 questionnaire, participants were asked to indicate which categories should be incorporated in a minimum set of the criteria list and give their arguments. Participants were always asked for other suggestions. In all the Delphi rounds, participants were asked to complete the questionnaire “bearing in mind the aim of the project.” After each Delphi round, feedback was given to inform the participants of decisions, opinions, and arguments of the other participants. The project team decided, on the basis of the majority of the answers and the arguments used, which categories and items should appear in the next Delphi round. Overall, a category or item was included based on an arbitrary cutoff point (if half of the participant plus one agreed on its inclusion).
The decisions of the project team were also presented and justified in feedback reports (Delphi questionnaires, feedback reports, and the final CHEC-list can be downloaded from www.beoz.unimaas.nl/chec/).
In the first Delphi round, participants were asked to indicate which categories and subsequently which items should be included in the CHEC-list. In the second Delphi round, the participants received the results and comments of the panel together with the decisions of the project team. In this second Delphi round, participants were asked, “to select one item of each category that should be included in the final CHEC-list.” Based on the analysis of this second round, the project team phrased a concept version of the CHEC-list. In the third Delphi round, participants were given a final opportunity to suggest modifications to the concept version of the CHEC-list.
To pilot the applicability of the CHEC-list, two articles on economic evaluations were reviewed by all authors.
Twenty-six experts were invited to participate in the Delphi panel, of whom two did not accept the invitation and one who originally agreed to participate did not respond to our questionnaires. A total of twenty-three members completed all three Delphi rounds.
A total of twenty-five guidelines were initially identified (1;3–22;24–26;28;29). Five of these guidelines were excluded after reading, because they were inadequate for our goal (3;5;14;22;28). Furthermore, some guidelines were published more than once (4;7;10;11;24–26). This finding left a sample of fifteen guidelines available, resulting in an initial item pool of 218 items. The item pool was restructured and reduced using the classification scheme. We eliminated items on general methodology (covered by criteria lists of effectiveness studies), and skipped (almost) identical formulations (see Ament et al. [2]). This strategy reduced the initial item pool to 128 items spread over nineteen categories, which were used in the first Delphi round. In the first Delphi round, all nineteen categories were considered essential for the CHEC-list, that is, the lowest approval (83 percent) was on the category “Ethics and distributional effects.” The items to be included in each category, however, did show some major differences (agreement varied between 9 percent and 91 percent). We used a majority agreement as an arbitrary cutoff point to include an item within a category. If none of the items reached the cutoff point within a category, the project team decided on the most appropriate item based on the arguments of the participants in the first Delphi round.
On the basis of the results of the Delphi-1, the project team created a feedback report and a Delphi-2 questionnaire, in which all items that met the above criteria were presented to the participants. All twenty-three participants completed and returned the Delphi-2 questionnaire. In the analysis within each category, the item with the highest percentage of agreement was chosen. An important observation from the Delphi-2 round was that a large number of the participants indicated that they preferred items to be selected that give insight into the quality of the study performed, rather than into how the study is performed (e.g., “Is the economic study design appropriate to the stated objective?” instead of “What is the form of design used?”). The project team selected and rephrased items and developed a concept version of the CHEC-list, which was included in the Delphi-3 questionnaire. In addition, guidelines were developed on how to fill out the CHEC-list, which gave an explanation of the meaning of each item.
The Delphi-3 questionnaire was also completed and returned by all twenty-three participants. The majority of the participants accepted the concept CHEC-list. To give a taste of the final discussion, we present some examples of the remarks of the panel members. Some participants had specific comments regarding the category “Presentation of results.” Three items in this category were included in the Delphi-2 questionnaire, that is, “Are the methods and analysis displayed in a clear and transparent manner?,” “Do the conclusions follow from the data reported?,” and “Are the assumptions and limitations of the study discussed?.” The agreement of these items was 39.1 percent, 69.6 percent, and 26.1 percent, respectively. We, therefore, had selected item 2 for our final CHEC-list (see Table 1). In the second and the third Delphi round, three Delphi members suggested that all three items should be retained in the criteria list. The project team reconsidered this and decided, as it was only suggested by the minority to keep to their original decision based on the opinion of the majority of the panel members.
In the guideline of the CHEC-list, regarding the category “perspective,” the project team had stated the following “If the study is performed from a societal perspective tick ‘yes,’ as all relevant costs and consequences of an intervention and disease are taken into account, if possible. Other narrower perspectives will only include certain components. The authors should motivate why a narrower perspective is valid.” Based on this statement, one participant remarked that “there is no single appropriate perspective. What is appropriate depends upon the decision/policy-maker. Thus, the answer to this item, if answerable at all, will be use-context specific and not intrinsic to the published study.” Although we agree to a certain extent with the remarks made by this Delphi member, it is an overall limitation of any criteria list to design a general criteria list, in which all items are equally relevant for all studies. Based on the discussion in the project team, we decided to not alter the guidelines and the CHEC-list, as the majority of the Delphi panel agreed with the suggested phrasing.
Regarding the category “independence of the investigator” one participant also remarked that “A study should not be ‘marked down’ because there are acknowledged potential conflicts. If we do that, no studies published by the pharmaceutical industry or by organizations working for them would be acceptable.” Based on this, the project team did not change the item, but in the explanation, we tried to overcome this difficulty by stating that “If an external agency finances the study, a statement should explicitly be given about who finances the study to guarantee transparency in the relationship between the sponsor and the researcher. Whenever a potential conflict of interest is possible, a declaration should be given of ‘competing interest’.”
The pilot test of the CHEC-list showed a strong agreement among the assessments of the members of the project team when reviewing two articles. No items were changed or adapted.
This study is the first in which a broadly accepted criteria list for economic evaluations was developed based on a Delphi consensus procedure. In a consensus procedure, the choice of participants is crucial. In the selection of this Delphi panel, the project group tried to achieve a broad representation of experts on quality assessment of economic evaluations. In the Delphi-2 round, it became clear that the majority of the participants wanted the CHEC-list to include items that give insight into the quality of economic evaluations rather than into how the economic evaluation is performed. As a result, most of the items included are now subjective judgments, not simple statements of “fact.” In its practical use, this may challenge the inter-rater variability. The project team, therefore, suggests using two or more reviewers when performing a systematic review and conducting a pilot test. Criteria lists, such as the CHEC, can be criticized because of their potential rigidity, which might prohibit further development of methodology. This criticism is only valid if criteria are used injudiciously. To prevent the CHEC-list from being methodologically rigid, the project group emphasizes that the CHEC-list is a minimum set and would like to stimulate researchers to add additional items appropriate for the specific subject under study, if relevant.
Finally, there are many examples of poor reporting of economic evaluations. It is often difficult to conclude what actually happened in the study and this affects the methodological quality assessment. Part of the problem is because journals only accept a limited number of words in an article, thus making extensive explanations almost impossible. A solution might be to contact the authors of the original study, asking them for a more detailed description of the study design. An alternative would be to require the production of a standard technical report for every economic evaluation study. This technical report should be available to everyone, for example on the Internet. Such a technical report, describing the economic design, and details of the study, could also be used in systematic reviews.
Notwithstanding these limitations, we believe that, by creating a criteria list consisting of a minimum set of items, this CHEC-list makes it possible for future systematic reviews of economic evaluations to become more transparent, informative, and comparable.
There was consensus among a group of international experts regarding a core set of items that can be used to assess the quality of economic evaluations in systematic reviews. The project team also provided a guideline to the criteria list items to standardize its interpretation and facilitate its use. Using this criteria list will make future systematic reviews of economic evaluations more transparent, informative, and comparable. Consequently, the CHEC-list can help researchers and policy-makers to interpret the results of systematic reviews more easily and can be of help by translating these results into policy implications.
Silvia Evers, PhD (S.Evers@beoz.unimaas.nl), Senior HTA Researcher, Department of Health Organization Policy and Economics, Mariëlle Goossens, PhD (m.goossens@dep.unimaas.nl), Assistant Professor, Department of Medical, Clinical, and Experimental Psychology, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands
Henrica de Vet, PhD (hcw.devet@vumc.nl), Professor, Maurits van Tulder, PhD (mw.vantulder@vumc.nl), Assistant Professor, Institute for Research in Extramural Medicine, VU University Medical Centre, Van der Boechortstraat 7, 1081 BT Amsterdam, The Netherlands
André Ament, PhD (a.ament@beoz.unimaas.nl), Associate Professor, Department of Health Organization Policy and Economics, Maastricht University, P.O. Box 616, 6200 MD Maastricht, The Netherlands
The authors thank the following persons for their participation in the Delphi panel: D. Banta, M. Buxton*, D. Coyle*, C. Donaldson, M. Drummond*, A. Elixhauser*, B. Jönsson, E. Jonsson*, K. Kesteloot, B. Luce, D. Menon, M. Mugford*, E. Nord, B. O'Brien ([dagger]February 13, 2004), J. Rovira, L. Russell, F. Rutten*, G. Simon, J. Sisk, R. Taylor, G. Torrance, A. Towse, and L. Vale. We thank J. van Emmerik for his technological assistance with the current study and E. Brounts for the literature review. *These members were also part of the Task Force Group.
Final CHEC-List after Three Delphi Rounds