Health technology assessment (HTA) agencies and guidelines developers interested in the cost-effectiveness of healthcare technologies need access to evidence from economic evaluations to identify what is known about the cost-effectiveness of a technology and to find information to inform models.
Access to economic evaluations has improved over the past decade with the development of specialist databases including the National Health Service Economic Evaluation Database (NHS EED) (3) and the Health Economic Evaluations Database (HEED) (7). To augment these valuable resources, many technology assessment researchers also search the MEDLINE and EMBASE databases for economic evaluations. The main reason for this is likely to be time lags between records being identified from databases such as MEDLINE and full abstracts being added to NHS EED or HEED. NHS EED flags records of studies being abstracted, but does not include a preliminary abstract. As a result, the search efficiency for those records is reduced while the full record is being produced and records in MEDLINE or EMBASE tend to have more information to assist retrieval.
When searching MEDLINE and EMBASE, technology assessment researchers typically use search filters. Search filters are collections of search terms designed to retrieve records reporting study designs, such as economic evaluations. Many published search filters to identify economic evaluations are available (9). However, HTA researchers have perceived that identifying economic studies in MEDLINE and EMBASE using published filters can be problematic, because filters are highly sensitive, but often have low levels of precision. Sensitivity measures the proportion of relevant records retrieved by a filter from the known relevant records. Technology assessment usually values high sensitivity to minimize risk of bias. Precision, in information retrieval, measures the number of relevant records retrieved as a proportion of the total records retrieved. Achieving high sensitivity frequently means sacrificing precision, and vice versa (Reference Jenkins10).
Despite the number of available economic evaluation search filters, there is little evidence about their current performance in identifying economic evaluations in MEDLINE and EMBASE (Reference McKinlay, Wilczynski and Haynes13;Reference Royle and Waugh15;Reference Sassi, Archard and McDaid17;Reference Wilczynski, Haynes and Lavis19). This makes it difficult to choose between the available filters.
As part of a project funded by the Canadian Agency for Drugs and Technologies in Health (CADTH) to develop search filters to identify economic evaluations more efficiently, we were able to test the performance of search filters to assist HTA researchers to identify the best filters for their specific needs.
OBJECTIVES
The main objectives of this research were to develop a range of search filters to identify economic evaluations in the MEDLINE and EMBASE databases and to obtain data on the relative performance of those new filters and available published search filters in those databases. This study reports the relative performance of available published filters for MEDLINE and EMBASE. The performance of search filters was assessed using gold standard sets of records of known economic evaluations.
Ideal desired performance levels for search filters were established through discussion within the project team (J.G., D.K., S.M.) reflecting their experience of searching for economic evaluations and their assessments of the preferences of HTA researchers within CADTH. HTA research usually requires highly sensitive searches (ideally over 0.95 for CADTH researchers). Very high precision is desirable (over 0.80), but given the trade-off between sensitivity and precision, researchers tend to acknowledge that precision will be much lower than sensitivity. In some circumstances, such as scoping exercises where a rapid overview of the evidence is required, a reduction in sensitivity with an increase in precision might be acceptable. The following scenarios were chosen to assess the performance of filters to achieve different objectives: (i) a sensitivity maximizing approach (sensitivity 0.95, precision 0.2); (ii) a precision maximizing approach (sensitivity 0.8, precision 0.5); and (iii) a balance between sensitivity and precision.
METHODS
There are many approaches to search filter testing (Reference Bak, Mierzwinski-Urban, Fitzsimmons, Morrison and Maden-Jenkins1;Reference Glanville, Bayliss and Booth5;Reference Jenkins10;Reference Royle and Waugh15;Reference Sassi, Archard and McDaid17;Reference Wilczynski, Morgan and Haynes20). Many filters are designed using a gold standard set of records and then tested against the gold standard (Reference Jenkins10). However, search filters are ideally validated on additional gold standards, which have not been used to derive the filters, to reduce the chance that filters may overperform on the gold standard from which they were derived (Reference Jenkins10). This research developed a known gold standard set of records of economic evaluations that had not been used to generate any of the search filters that were tested.
Definition of Economic Evaluations
Researchers define “economic evaluations” differently. The focus of this research was economic evaluations as per CADTH's Guidelines for the Economic Evaluation of Health Technologies: Canada (http://www.cadth.ca/media/pdf/186_EconomicGuidelines_e.pdf):
• Cost-effectiveness analyses (CEA), where health outcomes were measured in natural units, for example life-years gained, lives saved, or clinical events avoided or achieved.
• Cost-utility analyses (CUA), where outcome is measured as health-related preferences, often expressed as quality-adjusted life-years (QALYs) gained.
• Cost-benefit analyses (CBA), which value costs and outcomes in monetary terms.
• Cost-minimization (CMA) studies, where intervention and alternatives are considered equivalent in terms of factors relevant to decision (other than cost), and so, lowest cost is selected.
• Cost-consequences analyses (CCA), where costs and outcomes are listed separately in a disaggregated format, without aggregating these results (for example, in an incremental cost-effectiveness ratio).
Studies reporting utility estimates and partial economic evaluations were not included in the gold standard. Other types of economic study such as cost of illness studies and costing studies were also excluded.
Identification of the Gold Standard
We identified a gold standard set (reference standard) of known database records meeting the desired criteria for economic evaluations. Gold standards are ideally identified by handsearching, but may also be developed using other methods such as relative recall (Reference Glanville, Lefebvre and Miles6;Reference Jenkins10;Reference Sampson, Zhang and Morrison16;Reference Wilczynski, Morgan and Haynes20). Resources did not allow for extensive handsearching, so we used relative recall methods to identify a gold standard (Reference Sampson, Zhang and Morrison16). Records were identified by searching NHS EED (3). NHS EED is populated by extensive sensitive searches of several major databases and by handsearching journals (http://www.crd.york.ac.uk/crdweb/html/help.htm). The resulting records are assessed and categorized. Those records judged to be full economic evaluations according to the NHS EED definitions (http://www.crd.york.ac.uk/crdweb/html/help.htm) are abstracted by health economists. This sensitive search approach and detailed selection process mean that NHS EED records can form a relative recall gold standard.
Economic evaluations in NHS EED were identified by searching using coding in the TY (Economic Study Type) field (http://www.crd.york.ac.uk/crdweb/html/help.htm). The gold standard was created by downloading all records that were retrieved by the following search: (“economic evaluation”:ty or “provisional abstract”:ty) NOT (“partial”:ty or “outcome”:ty).
This retrieved all NHS EED records that had been coded as an economic evaluation and had received a full abstract (“economic evaluation”:ty), or were in the process of receiving a full abstract (“provisional abstract”:ty). This search excluded partial economic evaluations (“partial”:ty) and outcome evaluation studies (“outcome”:ty) that might otherwise have been retrieved by the other search terms.
Records were downloaded for publications that had been published in 3 years: 2000, 2003, and 2006. These years were chosen to provide adequate numbers of records and to span as much of the decade as possible. Year 2006 was chosen as the final year because it was the most recent year when all published articles were likely to have been identified and categorized in NHS EED. The records were downloaded from NHS EED in December 2008, and the database was unlikely to have had complete sets of publications recorded for 2007 and 2008 at that time.
NHS EED records were downloaded into EndNote reference management software (4). The records were checked against MEDLINE and EMBASE. Where corresponding records were available in those databases, the unique record identifier was noted and a search strategy was created within those databases to retrieve the records of the known economic evaluations. This created a MEDLINE gold standard and an EMBASE gold standard.
Identifying Search Filters
Search filters were identified from the InterTASC Information Specialists’ SubGroup (ISSG) Web site (http://www.york.ac.uk/inst/crd/intertasc/econ.htm) and from CADTH staff and reviewers. The following filters were tested: CADTH filter (Reference Ho, Li and Noorani8); Emory University (Grady) filter (Reference Woodworth21); McKinlay et al. filters (Reference McKinlay, Wilczynski and Haynes13); NHS Economic Evaluation Database filters (2); NHS Quality Improvement Scotland filters (Reference Macpherson and Boynton12); Royle and Waugh filters (Reference Royle and Waugh15); Sassi et al. filters (Reference Sassi, Archard and McDaid17); Scottish Intercollegiate Guidelines Network filter (18); and Wilczynski et al. filters (Reference Wilczynski, Haynes and Lavis19).
The search filters by McKinlay et al. (Reference McKinlay, Wilczynski and Haynes13), Royle and Waugh (Reference Royle and Waugh15), Sassi et al. (Reference Sassi, Archard and McDaid17), and Wilczynski et al. (Reference Wilczynski, Haynes and Lavis19) had been developed using a range of methods and had been published in journals. The remaining filters were published, without detailed descriptions of their development methods, on Web sites and in reports (2;Reference Ho, Li and Noorani8;Reference Macpherson and Boynton12;18;Reference Woodworth21.
The search terms were “translated” where necessary to run in Ovid and the full strategies are provided as a supplementary table to this study, which can be viewed at www.journals.cambridge/thc.
Testing Search Filter Performance
Search filter performance in retrieving gold standard records was assessed in terms of sensitivity and precision. Sensitivity was calculated as: (number of gold standard records retrieved/total number of gold standard records).
Precision was calculated as: (number of gold standard records retrieved/total number of records retrieved).
The search filters’ performance in Ovid was tested by two researchers independently (J.G., D.K.) using the strategy shown in Table 1. The retrieval results were restricted to the years of interest by using the following search line: (2000 or 2003 or 2006).yr.
.yr., Publication year; .pt., Publication Type; /, Subject heading; Exp, Explode (subject heading); .ti., Title; .ab., Abstract; .sh., Subject heading; Or/1–3, Combine sets 1 to 3 using OR.
Each filter was tested with and without the following exclusions (see Table 1, row C): publication types unlikely to yield reports of economic evaluations, animal studies unlikely to be required for most HTA research.
RESULTS
Gold Standard Records and Search Filters
A total of 2,070 full economic evaluations were identified from NHS EED for the years 2000, 2003, and 2006. Of these, 1,957 records of evaluations had corresponding records in MEDLINE but only 1,955 were retrievable using the publication years of interest in Ovid MEDLINE. This constituted the MEDLINE gold standard set.
A total of 1,875 of the NHS EED evaluations had records in EMBASE. Restricting the search results to the 3 years of interest missed two gold standard records that had different publication years listed in EMBASE. The EMBASE gold standard set was reduced to 1,873 “retrievable” records in Ovid EMBASE. Thirteen MEDLINE search filters and eight EMBASE filters were identified.
Testing Search Filter Performance in the Ovid Interface to MEDLINE
The results of testing thirteen search filters are shown in Table 2 in order of highest sensitivity. No gold standard records were lost using the exclusion filter in MEDLINE.
The most sensitive filters in MEDLINE (over 0.990) were: NHS Quality Improvement Scotland (full and brief) (Reference Macpherson and Boynton12), NHS EED (2), and Royle and Waugh (Reference Royle and Waugh15). The most precise filter, among these filters, was the NHS EED filter (0.04) (2). Achieving higher levels of precision required some sacrifice of sensitivity. The Wilczynski best optimization filter achieved 0.093 precision with 0.923 sensitivity (Reference Wilczynski, Haynes and Lavis19).
The Emory University (Grady) filter (Reference Woodworth21) provided the highest levels of precision with sensitivity greater than or equal to 0.80 (0.845 sensitivity and 0.133 precision).
Testing Search Filter Performance in the Ovid Interface to EMBASE
The results of testing eight published and unpublished search filters in Ovid EMBASE are shown in Table 3, in order of highest sensitivity. Using the exclusion filter resulted in the loss of gold standard records, so the testing is reported without the use of the exclusions.
The most sensitive EMBASE filters were NHS Quality Improvement Scotland (Reference Macpherson and Boynton12), CADTH (Reference Ho, Li and Noorani8), Royle and Waugh (Reference Royle and Waugh15), and NHS EED (2), with greater than 0.99 sensitivity. These filters showed precision ranging from 0.015 to 0.029.
The best precision maximizing filter with high levels of sensitivity was the McKinlay best optimization filter (0.986 sensitivity and 0.064 precision) (Reference McKinlay, Wilczynski and Haynes13). The McKinlay best specificity filter offered precision greater than 0.23, but only 0.63 sensitivity (Reference McKinlay, Wilczynski and Haynes13).
Impact of the Exclusion Strategy in EMBASE
We examined the eight records lost by using the exclusion strategy in EMBASE. We found that seven records had the subject heading “Nonhuman,” and there was no specific mention of humans in the records. However, the records were relevant to humans as most focused on the economic evaluation of various types of medical tests. The eighth record was excluded due to the search line for specific animal names in the title, abstract, or subject heading. In this case, the abstract mentioned a “WCST-Cat score,” which excluded it from the results. When the animal names search line was removed, this record was retrieved.
DISCUSSION
Summary of Findings
This research provides valuable new performance data on current search filters to identify economic evaluations in MEDLINE and EMBASE. Many publicly available filters have not been previously validated (2;Reference Ho, Li and Noorani8;Reference Macpherson and Boynton12;18;Reference Woodworth21. There is little comparative performance data available for formally published search filters, so this research has enhanced the performance picture of those filters (Reference McKinlay, Wilczynski and Haynes13;Reference Royle and Waugh15;Reference Sassi, Archard and McDaid17;Reference Wilczynski, Haynes and Lavis19). Based on this analysis, long-standing filters such as those by CADTH, NHS EED, NHS Quality Improvement Scotland, and Royle and Waugh continue to perform with high sensitivity in MEDLINE. However, none of the filters tested offers both very high sensitivity (defined for this project as sensitivity greater than 0.950) and a level of precision (0.200) that would be valued by researchers.
The gold standards used for this validation exercise were much larger than those reported in original validation studies: gold standards reported ranged from 23 economic evaluations used by Wilczynski and colleagues (Reference Wilczynski, Haynes and Lavis19) to 129 economic evaluations used by Sassi and colleagues (Reference Sassi, Archard and McDaid17).
The low precision scores suggest that either the choice of text words and indexing terms for filters or their method of combination is not optimal. As currently used, those terms do not sufficiently discriminate economic evaluation records from the majority of other records that deal with economic issues in health care.
Although none of the published filters met the project's desired performance levels in terms of sensitivity and precision, this analysis does show which filters performed best in finding large gold standard sets of relevant records. Many of the MEDLINE filters reached 0.090 precision with sensitivity of 0.9 or greater. Searchers also now have performance data on concise filters compared with lengthy filters.
This analysis shows that economic evaluations cannot be identified efficiently using indexing terms applied by database producers. Using indexing terms specific to economic evaluations (such as “Cost-benefit Analysis/” in MEDLINE and “Cost Effectiveness Analysis/” in EMBASE) did not achieve high levels of sensitivity and precision. Reasons for this may include poor reporting by authors, indexer uncertainty, and indexing lapses. Researchers cannot rely on a few highly precise search terms to identify economic evaluations efficiently in large general biomedical databases, as is the case with randomized controlled trials (Reference Glanville, Lefebvre and Miles6). This information is important for database producers seeking to make research evidence more accessible to researchers. It also falls to authors of papers to report economic evaluations clearly and use economic evaluation terminology as consistently as possible to assist indexers and searchers to identify reports of economic evaluations reliably. It also provides evidence for the continuing value of databases such as NHS EED and HEED, which identify economic evaluations from thousands of irrelevant records to provide efficient access to economic evaluations.
Study Limitations
This study relies on a gold standard obtained from NHS EED. There is a risk that the NHS EED search filters (and others developed from the NHS EED filter such as the Scottish Intercollegiate Guidelines Network filter), which were used to build NHS EED, overperform in comparison to non-NHS EED filters (2;18). The issue centers on whether the additional database searches (CINAHL and PsycINFO) and handsearching of sixty journals and series used to populate NHS EED dilute this overperformance by identifying additional studies that were indexed in MEDLINE or EMBASE, but that were not identified by the NHS EED MEDLINE or EMBASE filters. Without a detailed retrospective analysis, it is difficult to judge if overperformance is an issue.
Another question raised by basing the research on NHS EED records is whether NHS EED can provide an adequate gold standard in terms of being very close to representing all available economic evaluations. NHS EED has been created by hand searching and extensive sensitive database searching. There are high levels of duplication in the searches, which should reduce the risk of missing relevant economic evaluations. Even though NHS EED does not search all potentially relevant databases and journals, and so may miss some relevant studies, its retrieval is likely to be as extensive as most health technology assessment searches. The value of using NHS EED also lay in its clear inclusion definitions, extensive hand search, and its international coverage (which meant that there were few language restrictions). The gold standard is also limited if the NHS EED definitions of economic evaluations are considered to differ significantly from definitions used by other research teams. The gold standard relied on the consistency of the NHS EED production process over time. If economic evaluations have been missed by NHS EED researchers, or miscategorized, the gold standard might not have reflected the true number of economic evaluations available to be retrieved.
The search filters may have identified records of economic evaluations that had been missed by the NHS EED identification process. It was not possible, within the resources available, to assess the search filter results for additional studies or to compare how well each filter performed at that task. This remains an area for future investigation.
The MEDLINE gold standard set included some unindexed records, but the majority of records were indexed with Medical Subject Headings (MeSH) and Publication Type headings. This means that the search filters’ performance was tested for retrieving indexed MEDLINE records. The filters’ performance in finding “in process” and unindexed records was not specifically tested. This may be an issue if the search filters are being used to identify very current studies where indexing has not yet been assigned. This may be the case in situations where MEDLINE and EMBASE are being searched to compensate for perceived publication lags. This issue can be investigated in the future by searching for records using just the text and abstract words from the filters.
We found that using an exclusion strategy was efficient in MEDLINE (reducing the number of irrelevant records whilst retaining sensitivity) but unhelpful in EMBASE where improvements in precision resulted in the loss of relevant records. The exclusions of relevant records resulted from attempts to remove (irrelevant) animal studies from the results. EMBASE indexing policy seems to be to assign the “Nonhuman” index term to records about tests, even though they are relevant to humans, perhaps because the subjects of the research are deemed to be the tests and not human beings. Using “Nonhuman” as an exclusion option may result in missed relevant records. Using animal terms in the title and abstract, such as “cat,” can also remove relevant records where the term is used as an abbreviation for a disease or for equipment. The exclusion strategy we used was typical of limiting strategies used in searches to identify studies for HTA, but there are no agreed standards for such strategies. Exclusion strategies should be carefully tested.
Exclusion approaches also use limits provided by database indexers to remove specific publication types that are unlikely to contain reports of research, such as editorials. This approach works best with indexed records and does not usually assist with excluding unindexed records. It should be noted that sometimes letters and editorials may mention other relevant research and provide clues to identify ongoing or unpublished research. Therefore, excluding letters and editorials from searches that are intended to be highly sensitive, to minimize publication bias, should be considered with care (Reference Lefebvre, Manheimer, Glanville, Higgins and Green11).
Recommendations for Future Research
The filters have been tested in two large general gold standards, for MEDLINE and EMBASE, but they require continuing validation to maintain a detailed performance picture. Further validation tests might include: Carrying out a “relative recall” exercise on completion of a review of economic evaluations (Reference Ritchie, Glanville and Lefebvre14;Reference Sampson, Zhang and Morrison16); Building hand searched gold standards from records identified by handsearching programs for NHS EED, HEED, or other projects; Using additional years of NHS EED or HEED records as abstracts are completed; and Identifying other gold standards from other economic evaluation databases.
CONCLUSIONS
Searchers can choose between several highly sensitive but low precision filters to identify economic evaluations in MEDLINE. Increased precision can be achieved by sacrifices in sensitivity (Wilczynski best optimization of sensitivity and precision filter) (Reference Wilczynski, Haynes and Lavis19). All the filters for MEDLINE can be performed with exclusion search lines, if appropriate, to remove unwanted publication types and animal studies, and so improve precision.
Searchers who require high sensitivity filters to identify economic evaluations in EMBASE can choose between the NHS Quality Improvement Scotland, CADTH, Royle and Waugh, and NHS EED filters (2;Reference Ho, Li and Noorani8;Reference Macpherson and Boynton12;Reference Royle and Waugh15). These filters have very low precision. The effect of exclusion strategies should be explored in the light of the loss of relevant economic evaluations from the specific exclusion strategy used in our research.
HTA researchers now have new comparative information on the performance of a range of filters for identifying economic evaluations. They have improved confidence that highly sensitive economic evaluation filters are available, but can also choose search filters with lower sensitivity and higher precision when resources are more limited or rapid scoping is required.
SUPPLEMENTARY MATERIALS
Supplementary Table 1: www.journals.cambridge/thc
CONTACT INFORMATION
Julie Glanville, MSc (jmg1@york.ac.uk), Project Director, Information Services, York Health Economics Consortium, University of York, Level 2, Market Square, York YO10 5NH, UK
Shaila Mensinkai, MA, MLIS (ShailaM@cadth.ca), Manager, IS Infrastructure & Services, David Kaunelis, MLIS (davidk@cadth.ca), Methods Specialist, Information Services, Canadian Agency for Drugs and Technologies in Health, 600–865 Carling Avenue, Ottawa, Ontario K1S 5S8, Canada