The appropriate use and dissemination of new healthcare technologies needs to be supported by the provision of timely and high quality evidence. Assessment of the available evidence may help to influence or direct their uptake. For new interventional procedures, however, the evidence base is typically limited in quantity, quality, or both (Reference Campbell, Barnes and Kirby5;Reference Vist, Gjesdal and Rossvoll27). The development of health technology assessment (HTA) systems for interventional procedures has lagged behind those for drug treatments. Such systems exist only in a small number of countries (Reference Plumb, Campbell and Georgios Lyratzopoulos19), and only a minority of HTA organizations have a special remit for assessing interventional procedures. Examples include NICE's Interventional Procedures Programme in the UK (15), and the Australian Safety and Efficacy Register for Interventional Procedures-Surgical (or ASERNIP-S) in Australia (1). There are substantial variations in the status, structure, scope and modus operandi of HTA organizations with a remit for producing assessments of interventional procedures in different countries, and there is also variation in both the type and the format of their recommendations (Reference Plumb, Campbell and Georgios Lyratzopoulos19). For example, such organizations vary in their nature (whether public, academic, or professional bodies) and their funding sources; in the systems they use to select procedures for assessment (whether or not procedures are “self-selected” by the organizations themselves); in the types and sources of evidence used in the appraisal process (particularly in relation to unpublished evidence and use of registry data); in the people who appraise the evidence (notably in relation to industry representatives and lay people); in the arrangements for consultation on draft assessments (specifically whether an open consultation process exists or not); and in whether assessments produced in other countries are used (Reference Plumb, Campbell and Georgios Lyratzopoulos19).
Inevitably, any new interventional procedure may be assessed by different national organizations. The procedure may be appraised at different times by these different organizations and therefore in the context of different amounts of published evidence. This may result in different recommendations on the grounds of the available evidence base, quite apart from any differences resulting from varying appraisal processes and interpretations of the evidence. Organizations publish their recommendations in a variety of formats including “systematic reviews,” “technology reports,” and “guidance.” For the purposes of this study, they have all been referred to as assessments.
In recent years, initiatives have been set up to increase international collaboration in horizon scanning and other aspects of healthcare technology assessment methods, for example, the Euroscan initiative, www.euroscan.org.uk (Reference Simpson, Packer and Carlsson21). We know of no previous comparisons of different assessments about new interventional procedures produced by national HTA organizations in different countries. Comparison of assessments produced for the same procedures in different countries can offer several potential benefits. For example, it could help motivate improvements of the “systems” used in different countries, based on the examples of others; and stimulate greater international collaboration in improving generation and appraisal of relevant evidence. This study was based on a detailed comparison of the assessments published on five selected interventional procedures in countries around the world.
METHODS
Data
The study was conducted in early 2008. Given the limited resources and time available for this project, our aim was to select a small number of procedures, which had to fit the following criteria: (i) readily identifiable (i.e., unlikely to be confused with any other procedure), (ii) clearly “new” (i.e., introduced in recent years and not well established), (iii) potentially high impact (i.e., might become widely used), and (iv) related to different disease areas and specialties.
We identified five such interventional procedures, during unstructured discussions between the authors during the present and a “parallel” project (Reference Plumb, Campbell and Georgios Lyratzopoulos19), based on the list of 250 procedures on which NICE's Interventional Procedures Programme had published guidance (as of early 2008), and our direct knowledge of procedures that had been assessed by other healthcare technology assessment organizations but not NICE. The selected procedures were: cryotherapy for prostate cancer; deep brain stimulation for Parkinson's Disease; radiofrequency ablation for atrial fibrillation (percutaneous approach); lung volume reduction surgery for emphysema; and vacuum assisted wound closure (the one procedure not assessed by NICE). We then searched the Web sites of HTA organizations to identify different assessments about these procedures—specifically HTA organizations that were members of the International Network of Agencies for Health Technology Assessment (www.inahta.org/); Health Technology Assessment International (www.htai.org/); and the European Network for Health Technology Assessment (www.eunethta.net/). In addition, Web searches were also carried out using relevant search terms for the procedures of interest. When any organization had produced assessments which had subsequently been updated, only the most recent assessment was included in the study. We only considered assessments available in either full-text or abstract in English language (this resulted in excluding three assessments).
For each assessment, the following information was extracted and recorded or summarized about: (i) publication date; (ii) type of publication (e.g., “technology report” or “systematic review”); (iii) the inclusion of information about comparator interventions; (iv) whether primary RCT studies were considered as part of the evidence—and their number; (v) use of expert clinical advice; (vi) whether or not the assessment stipulated “additional” recommendations, defined as recommendations beyond the efficacy or safety per se, and about: (a) patient selection, (b) patient consent, (c) operator training, (d) possible need for involvement (and nature of) a multidisciplinary team, (e) the type of healthcare setting/organization where the procedure should be done, and (f) future research; and (vii) whether cost or cost-effectiveness was considered.
The efficacy, safety, and overall “headline” recommendation, of each assessment, were summarized and recorded by the first author (J.P.) and independently verified by a co-author (H.G.). The three investigators (J.P., G.L., and B.C.) then independently scored these summary statements for their apparent judgment of the quality, quantity, and consistency of evidence in relation to efficacy, safety, and cost-effectiveness. A scoring scale of 1–5 was used, where 1 denoted a statement describing or suggesting the weakest and 5 the strongest evidence base. The three investigators (J.P., H.G., and G.L.) also independently scored the “degree of support” expressed in the assessment's headline statements about the efficacy and the safety of the procedures, using a scale of 1–5, where 1 denoted the weakest and 5 the strongest support. The purpose was to examine variation (if it existed) in degree of expressed support—as opposed to evaluate the “quality” or “appropriateness” of the headline statement (the latter was beyond the objectives of the project). The three investigators subsequently met and agreed a single consensus score for these criteria for each assessment derived from the mean of the three scores. (Overall there was a great level of consistency between the three raters, but we have not examined this statistically, as we do not think this will be meaningful in the context of this study.)
RESULTS
Basic Characteristics of the Reviewed Assessments
In total, twenty-three individual assessments were identified initially for the five selected procedures, of which three were excluded from further analysis, because they had subsequently been updated by the same organization. The remaining twenty assessments (range, 3–5 for each of the five procedures) had been produced by ten different organizations in nine different countries (Australia was represented by two organizations (Table 1) (2;Reference Banerjee, Babidge and Cuncins-Hearn3;6;Reference Fisher and Brady7;9–14;Reference Noorani, Yee and Marshall16–Reference Pichon Riviere, Augustovski and Ferrante18;Reference Samson, Lefevre and Aronson20;Reference Stirling, Babidge and Peacock22–Reference Vist, Gjesdal and Rossvoll27). Three countries predominated: Australia (five), Canada (five), and the UK (four). For the same procedure, assessments from different countries had been produced within ranges of three (cryotherapy for prostate cancer) to 5 years (lung volume reduction surgery for emphysema) (Supplementary Table 1, which is available at www.journals.cambridge.org/thc2010007).
Table 1. Countries and HTA Organizations with Assessments About Each of the Procedures
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170127003350-72075-mediumThumb-S0266462309990614_tab1.jpg?pub-status=live)
‘+’ denotes assessment produced; reference numbers in parenthesis.
For one of the procedures (cryotherapy for prostate cancer), there was no RCT evidence used in any of the relevant three assessments. For the other four procedures, the number of primary RCTs considered by individual assessments ranged from one (NICE-Deep Brain Stimulation) to six (from ASERNIP-S and AHRQ on vacuum assisted wound closure) (Supplementary Table 1). Overall, the number of RCTs was somewhat greater in assessments published in more recent years. Explicit comparator interventions (i.e., interventions which the new procedure might replace or reasonably be compared with) were described in seven of the twenty assessments, but not in the other thirteen.
Guideline “Headline Statements” About the Quantity, Quality, and Consistency of the Evidence, and “Overall” Support
Overall, scores on the headline statements about the strength of the evidence base on both efficacy and safety were low (Table 2). On the scale of 1 (weakest) to 5 (strongest) only four assessments received a score of 4 on their statements about evidence on efficacy, and three of these were scored 4 for some aspect of their statements about evidence on safety.
Table 2. Scores on the Quantity, Quality, and Consistency of the Evidence of Efficacy and Safety as Described or Implied in Sentences in the Assessment
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170127003350-75299-mediumThumb-S0266462309990614_tab2.jpg?pub-status=live)
‘Consist.’, Consistency; CRY, cryotherapy for prostate cancer; DBS, deep brain stimulation for Parkinson's Disease; LVR, lung volume reduction surgery for emphysema; RFA, radiofrequency ablation for atrial fibrillation; VAC, vacuum assisted wound closure.
‘1’ = Very Poor to ‘5’ = Very Good; ‘—’ = Not Mentioned.
Statements about quantity and quality of evidence on efficacy were identified and scored on twenty-nine occasions (73 percent of a possible 40) compared with twenty-one (53 percent) for safety. Statements about both efficacy and safety evidence tended to be more positive in more recently produced assessments. Comments about consistency of the evidence were made on only four occasions (20 percent) for each of efficacy and safety. Statements about costs or cost-effectiveness were very scant (only in 30 percent of assessments) and tend to suggest a very low strength of evidence.
Scores on headline statements about support for the procedures varied between “2” (second weakest) and “4” (second strongest possible support). Overall statements of support about the procedure tended to be more positive in more recently published evidence. Examples of statements receiving low and high scores about judgments of the strength of the evidence are provided in Box.
Box. Examples of Headline Statements That Were Scored High and Low (for Their “Degree of Support” About the Efficacy and Safety of the Procedures)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170127003350-99725-mediumThumb-S0266462309990614_tab3.jpg?pub-status=live)
Presence or Absence of “Additional” Recommendations in Assessment “Headline Statements”
Recommendations in addition to the appraisal of efficacy and safety per se (i.e., about patient selection, consent, training requirements, teams and care settings for the procedure, future research, and registers) are shown in Table 3. Recommendations for further research were the most frequent, featuring in eleven assessments (55 percent of all assessments compared). Recommendations occurred about patient selection in seven (35 percent), about consent in four (20 percent) and about the type of team which should be involved in the procedure in four (15 percent) assessments. NICE (UK) made additional recommendations of this kind more frequently than any other organization.
Table 3. Additional Recommendations in ‘Headline’ Statements of the Assessment Report
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170127003350-03205-mediumThumb-S0266462309990614_tab4.jpg?pub-status=live)
CRY, cryotherapy for prostate cancer; DBS, deep brain stimulation for Parkinson's Disease; LVR, lung volume reduction surgery for emphysema; RFA, radiofrequency ablation for atrial fibrillation; VAC, vacuum assisted wound closure.
‘+’ indicates the presence of additional recommendation.
DISCUSSION
We have recently surveyed systems for producing assessments on interventional procedures, indicating that such systems exist in at least eighteen countries around the world (Reference Plumb, Campbell and Georgios Lyratzopoulos19). This present study has shown how few countries had published assessments for five procedures selected on the basis of criteria including their “newness” and potentially high impact. Fourteen of the twenty assessments came from Australia, Canada and the UK and three were updates of previous assessments from these countries. Among the relatively few countries which had published assessments for any particular procedure, there was substantial variation, with no particular systematic pattern in the statements about the quality, quantity and consistency of the available evidence; and about the degree of support expressed about the efficacy and safety of the procedure.
A limitation of our study is the relatively small number of procedures chosen for study (because of resource constraints), therefore generalizations ought to be cautious. In addition, as we identified these procedures mainly based on the list of procedures that had been assessed by NICE, this may further limit the generalizability of the study's findings to other procedures. Whilst thorough and systematic searches were undertaken of organizational Web sites, variation in Web site publication practices between them, and restriction of inclusion of material available in English language are potential limitations of our study. Nevertheless the extraction of statements that were subsequently rated was thorough and was verified by a second assessor. Another potential limitation is the relative subjectivity of our scoring system, which however captured the individual assessment of three independent assessors and was based on the empirical appraisal of assessment statements. We believe this is acceptable, given the fact that there are no other validated standard scales to measure such statements in healthcare technology assessments.
Our scoring of assessment statements about the strength of the evidence base for these procedures concurs with the general observation that interventional procedures are often introduced on a poor research evidence base (Reference Wilson28). Most statements about quantity, quality and (rarely) consistency of the evidence suggested that it was relatively poor. It was interesting to note that statements about the evidence were more often made with regard to efficacy (nearly three quarters of all assessments) than for safety (just over half). This is surprising, as the appraisal of safety for new procedures with uncertain evidence should be considered as at least of equal (or greater) importance to the appraisal of efficacy. A single investigator summarized and reported the headline statements and this enabled consistency. However the subjectivity of this methodology is also a potential weakness.
It is logical that assessments undertaken at different times for any particular procedure will have different amounts of published evidence available. Our sample was too small to examine the statistical significance of any apparent trend toward an increase in evidence base over time. There was perhaps a suggestion of this in the scoring of the evidence, but there was no consistent increase in the numbers of RCTs used, year upon year (although this may reflect the use of secondary research—reviews and meta-analysis). Consideration of the likely increase in the evidence base with time raises a question about the “optimal” time to undertake and publish an assessment of a procedure. If this is done “too soon” then there may be insufficient evidence upon which to base any useful assessment. However, if publication is delayed until plentiful evidence has been published there may be unnecessary delay in the introduction of a new procedure or widespread use in an inappropriate and uncontrolled way. In principle, one approach to dealing with this question is updating assessments regularly, but this requires more resources, and there seems to be no apparent consensus about the “triggers” for such updates, nor about their nature or frequency. Three of the reviewed assessments were “updates” of previously issued ones.
Publication of assessments in the early stages of a procedure's trajectory, when evidence is limited, provides an opportunity to recommend further research and to specify the uncertainties which need to be resolved. Recommendations for further research occurred in just over half of the assessments considered in this study. Some of these were nonspecific, simply suggesting that further research would be useful. Our own practice at NICE is now to specify the particular outcome measures which research should include, aimed at filling the gaps in the evidence base which would allow a more confident assessment of the procedure in the future. Another potential means of gathering evidence (and monitoring introduction of procedures) is to recommend submission of details about all patients to well designed registers. This was not observed in the published assessments for any of the procedures in this study, but it may be a useful way of gathering more evidence on procedures for which controlled trials would be difficult to conduct (Reference Lyratzopoulos, Patrick and Campbell8).
Apart from recommendations about the need for research, “additional” (i.e., beyond the appraisal of efficacy and safety) recommendations occurred in a minority of the assessments studied. All the procedures apart from vacuum assisted wound closure represent technical developments in highly specialist fields, and yet recommendations about operator training, the type of specialist teams which should be involved, and the setting where procedures should be done, were uncommon: they were most commonly found in the assessments from NICE. Recommendations about patient selection were more frequent, but still only present in approximately a third. Patient consent would seem to be a vital issue when offering new procedures to patients, and yet there was a specific reference to consent issues in only four of the pieces of the assessments—three from NICE in the UK and one from MSAC in Australia (the latter about lung volume reduction surgery). Perhaps this reflects different requirements of the clinical governance systems in different countries, but it would seem to be a matter which merits more emphasis in assessments of new procedures for which the evidence on safety and/or efficacy is uncertain.
Operator training is an important aspect for the safe introduction of new procedures. It seems to occur in a random and piecemeal manner, being most organized in the context of those commercial companies which market their novel devices only in the context of training courses and proctorship. Better definition of the need for, and organization of, systems and standards for training of clinicians (specifically accredited, established clinicians) who wish to start using new procedures is required in the UK, and probably in other countries (Reference Campbell, Patrick, Barnes and Marlow4).
Statements or recommendations about cost-effectiveness occurred in very few assessments. Clearly their inclusion depends to a great extent on the remit of the organization publishing the assessment—some have a specific remit to consider cost (Reference Plumb, Campbell and Georgios Lyratzopoulos19). Detailed cost-effectiveness calculations require adequate data, which is unlikely to be available when the evidence on efficacy and safety is so sparse. It is also relevant to remember that costs may vary in different healthcare systems, and cost-effectiveness data may therefore be less “transferable” from one country to another than other aspects of evidence appraisal.
Production of better evidence on new interventional procedures early in their development, through well designed trials, is long overdue, and there are many reasons for this. The increasing regulatory burden which impedes timely initiation and conduct of research studies is one important issue. Appraisal of available evidence, tracking of the publication of new evidence, and review of the evidence base, are all activities which might lend themselves to international collaboration, based on shared goals and agreed methodologies. Agreed research objectives and collection of similar data in well designed registers are other areas in which progress could be made by dialogue between health technology assessment organizations and clinicians in different countries: the former should take a lead on this to guide and assist clinicians. These are lofty aims and require a degree of collaboration which is currently not routine, but the obstacles are not insuperable. The greater the degree of international collaboration that can be achieved in production and appraisal of evidence on new procedures, the more rapidly they will either become established in a safe and efficient manner (or rejected), for the benefit of patients.
SUPPLEMENTARY MATERIALS
Supplementary Table 1 www.journals.cambridge.org/thc2010007
CONTACT INFORMATION
Jonathan Plumb, RN, DipHE, BSc(Hons), MPH (jonathan.plumb@mhra.gsi.gov.uk), Honorary Research Fellow (January to April 2008), Interventional Procedures Programme, National Institute for Health and Clinical Excellence, MidCity Place, 71 High Holborn, London WC1V 6NA, UK
Georgios Lyratzopoulos, MD, FFPH, MRCP, MPH, DTM&H, Ptychio Iatrikes (georgios.lyratzopoulos@nice.org.uk), Consultant Clinical Adviser, National Institute for Health and Clinical Excellence, MidCity Place, 71 High Holborn, London WC1V 6NA, UK; Clinical Senior Research Associate, Department of Public Health & Primary Care, University of Cambridge, Institute of Public Health, Forvie Site, Robinson Way, Cambridge CB2 0SR, UK
Helen Gallo, BSc(Hons), MSc, DLSHTM (helen.gallo@nice.org.uk), Technical Analyst, National Institute for Health and Clinical Excellence, MidCity Place, 71 High Holborn, London WC1V 6NA, UK
Bruce Campbell, MS, FRCP, FRCS (bruce.campbell@nice.org.uk), Chair, Interventional Procedures Advisory Committee, National Institute for Health and Clinical Excellence, MidCity Place, 71 High Holborn, London WC1V 6NA, UK; Professor, Peninsula College of Medicine & Dentistry, Universities of Exeter and Plymouth, UK