Introduction
Artificial Intelligence (AI) includes various technologies based on advanced algorithms and learning systems. Different terms are used in connection with AI, such as machine learning, deep learning, and conventional neural networks (Reference Hashimoto, Rosman, Rus and Meireles1). Furthermore, there is no universally agreed-upon definition of AI, but the definition a system capable of interpreting and learning from data to produce a specific goal is suggested (Reference Vaisman, Linder and Lundin2).
Medical specialties working with medical imaging have encountered a dramatic increase in the number of images produced over the past decade without an equivalent increase in the workforce (Reference Winder, Owczarek, Chudek, Pilch-Kowalczyk and Baron3). The excessive workload and burnout among physicians contribute to more mistakes and a prolonged answering time (Reference Winder, Owczarek, Chudek, Pilch-Kowalczyk and Baron3). Especially within pattern recognition, promising results have been accomplished and published across different AI technologies and healthcare areas (Reference Pesapane, Codari and Sardanelli4), which could significantly help medical staff and patients. However, it is important to recognize the low quality of the evidence and potential pitfalls behind AI technology, especially in a clinical setting (Reference Challen, Denny, Pitt, Gompels, Edwards and Tsaneva-Atanasova5). In addition, implementing advanced technology such as AI in a complex healthcare system could be difficult. A recent review of the scientific literature found a broad range of essential domains when assessing the impact of AI technologies; legal and ethical aspects were highlighted as important (Reference Fasterholdt, Naghavi-Behzad and Rasmussen6).
Although several reporting guidelines, frameworks, and checklists (Reference Cruz Rivera, Liu and Chan7–Reference Tsopra, Fernandez and Luchinat12) have been presented, an evidence-based and holistic assessment tool for valuing AI technology is still needed. The abovementioned guidelines are either not evidence-based (8;Reference Omoumi, Ducarouge and Tournier11) or rather narrow, for example, focussing on reporting of clinical outcomes (Reference Cruz Rivera, Liu and Chan7;Reference Liu, Cruz Rivera and Moher9), clinical performance metrics, validation, or robustness of the model (Reference Mongan, Moy and Kahn10;Reference Tsopra, Fernandez and Luchinat12). Health technology assessment (HTA) provides a broad framework for evaluating healthcare technologies, with several examples being tailored for specific areas and digital healthcare services (Reference Haverinen, Keränen and Falkenbach13;Reference Kidholm, Ekeland and Jensen14). HTA is a multidisciplinary process that summarizes information that has been collected in a systematic, transparent, unbiased, and robust manner (Reference Wild and Gartlehner15). One example is the HTA-based MAST (Model of ASsessment of Telemedicine), which has been accepted and used widely (Reference Kidholm, Clemensen, Caffery and Smith16). MAST has been used, adapted, and adjusted for assessment of telemedicine projects in rural areas in Germany (Reference Allner, Wilfling, Kidholm and Steinhäuser17). Also, a review of the use of MAST in European telemedicine projects was described by Ekeland and Grøttland (Reference Ekeland and Grøttland18), and MAST has been used as a framework for assessment of telemedicine in several European telemedicine projects, including more than 29,000 patients (Reference Kidholm, Jensen, Kjølhede, Nielsen and Horup19). Recently the MAST was chosen as a tool/assessment framework within the area of AI (Reference Fournaise, Lauridsen and Bech20) despite not being adapted for this area – underlining the need for an assessment tool for AI which includes assessment of safety, clinical outcomes, economic consequences, and organizational impact.
This study presents the development of a specialized HTA model for evaluating AI technologies within medical imaging – The model of assessment of AI (MAS-AI). Medical imaging is chosen due to the maturity of AI in this area, ensuring a robust evidence-based model. The purpose of the framework is to support decision makers when deciding whether or not to invest in AI technologies in medical imaging.
Methods
MAS-AI was developed by a multidisciplinary group of experts and patient representatives from Denmark, that is, HTA experts including health economists, clinicians, technical, legal, and ethical experts, and patients. A mixed method approach was used combining data from different sources and the MAS-AI guideline development was structured into three phases. First, we reviewed the existing guides, evaluations, and assessments of the value of AI in the field of medical imaging. In total, 5,890 studies were assessed, while eighty-six studies were included in the scoping review. Eleven essential domains were identified: (i) health problem and current use of technology, (ii) technology aspects, (iii) safety assessment, (iv) clinical effectiveness, (v) economics, (vi) ethical analysis, (vii) organizational aspects, (viii) patients and social aspects, (ix) legal aspects, (x) development of AI algorithm, performance metrics, and validation, and (xi) other aspects. The frequency of mentioning a domain varied from 20 to 78 percent within the included papers. See the published study for more details (Reference Fasterholdt, Naghavi-Behzad and Rasmussen6). Next, we conducted interviews with six leading researchers in AI in Denmark, lasting from 45 to 90 minutes. Interviews added new subtopics for some of the eleven domains identified through the review, but no new domains were identified. The third phase consisted of two full-day workshops with decision makers, patient representatives, and researchers in Denmark. The multidisciplinary team revised the model between the workshops according to comments from the workshop participants.
Details about the Workshops and Model Development
On 20 Sept 2021, we held the first MAS-AI workshop with eighteen participants in Odense, Denmark. Participants were divided into groups for the group work. Participants included five decision makers from hospitals or the regional healthcare sector, one patient representative, and twelve experts within various AI domains, that is, researchers and clinicians. Experts were radiology and nuclear medicine clinicians, three professors in data science, ethical and health aspects of AI, a researcher in anthropology, and HTA experts. There were three facilitated group sessions. During the first two sessions, participants discussed crucial domains and topics when evaluating AI based on results from the review and the interviews. In the last session, overall advice for the model work was discussed. The multidisciplinary team revised the model between workshops according to comments from workshop participants. For instance, at the first workshop, eleven domains were presented and discussed, and participants voiced a need for simplifications and a step-wise approach. Thus, at the second workshop, a model with nine domains and two steps was presented and discussed.
On the 22 November 2021, the second MAS-AI workshop was held in Odense, Denmark with a total of nineteen participants who were divided into groups for the group work. Participants included four decision makers from hospitals or the regional healthcare sector level, two patient representatives, and thirteen experts within various AI domains, that is, researchers and clinicians. Experts were radiology and nuclear medicine clinicians, a professor in ethical aspects of AI, two representatives from The Danish Medicines Agency, a legal expert, and HTA experts. One facilitated group session was held with several plenum discussions about the revised model. Again, the multidisciplinary team revised the model according to comments from workshop participants. Also, the model development was supported by answers from a Delphi questionnaire indicating which topics and subtopics were considered most important by the participants. Lastly, a final model was circulated via e-mail to participants of the workshops for their final comments. The following paragraph presents the MAS-AI model.
Results
The MAS-AI model has three parts and Figure 1 provides an overview of the content of these parts. There are two steps covering nine domains and process factors for an MAS-AI assessment. Note that the order of domains has no particular significance. Step 1 contains a description of patients, how the AI model was developed, and initial ethical and legal considerations. Finishing the four domains in step one is a prerequisite for moving to step two. In step two, a multidisciplinary assessment of outcomes of the AI application is done for the five remaining domains: safety, clinical aspects, economics, organizational aspects, and patient aspects. The last part consists of five process factors to facilitate a good evaluation process.
Finishing both steps is a complete MAS-AI assessment. Finishing only the first step is considered an “early MAS-AI,” that is, an initial assessment in the stage when only limited data are available in a few domains. Hence, step one can be seen as a prescreening, and if step one turns out positive, the second step can proceed.
Resume of All Nine Domains
Table 1 shows a brief description of the content of all nine domains. It is important to mention that MAS-AI utilizes an existing checklist, for example, “Checklist for Artificial Intelligence in Medical Imaging (CLAIM),” see Mongan et al. (Reference Mongan, Moy and Kahn10). The CLAIM guideline has forty-two items which are all incorporated into MAS-AI. The full description of all domains, including specific outcomes can be found in the Supplementary S1, which contains the complete MAS-AI guideline.
Note: See the list of abbreviations.
The information and data needed for the assessment of the nine domains will come from different sources. Information for the domains in the first step will often be available from the company that produces the AI solution, while the legal issues typically will require legal counseling from hospital staff. Data for the remaining domains in step 2 will primarily be supplied by the healthcare organization that is going to deploy the AI solution and/or HTA experts. Supplementary S2 provides cases as examples of how to use MAS-AI. A MAS-AI assessment will typically be around 5–10 pages, including a one-page executive summary.
Process Factors for a MAS-AI Assessment
The following five factors should be considered during the process of assessing an AI technology:
-
1. Assess the maturity: Judge the potential for clinical practise implementation through classification in development phases, that is, are we ready to move from step 1 (project phase) to step 2 (operation phase)?
-
2. Use multidisciplinary development with active participation across all stakeholders – make a plan for when to involve which stakeholders.
-
3. Use a “Devil’s Advocate-process” to counter hype and overpromising language in the assessments of AI, for example, by having people in the assessment team who are skeptical toward the AI application.
-
4. The organization should have a guideline for implementation to ensure adaptation and integration to real-world existing workflows and context.
-
5. Assessment should be done on a regular basis during the AI deployment phase, so when should the assessment be revisited?
Discussion
To our knowledge, no evidence-based and holistic framework has yet been presented to assess AI in medical imaging. We present the MAS-AI as a structured approach for assessment of AI technology in three parts. Two steps cover nine domains, and subsequently, there are process factors relevant for the MAS-AI assessment. Step 1 is a description of patients, the AI model developed, and initial ethical and legal considerations. Finishing the four domains in step 1 is a prerequisite for moving to step 2. In step 2, a multidisciplinary assessment of outcomes of the AI application is done for the five remaining domains: safety, clinical aspects, economics, organizational aspects, and patient aspects. Lastly, the model includes five process factors to facilitate the evaluation process.
As stated in a recent review by our group (Reference Fasterholdt, Naghavi-Behzad and Rasmussen6), a multifaceted, structured process and tool are needed to facilitate AI’s implementation in the healthcare system and provide greater transparency. The MAS-AI was developed based on HTA, a robust and well-known assessment tool for decision makers with specific reference to the EUnetHTA framework (21). Also, the CLAIM, a similar method well-proven, was an important inspiration (Reference Mongan, Moy and Kahn10). Further, in contrast to other guidelines or frameworks (Reference Tsopra, Fernandez and Luchinat12;Reference Alami, Lehoux and Auclair22), the MAS-AI assessment model is built not only on concepts or viewpoints (e.g., experts’ opinions, consensus statements) but on peer-reviewed evidence, interviews, and workshops. This approach ensures a high level of evidence combined with the relevant knowledge and expertise from stakeholders, decision makers, patients, and other experts. In addition, the workshop and interview participants were selected to reflect end-users and support the interdisciplinary collaboration AI evaluations call for.
In developing the model, we observed topic overlap (especially between ethical, legal, and patient domains). Although significant efforts were invested in separating the domains, some overlap remains – a more structured approach could have reduced the problem, for example, formal content mapping of the workshop outputs. Further, HR-Quality of life is considered a clinical effect/outcome in HTA Core Model from EUnetHTA. However, this outcome could also be in the patient domain as in the Canadian “decision determinants” framework (Reference Krahn, Miller and Bayoumi23). Medical imaging is a broad term that could be viewed as a limitation. However, in the field of telemedicine, which like AI covers a broad range of different technologies and approaches, it was possible to develop a common framework for valuing different types of telemedicine technologies (i.e., the MAST model: Model for Assessment of Telemedicine). The MAS-AI aims to be a broad framework, for example, covering both supervised and unsupervised techniques. However, we acknowledged that local adaption to the model could be necessary and developed further in specific areas. The model is currently undergoing a local validation and an external validation in Canada.
One of the major strengths of MAS-AI is the team behind the model. It consisted of an interdisciplinary group reflecting the complexity of AI (Reference Alami, Lehoux and Auclair22), thus covering all the identified domains in the model with specific experts within each field. Also, patients were an active part of the development of MAS-AI and one is a coauthor of this article. To our knowledge the MAS-AI is the first model that aims to cover all types of AI, thus covering both supervised and unsupervised techniques.
Transferability and Perspectives
Medical imaging was chosen as an area of interest mainly due to the maturity of AI in medical imaging, ensuring a robust evidence-based model. Furthermore, most of the evidence was retrospective with scarce clinical prospective studies, thus limiting the model’s clinical effectiveness, organizational, and economic aspects. This could restrict the use of MAS-AI to medical imaging, although we believe that most domains have a high level of transferability to other AI healthcare areas. The domains with least transferability are the once including the elements from the CLAIM which are specific to medical imaging, that is, domains 1 and 2.
Further, decisions about which AI technology to use and implement in health care can be structured differently and based on different decision levels between countries. This condition affects the transferability of MAS-AI. MAS-AI is primarily an assessment model whose main target group are decision makers in health care, for example, medical directors, head of departments at hospitals, local or national treatment councils, procurement organizations, and so forth. However, developers, researchers, and clinicians could also use the MAS-AI to guide the development, data collection, or research process. Further, the regulatory side, for example, policymakers from the government and HTA organizations or other regional and national authorities, may also find parts of MAS-AI helpful. Thus, MAS-AI may provide input to an evaluation in the entire lifespan of an AI technology. However, it is important to underline that MAS-AI is not intended as a “one-size-fits-all”-evaluation model. If the AI application is not very patient-critical, less rigorous evaluation might be appropriate.
The next phase includes empirical tests of MAS-AI usability. A validation workshop has been conducted in Toronto with Canadian health care decision makers and policymakers, AI researchers, clinicians, and patient organizations. Preliminary results (unpublished) from this workshop indicate that MAS-AI is relevant in a Canadian context based on a Delphi questionnaire regarding the perceived importance of the different types of information included in an MAS-AI assessment. Further research is planned to validate the framework in the Canadian context and explore the context specificities reflected in certain domains of the framework and its implementation challenges in the Canadian setting. Thus, the transferability of MAS-AI between Denmark and Canada will be thoroughly investigated. Also, we believe MAS-AI is sufficiently generic to be relevant for assessing other types of AI technologies in healthcare. However, this claim needs to be validated.
Conclusions
We present a holistic model for assessing artificial intelligence in medical imaging applications. This framework could provide a strong foundation for evaluation and help decision makers and other stakeholders make informed decisions when deliberating about or choosing to implement AI technologies. Secondly, we hope that MAS-AI will guide researchers and policymakers to conduct and evaluate AI research and ensure that only technologies that produce value for money are implemented in the healthcare systems globally.
Abbreviations
- AI, artificial intelligence
- CE, Conformité Européene
- CLAIM, Checklist for Artificial Intelligence in Medical Imaging
- MAS-AI, Model for ASsessing the value of AI
- MDR, medical device regulation
- QA, quality assurance
- QALY, quality-adjusted life year
- ROC, receiver operating characteristic curve
Acknowledgments
The authors thank the six respondents in the interviews and the people who participated in the two workshops and contributed valuable inputs for MAS-AI as well as Lise Kvistgaard Jensen for contribution to the linguistic content of the article.
Supplementary Material
To view supplementary material for this article, please visit https://doi.org/10.1017/S0266462322000551.
Author Contributions
I.F., K.K., and B.S.B.R. conceived and designed the study. I.F., T.K., M.N.-B., K.K., and B.S.B.R. contributed to data collection, while all authors contributed to data analyses. All authors discussed the results and contributed to the final manuscript. All authors read and approved the final manuscript.
Funding Statement
This work was supported by an in-house fund at Odense University Hospital (Denmark) named “Konkurrencemidler” in Danish. The funder had no role in the design of the study and collection, analysis, and interpretation of data, in writing the manuscript, or in the decision to submit the article for publication.
Conflict of Interest
The authors declare that they have no conflict of interests.
Data Availability Statement
The interviews and workshop material used during the current study are available from the corresponding author upon reasonable request.