Introduction
Clinical intuition is commonly characterised as a “feeling”. This feeling of subconscious pattern divergence can be applied to the diagnosis of complex illnesses or impending clinical deterioration. Clinical intuition is derived from repeated exposures to similar events that are stored in the human brain over time. This library of events can fine tune intuition so that when a future clinical event occurs, the clinician can anticipate or predict what will happen next. Humans are relatively efficient at taking into account multidimensional data (i.e., laboratory results, monitor data, and diagnostic imaging). Similarly, machine learning can closely replicate human intuition and support the deep infrastructure that goes into diagnosing complex illnesses or prediction of clinical deterioration, however, remains unaffected by biases and recent experiences that so often cloud our human judgement.
Machine learning is a discipline at the intersection of mathematics, statistics, and computer science that provides a powerful catalogue of techniques used to make predictions about future events. Machine learning implies training a computer algorithm on historical data stored in a large data set in order to “learn” how to make predictions about future events. Advances in computational power to handle complex amounts of reference data at high speed have led to the observed exponential growth of machine learning applications in healthcare in recent years. Such machine learning applications are conceptually different from computerised algorithms based on classical statistical modelling. The latter follows a rule-based logic where a programmer decides a set of conditional statements derived from domain knowledge to automate “human-like” clinical decision making (i.e., if body mass index > 30 then class = “obese”). In contrast, a machine learning algorithm “learns” the modelling parameters from historical data to develop decision rules for future predictions. This “learning” process performs in a fashion completely unbiased by existing domain-knowledge given that there are hidden patterns in the data that might not be obvious to humans. Furthermore, since machine learning algorithms use a data-driven logic independent from that of clinicians, it has been shown that machine learning algorithms outperform clinicians in some scenarios. Reference Gharehbaghi, Dutoit, Sepehri, Kocharian and Lindén1 Figure 1 summarises the machine learning pipeline, emphasising its role in “data-driven” decision making.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211118182600908-0998:S1047951121004212:S1047951121004212_fig1.png?pub-status=live)
Figure 1. Summary of a typical Machine Learning Pipeline Application.
*Machine Learning (ML), Principal Component Analysis (PCA).
Machine learning has been successfully applied to medicine in many fields, such as advanced cardiac imaging. Reference Yasaka and Abe2 In this context, medical images are processed and compiled to extract features used by the algorithm to fine tune its classification of outcomes. The majority of advances in medical imaging machine learning applications (i.e., artifact removal, augmentation of disease classification accuracy, etc.) have been developed in various adult populations. Reference Bien, Rajpurkar and Ball3–Reference Zech, Badgeley, Liu, Costa, Titano and Oermann8
Neonatal and paediatric populations, especially the most vulnerable like those with CHD, could greatly benefit from machine learning-based improved diagnostic accuracy and early disease detection. One in 100 live births in the United States is diagnosed with CHD every year, of which nearly 7,200 have critical CHD. Reference Oster, Lee, Honein, Riehle-Colarusso, Shin and Correa9–12 These life-threatening structural malformations of the heart are present at birth and require intervention in the first year of life. 13,Reference Ailes, Gilboa and Riehle-Colarusso14 Delays in timely diagnosis in neonates with CHD, and limited access to specialised cardiac programmes, could result in preventable morbidity or mortality in some cases. Securing access to specialised neonatal or paediatric cardiac programmes preemptively, although challenging, has been shown to play an instrumental role in decreasing risk of CHD infant mortality. Reference Udine, Burns, Pearson and Kaltman15 The observed benefits of machine learning applications in healthcare are promising for optimising timing and accuracy of CHD diagnoses, thereby providing early targeted access to highly specialised cardiac care. The objective of this scoping review is to describe the application and clinical utility of machine learning techniques used for diagnosing and assessing underlying critical and non-critical CHD. In this review, we will briefly define the emerging research applications where machine learning has been applied in paediatric cardiology, describe the various machine learning techniques used in these categories, and summarise the specific applications used for diagnosis and assessment of critical and non-critical CHD.
Materials and methods
Literature search strategy
We followed the guidelines for Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews. Reference Tricco, Lillie and Zarin16 All original, peer-reviewed studies published in PubMed database between January, 2015 and February, 2021 that described the use of machine learning or predictive analytics for predicting diagnostic outcomes in patients with critical and non-critical CHD were included. The most recent search was conducted on 20 February 2021. The search terms were “(machine learning) AND ((congenital heart disease) OR (cardiovascular disease in children))”. Studies focused on populations without a CHD diagnosis were excluded. The search was limited to English-language articles. This search yielded 219 journal articles. After screening the titles and abstracts, we excluded articles that were irrelevant (n = 169). Among the 50 full-text articles reviewed, 20 articles focusing on CHD diagnosis and assessment were retained for inclusion in this scoping review. Figure 2 briefly summarises the work flow for article search and selection.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211118182600908-0998:S1047951121004212:S1047951121004212_fig2.png?pub-status=live)
Figure 2. Preferred Reporting Items for Systematic Reviews and Meta-Analyses Flow Diagram of Included Articles.
Data coding scheme
Full-text articles were analysed using the matrix method as per recommendations by Whittemore and Knafl. Reference Garrard17 A single reviewer first sorted each article into a table using ascending chronological order with the following eight domains: journal / author information, purpose, design, sample, variables, results, limitations, and implications for future research. These domains were selected after discussions between the coauthors, then the information related to each domain were abstracted by a single reviewer. In addition to these general domains, we defined a data dictionary for the abstracted machine learning elements necessary for this study. These elements included: class of machine learning approach (supervised vs. unsupervised, regression vs. classification, traditional learning vs. deep learning); specific algorithm used (logistic regression, support vector machine, etc.); techniques used for testing and cross-validation; and the use of independent external validation.
Synthesis of findings
Results were reported using the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews guidelines. Reference Tricco, Lillie and Zarin16 Each article was assessed for its standard of documentation using the Minimum Information About Clinical Artificial Intelligence Modeling checklist. Reference Norgeot, Quer and Beaulieu-Jones18 Simple descriptive statistics and pie charts were used to report frequencies of different machine learning algorithms applied across eligible studies. Next, the content of the abstracted domains of eligible studies were qualitatively synthesised. Based on domain expertise and discussions among coauthors, three categories of machine learning applications for the diagnosis and assessment of CHD emerged and results were summarised for each of these categories.
Results
There were 50 studies that broadly focused on the application of machine learning in paediatric cardiology research. These studies were categorised and focused on various intentions for clinical use (Figure 3): diagnosis and assessment of underlying critical and non-critical CHD (n = 20), Reference Gharehbaghi, Dutoit, Sepehri, Kocharian and Lindén1,Reference Diller, Lammers and Babu-Narayan19–Reference Lv, Dong and Lei37 prediction and risk stratification of outcomes in CHD (n = 15), Reference Ruiz, Saenz and Lopez-Magallon38–Reference Cainelli, Bisiacchi and Cogo52 management of patients with CHD (n = 2), Reference Diller, Kempny and Babu-Narayan53,Reference Wolf, Lee and Nicolson54 medical device research (n = 4), Reference Wang, Javadekar and Rajagopalan55–Reference Liu, Aslan and Hess58 novel genetics and biomarkers in CHD (n = 5), Reference Liu, Zhao and Yuan59–Reference Qi, Zhang and Zhao63 CHD in pregnancy (n = 3), Reference Ren, Zhu and Gao64–Reference Dozen, Komatsu and Sakai66 and social media research (n = 1). Reference Klein, Sarker, Cai, Weissenbacher and Gonzalez-Hernandez67 The distribution of various machine learning algorithms used in these studies is summarised in Figure 4. The two most common algorithms were deep neural networks (deep learning) and support vector machines. Hidden Markov models and linear discriminant analysis were the least common algorithms.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211118182600908-0998:S1047951121004212:S1047951121004212_fig3.png?pub-status=live)
Figure 3. Distribution of Machine Learning Applications and Uses in Pediatric Cardiology Research.
*Congenital Heart Defect (CHD).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211118182600908-0998:S1047951121004212:S1047951121004212_fig4.png?pub-status=live)
Figure 4. Distribution of Machine Learning Algorithms Used in Pediatric Cardiology Research in General.
There were 20 studies focusing on the diagnosis and assessment of critical and non-critical CHD. The Table 1 summarises the details of these studies, including purpose, design, sample, machine learning technique, and primary findings. All studies were observational, generally of small sample size. None of the studies provided 100% of the Minimum Information About Clinical Artificial Intelligence Modeling checklist items (standard documentation guidelines). Reference Norgeot, Quer and Beaulieu-Jones18 Included studies applied machine learning to auscultation of heart sounds in patients with CHD, interpreting transthoracic echocardiogram data, or processing medical images (cardiovascular MRI). As shown in Figure 5, deep neural networks and support vector machines were also the most commonly used classification algorithms in these studies. More importantly, Table 1 highlights that using cross validation on existing retrospective data, the overall accuracy of the various machine learning algorithms exceeded 80%, with some techniques reaching 95 to 100%. With the exception of two studies, none of these models were externally validated on independent data sets. The remainder of this scoping review will focus on the qualitative synthesis of 20 studies that focused on machine learning applications for the diagnosis and assessment of CHD.
Table 1. Summary of studies focusing on machine learning techniques applied to diagnosing and assessing CHD in neonates, infants, and children
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211118182600908-0998:S1047951121004212:S1047951121004212_tab1.png?pub-status=live)
* Area Under the Curve (AUC), Atrial Septal Defect (ASD), Atrioventricular (AV), Bicuspid Aortic Valve (BAV), Cardiovascular Magnetic Resonance Imagining (CMR), Coarctation of the aorta (CoA), Confidence Interval (CI), Congenital Heart Defects (CHD), Convolutional Neural Network (CNN), Deep Neural Network (DNN), Linear Discriminant Analysis (LDA), Interquartile Range (IQR), Left Ventricle (LV), Mitral Regurgitation (MR), Negative Predictive Value (NPV), Patent Ductus Arteriosus (PDA), Positive Predictive Value (PPV), Principal Component Analysis (PCA), Right Ventricle (RV), Root Mean-Square Error (RMSE), Signal-to-Noise Ratio (SNR), Standard Deviation (SD), Structural Similarity Index (SSIM), Support Vector Machine (SVM), Time Growing Neural Network (TGNN), Tricuspid Regurgitation (TR), Ventricular Septal Defect (VSD).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211118182600908-0998:S1047951121004212:S1047951121004212_fig5.png?pub-status=live)
Figure 5. Distribution of Machine Learning Algorithms Used in Pediatric Cardiology Research Focusing on the Diagnosis and Assessment of Critical and Non-Critical CHD
The most common neural network types were the convolutional neural network, time growing neural network, autoencoder, and generative adversarial network.
The most common kernel among the studies who used support vector machine was a Gaussian kernel.
*Congenital Heart Defect (CHD).
Synthesis of literature
It is apparent that machine learning in paediatric cardiology research is an evolving field. Diagnosing and assessing patients with CHD can typically be done non-invasively, using examination findings and diagnostic tools based on auditory or visual pattern recognition. However, such medical images and signal data are considered unstructured; they are not stored as tabular data or in formatted fields. Reference Kagiyama, Shrestha, Farjo and Sengupta68 These type of data historically require the time-consuming process of clinician review and interpretation. Thus, in individuals with CHD, machine learning techniques described in Table 1 have focused on accurate diagnosis and assessment based on the classification of auscultatory heart sounds, transthoracic echocardiograms, or cardiovascular magnetic resonance images.
Auscultation
Aortic valves are normally tricuspid in nature and provide an outlet for blood flow from the heart to the body. Bicuspid aortic valves occur in 0.5 to 2% of children. Reference Gharehbaghi, Dutoit, Sepehri, Kocharian and Lindén1,Reference Mahle, Sutherland and Frias69–Reference Spaziani, Ballo and Favilli72 A subset of these patients develops progressive valve disease and/or aortic dilatation, with risk of life-threatening aortic aneurysm and dissection. Children with known bicuspid aortic valves must be monitored throughout their lives to measure risk for aortic aneurysm. Diagnosis of bicuspid aortic valves can be done via phonocardiogram, which records cyclical sounds produced by the heart. Reference Gharehbaghi, Dutoit, Sepehri, Kocharian and Lindén1 Patients with a bicuspid aortic valve typically have a systolic ejection click. Identification of this click through traditional auscultation methods can be limited due to provider expertise and skill, as well as rapid heart rates in young children. Gharehbaghi and colleagues performed a study that collected phonocardiogram data prospectively. Reference Gharehbaghi, Dutoit, Sepehri, Kocharian and Lindén1 This study created a statistical time growing neural network that automatically classified bicuspid aortic valves through use of recorded heart sounds produced by the phonocardiogram. Reference Gharehbaghi, Dutoit, Sepehri, Kocharian and Lindén1 The phonocardiogram recordings were preprocessed, so that the cardiac cycle could be segmented to recognise the additional heart sound (systolic ejection click). This segmentation was used to build a classifier to identify healthy subjects and those with a bicuspid aortic valve. The model was able to classify the 865 cardiac cycles with 98.5% accuracy. This improved diagnostic sensitivity for a subtle exam finding stands to bring many previously unrecognised children and adults to appropriate cardiac care.
Several other investigators are also using non-invasively recorded heart sounds to diagnose cardiac disease. Reference Aziz, Khan, Alhaisoni, Akram and Altaf29,Reference Gharehbaghi, Sepehri and Babic30,Reference Gómez-Quintana, Schwarz and Shelevytsky34 For example, Elgendi and colleagues Reference Elgendi, Bobhate and Jain27 used linear discriminant analysis to detect pulmonary arterial hypertension using digital auscultation to record the unique vibrations of the hypertensive pulmonary circulation. Their model performed well, with a sensitivity of 84% and specificity of 88.6% for entropy (disorder of heart sound pattern) of the first sinusoid formant (frequency resonance of heart sounds). Sun and colleagues, Reference Sun and Wang20 aimed to diagnose small, medium, and large ventricular septal defects based on heart sound feature extraction, using classification boundary curves and ellipse models. The ellipse model outperformed five other models used in the study for normal, small, medium, and large ventricular septal defect classification (accuracy 99%, 95.5%, 92.1%, and 96.2%). Cardiac auscultation is nuanced, but clearly machine learning holds promise for improving diagnostic accuracy for providers at all experience levels.
Transthoracic echocardiogram
Echocardiography is the mainstay of non-invasive assessment for CHD, but requires experienced reviewers that remain susceptible to biases. Coarctation of the aorta, a common form of critical CHD characterised by narrowing of the thoracic aorta, is particularly well suited for echocardiographic diagnosis, though poses challenges when it comes to image interpretation. Reference Pereira, Bueno and Rodriguez23 Narrowing of the aorta causes obstruction of normal blood flow to the body and excessive pressure to be generated by the left ventricle. If the obstruction is not diagnosed in a timely manner, it can lead to heart failure and poor systemic perfusion. Pereira and colleagues Reference Pereira, Bueno and Rodriguez23 retrospectively collected 2-dimensional echocardiographic images of the aortic arch. The aim of this study was to develop a fully automated algorithm to detect coarctation of the aorta from 2-dimensional echocardiographic images using standard view planes (suprasternal, apical, and parasternal windows). Static images representing a single cardiac cycle (end diastolic and end systolic phases) were pre-selected for model development. Neonates born with coarctation of the aorta alone and healthy neonates born without coarctation of the aorta were included in the sample. A stacked denoising autoencoder neural network was used for feature extraction over predefined image regions (sectors). A support vector machine classifier was trained on a random subset of training data features. The parasternal long axis view had the lowest coarctation error rate (end diastolic phase [7.7], end systolic phase [11.5]), and the apical view had the lowest healthy error rate (end diastolic phase [20.0], end systolic phase [20.0]). Reference Pereira, Bueno and Rodriguez23 When the views were combined, more undecided cases resulted. The increase in undecided cases with inclusion of multiple views is not surprising given the increased model complexity, and therefore difficulty for the model to classify outcomes.
Further research has focused on image denoising, automatic detection, and clustering subjects based on quantitative image data to further support clinical decision making and diagnosis of critical and non-critical CHD. Reference Wang, Liu and Wang33 Diller et al. Reference Diller, Lammers and Babu-Narayan19 aimed to remove acoustic shadowing artifacts that occur during transthoracic echocardiograms through the use of a deep neural network and autoencoder. Cross-entropy (a loss function that measures differences between two probability distributions -- original image vs. reconstructed image) Reference Diller, Lammers and Babu-Narayan19,Reference Rubinstein73 and sum of squared differences (measures image quality) were performance evaluation metrics used on the test data set. Autoencoders extracted features and trained significantly better on CHD samples compared to healthy samples, represented by a lower cross-entropy and lower mean squared difference (0.2597 ± 0.0327 and 118.86 ± 61.52). Reference Diller, Lammers and Babu-Narayan19 Finally, Meza et al. Reference Meza, Slieker and Blackstone21 used an unsupervised hierarchical cluster analysis in 651 neonates with critical left heart obstruction to determine if subjects could be defined using more clinically meaningful clusters. Images were derived from transthoracic echocardiograms, and three distinct groups emerged (n = 215, 338, 98). Aortic valve atresia and left ventricular end diastolic volume variables significantly distinguished the groups. Median left ventricular end diastolic area for groups 1, 2, 3 was 1.35, 0.69, and 2.47 cm2 (p < 0.0001). Aortic atresia in groups 1, 2, 3 was present in 11%, 87%, and 8%, (p < 0.0001). Reference Meza, Slieker and Blackstone21 The authors suggest that clustering analyses yield more reliable delineation of subject heart structure characteristics. These data support the use of clustering approaches to more accurately diagnose CHD. The added complexity and volume of data in diagnostic imaging poses a challenge to machine learning, but these investigators demonstrate the potential value machine learning brings to clinical pattern fitting and decision making.
Cardiovascular MRI
Cardiovascular MRI is considered the clinical gold standard for accurate assessment of ventricular volumes and function. Obtaining quality images requires repetitive breath holding, which can be a challenge for patients with CHD who may suffer from shortness of breath at baseline or who are simply too young to comply with breath-holding instructions. The investigators of the next study explored alternative ways to denoise cardiovascular magnetic resonance images that were captured during free-breathing. Specifically, they aimed to use real-time imaging, while applying reconstruction techniques to denoise the images (artifact versus artifact free images). Reference Hauptmann, Arridge, Lucka, Muthurangu and Steeden28 Metrics used to evaluate the performance of their convolutional neural network were the signal-to-noise ratio, acceleration factor, and image cropping. The root-mean square error and the structural similarity index were used to evaluate the test data set reconstructed image accuracy. The continuously rotating tiny golden angle sampling pattern had the lowest root-mean square error, and the highest structural similarity index (p < 0.0001) compared to all other sampling methods. The signal-to-noise ratio decreased from 20 dB to 10 dB, and the acceleration factor increased from 10x to 16x. Finally, the reconstruction time for all slices originating from raw data was 5.6x faster for the convolutional neural network (22 seconds). Reference Hauptmann, Arridge, Lucka, Muthurangu and Steeden28 This study demonstrates that machine learning models have the potential to successfully remove artifact, while decreasing reconstruction times to produce better quality images and more accurate measurements in real-time. More importantly, if these models become widely used, they have the potential to improve patient comfort or decrease the need for sedation in infants and younger children during cardiovascular MRI, as the need for frequent breath holding is no longer required.
Further, clustering subjects based on their cardiovascular MRI data is a very popular approach presented in the literature. While cardiac imaging has already been established as a standard and definitive way to diagnose complex heart disease, investigators are now taking advantage of the machine learning applications that can unveil hidden patterns in these data to boost diagnostic accuracy. Reference Karimi-Bidhendi, Arafati, Cheng, Wu, Kheradvar and Jafarkhani31,Reference Lu, Fu, Li and Qi32,Reference Tandon, Mohan and Jensen35 Bruse and colleagues Reference Bruse, Zuluaga and Khushnood24 used agglomerative hierarchical clustering and principal component analysis to detect clinically meaningful shape clusters using anatomical cardiovascular MRI data. Subjects with and without surgically corrected coarctation of the aorta and subjects with healthy aortic arches were included. For each cross-validated run, 83% of healthy aortic arches were assigned to the healthy group, 85% of the coarctation shapes were correctly assigned, and 100% of the surgically corrected shapes were accurately assigned. Human clinicians would ideally use the results provided by the machine learning model to either compare or validate their own interpretation of the image data. This suggests that clustering techniques have the potential to inform clinical decision making at the time of diagnosis, thus improving accuracy and efficiency.
Finally, cardiac segmentation applied to cardiovascular magnetic resonance images is a process that takes a complex multidimensional image of the heart and separates its major sections, for example, the ventricles and coronary arteries. Segmentation is necessary because each ventricle or vessel can then be assessed quantitatively. Particularly, ventricular mass, volume, or ejection fraction can be quantitively measured. Segmentation driven by deep learning; specifically convolutional neural networks require large amounts of training data. Research studies aiming to optimise cardiac segmentation driven by deep learning in patients with CHD are faced with data accessibility challenges related to the extreme heterogeneity of cardiac anatomy and rarity of disease within the CHD population. Generative adversarial networks learn from real images in order to generate synthetic image data. The generative adversarial network has two networks that compete with one another. A generator network creates false images that the discriminator network will use to decipher between real and false images. Reference Chen, Qin and Qiu74 Through these networks, new synthetic images are produced and can increase the size of training data sets in populations with limited image data. Reference Zeleznik, Weiss and Taron75 Investigators used these methods to create synthetic image data in patients with Tetralogy of Fallot, an extremely rare heart condition in the general population but one of the most common critical CHD, and found the images to be anatomically accurate. Reference Diller, Vahle and Radke76 Generative adversarial networks may have major implications in CHD diagnostic image research, as they can potentially expand training data sets and allow models to predict rare and life-threatening diseases.
Discussion
This scoping review demonstrates that machine learning is a rapidly evolving field of paediatric cardiology with a myriad of potential functions. The majority of these applications focused on the diagnosis and assessment of underlying CHD, including classification of auscultatory heart sounds, transthoracic echocardiograms, or cardiovascular magnetic resonance images.
Deep neural networks and support vector machines were commonly used algorithms for such tasks. Deep neural networks are popular to use when analysing human data because they are robust and can handle inconsistent data. They are designed to model human cognitive abilities, which process, store, and retrieve information. Deep neural networks are extremely complex and often not interpretable to clinicians; therefore, their utility is somewhat controversial when applied to clinical decision-making tasks. On the other hand, support vector machines are a popular technique used for binary outcome classification (i.e., diseased versus healthy). Support vector machines use features and can separate classes by maximising the distance between data points in each class. For healthcare-related classification problems, support vector machines have high accuracy and do not suffer from multicollinearity (highly correlated features), which is an issue with human data as many features are often highly correlated (i.e., systolic, diastolic, mean blood pressure values). Although support vector machines have their advantages, this technique is computationally intensive, with the non-linear support vector machine being more exhaustive than linear.
Implementing and translating machine learning results into clinically meaningful tools is a necessary path to improving diagnostic accuracy and efficiency.
Importantly, there are techniques to improve deep neural network interpretability, which is one of the most used algorithm types for diagnosing and assessing CHD. Heat maps are one example that can ensure models are capturing valid image signatures to further expand clinical usefulness. The integrated gradient method is another technique used to explain how a deep neural network predicted an outcome by visualising input feature importance. The linear interpretable model-agnostic explanation is an application used with convolutional neural networks to discover what the convolutional neural network learns while deriving predictions, providing more interpretability to clinical end-users. Investigators have used this technique after comparing diagnostic performance of a convolutional neural network versus trained cardiologists and MUSE (GE Healthcare) automated analysis. The linear interpretable model-agnostic explanation technique presented physiologically relevant electrocardiogram segments chosen by the convolutional neural network when predicting diagnostic classes. Reference Hughes, Olgin and Avram77 Implementing any of these techniques is equivalent to feature importance ranking in random forest or coefficients in support vector machine for interpretability purposes. These methods can be applied to validate and corroborate that a model is predicting a legible signal, further supporting clinical end-user interpretability. For clinicians, the algorithm outputs must be relevant and trustworthy to garner clinician support. If implemented, clinicians will use machine learning models to classify healthy individuals versus those with diseases. Clinicians have a desire to understand how classification results are derived. Achieving this understanding will support machine learning prediction translation and clinical uptake.
To date, machine learning has been successfully explored and applied to support clinicians in many ways including early recognition of cardiorespiratory instability. Bose and colleagues found that in 634 individuals with in-hospital cardiac arrest, 79% of these patients also had cardiorespiratory instability four to 24 hours prior to arrest. Reference Bose, Hoffman and Hravnak78,Reference Kause, Smith and Prytherch79 Predicting cardiorespiratory instability risk is significant because patients who experience this type of instability, if not attended to, can progress to in-hospital cardiac arrest. However, if cardiorespiratory instability is recognised early, cardiac arrest may be prevented.
Machine learning-based technologies have the opportunity to not only advance expert care at tertiary centres but also to provide access to quality care in remote areas without local subspecialty expertise. Udine and colleagues found that CHD infant mortality was associated with higher poverty levels. Reference Udine, Burns, Pearson and Kaltman15 The American Academy of Family Physicians’ position paper on poverty and health describes that poverty affects the built environment, which includes buildings, infrastructure, and services. Reference Czapp and Kovach80,Reference Macintyre, Ellaway and Cummins81 In the case of limited access to expert care, machine learning can be used as a tool to connect specialised cardiac programmes with distant primary care providers. Machine learning can systematically be applied to clinical data, process it, and provide improved accuracy and timely diagnosis of CHD for patients in various clinical settings. For patients in remote clinical settings, diagnostic data could be sent to a tertiary care centre for consultation (machine learning output analysis and patient triage). Even for patients with access to specialised cardiac programmes, they too may benefit from having their clinical data processed by machine learning algorithms. Specifically, advanced cardiac imaging data could result in automated detection of cardiac diseases. Machine learning models are likely to save time, while boosting diagnostic accuracy.
Although, machine learning appears to be an evolving and promising tool for CHD diagnostics, there are still several limitations to consider. General limitations of machine learning in healthcare include lack of diverse and large data sets, poor standardisation across hospital systems, expertise and time challenges related to ground-truth labelling, lack of comparable testing sets, poor transparency of algorithm design, and inadequate prospective integration into clinical workflow. Specifically for the models addressed in this scoping review, their limitations are as follows: deep neural network requires large amounts of data, and carries the black box concept (you don't know how or why the network came up with the output); support vector machine is computationally exhaustive and not ideal for problems with many training examples; principal component analysis reduces variables and dimensionality to improve interpretability, but will sacrifice prediction accuracy in doing so; cluster analyses are used when the outcome is unknown, so accuracy cannot be determined, and results tend to not be representative of real-world problems; Hidden Markov models require a priori knowledge about the problem, otherwise severe overfitting will result; linear discriminant analyses require a normal distribution, but do not impose assumptions which will increase bias. Linear discriminant analysis also suffers from issues with multicollinearity; and decision trees are considered an unstable classifier, meaning small changes in data can cause large changes in decision tree structure. Decision trees are expensive and time consuming as it takes a lot of time to train the model. Overfitting can also be a problem with this type of classifier.
Further limitations related to this scoping review include using a broad literature search strategy. Our goal was to describe the comprehensive applications of machine learning used in paediatric cardiology research and then focus on machine learning techniques used for diagnosis and assessment. This approach required screening an extensive amount of journal article titles and abstracts. In the future, it will be interesting to explore the predictive capabilities of other non-invasive diagnostic technologies, such as the 12-lead electrocardiogram. Few paediatric focused studies have considered the disease detection capabilities of this technology. Reference Du, Huang, Huang, Maalla and Liang82 In adult focused machine learning cardiac research, 12-lead electrocardiograms are already being leveraged to detect acute coronary syndrome. Reference Bouzid, Faramand and Gregg83,Reference Al-Zaiti, Besomi and Bouzid84
In conclusion, these findings indicate that machine learning is a very promising tool for diagnosing and assessing critical and non-critical CHD, yet extensive research is still needed to build robust and generalisable models for clinical use, especially considering the extreme heterogeneity of complex CHD.
Acknowledgements
None.
Financial support
This work was supported by grants from the National Institute of Health and (S.H., grant numbers T32NR008857, 1F31NR019725-01A1).
Conflicts of interest
The Role of Machine Learning Applications in Critical and Non-Critical Congenital Heart Disease Diagnosis and Assessment: A Scoping Review. None.
Ethical standards
This study was exempt from Institutional Review Board review.