There are approximately 2.5 million children and adults living in the United States with CHD today. Reference Gilboa, Devine and Kucik1 CHD is the most common congenital anomaly, affecting approximately 1% of births. Reference Hoffman and Kaplan2 An additional 40,000 children are born in the United States every year with heart defects, Reference Reller, Strickland, Riehle-Colarusso, Mahle and Correa3 and many more are born throughout the world, resulting in a significant public health issue. Improvements in medical and surgical management have greatly improved survival in children with heart disease Reference Best and Rankin4 ; since the majority of these children are now expected to survive into adulthood, research has increasingly focused on the comorbidities faced by this population.
Deficits in cognitive functioning were recognised as a potential problem for children and adults with CHD many years ago. Reference Stevenson, Stone, Dillard and Morgan5–Reference Clarkson, MacArthur, Barratt-Boyes, Whitlock and Neutze7 Children born with heart anomalies are exposed to anaesthesia, surgery, and cardiopulmonary bypass at a very early age, all of which could potentially result in damage to the developing brain. Further, children with critical CHD, defined as a heart defect requiring intervention in the first year of life, Reference Gilboa, Devine and Kucik1 are at higher risk for chronic hypoxaemia, Reference Kussman, Laussen, Benni, McGowan and McElhinney8 poor feeding and growth, Reference Mitting, Marino, Macrae, Shastri, Meyer and Pathan9 intrauterine growth restriction, Reference Wallenstein, Harper and Odibo10 and a high physiologic stress response in early development. Reference Anand, Hansen and Hickey11 These combined factors may potentially increase the risk for developing cognitive problems, which could lower academic success, Reference Bashir and Scavuzzo12 educational attainment, Reference Gathercole, Pickering, Knight and Stegmann13 and quality of life. Reference Pike, Evangelista, Doering, Eastwood, Lewis and Child14–Reference Eaton, Wang and Menahem16
Individuals with CHD may present with several neurodevelopmental vulnerabilities. Deficits in executive functions are among the most prevalent long-term morbidities in children and adolescents with CHD who underwent cardiac surgery in infancy. Reference Bellinger, Wypij and duPlessis17–Reference Kasmi, Calderon and Montreuil19 Executive functions refer to an umbrella of higher order neurocognitive processes that allow an individual to adapt to new situations. Executive functions can be categorised into core components including inhibitory control, working memory, and cognitive flexibility (or set-shifting), as well as higher order functions such as planning and problem-solving abilities. They help to temporarily retain information so it can be manipulated, inhibit attention to irrelevant stimuli, and help plan a complex series of thoughts or actions. Executive functions undergo a protracted development throughout childhood, and they continue to develop into early adulthood. They are essential to learning processes as well as social and emotional development. Reference Diamond, Barnett, Thomas and Munro20 Recent studies have reported significant deficits in these functions for children, adolescents, and adults with CHD as observed in standardised and experimental neuropsychological assessments. People with CHD performed especially poorly on measures of cognitive flexibility, Reference Cassidy, White, DeMaso, Newburger and Bellinger21 verbally mediated tasks, Reference Cassidy, White, DeMaso, Newburger and Bellinger21 and inhibition. Reference Calderon, Jambaque, Bonnet and Angeard22 Performance on measures of working memory was relatively preserved. Reference Calderon, Jambaque, Bonnet and Angeard22 Likewise, data from neuroimaging studies have corroborated these findings and have indicated that adolescents and young adults with CHD have altered brain activation patterns in regions related to executive function. Reference King, Smith and Burns23
The number of studies reporting measures of executive functions in people with CHD has greatly increased in the last several years. These studies have incorporated a wide variety of designs, populations, and clinical variables into their study populations, and many studies examined executive functions. We chose to conduct a systematic review and use meta-analytic methods to assess whether there are differences in executive functioning between people with CHD and controls, and whether we could identify variables, such as early surgical repair versus conservative management, that could explain some of the differences in scores between the CHD population and controls. The aim of this study was to conduct a systematic review of the literature and subsequent meta-analysis with the hypothesis that children and adults with CHD would score worse on measures of EF compared to control groups or normative population samples. Specifically, we hypothesised that children would perform worst on measures of cognitive flexibility and inhibition, while a smaller difference would be identified in measures of working memory and planning/problem solving.
Materials and methods
Study selection
We conducted our systematic review and meta-analysis following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Statement and Checklist. Reference Moher, Liberati, Tetzlaff and Altman24 We searched PubMed, CINAHL, EMBASE, PsycInfo, Web of Science, and the Cochrane Library for results published between 1 January, 1986 and 15 June, 2020. Our final search term was developed in consultation with a trained research librarian to maximise search results (Supplemental Information, Figure S1).
We included randomised controlled trials, observational studies, and cross-sectional studies that included a measure of executive function in a cohort with CHD evaluated at >3 years of age. We chose the age of three as the lower limit where executive functions could be meaningfully tested. Reference Isquith, Crawford, Espy and Gioia25 We excluded reports that were published earlier than 1 January, 1986 in order to reflect more modern practice in paediatric cardiac surgery and paediatric cardiology. Finally, we chose to exclude studies that included heart transplant recipients due to the high proportion of patients who were not born with CHD or might have presented a different neuropsychological profile.
Screening of search results
Studies were screened by title and abstract by two authors independently (WMJ, JJL). Further, we screened the bibliographies of the screened manuscripts and abstracts. Once our systematic review was complete, two authors (WMJ and ND) then independently read the screened manuscripts and abstracts in full to identify whether they met the full inclusion and exclusion criteria. Each potential entry was then discussed, and disagreements were adjudicated by a third author (LSS).
Data extraction
The scores for measures of executive functions that were reported in each manuscript were then extracted manually, and the measures were categorised into six main domains: planning and problem solving, inhibition, cognitive flexibility, working memory, summary measures, and reporter-based measures of executive function. Summary measures were defined as measures that combined multiple different domains into a single summary score, and reporter-based measures were self-report measures completed by either the subject or a caretaker. The included measures were categorised by consensus among authors. Different measures administered to the same cohort were included if the measures were different tests, tested different domains of executive function, or tested the same cohort at different ages. We allowed inclusion of longitudinal data because executive functions develop and change continuously throughout childhood. Reference Zelazo, Carlson, Kesek, Nelson and Luciana26 If there were multiple reports of identical data, we judged which data to use on an individual basis, with the goal of using the largest cohort reported. Authors were contacted if there was confusion regarding duplication, and if no response was received, those measures were discarded.
We extracted the following data from manuscripts: First author, year of publication, journal of publication, title, number of study subjects, number of control subjects if applicable, age at testing, CHD diagnosis, the name of the neuropsychological test, and the reported scores for the neuropsychological measures. The reported data were extracted as means with standard deviations or medians with ranges. The extracted data were checked twice more by one author (WMJ) in separate sessions to ensure correctness.
Bias assessment
Publication bias was assessed visually through the use of funnel plots. Assessment of the quality of evidence and other sources of bias were conducted by two authors independently (WMJ and NSF) using a modified Newcastle-Ottawa Scale for nonrandomised cohort studies. The results were discussed, and disagreements were adjudicated by a third party (LSS).
Dealing with dependent data
We chose to deal with correlation in the data using the method of Graham and Hebert, Reference Graham and Hebert27 which is an adaptation of Cooper’s shifting unit-of-analysis approach. Reference Cooper28 This method recommends an independent meta-analysis for each individual domain of a construct such as executive function, as opposed to calculating a combined overall effect size for all of the measures included in the analysis. We chose not to perform more complex methods of dealing with dependence, such as robust variance estimation Reference Hedges, Tipton and Johnson29 or three-level meta-analysis, Reference Cheung30 based on the conclusions of Scammacca et al, Reference Scammacca, Roberts and Stuebing31 who found that most methods of dealing with data dependence result in similar conclusions regarding the effect size. Further, since a weighted average of related but distinct executive functions domains would not be meaningful, we chose to employ Graham and Hebert’s approach in our analysis.
Data analysis
Analyses for each domain were performed with Comprehensive Meta-Analysis software (Biostat, Inc., Englewood, NJ). We chose a priori to use a random-effects model to estimate the effect size because we felt it was likely that age and CHD diagnosis would contribute to a range of effect sizes, contributing to between-study heterogeneity. Reference Zelazo, Carlson, Kesek, Nelson and Luciana26 Further, previous work by our group showed that measures of intelligence varied based on the CHD diagnosis of the study population. Reference Jackson, Davis, Calderon, Bellinger and Sun32 Therefore, a random-effects model seemed most appropriate. We assessed heterogeneity using the I2 statistic. We calculated a pooled standardised mean difference between scores of children with CHD and scores of healthy controls. A negative standardised mean difference was standardised to mean worse performance on the measure in the CHD group compared to controls. A standardised mean difference of 0 to −0.3 indicated a mild decrease in performance, a standardised mean difference of −0.3 to −0.6 indicated a moderate deficit in performance, and a standardised mean difference of −0.6 or below indicated a severe deficit in performance in the CHD group relative to controls.
We explored sources of heterogeneity if the I2 statistic exceeded 50% in any domain-level effect size calculation, indicating at least moderate between-study heterogeneity. Techniques we planned to use included meta-regression on clinical variables such as age and CHD type, “one-study-removed” analysis, and “one-measure-removed” analysis.
“One-study-removed” analysis is a technique for exploring each individual study’s impact on the effect size estimate and between-study heterogeneity. One study at a time is removed and the model is recalculated. If there are large differences in the effect size estimate or in the I2 statistic between the two models, one could explore differences in that study compared to others.
We also explored whether removing one individual measure, such as the Trail Making Test Part B or the DKEFS Sorting Test, and recalculating the model would result in significant changes in effect size estimate and I2. Significant changes could potentially indicate that the particular measure being tested may not have appropriate construct validity for the EF domain in our population of interest.
If less than 10 studies were included in any of the individual domain-level analyses, forest plots would be inspected visually for potential drivers of the heterogeneity. Potential moderator variables that would be considered for meta-regression included the specific measures used for each domain, the quality of evidence, age at evaluation, CHD diagnosis, and for the reporter-based measures analysis, the reporter.
We used the means, standard deviations, and sample sizes of control groups where presented. Data reported as medians with interquartile ranges or overall ranges were converted to means and standard deviations using standard methods. Reference Hozo, Djulbegovic and Hozo33 If data were reported as subgroups, the means and variances were pooled using the method reported by Hedges and Olkin. Reference Hedges and Olkin34 When no control group was included as part of the study, normative scores provided by the assessment battery were used.
Finally, we chose to consider the effect of normative data in our analysis. While normative data samples are large and validated in a broad sample, they often fail to take into account local variation in important confounders that could affect performance on measures of EF, such as socioeconomic status, parental education, and educational quality. Given that many of our studies contained small, geographically restricted convenience samples, and since many normative samples contain sample sizes in the thousands, an artificially large effect size in one cross-sectional study due to confounding variables could potentially drastically and incorrectly alter the overall effect size estimate. For this reason, we compared scores for participants in studies with no control group to a standardised control group utilising the expected scores for healthy children as per the testing manual (scaled score: mean 10, standard deviation 3; T-score: mean 50, standard deviation 10; standard score: mean 100, standard deviation 15), and limited the number of participants in each individual study’s control group to 200.
Results
Study characteristics
A total of 61,217 results were screened by title and abstract. Three hundred articles were identified for full-text screening from these search results. After independent review, 28 articles met the inclusion criteria for the analysis (Fig 1). Two hundred sixty-seven articles and abstracts were excluded due to the lack of a reported measure of executive function. Two manuscripts were excluded due to the failure to report a usable centrality and variance estimate of the executive function measure scores. Reference Daliento, Mapelli and Russo35,Reference Mittnacht, Choukair and Kneppo36 Two manuscripts were excluded because the data were reported in another included manuscript. Reference Gaynor, Gerdes and Nord37,Reference Miatton, De Wolf, Francois, Thiery and Vingerhoets38 Finally, one manuscript was rejected because it was not clear whether the measure included was a measure of executive function. Reference Sahu, Chauhan, Kiran, Bisoi, Ramakrishnan and Nehra39
The 28 included studies contained a total of 91 scores for measures of executive functions across the six domains (cognitive flexibility, inhibition, working memory, planning and problem solving, summary measures, reporter-based measures) that we planned to analyse. The characteristics of the included studies and measures are described in the Supplemental Information (Tables S1 – S6). A total of 16 scores on measures of executive function reported in eight manuscripts were excluded (Supplemental Information, Table S7). A brief description of the neuropsychological measures included in the analyses is presented in the Supplemental Information (Table S8).
Assessment of publication bias and quality of evidence
Funnel plots for each of the six analyses were generated and visually inspected. None of the plots indicated substantial publication bias. The quality of evidence was rated as moderate by the modified Newcastle-Ottawa scale and by qualitative impression (Supplemental Information, Tables S1 – S6).
Measures of cognitive flexibility and set shifting
Six manuscripts contained 21 measures of cognitive flexibility and set shifting. Reference Bellinger, Wypij and duPlessis17,Reference Kasmi, Calderon and Montreuil19,Reference Cassidy, White, DeMaso, Newburger and Bellinger21,Reference Calderon, Jambaque, Bonnet and Angeard22,Reference Bergemann, Hansen and Rotermann40,Reference Ilardi, Ono, McCartney, Book and Stringer41 A total of 2165 scores on measures of cognitive flexibility and set shifting were reported in 548 children and adults with CHD. A total of 2170 scores on measures were reported for 607 healthy controls. Meta-analysis showed a standardised mean difference of −0.628 (95% confidence interval: −0.726, −0.531) between populations with CHD and healthy controls in the full model (Fig 2). I2 was 49.62%, indicating a moderate amount of heterogeneity. Sensitivity analysis was performed to attempt to explain some of the between-study heterogeneity, including techniques such as meta-regression on age and type of CHD, “one-study-removed” analysis, and removal of one neuropsychological measure (e.g. Trails B or the DKEFS Sorting Test Confirmed Correct Sorts) from the analysis. None of those analyses decreased the I2 statistic by more than 10%, so the initial model was kept in place.
Measures of inhibition
Seven manuscripts contained 16 measures of inhibition. Reference Bellinger, Wypij and duPlessis17,Reference Calderon, Jambaque, Bonnet and Angeard22,Reference Calderon, Angeard, Moutier, Plumet, Jambaqué and Bonnet42–Reference Sterken, Lemiere, Van den Berghe and Mesotten46 Totally, 1273 scores on measures of inhibition were reported in 1038 children with CHD, and 986 scores on measures of inhibition were reported in 759 healthy controls. Meta-analysis revealed a standardised mean difference of −0.469 (95% confidence interval: −0.606, −0.333, Fig 3a), and I2 was calculated to be 51.98%. Two of the measures reported were the Statue test, a test commonly used to assess motor inhibition, whereas the other measures were assessments of cognitive inhibition, similar to the Stroop task. Reference Stroop47 After eliminating the two Statue measures, the standardised mean difference was −0.486 (95% confidence interval: −0.617, −0.355, I2 = 34.62%, Fig 3b).
Measures of working memory
Twelve manuscripts reported 21 measures of working memory. Reference Kasmi, Calderon and Montreuil19,Reference Calderon, Jambaque, Bonnet and Angeard22,Reference King, Smith and Burns23,Reference Ilardi, Ono, McCartney, Book and Stringer41–Reference Calderon, Bonnet, Courtin, Concordet, Plumet and Angeard43,Reference Bellinger, Watson and Rivkin48–Reference Sommariva, Gortan and Liguoro53 Authors reported 835 scores on measures of working memory in 501 children with CHD and 1565 scores on measures of working memory in 1095 healthy controls. The standardised mean difference was −0.369 (95% confidence interval: −0.466, −0.273, Fig 4), indicating a mild-to-moderate decrease in performance on measures of working memory in subjects with CHD compared to healthy controls. I2 was 6.4%, indicating low between-study heterogeneity.
Measures of planning and problem solving
Three manuscripts contained 11 measures of planning and problem-solving abilities. Reference Cassidy, White, DeMaso, Newburger and Bellinger21,Reference Calderon, Bonnet, Courtin, Concordet, Plumet and Angeard43,Reference Quartermain, Ittenbach and Flynn54 Two manuscripts reported an overall score for a variant of the Tower of London task, while the other manuscript reported three different metrics calculated to measure planning and problem-solving abilities during the Delis-Kaplan Executive Function System Tower task. There were 1145 scores on measures of planning and problem solving reported in 419 children with CHD, and 1032 scores on measures of planning and problem solving reported in 144 healthy controls. The overall standardised mean difference was calculated as −0.334 (95% confidence interval: −0.546, −0.121, Fig 5a), indicating moderate decreases in planning and problem-solving abilities in populations with CHD compared to healthy controls. I2 was 82.83%, and upon visual inspection, measures of the Delis-Kaplan Executive Function System Tower Test move accuracy ratio appeared to vary substantially from the other measures. When these measures were removed, the standardised mean difference was −0.525 (95% confidence interval: −0.644, −0.407, I2 = 16.43%, Fig 5b).
Summary measures of executive functioning
Five scores for summary measures of executive functions were extracted from five different manuscripts. Reference Bellinger, Wypij and Rivkin18,Reference Bellinger, Watson and Rivkin48,Reference Bellinger, Rivkin and DeMaso55–Reference Miatton, De Wolf, Francois, Thiery and Vingerhoets57 The five measures were a summary score calculated by averaging five subtests on the Delis-Kaplan Executive Function System, used in three different manuscripts, and the Developmental NEuroPSYchological Assessment – Version 2 Attention/Executive Function core domain score, used in two manuscripts. A total of 626 scores in children with CHD and a total of 641 scores of control children were included in the analysis. Meta-analysis of the five measures resulted in a standardised mean difference of −0.361 (95% confidence interval: −0.576, −0.147), indicating a moderate decrease in scores on measures of global executive functions in children with CHD compared to controls (Supplemental Information, Figure S2a). I2 was 73.18%, indicating substantial heterogeneity. One study removed analysis showed that the measure reported by Fuller et al accounted for all of the heterogeneity. The standardised mean difference estimate after its removal was −0.457 (95% confidence interval: −0.585, −0.329, I2 = 0, Supplemental Information, Figure S2b).
Reporter-based measures
Seven manuscripts reported 16 measures of either self-, parent-, or teacher report of executive function. Reference Bellinger, Wypij and Rivkin18,Reference King, Smith and Burns23,Reference Bellinger, Watson and Rivkin48,Reference Bellinger, Rivkin and DeMaso55,Reference Brosig, Bear and Allen58–Reference Sanz, Berl, Armour, Wang, Cheng and Donofrio60 A total of 1745 scores in the CHD group and 1793 scores in the healthy control group across the 16 measures were included for meta-analysis. We estimated a standardised mean difference of −0.444 (95% confidence interval: −0.614, −0.274, Supplemental Information, Figure S3a). I2 equaled 85.68%. Based on prior work reporting a significant difference in scores on the Behavior Rating Inventory of Executive Function between the reporter type, Reference Daunhauer, Fidler, Hahn, Will, Lee and Hepburn61 we chose to perform a meta-regression on the reporter variable. The inclusion of reporter in the model resulted in a statistically significant improvement in the effect size estimate (Q = 25.81, df = 2, p < 0.001, Supplemental Information, Figure S3b). The estimated standardised mean differences were −0.02 (95% confidence interval: −0.24, 0.21) for self-report measures, −0.44 (−0.7, −0.17) for parent-reported measures, and −0.83 (−1.15, −0.51) for teacher-reported measures. The R2 analog was 0.74, meaning that the model explained 74% of the estimated between-study variance.
Discussion
We found a moderate decrease in performance in six domains of executive functions in people with CHD compared to healthy controls. In all domains, people with CHD showed at least a mild decrease in scores. The most prominent decreases were in inhibition and cognitive flexibility, though these domains showed substantial between-study heterogeneity.
In the inhibition domain, between-study variance was eliminated when we removed the NEuroPSYchological Assessment – Second Edition-II Statue test from the analysis. This could potentially result from the fact that the Statue test is designed to assess behavioural control, which may be a different construct than cognitive inhibition. Reference Germain and Collette62
There was a large amount of heterogeneity in effect sizes for measures of cognitive flexibility that could not be explained. Elimination of each of the different included measures (Trail Making Test Part B, Wisconsin Card Sort Test, Delis-Kaplan Executive Function System Category Switch, Delis-Kaplan Executive Function System Dot Switch, Delis-Kaplan Executive Function System Sorting Test) did not substantially reduce between-study heterogeneity, nor did any individual measure have an outsize effect on the standardised mean difference. We regressed on a variety of age groups and by CHD diagnosis, but none of the regression models were significant. This could be that the analysis was underpowered to detect age and diagnosis-related differences in cognitive flexibility abilities, or there may be confounding introduced by related cognitive deficits, such as visual or auditory processing.
Children and adults with CHD showed significantly lower scores on measures of working memory and planning/problem solving as compared to healthy controls, albeit with lower effect sizes than other domains. Measures of working memory showed little between-study variance despite including multiple assessment tools (backward digit span, Corsi block tapping test, n-back) and broader subscales that incorporated multiple assessments (Wechsler Intelligence Scale for Children – Version 4 and Wechsler Adult Intelligence Scale – Version 3 working memory subscales).
The planning and problem-solving analysis showed decreased variability with the removal of the move accuracy ratio measure from the Delis-Kaplan Executive Function System Tower assessment. This measure is calculated as the total number of moves used to complete the task divided by the minimum number of moves required. The scores reported indicate that children with CHD had a lower move accuracy ratio compared to controls, which indicates better performance on the measure. This was in contrast to the rest of the measures used in the analysis, which showed decreased performance in the CHD group. A possible explanation is that a lower move accuracy ratio can have two interpretations: better performance or freezing. Children tested in the Tower test may become frustrated with later, more difficult tasks in the assessment. Often, they can become overwhelmed and perform no moves at all, leading to an artificially low move accuracy ratio and the appearance of better performance. The total achievement score helps distinguish the two possibilities. A higher total achievement score indicates better performance on the Tower task, and a lower total achievement score suggests a lack of moves due to frustration. In this case, the total achievement scores for the same samples were included in the analysis and were indeed lower than controls, suggesting the possibility that the move accuracy ratio scores in these CHD groups were artificially better.
Summary measures of executive functioning appear to be a reasonable proxy for executive function deficits in this population. Analysis of summary measures produced a similar effect size to the individual domain analyses, implying that the reporting of executive function summary scores may be sufficient for the estimation of the risk of general executive dysfunction in a population. However, domain-level information remains critical for more detailed clinical assessment and the development of possible interventions for cognitive remediation.
In a meta-regression of reporter-based measures, parents achieved the most accuracy in characterising the magnitude of deficits found on formal neuropsychological assessment of executive functioning. Self-reported measures underestimated the performance deficits, while teacher-reported measures overestimated the performance deficits. This is limited by the low number of scores in the two latter groups, which may have improved estimates with more data.
Our study has several important strengths. First, no current meta-analyses have examined performance on measures of executive functions in CHD. There has been a substantial body of literature examining the topic published in the past five years, and we believe this study represents an accurate synthesis of the data. Second, our methodology addresses two of the major problems encountered when synthesising neuropsychological data: collinearity and important confounders (e.g., age, severity of disease, reporter). Finally, while American Heart Association guidelines recommend screening for executive dysfunction starting around the age of six, Reference Marino, Lipkin and Newburger63 problems may present earlier in life. Our synthesis shows deficits in multiple domains of executive functions across the lifespan, including in younger children, and thus earlier screening for executive dysfunction may be beneficial.
We acknowledge some limitations in our work. A substantial proportion of our data comes from a small number of research groups. While these groups examined high-quality data in multiple cohorts in both observational and longitudinal designs, reports from different geographic locations would greatly strengthen the analysis. Further, while our analysis did account for correlation in the data, our methodology could not assess the impact it had on our effect size estimates. While analyses to address this issue were not feasible due to the small number of data sources, adjusting for collinearity might have explained some of the unexplained between-study variance.
In summary, children and adults with CHD show moderate deficits in measures of executive functions compared to healthy controls. The effect persists throughout childhood and into adulthood and exists in six common domains of executive functioning. These deficits can potentially be improved with cognitive remediation, Reference Calderon, Bellinger and Hartigan64 which makes executive functioning an important area of cognitive function to study in these patients. Further research should continue to quantify these deficits, and the field should continue to develop and encourage multidisciplinary CHD centres that include routine neuropsychological evaluation as part of the comprehensive care of individuals with CHD.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/S1047951121001074
Acknowledgements
None.
Financial support
None.
Conflicts of interest
Dr Sun works as a consultant for Merck, and she works as a co-editor-in-chief for UpToDate.