Introduction
Treatment research aimed at reducing neurologic morbidity from sickle cell disease (SCD) has been limited, in part, by a lack of well-established measures of cognitive functions to evaluate treatment outcomes. Measures of general cognitive ability have been used to measure neurocognitive effects and have shown sensitivity to a range of specific neurologic morbidities (Armstrong et al., Reference Armstrong, Thompson, Wang, Zimmerman, Pegelow, Miller and Vass1996). However, measures of specific cognitive functions often have two to three times the effect size as general cognitive ability for detecting neurocognitive deficits (Schatz, Finke, Kellet, & Kramer, Reference Schatz, Finke, Kellet and Kramer2002). Treatment studies have begun using cognitive measures as outcomes for randomized trials (Casella et al., Reference Casella, King, Barton, White, Noetzel, Ichord and Debaun2010; Wang et al., Reference Wang, Ware, Miller, Iyer, Casella, Minniti and Thompson2011). These studies used measures of general ability as outcomes, largely due to the lack of data indicating the relative merit of specific choices in measures of narrower abilities. The purpose of this report is to evaluate the utility of the Executive Abilities: Methods and Instruments for Neurobehavioral Evaluation and Research (EXAMINER) Battery to detect neurocognitive deficits in SCD. We also compare the EXAMINER Battery's performance to another measure of specific abilities that has shown good sensitivity to SCD-related deficits, the Woodcock-Johnson Cognitive Abilities, 3rd edition (WJ-III).
The EXAMINER Battery has several attractive features for SCD treatment research. EXAMINER was developed to create a flexible set of methods that can be used to define and measure executive functioning (Reference Kramer, Mungas, Possin, Rankin, Boxer, Rosen and WidmeyerKramer et al., this issue). EXAMINER was designed largely using the Unity and Diversity model of executive function (Miyake et al., Reference Miyake, Friedman, Emerson, Witzki, Howerter and Wager2000), which posits that executive function represents the coordinated execution of multiple, discrete executive function skills. Based on this framework, both global performance and narrower skills can be important levels of analysis. The EXAMINER Battery provides scores for a composite executive function factor and narrow-band factors derived from multiple executive function tasks. Executive function is one of the major areas of deficit found with SCD-related morbidity (Berkelhammer et al., Reference Berkelhammer, Williamson, Sanford, Dirksen, Sharp, Margulies and Prengler2007; DeBaun et al., Reference DeBaun, Schatz, Koby, Craft, Resar, Chu and Noetzel1998; Schatz & Roberts, Reference Schatz and Roberts2007; White, Saloria, Schatz, & DeBaun, Reference White, Saloria, Schatz and DeBaun2000); however, individual tests of executive function do not always yield effects superior to IQ test batteries (Hijmans et al., Reference Hijmans, Fijnvandraat, Grootenhuis, van Geloven, Heijboer, Peters and Oosterlaan2011). Thus, the use of composite scores from a battery of executive function tests may be advantageous. The EXAMINER Battery also relies on both traditional paper-and-pencil measures and computer-controlled measures involving precision display and response time, which may be more sensitive to subtle processing deficits (Schatz, Hale, & Myerson, Reference Schatz, Hale and Myerson1998).
SCD is a group of genetic blood disorders that result in the body producing abnormal, S-type hemoglobin. S-type hemoglobin undergoes structural changes when the blood becomes depleted of oxygen and results in a wide range of health effects (Serjeant, Reference Serjeant1995), including effects on the brain (Adams, Ohene-Frempong, & Wang, 2001). There are many specific SCD genotypes worldwide. Among the most common genotypes in the United States (U.S.), homozygous sickle cell anemia (HbSS) and sickle cell beta-zero-thalasemia (HbS-beta-thal0) show the highest risk for neurologic complications (Adams et al., Reference Adams, Ohene-Frempong and Wang2001). The most severe neurologic morbidity in SCD is stroke. Approximately 5% of children with SCD will develop stroke (Ohene-Frempong et al., Reference Ohene-Frempong, Weiner, Sleeper, Miller, Embury, Moohr and Gill1998). The highest incidence rates are between two and nine years of age. Cognitive deficits vary with the severity of the stroke, but range from mild to severe (Schatz & Puffer, Reference Schatz and Puffer2006).
Serious neurologic morbidities other than stroke occur in SCD and have associated neurocognitive effects. Silent cerebral infarcts (SCI) are regions of cerebral infarction evident on magnetic resonance (MR) imaging that occur without a history of stroke (Adams et al., Reference Adams, Ohene-Frempong and Wang2001). SCI occur in approximately 15–20% of children with SCD, have a generally similar age of onset as overt stroke, and often results in mild cognitive syndromes and academic difficulties (Bernaudin et al., Reference Bernaudin, Verlhac, Fréard, Roudot-Thoraval, Benkerrou, Thuret and Brugières2000; Pegelow et al., Reference Pegelow, Macklin, Moser, Wang, Bello, Miller and Kinney2002; Schatz, Brown, Pascual, Hsu, & DeBaun, Reference Schatz, Brown, Pascual, Hsu and DeBaun2001). Cerebral blood flow can also be abnormal with no cerebral infarction present (Wang et al., Reference Wang, Gallagher, Pegelow, Wright, Vichinsky, Abboud and Adams2000). The most common cerebral blood flow abnormality is elevated blood flow velocity in the cerebral arteries (Adams, Reference Adams2005). Higher cerebral blood flow velocity in SCD is associated with deficits in executive function (Kral et al., Reference Kral, Brown, Nietert, Abboud, Jackson and Hynd2003; Sanchez, Schatz, McClellan, & Roberts, Reference Sanchez, Schatz, McClellan and Roberts2010). Additionally, abnormally low blood flow or asymmetric blood flow is also indicative of cerebral vascular concerns (Buchanan, James-Herry, & Osunkwo, Reference Buchanan, James-Herry and Osunkwo2013; Schmidt et al., Reference Schmidt, Piechnik, Smielewski, Raabe, Matta and Czosnyka2003). Transcranial Doppler ultrasound (TCD) has become an important clinical indicator of abnormal cerebral blood flow that can be used to predict future risk of stroke (Adams, Reference Adams2005). Another indicator of problems with cerebral blood flow and stroke risk is stenosis of cerebral arteries, which may or may not occur along with abnormal TCD (Abboud et al., Reference Abboud, Cure, Granger, Gallagher, Hsu and Wang2004). Although little research on cognitive functioning has focused on cerebral artery stenosis, the limited data suggests an impact on cognition, at least with more extensive abnormalities (Hogan, Kirkham, Isaacs, Wade, & Vargha-Khadem, Reference Hogan, Kirkham, Isaacs, Wade and Vargha-Khadem2005). SCD is also associated with high rates of sleep disordered breathing (SDB), which appears to be in the 20–25% rate of prevalence (Goldstein et al., Reference Goldstein, Keller, Rey, Rao, Weedon, Dastgir and Miller2011, Strauss et al., Reference Strauss, Sin, Marcus, Mason, McDonough, Allen and Arens2012). Children with SDB have disrupted cerebral oxygen delivery, reduced brain tissue volumes, and mild cognitive syndromes (Lal, Strange, & Bachman, Reference Lal, Strange and Bachman2012; Macey et al., Reference Macey, Henderson, Macey, Alger, Frysinger, Woo and Harper2002; Morrell et al., Reference Morrell, McRobbie, Quest, Cummin, Ghiassi and Corfield2003; Sforza & Roche, Reference Sforza and Roche2012).
The present study addresses the ability of the EXAMINER Battery to detect neurocognitive effects of SCD using several methods not used frequently in SCD. First, most research in this area has focused on identifying cognitive effects associated with specific forms of morbidity. Although this work has merit, several factors complicate this research. There is a high co-occurrence rate of neurologic morbidities which makes identifying the unique contribution of specific morbidities difficult. The types of cognitive deficits appear to be very similar across forms of neurologic morbidity in SCD (Schatz & McClellan, Reference Schatz and McClellan2006). Therefore, it may be useful to view neurologic morbidities as lying on a continuum of severity, rather than as categories. Here we took an ordinal approach by using stroke as the most severe neurologic outcome. We then characterized neurologic involvement according to the number non-stroke morbidities (i.e., SCI, cerebral blood flow abnormalities, cerebral stenosis, SDB) to create ordinal groups from no neurologic morbidity to stroke.
As another measure of disease effects we evaluated the size of the corpus callosum (CC) using the area of the CC on the midsagittal view from MR. We have used this measure previously as an index of general white matter integrity to demonstrate decreased size of the CC with increasing neurologic disease in SCD (Schatz & Buzan, Reference Schatz and Buzan2006). Decreases in healthy-appearing white matter tissue based on midsagittal CC area or total volume show a linear association with the degree of cognitive deficit in SCD, show medium-to-large correlations with the volume of cerebral infarction, and capture brain effects beyond visible cerebral infarction (Baldeweg et al., Reference Baldeweg, Hogan, Saunders, Telfer, Gadian, Vargha-Khadem and Kirkham2006; Schatz & Buzan, Reference Schatz and Buzan2006). This measure also has a better distribution of scores than measures such as visible lesion volume in the present sample because the majority of children with SCD did not have visible tissue injury to the brain. We hypothesized that increasing neurologic involvement, as measured by clinical history and midsagittal CC area, would be associated with poorer performance on the EXAMINER factor scores.
In addition to evaluating neurologic effects, we compared the magnitude of effect sizes for EXAMINER variables to measures from the WJ-III (Woodcock, McGrew, & Mather, 2001). The WJ-III was designed to measure seven distinct domains of cognitive ability. We have shown that several of these domains have been consistently related to sickle cell neurocognitive deficits in school-age children. In particular, verbal comprehension (a.k.a., verbal ability), processing speed, and short-term/working memory are commonly affected (Schatz, Finke, & Roberts, Reference Schatz, Finke and Roberts2004; Schatz, Puffer, Sanchez, Stancil, & Roberts, Reference Schatz, Puffer, Sanchez, Stancil and Roberts2009; Schatz & Roberts, Reference Schatz and Roberts2005). We hypothesized the effect sizes between EXAMINER variables and neurologic severity variables will be at least as large as those found with WJ-III measures of verbal comprehension, processing speed, and short-term/working memory.
Method
Participants and Recruitment
Informed consent for research participation was obtained from a parent and assent was provided by the child as approved by the Institutional Review Board. The 32 participants with SCD disease and a parent were recruited at routine health maintenance visits at the Center for Children's Cancer and Blood Disorders at Palmetto Health Richland. Inclusion criteria were a genotype of either HbSS or HbS-beta-thal0 (as determined by routine birth screenings and subsequent hemoglobin electrophoresis) and age between 8 and 18 years. Thirty-one of the participants had HbSS and one participant had HbS-beta-thal0. Exclusion criteria were a co-morbid major medical, psychiatric, or developmental condition (e.g., autism, bipolar mood disorder, cancer, intellectual disability) or cognitive/motor limitations that precluded the ability to participate in the cognitive testing per medical records. On the day of enrollment if a child had pain, had taken pain medication, or had other illness, testing was re-scheduled for a subsequent date. All children with SCD were offered the opportunity to complete an optional magnetic resonance (MR) imaging study. Eight participants completed the research MR exam. All children reported English as their primary language.
Medical record reviews were conducted for the children with SCD to determine clinical history after cognitive testing and research MR exams had been completed. All SCD participants had either clinical MR/MRA exams completed within eighteen months of cognitive testing or completed the research MR/MRA exams. Twenty-seven of the participants had MR exams within 3 months of cognitive testing and three participants had exams within 12 months of cognitive testing; the two participants with longer intervals were over the age of 12 years at the time of both the MR exam and cognitive testing. The mean interval between MR exams and cognitive testing was 3.6 months and the median interval was 2.4 months. Reports from clinical exams were coded for study variables; a neuroradiologist reviewed the eight research scans to code the same variables.
All SCD participants had been screened at routine health maintenance visits for self- or parent-reported snoring. If snoring was a frequent event, a more detailed history was collected regarding labored breathing during sleep, observed apnea, restless sleep, diaphoresis, enuresis, cyanosis, or excessive daytime sleepiness by the pediatric hematologist. Children with frequent snoring and any additional indicators of SDB, or a parent concern about snoring, were referred for overnight polysomnography conducted by a pediatrician who was certified in sleep medicine. Diagnosis of SDB and its severity were completed using standard clinical criteria (American Academy of Pediatrics, 2002).
Finally, most children had received annual TCD exams as part of routine health care per the STOP protocol (Adams, Reference Adams2005). TCD exams were considered abnormal only if a routine exam was abnormal and the results were replicated in a subsequent exam per STOP procedures. In cases without TCD data, the child had suffered a stroke and was on chronic transfusion therapy. To form study groups, we relied on natural breaking points to provide relatively even distribution of children across neurologic severity groups. A summary of the clinical morbidities are provided in Table 1.
Table 1 Non-stroke neurologic morbidities and history of overt stroke event for children with neurologic disease
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090250789-0295:S1355617713001239:S1355617713001239_tab1.gif?pub-status=live)
Note. Sleep study categories are based on clinical criteria for mild, moderate, or severe obstructive sleep apnea based on the apnea-hypopnea index; Sleep study ranges are given if multiple exams were obtained with different categorization.
TCD = Transcranial Doppler Ultrasound; MRA = magnetic resonance angiography; MRI = magnetic resonance imaging; MCA = middle cerebral artery; ICA = internal carotid artery; ACA = anterior cerebral artery.
Eighty-five comparison children without SCD were recruited from local after-school care and summer care programs. Programs were targeted that were known to have similar demographics as our SCD clinic population, which is almost exclusively African-American and approximately half of the families are of lower socioeconomic status. Parents at these programs were sent a flyer with an accompanying letter, consent form, and demographic questionnaire for the study to return if interested. The inclusion criterion was ages 8 to 18 years. Exclusion criteria were a comorbid major medical, psychiatric, or developmental condition (e.g., autism, bipolar mood disorder, cancer, intellectual disability, sickle cell disease) or cognitive/motor limitations that precluded the ability to participate in the cognitive testing per parent report. All comparison children reported English as their primary language. Non-SCD comparison children were demographically similar to the SCD group with the exception of family income, which had a significantly higher representation of families earning more than $40,000 per year. We, therefore, randomly selected 25 comparison children from the highest income bracket to exclude from analyses to improve the demographic matching process, resulting in 60 demographic controls. Descriptive data for the final study sample are provided in Table 2.
Table 2 Descriptive information for primary study groups
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090250789-0295:S1355617713001239:S1355617713001239_tab2.gif?pub-status=live)
Note. If there are multiple parents in the home, parent education is based on the highest education of any one parent. n.a. = not available. Chi-square statistic is not provided for parent education, family income, and current therapy variables due to multiple cells containing less than five expected observations. Adults in the home includes anyone 18 years or older (other than participant) who resides in the same home as the participant. For hydroxyurea treatment status a child had to be taking the prescribed medication and show a therapeutic hematological response as judged by the treating hematologist to be considered to be on the therapy. One child with no neurologic morbidity was receiving chronic transfusion therapy due to recurrent acute chest syndrome. Routine CBC values were for the nearest routine CBC to the date of cognitive testing; in all cases this was within one month and for 27 of 32 cases the blood for the CBC was drawn within 48 hours of testing.
Procedures
Children completed one-on-one cognitive testing by a trained examiner. The study procedures occurred in a single session in all but six cases (two SCD cases and four non-SCD cases) for which the testing session was split across two sessions due to time constraints. Parents completed a demographic questionnaire and a behavioral report measure.
EXAMINER Battery
The EXAMINER Battery provides factor scores that are derived from transformations of raw scores as developed through the use of confirmatory factor analysis and Item Response Theory (IRT) methods (Kramer, this issue). There are 11 core variables used to compute a three-factor model (Fluency, Cognitive Control, Working Memory) and a Composite score based on a one-factor model. The process of transforming raw scores was initially conducted with data from English-speaking adults ages 18–64 and later tested for differential item functioning across language status and age groups. Each of the 11 continuous variables was re-coded into an ordinal score (up to 20 categories with at least 10 scores per category) that yielded ordinal scores roughly matching the distribution of raw scores. The ordinal scores were then used in the IRT analyses. The R ltm module was used to fit a two parameter graded response model. Item parameters were used to calculate examinee scores and standard errors using Empirical Bayes scoring. The three-factor model showed excellent fit that was superior to the one-factor model; however, the 11 loadings on the one-factor model were strong and also showed excellent fit. Factor scores are expressed in the form of raw scores from the IRT output. These scores are not age-adjusted. In addition, the same parameters were used across all ages as analyses testing differential item functioning for different age groups did substantially change the parameters for calculating the IRT scores. Tasks used to generate the factor scores are phonemic and category fluency tasks, a flanker task, a continuous performance test (CPT), a dot counting task, an n-back task, an anti-saccade task, a set-shifting task, and a behavioral rating form completed by the test administrator. Test–retest reliability for the battery was evaluated in 122 normal adult controls at an average interval of 25 days (Reference Kramer, Mungas, Possin, Rankin, Boxer, Rosen and WidmeyerKramer et al., this issue). The Executive Composite factor showed a test–retest reliability of r = .93.
Fluency factor
The fluency factor is based on Phonemic Fluency (2 trials) and Category Fluency (2 trials) tasks. Variables used are the total correct responses from the four trials. Test–retest reliability for the Fluency factor was r = .88.
Cognitive control factor
The Cognitive Control factor involves measures of inhibitory control, set shifting, and behaviors associated with the dysregulation of executive function. Variables that contribute to this factor are: total Flanker score, total Set Shifting score, anti-saccade total, and total dysexecutive errors. This is the only EXAMINER factor score for which higher scores indicate poorer performance. In the Flanker Task, the examinee is instructed to focus on a small cross in the center of the screen. After a short variable duration, a row of five arrows is presented in the center of the screen either above or below the fixation point. On half the trials, the flanking arrows are congruent with the direction of the center arrow and on half of the trials the flanking arrows are incongruent with the direction of the center arrow. The total Flanker score is derived from both error and response time data that is used to contrast performance on incongruent versus congruent trials.
In the Set Shifting task, participants match a stimulus on the top of the screen to one of two stimuli in the lower corners of the screen. There are task-homogeneous blocks in which participants perform the task with only a single dimension (either classifying shapes or classifying colors). In heterogeneous blocks participants alternate between the two tasks pseudo-randomly with the target dimension expressed via the word “shape” or “color” presented at the bottom of the screen. Performance on the task-homogeneous and task-heterogeneous blocks are compared to measure the performance differences between heterogeneous and homogeneous blocks (expressed in latency and accuracy) and the differences between switch and non-switch trials within the heterogeneous block (also expressed in latency and accuracy). The total Set Shifting score is the sum of the accuracy and reaction time scores.
In the anti-saccade task there are three blocks of trials in which subjects look at a fixation point in the center of a computer screen and move their eyes in response to a peripherally-presented stimulus. The first block is pro-saccade trials and the second and third blocks are anti-saccade trials. The primary outcome measure is the total number of correct responses on the two anti-saccade trials as observed by the examiner.
Dysexecutive errors is a composite variable derived from: false alarm responses on a CPT, rule violations on the verbal fluency tasks, the tendency to make errors on Flanker incongruent trials relative to congruent trials, the tendency to make errors on the Set Shifting shift trials relative to the non-shift trials, and the total score on the Behavior Rating Scale completed the test administrator. For the CPT, participants complete 100 experimental trials on a computer in which they push a button only for a target shape. The CPT is designed to elicit false alarm errors with 80% of trials showing the target. The primary dependent measure is the total number of false alarm errors. The Behavioral Rating Scale contains likert-type ratings of the following behaviors: agitated, stimulus-bound, perseverative, decreased initiation, motor stereotypes, lack of social/emotional engagement, impulsiveness, and socially inappropriate behavior with these behaviors defined in the EXAMINER manual. The combination of test responses and rating scale items in the Cognitive Control factor score is somewhat unusual in scale development; however, the test–retest reliability for the Cognitive Control factor was r = .88.
Working memory factor
The Working Memory factor is based on three variables derived from two tasks. The Dot Counting task involves counting the number of dots of a specified color among distracters over three to seven component displays on a screen, then stating the number of dots in each component display. Six trials are administered. The primary dependent variable is the total correct responses summed across the trials. For the n-back task children complete a spatial 1-back test. Thirty trials are presented (10 “yes” trials and 20 “no trials”) in which the child presses a button whenever a square is presented in the same spatial location as the previous trial (“yes trials”) and different button if the square is presented in a different location as the previous trial (“no trials”). Both total correct and d-prime are computed. Test–retest reliability for the Working Memory factor was r = .78.
WJ-III measures
Measures of verbal-comprehension, processing speed, and short-term/working memory domains from the WJ-III were administered. Verbal Comprehension is a measure of crystallized knowledge and verbal reasoning involving four subtests: picture naming, generating synonyms, generating antonyms, and verbal analogies; this test is also the WJ-III measure of Verbal Ability. Visual Matching is a measure of processing speed in which the child identifies matching numbers among rows of numbers over a 3-min period. Numbers Reversed is a measure of short-term/working memory in which the child hears a series of digits and repeats back the sequence in reverse order. Each test was administered and scored according to the test manual and converted to W scores and age-adjusted standard scores. The W scale is a mathematical transformation of the Rasch model of data analysis (Woodcock et al., Reference Woodcock, McGrew and Mather2001). The W scale score is created by converting raw scores such that they become an equal-interval scale representing ability level with a center on a value of 500, which is set approximately at the average performance of children 10.0 years of age. The typical range of W abilities within a WJ-III test varies from approximately 430 to 550. This metric from the WJ-III is most similar to the scaling of the EXAMINER factor scores and allows the most direct comparison of the two measures. Age-adjusted standard scores are also provided for descriptive and comparative purposes because this scaling is most commonly used in clinical research and practice.
Magnetic resonance (MR) exams and CC measurement
Twenty-eight children had completed MR exams for clinical purposes and eight children completed research MR exams (four completed both types of scans). All scans were collected without sedation. T1-weighted scans in the sagittal plane were used for CC measures. All of the clinical scans were collected on 1.5 T scanners using a 2D T1-weighted spin echo sequence with a matrix size of 256 × 192. Repetition time (TR) varied from 500 to 600 and echo time (TE) varied from 15 to 20 ms. Thirty-two slices were collected with a slice thickness of 4 mm and a 10–20% gap. The voxel size was 1 × 1 × 4 mm. For research scans, a three-dimensional (3D) MP-RAGE sequence was used with a matrix of 256 × 256. TR was 2250 ms and a TE of 4.15. One hundred ninety-two slices were collected with 1 mm contiguous slices. The voxel size was 1 × 1 × 1 mm. All participants also had axial T2-weighted and T2-weighted FLAIR sequences that were used to identify regions with apparent cerebral infarction and a 3D time of flight MRA sequence to visualize the major cerebral arteries.
Midsagittal CC measurements were completed as described previously (Schatz & Buzan, Reference Schatz and Buzan2006). CC measurements were completed by two raters. The midsagittal slice for each participant was identified on T1-weighted scans by identifying the slice with the maximal view of the CC and the cerebral aqueduct. Images were rotated using ImageJ v.1.44 software so that the base of the CC was horizontal (Schneider, Rasband, & Eliceiri, Reference Schneider, Rasband and Eliceiri2012). The CC was then manually outlined using a computer mouse. A straight line was drawn between the anterior and posterior limits of the CC. The length of this line was used as a control measure given that tissue effects typically results in thinning of the CC, not decreases in overall length. The method for subdividing the CC was adapted from Witelson (Reference Witelson1989; see Figure 1). Perpendicular lines were drawn from the posterior CC at one-fifth and one-third distance of the CC length, and from the anterior CC at one-third and one-half the distance of the CC length. The five sections roughly corresponded to the rostrum, genu and rostral body (region 1), anterior midbody (region 2), posterior midbody (region 3), isthmus (region 4), and splenium (region 5). Across the two raters total CC area was highly reliable (r = .98). For specific CC sections, inter-rater reliabilities varied from r = .90–.98 (median r = .92).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090250789-0295:S1355617713001239:S1355617713001239_fig1g.jpeg?pub-status=live)
Fig. 1 Sample midsagittal image showing the corpus callosum (CC) measurement approach. Panel A shows the raw image. Panel B shows the image rotated so that the corpus callosum is horizontal. Black lines show manual tracing of the corpus callosum, a line representing the horizontal length of the corpus callosum, and the subdivisions that were measured as adapted from Witelson (Reference Witelson1989). Only the anterior and posterior halves of the CC were retained for subdivisions in the final study measures.
Clinical and research scans showed comparable rank order and absolute size of the measurements for the participants completing both types of scans (e.g., total CC area was M = 471.5 mm2 for the clinical scans and M = 467.3 mm2 for the research scans). We calculated the percent error between scan protocols based on the mean values for the four participants. Percent error for the total CC area was 0.9%. For regions 1–5, the percent error was 3.1%, 0.9%, 11.8%, 2.8%, and 11.7%, respectively. We believed regions three and five showed too large of percent error to combine the data across protocols. If data on specific CC regions were collapsed into the area for the anterior half and posterior half of the CC, the percent error was 2.6% for anterior and 5.8% for posterior, which we considered acceptable for combining data across MR protocols. Therefore, data are not reported for specific regions, but only for the anterior and posterior halves of the CC.
Finally, the potential confound of normal demographic effects (e.g., age) impacting analyses of the CC variables was examined. Age- and gender-adjusted scores for the total CC area were computed based on the quadratic formula as reported by Giedd et al. (Reference Giedd, Blumenthal, Jeffries, Rajapakse, Vaituzis, Liu and Castellanos1999) and indicated age and gender accounted for little variance in CC size for our sample. Raw distance from age-expected CC area was calculated by subtracting the observed CC area from the age- and-gender-expected CC area. Total observed CC area and the distance from age- and-gender expected CC area were highly correlated. The Pearson and Spearmen rank-order correlations for these two measures were r = .92. Therefore, observed CC area was used in all analyses without any age correction.
Statistical Methods
Hypothesis 1. Increasing neurologic involvement is associated with poorer performance on the EXAMINER variables
Examiner variables were evaluated in relation to clinical history via a series of one-way analysis of variance (ANOVA) procedures. The dependent variable was EXAMINER factor score (Executive Composite, Fluency, Cognitive Control, or Working Memory). The independent variable was neurologic history group (controls, no non-stroke morbidities, one-or-two non-stroke morbidities, three-or-four non-stroke morbidities, stroke). To examine the relationship between the four examiner scores and midsagittal CC area, Pearson correlation coefficients were computed. For both sets of analyses, alpha was set at p=.0125 (.05/four dependent variables) to control for alpha inflation. In addition to these primary tests of Hypothesis 1, receiver operating characteristic (ROC) curves were examined to explore the potential discrimination ability of the EXAMINER to differentiate between SCD participants with neurologic morbidity from participants without such morbidity (controls, SCD without morbidity). Area under the ROC curve was used for this purpose as it represents the probability that a randomly chosen participant would be correctly classified as more impaired on the measure than a randomly chosen participant without neurologic disease (Hanley & McNeil, Reference Hanley and McNeil1982).
Hypothesis 2. The effect sizes between EXAMINER variables and neurologic severity variables will be at least as large as those found with WJ-III measures
For the three WJ-III cognitive variables, analogous one-way ANOVA and Pearson correlation coefficients were computed to compare with EXAMINER variables. Eta-squared was computed for all one-way ANOVA procedures as a measure of effect size. ROC curves were computed for WJ-III measures to compare with the EXAMINER measures.
Finally, given that EXAMINER scores were not age corrected, age was evaluated for its ability to account for unexplained variance in the analyses. We repeated the ANOVA and correlation procedures using age as a covariate to determine if this changed the statistical significance of any findings or had an apparent impact on observed effect size. We also repeated the ROC analyses for children less than versus more than 13 years of age (a value near the median split of the sample) to explore the extent to which narrowing the age range might improve the identification of participants with neurologic disease.
Results
EXAMINER scores differed by neurologic history. The summary of ANOVA results for the groups are described in Table 3. Across all EXAMINER variables, poorer performance occurred with increasingly severe neurologic history. Specific contrasts showing significant post hoc tests were: (a) Children with SCD and stroke showed poorer performance than the non-SCD control group for all EXAMINER measures and showed poorer performance than the children with SCD and no morbidities for the Executive Composite and Fluency scores. (b) Children with SCD and three-or-four non-stroke neurologic morbidities showed poorer performance than the non-SCD control group on all four EXAMINER measures, and poorer performance than children with SCD and no morbidities for the Executive Composite and Cognitive Control factors. (c) Children with SCD and one or two non-stroke morbidities performed worse on the Cognitive Control factor than the non-SCD control group. Tests for the homogeneity of variance assumption using the Levene statistic indicated no concerns about this assumption.
Table 3 Mean scores (± SD) for continuous dependent variables according to clinical history
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090250789-0295:S1355617713001239:S1355617713001239_tab3.gif?pub-status=live)
Notes. WJ-III = Woodcock-Johnson Tests of Cognitive Ability, 3rd edition; CC = corpus callosum. For all cognitive scores higher values represent better performance except for the EXAMINER Cognitive Control factor. All F-values were p < .005 with the exception of CC total area (p = .127), CC posterior area (p = .536), and CC length (p = .885). Mean values that share a subscript did not differ in LSD post hoc tests (alpha set at p < .05). Missing data (due to children not being able to complete the tasks reliably) occurred for Examiner Cognitive Control (one child in the overt stroke group) and Examiner Working Memory (nine children in the control group; one child in the no morbidity group, three children in the 1–2 morbidities group, four children in the 3-4 morbidities group, and two children in the overt stroke group).
EXAMINER scores showed statistically significant correlations with midsagittal CC measurements (see Table 4). The total CC area was significantly associated with the EXAMINER Composite score. Anterior CC area was correlated with the Composite and Fluency scores. Other correlations varied in magnitude and many approached statistical significance. In all cases the direction of the association was larger CC area associated with better performance.
Table 4 Correlations among MRI measurements and cognitive variables for SCD participants (n = 32)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160927090250789-0295:S1355617713001239:S1355617713001239_tab4.gif?pub-status=live)
Note. Correlations shown in bold type exceed the study alpha level of .0125. For all cognitive scores higher values represent better performance except for the EXAMINER Cognitive Control factor.
ROC curves indicated fair discrimination between participants with versus without neurologic morbidity. The area under the curve for EXAMINER variables was .796 for the EXAMINER Composite score, .760 for the Fluency factor, .782 for the Cognitive Control factor, and .760 for the Working Memory factor. All p-values for these curves were p < .002, indicating statistically significant ability to discriminate between groups.
Comparison of the magnitude of the observed effects across the EXAMINER and WJ-III W scale measures suggested a slight trend toward larger effects for the EXAMINER battery, though the difference was well within confidence intervals for the magnitude of these effects. Based on neurologic history, the eta-squared values for the EXAMINER variables ranged from .183 to .275 with a median of .230, whereas for the WJ-III the effect sizes ranged from .150 to .217 with a median of .168 (see Table 3). With midsagittal CC area, the magnitude of the association with EXAMINER variables ranged with R 2 values of .08 to .31 with a median of .18 whereas with the WJ-III the R 2 ranged from .08 to .15 with a median of .13 (see Table 4). Area under the ROC curve for WJ-III W scores also appeared to be slightly lower overall than for EXAMINER variables (Verbal Ability = .727; Visual Matching = .776; Numbers Reversed = .693).
Age created potential unexplained variance that could impact effect size measures for the EXAMINER and WJ-III variables. Controlling for age appeared to increase the association between neurologic disease and the cognitive measures for all instances except for the association of the WJ-III measures with CC area. The planned ANOVA analyses including age as a covariate showed that the inclusion of this covariate appeared to generally increase the η 2 value for neurologic severity group in explaining the EXAMINER scores (Executive Composite η 2 = .370, Fluency η 2 = .289, Cognitive Control η 2 = .272, Working Memory η 2 = .300), and the WJ-III scores (Verbal Ability η 2 = .258, Visual Matching η 2 = .323, Numbers Reversed η 2 = .164). Partial correlations between total CC area and the cognitive variables controlling for age showed minimal impact on the magnitude of these correlations for the EXAMINER (Executive Composite r = .57, Fluency r = .52, Cognitive Control r = −.37, Working Memory r = .43) but slightly lower values for the WJ-III variables when controlling for age (Verbal Ability r = .22, Visual Matching r = .21, Numbers Reversed r = .27). Finally, ROC curves conducted separately for younger and older children suggested the narrowing of ages in these analyses yielded slightly higher discrimination ability for both EXAMINER scores (range, .713 to .879; median .800) and WJ-III scores (range, .697 to .816; median .748).
Discussion
This study evaluated the ability of the EXAMINER Battery to detect SCD-related neurocognitive deficits and compared the magnitude of this relationship to more traditional cognitive measures. Overall, the EXAMINER Battery demonstrated: (a) an association with neurologic severity as assessed by clinical history or by brain tissue effects as measured by midsagittal CC area and (b) showed fair ability to discriminate between children with and without neurologic morbidity. The magnitude for the associations between EXAMINER factor scores and severity of neurologic disease was generally of a medium-to-large size using conventional descriptive labels (Cohen, Reference Cohen1988). The battery appeared to function at least as well as established measures from the WJ-III Tests of Cognitive Abilities for all of these comparisons.
The EXAMINER Battery offers a potentially useful alternative to more traditional paper-and-pencil cognitive testing with several advantages and disadvantages to consider. Many measures of executive function show low intercorrelations, suggesting the choice of what task to use may be more influential than the construct itself for determining study outcomes (Testa, Bennett, & Ponsford, Reference Testa, Bennett and Ponsford2012). The EXAMINER Battery uses a latent variable approach with multiple tasks used to generate factor scores that are less dependent on any single task. The option of using either a global composite score for executive function or narrower-band factor scores also allows for two complimentary levels of analysis. We found the most robust associations with the Executive Composite score; however, it is possible that in intervention research changes in more discrete neurocognitive systems might be of more interest (King, DeBaun, & White, Reference King, DeBaun and White2008). Finally, in our data, the EXAMINER Battery showed a slight trend toward better performance across all three approaches used (neurologic history groups, midsagittal CC area, ROC curves differentiating children with or without neurologic morbidity). Although none of these differences was striking in isolation, the consistency with which this occurred is encouraging for the future of this measure.
Despite the strengths noted above, there are some limitations of the EXAMINER Battery in its current state. First, the factor scores generated from the battery are not age-corrected. For patient populations like pediatric SCD, most studies include a heterogeneous age range of participants. Age-corrected scores are useful to reduce this source of variance and, in fact, our analyses showed EXAMINER variables performed slightly better with post hoc attempts to reduce unexplained variance due to age differences. Most notably, for the ROC curves our relatively crude attempt to take age into account helped move the discrimination ability of EXAMINER variables into the range of classification accuracy often desired for clinical decision making (Murphy et al., Reference Murphy, Berwick, Weinstein, Borus, Budman and Klerman1987). Second, the reliability and validity of this measure across cultural groups (e.g., race/ethnicity) is not known. The development of the measure included data from people of differing cultural backgrounds in the United States, but specific evidence for the cultural validity of the instrument is needed. Third, the EXAMINER factor scores require completing a somewhat lengthy set of tasks that may not be feasible in some study designs. For example, we had some difficulties in our sample with children not being able to complete both working memory tasks; however, this was not a general pattern across sites involved in the development of the instrument.
The present study has several limitations that should be considered in interpreting the results. We believe the approach of measuring clinical history using ordinal classification makes sense given the complex set of neurologic morbidities in SCD; however, the sample size for individual groups was small. The statistical comparison of these groups yielded several significant differences; however, there is insufficient power to make inferences about null effects between groups. The sample was also too small to generate reliable cut-off scores for detecting neurologic morbidity that could be generalized to other samples; such work would best be conducted with a larger sample and age-corrected EXAMINER scores. In addition, the present SCD sample appeared to have predominant changes in the CC for the anterior half with minimal differences evident across groups in the posterior CC. This is not consistent with previous data suggesting more diffuse white matter effects and may be a sampling issue (Baldeweg et al., Reference Baldeweg, Hogan, Saunders, Telfer, Gadian, Vargha-Khadem and Kirkham2006; Schatz & Buzan, Reference Schatz and Buzan2006). Our use of midsagittal CC area as an outcome measure showed reliability and validity; however, the reliance on clinical MR scans also likely added to error variance, both from the somewhat thick 4-mm slices acquired in these scans and a greater time lag between cognitive testing and some of the MR exams. Despite these limitations, this study had sufficient power to detect group differences in cognitive performance, including differences between children with SCD and no known neurologic morbidity and those with significant neurologic morbidity. The MR measures also were sensitive enough to detect disease-related changes across the SCD study groups and showed robust correlations with several cognitive measures. The present study, despite some limitations, indicates the EXAMINER Battery is a promising tool for assessing SCD-related neurocognitive deficits.
Acknowledgments
The authors thank the families who participated in this research. This work was supported in part by a grant from the National Institutes of Health (NIH-NINDS-05-02; Kramer, P.I., Schatz, site P.I.) and an award from the University of South Carolina Institute for African American Research (Sanchez). There are no known financial or other relationships that could be interpreted as a conflict of interest for this manuscript.