INTRODUCTION
There is a growing consensus that working memory and inhibition are two core components of the executive function (EF) construct (Pennington & Ozonoff, Reference Pennington and Ozonoff1996; Verte et al., Reference Verte, Geurts, Roeyers, Oosterlaan and Sergeant2006), and that they may be dissociable in children (Goldberg et al., Reference Goldberg, Mostofsky, Cutting, Mahone, Astor, Denckla and Landa2005; Mahone et al., Reference Mahone, Zabel, Levey, Verda and Kinsman2002c, Reference Mahone, Pillion, Hoffman, Hiemenz and Denckla2005; Martinussen et al., Reference Martinussen, Hayden, Hogg-Johnson and Tannock2005; Nyden et al., Reference Nyden, Hjelmquist and Gillberg2000; Ozonoff & Jensen, Reference Ozonoff and Jensen1999). Working memory refers to temporary retention of information that was just experienced, but no longer exists, which can be stored via active maintenance, and made available for goal-directed behavior via manipulation (Sheridan et al., Reference Sheridan, Hinshaw and D’Esposito2007). Working memory is essential to everyday functioning in children because it permits rules to guide decision-making and responses so that behavior is not entirely governed by sensory cues in the environment (Martinussen et al., Reference Martinussen, Hayden, Hogg-Johnson and Tannock2005). It is critical to classroom learning (Kibby et al., Reference Kibby, Kroese, Morgan, Hiemenz, Cohen and Hynd2004), and has been described as a candidate endophenotype in disorders such as ADHD (Castellanos & Tannock, Reference Castellanos and Tannock2002). Verbal working memory in particular has been linked to reading comprehension, both in normal, highly experienced readers (Swanson & Alexander, Reference Swanson and Alexander1997) and in impaired readers (Sesma et al., Reference Sesma, Mahone, Levine, Eason and Cutting2008).
A variety of performance-based behavioral measures purport to measure working memory in children, including digit, letter, sentence, and spatial span (Conway et al., Reference Conway, Kane, Bunting, Hambrick, Wilhelm and Engle2005; Milner et al., Reference Milner, Corsi and Leonard1991; Roid, Reference Roid2003; Wechsler et al., Reference Wechsler, Kaplan, Fein, Kramer, Morris and Delis2004), self-ordered search (e.g., CANTAB, Cambridge Cognition, 2004), letter-number sequencing (Wechsler et al., Reference Wechsler, Kaplan, Fein, Kramer, Morris and Delis2004), and divided attention (McGrew & Woodcock, Reference McGrew and Woodcock2001; Schretlen, Reference Schretlen1997). Working memory in children can also be measured by using rating scales assessing executive functions. A commonly used parent rating scale of EF in children is the Behavior Rating Inventory of Executive Function (BRIEF; Gioia et al., Reference Gioia, Isquith, Guy and Kenworthy2000). Since its publication, there have been several validity studies examining concurrent and predictive validity of the BRIEF (See Donders, Reference Donders2002; Strauss et al., Reference Strauss, Sherman, Spreen, Strauss, Sherman and Spreen2006, for reviews). Initial factor analytic studies of the BRIEF support two robust indices: a Behavioral Regulation Index (BRI)—emphasizing inhibitory and emotional control, and a Metacognition Index (MCI)—emphasizing working memory, planning, and strategic response preparation (Gioia et al., Reference Gioia, Isquith, Guy and Kenworthy2000). Five subscales comprise the MCI, with the Working Memory scale showing the greatest scale correlation with the Index (r = .88). Items from the Working Memory scale include: “Forgets what he/she was doing,” or “Has trouble remembering things, even for a few minutes.”
Behavioral studies in primates, along with imaging and lesion studies in humans, have linked working memory to the lateral prefrontal cortex (Baldo & Dronkers, Reference Baldo and Dronkers2006; D’Esposito et al., Reference D’Esposito, Postle and Rypma2000; Mull & Seyal, Reference Mull and Seyal2001) in a process specific manner, such that the ventrolateral prefrontal cortex (VLPFC) is involved in the maintenance of information, whereas the dorsolateral prefrontal cortex (DLPFC) is associated with manipulation of information (D’Esposito et al., Reference D’Esposito, Postle, Ballard and Lease1999; Fletcher & Henson, Reference Fletcher and Henson2001; Harley & Speer, Reference Harley and Speer2000). While much of the work examining the functional dissociation between domain specific components of working memory has been done for visual tasks (Courtney et al., Reference Courtney, Petit, Maisog, Ungerleider and Haxby1998; Mohr et al., Reference Mohr, Goebel and Linden2006), a similar dissociation has been reported for language-based information (D’Esposito et al., Reference D’Esposito, Postle, Ballard and Lease1999; Smith & Jonides, Reference Smith and Jonides1999), and in information specific to auditory domain (Rama et al., Reference Rama, Poremba, Sala, Yee, Malloy, Mishkin and Courtney2004).
Despite the promise of neuropsychological tests of EF, studies examining their ecological validity have yielded conflicting results (Wodka et al., Reference Wodka, Loftis, Mostofsky, Prahme, Gidley Larson, Denckla and Mahone2008), possibly because the highly-structured clinical testing setting may not place a high enough demand on EF due to the external constraints and supports necessarily imposed on the child by the examiner (Bernstein & Waber, Reference Bernstein, Waber, Boulton, Baker and Hiscock1990; Tarazi et al., Reference Tarazi, Mahone and Zabel2007). Additionally, performance-based measures of EF may be less sensitive in children with above average IQ (Mahone et al., Reference Mahone, Hagelthorn, Cutting, Schuerholz, Pelletier, Rawlins, Singer and Denckla2002a). Given the difficulties associated with performance-based measures, there has been increased interest in methods to improve the ecological validity of comprehensive neuropsychological assessments (Chaytor et al., Reference Chaytor, Schmitter-Edgecombe and Burr2006; Sbordone, Reference Sbordone2000; Spooner & Pachana, Reference Spooner and Pachana2006), including use of caregiver rating scales in conjunction with performance-based tests in clinical formulations (Cripe, Reference Cripe1996), and development of tests with ecological validity in mind (Robertson et al., Reference Robertson, Ward, Ridgeway and Nimmo-Smith1994; Wilson et al., Reference Wilson, Cockburn and Baddeley1985, Reference Wilson, Shiel, Foley, Emslie, Groot, Hawkins, Groot and Evans2004). Although results have been mixed, many studies find that caregiver ratings of EF do not correspond directly to performance-based measures of similar constructs in children (Bodnar et al., Reference Bodnar, Prahme, Cutting, Denckla and Mahone2007; Mahone et al., Reference Mahone, Cirino, Cutting, Cerrone, Hagelthorn, Hiemenz, Singer and Denckla2002b; Mahone & Hoffman, Reference Mahone and Hoffman2007; Niendam et al., Reference Niendam, Horwitz, Bearden and Cannon2007).
However, parent ratings on the BRIEF have shown considerable promise as predictors of independent (adaptive) skill development, including both deficits and strengths (Gilotty et al., Reference Gilotty, Kenworthy, Sirian, Black and Wagner2002; Mangeot et al., Reference Mangeot, Armstrong, Colvin, Yeates and Taylor2002; Ries et al., Reference Ries, Zabel and Mahone2003). For example, Waber et al. (Reference Waber, Gerber, Turcios, Wagner and Forbes2006) found strong correlations between parent BRIEF ratings and performance on statewide academic testing. There is also emerging evidence that the BRIEF may actually be more sensitive than some performance-based measures in identifying salient EF-dependent life skill deficits. For example, Mahone et al. (Reference Mahone, Cirino, Cutting, Cerrone, Hagelthorn, Hiemenz, Singer and Denckla2002b) reported that children with ADHD were rated by parents on the BRIEF as having impairments in multiple components of EF, even when the performance-based measures yielded no group differences. Similarly, in a triple-blind (parent, child, examiner), placebo-controlled treatment study of guanfacine in children with Tourette syndrome, the treatment group had significantly improved parent ratings on the BRIEF MCI, whereas no significant changes were detected on performance-based measures (Cummings et al., Reference Cummings, Singer, Krieger, Miller and Mahone2002). Among BRIEF scales, Working Memory has been shown to demonstrate the greatest effect size (compared with other BRIEF scales) in identifying executive dysfunction in children with spina bifida/hydrocephalus (Mahone et al., Reference Mahone, Zabel, Levey, Verda and Kinsman2002c). Together, these results point to a common pattern of findings, in which the BRIEF Working Memory scale tends to be the most sensitive of the BRIEF scales, across a variety of studies, with multiple patient groups.
Several studies have also identified links between biological markers and caregiver ratings. For example, in children with moderate to severe traumatic brain injury, Wozniak et al. (Reference Wozniak, Krach, Ward, Mueller, Muetzel, Schnoebelen, Kiragu and Lim2007) reported a significant relationship between frontal white matter microstructure (i.e., reduced fractional anisotropy values from diffusion tensor imaging) and parent ratings on the BRIEF Emotional Control scale. Similarly, Anderson et al. (Reference Anderson, Anderson, Northam, Jacobs and Mikiewicz2002) reported elevations (relative to controls) on BRIEF scales among children with hydrocephalus, treated phenylketonuria (PKU), and frontal lesions; however, those with documented frontal lesions had the greatest effect sizes among the three groups, and the BRIEF Working Memory scale emerging as most sensitive. In a sample of children with velocardial facial syndrome (VCFS) and ADHD, Antschel et al. (Reference Antschel, Conchelos, Lanzetta, Fremont and Kates2005) reported association between area measurements from the splenium of the corpus callosum the BRIEF Inhibit scale, with smaller callosal area associated with greater behavioral symptoms. Using the adult version of the BRIEF, Rabin and colleagues reported an increase in problematic scores (particularly the Working Memory scale) as individuals progressed from concerns about cognitive function to Mild Cognitive Impairment (Rabin et al., Reference Rabin, Roth, Isquith, Wishart, Nutter-Upham, Pare, Flashman and Saykin2006).
SUMMARY
Caregiver ratings scales potentially add to ecological validity of neuropsychological examinations in children. Ratings of working memory are particularly salient, given the relationship between working memory and academic development. The BRIEF has been shown to be sensitive in identifying “real life” difficulties with working memory in a variety of clinical populations, although to date, it has had limited validation in conjunction with neuroimaging findings. While there is evidence among clinical populations to suggest an association between reduced regional gray (Bearden et al., Reference Bearden, van Erp, Monterosso, Simon, Glahn, Saleh, Hill, McDonald-McGinn, Zakai, Emanual and Cannon2004; Sowell et al., Reference Sowell, Mattson, Kan, Thompson, Riley and Toga2008) and white matter volumes (Carey et al., Reference Carey, Haut, Reminger, Hutter, Theilman and Kaemingk2008; Semrud-Clikeman et al., Reference Semrud-Clikeman, Steingard, Filipek, Biederman, Bekken and Renshaw2000) and cognitive/behavioral dysfunction, few studies have explored whether the same relationships exist in healthy, typically developing children (Bigler et al., Reference Bigler, Mortensen, Neeley, Ozonoff, Krasny, Johnson, Lu, Provencal, McMahon and Lainhart2007; Wells et al., Reference Wells, Mahone, Matson, Kates, Hay and Horska2008), and whether they can be captured with caregiver rating scales.
The purpose of the present study was to investigate construct validity (convergent/discriminant) of the BRIEF Working Memory scale, using a multi-trait/multi-method design including neuroimaging, rating scales, and performance-based measures. Specifically, we hypothesized that the BRIEF Working Memory scale would be more highly correlated with performance-based measures of working memory than with performance-based measures of non-EF skills, or with parent-ratings of non-EF constructs. Similarly, we hypothesized that the BRIEF Working Memory scale would be more strongly correlated with frontal, compared with nonfrontal, brain volumes.
METHODS
Participants
Participants were recruited from the Baltimore, Maryland, metropolitan area as a control group for a larger study examining late effects of cancer treatments in children. For the present study, a group of 35 typically developing children (19 boys), ages 5–17 years who met the specific study criteria were included. All participants and parents signed a consent form that met the Institutional Review Board standards of the Johns Hopkins Medical Institutions.
Study Procedures
Participants were initially screened via telephone interview with parent and excluded if there was a prior history of psychiatric or neurological disorder, mental retardation, or learning disability. Parents of all study participants also completed a semi-structured psychiatric interview (described below) before testing. Once enrolled, all participants received MRI scans and completed a neuropsychological assessment battery that included measures of attention, memory, language, visual, and motor skills. Parents of the participants also completed behavior rating scales at the time of neuropsychological testing. Parent ratings of EF (emphasizing working memory) and non-EF and performance-based neuropsychological measures of working memory and selected non-EF measures were analyzed for the present study. MRI scans were obtained within one month of neuropsychological testing. Study measures are listed in Table 1.
Table 1. Measures examined in the study

Note.
BRIEF = Behavior Rating Inventory of Executive Function; WJ-III = Woodcock-Johnson III Cognitive Battery; CBCL = Child Behavior Checklist.
Screening Measures
Hollingshead Index
Socioeconomic status for each participant was estimated by a widely used four-factor index (i.e., gender, marital status, education, and occupation; Hollingshead, Reference Hollingshead1975).
Diagnostic Interview for Children, Fourth Edition (DICA-IV; Reich et al., 1997)
Parents of children deemed eligible via telephone screen were administered the DICA-IV, which is based on the Diagnostic and Statistical Manual of Mental Disorders – Fourth Edition (DSM-IV; American Psychiatric Association, 1994). This is a semi-structured interview that is designed for determining selected current and retrospective psychiatric diagnoses including Attention Deficit Hyperactivity Disorder, Conduct Disorder, Oppositional Defiant Disorder, Major Depressive Disorder, Bipolar Disorders, Dysthymia, Separation Anxiety Disorder, Panic Disorder, Generalized Anxiety Disorder, Specific Phobia, Obsessive Compulsive Disorder, and Adjustment Disorders. The DICA-IV has been reported to be reliable for DSM-IV diagnoses. Participants who met DSM-IV criteria for any psychiatric disorder were subsequently excluded from the current sample.
Peabody Picture Vocabulary Test, Third Edition (PPVT-III; Dunn & Dunn, 1997)
The PPVT-III is a screening test of verbal ability and a measure of receptive (i.e., listening) single-word vocabulary attainment for standard English. It was used as an estimate for IQ. The child is shown a page with four pictures and the examiner provides the child with a vocabulary word. The child is asked to identify the picture that best describes the word either by pointing or verbalizing the number of the picture. Thus, the test requires little to no motor or expressive language output. The child continues the test until 8 of 12 items are missed in an item set. Standard scores were used to describe the sample. The PPVT-III manual reports that it is highly correlated (.90) with Wechsler Intelligence Scales for Children, Third Edition (Wechsler, Reference Wechsler1991) Full Scale IQ. Children in the present study were excluded if they had PPVT-III standard scores lower than 70.
Neuropsychological Measures of Working Memory
Behavior Rating Inventory of Executive Function (BRIEF; Gioia et al., 2000)
The BRIEF is a caregiver report questionnaire designed to assess the behavioral manifestations of executive functions in children. Parent and teacher report formats are available; the Parent Form was used for the present study. The BRIEF consists of 86 items sampled from parent and teacher comments collected during the course of clinical interviews, and arranged with respect to eight aspects of the executive function construct. Raters assess the child’s behavior on a 3-point Likert scale and scores are obtained on the following scales: Initiate, Working Memory, Plan/Organize, Organization of Materials, Self-Monitor, Inhibit, Shift, and Emotional Control. Factor scores for two indexes (Metacognition and Behavioral Regulation) are provided, along with a Global Executive Composite. The Working Memory scale was the focus in the present study. Anderson and colleagues reported convergent (i.e., significant correlations with Contingency Naming Task) and discriminant (i.e., nonsignificant correlations with word fluency) validity of the Working Memory scale in a sample of children with frontal lesions, phenylketonuria, or hydrocephalus (Anderson et al., Reference Anderson, Anderson, Northam, Jacobs and Mikiewicz2002). T-scores were used in analyses.
Woodcock Johnson-III Auditory Working Memory (Woodcock et al., 2001)
This is a measure of short-term auditory memory span and auditory (verbal) working memory. The child is asked to listen to a series that contains digits and words such as “dog,” “1,” “shoe,” “8,” and attempt to reorder the words, repeating the objects first and the numbers second. The task requires the child to maintain the information in immediate awareness, and manipulate it by dividing the words into two groups. The manual reports median test–retest reliability of .88 for the age range of interest in the present study. The Auditory Working Memory test is part of the Working Memory Clinical Cluster of the Woodcock Johnson-III battery, and is considered separable from verbal learning in general (Leffard et al., Reference Leffard, Miller, Bernstein, DeMann, Mangis and McCoy2006; Mather & Woodcock, Reference Mather and Woodcock2001). Standard scores were used in analyses.
Neuropsychological Measures of Nonexecutive Skills
Woodcock Johnson-III Spatial Relations (Woodcock et al., 2001)
The Spatial Relations test is an untimed multiple-choice test in which the child is asked to mentally match fragments of shapes to a model. From an array of three to six choices, the subject is asked to select the appropriate shapes (either two or three) that can be assembled to replicate the displayed model. This task taps the spatial perception of abstract forms, and does not require a motor response. The WJ-III Technical Manual reports median test–retest reliability calculated for the age ranges under consideration is .81. Factor analytic studies suggest that Spatial Relations loads with other tests involving visuospatial and constructional skills (e.g., Block Design; McGrew & Woodcock, Reference McGrew and Woodcock2001). Standard scores were analyzed for the present study.
Child Behavior Checklist (CBCL/6-18; Achenbach & Rescorla, 2000).
The CBCL/6-18 is a broadband parent rating scale that examines behavioral and adaptive functioning. The scale provides scores on three competence scales (Activities, Social, and School), Total Competence, eight syndromes, and Internalizing, Externalizing, and Total Problems. The syndromes include Aggressive Behavior, Anxious/Depressed, Attention Problems, Rule-Breaking Behavior, Social Problems, Somatic Complaints, Thought Problems, and Withdrawn/Depressed. The DSM-oriented scales include: Affective Problems, Anxiety Problems, Somatic Problems, Attention Deficit/Hyperactivity Problems, Oppositional Defiant Problems, and Conduct Problems. The scales are based on factor analyses of parents’ ratings of 4994 clinically referred children, and are normed on 1753 children aged 6 to 18, using a representative sample from the 48 contiguous states stratified for SES, ethnicity, region, and urban-suburban-rural residence. The Anxious/Depressed scale (T-scores) were used as a parent rating of nonexecutive behavior in the present study.
Magnetic Resonance Image Acquisition and Processing
High-resolution three-dimensional MRI images of each participant’s brain were acquired with a GE-Signa 1.5 Tesla LX scanner (General Electric, Milwaukee, WI) using the standard birdcage quadrature head coil. Oblique-axial images were obtained with a 3-D volumetric radiofrequency spoiled gradient echo (SPGR) series partitioned into 124, 1.5-mm contiguous slices. Raw, GE-Signa formatted image data was transferred from the MRI scanner at Johns Hopkins Hospital to Apple Macintosh Power PC workstations at SUNY Upstate Medical Institutions via existing networks. The image data were imported into the program BrainImage (Reiss, Reference Reiss1999; http://spnl.stanford.edu/tools/brainimage.htm) for visualization, processing, and quantitation (Subramaniam et al., Reference Subramaniam, Naidu and Reiss1997). The importation process creates a 124-slice image stack composed of spatially registered, 8-bit images that have been processed to minimize signal artifacts related to RF field inhomogeneity. To prepare the stacks for measurement, nonbrain material (i.e., skull, scalp, and vasculature) was removed using a semi-automated edge detection routine that involves region growing as well as stepwise morphologic operations (Subramaniam et al., Reference Subramaniam, Naidu and Reiss1997). These “skull-stripped” images were re-sliced so that the interpolated slice thickness (z-dimension) is the same as the x and y pixel dimensions, thereby converting the image stacks into cubic voxel data sets. The cubic voxel data sets were opened into the multiplanar visualization mode of BrainImage so that three orthogonal representations of the data could be viewed simultaneously.
Image Measurement
Isolated brain tissue was subdivided into cerebral lobes, subcortical regions, brainstem, and cerebellar regions using the revised Talairach (Talairach & Tournoux, Reference Talairach and Tournoux1988) stereotaxic grid atlas specific for measurement in pediatric study groups (Andreasen et al., Reference Andreasen, Cizadlo, Harris, Swayze, O’Leary, Cohen, Ehrhardt and Yuh1993; Kaplan et al., Reference Kaplan, Liu, Abrams, White, Warsofsky and Reiss1997; Kates et al., Reference Kates, Abrams, Kaufmann, Breiter and Reiss1997). With this approach, high levels of sensitivity and specificity are achieved for all revised Talairach-based calculations of lobar brain regions (Kates et al., Reference Kates, Warsofsky, Patwardhan, Abrams, Liu, Naidu, Kaufmann and Reiss1999). Each region was then segmented to delineate and measure lobar volumes of gray, white, and ventricular compartments using a constrained fuzzy algorithm that assigns voxels to one or more tissue categories based on intensity values and tissue boundaries (Reiss et al., Reference Reiss, Hennessey, Rubin, Beach, Abrams, Warsofsky, Liu and Links1998). The segmentation method used was determined reliable for all gray matter, white matter, and CSF volumes (Reiss et al., Reference Reiss, Hennessey, Rubin, Beach, Abrams, Warsofsky, Liu and Links1998). Gray and white matter volumes for frontal, temporal, parietal, and occipital lobes were analyzed in the present study.
Data Analysis
Bivariate correlations of left vs. right hemispheres for each lobe were examined and found to be significantly correlated (all greater than .90; p < .01); therefore, left and right hemisphere measures were combined when examining all lobar volumes to reduce the number of comparisons. Using the procedure outlined by Kramer et al. (Reference Kramer, Quitania, Dean, Neuhaus, Rosen, Halabi, Weiner, Magnotta, Delis and Miller2007), the lobar volumes were normalized to correct for variance in overall head size by multiplying absolute lobar volumes by the average total cerebral volume of our sample and then dividing by the individual’s total cerebral volume.
Main analyses included correlations between the BRIEF Working Memory scale and performance-based assessment of working memory (WJ-III Auditory Working Memory) and nonworking memory (WJ-III Spatial Relations) skills. In addition, because age was significantly correlated with most of the normalized regional white and gray matter volumes, partial correlations (correcting for age) between the performance-based measure of working memory and regional brain volumes, and between non-EF measures and brain volumes were calculated. Standard scores for all performance-based measures and rating scales were used. To reduce skewness, outlier scores were truncated to three standard deviations. To reduce the probably of Type I error, significance level for correlations (two tailed) was set at α = .01.
Fisher’s r-to-z transformations (Hays, Reference Hays1988) were used to examine differences in the magnitude of correlations of interest—in particular, contrasting the correlations between Working Memory ratings and frontal lobe volumes with the correlations between working memory and nonfrontal volumes. Using the same method, the correlations between BRIEF Working Memory ratings and performance-based tests of EF were contrasted with the correlations between BRIEF Working Memory and nonexecutive performance based tests.
RESULTS
Demographic Information
Demographic characteristics of all 35 participants are summarized in Table 2. There were 19 boys and 16 girls in the sample and the mean age was 11.9 years (standard deviation = 3.1). The racial composition was 43% Caucasian, 49% African-American, and 8% biracial; 80 percent of the sample was right-handed and 20% left-handed.
Table 2. Study sample characteristics

Note.
* Gray and white matter volumes normalized to correct for total cerebral volume. CBCL = Child Behavior Checklist; Auditory Working Memory and Spatial Relations subtests from Woodcock-Johnson III Cognitive Battery. SS = Standard Score.
Correlations Between Neuropsychological and Imaging Variables
Means and standard deviations of parent ratings, neuropsychological measures, and normalized lobar volumes are listed in Table 2. Correlations between neuropsychological variables and normalized regional brain volumes are presented in Table 3 and in Figure 1. The BRIEF Working Memory scale was not significantly correlated with the WJ-III Auditory Working Memory (same construct, different modality), the CBCL Anxious/Depressed scale (same modality/different construct), or WJ-III Spatial Relations (different modality/different construct). In contrast, after controlling for age, parent ratings on the BRIEF Working Memory scale were significantly correlated with normalized frontal gray matter volume (r = −.46; p = .006), but not with temporal, parietal or occipital gray, or any of the regional white matter volumes, although the correlation between BRIEF Working memory and occipital white matter approached significance (r = .38; p = .024). The correlation between WJ-III Auditory Working Memory and frontal gray matter volumes approached significance (r = −.37; p = .030); however, neither the WJ-III Spatial Relations, nor ratings on the CBCL Anxious/Depressed scale were significantly correlated with any of the lobar volumes.

Fig. 1. Scatterplots of correlations between BRIEF Working Memory T-Score and Lobar Volumes (cm3) (A) Frontal Gray Matter, (B) Temporal Gray Matter, (C) Parietal Gray Matter, and (D) Occipital Gray Matter. Lobar Volumes were normalized to correct for differences in overall head size. Frontal Gray Matter Volumes show the strongest correlation with BRIEF Working Memory at r = −.463.
Table 3. Correlations between neuropsychological measures and volumetric MRI

Note.
BRIEF = Behavior Rating Inventory of Executive Function; CBCL = Child Behavior Checklist; WJ-III, Woodcock Johnson III Auditory Working Memory Standard Score; SS = Standard Score. Lobar volumes are normalized to adjust for total cerebral volume. Rows 1–4 are zero-order correlations; rows 5–12 are partial correlations (correcting for age). The bolded value is p < .01 (two-tailed).
Fisher’s r-to-z transformation was used to statistically compare the magnitude of significant versus nonsignificant correlations between BRIEF Working Memory ratings and lobar volumes. Using this transformation, the partial correlation between BRIEF Working Memory and normalized frontal gray matter volume (r = −.46) was significantly larger than the correlations between BRIEF Working Memory and temporal gray (r = .05), frontal white (r = −.04), and parietal white matter volumes (r = .05), but not significantly greater than the correlations between BRIEF Working Memory and parietal gray, occipital gray, temporal white, or occipital white matter volumes. Using the same method, however, the zero-order correlation between the BRIEF Working Memory and the WJ-III Auditory Working Memory (same construct, different method, r = −.28) was not significantly larger than the correlation between the BRIEF Working Memory and the CBCL Anxious/Depressed scale (different construct, same method, r = .21) or WJ-III Spatial Relations (different construct, different method, r = −.10).
Prediction of Frontal Volumes
An exploratory hierarchical regression analysis was used to examine the additive value of the BRIEF Working Memory scale (over and above the two performance-based measures and the non-EF parent rating scale) in predicting frontal lobe gray matter volume. Age and scores from CBCL Anxious/Depressed scale, WJ-III Auditory Working Memory and Spatial Relations were entered into the regression analysis on step 1; the BRIEF Working Memory scale was entered on step 2. Together, the variables on step 1 predicted 30% of the variance in frontal gray matter (p = .025); however, the BRIEF Working Memory scale added an additional unique 13% variance (p = .017) on step 2 to the prediction of frontal lobe gray matter volumes.
DISCUSSION
This is one of the first studies to directly examine neuroimaging correlates of parent ratings of executive function in healthy, typically developing children. Results from the present study provide preliminary support for the convergent and discriminant validity of parent ratings of working memory, as measured by the BRIEF. Parent ratings of working memory were significantly correlated with frontal gray, but not nonfrontal gray or white matter volumes. The performance-based measure of working memory had a similar pattern of correlations with lobar volumes, but was not significantly correlated with the BRIEF Working Memory Scale, suggesting that the BRIEF captures unique variance in predicting frontal lobe development in children. Parent ratings of working memory were consistently uncorrelated with parent ratings and performance-based measures of non-EF skills, demonstrating strong divergent/discriminant validity.
The results emphasize the utility of parent ratings in measuring brain–behavior relationships in children, and extend previous work linking parent ratings of ADHD symptoms with regional brain volumes (Castellanos et al., Reference Castellanos, Giedd, Berquin, Walter, Sharp, Tran, Vaituzis, Blumenthal, Nelson, Bastain, Zijdenbos, Evans and Rapoport2001; Giedd et al., Reference Giedd, Castellanos, Casey, Kozuch, King, Hamburger and Rapoport1994; Schrimsher et al., Reference Schrimsher, Billingsley, Jackson and Moore2002). In particular, rating scales (such as the BRIEF) that assess behavioral manifestations of executive function may potentially have ecological validity not only in assessing “everyday life” function, but also as correlates of brain anatomy. These initial findings certainly require replication in patient populations and with other skill areas (EF and non-EF). Nevertheless, given the relative time and cost efficiency of caregiver ratings (compared with performance-based assessments), they may play an important role in screening, and as an integral part of comprehensive neuropsychological and developmental evaluations. Consistent with previous studies examining the validity of the BRIEF in predicting neuropsychological test performance (Anderson et al., Reference Anderson, Anderson, Northam, Jacobs and Mikiewicz2002; Bodnar et al., Reference Bodnar, Prahme, Cutting, Denckla and Mahone2007; Mahone et al., Reference Mahone, Cirino, Cutting, Cerrone, Hagelthorn, Hiemenz, Singer and Denckla2002b; Niendam et al., Reference Niendam, Horwitz, Bearden and Cannon2007; Vriezen & Pigott, Reference Vriezen and Pigott2002), the BRIEF appears to capture different elements of the EF construct than do performance-based measures. When used together with performance-based tests, the BRIEF may be a better predictor of integrity of frontal lobe development. As illustrated with data from the current study, the BRIEF Working Memory added an additional 13% unique predicted variance, over and above that predicted by performance-based measure of EF and non-EF measures.
The present findings, considered together with prior research with school-aged children and adolescents, suggest that working memory difficulties may be validly observed by parents, perhaps because there is less overlap between working memory and overt behavior problems than other components of executive function (e.g., inhibitory control; Cummings et al., Reference Cummings, Singer, Krieger, Miller and Mahone2002; Mahone et al., Reference Mahone, Zabel, Levey, Verda and Kinsman2002c; Mangeot et al., Reference Mangeot, Armstrong, Colvin, Yeates and Taylor2002). As children develop, executive dysfunction does not necessarily result from the emergence of atypical behaviors, but rather the lack of developmentally appropriate reduction of certain behaviors that may be considered typical at an earlier stage of development (Tarazi et al., Reference Tarazi, Zabel and Mahone2008). The BRIEF Working Memory scale appears particularly sensitive to this “failure to keep pace” observed among clinical groups during adolescence (Mahone et al., Reference Mahone, Zabel, Levey, Verda and Kinsman2002c; Tarazi et al., Reference Tarazi, Zabel and Mahone2008), and may be important in understanding the anomalous brain development associated with persistent academic, social, and self-care difficulties (Mangeot et al., Reference Mangeot, Armstrong, Colvin, Yeates and Taylor2002; Tarazi et al., Reference Tarazi, Mahone and Zabel2007).
While the observed pattern of correlations among normalized gray matter volumes and neuropsychological measures fit broadly with study hypotheses, the pattern of correlations among white matter volumes yielded some unexpected results (i.e., the moderate correlation between occipital white matter and BRIEF Working Memory versus the low correlation between BRIEF Working Memory and frontal white matter). The unexpected findings with regard to white matter volumes may be a function of the age of participants (5–18 years) relative to the overall period of white matter development (Lenroot & Giedd, Reference Lenroot and Giedd2006). In other words, white matter may be relatively underdeveloped at this time, and its volume may be less closely linked to frontal lobe functions than gray matter volume. Alternatively, the findings may suggest that the BRIEF captures behavior associated with cortical gray matter development better than it does for behaviors dependent on white matter growth. Both hypotheses should be explored in future studies.
Strengths of the current study include strict inclusion/exclusion criteria, contrasting of co-normed performance-based measures, and the inclusion of anatomic MRI measurement in the analyses. Nevertheless, while the current findings are encouraging, they will require replication with larger samples, a broader range of tests, clinical groups, with MRI measurement of different brain regions (particularly basal ganglia, cerebellum, and lobar subdivisions), and ultimately using additional MRI technologies (e.g., fMRI, diffusion tensor imaging, spectroscopy). Examination of brain–behavior relationships using teacher, adolescent self-report, adult, and preschool versions of the BRIEF are also warranted, as are the explorations of nonlinear (e.g., quadratic, cubic) brain–behavior associations that can better highlight the trajectory of brain development in children.
The current study emphasized verbal working memory correlates of parent ratings and brain volumes, and results cannot be interpreted to extend to performance-based measures of visual (spatial, object) working memory. Additionally, because right and left lobar volumes were so highly correlated in the current sample, the hemispheres were combined for analyses, which did not permit the investigation of differential patterns of correlation between verbal working memory and left versus right regional brain volumes. Given the well-established links to regional specialization and functional dissociation between domain specific components of working memory (D’Esposito et al., Reference D’Esposito, Postle, Ballard and Lease1999; Mohr et al., Reference Mohr, Goebel and Linden2006; Rama et al., Reference Rama, Poremba, Sala, Yee, Malloy, Mishkin and Courtney2004), future research should seek to determine whether BRIEF working memory ratings are differentially correlated with structural or functional MRI differences in frontal subdivisions related to working memory (i.e., DLPFC, VLPFC), or more generally, whether BRIEF scales are differentially sensitive to brain regions shown to be anomalous among children with executive dysfunction (e.g., supplementary motor area, orbitofrontal cortex).
ACKNOWLEDGMENTS
A portion of this study was presented at the 34th Annual International Neuropsychological Society Conference, Boston, Massachusetts, February 3, 2006. The authors thank Melissa Matson, Ph.D., for her contribution to an earlier version of this work. Supported by R01 NS04285, HD-24061 (Developmental Disabilities Research Center), and Johns Hopkins University School of Medicine Institute for Clinical and Translational Research, an NIH/NCRR CTSA Program, UL1-RR025005.