Explaining Differences in Episodic Memory Performance among Older African Americans and Whites: The Roles of Factors Related to Cognitive Reserve and Test Bias

Denise C. Fyffe; Shubhabrata Mukherjee; Lisa L. Barnes; Jennifer J. Manly; David A. Bennett; Paul K. Crane

doi:10.1017/S1355617711000476

Explaining Differences in Episodic Memory Performance among Older African Americans and Whites: The Roles of Factors Related to Cognitive Reserve and Test Bias

Published online by Cambridge University Press: 06 May 2011

Denise C. Fyffe ,

Shubhabrata Mukherjee ,

David A. Bennett and

Denise C. Fyffe*: Affiliation:
Spinal Cord Injury/Outcomes & Assessment Research Laboratory, Kessler Foundation Research Center, West Orange, New Jersey and Physical Medicine and Rehabilitation, New Jersey Medical School University of Medicine and Dentistry of New Jersey, Newark, New Jersey
Shubhabrata Mukherjee: Affiliation:
School of Medicine, University of Washington, Seattle, Washington
Lisa L. Barnes: Affiliation:
Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois
Jennifer J. Manly: Affiliation:
Taub Institute for Research on Alzheimer's Disease and the Aging Brain, Columbia University Medical Center, New York, New York
David A. Bennett: Affiliation:
Rush Alzheimer's Disease Center, Rush University Medical Center, Chicago, Illinois
Paul K. Crane: Affiliation:
School of Medicine, University of Washington, Seattle, Washington
*: Correspondence and reprint requests to: Denise C. Fyffe, Spinal Cord Injury/Outcomes & Assessment Laboratory, Kessler Foundation Research Center, 1199 Pleasant Valley Way, West Orange, New Jersey 07052. E-mail: dfyffe@kesslerfoundation.org

Article contents

Abstract
Introduction
Method
Results
Discussion
References

Rights & Permissions

Abstract

Older African Americans tend to perform poorly in comparison with older Whites on episodic memory tests. Observed group differences may reflect some combination of biological differences, measurement bias, and other confounding factors that differ across groups. Cognitive reserve refers to the hypothesis that factors, such as years of education, cognitive activity, and socioeconomic status, promote brain resilience in the face of pathological threats to brain integrity in late life. Educational quality, measured by reading test performance, has been postulated as an important aspect of cognitive reserve. Previous studies have not concurrently evaluated test bias and other explanations for observed differences between older African Americans and Whites. We combined data from two studies to address this question. We analyzed data from 273 African American and 720 White older adults. We assessed DIF using an item response theory/ordinal logistic regression approach. DIF and factors associated with cognitive reserve did not explain the relationship between race, and age- and sex-adjusted episodic memory test performance. However, reading level did explain this relationship. The results reinforce the importance of considering education quality, as measured by reading level, when assessing cognition among diverse older adults. (JINS, 2011, 17, 625–638)

Keywords

Mental recall Ethnic groups Psychometrics Cognition Education Health status disparities

Type: Special Series
Information: Journal of the International Neuropsychological Society , Volume 17 , Issue 4 , 21 June 2011 , pp. 625 - 638

DOI: https://doi.org/10.1017/S1355617711000476 [Opens in a new window]
Copyright: Copyright © The International Neuropsychological Society 2011

Introduction

Racially disparate outcomes on neuropsychological episodic memory tests have persistently been observed among older adults. Generally, older African Americans demonstrate lower scores on episodic memory tests than Whites (Fillenbaum, Peterson, Welsh-Bohmer, Kukull, & Heyman, Reference Fillenbaum, Peterson, Welsh-Bohmer, Kukull and Heyman1998; Manly et al., Reference Manly, Jacobs, Sano, Bell, Merchant, Small and Stern1998; Masel & Peek, Reference Masel and Peek2009; McDougall, Vaughan, Acee, & Becker, Reference McDougall, Vaughan, Acee and Becker2007; Schwartz et al., Reference Schwartz, Glass, Bolla, Stewart, Glass, Rasmussen and Bandeen-Roche2004; Whitfield et al., Reference Whitfield, Fillenbaum, Pieper, Albert, Berkman, Blazer and Seeman2000; Zsembik & Peek, Reference Zsembik and Peek2001). Worse performance may represent poorer episodic memory functioning, measurement problems such as test bias, or a combination. Poor performance among African Americans due to measurement problems could lead to misdiagnosis of memory disorders (Gurland et al., Reference Gurland, Wilder, Lantigua, Stern, Chen, Killeffer and Mayeux1999; Weiner, Reference Weiner2008; Whitfield, Reference Whitfield2002; Whitfield et al., Reference Whitfield, Fillenbaum, Pieper, Albert, Berkman, Blazer and Seeman2000). Inaccurate assessment and inappropriate diagnoses can have profound negative implications on quality of life, end of life decision making, and caregiver support (Dilworth-Anderson, Hendrie, Manly, Khachaturian, & Fazio, Reference Dilworth-Anderson, Hendrie, Manly, Khachaturian and Fazio2008; Parker & Philp, Reference Parker and Philp2004). Previous investigators have identified demographic characteristics including age and sex (Manly et al., Reference Manly, Jacobs, Sano, Bell, Merchant, Small and Stern1998; McDougall, et al., Reference McDougall, Vaughan, Acee and Becker2007; Mungas, Reed, Farias, & DeCarli, Reference Mungas, Reed, Farias and DeCarli2009; Zsembik & Peek, Reference Zsembik and Peek2001), health conditions including hypertension and cardiovascular disease (Schwartz et al., Reference Schwartz, Glass, Bolla, Stewart, Glass, Rasmussen and Bandeen-Roche2004; Whitfield et al., Reference Whitfield, Fillenbaum, Pieper, Albert, Berkman, Blazer and Seeman2000), and sociocultural variables including education, language, acculturation, and socioeconomic status (Boone, Victor, Wen, Razani, & Ponton, Reference Boone, Victor, Wen, Razani and Ponton2007; Manly, Byrd, Touradji, & Stern, Reference Manly, Byrd, Touradji and Stern2004) as factors associated with observed score differences across groups.

Stern et al., suggested educational experiences influence brain development and can be considered a proxy for cognitive reserve (Stern et al., Reference Stern, Gurland, Tatemichi, Tang, Wilder and Mayeux1994; Stern, Reference Stern2009). Parental education (Kaplan et al., Reference Kaplan, Turrell, Lynch, Everson, Helkala and Salonen2001; Rogers et al., Reference Rogers, Plassman, Kabeto, Fisher, McArdle, Llewellyn and Langa2009; Singh-Manoux, Richards, & Marmot, Reference Singh-Manoux, Richards and Marmot2005), home experiences that stimulate childhood learning (Everson-Rose, Mendes de Leon, Bienias, Wilson, & Evans, Reference Everson-Rose, Mendes de Leon, Bienias, Wilson and Evans2003), and lifetime engagement in cognitive activities (Scarmeas & Stern, Reference Scarmeas and Stern2003; Wilson, Barnes, & Bennett, Reference Wilson, Barnes and Bennett2003; Wilson et al., Reference Wilson, Barnes, Krueger, Hoganson, Bienias and Bennett2005) are examples of factors found to influence late-life cognitive functioning. These experiences, conceptualized as cognitive reserve in the current manuscript, may preserve cognitive functioning in the face of brain pathology in later life (Jones et al., Reference Jones, Fong, Metzger, Tulebaev, Yang, Alsop and Inouye2010; Scarmeas & Stern, Reference Scarmeas and Stern2003). The primary goal of this study is to examine factors associated with cognitive reserve concurrently for measurement bias and their ability to explain differences in episodic memory performance across African Americans and Whites.

The association between education and reserve may be partially mediated by socioeconomic status and education quality (Brunner, Reference Brunner2005; Dotson, Kitner-Triolo, Evans, & Zonderman, Reference Dotson, Kitner-Triolo, Evans and Zonderman2009; Kaplan et al., Reference Kaplan, Turrell, Lynch, Everson, Helkala and Salonen2001; Stern, Albert, Tang, & Tsai, Reference Stern, Albert, Tang and Tsai1999). Higher socioeconomic status may afford opportunities to engage in cognitively stimulating experiences, which may buffer against late life cognitive decline (Stern et al., Reference Stern, Gurland, Tatemichi, Tang, Wilder and Mayeux1994, Reference Stern, Albert, Tang and Tsai1999; Stern, Reference Stern2006). Manly, Touradji, Tang, and Stern (Reference Manly, Touradji, Tang and Stern2003) and Manly, Schupf, Tang, and Stern (Reference Manly, Schupf, Tang and Stern2005) studied education quality as measured by performance on reading tests (Cosentino, Manly, & Mungas, Reference Cosentino, Manly and Mungas2007). Low reading levels (i.e., a proxy for poor education quality) were associated with more rapid rates of cognitive decline (Manly et al., Reference Manly, Touradji, Tang and Stern2003, Reference Manly, Schupf, Tang and Stern2005).

Demographic, health, and sociocultural factors that contribute to differential episodic memory ability may represent test bias (Brickman, Cabo, & Manly, Reference Brickman, Cabo and Manly2006; Gasquoine, Reference Gasquoine2009; Pedraza & Mungas, Reference Pedraza and Mungas2008; Robertson, Liner, & Heaton, Reference Robertson, Liner and Heaton2009; Rosselli & Ardila, Reference Rosselli and Ardila2003). Educational experiences that lead to the acquisition of test-taking strategies can increase “test wiseness” and may inflate test scores (Gasquoine, Reference Gasquoine2009; Manly, Jacobs, Touradji, Small, & Stern, Reference Manly, Jacobs, Touradji, Small and Stern2002; Robertson et al., Reference Robertson, Liner and Heaton2009; Rosselli & Ardila, Reference Rosselli and Ardila2003; Scruggs & Lifson, Reference Scruggs and Lifson1985). If test wiseness varies across groups, individuals in different groups with the same underlying level of the ability measured by the test would have unequal expected scores, which is a definition of differential item functioning (DIF) (Camilli & Shepard, Reference Camilli and Shepard1994; Thissen, Steinberg, & Wainer, Reference Thissen, Steinberg, Wainer, Holland and Wainer1993). Other factors representing test bias include reaction to test content (e.g., familiarity, interest) (Brickman et al., Reference Brickman, Cabo and Manly2006; Flaugher, Reference Flaugher1978; Stricker & Emmerich, Reference Stricker and Emmerich1999; Teng & Manly, Reference Teng and Manly2005) and cultural factors including stereotype threat, language, or unrepresentative norms (Brickman et al., Reference Brickman, Cabo and Manly2006; Gasquoine, Reference Gasquoine2009; Kit, Tuokko, & Mateer, Reference Kit, Tuokko and Mateer2008; Loewenstein, Arguelles, Arguelles, & Linn-Fuentes, Reference Loewenstein, Arguelles, Arguelles and Linn-Fuentes1994; Manly et al., Reference Manly, Jacobs, Touradji, Small and Stern2002; Manly, Reference Manly2008; Teng & Manly, Reference Teng and Manly2005; Whitfield, Reference Whitfield2002).

Meaningful comparisons of performance across groups necessitate attention to measurement equivalence (Teresi, Kleinman, & Ocepek-Welikson, Reference Teresi, Kleinman and Ocepek-Welikson2000; Teresi, Stewart, Morales, & Stahl, Reference Teresi, Stewart, Morales and Stahl2006; Tuokko et al., Reference Tuokko, Chou, Bowden, Simard, Ska and Crossley2009). Several researchers have applied DIF methodology to assess relationships between characteristics associated with test bias and performance on neuropsychological tests among racially diverse older adults (Crane, van Belle, & Larson, Reference Crane, van Belle and Larson2004; Crane et al., Reference Crane, Narasimhalu, Gibbons, Pedraza, Mehta, Tang and Mungas2008; Jones, Reference Jones2003; Pedraza et al., Reference Pedraza, Graff-Radford, Smith, Ivnik, Willis, Petersen and Lucas2009; Ramirez, Teresi, Holmes, Gurland, & Lantigua, Reference Ramirez, Teresi, Holmes, Gurland and Lantigua2006; Teresi, Holmes, Ramirez, Gurland, & Lantigua, Reference Teresi, Holmes, Ramirez, Gurland and Lantigua2001; Teresi et al., Reference Teresi, Golden, Cross, Gurland, Kleinman and Wilder1995). Much of this previous work has found substantial DIF in global measures of cognition, such as the Mini-Mental State Examination (MMSE) (Crane, Gibbons, Jolley, & van Belle, Reference Crane, Gibbons, Jolley and van Belle2006; Dorans & Kulick, Reference Dorans and Kulick2006; Jones, Reference Jones2006; Morales, Flowers, Gutierrez, Kleinman, & Teresi, Reference Morales, Flowers, Gutierrez, Kleinman and Teresi2006; Ramirez et al., Reference Ramirez, Teresi, Holmes, Gurland and Lantigua2006) or the Cognitive Abilities Screening Instrument (CASI) (Crane et al., Reference Crane, van Belle and Larson2004; Gibbons et al., Reference Gibbons, McCurry, Rhoads, Masaki, White, Borenstein and Crane2009). DIF has also been observed in specific cognitive domains, such as visual naming ability (Pedraza et al., Reference Pedraza, Graff-Radford, Smith, Ivnik, Willis, Petersen and Lucas2009), fluency, and working memory (Crane et al., Reference Crane, Narasimhalu, Gibbons, Pedraza, Mehta, Tang and Mungas2008). To our knowledge this is the first study to examine DIF in African Americans and Whites on a measure of episodic memory.

DIF analyses determine whether individual characteristics exaggerate or attenuate the probability of successful responses to episodic memory items, given a particular level of episodic memory functioning. DIF analyses often focus on item-level findings. Crane Gibbons, Narasimhalu, Lai and Cella (Reference Crane, Gibbons, Narasimhalu, Lai and Cella2007) and Crane, Gibbons, and Ocepek-Welikson, et al. (Reference Crane, Gibbons, Ocepek-Welikson, Cook, Cella, Narasimhalu and Teresi2007) suggest there may be different audiences for DIF analyses. Scale developers may be most interested in item-level findings. Clinicians may be primarily interested in individual-level DIF impact. Social scientists may be primarily interested in group-level DIF impact, which addresses the question, “Is it likely that DIF might impact mean scores for groups or relationships between covariates of interest?” (Crane, Gibbons, and Ocepek-Welikson, et al., Reference Crane, Gibbons, Ocepek-Welikson, Cook, Cella, Narasimhalu and Teresi2007; Crane, Gibbons, Narasimhalu, et al., Reference Crane, Gibbons, Narasimhalu, Lai and Cella2007). In this study, we are primarily interested in group-level DIF impact. One research question being posed is: Does DIF impact the relationships between factors associated with reserve and episodic memory functioning across African American and White older adults?

Figure 1 depicts theorized relationships evaluated in this study. Observed variables (performance on episodic memory tests, demographics, indicators associated with reserve) are depicted in rectangles, while the unobserved factor (actual episodic memory functioning) is in an oval. The prior work of Manly et al. (Reference Manly, Jacobs, Touradji, Small and Stern2002, Reference Manly, Touradji, Tang and Stern2003, Reference Manly, Schupf, Tang and Stern2005) suggested that educational experiences were particularly important. Because these investigators did not test for DIF, its possible importance as an explanatory factor is unknown. In the current study we directly tested for DIF and depict DIF in a dashed box in Figure 1. The dashed box indicates that usually DIF is ignored, but is included in the present study. Thus, the goals of this study are thus to better understand relationships between memory performance and demographic and cognitive reserve covariates, while accounting for DIF.

Fig. 1 Theoretical model of relationships between demographics, cognitive reserve and measurement bias on observed performance on episodic memory tests. Observed variables (i.e., performance on neuropsychological episodic memory tests, demographic characteristics, and indicators associated with cognitive reserve) are depicted in rectangles, while the unobserved factor (actual episodic memory functioning) is depicted in an oval. Observed episodic memory scores from neuropsychological tests (the box to the right of the figure) have only two inputs: episodic memory functioning itself (the oval at the left) and DIF. Demographics and cognitive reserve indirectly influence assessment performance through episodic memory functioning. DIF is depicted as a dashed box. DIF analyses allow us to assess whether measurement bias may be responsible for differences across groups in observed episodic memory scores. Any effect of demographics or cognitive reserve on observed performance that is not due to actual episodic memory functioning is depicted as a DIF effect.

Method

Participants

Study participants were identified from the Memory and Aging Project (MAP) and the Minority Aging Research Study (MARS) conducted by the Rush Alzheimer's Disease Center. MAP and MARS are ongoing longitudinal cohort studies among community-dwelling older adults in Chicago. MAP began enrollment in 1997 (Bennett et al., Reference Bennett, Schneider, Buchman, Mendes de, Bienias and Wilson2005). Consenting participants agreed to detailed annual evaluations, cognitive testing, and postmortem organ donation. MARS has a nearly identical design and began enrollment of African Americans in 2004. By April 2010, MAP included 1304 participants, and MARS 349. Recruitment strategies were so similar that a few African Americans are enrolled in both studies.

We evaluated baseline data from self-identified African Americans or Whites who were free of dementia, and had complete episodic memory and cognitive reserve data. The data from these studies were obtained in compliance with Rush's Institutional Review Board regulations.

Clinical Evaluations

Participants completed clinical evaluations including medical history, neurological examination, and neuropsychological assessment (Arvanitakis, Bennett, Wilson, & Barnes, Reference Arvanitakis, Bennett, Wilson and Barnes2010; Bennett et al., Reference Bennett, Schneider, Buchman, Mendes de, Bienias and Wilson2005). A clinician used clinical data and standard criteria to classify dementia and Alzheimer's Disease (McKhann et al., Reference McKhann, Drachman, Folstein, Katzman, Price and Stadlan1984).

Neuropsychological Evaluations

Participants completed a 19-test battery assessing five cognitive domains. We evaluated episodic memory tests common across MAP and MARS. (a) Story recall (4 scores). Logical Memory Story A (Wechsler, Reference Wechsler1987) is a fact-dense textual passage read aloud once; the participant is asked to recall elements immediately and after a delay. The East Boston Memory Test (Albert et al., Reference Albert, Smith, Scherr, Taylor, Evans and Funkenstein1991) is similar, and includes scores for immediate and delayed recall. (b) Word list (3 scores). The 10-word CERAD list (Morris et al., Reference Morris, Heyman, Mohs, Hughes, van Belle and Fillenbaum1989) was administered in three learning trials that are summed (range, 0–30). After a distracter task, the participant is asked to recall the words (range, 0–10). Participants are then presented with ten trials of four words, and asked to identify the one on the CERAD list (range, 0–10).

Cognitive reserve

Cognitive reserve indicators included: years of personal, maternal, and paternal education, childhood cognitive activity frequency, income at age 40, and education quality, as measured by reading level (see below). We initially categorized self-reported personal years of education as (1) some primary (<grade 8); (2) primary (completed grade 8); (3) some high school (9–11); (4) high school (completed grade 12); or (5) post-secondary (13 or greater). For DIF analyses, we categorized education as <12 and ≥12 years to ensure adequate analytic sample sizes.

We calculated childhood cognitive activity from self-reported activities at ages 6 and 12. Participants were asked how often someone read to them, told them stories, or played games with them (age 6) and how often they read books and magazines or went to the library (age 12); response options ranged from less than once a year (1 point) to almost every day (5 points) and composite scores were obtained by averaging across the five items (Wilson et al., Reference Wilson, Barnes and Bennett2003). The scale has demonstrated adequate psychometric properties (Cronbach's α = 0.88; test–retest reliability of r = 0.79) in studies with older adults (Barnes, Wilson, de Leon, & Bennett, Reference Barnes, Wilson, de Leon and Bennett2006; Wilson et al., Reference Wilson, Barnes, Krueger, Hoganson, Bienias and Bennett2005). We dichotomized average scores at ≤3 and >3 activities to ensure adequate analytic sample sizes.

Income at age 40 was reported in one of six categories defined by a range of dollar amounts. We compared participant responses to the median U.S. family income for the appropriate year (United States Census Bureau, 2010). We categorized income as below or above median income at age 40.

Reading level was measured by reading tests. MAP participants were administered the National Adult Reading Test (NART) (Nelson, Reference Nelson1982), while MARS participants were administered the third edition of the Wide Range Achievement Test Reading subtest (WRAT-3) (Wilkinson, Reference Wilkinson1993). For each test, participants read aloud words of increasing complexity; correct pronunciation is required to obtain a point.

We analyzed NART and WRAT-3 data from the 10 individuals enrolled in both studies to co-calibrate this variable. We identified 23 data points where those individuals were evaluated by the two tests at least two times within a 6-month window. For those 23 occasions, we examined a scatterplot (Appendix 1) that confirmed Z scores on the two tests appeared to be roughly linearly related to each other. We identified the median Z score on the WRAT-3 and the median Z score on the NART for these individuals, and used those Z scores to categorize reading levels from the parent studies.

Data Analysis

Overview

We derived three different composite scores from the seven episodic memory test data points: a composite Z-score, an IRT score that ignored DIF (a “naive” score), and an IRT score that accounted for DIF with respect to all of the covariates. We performed linear regression analyses using standardized composite scores as dependent variables and race as the primary predictor. We included demographic factors, and factors associated with cognitive reserve, paying particular attention to reading level. We performed a series of sensitivity analyses to assess the robustness of our findings.

Composite Z score

We created the composite measure of episodic memory by converting raw scores on each test to Z scores using the baseline MAP mean and standard deviation. We averaged these Z scores (Wilson et al., Reference Wilson, Barnes and Bennett2003, Reference Wilson, Barnes, Krueger, Hoganson, Bienias and Bennett2005).

Dimensionality

Both the naive IRT score and the IRT score accounting for multiple sources of DIF rely on an assumption of unidimensionality, that is, that the items can be conceptualized as measuring a single underlying construct. There is no single standard approach for determining whether a scale is sufficiently unidimensional. We used exploratory and confirmatory factor analyses.

Naive IRT scores

We used Parscale (Muraki & Bock, Reference Muraki and Bock2003) using Samejima's graded response model (Samejima, Reference Samejima1969) and expected a posteriori (EAP) scoring. The graded response model is a polytomous extension of the two-parameter logistic model (2PL) (Lord & Novick, Reference Lord and Novick1968).

IRT scores that accounted for all forms of DIF

We used a hybrid ordinal logistic regression/IRT approach to identify and account for DIF, using difwithpar software (Crane et al., Reference Crane, Gibbons, Jolley and van Belle2006). We analyzed several covariates for DIF: self-reported race, sex, education, age, father's education, mother's education, childhood cognitive activities, income at age 40, and reading level. We were primarily interested in accounting for all sources of DIF. Detailed methods have been published previously (Crane et al., Reference Crane, Gibbons, Jolley and van Belle2006, Reference Crane, Narasimhalu, Gibbons, Pedraza, Mehta, Tang and Mungas2008).

Regression analyses

All regression models included an indicator term for race. We transformed each episodic memory composite score to have a mean of 0 and standard deviation of 1. We performed a series of regression analyses with the composite episodic memory scores as dependent variables: (1) Base: race; (2) Demographics: race plus demographics (sex and age); (3) Demographics and cognitive reserve except reading level: model 2 plus cognitive reserve factors other than reading level (years of education, father's education, mother's education, childhood cognitive activities, and income at age 40); (4) Demographics and cognitive reserve including reading level: model 3 plus reading level.

Sensitivity analyses

We performed several sensitivity analyses to determine whether assumptions made in our modeling affected our conclusions. We repeated DIF analyses related to race using Multiple Indicator Multiple Cause (MIMIC) modeling. These analyses were performed in two ways, using (1) a single factor model (analogous to the IRT approach used in the primary analysis); and, (2) a bi-factor model that does not rely on the assumption of unidimensionality.

We assessed multicollinearity between the covariates. We matched African Americans to Whites of similar age and education and the same sex, and repeated the regression analyses to control for cohort effects. We performed regression analyses with age, education, and childhood cognitive activity as continuous variables. The scores we used to co-calibrate the reading tests may lead to misclassifying high or low reading levels (Appendix 1), so we performed a secondary analysis in which we omitted people whose reading scores were close to the cutoff values (within 0.25 SD of the cutoff values), that is, people whose reading levels were most likely to be misclassified to ensure that misclassification of reading level was not driving the results.

We performed additional analyses to determine whether the reading level effect was unique, or whether using another cognitive test would have the same effect. We compared correlations between reading scores and Digit Span Forward, Digit Span Backward (Wechsler Memory Test-R) (Wechsler, Reference Wechsler1987), and Digit Ordering (Cooper & Sagar, Reference Cooper and Sagar1993; Wilson et al., Reference Wilson, Beckett, Barnes, Schneider, Bach, Evans and Bennett2002). We used Digit Ordering, the test that had the lowest correlations with reading scores, to avoid confounding the domains. We dichotomized Digit Ordering so similar proportions would be classified as high or low as were in those categories for reading level. We then repeated the final regression model replacing reading level with Digit Ordering.

Results

Demographics and Episodic Memory Scores

Data were available from 1644 participants. We performed our primary analyses on the 993 participants with complete data, including 273 African Americans and 720 Whites. Some participants who were included in the data set also self-identified as Hispanic: 5 (2%) of the African Americans and 77 (11%) of the Whites. Figure 2 provides an outline of the sample derivation. There were 83 participants excluded due to a diagnosis of Alzheimer's disease or other dementia and 12 participants excluded because they self-identified in a racial group other than African American or White. An additional 556 participants were excluded because they had missing data. Missing data were especially prevalent for three reserve indicators: mother's and father's education and income at 40. The demographic and episodic memory characteristics remained the same when we included participants with missing data. We also compared results from the 993 people with complete data on all covariates to results from the 1421 people with data on all covariates other than mother's education, father's education, and income at age 40, and all regression coefficients were within a few hundredths of each other.

Fig. 2 The derivation of study participants from the Memory and Aging Project (MAP) and Minority Aging Research Study (MARS) databases.

The 993 participants in our primary analyses had a mean age of 77.8 years (SD = 7.6) and a mean of 14.8 years of education (SD = 3.3); 71% were women and 73% were White. Further demographic details are provided in Table 1. On average African Americans were younger and had more years of formal schooling than Whites, had approximately the same levels of parental education and income at age 40, had higher childhood cognitive activity scores, and had lower reading levels. Mean scores for African Americans and Whites for the individual episodic memory tests are shown in Table 1. The tests used to make episodic memory scores demonstrated adequate reliability (α = 0.81) and bivariate correlations ranging from 0.23 to 0.85.

Table 1 Demographics, cognitive reserve, and episodic memory characteristics of sample stratified by race

^aSee the Methods section for details on calculation of the childhood cognitive activity score.

^bIncome at age 40 was reported in categories of dollars. We dichotomized this variable by looking at the median family income in the U.S. for the year in which the participant was 40. See methods section for details.

^cReading level was obtained from the WRAT-3 and NART in the MARS and MAP studies, respectively. As detailed in the methods section, we analyzed data from participants in both studies to identify threshold values for the two tests that could be considered to be equivalent. The values shown in this table represent the numbers of individuals above and below those thresholds, which were a Z score of 0.48 for the WRAT-3 in MARS and a Z score of −0.94 for the NART in MAP.

IRT and DIF Analyses

We calculated three composite episodic memory scores, which were highly correlated. The two IRT scores were more closely correlated with each other (r = 0.998) than with the composite Z score (r = 0.913 for the naive IRT score and r = 0.900 for the IRT score accounting for DIF). Results from exploratory and confirmatory factor analyses indicated that the episodic memory indicators were sufficiently unidimensional for use of IRT. Only a single Eigen value was above 1 and the second factor had a negligible Eigen value, a single factor model did not fit well, so we fit a bi-factor model in which the three word list items formed a secondary factor and in which we allowed for residual correlation between the two Logical Memory items and similarly for the two East Boston items. This model fit well. Factor loadings between the single factor model and the bi-factor model were very similar, and all of the loadings on the general factor in the bi-factor model were >0.30, which McDonald suggests is evidence of sufficient unidimensionality (McDonald, Reference McDonald1999).

The DIF analyses considered nine covariates: race, age, sex, education, income at age 40, early life cognitive activities, mother's education, father's education, and reading level. The difference between the IRT score accounting for all nine sources of DIF and the naive IRT score represents individual-level DIF impact. When DIF has a negligible impact, the difference will be close to zero. If DIF makes a big impact, this difference will be large. We compared differences to the median standard error of measurement for IRT scores in this data set, which was 0.3. Accounting for all sources of DIF led to changes larger than 0.3 for only six participants (<1%), which suggested the overall individual level DIF impact was negligible.

We compared scores for African Americans and Whites when accounting for and ignoring DIF. The mean (SD) naive score for African Americans was −0.005 (0.88), and for Whites it was +0.002 (1.04), a difference of 0.007. The mean (SD) scores accounting for DIF for African Americans was −0.036 (0.89), and for Whites it was 0.014 (1.04), a difference of 0.050. Ignoring DIF thus very modestly attenuated differences in mean episodic memory scores between Whites and African Americans.

Factors Associated With Episodic Memory Scores

Regression results are shown in Table 2. The cells show values for regression coefficients for each model. The four sections show results obtained from models with: (1) race only; (2) race and demographics; (3) race, demographics and measures of cognitive reserve except reading level; and (4) race, demographics, and all measures of cognitive reserve including reading level. The three columns show results for the three different dependent variables (naive IRT score, IRT score accounting for all sources of DIF, and composite Z-score) used for the regression models.

Table 2 Results of regression models across Episodic Memory Scores

Our primary focus in these analyses was on the coefficients associated with race, shown in the top row of each section of Table 2. The intercept term provides an estimate of the adjusted mean for the reference group, while the coefficient for race provides an estimate of the adjusted mean difference between African Americans and Whites. In unadjusted models, mean episodic memory scores were not different across race groups in our sample (Model 1). When we accounted for demographic differences across race groups by including age and sex, African Americans on average did worse than Whites (Model 2). These findings were consistent across the three composite episodic memory scores. We entered age and sex separately in the models and confirmed our suspicion that this effect was attributable to age. The third section in Table 2 summarizes regression findings from models that included race, demographics, and measures of cognitive reserve other than reading level (Model 3). The coefficient for race was not affected by including these factors in the model, suggesting that differences across racial groups in age- and sex-adjusted episodic memory performance were not due to these factors. Again, findings were very similar for the three dependent variables.

The fourth section in Table 2 summarizes findings from the full model including reading level. Adding reading level to Model 3 caused the coefficient associated with race to become insignificant, suggesting that reading level explained the differences across race groups in age- and sex-adjusted episodic memory scores. These results were consistent across different composite episodic memory scores.

Sensitivity Analyses

There are a range of methods to detect and account for DIF (Millsap & Everson, Reference Millsap and Everson1993; Teresi, Reference Teresi2006) that might yield different results. We found similar results for race using single factor or bi-factor multiple indicator—multiple cause (MIMIC) models as those we report for the IRT approach. The consistency of findings across the two approaches (MIMIC vs. IRT) is reassuring, as is the consistency of findings when we relaxed the single factor assumption (single factor vs. bi-factor MIMIC).

We did not detect any multicollinearity. We assessed the variation inflation factors (VIF) for old models (age dichotomized) and new models (age centered and treated as continuous), all of the VIFs were less than 4, indicating no multicollinearity was detected. We matched participants on age, years of education, and sex to derive a sample of 546. We repeated our regression models in this matched data set and confirmed our main findings observed in Model 4 of Table 2 (see Table 3).

Table 3 Regression results for the matched analyses (n = 546)

Note. Overall Regression model results based on participants matched on sex, age and years of education (n = 546). Findings are essentially identical to regression results from the whole sample shown in Table 2.

We performed additional regression analyses on the entire sample in which we treated age, years of education and childhood cognitive activity as continuous variables. Findings were essentially the same as our primary analyses.

We repeated analyses after excluding people whose reading test scores were close to the cutoff values. Using this approach, 67 participants were excluded from MARS and 48 from MAP. Results were very similar to those from the whole sample (Appendix 2).

We repeated the analyses of Model 4, substituting Digit Ordering for reading level. The coefficient for race in the model of the IRT score accounting for all forms of DIF was −0.16 (p = .04), in the model of the naive IRT scores it was −0.16 (p = .03), and in the model with composite Z-scores it was −0.10 (p = .18). These results suggest the ability of reading level to account for the effect of race on episodic memory is specific to reading level, because using a cognitive domain minimally correlated with reading level did not remove the effect of race.

Discussion

The goal of this study was to investigate several possible explanations for lower episodic memory test scores among older African Americans compared to older Whites. Measurement bias, as identified by DIF analyses, did not explain differences across race in age- and sex-adjusted episodic memory scores. Several variables used as proxies for reserve did not explain these differences. However, we confirmed the findings of Manly and colleagues (Reference Manly, Jacobs, Touradji, Small and Stern2002, Reference Manly, Touradji, Tang and Stern2003, Reference Manly, Schupf, Tang and Stern2005) that education quality, as measured by reading level, explained differences in age- and sex-adjusted scores between African Americans and Whites. This finding appears to be unique to reading level, as a measure of attention (Digit Ordering) did not have the same effect.

An important strength of this study is the evaluation of DIF. DIF analyses are common in educational testing, but still rare in neuropsychology. Without specific analyses, it is impossible to determine whether observed score differences across groups may be due to measurement bias or true group differences. We found that DIF was not responsible for differences in episodic memory test scores between African Americans and Whites. This finding is in contrast to DIF studies in other cognitive domains (Crane et al., Reference Crane, Narasimhalu, Gibbons, Pedraza, Mehta, Tang and Mungas2008; Pedraza et al., Reference Pedraza, Graff-Radford, Smith, Ivnik, Willis, Petersen and Lucas2009).

We used a hybrid IRT/OLR approach to DIF detection. There are a range of methods to detect and account for DIF (Millsap & Everson, Reference Millsap and Everson1993; Teresi, Reference Teresi2006) that might yield different results. We found similar results for race using a different DIF detection technique. The IRT approach used here relies on the assumption of unidimensionality. Methods for DIF assessment when this assumption is violated are not readily available, especially when the goal is to account for DIF with respect to a large number of covariates. We found the same item identified with DIF for race when we used single factor or bi-factor MIMIC models for episodic memory, suggesting that ignoring bi-factor structure may not be an important feature in our DIF findings.

African Americans tend to perform lower on episodic memory tests than Whites of similar age, but the differences are often due to differences in education, occupation or income (Dotson et al., Reference Dotson, Kitner-Triolo, Evans and Zonderman2009; Manly et al., Reference Manly, Jacobs, Sano, Bell, Merchant, Small and Stern1998; McDougall et al., Reference McDougall, Vaughan, Acee and Becker2007; Mungas et al., Reference Mungas, Reed, Farias and DeCarli2009; Zsembik & Peek, Reference Zsembik and Peek2001). In the current study, mean scores for some memory tests were actually higher for African Americans than Whites (Table 1), but African Americans were younger on average (Table 1). Indeed, in unadjusted analyses (Model 1 in Table 2), composite episodic memory scores did not differ across race. In adjusted analyses, African Americans had poorer age- and sex-adjusted episodic memory scores (Model 2 in Table 2). In our study, reserve factors other than reading level did not explain differences across race groups in age-adjusted episodic memory scores (Model 3 in Table 3). Reading level itself did explain differences across race groups in age adjusted episodic memory scores (Model 4 in Table 3). This effect was specific for reading level, as the race effect was still present in models that excluded reading level but included Digit Ordering.

Prior research has identified reading level as a proxy of educational quality associated with cognitive decline (Manly et al., Reference Manly, Jacobs, Touradji, Small and Stern2002). This factor has been identified as particularly important to comparisons of neuropsychological testing results across groups of elders characterized by diverse languages and ethnic backgrounds (Cosentino et al., Reference Cosentino, Manly and Mungas2007; Manly et al., Reference Manly, Jacobs, Touradji, Small and Stern2002). Two tests of reading level were used in the analyses: WRAT-3 and NART. MARS selected the WRAT-3 due to concerns about floor effects for the NART among minority elders. A cross-validation study found the WRAT-3 and NART to be comparable measures of premorbid intelligence (Johnstone, Callahan, Kapila, & Bouman, Reference Johnstone, Callahan, Kapila and Bouman1996). We are unaware of any formulas or other means of translating between the two measures. While other statistical methods (e.g., Bland-Altman plots) might prove useful to compare these tests, our sample size of 10 individuals with data from both tests was insufficient for these methods. Our categorization into high and low reading levels might be considered somewhat crude, with the distinct possibility of misclassification. The fact that this crude variable explained differences in age- and sex-adjusted episodic memory scores, while a series of other factors associated with reserve did not explain these differences, is remarkable. Our results remained unchanged when we omitted people with reading scores close to the cutoff used to distinguish between high and low scores, increasing our confidence in our findings. Results of additional sensitivity analyses in which we substituted Digit Ordering for episodic memory further buttress the impressive nature of this finding. There was no misclassification for Digit Ordering—the same test was used in both studies—but it did not explain age- and sex-adjusted episodic memory score differences between African Americans and Whites.

As noted above, the WRAT and NART have been conceptualized as measures of reading level indicating educational quality (as we have done here) and also as measures of premorbid intelligence (Johnstone, Callahan, Kapila, & Bouman, Reference Johnstone, Callahan, Kapila and Bouman1996). We have used these tests in models that have already adjusted for years of education, parental education, income at midlife, and childhood cognitive activities—all factors likely also associated with intelligence but none of which explained racial differences in episodic memory scores. Furthermore, the effect of reading ability to explain racial differences was unique, as Digit Ordering did not explain racial differences in episodic memory scores. Digit Ordering is also correlated with intelligence (Luo, Chen, Zen, & Murray, Reference Luo, Chen, Zen and Murray2010). While we cannot rule out the possibility that intellectual ability rather than educational quality explains differences across race in episodic memory scores, our analyses suggest that reading test scores alone—and not the other factors considered here—are able to explain these differences, suggesting that there is something unique about reading test scores not shared by these other factors.

As in any observational study, residual and/or unmeasured confounding variables may explain our findings. Unmeasured confounders (i.e., those not included in the current study) might include environmental factors (e.g., pollutants) and genetic differences. The complexities of race and culture are also unmeasured factors that may influence the performance of ethnically diverse older adults on neuropsychological tests. Aspects of culture such as acculturation contribute to older adults’ performances on episodic memory tests (Manly et al., Reference Manly, Byrd, Touradji and Stern2004).

We used somewhat crude dichotomous indicators of each factor associated with cognitive reserve in our DIF assessments and in our regression models, which raises the possibility of residual confounding. For example, based on responses to the question regarding income at age 40, we dichotomized participants into those with incomes below the median family income in the year they were 40 versus those at or above the median income. It is possible that levels of wealth well over the poverty line may not be related to additional brain protection than more modest levels of wealth, while levels of wealth close to or below the poverty line may be more linearly related to brain insults. By dichotomizing these variables, we are necessarily grouping together individuals who may nevertheless have variability in risk. When we treated the variables as continuous, our results were unchanged.

The generalizability of the results may be limited by the geographic location of the study population, the specific inclusion criteria used for the two studies, and the focus on African Americans and Whites. Furthermore, generalizability to other older African Americans may be limited by the relatively high education level in the current sample. Recall bias could possibly impact the measurement of some of our covariates such as income at age 40, childhood cognitive activity, and educational experience, though we do not expect this bias to be different across race groups. Analyses in which we matched on sex, age and years of education did not substantially change our findings. That result suggests that multiple linear regression is an adequate approach to determine the effect of race on episodic memory performance.

These results may also be limited to the specific cognitive domain, episodic memory, examined and the neuropsychological tests used to measure this domain. Indeed, Crane et al. (Reference Crane, Narasimhalu, Gibbons, Pedraza, Mehta, Tang and Mungas2008) found DIF was more important in explaining differences across race/ethnic groups for a fluency and working memory composite. The cross-sectional analyses we performed did not allow us to comment on rates of decline of episodic memory functioning over time. Thus, we cannot comment on whether rates of decline may differ by race, or whether any such difference may be due to DIF, demographic factors, or factors associated with reserve.

In conclusion, we found on average, older African Americans had lower age- and sex- adjusted mean episodic memory scores than Whites. Those differences are not due to ignoring DIF. We tested several factors related to reserve identified from previous research, and none of these explained differences across race groups in age- and sex-adjusted episodic memory scores. However, reading level, posited to be an indicator of the quality of educational experiences, did explain differences across race groups in age- and sex-adjusted mean episodic memory scores. This finding was not generalizable to other cognitive tests. These findings reinforce prior work (Manly et al., Reference Manly, Jacobs, Touradji, Small and Stern2002, Reference Manly, Touradji, Tang and Stern2003, Reference Manly, Schupf, Tang and Stern2005) that stressed the importance of measuring and accounting for the quality of education (as measured by reading level) in studies of older individuals from racially diverse samples.

Acknowledgments

We thank the participants in the Rush Memory and Aging Project and the Minority Aging Research Study, and the staff of the Rush Alzheimer's Disease Center. Data collection was supported by the following National Institute of Aging grants: (R01AG17917, D Bennett, PI) and (R01AG022018, L Barnes, PI). Data analyses were supported by R01AG029672 (P Crane, PI). Parts of this manuscript were presented at the National Multicultural Conference & Summit 2011 in Seattle, Washington. No conflict of interest exists for the authors.

Appendix 1: WRAT-3 and NART analyses

Note. Scatter plot of nearly simultaneous WRAT-3 and NART Z scores for participants included in both data sets. Each symbol represents one of the ten participants common to the MAP and MARS databases. Horizontal and vertical lines are provided at the cut-points used in this study: +0.48 for the WRAT-3 in MARS, and −0.98 for the NART in MAP.

Appendix 2: Regression results excluding individuals with reading test Z scores within 0.25 of the cutpoint

Note. Regression results from a sensitivity analysis in which we omitted individuals with reading test scores close to the threshold values used to differentiate between low and high scores. The sample size reduced from 993 to 878 (67 participants were excluded from MARS and 48 participants from MAP). Regression findings are largely similar to those reported in the primary analyses.

References

Albert, M., Smith, L.A., Scherr, P.A., Taylor, J.O., Evans, D.A., Funkenstein, H.H. (1991). Use of brief cognitive tests to identify individuals in the community with clinically diagnosed Alzheimer's disease. The International Journal of Neuroscience, 57(3–4), 167–178. Retrieved from PM:1938160.CrossRef Google Scholar PubMed

Arvanitakis, Z., Bennett, D.A., Wilson, R.S., Barnes, L.L. (2010). Diabetes and cognitive systems in older black and white persons. Alzheimer Disease and Associated Disorders, 24(1), 37–42. doi:10.1097/WAD.0b013e3181a6bed5 [doi]. Retrieved from PM:19568148.CrossRef Google Scholar PubMed

Barnes, L.L., Wilson, R.S., de Leon, C.F., Bennett, D.A. (2006). The relation of lifetime cognitive activity and lifetime access to resources to late-life cognitive function in older African Americans. Neuropsychology, Development, and Cognition. Section B, Aging, Neuropsychology and Cognition, 13(3–4), 516–528. doi:K8005180774554M6 [pii]; 10.1080/138255890969519 [doi]. Retrieved from PM:16887787.CrossRef Google Scholar PubMed

Bennett, D.A., Schneider, J.A., Buchman, A.S., Mendes de, L.C., Bienias, J.L., Wilson, R.S. (2005). The Rush Memory and Aging Project: Study design and baseline characteristics of the study cohort. Neuroepidemiology, 25(4), 163–175. doi:NED2005025004163 [pii]; 10.1159/000087446 [doi]. Retrieved from PM:16103727.CrossRef Google Scholar PubMed

Boone, K.B., Victor, T.L., Wen, J., Razani, J., Ponton, M. (2007). The association between neuropsychological scores and ethnicity, language, and acculturation variables in a large patient population. Archives of Clinical Neuropsychology, 22(3), 355–365. doi:S0887-6177(07)00017-0 [pii]; 10.1016/j.acn.2007.01.010 [doi]. Retrieved from PM:17320344.CrossRef Google Scholar

Brickman, A.M., Cabo, R., Manly, J.J. (2006). Ethical issues in cross-cultural neuropsychology. Applied Neuropsychology, 13(2), 91–100. doi:10.1207/s15324826an1302_4 [doi]. Retrieved from PM:17009882.CrossRef Google Scholar PubMed

Brunner, E.J. (2005). Social and biological determinants of cognitive aging. Neurobiology of Aging, 26(Suppl. 1), 17–20. doi:S0197-4580(05)00299-X [pii]; 10.1016/j.neurobiolaging.2005.09.024 [doi]. Retrieved from PM:16257477.CrossRef Google Scholar PubMed

Camilli, G., Shepard, L.A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.Google Scholar

Cooper, J.A., Sagar, H.J. (1993). Incidental and intentional recall in Parkinson's disease: An account based on diminished attentional resources. Journal of Clinical and Experimental Neuropsychology, 15(5), 713–731. Retrieved from PM:8276931.CrossRef Google Scholar PubMed

Cosentino, S., Manly, J., Mungas, D. (2007). Do reading tests measure the same construct in multiethnic and multilingual older persons? Journal of the International Neuropsychological Society, 13(2), 228–236. doi:S1355617707070257 [pii]; 10.1017/S1355617707070257 [doi]. Retrieved from PM:17286880.CrossRef Google Scholar PubMed

Crane, P.K., Gibbons, L.E., Jolley, L., van Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. DIFdetect and difwithpar. Med Care, 44(11 Suppl. 3), S115–S123. doi:10.1097/01.mlr.0000245183.28384.ed [doi]; 00005650-200611001-00017 [pii]. Retrieved from PM:17060818.CrossRef Google Scholar PubMed

Crane, P.K., Gibbons, L.E., Narasimhalu, K., Lai, J.S., Cella, D. (2007). Rapid detection of differential item functioning in assessments of health-related quality of life: The functional assessment of cancer therapy. Quality of Life Research, 16(1), 101–114. doi:10.1007/s11136-006-0035-7 [doi]. Retrieved from PM:17111233.CrossRef Google Scholar PubMed

Crane, P.K., Gibbons, L.E., Ocepek-Welikson, K., Cook, K., Cella, D., Narasimhalu, K., Teresi, J.A. (2007). A comparison of three sets of criteria for determining the presence of differential item functioning using ordinal logistic regression. Quality of Life Research, 16(Suppl. 1), 69–84. doi:10.1007/s11136-007-9185-5 [doi]. Retrieved from PM:17554640.CrossRef Google Scholar PubMed

Crane, P.K., Narasimhalu, K., Gibbons, L.E., Pedraza, O., Mehta, K.M., Tang, Y., Mungas, D.M. (2008). Composite scores for executive function items: Demographic heterogeneity and relationships with quantitative magnetic resonance imaging. Journal of the International Neuropsychological Society, 14(5), 746–759. doi:S1355617708081162 [pii]; 10.1017/S1355617708081162 [doi]. Retrieved from PM:18764970.CrossRef Google Scholar PubMed

Crane, P.K., van Belle, G., Larson, E.B. (2004). Test bias in a cognitive test: Differential item functioning in the CASI. Statistics in Medicine, 23(2), 241–256. doi:10.1002/sim.1713 [doi]. Retrieved from PM:14716726.CrossRef Google Scholar

Dilworth-Anderson, P., Hendrie, H.C., Manly, J.J., Khachaturian, A.S., Fazio, S. (2008). Diagnosis and assessment of Alzheimer's disease in diverse populations. Alzheimers & Dementia, 4(4), 305–309. doi:S1552-5260(08)00077-0 [pii];10.1016/j.jalz.2008.03.001 [doi]. Retrieved from PM:18631983.CrossRef Google Scholar PubMed

Dorans, N.J., Kulick, E. (2006). Differential item functioning on the Mini-Mental State Examination. An application of the Mantel-Haenszel and standardization procedures. Medical Care, 44(11 Suppl. 3), S107–S114. doi:10.1097/01.mlr.0000245182.36914.4a [doi]; 00005650-200611001-00016 [pii]. Retrieved from PM:17060817.CrossRef Google Scholar PubMed

Dotson, V.M., Kitner-Triolo, M.H., Evans, M.K., Zonderman, A.B. (2009). Effects of race and socioeconomic status on the relative influence of education and literacy on cognitive functioning. Journal of the International Neuropsychological Society, 15(4), 580–589. doi:S1355617709090821 [pii]; 10.1017/S1355617709090821 [doi]. Retrieved from PM:19573276.CrossRef Google Scholar PubMed

Everson-Rose, S.A., Mendes de Leon, C.F., Bienias, J.L., Wilson, R.S., Evans, D.A. (2003). Early life conditions and cognitive functioning in later life. American Journal of Epidemiology, 158(11), 1083–1089. Retrieved from PM:14630604.CrossRef Google Scholar PubMed

Fillenbaum, G.G., Peterson, B., Welsh-Bohmer, K.A., Kukull, W.A., Heyman, A. (1998). Progression of Alzheimer's disease in black and white patients: The CERAD experience, part XVI. Consortium to Establish a Registry for Alzheimer's Disease. Neurology, 51(1), 154–158. Retrieved from PM:9674795.CrossRef Google Scholar

Flaugher, R.L. (1978). The many definitions of test bias. American Psychologist, 33(7), 671–679.CrossRef Google Scholar

Gasquoine, P.G. (2009). Race-norming of neuropsychological tests. Neuropsychology Review, 19(2), 250–262. doi:10.1007/s11065-009-9090-5 [doi]. Retrieved from PM:19294515.CrossRef Google Scholar PubMed

Gibbons, L.E., McCurry, S., Rhoads, K., Masaki, K., White, L., Borenstein, A.R., Crane, P.C. (2009). Japanese-English language equivalence of the Cognitive Abilities Screening Instrument among Japanese-Americans. International Psychogeriatrics, 21(1), 129–137doi:S1041610208007862 [pii]; 10.1017/S1041610208007862 [doi]. Retrieved from PM:18947456.CrossRef Google Scholar PubMed

Gurland, B.J., Wilder, D.E., Lantigua, R., Stern, Y., Chen, J., Killeffer, E.H., Mayeux, R. (1999). Rates of dementia in three ethnoracial groups. International Journal of Geriatric Psychiatry, 14(6), 481–493. doi:10.1002/(SICI)1099-1166(199906)14:6<481::AID-GPS959>3.0.CO;2-5 [pii]. Retrieved from PM:10398359.3.0.CO;2-5>CrossRef Google Scholar PubMed

Johnstone, B., Callahan, C.D., Kapila, C.J., Bouman, D.E. (1996). The comparability of the WRAT-R reading test and NAART as estimates of premorbid intelligence in neurologically impaired patients. Archives of Clinical Neuropsychology, 11(6), 513–519. doi:0887-6177(96)82330-4 [pii]. Retrieved from PM:14588456.CrossRef Google Scholar PubMed

Jones, R.N. (2003). Racial bias in the assessment of cognitive functioning of older adults. Aging & Mental Health, 7(2), 83–102. doi:10.10801360786031000045872 [doi]; 6NQP8W6NLFX55HFV [pii]. Retrieved from PM:12745387.CrossRef Google Scholar PubMed

Jones, R.N. (2006). Identification of measurement differences between English and Spanish language versions of the Mini-Mental State Examination. Detecting differential item functioning using MIMIC modeling. Medical Care, 44(11 Suppl. 3), S124–S133. doi:10.1097/01.mlr.0000245250.50114.0f [doi]; 00005650-200611001-00018 [pii]. Retrieved from PM:17060819.CrossRef Google Scholar PubMed

Jones, R.N., Fong, T.G., Metzger, E., Tulebaev, S., Yang, F.M., Alsop, D.C., Inouye, S.K. (2010). Aging, brain disease, and reserve: Implications for delirium. The American Journal of Geriatric Psychiatry, 18(2), 117–127. doi:10.1097/JGP.0b013e3181b972e8 [doi]; 00019442-201002000-00004 [pii]. Retrieved from PM:20104068.CrossRef Google Scholar PubMed

Kaplan, G.A., Turrell, G., Lynch, J.W., Everson, S.A., Helkala, E.L., Salonen, J.T. (2001). Childhood socioeconomic position and cognitive function in adulthood. Internataional Journal of Epidemiology, 30(2), 256–263. Retrieved from PM:11369724.CrossRef Google Scholar PubMed

Kit, K.A., Tuokko, H.A., Mateer, C.A. (2008). A review of the stereotype threat literature and its application in a neurological population. Neuropsychology Review, 18(2), 132–148. doi:10.1007/s11065-008-9059-9 [doi]. Retrieved from PM:18415682.CrossRef Google Scholar

Loewenstein, D.A., Arguelles, T., Arguelles, S., Linn-Fuentes, P. (1994). Potential cultural bias in the neuropsychological assessment of the older adult. Journal of Clinical and Experimental Neuropsychology, 16(4), 623–629. Retrieved from PM:7962363.CrossRef Google Scholar PubMed

Luo, D.L., Chen, G., Zen, F., Murray, B. (2010). Modeling work memory tasks on the item level. Intelligence, 38(1), 66–82. doi:10.1016/j.intell.2009.07.003.CrossRef Google Scholar

Lord, F.M., Novick, M.R. (1968). Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Google Scholar

Manly, J.J. (2008). Critical issues in cultural neuropsychology: Profit from diversity. Neuropsychology Review, 18(3), 179–183. doi:10.1007/s11065-008-9068-8 [doi]. Retrieved from PM:18814033.CrossRef Google Scholar PubMed

Manly, J.J., Byrd, D.A., Touradji, P., Stern, Y. (2004). Acculturation, reading level, and neuropsychological test performance among African American elders. Applied Neuropsychology, 11(1), 37–46. doi:10.1207/s15324826an1101_5 [doi]. Retrieved from PM:15471745.CrossRef Google Scholar PubMed

Manly, J.J., Jacobs, D.M., Sano, M., Bell, K., Merchant, C.A., Small, S.A., Stern, Y. (1998). Cognitive test performance among nondemented elderly African Americans and whites. Neurology, 50(5), 1238–1245. Retrieved from PM:9595969.CrossRef Google Scholar PubMed

Manly, J.J., Jacobs, D.M., Touradji, P., Small, S.A., Stern, Y. (2002). Reading level attenuates differences in neuropsychological test performance between African American and White elders. Journal of the International Neuropsychological Society, 8(3), 341–348. Retrieved from PM:11939693.CrossRef Google Scholar PubMed

Manly, J.J., Schupf, N., Tang, M.X., Stern, Y. (2005). Cognitive decline and literacy among ethnically diverse elders. Journal of Geriatric Psychiatry and Neurology, 18(4), 213–217. doi:18/4/213 [pii]; 10.1177/0891988705281868 [doi]. Retrieved from PM:16306242.CrossRef Google Scholar PubMed

Manly, J.J., Touradji, P., Tang, M.X., Stern, Y. (2003). Literacy and memory decline among ethnically diverse elders. Journal of Clinical and Experimental Neuropsychology, 25(5), 680–690. Retrieved from PM:12815505.CrossRef Google Scholar PubMed

Masel, M.C., Peek, M.K. (2009). Ethnic differences in cognitive function over time. Annals of Epidemiology, 19(11), 778–783. doi:S1047-2797(09)00175-6 [pii]; 10.1016/j.annepidem.2009.06.008 [doi]. Retrieved from PM:19656690.CrossRef Google Scholar PubMed

McDonald, R.P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.Google Scholar

McDougall, G.J. Jr., Vaughan, P.W., Acee, T.W., Becker, H. (2007). Memory performance and mild cognitive impairment in Black and White community elders. Ethnicity & Disease, 17(2), 381–388. Retrieved from PM:17682374.Google Scholar PubMed

McKhann, G., Drachman, D., Folstein, M., Katzman, R., Price, D., Stadlan, E.M. (1984). Clinical diagnosis of Alzheimer's disease: Report of the NINCDS-ADRDA Work Group under the auspices of Department of Health and Human Services Task Force on Alzheimer's Disease. Neurology, 34(7), 939–944. Retrieved from PM:6610841.CrossRef Google Scholar PubMed

Millsap, R.E., Everson, H.T. (1993). Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297–334.CrossRef Google Scholar

Morales, L.S., Flowers, C., Gutierrez, P., Kleinman, M., Teresi, J.A. (2006). Item and scale differential functioning of the Mini-Mental State Exam assessed using the Differential Item and Test Functioning (DFIT) Framework. Medical Care, 44(11 Suppl. 3), S143–S151. doi:10.1097/01.mlr.0000245141.70946.29 [doi]; 00005650-200611001-00020 [pii]. Retrieved from PM:17060821.CrossRef Google Scholar PubMed

Morris, J.C., Heyman, A., Mohs, R.C., Hughes, J.P., van Belle, G., Fillenbaum, G., … the CERAD Investigators (1989). The Consortium to Establish a Registry for Alzheimer's Disease (CERAD). Part I. Clinical and neuropsychological assessment of Alzheimer's disease. Neurology, 39(9), 1159–1165. Retrieved from PM:2771064.Google Scholar

Mungas, D., Reed, B.R., Farias, S.T., DeCarli, C. (2009). Age and education effects on relationships of cognitive test scores with brain structure in demographically diverse older persons. Psychology and Aging, 24(1), 116–128. doi:2009-03151-003 [pii]; 10.1037/a0013421 [doi]. Retrieved from PM:19290743.CrossRef Google Scholar PubMed

Muraki, E., Bock, R.D. (2003). PARSCALE 4: IRT item analysis and test scoring for rating-scale data [computer program]. Chicago, IL: Scientific Software.Google Scholar

Nelson, H.E. (1982). The National Adult Reading Test (NART): Test Manual. Windsor, UK: NFER Nelson.Google Scholar

Parker, C., Philp, I. (2004). Screening for cognitive impairment among older people in black and minority ethnic groups. Age and Ageing, 33(5), 447–452. doi:10.1093/ageing/afh135 [doi]; afh135 [pii]. Retrieved from PM:15217776.CrossRef Google Scholar PubMed

Pedraza, O., Graff-Radford, N.R., Smith, G.E., Ivnik, R.J., Willis, F.B., Petersen, R.C., Lucas, J.A. (2009). Differential item functioning of the Boston Naming Test in cognitively normal African American and Caucasian older adults. Journal of the International Neuropsychological Society, 15(5), 758–768. doi:S1355617709990361 [pii]; 10.1017/S1355617709990361 [doi]. Retrieved from PM:19570311.CrossRef Google Scholar PubMed

Pedraza, O., Mungas, D. (2008). Measurement in cross-cultural neuropsychology. Neuropsychology Review, 18(3), 184–193. doi:10.1007/s11065-008-9067-9 [doi]. Retrieved from PM:18814034.CrossRef Google Scholar PubMed

Ramirez, M., Teresi, J.A., Holmes, D., Gurland, B., Lantigua, R. (2006). Differential item functioning (DIF) and the Mini-Mental State Examination (MMSE). Overview, sample, and issues of translation. Medical Care, 44(11 Suppl. 3), S95–S106. doi:10.1097/01.mlr.0000245181.96133.db [doi]; 00005650-200611001-00015 [pii]. Retrieved from PM:17060840.CrossRef Google Scholar PubMed

Robertson, K., Liner, J., Heaton, R. (2009). Neuropsychological assessment of HIV-infected populations in international settings. Neuropsychology Review, 19(2), 232–249. doi:10.1007/s11065-009-9096-z [doi]. Retrieved from PM:19455425.CrossRef Google Scholar PubMed

Rogers, M.A., Plassman, B.L., Kabeto, M., Fisher, G.G., McArdle, J.J., Llewellyn, D.J., Langa, K.M. (2009). Parental education and late-life dementia in the United States. Journal of Geriatric Psychiatry and Neurology, 22(1), 71–80. doi:0891988708328220 [pii]; 10.1177/0891988708328220 [doi]. Retrieved from PM:19073840.CrossRef Google Scholar PubMed

Rosselli, M., Ardila, A. (2003). The impact of culture and education on non-verbal neuropsychological measurements: A critical review. Brain and Cognition, 52(3), 326–333. doi:S0278262603001702 [pii]. Retrieved from PM:12907177.CrossRef Google Scholar PubMed

Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.Google Scholar

Scarmeas, N., Stern, Y. (2003). Cognitive reserve and lifestyle. Journal of Clinical and Experimental Neuropsychology, 25(5), 625–633. Retrieved from PM:12815500.CrossRef Google Scholar PubMed

Schwartz, B.S., Glass, T.A., Bolla, K.I., Stewart, W.F., Glass, G., Rasmussen, M., Bandeen-Roche, K. (2004). Disparities in cognitive functioning by race/ethnicity in the Baltimore Memory Study. Environmental Health Perspectives, 112(3), 314–320. Retrieved from PM:14998746.CrossRef Google Scholar

Scruggs, T.E., Lifson, S.A. (1985). Current conceptions of test-wiseness: Myths and realities. School Psychology Review, 14(3), 339–350.CrossRef Google Scholar

Singh-Manoux, A., Richards, M., Marmot, M. (2005). Socioeconomic position across the lifecourse: How does it relate to cognitive function in mid-life? Annals of Epidemiology, 15(8), 572–578. doi:S1047-2797(04)00323-0 [pii]; 10.1016/j.annepidem.2004.10.007 [doi]. Retrieved from PM:16118001.CrossRef Google Scholar PubMed

Stern, Y. (2006). Cognitive reserve and Alzheimer disease. Alzheimer Disease and Associated Disorders, 20(2), 112–117. doi:10.1097/01.wad.0000213815.20177.19 [doi]; 00002093-200604000-00006 [pii]. Retrieved from PM:16772747.CrossRef Google Scholar PubMed

Stern, Y. (2009). Cognitive reserve. Neuropsychologia, 47(10), 2015–2028. doi:S0028-3932(09)00123-7 [pii]; 10.1016/j.neuropsychologia.2009.03.004 [doi]. Retrieved from PM:19467352.CrossRef Google Scholar PubMed

Stern, Y., Albert, S., Tang, M.X., Tsai, W.Y. (1999). Rate of memory decline in AD is related to education and occupation: Cognitive reserve? Neurology, 53(9), 1942–1947. Retrieved from PM:10599762.CrossRef Google Scholar PubMed

Stern, Y., Gurland, B., Tatemichi, T.K., Tang, M.X., Wilder, D., Mayeux, R. (1994). Influence of education and occupation on the incidence of Alzheimer's disease. The Journal of the American Medical Association, 271(13), 1004–1010. Retrieved from PM:8139057.CrossRef Google Scholar PubMed

Stricker, L.J., Emmerich, W. (1999). Possible determinants of differential item functioning: Familiarity, interest, and emotional reaction. Journal of Educational Measurement, 36(4), 347–366.CrossRef Google Scholar

Teng, E.L., Manly, J.J. (2005). Neuropsychological testing: Helpful or harmful? Alzheimer Disease and Associated Disorders, 19(4), 267–271. doi:00002093-200510000-00016 [pii]. Retrieved from PM:16327357.CrossRef Google Scholar PubMed

Teresi, J.A. (2006). Different approaches to differential item functioning in health applications. Advantages, disadvantages and some neglected topics. Medical Care, 44(11 Suppl. 3), S152–S170. doi:10.1097/01.mlr.0000245142.74628.ab [doi]; 00005650-200611001-00021 [pii]. Retrieved from PM:17060822.CrossRef Google Scholar PubMed

Teresi, J.A., Golden, R.R., Cross, P., Gurland, B., Kleinman, M., Wilder, D. (1995). Item bias in cognitive screening measures: Comparisons of elderly white, Afro-American, Hispanic and high and low education subgroups. Journal of Clinical Epidemiology, 48(4), 473–483. doi:0895-4356(94)00159-N [pii]. Retrieved from PM:7722601.CrossRef Google Scholar PubMed

Teresi, J.A., Holmes, D., Ramirez, M., Gurland, B.J., Lantigua, R. (2001). Performance of cognitive tests among different racial/ethnic and education groups: Findings of differential item functioning and possible bias. Journal of Mental Health and Aging, 7(1), 79–90.Google Scholar

Teresi, J.A., Kleinman, M., Ocepek-Welikson, K. (2000). Modern psychometric methods for detection of differential item functioning: Application to cognitive assessment measures. Statistics in Medicine, 19(11–12), 1651–1683. doi:10.1002/(SICI)1097-0258(20000615/30)19:11/12<1651::AIDSIM453>3.0.CO;2-H [pii]. Retrieved from PM:10844726.3.0.CO;2-H>CrossRef Google Scholar PubMed

Teresi, J.A., Stewart, A.L., Morales, L.S., Stahl, S.M. (2006). Measurement in a multi-ethnic society. Overview to the special issue. Medical Care, 44(11 Suppl. 3), S3–S4. doi:10.1097/01.mlr.0000245437.46695.4a [doi]; 00005650-200611001-00003 [pii]. Retrieved from PM:17060831.CrossRef Google Scholar

Thissen, D., Steinberg, L., Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In Holland, P.W., Wainer, H. (Eds.), Differential item functioning (pp. 67–113). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar

Tuokko, H.A., Chou, P.H., Bowden, S.C., Simard, M., Ska, B., Crossley, M. (2009). Partial measurement equivalence of French and English versions of the Canadian Study of Health and Aging neuropsychological battery. Journal of the International Neuropsychological Society, 15(3), 416–425. doi:S1355617709090602 [pii]; 10.1017/S1355617709090602 [doi]. Retrieved from PM:19402928.CrossRef Google Scholar PubMed

United States Census Bureau. (2010). Income: Historical income tables - families. Retrieved from http://www.census.gov/hhes/www/income/histinc/f07ar.html Google Scholar

Wechsler, D. (1987). Wechsler Memory Scale - Revised Manual. San Antonio, TX: Psychological Corporation.Google Scholar

Weiner, M.F. (2008). Perspective on race and ethnicity in Alzheimer's disease research. Alzheimer's & Dementia, 4(4), 233–238. doi:S1552-5260(07)00635-8 [pii]; 10.1016/j.jalz.2007.10.016 [doi]. Retrieved from PM:18631972.CrossRef Google Scholar PubMed

Whitfield, K.E. (2002). Challenges in cognitive assessment of African Americans in research on Alzheimer disease. Alzheimer Disease and Associated Disorders, 16(Suppl 2), S80–S81. Retrieved from PM:12351919.CrossRef Google Scholar PubMed

Whitfield, K.E., Fillenbaum, G.G., Pieper, C., Albert, M.S., Berkman, L.F., Blazer, D.G., Seeman, T. (2000). The effect of race and health-related factors on naming and memory. The MacArthur Studies of Successful Aging. Journal of Aging and Health, 12(1), 69–89. Retrieved from PM:10848126.CrossRef Google Scholar PubMed

Wilkinson, G.S. (1993). Wide Range Achievement Test 3. Wilmington, DE: Wide Range, Inc.Google Scholar

Wilson, R., Barnes, L., Bennett, D. (2003). Assessment of lifetime participation in cognitively stimulating activities. Journal of Clinical and Experimental Neuropsychology, 25(5), 634–642. Retrieved from PM:12815501.CrossRef Google Scholar PubMed

Wilson, R.S., Barnes, L.L., Krueger, K.R., Hoganson, G., Bienias, J.L., Bennett, D.A. (2005). Early and late life cognitive activity and cognitive systems in old age. Journal of the International Neuropsychological Society, 11(4), 400–407. Retrieved from PM:16209420.CrossRef Google Scholar PubMed

Wilson, R.S., Beckett, L.A., Barnes, L.L., Schneider, J.A., Bach, J., Evans, D.A., Bennett, D.A. (2002). Individual differences in rates of change in cognitive abilities of older persons. Psychology and Aging, 17(2), 179–193. Retrieved from PM:12061405.CrossRef Google Scholar PubMed

Zsembik, B.A., Peek, M.K. (2001). Race differences in cognitive functioning among older adults. The Journals of Gerontology. Series B, Psychological Sciences and Social Sciences, 56(5), S266–S274. Retrieved from PM:11522808.CrossRef Google Scholar PubMed

Fig. 1 Theoretical model of relationships between demographics, cognitive reserve and measurement bias on observed performance on episodic memory tests. Observed variables (i.e., performance on neuropsychological episodic memory tests, demographic characteristics, and indicators associated with cognitive reserve) are depicted in rectangles, while the unobserved factor (actual episodic memory functioning) is depicted in an oval. Observed episodic memory scores from neuropsychological tests (the box to the right of the figure) have only two inputs: episodic memory functioning itself (the oval at the left) and DIF. Demographics and cognitive reserve indirectly influence assessment performance through episodic memory functioning. DIF is depicted as a dashed box. DIF analyses allow us to assess whether measurement bias may be responsible for differences across groups in observed episodic memory scores. Any effect of demographics or cognitive reserve on observed performance that is not due to actual episodic memory functioning is depicted as a DIF effect.

Fig. 2 The derivation of study participants from the Memory and Aging Project (MAP) and Minority Aging Research Study (MARS) databases.

Table 1 Demographics, cognitive reserve, and episodic memory characteristics of sample stratified by race

Table 2 Results of regression models across Episodic Memory Scores

Table 3 Regression results for the matched analyses (n = 546)

Article contents

Explaining Differences in Episodic Memory Performance among Older African Americans and Whites: The Roles of Factors Related to Cognitive Reserve and Test Bias

Abstract

Keywords

Introduction

Method

Participants

Clinical Evaluations

Neuropsychological Evaluations

Cognitive reserve

Data Analysis

Overview

Composite Z score

Dimensionality

Naive IRT scores

IRT scores that accounted for all forms of DIF

Regression analyses

Sensitivity analyses

Results

Demographics and Episodic Memory Scores

IRT and DIF Analyses

Factors Associated With Episodic Memory Scores

Sensitivity Analyses

Discussion

Acknowledgments

Appendix 1: WRAT-3 and NART analyses

Appendix 2: Regression results excluding individuals with reading test Z scores within 0.25 of the cutpoint

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests