INTRODUCTION
Global measures of cognitive functioning are important in both the clinical and research settings. They allow clinicians to efficiently gauge a patient’s overall level of functioning, and to communicate this information to other clinicians. In research studies, global measures of cognitive status provide investigators and readers a practical yardstick with which to judge participants’ disease severity. Global measures are particularly useful in cross-sectional and longitudinal intervention studies, where complex patterns of performance on different cognitive subtests and across individuals, and variable rates of changes on different cognitive subtests and across individuals, would otherwise be difficult to assess, scrutinize, and communicate. Thus, although global performance measures are necessarily associated with a loss of information which would be unacceptable in the diagnostic and differential diagnostic process, they represent an elegant solution to a variety of practical issues in the clinical and research environment.
Pharmacologic and non-pharmacologic intervention studies have, therefore, used simple and simplifying measures of global functioning to judge cognitive performance at baseline and over time. These include in particular the Mini-Mental State Examination (MMSE; Folstein, Folstein, & McHugh, Reference Folstein, Folstein and McHugh1975), the Alzheimer Disease Assessment Scale (ADAS-Cog; Rosen, Mohs, & Davis, Reference Rosen, Mohs and Davis1984), and for severely impaired individuals the Severe Impairment Battery (SIB; Panisset, Roundier, Saxton, & Boller, Reference Panisset, Roudier, Saxton and Boller1994). The Consortium to Establish a Registry on Alzheimer’s Disease-Neuropsychological Assessment Battery (CERAD-NAB; Morris, Mohs, Rogers, Fillenbaum, & Heyman, Reference Morris, Mohs, Rogers, Fillenbaum and Heyman1988; Welsh et al., Reference Welsh, Butters, Mohs, Beekly, Edland and Fillenbaum1994) may provide the basis for an alternative global assessment measure. The CERAD-NAB has proven useful for the diagnosis of Alzheimer’s disease (AD), and a validated and normed version of this test battery has been established in German-speaking Europe as a minimal common screening battery for dementia syndromes (German version of the CERAD-NAB; available at Memory Clinic, Basel, Switzerland, http://www.memoryclinic.ch; Aebi, Reference Aebi2002; Thalmann et al., Reference Thalmann, Monsch, Schneitter, Bernasconi, Aebi and Camachova Davet2000). The CERAD has also proven sensitive to cognitive impairments occurring in the early stages of dementia and to cognitive changes over long time spans (Fillenbaum, Unverzagt, Ganguli, Welsh-Bohmer, & Heyman, Reference Fillenbaum, Unverzagt, Ganguli, Welsh-Bohmer, Heyman and Ferraro2002; Morris et al., Reference Morris, Heyman, Mohs, Hughes, van Belle and Fillenbaum1989, Reference Morris, Edland, Clark, Galasko, Koss and Mohs1993; Welsh, Butters, Hughes, Mohs, & Heyman, Reference Welsh, Butters, Hughes, Mohs and Heyman1991, Reference Welsh, Butters, Hughes, Mohs and Heyman1992; Zehnder, Bläsi, Berres, Spiegel, & Monsch, Reference Zehnder, Bläsi, Berres, Spiegel and Monsch2007).
Chandler et al. (Reference Chandler, Lacritz, Hynan, Barnard, Allen and Deschner2005) recently developed a total score for the American version of the CERAD-NAB. This total score was created by summing six CERAD subtest scores (excluding MMSE and praxis recall; maximum score = 100) and submitting the sum to a regression analysis to correct for demographic status. They provide normative data (T-scores) for these demographically corrected CERAD-NAB total scores. Thus, this score has the advantage that it is easy to calculate and characterizes global cognitive performance within the patient’s demographic framework. As expected, this score discriminated normal control participants (NC) from patients with probable AD significantly better than the MMSE (Chandler et al., Reference Chandler, Lacritz, Hynan, Barnard, Allen and Deschner2005). However, it is unclear whether the simple sum of CERAD-NAB variables provides the most powerful diagnostic discrimination between NC and AD. Moreover, it remains to be established whether Chandler et al.’s and other potential composite scores of CERAD-NAB performance retain their discriminatory power with AD individuals in very early stages of dementia as well during longitudinal assessment.
The goals of the present study were, therefore, to (1) determine the discriminatory utility of Chandler et al.’s total score of German CERAD-NAB performance with groups of NC participants and AD patients in very mild stages of the disease (MMSE in both groups ≥ 24/30), (2) determine the relative diagnostic discriminabilities of three different global CERAD-NAB scores with groups of NC individuals and AD patients, (3) to determine and compare the longitudinal discriminatory power of the different global CERAD-NAB scores, and (4) examine whether a combination of cross-sectional and longitudinal scores improves diagnostic classification.
MATERIALS AND METHODS
Healthy Aged Participants
A total of 1,100 healthy aged individuals (NC) participated (see Table 1). These individuals were a subset of BASEL study participants (Basel Study on the Elderly; Monsch et al., Reference Monsch, Thalmann, Schneitter, Bernasconi, Aebi and Camachova Davet2000) and formed the normative sample for the German version of the CERAD-NAB. Baseline testing took place between 1997 and 2001. Additional neuropsychological tests included the following: Trail Making Test (Army Individual Test Battery, 1944), nonverbal and phonemic fluency (Regard, Strauss, & Knapp, Reference Regard, Strauss and Knapp1982; Thurstone, Reference Thurstone1938), the modified Wisconsin Card Sorting Test (Nelson, Reference Nelson1976), digit span and Corsi blocks (Härting, Markowitsch, Neufeld, Calabrese, Deisinger, & Kessler, Reference Härting, Markowitsch, Neufeld, Calabrese, Deisinger and Kessler2000). All participants were thoroughly medically screened and fulfilled the following inclusion criteria: German as first language; Z scores ≤ −1.96 (2.5th percentile) in no more than 1 of the 11 CERAD-NAB variables; and were in good health, that is, had no current systemic illnesses, no current depression according to the Diagnostic and Statistical Manual of Mental Disorders, 4th Edition [American Psychiatric Association (APA), 1994] as assessed with a standardized questionnaire (Kühner, Reference Kühner1997), no current diseases interfering with the administration of neuropsychological tests (e.g., severe hearing or visual deficits), or diseases of the central nervous system (CNS) at the time of testing. Moreover, participants had no diseases or events during life that could have negatively impacted on CNS activity, and had never been hospitalized for a psychiatric illness. This project was approved by the local Ethics Committee, and written informed consent was obtained from all participants.
Table 1. Characteristics of healthy aged participants (NC) and patients with probable Alzheimer`s disease (AD) in the cross-sectional sample (T1)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704215413-48883-mediumThumb-S1355617710000822_tab1.jpg?pub-status=live)
Note
y = years. MMSE = Mini-Mental State Examination (Folstein et al., Reference Folstein, Folstein and McHugh1975). *p < .001.
Due to resource limitations, only 549 participants were invited for follow-up circa 2 (T2) and 4 (T3) years after baseline. T2 data from participants who remained cognitively healthy at T3 (n = 524) were included in the longitudinal analyses [mean of 2.4 years (SD = 0.3) following baseline]. Table 2 lists differences between NC participants with and without follow-up at T2. The testing protocol at T2 was identical to that at T1.
Table 2. Characteristics of healthy aged participants (NC) and patients with probable Alzheimer`s disease (AD) in the longitudinal subsample (T1 and T2; follow-up data) and those with no follow-up data
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704215413-57982-mediumThumb-S1355617710000822_tab2.jpg?pub-status=live)
Note
T1 = Baseline; T2 = Follow-up; y = years. MMSE = Mini-Mental State Examination (Folstein et al., Reference Folstein, Folstein and McHugh1975). **p < .001; *p < .01; †p < .05.
AD Patients with Very Mild to Mild Dementia
A total of 352 patients from the Memory Clinic at the University Hospital Basel fulfilled inclusion criteria for this study of (1) a diagnosis of probable AD according to the criteria outlined by the National Institute for Neurological and Communicative Disorders and Stroke and the Alzheimer`s Disease and Related Disorders Association (NINCDS-ADRDA; McKhann, Drachman, Folstein, Katzman, Price, & Stadlan, Reference McKhann, Drachman, Folstein, Katzman, Price and Stadlan1984) and DSM-IV (APA, 1994) and (2) an MMSE score ≥ 24/30 (see Table 1). The MMSE inclusion criterion ensured that patients were in the very early stages of the disease. All patients underwent a thorough interdisciplinary examination including a comprehensive neuropsychological assessment (same test battery as for NC sample), a medical examination including neurological status exam, structural brain imaging and laboratory tests (Monsch et al., Reference Monsch, Foldi, Ermini-Fünfschilling, Berres, Taylor and Seifritz1995). We note that, although diagnosing clinicians were not provided with CERAD-NAB total scores, they were presented with individual CERAD-NAB subtest scores together with other neuropsychological test scores, all categorized by cognitive domain. Thus, NC-AD differences in CERAD total scores reported below are most likely artificially inflated. We note that this artificial inflation affects all CERAD total scores. As such, the comparison of different CERAD total scores with one another—the primary goal of this study—is not confounded with the inclusion of CERAD data in the diagnostic process. Depressive symptoms were probed with the short version of the Geriatric Depression Scale (GDS; Sheikh & Yasavage, Reference Sheikh, Yasavage and Brink1986). At T1, 85% of the AD patients had no depressive symptoms (GDS score 0–4), 11% had mild (GDS score 5–7), 3% moderate (GDS score 8–10), and 2% pronounced (GDS score > 11) depressive symptoms.
Forty-seven percent (n = 165) of patients agreed to participate at a follow-up assessment and were tested longitudinally an average of 1.2 years (SD = 0.3) after baseline. All patients retained their diagnosis of AD at follow-up, and all retested patient data were included in the longitudinal analyses (see Table 2). The sample of AD patients not available for follow-up included more females and older, less well-educated patients with lower MMSE scores (see Table 2). At T2, 82% of the patients had no depressive symptoms, 13% had mild, 5% moderate and less than 1% pronounced depressive symptomatology according to the GDS. In the interval between assessments (but not before baseline), n = 108 (65%) of the longitudinal patient group were treated with acetylcholine-esterase inhibitors only, n = 6 (4%) took part in group memory training only (Ermini-Fünfschilling & Meier, Reference Ermini-Fünfschilling and Meier1995), n = 21 (13%) patients received both forms of therapy, and n = 30 (18%) patients received no specific dementia therapy.
Material
NC participants and AD patients were administered the German version of the CERAD-NAB (Thalmann et al., Reference Thalmann, Monsch, Schneitter, Bernasconi, Aebi and Camachova Davet2000) by experienced neurospychologists or trained psychology students. This test battery is comprised of subtests designed to measure those cognitive functions typically affected by AD, that is: “animal fluency” (60 s), a modified version of the “Boston Naming Test” (BNT; maximum score = 15), the MMSE (maximum score = 30), “word list – total” (the sum of words learned after three trials of the 10-word learning list; maximum score = 30), “figures – copy” (maximum score = 11), “word list – delayed recall” (maximum score = 10), “word list – recognition” (maximum score = 100%), and “figures – delayed recall” (maximum score = 11). Three additional variables were created: the number of word responses given during the three word list learning trials and word list delayed recall that were not on the original list was conceptualized as “word list – intrusions”; the proportion of correctly recalled words during the verbal delayed recall compared with verbal learning trial 3 was “word list – savings”; and similarly the proportion of correctly drawn figures during figural delayed free recall compared with the copy condition was “figures – savings”. A large, independent study with NC and AD patients demonstrated that these German CERAD-NAB variables have good to excellent discriminative validity (Aebi, Reference Aebi2002).
Statistical Analyses
Global CERAD-NAB scores
Three different global CERAD-NAB scores (all excluding the MMSE) were calculated for the NC and AD participants in the cross-sectional sample at T1 (cf. Table 1), and for the NC and AD participants in the longitudinal sample at T1 and T2 (cf. Table 2): Chandler’s total score, a new CERAD-NAB score derived from a principal components analysis of NC data and subsequent logistic regression (PCA-LR score), and a global score based on a logistic regression with jackknife procedure (LR score). The MMSE was not included in the global score calculation to more directly compare the performance of our total scores with that of Chandler et al. (Reference Chandler, Lacritz, Hynan, Barnard, Allen and Deschner2005), and because MMSE scores were used as an inclusion criterion for AD patients to participate in this study. To compare the abilities of each measure to discriminate between NC and AD participants, we produced receiver operating characteristics (ROC) curves for each global score and compared the corresponding areas under the curves (see Hanley & McNeil, Reference Hanley and McNeil1983). For descriptive purposes only, we also report the sensitivity, specificity and correct classification rate (CCR; mean of sensitivity and specificity) of each score.
Chandler score
Chandler CERAD total score was calculated using a procedure identical to Chandler et al. (Reference Chandler, Lacritz, Hynan, Barnard, Allen and Deschner2005). Raw scores on six German CERAD-NAB variables (i.e., animal fluency with a maximum score of 24 words; modified BNT; word list – total; word list – delayed recall; word list – recognition, subtracting the number of false positives from the number of true positives; figures – copy) were summed, and an age, education and gender corrected regression formula was created with a multiple regression analysis based on the present participants’ data.
Principal Components Analysis – Logistic Regression (PCA-LR) score
A principal components analysis (PCA) was conducted to reduce the CERAD-NAB variables to one score while accounting for intercorrelations between variables and minimizing redundancies. CERAD-NAB variables were transformed to achieve normality of standardized residuals (Z scores) in regression models adjusting for covariates (Berres, Zehnder, Bläsi, & Monsch, Reference Berres, Zehnder, Bläsi and Monsch2007). The Z scores of the 10 CERAD-NAB variables of the NC group at T1, excluding MMSE, were used in the PCA. A three-factor PCA solution was selected because three factors achieved eigenvalues greater than 1. Three factor scores were calculated for each participant using coefficient values rounded to one decimal place. These three factor scores were submitted to a logistic regression analysis with backward elimination to produce a global score. The global score was adjusted in a linear model with age, education, and the square of these variables. This analysis resulted in a Z score which was then linearly transformed to a mean of 100 and SD of 15.
The PCA-LR score was also calculated for each AD patient. The cutoff score for the PCA-LR scores was determined with a binary logistic regression analysis. A case was classified as “demented” if the predicted probability for dementia was greater than the proportion of true AD patients in the sample. The cutoff value for the binary logistic regression analysis was, therefore, set to a probability of 0.25.
PCA-LR scores for T2 CERAD-NAB performance were calculated using a procedure identical to the one described above.
Logistic regression (LR) score
Stepwise logistic regression analyses with backward elimination (exclusion criterion: p = .10; inclusion criterion: p = .05; cutoff value = 0.25) comparing NC participants to AD patients were performed on all 10 z-transformed baseline CERAD-NAB scores (excluding MMSE). Logistic regression estimates the optimal weighting of each subtest score, as determined by the best fit of predicted probabilities to observed outcomes. Thus, it tends to give high weights to variables with high measurement precision, similar to methods which directly weight by precision (see, e.g., Wouters, van Gool, Schmand, & Lindeboomm, Reference Wouters, van Gool, Schmand and Lindeboom2008), but has the added advantage of accounting for correlations between variables.
Cross-validation
To determine the diagnostic accuracies of each baseline CERAD-NAB total score, we first randomly split the NC and AD baseline samples (group A and group B) for cross-validation analyses (see Table 3 for demographic information). Thus, total scores were derived from group A and applied to group B, and in a second step, scores were derived from group B and applied to group A. The diagnostic discriminability of each set of scores was quantified with sensitivity, specificity, CCR, and AUC.
Table 3. Characteristics of healthy aged participants (NC) and patients with probable Alzheimer`s disease (AD), randomly assigned to groups A and B for cross-validation
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704215413-99189-mediumThumb-S1355617710000822_tab3.jpg?pub-status=live)
Note
y = years. MMSE = Mini-Mental State Examination (Folstein et al., Reference Folstein, Folstein and McHugh1975).
Entire Sample
To foreshadow the cross-validation results, each set of analyses resulted in comparable diagnostic discriminabilities (see Table 4), indicating a high stability of the generated total scores. To produce a single set of total score formulae, we next calculated CERAD total scores on the entire participant population. For the LR score based on the entire sample, the stability of the regression model was tested with a cross-validation via a jackknife procedure (“leave one out” method). This procedure was used to calculate LR scores at T1 and T2. We report the results (sensitivity, specificity, CCR, AUC) of these analyses after the jackknife cross-validation.
Table 4. Results of the double cross-validation of randomly split groups
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704215413-57179-mediumThumb-S1355617710000822_tab4.jpg?pub-status=live)
Note
PCA-LR = logistic regression on principal components analysis scores; LR = logistic regression on demographically corrected (CERAD-NAB) variables; CCR = correct classification rate; AUC = area under the curve. The sensitivity, specificity, CCR, and AUC in the validation sample are based on application of score formulae derived from the training data.
Global Change Scores
To determine the extent to which the global scores discriminated between the longitudinal performance of NC and AD participants, difference scores were calculated by subtracting each score at T1 from the corresponding score at T2. Because difference scores based on Z scores are also influenced by demographic variables, difference scores were corrected for the effects of age, education, gender, baseline performance, and all possible interactions between these variables (Zehnder et al., Reference Zehnder, Bläsi, Berres, Spiegel and Monsch2007). Forty-five regression models were fitted for each global score. The most accurate model, that is, the model with the smallest standard deviation of the predicted residuals, was selected based on the results of the Predicted Residual Sum of Squares statistics (Berres et al., Reference Berres, Zehnder, Bläsi and Monsch2007).
Combination of Cross-sectional and Longitudinal Data
To determine whether the correct classification rate based on combined information from (a) T1 and T2 and (b) T1 and (T2–T1) is superior to the diagnostic accuracy of T1 information alone, linear combinations of these data were calculated for the Chandler and PCA-LR global scores. These values were compared with scores from the logistic regression with backward elimination (exclusion criterion: p = .10, inclusion criterion: p = .05) based on all 10 z-transformed CERAD-NAB variables at timepoints T1 and T2.
Statements of significance refer to comparison-wise error rates and should be interpreted in a descriptive sense.
RESULTS
Cross-validation
The results of the cross-validation analyses are shown in Table 4. Because the diagnostic discriminabilities of scores in the validation and cross-validation samples were comparable, and to generate a single set of total score formulae, all ensuing analyses were conducted on the entire dataset.
Cross-sectional Sample (cf. Table 1)
The mean uncorrected baseline Chandler total score of the NC group was 81.3 (SD = 8.4; range = 53–99), and of the AD patients was 56.3 (SD = 10.0; range = 33–83). The mean adjusted Chandler total score for the NC group was 95.4 (SD = 7.3; range = 63.5–114.1), higher than that of the AD patients (mean = 73.0; SD = 9.4; range = 47.1–100.9) (t[494.3] = 40.9, p < .001, two-tailed), and the cutoff score was 85.11. The formula for the demographic correction was: Chandler raw total score – (−0.391*age + 0.886*education + 4.447*gender), where gender is coded as man = 0, woman = 1. This adjusted Chandler total score was highly correlated with the adjusted American version (r = 0.997).
The PCA-LR global CERAD-NAB score was based on a PCA and logistic regression. Table 5 contains the factor loadings resulting from the PCA of all CERAD-NAB subtest performance of the NC group at T1. Factor 1 represents “verbal memory,” factor 2 “nonverbal memory,” and factor 3 “non-memory functions.” The logistic regression of the raw PCA factor loadings resulted in the following PCA-LR raw score: = 4.355 + 2.23*factor1 + 0.466*factor3. The PCA-LR raw scores were transformed with t_PCA-LR score = sign(PCA-LR raw score) * abs(PCA-LR raw score)1.15, with sign(x) = 1 for x ≥ 0 and = −1 for x < 0 in preparation for the adjustments for linear and quadratic effects of the demographic variables. The corrected PCA-LR global Z score was: Z_PCA-LR score = [t_PCA-LR score – (4.779 + 0.00701*age + 0.05718*education - 0.00246*(age - 68.68)2 - 0.02592*(education - 12.47)2)] / 3.204. The NC data were then linearly transformed to a mean of 100 and SD of 15 (range = 51.65–144.04). The mean PCA-LR score for the AD patients (mean = 60.34; SD = 15.75; range = 17.2–109.85) was lower than that of the NC participants (t[1450] = 42.7; p < .001, two-tailed), and the resulting cutoff score was 80.13.
Table 5. Factor loadings of the German CERAD-NAB variables resulting from a principal component analysis (PCA; 3-factor solution) of healthy aged participants’ performance (N = 1,100)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704215413-76424-mediumThumb-S1355617710000822_tab5.jpg?pub-status=live)
Note
CERAD-NAB = Consortium to Establish a Registry on Alzheimer’s Disease-Neuropsychological Assessment Battery.
a Number of words provided during the three learning trials and the Word List – Delayed Recall that were not on the Word List.
b Proportion of correctly recalled words during the Word List - Delayed Recall compared to the Word List learning trial 3.
c Proportion of correctly drawn figures during Figures - Delayed Recall compared to Figures - Copy.
The numerical values for the Chandler and PCA-LR scores cannot be directly compared because they underwent different transformations.
The logistic regression analysis resulted in a model containing all 10 z-transformed CERAD-NAB variables which provided the best possible discrimination between groups: 4.961 + (.704*animal fluency) + (.199*BNT) + (.719*word list – total) + (.293* word list – intrusions) + (.459*figures – copy) + (1.626 word list – delayed recall) – (.408*word list – savings) + (.295*word list – recognition) – (.679*figures – delayed recall) + (1.437*figures – savings).
To estimate the ability of the global scores to discriminate between NC and AD participants, ROC curves were generated for each global score. The sensitivity and specificity, as well as the CCR of each global score were calculated from these ROC curves (see Table 6). Comparisons of the areas under the ROC curve revealed that the LR score showed trends to discriminate better than the PCA-LR score (Z = 1.96; p = .05) and the Chandler score (Z = 1.80; p = .07). The Chandler and PCA-LR scores performed comparably (Z = 0.77; p = .44). The distributions of the Chandler and PCA-LR scores for NC and AD participants with corresponding cut-off scores are shown in Figure 1a,b. These figures illustrate the excellent diagnostic discriminabilities of the Chandler and PCA-LR scores, as well as the number of false classifications associated with other, suboptimal, cut-off scores.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704215413-44601-mediumThumb-S1355617710000822_fig1g.jpg?pub-status=live)
Fig. 1. Distribution of (a) Chandler and (b) logistic regression on principal components analysis (PCA-LR) scores in healthy elderly individuals (NC) and patients with Alzheimer’s disease (AD) at baseline (T1). Each optimal cutoff score classifies individuals with higher scores as “healthy” and those with lower scores as “demented”.
Table 6. Comparisons of the discriminatory diagnostic characteristics of the Chandler score, PCA-LR score, and LR score in the cross-sectional (T1) and longitudinal (T1 and T2) samples
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704215413-72398-mediumThumb-S1355617710000822_tab6.jpg?pub-status=live)
Note
PCA-LR = logistic regression on principal components analysis scores; T1 = baseline; T2 = follow-up; CCR = correct classification rate; AUC = area under the curve; LR = logistic regression on demographically corrected (CERAD-NAB) variables, results after jackknife validation.
* Score differs from LR score (p = .05).
Longitudinal Subsample (cf. Table 2)
Baseline
At T1, the longitudinal subsample of NC participants had higher Chandler scores (mean = 95.42; SD = 6.95; range = 73.61–113.7) and PCA-LR scores (mean = 99.95; SD = 14.4; range = 51.65–134.74) than the AD patients (Chandler score: mean = 74.27; SD = 9.65; range = 47.11–100.91; t[220.1] = 26.1; p < .001 , two-tailed; PCA-LR score: mean = 63.21; SD = 15.81; range = 28.19–109.85; t[687] = 27.9; p < .001 , two-tailed). The area under the ROC curve of the Chandler score did not significantly differ from the area under the ROC curve of the PCA-LR score (Z = 0.23; p = .82) or the LR score (Z = 1.18; p = .24), and the areas under the ROC curves of the PCA-LR and LR scores also did not significantly differ (Z = 1.07; p = .28) (see Table 6).
Follow-up
At T2, the NC participants still had a greater Chandler score (mean = 98.01, SD = 6.31) and PCA-LR score (mean = 105.26, SD = 13.76) compared with the AD patients (Chandler score: mean = 73.09; SD = 11.64; t[195.3] = 26.31; p < .001; PCA-LR score: mean = 60.45; SD = 18.75; t[222.4] = 28.39; p < .001), as expected.
The LR analysis resulted in a model containing five z-transformed CERAD-NAB variables which provided the best possible discrimination between groups: 3.767 + (.908*animal fluency) – (1.219*word list – total) + (0.603*word list – delayed recall) + (.581*word list – recognition) + (.731*figures – delayed recall).
The Chandler and PCA-LR scores demonstrated similar discriminatory abilities (Z = 0.44; p = .66), as did the LR and Chandler scores (Z = 1.02; p = .31) as well as LR and PCA-LR scores (Z = 0.57; p = .57). In this subsample, the optimal cut-off score for the Chandler score was 85.89 at T1 and 88.1 at T2, for the PCA-LR score 81.3 at T1 and 84.82 at T2 (see Table 6).
Longitudinal analyses
Difference scores (T2–T1) were calculated for the Chandler and PCA-LR and standardized to a mean of 0 and a SD of 1 in the NC sample. Both difference scores for the AD patients were significantly different from zero (Chandler score: mean = −2.62; SD = 1.63; range = −6.33–2.16; t[203.7] = 19.42; p < .001; PCA-LR score: mean = −2.19; SD = 1.21; range = −5.07–1.63; t[238] = 21.06; p < .001), indicating a significant decline in CERAD-NAB performance over time. The sensitivities, specificities and CCRs for the Chandler difference score were 80.6, 87.8 and 84.2 (AUC = 0.91), respectively, and the corresponding values for the PCA-LR difference score were 85.5, 85.1 and 85.3 (AUC = 0.922), respectively. The LR difference score based on the z-transformed differences in Z score between T2 and T1 and following a jackknife procedure demonstrated a sensitivity of 85.5, a specificity of 90.3 and a CCR of 87.9 (AUC = 0.934). All three measures did not significantly differ with respect to their abilities to discriminate NC from AD participants’ longitudinal performance (AUC comparisons; Chandler vs. PCA-LR score: Z = 0.79; Chandler vs. LR: Z = 1.60; PCA-LR vs. LR: Z = 0.95).
Combination of cross-sectional and longitudinal information
The results of the binary logistic regression for both combinations [T1 and T2 as well as T1 and difference scores (T2 – T1)] for each global CERAD-NAB score are shown in Table 7. Both sets of scores were comparable in their ability to discriminate NC from AD participants according to comparisons of the corresponding areas under the curves (all Z < 1.49). While both combinations of PCA-LR scores demonstrated a higher diagnostic accuracy than PCA-LR scores based on T1 alone (n.b. areas under the curve identical for both PCA-LR combination scores; Z = 2.27, p = .02), there was only a trend for the Chandler combination scores to outperform the Chandler score for T1 (n.b. areas under the curve identical for both Chandler combination scores; Z = 1.88, p = .06), and the already excellent discriminatory power of the LR score at baseline did not improve when information from T1 and T2 were combined (Z = 1.51, p = 0.13).
Table 7. Discriminatory diagnostic characteristics of the Chandler score, PCA-LR score, and LR score for the combined cross-sectional and longitudinal data
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704215413-34530-mediumThumb-S1355617710000822_tab7.jpg?pub-status=live)
Note
PCA-LR = logistic regression on principal components analysis scores; T1 = baseline; T2 = follow-up; difference = adjusted difference score (T2 – T1); CCR = correct classification rate; AUC = area under the curve; LR = logistic regression on demographically corrected (CERAD-NAB) variables, results after jackknife validation.
DISCUSSION
All global CERAD-NAB scores examined here demonstrated a comparably good ability to correctly classify NC participants and very early AD patients. Moreover, the consistency of the total score performance measures in the cross-validation analyses indicate that these measures are highly stable. The logistic regression represents an ideal method for diagnostic discrimination at the group level, and the supplementary jackknife procedure optimizes logistic regression scores for future samples. The present analyses, in which global scores were stringently compared with the logistic regression with jackknife procedure, showed that the Chandler score and the PCA-LR score both have excellent diagnostic discriminability for NC and early AD (we should note that the numerical values for the Chandler and PCA-LR scores cannot be directly compared because of the different methods of derivation and scalings). However, the relative ease of calculating the Chandler et al. score makes it a practical choice for the assessment of early Alzheimer’s disease patients with the German CERAD-NAB.
The CCRs were most likely affected by several competing factors. With respect to the change scores, stringent statistical analyses were used because the test–retest interval of the NC participants was nearly double that of the AD patients, who most likely would have shown an even more pronounced cognitive decline with an equally long test–retest interval. Thus, the relatively high CCRs between 84.2% and 87.9% most likely underestimate the potential of these scores to correctly classify individuals on the basis of their change in cognitive performance over time and thus also the constructed combination scores. Similar to our global scores, Zehnder et al. (Reference Zehnder, Bläsi, Berres, Spiegel and Monsch2007) reported that individual CERAD-NAB subtest baseline and change scores both showed excellent diagnostic discriminability for NC and AD in groups who had the same test–retest intervals as the present study. Moreover, 78% of AD patients with follow-up testing had received acetylcholine-esterase inhibitor treatment between baseline and follow-up, which may also have weakened differences between the CERAD total scores of NC and AD participants. It would be interesting for further studies to quantify the potential effects of such treatments on longitudinal CERAD-NAB total score performance. AD patients available for follow-up also tended to be younger, better educated, and have higher MMSE scores than AD patients who were not followed-up, factors presumably associated with higher CERAD-NAB scores and a decreased CCR. However, an additional factor may have led to an overestimation of the CCR: CERAD-NAB scores were available to clinicians during the diagnostic process. This partial circularity, which affected all total scores, most likely artificially increased their CCRs.
The combined baseline and longitudinal PCA-LR scores provided significantly greater diagnostic discriminability compared with baseline PCA-LR data alone, whereby combined Chandler score information showed a trend to outperform corresponding baseline data. These findings suggest that the consideration of information from two testing sessions can be diagnostically relevant, especially in the early stages of a dementing illness or in cases where uncertainties surround the initial diagnosis (e.g., initially good test performance combined with caregiver report or clinical signs of impairments in activities of daily living). Indeed, the results of a recent study (Rosetti, Cullum, Hynan, & Lacritz, Reference Rosetti, Cullum, Hynan and Lacritz2010) support the utility of the CERAD total score to measure the progression of global neuropsychological impairment in AD.
Because also the Chandler total score was developed from NC data, it may additionally prove useful in discriminating NC individuals from those with other forms of dementia. Aebi (Reference Aebi2002) demonstrated that 7 of the 10 CERAD-NAB variables discriminated NC from patients with AD, vascular dementia and mixed dementia with an accuracy between 81 and 86%. Moreover, a total score based on a newly developed, extended version of the German CERAD-NAB (“plus”) battery which includes phonemic fluency (S-words; Thurstone, Reference Thurstone1938) and the TMT (Army Individual Test Battery, 1944) may prove especially useful in discriminating NC individuals from patients with subcortical forms of dementia.
The higher diagnostic discriminability (NC vs. AD) of the total score in Chandler et al.’s (Reference Chandler, Lacritz, Hynan, Barnard, Allen and Deschner2005) population (sensitivity 93.7% / specificity 92.6% vs. 92.0% / 89.1% in the present sample) may reflect differences in demographic correction formulae and the more advanced stage of AD in Chandler et al.’s population. When the original American demographic correction formula was applied to our sample, mean corrected CERAD total scores of our AD patients (71.6; SD = 9.3) were more comparable to those of Chandler et al.’s MCI sample (76.9; SD = 8.9) than their AD sample (60.2; SD = 11.9). These findings highlight the challenge of differentiating patients with (amnestic) MCI from those in a very early stage of dementia (Winblad et al., Reference Winblad, Palmer, Kivipelto, Jelic, Fratiglioni and Wahlund2004). This differentiation will depend in part on assessing complex instrumental activities in daily living as precisely as possible, although it remains unclear what these activities are and how they can be discriminated from activities of daily living. This is an important topic for further investigation, especially because patients with a minimal degree of symptomatology would be expected to profit from a sufficiently safe and tolerable treatment to delay the progression of the disease, for example, immunization therapy (Grimmer, Perneczky, & Kurz, Reference Grimmer, Perneczky and Kurz2008). For this reason, and in light of recent developments in amyloid ß research, the early diagnosis of AD is of central importance (Forsberg et al., Reference Forsberg, Engler, Almkvist, Blomquist, Hagman and Wall2008; Pike et al., Reference Pike, Savage, Villemagne, Ng, Moss and Maruff2007). The CERAD neuropsychological assessment battery appears to be an excellent tool for this purpose.
Complex cognitive profiles composed of a large number of individual test scores can be a cumbersome form of communicating an individual’s cognitive status, especially in intervention or longitudinal studies where individual scores can show independent and in some cases divergent patterns over time. In many cases, instruments such as the ADAS-Cog have been used in addition to the MMSE (Brodaty, Corey-Bloom, Potocnik, Truyen, Gold, & Damaraju, Reference Brodaty, Corey-Bloom, Potocnik, Truyen, Gold and Damaraju2005; Rogers, Farlow, Doody, Mohs, & Friedhoff, Reference Rogers, Farlow, Doody, Mohs and Friedhoff1998; Seltzer et al., Reference Seltzer, Zolnouni, Nunez, Goldman, Kumar and Ieni2004; Tariot, Solomon, Morris, Kershaw, Lilienfeld, & Ding, Reference Tariot, Solomon, Morris, Kershaw, Lilienfeld and Ding2000). The CERAD-NAB has the advantage over the ADAS-Cog and MMSE of measuring both delayed recall and recognition from episodic memory, functions critically impaired in AD. The CERAD total score based on demographically corrected raw scores represents a practical alternative tool to assess global cognitive functioning. During repeated testing with the validated CERAD-NAB, this score allows for easily communicable conclusions to be made about the course of cognitive functioning and, therefore, appears especially suited to intervention studies, for example, to assess the potential modification of disease progression with different kinds of pharmacotherapy.
For the German version of the CERAD-NAB, Chandler et al.’s (Reference Chandler, Lacritz, Hynan, Barnard, Allen and Deschner2005) method of calculating a total score adjusted for the influences of demographic variables can be recommended. This score is much simpler to calculate than the global CERAD-NAB score based on a principal component analysis (i.e., PCA-LR score) and provides an effective global measure of cognitive functioning.
ACKNOWLEDGMENTS
All authors report no conflicts of interest. Parts of this manuscript were presented at the International Conference on Alzheimer’s Disease, Chicago, July 2008. We gratefully acknowledge the help and support of all patients and volunteers as well as the staff of the Memory Clinic, Basel, Switzerland. In particular, we thank Ursi Kunze and Irene Täuber for their support in database management.