Published online by Cambridge University Press: 01 March 2004
We investigated the sensitivity of the P300 event-related brain potential (ERP) recorded during a memory-demanding task to memory function in subjects with dementia of the Alzheimer's type (DAT), those with mild cognitive impairment (MCI), and normal elderly controls. We also explored the ability of neuropsychological (delayed verbal memory), neuroanatomical (MRI-based hippocampal volume), and electrophysiological (memory search P300 amplitude) memory measures to distinguish between the three subject groups using discriminant function analyses. Fourteen patients with DAT, 16 with MCI, and 15 age- and education-matched controls were tested. P300 amplitude was reduced in DAT subjects at all levels of memory load; however, it did not differ between MCI and control subjects. Delayed verbal memory performance best discriminated DAT from MCI and control subjects, while delayed verbal memory and hippocampal volume best discriminated MCI subjects from controls. These results support the utility of neuropsychological and neuroanatomical measures in diagnosing dementia and do not support the notion that P300 amplitude is sensitive to mild memory dysfunction when measured using the current task. (JINS, 2004, 10, 200–210.)
Dementia of the Alzheimer type (DAT) is the most prevalent degenerative neurological disorder in the elderly and is characterized by marked changes in cognition and personality. Despite the fact that there is approximately 90% concordance between diagnoses made following presentation to memory clinics and subsequent post-mortem clinico-pathological examination (Morris et al., 1988; Tierney et al. 1988), early diagnosis can be difficult. Episodic learning and memory impairments are considered the hallmark symptom (Brandt & Rich, 1995), presenting early in the disease. The main objective of this study was to evaluate the relative discriminative power of neuropsychological, neuroanatomical, and neurophysiological correlates of memory function in patients with DAT and those with mild memory loss due to mild cognitive impairment (MCI).
MCI subjects are individuals with subjective memory complaints who have evidence of mild memory loss but normal general cognitive function, as substantiated by quantitative neuropsychologic measures. The impairment is not of sufficient magnitude to warrant a diagnosis of dementia; that is, it must not result in meaningful functional impairment. Nevertheless, this population of individuals includes those considered at high risk for the development of DAT over long-term follow-up. Longitudinal studies of samples of similarly mildly impaired patients have shown that approximately one-half to two-thirds progress to a demonstrably demented state within two to five years (O'Connor, 1994; Petersen et al., 1999; Rubin et al., 1989). These persons are of great interest because they may represent a transitional state between normal cognitive aging and DAT.
Studies examining magnetic resonance imaging (MRI)-derived measures of medial temporal lobe structures have shown significant atrophy of the hippocampus and amygdala in DAT patients (Jack et al., 1992; Killiany et al., 1993). Such findings are consistent with evidence for the crucial role of the hippocampus in memory function (Milner, 1978) and with the observation that memory impairment is a defining feature of DAT (Brandt & Rich, 1995). Studies searching for similar changes in minimally impaired individuals who would meet criteria for MCI have yielded mixed results. Some studies have found cross-sectional differences in hippocampal volume between mildly impaired patients and normals controls (e.g., De Leon et al., 1993), while others have not (e.g., Murtha et al., 1994; Soininen et al. 1994).
However, it is quite possible that the very earliest brain changes in DAT (i.e., in those MCI patients who subsequently develop dementia) might not be detectable by gross structural measures such as the size of the hippocampus. It is reasonable to hypothesize a functional abnormality of the hippocampus would be detectable prior to the appearance of observable atrophy on MRI. This study investigates whether the P300 event-related brain potential recorded during a memory demanding task may represent a marker of hippocampal dysfunction.
Event-related brain potentials (ERPs) reflect the neuronal mass activity associated with perceptual and cognitive processing of stimulus information (Coles & Rugg, 1995). The P300 component is a member of a “family” of late-occurring scalp-recorded positivities which has been widely studied in clinical populations because it reflects aspects of cognitive processing in normal subjects (Donchin, 1981; Picton, 1992). The so-called oddball P300 is recorded during a paradigm in which subjects attend to and detect infrequent targets in a series of standard stimuli. Delayed P300 latency has been shown in DAT patients (Polich et al., 1990), elderly adults (Brown et al., 1983), and patients with depression or schizophrenia (Blackwood et al., 1987). Unfortunately, this non-specific latency abnormality limits the use of the oddball P300 diagnostically.
However, P300-like positivities can be elicited in different experimental paradigms. The Sternberg task (Sternberg, 1966) is a short-term recognition memory paradigm in which subjects memorize sets of stimuli that vary in the amount of information to be remembered. A subject's task is to decide whether subsequently presented probe stimuli had been contained in the memory set. A common finding in both healthy young adults (Adam & Collins, 1978; Pratt et al., 1989) and healthy elderly adults (Ford et al., 1979; Phillips & Hooper, 1997) is that P300 latency increases with increasing memory load. This latency increase is thought to reflect the neural processes associated with searching items in memory prior to response selection.
Variation in P300 amplitude with increasing memory load has been less commonly reported. Reduced P300 amplitude to auditory but not visual probes have been reported in patients with verbal auditory short-term memory deficits (Starr & Barrett, 1987), suggesting that P300 amplitude may be sensitive to disorders of memory function. DeToledo-Morrell et al. (1991) investigated this possibility in DAT patients (n = 6) and age-matched controls (n = 7). The memory scanning P300 was found to be differentially sensitive to the memory dysfunction of DAT patients relative to the controls. Specifically, memory scanning P300 amplitude in DAT patients decreased abnormally as the memory load increased, while oddball P300 amplitude did not differ between the two groups. DeToledo-Morrell and colleagues (1991) suggested this finding might provide a sensitive marker of memory impairment in DAT patients.
Although intriguing, these results were based on a small number of subjects. We set out to replicate and expand these results. One objective was to determine whether the memory search P300 is a marker of early memory dysfunction in patients with DAT and those with mild memory dysfunction at risk for developing DAT (i.e., MCI patients). We also studied the relationship between ERP amplitude and behavioral (neuropsychological test performance) and anatomical (MRI hippocampal volumes) measures of memory function to provide converging evidence for the proposed ERP memory abnormality. The following questions were tested: Is P300 amplitude differentially reduced in DAT patients relative to controls when memory load is stressed? Does P300 amplitude distinguish between MCI patients and healthy controls? Is there a relationship between P300 amplitude and MRI-based measures of hippocampal volume? Can P300 amplitude, hippocampal volume, and long-term memory performance be used to differentiate patients with DAT, patients with MCI, and normal controls?
MCI and DAT subjects were recruited on a physician referral basis from the Memory Clinic of the Sir Mortimer B. Davis–Jewish General Hospital (JGH), a tertiary care referral center of McGill University, Montreal. Their investigations included full medical, neuropsychological, and neuroradiological evaluations. ERP testing was conducted in addition to these standard assessments. All subjects gave informed consent for their participation. The study was approved by the Jewish General Hospital/McGill University and Concordia University Human Ethics committees.
Seventeen healthy elderly adults were recruited from the Herzl Family Medicine Clinic of the JGH and screened at the JGH Memory Clinic to ensure they had no symptoms of dementia and their neuropsychological profile was normal (Clinical Dementia Rating, CDR = 0; see also neuropsychological performance in Table 1). Two subjects were excluded from analyses due to technical difficulties during ERP testing. The final sample consisted of 15 subjects matched on age and education with the DAT and MCI patients described below (Table 1).
Mean (and SD) of subject demographics and neuropsychological performance of all subjects
Eighteen patients diagnosed with probable DAT participated; four were excluded due to either task confusion or excessive fatigue. The remaining 14 subjects had a CDR of at least 1.0 and met the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) and the Alzheimer's Disease and Related Disorders Association (ADRDA) criteria for probable Alzheimer's Disease (McKhann et al., 1984).
Seventeen individuals identified as having mild cognitive impairment (MCI) participated; one subject was excluded due to technical difficulties during ERP testing. The remaining 16 subjects all received CDR scores of 0.5, indicating mild forgetfulness, minimal word finding difficulties, and a slight impairment in mental efficiency (Hughes et al., 1982). In all subjects, there was a reported decline (by either the individual or family) in memory function which was gradual and of at least 6 months duration. This was documented by impaired performance (i.e., ≥1.5 SD) on objective neuropsychological tests with appropriate norms for age and/or education (see below). As indicated in Table 1, these subjects had documented mild memory dysfunction. None were considered to have significant impairment in activities of daily living and none met the criteria for dementia, as determined by the assessing physician in the Memory Clinic.
As part of their diagnostic clinical examination, the DAT and MCI participants underwent a neuropsychological evaluation which included the Mini-Mental State Exam (MMSE; Folstein et al., 1975), immediate and delayed verbal recall (Story A from the Logical Memory I and II subtests, respectively, of the Wechsler Memory Scale, Revised; WMS–R; Wechsler, 1987), confrontation naming (Boston Naming Test; Kaplan et al., 1983), orthographic (letters F and S) and semantic (category: animals) oral fluency in 60 s (Benton & Hamsher 1989; Tombaugh et al., 1966, cited in Spreen & Strauss, 1998), and immediate memory (Digit Span subtest from the Wechsler Adult Intelligence Scale, Revised; WAIS–R; Wechsler, 1981).1
Although at least one other test of memory (e.g., the WMS–R Visual Reproduction subtest and the Rey Auditory Verbal Learning Test) was usually included in the diagnostic evaluation, these data were not consistently available for all participants and, therefore, were not used for the correlation and classification analyses reported below.
MRI scans were obtained on the majority (37/45) of subjects with ERP data. Scans for eight subjects were not available due to either technical problems or refusal to participate. Table 2 summarizes demographic and neuropsychological data for this MRI subsample and shows that the subject groups remained equated on age and education, and differed with respect to MMSE and Logical Memory I and II performance.
Mean (and SD) of subject demographics and neuropsychological performance of subsamples with MRI data
Commercially available software (Neuroscan Inc.) controlled stimulus presentation and data acquisition. Stimuli were individual yellow consonants presented on a black computer screen. During a fixed-set Sternberg memory search paradigm (Sternberg, 1966) subjects were presented with a total of five memory sets, ranging in size from 1 to 5 letters. For each memory set, subjects were instructed to memorize the letter(s) which were presented for 60 s. Following presentation of a memory set, 100 single letter probe stimuli were presented one at a time for 700 ms; 24% were members of the memory set (positive probes) and the remaining probes were not (negative probes). Probe stimulus onset asynchrony was 4200 ms. Subjects were instructed to indicate as quickly and accurately as possible via button press whether or not each probe stimulus had been part of the preceding memory set. Thus, subjects were required to remember a given memory set and respond to probe stimuli over an approximate 5-min period. The order of memory set size presentation was randomized across subjects, with the exception that DAT subjects never began with a memory set size of 5. Button press responses (left or right thumb presses for positive or negative probe trials) were counterbalanced across subjects. An experimenter was present in the testing room to ensure that subjects paid attention to the task. Subjects were instructed to sit quietly and avoid blinking during the presentation of a probe stimulus.
Electroencephalogram (EEG) activity was recorded using tin electrodes embedded in a commercially available nylon cap from scalp locations Fz, F7, F8, Cz, T3, T4, Pz, P3, and P4 of the 10–20 system of electrode placement (Jasper, 1958), which were referenced to linked ears. Electrode impedance was ≤5 kΩ. EEG epochs were time-locked to the presentation of each probe stimulus, amplified in a 0.1–100 Hz bandwidth, and sampled at 200 Hz for 1100 ms (100 ms pre-stimulus baseline). Electro-oculogram activity (EOG) was recorded supra-orbitally and at the outer canthus of one eye; recordings were corrected off-line by regression analysis for EOG artifact (Semlitsch et al., 1986). The transmission coefficients computed for each correction were reviewed in every subject to ensure adequate and reliable correction. ERPs were averaged and scored for each subject according to probe type (positive or negative) and memory load (set size 1 to 5). Only trials on which subjects responded accurately were included in the averages. The memory scanning P300 was defined as the largest positivity occurring 300–800 ms post stimulus. Latency was measured relative to stimulus onset in ms and amplitude relative to the 100-ms pre-stimulus 0 μV baseline.
Response accuracy was defined as correctly identifying a probe stimulus as being either a positive or negative member of the memory set. Reaction time was measured in ms from probe stimulus onset to button press response.
Since this study capitalized on the availability of radiological data obtained during subjects' clinical assessment, the MRI data were acquired in one of two locations, the Montreal General Hospital or the Montreal Neurological Institute. Scans from the Montreal General were acquired using a 1.5-Tesla system (SIGNA, GE Medical, Systems Milwaukee, Wisconsin). T1-weighted, 5 mm sagittal images were obtained through the brain using a fast gradient echo sequence (pulse sequence: TR = 300, TE = 22, number of excitations = 2, flip angle 90°; no interslice gap). T1-weighted, 4 mm, coronal images were obtained through the temporal lobes using the same pulse sequence. Sagittal and coronal images were then merged to occupy one standard voxel grid (181 × 217 × 181) with 1 mm cubic voxels to bring the scanned images into stereotaxic space (Neelin et al., 1993).
MRI scans at the Montreal Neurological Institute were acquired on a Philips Gyroscan ACS, 1.5 Tesla super-conducting magnet system. T1-weighted images were obtained using three-dimensional spoiled gradient-echo acquisition with sagittal volume excitation (TR = 18, TE = 10, flip angle 30°, 1 mm isotropic voxels, 140–180 sagittal slices). Fewer than 22% of subjects were scanned at the Montreal Neurological Institute; these were distributed equally throughout the control and DAT groups. This number was too small to analyze separately from the pool of subjects scanned at the Montreal General; however, examination of mean hippocampal volumes from both subsamples indicated the two protocols yielded comparable results.
All volumetric measurements were made using Display, an interactive 3-D imaging software package developed at the Brain Imaging Centre of the Montreal Neurological Institute (MacDonald et al., 1994). The contours of the hippocampus were measured (Watson et al., 1992) using manual segmentation by a single rater (H.P.) who was blind to group membership. In this process voxels within the hippocampus are “painted” using the three axes to help identify the boundaries of the hippocampus. Preliminary analyses revealed no significant differences between volumes of the left and right hippocampi [F(1,34) = 2.72, p = .11], and no interaction with group [F(2,34) = 0.23, p = .80]; therefore, the mean volume of the two hippocampi was computed and used in all subsequent analyses.
Data were analyzed using one-way analyses of variance (ANOVAs) with significant differences explored using the Tukey A post-hoc test. Table 1 summarizes group neuropsychological test performance. As expected, DAT subjects had significantly worse performance on the MMSE, tests of short- and long-term memory (Logical Memory I and II, respectively), and language function (Boston Naming and Oral Fluency) when compared to controls. DAT participants were also impaired on the same tests relative to the MCI participants, with the exception of the Boston Naming Test. The performance of MCI participants differed significantly from that of normal controls on tests of short- and long-term memory function and semantic (animal) fluency.
ERP data were analyzed in a between (group) and within (memory load, probe type) mixed-factors ANOVA. Analyses were conducted on data from only the Pz electrode site to be consistent with previous reports (deToledo-Morrell et al., 1991) and with analyses conducted with MRI data, reported below. Analyses conducted with the larger array of electrodes yielded comparable results.
Figure 1 shows a significant main effect of group on P300 amplitude [F(2,42) = 5.11, p = .01]. Planned comparisons revealed that P300 amplitude was significantly reduced in DAT subjects (M = 8.8 μV, SD = 5.5) compared to controls [M = 13.0 μV, SD = 5.2; F(1,42) = 7.4, p = .009] and the MCIs [M = 12.2 μV, SD = 4.0; F(1,42) = 5.5, p = .02]; the latter two groups did not differ [F(1,42) = 0.2, p = .67]. There was a significant effect of memory load [F(4,168) = 9.1, p < .0001]. P300 amplitude was larger in Set Size 1 compared to all other set sizes and, as shown in Figure 2, this did not interact with group (ps > .1). Significantly larger P300s were elicited by positive probes (M = 12.5 μV, SD = 5.6) than negative probes [M = 10.4 μV, SD = 4.77; F(1,42) = 39.4, p < .0001]. Group did not interact with either memory load or probe type (ps > .1).
Grand ERP averaged waveforms recorded from the midline parietal electrode depicting the P300 at approximately 500 ms. Waveforms are presented for each subject group averaged across memory load and probe status.
Mean P300 amplitude plotted as a function of group and memory load (error bars denote standard error). P300 amplitude decreased with increasing memory load in all groups. Note the substantial overlap between MCI (open circles) and control subjects (closed circles).
P300 latency did not differ between groups [F(2,42) = 2.5, p > .05]. P300 was significantly earlier in Set Size 1 (M = 481 ms, SD = 99.4) relative to all other set sizes [Set Sizes 2–5: 539 ms, 531 ms, 543 ms, 542 ms; F(4,168) = 6.8, p = .0003; Tukey A test, all ps < .05]. Again, group did not interact with either memory load or probe type (ps > .1).
Reaction time and accuracy to respond to the probe stimuli were each analyzed using a mixed between (group) and within factors (memory load, probe type) ANOVA. Significant effects (ps ≤ .05) involving three or more means were further explored using the Tukey A post-hoc test (family-wise α ≤ .05).
Accuracy differed as a function of group [F(2,42) = 23.5, p < .0001]. Figure 3 (top panel) shows that, although accuracy was high, DAT patients were significantly less accurate than controls and MCIs, who did not differ from each other. Subjects were more accurate in identifying negative (M = 92%, SD = 11.1) than positive probes (M = 86%, SD = 14.7), probe type main effect [F(1,42) = 22.4, p < .0001]. Group did not interact with either within factor.
Recognition accuracy (top panel) and reaction time (bottom panel) to probe stimuli during the ERP recognition memory task. Error bars denote standard deviations.
Reaction time also differed as a function of group [F(2,42) = 10.8, p < .0001]. Fig. 3 (bottom panel) shows that DAT patients were significantly slower in responding to probe stimuli than controls and MCIs, who did not differ from each other. Subjects responded more quickly to negative probes (M = 801 ms, SD = 306) than positive probes (M = 835 ms, SD = 312), probe type main effect [F(1,42) = 7.7, p = .009]. Reaction times were longer with increasing memory load [F(4,168) = 16.1, p < .0001]. Group did not interact with either within factor.
As shown in Table 2, hippocampal volumes were significantly smaller in DAT and MCI subjects compared to controls. Pearson product-moment correlations were conducted to evaluate the relationship between the neuropsychological, neuroanatomical, and electrophysiological memory indices. Across the three subject groups, there was a significant positive correlation between hippocampal volume and memory scanning P300 amplitude [r(35) = .41, p = .01], and between hippocampal volume and Logical Memory II performance [r(35) = .44, p = .01]. There was no significant relationship between P300 amplitude and Logical Memory II performance [r(35) = .20, p > .05].
A discriminant function analysis was performed to determine whether electrophysiological, neuropsychological, and structural measures could be used to accurately distinguish between control, MCI, and DAT cases. Specifically, mean P300 amplitude at Pz was employed as the electrophysiological measure because P300 amplitude is maximal at this scalp location (Picton, 1992). Performance on the Logical Memory II subtest was employed because long-term memory is the most salient and reliable neuropsychological deficit in early DAT. Mean hippocampal volume was used as the structural measures. Initial inspection of the data revealed no multivariate outliers (p > .05) on the three variables.
There was statistically significant discrimination between the three groups on the basis of these predictors [F(6,64) = 11.52, p < .001]. After adjustment for the other two predictor variables, only Logical Memory II performance significantly separated the three groups [F(2,32) = 27.1, p < .05], with non-significant contribution from mean hippocampal volume [F(2,32) = 2.89, p < .1]. The contribution from P300 amplitude did not approach significance [F(2,32) = 1.23, p > .1]. Logical Memory II performance accounted for 79% of the group membership variance (squared semipartial correlation). Seventy-four percent of subjects were correctly classified by the discriminant function, based on a jackknife procedure which reduced bias in classification. As shown in Table 3, classification was most accurate for DAT subjects (91%), intermediate for controls, and poorest for the MCI group. As would be expected, misclassified cases were assigned to the more proximal group in terms of neurological integrity (e.g., the misidentified DAT case was wrongly assigned to the MCI group, not to the Control group).
Jackknife classification of subjects using three-factor discriminant function analysis
The same three predictors were evaluated in a second discriminant function analysis to determine their ability to discriminate MCI subjects from controls. A significant discriminant function was obtained [F(3,22) = 5.9, p < .005]. With all three variables forced into the function, only Logical Memory II performance [F(1,22) = 7.0, p < .05] and mean hippocampal volume [F(1,22) = 5.7, p < .05], significantly discriminated the two groups. The squared semipartial correlation of the grouping variable with Logical Memory II performance was 18% and with hippocampal volume it was 15%. The jackknife procedure based on the discriminant function correctly classified 75% of the MCI and 64% of the Control subjects.2
A logistic regression analysis conducted using Logical Memory II performance and hippocampal volume alone confirmed this result by yielding a significant model which correctly classified 75% of the MCI and 93% of the control participants.
Before elaborating the main findings, it is important to review the neuropsychological profiles of our subject samples. The DAT subjects were impaired in memory and language function relative to the normal controls. The MCI subjects had demonstrable memory deficits which were less severe than those of the DAT subjects. Moreover, the MCI subjects did not differ reliably from the controls in non-memory cognitive functions, with the exception of semantic fluency. Thus, the neuropsychological profile of our samples appear representative of those reported in the literature (Flicker et al., 1991; Petersen et al. 1999).
It is also necessary to comment on the extent to which our basic electrophysiological data are consistent with findings in the literature. As stated in the Introduction, previous research has tended to focus on P300 latency findings in dementia samples using the oddball paradigm, as opposed to the short-term memory search task used here. We found delays in P300 latency with increasing memory load (Set Size 1 vs. all other memory loads) in all three groups, but no group differences in P300 latency. In this aspect, our data accord well with other memory-search studies which also found delayed latencies with increasing set size (Adam & Collins, 1978; Ford et al., 1979; Pratt et al., 1989; Starr & Barrett, 1987). It is true that we did not find between-group differences in memory scanning P300 latency as other studies have found using the oddball paradigm (e.g., Goodin et al., 1978; Polich et al., 1990; although see Kraiuhin et al., 1990, and Patterson et al., 1988, for other negative findings). However, there are notable differences between the visual memory scanning task we employed and the typical oddball paradigm, including differences in modality, probability of the matching stimulus, and the underlying psychological processes. The average P300 latency for elderly control subjects during Set Size 1 in this study was 470 ms, which is substantially later than averaged latencies observed in standard oddball tasks (c.f., Goodin et al., 1978; Kraiuhin et al., 1990; Patterson et al., 1988). Our latency data are in keeping with those reported by deToledo-Morrell et al. (1991) who also did not find that P300 latency discriminated their groups. Taking our latency and amplitude findings together, it appears that memory search processes have comparable timing in memory impaired patients and age-matched controls, but differ in the extent to which they are evoked.
Brief comments on the RT and accuracy data are also warranted. Accuracy was generally high, showing that even the DAT participants understood the task. The expected prolongation of RT with increases in set size was observed and this effect was consistent across groups. Unexpectedly, we found that mean RT was faster for negative probes than positive probes, which stands in contrast to the more common facilitation for positive responses (Sternberg, 1966; 1975). However, this facilitation is not always observed (Boaz & Denny, 1993; Ford et al., 1979). The fact that we employed a fixed-set version of the task with relatively infrequent positive probes may also explain this finding, as this has been shown to reduce facilitation for positive probes (Sternberg, 1975).
We now turn to evaluating the three major goals of this study, which will be discussed in sequence. Our first goal was to determine if memory scanning P300 amplitude is a sensitive indicator of memory dysfunction in patients with DAT and MCI. Our results partially replicate those of deToledo-Morrell and colleagues (1991) in that P300 amplitude was significantly reduced in DAT subjects relative to controls. However, this was a group main effect observed at all levels of memory load and the group differences were not more pronounced when the memory load increased. It is not clear why we did not fully replicate the results of deToledo-Morrell et al. (1991). With respect to their DAT patients, we employed a larger sample size, our patients were similar to theirs in terms of age and education, and our patients had clearly demonstrated memory deficits. Moreover, both studies employed similar recording methods and experimental tasks. Nevertheless, our findings are more consistent with those of Swanwick and colleagues (1997) who also did not observe a specific memory-load decrease in P300 amplitude in mild DAT patients. Given that both this study and Swanwick et al. (1997) used larger sample sizes, the results do not support the hypothesis that P300 amplitude provides a more sensitive marker of hippocampal dysfunction in memory impaired patients when memory demands are increased.
P300 amplitude did not differ between MCI subjects and controls in this study, nor was a group difference evident when P300 was measured during larger memory loads. This is consistent with the behavioral data obtained during this short-term recognition memory task. The DAT subjects were slower and less accurate than controls and MCI subjects, but the performance of these latter groups did not differ. We used the present task to replicate previous methods and to determine whether a task which differentiates DAT patients from controls could do so for MCI subjects. Inspection of Figures 1 and 2 does not provide any evidence for the sensitivity of P300 amplitude in distinguishing MCI subjects from controls. However, it is possible that this short-term recognition memory test was either too simple to reveal memory dysfunction in the MCI patients or did not target appropriate aspects of memory function. It is possible that ERP activity elicited during working memory or long-term episodic memory tasks would be more sensitive to dysfunction in MCI patients. With respect to the former, it is possible that using a task in which the memory set changed from trial to trial or a n-back task would have tapped into working memory processing and been more sensitive to deficits in MCI patients (Perry & Hodges, 1999). Alternatively, Olichney et al. (2002) found evidence that ERP measures of verbal recognition memory may be useful in identifying MCI subjects who subsequently develop dementia.
Our second goal was to evaluate converging evidence for the ERP memory marker. We observed a significant positive relationship between memory scanning P300 amplitude and hippocampal volume. This is consistent with findings from intracerebral recordings (Halgren et al., 1998; McCarthy et al., 1989) and studies of patients with medial temporal lobe pathology (Puce et al., 1989) which indicate that hippocampal and medial temporal lobe areas contribute significantly to oddball P300 activity (note, however, not all studies support this conclusion; e.g., Polich & Squire, 1993; Rugg et al., 1991). Our results indicate that there is a similar relationship between hippocampal size and memory scanning P300 amplitude.
The third goal was to determine whether a combination of electrophysiological, neuropsychological, and neuroanatomical measures would be able to discriminate between the DAT, MCI, and control groups. In the three-group analysis, only 74% of all subjects were correctly classified. Performance on the Logical Memory II subtest was the only significant predictor of group membership and neither hippocampal volume nor P300 amplitude were significant predictors once the contribution of the delayed verbal recall memory score was taken into account. Classification was very accurate for the DATs and poorest for the MCI subjects, 25% of whom were misclassified as controls and 17% as DATs. Better discrimination of the MCI subjects from the controls was achieved with a discriminant function computed for these two groups alone. In this case, both delayed verbal recall and mean hippocampal volume contributed significantly to the differentiation between groups. We are following these patients longitudinally; however, only two MCI subjects have progressed to dementia, a number too few to analyze separately. One cannot argue that the lack of sensitivity of the memory scanning P300 amplitude is due to the inclusion of a large proportion of MCI participants with non-progressing cognitive deficits, since episodic memory and hippocampal volume measures did discriminate this sample of MCIs from their normal controls. It may be that P300 amplitude can identify the MCI patients who decline to dementia, but the present data do not speak to this issue.
Several studies have demonstrated the utility of neuropsychological batteries which include tests of memory for distinguishing DAT subjects from elderly controls (Petersen et al., 1994; Storandt et al., 1984; Tierney et al., 1987). Moreover, psychometric memory performance has been shown to be a significant predictor of the development of DAT in minimally impaired subjects (Albert et al., 2001; Tierney et al., 1996; Tuokko et al., 1991). Studies have indicated that MRI measures of medial temporal lobe structures also can distinguish DAT subjects from controls (Erkinjuntti et al. 1993; Jack et al., 1992; Killiany et al. 1993). However, with the following exceptions, very few studies have compared the relative discriminative power of anatomical brain measures against neuropsychological measures directly and none have included electrophysiological data. Jack et al. (1999) found that hippocampal volume was a significant predictor of conversion to dementia in MCI patients even after controlling for neuropsychological performance using bivariate models; however, they did not evaluate the increase in predictive power of hippocampal volume measures over and above the use of free recall memory scores or against more than one neuropsychological measure at a time. De Leon et al. (1993) found that MRI-based measures of perihippocampal dilation was a better predictor of decline to dementia in initially non-demented memory impaired subjects than was performance on a neuropsychological battery which included several memory measures. In contrast, Visser et al. (1999) showed that memory performance was a better predictor of the development of dementia in mildly impaired subjects than MRI-based measures of medial temporal lobe atrophy. Laakso et al. (2000) also reported superior discrimination between subjects with DAT and normal controls using a measure of delayed visual recall than that afforded by hippocampal volume measurements.
Our data suggest that performance on tests of long-term memory is a better predictor of group membership than the neuroanatomical measure and is notably better than the electrophysiological measure. However, we concentrated on hippocampal volume alone and may have obtained more powerful results by including measures of medial temporal lobe atrophy. Recent studies have shown that MRI measures of entorhinal cortex volume are more sensitive than hippocampal volume in identifying pre-clinical DAT (e.g., Dickerson et al., 2001; Killiany et al., 2002). Also, because our data were derived from the clinical examination of our patients, we were limited to using Logical Memory scores. Logical Memory performance was used as part of the larger diagnostic battery which may have positively influenced its classification rate, although diagnosis was made on the basis of a full neuropsychological evaluation (which included other memory measures) and a clinical medical and neurological assessment. Moreover, our results are in agreement with the literature on the sensitivity of neuropsychological measures of memory to group differences.
Although this study had relatively small sample sizes, it was unique in its evaluation of electrophysiologial, neuroanatomical, and neuropsychological measures of memory in healthy elderly subjects, patients with DAT, and patients at risk for developing DAT. Our study does not provide evidence for a selective decrease in P300 amplitude under increasing memory load in patients with DAT, at least as recorded in this short-term recognition memory task. Amplitude reductions with increasing memory loads were observed in all three groups studied, but this effect did not differentiate MCI participants from controls and did not contribute to group discrimination above and beyond that afforded by long-term verbal memory performance and hippocampal volume estimates. It is likely that tasks requiring learning and retrieval from long-term memory or tasks that stress working memory are necessary to reveal electrophysiological deficits in subjects with mild memory dysfunction at high risk for developing Alzheimer's disease. In the meantime, this study and the extant literature provide strong support for the relatively greater sensitivity of neuropsychological memory testing over more technologically sophisticated methods in the diagnosis of MCI and DAT.
The authors are grateful to S. Solomon for help in subject recruitment, to the staff of the Jewish General Hospital Memory Clinic for their co-operation, and to the subjects for their participation. We are grateful to G. Whitaker and the Division of Clinical Neurophysiology, Department of Neurology, Jewish General Hospital for sharing of facilities. This work was supported by grants to N.A. Phillips from the Concordia University Faculty Research Development Program and the Alzheimer Society of Montreal, by grants to H. Chertkow from the Alzheimer Society of Canada, the Fonds de la Recherche de Santé en Québec, and the Medical Research Council of Canada, and by a post-doctoral fellowship awarded to S. Murtha from the Alzheimer Society of Canada.
Preliminary versions of these data were reported at the Rotman Research Institute 8th Annual Conference, March 19–20, 1998, Toronto, Ontario, and at the Seventh Cognitive Aging Conference, April 23–26, 1998, Atlanta, Georgia.
Mean (and SD) of subject demographics and neuropsychological performance of all subjects
Mean (and SD) of subject demographics and neuropsychological performance of subsamples with MRI data
Grand ERP averaged waveforms recorded from the midline parietal electrode depicting the P300 at approximately 500 ms. Waveforms are presented for each subject group averaged across memory load and probe status.
Mean P300 amplitude plotted as a function of group and memory load (error bars denote standard error). P300 amplitude decreased with increasing memory load in all groups. Note the substantial overlap between MCI (open circles) and control subjects (closed circles).
Recognition accuracy (top panel) and reaction time (bottom panel) to probe stimuli during the ERP recognition memory task. Error bars denote standard deviations.
Jackknife classification of subjects using three-factor discriminant function analysis