Introduction
A recent National Institute of Aging and Alzheimer's Association workgroup articulated the need to better define preclinical Alzheimer's disease (AD) for future clinical research studies (Sperling et al., Reference Sperling, Aisen, Beckett, Bennett, Craft, Fagan and Phelps2011). Because some AD risk factors are potentially addressable (Hendrie et al., Reference Hendrie, Albert, Butters, Gao, Knopman, Launer and Wagster2006), early identification of persons with preclinical AD may help to prevent or delay disease onset. The present study explores the use of translational computational approaches in cognitive testing to identify high and low risk AD population differences.
Cognitive Testing in Preclinical Alzheimer's Disease
Longitudinal cognitive testing has been used to identify asymptomatic persons thought to be at risk for AD. The Framingham Study, the Baltimore Longitudinal Study of Aging and the Nun Study described cognitive changes 10 or more years before disease onset in verbal learning, abstract reasoning, visual learning, and verbal narratives (Elias et al., Reference Elias, Beiser, Wolf, Au, White and D'Agostino2000; Kawas et al., Reference Kawas, Corrada, Brookmeyer, Morrison, Resnick, Zonderman and Arenberg2003; Snowdon et al., Reference Snowdon, Kemper, Mortimer, Greiner, Wekstein and Markesbery1996). A meta-analysis of cognitive changes in preclinical AD showed statistical differences in areas such as episodic memory (Bäckman, Jones, Berger, Laukka, & Small, Reference Bäckman, Jones, Berger, Laukka and Small2005). However, the lowest mean age of participants in these studies was 62 years and a third of the subjects were 75 years or older at the time of baseline testing. Current models suggest that AD pathogenesis may begin as early as 20 years before diagnosis (Braak & Braak, Reference Braak and Braak1990); consequently, the search for subtle cognitive changes that may correlate with preclinical AD pathology has assumed a new importance (Sperling et al., Reference Sperling, Aisen, Beckett, Bennett, Craft, Fagan and Phelps2011).
Previous work has focused on list learning tests for differentiating cognitively normal persons from those with AD. Although summary scores from such tests may differentiate Mild Cognitive Impairment (MCI) or AD from normal aging (Grundman et al., Reference Grundman, Petersen, Ferris, Thomas, Aisen, Bennett and Thal2004; Welsh, Butters, Hughes, Mohs, & Heyman, Reference Welsh, Butters, Hughes, Mohs and Heyman1991), additional performance measures that reflect specific aspects of the learning process have also proven useful. On lists comprised of unrelated words, such as the Rey Auditory Verbal Learning Test (AVLT) (Lezak, Howieson & Loring, Reference Lezak, Howieson and Loring2004; Rey, Reference Rey1964), differences were noted between persons with AD and controls on measures including serial position effects and subjective organization. Persons with AD, even at mild stages, disproportionately recall words from the end of a supraspan list (“recency effect”) compared to words at the beginning of the list (“primacy effect”) (Capitani, Della Sala, Logie, & Spinnler, Reference Capitani, Della Sala, Logie and Spinnler1992). These differences, which may signal compensation for weak consolidation in episodic memory processing, are consistent with compromised function of the hippocampus (Hermann et al., Reference Hermann, Seidenberg, Wyler, Davies, Christeson, Moran and Stroup1996) and other mesial temporal brain regions, which is characteristic of this disease (Jack et al., Reference Jack, Petersen, Xu, O'Brien, Smith, Ivnik and Kokmen1999; Killiany et al., Reference Killiany, Gomez-Isla, Moss, Kikinis, Sandor, Jolesz and Albert2000). Persons with AD also are less likely to impose and consistently use subjective organization procedures—that is, combinations of words remembered together in subsequent trials (Ramakers et al., Reference Ramakers, Visser, Aalten, Bekers, Sleegers, van Broeckhoven and Verhey2008). Subjective organization reflects an idiosyncratic organization imposed by the individual, instead of responses to obvious semantic relationships (Bousfield & Bousfield, Reference Bousfield and Bousfield1966; Tulving, Reference Tulving1962). The active organization that this requires may involve a combination of executive function—for example, working memory—and semantic memory. As a result, AD-related reductions in subjective organization may stem from changes in the prefrontal cortex (Ramakers et al., Reference Ramakers, Visser, Aalten, Maes, Lansdaal, Meijs and Verhey2010) and/or temporoparietal regions (Wolk, Dickerson, & Alzheimer's Disease Neuroimaging Initiative, Reference Wolk and Dickerson2011).
Supplementary learning measures have also been studied in non-demented persons who were at increased risk of developing AD. Regarding serial position effects, Howieson et al. (Reference Howieson, Mattek, Seeyle, Dodge, Wasserman, Zitzelberger and Jeffrey2010) observed an intermediate level of primacy during AVLT testing for persons diagnosed with MCI compared to cognitively normal controls and persons with AD. La Rue et al. (Reference La Rue, Hermann, Jones, Johnson, Asthana and Sager2008) showed a small but statistically significant serial position effect in a middle-aged sample where asymptomatic persons with a parental family history of AD exhibited reduced primacy effects on the AVLT compared to controls without a parental family history of AD. Ramakers et al. (Reference Ramakers, Visser, Aalten, Maes, Lansdaal, Meijs and Verhey2010) measured subjective organization in the AVLT and found less subjective organization for patients diagnosed with MCI that progressed to AD as opposed to patients diagnosed with MCI that did not progress to AD.
Family History and AD Risk
Epidemiologic research indicates that a first-degree family history of AD increases risk of developing the disease (Cupples et al., Reference Cupples, Farrer, Sadovnick, Relkin, Whitehouse and Green2004; Lautenschlager et al., Reference Lautenschlager, Cupples, Rao, Auerbach, Becker, Burke and Farrer1996). Accumulating laboratory and neuroimaging evidence shows that preclinical indicators of AD are disproportionately evident among asymptomatic relatives of AD patients (Bassett et al., Reference Bassett, Yousem, Cristinzio, Kusevic, Yassa, Caffo and Zeger2006; Foldi, Brickman, Schaefer, & Knutelska, Reference Foldi, Brickman, Schaefer and Knutelska2003; Johnson et al., Reference Johnson, Schmitz, Trivedi, Ries, Torgerson, Carlsson and Sager2006; La Rue et al., Reference La Rue, Hermann, Jones, Johnson, Asthana and Sager2008; Mosconi et al., Reference Mosconi, Brys, Switalski, Mistur, Glodzik, Pirraglia and de Leon2007; van Exel et al., Reference van Exel, Eikelenboom, Comijs, Frölich, Smit, Stek and Westendorp2009; van Vliet et al., Reference van Vliet, Westendorp, Eikelenboom, Comijs, Frolich, Bakker and van Exel2009; Xu et al., Reference Xu, McLaren, Ries, Fitzgerald, Bendlin, Rowley and Johnson2009). Some studies reported family history interactions with the APOE genotype, whereas others reported independent family history effects, suggesting the presence of an unknown gene(s) that may be responsible. These findings suggest that asymptomatic children of persons with AD may be a particularly valuable cohort for prospective studies of preclinical AD (Jarvik et al., Reference Jarvik, LaRue, Blacker, Gatz, Kawas, McArdle and Zonderman2008).
Motivation and Purpose of this Study
In this study, we constructed a new scoring measure to differentiate asymptomatic persons with versus without a parental family history of AD. Our motivation for developing a new measure grew from the observation that primacy and subjective organization measures are computed over discrete binnings, that is, partitions, of the word list, ignoring the actual order of recall. This is shown in Figure 1. Although the sequence of recall is opposite for 1a) and 1b), the primacy score is the same. Additionally, although recall pairs 1a) and 1b) appear less similar than recall pairs 1b) and 1c), the subjective organization scores are the same. As these examples illustrate, the order of recall provides significant information about individual differences not captured by serial position or subjective organization measures.
Our long-term goal is to determine whether a machine learning framework can help identify individual patients with preclinical AD. Toward this aim, we first attempted to amplify the separation between high risk (family history of AD) and low risk (no family history of AD) populations. The main issue is whether more nuanced neuropsychological analyses that move beyond standard descriptive measures can uncover hidden signals that better separate these populations. We examine the benefits obtained from combining the pre-existing and new measures of AVLT performance to create a more informative aggregate measure. Our approach is novel in clinical neuropsychology literature as it combines methods drawn from statistics and machine learning in evaluating the psychometric testing. We demonstrate that this finer-grained analysis and concept of combining measures for AVLT data yields far greater separation between family history and control populations than traditional clinical analytic approaches.
Methods
Participants
Participants were English-speaking middle-aged adults (40 to 65 years at baseline) enrolled in the Wisconsin Registry for Alzheimer's Prevention (WRAP), a prospective longitudinal study which began in 2001 (see Sager, Hermann, & La Rue, Reference Sager, Hermann and La Rue2005). Family-history participants (FH+) had at least one parent with autopsy-confirmed or probable AD defined by NINCDS-ADRDA criteria (McKhann et al., Reference McKhann, Drachman, Folstein, Katzman, Price and Stadlan1984). Control participants (FH−) had mothers surviving to at least 75 years and fathers to at least 70 years without Alzheimer's disease, other dementia or significant memory deficits. The study protocol was approved by the institutional review board at the University of Wisconsin Hospital and Clinics and written informed consent was obtained from each participant.
Procedures
Clinical measures, health history, neuropsychological testing and laboratory tests such as APOE ε4 genotyping (Athena Diagnostics, Worcester, MA and the Laboratory for Endocrinology, Aging, and Disease, William S. Middleton Memorial Veterans Hospital, Madison, WI) were collected for all WRAP participants (see Sager et al., Reference Sager, Hermann and La Rue2005, for greater detail). Multiple measures in the WRAP neuropsychological battery were reduced using factor analysis to six weighted, standardized summary scores (see Appendix A and Dowling, Hermann, La Rue, & Sager, Reference Dowling, Hermann, La Rue and Sager2010, for a description of factor procedures).
The AVLT, collected at baseline assessment, served as the primary neuropsychological measure for the present analyses. This test involves recalling 15 unrelated nouns across 5 learning trials. Repetitions or intrusions were not included in the recall sequence.
AVLT Measures
One of the most commonly reported AVLT scores is the total number of words recalled over the five learning trials. We examined two other standard AVLT measures and introduced a novel, more effective approach. To provide examples of how these measures were computed, we represented recall as a 15-element vector (i.e., ordered list). Each number represented the position in the order of presentation from the original list and the position of each element in the vector was the position in the order of recall. In other words, position one in the vector was the first word recalled while the number in that position was the word's location in the original list. We filled in zeros at the vector positions (up to the 15th element) after the last word that was recalled. Below are two examples of recall: trial a = (1,2,3,4,0,0,0,0,0,0,0,0,0,0,0) and trial b = (8,7,4,3,1,2,6,0,0,0,0,0,0,0,0). For instance, in trial b, word 8 in the original list was recalled first.
Serial position primacy
Serial position effects were calculated for the five learning trials. Primacy is defined as the proportion of the first four words from the list that were recalled (Foldi, Kneutelska, Winnick, Dahlman, & Andreeva-Cook, Reference Foldi, Kneutelska, Winnick, Dahlman and Andreeva-Cook2005; Hermann et al., Reference Hermann, Seidenberg, Wyler, Davies, Christeson, Moran and Stroup1996; La Rue et al., Reference La Rue, Hermann, Jones, Johnson, Asthana and Sager2008). In the examples given above, the primacy score was 1.0 for both sequences. The middle region and recency were not examined because primacy was the only serial position score showing a significant family history group differences in previous analyses (La Rue et al., Reference La Rue, Hermann, Jones, Johnson, Asthana and Sager2008).
Subjective organization
A paired frequency measure of subjective organization (Bousfield & Bousfield, Reference Bousfield and Bousfield1966; Ramakers et al., Reference Ramakers, Visser, Aalten, Bekers, Sleegers, van Broeckhoven and Verhey2008; Tulving, Reference Tulving1962) was used. A higher subjective organization score indicates that more word pairs are recalled together in two sequential trials. We calculated this for the first five trials (i.e., trial 1 and trial 2, trial 2 and trial 3, trial 3 and trial 4, trial 4 and trial 5), which provided four measures. (See Appendix B for examples of subjective organization calculation.)
Euclidean distance measure
A new measure to investigate AVLT recall strategy was derived from machine learning, a major subfield of computer science. One aspect of machine learning is the design of algorithms to classify new examples using previously collected data. For example, given a previously unencountered person, can we predict if they are at high risk of AD? A basic task is to select or engineer features that separate between classes of interest. This is a classic problem in machine learning and statistics known as feature selection (Guyon & Elisseeff, Reference Guyon and Elisseeff2003). Creating novel features that accentuated the difference between FH+ (high risk of AD development) and FH− (low risk of AD development) was the motivation for formulating the new approach to extract data from AVLT performance.
We constructed a new measure using the Euclidean distance between trials i and i + 1 defined as:
where ti,j was position j from trial i, ti + 1,j was position j from trial i + 1. A higher Euclidean distance measure indicates greater word order variability between two recall trials. For example, the Euclidean distance between recall a and b is 10.8. (See Appendix B for Euclidean distance example calculation). We calculated the measure between sequential trials from the five learning trials, which provided four measures.Footnote 1
Metric combination
We constructed a new aggregate measure:
where was the normalized primacy score on trial i, was the normalized score of subjective organization between trials j and j + 1, and was the normalized Euclidean distance between trials k and k + 1. Trials i, j, and k may or may not be distinct. We normalized each measure to lie in [0, 1] to eliminate arbitrary scaling differences in their scoring methodology. For example, the maximum value for primacy is 1.0, whereas the maximum Euclidean distance is approximately 41.53. The parameters for β 1,β 2, and β 3 were set to β 1 = β 2 = β 3 = 1 (see Appendix C for parameter selection and its validity). This yielded a measure of:
Using this method with β 1 = β 2 = β 3 = 1 to obtain the aggregate measure, we tested all combinations of significant measures (p < .05) identified in the single measures’ analysis of variance (ANOVA). As justified in Appendix C, we chose i = 1, j = 1, and k = 3. (See Appendix B for a sample calculation). We also tested aggregate measures with only two significant measures. This included the three pairs: primacy and Euclidean measure, primacy and subjective organization, and Euclidean measure and subjective organization. The final aggregate measure chosen was:
Statistical Analysis
We examined the five trial total score (total words recalled), the primacy effect on each of the five learning trials, and the subjective organization and Euclidean measures between sequential pairs of trials from the five learning trials. Therefore, we had five measures for primacy while there were four measures for subjective organization and Euclidean measure. We calculated the mean and standard deviation in each trial for primacy and in each sequential trial pair for subjective organization and Euclidean measure. Type III sum of squares ANOVA was performed with family history, APOE ε4, age, sex, and education level as predictors and with the measures as the response variable. The same ANOVA analysis was performed with all aggregate measures. We also calculated the pairwise Pearson's correlation coefficients of the individual measures.
Comparison of p Values in Hypothesis Testing
Although it is conceptually and mathematically difficult to compare p values derived from the same dataset using different measures, we believe it is justified in this case. As is commonly described, the p value is defined as the probability, assuming the null hypothesis is true, of observing a value of the test statistic the same as or more extreme than what is actually observed (Wasserman, Reference Wasserman2004). The comparison of p values from the same dataset is useful only when it provides additional insight into the problem at hand (Ott & Longnecker, Reference Ott and Longnecker2001). In other words, comparing the value of the test statistic at p = .05 and p = .0005 is only meaningful when the process by which the p value is lowered is informative. Our method allows us to understand how lower p values were obtained. Namely, it tells us how using alternative measures or aggregate measures make the hypothesis tests more or less powerful.
Results
Table 1 shows characteristics of the FH+ and FH− groups. On average, family history subjects had statistically (p < .05) lower age, less college education, and more APOE ε4. Regarding cognitive factor scores, there were no significant family history group differences on verbal learning and memory, working memory, speed and flexibility, visuospatial ability, or verbal ability. The lower performance of the FH+ group on immediate memory was consistent with performance differences examined in greater detail in the current analyses.
Note: sd = standard deviation, BA/BS = Bachelor of Arts/Bachelor of Science, se = standard error. *Satterthwaite approximation used due to heteroskedasticity. Bold entries represent significant FH differences, p < .05.
Using the total words recalled over five trials, ANOVA for family history (p = .596) and APOE ε4 (p = .125) were not significant. Using the same ANOVA methods, we tested the relationship of trial-specific primacy, subjective organization or Euclidean measure with family history and APOE ε4. Table 2 shows that family history was significant for primacy trial 1 (p = .0059), subjective organization across trials 1–2 (p = .0224) and trials 3–4 (p = .0434), and the Euclidean measure across trials 3–4 (p = .00051). The Euclidean measure across trials 3–4 was most significant. In trial 1, the FH+ group showed lower primacy relative to controls and less organization in recall in the transition from trials 1 to 2. In contrast, in the transition from trials 3 to 4, FH+ participants showed higher subjective organization and lower Euclidean measure.
Note. p values are shown from type III sum of squares ANOVA for primacy, subjective organization or Euclidean measure with family history (FH), APOE ε4 status, age, sex and education level as predictors. sd = standard deviation; p-val = p value; FH = family history; APOE = apolipoprotein ε4.
The Pearson's correlations between the significant trial-specific measures are shown in Table 3. Correlations ranged from −0.269 to 0.224 and were similar using Spearman's non-parametric correlation (data not shown). Although some reached statistical significance, all were considered small (Cohen, Reference Cohen1988). These low correlations indicated that the components of the aggregate measure may reflect different cognitive processes.
Note. SO = subjective organization.
Table 4 shows ANOVA results of family history and APOE ε4 for the combination measures. We selected the significant trial-specific measures from ANOVA, which included primacy trial 1, subjective organization trials 1–2, subjective organization trials 3–4, and Euclidean measure trials 3–4. There were three combination measures with a lower p value than the single Euclidean measure across trials 3–4. These included: primacy trial 1 and Euclidean measure trials 3–4 (p = 4.92 × 10−5); subjective organization trials 1–2 and Euclidean measure trials 3–4 (p = 4.38 × 10−5); and primacy trial 1, subjective organization trials 1–2, and Euclidean measure trials 3–4 (p = 1.44 × 10−5). Figure 2 contrasts the FH+ and FH− population separation using our aggregate measure compared to a standard clinical score of the total words recalled in trials 1–5. APOE ε4 was not a significant predictor for any single or aggregate measure.
Note: ANOVA = analysis of variance; SO = subjective organization; t = trial; FH = family history; p-val = p value; APOE = apolipoprotein ε4.
Discussion
The main contribution of this study is the use of a machine learning framework to derive an aggregate measure to amplify a marginally statistically significant difference between asymptomatic middle-aged children of persons with and without a parental family history of AD as seen in Figure 2. Our approach used stochastic gradient descent from machine learning to combine three weakly correlated psychometric measures. We showed that these finer-grained and supplementary measures were not redundant and were likely signals for different aspects of learning occurring at various points across the learning trials. By deconstructing and combining these measures, we obtained significantly lower p values that did not indicate overfitting and provided a more accurate preclinical signal distinguishing AD family history and control groups.
Supplementary, Fine-Grained, and Aggregate Measures
Several previous studies of serial position effects and subjective organization (La Rue et al., Reference La Rue, Hermann, Jones, Johnson, Asthana and Sager2008; Ramakers et al., Reference Ramakers, Visser, Aalten, Maes, Lansdaal, Meijs and Verhey2010) examined average scores across learning trials. In contrast, by analyzing results on a trial-by-trial basis, we advocate combining different testing measures to capture different phenomena along a subject's learning curve for the AVLT, reflecting the dynamic processes occurring across trials.
We confirmed an earlier finding from the WRAP cohort that persons with a parental family history of AD showed reduced retrieval from the primacy region compared to those whose parents did not have AD (La Rue et al., Reference La Rue, Hermann, Jones, Johnson, Asthana and Sager2008). However, we demonstrated that this serial position effect was significant only on the initial learning trial, with a trend toward family history group differences on trial 2. The finding that primacy differences and other serial positional effects was most pronounced on the initial trial of learning tasks was reported in some prior studies, especially for cognitively intact persons (Howieson et al., Reference Howieson, Mattek, Seeyle, Dodge, Wasserman, Zitzelberger and Jeffrey2010), but others found distinct serial position scores on later as well as earlier trials (Carlesimo, Sabbadini, Fadda, & Caltagirone, Reference Carlesimo, Sabbadini, Fadda and Caltagirone1997). In our study, the high performance level overall on the AVLT suggested that ceiling effects may have reduced the likelihood of finding serial position differences on later trials. As shown in Table 2, both family history and control groups recalled over 80% of items from the primacy region by trials 4 and 5.
Comparing performance across adjacent trials provided significant insight into the strategies that subjects used to learn and recall additional words, complementing information from single-trial serial position scores. The most significant family history difference was seen with the Euclidean measure at trials 3–4, which had a p value for family history an order of magnitude lower than primacy in trial 1. Although several esoteric distance functions (e.g., Smith-Waterman alignment; Durbin, Eddy, Krogh, & Mitchison, Reference Durbin, Eddy, Krogh and Mitchison1998) had lower p values than Euclidean distance in this comparison, we selected Euclidean distance in this study for its simplicity. When analyzed trial-by-trial, learning tasks from another study also showed strongly significant signals between later trials that did not correspond to initial assumptions about how to interpret a standard neuropsychological test (Coen et al., Reference Coen, Selvaprakash, Dassow, Prudom, Colman and Kemnitz2009).
We found statistically significant family history group differences in subjective organization across trials 1–2 and trials 3–4. Controls had higher subjective organization in trials 1–2 than family history subjects. This result was similar in direction to the findings of Ramakers et al. (Reference Ramakers, Visser, Aalten, Maes, Lansdaal, Meijs and Verhey2010), which showed higher subjective organization for MCI patients who did not progress to AD compared to those that did progress to AD. By trials 3–4, however, persons in our sample with a family history of AD showed greater subjective organization in recall than controls, and they also had a smaller Euclidean measure for this pair of trials. Although we cannot be certain how to interpret these later-trial results, one possibility is that FH+ subjects may be consolidating their gains and committing to memory the items they know best at trial 3, whereas FH− participants may still be attempting to learn additional words at that point in the learning process.
Considering all results together, persons with a family history of AD exhibited a less efficient initial approach to list learning, for example, less ability to rehearse the first-presented items, and slowness in identifying subjectively related clusters within the list. In contrast, during later learning trials, family history participants were repeating subjectively organized units more consistently than controls and were drawing recalled items from less diverse serial positions in the list. The end result was that the two groups accomplished the same total learning but arrived there in different ways.
A recent structural MRI study of persons with mild AD found that early trial responses on the AVLT were more closely correlated with brain structures involved in working memory and semantic memory processes, while later learning trials and delayed recall were associated with medial temporal lobe structures involved in episodic memory (Wolk et al., Reference Wolk and Dickerson2011). In our clinically asymptomatic sample, it is possible that the family history group differences on early learning trials may have more to do with working memory and semantic memory than with secondary memory per se. Applying a brain-behavior approach such as that of Wolk et al. (Reference Wolk and Dickerson2011) to asymptomatic at-risk samples could help identify possible anatomic substrates for diverse aspects of the learning process that may change preclinically.
Having demonstrated that different weakly correlated measures were sensitive to different aspects of the learning and recall process, we used an aggregate measure to show that a simple normalized linear combination of these measures increased the statistical significance of the separation between family history and control populations. The combination with the lowest p value used all three measures on different trials. Primacy was significant on trial 1, subjective organization was significant across trials 1–2, and the Euclidean measure was significant across trials 3–4. Our approach was to find all signals in the data that distinguished the two groups.
Because we demonstrated that each individual measure contributed something new based on the statistical results when combining AVLT measures, this approach enabled a far more informative method for distinguishing populations with and without a family history of AD. This fine-grained analysis is the first step toward clarifying the role of cognitive tests such as the AVLT in identifying preclinical AD.
Family History and APOE ε4 in Preclinical Alzheimer's Disease
Although our analyses showed these measures to be significantly related to family history, no measure was related to APOE ε4. A recent meta-analysis of studies of APOE effects on normal cognition (Wisdom, Callahan, & Hawkins, Reference Wisdom, Callahan and Hawkins2011) concluded that having an ε4 allele was associated with a modest negative influence on episodic memory but that the effect size was smaller in younger subjects and for ε4 heterozygotes as opposed to homozygotes. The WRAP sample was relatively young and only 11.7% of our APOE ε4-positive subgroup had two ε4 alleles. Ramakers et al. (Reference Ramakers, Visser, Aalten, Bekers, Sleegers, van Broeckhoven and Verhey2008) reported an APOE ε4-related reduction in subjective organization during AVLT recall in a middle-aged sample, but their participants were already diagnosed with MCI.
Another factor that may lead to inconsistencies in the literature on preclinical cognitive effects of APOE is the likelihood of confounding effects between ε4 positivity and family history of AD. Many studies that examined APOE effects did not report family history status of participants or analyze results in terms of family history, alone or in conjunction with APOE. However, some other investigators discussed the importance of both APOE ε4 and family history (Corder & Caskey, Reference Corder and Caskey2009; Hayden et al., Reference Hayden, Zandi, West, Tschanz, Norton, Corcoran and Welsh-Bohmer2009). A parental history of AD was linked to reduction in overall brain glucose metabolism (Mosconi et al., Reference Mosconi, Brys, Switalski, Mistur, Glodzik, Pirraglia and de Leon2007), differences in BOLD signal reduction tasks (Xu et al., Reference Xu, McLaren, Ries, Fitzgerald, Bendlin, Rowley and Johnson2009), and a sharper trajectory of total cerebral brain volume decline in combination with APOE ε4 (Debette et al., Reference Debette, Wolf, Beiser, Au, Himali, Pikula and Seshadri2009). The current findings, combined with previously reported serial position results (La Rue et al., Reference La Rue, Hermann, Jones, Johnson, Asthana and Sager2008), suggest that parental family history of AD may also be linked to preclinical cognitive changes. These are subtle findings not reflected in traditional output measures of the AVLT, such as total words recalled.
Study Impact
Our results have implications both for AD research and for cognitive testing in general. Because the family history population is at increased risk of developing AD, observing enhanced differences between family history and control populations via traditional cognitive tests and a machine learning approach may help to signify a midlife indicator of preclinical AD. We believe these measures in combination with other predictors may be an integral component of predicting risk of future AD. By increasing clinical measures of separation between family history and control populations using this novel approach, it may become possible to identify individuals at risk of developing dementia who may benefit from cognitive rehabilitation (Hampstead, Sathian, Moore, Nalisnick, & Stringer, Reference Hampstead, Sathian, Moore, Nalisnick and Stringer2008; Verhaeghen, Marcoen, & Goossens, Reference Verhaeghen, Marcoen and Goossens1992) and lifestyle modifications (Barnes & Yaffe, Reference Barnes and Yaffe2011). The findings of this study are broadly applicable to neuropsychology research. In this initial study, we showed the best neuropsychological signal may not be easily developed from standard descriptive measures. Instead this signal may be derived via a combination of supplementary measures that themselves are best applied during different phases of the learning process.
There were limitations to this study. Although we hypothesized that family history and control differences may be predictive of future AD, these differences may relate to an entirely distinct phenomenon such as stable cognitive phenotypes (Greenwood, Lambert, Sunderland, & Parasuraman, Reference Greenwood, Lambert, Sunderland and Parasuraman2005). Few participants in this cohort developed AD during the time period of these analyses. We plan to analyze longitudinal data as the study progresses to evaluate the association of the aggregate measures with AD development. We acknowledge our approach was data driven and was meant for exploratory analysis. Additional analysis with similar family history datasets and datasets with differing AD risk, for example, MCI versus normal cognitive aging, must be done to confirm the significance of these methods. Furthermore, longitudinal data including disease outcome will be needed before it can be clinically useful as a predictive model.
While this study demonstrated clear benefits with respect to AVLT evaluation in particular, we believe the approach is quite general and can be applied to a variety of conventional neuropsychological tests. As such, it supports the view that more nuanced and sensitive performance measures can tell a different story about patients’ performance, and combining them can be far more informative than examining them individually.
Acknowledgments
This research was supported in part by grant 1UL1RR025011 from the Clinical and Translational Science Award (CTSA) program of the National Center for Research Resources (NCRR), National Institutes of Health (NIH), grant 2T32GM008962 from the Medical Scientist Training Program Award program of the National Institute of General Medical Sciences (NIGMS), NIH, and grant 5R01AG027161 from the National Institute of Aging (NIA), NIH. Additional support included the Helen Bader Foundation, Northwestern Mutual Foundation, Extendicare Foundation,Wisconsin Alumni Research Foundation, the Vilas Trust, and the School of Medicine and Public Health, the Department of Biostatistics and Medical Informatics, and the Department of Computer Sciences at the University of Wisconsin-Madison. The study protocol was approved by the institutional review board at the University of Wisconsin Hospital and Clinics and written informed consent was obtained from each patient (HSC H-2006-0202 on 3/2009 and HSC 2001-329 on 11/2009). We thank Grace Wahba and Marissa Phillips for helpful discussion and are especially grateful to the WRAP participants.
APPENDIX B: AVLT Measure examples
The two examples of recall in the text were: trial a = (1,2,3,4,0,0,0,0,0,0,0,0,0,0,0) and trial b = (8,7,4,3,1,2,6,0,0,0,0,0,0,0,0).
The subjective organization measure was calculated for trial i to i + 1 as j –{[2c(c−1)]/(hk)}, where j was the number of pairs of items recalled on trial i and i + 1 in adjacent positions, c was the number of common items recalled on both trials, h was the number of items recalled on trial i, and k was the number of items recalled on trial i + 1. Subjective organization between recall trial i and trial i + 1 was 1.14 and was calculated as follows:
Euclidean distance was calculated as follows:
An example calculation of the aggregate measure chosen, which used primacy trial 1, subjective organization trials 1–2 and Euclidean measure trials 3–4, is shown below. Assume a subject had the following recall for five trials.
Trial 1 = (14, 2, 12, 11, 3, 10, 0, 0, 0, 0, 0, 0, 0, 0)
Trial 2 = (1, 2, 14, 15, 6, 12, 8, 7, 11, 0, 0, 0, 0, 0, 0)
Trial 3 = (13, 14, 9, 3, 11, 5, 6, 7, 8, 1, 2, 0, 0, 0, 0)
Trial 4 = (5, 11, 3, 1, 10, 6, 7, 9, 12, 15, 8, 4, 0, 0, 0)
Trial 5 = (12, 14, 13, 7, 6, 3, 4, 9, 2, 8, 10, 11, 5, 1, 15)
Primacy for trial 1 was 0.500; subjective organization for trials 1–2 was 0.556; and Euclidean measure for trials 3–4 was 19.6. The aggregate measure after individual measure normalization was:
APPENDIX C: AGGREGATE MEASURE
To find parameters , we used the stochastic gradient descent method (Bertsekas & Nedic, Reference Bertsekas and Nedic2003). Stochastic gradient descent is a method to identify parameters by minimizing an objective function via iterative optimization. We used , where the objective function minimization was over the p value derived from an unpaired two sample t-test for FH+ and FH− using MAggregate(m). We noticed the following interesting result. Namely, the function appeared weakly convex over a wide range of values for parameters β 1,β 2, and β 3, all of which provided extremely similar results. We, therefore, set β 1 = β 2 = β 3 = 1, yielding a measure of
Because there was no model parameter fitting, the results were less likely to be overfit.
To assuage concerns of overfitting in this work, we explain that the choice of parameters in the aggregate measure had little effect on the calculated significance. Traditional machine learning approaches such as cross validation and receiver operating characteristics curves are applicable in the context of individual person predictions but are not readily applicable in the context of separating two populations. We provide evidence via a geometric argument over the parameter space. We conducted a grid search, by calculating the significance of family history in ANOVA using all triplet sets of parameters from the range [−10, 10] by a 0.5 step size. We show in Figure B1 all triplet parameters where the ANOVA family history p values are less than or equal to the p value when all parameters equal 1 (1.44 × 10−5). 11.1% of parameters would yield equal or more significant results while over 30% of parameters would yield results in concordance with those we have presented, therefore, alleviating concerns of overfitting.