Introduction
The dementias are behavioural syndromes by definition. Alzheimer's disease (AD) and all other brain diseases that cause dementia are characterized clinically by behavioural symptoms. In most of these diseases the core symptoms are cognitive and emotional, as in AD. Some diseases may start with motor symptoms, but during the course of the disease cognitive and emotional symptoms arise also, as in dementia associated with Parkinson's disease. The neuropathological processes that ultimately result in dementia are assumed to be active long before the first symptoms appear. Researchers are keen to find biomarkers of these neurodegenerative processes because the availability of such markers would create a window of opportunity for preclinical diagnostic testing and also early treatment, and perhaps even for prevention of impairments. Therefore, considerable research effort has been invested in finding or improving neuroimaging and neurochemical biomarkers of dementia. By now there is an abundant literature, and several recent overviews express enthusiasm on the value of currently available biomarkers (Blennow & Hampel, Reference Blennow and Hampel2003; de Leon et al. Reference de Leon, DeSanti, Zinkowski, Mehta, Pratico, Segal, Clark, Kerkman, DeBernardis, Li, Lair, Reisberg, Tsui and Rusinek2004, Reference de Leon, Mosconi, Blennow, DeSanti, Zinkowski, Mehta, Pratico, Tsui, Saint Louis, Sobanska, Brys, Li, Rich, Rinne and Rusinek2007a; Hampel et al. Reference Hampel, Burger, Teipel, Bokde, Zetterberg and Blennow2008), although some reviewers are more cautious (Shaw et al. Reference Shaw, Korecka, Clark, Lee and Trojanowski2007).
A fundamental assumption underlying these efforts is that the disease processes are detectable in the preclinical phase, long before behavioural symptoms arise, for example by an abnormal composition of cerebrospinal fluid (CSF) (Blennow & Hampel, Reference Blennow and Hampel2003; de Leon et al. Reference de Leon, Mosconi, Blennow, DeSanti, Zinkowski, Mehta, Pratico, Tsui, Saint Louis, Sobanska, Brys, Li, Rich, Rinne and Rusinek2007a) or by neuronal loss visible as atrophy on magnetic resonance imaging (MRI) (see Fig. 1) (Zamrini et al. Reference Zamrini, De Santi and Tolar2004; Smith et al. Reference Smith, Chebrolu, Wekstein, Schmitt, Jicha, Cooper and Markesbery2007; Carlson et al. Reference Carlson, Moore, Dame, Howieson, Silbert, Quinn and Kaye2008; Davatzikos et al. Reference Davatzikos, Fan, Wu, Shen and Resnick2008).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160927014849-82955-mediumThumb-S0033291709991516_fig1g.jpg?pub-status=live)
Fig. 1. Assumptions on the sequence of events that were tested in this review. (1) Cerebrospinal fluid (CSF) composition becomes abnormal shortly after the beginning of the neuropathological processes that ultimately lead to dementia. (2) When enough neuronal loss has occurred, brain atrophy is detectable on magnetic resonance imaging (MRI). (3) With further degeneration, behavioural symptoms arise, most often memory impairment. The whole process is thought to take several decades.
The biomarkers of AD that have been investigated most extensively are medial temporal lobe (MTL) atrophy on structural MRI, hippocampal atrophy in particular, and levels of tau and beta-amyloid in CSF. The MRI measures that show the largest differences between AD and normal ageing are atrophy of the hippocampi and the amygdalae, with effect sizes (Cohen's d, i.e. the difference between group means divided by the pooled standard deviation) of the order of 1.7 and 1.8 respectively (Zakzanis et al. Reference Zakzanis, Graham and Campbell2003). Even during the stage of mild cognitive impairment (MCI; Petersen et al. Reference Petersen, Doody, Kurz, Mohs, Morris, Rabins, Ritchie, Rossor, Thal and Winblad2001), atrophy of the MTL predicts future AD fairly accurately (Twamley et al. Reference Twamley, Ropacki and Bondi2006; Mosconi et al. Reference Mosconi, Brys, Glodzik-Sobanska, De Santi, Rusinek and de Leon2007). The CSF biomarkers total tau (t-tau), phosphorylated tau (p-tau) and amyloid beta 42 (aβ42) have sensitivities between 81% (for t-tau) and 86% (for aβ42), both at 90% specificity with respect to the distinction between AD and normal ageing (Blennow & Hampel, Reference Blennow and Hampel2003). This equals effect sizes of 2.0 and 2.3 respectively (Parker & Hagan-Burke, Reference Parker and Hagan-Burke2007). Studies that compared progressing MCI patients and healthy controls reported similar impressive sensitivities of 60–90% at 90% specificity (Blennow & Hampel, Reference Blennow and Hampel2003).
Although these findings are encouraging, they do not definitely prove that the underlying assumption as depicted in Fig. 1 is valid. In other words, they do not prove that these biomarkers can validly detect incipient brain disease that ultimately will lead to dementia long before the first symptoms arise. This claim would imply that the prognostic accuracy of these biomarkers is clearly superior to measures of behavioural symptoms and, if true, would have important consequences for clinical practice. To test this prediction, we have systematically reviewed longitudinal studies of subjects who were not demented at baseline, and of whom some declined to MCI or converted to AD during follow-up. We compared the available evidence for the CSF and MRI biomarkers to the prognostic accuracy of assessment of behavioural symptoms that were used in the same studies. Memory impairment is usually the first symptom of AD (Zakzanis, Reference Zakzanis1998; Bäckman et al. Reference Bäckman, Jones, Berger, Laukka and Small2005). Therefore, we compared the biomarkers to performance on memory tests.
Method
Search strategy
CSF biomarker studies
We searched PubMed, Medline, EMBASE and PsychINFO for papers with search terms ‘Alzheimer’ or ‘MCI’ or ‘aging’, and ‘CSF’, and ‘tau’ or ‘amyloid’, and ‘longitudinal’ or ‘follow-up’. We did not add ‘memory’ as a search term because memory was not a focus in most of these studies, and consequently it would result in a much smaller number of hits. The first longitudinal studies on CSF biomarkers of this kind appeared in 2003. Therefore, only papers that were published during the past 6 years (from 1 January 2003 to 30 November 2008) were included. Other inclusion criteria were (1) longitudinal design; (2) subjects had to be cognitively normal or in the MCI stage at baseline; (3) decline to MCI and/or AD at follow-up had to be observed in some of the subjects; (4) diagnosis of MCI or AD was made according to established criteria for MCI (Petersen et al. Reference Petersen, Smith, Waring, Ivnik, Tangalos and Kokmen1999, Reference Petersen, Doody, Kurz, Mohs, Morris, Rabins, Ritchie, Rossor, Thal and Winblad2001) and AD (McKhann et al. Reference McKhann, Drachman, Folstein, Katzman, Price and Stadlan1984) or using the Clinical Dementia Rating (Morris et al. Reference Morris, McKeel, Fulling, Torack and Berg1988); and (5) baseline data on CSF biomarkers for stable and declining subgroups were reported. If a study reported on several groups (e.g. both normal subjects, MCI subjects and AD patients), only the data for normal and MCI subjects were used for the analysis. Studies that reported only on tau/amyloid-beta ratios without presentation of separate values for tau and amyloid-beta were excluded.
MRI studies of MTL atrophy
Studies were traced in the same databases with search terms ‘MRI’, and ‘Alzheimer’ or ‘MCI’ or ‘aging’, and ‘temporal’ or ‘hippocamp*’, and ‘longitudinal’ or ‘follow-up’. Inclusion and exclusion criteria were the same as for the CSF studies. Limitation of the search to the same time period of the past 6 years ensured that the scanning techniques met current standards. Papers that focused on the patterns of atrophy, or on atrophy rates in serial scans, were excluded unless they reported baseline data on MTL volumes of stable subjects and decliners/converters.
The retrieved CSF and MRI studies were screened for the inclusion and exclusion criteria by two authors (B.S. and W.A.v.G.) either on the basis of the title and abstract or, if this provided insufficient information, on the method section. Both authors independently selected the same sets of studies. If a paper reported on (almost) the same sample as another study, we included the one with the most complete data. Some studies reported on both CSF and MRI biomarkers; these studies were included twice (for each analysis; see below) as if they were separate studies.
Data extraction
Means and standard deviations or sensitivity and specificity figures at baseline of each study were extracted (by B.S., and checked by H.M.H.) for the stable subgroup and the subgroup that declined. MRI measures used were preferably quantitative measures (mostly by voxel-based morphometry) or visual ratings of atrophy of the hippocampus. If these were not given, comparable data on other MTL structures, such as the entorhinal cortex or ventricular volume, were used.
CSF measures used were t-tau, p-tau and aβ42. Ratios of tau and amyloid-beta were not analysed because these were reported less frequently.
The behavioural measure used was performance at the delayed recall condition of memory tasks, that is when the subject has to recall information, such as a short list of words or a brief story that has been memorized at an earlier moment, typically 15 to 30 min earlier. Impaired delayed recall is usually the first symptom of AD (Bäckman et al. Reference Bäckman, Jones, Berger, Laukka and Small2005; Zakzanis, Reference Zakzanis1998). The follow-up duration of the studies was also recorded.
If a study fulfilled all inclusion criteria but did not report all relevant data, we contacted the authors to obtain supplementary data. This was especially necessary for data on memory performance, which were collected but not reported by 42% of the studies. Unfortunately, some authors did not respond; others were unable or refused to provide the requested data.
Data analysis
The data were analyzed using MetaWin software (version 2.1, release 5.10, 2007) (Rosenberg et al. Reference Rosenberg, Adams and Gurevitch2000). The measure of interest was the effect size Cohen's d, which is generally calculated as the difference between group means divided by the pooled standard deviation. In the present analyses it was the standardized difference at baseline between the subjects who at follow-up were demented (or had declined to MCI) and those who remained cognitively stable. When the data were reported as sensitivity and specificity, Cohen's d was calculated as d=2ϕ/√(1 – ϕ2) (Parker & Hagan-Burke, Reference Parker and Hagan-Burke2007). Next, the effect sizes were expressed as Hedges' d, which is marginally different from Cohen's d but is not sensitive to bias due to small sample sizes, differences in sample sizes, or differences in variance between samples (Rosenberg et al. Reference Rosenberg, Adams and Gurevitch2000). The studies were tested for heterogeneity with conventional Q-total tests, as well as with the H statistic (Higgins & Thompson, Reference Higgins and Thompson2002), which is less sensitive to small numbers of included studies. Even if the test for heterogeneity was not significant, the effect sizes were analysed in a random effects model.
The presence of publication bias was checked by inspection of funnel plots and by fail-safe analysis (Rosenthal's method). The fail-safe number is the number of negative studies that have to be published to render the effect size insignificant. This number should be substantially larger than the estimated number of unpublished negative studies, which is calculated as 5k+10 (k is the number of studies used to calculate the effect size) (Rosenthal, Reference Rosenthal1991).
The follow-up duration of the studies is potentially a critical variable, because effect sizes of prognostic markers can be expected to vary with the stage of progression of the disease when they are assessed. Therefore, the effect sizes were regressed on the duration of follow-up, in a maximum likelihood random effects meta-regression (Thompson & Higgins, Reference Thompson and Higgins2002; van Houwelingen et al. Reference van Houwelingen, Arends and Stijnen2002; Viechtbauer, Reference Viechtbauer2006).
Results
CSF biomarkers
The search strategy retrieved 60 papers on CSF. Eighteen papers were excluded because they reported on prevalent AD only (nine) or did not report on a study with longitudinal design (nine). Twenty-eight studies were excluded for a variety of reasons (such as reporting on the same sample as another report, on other diseases, on abeta/tau ratios or patterns only, or on other CSF constituents, and reviews or comments). Fourteen studies satisfied all inclusion and exclusion criteria (Table 1). Tests for heterogeneity were not significant (Q total in random effects models ⩽9.91, p⩾0.40, H<1, n.s.). The weighted mean effect sizes in these studies were 0.91 [90% confidence interval (CI) 0.68–1.14] for t-tau, 0.92 (90% CI 0.69–1.14) for aβ42 and 1.11 (90% CI 0.88–1.34) for p-tau. Mean follow-up duration was 2.5 years.
Table 1. Longitudinal studies of CSF biomarkers in subjects that were normal or in MCI stage at baseline
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160927014849-00903-mediumThumb-S0033291709991516_tab1.jpg?pub-status=live)
CSF, Cerebrospinal fluid; MCI, mild cognitive impairment; ES, effect size (Hedges' d); memory, delayed recall memory test; t-tau, total tau; p-tau, phosphorylated tau; aβ42, amyloid beta 42; n.a., not assessed; n.r., not reported (but assessed); s, based on sensitivity and specificity; mean ES, mean effect size weighted for sample sizes; CI, confidence interval; CERAD, cognitive test battery of the Consortium to Establish a Registry for Alzheimer's Disease; RAVLT, Rey Auditory Verbal Learning Test; fig, CFT, Rey Complex Figure Test; WMS, Wechsler Memory Scale; WMSr dr, WMS revised, composite of delayed recall scores; MODA, Milan Overall Dementia Assessment; Informal, non-standardized word list; ?, paper mentions neuropsychological testing, but does not specify; authors did not answer request for further information.
Figures in italics were provided by the respective authors at our request.
The study by Fagan et al. (Reference Fagan, Roe, Xiong, Mintun, Morris and Holtzman2007) deserves particular attention because it is one of the few studies that followed subjects (n=61) who were normal at baseline. Thirteen of these subjects (21%) declined to MCI or AD during a mean follow-up time of 3 to 4 years (range 1–8 years). Total tau, p-tau and aβ42 separately were not significant as predictors of decline (effects sizes between 0.5 and 0.7), but combinations of tau and aβ42 had significant hazard ratios (HRs). The highest HR was found for t-tau/aβ42 (5.21) with an effect size of 1.1. Another recent paper by Li et al. (excluded because it reported on ratios only) corroborated these results in a smaller sample of normal subjects (n=43), of whom 9% progressed to MCI after 42 months (Li et al. Reference Li, Sokal, Quinn, Leverenz, Brodey, Schellenberg, Kaye, Raskind, Zhang, Peskind and Montine2007). However, the effect size of the tau/aβ42 ratio was smaller than in the Fagan study (d=0.5).
MTL atrophy
We retrieved 233 papers on MRI. Of these, 212 papers were excluded for the following reasons: they concerned investigations on other diseases than AD (n=54), reports on prevalent AD only (15) or on healthy, non-declining subjects only (25), used other techniques than structural MRI (28), mostly functional MRI (14), reports on regional patterns of atrophy (eight) or atrophy rates in serial MRI only (14), or on technical or statistical aspects of MRI only (seven), studies were not longitudinal (15), or reported no baseline data for stable and declining subgroups separately (12). Fifteen papers were reviews, two were comments, and 10 were excluded because they were animal studies or case reports. Seven papers doubled other publications. Twenty-one MRI studies satisfied the inclusion and exclusion criteria; they are summarized in Table 2. (The Hall 2008 study was split in two because two different follow-up intervals were used.) The follow-up duration in these studies was 4.5 years on average. The right-most column of Table 2 shows the effect sizes of atrophy of the hippocampus (or other MTL structures). The weighted mean effect size was 0.75 (90% CI 0.61–0.89). Tests for heterogeneity of the studies were not significant (Q=21.93, p=0.40, H=1.02, n.s.).
Table 2. Longitudinal studies of MRI biomarkers (atrophy of the hippocampus or other medial temporal lobe structures) in subjects that were normal or in MCI stage at baseline
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160927014849-15696-mediumThumb-S0033291709991516_tab2.jpg?pub-status=live)
MRI, Magnetic resonance imaging; MCI, mild cognitive impairment; ES, effect size (Hedges' d); memory, delayed recall memory test; atrophy, atrophy of medial temporal lobe (MTL) structures (in most studies hippocampal volume); n.a., not assessed; n.r., not reported (but assessed); s, based on sensitivity and specificity; mean ES, mean effect size weighted for sample sizes; CI, confidence interval; CERAD, cognitive test battery of the Consortium to Establish a Registry for Alzheimer's Disease; Composite, composite score of California Verbal Learning Test (CVLT) and Rey Complex Figure Test (CFT) delayed recall; GMT pa, Guild Memory Test, paired associates; SRT, Selective Reminding Test; LM-II, logical memory II; NYU pr, New York University delayed paragraph recall; WMSr lm, Wechsler Memory Scale revised, logical memory test; word list, word list immediate recall; Williams, Williams Memory Assessment Scale word list learning.
Figures in italics were provided by the respective authors on our request.
a Ventricular volume.
b Grey-matter concentration.
Memory tests
The weighted mean effect size for memory tests in 15 of the MRI studies was d=1.04 (90% CI 0.84–1.24). Only four CSF studies reported psychometric results; one research group supplied additional data. The effect size for memory tests in these CSF studies was d=1.21 (90% CI 0.63–1.79). For the 20 CSF and MRI studies combined, the effect size was 1.06 (90% CI 0.89–1.24). Tests for heterogeneity were not significant (Q total=22.01, p=0.34; H=1.05, n.s.).
Publication bias
There was no clear indication of publication bias. Spearman's ρ was ⩽|0.37| (p⩾0.09) and the funnel plots were not skewed, except for MTL atrophy, which showed a trend towards larger effect sizes in smaller studies (p=0.09). Rosenthal's fail-safe numbers were 1108 for MTL atrophy, 249 for aβ42, 463 for t-tau, 474 for p-tau and 1857 for memory tests. The estimated numbers of unpublished negative studies ranged from 1/17 (for memory tests) to 1/4 (for aβ42) of these fail-safe numbers.
Correlation of effect size and follow-up duration
In Fig. 2 the effect sizes from Tables 1 and 2 are plotted against the duration of follow-up. None of the slopes were significantly different from zero, but the trends showed the expected pattern. The effect sizes of memory tests and MTL atrophy increased slightly with decreasing distance to the moment of diagnosis [slope (Δd/year±s.e.)=0.036±0.068 (p=0.60 two-tailed) for memory tests and 0.053±0.047 (p=0.26) for MTL atrophy] whereas they decreased for two of the three CSF markers [aβ42: −0.119±0.076 (p=0.11); t-tau: −0.122±0.085 (p=0.15); p-tau: −0.007±0.098 (p=0.95); this latter slope was −0.077±0.070 (p=0.29) when one outlying study (Parnetti et al. Reference Parnetti, Lanari, Silvestrelli, Saggese and Reboldi2006) was dropped from the analysis]. The regression lines of CSF biomarkers crossed the regression line of memory tests around 4 years before the moment of follow-up when the diagnosis was made (see Fig. 2f). At 6 years before diagnosis (–72 months in Fig. 2) the intercepts of t-tau and aβ42 were still not significantly higher than for memory (p=0.40 for the estimated difference in intercept between t-tau and memory).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160927014849-68630-mediumThumb-S0033291709991516_fig2g.jpg?pub-status=live)
Fig. 2. Effect sizes as a function of follow-up duration in longitudinal studies with subjects who, at baseline, were in the mild cognitive impairment (MCI) stage (or normal), and who either remained stable or declined to Alzheimer's disease (AD) (or to MCI). (a)–(e) A graphical representation of the data given in Tables 1 and 2. The size of the circles is proportional to the inverse variance of the effect size for each study. (a) Total tau (t-tau); (b) phosphorylated tau (p-tau); (c) amyloid beta 42 (aβ42); (d) medial temporal lobe (MTL) atrophy; (e) memory (delayed recall); (f) regression lines from (a)–(e). Absolute values for effect sizes of t-tau and p-tau were taken for reasons of comparison.
Discussion
The results of this review suggest that memory tests with a delayed recall condition are better detectors of future AD in normal elderly and subjects with MCI than atrophy of the hippocampus or other MTL structures as assessed by MRI. The available data for CSF biomarkers show that the prognostic accuracy of CSF biomarkers (especially p-tau) is better than for MTL atrophy, and about equal to memory tests. However, CSF biomarkers tend to gain accuracy when they are assessed earlier in the disease process, whereas for memory tests and MTL atrophy the reverse is true; they tend to be most accurate when they are assessed closer to the moment of diagnosis (Fig. 2). Measures of memory impairment and CSF abnormalities are about equally predictive some 4 years before diagnosis.
Before general conclusions can be drawn, we need to address several caveats. First, our treatment of markers was ‘univariate’. The evidence for each marker was presented in isolation of its clinical context. This was necessary for a fair comparison, but it does not do justice to clinical reality. In the diagnostic work-up of suspected dementia patients, MRI, for example, is not used solely to examine atrophy of the hippocampus to detect AD. On the contrary, it may serve several differential diagnostic purposes at the same time, such as screening for tumours or cerebrovascular lesions. Something similar may be said of the neuropsychological evaluation (it evaluates much more than just delayed recall), and even of CSF assessments. In clinical practice a single parameter relevant to one diagnostic consideration is not investigated in isolation, but a pattern of signs and symptoms is examined, while considering a variety of diagnostic possibilities. Such patterns are more informative, but their diagnostic value is often much more difficult to investigate. A simple example is the combination of tau and beta-amyloid, which generally has better diagnostic properties for detecting AD than each parameter on its own (Shaw et al. Reference Shaw, Korecka, Clark, Lee and Trojanowski2007).
Second, the cited studies are based on the clinical diagnosis of AD (or MCI) without neuropathological confirmation. A substantial proportion of clinical AD diagnoses are wrong, even in centres of excellence, and conversely, the brains of many elderly people who are clinically normal show signs of neurodegenerative disease when they happen to come to post-mortem (Neuropathology Group of MRC CFAS, 2001; Zaccai et al. Reference Zaccai, Ince and Brayne2006). Thus, the accomplishments of diagnostic markers are necessarily reduced by these diluting influences.
Third, we did not select or weigh the studies by methodological criteria. We disregarded, for example, that some methods to establish MTL atrophy may be more reliable than others, that some studies used memory tests of questionable quality, and that studies using externally established cut-off points may have greater validity than studies that use ad-hoc cut-off points. We considered that the number of studies was too small for such further refinements. However, the studies included in this review generally met high methodological standards, such as those of the STARD (Standards for Reporting of Diagnostic Accuracy) initiative (Bossuyt et al. Reference Bossuyt, Reitsma, Bruns, Gatsonis, Glasziou, Irwig, Lijmer, Moher, Rennie and de Vet2003). Moreover, we did not find much evidence for publication bias. This is particularly important with respect to the memory data, because so many investigators failed to report these data. It is therefore reassuring (and another indication of absence of publication bias in the memory data) that the results of our analysis were comparable to an independent meta-analysis of 17 longitudinal studies in 3388 non-demented subjects of whom 453 converted to AD (Bäckman et al. Reference Bäckman, Jones, Berger, Laukka and Small2005). In this analysis delayed memory had mean effect sizes of 1.2 in normal subjects, and 1.3 in MCI patients.
Fourth, we did not consider evidence from functional neuroimaging studies, such as positron emission tomography (PET) or functional MRI, because these techniques have been studied less frequently. Moreover, they are either invasive, relatively expensive, or require complex analyses, rendering it unlikely that they will find large-scale application in everyday clinical practice, outside the realm of research settings (Reagan Institute Working Group, 1998; Growdon, Reference Growdon1999).
Finally, most of the incorporated studies concerned groups that initially were diagnosed to have MCI. MCI is defined on the basis of behavioural characteristics, just like dementia itself. This seems to cause some circularity, because the same cognitive impairments that define MCI, and thus future dementia, also define dementia. This circularity is only apparent, however. In fact, it creates a handicap for the memory tests. In a typical research project MCI is partly defined on the basis of neuropsychological test scores, which makes it harder for memory measures than for other types of markers to predict conversion to dementia, because the inclusion criteria of MCI have removed a large part of variance in memory performance from the data. This handicap must have been effective in many of the cited studies. Studies that do not select their subjects with neuropsychological tests, such as the population-based study by den Heijer et al. (Reference den Heijer, Geerlings, Hoebeek, Hofman, Koudstaal and Breteler2006), or do so only loosely, such as the clinic-based study of Devanand et al. (Reference Devanand, Pradhaban, Liu, Khandji, De Santi, Segal, Rusinek, Pelton, Honig, Mayeux, Stern, Tabert and de Leon2007), do not suffer from this bias, or less so than standard MCI studies. The former study included elderly subjects who were normal at baseline, and of whom 5% developed AD during 6 years of follow-up (den Heijer et al. Reference den Heijer, Geerlings, Hoebeek, Hofman, Koudstaal and Breteler2006). Baseline atrophy of the hippocampus had an effect size of 0.3; a test of delayed memory had an effect size of 1.2 (Table 1). The Devanand study investigated MCI patients who were followed for 3 years; 24% converted to AD. A compound measure of hippocampal and entorhinal atrophy had an effect size of 0.8, whereas a delayed recall test had an effect size that was about 25% larger (d=1.0). In both studies the memory evaluation performed better than the MRI. Even in the study by Fleisher et al. (Reference Fleisher, Sun, Taylor, Ward, Gamst, Petersen, Jack, Aisen and Thal2008), which did select its MCI subjects in the usual way, and which used state-of-the-art MRI techniques, delayed recall was much more predictive for conversion than hippocampal atrophy (see Table 2). It is therefore conceivable that the effect sizes for the memory tests are systematically underestimated in this research field.
The hypothesis that preclinical disease processes are detectable by CSF and MRI assessments long before behavioural symptoms arise (Blennow & Hampel, Reference Blennow and Hampel2003; Zamrini et al. Reference Zamrini, De Santi and Tolar2004; de Leon et al. Reference de Leon, Mosconi, Blennow, DeSanti, Zinkowski, Mehta, Pratico, Tsui, Saint Louis, Sobanska, Brys, Li, Rich, Rinne and Rusinek2007a) requires that CSF and MRI assessments have non-zero effect sizes at a moment in time where memory effect size is zero. Figure 2 shows that there is no such point in time. The results from the present meta-regression suggest that once CSF composition starts to deviate from normal, memory starts to decline as well, while MTL atrophy lags behind (Fig. 3).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160927014849-59810-mediumThumb-S0033291709991516_fig3g.jpg?pub-status=live)
Fig. 3. Sequence of events as suggested by the results of this review. Cerebrospinal fluid (CSF) abnormalities and memory impairment arise at about the same moment in the course of the disease (1–2). Brain atrophy on magnetic resonance imaging (MRI) becomes detectable somewhat later (3).
The diagnostic accuracies of CSF biomarkers increase with time from diagnosis. This may be explained by their biochemical nature reflecting early neuropathological processes, which apparently need to be active a long time before they result in dementia. To this extent our review corroborates their claim of being early markers, but this not necessarily makes them accurate markers. Our analysis suggests that longitudinal studies in normal subjects with more than 6 years of follow-up duration are needed to decide whether the prognostic accuracy of CSF biomarkers indeed does increase further longer before the moment of diagnosis. The currently available data cannot answer this question. At least two studies of this kind are under way (Fagan et al. Reference Fagan, Roe, Xiong, Mintun, Morris and Holtzman2007; Li et al. Reference Li, Sokal, Quinn, Leverenz, Brodey, Schellenberg, Kaye, Raskind, Zhang, Peskind and Montine2007) with initial results after a mean follow-up of 3 to 4 years. The CSF biomarkers tau and aβ42 separately were not significantly related to follow-up diagnosis (Fagan et al. Reference Fagan, Roe, Xiong, Mintun, Morris and Holtzman2007). In one of these studies the tau/aβ42 ratios had an effect size that is at the level of CSF regression lines in Fig. 2 (Fagan et al. Reference Fagan, Roe, Xiong, Mintun, Morris and Holtzman2007), whereas in the other study this effect size was even smaller (Li et al. Reference Li, Sokal, Quinn, Leverenz, Brodey, Schellenberg, Kaye, Raskind, Zhang, Peskind and Montine2007).
Objections could be made that the developments in biotechnology move fast, and that current experimental methods, such as imaging of amyloid deposition in the brain by PET-PIB (PET with 11C-labeled Pittsburgh Compound-B) or serial MRI scans to assess rates of atrophy, will soon become available in many hospitals. Serial MRI may be a more promising biomarker than a single MRI scan. Furthermore, it is conceivable that mapping several regions and defining a pattern of atrophy increases the specificity of a single MRI scan. This may be true, but neuropsychological methods develop as well. It is now clear, for example, that memory tests with semantic encoding procedures (Buschke et al. Reference Buschke, Sliwinski, Kuslansky and Lipton1997) and certain tests of associative learning (Lindeboom et al. Reference Lindeboom, Schmand, Tulner, Walstra and Jonker2002; Blackwell et al. Reference Blackwell, Sahakian, Vesey, Semple, Robbins and Hodges2004) have higher diagnostic and prognostic accuracy for AD than test paradigms that were used in most of the studies included in our meta-analyses. The present results encourage future studies that evaluate the diagnostic claims of these newer biomarkers by comparing them directly to modern memory tests, and to patterns of cognitive decline, assessed in a serial way. This requires longitudinal research projects with follow-up durations extending 6 years to allow CSF biomarkers to show their potential.
Irrespective of the value of future biomarkers, there will always remain an important role for neuropsychology in the diagnostic work-up. It is not very likely that MRI or CSF abnormalities will ever be the sole object of treatment, but patients will remain so. Even if effective preventive or disease-modifying therapy without serious side-effects becomes available, treatment of biomarker abnormalities associated with AD will have the obvious drawback that many persons, who will never reach the dementia stage because of competing risks or because of imperfection of the predictions, will be treated unnecessarily. We suppose that, in general, physicians will prefer to postpone treatment until the very first, subtle behavioural signs of dementia come to light as they can be detected with modern memory tests.
Acknowledgements
This work was funded by the Academic Medical Centre (AMC) and the University of Amsterdam. We are grateful to K. Zwinderman for assistance with the meta-regression. Our colleagues H. Smeding, P. Eikelenboom and G. Walstra gave valuable comments on earlier drafts of this paper. Drs L. Parnetti, H. Rusinek, A. Fagan, D. P. Devanand, M. Tabert, A. Fleisher and O. Hansson kindly provided supporting data that were not reported in their respective papers.
Declaration of Interest
None.