INTRODUCTION
Many Parkinson’s disease (PD) patients show a decline in cognitive functioning, often already early in the disease course (Aarsland et al., Reference Aarsland, Andersen, Larsen, Lolk, Nielsen and Kragh–Sørensen2001; Hobson & Meara, Reference Gasca-Salas, Estanga, Clavero, Aguilar-Palacio, González-Redondo, Obeso and Rodríguez-Oroz2004; Muslimovic et al., Reference Mamikonyan, Moberg, Siderowf, Duda, Ten Have, Hurtig, Stern and Weintraub2005). PD with mild cognitive impairment (PD-MCI) is predictive of progression to PD dementia (PDD; Aarsland et al., Reference Aarsland, Andersen, Larsen, Lolk, Nielsen and Kragh–Sørensen2001; Caviness et al., Reference Caviness, Driver-Dunckley, Connor, Sabbagh, Hentz, Noble, Evidente, Shill and Adler2007; Hoogland et al., Reference Hobson and Meara2017; Williams-Gray et al., Reference Williams-Gray, Foltynie, Brayne, Robbins and Barker2007). It is important to accurately predict which patients will develop PDD as it may have implications for patient care, for example, choice of medication (such as avoiding anticholinergic drugs) and planning of assistance. Also, accurate prediction enables a more appropriate selection of patients for cognitive interventions or pharmaceutical trials.
Clinical criteria for PD-MCI have been proposed by a task force of the International Parkinson and Movement Disorder Society (MDS) (Litvan et al., Reference Lezak, Howieson, Bigler and Tranel2012). In order to diagnose PD-MCI at Level II (i.e. the level with most diagnostic certainty), a PD patient should experience subjective complaints (or their relatives should report such complaints) and should be impaired on objective cognitive testing. Litvan et al. (Reference Lezak, Howieson, Bigler and Tranel2012) recommend administering at least two tests for each of five cognitive domains, thus a minimum of 10 tests, of which at least two tests need to indicate impairment for a PD-MCI diagnosis. Impairment is usually assessed by comparing the patient’s test scores to those of normative samples, often in the form of norm tables that accompany published test manuals.
There are several issues with comparing patients to such published norm tables. First, as the normative data for neuropsychological tests have been collected for each test separately, correlations between tests are usually unknown (except in case of co-normed tests). Because the correlations are unknown, they cannot formally be taken into account in neuropsychological assessment. This makes it hard to evaluate abnormal combinations of scores (i.e. an abnormal score profile; Huizenga et al., Reference Hoops, Nazem, Siderowf, Duda, Xie, Stern and Weintraub2007). Third, norm tables do not always allow for correction for the influence of demographic variables, even though age, sex, and level of education are known to influence the scores on neuropsychological tests. Moreover, it is often not possible to simultaneously correct for age, sex, and level of education (Lezak et al., Reference Kaplan, Goodglass and Weintraub2012). Also, when correction for age is possible, separate norms are presented for different age groups. When a patient gets older and shifts from one age group to the next, the interpretation of their test results can be different and may, for example, change from abnormal to normal (Zachary & Gorsuch, Reference Zachary and Gorsuch1985). Fourth, when evaluating more than one test (at least 10 in the case of Level II PD-MCI diagnosis), the likelihood of obtaining an abnormal score by chance alone increases with the number of tests that have been administered (Binder et al., Reference Binder, Iverson and Brooks2009).
In this study, we apply a new statistical method that circumvents these problems, to detect cognitive abnormality in newly diagnosed PD patients and to predict PDD at later follow-up. This method uses an aggregated normative database of neuropsychological tests (de Vent et al., Reference Van der Putten, Hobart, Freeman and Thompson2016). Because the database contains data of co-normed neuropsychological tests, correlations between tests can be taken into account. This allows for a so-called multivariate normative comparison (MNC), which evaluates a patient’s entire profile of test scores. MNC can detect abnormal combinations of high and low scores in a score profile, which are easily overlooked in a traditional, univariate normative comparison (Crawford & Garthwaite, Reference Crawford and Garthwaite2002; Huizenga et al., Reference Hoops, Nazem, Siderowf, Duda, Xie, Stern and Weintraub2007; Su et al., Reference Su, Schouten, Geurtsen, Wit, Stolte, Prins, Portegies, Caan, Reiss, Majoie and Schmand2015). The database contains information about demographic variables and thus allows simultaneous correction for age, sex, and level of education. By using regression-based demographic corrections, drastic changes in the interpretation of test scores, when moving from one to another norm table, are prevented. The new statistical method keeps the false positive rate under control, because it entails a single statistical comparison.
In this article, we will discuss (1) using an aggregate normative database instead of norm tables and (2) using it to make MNCs. To examine whether these approaches are a good alternative to traditional (univariate) normative comparisons, we compare their performance to that of the PD-MCI criteria with traditional norms. We use existing data from a longitudinal study conducted by our group (Broeders et al., Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013). First, we compare the ability of the traditional PD-MCI criteria to predict PDD after 3 and 5 years to the same PD-MCI criteria when applied with a large normative database of co-normed tests. Second, we compared the traditional PD-MCI criteria to the new MNC when applied with the same large normative database. Finally, we explored whether the new approach can give insight into which cognitive domains in particular are impaired in PD-MCI patients who progress to PDD compared to those who do not.
METHOD
PD Patients
Participants were 123 patients with newly diagnosed idiopathic PD according to the Gelb criteria who at baseline were younger than 85 years, were not demented, had no history of stroke, and had a score of at least 24 on the Mini-Mental State Examination (MMSE; Folstein et al., Reference Emre, Aarsland, Brown, Burn, Duyckaerts, Mizuno, Broe, Cummings, Dickson, Gauthier and Goldman1975). The diagnosis of idiopathic PD was checked at study inclusion and was reevaluated on two occasions by a neurologist specialized in movement disorders (Broeders et al., Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013; Muslimovic et al., Reference Mamikonyan, Moberg, Siderowf, Duda, Ten Have, Hurtig, Stern and Weintraub2005). Patients whose diagnosis was revised were not included in the group of 123 patients. After 3 years, the clinical status was missing for 26 patients. After 5 years, information was no longer available for another 24 patients (see Table 1). The institutional review boards of the participating hospitals approved the original study and all patients gave written informed consent. The institutional review boards of the participating hospitals approved the original study by Muslimovic et al. (Reference Mamikonyan, Moberg, Siderowf, Duda, Ten Have, Hurtig, Stern and Weintraub2005) and Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013) in accordance with the Helsinki Declaration.
Table 1. Sample characteristics of the PD group from Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013)

PD with mild cognitive impairment
Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013) applied the PD-MCI Level II criteria (Litvan et al., Reference Lezak, Howieson, Bigler and Tranel2012) as follows: (1) Patient has a PD diagnosis; (2) there is gradual cognitive decline as reported by the patient or observed by the caregiver or the clinician; (3) there is a cognitive deficit on neuropsychological testing; and (4) cognitive deficits do not significantly interfere with functional independence. With respect to the first criterion, all patients in the sample were newly diagnosed PD patients; the diagnosis was checked by the study neurologists at follow-up. With respect to the second criterion, gradual cognitive decline reported by the patient was assessed by two questions, asking whether the patient experienced memory problems or concentration problems. If participants answered either question with “yes” or “sometimes”, this was recorded as experiencing subjective complaints. With respect to the third criterion, a score of 1.5 SD below the demographically corrected mean on at least two tests was considered a cognitive deficit. To compensate for the fact that gradual cognitive decline as observed by the caregiver or the clinician from Criterion 2 was not available in our data, patients could also be diagnosed with PD-MCI if they reported no subjective complaints but had impairments (of at least 1.5 SD) on four or more tests. The reasoning behind this choice was that such broad impairments would be noticed by caregivers (Broeders et al., Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013). This choice does mean that more patients can be classified as having PD-MCI, including patients without subjective cognitive complaints. Because this choice was nonstandard for the PD-MCI criteria (Litvan et al., Reference Lezak, Howieson, Bigler and Tranel2012), we reran the analyses post hoc for the Advanced Neuropsychological Diagnostics Infrastructure (ANDI) data with the stricter original rule, excluding patients without subjective cognitive complaints from the PD-MCI classification (see Supplement 3). With respect to the fourth criterion, patients were excluded if they had a score <24 on the MMSE (Folstein et al., Reference Emre, Aarsland, Brown, Burn, Duyckaerts, Mizuno, Broe, Cummings, Dickson, Gauthier and Goldman1975).
PD Dementia
PDD was used as the outcome variable. PDD at 3- and 5-year follow-ups was diagnosed by MDS criteria (Emre et al., Reference Elgh, Domellöf, Linder, Edström, Stenlund and Forsgren2007). Criteria were as follows: (1) a diagnosis of PD prior to the onset of dementia, (2) MMSE score lower than 24, (3) no depression as measured by the Hospital Anxiety and Depression Scale (HADS; Zigmond & Snaith, Reference Zigmond and Snaith1983), (4) cognitive deficits that are severe enough to interfere with daily functioning, as measured by the Behavioral Assessment of Daily Living (Collin et al., Reference Collin, Wade, Davies and Horne1988), Schwab and England Scale (Schwab & England, Reference Schwab, England, Billingham and Donaldson1969), and Functional Independence Measure (Van Putten et al., Reference Pedersen, Larsen, Tysnes and Alves1999), and (5) an abnormal score on at least two of the following tests: clock drawing (Lezak et al., Reference Kaplan, Goodglass and Weintraub2012), pentagon copying or serial 7s of the MMSE (Folstein et al., Reference Emre, Aarsland, Brown, Burn, Duyckaerts, Mizuno, Broe, Cummings, Dickson, Gauthier and Goldman1975).
Materials
PD patients were tested on five cognitive domains: memory, language, executive functions, visuospatial skills, and attention. All test variables from the Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013) study were used except the Modified Wisconsin Card Sorting Test (Nelson, Reference Muslimović, Post, Speelman and Schmand1976) as its score distribution was extremely skewed, violating the assumptions of the parametric normative comparisons that are used throughout this article. We replaced it by the Tower of London (Culbertson & Zillmer, Reference Culbertson and Zillmer1998) as an alternative test for the executive functions domain. An overview of the tests can be found in Table 2.
Table 2. Characteristics of the neuropsychological test variables in ANDI

RAVLT = Rey Auditory Verbal Learning Test, RBMT = Rivermead Behavioral Memory Test, BNT = Boston Naming Test, WAIS-III = Wechsler Adult Intelligence Scale 3rd edition, COWAT = Controlled Oral Word Association Test, TOL = Tower of London, JOLO = Judgment of Line Orientation, WAIS-R = Wechsler Adult Intelligence Scale Revised edition, TMT = Trail Making Test.
a As explained elsewhere (de Vent et al., Reference Van der Putten, Hobart, Freeman and Thompson2016), an Akaike Information Criteria (AIC) selection procedure was used to estimate which of the three demographic variables to include in regression-based demographic corrections. In this column, A, S, and E indicate whether age, sex, and level of education were included for each variable.
Normative Control Sample
For normative comparisons, we used either the published norms of each neuropsychological test or the database of the ANDI (de Vent et al., Reference Van der Putten, Hobart, Freeman and Thompson2016). ANDI is an online tool that can be used by clinicians and researchers to conduct normative comparisons. ANDI has a large aggregated normative database (N = 26,635) which consists of healthy individuals who either participated as control subjects in clinical studies, or participated in community-based studies. Since each participant completed only a subset of the tests that are included in ANDI, the number of participants per test varies between 62 and 5017 depending on the test. Table 2 gives an overview of the tests used for the present analyses and of the number of participants per test. All scores in the ANDI database have been transformed to normality and standardized to demographically corrected z scores. For most test variables, age, sex, and level of education had a significant effect, and thus were included in the demographic correction (de Vent et al., Reference Van der Putten, Hobart, Freeman and Thompson2016).
PD-MCI Criteria Applied with ANDI’s Normative Data
In applying the PD-MCI criteria, Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013) followed typical neuropsychological practice and used normative data from test manuals and various other sources to judge whether a patient deviated from the norm. Here, we applied the PD-MCI Level II criteria in the same way but now with the ANDI database instead of the normative data accompanying each test. A difference with the usual way of working is that the ANDI data have been treated in a consistent manner across all tests (de Vent et al., Reference Van der Putten, Hobart, Freeman and Thompson2016): For each test, the same procedures were followed for outlier removal, test score standardization, and selection of transformations to normality. Another difference is that for many tests a larger normative sample is available. Instead of z-values, ANDI uses t-values because these have better statistical properties (Crawford & Garthwaite, Reference Crawford and Garthwaite2002). As threshold for abnormality of test scores, we used the clinical convention of −1.5 standard deviations below the demographically corrected mean, which corresponds to a p value of .067 one-tailed. Because tests were one-tailed, only deviations in the negative direction were classified as impaired.
Abnormality as Defined by MNC
Finally, MNCs were used (Huizenga et al., Reference Hoops, Nazem, Siderowf, Duda, Xie, Stern and Weintraub2007) to assess abnormal profiles. MNC compares the profile of the patient’s scores to the norm, that is, to the profile of scores that is predicted for a healthy participant of the same age, sex, and level of education (Agelink van Rentergem et al., Reference Agelink van Rentergem, de Vent, Schmand, Murre and Huizenga2017a, Reference Agelink van Rentergem, Murre and Huizenga2017b). The MNC procedure results in a p value, which indicates abnormality when it is below a certain threshold. We tested for impairment (one-tailed), that is, only deviations in the negative direction were classified as impaired. In univariate comparisons, if no subjective complaints were present, we required four instead of two significant results. In the MNC, this adaptation is not possible, as only one statistical test result is obtained. Therefore, we used different threshold values for those with and without subjective complaints, to determine whether results were significant. For patients without subjective complaints, we used the same threshold of p <.067 (corresponding to −1.5 SD). For patients with subjective complaints, a less strict criterion was chosen, mimicking the procedure for the univariate comparisons. Therefore, we used a p value of 2 *.067 one-tailed for this group.
Analysis
We calculated whether the classification of cognitive impairment at baseline is predictive of progression to PDD. Sensitivity and specificity were compared across these three methods. Sensitivity was calculated by dividing the number of patients who were classified as impaired at baseline and progressed to PDD by the total number of patients who developed PDD. Specificity was calculated by dividing the number of patients who were classified as “not-impaired” at baseline and did not progress to PDD by the total number of patients who did not develop PDD. This was done separately for the progression to PDD after 3 years, and after 5 years.
RESULTS
Demographic Characteristics
In Table 3, demographic and clinical characteristics of the patients are given, where the patients are separated into cognitively normal and abnormal categories using each of the three methods.
Table 3. Demographic and clinical characteristics for the three groups (PD-MCI criteria, ANDI PD-MCI criteria, and ANDI MNC) at baseline

MMSE = Mini-Mental State Examination; LED = LEVODOPA EQUIVALENT DOSE; UPDRS = Unified Parkinson’s Disease Rating Scale; H&Y = Hoehn & Yahr scale; HADS = Hospital Anxiety and Depression Scale; SE-ADL = Schwab & England Activities of Daily Living; BADL = Behavioral Assessment of Daily Living.
Progression to PDD
Figure 1 gives an overview of progression to PDD for each method evaluated. According to the criteria used by Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013), at baseline, 35% of the PD patients had PD-MCI. After 3 years, 16% of the PD-MCI patients had progressed to PDD and 65% had not. Of the group who did not have PD-MCI, 3% of patients nevertheless had progressed to PDD and 75% had not (the remaining patients were lost to follow-up). After 5 years, 23% of those with PD-MCI at baseline had progressed to PDD while 32% had not. Of the group who did not have PD-MCI, 9% had progressed to PDD while 53% had not.

Fig. 1. Progression of PD patients (n = 123) to PDD after 3 (n = 97) and 5 years (n = 73) for the three methods: PD-MCI criteria, PD-MCI criteria applied with ANDI, and MNCs applied with ANDI.
The PD-MCI criteria applied with ANDI show that at baseline 27% of the patients had PD-MCI. After 3 years, 18% of the PD-MCI patients had progressed to PDD and 61% had not. Of the group who did not have PD-MCI, 3% had progressed to PDD and 55% had not. After 5 years, 24% of the PD-MCI patients had progressed to PDD and 24% had not. Of the group who did not have PD-MCI, 10% patients had progressed to PDD while 53% had not.
The MNC method applied with the ANDI normative data shows that at baseline 26% PD patients were considered to be MNC abnormal versus 74% who were not. After 3 years, 25% of the MNC abnormal PD patients had progressed to PDD and 50% had not. Of the group who were not MNC abnormal, 1% had progressed to PDD and 79% had not. After 5 years, 38% of the MNC abnormal PD patients had progressed to PDD and 19% had not. Of the group who were not MNC abnormal, 5% patients nevertheless had progressed to PDD while 54% had not.
In Figure 1, it is not visible how much overlap there is in the three different types of diagnostic methods at baseline. For example, the 32 classified as MNC abnormal could theoretically be different patients from those 33 classified as having PD-MCI using the ANDI method. The overlap in diagnoses between pairs of classification methods is explored in Supplement 2. Each of the three methods did indeed differ somewhat in the patients they classified as impaired, although the percentages of agreement were high (78–87%) and kappa’s ranged from .49 to .68.
Sensitivity and Specificity
Sensitivity and specificity of the three methods are given in Table 4. After 5 years, sensitivity decreased for all methods but specificity remained similar or increased. However, the confidence intervals are large due to the small size of the samples that remained after attrition. Although differences between the sensitivities and specificities for the three methods did not reach significance (all ps > .099), there is a trend showing higher specificity of the MNC method.
Table 4. Sensitivity and specificity for progression to PDD of each method (original PD-MCI criteria, PD-MCI applied with ANDI, and MNC method applied with ANDI), specified for 3- and 5-year follow-ups. In parentheses: 90% confidence interval (Agresti & Coull, Reference Agresti and Coull1998)

Comparing the results of the original PD-MCI criteria to those obtained with the same criteria using ANDI shows a trade-off: sensitivity is higher for the original PD-MCI criteria, and specificity is higher for the PD-MCI criteria applied with ANDI. Therefore, there does not seem to be a clear advantage for either method.
MNCs as applied with ANDI fare better than the other two methods. At both the 3-year and the 5-year follow-ups, sensitivity and specificity were higher for the multivariate method than for the two univariate methods.
Cognitive Domains
We explored which cognitive domains were most often impaired in PD patients who were MNC abnormal, and whether there was a distinct profile for the patients who progressed to PDD. Figure 2 shows the mean demographically corrected z scores at baseline. Negative z scores indicate worse performance than the norm. From the figure, it can be observed that those who were MNC abnormal at baseline (solid lines) mainly showed impairment on the Rivermead Behavioural Memory Test and were slightly more impaired on the Trail Making Test (TMT) part A and the Wechsler Adult Intelligence Scale Revised edition (WAIS-R) Digit Symbol Coding task compared to those who were not MNC abnormal at baseline (dashed lines). Performance on the WAIS-R Digit Symbol Coding task seemed to be low for both those who were MNC normal and MNC abnormal at baseline. This is probably due to PD pathology affecting motor performance.

Fig. 2. Mean demographically corrected z scores for PD patients at baseline. Black lines indicate PD patients who progressed to PDD after 5 years. Gray lines indicate no PDD after 5 years. The solid line is MNC abnormal, and dashed is not MNC abnormal.
No clear difference is visible between those who progressed to PDD after 5 years (black lines), and those who did not (gray lines). From the figure, we can see that the MNC abnormality is primarily driven by the Auditory Verbal Learning Test (AVLT), Rivermead Behavioral Memory Test (RBMT), and letter fluency variables, as this is where the deepest troughs are for the MNC abnormal group (solid lines). If we look at just these tests, those who eventually do develop PDD (black solid line) are those with low scores on AVLT and letter fluency. Therefore, these tests seem to be most sensitive. The figure in Supplement 1 plots a line for every individual patient, and thus provides more detailed information on individual differences.
DISCUSSION
We investigated three methods for detecting cognitive abnormalities in PD patients that predict progression to PDD. We compared the predictive performance of the PD-MCI criteria, applied either with traditional normative data (Broeders et al., Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013) or with the ANDI normative database, to the performance of MNC using the ANDI database. We found that the number of patients diagnosed with PD-MCI at baseline differed between these methods. The original PD-MCI criteria as applied by Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013) resulted in 35% of the PD patients being diagnosed with PD-MCI. Using the same criteria but with ANDI normative data, this decreased to 27%. The MNC method applied with ANDI concluded that 26% of the patients were cognitively abnormal at baseline. In the literature, the frequency with which cognitive impairments in PD patients are reported differs greatly between studies (probably due to differences in methodology and in sample characteristics, such as disease duration or severity). Studies with comparable methods to ours (1.5 SD deviating on at least 2 out of 10 tests) show that between 21% and 60.5% of PD patients are diagnosed as PD-MCI (Domellöf et al., Reference de Vent, Agelink van Rentergem, Schmand, Murre and Huizenga2015; Galtier et al., Reference Folstein, Folstein and McHugh2016; Gasca-Salas et al., Reference Galtier, Nieto, Lorenzo and Barroso2014; Hobson & Meara, Reference Hobson and Meara2015; Janvin et al., Reference Hughes, Ross, Musa, Bhattacherjee, Nathan, Mindham and Spokes2003; Pedersen et al., 2017; Santangelo et al., Reference Santangelo, Vitale, Picillo, Moccia, Cuoco, Longo, Pezzella, di Grazia, Erro, Pellecchia and Amboni2015). The new MNC technique yields a number that lies at the low end of this range.
In terms of prediction of progression to PDD, the MNC method applied with the ANDI database performed best. Although classification accuracy was not statistically different between methods, both sensitivity and specificity were higher for the multivariate method than for the two PD-MCI criteria methods. This suggests that the improvement is mainly due to use of a multivariate statistical technique and not due to use of a large aggregated database. This was true for both the prediction of PDD after 3 and after 5 years. Between the two PD-MCI criteria methods, there was little difference in terms of accuracy. The PD-MCI criteria applied with ANDI resulted in a slightly lower sensitivity and a slightly higher specificity compared to the PD-MCI criteria as applied with Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013). Just using the ANDI database instead of traditional norms therefore does not seem to improve prediction.
Previous studies that also used 1.5 SD as a cutoff score reported a sensitivity of the PD-MCI criteria for PDD ranging from .52 (Pedersen et al., 2017) to .92 (Gasca-Salas et al., Reference Galtier, Nieto, Lorenzo and Barroso2014) and specificity ranging from .46 (Galtier et al., Reference Folstein, Folstein and McHugh2016) to .94 (Hobson & Meara, Reference Hobson and Meara2015). Therefore, the sensitivity and specificity estimates obtained with the MNC are at the high end of the spectrum.
For all three methods studied in this article, a decrease in sensitivity can be observed between the 3-year and 5-year follow-ups. An explanation would be that with a short period between baseline and PDD diagnosis, most patients who progressed to dementia were already rather severely impaired, leading to a high sensitivity. With more time between baseline and PDD diagnosis, some patients who progressed to dementia may have been unimpaired at baseline, leading to a lower sensitivity. Similarly, a small increase in specificity between the 3-year and 5-year follow-ups can be observed. This is explained by the time it takes to progress to dementia: patients who are impaired at baseline may still not progress to dementia in the first few years after baseline, leading to a low specificity. As more time passes however, patients who were impaired at baseline will probably progress to dementia, leading to an increase in specificity.
In our sample, it appeared that the patients who were MNC abnormal at baseline mainly experienced difficulties in memory and attention. These findings are in line with earlier studies, where PD-MCI patients showed problems with memory (Aarsland et al., Reference Aarsland, Brønnick, Larsen, Tysnes and Alves2009; Galtier et al., Reference Folstein, Folstein and McHugh2016; Janvin et al., Reference Hughes, Ross, Musa, Bhattacherjee, Nathan, Mindham and Spokes2003; Mamikonyan et al., Reference Litvan, Aarsland, Adler, Goldman, Kulisevsky, Mollenhauer, Rodriguez-Oroz, Tröster and Weintraub2009; Pedersen et al., 2017) and attention (Elgh et al., Reference Dubois, Burn, Goetz, Aarsland, Brown, Broe, Dickson, Duyckaerts, Cummings, Gauthier and Korczyn2009; Mamikonyan et al., Reference Litvan, Aarsland, Adler, Goldman, Kulisevsky, Mollenhauer, Rodriguez-Oroz, Tröster and Weintraub2009; Pedersen et al., 2017). Difficulties in other domains, such as executive functions (Elgh et al., Reference Dubois, Burn, Goetz, Aarsland, Brown, Broe, Dickson, Duyckaerts, Cummings, Gauthier and Korczyn2009; Janvin et al., Reference Hughes, Ross, Musa, Bhattacherjee, Nathan, Mindham and Spokes2003; Pedersen et al., 2017), language (Hobson et al., Reference Gasca-Salas, Estanga, Clavero, Aguilar-Palacio, González-Redondo, Obeso and Rodríguez-Oroz2004), and visuospatial abilities (Janvin et al., Reference Hughes, Ross, Musa, Bhattacherjee, Nathan, Mindham and Spokes2003; Williams-Gray et al., Reference Williams-Gray, Foltynie, Brayne, Robbins and Barker2007), are also reported in the literature but were not found in our sample. This could be due to the heterogeneity of the PD population. In our patient sample, newly diagnosed PD patients were examined. Since disease duration determines to a large extent which cognitive functions are impaired (Hobson et al., Reference Gasca-Salas, Estanga, Clavero, Aguilar-Palacio, González-Redondo, Obeso and Rodríguez-Oroz2004; Hughes et al., Reference Hoogland, Boel, de Bie, Geskus, Schmand, Dalrymple-Alford, Marras, Adler, Goldman, Tröster and Geurtsen2000; Litvan et al., Reference Kobayakawa, Koyama, Mimura and Kawamura2011), our patients might eventually also develop difficulties in these other cognitive domains (Muslimović et al., Reference Mimura, Oeda and Kawamura2009). One difficulty with studying cognition in a movement disorder like PD is that motor problems may present confound, as many of the tests used to examine cognition require motor skills to some extent. In our sample, patients with MCI also had more severe motor problems [as evidenced by their Unified Parkinson’s Disease Rating Scale (UPDRS) and Hoehn & Yahr scores]. They are indeed somewhat impaired on tests that require motor skills, but they are equally impaired on tests that do not require motor skills, like memory tests. Therefore, we argue that MCI is primarily a reflection of cognitive problems, not motor problems, in this sample.
There are several limitations to our study. First, the number of patients was not very large (n = 123) and loss to follow-up was quite high (21% at 3 years, and another 25% at 5 years). However, the numbers lost to follow-up are not different between those cognitively normal or abnormal at baseline (in the tables in Supplement 2 a specification of which patients were lost to follow-up is given). Because a formal test of a difference between rates requires a very large sample size if the prevalence of a disease is low (Carley et al., Reference Carley, Dosman, Jones and Harrison2005), the power to detect differences between the sensitivity and specificity of the different methods was low, and confidence intervals were broad. We recommend that future studies either collect a much larger sample, or find a way to synthesize the literature to obtain a better estimate of these rates. Second, the age range in the current sample is quite broad (35–84 years at baseline). Consequently, the sample may have been heterogeneous with regard to cognitive functioning. Third, even though all cognitive domains prescribed by the PD-MCI criteria (Litvan et al., Reference Lezak, Howieson, Bigler and Tranel2012) were studied, tests outside the traditional domains of neuropsychology have been shown to be informative in this type of clinical research. For example, it is known that decision-making in PD patients is impaired (Kobayakawa et al., Reference Janvin, Aarsland, Larsen and Hugdahl2008; Mimura et al., Reference Litvan, Goldman, Tröster, Schmand, Weintraub, Petersen, Mollenhauer, Adler, Marder, Williams-Gray and Emre2006). Therefore, future research could also include other domains of cognition when evaluating PD patients, to better represent the impairments with which they struggle.
In diagnosing PDD, we applied the MDS consensus criteria (Dubois et al., Reference Domellöf, Ekman, Forsgren and Elgh2007; Emre et al., Reference Elgh, Domellöf, Linder, Edström, Stenlund and Forsgren2007). This procedure did not entail an in-depth cognitive assessment, which is a limitation. Moreover, we used the MMSE which, since the start of the study in 2003, has been criticized and is no longer considered the optimal choice for this purpose (Hoops et al., Reference Janvin, Aarsland, Larsen and Hugdahl2009; Skorvanek et al., Reference Skorvanek, Goldman, Jahanshahi, Marras, Rektorova, Schmand, van Duijn, Goetz, Weintraub, Stebbins and Martinez-Martin2018). For future research, where cognition needs to be assessed in a time-constrained setting, we would advise to use other instruments such as the Montreal Cognitive Assessment (MoCA) (Nasreddine et al., Reference Muslimović, Post, Speelman, De Haan and Schmand2005), or the Disease-Cognitive Rating Scale (PD-CRS) (Pagonabarraga et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead, Collin, Cummings and Chertkow2008, p. 14).
In this study, no correction was made for premorbid IQ, even though IQ influences the scores that patients achieve on neuropsychological tests (Testa et al., Reference Testa, Winicki, Pearlson, Gordon and Schretlen2009). Corrections for premorbid cognitive functioning were not performed here, because they were not included in the original determination of PD-MCI by Broeders et al. (Reference Broeders, De Bie, Velseboer, Speelman, Muslimovic and Schmand2013), and were not available from the ANDI database. We recommend that future work will take premorbid IQ in account, for example, by correcting scores using a reading test like the reading subtest of the Wide Range of Achievement Test in Autism (WRAT) (Wilkinson & Robertson, Reference Wilkinson and Robertson2006) or the Test of Premorbid Functioning (TOPF) (Pearson Assessment, Reference Nelson2009). Furthermore, in this study, we have focused on just PD and progression to PDD. For future studies, we recommend that the influence of more disorders, such as Parkinson’s plus disorders and psychiatric disorders, is taken into account.
Subjective complaints were used in PD-MCI criteria and MNC. Therefore, subjective complaints played a large role in determining the diagnoses in this study, while they were established using only two questions. Possibly, higher specificity and sensitivity would have been obtained, had we established subjective complaints more formally, for example, with a longer, validated questionnaire, ideally including reports by relatives, caregivers, and clinicians. Instead, for patients without subjective complaints, deviation on at least four neuropsychological tests was used as a criterion for PD-MCI, and a stricter criterion was used for MNC.
It is not easy to apply MNCs in daily clinical practice. We therefore developed the user-friendly ANDI website. Currently, ANDI is only applicable to the Dutch-speaking population. However, we provide it as an open source infrastructure. It is possible to copy ANDI and recreate the system in other countries, which would only require a local (aggregated) normative sample. Doing so would make MNCs easier applicable in the clinical setting.
In sum, using the large aggregated data set of the ANDI database circumvents limitations found in standard norm-referenced testing. The MNC method works at least as well as the more conventional approach of the PD-MCI criteria and trended toward a better classification accuracy for predicting future dementia.
ACKNOWLEDGEMENTS
We have no conflicts of interest to disclose. This study was funded by The Netherlands Organization for Scientific Research (NWO) (project number 480-12-015).
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S1355617719000298