INTRODUCTION
There is now ample evidence that patients with Parkinson's disease (PD), even in the absence of clinically apparent dementia, can exhibit selective impairments in several cognitive domains (Dubois & Pillon, 1997). Some of these impairments, such as deficits in executive function and memory are observed already in the early stages of the disease (Muslimovic et al., 2005), whereas others, such as deficits in visuospatial abilities seem to occur later in the disease process (Della Sala et al., 1986; Levin et al., 1991). Current knowledge about cognitive changes in PD is largely obtained from cross-sectional studies of patients at various stages of the disease. Whereas cognitive decline is considered to be an integral part of the natural course of PD, there is little information regarding the exact pattern and degree of changes in specific cognitive domains over time. Most longitudinal studies examining non-demented PD patients have focused on the incidence and predictive factors of dementia rather than on the nature of cognitive deterioration itself.
Studies addressing the progression of cognitive deficits in PD have produced varying results with respect to the affected functional domains and the severity of the reported changes. For example, Bayles et al. (1996) reported significant deterioration in global cognitive ability among a subset of PD patients two years after the initial assessment, whereas another study described fairly stable performance even after four years (Growdon et al., 1990). Similarly, executive functions have been found by some to deteriorate (Caparros-Lefebvre et al., 1995) and by others to remain stable (Ramirez-Ruiz et al., 2005). These discrepant findings may be explained in part by differences in the methods of patient selection, the cognitive tests employed, and by the generally small sample sizes, which might have reduced the statistical power to detect subtle changes. Moreover, some studies do not explicitly distinguish between demented and non-demented patients at the inclusion in the study, using instead an unselected population. In light of these inconsistencies and limitations, we felt that a meta-analysis of the published data was needed to clarify the pattern and extent of cognitive changes that accompany the progression of PD.
The primary aim of this study was to determine the magnitude and pattern of cognitive decline in initially non-demented PD patients by quantitatively combining data from previous studies using meta-analytic techniques. Studies included in this review adopted a longitudinal design, and the outcome measure in the current analysis represents the difference between patient's cognitive performance at initial evaluation when no clinical indications of dementia were present and their own scores at follow-up assessment. The neuropsychological tests were grouped into functional domains to aid in the interpretation of data. In addition, we evaluated the influence of demographic variables, such as age and education and clinical characteristics such as disease duration on the magnitude of cognitive changes. Furthermore, we assessed the effect of the length of follow-up.
METHODS
The data included in this study were obtained in compliance with the guidelines of the Helsinki Declaration.
Literature Search
A comprehensive literature search of Medline (1966–2006), PsychInfo (1972–2006) and Biological Abstracts (1993–2006) was conducted to identify articles for inclusion in the review. The keywords used in the search strategy were Parkinson disease, or Parkinson's disease in combination with cognition, or cognitive impairment, or memory, or executive function, or neuropsychological tests, and longitudinal studies, or prognosis, or progression. The search was completed in January 2006 and was limited to English-language articles. Articles were inspected for relevant references to locate additional studies for inclusion in the review.
Inclusion Criteria
To be included in the analysis, studies had to meet the following criteria:
- The diagnosis of idiopathic PD had to be made according to validated clinical criteria (e.g. United Kingdom Parkinson's Disease Society Brain Bank criteria [UKPDSBBC]; Gib & Lees, 1988). Studies, which did not explicitly report diagnostic criteria were still considered for inclusion if the diagnosis was made or confirmed by a board-certified neurologist.
- Patients had to be examined prospectively on at least two occasions with the same neuropsychological tests (i.e., a longitudinal design).
- Patients had to be free of clinical dementia at initial evaluation. Patients were considered to be demented if their diagnoses were made according to standardized clinical criteria (e.g., Diagnostic and Statistical Manual of Mental Disorders [DSM]; American Psychiatric Association, 1994), or if their performance fell below a traditionally employed cutoff on a screening measure for dementia. If the study sample comprised both demented and non-demented patients at the initial evaluation, the results of non-demented patients had to be analyzed separately in the original article in order to be included.
- At least one standardized neuropsychological test had to be employed as a dependent variable.
- Test scores had to be presented for the PD group both at baseline and at follow-up (mean and standard deviations), or other statistics had to be reported that could be converted to effect sizes (e.g. exact p-values or t values). In case this information was not originally reported, we contacted the study authors to obtain the relevant statistics.
In cases where different papers reported data concerning the same group of patients, the study with the largest sample or with the longest follow-up period (provided that the difference in the number of participants was not substantial) was included in the analysis.
Exclusion Criteria
Reports published only in abstract form were excluded. Clinical trials were excluded for two reasons. First, patients entered into trials may not be representative of the population with the disorder (Laupacis et al., 1994). Second, we wanted to examine the natural course of cognitive decline in PD. However, studies that included a control sample of PD patients who were assessed serially but who did not undergo an experimental treatment (neither active nor placebo) during the test-retest interval, were considered eligible. If the study sample comprised PD patients with evidence of delirium either at baseline or at follow-up, the results obtained from these patients were not included in the meta-analysis.
Outcome Measures
A number of cognitive domains were assessed across studies and many different tests were used to measure these domains. Classification of tests into functional domains was based on the detailed descriptions of task characteristics and the corresponding area of cognitive functioning described in the two standard textbooks of neuropsychological assessment, which are widely accepted both in clinical practice and in cognitive research (Lezak et al., 2004; Strauss et al., 2006). Tests from the individual studies were categorized into the following eight functional domains: global cognitive ability, verbal ability, memory, verbal fluency, mental flexibility and reasoning, attention and processing speed, visuoperceptual functions, and visuoconstructive skills. Appendix A lists the tests that were included in each cognitive domain.
Calculation of Effect Sizes and Statistical Analysis
From the data reported in each study, the effect-size estimate Hedge's g (Hedges & Olkin, 1985) was calculated, indicating the mean difference between baseline and follow-up divided by the pooled standard deviation. When means and standard deviations were not available, effect sizes were calculated from other reported statistics using the methods described by Rosenthal (1991). Cognitive test results obtained at different measurement points are expected to be dependent, and effect sizes need to be corrected for this correlation. However, none of the studies in the current meta-analysis reported the correlation between baseline and follow-up test scores. Therefore, these correlation coefficients were estimated from a few studies that provided raw scores, t-tests, or corresponding p-values (Azuma et al., 2003; Bayles et al., 1996; Caparros-Lefebvre et al., 1995; Katsarou et al., 1998; Ramirez-Ruiz et al., 2005). These studies employed follow-up intervals similar to the average follow-up period of the review (see Table 1). Separate correlation coefficients were calculated for each cognitive domain. Subsequently, these correlation coefficients were used to adjust the variance (V) for repeated measurements using the formula:
where sb and sf are the standard deviations at baseline and follow-up, respectively, r is the correlation coefficient between baseline and follow-up test scores, and n is the sample size.
If no exact values were reported for non-significant results, we adopted a conservative approach, and set the effect size for that measure at zero (Rosenthal, 1991). This method was applied for four outcome measures derived from two studies (Growdon et al., 1990; Hovestadt, 1990). If the study reported a range of the significance level rather than exact values, the p-value used to calculate the effect size was set only marginally lower (i.e., for p < .05 it was set at .049, for p < .01 at .0099, and for p < .001 at .00099). This was the case in only one study (Caparros-Lefebvre et al., 1995).
From the effect sizes obtained in individual studies (i.e., g-values), a combined d-value was calculated, expressing the magnitude of cognitive decline across studies. This d-value was weighted for the sample sizes of the individual studies in order to correct for upwardly biased estimation of the effect in small samples (Hedges & Olkin, 1985). Separate pooled effect sizes were calculated for each of the eight cognitive domains. In addition, 95% confidence intervals (CI) were computed around each mean weighted effect size to provide an estimate of the variability of d and to test whether the d-value was significantly different from zero.
Because the studies included in the present analysis are diverse with regard to both clinical variables and methods of assessment of cognitive functions, heterogeneity in their results is to be expected. Therefore, a random-effects model was considered appropriate to generate the pooled effect sizes (Raudenbush, 1994).
For both individual and pooled effect sizes, a positive direction indicated worsening of cognitive performance over time. In accordance with Cohen's guidelines (1988) for interpreting effect sizes, values between .20 and .50 were considered as small, values between .50 and .80 as medium, and values above .80 as large.
In studies that used more than one measure to assess a particular cognitive domain, an averaged effect size was calculated for the final analysis. Thus, each study was allowed to contribute only one effect size for each functional domain. This strategy was used to avoid that one study dominated the results.
In studies in which the sample was divided in subgroups on the basis of certain variables (e.g., presence vs. absence of depression, Hoehn & Yahr score), effect sizes were calculated from the data of the whole group. If such findings were not reported, the effect sizes were calculated for each subgroup separately and a weighted average of these effect sizes was used in the final analysis.
To determine whether there was variation in effect sizes across studies, the chi-square statistic Q was calculated. The Q-statistic quantifies the degree to which the studies contributing to each respective mean effect size can be regarded as homogeneous. A significant Q-value indicates heterogeneity among studies contributing to the particular mean, in which case a further search for potential moderator variables is needed. It is recommended, however, to perform additional analyses even when the Q-statistic is not significant, to determine whether the magnitude of the effect sizes covaries with some attributes of the study (Rosenthal, 1991).
Because the Q-statistic is believed to have limited value in detecting true heterogeneity among studies because of its rather low power, particularly when meta-analyses encompass a relatively small number of studies, an alternative index I2 indicating the degree of inconsistency in the study results was calculated using the equation I2 = (Q − df)/Q × 100%, where Q is Cochran's heterogeneity statistic and df the degrees of freedom (the number of studies − 1). The I2 index reflects the percentage of total variation across studies that are caused by heterogeneity rather than chance (Higgins et al., 2003). A value of 0% indicates no observed heterogeneity and larger values show increasing heterogeneity. In the moderator analysis, the Qw statistic indicates the degree of heterogeneity of studies within categories. The QB statistic refers to a difference between categories of the moderator variable, and if statistically significant, suggests the influence of this moderating variable. All analyses were performed using the statistical software MetaWin version 2.0 (Rosenberg et al., 2000).
Publication Bias
Extracting data from published studies may bias results in favor of a significant mean effect size, because studies with non-significant results are less likely to be published (Rosenthal, 1991). To examine the possibility of publication bias, the fail-safe N was calculated for each mean weighted effect size to estimate the number of negative studies which would be necessary to render the results non-significant (Rosenthal, 1991). In addition, Rosenthal (1991) recommends the value of 5k + 10 (where k is the number of studies used to calculate the effect size) as a reasonable, conservative estimate of existing unpublished or unretrieved studies against which to test a fail-safe calculation. If the fail-safe N is large relative to this estimated number of negative studies one could be fairly confident that the observed result is a reliable estimate of the true effect.
Moderator Variables
We assessed the potential influence of demographic and clinical variables on effect sizes using categorical models. The demographic variables included age of the patients and years of education. Among clinical variables, only duration of disease was included in the analysis. The potential influence of other demographic and clinical characteristics (e.g. sex, severity of disease, medication) on effect sizes could not be evaluated due to a rather small number of studies reporting exact information for these parameters.
With regard to study characteristics, we were particularly interested in the effect of length of follow-up period. Therefore, we conducted additional analyses to examine the extent to which changes in cognitive functioning varied as a function of follow-up interval. For all variables included in moderator analyses, the division of groups was based on a median split.
RESULTS
Studies Retrieval
The literature search identified 85 longitudinal studies on PD, excluding those that examined the effects of treatment. Of these, three reports were published in abstract form; 12 studies reported data from (almost) the same samples; 11 studies included a number of patients who either met the clinical criteria for dementia or were suspected of having dementia at the initial assessment; 24 studies did not employ neuropsychological tests as the outcome measure; one study employed an uncommon test that was not comparable with other measures; and nine studies did not report relevant statistics to calculate effect sizes. Twenty-five of the studies that had been identified met criteria and were included in this meta-analysis. Demographic and clinical characteristics of each of these studies are shown in Table 1.
Participant and Study Characteristics
In total, neuropsychological test results from 901 initially non-demented patients with PD who underwent both baseline and follow-up assessment were recorded in the meta-analysis. As can be seen in Table 2, patients were on average 64.7 years of age at initial evaluation and were affected by the disease for approximately 8 years. The mean Hoehn and Yahr score (Hoehn & Yahr, 1967) indicated a moderate degree of motor disorder. From the studies that provided sufficient information, it appeared that the majority of patients were receiving anti-Parkinson's disease medication at baseline. Follow-up assessments ranged from 2.4 months to 8 years, with a mean test-retest interval of 29 months.
Thirteen studies employed a dementia screening measure, whereas in 12 studies a more comprehensive clinical examination was used to assess the presence of dementia at baseline. In three studies the presence of depression was an exclusion criterion both at baseline and at follow-up. The majority of studies (76%) included patients who remained non-demented at the follow-up evaluation. One study comprised non-medicated de novo patients at baseline, whereas the remaining studies included patients on anti-Parkinson's disease medication who were in later stages of disease. With regard to follow-up assessment, 88% of the studies utilized one follow-up measurement point, 8% of the studies used two measurement points, and in one study patients were examined ten times after the baseline evaluation.
Effect Sizes
A total of 154 effect sizes were extracted covering eight cognitive domains. Mean weighted effect sizes (d-values) for each domain are shown in Table 3. In five domains (verbal ability, verbal fluency, mental flexibility/reasoning, attention and processing speed, and visuoperceptual functions) effect sizes were negligible (d ≤ .10). For global cognitive ability, visuoconstructive skills, and memory the effect sizes were small in magnitude, but statistically significant (p < .05). The greatest degree of decline was observed in global cognitive ability (d = .40).
The Q statistic indicated no significant heterogeneity for any of the effect sizes. This finding is further supported with index I2, which revealed no heterogeneity whatsoever for seven functional domains and only small variability across studies that contributed data for global cognitive ability.
Publication Bias
For cognitive domains that showed significant decline, the fail-safe N and an estimate of unpublished, non-significant studies were calculated (Table 3). The fail-safe N of 473 for global cognitive ability exceeded the estimate of 105 existing unpublished studies reporting non-significant results (tolerance level), which suggests that the observed effect cannot be explained by publication bias. Similarly, the fail-safe N of 78 studies for memory domain exceeded the tolerance level of 70, whereas only 30 studies (below the tolerance level of 45) would be required to reduce the findings regarding visuoconstructive skills to a non-significant level.
Influence of Moderator Variables
Older age at baseline was related to a greater degree of deterioration on measures of global cognitive ability and memory (Table 4). The finding that older age resulted in improved performance on tests of attention and processing speed over time is surprising. Closer inspection of studies that contributed data for this domain reveals that this finding is primarily because of one study (Dujardin et al., 2004), which observed fairly large improvement on two outcome measures (i.e., Stroop test color naming, Stroop interference test) in a sample of newly diagnosed PD patients. When this study was excluded from the analysis, no relationship was found between age and performance on tests of attention and psychomotor speed.
As shown in Table 5, educational level was associated with decline in all cognitive domains assessed. Specifically, fewer years of formal education were associated with a greater degree of decline in all domains. The influence of education was most pronounced on mental flexibility and reasoning (p ≤ .0001), attention and processing speed (p ≤ .001), and memory (p ≤ .01). The duration of disease was not related to change in any of the cognitive domains assessed (data not shown).
Table 6 displays the results regarding the effects of the length of follow-up period on the magnitude of decline across cognitive domains. Studies with follow-up intervals longer than 2 years yielded larger effect sizes for global cognitive ability (p < .05). The influence of age and the length of test-retest interval on visuoperceptual functions could not be assessed due to insufficient number of studies.
DISCUSSION
The results of the present meta-analysis indicate that non-demented PD patients show a relatively small decline in cognitive functions during an average follow-up period of 2.5 years. Specifically, the magnitude of decline in the various cognitive domains ranged from 0 to 0.4 standard deviations. Among the cognitive domains analyzed, global cognitive ability, visuoconstructive skills, and memory showed statistically significant declines. In all domains the homogeneity between studies was acceptable, which suggests that the reported effects can be considered reliable.
Our findings suggest that there are subtle differences in the extent of decline across the various cognitive domains. Most prominent changes over time were detected on global dementia screening measures (e.g., MMSE), which summarize multiple aspects of cognitive function. One would therefore expect to find corresponding changes in several other cognitive domains. This was indeed the case for memory and visuoconstructive skills. In contrast, changes in the remaining domains were negligible. However, one should be cautious when comparing effect sizes. For example, effect sizes found for global cognitive ability were based on the sample of 702 patients, whereas effects for all other domains were based on considerably fewer subjects. Furthermore, it is likely that some heterogeneity exists with regard to the individual trajectories of cognitive decline. In some patients, for example, executive function may be the first domain to show decline, whereas in others memory may deteriorate first. Consequently, this results in smaller effect sizes for specific cognitive domains, and more pronounced effects on global cognitive measures because these latter tests are likely to detect changes that occur in any of the specific domains.
In light of the generally held clinical view that PD is accompanied by progressive cognitive impairment, our finding of a minimal decline in a number of cognitive domains is puzzling. Perhaps the most remarkable result of our meta-analysis is the absence of clear evidence for deterioration in executive function, whereas it is widely believed that this type of decline occurs already in the early stages of the disease (Levin & Katzen, 2005). Executive dysfunction should therefore have been reflected in relatively large effect sizes for verbal fluency, mental flexibility and reasoning, and attention and processing speed. One possible explanation for this lack of decline in executive abilities is that it occurs in the very early stage and then tapers off during the mid stages of the disease. The patients in our analysis, having a mean disease duration of 7.7 years, were in a mid stage. However, comparison of baseline scores on tests of executive functions with normative data (Strauss et al., 2006) revealed that PD patients in the present review exhibited only moderate decrements (mean z-score = −0.6), whereas impairments of attention and processing speed were much more prominent (z-score = −1.1). Effects of medication might in part account for this finding, since there is some evidence that levodopa treatment may improve executive functions of PD patients (Lange et al., 1992).
Another explanation for the present results could be a selection bias. Loss to follow-up was considerable in some of the studies. In general, the oldest and cognitively most compromised patients are the ones to drop out, causing reductions of effect sizes in this type of research (Levin et al., 2000). This is underscored by the fact that only four studies (Bayles et al., 1996; Ebmeier et al., 1990; Stern et al., 1998; Tachibana et al., 1995) comprised patients with evidence of dementia at follow-up evaluation. A separate analysis of these studies revealed a mean effect size of 0.46 for global cognitive ability, whereas studies with non-demented patients yielded a mean effect size of 0.38. Moreover, only one study (Stern et al., 1998) including demented PD patients utilized a comprehensive neuropsychological test battery. Individual effect sizes generated from that study were in the small range (g = .1–.4) for visuoperceptual functions, naming ability, abstract reasoning and verbal fluency, whereas moderate to large declines (g = 0.5–1.5) were observed on constructive skills, global cognitive functioning and various aspects of memory. Thus, it seems likely that the absence of dementia in the majority of samples used in the present meta-analysis has led to smaller effect size estimates for some cognitive domains. Other meta-analyses that cross-sectionally compared neuropsychological test performance of PD patients and healthy control subjects have repeatedly demonstrated that impairments in the PD group are particularly robust when there is evidence of dementia (Henry & Crawford, 2004; Zakzanis & Freedman, 1999).
There are some methodological issues that might have affected our findings. This meta-analysis included longitudinal studies that used neuropsychological tests as dependent variables. An important issue related to repeated neuropsychological testing is practice or retest effects. With repeated exposure to the same task, novelty of testing is likely to diminish. Consequently, implicit knowledge of test demands and a reduction of potential anxiety associated with examination procedure may enhance the subsequent test performance. These factors might have obscured real decrements in cognitive functions. Observations from nonclinical populations (Basso et al., 1999; Dikmen et al., 1999) suggest that measures with a problem-solving component or measures requiring generation of a strategy (e.g. Wisconsin Card Sorting Test) tend to be particularly susceptible to practice effects. Thus, it is possible that the absence of change in some aspects of executive function may be an artifact of practice effects compensating real decline. Methods to control or reduce practice effects are the use of control groups, use of alternate forms of tests, and increase of test-retest intervals (McCaffrey & Westervelt, 1995).
In the present meta-analysis, only seven studies (Aarsland et al., 2004; Azuma et al., 2003; Bayles et al., 1996; Firbank et al., 2005; Hayashi et al., 1996; Katsarou et al., 1998; Serrano-Dueñas & Bleda, 2005) utilized a healthy control group. This was too small a number to allow comparison of cognitive decline in PD with normal cognitive aging (see also remarks below). The effect size for global cognitive ability obtained from studies that included a control group (d = .45) was similar to that for studies without control groups (d = .38). Furthermore, only three studies (Schmand et al., 2000; Smeding et al., 2006; Stebbins et al., 2000) used alternate forms of some tests. With regard to the length of test-retest interval, 40% of the studies conducted follow-up assessment at rather short period of 12 months or less after baseline. Thus, few investigators have attempted to minimize the impact of practice effects. Future research in this field should be conducted more carefully and take the potentially confounding effects associated with repeated testing into account.
Another issue concerns the statistical model applied in the present analysis. We used the random-effects meta-analytical model rather than the more commonly employed fixed-effects model to obtain estimates of the mean effect sizes. Random-effects models are based on the assumption that in addition to sampling error, there is a true heterogeneity among the estimated population values as another source of variability that contributes to the observed differences in effect sizes across samples (Raudenbush, 1994). In comparison to fixed-effects models, random-effects analyses yield wider confidence intervals around the average effect size, thereby reducing the likelihood of committing a type I error. Random-effects analyses are therefore considered more conservative than fixed-effects models (Cohn & Becker, 2003). Because we anticipated heterogeneous results across studies, we reasoned that random-effects model would be more appropriate to calculate the pooled effect sizes. The advantage of random-effects analyses, however, is that they yield more generalizable parameter estimates, which extend beyond the studies included in the review (Cohn & Becker, 2003).
In order to better understand the relevance of the current findings it is important to consider these in relation to the cognitive changes in normal ageing. In the present review, we could not analyze reliably cognitive decline in healthy elderly subjects because only a few studies used a control group. A recent systematic review of the literature suggests that there is almost universal cognitive decline in the general elderly population, although the exact magnitude of this decline could not been determined (Park et al., 2003). However, studies on cognitive ageing in healthy subjects of comparable age to our PD sample and over a comparable follow-up interval, in general, suggest stability of cognitive performance. For example, a population-based study of elderly subjects using the MMSE as outcome measure showed no cognitive decline in younger age groups (65–69 years) three and five years after the initial assessment (Jacqmin-Gadda et al., 1997). One of the studies (Azuma et al., 2003) reviewed in the present analysis included a control group comprising subjects with comparable mean age (68 years), who were re-tested after approximately the same period of time (26 months) as the patients in this review. Performance on only one measure deteriorated significantly, whereas the scores on the remaining tests remained the same or improved. Thus, decline of cognitive functions in non-demented PD patients, albeit small, appears to be more pronounced than in healthy elderly subjects. The extent of cognitive changes (d ≤ .40) in non-demented PD patients, however, if present, may be difficult to detect with small samples or with only clinical observation.
Whereas PD patients generally seem to show small changes in cognitive function over time, it is possible that certain sub-groups of patients exhibit more profound and obvious deterioration. It has, for example, been suggested that later onset of disease (Reid, 1992) and the presence of depressive symptoms at baseline (Starkstein et al., 1992) are associated with greater cognitive decline. Although the moderator analyses in the present review should be interpreted cautiously because of a small number of contributing studies, there are some interesting findings regarding the influence of demographic variables and study characteristics. Older age was found to be associated with greater degree of decline in global cognitive ability and memory. Although the effect of age was relatively small, it is consistent with previous studies examining the relation between demographic features and cognitive changes in PD (Palazzini et al., 1995; Portin & Rinne, 1986). Furthermore, our results indicate that education significantly influences changes in cognitive performance of PD patients over time. This suggests that high educational attainment may exert a protective effect on cognitive decline in PD, which is consistent with the cognitive reserve hypothesis (Stern, 2002). It has been reported before that low educational attainment is an important risk factor for the development of dementia in PD (Glatt et al., 1996). The present findings provide further evidence of the influence of education on decline in a number of major cognitive domains in PD. Our results show that duration of disease was not associated with the magnitude of cognitive changes.
Effect sizes for global cognitive ability were found to be greater in studies that employed longer follow-up intervals. This finding is consistent with the clinical experience that the progression of PD is rather slow, and corroborates the validity of the analysis. The apparent lack of effect of the follow-up interval on the remaining aspects of cognitive function is probably not a reliable finding, due to the small number of studies contributing to some analyses (e.g., verbal ability, attention and processing speed, and constructive skills).
This meta-analysis has several limitations, which are primarily related to the current state of the literature. First, although the number of patients in most comparisons was of reasonable size, a relatively small number of studies were included in analyses of some domains. Second, most studies have focused on only one or two cognitive domains. Consequently, the analyses of potential moderating variables were typically based on a small number of studies. In addition, only few studies have provided information about disease characteristics (e.g., severity of specific motor symptoms, medication, scores on depression rating scale), which prevented us from assessing the moderating effects of these variables. Third, when the same research group published multiple papers it was not always clear whether and to what degree there was overlap in the samples. This may have introduced bias in our analysis. However, in cases where there were indications that two or more studies were drawn from the same or highly overlapping samples, only the article with the largest sample was included in the present review. Fourth, the classification of tests into cognitive domains is arbitrary to some degree. Fifth, about 70% of the longitudinal studies identified by the literature search did not meet the inclusion criteria for the present review. Closer inspection of demographic and study characteristics revealed that excluded studies had on average a longer follow-up interval (42 versus 29 months) and a shorter disease duration (5.1 versus 7.7 years) than studies included in the review. In light of our finding that the longer follow-up period was associated with a greater degree of cognitive decline, it could be argued that the exclusion of these studies might have resulted in some underestimation of the effect sizes. No differences were found between included and excluded studies with respect to age and educational level of patient samples. Finally, several studies have recruited patients from tertiary care clinics specialized in movement disorders. Therefore, it remains uncertain whether the present findings can be generalized to the PD population at large.
In conclusion, our quantitative review indicates that in non-demented PD patients, changes in cognitive functions over time are generally quite subtle. However, some methodological issues, such as the possibility of selection bias and the impact of practice effects, should be taken into consideration when interpreting the current findings. Clearly, more prospective studies are needed before firm conclusions can be drawn on the evolution of cognitive changes in PD. Future studies should include larger numbers of patients and employ comprehensive neuropsychological test batteries that evaluate various aspects of cognitive functioning. Several methodological issues should also be taken into account, for example, inclusion of a control group re-tested at the same interval and use of alternate test forms, in order to determine unequivocally the nature and magnitude of cognitive decline. Finally, future studies should attempt to measure the effects of potential moderator variables since they may be important in understanding the variability in the progression of cognitive changes in PD.
ACKNOWLEDGMENTS
This work was funded by the Prinses Beatrix Fonds (Grant PGO 01-0138 to B. Schmand). There are no financial or other relationships that may create a conflict of interest. The authors thank R.P.C. Kessels, PhD., for helpful comments on statistical analyses.