INTRODUCTION
There is an established mindset in clinical neuropsychology that demographic adjustments to test scores improves the accuracy of inferences based on neuropsychological assessment (American Academy of Clinical Neuropsychology Board of Directors, 2007; Ivnik, Reference Ivnik2004; Lezak, Howieson, & Loring, Reference Lezak, Howieson and Loring2004). For inferences regarding the presence of acquired impairment from brain injury or neurological disease, this emphasis on demographic adjustments is well supported (Lezak et al., Reference Lezak, Howieson and Loring2004). However, sometimes the issue at hand is the adequacy of neuropsychological abilities for real-world activities. Many real-world activities such as paying bills, finding one’s way around town, job performance, and driving safely require the same level of cognitive abilities from anyone, regardless of their demographic characteristics. Adjusted scores are informative regarding whether patients have declined from their premorbid level of ability, but for the purpose of predicting competence in real-world activities dependent on those abilities, more accurate inferences may be based on unadjusted, raw scores reflecting the individual’s current level of ability. Prior to publication of large normative studies in the 1990s, studies predicting real-world functioning almost invariably employed unadjusted scores, except for studies using the Wechsler Adult Intelligence Scale (WAIS) tests. In their manual for demographically adjusted norms, Heaton and colleagues (Reference Heaton, Miller, Taylor and Grant2004) recommend the use of unadjusted scores for predicting everyday functioning. When predicting competence for real-world activities that require the same level of ability from all, adjusting scores for demographic factors may diminish the accuracy of inferences.
This proposition has received little empirical study. In a landmark investigation, Silverberg and Millis (Reference Silverberg and Millis2009) studied 52 adults approximately one year after traumatic brain injury (TBI). Performances on seven neuropsychological tests were converted to both absolute and adjusted scores. Absolute scores were referenced to a normative sample reflecting the general adult population. Adjusted scores were based on regression models derived from healthy subjects accounting for age, education, gender, and ethnicity. Overall test battery mean scores were calculated separately for absolute and adjusted scores. Outcome measures were participant or caregiver ratings for independence of living situation, community ambulation, employment, and global psychosocial functioning. Approximately one-third of the TBI participants showed nontrivial discrepancies between absolute and adjusted scores. When discrepancies were present, absolute scores were the better predictor for seven of eight outcome measures, and strongly so for three of the seven: employability and two global ratings of real-world functioning. Although demonstrating the superiority of absolute scores for predicting real-world capabilities in a demographically heterogeneous sample of individuals with TBI, the authors noted limitations, and called for future research with larger and demographically distinct samples, other clinical conditions, additional outcome measures, and more reliable outcome measures.
The present study was designed to address limitations of the Silverberg and Millis (Reference Silverberg and Millis2009) study by investigating the effect of demographic adjustments on prediction of another important aspect of real-world functioning, driving. As the population of North America ages and dementia becomes more prevalent, the increasing number of elderly drivers with cognitive impairment presents a growing public health problem (Dawson, Anderson, Uc, Dastrup, & Rizzo, Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a). However, accurate identification of unsafe drivers is important to avoid unnecessarily restricting the mobility and freedoms of competent drivers, a challenge heightened by the fact that not all patients with dementia are unsafe drivers (Hunt, Morris, Edwards, & Wilson, Reference Hunt, Morris, Edwards and Wilson1993; Tallman, Beattie, & Tuokko, Reference Tallman, Beattie, Tuokko, Johansson and Lundberg1994).
Withaar and colleagues (Reference Withaar, Brouwer and Van Zomeren2000) reviewed studies of on-road driving in older adults with cognitive decline and concluded that tests of visual scanning, visual attention, and visual perception, especially the Trail-Making Test, are the most robust predictors of crashes and moving violations. Reger and colleagues (Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004) conducted a meta-analysis of 27 empirical studies, separately for three types of driving assessment (on-road, off-road/simulator, and caregiver report), noting that on-road tests are generally considered the “gold standard.” Tests of visuospatial skills bore the strongest relationship to on-road driving performance, followed by tests of attention/concentration. Tests of mental status, language, and executive functions were only weakly related or unrelated. (There were insufficient studies to include memory tests in this analysis.) They concluded that tests of visual processing skills may be most helpful for identifying at-risk drivers. Those observations have been borne out in more recent studies showing that, among patients with mild dementia, tests predicting on-road driving performance included the Trail-Making Test (Dawson et al., Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a; Grace et al., Reference Grace, Amick, Abreu, Festa, Heindell and Ott2005; Uc et al., Reference Uc, Rizzo, Anderson, Sparks, Rodnitzky and Dawson2007; Reference Uc, Rizzo, Johnson, Dastrup, Anderson and Dawson2009; Whelihan, DiCarlo, & Paul, Reference Whelihan, DiCarlo and Paul2005), the Rey Complex Figure Test copy condition (Dawson et al., Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a; Grace et al., Reference Grace, Amick, Abreu, Festa, Heindell and Ott2005), the Benton Visual Retention Test (Dawson et al., Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a), and the Useful Field of View (Uc et al., Reference Uc, Rizzo, Johnson, Dastrup, Anderson and Dawson2009; Whelihan et al., Reference Whelihan, DiCarlo and Paul2005).
In the present study, we examined the accuracy of raw versus demographically adjusted test scores for prediction of driving ability on a prespecified, standardized route among older drivers with very mild Alzheimer’s or Parkinson’s disease, and healthy elderly controls. The clinical groups were selected for very mild cognitive deficits, because it is within these groups that identification of no-longer-safe drivers is especially challenging, but particularly important (Tallman, Reference Tallman1992). To address limitations of the Silverberg and Millis (Reference Silverberg and Millis2009) study, the sample is larger, with increased clinical heterogeneity. The outcome measure is a performance-based, reliable measure of driving ability. It was hypothesized that driving performance can be predicted with raw scores from relevant neuropsychological tests, and that the accuracy of prediction is diminished when scores are adjusted for demographic factors.
METHODS
Participants
Participants included 24 neurologically normal elderly controls (NC), 26 individuals with probable Alzheimer’s disease (AD), and 33 with Parkinson’s disease (PD). All were community-dwelling, independent-living, active drivers; were age 65 or older; and had a score of 26 or higher on the Mini-Mental State Examination (MMSE; Folstein, Folstein, & McHugh, Reference Folstein, Folstein and McHugh1975). Classification as an “active driver” required a valid license and weekly driving of ten miles or more. NC participants were volunteers recruited from the community (primarily by advertisement at senior centers), had no neurological diagnosis or complaints, and had family/collateral confirmation of the absence of abnormal cognitive decline. These were not matched groups as the study is not concerned with comparing their driving performance; rather, participants were a consecutive series of group members in the database with completed protocols and meeting criteria for the present study. AD participants were recruited from a registry maintained by the Department of Neurology, University of Iowa Carver College of Medicine. All were initially determined to have significant cognitive impairments and significant functional decline consistent with probable AD by clinical neuropsychological evaluation in the Benton Neuropsychology Laboratory (Tranel, Reference Tranel, Grant and Adams2009) prior to their identification as potential research participants, and admittance to the registry required meeting NINCDS-ADRDA criteria (McKhann et al., Reference McKhann, Drachman, Folstein, Katzman, Price and Stadlan1984) for diagnosis of probable Alzheimer’s disease as determined by a board-certified neurologist. PD participants were recruited from the Movement Disorders Clinics of the University of Iowa Hospitals and Clinics and the Iowa City Veterans Affairs Medical Center, with mild-moderate disease severity determined by the Hoehn-Yahr score. PD patients were tested during their “on” periods (the period after taking medication during which PD symptoms such as tremor, rigidity, and bradykinesia are responding well, before the therapeutic effect begins waning). The sample was Caucasian other than two participants (a proportion that approximates the population of the state of Iowa): one was an African-American with AD and one a biracial individual with PD; both were highly similar to other members of their clinical group on characteristics presented below. Further details regarding these groups are available elsewhere (Dawson et al., Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a; Uc et al., Reference Uc, Rizzo, Johnson, Dastrup, Anderson and Dawson2009).
Exclusion criteria included history of epilepsy, stroke, neoplasm, sleep disorders, depression, anxiety, schizophrenia, alcohol or substance abuse, diabetes, liver or kidney disease, severe arthritis, cancer, or motion sickness, or present treatment with cognitive-dulling medications. Inclusion and exclusion criteria were carried over from prior studies (Dawson et al., Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a; Uc et al., Reference Uc, Rizzo, Johnson, Dastrup, Anderson and Dawson2009), with the exception of the criterion of MMSE of 26 or higher, a criterion added to provide a more stringent test of the study hypotheses (elaboration in Discussion). The study was approved by the University of Iowa Institutional Review Board, and informed consent was obtained in accord with federal guidelines for human subjects’ safety and confidentiality.
Materials and Procedure
Neuropsychological measures
Five neuropsychological measures were selected for (a) their sampling of visual scanning, visual information processing, and attention, areas found to correlate significantly with driving performance (Dawson et al., Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a; Reger et al., Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004), and (b) being in widespread use in clinical practice. These included the Trail-Making Test A and B (TMT-A and TMT-B) seconds to completion; the Complex Figure Test, copy (CFT); the Benton Visual Retention Test, errors (BVRT); and the Wechsler Adult Intelligence Scale-III Block Design Test (BD). All tests are described in detail elsewhere (Lezak et al., Reference Lezak, Howieson and Loring2004). Raw scores and demographically adjusted scores were obtained for each test. Raw scores were employed rather than the absolute scores used by Silverberg and Millis (Reference Silverberg and Millis2009), because, as analyses were conducted with linear regression analysis, there was no requirement for variables to be normally distributed or on a common metric, so there was no benefit to be conferred by additional transformation of scores – yielding scores that would not necessarily be available to clinicians. Demographically adjusted scores are expressed in T scores, calculated by referencing scores to published normative data adjusted for demographic factors as available from the following sources: TMT (Heaton, Grant, & Matthews, Reference Heaton, Grant and Matthews1991), CFT (Van Gorp, Satz, & Mitrushina, Reference Van Gorp, Satz and Mitrushina1990), BVRT (Benton, Eslinger, & Damasio, Reference Benton, Eslinger and Damasio1981), and BD (Wechsler, Reference Wechsler1997). Regarding TMT, it is noted that normative data from the Heaton et al. system are only available up to age 80, so adjustments for the 75–80 age band were applied to ten subjects over age 80. The Heaton et al. (Reference Heaton, Grant and Matthews1991) normative system adjusts TMT scores for age, gender, and education; the normative data sets for the other tests correct only for age.
Driving ability
An instrumented vehicle known as ARGOS was employed (Rizzo, McGehee, Petersen, & Dingus, Reference Rizzo, McGehee, Petersen, Dingus, Rothengatter and Carbonnel1997), and driving performance was assessed with an on-road driving test, a procedure with established validity (Reger et al., Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004). Participants drove an approximately 45-minute standardized route around Iowa City in daylight, and exclusive of particularly challenging weather conditions. A trained researcher classified errors from the passenger seat. The researcher was usually blind to diagnostic group and neuropsychological test results, although group membership of PD with manifest motor symptoms would be unavoidably evident. Additionally, video of the drive was collected and reviewed a minimum of two times to assess interrater reliability (Dawson et al., Reference Dawson, Uc, Anderson, Dastrup, Johnson and Rizzo2009b). Driving ability was operationalized as the total number of errors made during the drive, with errors classified according to Iowa Department of Transportation definitions. There were 76 error types in 15 categories, such as starting and pulling away from the curb, traffic signals and signs, turns, lane observations, overtaking another vehicle, control of speed, reverse driving, parking maneuvers, curves, and miscellaneous events. Further details of the road test and scoring system are available elsewhere (Dawson et al., Reference Dawson, Uc, Anderson, Dastrup, Johnson and Rizzo2009b; Rizzo et al., Reference Rizzo, McGehee, Petersen, Dingus, Rothengatter and Carbonnel1997).
RESULTS
Background characteristics of NC, AD, and PD are presented in Table 1 for informational purposes only; adjustments were not made for background differences between groups, because the study goal was not to compare these groups but to investigate the effect of demographic adjustments on predictive accuracy in a clinically heterogeneous sample. Study groups differed in gender composition and in age, but did not differ in years of education, MMSE, or driving errors. AD had worse scores than NC (but not PD) on four measures, and PD had worse scores than NC (but not AD) on all five measures. Errors driving along the standardized route were rated with high intrarater and interrater reliability: .95 and .73, respectively (both p < .001). Driving errors were not significantly related to age in this elderly sample (r = .18, p = .101).
Table 1. Demographic, cognitive and driving characteristics of the study sample
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160714011907-06099-mediumThumb-S1355617710000470_tab1.jpg?pub-status=live)
Note
MMSE = Mini-Mental State Examination; TMT-A = Trail-Making Test, Trail A; TMT-B = Trail-Making Test, Trail B; CFT = Complex Figure Task, Copy; BVRT = Benton Visual Retention Test, errors; BD = Wechsler Adult Intelligence Scale-III, Block Design Test. Values reflect means and standard deviations, except for gender. AD = Alzheimer’s disease; PD = Parkinson’s disease; NC = normal elderly controls.
a Differences between study groups are tested by one-way analysis of variance (ANOVA), with the exception of gender, which is evaluated by chi-square with two degrees of freedom.
b Between-group differences on neuropsychological scores were evaluated post hoc with Fisher’s protected least significant difference (LSD) test.
c AD and PD performed worse than NC.
d PD performed worse than NC and AD.
e Driving errors are as defined by the Iowa Department of Transportation for assessment of driving performance.
Preliminary analyses indicated reasonably normal distributions of independent and dependent variables, and least-squares multiple linear regression was used to test hypotheses. As the study was concerned with comparing the predictive accuracy of neuropsychological measures in general when demographic adjustments are or are not applied (rather than comparing the predictive value of specific measures), all five measures were entered as a block in two corresponding predictive models, one with raw scores, one with demographic adjustments. These analyses revealed that the model with raw scores from all five neuropsychological measures was significantly predictive of driving errors (R 2 = .199, F = 3.68, p = .005). A parallel regression analysis with demographically adjusted scores did not account for a significant proportion of the variance in driving errors (R 2 = .113, F = 1.87, p = .107). A direct comparison of the R 2 from the two predictive models was performed with the Z statistic, as described by Olkin (Kleinbaum Kupper, Nizam, & Muller, Reference Kleinbaum, Kupper, Nizam and Muller2008), to test for the significance of the difference between correlation coefficients. This statistic, Z = 2.09 (p = .037), indicated that the model with raw scores accounted for a significantly higher proportion of variance in driving errors than did adjusted scores.
Table 2 presents the predictive relationship (Pearson correlations) of raw versus demographically adjusted scores from individual neuropsychological measures with driving errors. Raw scores were significantly correlated with driving errors for three of five neuropsychological measures, TMT-A, CFT, and BD, and approached significance for the other two measures, TMT-B and BVRT. In contrast, only one demographically adjusted score was significantly correlated with driving errors, TMT-A, and only two others approached significance, CFT and BD. The magnitude of the correlation was higher for raw scores than adjusted scores for all five neuropsychological measures, although the difference only reached significance for CFT.
Table 2. Bivariate relationships between neuropsychological measures and total driving errors
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160714011907-56601-mediumThumb-S1355617710000470_tab2.jpg?pub-status=live)
Note
TMT-A = Trail Making Test, Trail A; TMT-B = Trail Making Test, Trail B; CFT = Complex Figure Task, Copy; BVRT = Benton Visual Retention Test, errors; BD = Wechsler Adult Intelligence Scale-III, Block Design Test. For all tests for which lower scores indicate higher functioning, the direction of the correlations was reversed so that a positive correlation always indicates a positive relationship between better neuropsychological performance and better driving performance.
a The p value is based on the test statistic Z for the differences between the correlation coefficients associated with raw scores versus adjusted scores.
†p < .10; *p < .05; ***p < .001.
To determine whether the hypothesized effect of diminished predictive accuracy with demographic adjustments was present within each study group, separate regression analyses were conducted for each group. The R 2 of raw scores and demographically adjusted scores to driving errors for each study group were: .217 versus .08 for NC, .268 versus .280 for AD, and .161 versus .05 for PD. In the context of small sample sizes and five predictor variables, none of these differences reached statistical significance.
DISCUSSION
Results supported study hypotheses. Raw scores from a set of five neuropsychological measures predicted a significant 20% of the variance in driving errors in a clinically heterogeneous sample of elderly, independent-living, active drivers. Application of demographic corrections to neuropsychological scores diminished predictive accuracy: With demographic adjustments, the proportion of variance predicted by neuropsychological measures declined significantly – by almost half – to a nonsignificant 11%. The expected pattern was seen for each of the five neuropsychological measures individually, although the difference was statistically significant only for CFT. This provides modest support for the generality of the detrimental effect of demographic corrections to the predictive relationship of distinctive cognitive abilities with driving on a standardized route.
Our findings are highly consistent with those of Silverberg and Millis (Reference Silverberg and Millis2009), providing an independent demonstration that when discrepancies between unadjusted and demographically adjusted scores are present, unadjusted scores more accurately predict real-world functioning. Findings are also consistent with the observation of Reitan and Wolfson (Reference Reitan and Wolfson2005) that corrections for age are less useful for accurate interpretation of neuropsychological performances of a wide range of brain-damaged patients, for whom scores have diminished associations with age and education. Similarly, Golden and van den Broek (Reference Golden and van den Broek1998) reported that, among brain-damaged subjects, neuropsychological scores undergoing age-, education-, and gender-corrections were less sensitive to impairment than were raw scores. They noted that by correcting for cognitive changes arising from aging, we may also be inadvertently adjusting for the effect of brain changes, “corrections” which make the individual appear normal when there may actually be subtle impairment.
Silverberg and Millis noted that the representativeness of their findings for populations with distinctive demographic characteristics and conditions other than TBI, and for functional capacities other than those included in their study, remained to be determined. The present study was designed to address the called for extensions by examining a distinctive outcome in a larger, demographically distinct (elderly, more highly educated, and largely Caucasian), and clinically distinct sample with, or at risk for, mild cognitive decline. The AD and PD groups were selected for mildness of cognitive deficits for two reasons. First, individuals with mild cognitive decline are the most problematic in regards to determination of fitness for driving (Reger et al., Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004): There is ample evidence that patients with AD or PD and marked cognitive impairment are significantly worse drivers than healthy controls (Dawson et al., Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a; Grace et al., Reference Grace, Amick, Abreu, Festa, Heindell and Ott2005; Reger et al., Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004; Uc et al., Reference Uc, Rizzo, Johnson, Dastrup, Anderson and Dawson2009) and are at greater risk for crashes (Lundberg, Hakamies-Blomqvist, Almkvist, & Johansson, Reference Lundberg, Hakamies-Blomqvist, Almkvist and Johansson1998; Stutts, Stewart, & Martel, Reference Stutts, Stewart and Martel1998). However, even very mild cognitive decline may compromise safe driving (Withaar et al., Reference Withaar, Brouwer and Van Zomeren2000), and individuals with very mild dementia have been found to fail on-road driving tests at rates of 21% to 38% (Tallman et al., Reference Tallman, Beattie, Tuokko, Johansson and Lundberg1994, and Hunt et al., Reference Hunt, Morris, Edwards and Wilson1993, respectively). Thus, factors enhancing or detracting from accurate prediction of driving safety in individuals with very mild cognitive decline is of particular clinical import (Tallman, Reference Tallman1992). Second, inclusion of participants with more severe cognitive impairment would be expected to inflate findings regarding prediction with raw scores and diminished accuracy with demographically adjusted scores, and it was decided that a stringent test would be a more compelling demonstration of the hypothesized effect.
It is important to emphasize that, although the AD group was selected for mild cognitive decline, they did have significant cognitive impairments (most frequently in memory and associative verbal fluency), and all met standard, accepted diagnostic criteria for probable Alzheimer’s disease. The MMSE cut-off was used strictly to identify those with very mild cognitive decline – not for diagnostic purposes, as it has been established that the MMSE has limited sensitivity to very mild cognitive impairment in AD (Nasreddine et al., Reference Nasreddine, Phillips, Bédirian, Charbonneau, Whitehead and Collin2007), especially in a highly educated sample such as ours, and in PD, as well (Hoops et al., Reference Hoops, Nazem, Siderowf, Duda, Xie, Stern and Weintraub2009). In fact, patients with AD identified by rigorous diagnostic procedure can have MMSE scores of 30 (Shiroky, Schipper, Bergman, & Chertkow, Reference Shiroky, Schipper, Bergman and Chertkow2007).
Given the goal of this study, AD and PD groups were not intended to be representative of patients with AD and PD in general (i.e., with a broad range of disease severity), and we emphasize that this study was not designed to compare the driving ability of AD, PD, and healthy elderly drivers. Rather, the follow-up analysis of the study groups was performed to examine the robustness of the superior predictive power of raw scores versus demographically adjusted scores across neurological status. The results of that analysis showed that diminished predictive accuracy resulting from demographic adjustments was found for healthy elderly and one neurodegenerative disease (PD), but not for AD. Post hoc analyses revealed that, in the AD group, age was completely uncorrelated with driving errors (r = .04), in contrast to modest, but nonnegligible, correlations among PD and NC (r = .29 and .27, respectively). It is possible that the restricted age range in this study and the generally later age of onset for AD contributed to this. In any event, the lack of association between age and driving errors naturally minimizes the impact of correcting for age (the primary demographic correction in this study) on the multiple correlations between neuropsychological performances (adjusted scores) and driving errors for this group. Although this study was not designed to compare the driving ability of the various clinical conditions, it must be noted that AD and PD participants did not have significantly more driving errors than the controls, and this result warrants comment. In part, the lack of significance reflects the considerable variability in driving performance within each group. Additionally, compromised driving performance is not a function of diagnostic status per se, but rather of the degree of cognitive weaknesses (Hunt et al., Reference Hunt, Morris, Edwards and Wilson1993; Tallman et al., Reference Tallman, Beattie, Tuokko, Johansson and Lundberg1994) and the inclusion criterion, designed here to select a sample with very mild cognitive decline, which accordingly yielded an AD group that does not reflect the severity of cognitive impairment found in AD with more advanced disease. It may be further noted that impairment in early AD generally involves memory most prominently, and memory tends to not be significantly correlated with driving performance (Dawson et al., Reference Dawson, Anderson, Uc, Dastrup and Rizzo2009a).
Findings also extend the range of outcomes for which adjusted scores show diminished predictive accuracy, from ratings of broad functional capacities in the study of Silverberg and Millis (Reference Silverberg and Millis2009; e.g., global functioning or employability) to a specific ability, driving, in the present study. In the earlier study, although the pattern of superiority for unadjusted scores was seen rather consistently across outcome measures, the magnitude of the discrepancy was weak for several measures. In the present study, the more impressive magnitude of the superiority for unadjusted scores (close to twice the predictive accuracy of demographically corrected scores) may reflect the advantage conferred by the objective, reliable, ecologically valid outcome measure (DeRaedt & Ponjaert-Kristoffersen, Reference DeRaedt and Ponjaert-Kristoffersen2001). This is consistent with the observation that neuropsychological tests are more strongly related to on-road driving tests than to caregiver reports of driving ability (Reger et al., Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004).
Although this study was not designed to evaluate or compare the predictive abilities of individual neuropsychological tests, it may be noted that the correlations between the neuropsychological measures (raw scores) – individually or in combination – and driving performance (individually, r = .19–.38; collectively, .44) indicate “low” to “moderate” correlations or effect sizes according to the widely accepted interpretive conventions of Cohen (Reference Cohen1992). These findings are consistent with previous findings. One review of on-road driving in older adults with cognitive decline noted inconsistent correlations between neuropsychological tests and on-road driving performance, ranging from negligible to “moderate” (Withaar et al., Reference Withaar, Brouwer and Van Zomeren2000). In a more recent meta-analysis, Reger and colleagues (Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004) converted results from 27 empirical studies into mean Pearson product-moment correlation coefficients weighted by sample size for each of six cognitive domains. Among subjects with mild or very mild dementia, visuospatial skills and attention/concentration bore the strongest relationships to on-road driving performance (weighted mean correlations of .29 and .25, respectively), whereas tests of mental status, language, and executive functions had only low to negligible correlations (.13, .10, and –.06, respectively). Regarding specific tests, one recent study reported correlation coefficients (Whelihan et al., Reference Whelihan, DiCarlo and Paul2005), and this study found that among patients with mild dementia, TMT-A and TMT-B correlated with on-road driving performance .32 and .46, respectively. Additionally, the Brief Visuospatial Memory Test, a very close relative of the BVRT employed in the present study, correlated .17 with driving. The generally low to moderate relationships (with the exception of occasionally large effect sizes for the less widely used Useful Field of View Test of Ball & Owsley, Reference Ball and Owsley1993) may appear somewhat disappointing. However, the present study was not designed to search for the optimal combination of neuropsychological measures, and the range of cognitive impairment was restricted, attenuating correlations between cognitive scores and driving performances. Furthermore, from a broader perspective, the correlations from the present study are also consistent with the general level of univariate correlations that are typically found between cognitive measures and complex real-world functioning in areas such as job performance, job-training performance, college GPA, and graduate school GPA, which range from .19 to .28 (Meyer et al., Reference Meyer, Finn, Eyde, Kay, Moreland and Dies2001).
This study is not without limitations. Of the five neuropsychological measures, only age corrections were available for three (CFT, BVRT, and BD). Although, for the CFT, normative data are available, stratifying subjects by gender, education, and age (Ardila, Rosselli, & Rosas, Reference Ardila, Rosselli and Rosas1989), subjects in that study were Spanish-speaking and age bands did not extend beyond age 65, making these norms unsuitable for the present study. The effect of the absence of gender and education corrections for these three measures is unknown; however, Silverberg and Millis (Reference Silverberg and Millis2009) found age to be the most powerful factor contributing to discrepancies with absolute scores, making it most relevant to the issue at hand. Additionally, they noted that, to the extent that multiple demographic adjustments increased a discrepancy from the absolute score, this tended to increase the reduction in predictive accuracy of demographically adjusted scores. Thus, the unavailability of gender and education corrections for three predictors, if anything, likely attenuated the magnitude of the effect under study. The fact that the demographic adjustments came from different sources for each test is another methodologic weakness, as different normative samples may have differed from our study sample in divergent ways. However, the samples for the BVRT and CFT norms appear quite comparable to the study sample, and the TMT scores are corrected for each of the relevant factors, so sample differences should have minimal impact. The sample from which norms for the WAIS-III BD test were derived is less educated than the study sample. As this difference is not corrected for in the demographic adjustments for BD, it is likely that this also serves to underestimate the extent to which demographic adjustments would have reduced predictive accuracy had they included correction for education. In short, the limitation regarding the normative data available for some measures in the present study likely provided for a conservative test of the hypothesis.
Another possible limitation is the a priori restriction of the age range to 65 or older. It may be questioned whether a broader age range (e.g., age 60+) would have been more appropriate, especially as it was found that driving errors were not significantly related to age. This nonsignificant result is attributable, in part, to the sample’s restricted age range. More importantly, that finding is consistent with previous findings that, among elderly drivers, impaired on-road driving performance is not an effect of aging per se, but of cognitive decline in a subset of that group (Fitten et al., Reference Fitten, Perryman, Wilkinson, Little, Burns and Pachana1995; Tallman et al., Reference Tallman, Beattie, Tuokko, Johansson and Lundberg1994). Neither of these observations threatens the internal validity of the study. The question is whether restricting the age range may have altered the primary study findings. The rationale for the restrictive age cut-off was to have a more stringent test of the study hypothesis. The powerful association of age, especially advanced age, to driving safety is well established (e.g., Massie, Campbell, & Williams, Reference Massie, Campbell and Williams1995; Williams & Shabanova, Reference Williams and Shabanova2003). Given a sample with a broader age range and more powerful age effect in the raw data, the demographic adjustments – which are designed to eliminate the effect of age (and other demographic factors) – will necessarily increase the extent to which demographic adjustments reduce predictive accuracy vis-à-vis raw scores.
Another limitation is the fact that driving ability was not assessed in terms of actual accidents and cited violations in day-to-day driving. It is possible that driving performance was affected, for better or worse, by ways in which an on-road driving test differs from natural driving activities (e.g., a researcher sitting in the car evaluating them). However, history of accidents is also an imperfect criterion, because they are very low-frequency events (Stutts et al., Reference Stutts, Stewart and Martel1998); they are related to miles driven, and individuals with AD have been found to drive fewer miles (Massie et al., Reference Massie, Campbell and Williams1995; Trobe, Waller, Cook-Flannagan, Teshima, & Bieliauskas, Reference Trobe, Waller, Cook-Flannagan, Teshima and Bieliauskas1996); and accidents are multidetermined, so that some crashes are not necessarily attributable to an involved driver’s ability (Lundberg et al, Reference Lundberg, Hakamies-Blomqvist, Almkvist and Johansson1998; Withaar et al., Reference Withaar, Brouwer and Van Zomeren2000). On-road driving tests predict some types of at-fault crashes at levels comparable to the most predictive neuropsychological tests – including UFOV, the computerized neuropsychological test most highly correlated with crashes (Ball & Owsley, Reference Ball and Owsley1994; Owsley et al., Reference Owsley, Ball, McGwin, Sloane, Roenker, White and Overley1998) – and are better than most (DeRaedt & Ponjaert-Kristoffersen, Reference DeRaedt and Ponjaert-Kristoffersen2001), and at levels higher than caregiver-reported driving skill (Reger et al., Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004). Another concern is that use of an on-road driving test limits experimental control over potentially important variables, such as amount of traffic and the behavior of other drivers, and this may increase error variance in analyses. Nevertheless, active drivers do not drive in controlled settings, and on-road drives allow researchers to observe driving ability in real-world situations versus simulated drives in a controlled laboratory setting, increasing the ecological validity of study findings, an inherently valuable feature in research investigating important, complex real-world activities (Spooner & Pachana, Reference Spooner and Pachana2006). In this study, driving errors were tallied according to well-specified criteria and were rated with good interrater reliability (.73), as is typical of such tests (DeRaedt & Ponjaert-Kristoffersen, Reference DeRaedt and Ponjaert-Kristoffersen2001). Because of their advantages over actual crashes or caregiver-reported driving skill, on-road tests are generally considered the “gold standard” for assessing driving ability (Reger et al., Reference Reger, Welsh, Watson, Cholerton, Baker and Craft2004).
Demographic corrections are an important feature of neuropsychological assessment. However, present findings, in conjunction with those of Silverberg and Millis (Reference Silverberg and Millis2009), indicate that, when the issue at hand is an individual examinee’s ability to perform a real-world activity with universal demands for adequate levels of cognitive abilities from all, regardless of demographic characteristics, it is the examinee’s absolute levels of relevant ability that are pertinent. For such activities, basing predictions on demographically adjusted scores risks diminished accuracy of inferences regarding competency for that activity. Findings also demonstrate that compromised driving safety is not a function of increasing age per se, but of the degree to which relevant abilities are compromised in the individual driver. It is emphasized that this study was not designed to compare diagnostic groups, and so does not speak to the relative ability of drivers with AD, PD, or healthy elderly. While the current study demonstrates that it will be raw or absolute scores that are most useful, it will be important for future research to investigate identification of unsafe drivers at the individual level. One approach will be identifying cut-off points on relevant neuropsychological tests, requiring validation with bona fide measures of unsafe driving (e.g., at-fault crashes). However, as an individual may be an unsafe driver for a variety of reasons, or combinations of reasons, it will also be important to investigate multivariate predictive equations that incorporate multiple measures of relevant neuropsychological abilities and other factors, weighted by relative risk.
ACKNOWLEDGMENTS
This research was supported by grants NIA AG 17717, NIA AG 15071, and NINDS R01 NS044930.