Introduction
Intraindividual reaction time (RT) variability refers to the within-person trial-to-trial response time variability of a given cognitive task. It is the focus of considerable research interest in the cognitive and clinical neuropsychology of aging fields as well as several other areas (e.g., schizophrenia, head injury) because it may provide a behavioral marker of neurobiological disturbance (Hultsch, Strauss, Hunter, & MacDonald, Reference Hultsch, Strauss, Hunter and MacDonald2008; MacDonald, Nyberg, & Backman, Reference MacDonald, Nyberg and Backman2006). Consistent with this proposition, greater intraindividual variability (IIV) is associated with mild dementia (Bielak, Hultsch, Strauss, MacDonald, & Hunter, Reference Bielak, Hultsch, Strauss, MacDonald and Hunter2010; Cherbuin, Sachdev, & Anstey, Reference Cherbuin, Sachdev and Anstey2010; Hultsch, MacDonald, Hunter, Levy-Bencheton, & Strauss, Reference Hultsch, MacDonald, Hunter, Levy-Bencheton and Strauss2000; Tales et al., Reference Tales, Leonards, Bompas, Snowden, Philips, Porter and Bayer2012), Parkinson's disease (de Frias, Dixon, Fisher, & Camicioli, Reference de Frias, Dixon, Fisher and Camicioli2007), brain injury (Stuss, Pogue, Buckle, & Bonder, Reference Stuss, Pogue, Buckle and Bonder1994), mild psychopathology (Bunce, Handley, & Gaines, Reference Bunce, Handley and Gaines2008), and it also increases in normal aging (Bielak, Cherbuin, Bunce, & Anstey, Reference Bielak, Cherbuin, Bunce and Anstey2013; Hultsch, MacDonald, & Dixon, Reference Hultsch, MacDonald and Dixon2002).
As this brief review shows, measures of IIV have considerable potential in clinical contexts as they may aid identification and diagnosis of a range of neurobiological disorders. Additionally, they are quick to administer. However, although there has been considerable research investigating IIV, there is little consensus concerning how to measure the construct. This is evidenced by the literature where several computations of IIV can be found using varying numbers of RT trials derived from a wide selection of neuropsychological tasks. Additionally, the relative merits of the different measures of IIV have received little attention. Three metrics are predominant in the literature; the raw intraindividual SD which does not control for any potential confounding influences; the coefficient of variation (i.e., intraindividual SD/raw intraindividual M) which adjusts IIV by the mean level of RT performance; and, the intraindividual SD that statistically partials out influences that may artefactually inflate IIV such as time-on-task effects (e.g., Hultsch et al., Reference Hultsch, MacDonald and Dixon2002; additionally, see Hultsch et al., Reference Hultsch, Strauss, Hunter and MacDonald2008, for a statistical discussion of each method). Few studies have directly contrasted different measures of IIV, and those that have suggest that different IIV metrics produce converging results (e.g., Lovden, Li, Shing, & Lindenberger, Reference Lovden, Li, Shing and Lindenberger2007). Research also shows that caution should be observed when controlling for influences such as mean RT as the method used can affect IIV associations with, for example, age (Dykiert, Der, Starr, & Deary, Reference Dykiert, Der, Starr and Deary2012b). Importantly though, direct comparisons of the raw intraindividual SD to that adjusted for mean RT indicate that age-related increases in IIV are not accounted for by mean RT (Dykiert, Der, Starr, & Deary, Reference Dykiert, Der, Starr and Deary2012a), a finding suggesting that IIV captures sources of variance other than those attributable to age-related slowing. This is underlined by studies showing that relative to mean RT, IIV is particularly sensitive to mild cognitive impairment (e.g., Dixon et al., Reference Dixon, Garrett, Lentz, MacDonald, Strauss and Hultsch2007) and mild psychopathology (e.g., Bunce et al., Reference Bunce, Handley and Gaines2008).
As there is a need for additional information regarding the measures used to quantify IIV, in the present study we compared the three commonly used measures and asked (1) which measure provides the strongest concurrent prediction of frontal white matter hyperintensities, and (2) how many RT trials are required to achieve this? To answer these questions, we used data from a recent study (Bunce et al., Reference Bunce, Anstey, Cherbuin, Burns, Christensen, Wen and Sachdev2010) in a sample of adults aged 44 to 48 years drawn from the population-based PATH Through Life Study. In this investigation, we described the association between within-person variability and left frontal cortex macroscopic white matter lesions (referred to as white matter hyperintensities: WMH) obtained from T1-weighted MRI scans. In other studies, WMH have been associated with cognitive decline in non-demented individuals and particularly in frontal executive function (Jokinen et al., Reference Jokinen, Gouw, Madureira, Ylikoski, van Straaten, van der Flier and Erkinjuntti2011; van der Flier et al., Reference van der Flier, van Straaten, Barkhof, Verdelho, Madureira, Pantoni and Scheltens2005), a variety of histopathological abnormalities (Gouw et al., Reference Gouw, Seewann, van der Flier, Barkhof, Rozemuller, Scheltens and Geurts2011), and future risk of dementia (Debette & Markus, Reference Debette and Markus2010). These findings suggest WMH have potential as early markers of cerebral ill-health.
There are several features of our earlier study that make it well-suited for the present evaluative purposes. First, although the future clinical status of our participants is currently unclear, the presence of WMH in this sample of apparently healthy middle aged adults is a possible early marker of age-related neuropathology. Second, the narrow age range minimizes the potential confounding influence of the between-subject factor age on the degree of within-person variability. Third, the RTs were drawn from a psychomotor task (choice RT), variations of which are typically used in aging and clinical neuropsychological research. Finally, 40 trials were administered, allowing estimations of the strength of association between WMH and IIV using varying numbers of RT trials to compute the latter variable. Although we also found an association between IIV and temporal WMH in the original investigation (Bunce et al., Reference Bunce, Anstey, Cherbuin, Burns, Christensen, Wen and Sachdev2010), here, we focus on the association between IIV and left frontal WHM as (a) theoretically, it is proposed that IIV is related to executive or attentional control (Bunce, MacDonald, & Hultsch, Reference Bunce, MacDonald and Hultsch2004; Bunce, Warr, & Cochrane, Reference Bunce, Warr and Cochrane1993; West, Murphy, Armilio, Craik, & Stuss, Reference West, Murphy, Armilio, Craik and Stuss2002) supported by the frontal cortex, and (b) there is functional MRI evidence that IIV is associated with task-related activity in the left middle prefrontal cortex (Bellgrove, Hester, & Garavan, Reference Bellgrove, Hester and Garavan2004), and neuropsychological research showing frontal lesions are associated with increased IIV (Stuss, Murphy, Binns, & Alexander, Reference Stuss, Murphy, Binns and Alexander2003).
Method
Participants
Data for the present study were drawn from 2530 persons aged 44 to 48 years recruited from the electoral roll in Canberra and surrounding areas in Australia, who were participating in the PATH Through Life Project, a longitudinal population-based investigation of age, cognition and mental health (see Anstey et al., Reference Anstey, Christensen, Butterworth, Easteal, Mackinnon, Jacomb and Jorm2012). A randomly selected subsample of 656 persons was offered an MRI scan, 431 of whom eventually completed. Detailed information on the sample from which the present subsample was drawn, image analysis and methods, can be found in the earlier report (Bunce et al., Reference Bunce, Anstey, Cherbuin, Burns, Christensen, Wen and Sachdev2010). Here, 415 persons (M age = 46.70; SD = 1.43; 227 women; M years of education = 14.82; SD = 2.28; non-Caucasian = 16) were included in the analyses. This figure is slightly lower than in the earlier investigation which used an imputation procedure to replace missing data for a minority of cases. Participants with or without left frontal white matter hyperintensities did not differ significantly on any of the demographic variables (ps > .55). All aspects of the project were approved by the Australian National University Human Research Ethics Committee.
Choice RT Task
As part of a wider neuropsychological battery, forty two-choice RT trials were administered using a small response box held with both hands, with left and right buttons at the top that were depressed using the index fingers. The front of the box had two red stimulus lights under the left and right buttons respectively and a green “get-ready” light in the middle beneath these. Following the “get-ready” light, one of the two red stimulus lights randomly illuminated to which participants were instructed to respond as quickly and as accurately as possible by pressing the corresponding response button. The interval between the “get-ready” light and the first light of the trial was 2.3 s, but the interval time for the remaining trials varied.
Computation of Intraindividual Variability Measures
Preprocessing of data for computation of IIV measures followed procedures commonly used in the literature (e.g., Hultsch et al., Reference Hultsch, MacDonald and Dixon2002). Initially, RTs for incorrect trials were removed together with unusually fast responses (<150 ms) and those greater than the individual mean + 3 individual SDs. The resulting missing values (<6.4%) were replaced using a regression substitution procedure where individual regression equations across all valid trials for each participant were computed to predict and replace the missing values. As this approach tends to reduce within-person variability, it represents a conservative approach to the study if IIV.
The raw SD was computed as the standard deviation for each individual across the trials of the task. Additionally, two other commonly used measures of IIV were assessed. In the first, the coefficient of variation (referred to as CV) was computed as raw intraindividual SD/raw intraindividual M RT. For the final measure, a regression procedure was used to compute the intraindividual SDs (referred to as ISD) where residuals were saved having partialed out categorical trial effects (i.e., time-on-task effects). The residuals obtained were then standardized and converted into t scores, and finally an estimate of each individual's standard deviation across the trials was computed. Due to the narrow age range, neither age nor any other between-subject factor was taken into account in this residualization procedure. The three metrics were computed for the first 5, 10, 15, 20, 25, 30, 35, and 40 trials, resulting in 24 different measures for each individual (8 raw SDs; 8 CVs; 8 residualized ISDs).
MRI Image Acquisition and Processing
Full details of the imaging protocol and analysis of WMH can be found elsewhere (Bunce et al., Reference Bunce, Anstey, Cherbuin, Burns, Christensen, Wen and Sachdev2010; Wen, Sachdev, Li, Chen, & Anstey, Reference Wen, Sachdev, Li, Chen and Anstey2009). Briefly, MRI data were collected using a 1.5 Tesla Gyroscan scanner (ACS-NT, Philips Medical Systems, Best, The Netherlands). T1-weighted three-dimensional (3D) structural MRI images were acquired in coronal plane using a Fast Field Echo sequence. The fluid-attenuated inversion recovery (FLAIR) sequence used to estimate WMH was acquired with TR = 11,000 ms, TE = 140 ms, TI = 2,600, number of excitations = 2, matrix size = 256 × 256, and the field of view was 230 × 230 mm. Slice thickness was 4.0 mm with no gap between slices and in-plane spatial resolution was 0.898 × 0.898 mm/pixel. The FLAIR and 3D T1 structural images of the same subject were co-registered using well-established procedures (Wen et al., Reference Wen, Sachdev, Li, Chen and Anstey2009).
Results
Table 1 presents descriptive data for raw SDs, CVs, and ISDs, and the average time to administer the task for the various numbers of trials. Consideration of mean scores for raw SD and CV measures suggests that IIV marginally declines with increasing numbers of trials. However, for the ISD measure, which controlled for time-on-task effects, it is noticeable that following a lower score for the 5-trial measure, values were relatively stable from 10 trials onward. This may reflect a practice effect for the opening trials. To assess the number of trials required to obtain a stable metric relative to the 40 trials estimate, we conducted a series of t tests that compared the measure obtained for each number of trials with that for 40 trials. Table 2, which details the paired t tests and corresponding bivariate correlations, shows that for raw SD, 30 trials were required before a nonsignificant result was obtained in comparison to the 40 trial metric, and for the CV measure, 35 trials were needed. For the ISD metric, however, only 10 trials were required before nonsignificant comparisons were obtained. This finding suggests that partialing out time on task effects may result in a stable measure of IIV after fewer trials than for the raw SD and CV computations. (Bonferroni corrections to the above t tests only additionally rendered the comparison for raw RT, 10 trials nonsignificant.)
aT-score metric where the effects of trial are removed.
Note. r with 40: correlation of IIV estimate at n trials with estimate at 40 trials. t with 40: paired t-test comparing IIV estimate at n trials with estimate at 40 trials. p with 40: p-value from paired t-test comparing IIV estimate at n trials with estimate at 40 trials.
We then formally tested the IIV measures in relation to WMH. As WMH distributions were bimodal, the variability measures computed for varying numbers of RT trials were subjected to logistic regression where left frontal WMH volumes (present or absent) were regressed onto intracranial volume and white matter volume at Step 1 (to control for individual differences in neuroanatomical volumes), and at Step 2 the respective IIV measures. The model was repeated for each of the three versions of the IIV measure computed for the first 5, 10, 15, 20, 25, 30, 35, and 40 RT trials. The statistic B, together with the associated standard error and p value obtained from the respective logistic regression models is reported in Table 3.
Note. The models adjusted for intracranial and white matter volumes.
With regard to the strength of association between the respective IIV measures and WMH as indicated by the coefficient B obtained from the logistic regression models, it can be seen that for all measures, consistently significant predictions of WMH were obtained from metrics of 20 trials and upward. The ISD measure that controlled for time-on-task effects produced the most conservative estimates. As there are several variables that may influence our findings, we repeated the logistic regressions adjusting for gender, years of education, ethnicity, depression, and history of head injuries. The results from these repeated models did not substantially differ from the original findings. Additionally, rerunning the models adjusting for individual mean RT attenuated some of the effects for IIV measures, but they all remained significant.
Next, we statistically compared the IIV metrics (raw SD vs. CV vs. ISD) by (a) repeating the logistic regressions for each trial number but including all three IIV measures in the models, and (b) computing the area under the curve (AUC) for each logistic regression model. The AUC estimates were then contrasted using the PROC Logistic procedure in SAS v9.3 (SAS Institute Inc, Cary, NC, 2011) to examine whether there were differences in accuracy between the respective measures of IIV in predicting WMH for each number of trials. Neither procedure revealed any significant differences among the respective IIV measures (for AUC comparisons, ps > .34), suggesting the various metrics were similar in predicting WMH.
Finally, it is worth considering the average time taken to administer the IIV measures of different trials numbers (Table 1). For IIV metrics where a significant prediction of frontal WMH was obtained (20 trials upward), the average administration time for this CRT task was between 52 and 104 s. Together, these findings suggest that a behavioral marker of frontal white matter integrity may be obtained in less than 2 min in clinical settings.
Discussion
To our knowledge, no previous research has compared the number of trials that contribute to different measures of IIV in relation to neuroanatomical measures of potential clinical significance. Particular strengths of the study were that we investigated a narrow age range (44 to 48 years), thereby removing a major between-subject influence (i.e., age) that may confound computations of within-person variability, and assessed a large community-based sample of 415 persons. Findings suggested that using more trials to compute IIV metrics (i.e., 20 to 40 trials) produced stronger predictions of left frontal WMH. Our findings converged with those elsewhere (e.g., Lovden et al., Reference Lovden, Li, Shing and Lindenberger2007) in that there was little to distinguish the three IIV measures (raw SD, CV, ISD), although analyses suggested that some of the variation contributing to predictions may have stemmed from practice effects and later fatigue inflating the degree of within-person variability.
WMH reflect cerebrovascular disease and may provide an early marker of age-related neurological decline. Although the presence of WMH in this cognitively normal community-based sample is not necessarily indicative of future neurological status, our findings demonstrate that measures of IIV computed from relatively few RT trials are predictive of white matter integrity. Indeed, IIV measures computed from as little as 20 trials provided a significant prediction of WMH burden. As average administration times for the measures ranged from 51.94 s for 20 trials to 103.88 s for 40 trials, the potential use of these measures in clinical settings for assessment purposes is clear, particularly as they are quick to administer. However, there are several considerations in reaching this conclusion.
First, in the main, the strength of association grew with the number of trials administered. As random error reduces as the number of RT trials increase, the present findings suggest caution should be observed in the use of IIV measures computed from fewer than 20 trials. This is particularly true in relation to the raw SD and CV measures where, respectively, 30 and 35 trials were needed to obtain a stable estimate relative to the 40 trial metric. Moreover, it is clear from consideration of raw and partialed ISDs that some of the within-person variability in raw SDs that contributed to the association with WMH may have stemmed from time-on-task effects as estimates for the ISD measures that controlled for this influence, were more conservative. By statistically removing within-person variation related to factors such as practice and fatigue, it is likely that a closer approximation to the construct of interest—the IIV associated with macroscopic white matter lesions in the left frontal cortex—was obtained. Additionally, our findings relate to a psychomotor task, types of which are commonly used in cognitive and clinical neuropsychological research. It is important that future research conduct similar analyses in other typically used cognitive domains such as attention and memory, as well as systematically varying the number and order of trials grouped together in randomized experimental designs. Furthermore, given the potential for clinical screening demonstrated by numerous other investigations of IIV, an important next step for research is to develop a normative database of IIV measures taking into account age, education and neurological status.
A limitation we should acknowledge, however, is that, although the present analyses were based on a cognitively normal sample aged 44 to 48 years and, therefore, minimized the well-established influence of age on IIV, the findings may not generalize to older populations or persons suffering specific neurological disorders. Additionally, we do not know the future neurological status of those persons exhibiting WMH in the sample. However, the PATH Through Life study is a longitudinal investigation and we hope to shed light on these issues as the research progresses. Also, in the present study, we evaluated different numbers of trials relative to a maximum of 40 trials, identifying the minimum number required for each of the IIV measures to obtain a nonsignificant comparison. It is possible that had a greater total number of trials had been administered (e.g., 100), the minimum number of trials required for a nonsignificant comparison would also have been greater. Our conclusions, therefore, should be viewed within the context of the total number of trials administered in this study. Finally, in using a succession of logistic regression models to test the predictive utility of the respective IIV metrics for different numbers of trials, it is possible that the findings are subject to Type I error. However, we have presented the exact p values obtained for each model in Table 3, and the consistency of the findings lead us to believe that this is unlikely.
To conclude, although we do not currently know the future neurological status of our participants, the present findings suggest that measures of IIV that have taken into account the influence of time-on-task effects may have potential in clinical contexts for assessment purposes. Moreover, the findings suggest that statistically significant predictions can be obtained from relatively few RT trials administered in as little as 52 s. Although we do not advocate the use of IIV measures as a stand-alone screening or diagnostic tool, but rather as a possible supplement to existing neuropsychological and biological assessment methods, given the speed of administration, it is important that future research evaluates the comparative utility of IIV measures in relation persons exhibiting clear neuropathology.
Acknowledgments
David Bunce was supported by a Leverhulme Trust (UK) Research Fellowship, Kaarin Anstey by National Health and Medical Research Council (NHMRC) Research Fellowship No. 366756, Nicolas Cherbuin by NHMRC Early Career Research Fellowship No. 471501 and Philip Batterham by NHMRC Early Career Research Fellowship No. 1035262. The research was also supported by NHMRC of Australia Unit Grant No. 973302, Program Grant No. 179805, Project grant No. 157125. We thank the study participants, and also Anthony Jorm, Bryan Rodgers, Helen Christensen, and PATH interviewers Patricia Jacomb and Karen Maxwell. The information in this manuscript and the manuscript itself has never been published either electronically or in print. The authors do not declare any conflicts of interest.