Introduction
Executive functions (EFs) enable individuals to plan, initiate, execute and monitor goal directed behavior in novel situations. Theoretical and factor analytic studies have identified several component skills which underpin goal-directed or executive behavior. These include planning, which involves analyzing a task and generating strategies to implement it, monitoring or regulating its execution and holding relevant information in mind to guide future behavioral responses, dealing with interference and multiple demands by switching flexibly between aspects of the task, inhibiting responses to inappropriate stimuli and checking that the task is executed as planned (Bennett, Ong, & Ponsford, Reference Bennett, Ong and Ponsford2005a; Bennett, Ong, & Ponsford, Reference Bennett, Ong and Ponsford2005b; Busch, McBride, Curtiss, & Vanderploeg, Reference Busch, McBride, Curtiss and Vanderploeg2005; Chan, Shum, Toulopoulou, & Chen, Reference Chan, Shum, Toulopoulou and Chen2008; Crawford & Henry, Reference Crawford and Henry2005; Cushman & Duffy, Reference Cushman and Duffy2008; Damasio & Anderson, Reference Damasio and Anderson2003; Duncan, Johnson, Swales, & Freer, Reference Duncan, Johnson, Swales and Freer1997; Lezak, Howieson, & Loring, Reference Lezak, Howieson and Loring2004; Stuss, Reference Stuss2007; Testa & Bennett, Reference Testa, Bennett and Ponsford2011). EF processes are commonly impaired in people with traumatic brain injury (TBI; Busch et al., Reference Busch, McBride, Curtiss and Vanderploeg2005) resulting in difficulty performing daily activities and roles.
Neuropsychologists are frequently asked to predict patients’ everyday functional abilities, necessitating that neuropsychological assessments be sensitive and ecologically valid. However, numerous studies have demonstrated that people expected to perform poorly on executive tests may perform within normal limits (Alderman, Burgess, Knight, & Henman, Reference Alderman, Burgess, Knight and Henman2003; Norris & Tate, Reference Norris and Tate2000; Ord, Greve, Bianchini, & Aguerrevere, Reference Ord, Greve, Bianchini and Aguerrevere2009), and several studies have reported either no relationship or weak to moderate relationships between performance on EF tests and measures of everyday functioning (Chaytor et al., Reference Chaytor, Schmitter-Edgecombe and Burr2006; Manchester, Priestley, & Jackson, Reference Manchester, Priestley and Jackson2004). Such findings may reflect the low verisimilitude of neuropsychological tests; that is, the low parity between the task demands and the demands of the everyday environment that are complex and multi-factorial (Manchester et al., Reference Manchester, Priestley and Jackson2004; Morganti, Reference Morganti2004; Shallice & Burgess, Reference Shallice and Burgess1991).
There is increasing recognition that neuropsychological assessment tools need to incorporate more complex and life-like scenarios, capable of taxing multiple executive processes simultaneously to be more predictive of real-world performances (Burgess et al., Reference Burgess, Alderman, Forbes, Costello, Coates, Dawson and Channon2006; Chan, Shum, Toulopoulou, & Chen, Reference Chan, Shum, Toulopoulou and Chen2008; Manchester et al., Reference Manchester, Priestley and Jackson2004). The Multiple Errands Test (MET; Alderman et al., Reference Alderman, Burgess, Knight and Henman2003; Burgess et al., Reference Burgess, Alderman, Forbes, Costello, Coates, Dawson and Channon2006; Shallice & Burgess, Reference Shallice and Burgess1991) addresses this need by requiring participants to perform errands in a real shopping centre according to certain rules. However, functional assessments such as the MET can be time consuming, costly, and difficult to replicate or standardize across settings (Rand, Rukan, Weiss, & Katz, 2009). It is also not always feasible for people with significant mobility, behavioral, or psychological difficulties to access the community (Knight & Alderman, Reference Knight and Alderman2002).
Virtual reality (VR) technology incorporates the principles of verisimilitude by creating three-dimensional, interactive computer-generated environments that simulate real world scenarios (Matheis, Reference Matheis2004). Virtual reality assessments offer the potential to address the ecological validity limitations inherent in traditional neuropsychological measures, as well as the reliability and utility issues associated with functional neuropsychological assessments such as the MET (Shallice & Burgess, Reference Shallice and Burgess1991). A range of VR environments have been used to assess executive functioning, including a 3D virtual apartment (Zalla, Plassiart, Pillon, Grafman, & Sirigu, 2001), virtual supermarket (Klinger, Chemin, Lebreton, & Marie, Reference Klinger, Chemin, Lebreton and Marie2006), virtual university (McGeorge et al., Reference McGeorge, Phillips, Crawford, Garden, Della Sala and Milne2001), virtual office (Jansari, Agnew, Akesson, & Murphy, Reference Jansari, Agnew, Akesson and Murphy2004), virtual beach (Elkind, Rubin, Rosenthal, Skoff, & Prather, Reference Elkind, Rubin, Rosenthal, Skoff and Prather2001), and virtual street (Titov & Knight, Reference Titov and Knight2005). Recently a VR version of the MET has been developed, but this has only been studied in nine post-stroke patients (Rand et al., Reference Rand, Basha-Abu Rukan, Weiss and Katz2009). Research on these tools has been largely exploratory and has lacked psychometric rigor (Knight & Titov, Reference Knight and Titov2009). Validation of the relationship between VR and real life environments has also yet to be established (Lamberts, Evans, & Spikman, Reference Lamberts, Evans and Spikman2010).
The Virtual Library Task (VLT; Renison, Ponsford, Testa, Richardson, & Brownfield, Reference Renison, Ponsford, Testa, Richardson and Brownfield2008) is a VR measure of EF designed within a function-based framework. Assessment involves observing participants perform several library-based tasks in a virtual library (Spooner & Pachana, Reference Spooner and Pachana2006). Participants are required to prioritize and complete multiple tasks while managing interruptions, in addition to the presentation of new information that necessitates a shift in their behavioral approach. The present study aimed to examine the construct validity and ecological validity of the VLT. Specifically it aimed to:
(1) Examine the inter-rater and intra-rater reliability of the VLT and the same task performed in the real world environment; the Real Library Task (RLT).
(2) Compare performance on the VLT with performance on the RLT. It was hypothesized that scores on the VLT and RLT would be highly and significantly correlated.
(3) Examine whether the VLT and five other neuropsychological measures of EF were able to discriminate individuals with TBI with reported executive difficulties from healthy Controls. It was expected that the TBI participants would perform more poorly than Controls on the VLT and some measures of EF, but that group differences would be greater on the VLT task than on the neuropsychological measures.
(4) Examine the convergent validity of the VLT by examining correlations between the VLT and hypothetically related constructs. It was expected that the VLT would be moderately correlated with EF measures.
(5) Investigate the discriminant validity of the VLT by comparing correlations between the VLT and hypothetically related constructs (EF measures) with correlations between the VLT and a hypothetically unrelated construct, namely a measure of immediate attention, It was hypothesized that the VLT would be moderately correlated with EF measures, and not with immediate attention, and that the correlations of the VLT with EF measures would differ significantly from its correlation with immediate attention.
(6) Compare the ability of the VLT and neuropsychological measures of EF to predict executive functioning in everyday life. It was hypothesized that the VLT would be the strongest predictor of everyday EF.
Method
The study was approved by the Human Research Ethics committees of Epworth Hospital and Monash University.
Participants
Thirty participants with TBI and executive difficulties were recruited from Epworth Hospital (n = 25) and Osborn Sloan and Associates, a community based brain injury rehabilitation service (n = 5). 86.67% of TBI participants sustained injuries in motor vehicle accidents, 10% in falls and 3.33% in assaults. Duration of post traumatic amnesia (PTA), measured prospectively using the Westmead PTA Scale (Shores, Marosszeky, Sandanam, & Batchelor, Reference Shores, Marosszeky, Sandanam and Batchelor1986), ranged from 1 to 120 days (Median = 34.00; 6.67% PTA 0–7 days, 33.33% PTA 7–28 days; 60% PTA >28 days) and Glasgow Coma Scale (GCS) scores ranged from 3 to 15 (Median = 7.00; 63.33% GCS 3–8, 26.67% GCS 9–12, 10% GCS 13–15). PTA durations for the three TBI participants with GCS score of 13–15 were 14, 14 and 35 days, respectively. Time since injury ranged from 1 to 28 years (median = 5.00).
Participants were recruited if they, their families, or referring clinician considered they were experiencing EF problems, as per the definition of EF provided previously. All participants spoke English and were able to give informed consent. Exclusion criteria included a known psychiatric diagnosis, substance abuse, dysphasia, and physical or cognitive inability to use a computer.
A comparison group of 30 healthy adults with no history of neurological impairment was recruited from the general community.
Measures
Virtual Library Task (Renison et al., Reference Renison, Ponsford, Testa, Richardson and Brownfield2008)
The Virtual Library Task (VLT) is a non-immersive VR role-play task that can operate on any modern computer. It was built using a specially adapted version of the Genesis3D software program. It aims to measure EF in a reliable, functional, economical, and ecologically valid manner. The virtual environment (VE) models the exact dimensions and associated contents of two rooms in the Library at Epworth Hospital. This VE is navigated using an X-box and Playstation compatible handset. Participants were required to perform several specified tasks associated with the day to day running of the library, while adhering to predetermined rules. For example, to cool the library participants must “walk” to the air conditioner by manipulating the right hand joystick button. They turn the air conditioner on by moving the cursor over the air conditioner, via the left hand joystick button, and pressing the “select” button on the handset. They are informed via a visual prompt that the “air conditioner is out of order.” To problem solve an alternative method to cool the room participants pick up the fan by moving the cursor over it and pressing the “select” button. They must then carry the fan to the meeting room and then move the cursor to the power point and press “select,” which results in the fan being plugged in and automatically turned on. Another example is that of checking items that appear in the Intray, which participants do by using the right hand joystick control to “walk” within reach of the Intray and using the left hand joystick control to move the cursor over the Intray. The object is picked up by pressing the “select” button on the handset. To put the object down participants must “walk” to where they want to put the object and move the cursor to the desired location, that is, table in meeting room. Once they press the “select” button the object appears in the desired location. The VLT comprises functional tasks (Table 1) designed to reflect seven components of EF. These components were drawn from a survey of theoretical models and factor analyses of EF (Busch et al., Reference Busch, McBride, Curtiss and Vanderploeg2005; Crawford & Henry, Reference Crawford and Henry2005; Damasio & Anderson, Reference Damasio and Anderson2003; Duncan et al., Reference Duncan, Johnson, Swales and Freer1997; Stuss, Reference Stuss2007; Testa & Bennett, Reference Testa, Bennett and Ponsford2011). Two prospective memory components were included as they were considered crucial in the coordination of complex behavior (Burgess, Alderman, Volle, Benoit, & Gilbert, Reference Burgess, Alderman, Volle, Benoit and Gilbert2009). Administration time typically ranges between nine and twenty minutes. Factors influencing the time taken include level of impulsivity and planning and problem solving abilities.
Table 1 VLT: Subtasks and scoring criteria
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922012155-72043-mediumThumb-S1355617711001883_tab1.jpg?pub-status=live)
Trained raters used operationalized scoring criteria to rate how accurately the functional tasks were completed on a three-point scale (0, 1, 2). The functional tasks mapped on to seven components of EF, weighted proportionately according to the number of functional tasks that mapped onto them. Outcome measures included seven subtask scores ranging between 0 and 8 when weighted, summed to provide a Total Score from 0 to 56. Low scores reflected poor EF.
Real Library Task (Renison et al., Reference Renison, Ponsford, Testa, Richardson and Brownfield2008)
The Real Library Task (RLT) and scoring system was identical to the VLT, but was performed in the real library at the hospital and participants interacted with real objects.
Neuropsychological tests of intelligence, immediate memory, working memory, and verbal memory
Intelligence was estimated using the Wechsler Test of Adult Reading (WTAR). Verbal memory was assessed on the Logical Memory II subtest (scaled score) from the Wechsler Memory Scale – Third Edition (WMS-III). The Digit Span subtest from the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III) was used to examine immediate memory and working memory. Z-scores were used for the longest digits forwards (immediate memory) and longest digits backwards (working memory).
Neuropsychological tests of EF
Traditional neuropsychological measures included: Verbal Fluency total number of correct words generated over three trials (Benton, Reference Benton1968), Wisconsin Card Sorting Test 64: Computer version 2 percentage of perseverative errors (Heaton, Reference Heaton2005), the Brixton Spatial Anticipation Test raw score (Burgess & Shallice, Reference Burgess and Shallice1996), and two tasks proposed to have greater ecological validity, the Zoo Map and Modified Six Elements Test (MSET) from the Behavioural Assessment of the Dysexecutive Syndrome (Wilson, Alderman, Burgess, Emslie, & Evans, Reference Wilson, Alderman, Burgess, Emslie and Evans1996). Raw scores were also used due to limited variability of generated scaled scores. These commonly used measures are said to assess executive constructs similar to those which the VLT was designed to measure.
Everyday measure of EF
The Dysexecutive Questionnaire (DEX; Wilson et al., Reference Wilson, Alderman, Burgess, Emslie and Evans1996) is a 20-item checklist designed to measure the frequency of executive difficulties manifested in everyday life on a five-point scale ranging from “never” to “very often.” Three executive factors are measured: cognitive, behavioral, and emotional. Higher scores on the DEX reflect poorer executive functioning. The independent-rater form was used as previous research has shown people with executive dysfunction may be poor informants of their own behavior due to reduced self-awareness (Bennett et al., Reference Bennett, Ong and Ponsford2005b; Burgess, Alderman, Evans, Emslie, & Wilson, Reference Burgess, Alderman, Evans, Emslie and Wilson1998).
Experience with virtual reality technology
Participants self rated their level of experience in using virtual reality technology on a four point scale; 1 = never, 2 = < weekly, 3 = < 1 hour per day, 4 = > 1 hour per day. Those who rated their level of experience as 1 or 2 were coded as having “low VR experience”, whereas those who responded 3 or 4 were considered to have “high VR experience”.
Procedures
Data collection occurred in the Library at Epworth Hospital over two 90-min sessions (including rest breaks) 1 week apart. Session one consisted of the VLT, Verbal Fluency, Zoo Map, and Brixton Spatial Anticipation Test. Session two consisted of the RLT, MSET, and Wisconsin Card Sorting Test. Administration order was counterbalanced to eliminate practice effects associated with performing the same task in the two different environments (virtual, real).
The VLT was run on a standard laptop and participants were trained in the navigation of the VE before its administration. Training time, ranging between three and fifteen minutes, was determined by the participant and was largely dependent on the participant's prior experience with VR software. Participants were provided a paper copy of the “Scenario Sheet” and “To Do List” and the researcher answered any questions before starting the task.
To obtain data regarding the intra- and inter-rater reliability of the VLT and RLT, the performance of 11 participants was videotaped and rated by two independent raters both at the time of task administration and 1 week later.
The DEX was completed and returned by mail by a “significant other” nominated by the participant, typically a family member or friend with whom they either lived or had at least weekly contact.
Data analyses were carried out using SPSS for Windows, version 18. Pearson's correlation coefficients were used to examine the intra- and inter-rater reliability of the VLT and RLT and to compare performance on the Library Task conducted in real and the VE. Independent samples t tests were used to compare the Control and TBI groups on the neuropsychological measures and VLT. Analysis of co-variance was conducted to examine the group difference in VLT scores after controlling for age, education, intelligence and verbal memory. Pearson's correlation coefficients were used to examine the relationship between (a) the VLT and executive measures, age, immediate attention, working memory and verbal memory, controlling for education and intelligence, and (b) the DEX and performances on the VLT and neuropsychological measures. One tailed tests and Fisher's transformations of r to Zr were used to test the significance of difference between two sets of correlations: (1) those between VLT and executive measures and (2) correlations between the VLT and a test of immediate attention. A standard multiple regression was performed to examine the relative contribution of executive measures, including the VLT to predict everyday EF as measured by the DEX. Alpha level was set at 0.05 throughout.
Results
Intra- and Inter-rater Reliability of the VLT and RLT
Preliminary data investigating the reliability of the VLT and RLT, showed strong inter-rater reliability (rVLT = 1.0; p < .001; rRLT = 1.0; p < .001) and strong intra-rater reliability (rVLT = 1.0; p < .001; rRLT = 1.0; p < .001).
Correlation Between VLT and RLT Performance
The mean Total Scores on the VLT and RLT were 40.28 (SD = 8.15) and 41.35 (SD = 8.48), respectively. The examiner needed to assist a small number of TBI and Control participants who experienced some navigational difficulties; however the pattern of correlations of performances of the task in the two environments was similar for both groups. This suggests that the cognitive and physical impairments experienced by the TBI group did not impede their ability to use VR technology any more than Controls. As such, combined correlations for the two groups are presented. Performance in the real and VE was significantly correlated for the VLT and RLT Total Score (r = .68; p < .01), and for six of the seven subtests; Task Analysis (r = .27; p = .04), Strategy Generation and Regulation (r = .77; p < .01), Prospective Working Memory (r = .53; p < .01), Response Inhibition (r = .54; p < .01), Timed-based Prospective Memory (r = .48; p < .01), and Event-based Prospective Memory (r = .73; p < .01). Real and virtual performances were not significantly correlated for the Interference and Dual Task Management subtest (r = −.10; p = .47). Further investigation regarding this revealed that, whereas the groups did not significantly differ on the RLT Interference and Dual Task Management subtest, the Control group performed significantly better on this subtest than the TBI group on the VLT version (MTBI = 5.27; SDTBI = 1.86; MControl = 7.20; SDControl = 1.24; t(58) = −4.74; p < .01).
Comparison of TBI and Control Group Performance on Demographic Variables, Intelligence, Immediate Memory, Working Memory, and Verbal Memory Ability
The groups did not differ significantly in terms of age (TBI M = 37.57; SD = 12.24; Control M = 32.10; SD = 12.34; t(58) = 1.72; p = .09, r = .22), education (TBI M = 13.33; SD = 2.58; Control M = 13.57; SD = 2,76; t(58) = −.34; p = .74; r = .04), intelligence (TBI M = 101.60; SD = 10.59; Control M = 104.73; SD = 7.68; t(52.94) = -1.31; p = .20; r = .13), immediate memory (TBI M = .01; SD = .90; Control M = .33; SD = 1.00; t(58) = −1.31; p = .19; r = .34), working memory (TBI M = .09; SD = 1.16; Control M = .53; SD = 1.00; t(58) = −1.57; p = .12; r = .41), or verbal memory ability (TBI M = 10.00, SD = 2.95; Control M = 10.67; SD = 2.32; t(58) = .97; p = .34; r = .13).
Comparison of TBI and Control Group Performance on Neuropsychological Measures of EF
The performance of TBI and Control groups did not differ significantly on the EF tests, with the exception of the MSET, on which TBI participants performed more poorly than the control group.
Comparison of TBI and Control Group Performance on the Virtual Library Task of EF
The TBI group obtained significantly lower mean scores than the Control group on the VLT Total Score and on four of the VLT subtests, including VLT Prospective Working Memory, VLT Interference and Dual Task Management, VLT Time-based Prospective Memory and VLT Event-based Prospective Memory (Table 2). There were no significant group differences on the Task Analysis, Strategy Generation and Regulation, and Response Inhibition subtests of the VLT. Analysis of covariance (ANCOVA; Table 3) revealed that the TBI group obtained significantly lower scores on the VLT than the Control group after controlling for age, intelligence, working memory and verbal memory. Three of the four covariates made a significant contribution to the group differences in VLT scores including age, intelligence, and verbal memory performance.
Table 2 Mean differences between groups on neuropsychological measures of EF and on the VLT (N = 60)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160922012155-99690-mediumThumb-S1355617711001883_tab2.jpg?pub-status=live)
Notes. VLT Strategy Gen. & Reg. = VLT Strategy Generation & Regulation, VLT Prospective WM = VLT Prospective Working Memory, VLT IDTM = VLT Interference & Dual Task Management, VLT Time PM = VLT Time-based Prospective Memory, VLT Event PM = VLT Event-based Prospective Memory, ^This df value was adjusted to take account of a Levene's test showing violation of the homogeneity of variance assumption.
Table 3 Analysis of covariance for VLT Total Score
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921003521412-0620:S1355617711001883:S1355617711001883_tab3.gif?pub-status=live)
Note. N = 60.
Comparison of TBI and Control Group Performance on VR Experience
Chi square analyses revealed that the amount of participants’ previous VR experience (low vs. high) did not differ between the groups, X2 (1, N = 60) = 1.67; p = .20). Independent samples t tests revealed participants with low VR experience did not perform significantly worse on the VLT than participants with high VR experience (MlowVR = 40.06; SDlowVR = 8.56; MhighVR = 41.14; SDlowVR = 6.50; t(58) = −.41; p = .69).
Convergent and Discriminant Validity
Convergent validity was examined by investigating the relationships between scores on VLT and scores on the neuropsychological measures. Support for the convergent validity of the VLT is provided by the moderate correlations between the VLT and three of the EF measures; Verbal Fluency, Zoo Map, and Modified Six Elements Test, controlling for education and IQ (Table 4).
Table 4 Correlations between virtual library task (VLT) and executive measures, age, immediate attention, working memory and verbal memory controlling for education and intelligence (N = 60)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921003521412-0620:S1355617711001883:S1355617711001883_tab4.gif?pub-status=live)
Initial support for the discriminant validity of the VLT was provided by (1) the moderate correlations between the VLT and three of the EF tests, and (2) the non-significant correlation between the VLT and a test measuring immediate attention (Digits Forward). The correlation between the VLT and Digits Forward differed significantly from the correlation between the VLT and the Brixton Test (Z = 2.17; p < .05), and the correlation between the VLT and Digits Forward differed significantly from the correlation between the VLT and Zoo Map performance (Z = 2.57; p < .01). Moderate correlations were found between the VLT and age and measures of intelligence, working memory and verbal memory.
Comparison of TBI and Control Group Everyday EF Performance as Measured by the Dysexecutive Questionnaire (DEX)
Independent sample t tests revealed that on average the TBI group experienced significantly more everyday executive dysfunction problems (M = 30.00; SD = 14.15) as measured by DEX independent rater responses, than the Control group (M = 13.55; SD = 10.58; t(57) = 5.04; p < .01; r = .56).
Relationships Between Scores on Neuropsychological Measures and the Virtual Library Task and Scores on the DEX
Pearson's correlation coefficients were calculated to examine the associations between scores on the EF measures and scores on the VLT. The pattern of correlations was similar for each group; therefore, to maximize power combined correlations are presented in Table 5.
Table 5 Pearson correlations between scores on neuropsychological measures of EF and the Virtual Library Task and scores on the DEX
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921003521412-0620:S1355617711001883:S1355617711001883_tab5.gif?pub-status=live)
Notes. VF = Verbal Fluency, WCS = Wisconsin Card Sorting Test computerized version, Brix = Brixton Spatial Anticipation Test, Mod SET = Modified Six Elements Test, VLT TA = VLT Task Analysis subtest, VLT SGR = VLT Strategy Generation and Regulation subtest, VLT PWM = VLT Prospective Working Memory subtest, VLT IDTM = VLT Interference and Dual Task Management subtest, VLT Inhi = VLT Inhibition subtest, VLT EPM = VLT Event-based Prospective Memory subtest, VLT TPM = VLT Time-based Prospective Memory subtest.
*p < .05, **p < .01.
Significant correlations were found between DEX scores and only one of the neuropsychological measures, namely the MSET. Regarding the VLT, there were significant associations between the DEX score and the VLT Total Score and three subtests: Task Analysis, Interference and Dual Task Management, and Time-based Prospective Memory. The significant negative correlations indicate that poorer performance on the MSET and the VLT was indicative of a high number and/or severity of everyday executive problems being reported by the participants’ significant other.
A standard multiple regression analysis was conducted to examine the relative contribution of all EF measures to scores on the DEX. According to Coakes and Steed (Reference Coakes and Steed2003) at least five times more cases than independent variables are required to conduct multiple regression analysis. As such, only two predictor variables were included in the regression, on the basis that they were both designed with ecological validity in mind and that they correlated with the DEX: VLT Total Score and the MSET. Variables were entered simultaneously into the regression equation to ascertain the unique contribution of each to variance in DEX scores. A summary of the results of the regression analyses is presented in Table 6.
Table 6 Summary of standard regression of Modified Six Elements test and Virtual Library Task (VLT) Total Score on Dysexecutive Questionnaire Score
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921003521412-0620:S1355617711001883:S1355617711001883_tab6.gif?pub-status=live)
Note. N = 60, R Square = .25.
The total model was significant, F(2,56) = 8.56, p < .01, explaining 23.4% of the variance in DEX scores. At an individual level, both the MSET and the VLT were significant, with the Beta values indicating that the MSET was the stronger predictor of DEX scores.
Discussion
The current study sought to examine whether performance on a VR task was similar to performance of the same task in the real world environment. As expected, scores on the VLT and the RLT were highly positively correlated. This finding is consistent with the results of previous research (Cushman & Duffy, Reference Cushman and Duffy2008; Jansari et al., Reference Jansari, Agnew, Akesson and Murphy2004; McGeorge et al., Reference McGeorge, Phillips, Crawford, Garden, Della Sala and Milne2001; Rand et al., Reference Rand, Basha-Abu Rukan, Weiss and Katz2009; Zhang et al., Reference Zhang, Abreu, Seale, Masel, Christiansen and Ottenbacher2003), suggesting that performance on the VLT is similar to performance on the RLT. This finding therefore supports the use of VE's for testing of patients with TBI. This is important because VR assessment has greater clinical utility than assessment in real world settings. The only exception to this was the lack of significant correlation between the real and virtual performances for the Interference and Dual Task Management subtest. The TBI group performed significantly more poorly than the Control group on the virtual version of this subtest but not on the real life version. The real life version may have involved more familiar navigation, whereas virtual navigation demands may have added to the complexity of this subtask, placing greater burden on the TBI participants. The verbal prompt provided on this version may have been more salient to TBI participants than the visual prompt provided in the VLT version. It is important to note that both versions of the task showed strong inter-rater reliability. In the present study VR experience did not impact on performance on the VLT.
Support for the construct validity of the VLT as a measure of EF in participants with TBI was evidenced by the superior ability of the VLT to differentiate between patients with TBI and healthy Controls relative to the EF measures, even after controlling for age, education, intelligence, and verbal memory. Modest support for its convergent validity was also provided by the moderate correlations between the VLT and three of the five EF measures after controlling for education and IQ. These correlations, combined with the non significant correlation between the VLT and a measure of immediate attention and the significance of difference between pairs of correlations; namely the VLT and Digits Forward and (1) the VLT and the Brixton Test and (2) the VLT and Zoo Map performance, provide some support for the discriminant validity of the VLT. The moderate correlations between the VLT, and age, measures of intelligence, working memory and verbal memory supports previous studies which have reported that these constructs may influence performance on EF measures (Axelrod et al., Reference Axelrod, Goldman, Heaton, Curtiss, Thompson and Ghelune1996; Bennett et al., Reference Bennett, Ong and Ponsford2005a; Greve, Brooks, Crouch, Williams, & Rice, Reference Greve, Brooks, Crouch, Williams and Rice1997). They underscore the importance of controlliong for these variables which have not been considered in many previous studies (Duncan, Reference Duncan2005).
As expected, and consistent with previously reported findings (Burgess et al., Reference Burgess, Alderman, Evans, Emslie and Wilson1998; Chan & Manly, Reference Chan and Manly2002; Dawson et al., Reference Dawson, Anderson, Burgess, Cooper, Krpan and Stuss2009; Knight & Alderman, Reference Knight and Alderman2002; Wilson et al., Reference Wilson, Alderman, Burgess, Emslie and Evans1996) the TBI group were reported by significant others to have significantly more executive difficulties in everyday life than the control group. Four of the neuropsychological measures did not identify this and failed to significantly differentiate between the two groups. This is consistent with several studies that have reported no significant differences between control and brain injured performances on Verbal Fluency (Alderman, et al., Reference Alderman, Burgess, Knight and Henman2003; Jovanovski, Reference Jovanovski2004), Brixton Test of Spatial Anticipation (Draper & Ponsford, Reference Draper and Ponsford2008), the WCST (Alderman et al., Reference Alderman, Burgess, Knight and Henman2003; Dawson et al., Reference Dawson, Anderson, Burgess, Cooper, Krpan and Stuss2009; Norris & Tate, Reference Norris and Tate2000; Ord et al., Reference Ord, Greve, Bianchini and Aguerrevere2009), and the Zoo Map Test (Evans, Chua, McKenna, & Wilson, Reference Evans, Chua, McKenna and Wilson1997; Tyson, Laws, Flowers, Mortimer, & Schulz, Reference Tyson, Laws, Flowers, Mortimer and Schulz2008), although the Zoo Map Test has shown better discriminant validity in some previous research (Katz, Tadmor, Felzen, & Hartman-Maeir, Reference Katz, Tadmor, Felzen and Hartman-Maeir2007; Moriyama et al., Reference Moriyama, Mimura, Kato, Yoshino, Hara, Kashima and Watanabe2002; Norris & Tate, Reference Norris and Tate2000; Wilson, Evans, Emslie, Alderman, & Burgess, Reference Wilson, Evans, Emslie, Alderman and Burgess1998). These findings add to a growing body of evidence regarding the limited ecological validity of traditional EF tests (Bennett et al., 2005b; Chan, Reference Chan2001; Chaytor & Schmitter-Edgecombe, Reference Chaytor and Schmitter-Edgecombe2003; Chaytor et al., Reference Chaytor, Schmitter-Edgecombe and Burr2006; Manchester et al., Reference Manchester, Priestley and Jackson2004; Norris & Tate, Reference Norris and Tate2000). The disparity between the demands of real-life and testing environments most likely accounts for this (Manchester et al., Reference Manchester, Priestley and Jackson2004; Morganti, Reference Morganti2004; Shallice & Burgess, Reference Shallice and Burgess1991). That said, the multifactorial nature of the tasks, and the consequent possibility that the TBI patients did not have impairments on the components of EF measured by these tests cannot be ruled out.
The MSET did successfully differentiate the TBI and Control groups. Several studies have reported adequate construct validity of the MSET between healthy controls and clinical populations including alcoholics (Moriyama et al., Reference Moriyama, Mimura, Kato, Yoshino, Hara, Kashima and Watanabe2002), TBI and multiple sclerosis (Norris & Tate, Reference Norris and Tate2000), schizophrenia (Wilson et al., Reference Wilson, Evans, Emslie, Alderman and Burgess1998), and acquired brain injury (Wilson et al., Reference Wilson, Evans, Emslie, Alderman and Burgess1998). This task may be more sensitive to executive difficulties because it requires sustained cognitive effort over a 10-min period, incorporates subtasks simulating real world scenarios thereby providing good face validity, places significant demands on working memory, and requires an ability to multi-task and shift set.
The TBI group performed significantly worse on the VLT than the Control group overall, and on four out of the seven subtests; VLT Prospective Working Memory, VLT Interference and Dual Task Management, VLT Time-based Prospective Memory, and VLT Event-based Prospective Memory, suggesting that these tasks adequately taxed these executive sub-skills. The significant difference between the TBI and control groups in the Time and Event-based Prospective Memory subtests is consistent with the findings of previous studies using the Virtual Bungalow Task in patients with frontal lesions (Morris, Kotitsa, Bramham, Brooks, & Rose, Reference Morris, Kotitsa, Bramham, Brooks and Rose2002) and stroke (Brooks, Rose, Potter, Jayawardena, & Morling, Reference Brooks, Rose, Potter, Jayawardena and Morling2004).
Both the MSET and the VLT were significantly and moderately correlated with the DEX, which supports the ecological validity of these tests. The tests together predicted 23.4% of the variance in DEX scores, providing support for a significant association between performance on these two tasks and everyday EF as measured on the DEX. Nevertheless there remains a substantial proportion of unshared variance in DEX scores, suggesting additional factors are contributing to the reported everyday executive difficulties. These may include: the level of environmental cognitive demand and compensatory strategy use (Chaytor et al., Reference Chaytor, Schmitter-Edgecombe and Burr2006), how often the person's executive function impairments negatively affect the significant other (Joyner, Silver, & Stavinoha, Reference Joyner, Silver and Stavinoha2009), their degree of acceptance of cognitive and behavioral changes in the injured participant, their current stage of adjustment to the impact of the injury (Ponsford & Kinsella, Reference Ponsford and Kinsella1991), emotional problems, pre-morbid functioning, health problems, and mobility status (Long & Collins, Reference Long and Collins1997). Future research examining ecological validity should attempt to control for the abovementioned sources of variation where possible.
Individually, the MSET made a significant and unique contribution to prediction of DEX scores. The failure of previous studies to find positive correlations between the MSET and overall DEX scores (Chan & Manly, Reference Chan and Manly2002; Norris & Tate, Reference Norris and Tate2000) may be due to the heterogeneity in their participants, who may not all have had executive dysfunction in the domains assessed by the tasks. However, the results of this study and that of Bennett et al. (Reference Bennett, Ong and Ponsford2005b) who found moderately significant correlations (r = −.36) between TBI scores on the MSET and clinician-rated DEX suggests that this test is an ecologically valid measure in people with predominantly moderate to severe TBI.
The VLT also made a unique contribution to the prediction of DEX scores. The VLT appears to have greater face validity than the MSET. This is supported by the observations of Manly, Hawkins, Evans, Woldt, and Robertson (Reference Manly, Hawkins, Evans, Woldt and Robertson2002) that the MSET remains somewhat removed from everyday situations; “it is difficult to imagine many real situations where the parameters for switching from one task to another are so firmly set, and where the time allocated to each aspect is of such short duration” (2002, p. 279). Clinically the importance of face validity should not be underestimated; patients are more likely to accept feedback regarding cognitive impairments if the tools used to make decisions regarding their impairments are reflective of the everyday environments and scenarios. Another important advantage of the VLT over the MSET is that it attempts to provide operational definitions of the underlying components of EF which it purports to measure. Such information would be useful in assisting clinicians to tailor rehabilitation recommendations to a patient's specific EF impairments and strengths.
Several clinical implications arise from this study. We would caution against the use of Verbal Fluency, WCST, the Brixton Spatial Anticipation Test, and the Zoo Map test if a key function of the neuropsychological assessment is to extrapolate how cognitive behavioral deficits observed during assessment will impact on patients’ functional executive abilities. We suggest that the VLT would also be an appropriate measure, potentially providing a more comprehensive assessment of EF than the MSET.
The present study is not without limitations. Given the moderate sample size it was only possible to examine the relative sensitivity of a limited number of executive tests. Thus one cannot rule out the possibility that other measures not included in this study may have shown greater sensitivity to everyday executive difficulties. However, a broad range of executive tasks was included. Development of an alternate version of the task for retest purposes will be a useful next step. The DEX was chosen as the ecological comparator for the neuropsychological measures and the VLT based on the recommendation of Chaytor and Schmitter-Edgecombe (Reference Chaytor and Schmitter-Edgecombe2003) that neuropsychological measures of EF should be measured against a measure of everyday executive skill rather than many different global outcome measures. All but one participant lived in the community and, therefore, family- and friend-rated DEX scores were the most appropriate for this sample. However, the DEX is not without limitations and as previously discussed ratings made by relatives or friends may be influenced by a range of factors.
In addition to demonstrating that virtual reality assessments can be successfully administered to people with moderate to severe TBI, the current study provides evidence for the construct and ecological validity of the VLT; a newly developed assessment which aims to measure multiple components of EF in an integrated and lifelike manner. The inability of four of the five neuropsychological measures to (1) distinguish between the TBI and control groups and (2) correlate significantly with a measure of everyday EF, provides further evidence of the limited sensitivity and ecological validity of traditional pen and paper measures of EF. In contrast The MSET and the VLT appear to be sensitive and ecologically valid assessment tools. The VLT has the potential advantage over the MSET of providing objective measurement of various components of EF.
Acknowledgments
We thank the participants in this study, who generously donated their time. We also thank Johnathan Wells for programming the virtual library environment. The development of the virtual library environment was supported by a grant from the Jack Brockhoff Foundation and completion of the study was supported by a grant from Monash University's Faculty of Medicine, Nursing and Health Sciences. There are no conflicts of interest relating to this manuscript.