Introduction
Telehealth and telemedicine technology has seen rapid growth over the past decade in many areas of healthcare. These include teleradiology, telepsychiatry, telepsychology, and teleneurology/telestroke, wherein consultations and some traditional clinical services (e.g., examinations and interventions) can be provided remotely using video teleconference (VTC) equipment. Research in telepsychiatry has shown generally good acceptance by consumers and providers, particularly when such specialty services are not available within the community (Hilty, Nesbitt, Kunneth, Crus, & Hales, Reference Hilty, Nesbitt, Kunneth, Crus and Hales2007; Myers & Turvey, Reference Myers and Turvey2012; Shore, Reference Shore2013). Because of their reliance on verbal and visual communication, telemental health applications are particularly well suited to this medium of patient contact (e.g., see Myers & Turvey, Reference Myers and Turvey2012). Research comparing clinical diagnostic interviewing conducted via VTC and traditional face-to-face conditions has suggested generally good agreement in a variety of conditions, including dementia and cognitive impairment of various causes as well as neuropsychiatric groups and healthy controls (Barton, Morris, Rothlind, & Yaffe, Reference Barton, Morris, Rothlind and Yaffe2011; Loh, Donaldson, Flicker, Maher, & Goldswain, Reference Loh, Donaldson, Flicker, Maher and Goldswain2007; Shores et al., Reference Shores, Ryan-Dykes, Williams, Mamerto, Sadak, Pascualy and Peskind2004; Temple, Drummond, Valiquette, & Jozsvai, Reference Temple, Drummond, Valiquette and Jozsvai2010). The implementation of telerehabilitation programs (e.g., McCue, Fairman, & Pramuka, Reference McCue, Fairman and Pramuka2010) and teleneurology (primarily telestroke) is also increasing (Larner, Reference Larner2011), although the evidence base for VTC-based assessments beyond telestroke remains limited (Halley, Roine, Ohinmaa, & Dennett, Reference Halley, Roine, Ohinmaa and Dennett2013). These developments bring much-needed specialty services to people in remote and underserved areas, and are becoming more commonplace in view of greater availability and reduced costs of these technologies.
Along these lines, remote administration of several neuropsychological tests has yielded encouraging results, although no large-scale studies have been published to date. Neuropsychological tests relying upon verbal instructions and responses are particularly well suited to VTC administration, and good agreement between VTC and face-to-face testing has been seen in several studies. Montani et al. (Reference Montani, Billaud, Tyrrell, Fluchaire, Malterre, Lauvernay and Franco1997) were among the first to explore the use of the Mini-Mental State Examination (MMSE) and other cognitive screening tests in elderly inpatients, and this was followed by other investigations using a variety of brief measures in psychogeriatric (Ball & Puffett, Reference Ball and Puffett1998; Menon et al., Reference Menon, Kondapavalru, Krishna, Chrismer, Raskin, Hebel and Ruskin2001) and healthy older subjects (Hildebrand, Chow, Williams, Nelson, & Wass, Reference Hildebrand, Chow, Williams, Nelson and Wass2004). Vestal, Smith-Olinde, Hicks, Hutton, and Hart (Reference Vestal, Smith-Olinde, Hicks, Hutton and Hart2006) demonstrated good agreement between neuropsychological measures of language administered to a small group of patients with dementia in face-to-face and VTC conditions, and promising telehealth-based neuropsychological assessment programs have been described in military settings as well (Clement, Brooks, Dean, & Galaz, Reference Clement, Brooks, Dean and Galaz2001).
Early studies generally examined feasibility and mean differences in scores from the same tests administered in VTC and/or face-to-face test conditions. Most focused on healthy control subjects or small samples of patients with a variety of neuropsychiatric diagnoses, with little emphasis on traditional and more detailed neuropsychological assessments, and few investigations used a counterbalanced design combined with alternate test forms.
In 2006, we presented VTC-based results from a brief battery of standard neuropsychological tests in a preliminary sample (N=33) of older adults with and without cognitive impairment (Cullum, Weiner, Gehrmann, & Hynan, Reference Cullum, Weiner, Gehrmann and Hynan2006). Tests selected were commonly used in the evaluation of patients with known or suspected dementia. Criteria for the tests included brevity, reliability, availability of an alternate form, and they had to be amenable to the VTC environment using standard instructions. The sample consisted of 14 subjects with mild cognitive impairment and 19 with Alzheimer disease (AD). Good agreement was found across most tests examined, with intraclass correlations ranging from .58 to .88. Together with other encouraging findings from the literature using various brief assessments in small groups (Harrell, Wilkins, Connor, & Chodosh, Reference Harrell, Wilkins, Connor and Chodosh2014; Jacobsen, Sprenger, Andersson, & Krogstad, Reference Jacobsen, Sprenger, Andersson and Krogstad2003; Kirkwood, Peck, & Bennie, Reference Kirkwood, Peck and Bennie2000; Turner, Horner, VanKirk, Myrick, & Tuerk, Reference Turner, Dorner, VanKirk, Myrick and Tuerk2012; Vestal et al., Reference Vestal, Smith-Olinde, Hicks, Hutton and Hart2006), these results suggested that VTC-based neuropsychological assessment in older individuals with and without cognitive impairment was feasible and had promising reliability despite that no large-scale studies more definitively establishing reliability had been conducted. Consumer acceptability of VTC-based neuropsychological assessment has also been explored in a recent investigation, with results suggesting good acceptance of VTC-based testing among healthy and cognitively impaired older individuals (Parikh et al., Reference Parikh, Grosch, Graham, Hynan, Weiner and Cullum2013). A review of the teleneuropsychology literature and a list of tests used in this context can be found in Cullum and Grosch (Reference Cullum and Grosch2012).
Despite the encouraging results from several small studies suggesting the feasibility of VTC-based neuropsychological testing, questions remain regarding reliability and validity of various tests in larger samples, and the extent to which remote assessment can be done in patients with various levels of cognitive impairment. The purpose of this investigation was to compare psychometric results from a battery of common neuropsychological tests administered face-to-face and by VTC in a large group of urban and rural adults with and without cognitive impairment.
Method
Subjects
Subjects were recruited through the Alzheimer’s Disease Center at the University of Texas Southwestern Medical Center in Dallas, Texas and its satellite clinic in Talihina, Oklahoma, the latter serving the Choctaw Nation to obtain urban and rural-dwelling individuals diagnosed with mild cognitive impairment (MCI; Petersen et al., 1999), probable Alzheimer disease (AD; NINCDS/ADRDA guidelines), or normal cognition based upon multidisciplinary consensus diagnosis. Rural (Talihina) subjects were diagnosed on site by ADC personnel and/or by teleconference interviews augmented by more detailed on-site neuropsychological testing (which did not include the measures used in this study), CT or MRI scans, and appropriate blood tests (Weiner, Rossetti, & Harrah, Reference Weiner, Rossetti and Harrah2011).
Participants (and caregivers, as indicated) provided verbal and written informed consent for this study on forms approved by the Institutional Review Boards of the University of Texas Southwestern Medical Center and the Choctaw Nation. All subjects were fluent in English and this was their declared primary language. All had adequate eyesight and hearing. A total of 202 participants (59% healthy controls and 41% with MCI or AD) were included. For the purpose of this investigation, all subjects were combined. Sixty-three percent (127/202) of the sample was female. Other demographic features of the cohort are provided in Table 1.
Table 1 Demographic Characteristics of the Total Sample (N=202)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921003716953-0025:S1355617714000873:S1355617714000873_tab1.gif?pub-status=live)
Videoconferencing Equipment and Setup
Dallas-to-Talihina communication
Since 2003, a 384 kilobytes/s (Kbps) connection has been established for patient care and research through the commodity Internet between our campus in Dallas and the ADC outreach clinic in Talihina, Oklahoma. We used a “multi-point control unit” in Dallas and two endpoints, one in Dallas and one in Talihina. The data path for the VTC is through the commodity Internet. The videoconferencing communications standard used is the ITU-T H.323 packet-based multimedia communications protocol.
Dallas-to-Dallas communication
The videoconferencing equipment and procedures for the current project included an H.323 PC - Based Videoconferencing System (Polycom™ iPower 680 Series) that was set up in two non-adjacent rooms. Subjects viewed the examiner and test materials on a 26″ flat screen color monitor placed approximately 30″ away while seated at a desk. The examiner viewed subjects on a split screen that allowed them to also see the view from the subject’s perspective. For the presentation of visual stimuli, the examiner placed test materials in front of the camera in a fixed position so that stimuli were appropriately oriented and appeared similar in size on screen to the actual test items.
Procedure
Subjects were told that we were investigating two ways to administer our regular neuropsychological testing battery, and that one would be performed after the other on the same day. After signing informed consent, subjects were randomized to test condition using computer-generated random numbers. For VTC administration, subjects were seated in front of the computer screen by local staff, and were greeted and introduced to the remote examiner. Staff support at the remote site for dealing with the VTC equipment and subjects was available. No staff were in the VTC testing room with participants during assessments, as we observed that subjects adapted well to the VTC environment when provided with an introduction to the process by local staff. Tests were administered by experienced psychometrists or research assistants. Tests included measures of global cognitive status (MMSE), verbal episodic memory (Hopkins Verbal Learning Test-Revised; Benedict, Schretlen, Groninger, & Brandt, Reference Benedict, Schretlen, Groninger and Brandt1998), letter fluency (FAS), category fluency (Animals), confrontation naming (Boston Naming Test – 15 item; Mack, Freed, Williams, & Henderson, Reference Mack, Freed, Williams and Henderson1992), Digit Span forward and backward, and the Clock Drawing Test. The order of tests was fixed, and test form (standard or alternate version) was alternated for each consecutive subject. Alternate forms for all tests were used, with the exception of the MMSE, in which items were the same except for the use of an alternate three-word recall task. Standard test instructions were used for all tests. The only procedural differences for VTC administration were that subjects were asked to hold up their drawings to the camera so that the examiner could view each clearly. Scoring of drawings was done in real-time while subjects held up their products to the camera. Each week that subjects were tested at our remote site, the original packet of test forms was mailed to our lab for double-checking of scoring.
Statistical Analyses
Mean (standard deviations [SD]) are provided for each test score. Intraclass correlations coefficients (ICC) were used to assess the agreement between the two testing formats (face-to-face vs. VTC). The Bradley-Blackwood Procedure was used to examine the bias between the testing formats by simultaneously testing the equality of means and equality of variances; if the Bradley-Blackwood procedure resulted in a significant result, both the paired t test (significant result indicating that means are biased) and the Pitman Test (significant results indicating that variances are biased) were examined to determine the source of bias. In addition, Bland-Altman plots were examined to additionally explore the magnitude of the mean differences between methods. IBM SPSS V20 was used for all analyses, all statistical tests were two sided, and p<.05 indicated significant results.
Results
All 202 subjects from both testing sites completed both test conditions, with no missing data. Mean testing time was 41.3 min (SD=8.8) for VTC testing and 36.3 min (SD=7.1) for the face-to-face condition, with a mean of 21.3 min (SD=8.9) between test sessions. Neuropsychological results from VTC and face-to-face testing conditions are presented in Table 2.
Table 2 Mean (SD) and ICC Results for VTC and Face-to-Face Conditions by Test
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921003716953-0025:S1355617714000873:S1355617714000873_tab2.gif?pub-status=live)
The means across groups for all tests were highly similar. To examine the agreement of subject performance across test conditions, ICCs are presented in Table 2. All ICCs (range =0.55 to 0.91) were significant (p<.0001), suggesting good agreement across test administration conditions.
To examine statistical variations in scores across test conditions, results from the Bradley-Blackwood Procedure and the Bland Altman plots were reviewed (Table 3). All tests had non-significant Bradley-Blackwood Procedures except the HVLT-R and BNT-15. For the HVLT-R, the paired t test was significant (p=.005) but the Pitman Test was non-significant (p=.785); means for VTC were slightly higher [23.4 (6.90)] than in the face-to-face condition [22.6 (6.98)]. For the BNT-15, the Pitman Test was significant and the paired t test was non-significant, indicating that the variances for the VTC and face-to-face conditions were statistically different (2.432 vs. 2.162) despite almost identical means (13.1 vs. 13.3). A review of the Bland-Altman plots for each measure was found to show very low to no bias, thereby underscoring the psychometric similarity of all results between test conditions (results not shown).
Table 3 Agreement Diagnostics Across Tests
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20160921003716953-0025:S1355617714000873:S1355617714000873_tab3.gif?pub-status=live)
Discussion
These data represent the largest study to date of VTC-based neuropsychological testing. We found good agreement between the VTC- and traditional face-to-face-administered neuropsychological tests examined. Mean differences between test conditions using alternate test forms were minimal, and Intraclass correlations across all measures were significant, with most reflecting strong agreement between results obtained in both conditions. Intraclass correlations ranged from .55 on digit span backward to .91 on a global measure of cognitive function (MMSE), with a mean overall correlation of .74 across all tests administered. The highest correlation involved the MMSE, which is not surprising given the short time between test administrations and the fact that all items except the three-word recall stimuli were the same. The majority of the other results depict very similar performances across measures of verbal learning, fluency, simple attention, and naming. HVLT-R scores were slightly but statistically significantly higher in the VTC condition, and this difference was not explained by greater variability or test condition-by-form interaction. Although examiners were trained and instructed to administer tests in standard manner regardless of test condition, it is possible that examiners may have enunciated more clearly or presented stimuli at a slightly different rate to adapt to the VTC condition that resulted in slightly higher scores. However, the mean scores, variability, and range of scores between test conditions were highly similar, with all differences between test conditions being clinically negligible.
These overall results, which depict similar findings in both traditional face-to-face and VTC testing conditions, are consistent with most of the preliminary teleneuropsychology findings obtained in much smaller samples and with our initial pilot results (see Cullum & Grosch, Reference Cullum and Grosch2012, for a review). This suggests that a brief battery of tests designed to assess adult patients with known or suspected cognitive impairment can be feasibly administered via VTC and that the tests examined produce very similar results regardless of administration condition in these subjects. Results also provide support for the feasibility of remote neuropsychological assessment in rural American Indian populations, which has been our experience in this population using telepsychiatry (Weiner et al., Reference Weiner, Rossetti and Harrah2011).
These findings also provide support for the similarity of the alternate forms of tests that were examined. Since test form was counterbalanced along with test condition, we were able to examine potential interaction effects, although none were found, suggesting that the specific alternate forms of tests selected minimized practice effects and yielded similar levels of performance. Of note was our selection of three-word recall items, which included two sets of simple nouns of similar length, concreteness, and frequency in the English language. Similarly, performance on digit span forward and backward trials was very similar between groups, indicating that our selection of stimuli for this test resulted in items of similar difficulty levels, although digit span backward and forward did have the lowest correlations across all tests examined. Thus, it is possible that our choice of alternate digit strings resulted in the lower correlations, and other versions (e.g., WAIS-4, RBANS) may show higher correlations. Additionally, our use of Forms 1 and 4 of the HVLT-R demonstrated good correspondence (consistent with standard HVLT-R test–retest values), although t test results did reach significance despite similar mean scores of 22.55 versus 23.43. Scores on the 15-item short forms of the Boston Naming Test were also similar, as were scores on measures of verbal fluency, indicating that we were successful in selecting and developing tasks of similar levels of difficulty across most cognitive domains.
Limitations of our study include the small number of neuropsychological tests examined, as information remains lacking for the majority of measures in common clinical use. While the test battery was purposely designed to be brief and appropriate for dementia evaluations, further investigation of VTC-based neuropsychological assessment with a broader array of tools is needed. Although the current results are indeed strong in their support for VTC-based testing, it cannot be assumed that all neuropsychological tests can be administered validly in this medium. For the current project, tests were selected to be easily administered via VTC (i.e., predominantly verbal measures requiring verbal instructions and responses). As such, we did not find that any instructions required modification for most measures. Administration of the Clock Drawing Test did require some additional instructions and procedures, but this did not alter actual administration procedures nor did it require scoring modifications. Further investigation with additional tests and other nonverbal measures and tasks requiring more equipment are needed to explore the feasibility, validity, and reliability among other assessment tools and in more comprehensive test batteries.
Additional limitations to generalizing beyond the current results include the relatively high level of education of the sample and that exploration of reliability and validity by ethnicity was not conducted in the current study. It is also not clear how level of cognitive impairment might impact the validity of VTC-based testing, although our range of MMSE scores went as low as 15. This suggests that even patients with rather severe levels of impairment can potentially be tested via VTC using similar procedures, although behavioral issues will dictate which subjects may not be appropriate for remote assessment, particularly in the absence of a staff member in the room with the individual.
Importantly, all 202 of our subjects were able to complete testing without additional remote assistance beyond the examiner, and there were no significant problems with audio or visual transmission, or the ability of subjects to understand and comply with test instructions. This is consistent with our recent report of good acceptability of VTC-based neuropsychological testing among a subgroup of the subjects in the current investigation wherein the majority (approximately two-thirds) of individuals with or without cognitive impairment expressed no preference between VTC and face-to-face testing (Parikh et al., Reference Parikh, Grosch, Graham, Hynan, Weiner and Cullum2013). A final limitation may relate to the connections between local and remote testing sites, as we had the advantage of a secure network. Whether more challenges to the testing environment and validity of assessments will occur with less secure systems or different connection speeds remains to be seen, although as telehealth technologies advance, this may become less of an issue. Data security is an issue, however, whether in research or clinical settings. Ensuring that connections are secure with appropriate levels of data encryption and privacy safeguards is essential, as is knowledge of relevant practice issues such as state licensing laws (e.g., see Grosch, Gottlieb, & Cullum, 2012). It has been difficult to determine the relative cost of VTC versus face-to-face testing (Weiner et al., Reference Weiner, Rossetti and Harrah2011), but the ready availability of computers with video capacity and of communication by Internet is continually reducing equipment and communication costs.
In summary, VTC-based neuropsychological assessment appears to be a useful alternative to traditional face-to-face testing for the assessment tools studied to date. Similar results can be achieved with these predominantly verbally-oriented measures, and the entire battery of tests as outlined herein can be conducted in under an hour in most cases. While designed as a brief battery of tests for older individuals, these measures enjoy common use in the field and are applicable to adults with a wide array of known or suspected cognitive disorders, thereby suggesting applicability to other populations.
Future studies will need to demonstrate the ability of VTC-based testing to distinguish among various clinical groups and controls of various ages to demonstrate validity, although the current results are indeed promising, and additional analyses of our different diagnostic groups are underway. An important area for further development is the application of teleneuropsychology and telepsychology to children, wherein very little research has been done (e.g., see Sloan, Reese, & McClellan, Reference Sloan, Reese and McClellan2012). Teleneuropsychology services require additional considerations for implementation in research and clinical settings, although initial guidelines and practice recommendations now exist (e.g., see American Psychological Association, 2013; Turvey et al., Reference Turvey, Coleman, Dennison, Drude, Goldenson, Hirsch and Bernard2013; Grosch et al., 2012), and the future appears bright for the application of neuropsychological procedures in the rapidly evolving telehealthcare scene.
Acknowledgments
This work was supported by the National Institutes of Health (grant numbers 1-R01-AG027776-01A1 and 5-P30-AG12300-18). No conflicts of interest exist among co-authors affecting this manuscript. Thanks to Laura Graham, Kristin Martin-Cook, Hugo Pons, and Andy Guynn for their assistance with the project and to the Choctaw Nation of Oklahoma for their support and participation in this study.