INTRODUCTION
Telehealth is the remote delivery of healthcare services from one site to another (e.g., office, home, clinic). Telehealth has evolved over the past two decades as an important mode of healthcare delivery. From medicine to behavioral health, telehealth reaches individuals beyond the treatment room to deliver assessment, intervention, and follow-up services remotely. Within the field of psychological therapy, teletherapy has grown in popularity. This is due, in part, to the increasing number of studies demonstrating evidence-based treatments can be delivered efficaciously through telehealth (Bashshur, Shannon, Bashur, & Yellowlees, Reference Bashshur, Shannon, Bashshur and Yellowlees2016; Berryhill et al., Reference Berryhill, Culmer, Williams, Halli-Tierney, Betancourt, Roberts and King2019; Egede et al., Reference Egede, Acierno, Knapp, Lejuez, Hernandez-Tejada, Payne and Frueh2015). On the other hand, research on cognitive and other standardized evaluations delivered remotely (i.e., teletesting), particularly for children and adolescents, remains limited.
Telehealth has the potential to increase access for populations who experience social, economic, geographical, and/or health-related barriers to care (Marcin, Shaikh, & Steinhorn, Reference Marcin, Shaikh and Steinhorn2016; Weinstein et al., Reference Weinstein, Lopez, Joseph, Erps, Holcomb, Barker and Krupinski2014). Telehealth may also inadvertently exacerbate economic disparities based on access and technological literacy. However, access is rapidly growing; the American Community Survey documents 90.3% of US households have a computer and 82.7% of households have broadband Internet subscriptions (see United States Census Bureau, 2019). In addition, the American Academy of Pediatrics is actively working to improve access to telehealth for groups impacted by these factors (see Jenco, Reference Jenco2020), and teleassessment should evolve with these considerations in mind. Remote cognitive assessment via teletesting requires additional consideration given the standardized nature of administration procedures that are inherently changed when presented over a screen (Hewitt, Rodgin, Loring, Pritchard, & Jacobson, Reference Hewitt, Rodgin, Loring, Pritchard and Jacobson2020). Presentation of stimuli has been demonstrated to be equivalent when shown on an iPad versus a traditional printed booklet (Daniel, Wahlstrom, & Zhang, Reference Daniel, Wahlstrom and Zhang2014). Some test publishing companies had previously released digital assessments tools (e.g., Q-Interactive iPad assessments via Pearson) designed to replace physical “pen and paper” testing materials, but these still require face-to-face administration. In response to the COVID-19 pause in clinic-based services, Pearson Assessments released a “Letter of No Objection,” dated March 20, 2020, permitting the use of copyrighted materials to assist in remote assessments. Given the pivot to “on screen” presentation of stimulus books and stimuli that have not yet been normed for remote administration, there is a critical need to assess the equivalence of remote assessments delivered via telehealth.
The majority of studies examining equivalency of performance between in-person and teletesting have focused on adult populations with few studies among children and adolescents. Overall, the literature suggests no effect of testing for remote versus in-person administration for neuropsychological testing (for metanalysis, see Brearly et al., Reference Brearly, Shura, Martindale, Lazowski, Luxton, Shenal and Rowland2017), though this was limited by few available studies, selection bias related to participant age, and mixed designs. This comparability has been documented across a range of populations, such as for those with cognitive impairment (Wadsworth et al., Reference Wadsworth, Dhima, Womack, Hart, Weiner, Hynan and Cullum2018), within culturally diverse groups (Vahia et al., Reference Vahia, Ng, Camacho, Cardenas, Cherner, Depp and Agha2015), and for individuals with intellectual disability (Temple, Drummond, Valiquette, & Jozsvai, Reference Temple, Drummond, Valiquette and Jozsvai2010). Support for equivalence also exists for differing referral concerns, such as dementia (Cullum, Weiner, Gehrmann, & Hynan, Reference Cullum, Weiner, Gehrmann and Hynan2006; Cullum, Hynan, Grosch, Parikh, & Weiner, Reference Cullum, Hynan, Grosch, Parikh and Weiner2014), speech–language (Waite, Theodoros, Russell, & Cahill, Reference Waite, Theodoros, Russell and Cahill2010), academic (Wright, Reference Wright2016), learning disabilities (Hodge et al, Reference Hodge, Sutherland, Jeng, Bale, Batta, Cambridge and Silove2019), demyelinating disorders (Harder et al., Reference Harder, Hernandez, Hague, Neumann, McCreary, Cullum and Greenberg2020), neurodegenerative diseases (Ragbeer, Augustine, Mink, et al., Reference Ragbeer, Augustine, Mink, Thatcher, Vierhile and Adams2016), and broader neuropsychological (Galusha-Glasscock, Horton, Weiner, & Cullum, Reference Galusha-Glasscock, Horton, Weiner and Cullum2016) evaluations. However, given that the majority of these studies included adult measures or those specific to neuropsychological assessment [e.g., Boston Naming Test, Clock Drawing, Mini-Mental Status Exam, Rey Auditory Verbal Learning Test, Repeatable Battery for the Assessment of Neurological Status, Weschler Adult Intelligence Scale (WAIS); see Brearly et al., Reference Brearly, Shura, Martindale, Lazowski, Luxton, Shenal and Rowland2017 for metanalysis], the translation to pediatric care remains unclear.
The COVID-19 pandemic abruptly halted all nonessential services, centering the importance of teletesting before researchers could establish an evidence base. This left many stakeholders without a clear path for assessment services for referred patients as well as students. Several conflicting position papers were released citing validity concerns and lack of evidence regarding teleassessment for school-based evaluations. Given the need to adhere to timelines, the National Association of School Psychologists released updated guidelines specific to the school setting (National Association of School Psychologists, 2020), whereas others encouraged waiting for the return to in-person assessment.
As the situation continues to evolve, psychologists grapple with balancing safety, validity, and ethical responsibility. Farmer et al. (Reference Farmer, McGill, Dombrowski, McClain, Harris, Lockwood and Loethen2020a) offered several considerations for the delivery of teleassessment with a lens toward implications for policy and practice. The authors argue that although evidence exists within the adult literature, the child and adolescent literature remains limited and requires unique considerations, particularly in relation to special education services. In the second paper, the authors provide a commentary on the limitations of validity for local educational agencies to consider (Farmer et al., Reference Farmer, McGill, Dombrowski, Benson, Smith-Kellen, Lockwood and Stinnett2020b). Others have outlined solutions for teleassessment and emphasize the importance of moving forward with remote administration, with appropriate caution, for both pediatric and adult groups (Hewitt et al., Reference Hewitt, Rodgin, Loring, Pritchard and Jacobson2020).
More recently, the feasibility of teletesting has been demonstrated in pediatric patients using a wide variety of measures (specific cognitive measures included selected subtests from the Wechsler Abbreviated Scales of Intelligence – Second Edition and Differential Ability Scales – Second Edition and specific academic measures included selected subtests from Bracken expressive form, Comprehensive Test of Phonological Processing, and Wechsler Individual Achievement Test – Third Edition; Ransom et al., Reference Ransom, Butt, DiVirgilio, Cederberg, Srnka, Hess and Katzenstein2020) yet comparison to in-person administration remains limited. Harder et al. (Reference Harder, Hernandez, Hague, Neumann, McCreary, Cullum and Greenberg2020) examined teletesting versus in-person assessment via a test–retest design among pediatric patients recruited within a demyelinating clinic. Findings did not reveal any differences in test scores between the two conditions on selected subtests of the California Verbal Learning Test (Children’s Version and Second Edition), Symbol Digit Modalities Test, or selected subtests of the Wechsler Intelligence Scale for Children – Fifth Edition (WISC-V), WAIS-IV, Beery-Buktenica Developmental Test of Visual-Motor Integration – Visual Perception, Delis–Kaplan Executive Function System, or Woodcock–Johnson Tests of Academic Achievement – Third Edition.
As discussed by Wright (Reference Wright2020) and Hewitt et al. (Reference Hewitt, Rodgin, Loring, Pritchard and Jacobson2020), remote assessment has the potential to alleviate many of the preexisting structural and systematic challenges to educational evaluations that have only become compounded by the COVID-19 crisis. With the uncertainty of timelines for school services resuming fully in person, teletesting also has the potential to address challenges related to social distancing and long waitlists.
In light of these concerns, Wright (Reference Wright2020) examined performance on the WISC-V (Wechsler, Reference Wechsler2014) administered either in person or remotely utilizing a proctor, among a sample of 256 school children. Results did not reveal a method effect for most subtests (except for Letter–Number Sequencing), Index, or Full-Scale IQ scores. Although encouraging, this study did not include clinically referred children who may show differences in performance by administration type. This study also required the use of a proctor to manage materials in the remote condition, which is not consistent with clinical practices, and the authors only examined the WISC-V. As such, there is a need to examine remote testing strategies among children who are clinically referred in a real-world setting using both cognitive and academic assessments that are crucial for educational evaluations.
Evaluations are often coupled with unique circumstances, such as time-sensitive referral questions or high-stakes eligibility determinations. Additionally, clinically referred children may perform differently due to suspected cognitive or learning needs. Given that some clinics have either been instructed (by institutional guidelines) or opted to convert to telehealth rather than, or in addition to, in-person visits, it is critical that research examine the equivalence of teletesting within this group. Thus, the goal of the present study was to expand upon the nascent pediatric teletesting literature by examining the equivalence of subtests of cognitive (WISC-V) and academic [Kaufman Test of Educational Achievement – Third Edition (KTEA-3); Kaufman & Kaufman, Reference Kaufman and Kaufman2014] batteries administered via teletesting versus face to face within a clinically referred sample.
METHODS
Study Design, Inclusion Criteria, and Population
Participants in this retrospective cross-sectional study were referred for psychological/neuropsychological assessment at an urban outpatient testing service of a pediatric hospital in the Mid-Atlantic region of the US. To be included in this study, the participant: (a) must have been between 4 and 18 years of age; and (b) have received a psychological/neuropsychological assessment using any of the measures of interest between November 2019 and March 2020 (for in-person assessment) or April 2020 and August 2020 (for telehealth-based assessment). Evaluations were conducted by clinical psychologists and clinical neuropsychologists, utilizing the Q-Global platform for remote subtest administration. Data from clinical evaluations are routinely entered into the electronic medical record and de-identified records can be retrieved for analysis following appropriate approvals. The hospital’s Institutional Review Board approved this retrospective review.
Participants were included if they received at least one subtest on either measure. Thus, some received KTEA-3 but not WISC-V, or vice versa. A total of 893 youth were included (Mean age = 10.1 years, SD = 2.9 years); 61% were male and 35% of families were receiving Medical Assistance/Public Insurance. ADHD (61%) and Anxiety/Depression (22%) disorders were the most common billing diagnoses. Slightly more than half were White (54%), with the remaining identifying as Black/African-American (31%) or “Other” races (15%). Within the “Other racial group” (n = 129), 41% were listed as “Other” race in the electronic health records system, 29% were Asian, 27% were Multiracial, and the remaining were Hispanic (7%), Native American (2%), and Asian Indian (2%); race was missing for 3% of the sample. See Table 1 for details.
* p < .05; †first and second billing diagnoses were included in the coding, thus patients could have multiple diagnoses billed.
Dependent Variables
Academic achievement
Educational screening was conducted using the KTEA-3 (Kaufman & Kaufman, Reference Kaufman and Kaufman2014). The KTEA-3 is a psychometrically sound, academic assessment designed for individuals aged 4–26 or grades prekindergarten through 12. The Letter and Word Recognition and Math Concepts and Applications subtests were chosen as these subtests provide a brief screening of core academic skills. Most importantly for this study, these tests are amenable to telehealth and were available online for remote administration since the beginning of the COVID-19 stay-at-home order in our state. Standard scores for Letter and Word Recognition as well as Math Concepts were used in the analyses.
Intelligence
Core reasoning and brief attention were measured using the WISC-V (Wechsler, Reference Wechsler2014). The WISC-V is a well-validated, psychometrically sound, cognitive assessment for use in children aged 6–16. As teletesting utilized subtests most amenable to remote administration by not requiring physical manipulation or written responses, subtests examined included Similarities, Matrix Reasoning, Digit Span, Vocabulary, and Visual Puzzles.
Independent and Control Variables
Demographics
Date of appointment, mode of assessment (in-person vs. telehealth), age (in years), sex (male and female), race (White, Black/African American, Other), insurance type (private and public), and billing diagnosis were captured from the electronic health records system. Billing diagnoses, based on International Classification of Diseases codes, 10th edition, were classified as anxiety/depression (F41, F32, F33, F34, F39), adjustment disorders (F43), attention-based disorders (e.g., F90, R41), epilepsy (G40), oncology (C and D), encephalopathy (G93, G94, G95, G96), genetic conditions (Q), and other (less commonly billed medical and mental health diagnoses). Diagnoses were coded if they were billed as either primary or secondary. Parental education was also captured from the online pre-visit questionnaire (see below).
Online pre-visit parent ratings.
Parents of children scheduled for psychological or neuropsychological assessment were sent a letter providing information about their upcoming appointment. The letter included a weblink to an online pre-visit custom developmental history questionnaire hosted via a secure third-party data collection platform. The questionnaire included a series of embedded parent-reported rating scales (described below).
Most (79%) of parent ratings were completed prior to the assessment (median days between questionnaire completion and assessment was 66), although parents were given the option to complete the questionnaire on the day of evaluations, as needed. Parent ratings were more often available for children who completed in-person assessment (82%) compared to telehealth (71%).
In total, eight parent-rated measures were captured from the online pre-visit questionnaire. As with the demographic variables, scores on these measures were employed as control variables to account for any potential differences between children receiving in-person versus telehealth assessments. Internalizing problems were assessed via the Generalized Anxiety (6 items) and Major Depression subscales (10 items) from the Revised Children’s Anxiety and Depression Scale – Parent Version (RCADS: Ebesutani et al., Reference Ebesutani, Bernstein, Nakamura, Chorpita and Weisz2010). Externalizing problems were evaluated using a subset of eight items tapping Oppositional Defiance and Conduct Disorder (VAN-Conduct) from the Vanderbilt ADHD Diagnostic Parent Rating Scale (Wolraich et al., Reference Wolraich, Lambert, Doffing, Bickman, Simmons and Worley2003). The Colorado Learning Difficulties Questionnaire (CLDQ; Willcutt et al., Reference Willcutt, Boada, Riddle, Chhabildas, DeFries and Pennington2011) was employed to identify potential academic problems in the areas of math (CLDQ – Math; five items) and reading (CLDQ – Reading; six items). The Attention Deficit Hyperactivity Disorder (ADHD) Rating Scale-5 (DuPaul et al., Reference DuPaul, Reid, Anastopoulos, Lambert, Watkins and Power2016), Home Version, was employed to evaluate ADHD symptoms based upon criteria from the Diagnostic and Statistical Manual of Mental Disorders – Fifth Edition (DSM-5; American Psychiatric Association, 2013). The ADHD Rating Scale-5 includes a total of 18 items assessing the hyperactivity (ADHD-HY) and inattention (ADHD-IN) symptom criteria. The Impairment Rating Scale (IRS; Fabiano et al., Reference Fabiano, Pelham, William, Waschbusch, Gnagy, Lahey and Burrows-MacLean2006) was used to measure the impairment of patients across several domains of functioning covering social/peer relationships, relationship with caregivers, academic progress, home life, and self-esteem domains of functioning. The 14-item Sluggish Cognitive Tempo (SCT; Penny, Waschbusch, Klein, Corkum, & Eskes, Reference Penny, Waschbusch, Klein, Corkum and Eskes2009) scale was used as a measure of cognitive processing speed. All parent-reported measures discussed above have demonstrated strong psychometric properties.
STATISTICAL ANALYSIS PLAN
The goal of this retrospective, cross-sectional study was to evaluate differences between in-person versus virtual administration of select cognitive and academic tests. The primary methodologic concern with this design, and thus this study, is confounding. In the present study, confounding is an important concern because we assume that, after adjusting for numerous sociodemographic and clinical variables, the samples of children in our study who received in-person versus virtual tests were exchangeable (i.e., similar on all observed and unobserved variables).
The first step in the analysis was to examine any differences between the teletesting and in-person groups on demographic and clinical characteristics as well as subtests administered, using descriptive statistics and bivariate (t-test, χ 2) analyses. The next step was to address missingness. For demographics and billing diagnoses, there was little to no missing data (<2%). However, there was substantial missingness for the parent-reported ratings (see Table 1). To address this, parent-reported mean raw summary scores were imputed using multiple linear regression methods. Included in the model were patient demographics (age, sex, insurance type) and billing diagnoses (see Table 1). This allowed for full sample inclusion, which has been shown to be less biased than complete case analysis (Wang & Rao, Reference Wang and Rao2001). The imputed data were only employed in the regression analyses.
The third and final step was to examine the association between assessment type and scores on academic and cognitive subtests after adjusting for all demographic, parent-reported, and diagnostic variables. We took a test-wise approach for the analysis, such that each analysis was conducted by subtest, rather than by participant. This allowed for preserving as much data as possible, rather than requiring each participant to have all subtests of interest. For instance, only 15% of the sample received all seven subtests (See Table 1).
To identify differences in subtest scores between in-person versus teletesting methods, a doubly robust inverse probability of exposure weighted (IPEW) linear regression model was employed. This model has two parts. It includes IPEW, based on a propensity score or a single, numerical summary of information representing the probability of exposure (or assessment type) conditional on a set of baseline covariates (i.e., demographic and clinical differences). Weighting by the inverse of exposure, in effect, creates a synthetic sample in which assessment is independent of, and thus balanced across, covariates (Joffe, Ten Have, Feldman, & Kimmel, Reference Joffe, Ten Have, Feldman and Kimmel2004). The second part of the model, which reflects the term “doubly robust,” means the regression model also includes all the variables as covariates, along with the IPEW, in the multivariate model to ensure any residual differences, not captured by the weights, are addressed. Robust standard errors were employed in all models, to address any unobserved clustering or misspecification. When a significant difference was found (i.e., p < .05), effect sizes were calculated using Cohen’s d (Lakens, Reference Lakens2013).
A major benefit of the IPEW model, compared to least squares regression, is that this approach is not subject to multicollinearity. As such, a total of 21 variables were included in the model (i.e., demographics, billing diagnoses, parent-reported symptoms, and an indicator of missingness of the parent-report forms). Using all available information is the recommended approach in IPEW since it maximizes exchangeability between groups (Austin & Stuart, Reference Austin and Stuart2015). The final step in the IPEW analysis is to ensure that the weighting procedure was effective in addressing differences between the groups. This is achieved by: (1) a chi-square test of any remaining differences between groups in the final model and (2) ensuring the standardized differences in means (interpreted the same as effect size) are relatively small (e.g., <.1, or a 10% difference; Austin & Stuart, Reference Austin and Stuart2015) after the weighting procedure is employed. All analyses were performed in STATA 15.0 (College Station). The t-effects package in STATA was employed to calculate the IPEW. Alpha was set at p < .05 for determining statistical significance.
RESULTS
Demographic and diagnostic differences between groups
There were very few demographic differences between the in-person versus teletesting groups. No differences were found in age, sex, race, or insurance type (all p > .05; see Table 1). However, there were differences in billing diagnoses, such that those in the teletesting group had lower proportions of ADHD (χ2 = 6.45, p = .01) and anxiety/depression (χ2 = 25.10, p < .001) and greater proportions of encephalopathy (χ2 = 5.20, p = .02) and epilepsy (χ2 = 4.40, p = .04).
Completed subtests and parent-report ratings between groups
The in-person group was less likely to receive the KTEA-3 Letter and Word Recognition (χ2 = 6.49, p = .01) or the Math Concepts and Applications (χ2 = 13.47, p < .001) subtests. Those in the in-person group, however, were more likely to receive at least one of the five WISC-V subtests (Similarities, χ2 = 139.19, p < .001; Matrix Reasoning, χ2 = 118.31, p < .001; Digit Span, χ2 = 21.75, p < .001; Vocabulary, χ2 = 120.99, p < .001; Visual Puzzles, 131.15, p < .001) than the telehealth group (p > .05). Overall, there was a greater number of KTEA-3/WISC-V subtests completed among those in the in-person versus telehealth group (χ2 = 148.35, p < .001; see Table 1 for details). Finally, there was a greater proportion of completed parent-report ratings among those in the in-person versus teletesting groups, likely related to a suspension of the requirement for completion prior to scheduling during the telehealth period (χ2 = 12.40, p < .001).
Unadjusted subtest and parent-reported symptom differences between groups
Tables 1 and 2 display the unadjusted differences in KTEA-3 and WISC-V scores as well as symptom ratings between groups. For the KTEA-3, no differences were found in Letter and Word Recognition scores; however, there were slightly higher Math Concepts scores in the teletesting group. The only difference in WISC-V scores was a slightly higher Visual Puzzles score in the teletesting group (see Table 2). For parent-reported symptoms, those in the teletesting group had lower SCT scores when compared to the in-person group (t = 2.57, p = .01); no other differences were found.
Positive mean differences reflect greater teleassessment scores compared to in-person assessment.
Adjusted subtest differences between groups
After employing the doubly robust, IPEW regression model, no differences were found in KTEA-3 Letter and Word Recognition scores (β = 1.12, 95% CI: −1.14, 3.37, p = .33); however, there remained a small difference in Math Concepts scores (β = 2.95, 95% CI: .24, 5.67, p = .03). For cognitive scores, no differences were found for the following WISC-V subtests: Similarities (β = .18, 95% CI: −.33, .69, p = .47), Matrix Reasoning (β = −.24, 95% CI: −.83, .35, p = .42), Digit Span (β = .42, 95% CI: −.20, 1.05, p = .18), and Vocabulary (β = .44, 95% CI: −.11, 1.00, p = .11). However, Visual Puzzles was slightly higher for the teletesting group (β = .96, 95% CI: .29, 1.63, p = .005). Effect sizes for both of the significant findings were small (WISC-V Visual Puzzles, d = .33; KTEA-3 Math Concepts, d = .18)
After the weighting procedure, the chi-square test for covariate balance was nonsignificant for all tests (all p > .50). This demonstrates there were no statistically significant residual differences between the groups after the weights were applied. However, a few imbalances (>.10 or a 10% difference; Austin & Stuart, Reference Austin and Stuart2015) remained, albeit nonsignificantly, for parent-reported symptoms. No covariate imbalances were observed for the KTEA-3 tests. For the WISC-V tests, a few imbalances remained for the Similarities (Depression, Math, and SCT), Matrix Reasoning (Depression), Digit Span (Depression and SCT), Vocabulary (Depression and Math), Visual Puzzles (Depression, Math, and Reading). A total of 21 variables were included as covariates, across 7 tests (equaling 147 total adjustments). Given only 7% were above the threshold for imbalance, and the chi-square test was highly nonsignificant for each test, these findings suggest overall covariate balance was well achieved for this study.
DISCUSSION
This study examined the equivalence of in-person versus teletesting within a referred pediatric sample. This question is of critical importance given the current COVID-19 pandemic and related changes in assessment methodology across both healthcare and school settings. Results from the multivariate analyses found equivalency of performance on four of the five WISC-V subtests and one of the two KTEA-3 subtests across administration methods. Of the two subtests that differed statistically, the effect sizes were both small in magnitude. In both instances, the teletesting group scored slightly higher than the in-person group. However, given there was less than a 1-point difference in Visual Puzzles scaled scores and less than a 3-point difference in KTEA-3 Math Concepts standard scores, these differences are not clinically meaningful. These findings, along with the equivalency of five additional subtests, provide support for the use of these subtests via teletesting. Within the present study, the adaptation of teletesting was left to clinician discretion. As such, there were some diagnostic group differences that resulted from greater uptake of telehealth by neuropsychologists in certain clinics.
Although test publishers have released digital assessments that are ideal for use in teletesting, these have been critiqued as inherently different from remote administration (e.g., iPad display replicates testing booklets by lying flat on the testing table, whereas remote assessment often involves upright screens; Farmer et al., Reference Farmer, McGill, Dombrowski, McClain, Harris, Lockwood and Loethen2020a). Despite these concerns, findings suggest that standardized materials are robust to these variations and teletesting does not substantively impact scores in either direction.
Findings add to the nascent literature on validity for remote administration in children and adolescents. Specifically, although prior studies have demonstrated the feasibility of teletesting, this study expands the literature by demonstrating the equivalence of teletesting relative to in-person assessment. We also replicate prior work investigating the WISC-V and expand prior work including academic measures by examining selected subtests of the KTEA-3. In addition, whereas prior work has included the use of a “proctor” or “assistant” for administration of the WISC-V on the participant’s end (Wright, Reference Wright2016), this study documents that children and adolescents are generally capable of navigating the testing environment without that level of assistance. This greatly reduces the risk of viral exposure and also provides support for improving broader access to services. Further, this study demonstrates equivalency within a diverse sample, as our sample is representative of our city (Baltimore, Maryland) as well as the US overall (United States Census Bureau, 2019).
With due consideration to the technical, ethical, and legal factors related to service delivery (Farmer et al., Reference Farmer, McGill, Dombrowski, McClain, Harris, Lockwood and Loethen2020a, b; Hewitt et al., Reference Hewitt, Rodgin, Loring, Pritchard and Jacobson2020), providers now have additional evidence for comparability of telehealth and traditional in-person assessment methods. With accumulating evidence for comparability, psychologists can have confidence in the validity of these measures and can now turn their focus to considering whether telehealth methods are appropriate for specific patients or students, based upon factors unique to each referral. Further, several recent papers have detailed models for clinical decision-making related to teletesting, such as through tiered triage (Koterba et al., Reference Koterba, Baum, Hamner, Busch, Davis, Tlustos-Carter and Slomine2020; Peterson, Ludwig, & Jashar, Reference Peterson, Ludwig and Jashar2020; Pritchard et al., Reference Pritchard, Sweeney, Salorio and Jacobson2020).
Beyond the unique circumstances that the COVID-19 pandemic has imposed on families, educators, schools, psychologists, and test publishers alike, teletesting has critical implications for reducing longstanding barriers related to distance/transportation and subsequent disparities in care. Evidence of equivalency provides a strong foundation from which providers can actively and confidently serve students and patients from underserved populations moving forward.
LIMITATIONS AND STRENGTHS
There are several methodological strengths and limitations that should be considered when interpreting the findings. First, this cross-sectional study employed a sequential cohort design in which the in-person and teletesting groups included different samples. One benefit of this design is that it avoids practice effects or fatigue of repeat administration. However, this design raises concerns about confounding between groups. The analysis paid close attention to this issue in two ways. First, this study employed robust measurement of demographic, clinical, and parent-reported characteristics of the child. Second, modern statistical methods were used to account for these differences, including testing for any residual differences after adjustment.
Unfortunately, the sample size could not support the assessment of subgroup differences due to lack of power. This was particularly true for race, where the sample underrepresented minorities other than Black/African Americans. This will be an important area of future research. Cognitive assessments were only obtained for children 6 years or older (i.e., WISC-V) and thus such assessments of younger children should be explored. Another limitation is that there may be additional factors to consider which were not available for analysis, such as technological (e.g., Internet speed, screen size, setting/environment) and clinical (e.g., referral reasons may impact whether testing was completed and which subtests were administered) factors which warrant further investigation.
Teletesting may be limited by economic barriers and findings should be interpreted with attention to technological access and literacy. Our department’s care coordination center assessed interest/comfort in telehealth visits as well as access to appropriate technology at the start of the pandemic. Data related to this are described in Pritchard et al., Reference Pritchard, Sweeney, Salorio and Jacobson2020 and detail that 94% of respondents were interested in telehealth appointments, whereas 74% had access to the technology needed. National rates of computer and Internet access exceed that of our study (United States Census Bureau, 2019). For those without adequate technology, hotspots or tablets were provided as part of a grant. Our department saw an increase in the proportion of visits for those with medical assistance following the transition to telehealth.
Furthermore, not all subtests were administered via both assessment modalities and thus current comparisons are limited to the subtests amenable to telehealth administration (i.e., do not require manipulatives/motor response). For a review of novel telehealth triage models with considerations for in-person versus teletesting, see Koterba et al., Reference Koterba, Baum, Hamner, Busch, Davis, Tlustos-Carter and Slomine2020 and Peterson et al., Reference Peterson, Ludwig and Jashar2020. In addition, results do not include composite and Full-Scale IQ scores (although selected subtests do allow for calculation of several composite scores). Pearson offers guidance on calculating non-motor Full-Scale IQ and GAI scores as well as non-motor processing speed and visual-spatial indexes through Essentials of WISC-V Integrated Assessment (Raiford, Reference Raiford2017). Given demonstrated equivalency, however, it is unlikely that differences would emerge when composites are computed. Future studies should explicitly explore subtests that are motor dependent to inform teletesting practices. Of note, the field is beginning to adopt technology that may provide avenues through which motor tasks are amenable to telehealth (e.g., Coding and Symbol Search administered through Q-Interactive).
Finally, this study employed a sample, albeit with missing data, gathered from a single site and thus may not be entirely generalizable across the US. Nevertheless, the sample was large and conducted in a real-world setting among a clinically and demographically heterogeneous group.
CONCLUSION
There is limited evidence for the validity of cognitive and academic teletesting with children and adolescents. The present study fills this gap by offering timely results demonstrating equivalence, between tele- and in-person assessment, across select WISC-V and KTEA-3 subtests in a large heterogeneous sample of referred children using robust measurement and analytic procedures. The findings hold important implications for reducing disparities through expanding teleassessment in the era of COVID-19 and beyond.
ACKNOWLEDGMENTS
None.
FINANCIAL SUPPORT
None.
CONFLICT OF INTEREST
The authors have nothing to disclose.
ETHICAL STANDARDS
Data were collected as part of routine clinical care. The Johns Hopkins Medicine Institutional Review Board granted approval to extract the data from the electronic health record and to create a separate de-identified research database for this study.