The Tower of London (TOL) test (Shallice, Reference Shallice1982) is a neuropsychological instrument that evaluates difficulties with planning and non-verbal problem solving, which are associated with frontal lobe dysfunction, especially in the right dorsolateral prefrontal cortex, cingulate cortex, and basal ganglia (Tirapu, García, Luna, & Periañez, Reference Tirapu, García, Luna, Periañez, Tirapu, García, Rios and Árdila2012). The planning sequence begins with proposing an objective, mentally rehearsing, applying one’s chosen strategy, and finally appraising whether or not the objective was acheived (Tirapu-Ustárroz, Muñoz-Céspedes, Pelegrín-Valero, & Albéniz-Ferreras, Reference Tirapu-Ustárroz, Muñoz-Céspedes, Pelegrín-Valero and Albéniz-Ferreras2005).
Shallice (Reference Shallice1982) created the original TOL. Most versions utilize two peg boards, each with three vertical pegs arranged with several colored beads. However, some test variants employ other physical objects, like the Monsters and Globes task (Kotovsky, Hayes, & Simon, Reference Kotovsky, Hayes and Simon1985) and computerized versions like the Cambridge Neuropsychological Test Automated Battery (CANTAB) (Owen, Downes, Sahakian, Polkey, & Robbins, Reference Owen, Downes, Sahakian, Polkey and Robbins1990).
The latest version of the TOL was created at Drexel University; as such, it is called the Tower of London-Drexel University 2nd Edition (TOLDXtm) (Culbertson & Zillmer, Reference Culbertson and Zillmer2001). It includes a children’s version (7 to 15 years old) and an adult version (16 and up). Beginning with the TOLDXtm test, the authors proposed a modification for people with intellectual disability (ID) (Culbertson & Zillmer, Reference Culbertson and Zillmer2001) (Figure 2). The present study’s objective was to gather evidence for the validity of that version.
Due to the TOL’s sensitivity to executive function deficits, numerous studies have deployed it to study different populations: subjects with dementia (Paula, Neves, Levy, Nassif, & Malloy-Diniz, Reference Paula, Neves, Levy, Nassif and Malloy-Diniz2012), traumatic brain injury (Cockburn, Reference Cockburn1995), patients with schizophrenia (Zhu et al., Reference Zhu, Liu, Wang, Jiang, Fang, Hu and Zhang2010), people with addictive behaviors (Davydov & Polunina, Reference Davydov and Polunina2004), and children with focal frontal brain lesions (Jacobs & Anderson, Reference Jacobs and Anderson2002).
The TOLDXtm has various adaptations: for Alzheimer’s type dementia (Rainville et al., Reference Rainville, Amieva, Lafont, Dartigues, Orgogozo and Fabrigoule2002) and Parkinson’s disease (Culbertson, Moberg, Duda, Stern, & Weintraub, Reference Culbertson, Moberg, Duda, Stern and Weintraub2004). The Neuronorma project has provided normative data in Spanish adult (Peña-Casanova et al., Reference Peña-Casanova, Quiñones-Ubeda, Gramunt-Fombuena, Quintana, Aguilar, Molinuevo and Blesa2009) and young adult (Rognoni et al., Reference Rognoni, Casals-Coll, Sánchez-Benavides, Quintana, Manero, Calvo and Peña-Casanova2013) populations.
As for populations with ID, the TOL has been the instrument of choice to study the relation between motor performance and executive functioning (Hartman, Wouwen, Scherder, & Visscher, Reference Hartman, Wouwen, Scherder and Visscher2010) as part of cognitive testing of children and adolescents with autism spectrum disorders (Robinson, Goddard, Dritschel, Wisley, & Howlin, Reference Robinson, Goddard, Dritschel, Wisley and Howlin2009), and adults with fragile X syndrome (Moore et al., Reference Moore, Daly, Schmitz, Tassone, Tysoe, Hagerman and Murphy2004), Prader-Willi syndrome (Walley & Donaldson, Reference Walley and Donaldson2005), and Down syndrome (Ball, Holland, Treppner, Watson, & Huppert, Reference Ball, Holland, Treppner, Watson and Huppert2008).
Though it is considered a standard in evaluating processes like planning, the TOL has certain limitations. A ceiling effect has been observed in young subjects, depending on the scoring method used (Berg & Byrd, Reference Berg and Byrd2002). Also, its task complexity makes it unsuitable for populations with cognitive deficit or ID unless it is adapted. Furthermore, the process of administering it needs to be standardized, because variations in the instructions provided, assistance, and learning processes all influence test outcomes (Unterrainer, Rahm, Leonhart, Ruff, & Halsband, Reference Unterrainer, Rahm, Leonhart, Ruff and Halsband2003). If TOL-type tests had simpler tasks and more flexible rules, it would increase efficacy in the diagnostic process and subsequent implementation of cognitive intervention programs (Ball et al., Reference Ball, Holland, Treppner, Watson and Huppert2008).
Instruments administered as part of any neuropsychological assessment should draw on normative data from the reference population, and people with ID are no exception. They have specific cognitive needs, and require tests in keeping with their intellectual potential. Though there is a growing variety of neuropsychological tests for the adult population with ID, instruments still need to be adapted for cognitive tests to meet reliability and validity standards. Normative data in the Spanish adult population with ID have not yet been published for any version of the TOL. With that in mind, the present study’s objective was to examine reliability and validity evidence for the TOLDXtm in the adult population with ID.
The study of the psychometric properties of the TOLDXtm using theoretical neuropsychological models (Tirapu et al., Reference Tirapu, García, Luna, Periañez, Tirapu, García, Rios and Árdila2012), its functioning in different populations, and its precision in the cognitive process it measures should reveal a robust structure and strong relation with other neuropsychological tests, especially tests that measure executive functions related to or involved in planning. Yet not all tests evaluate executive functions via the subject. Some measure executive functions based on parent/teacher reporting, and it would stand to reason that data collected through self-report, from the patients themselves, would differ in certain ways. For that reason, based on the tests employed in this study, we expect to find various factors, made up of variables from executive function measures taken from relatives, neuropsychological variables from strict executive functioning measures administered to patients, and a factor made up of the TOL’s weakest predictor variables. Even though we expect to find results partly consistent with the above in light of the literature (Culbertson & Zillmer, Reference Culbertson and Zillmer2001), we cannot overlook the type of population utilized, with Down syndrome, so we also anticipate differences from findings in other populations.
Method
Participants
The sample was comprised of 63 adult participants (≥ 39 years; 33 men and 30 women) with Down syndrome (DS) resulting from trisomy 21 and confirmed by karyotype, and mild (IDMi) and moderate (IDMo) levels of ID according to DSM-5 criteria (IDMi n = 39, IDMo n = 24). Participants were selected from hospital units: the Adult Down Syndrome Unit (ADSU) in Internal Medicine at La Princesa University Hospital (Madrid); and the Specialized Mental Health Unit for Adults with ID (SMHU-ID) at Martí i Julìa Park Hospital (Girona). The units provided lists of possible candidates from among their patients, then participants were selected through simple random sampling. The research protocol was approved by the Research Ethics Committee at each institution. All participants (subjects and their guardians) signed the appropriate informed assent forms (subjects) or guardian consent forms.
All subjects had to meet the following inclusion criteria, and none of the exclusion criteria.
Inclusion criteria
People with DS confirmed by karyotype (including mosaicisms and translocations); aged ≥ 39 years with IDMi and/or IDMo according to DSM-5 criteria; with no symptomatology of minor neurocognitive disorder (DSM-5) or Alzheimer’s disease according to the dementia criteria established in CAMDEX-DS (which includes diagnostic criteria from the DSM-IV and ICD-10); receiving no prescription drug treatment that could interfere with the objectivity of data collected in clinical assessments.
Exclusion criteria
Patients with clinical hypo/hyperthyroidism or uncontrolled B9 or B12 hypovitaminosis; altered consciousness (delirium); severe uncorrected sensory alteration (auditory/visual) that would impede proper test completion; minor or major neurocognitive disorder; and not providing informed consent/assent in writing.
Materials
Kaufman Brief Intelligence Test, Second Edition (KBIT-2; Kaufman & Kaufman, 2004)
This instrument yields an intellectual quotient. The test is widely utilized for research in people with ID because it gives a standard base score of 40. Reliability coefficients (matrices, α = .83; verbal knowledge, α = .87). The KBIT-2 has not been translated into Spanish. This study only utilized the Matrices part, which has no verbal component, and as its authors stipulate in the manual, it can be used without the Verbal Knowledge component to obtain an IQ.
Adaptative Behavior Scale-Residential and Comunity-second edition (ABS-RC:2; Nihira, Leland, & Lambert, 1993)
This scale evaluates adaptive behavior. Internal consistency (part one, adaptive behavior, α = .91; part two, ‘problem’ behaviors, α = .80). This study utilized the Spanish-language version by Medina-Gómez and García-Alonso (Reference Medina-Gómez and García-Alonso2010).
Cambridge Examination for Mental Disorders of Older People with Down’s Syndrome and Others with Intellectual Disabilities (CAMDEX-DS; Esteba-Castillo, Novell, Vilá, & Ribas, 2014)
This diagnostic instrument detects dementia in people with DS and other causes of ID, internal consistency (α = .93).
Barcelona Test-Intellectual Disability (BT-ID; Esteba-Castillo, Caixas, Deus, & Peña-Casanova, 2015)
This neuropsychological test battery was adapted and validated in Spanish for adults with ID. Of its constituent subtests, this study utilized the following: Inverse Digits (working memory), Planning and Organization, Resisting Interference, Verbal Execution, and Semantic Fluency with Animals. Internal consistency by area (α orientation = .87; α attention = .85; α working memory = .91; α language = .96; α praxis = .87; α memory = .74; α executive functions = .85; α visual-construction = .83).
Weigl Color Form Sorting Test (WCFST; Goldstein & Scheerer, 1941)
This assesses the ability to categorize across two dimensions: it means ignoring a dominant dimension (color) to categorize based on a second, less dominant dimension (shape). It allows experimenters to offer external clues to see if performance improves. The WCFST has demonstrated sensitivity in detecting brain damage, and it correlates strongly with other tests of frontal performance and cognitive deficit (MMSE r = .67, p < .0001; total CAMCOG-R r = .72, p < .0001; CAMCOG-R executive functions subtests r = .65, p < .0001), showing it can help diagnose cases of cognitive alteration (Hobson, Meara, & Taylor, Reference Hobson, Meara and Taylor2007). The WCFST does not have a verbal component, so the original version was utilized, applying Strauss and Lewin’s (1982) criteria for test administration and correction.
Behavior Rating Inventory of Executive Function (BRIEF; Gioia, Isquith, Guy, & Kenworthy, 2000)
In this case, we used the parent form, the BRIEF-Parents (BRIEF-P), specifically a Spanish translation edited by the original authors (Gioia, Isquith, Guy, & Kenworthy, 2016). This interview with parents evaluates eight domains of executive functions: Inhibit, Shift, Emotional Control, Initiate, Working Memory, Plan/organize, Organization of Materials, and Monitor.” Internal consistency (teacher form, α = .80; parent form, α = .98), test-retest reliability (teacher form, r = .88; parent form, r = .82).
Tower of London-Drexel University: 2nd Edition (TOLDXtm; Culbertson & Zillmer, 2001)
We utilized the TOLDXtm adaptation for people with ID. It consists of two wooden peg boards (one for the patient, one for the examiner) with three vertical pegs of differing length, and three beads (one red, one green, and one blue). The maximum number of moves per item or problem is 20, and the maximum time is 120 seconds. The main difference between the two versions of the TOLDXtm (for children, adults) and the version of the TOLDXtm for ID is the minimum number of moves per problem, and their complexity. The minimum number of moves shifts from 3 to 7 on the children’s version (7–15 years old), and 4 to 7 on the adult version (≥ 16 years old). In the version for people with ID, meanwhile, the minimum number of moves shifts from 3 to 4. That noticeably simplifies task execution. It is not until the seventh item that examinees first encounter a problem with a minimum of 4 moves, so the first six problems provide training. Test-retest reliability coefficients: total moves (r = .81, p < .005); total time violations (r = .79, p < .005); total rule violations (r = .42, p < .05). Sensitivity and specificity indices, .76 and .81, respectively.
This test collects the following measures and scales, as a function of moves, time, and violations of the instructions: a) correct (Corr): number of problems solved with the minimum number of moves specified; b) total moves (Mov): the sum of excess moves used on each item. A move was understood as taking a bead off a peg and placing it on another, either at the bottom, or on top of another bead; c) initiation time (IT): the time lapsed between the signal to start executing, and making the first move; d) total time (TT): the time lapsed between initation and stopping the stopwatch; e) execution time (ET): difference between total time and initiation time; f) violation type I (VTI): times when the maximum beads per peg was exceeded; g) violation type II (VTII): number of times more than one bead was moved in a single turn; h) number of times examinee had to be reminded of the premise (PV).
The original test presents a scoresheet on which to report test performance: total correct, total moves, total initiation time, total execution time, total time, and total type I and type II violations. The test is over when in two consecutive problems, more than 20 moves are made or the maximum time is exceeded.
Relation between the tests utilized
The subtests and tests chosen to gather evidence of validity involve tasks related to the executive functions (Esteba-Castillo & García-Alba, 2015): the WCFST requires set shifting, which is the ability to change strategy; the BRIEF-P gives two dysexecutive indexes, one cognitive (inhibit, shift, emotional control, initiate, and working memory) and one behavioral (plan/organize, organization of materials, and monitor); the subtests (of the executive component) of the BT-ID require one to summon processes such as working memory, planning and organization, resisting interference, and verbal fluency.
Procedure
Subjects who met all the inclusion criteria and none of the exclusion criteria took part in the study. They were administered all the tests described above. First, their ID levels were determined (K-BIT II, ABS-RC: 2). Then, we determined if cognitive deficit resulting from dementia was present or absent (CAMDEX-DS and CAMCOG). Last, the neuropsychological test batteries were administered (BT-ID, BRIEF-P, WCFST, and TOLDXtm). Assessments were conducted by psychologists specialized in the neuropsychology of ID, in rooms at the SMHU-ID at Martí i Julìa Park Hospital (Girona) and the Hermanos García Noblejas Medical Specialty Center (La Princesa University Hospital, Madrid). In Madrid as well as Girona, subjects were always evaluated by the same professional, in both cases by staff highly specialized in the neuropsychological assessment of ID.
Data analysis
We computed descriptive statistics for all sociodemographic variables, and for the scales of the TOLDXtm. In addition to the eight scales proposed by Culbertson and Zillmer (Reference Culbertson and Zillmer2001) and discussed in the instruments section, we added a new variable called Hits. For its sheer simplicity, the authors decided it would be interesting to include in the study. This variable is to the number of problems correctly solved out of the ten on the TOLDXtm, regardless of the number of moves used. This variable was included in our analyses.
The descriptive data tables provide means, standard deviations, indicators of skewness and kurtosis (calculated as b1 and b2), and corrected item-total correlations on each measure in the 10 problems, for the sample all together as well as separately by group, IDMi and IDMo. We ran the appropriate comparisons between groups; in this case, since scores were not normally distributed, we opted for the Mann-Whitney test.
Internal consistency and factor structure
For all TOLDXtm scales, in the total sample as well as the two subgroups, we determined internal consistency using Cronbach’s alpha coefficient. We also compared the equality of alpha coefficients for independent groups (Feldt, Woodruff, & Salih, Reference Feldt, Woodruff and Salih1987) using the groups in this study (IDMi and IDMo). The factorial composition of the test’s 10 problems was examined through Exploratory Factor Analysis. For each scale, we calculated the number of optimal factors to extract according to classic parallel and MAP criteria, which are widely used in psychometric test validation. The results include goodness of fit indicators (RMSEA, RMSR, and TLI) obtained by extracting a single factor, for each scale. The Ordinary Least Squares (OLS) method of factor extraction was used throughout, and we worked from Pearson correlation matrices. For the variables Hits and Corr, since they were dichotomous, we worked with tetrachoric correlation matrices.
Relationship between the TOLDXtm and other neuropsychological tests
We examined relations between the TOLDXtm scales and other cognitive measures through Exploratory Factor Analysis. We worked with the Pearson correlation matrix, calculated using total scores on all the scales and tests in the study. As above, we opted for OLS estimation and Promax rotation. The number of factors to extract was determined by the aforementioned parallel, MAP, and VSS algorithms. The results section presents the pattern matrix, correlations between factors, and the following goodness of fit indexes: RMSEA, RMSR, and TLI.
Normative data table
To provide a possible clinical interpretation of test scores, a frequency table is included, with each of the nine scales of the TOLDXtm represented. Given the sample available, we chose to present quartiles only.
All analyses were conducted using R statistical software with the psych and cocron packages.
Results
Difficulty and consistency of scales on the test
Though TOL items were constructed to progressively increase in difficulty, that was not apparent in the results (see Figure 1 and Appendices A through I). Instead we observed a staggered structure in which difficulty decreased from the first item to the second, increased in problems 3 and 4, then dipped again only to increase on item 7, and again on 10. This behavior was more or less generalized to all scales, but was less pronounced on Hits, which showed a more gradual rise in difficulty.
The meaning of these shifts, probably related to the cognitive components of each task, is explored in the Discussion section.
In general, and especially on the violations scales, there were notable discrepancies between indices of skewness and kurtosis. That was due to small sample size, and to the heterogeneity of patients with DS. With those considerations in mind, we elected to use nonparametric tests.
Total scores on the three movement-related scales (Corr, Mov, and Hits) showed significant differences in score between the IDMi and IDMo groups (Table 1), with the IDMo group performing worse across the board: Corr (p < .05), Mov (p < .05), and Hits (p < .01). Its values of internal consistency, gauged by Cronbach’s alpha, were found to be suitable, with the highest value found on the Hits scale (α = .89), followed by Corr (α = .75), and Mov having the lowest of the three (α = .52). Tests of the equality of alpha did not suggest differences between the Mild and Moderate groups on any scale, showing the test to be equally reliable in the two groups: Mov (p = .216), Corr (p = .066), and Hits (p = .159).
Sd: Standard deviation; Sk: Index of skewness; Ku: Index of kurtosis; alpha: Value of Cronbach’s α; [a] Mann-Whitney’s contrast for ID Mild and ID Moderate; [b] Test for equality of alpha coefficients between ID Mild and ID Moderate.
The time scales (initiation, execution, and total) had similar or greater values of internal consistency than previous scales (Table 2): initiation (α = .84), execution (α = .75), and total (α = .77). Meanwhile, though times were longer in the Moderate group than the Mild group, no statistically significant differences were detected: initiation (p = .147), execution (p = .475), and total (p = .581). Initiation time did reveal significant differences between groups in terms of alpha values – Mild (α = .73), Moderate (α = .91), p < .01 – but that is not especially worrisome since their values were still high. On the other two scales, differences in alpha were not found between groups; execution (p = .150), total (p = .200).
Sd: Standard deviation; Sk: Index of skewness; Ku: Index of kurtosis; alpha: Value of Cronbach’s α; [a] Mann-Whitney’s contrast for ID Mild and ID Moderate; [b] Test for equality of alpha coefficients between ID Mild and ID Moderate.
The third set of scales, violations, are peculiar in that violations decrease as one advances through problems (as in a Pareto distribution). These generally presented poorer psychometric properties than the scales above (Table 3). Their alpha indexes were lower: VTI (α = .42), VTII (α = .70), and PV (α = .67); and none yielded between-groups differences: VTI (p = .670), VTII (p = .311), and PV (p = .406). Some alpha values could not be calculated for lack of subjects with data.
Sd: Standard deviation; Sk: Index of skewness; Ku: Index of kurtosis; alpha: Value of Cronbach’sα; [a] Mann-Whitney’s contrast for ID Mild and ID Moderate; [b] Test for equality of alpha coefficients between ID Mild and ID Moderate. [c] This value could not be calculated due to missing data.
The test’s factor structure
In terms of factor structure, the TOLDXtm test was built with a single dimension. Confirmatory factor analysis could not be conducted due to particularities of the sample, but Table 4 is included to present parallel and MAP algorithms and descriptive data from an exploratory factor analysis of just one factor. The MAP algorithm indeed proposed a single-factor solution for all scales; but the other algorithm yielded less uniform results, especially for the scales Mov, initiation time, and the three violations scales. Goodness of fit indicators for one-factor extraction were not very acceptable on any scale, with all values of RMSEA and RMSR above .1. Corr was the exception; it had little explained variance (24%), but it presented an RMSEA less than .05 (RMSEA = .03). The scale that explained the highest percentage of variance was Hits, at 44%, while Mov and VTI explained the least variance, at 17%.
Relation between TOLDXtm test scales and other neuropsychological tests
Table 5 presents the sample’s descriptive statistics for the remaining tests in the study, and Table 6 conveys the results of exploratory factor analysis, which was done using total scores from the nine scales of the TOLDXtm, and said tests. The correct number of factors to retain, according to parallel and MAP criteria, was three, with 43% of variance explained and the following goodness of fit values: RMSR = .09; TLI = .057; and RMSEA = .15. Interpreting the matrix of factor loadings reveals a first factor (F1) associated with the scales of the BRIEF-P, a second factor (F2) comprised of variables from the BT-ID, WCFST, and two movement-related variables from the TOLDXtm (Corr and Hits), and a third and final factor (F3) that combines all TOLDXtm measures except the last two. Considering the matrix of correlations, these factors are independent of one another.
Sd: Standard deviation; Sk: Index of skewness; Ku: Index of kurtosis.
Each variable’s highest factor loading appears in bold; h 2 communality of each scale; Below find the variance explained by each factor, and goodness of fit indicators (RMSR, TLI, and RMSEA).
Normative data table
Table 7 was included to facilitate clinical interpretation. It presents minimum, maximum, and quartile scores on the nine scales, in the total sample as well as IDMi and IDMo groups separately. All scales (except Corr and Hits) were inverted, so the lower a subject’s quartile, the worse the performance. In other words, on every scale, a subject in the first quartile (Q1) performed worse than someone in the third (Q3).
Low quartile indicates poor performance. Worst and best scores report either the high or low score in the sample, depending on the direction of scoring of the scale.
Discussion
This study’s objective was to examine reliability and validity evidence for the version of the TOLDXtm created for people with ID (Culbertson & Zillmer, Reference Culbertson and Zillmer2001): 2nd edition for adults with DS. Of the many existing versions of the TOL, this is the only one designed with problem difficulty appropriate for use in people with ID.
Assessing executive functions in people with ID is very important because of how they relate to behavioral aspects, like the behavioral disorders so common in these individuals. However, professionals in this field must make an effort to correct the present dearth of tests designed specifically for this population, and lack of adaptations. To fill that gap would enable precise evaluation and correct reporting of subjects’ level of functioning (Esteba-Castillo et al., Reference Esteba-Castillo, Dalmau-Bueno, Ribas-Vidal, Vilà-Alsina, Novell-Alsina and García-Alba2013).
Often, evaluations are made using tests designed for the general population, and applying barometers from the population at large, by mental age, to gauge a subject’s performance. That can be risky, and anyway diagnostic decisions should be made with caution (Reference Esteba-Castillo, García-Alba, Enseñat, Roig and GarcíaEsteba-Castillo & García-Alba, 2015). Being able to access normative data for reference is indispensable on any neuropsychological test, and for reference data to be valid and correct, it should represent the population to which the evaluee belongs (Rognoni et al., Reference Rognoni, Casals-Coll, Sánchez-Benavides, Quintana, Manero, Calvo and Peña-Casanova2013).
Generally speaking, the psychometric properties of the TOLDXtm version for people with ID were satisfactory on all variables. It was also noteworthy in its ability to avoid the floor effect, which tests not adapted for the ID population tend to have.
As some authors have warned in past publications, there is an age effect on TOL performance. Peña-Casanova et al. (Reference Peña-Casanova, Quiñones-Ubeda, Gramunt-Fombuena, Quintana, Aguilar, Molinuevo and Blesa2009) observed a clear drop in performance in adults 60 to 70 years old, and much higher performance in young people aged 20 to 30. Although there are no TOLDXtm-ID data available in the child and/or adolescent population with DS, we do not expect they will perform the same as adults. On that note, the general-population version has two forms, one for children and one for adults. Scores on the original test show that on the children’s version (7–15 years), the various age ranges show statistically significant differences on all scales (p < .001) except total initiation time (p < .20). According to the authors, that is linked to cognitive development of processes needed to solve the problems (Culbertson & Zillmer, Reference Culbertson and Zillmer2001).
This study’s population has specific features belonging to a very advanced stage of development among people with DS; in many of them, it marks the beginning of cognitive decline, depending on normative aging and the onset of Alzheimer’s type dementia (Nieuwenhuis-Mark, Reference Nieuwenhuis-Mark2009). Given the high prevalence of Alzheimer’s disease in people with DS (Zigman & Lott, Reference Zigman and Lott2007) and frontal lobe neurodegeneration, especially in certain areas (Teipel et al., Reference Teipel, Alexander, Schapiro, Möller, Rapoport and Hampel2004), it is especially important to administer a test like the TOLDXtm-ID around this age. Therefore, future validations of the TOLDXtm, and similar tools, should account for these development stages and their characteristics before establishing appropriate age cutoffs.
The variables in this version of the TOLDXtm (correct, moves, initiation time, total time, execution time, type I violations, type II violations, premise violations) appeared on the original test, and measure different parameters – moves, times, and violations – to determine a subject’s level of performance. We observed that the three parameters do not work in the same way. In general, according to internal consistency indices and their significance across IDMi and IDMo groups, the variables Corr and Mov had the highest values. Total scores on Corr and Mov were significantly different in the IDMi and IDMo groups. This suggests the minimum and total moves made in the ten problems clearly differed according to the extent of ID. The IDMi group scored higher on Corr and lower on Mov than the IDMo group, both of which indicate higher performance in the IDMi group. However, those two variables’ internal consistency was stable. Thus, both can predict ID levels and show similar levels of internal consistency. This aspect is especially relevant because it reports this TOL version’s sensitivity in detecting clearly differentiated levels of intellectual functioning. In that sense, the authors believe this adaptation is significantly suitable.
The new variable proposed, Hits, is dichotomous (was the subject able to finish the problem or not, regardless of how many moves they used) and it was observed to operate with very high levels of reliability, higher than the variables Corr and Mov. As in the case of Corr and Mov, Hits was consistent across ID levels; comparing alphas in the IDMi and IDMo groups produced no statistically significant differences. Outcomes on this variable suggest that as the problems become more difficult, fewer subjects can finish them; only 64% of sujects with IDMi and 38% of subjects with IDMo were able to complete problem 10. Nonetheless, the original test did not include Hits, so it is not counted in the final scores assessing examinee performance. With that in mind, and given its indices of reliability, stable consistency, and the information it provides about examinee performance, future versions of the test should take Hits into account.
The test addresses another parameter: time (initiation time, execution time, and total time). Measures of IT across the 10 problems in both groups were stable; differences were not observed except on problem 6, but observing and comparing total IT scores, significant differences did not appear. In the IDMi group, there was a tendency for subjects, independently of item difficulty and the minimum moves needed to correctly complete it, to take a very similar amount of time to initiate. That is, they initiated the task without prior mental planning. The IDMo group, meanwhile, took longer to initiate tasks, and also displayed a similar effect as in the IDMi group: initiation time did not depend on task difficulty, and they did not seem to plan moves ahead of time. Regarding ET, differences were observed between the IDMi and IDMo groups such that the IDMi group executed tasks in far less time. However, that effect only occurred in items 1 to 5; from 6 onward, execution times were similar in the two groups. In a way, it seems the IDMo group learned over the course of problems 1–5 and, having overcome those, improved performance, gaining speed. Yet looking at total ET’s across groups, significant differences were not observed. In IT as well as ET, we observed the same effect as in Corr and Mov: comparing the two groups’ internal consistency on both variables, values were highly acceptable but not stable; the same level of consistency did not occur in IDMi as in IDMo. Fatigue is an important aspect to consider. Looking at time measures on the whole, it does not seem that fatigue influenced quality or ability in task execution. Execution time seemed more closely related and learning dependent, and thus more dependent on cognitive functioning level than on task difficulty or fatigue.
The third parameter refers to instances of the subject violating norms, and what type of norms. Both groups showed a clear tendency to make fewer violations as the test went on, especially fewer instances of VTII and PV. Thus, over the course of the test, examinees incorporate norms and can complete a given task with or without making a violation. In the case of VTI, the effect was weaker; when relaying instructions, strong emphasis should be placed on VTI, because it is hardest for examinees to incorporate. In view of the results, it seems clearly related to problem difficulty.
With respect to subjects’ performance on different problems, the authors observed staggered performance in terms of problem solving. The same effect was reported using other versions of the TOL in the population with ID (Masson, Dagnan, & Evans, Reference Masson, Dagnan and Evans2010). There are two clearly defined types, items 1–7 and 8–10, and their differentiation is the result of minimum moves shifting from 3 in 1–7, to 4 in 8–10. However, the data suggest there are certain problems (1, 4, and 7) on which the number of moves, execution times, and violations all increased. On problem 1, it seems to be because it is first, an effect of task novelty. Despite doing practice items beforehand, between those items and problem 1 there seems to be a jump in difficulty. The authors suggest adding a new practice item, one that is harder than the current ones and more similar to problem 1. In any case, with that factor in mind, it is very important for the examiner to ensure the subject correctly understood the instructions. On item 4, conversely, the “extra” difficulty seems more the result of move composition than any other factor. On 7, the increase in moves, times, and violations seems to be in response to the increase to four moves. Even though examinees solve it, that jump increases task difficulty considerably. The new variable, Hits, follows a pattern most in line with problem difficulty (the proportion of Hits decreases with each item), followed by the variables Corr and Mov.
The variables that indicated significant differences beween the IDMi and IDMo groups were Corr, Hits, and Mov (those related to the subject’s movements). The ones that did not show significant differences between the two groups were IT, ET, and TT (time related), and VTI, VTII, and PV (violation related). Thus, it is best for clinicians to use Corr, Hits, and Mov to differentiate between the two groups.
Exploratory factor analysis of the nine TOLDXtm-ID variables indicated that none of the goodness of fit indices were especially satisfactory in explaining the variability in scores, so it seems we must assume that multiple variables and factors were involved in subjects’ actions. Nevertheless, the movement-related variables, Hits and Corr, explained the most variance in the one-factor solution (.71 and .46, respectively).
On another note, we examined the extent to which this test is related to others with an executive component. Factor analysis suggested a three-factor structure, which would be quite logical. First, there would be one factor comprised of BRIEF-P variables, a second factor would encompass variables with an executive component on the BT-ID and more predictor variables on the TOLDXtm (Corr and Hits), and the third factor would include the remaining measures on the TOLDXtm. This factor structure suggests the tests had high specificity. BRIEF-P scores seemed more closely related to one another than to the cognitive components they measure, executive. That seems to be because parents’ impressions about the domains evaluated by the BRIEF-P were not consistent with outcomes on the executive functioning tests subjects completed (TOLDXtm, BT-ID, and WCFST). The same conclusion could be reached about TOLDXtm variables related to times and violations. The variables most predictive of outcomes on the TOLDXtm and the other tests subjects completed, all assessing executive functions, belong to the second factor. As such, there seems to be a strong functional relationship between subjects’ performance on the TOLDXtm-ID, and the remaining executive tests. Analysis of the original test (TOLDXtm) using other tests (intelligence, memory, and executive functions) produced a similar factor structure (Culbertson & Zillmer, Reference Culbertson and Zillmer2001). Significant correlations were found with tests involving executive processes (r = +.57 to r = –.54), but as described above, correlations with intelligence tests were not significant. The original test presented four factors: (F1) tests of frontal performance, (F2) the TOLDXtm’s stronger predictor variables, (F3) memory tests, and (F4) psychometric intelligence tests. While that study did not use the same tests as the present one, the factor structure, at a cognitive functional level, bears strong similarity.
On a practical level, and to help clinically interpret the scores reported here, a barometer was created with quartiles to place subjects in normative groups. Table 7 presents all new score-generating variables on the TOLDXtm-ID test. However, please note that according to the scores obtained, our recommendation is to utilize only the Corr variable. Given the alpha values and scores on the ten problems, that seems best. If an examinee’s ID level can be determined, match with ID level. If not, match with ranking out of All subjects.
We must emphasize that given the limited number of subjects, the sample could be complemented with future studies to confirm results. That said, this study is part of a much broader longitudinal study which, once complete, could provide an interesting opportunity to analyze test-retest reliability (basic-24 month).
In conclusion, the version of the TOLDXtm test for ID showed sufficient evidence of reliability and validity in adults with Down syndrome for it to be used in clinical practice as well as research. Its high correlations with other tests of frontal performance indicate it is especially suitable for assessing processes like planning, which is very important for people with ID.
Supplementary Material
To view supplementary material for this article, please visit https://doi.org/10.1017/S1138741617000300