INTRODUCTION
Many studies have investigated verbal memory performance based on the lateralization of temporal lobe epilepsy (TLE), with the expectation that left TLE (LTLE) patients will perform worse than right TLE (RTLE) patients (Baxendale, Reference Baxendale1998; Bowden, Simpson, & Cook, Reference Bowden, Simpson and Cook2016; Gow, Reference Gow2012; Reyes et al., Reference Reyes, Kaestner, Ferguson, Jones, Seidenberg, Barr and McDonald2020). This expectation is based on the premise that LTLE patients have more damage to structures in the left anterior temporal lobe that are involved in language processing, such as the hippocampus (Baxendale, Reference Baxendale1998; Bowden et al., Reference Bowden, Saklofske, van de Vijver, Sudarshan and Eysenck2016; Gow, Reference Gow2012; Reyes et al., Reference Reyes, Kaestner, Ferguson, Jones, Seidenberg, Barr and McDonald2020). Despite the multitude of studies that have investigated this hypothesis in pre-surgical TLE patients, results have been inconsistent. Some studies have shown significantly lower verbal memory performance for LTLE patients compared to RTLE patients prior to anterior temporal lobectomy (Busch, Frazier, Iampiertro, Chapin, & Kubu, Reference Busch, Frazier, Iampietro, Chapin and Kubu2009; Kim, Yi, Son, & Kim et al., Reference Kim, Yi, Son and Kim2003; Soble et al., Reference Soble, Eichstaedt, Waseem, Mattingly, Benbadis, Bozorg and Schoenberg2014; Visser et al., Reference Visser, Forn, Lambon Ralph, Hoffman, Gomez Ibanez, Sunajuan and Avila2018). Other studies have not found significant differences based on TLE lateralization (Bouman, Hendriks, Schmand, Kessels, & Aldenkamp, Reference Bouman, Hendriks, Schmand, Kessels and Aldenkamp2016; Loring et al., Reference Loring, Strauss, Hermann, Barr, Perrine, Trenerry and Bowden2008; Raspall et al., Reference Raspall, Doñate, Boget, Carreño, Donaire, Agudo and Salamero2005; Wilde et al., Reference Wilde, Strauss, Chelune, Loring, Martin, Hermann and Hunter2001). The basis for these inconsistencies is unresolved; some researchers suggest that the lateralizing value in TLE is due to differences in test characteristics, with previous research suggesting that certain verbal memory tests are more sensitive to lateralized TLE than others (Saling, Reference Saling2009; Umfleet, et al., Reference Umfleet, Janecek, Quasney, Sabsevitz, Ryan, Binder and Swanson2015). Others have suggested that hippocampal sclerosis, the most common pathology underlying TLE, may represent a developmental lesion with less localizing impact than an acute lesion in the same location (Seidenberg et al., Reference Seidenberg, Hermann, Wyler, Davies, Dohan and Leveroni1998; Bowden et al., Reference Bowden, Saklofske, van de Vijver, Sudarshan and Eysenck2016).
Associative memory tests such as the verbal paired associates (VerbalPA) subtest of the Wechsler Memory Scales (WMS) have been hypothesized as useful in detecting left hippocampal damage characteristic of LTLE (Reyes et al., Reference Reyes, Kaestner, Ferguson, Jones, Seidenberg, Barr and McDonald2020; Saling, Reference Saling2009; Saling et al., Reference Saling, Berkovic, O’Shea, Kalnins, Darby and Bladin1993; Scorpio et al., Reference Scorpio, Islam, Kim, Bind, Borod, Bender, Kreutzer, DeLuca and Caplan2018; Suzuki, Reference Suzuki2008). Specifically, it has been proposed that LTLE patients perform worse than RTLE patients on items with arbitrary associative properties (such as “quiet-spoon”) compared to items with obvious semantic properties (such as “beach-ocean”; Reyes et al., Reference Reyes, Kaestner, Ferguson, Jones, Seidenberg, Barr and McDonald2020; Saling, Reference Saling2009; Saling et al., Reference Saling, Berkovic, O’Shea, Kalnins, Darby and Bladin1993; Suzuki, Reference Suzuki2008). Such findings suggest that overall performance on verbal memory tasks may be the result of the latent factors that impact item-level performance, such as the extent to which the association between words in each item is arbitrary versus semantically related.
Factor analysis can be used to model the relationships between observed item scores and latent factors (Brown, Reference Brown2015; Kline, Reference Kline2015). Examples of possible latent factors underlying verbal memory items include orthographic features, semantic category, or grammatical class of a word. Confirmatory factor analysis (CFA) is the preferred approach to test hypothesized factor structures by predefining the expected number of latent factors and the corresponding items expected to load onto each factor (Brown, Reference Brown2015; Kline, Reference Kline2015).
To date, no known studies have examined the latent structure of the VerbalPA in any version of the WMS at an item level. A previous, unpublished study by Petrauskas (Reference Petrauskas2012) performed an item-level factor analysis on the VerbalPA in the WMS-Third Edition (WMS-III; Wechsler, Reference Wechsler1997). Data used in that study were from 490 heterogeneous neurosciences patients comprising consecutive referrals, including a subsample of pre-surgical TLE patients (n = 223) with established seizure onset laterality. The results revealed a latent structure that appeared to correspond to semantic features of words within each item pair. Importantly, all VerbalPA items in WMS-III are word pairs with arbitrary associations (hard items), meaning that semantic structure implied by the results of Petrauskas (Reference Petrauskas2012) cannot be attributed to the difference between easy and hard items. Rather, these results suggest that the linguistic properties of the individual words in each item may be relevant to overall performance in the VerbalPA.
One theory that may be useful in guiding an item-level factor analysis of VerbalPA is the hub-and-spoke model of semantic processing (Patterson & Lambon Ralph, Reference Patterson, Lambon Ralph, Hickok and Small2016). The hub-and-spoke model proposes that modality-specific features of words are processed in distinct parts of the brain and that these features are subsequently combined in the anterior temporal lobe to give the words overall semantic meaning (Patterson & Lambon Ralph, Reference Patterson, Lambon Ralph, Hickok and Small2016). Using the hub-and-spoke model to interpret VerbalPA performance based on the semantic properties of items may optimize the clinical utility of the test by providing an interpretative framework that is consistent across different neuroscience disciplines (David, Fleminger, Kopelman, Mellers, & Lovestone, Reference David, Fleminger, Kopelman, Mellers and Lovestone2009; Patterson & Lambon Ralph, Reference Patterson, Lambon Ralph, Hickok and Small2016). For instance, clinicians and researchers may find interpretations based on hub-and-spoke more meaningful than interpretations based on a customary scoring model without clear theoretical rationale.
The latent structure of many cognitive tests has been modeled on the Cattell–Horn–Carroll (CHC) framework, a theory which has been empirically derived and refined by factor analytic studies across numerous community and clinical populations (Agelink van Rentergem, Reference Agelink van Rentergen, de Vent, Schmand, Murre, Staaks and Huizenngea2020; Jewsbury, Bowden, & Duff, Reference Jewsbury, Bowden and Duff2016; Reynolds, Keither, Flanagan, & Alfonso, Reference Reynolds, Keith, Flanagan and Alfonso2013; Schneider & McGrew, Reference Schneider and McGrew2018). Recent accounts of the CHC model recognize three broad abilities relevant to clinical memory assessment, including retrieval fluency, learning efficiency, and working memory (Schneider & McGrew, Reference Schneider and McGrew2018). Narrow abilities within learning efficiency include associative memory, the ability to memorize pairs of stimuli that have arbitrary associations, and meaningful memory, which is memory for semantically meaningful information (Schneider & McGrew, Reference Schneider and McGrew2018). Accordingly, these narrow abilities may be useful in interpreting the difference between performance on easy items and hard items in the current VerbalPA.
The hub-and-spoke model also fits neatly into CHC theory. Acquired knowledge is a broad CHC ability that describes the ability to understand and communicate knowledge and includes the nested abilities of language development and lexical knowledge, which both describe cognitive abilities related to understanding the semantic aspects of words (Schneider & McGrew, Reference Schneider and McGrew2018). Thus, the hub-and-spoke model could be considered as describing narrow abilities within acquired knowledge, providing semantic organizational features of language development and lexical knowledge. The hub-and-spoke model is also relevant to meaningful memory under learning efficiency, which is dependent on the semantic organization of verbal memory.
The present study sought to identify a well-fitting and theoretically justified latent variable structure for the WMS-IV VerbalPA subtest items to facilitate the ease and accuracy of score interpretations (Kamphaus, Winsor, Rowe, & Sangwon, Reference Kamphaus, Winsor, Rowe, Sangwon, Flanagan and McDonough2018). Using CFA, three theoretically motivated models were tested. The first, based on hub-and-spoke theory, tested whether performance is related to specific features of individual words in each item. The second, based on relevant CHC constructs, investigated whether the associative properties of easy items versus hard items are a factor in overall performance. The third model was a combination of the preceding models and tested whether performance is related to both semantic and associative features of items. Because latent variable analysis (CFA) requires larger samples to obtain stable parameter estimates, we first established a CFA model in a larger heterogeneous neurosciences sample and then tested the applicability of the model in subsets of patients with LTLE and RTLE.
METHODS
Participants
This study used archival data from 250 (138 women, 111 men) patients who attended St. Vincent’s Hospital in Melbourne and were administered the WMS-IV as part of a standard neuropsychological assessment. A retrospective sampling strategy was used because prospective funding for this type of study is difficult to obtain, and the data accumulation epoch exceeds the duration of a typical graduate student enrolment. Cases were retrieved retrospectively from a consecutive sample of all patients referred for neuropsychological evaluations. Of these, 74 were diagnosed with lateralized TLE (LTLE, n = 46; RTLE, n = 28). The patients with TLE had all undergone full multimodal neurological evaluation for seizure disorders in a comprehensive epilepsy program. Data from clinical investigations by epileptologists, magnetic resonance brain imaging using an epilepsy protocol, 7-day inpatient video telemetry, and volumetric hippocampal analysis were investigated for concordance with FDG-PET, and subtraction ictal SPECT co-registered to MRI. The standard clinical protocol has been previously described (Loughman, Bowden, & D’Souza, Reference Loughman, Bowden and D’Souza2017; Murphy et al., Reference Murphy, Smith, Wood, Bowden, O’Brien, Bulluss and Cook2010). All TLE patients subsequently proceeded to surgery. Details of all diagnostic categories are displayed in Supplementary Table 1 and represent the heterogeneous range of diagnoses routinely referred to the neuropsychology service. No community controls were included in the sample. For the full sample, patients were aged from 17 to 69 (M = 44.16, SD = 13.84), with total education ranging from 3 to 18 years (M = 11.82, SD = 2.85). All data were de-identified before analysis, and the research project was approved by the Human Research and Ethics Committee at St. Vincent’s Hospital Melbourne (Approval number QA 051/19).
Measures
Data items were dichotomous item-level responses from the WMS-IV VerbalPA I and VerbalPA II (70 variables in total). VerbalPA is an associative memory task comprised 14 dichotomous items, including four semantically related word pairs such as “floor-ceiling” (hypothetical easy items) and 10 unrelated word pairs such as “table-tree” (hard items). Performance on the VerbalPA items depends on the ability to make new memory associations between the individual words in each pair, insofar that the presentation of a single word prompts recollection of its pair. VerbalPA II is a delayed recall trial of the same material. As the VerbalPA test items are copyright protected, copies of the items will not be provided in this article, but are available in the WMS-IV test manual (Wechsler, Reference Wechsler2009). Descriptive statistics for VerbalPAI and VerbalPAII raw scores in the TLE samples and the balance of the heterogeneous neurosciences sample are shown in Supplementary Table 2.
Procedure
All analyses were completed using Mplus Version 8 (Muthén & Muthen, Reference Muthén and Muthen2017). As the data were dichotomous (scored as correct or incorrect), the weighted least squares mean and variance adjusted (WLSMV) estimator was used in the estimation of all models. Goodness-of-fit was evaluated according to Hu and Bentler’s (Hu & Bentler, Reference Hu and Bentler1998) guidelines for well-fitting models. Good fit was defined as root mean square error of approximation (RMSEA) values below 0.06, and comparative fit index (CFI) and Tucker Lewis Index (TLI) values above of 0.95. Acceptable fit was defined as RMSEA values below 0.08, and CFI and TLI values above 0.90. Less emphasis was placed on the χ 2 test of overall model fit, as this statistic is influenced by large numbers of variables (Brown, Reference Brown2015). Satorra and Bentler (Reference Satorra and Bentler2001) chi-squared (χ 2) difference testing was used to compare model fit for nested CFA models.
Three a priori factor structures were tested and compared using CFA. The first was a three-factor “Semantic-Clustering” (see Supplementary Table 2) model based on the cognitive principles described in hub-and-spoke theory. The second was a two-factor “Easy-Hard” model (see Supplementary Table 3) based on CHC theory. The third was a five-factor “Hybrid” (see Table 1) based on integrating principles of hub-and-spoke theory with CHC theory, as described above. A one-factor model representative of general long-term retrieval ability was also tested to ensure that any high correlations between factors were not indicative of over-factoring. Apart from the one-factor model, further post hoc modification risks factor solutions of low replicability (Brown, Reference Brown2015; Kline, Reference Kline2015), so were not pursued.
Table 1. Structure of the five-factor Hybrid model of WAIS-IV verbal paired associates

The generalizability of viable, well-fitting models to TLE patients was tested using MIMIC modeling (“multiple causes, multiple indicators”; Brown, Reference Brown2015). A variable of “group membership” was defined, including three subgroups comprising LTLE patients, RTLE patients, and heterogeneous patients (remainder of the sample). Subgroups were dummy coded as 1 = LTLE, 2 = RTLE, and 0 = heterogeneous. Descriptive statistics for each subgroup are displayed in Table 2. To assess whether lateralization of TLE diagnosis is related to differences in VerbalPA performance characteristics, factor scores were compared separately for the LTLE subgroup and the RTLE subgroup, against the heterogeneous subgroup and against each other, for the best-fitting model.
Table 2. Descriptive statistics for demographic variables in the heterogeneous (n = 146), LTLE (n = 46), and RTLE (n = 28) subgroups

LTLE = left temporal lobe epilepsy; RTLE = right temporal lobe epilepsy.
RESULTS
Initial factor analyses revealed warnings for several items (List A, items 7, 8, 11, 12, and 14; List B, item 13; List C, items 1 and 15; List D, items 1, 4, 5, 9, and 11; Delayed Recall, items 5, 11, 12, 13, and 14) due to empty cells in the cross-classification tables, indicating that for certain pairs of items, everyone either passed or failed on one item in the pair, despite the heterogeneous nature of the clinical sample. Accordingly, these items were removed from the final specifications of all CFA and MIMIC models.
CFA Models
The two-factor Easy-Hard model and the three-factor Semantic-Clustering model were specified and compared using CFA. In addition, a one-factor model representing long-term retrieval ability was tested in anticipation of high correlation between factors in the a priori models. The fit statistics for these models are displayed in Table 3.
Table 3. Fit statistics for each of the CFA models of WAIS-IV verbal paired associates

CFA = confirmatory factor analysis; WLSMV = weighted least squares mean and variance adjusted; RMSEA = root mean square error of approximation; CI = confidence interval.
*p < .001.
The Easy-Hard Model
Apart from the significant χ2 value, the RMSEA, CFI, and TLI values show excellent fit for the two-factor Easy-Hard model (see Table 3). All items showed positive, significant loadings on the pre-specified factors and high R 2 values. The two factors showed a significant, strong correlation (r = .72, p < .001, SE = .03). Although strong correlations between factors can be indicative of over-factoring in model (Brown, Reference Brown2015), the Satorra and Bentler (Reference Satorra and Bentler2001) χ2 difference test showed that collapsing two factors into the one-factor Gl model (also shown in Table 3) resulted in significant loss of fit, difference χ 2 (1) = 91.06, p < .001.
The Semantic-Clustering Model
As shown in Table 3, the RMSEA, CFI, and TLI values for the three-factor Semantic-Clustering model also suggest excellent fit, despite the significant χ 2 value. This model specified three factors, in which items were categorized based on the referent of the target word (“living”, “non-living”, or “abstract”). Again, all items showed significant loadings onto pre-specified factors and high R 2 values.
All three factors were significantly correlated with the strongest correlation observed between the “abstract” and the “living” factors (r = .84, p < .001, SE = .02). The “non-living” factor was also found to be correlated with the “living” factor (r = .78, p < .001, SE = .03) and the “abstract” factor (r = .80, p < .001, SE = .03); however, all correlations were at least six standard errors <1. Again, the χ 2 difference test showed significantly poorer fit for the one-factor Gl model compared to the three-factor Semantic-Clustering model, χ 2 (3) = 157.70, p < .001.
The Five-factor Hybrid Model
The RMSEA, CFI, and TFI values suggested excellent fit for the five-factor Hybrid model, as shown in Table 3. All items showed significant loadings onto pre-specified factors and high R 2 values. Standardized factor loadings, standard errors, and R 2 values for this model are displayed in Supplementary Table 4. All factors were significantly correlated, but all factor correlations were significantly <1, that is, all factor correlations were at least three standard errors <1 (displayed in Supplementary Table 5).
Compared to the five-factor Hybrid model, the χ 2 difference test showed significantly poorer fit for the three-factor Semantic-Clustering model (χ 2 (7) = 123.97, p < .001) and for the two-factor Easy-Hard model (χ 2 (9) = 130.21, p < .001). Thus, the Hybrid model was deemed the best-fitting CFA model for the present data set.
MIMIC Models
Group membership was specified as a covariate in which the diagnostic categories of the full sample were collapsed into three subgroups, including LTLE, RTLE, and the remaining heterogeneous patients. Again, items with empty cells in the cross-classification tables were excluded from the MIMIC model.
The group membership covariate was added to the five-factor Hybrid model in two separate MIMIC models. In the first MIMIC model, each of the five factors was regressed onto the LTLE subgroup to estimate the direct effect of LTLE on each of the five factors and the respective indicators compared to the RTLE and heterogeneous subgroups. Based on the expected detrimental effect of LTLE on indicators of verbal memory, a negative effect of group membership was expected for all factors. In the second MIMIC model, the five factors were regressed onto the RTLE subgroup to estimate the direct effect of RTLE on each of the five factors compared to the LTLE and heterogeneous subgroups. As RTLE is not hypothesized to have a significant effect on any indicators, no effect of group membership was expected in this model.
Both MIMIC models showed good overall fit for the data based on the RMSEA, CFI, and TLI values, indicating that adding a variable for clinical group membership did not lead to appreciable loss of fit (fit statistics for these models are displayed in Supplementary Table 6). The latent mean differences comparing the effect of group membership for both MIMIC models are displayed in Table 4. In line with theoretical expectations, the first MIMIC model showed a significant negative difference for two factors (“Abstract/Hard” and the “Non-living/Hard”) for LTLE compared to the heterogeneous subgroup. The results of the MIMIC analysis for the RTLE subgroup were also in line with expectations, with no significant effect for any factor compared to the heterogeneous subgroup. Thus, in line with expectations, the MIMIC models suggest that the five-factor Hybrid model is invariant for factor scores in the RTLE subgroup, but significantly lower latent means were observed on two factors in the LTLE subgroup (Table 4). Derived from the completely standardized variance–covariance matrix, the lower factor means in the LTLE group, while significant, represent small effects equivalent to Cohen’s d in the range of 0.1–0.2.
Table 4. Latent mean differences for LTLE vs Heterogeneous and RTLE vs Heterogeneous for each of the five factors in the Hybrid model of WAIS-IV verbal paired associates

LTLE = left temporal lobe epilepsy; RTLE = right temporal lobe epilepsy; MD = mean difference; SE = standard error.
Finally, because sample sizes were too small for stable latent variable estimation, to compare LTLE and RTLE groups directly, mean raw scores for each factor (total number of items correct) were calculated for the LTLE and RTLE subgroups and are displayed in Table 5. Independent sample t tests revealed no significant differences between groups for any of the five factors, and group-difference effects were small, four of Cohen’s d values were <0.24, and the fifth equal to .415.
Table 5. Raw score composites corresponding to the items assigned to factor. Scores are shown for the RTLE and LTLE groups compared with independent samples t tests

LTLE = left temporal lobe epilepsy; RTLE = right temporal lobe epilepsy.
# unequal variance t-test with adjusted degrees of freedom.
DISCUSSION
The present study aimed to identify a well-fitting model for VerbalPA from the WMS-IV that can be interpreted in line with established cognitive theory. Initial inspection of results showed good overall fit for a two-factor Easy-Hard model, and a three-factor Semantic-Clustering model also showed good overall fit, consistent with the semantic structure found by Petrauskas (Reference Petrauskas2012) for the VerbalPA in the WMS-III. Both of these models fit significantly better than a single factor model. Further inspection showed best overall fit for a five-factor Hybrid model, which incorporated both the associative and semantic aspects of items that underlie the Easy-Hard and Semantic-Clustering models, respectively. In this model, items from different lists fell on different factors in line with the a priori classification of the hub-and-spoke model. For example, items assigned to the “Living/Hard” factor included the same item repeated in the immediate recall lists A, B, C, and D in ordinal position 8, 8, 1, and 11, respectively. This model also conformed with theoretical expectations in the MIMIC models, which examined the direct effect of LTLE and RTLE diagnostic group membership on each of the five factors. Accordingly, the five-factor model was selected as the best-fitting model for the present data set.
In view of the common concerns for risk of verbal memory decline after surgical treatment of LTLE, valid and reliable assessment of verbal memory functioning by neuropsychologists is a critical aspect of managing patient wellbeing (Baxendale, Reference Baxendale1998; Bowden et al., Reference Bowden, Saklofske, van de Vijver, Sudarshan and Eysenck2016; Reyes et al., Reference Reyes, Kaestner, Ferguson, Jones, Seidenberg, Barr and McDonald2020). As previously noted, inconsistencies in the existing literature regarding the effect of TLE on verbal memory ability suggest that different verbal memory tasks may differ in sensitivity to detect left hippocampal damage (Bell, Lin, Seidenberg, & Hermann, Reference Bell, Lin, Seidenberg and Hermann2011). As noted, it has been proposed that arbitrary associative properties within VerbalPA items may confer sensitivity to left hippocampal damage (Saling, Reference Saling2009; Scorpio et al., Reference Scorpio, Islam, Kim, Bind, Borod, Bender, Kreutzer, DeLuca and Caplan2018; Suzuki, Reference Suzuki2008). However, the current findings suggest that it is the semantic content of VerbalPA that contributes to performance differences in relation to heterogeneous neurosciences participants.
Specifically, results of the MIMIC model for the LTLE group suggest that items containing “abstract” or “non-living” target words are the most sensitive to left hippocampal damage compared to the other two patient samples (Table 4). These results suggest that VerbalPA performance may be the result of latent semantic processing factors, reflecting principles identified in contemporary cognitive neuroscience models of semantic processing, and implies VerbalPA target stimuli are not literally arbitrary despite the superficially ‘arbitrary’ properties of the word pairs. Rather, the latent semantic structure of the VerbalPA suggests that all linguistic stimuli may be inherently semantic, to some extent, and that semantic processes operate in conjunction with other memory processes to facilitate encoding and retrieval.
When compared directly to the RTLE sample on the corresponding raw item totals, there was no statistically significant differences between the LTLE and RTLE samples and again small effects were observed (Table 6). This result is in line with our previous observations that in patients identified with TLE on the basis of volumetric MRI, VerbalPA total scores were not lateralizing (Bowden et al., Reference Bowden, Saklofske, van de Vijver, Sudarshan and Eysenck2016).
WMS-IV VerbalPA clinical interpretation currently is based on a total score that represents a single construct that is referred to as associative memory in CHC theory (Schneider & McGrew, Reference Schneider and McGrew2018). However, the poorer fit observed for the one-factor model suggests that clinicians may overlook useful information if VerbalPA performance is interpreted using a single score (Kamphaus et al., Reference Kamphaus, Winsor, Rowe, Sangwon, Flanagan and McDonough2018). Scoring tests according to a latent structure model allow clinicians to refer to established theory and prior research in their reports, in turn making interpretations of performance more meaningful to non-neuropsychologist clinicians and researchers (Kamphaus et al., Reference Kamphaus, Winsor, Rowe, Sangwon, Flanagan and McDonough2018). Thus, an alternative scoring system based on a multifactorial model of VerbalPA performance may improve the interpretability of the test and potentially guide item-parceling in future test revisions for more rational scoring. The superior fit of the Hybrid model in the present study suggests that an alternative scoring model of VerbalPA performance incorporating scoring based on hub-and-spoke theory (Patterson & Lambon Ralph, Reference Patterson, Lambon Ralph, Hickok and Small2016) may enhance clinical interpretation.
Furthermore, it is likely that this model will generalize to other populations, with no reason to expect a different latent structure for healthy and clinical populations. This inference relies on the principal of measurement invariance which underlies the use of norms and validity research derived from a standardization sample when any test is applied to another clinical or community population (Bowden, Saklofske, van de Vijver, Sudarshan, & Eysenck, Reference Bowden, Saklofske, van de Vijver, Sudarshan and Eysenck2016; Widaman & Reise, Reference Widaman and Reise1997). Such inference is necessary for meaningful test interpretation and, if not supported by research, potentially requires the revalidation of a test in every population to which the test is applied. Although an advanced topic in clinical psychometrics, the assumption of measurement invariance runs counter to some historical lines of inference, namely that latent structures vary across clinical populations. However, this latter inference may owe more to incautious interpretation of observed correlations, or incautious use of exploratory factor analytic methods, than to actual variation in factor structures (see Bowden, Reference Bowden2004; Jewsbury and Bowden, Reference Jewsbury, Bowden and Bowden2017; Wilson et al., submitted).
Fortunately, to date, all research that adopts the recommended approach to evaluation of measurement invariance has been found to support the assumption of measurement invariance, that is, that latent cognitive ability structures measured by a wide variety of test batteries generalize across diverse community and clinical populations (Jewsbury, Bowden and Duff, Reference Jewsbury, Bowden and Duff2016; Wilson et al., submitted). However, we should seek to carefully evaluate all assumptions underlying measurement invariance, including generalization across ethnicity, age, and ability levels.
While the factor structure underlying VerbalPA reported from this study requires independent validation, other research suggests a high likelihood of successful replication because the hub-and-spoke model appears to provide a robust method of semantic categorization (Furey, in prep; Petrauskas unpublished). To take an example from Wechsler Scale history, we no longer interpret scores from older intelligence tests in terms of Verbal and Performance scores, but rather, in terms of the latent structure derived from careful factor analysis (Bowden, 2015; Holdnack, Zhou, Larrabee, Millis, & Salthouse, Reference Holdnack, Zhou, Larrabee, Millis and Salthouse2011).
Although we only demonstrated differences of small effect size in this study between patients with LTLE and RTLE, a revised scoring system deserves further scrutiny in patient populations in whom more pronounced lateralizing deficits may be apparent, for example, patients with acute onset lesions, to determine whether the revised scoring provides better patient classification compared to the conventional scoring. A revised scoring algorithm could be included with the Wechsler Advanced Clinical Solutions software for ease of use by clinicians.
The present findings suggest that the principles of hub-and-spoke theory may be useful in describing the semantic organization of knowledge in CHC theory, the most successful taxonomy of clinical cognitive abilities. The role of semantic organization may extend to other abilities that can be tested with linguistic stimuli, such as retrieval ability and working memory. Alternatively, semantic processing may be encompassed by the broad ability of acquired knowledge, which describes the ability to understand and communicate valued knowledge (Schneider & McGrew, Reference Schneider and McGrew2018). It is likely that acquired knowledge and learning efficiency abilities operate in conjunction to facilitate memory processing. Thus, the findings of the current study suggest that hub-and-spoke theory may be useful to describe the semantic organization of lexical knowledge in the CHC model and may elaborate understanding of narrow cognitive abilities such as meaningful memory.
While we are proposing a new semantic-clustering scoring system for future editions of the WMS, further research is required against community-normative samples to determine whether the revised scoring provides useful clinical classification accuracy (Bowden & Loring, Reference Bowden and Loring2009). In addition, it may be that VerbalPA is best used in combination with other verbal memory tests, including logical memory from the WMS-IV because many studies suggest that these subtests together share the majority of reliable variance in common (Bowden et al., Reference Bowden, Saklofske, van de Vijver, Sudarshan and Eysenck2016; Holdnack et al., Reference Holdnack, Zhou, Larrabee, Millis and Salthouse2011; Reyes et al., Reference Reyes, Kaestner, Ferguson, Jones, Seidenberg, Barr and McDonald2020) data which is incompatible with the uncorrelated modularity view of anterograde memory (Saling, Reference Saling2009).
A limitation of this study was the lack of cross-validation of viable models in another sample, which is necessary to demonstrate the generalizability of models across populations. This is an inevitable limitation of any exploratory study, as obtaining the large participant numbers required for multiple samples is often challenging. The limitation of a small sample is also true of the RTLE and LTLE groups in the present sample that were analyzed separately.
In summary, the focus of this study was to explore the item-level factor structure and test the fit of a general semantic classification of VerbalPA items, derived from hub-and-spoke predictions. The results demonstrated the success of item-level analysis, on the one hand, and showed that the hub-and-spoke semantic classification fitted the data well, including the hard item pairs. This result indicates that inferences about arbitrary-associational memory which assumes the semantic-free characteristics of hard pairs may not be valid and that all linguistic material may be subject to semantic organization in memory retrieval. Although we noted only small effect sizes for left verses right TLE differences, given the staged or developmental lesion underlying most cases of TLE, strong lateralizing effects were not expected (Ong, Reference Ong2020). However, it may be that application of the hub-and-spoke scoring to patients with an acute-onset disease in one or other temporal lobe may reveal larger effects.
The results of this study provide no evidence to suggest that the arbitrariness of VerbalPA stimuli, interpreted as semantics-free, is related to the sensitivity of the test to detect left hippocampal damage. The latent factors reflect both associative and semantic performance characteristics that are consistent with the principles of contemporary cognitive neuroscience and integrate neatly with the well-established CHC model of cognitive abilities. Thus, as an alternative structure for VerbalPA scoring, the five-factor Hybrid model identified in the present study shows strong interpretability and clinical practicality.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit https://doi.org/10.1017/S1355617721000709
FINANCIAL SUPPORT
Stephen Bowden has received research grant support from the Australian Brain Foundation and the Australian National Health and Medical Research Council, book royalties from Taylor and Francis and Oxford University Press and editorial honoraria from Springer Nature. Wendyl D’Souza’s salary is part-funded by The University of Melbourne. He has received travel, investigator-initiated, scientific advisory board and speaker honoraria from UCB Pharma Australia & Global; investigator-initiated, scientific advisory board, travel and speaker honoraria from Eisai Australia & Global; advisory board honoraria from Liva Nova; educational grants from Novartis Pharmaceuticals, Pfizer Pharmaceuticals and Sanofi-Synthelabo; educational; travel and fellowship grants from GSK Neurology Australia, and honoraria from SciGen Pharmaceuticals.
CONFLICTS OF INTEREST
The authors have nothing to disclose.