INTRODUCTION
The term “learning potential” was first coined in 1964 by Budoff and colleagues (Budoff & Friedman, Reference Budoff and Friedman1964). Unlike static, single time-point assessments of cognitive ability, learning potential (LP) refers to the ability to improve cognitive performance as a result of training, and entails multiple time-point assessments, with intervening training on the domain of interest. LP assessments were originally developed for use with educable mentally retarded (EMR) children in order to better evaluate which of those with low intelligence quotient (IQs) had an ability to benefit from training or instruction. More recently, LP assessments have also begun to be used in schizophrenia research. Interest in LP in schizophrenia developed as a result of its’ suggested role as a mediator between static neurocognition and functional outcome (Green, Kern, Braff, & Mintz, Reference Green, Kern, Braff and Mintz2000). Construct validity of LP assessment has been supported by studies indicating that groups characterized by differences in LP exhibit distinct cognitive profiles (Kurtz & Wexler, Reference Kurtz and Wexler2006; Rempfer, Hamera, Brown, & Bothwell, Reference Rempfer, Hamera, Brown and Bothwell2006; Vaskinn, et al., Reference Vaskinn, Sundet, Friis, Ueland, Simonsen and Birkenaes2008; Wiedl, Schottke, & Calero-Garcia, Reference Wiedl, Schottke and Calero-Garcia2001), as well as distinct cerebral metabolite levels (Ohrmann et al., Reference Ohrmann, Kugel, Bauer, Siegmund, Kolkebeck and Suslow2008; Pedersen, Wiedl, & Ohrmann, Reference Pedersen, Wiedl and Ohrmann2009). A number of studies also indicate that LP predicts functioning, including performance on medication management and problem-solving skill training (Wiedl, Reference Wiedl1999; Wiedl et al., Reference Wiedl, Schottke and Calero-Garcia2001), training gains in cognitive rehabilitation training (Wiedl & Wienobst, Reference Wiedl and Wienobst1999), work skill acquisition (Sergi, Kern, Mintz, & Green, Reference Sergi, Kern, Mintz and Green2005), readiness for rehabilitation (Fiszdon et al., Reference Fiszdon, McClough, Silverstein, Bell, Jaramillo and Smith2006), performance on a proxy measure of functioning (Kurtz & Wexler, Reference Kurtz and Wexler2006), and the outcome of vocational rehabilitation (Watzke, Brieger, Kuss, Schoettke, & Wiedl, Reference Watzke, Brieger, Kuss, Schoettke and Wiedl2008; Watzke, Brieger, & Wiedl, Reference Watzke, Brieger and Wiedl2009). It should be noted, however, that there are also several reports indicating that LP may not confer an advantage above and beyond the predictive value of static neurocognitive measures (Kurtz, Jeffrey, & Rose, Reference Kurtz, Jeffrey and Rose2010; Tenhula, Kinnaman, & Bellack, Reference Tenhula, Kinnaman and Bellack2007; Vaskinn et al., Reference Vaskinn, Sundet, Friis, Ueland, Simonsen and Birkenaes2008 Woonings, Appelo, Kluiter, Slooff, & van den Bosch, Reference Woonings, Appelo, Kluiter, Slooff and van den Bosch2003). Furthermore, although many researchers report that LP status is a good predictor of functioning, multiple approaches have been used to determine LP status, and no consensus has yet been reached about the best method for assessing and quantifying LP.
A number of different testing and computational approaches have been used to quantify LP. Studies of LP in schizophrenia have most frequently relied on test-train-test versions of the Wisconsin Card Sorting Test (WCST; Heaton, Reference Heaton1981), a test of executive function, though also used have been list-learning tasks that assess verbal learning and memory. Based on pre-training test performance and relative change in post-training performance, one approach of applying LP assessment involves classifying individuals into learner categories (see Schoettke, Bartram, & Wiedl, Reference Schoettke, Bartram, Wiedl, Hamers, Sijtsma and Ruijssenaars1993 for algorithm), namely, high scorers (those with good pre-training performance), learners (those whose post-training performance is above limits established by reliable change criterion), and nonretainers (those whose post-training performance is within limits established by reliable change criterion). While this approach has the advantage of separately examining distinct LP groups, as with any categorical approach, it reduces statistical power. An alternative to the categorical approach is the continuous gain approach, which relies on a ratio of actual pre-post improvement to maximum possible pre/post improvement (see Sergi et al., Reference Sergi, Kern, Mintz and Green2005 for algorithm). While this approach has the benefit of increasing statistical power, it assumes a linear continuum of learning potential ability. Moreover, because gain scores are scaled relative to optimum performance, individuals with initial performance near optimum can receive very high or low gain scores based on minute differences in pre- and post-training performance – small decrements from initially high performance produce gain scores similar to those of individuals with low pre- and post-training performance, while small improvements in post-training performance produce gain scores similar to those of individuals with poor pre-training performance who make considerable gains following training. An alternative to these two LP indices is the optimized performance approach (see Woonings et al., Reference Woonings, Appelo, Kluiter, Slooff and van den Bosch2003). As the name suggests, this approach simply relies on the post-training performance, without reference to pre-training performance or amount of improvement as a result of training. While only a “static” post-training score is used, this approach appropriately falls within the realm of learning potential assessment, as the post-training score in effect incorporates the potential effects of the training intervention. While the optimized approach does provide an index of how well an individual can perform following training, it is limited by not directly incorporating pre-training performance, thereby not taking into account the amount of change as a result of training. Two additional indices have also been used in the assessment of LP: a simple difference score between pre-training and post-training performance, and a regression residual score, calculated by regressing pre-training performance on post-training performance (see Weingartz, Wiedl, & Watzke,2008 for algorithm). While not unique to LP assessment in schizophrenia, the use of varied measurement strategies represents a serious problem for researchers for several reasons. For one, methodological differences can lead to substantially different conclusions, even using the same data, as well as decrease the potential for the generalization of findings. Methodological differences may also complicate research conclusions and practical implications of LP assessment, as LP computational approaches may differ in basic psychometric (i.e., reliability) properties and inadvertently alter the construct being assessed.
We are aware of only two studies that have examined the psychometric properties of different approaches to LP assessment in schizophrenia. Waldorf and colleagues (Waldorf, Wiedl, & Schottke, Reference Waldorf, Wiedl and Schottke2009) examined the comparability of three different reliable change indices as applied to categorical LP classification using the WCST. While their results indicated particularly high (kappa > 0.90) classification concordance rates for two of the three indices, concordance rates among all three indices were still in the acceptable range (kappa > 0.70). As noted by the authors themselves, while these analyses allowed for a comparison of the three indices, they did not address questions of the validity of the resulting classifications. In a more extensive evaluation of LP indices obtained using the test-train-test administration of the WCST, Weingartz and colleagues (Weingartz et al., Reference Weingartz, Wiedl and Watzke2008) evaluated the test-retest stability, intercorrelations among, and construct validity of several continuous approaches to LP measurement, which included pre-post differences, gain scores, and regression residuals. Also examined was a categorical approach to LP classification, as well as raw pre- and post-training test values. These authors concluded that from among the indices examined, categorical LP classification, raw post-training test performance, and regression residuals had the highest stability and validity. Close examination of these indices, however, reveals low levels of one-year test-retest stability (0.32 to 0.51). Analyses ascertaining concurrent and prognostic validity (assessed with functioning scales and employment status) yielded significant but small (0.25 to 0.35) correlation coefficients for post-training test performance and regression residuals.
Both of these psychometric investigations relied on scores obtained from test-train-test versions of the WCST. An important characteristic of the WCST training is that it focuses on whether, and how quickly, the examinee learns the sorting principles. Once these sorting principles are known, they can be applied time and again to successive administrations of the same task. Particularly in cases where test-retest stability of LP is assessed, it is possible that WCST retest performance may be “spoiled” for individuals who learn the sorting principles during the initial LP training trial – the test may change from one of problem solving ability, to one measuring memory of test rules. In fact, there is data to indicate that the construct validity of WCST performance changes as a result of training. This is demonstrated by changes in the pattern of intercorrelations between pre-training and post-training WCST performance and performance on other cognitive tasks, with a significant increase in the correlation between memory function and WCST performance following training (Wiedl, Schottke, Green, & Nuechterlein, Reference Wiedl, Schottke, Green and Nuechterlein2004). An alternative measure that has been used to assess learning potential is word list learning. These tests typically consist of multiple administrations of a word list, with instructions, additional practice, and feedback between the initial and final list recalls. While there is evidence of construct overlap between WCST-based LP and verbal memory assessed by list-learning tasks (Wiedl et al., Reference Wiedl, Schottke, Green and Nuechterlein2004), unlike with the WCST, there is no rule that, if known by the examinee, will result in perfect list learning performance, and both pre- and post-training performance are an index of the same cognitive domain, verbal learning and memory.
The aim of the current investigation was to examine the reliability and validity of LP indices based on a list-learning task. We evaluated several LP indices based on a test-train-test version of a list-learning task, in which the training phase focused on teaching the use of semantic grouping, an encoding strategy that is associated with greater list recall. Unlike the WCST LP administrations, different versions of the word lists were administered at pre-test, training, and post-test, so the pre-post training differences are more apt to reflect the application of the trained strategy rather than simple practice effects associated with repeated exposure to the same stimuli. Like Weingartz and colleagues (Weingartz et al., Reference Weingartz, Wiedl and Watzke2008), our analyses focused on four psychometric issues: test-retest stability, intercorrelations of different LP indices, construct, and criterion validity.
METHODS
Participants
Participants in the study were 43 individuals with a DSM-IV (American Psychiatric Association & Task Force on DSM-IV, 1994) diagnosis of schizophrenia (n = 35) or schizoaffective disorder (n = 8), as assessed by the Structured Clinical Interview for DSM-IV, (SCID; First, Spitzer, Gibbon, & Williams, Reference First, Spitzer, Gibbon and Williams1996) administered by doctoral-level psychologists. All participants were enrolled in a larger, ongoing, randomized trial of cognitive remediation being conducted at a Veterans Administration Medical Center. For the current analyses, data were included only for individuals who completed the learning potential assessment at both study intake and at two-month follow-up. The study was approved by the local institutional review board (IRB), and all participants signed informed consent.
All participants met the following inclusion criteria: age between 18 and 65 years, no substance abuse in the past month, no known neurological condition that may affect cognition, no documented mental retardation, no medication changes in past month, and no housing changes in past month. The sample was 56% Caucasian and 75% male. Sixty-one percent of the participants had never been married. On average, age at illness onset was 22.08 (SD = 6.01), with age at study entry of 49.56 (SD = 8.47). Participants had completed an average of 12.53 (SD = 2.04) years of education and had a Wechsler Abbreviated Scale of Intelligence (WASI, Wechsler, Reference Wechsler1999) IQ estimate of 94.37 (SD = 14.94). They had had an average of 16.27 (SD = 25.82) hospitalizations, a Global Assessment of Functioning (GAF; American Psychiatric Association & Task Force on DSM-IV, 1994) score of 39.00 (SD = 6.38), and a Positive and Negative Syndrome Scale (PANSS; Kay, Fiszbein, & Opler, Reference Kay, Fiszbein and Opler1987) score of 54.19 (SD = 11.43).
Procedures
All participants in this sample of convenience completed comprehensive intake assessments consisting of psychiatric interviews and cognitive testing, after which they were randomized (1 to 2 ratio) to receive two months of either treatment as usual or cognitive remediation. Cognitive remediation consisted of up to 40 hours of both computerized and paper-and-pencil training, akin to that used in Neurocognitive Enhancement Therapy (NET; see Bell, Bryson, Greig, Corcoran, & Wexler, Reference Bell, Bryson, Greig, Corcoran and Wexler2001) and Cognitive Remediation Therapy (CRT; see Delahunty, Reeder, Wykes, Morice, & Newton, Reference Delahunty, Reeder, Wykes, Morice and Newton2001), respectively. At the end of the two-month active phase, all participants were again administered psychiatric interviews and cognitive testing.
Measures
Learning potential assessment
A modified test-train-test administration of the California Verbal Learning Test-II (CVLT-II; Delis, Kramer, Kaplan, & Ober, Reference Delis, Kramer, Kaplan and Ober2000) was used to assess learning potential. For details of this procedure, the reader is referred to Fiszdon and colleagues (Fiszdon et al., Reference Fiszdon, McClough, Silverstein, Bell, Jaramillo and Smith2006). Briefly, for the standard CVLT-II administration, the examinee is read and asked to recall a list of 16 words belonging to four semantic categories. Each correctly recalled word earns one point. The list is administered a total of five times, for a maximum score of 80. The LP modified CVLT involves the administration of three different word lists (which we refer to as pre-training, training, and post-training lists). While the first administration uses standard procedures, the second list administration is preceded by a training component. Standard administration is again done for the third CLVT list. Immediately prior to and during the administration of the second (training) list, participants were shown how using semantic grouping can lead to better recall, were asked to recall the specific list categories at the end of each recall trial, and were given corrective feedback about list categories as needed (training script available upon request from the first author, J.F.).
The following raw scores and learning potential indices were calculated based on trials 1–5 total scores (maximum of 80) for the pre-training and post-training list administrations: (1) Pre-training score, (2) Post-training score, (3) Categorical LP index (“learner,” “nonlearner,” and “high-scorer” groups) based on pre-training performance and whether post-training performance is outside a 90% confidence interval set around a hypothetical parallel test score (for algorithm, refer to Schoettke et al., Reference Schoettke, Bartram, Wiedl, Hamers, Sijtsma and Ruijssenaars1993), (4) regression residuals (obtained by regressing pre-training performance on post-training performance; see Weingartz et al., Reference Weingartz, Wiedl and Watzke2008 for algorithm), (5) Post-training minus pre-training difference score, and (6) gain score (ratio of actual pre-post training improvement to maximum possible pre-post training improvement, with maximum possible pre-post improvement being 80 minus pre-training performance; for algorithm see Sergi et al., Reference Sergi, Kern, Mintz and Green2005). The modified CVLT-II was administered at study intake and at 2-month follow-up, and separate LP indices were computed for each time point.
Construct validity measure
The Wisconsin Card Sorting Test (WCST; Heaton, Reference Heaton1981) is a neuropsychological test of abstract problem solving that has previously been adapted to LP assessments, and has been shown to share variance with list-learning performance (Wiedl et al., Reference Wiedl, Schottke, Green and Nuechterlein2004). The WCST was administered at intake and used in this analysis as a measure of concurrent validity of the learning construct. Standard scores for two of the more common indices, percent errors, and percent perseverative errors were used in the current analyses.
Criterion validity measures
The Quality of Life Scale (QLS; Heinrichs, Hanlon, & Carpenter, Jr., Reference Heinrichs, Hanlon and Carpenter1984) was administered at intake and at 2-month follow-up and was used to assess concurrent and predictive validity of the LP measures in relation to global functioning. The QLS is a well-validated scale consisting of 21 items organized into four domains: intrapsychic foundations, interpersonal relations, instrumental role function, and common objects and activities. Likert-type (0–6) ratings are made following a clinician-administered interview, with higher total scores indicating better overall functioning. All ratings were made by the first author (J.F.), who was blind to learning potential classification at the time of interview.
Data Analysis
All data were analyzed using the Statistical Package for the Social Sciences (SPSS 17). To determine test-retest stability, bivariate correlations were computed between intake and 2-month LP indices. To determine whether the main study intervention affected the measurement of LP at 2-month follow-up, test-retest analyses for the LP indices were repeated, controlling for the number of cognitive remediation sessions. Next, intercorrelations between the LP indices were computed. Construct validity of LP indices was evaluated using correlations of each LP index with WCST performance. Concurrent and predictive criterion validity was assessed by correlating intake LP indices with intake and 2-month QLS scores. For the categorical LP index variable, all correlations were computed using Kendall’s tau-b. For the remaining variables, Pearson correlations were used. Differences in the strength of correlations between each LP index and the construct and criterion validity measures (WCST, QLS) were tested using William’s T2 statistic for dependent correlations (Steiger, Reference Steiger1980). Using William’s formula, raw correlation coefficients obtained for each LP index were entered in a pair-wise contrast with the pre-training CVLT score. In this manner we sought to determine whether LP administration of the CVLT provides a stronger measure of association than standard administration, and, if so, which of the LP indices have superior predictive power. These tests were conducted using alpha = .05, two-tailed.
RESULTS
Results for bivariate test-retest correlations were strikingly similar to results for partial correlations that controlled for the number of cognitive remediation sessions (see Table 1). Because we found no indication that the reliability of LP measures was affected by cognitive remediation in this sample, partial correlations are not reported, and all subsequent analyses were conducted without controlling for the number of cognitive remediation sessions. Data on test-retest reliability, intercorrelations among intake LP indices, construct and criterion validity of the LP indices are reported in Table 2.
Table 1. Intake to 2-month test-retest correlations
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704231901-11328-mediumThumb-S1355617710000317_tab1.jpg?pub-status=live)
Note
Correlation coefficients of bivariate analyses with statistical significance indicated as follows: *p ≤ .05, **p ≤ .01, ***p ≤ .001.
Table 2. Test-retest correlations, intercorrelations, and construct validity of learning potential (LP) indices
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20160704231901-00960-mediumThumb-S1355617710000317_tab2.jpg?pub-status=live)
Note
Correlation coefficients of bivariate analyses with statistical significance indicated as follows: + p ≤ .10, *p ≤ .05, ** p ≤ .01, *** p ≤ .001. Test-retest reliability for each score is analyzed in CVLT LP administrations conducted at intake and 2-month follow-up. Intercorrelations among LP indices were computed across intake data only. Concurrent validity was assessed in relation to Wisconsin Card Sort Test (WCST) % errors and % perseverative errors and Quality of Life Scale (QLS) ratings at intake. Predictive validity was assessed in correlations between intake LP indices and QLS ratings conducted at 2-month follow up.
1 Kendall’s tau b used for correlations with the categorical LP measure, Pearson r used for all other analyses.
Test-Retest Reliability
Correlations between intake and 2-month LP indices ranged between r = .749 and r = .105, with the highest values for the pre-training and post-training scores, significant moderately-sized correlations for categorical LP index, regression residual, and pre-post difference score, and small nonsignificant correlations for gain score.
Intercorrelations Among LP Indices
Post-training scores correlated significantly with all other LP indices with the exception of gain scores. Regression residuals correlated significantly with all other LP indices with the exception of pre-training score. Regression residuals, pre-post difference scores, and gain scores were highly correlated with each other.
Construct Validity
Construct validity was assessed in relation to cognition (WCST % errors and % perseverative errors). Only pre-training score, post-training score, and categorical LP index correlated significantly with WCST performance. A statistical comparison of the strength of these correlations determined that none of the LP indices produced coefficients that differed significantly from pre-training performance. Trend level effects were detected for the gain score method, indicating weaker association with both WCST % errors, t(40) = –1.79, p < .10, and WCST % perseverative errors, t(40) = –1.87, p < .10, than pre-training score.
Concurrent Criterion Validity
In contrast to the pattern of results for WCST score, each of the LP indices correlated significantly with intake QLS, despite a small, nonsignificant, correlation with pre-training score. Compared statistically, only one of these LP indices, post-training score, produced a correlation with intake QLS that was significantly higher, t(40) = 3.04, p < .01, than that of pre-training score. A nonsignificant trend in the same direction was detected for the regression residual method, t(40) = 1.88, p < .10.
Predictive Criterion Validity
This was assessed in relation to QLS ratings conducted 2 months following intake LP assessment. Across pre-training and LP indices, the pattern of correlations was similar to that observed with intake QLS, with the exception that the correlation between categorical LP grouping and QLS dropped to trend level. Compared statistically, only post-training score produced a correlation with 2-month QLS that was significantly higher, t(40) = 2.39, p < .05, than that of pre-training score.
DISCUSSION
The aim of the current study was to evaluate the reliability and validity of several LP indices, as applied to test-train-test administrations of a list-learning measure, the CVLT-II. With regard to test-retest reliability, our results indicate that raw CVLT test scores (pre-training score and post-training score) have high 2- month test-retest reliability (0.70 range). In contrast, test-retest reliability of LP measures based on these scores is variable and modest overall, with coefficients ranging from r = .34 to r = 0.61 for categorical LP index, regression residual, and pre-post difference score methods. A final method, the gain score, had the lowest test-retest reliability, with a coefficient only marginally exceeding 0. Given that all LP scores were computed based on the same pre- and post-training CVLT scores, which themselves appeared to be highly reliable under repeat administration, this analysis demonstrates how basic psychometric properties of LP scores can be altered by their computational method. In line with the well-known adage about reliability being a necessary but not sufficient condition for validity, our results suggest that gain scores are less than optimal indices of LP. However, we recognize that reliability can vary widely across samples, and our findings require replication in larger samples before definitive conclusions can be drawn about the usefulness of this particular LP index. Additionally, any conclusions based on these reliability coefficients assume that learning potential is a stable construct that can be assessed reliably over multiple dynamic test administrations, which has been questioned by some (Weingartz et al., Reference Weingartz, Wiedl and Watzke2008). Specifically, it has been suggested that LP may have state-like characteristics, and may be influenced by variables such as psychopathology. If this were the case, one would not expect high test-retest correlations. However, given that from among all of the LP indices studied, only gain scores fail to show significant test-retest correlations, it is more likely that this is in fact due to the psychometric properties of this particular method of LP calculation.
In the absence of a gold standard with which to correlate the various LP measures, it is difficult to draw conclusions about validity based on the pattern of intercorrelations, as they only indicate similarities between classifications, and not necessarily the degree of overlap with the latent construct of interest. It is of interest, however, that post-training performance most consistently correlates with all the other LP indices, suggesting that it best taps the shared variance of the measures and has the highest overlap with the construct being measured by the other indices. Also noteworthy is the high level of agreement among gain scores, pre-post training difference scores, and regression residuals. Given that all three of these measures are based on pre- and post-training performance, this high degree of relationship could be due to a measurement (psychometric) artifact, with high correlations due to shared measurement error across the measures (Linn & Slinde, Reference Linn and Slinde1977).
Concerning the construct and criterion validity of LP assessment, several patterns of association between LP indices and independent measures of neurocognitive (WCST) and community (QLS) functioning are notable. First, we found that only post-training performance and the categorical LP index correlated significantly with WCST performance. Because the WCST is a measure of abstract problem-solving ability in which the examinee can improve performance by using corrective feedback, one would expect overlap between WCST scores and learning capacity. Furthermore, correlations between pre-training CVLT score and both WCST scores were significant, suggesting some common variance in the neurocognitive domains assessed by these tests. However, no appreciable relationship with WCST performance was observed for LP indices computed by regression residual, pre-post difference, or gain score methods. For our criterion validity measure, intake QLS, the use of LP indices did appear to add unique and meaningful variance in predicting functioning over the standard (pre-training) CVLT administration, a finding similar to that reported by Weingartz and colleagues (Weingartz et al., Reference Weingartz, Wiedl and Watzke2008). Correlations between intake QLS and post-training score, categorical LP classification, pre-post training difference score, regression residual, and gain score were all statistically significant, while the correlation between QLS and pre-training CVLT score was not. A very similar pattern of results in relation to the pattern of correlations with 2-month QLS scores suggests the LP indices may improve predictive power of functional outcome assessment over conventional CVLT administration. Taken together, we find a consistent and common pattern of association for post-training performance and the categorical LP index, while regression residuals, pre-post difference scores, and gain scores correlate only with QLS.
Given the patterns of relationships observed across examination of test-retest reliability, interrelatedness across LP measures, and association with independent measures of learning (i.e., WCST) and functioning, several conclusions can be made about the strengths and weakness of different methods of computing LP. Because test-retest reliability was remarkably poor for gain scores, we are cautious in suggesting this computational approach as a preferred measure of LP. While test-retest reliability was relatively high for the simple pre-post difference score, this method is also not without limitations. Since difference scores have a negative correlation with pre-training scores, high difference scores are more apt to be obtained when pre-training performance is low (Linn & Slinde, Reference Linn and Slinde1977), an effect further confounded by possible regression to the mean (Klauer, Reference Klauer, Hamers, Sijtsma and Ruijssenaars1993; Schoettke et al., Reference Schoettke, Bartram, Wiedl, Hamers, Sijtsma and Ruijssenaars1993). There are also scaling issues, as difference scores do not necessarily have the same distributional characteristics as the raw pre- and post-scores from which they are derived (Schoettke et al., Reference Schoettke, Bartram, Wiedl, Hamers, Sijtsma and Ruijssenaars1993). Although these issues are not completely mitigated by regression residuals or categorical LP classification, these methods are positively correlated with pre-training and have relatively stronger association with post-training, suggesting they are less dependent on initial ability and may better capture the capacity to benefit from instructions provided during LP assessment. Interestingly, the categorical LP index and regression residuals differed in their patterns of association with WCST and QLS validity measures, wherein the categorical LP classification correlated significantly with WCST and intake QLS, while regression residual correlated with intake and 2-month QLS, but not WCST. Therefore, computational differences between these approaches appear to determine the strength of relationship with other cognitive and functional outcome measures, a somewhat undesirable characteristic.
Considering together the test-retest stability, pattern of interrelationships with other LP scores, and correlations with validity measures, post-training performance appears to confer some advantages over other LP indices. Indeed, significance testing of the strength of correlation between LP indices and QLS ratings indicated that only the post-training score produced higher coefficients than would otherwise be obtained using a standard CVLT administration. However, it should be noted that post-training score did not produce the highest bivariate correlations with QLS, and that this finding may be influenced by the intercorrelation of pre-training with post-training score. Although the William’s T2 statistic is recommended as a preferred approach for contrasting the strength of dependent correlations (Steiger, Reference Steiger1980), this method takes into account the intercorrelation between predictors, and thus may be more sensitive for contrasts involving highly related predictors, such as pre-training and post-training score, and less sensitive when predictors share a zero-order correlation, as in the case of the regression residual. Furthermore, whether or not post-training performance is the preferred measure depends on exactly what the investigator wishes to capture – while the optimized post-training measure does provide an index of how well an individual can perform (which in turn is related to functioning), by not incorporating pre-testing performance, it fails to assess the amount of learning that has occurred. While strictly speaking, optimized scores do not index one’s ability to learn (hence, do not index learning potential per se), they do provide a relatively stable index of potential performance, which, as suggested by some (Woonings et al., Reference Woonings, Appelo, Kluiter, Slooff and van den Bosch2003), may be the best indicator of ability and the strongest predictor of functional outcomes.
ACKNOWLEDGMENTS
This work was supported in part by a grant from Veterans Administration Rehabilitation Research and Development (#D2356-R to J.F.).