Hostname: page-component-7b9c58cd5d-wdhn8 Total loading time: 0 Render date: 2025-03-16T00:14:30.911Z Has data issue: false hasContentIssue false

Effects of using same- versus alternate-form memory tests during short-interval repeated assessments in multiple sclerosis

Published online by Cambridge University Press:  21 October 2005

RALPH H. B. BENEDICT
Affiliation:
State University of New York (SUNY) at Buffalo School of Medicine, Department of Neurology, and the Jacobs Neurological Institute, Buffalo General Hospital, Buffalo, New York
Rights & Permissions [Opens in a new window]

Abstract

Repeated neuropsychological testing gives rise to practice effects in that patients become familiar with test material as well as test-taking procedures. Using alternate forms prevents the learning of specific test stimuli, potentially mitigating practice effects. However, changing forms could diminish test-retest reliability coefficients. Our objective was to examine test-retest effects in multiple sclerosis (MS) patients randomly assigned to same- (SF) or alternate-form (AF) conditions. Thirty-four MS patients underwent neuropsychological evaluation. The battery included the California Verbal Learning Test II (CVLT-II) and the Brief Visuospatial Memory Test–Revised (BVMT-R), memory tests recommended by a recently convened consensus panel. Patients were randomly assigned to SF or AF groups and then tested at baseline and follow-up examination 1 week later. Analysis of variance tests (ANOVAs) revealed significant group × time interactions, with SF patients showing greater gain than AF patients. SF practice effects were often large, compromising test validity. Reliability coefficients were either equivalent or higher in the AF group, a finding attributed to ceiling effects and reduced variance in the SF group at retest. The generalizability of the findings may be limited to short test-retest intervals and the MS population. Nevertheless, I conclude that the use of CVLT-II and BVMT-R alternate forms likely helps preserve test validity without compromising test-retest reliability. (JINS, 2005, 11, 727–736.)

Type
Research Article
Copyright
© 2005 The International Neuropsychological Society

INTRODUCTION

Repeated neuropsychological (NP) testing has become commonplace in recent years. Increasingly, tests are used to establish a baseline against which the effects of neurologic disease (Heaton et al., 1995; Amato et al., 2001; DeCarli et al., 2004; Green et al., 2004), trauma (Iverson et al., 2003; Lovell et al., 2004), or treatment (Chelune et al., 1993; Collie et al., 2002; Krupp et al., 2004) may be assessed. There are, in turn, efforts to develop NP tests with good test-retest reliability and alternate forms (Benedict et al., 1996; Woodard et al., 1996; Benedict et al., 1998) so that testing enhances the power to detect reliable change (Temkin et al., 1999; Heaton et al., 2001; Iverson et al., 2003).

Practice effects are particularly salient when patients are asked to learn a problem-solving strategy or remember new information. There may be two distinct sources of learning across testing sessions. First, as with most any cognitive test, patients are apt to develop test-taking strategies with repeated exposures to the same procedure. We (Benedict & Zgaljardic, 1998; Zgaljardic & Benedict, 2001) previously referred to this as “test-specific” practice. In contrast, “item-specific” practice refers to the learning of actual content (e.g., a word list) from one administration of the test to the next. While test-specific practice is unavoidable in serial NP assessment, item-specific practice can be mitigated via the use of alternate test forms.

In our previous review of studies involving healthy subjects (Benedict & Zgaljardic, 1998), we compared effect sizes from studies of repeated memory testing whenever same or alternate test forms were employed at retest. We calculated Cohen's (1988) d and found marked practice effects in studies repeating the same test forms (McCaffrey et al., 1992, 1993, 1995; Rapport et al., 1997). In contrast, studies using alternate forms generated much lower d values (Parker et al., 1995; Benedict et al., 1996). Surprisingly few studies had directly compared the effects of using same or alternate forms in the same sample. An exception was a study where two forms of the Rey Auditory Verbal Learning Test (Rey, 1964) were compared (Crawford et al., 1989). Subjects were randomly assigned to testing with one of two test forms. A between-group baseline comparison indicated that the forms were of equivalent difficulty. Participants were then randomly reassigned to same- or alternate-form conditions. After 27 days, significant improvement was found for the same-form group on all measures, with mostly large effects (d range .4 to 1.3). By comparison, significant practice effects were not apparent in the alternate-form group.

In our study (Benedict & Zgaljardic, 1998), we assessed 30 healthy volunteers divided into same- and alternate-form groups, matched on age, education, and baseline memory test performance. The tests employed were the Hopkins Verbal Learning Test–Revised (Brandt & Benedict, 2001) and the Brief Visuospatial Memory Test–Revised (Benedict, 1997). Participants tested with the same form every two weeks improved significantly over four sessions, whereas those completing alternate forms produced either small or insignificant practice effects. Taken together, these studies of normal volunteers show that in the domain of memory, practice effects are significantly attenuated when alternate forms are employed.

Practice effects have rarely been examined systematically in patient samples. Hawkins and Wexler (1999) administered the California Verbal Learning Test (CVLT; Delis et al., 1987) to 20 schizophrenia patients on three occasions: baseline, 10, and 14 weeks. As anticipated, statistically significant practice effects were observed on multiple measures. Baseline to week 14 effect sizes (d) ranged from 0.6 on Trial 5 to 1.0 on Delayed Recall. A larger and older (mean age = 59.4 years) schizophrenia sample was studied by Harvey et al. (2005), who used a large battery of tests spanning multiple cognitive domains. There were two assessments, with an 8-week test-retest interval. Of the 17 tests administered, only two had alternate forms. Significant improvement was recorded on three tests, and in each case the effects were small. The authors concluded that NP performance of older schizophrenia patients is stable over 8 weeks. Practice effects have also been documented in HIV samples. McCaffrey and colleagues (1995) found significant practice effects on the CVLT and the Paced Auditory Serial Addition Test (Gronwall, 1977). More recently, in 26 HIV+ symptomatic and 33 HIV+ asymptomatic patients, significant gain was observed on multiple CVLT indices over an interval of roughly 16 days. Altogether, these studies show that neurological and psychiatric patients, like healthy volunteers, often demonstrate significant practice effects, but that older patients with chronic neuropsychiatric illness may be less susceptible to these effects. Unfortunately, to the best of our knowledge, no studies, so far, have compared the effects of same- versus alternate-forms within the same sample of neurological or psychiatric patients.

Reliability is another issue to be weighed in considering the costs and benefits of alternate forms. Even if the equivalence of alternate forms is well established, test-retest reliability may be attenuated from one test session to another, thereby increasing error in statistical analysis and hindering estimates of reliable change in performance (Chelune et al., 1991; Jacobson & Truax, 1991; Sawrie et al., 1996). In the studies with clinical patients cited earlier, test-retest reliability coefficients were generally good. For example, in Harvey et al's schizophrenia study, correlations ranged from r = .52 to .93 (Harvey et al., 2005). In the more recent HIV work (Duff et al., 2001), reliability coefficients for CVLT recall measures for asymptomatic patients ranged from .52 to .76. It is conceivable that using alternate forms in longitudinal studies could increase error, thus lowering reliability coefficients and increasing reliable change indices.

The population of interest in this study is multiple sclerosis (MS). Roughly half of all MS patients are cognitively impaired (Rao et al., 1991; Heaton, 1985) and a higher frequency of impairment may exist among clinic attendees (Medaer et al., 1984; Rao et al., 1984). Deficits in processing speed and working memory are common (Franklin et al., 1988; Litvan et al., 1988; Beatty et al., 1989; Rao et al., 1989; DeLuca et al., 1993; Kujala, 1994; Camp et al., 1999; Demaree et al., 1999; Archibald & Fisk, 2000), as are impairments in new learning and memory (Grant et al., 1984; Rao et al., 1984; Fischer, 1988; DeLuca et al., 1994; Beatty, 1996; DeLuca et al., 1998). In recent years, medications that alter disease activity have significantly diminished the frequency of relapses in MS and delayed the onset of physical disability (Paty et al., 1993; Johnson et al., 1995; Jacobs et al., 1996). In some cases, interferon beta-1a has significantly delayed the onset of neuropsychological impairment (Fischer et al., 2000). In addition, donepezil was recently found to enhance memory function in MS patients (Krupp et al., 2004). As a result of such medicinal successes, demand for serial NP testing in MS has increased, leading a consensus panel (Benedict et al., 2002) to conclude that minimal standards for NP testing in MS should emphasize alternate forms, as well as reliability and discriminative validity. This group acknowledged, however, that little is known about stability of NP testing when alternate forms are used in clinical trials.

In summary, practice effects are problematic in NP assessment, and it appears that serial examinations over short time intervals are increasingly called for in studies concerning the natural history of cognitive impairment in neurologic disease and clinical trials. Memory tests are almost always included in NP assessments, yet the effects of using same/alternate forms of memory tests have not been compared in a neurologic or psychiatric sample. This study, therefore, was designed to assess test-retest effects in an MS sample, the general hypothesis being that alternate forms would protect against marked practice effects, and that the stability of NP testing would not be adversely affected by introducing alternate forms.

METHODS

Research Participants

Included were 34 MS patients who underwent repeat NP testing while participating in a study of the psychometric properties of a recently developed screening questionnaire (Benedict et al., 2004a). All were attending one of four MS clinics (Baird MS Center, Buffalo, NY; University of California at San Francisco; University of Colorado Health Sciences Center, Denver, CO; Gimbel Center at Teaneck, NJ). Inclusion criteria were (a) diagnosis of clinically definite MS (McDonald et al., 2001), (b) informant having contact with the patient at least three times per week, (c) age 18 or older, (d) fluent in English, and (e) able to provide informed consent to all procedures. Exclusion criteria were (a) neurological disorder other than MS, (b) psychiatric disorder (American Psychiatric Association, 1994) other than mood, personality, or behavior change following the onset of MS, (c) other medical condition that might influence cognition, (d) history of developmental disorder (e.g., ADHD, learning disability), (e) history of substance or alcohol dependence, (f) current substance abuse, (g) motor or sensory defect that might interfere with cognitive test performance, (h) relapse and/or corticosteroid pulse within four weeks of assessment. All participants signed informed consent forms approved by institutional review panels prior to participating in the study.

Mean age (±SD) was 42.2 ± 8.9 years. On average, patients completed 15.4 ± 2.5 years of education. The majority were Caucasian (n = 31 or 91%) and female (n = 28 or 82%), consistent with the MS population, which is primarily female (Jacobs et al., 1999). Scores derived from the Expanded Disability Status Scale (Kurtzke, 1983), which assess neurologic/physical disability, were available for 28 patients and the mean was 2.5 ± 2.0, representing mild/moderate impairment. Most patients (n = 27 or 79%) had relapsing-remitting rather than progressive (4 secondary progressive, 3 primary progressive) course.

Materials

Each patient underwent NP testing in accordance with recent consensus panel recommendations (Benedict et al., 2002). This battery, known as the Minimal Assessment of Cognitive Function in MS (MACFIMS), recommends the use of alternate forms where possible, and includes the following tests with alternate forms: California Verbal Learning Test, second edition (CVLT-II; Delis et al., 2000), Brief Visuospatial Memory Test–Revised (BVMT-R; Benedict, 1997), Paced Auditory Serial Addition Test (PASAT; Gronwall, 1977), Controlled Oral Word Association Test (COWAT; Benton et al., 1994), and Sorting Test from the Delis-Kaplan Executive Function System (Delis et al., 2001). The Sorting Test was not included in the analysis because it was not administered uniformly across subjects (cued sorting not administered to all subjects). Same/alternate form comparisons were thus restricted to four tests, two emphasizing memory (CVLT-II, BVMT-R) and two emphasizing processing speed or executive control (PASAT, COWAT).

The CVLT-II and BVMT-R are both learning and memory tests, with similar formats. Both require single exposures to new material that is recalled immediately after presentation. There is a 25-minute interval following the final learning trial, after which participants are asked to recall the information again without further exposure to the to-be-learned material. Delayed recall is followed by a yes/no, forced-choice recognition task. There are 5 learning trials for the CVLT-II. Examiners read 16 words and ask participants to repeat as many words as possible. The entire word list is repeated each time. For the BVMT-R, the stimulus material is a matrix of 6 visual designs, held before the participant for 10 seconds. Participants are asked to render the designs using paper and pencil, taking as much time as needed for reproduction. Each design receives a score of 0, 1, or 2, based on accuracy and location scoring criteria. There are three free-recall trials preceded by stimulus exposure. In this study, we considered the following measures for each test: trial 1 recall (Trial 1), final trial recall (CVLT-II Trial 5; BVMT-R Trial 3), total recall over all learning trials (Total Learning), recall after the delay interval (Delayed Recall), and recognition discrimination index recommended in the test manuals, which follows the delayed interval (Delayed Recognition). Z scores were calculated using a previously published (Benedict et al., 2004a) normative sample (n = 40; mean age = 40.0 ± 9.1; mean education = 14.9 ± 2.0; 70% female), and the means ranged from −1.2 ± 1.5 for BVMT-R Delayed Recall to −0.5 ± 1.4 for CVLT-II Total Learning.

The PASAT and COWAT served as executive control tasks. We employed Rao's adaptation (Rao, 1991) of the PASAT, which includes 60 trials presented at interstimulus intervals of 3 and 2 seconds. This version was selected to be a component of the MS Functional Composite (MSFC), a clinical outcome measure composed of quantitative measures of leg, arm/hand, and cognitive function (Cutter et al., 1999; Fischer et al., 1999). The dependent measure was the mean number of correct responses from the two trials. The COWAT was administered in the standard manner, following the method of Arthur Benton (Benton et al., 1994). In successive one-minute trials, participants generated as many words as possible, beginning with each of three designated letters. The dependent measure was the total number of correct words over the three trials.

The test battery also included the Symbol Digit Modalities Test (SDMT; Smith, 1982), which measures working memory and processing speed. It presents a series of nine symbols, each of which is paired with a single digit in a key at the top of an 8.5 × 11 inch sheet. The remainder of the page presents a pseudo-randomized sequence of symbols. Participants respond by voicing the digit associated with each symbol as quickly as possible. As recommended by the MACFIMS panel (Benedict et al., 2002), we employed only the oral-response administration. The SDMT is strongly correlated with whole-brain atrophy in MS (Christodoulou et al., 2003; Benedict et al., 2004b). The alternate forms available are probably not equivalent (Boringa et al., 2001). Thus, only the standard form was used in this study. The data are nevertheless presented for descriptive purposes and to determine if the degree of practice was the same across groups.

Finally, patients also completed the Beck Depression Inventory–Fast Screen for Medical Patients (BDI-FS; Beck et al., 2000), which was recently validated in this population (Benedict et al., 2003).

Procedure

The participants were contacted by mail or approached during the course of their usual clinical care at an MS center. The tests were administered individually by a trained assistant or graduate student working under the supervision of a licensed psychologist. The entire test battery, requiring approximately 90 minutes, was repeated in 6–8 days following the baseline examination. The participants were assigned, randomly, in sequence, to same- (SF) or alternate-form (AF) conditions. The former underwent NP testing using the same forms on each occasion, whereas the AF group was tested with an alternate version at retest. The CVLT-II, PASAT, and COWAT have two, equivalent alternate forms (Rao, 1991; Benton et al., 1994; Ruff et al., 1996; Delis et al., 2000). For the BVMT-R, we employed the equivalent forms 1 and 4 (Benedict et al., 1996). At baseline, all patients were examined with the CVLT-II Standard Form, BVMT-R Form 1, PASAT Form A, and COWAT Form CFL. At retest, only the AF group was examined with alternate forms (CVLT-II Alternate Form, BVMT-R Form 4, PASAT Form B, COWAT Form PRW).

Analysis

Pearson correlation, analysis of variance (ANOVA), and chi-square tests were utilized to examine correlations and between-group effects. Reliability coefficients were compared using the Fisher r-to-z transformation. The primary analysis approach was mixed factor ANOVA, with group (SF, AF) serving as the between-groups factor, and time (test, retest) serving as the repeated factor. The hypothesis that using alternate forms differentially affects retest performance was tested by the group × time interaction. The dependent variables included total correct indices from the SDMT, PASAT, and COWAT, and the following were derived from the CVLT-II and BVMT-R: Trial 1, Trial 5 or 3, Total Learning, Delayed Recall, Delayed Recognition. For the memory tests, we also examined the pattern of learning and forgetting via 3 × 2 repeated measures ANOVAs. Here, the within-subjects effects were trial (Trial 1, Trial 5/3, Delayed Recall) and time (test, retest). Post-hoc comparisons were accomplished via t test. Throughout, alpha was set at p < .05. For descriptive purposes, we examined effect sizes using Cohen's d statistic, which is the difference between means divided by the pooled standard deviation (SD).

RESULTS

The SF and AF groups were well matched on demographics, disease features, and BDI-FS scores as demonstrated by nonsignificant univariate comparisons. The groups were also matched on baseline CVLT-II, BVMT-R, PASAT, and COWAT performance (all p values > .20).

Symbol Digit Modalities Test

The test-retest reliability coefficients were robust and similar for both groups (SF r = .98, p < .001; AF r = .97, p < .001). The performance of the SF and AF groups on the SDMT is presented in Figure 1. The ANOVA showed a significant effect for time [F(1,32) = 23.1, p < .001], but the group and group × time interaction effects were not significant. Collectively, the sample improved from a raw score of 51.9 ± 16.0 to 55.4 ± 17.4, a small effect of d = 0.2.

Mean raw scores produced by the SF and AF groups on the Symbol Digit Modalities Test (SDMT). Statistical analysis using ANOVA reveals significant gain or practice effect in both groups, but no difference in overall performance or degree of gain.

Auditory/Verbal Memory

CVLT-II data are presented in Table 1. The mixed-factor ANOVAs revealed significant interaction effects for all measures. In every analysis, the SF group showed significantly higher scores on retest compared to baseline [T1 t(16) = −3.9, p < .01; T5 t(16) = −2.4, p < .05; Total Learning t(16) = −4.1, p < .01; Delayed Recall t(16) = −4.2, p < .01; Delayed Recognition t(16) = −3.6, p < .01], whereas there were no significant test-retest effects in the AF group. Analysis of effect sizes revealed that SF d's ranged from 0.5 to 1.0 and AF effects ranged from −0.1 to 0.1.

Within-group data and interaction effects for the CVLT-II

Figure 2 describes the data as analyzed by the trial × time repeated measures ANOVA. The SF group (Fig. 2a) showed marked change over the test-retest interval, as demonstrated by a significant interaction [F(2,15) = 9.5, p < .01]. As noted earlier, t tests showed significant effects of time at Trial 1, Trial 5, and Delayed Recall, with higher scores being observed at retest. In an effort to further explain the interaction, Trial 5 and Delayed Recall scores were also compared. At baseline, there was a significant difference between the scores with the higher value being associated with Trial 5 [t(16) = 4.1, p < .01]. In contrast, the Trial 5/Delayed Recall comparison was not significant at retest [t(16) = 0.0].

Raw scores produced by the SF and AF groups on the California Verbal Learning Test–II. Shown are the mean number of words recalled on Trial 1, Trial 5, and after the 25 min delay interval. For the same form group (Fig. 2a) repeated measures, trial × time ANOVA reveals a significant interaction. Post hoc tests reveal decline in performance from Trial 5 to Delayed Recall at baseline, but not at retest. For the alternate form condition (Fig. 2b), the ANOVA reveals only a main effect for trial, meaning that performance is not affected by retesting.

The AF repeated measures ANOVA (Fig. 2b) revealed only a significant trial main effect [F(2,15) = 89.15, p < .001].

Visual/Spatial Memory

Similar findings emerged for the BVMT-R (Table 2). In all of the ANOVAs but one, there was again a significant interaction effect best explained by significant gain in the SF group [T1 t(16) = −6.5, p < .001; T3 t(16) = −2.7, p < .05; Total Learning t(16) = −6.1, p < .001; Delayed Recall t(16) = −2.9, p < .05]. This time, SF d's ranged from 0.7 to 1.6, whereas all AF effect sizes were again small (−0.2 to 0.2). There were no significant within-subjects effects in the AF group. For BVMT-R recognition, the ANOVA revealed no significant main or interaction effect.

Within-group data and interaction effects for the BVMT-R

The BVMT-R across trial data are shown in Figure 3. Again, the SF group (Fig. 3a) showed marked change over the test-retest interval, as demonstrated by trial × time interaction [F(2,15) = 4.7, p < .05]. As noted earlier, there were significant test-retest effects at Trial 1, Trial 3, and Delayed Recall. When comparing Trial 3 and Delayed Recall scores, we observed a trend toward a significant difference at test [t(16) = 1.9, p = .07] and no effect at retest [t(16) = 1.7].

Raw scores produced by the SF and AF groups on the Brief Visuospatial Memory Test–Revised. Shown are the mean recall scores (range 0–12) on Trial 1, Trial 3, and after the 25 min delay interval. For the same form group (Fig. 3a) repeated measures, trial × time ANOVA reveals a significant interaction. Post hoc tests reveal a trend toward decline in performance from Trial 5 to Delayed Recall at baseline, but not at retest. For the alternate form condition (Fig. 3b), the ANOVA reveals only a main effect for trial, meaning that performance is not affected by retesting.

As was the case with the CVLT-II, the AF ANOVA showed only a significant trial main effect [F(2,15) = 49.7, p < .001].

Executive Control Tests

The interaction effects (Table 3) were not significant for either the PASAT [F(1,32) = 0.1] or COWAT [F(1,32) = 0.2]. There were no significant group effects, but both ANOVAs revealed significant effects of time [PASAT F(1,32) = 19.9, p < .001; COWAT F(1,32) = 8.8, p < .01].

Within-group data and interaction effects for the PASAT and COWAT

Reliability

Same/alternate form pairings of test-retest reliability coefficients were analyzed for statistical significance using the Fisher r-to-z transformation. Significant differences were found on four measures: CVLT-II Delayed Recall [SF r = 0.54, AF r = 0.89, z = −2.2, p < .05], BVMT-R Trial 1 [SF r = 0.47, AF r = 0.88, z = −2.3, p < .05], BVMT-R Total Learning [SF r = 0.64, AF r = 0.91, z = −2.0, p < .05], and BVMT-R Delayed Recall [SF r = 0.32, AF r = 0.85, z = −2.5, p < .05]. In each case, higher reliability coefficients were found in the AF group, and, except for one case, there was substantially greater variance at retest in the AF group (Tables 1 and 2).

DISCUSSION

The general aim of this study was to investigate the effects of using alternate-forms during repeated NP testing in MS. The common assumption that alternate forms protect against inter-examination learning effects was directly tested by randomly assigning patients to same- (SF) and alternate-form (AF) conditions. Because practice effects can be attributed to the learning of both the procedural aspects of testing and test content, this study focused on tests of auditory/verbal and visual/spatial memory. When retested at one week, SF patients showed marked practice benefit on the order of ½ to 1 SD in magnitude. Patients taking a different form of these memory tests showed no such practice. Thus, the data support the use of alternate forms, at least for these particular tests, in MS patients.

There is an additional benefit with the use of alternate forms. Figures 2 and 3 clearly show that the shape of the learning and delayed recall curve in the SF group is affected by the same/alternate form contingency (see also Tables 1 and 2). For example, at test 1 (or baseline), the asymptote of the CVLT-II learning curve is 12.3 ± 3.0. The mean recall score after the delay interval falls to 10.5 ± 3.5. Such a significant, if modest, drop in performance would be expected in a cerebral disease such as MS (Benedict et al., 2002; Delis et al., 2000). On retest, however, the Trial 5 and Delayed Recall values are precisely the same. Similar findings emerge for the BVMT-R, although the statistical test of Trial 3 versus Delayed Recall for the SF group at baseline shows only a trend toward significance. In other words, there is no longer evidence of modest forgetting after patients were exposed to the baseline examination. As such, the validity of the CVLT-II and BVMT-R is compromised when readministered in this way.

Is reliability compromised when alternate forms are used? In this study, test-retest correlations were calculated for each CVLT-II and BVMT-R measure independently for the SF and AF groups. The reliability coefficients were statistically different in four measures. In general, these significant findings were associated with higher reliability in the AF group. This finding was contrary to hypothesis. Error in test-retest analysis is often conceptualized (Crocker & Algina, 1986) as being attributed to the effects of time (e.g., fluctuations in mental state, motivation, etc.) or both time and content as may occur when alternate forms are used. Because the AF group's error variance may have stemmed from two different sources (time, content), lower r values were expected from these data. The most parsimonious explanation of this unexpected finding is that practice in the SF group restricted the range of retest scores, thereby compromising Pearson r statistics. The BVMT-R Delayed Recall measure provides a useful illustration of this point. At baseline, the mean scores and variances are nearly identical (SF = 8.8 ± 2.8, AF = 8.7 ± 2.6). One week later, the SF group mean is 10.7 but the range of possible scores extends only to 12. Therefore, it is not surprising to find that the SD drops to 1.3. In contrast, there is virtually no change in the AF group mean or variance.

These data have a bearing on calculations of reliable change intervals that may be used in clinical trials to gauge the meaning of improved NP test performance. The familiar Reliable Change Index (RCI) is most simply calculated as the estimated practice effect +/− a margin of error based on the standard error of the difference (SEdiff). The SEdiff is the square root of the sum of the squared standard error of the means (SEMs) for test and retest (Hageman & Arrindell, 1993; Iverson et al., 2003). SEMs are, in turn, directly related to the reliability coefficient. Thus, as can be seen in Tables 1 and 2, low test-retest reliability is associated with higher SEdiff values, which would invariably lead to larger RCIs. For example, returning to the SF group's BVMT-R Delayed Recall performance, the 80% confidence interval (CI), as derived from the SEdiff of 2.5, is 3.3. When this figure is added to the measured practice effect (+1.9) the 80% RCI stands at 5. Considering that the baseline mean is 8.8, the RCI then extends beyond the test's ceiling. This finding raises serious concerns about the test's utility when the same form is administered repeatedly.

It is noteworthy that while significant practice effects were found, group × time interactions did not emerge for either the PASAT or COWAT. The PASAT finding is perhaps most easily explained. Patients are asked to quickly add numbers while listening for new stimuli, and it would be nearly impossible for patients to memorize number combinations while the test is proceeding. Practice effects found on the PASAT are presumably a result of the development of a test-taking strategy that should not differ across same versus alternate-form conditions. The COWAT data are more difficult to explain. Using a two-week test-retest interval, we (Zgaljardic & Benedict, 2001) found similar results when comparing the CFL and FAS forms in healthy volunteers. Although a significant effect of practice was observed, there was no interaction. These and the present findings imply that procedural aspects of the test (e.g., retrieval strategy, learning to avoid repeating the same stem word, etc.) are more important than the examinee's familiarity with specific word content. The finding, if replicated, would challenge the common belief that alternate forms are an important consideration for tests of generative verbal fluency.

This study supports the reliability and validity of several tests proposed by a consensus panel for the minimal core assessment of MS patients (Benedict et al., 2002). The MACFIMS battery is a rationally derived, clinically oriented battery based on expert consensus regarding the cognitive domains most important to assess in MS. Test selection was based largely on psychometric properties of candidate measures and ease of administration. All of the recommended tests have acceptable test-retest reliability in this sample of 34 MS patients. If one assumes the use of CVLT-II and BVMT-R alternate forms, only one measure (CVLT-II Trial 1) showed a reliability coefficient less than 0.70. Furthermore, the construct validity of these memory tests is supported by the demonstration of a learning curve followed by modest forgetting, approximating data derived from healthy volunteers (Benedict, 1997; Delis et al., 2000).

There are several methodological limitations in this study that merit further comment. First, the lack of a normal-control group hindered determination of whether the magnitude of practice shown on these tests is “normal.” Second, the small sample size raises questions about external validity, and because so few men were enrolled in the study, analysis of gender interaction effects could not be pursued. The study is further limited by the one-week test-retest interval. These findings are useful for planning serial assessments for clinical trials where short test-retest intervals are needed. However, many clinical trials or natural history studies will require much longer test-retest intervals and very different findings might be obtained in such a research design. In addition, the patients studied here are mostly those with relapsing-remitting course and mild physical/neurological disability. Research with other clinical populations suggests that elderly patients with chronic psychiatric disease may not show practice with repeated exposure to the same test forms (Harvey et al., 2005). Unfortunately, our study sheds no light on the question of whether or not alternate forms are valuable in older MS patients with greater degrees of disability.

These concerns notwithstanding, the present data support the use of alternate forms when evaluating memory in MS repeatedly over time. When the test-retest interval is short, alternate forms may enhance, not hinder, test-retest reliability.

ACKNOWLEDGMENTS

I acknowledge the assistance of Drs. Darcy Cox, Laetitia Thompson, and Frederick Foley for their assistance in data collection. This research was supported, in part, by an unrestricted educational grant from Biogen Idec.

References

REFERENCES

Amato, M.P., Ponziani, G., Siracusa, G., & Sorbi, S. (2001). Cognitive dysfunction in early-onset multiple sclerosis: A reappraisal after 10 years. Archives of Neurology, 58, 16021606.Google Scholar
American Psychiatric Association (1994). Diagnostic and Statistical Manual of Mental Disorders. (4th ed.) Washington, DC: American Psychiatric Association.
Archibald, C.J. & Fisk, J.D. (2000). Information processing efficiency in patients with multiple sclerosis. Journal of Clinical & Experimental Neuropsychology, 22, 686701.Google Scholar
Beatty, W.W. (1996). Memory disturbance in multiple sclerosis: Reconsideration of patterns of performance on the Selective Reminding Test. Journal of Clinical & Experimental Neuropsychology, 18, 5662.Google Scholar
Beatty, W.W., Goodkin, D.E., Monson, N., & Beatty, P.A. (1989). Cognitive disturbances in patients with relapsing remitting multiple sclerosis. Archives of Neurology, 46, 11131119.Google Scholar
Beck, A.T., Steer, R.A., & Brown, G.K. (2000). BDI–Fast Screen for Medical Patients: Manual. San Antonio, TX: Psychological Corporation.
Benedict, R.H.B. (1997). Brief Visuospatial Memory Test–Revised: Professional manual. Odessa, FL: Psychological Assessment Resources, Inc.
Benedict, R.H.B., Cox, D., Thompson, L.L., Foley, F.W., Weinstock-Guttman, B., & Munschauer, F. (2004a). Reliable screening for neuropsychological impairment in MS. Multiple Sclerosis, 10, 675678.Google Scholar
Benedict, R.H.B., Fischer, J.S., Archibald, C.J., Arnett, P.A., Beatty, W.W., Bobholz, J., Chelune, G.J., Fisk, J.D., Langdon, D.W., Caruso, L., Foley, F., LaRocca, N.G., Vowels, L., Weinstein, A., DeLuca, J., Rao, S.M., & Munschauer, F. (2002). Minimal neuropsychological assessment of MS patients: A consensus approach. Clinical Neuropsychologist, 16, 381397.Google Scholar
Benedict, R.H.B., Fishman, I., McClellan, M.M., Bakshi, R., & Weinstock-Guttman, B. (2003). Validity of the Beck Depression Inventory–Fast Screen in multiple sclerosis. Multiple Sclerosis, 9, 393396.Google Scholar
Benedict, R.H.B., Schretlen, D., Brandt, J., & Groninger, L. (1998). Revision of the Hopkins Verbal Learning Test: Reliability and normative data. Clinical Neuropsychologist, 12, 4355.Google Scholar
Benedict, R.H.B., Schretlen, D., Groninger, L., Dobraski, M., & Shpritz, B. (1996). Revision of the Brief Visuospatial Memory Test: Studies of normal performance, reliability, and validity. Psychological Assessment, 8, 145153.Google Scholar
Benedict, R.H.B., Weinstock-Guttman, B., Fishman, I., Sharma, J., Tjoa, C.W., & Bakshi, R. (2004b). Prediction of neuropsychological impairment in multiple sclerosis: Comparison of conventional magnetic resonance imaging measures of atrophy and lesion burden. Archives of Neurology, 61, 226230.Google Scholar
Benedict, R.H.B. & Zgaljardic, D.J. (1998). Practice effects during repeated administrations of memory tests with and without alternate forms. Journal of Clinical and Experimental Neuropsychology, 20, 339352.Google Scholar
Benton, A.L., Sivan, A.B., Hamsher, K., Varney, N.R., & Spreen, O. (1994). Contributions to neuropsychological assessment. (2nd ed.) New York: Oxford University Press.
Boringa, J.B., Lazeron, R.H., Reuling, I.E., Ader, H.J., Pfennings, L., Lindeboom, J., de Sonneville, L.M., Kalkers, N.F., & Polman, C.H. (2001). The brief repeatable battery of neuropsychological tests: Normative values allow application in multiple sclerosis clinical practice. Multiple Sclerosis, 7, 263267.Google Scholar
Brandt, J. & Benedict, R.H.B. (2001). The Hopkins Verbal Learning Test Revised: Professional manual. Odessa, FL: Psychological Assessment Resources, Inc.
Camp, S.J., Stevenson, V.L., Thompson, A.J., Miller, D.H., Borras, C., Auriacombe, S., Brochet, B., Falautano, M., Filippi, M., Herisse-Dulo, L., Montalban, X., Parrcira, E., Polman, C.H., De Sa, J., & Langdon, D.W. (1999). Cognitive function in primary progressive and transitional progressive multiple sclerosis: A controlled study with MRI correlates. Brain, 122, 13411348.Google Scholar
Chelune, G.J., Naugle, R.I., Lueders, H., & Awad, I.A. (1991). Prediction of cognitive change as a function of preoperative ability status among temporal lobectomy patients seen at 6-month follow-up. Neurology, 41, 399404.Google Scholar
Chelune, G.J., Naugle, R.I., Lueders, H., & Sedlak, J. (1993). Individual change after epilepsy surgery: Practice effects and base-rate information. Neuropsychology, 7, 4152.Google Scholar
Christodoulou, C., Krupp, L.B., Liang, Z., Huang, W., Melville, P., Roque, C., Scherl, W.F., Morgan, T., MacAllister, W.S., Li, L., Tudorica, L.A., Li, X., Roche, P., & Peyster, R. (2003). Cognitive performance and MR markers of cerebral injury in cognitively impaired MS patients. Neurology, 60, 17931798.Google Scholar
Cohen, J. (1988). Statistical power for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Collie, A., Darby, D.G., Falleti, M.G., Silbert, B.S., & Maruff, P. (2002). Determining the extent of cognitive change after coronary surgery: A review of statistical procedures. Annals of Thoracic Surgery, 73, 20052011.Google Scholar
Crawford, J.R., Stewart, L.E., & Moore, J.W. (1989). Demonstration of savings on the AVLT and development of a parallel form. Journal of Clinical and Experimental Neuropsychology, 11, 975981.Google Scholar
Crocker, L. & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
Cutter, G.R., Baier, M.L., Rudick, R.A., Cookfair, D.L., Fischer, J.S., Petkau, J., Syndulko, K., Weinshenker, B.G., Antel, J.P., Confavreaux, C., Ellison, G.W., Lublin, F., Miller, A.E., Rao, S.M., Reingold, S., Thompson, A., & Wiloughby, E. (1999). Development of a multiple sclerosis functional composite as a clinical trial outcome measure. Brain, 122, 871882.Google Scholar
DeCarli, C., Mungas, D., Harvey, D., Reed, B., Weiner, M., Chui, H., & Jagust, W. (2004). Memory impairment, but not cerebrovascular disease, predicts progression of MCI to dementia. Neurology, 63, 220227.Google Scholar
Delis, D.C., Kaplan, E., & Kramer, J.H. (2001). Delis-Kaplan Executive Function System. San Antonio, TX: Psychological Corporation.
Delis, D.C., Kramer, J.H., Kaplan, E., & Ober, B.A. (2000). California Verbal Learning Test Manual: Second edition, Adult version. San Antonio, TX: Psychological Corporation.
Delis, D.C., Kramer, J.H., Kaplan, E., & Ober, B.A. (1987). California Verbal Learning Test: Adult version. San Antonio, TX: Psychological Corporation.
DeLuca, J., Gaudino, E.A., Diamond, B.J., Christodoulou, C., & Engel, R.A. (1998). Acquisition and storage deficits in multiple sclerosis. Journal of Clinical and Experimental Neuropsychology, 20, 376390.Google Scholar
DeLuca, J., Johnson, S.K., & Natelson, B.H. (1993). Information processing efficiency in chronic fatigue syndrome and multiple sclerosis. Archives of Neurology, 50, 301304.Google Scholar
DeLuca, J., Barbieri-Berger, S., & Johnson, S.K. (1994). The nature of memory impairments in multiple sclerosis: Acquisition versus retrieval. Journal of Clinical and Experimental Neuropsychology, 16, 183189.Google Scholar
Demaree, H.A., DeLuca, J., Gaudino, E.A., & Diamond, B.J. (1999). Speed of information processing as a key deficit in multiple sclerosis: Implications for rehabilitation. Journal of Neurology, Neurosurgery & Psychiatry, 67, 661663.Google Scholar
Duff, K., Westervelt, H.J., McCaffrey, R.J., & Haase, R.F. (2001). Practice effects, test-retest stability, and dual baseline assessments with the California Verbal Learning Test in an HIV sample. Archives of Clinical Neuropsychology, 16, 461476.Google Scholar
Fischer, J.S. (1988). Using the Wechsler Memory Scale–Revised to detect and characterize memory deficits in multiple sclerosis. Clinical Neuropsychologist, 2, 149172.Google Scholar
Fischer, J.S., Priore, R.L., Jacobs, L.D., Cookfair, D.L., Rudick, R.A., Herndon, R.M., Richert, J.R., Salazar, A.M., Goodkin, D.E., Granger, C.V., Simon, J.H., Grafman, J.H., Lezak, M.D., O'Reilly-Hovey, K.M., Perkins, K.K., Barilla-Clark, D., Schacter, M., Shucard, D.W., Davidson, A.L., Wende, K.E., Bourdette, D.N., & Kooijmans-Coutinho, M.F. (2000). Neuropsychological effects of interferon beta-1a in relapsing multiple sclerosis. Multiple Sclerosis Collaborative Research Group. Annals of Neurology, 48, 885892.Google Scholar
Fischer, J.S., Rudick, R.A., Cutter, G.R., & Reingold, S.C. (1999). The Multiple Sclerosis Functional Composite Measure (MSFC): An integrated approach to MS clinical outcome assessment. National MS Society Clinical Outcomes Assessment Task Force. Multiple Sclerosis, 5, 244250.Google Scholar
Franklin, G.M., Heaton, R.K., Nelson, L.M., Filley, C.M., & Seibert, C. (1988). Correlation of neuropsychological and MRI findings in chronic/progressive multiple sclerosis. Neurology, 38, 18261829.Google Scholar
Grant, I., McDonald, W.I., Trimble, M.R., Smith, E., & Reed, R. (1984). Deficient learning and memory in early and middle phases of multiple sclerosis. Journal of Neurology, Neurosurgery & Psychiatry, 47, 250255.Google Scholar
Green, M.F., Kern, R.S., & Heaton, R.K. (2004). Longitudinal studies of cognition and functional outcome in schizophrenia: Implications for MATRICS. Schizophrenia Research, 72, 4151.Google Scholar
Gronwall, D.M.A. (1977). Paced auditory serial addition task: A measure of recovery from concussion. Perceptual and Motor Skills, 44, 367373.Google Scholar
Hageman, W.J.J.M. & Arrindell, W.A. (1993). A further refinement of the reliable change (RC) index by improving the pre-post difference score: Introducing RC(ID). Behaviour Research and Therapy, 31, 693700.Google Scholar
Harvey, P.D., Palmer, B.W., Heaton, R.K., Mohamed, S., Kennedy, J., & Brickman, A. (2005). Stability of cognitive performance in older patients with schizophrenia: An 8-week test-retest study. American Journal of Psychiatry, 162, 110117.Google Scholar
Hawkins, K.A. & Wexler, B.E. (1999). California Verbal Learning Test practice effects in a schizophrenia sample. Schizophrenia Research, 39, 7378.Google Scholar
Heaton, R.K., Grant, I., Butters, N., White, D.A., Kirson, D., Atkinson, J.H., McCutchan, J.A., Taylor, M.J., Kelly, M.D., & Ellis, R.J. (1995). The HNRC 500—Neuropsychology of HIV infection at different disease stages. HIV Neurobehavioral Research Center. Journal of the International Neuropsychological Society, 1, 231251.Google Scholar
Heaton, R.K. (1985). Neuropsychological findings in relapsing-remitting and chronic-progressive multiple sclerosis. Journal of Consulting & Clinical Psychology, 53, 103110.Google Scholar
Heaton, R.K., Temkin, N., Dikmen, S., Avitable, N., Taylor, M.J., Marcotte, T.D., & Grant, I. (2001). Detecting change: A comparison of three neuropsychological methods, using normal and clinical samples. Archives of Clinical Neuropsychology, 16, 7591.Google Scholar
Iverson, G.L., Lovell, M.R., & Collins, M.W. (2003). Interpreting change on ImPACT following sport concussion. Clinical Neuropsychologist, 17, 460467.Google Scholar
Jacobs, L.D., Cookfair, D.L., Rudick, R.A., Herndon, R.M., Richert, J.R., Salazar, A.M., Fischer, J.S., Goodkin, D.E., Granger, C.V., Simon, J.H., Alam, J.J., Bartoszak, D.M., Bourdette, D.N., Braiman, J., Brownscheidle, C.M., Coats, M.K., Cohan, S.L., Dougherty, D.S., Kinkel, R.P., Mass, M.K., Munschauer, F.E., Priore, R.L., Pullicino, P.M., Scherokman, B.J., Weinstock-Guttman, B., Whitham, R.H., & Multiple Sclerosis Collaborative Research Group. (1996). Intramuscular interferon beta-1a for disease progression in relapsing multiple sclerosis. Annals of Neurology, 39, 285294.Google Scholar
Jacobs, L.D., Wende, K.E., Brownscheidle, C.M., Apatoff, B., Coyle, P.K., Goodman, A., Gottesman, M.H., Granger, C.V., Greenberg, S.J., Herbert, J., Krupp, L., Lava, N.S., Mihai, C., Miller, A.E., Perel, A., Smith, C.R., & Snyder, D.H. (1999). A profile of multiple sclerosis: The New York State Multiple Sclerosis Consortium. Multiple Sclerosis, 5, 369376.Google Scholar
Jacobson, N.S. & Truax, P. (1991). Clinical significance: A statistical approach to defining meaningful change in psychotherapy research. Journal of Consulting and Clinical Psychology, 59, 1219.Google Scholar
Johnson, K.P., Brooks, B.R., Cohen, J.A., Ford, C.C., Goldstein, J., Lisak, R.P. et al. (1995). Copolymer 1 reduces relapse rate and improves disability in relapsing-remitting multiple sclerosis: Results of a phase III multicenter, double-blind placebo-controlled trial. The Copolymer 1 Multiple Sclerosis Study Group. Neurology, 45, 12681276.Google Scholar
Krupp, L.B., Christodoulou, C., Melville, P., Scherl, W.F., MacAllister, W.S., & Elkins, L.E. (2004). Donepezil improves memory in multiple sclerosis in a randomized clinical trial. Neurology, 63, 15791585.Google Scholar
Kujala, P. (1994). Automatic and controlled information processing in multiple sclerosis. Brain, 117, 11151126.Google Scholar
Kurtzke, J.F. (1983). Rating neurologic impairment in multiple sclerosis: An expanded disability status scale (EDSS). Annals of Neurology, 13, 227231.Google Scholar
Litvan, I., Grafman, J., Vendrell, P., & Martinez, J. (1988). Multiple memory deficits in patients with multiple sclerosis: Exploring the working memory system. Archives of Neurology, 45, 607610.Google Scholar
Lovell, M., Collins, M., & Bradley, J. (2004). Return to play following sports-related concussion. Clinics in Sports Medicine, 23, 421441.CrossRefGoogle Scholar
McCaffrey, R.J., Cousins, J.P., Westervelt, H.J., & Martynowicz, M. (1995). Practice effects with the NIMH AIDS abbreviated neuropsychological battery. Archives of Clinical Neuropsychology, 10, 241250.Google Scholar
McCaffrey, R.J., Ortega, A., & Haase, R.F. (1993). Effects of repeated neuropsychological assessments. Archives of Clinical Neuropsychology, 8, 519524.Google Scholar
McCaffrey, R.J., Ortega, A., Orsillo, S.M., & Nelles, W.B. (1992). Practice effects in repeated neuropsychological assessments. Clinical Neuropsychologist, 6, 3242.Google Scholar
McDonald, W.I., Compston, A., Edan, G., Goodkin, D.E., Hartung, H., Lublin, F., McFarland, H.F., Paty, D.W., Polman, C.H., Reingold, S.C., Sandberg-Wollheim, M., Sibley, William A., Thompson, A., van der Noort, S., Weinshenker, B.Y., & Wolinsky, J.S. (2001). Recommended diagnostic criteria for multiple sclerosis: Guidelines from the international panel on the diagnosis of multiple sclerosis. Annals of Neurology, 50, 121127.Google Scholar
Medaer, R., De Smedt, L., Swerts, M., & Geutjens, J. (1984). Use of rating scales to reflect cognitive and mental functioning in multiple sclerosis. Acta Neurologica Scandinavica, Supplementum, 101, 6567.Google Scholar
Parker, E.S., Eaton, E.M., Whipple, S.C., Heseltine, P.N.R., & Bridge, T.P. (1995). University of Southern California repeatable episodic memory test. Journal of Clinical and Experimental Neuropsychology, 17, 926936.Google Scholar
Paty, D.W., Li, D., UBC MS/MRI Study Group, & IFNB Multiple Sclerosis Study Group (1993). Interferon beta-1b is effective in relapsing-remitting multiple sclerosis. Neurology, 43, 662667.Google Scholar
Rao, S.M. (1991). A manual for the brief, repeatable battery of neuropsychological tests in multiple sclerosis. New York: National Multiple Sclerosis Society.
Rao, S.M., Hammeke, T.A., McQuillen, M.P., Khatri, B.O., & Lloyd, D. (1984). Memory disturbance in chronic progressive multiple sclerosis. Archives of Neurology, 41, 625631.Google Scholar
Rao, S.M., Leo, G.J., Bernardin, L., & Unverzagt, F. (1991). Cognitive dysfunction in multiple sclerosis. I. Frequency, patterns, and prediction. Neurology, 41, 685691.Google Scholar
Rao, S.M., Leo, G.J., Haughton, V.M., Aubin-Faubert, P.S., & Bernardin, L. (1989). Correlation of magnetic resonance imaging with neuropsychological testing in multiple sclerosis. Neurology, 39, 161166.Google Scholar
Rapport, L.J., Axelrod, B.N., Theisen, M.E., Brines, D.B., Kalechstein, A.D., & Ricker, J.H. (1997). Relationship of IQ to verbal learning and memory: Test and retest. Journal of Clinical and Experimental Neuropsychology, 19, 655666.Google Scholar
Rey, A. (1964). L'examen clinique en psychologie. Paris: Press Universitaire de France.
Ruff, R.M., Light, R.H., Parker, S.B., & Levin, H.S. (1996). Benton Controlled Oral Word Association Test: Reliability and updated norms. Archives of Clinical Neuropsychology, 11, 329338.Google Scholar
Sawrie, S.M., Chelune, G.J., Naugle, R.I., & Lueders, H.O. (1996). Empirical methods for assessing meaningful neuropsychological change following epilepsy surgery. Journal of the International Neuropsychological Society, 2, 556564.Google Scholar
Smith, A. (1982). Symbol Digit Modalities Test: Manual. Los Angeles: Western Psychological Services.
Temkin, N.R., Heaton, R.K., Grant, I., & Dikmen, S.S. (1999). Detecting significant change in neuropsychological test performance: A comparison of four models. Journal of the International Neuropsychological Society, 5, 357369.Google Scholar
Woodard, J.L., Benedict, R.H.B., Roberts, V.J., Goldstein, F.C., Kinner, K.M., Capruso, D.X., & Clark, A.N. (1996). The reliability and validity of alternate short forms for the Benton Judgement of Line Orientation Test. Journal of Clinical and Experimental Neuropsychology, 18, 898904.Google Scholar
Zgaljardic, D.J. & Benedict, R.H.B. (2001). Evaluation of practice effects in language and spatial processing test performance. Applied Neuropsychology, 8, 218223.Google Scholar
Figure 0

Mean raw scores produced by the SF and AF groups on the Symbol Digit Modalities Test (SDMT). Statistical analysis using ANOVA reveals significant gain or practice effect in both groups, but no difference in overall performance or degree of gain.

Figure 1

Within-group data and interaction effects for the CVLT-II

Figure 2

Raw scores produced by the SF and AF groups on the California Verbal Learning Test–II. Shown are the mean number of words recalled on Trial 1, Trial 5, and after the 25 min delay interval. For the same form group (Fig. 2a) repeated measures, trial × time ANOVA reveals a significant interaction. Post hoc tests reveal decline in performance from Trial 5 to Delayed Recall at baseline, but not at retest. For the alternate form condition (Fig. 2b), the ANOVA reveals only a main effect for trial, meaning that performance is not affected by retesting.

Figure 3

Within-group data and interaction effects for the BVMT-R

Figure 4

Raw scores produced by the SF and AF groups on the Brief Visuospatial Memory Test–Revised. Shown are the mean recall scores (range 0–12) on Trial 1, Trial 3, and after the 25 min delay interval. For the same form group (Fig. 3a) repeated measures, trial × time ANOVA reveals a significant interaction. Post hoc tests reveal a trend toward decline in performance from Trial 5 to Delayed Recall at baseline, but not at retest. For the alternate form condition (Fig. 3b), the ANOVA reveals only a main effect for trial, meaning that performance is not affected by retesting.

Figure 5

Within-group data and interaction effects for the PASAT and COWAT