INTRODUCTION
Memory loss is probably the most consistent feature of cognitive impairment associated with brain damage and normal aging. It is certainly one of the most frequent complaints of the elderly and can seriously affect a person's quality of life. Nevertheless, until recently, retraining programs have been directed mainly at the memory problems of brain-damaged populations, with few programs designed specifically for the elderly (for review, see Glisky & Glisky, 1999). For these reasons, a priority objective in designing a rehabilitation program suitable for older adults was to improve memory performance in our participants.
Our approach was guided by encouraging results from the limited research into memory training in the elderly (Glisky & Glisky, 1999). For example, Ball and colleagues (2002), in the large ACTIVE study, compared three different cognitive training interventions (memory, reasoning, and speed of processing) and included a no-contact control group. There was improvement in each targeted cognitive ability, although the percentage of those who demonstrated reliable improvement after training was less in memory (26%) than in reasoning (74%) and speed (87%). Other studies demonstrated positive effects after memory training (e.g., Scogin et al., 1985), with long-term retention of improvement lasting from months (e.g., Sheikh et al., 1986) to over 3 years (Stigsdotter Neely & Bäckman, 1993).
Memory is a multidimensional construct, with different types of memory affected to varying degrees in old age. Our approach to rehabilitation was based on a theoretical framework that differentiates two major components of remembering: one more automatic and familiarity-based, the other more controlled and recollective (Jacoby, 1991). Aging adversely affects recollection, but leaves familiarity relatively intact (Jennings & Jacoby, 1993). Evidence suggests that training improves controlled recollection, but not the more automatic familiarity aspects (Jacoby et al., 1996). In the present study, we assessed recollective memory in several tests that were also designed to measure primary and secondary memory, as well as working memory.
The term “short-term memory” is used in different ways by cognitive researchers and by clinicians, so we avoid it in favor of Waugh and Norman's (1965) distinction between primary and secondary memory. In their usage, the short-term retention of information is mediated partly by the small amount of information that is maintained continuously in mind, and partly by further information that was perceived and encoded, but then dropped out of conscious awareness, and so must be retrieved to be recalled. These components of short-term retention are referred to as primary memory and secondary memory, respectively. Primary memory is affected very little by normal aging (Craik, 1968) or even by amnesia (Baddeley & Warrington, 1970), so it seemed unlikely that the present rehabilitation procedures would benefit this form of memory. On the other hand, secondary memory is the type involved in real-life episodic memory whose performance depends on the efficiency of encoding and retrieval processes, both of which should be sensitive to rehabilitation.
Working memory (Baddeley & Hitch, 1974; Engle & Kane, 2004) is similar to primary memory in that both terms refer to a small amount of material held in conscious awareness. The difference is that, whereas primary memory involves the verbatim reproduction of presented material, working memory involves the manipulation and transformation of the material held in mind. The efficiency of working memory operations, particularly if the task requires greater involvement of the “central executive,” is crucial to a wide range of our cognitive behaviors.
In the present study, working memory was assessed by the Alpha Span Test (Craik, 1986), and primary memory was measured both by the Brown-Peterson Test (Floden et al., 2000) and the Hopkins Verbal Learning Test − Revised (HVLT-R; Benedict et al., 1998; Brandt, 1991). Measures of secondary memory were obtained from the Brown-Peterson Test, the HVLT-R, and also from the Logical Stories Test (Dixon et al., 1989). These tests include measures of immediate and delayed recall, recognition, as well as strategic processing (subjective organization and semantic clustering).
METHODS
Participants and Design
Forty-nine healthy independent-living older (71–87 years old) adults, who had subjective complaints of cognitive or memory impairment participated (see Stuss et al., 2007, for details). The volunteers who met inclusion criteria were divided into an Early Training Group (ETG; N = 29, 14 men) and a Late Training Group (LTG; N = 20, 8 men). A blocked randomization procedure was used, constrained by the need to equalize the groups with respect to the Mini-Mental State Examination (MMSE; Folstein et al., 1975), education, sex, and age. The groups were well matched on demographic and neuropsychological variables, with one exception [Logical Memory immediate recall (Wechsler Memory Scale-Revised; Wechsler, 1987)], in which the ETG was better than the LTG (Stuss et al., 2007). For the ETG, training began immediately after admission into the program. The LTG served as a control, beginning training 3 months later. The study was approved by the Baycrest Research Ethics Board and conducted in accordance with the guidelines of the Helsinki Declaration.
In overview, the complete program consisted of three 4-week modules: Memory Skills Training, Goal Management Training, and Psychosocial Training. There were four testing sessions, spaced 3 months apart, referred to as Assessments A, B, C, and D: the first session (Assessment A) provided pretraining baseline measures for both groups; Assessment B followed training for the ETG but not for the LTG; the ETG received no further training before Assessment C, but the LTG received training between Assessments B and C; finally, Assessment D was a long-term follow-up, occurring for each group 6 months after training. The training sessions were administered to groups of 5–6 participants. Missed attendance was negligible, with make-up sessions provided to ensure that all participants received the same amount of training. Homework completion was monitored to maximize comparability between groups. In this study, we provide the procedures and results related to the Memory Training module.
Memory Training
The focus of the Memory Training module was on learning a variety of strategies and techniques to improve organizational and memory skills. Consistent with other modules, the first memory training session began with a review of the rehabilitation program goal, including its aims and objectives. This session was followed by an interactive lecture that highlighted the complexity of memory and its relationship to brain function. Participants learned of the different types of memory and that forgetting can occur at different rates and for different reasons. Factors that affect successful encoding and retrieval (e.g., attention, fatigue, physical limitations, pain, medication, stress) were identified. Self-awareness of individual memory slips was promoted by encouraging participants to maintain a “slips” log throughout the module. At the completion of the first session, participants were briefly introduced to the importance of external strategies and asked to log the external strategies they used throughout the following week.
The second training session continued the focus on external strategies. It began with a review and discussion of memory slips that participants logged throughout the week. The use of strategies such as date books/diaries, post-it notes, timers, and wall calendars was encouraged. Additional techniques and strategies for enhancing remembering were discussed. These included external self-talk (the process of working through a problem or situation by thinking out loud), routines and habits (placing car keys in the same location in the house ensures being able to find them when needed), and organization and planning in ways that would reduce the number of memory slips. At the conclusion of this session, participants were encouraged to use external strategies and reduce reliance on internal strategies and “just remembering.” As homework, they were asked to continue logging slips and identify external strategies that may prevent slips from occurring. In addition, participants were given a list of internal strategies, definitions, and examples as preparation for the final two memory seminars.
The primary focus of training sessions 3 and 4 was to practice a variety of internal strategies which help to encode information in a deep and meaningful way. Participants were shown a series of graphs demonstrating that deep levels of processing at encoding can significantly increase recall (Craik & Lockhart, 1972; Craik & Tulving, 1975). Building on that discussion, participants were introduced to the following six strategies: categorization, story making, visual imagery, association, motor movement, and spaced retrieval. Each strategy was discussed with examples. Participants were then provided with opportunities to practice using each strategy. As homework between sessions 3 and 4, participants were given article reading and name remembering assignments for practicing the various strategies.
Although participants were encouraged to learn and practice all strategies, it was recognized that different strategies work for different people in different situations and that combinations of strategies are commonly used. With this understanding, participants were presented with additional test opportunities in which they were instructed to use the strategy or combination of strategies that best suited them.
Dependent Measures
Alpha Span Test (Craik, 1986)
Four equivalent versions of the Alpha Span Test were used at Assessments A, B, C, and D. Each version of the test consisted of 14 lists of common one-syllable words. The lists varied in length from two to eight words, and there were two lists at each length. Presentation started with list-lengths of two words and proceeded with increasingly longer lists until the participant failed on both lists of the same length. On each trial, the participant's task was to rearrange the words mentally and recall them orally in alphabetical order. The two measures taken are Total Score, the number of errorless trials (ranging from 0 to 14), and Partial Score, where participants were given 1 point for each item that was recalled as a member of a correctly recalled adjacent pair.
Brown-Peterson Test (Floden et al., 2000, modification)
In this test, a list of four unrelated words was presented orally for the participant to recall orally after a delay of 0, 3, 18, or 36 s. The 48 words used in the 12 scored trials were high-frequency, concrete nouns of high imagery value; they were grouped arbitrarily into sets of four with no obvious associative links among members of each set. The delay intervals were filled with a rehearsal-preventing counting task, consisting of subtracting either 3, 2, or 1 from each of a series of two-digit numbers, presented visually. To equate the difficulty of the distracter task between participants, the task was first titrated to achieve a 60% accuracy rate at a 1-s presentation rate. During the titration phase, participants were presented with two-digit numbers at rates of 1.6, 1.2, and then 1.0 s per number. They were asked to complete the subtraction task to determine a condition enabling each participant to be 60% accurate at the 1-s rate.
In the test proper, short lists of four words were presented orally at a 2-s rate. Participants repeated words out loud to verify that they had heard each word correctly. Following presentation, participants were either cued immediately for recall (0 delay) or performed the subtraction task until the recall cue (a visual question mark) was presented for 2 s. The screen remained blank for an additional 18 s, allowing a total of 20 s for recall. The entire experiment consisted of three practice trials at a 3-s delay, and then three experimental trials at each of the four delay intervals, presented quasirandomly. The distractor task was presented visually at a rate of one number per second, and the amount subtracted from each number was set at the value attained in the titration phase. The same materials were presented on all four testing occasions.
The following dependent measures were analyzed:
a) Secondary memory—average correct word recall at the 36-s delay, on the assumption that primary memory is no longer available after 36 s of distracting activity.
b) Primary memory—average correct word recall over the 0-, 3-, and 18-s delays. At each delay, primary memory is calculated from the formula:
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170221052754741-0633:S1355617707070166:S1355617707070166ffm001.gif?pub-status=live)
in which PM = probability of primary memory recall, SM = probability of secondary memory recall (from the 36-s condition), and observed recall is the raw probability of recall at that delay interval (see Waugh & Norman, 1965, for fuller details).
c) Resource allocation. On the assumption that optimal performance involves allocating available resources to recalling the words, we calculated two measures, termed Performance and Strategy. Performance is the sum of the proportion of correct words and correct number subtractions at the 36-s delay. An increase in this measure indicates improved task performance. Strategy is the difference in proportions between correct words and correct numbers at the 36-s delay. An increase in this measure indicates that more resources are being allocated to recall.
Hopkins Verbal Learning Test-Revised (HVLT-R)
The HVLT-R (Benedict et al., 1998; Brandt, 1991) measures primary and secondary memory in addition to the rate of verbal learning over three successive learning and recall trials. A 12-item word list is presented orally at a rate of one word every 2 s, and the participant recalls as many words as possible in any order immediately after the presentation. Each list consists of three semantic categories with four words from each category. A different HVLT-R form was used on each testing occasion. After the participant could no longer recall any words, the list was presented again, followed by a further recall attempt. Three successive presentations and recall tests were administered. After the third learning trial, participants were instructed to remember the words because they would be asked to recall them again later in the session. After a 30-min delay in which other cognitive tests were administered, the participant again recalled the words in any order (delayed free recall). This delayed test was followed by a recognition test consisting of the 12 target words randomly mixed with 12 distracter words, with half of the distracter words being drawn from the same semantic categories as the targets (related distracters) and the other half from other categories (unrelated distracters).
The measures include correct immediate and delayed recall, and delayed recognition. For immediate recall, the total was broken down into primary and secondary memory scores. A recalled word was counted as coming from primary memory provided that no more than seven words intervened between the word's presentation and recall (Tulving & Colotla, 1970). The remaining recalled words were judged to come from secondary memory. In addition, three different measures of strategic organization in free recall were calculated: serial ordering, semantic clustering, and subjective organization. For serial ordering, 1 point was given each time the participant recalled two correct words in the same order as presented. From this total, we subtracted the number of serially recalled pairs expected by chance, and used this chance-adjusted score in the analyses. For semantic clustering, 1 point was given for each correct word following another word from the same semantic category. Subjective organization (Tulving, 1966) was defined as the number of correct word pairs recalled together (regardless of order) that had been recalled together in the previous trial. All strategic measures were adjusted based on the number of words recalled. Word repetitions and intrusions were also recorded. For the recognition test, the principal measure was number of hits minus number of false-positive errors made to distracter items.
Logical Stories Test
Four stories were selected from the original 25-story collection created by Dixon et al. (1989). The stories were designed to be semantically and structurally homologous, describing a single concrete event in the life of an older protagonist. In each story, we used the first paragraph—eight sentences containing approximately 100 words and an average of 53.8 propositions or idea units.
Each story was read aloud to participants who were then asked to recall as much information as they could, using as many of the same words as they could remember. Recall testing took place immediately (Immediate) and again approximately 30 min later (Delayed). If participants failed to recall the story during Delayed Recall, they were prompted with the gist of the story. Participants' oral recall was recorded and later transcribed for scoring. A different story was used on each test occasion, but stories were not counterbalanced over tests; that is, the same story was given to all participants on each specific test occasion.
Scoring was in terms of recollection of the specific and general points related in each story. These story items were designated as belonging to Level 1 (general gist of the story), Level 2 (details of the story), or Level 3 (highly specific details). Performance was calculated as a percentage of the possible total at each level. Measures of Levels 1, 2, and 3 plus the total number of story items recalled were calculated for both Immediate and Delayed Recall, making a total of eight measures overall.
Statistical Analysis
The general approach to data analysis is described in the Introductory paper (Stuss et al., 2007). Performance at Assessment A was examined with analysis of variance (ANOVA) to determine whether the ETG and LTG differed on any measure before rehabilitation training. Analysis of covariance (ANCOVA) was used to examine performance at each of Assessments B and C, while statistically controlling for performance at the previous session (Assessments A and B, respectively). Repeated measures analysis of variance was used to investigate long-term performance changes between Assessments A and D. Effect sizes are presented for all significant group comparisons.
One exception to this general data analysis strategy was required for the HVLT data. Scores from a subset of 7 ETG participants at Assessment A had to be excluded from analysis because they were allowed to write their responses as opposed to reporting them orally. In order to not suffer an unacceptable loss of subjects due to missing values at Assessment A, analysis of variance was carried out for HVLT at Assessment B without covarying Assessment A performance.
As indicated in the Introductory paper, 5 participants in the LTG did not complete Assessments C and D due to an outbreak of severe acute respiratory syndrome (SARS), which effectively prohibited research participants from entering the hospital.
Measures from the four memory tests fall into three general categories which were anticipated to respond differentially to this rehabilitation training program: two requiring relatively little recollection—working memory, primary memory; and one category dependent more on controlled recollection—secondary memory (including strategic processing related to efficient secondary memory). Working memory was assessed using the Alpha Span Test. Both the Brown-Peterson and Hopkins Verbal Learning Tests provided measures of primary and secondary memory. Strategic processing was directly targeted by the program, and measures of this were built into Brown-Peterson, HVLT (e.g., semantic clustering and subjective organization in HVLT), and Logical Stories.
RESULTS
Figures 1, 2, 3, and 4 show mean scores and standard errors for the participants who were tested at each session. Table 1 provides the means and standard deviations for the main measures at the four assessments. There were no significant group differences for any of the memory measures at Assessment A, indicating that the ETG and LTG were equivalent in terms of baseline memory functioning.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170409154037-12356-mediumThumb-S1355617707070166fig001g.jpg?pub-status=live)
Number of words recalled correctly from secondary memory (36-second delay) by the Early (ETG) and Late (LTG) Training Groups at all assessments of the Brown-Peterson Test. The maximum possible score is 12. Error bars represent the standard error of the mean.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170409154037-55191-mediumThumb-S1355617707070166fig002g.jpg?pub-status=live)
Number of words recalled correctly from secondary memory out of a possible 36 (immediate recall) by the Early (ETG) and Late (LTG) Training Groups at all assessments on the Hopkins Verbal Learning Test–Revised (HVLT-R) Test. Error bars represent the standard error of the mean.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170409154037-54114-mediumThumb-S1355617707070166fig003g.jpg?pub-status=live)
Strategic and nonstrategic performance by the Early (ETG) and Late (LTG) Training Groups at all assessments (A, B, C, D) of the Hopkins Verbal Learning Test–Revised (HVLT-R). Subjective organization was measured by Pair Frequency Analysis (Sternberg & Tulving, 1977). This measure, which adjusts for the number of words recalled, tabulates the number of word pairs recalled together from one trial to the next. Semantic clustering was measured as the number of consecutively recalled words from the same semantic category. Serial ordering is the number of words recalled in the same order as presented. A proportional measure was obtained by dividing the number of serial order clusters by the theoretically maximal number of order clusters, which in turn depends on the total number of words recalled.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170409154037-10868-mediumThumb-S1355617707070166fig004g.jpg?pub-status=live)
Percent delayed recall of Level 3 information by the Early (ETG) and Late (LTG) Training Groups at all assessments of the Logical Stories Test. Error bars represent the standard error of the mean.
Means (and SD) for main measures at the four serial assessments
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170409154037-21953-mediumThumb-S1355617707070166tbl001.jpg?pub-status=live)
Alpha Span Test
ANCOVA, taking previous performance levels into account as described in the Introductory paper, were performed on the Alpha Span Test scores at Assessments B and C. These analyses failed to reveal any differences between the ETG and LTG. [ANOVA conducted on the working memory measures did not reveal significant group differences at any of the assessments.] Moreover, a series of within-group comparisons involving participants who completed successive pairs of test sessions also failed to reveal any significant effects. It appears that memory training had no effect on this measure of working memory performance.
Brown-Peterson Test
For the primary memory measures, there were no meaningful group effects, as revealed by ANCOVA. [The only exception among the 12 primary memory measures (0-, 3-, and 18-s delays × 4 assessments) was at 0 delay for Assessment B, where a significant difference (p < .04) favoring the ETG was observed. We consider this a chance finding. Additionally, ANOVA conducted on the primary memory measures did not reveal significant group differences at any of the assessments.] In the following, we report findings related to secondary memory, as assessed by the number of words recalled at the 36-s delay.
Training effects on ETG (Assessments A to B)
An ANCOVA applied to the Assessment B data did not reveal significant group effects.
Training effects on LTG (Assessments B to C)
There were no significant between-group differences at Assessment C. However, within-group comparisons across Assessments B and C showed that the LTG improved significantly on words recalled at the 36-s delay (F1,14 = 10.23, p = .006), while ETG did not improve to the same degree (F1,25 = 3.19, p = .09; Figure 1). There was also a significant improvement in the LTG's strategy score at Assessment C, relative to Assessment B (F1,14 = 5.80, p = .03); the ETG did not improve to the same degree (F1,28 = 2.64, p = .12).
Long-term effects of training (Assessment D)
Within-group comparisons of scores between Assessments A and C revealed significant improvement in the ETG on words recalled at the 36-s delay (F1,25 = 8.59, p = .007); however, this difference was not significant between Assessments A and D. Improvement on the strategy measure was in the expected direction, but was not statistically significant.
Summary
At Assessment C, within-group comparisons indicated that the LTG improved relative to Assessment B on both secondary memory and strategy measures. Similar comparisons provided some evidence of relatively long-term benefits in secondary memory in the ETG.
HVLT
There were no significant between-group differences in words recalled from primary memory in immediate recall or in delayed recognition memory at any of the testing sessions, using ANCOVA. (ANOVA conducted on the primary memory measure did not reveal significant group differences at any of the assessments.)
Training effects on ETG (Assessments A to B)
As mentioned in the statistical analysis section, at Assessment A, 7 participants were allowed to respond in a written format which they were able to use as a type of memory prosthetic resulting in improved performance. We opted to compare HVLT scores for the ETG and LTG at Assessment B using analysis of variance without covarying Assessment A and drastically reducing sample size. An impact of rehabilitation in immediate memory testing was a substantial and significant group difference in the number of words recalled from secondary memory (F1,47 = 4.22, p = .05, η2 = .082). At Assessment B, group differences on the other recall measures (immediate total recall and delayed recall) were all in the predicted direction, although differences just failed to be significant (all p's > .05 < .08). Immediate recall scores are shown in Figure 2.
Turning to measures of strategy use, the first to be considered is serial order. Given that the 12-word HVLT lists contained three semantic categories of 4 words each, an efficient strategy is to encode and retrieve the words in terms of these categories. It is, therefore, likely that good retrieval performance will be associated with a shift away from reproducing the randomized order of presentation (that is, a decline in serial ordering) and a shift toward retrieving same-category words together (that is, a rise in semantic clustering). If such strategic awareness is a consequence of memory training, we expect these shifts to occur predominantly between Assessments A and B for the ETG, and between Assessments B and C for the LTG.
Serial order scores for participants who completed all four batteries are shown in Figure 3. Serial order clustering declined in the ETG at Assessment B, relative to Assessment A, while rising slightly, but not significantly, in the LTG. This finding resulted in a significant group difference on this measure, Z = −2.74, p = .01 (nonparametric analysis was used because the distribution of scores was not normal), favoring the ETG. Conversely, and as expected, the use of semantic clustering and subjective organization strategies increased in the ETG, relative to the LTG (Figure 3). ANOVA, applied to these scores, indicated significant group differences in semantic clustering (F1,47 = 4.68, p = .04, η2 = .091), and subjective organization (F1,47 = 8.90, p = .005, η2 = .159).
To a limited extent, within-group comparisons across Assessments A to B provided support for greater use of effective strategies by the ETG at Assessment B. This group showed a significant increase in semantic clustering (F1,20 = 4.39, p = .05), and a numerically smaller decrease in serial ordering (F1,20 = 3.72, p = .07). Between Assessments A and B, there were no significant changes within the LTG on any of the strategic measures.
Training effects on LTG (Assessments B to C)
At Assessment C, ANCOVA did not show significant differences between the ETG and the LTG on any of the memory measures, or on measures of subjective organization or semantic clustering. There was a significant group difference on the serial order measure at Assessment C (Z = −2.12, p = .03; nonparametric analysis was performed at Assessment C between groups because the scores were not distributed normally), indicating that the LTG made greater use of this inefficient strategy than the ETG (Figure 3). Within-group comparisons strengthened the suggestion that the LTG had benefited from rehabilitation training. Relative to Assessment B, at immediate memory testing, the LTG improved significantly in terms of total recall (F1,14 = 7.88, p = .01), and recall from secondary memory (F1,14 = 4.68, p = .05), while ETG showed negligible change for total recall (F1,23 = .70, p = .41) and nominal change in secondary memory recall (F1,23 = 3.60, p =.07).
Comparisons between Assessments A and C yielded no significant differences for the LTG, but the ETG showed significant improvements in immediate recall for total correct words (F1,18 = 5.92, p = .03). For this comparison, the ETG also exhibited improvements in semantic clustering (F1,18 = 8.10, p = .01), a marginally significant improvement in subjective organization (F1,18 = 3.95, p = .06), and a significant decline in serial ordering (F1,18 = 10.73, p = .004; see Figure 3).
Long-term effects of training (Assessment D)
There were no significant differences between the ETG and the LTG at Assessment D. However, within-group comparisons indicated long-term benefits of rehabilitation training in the ETG. For example, relative to baseline testing (Assessment A), the ETG improved in total words recalled (F1,16 = 18.21, p < .001), and in words recalled from secondary memory (F1,16 = 10.46, p = .005). The ETG also exhibited an increased use of semantic clustering (F1,16 = 9.57, p = .007), a marginally significant increase in subjective organization (F1,16 = 4.12, p = .06), and a decreased use of serial ordering (F1,16 = 13.51, p = .002). Similar comparisons failed to yield significant changes in the LTG.
Summary
The ETG and LTG showed immediate benefits of training on the HVLT-R test, both in terms of number of words recalled and efficient use of strategies. The ETG, but not the LTG, continued to show improvement in performance on these measures at Assessment D, indicating that the effects of training for this group were long-lasting.
Logical Stories Test
To simplify presentation, of the three levels of analysis only the results of Level 3 recall (specific details) are presented here, along with measures of total recall.
Training effects on ETG (Assessments A to B)
In line with our hypothesis that the beneficial effects of training would be especially apparent in strategic functioning, large effects were observed in recall of Level 3 information. In Level 3, there were significant group differences at Assessment B in immediate recall (F1,45 = 9.60, p = .003, η2 = .176), and in delayed recall (F1,45 = 4.77, p = .03, η2 = .096; the data for Delayed Recall of Level 3 information are provided in Figure 4). These differences favored the ETG and reflect the effect of rehabilitation training.
Between Assessments A and B, both groups showed significant improvement in total immediate recall (ETG: F1,28 = 28.80, p < .001; LTG: F1,18 = 11.79, p = .003), and total delayed recall (ETG: F1,28 = 55.16, p < .001; LTG: F1,18 = 18.01, p < .001). Despite the unexpected improvement in both groups at Assessment B, the ETG performed significantly better than the LTG in immediate recall (F1,45 = 4.52; p = .04). A nonsignificant difference was found to be in the same direction for delayed recall (F1,45 = 3.17; p = .08).
The overall pattern of results at Assessment B indicates benefits of rehabilitation training for the ETG in Level 3 information (recall of specific detail), but no specific benefit for total recall. The equivalent increase in total recall scores for both groups may reflect a general practice effect or different levels of difficulty of the stories at the two tests.
Training effects on LTG (Assessments B to C)
There was no evidence of improvement in the LTG, relative to the ETG, on any of the measures at Assessment C.
Long-term effects of training (Assessment D)
At Assessment D, the ETG recalled slightly more Level 3 information than the LTG in immediate recall, with the group difference being marginally significant (F1,34 = 3.65, p = .06). There was no significant difference on this measure at delayed recall (F1,34 = 2.03, p = .16). At this session, the ETG performed significantly better than the LTG in terms of total words recalled at immediate (F1,34 = 12.19, p = .001) and delayed (F1,34 = 5.92, p = .02) recall.
Within-group comparisons confirmed that there were no significant differences in the scores of either group between Assessments C and D. However, both groups exhibited significant improvements from Assessments A to D (for Level 3 immediate recall: ETG, F1,23 = 140.64, p < .001; LTG, F1,13 = 11.23, p = .005; and delayed recall: ETG, F1,23 = 83.82, p < .001; LTG, F1,13 = 25.53, p < .001. Total immediate recall: ETG, F1,23 = 34.33, p < .001; LTG, F1,13 = 5.03, p = .04; Total delayed recall: ETG, F1,23 = 25.27, p < .001; LTG, F1,13 = 23.86, p < .001). Any conclusion about long-term benefits of training on this measure must be qualified by the possible influence of practice effects.
Summary
In the Logical Stories Test, the ETG showed some benefits of training in immediate and delayed recall. Relative to baseline, both groups also improved in Level 3 information recalled, an indication that they were using more efficient strategies. The ETG retained training-derived benefits at Assessment D; this finding is especially evident in the total recall scores. LTG performance is also higher than baseline on all measures, but interpretation of these results must be qualified in view of apparent practice effects and possible differences in story difficulty across test sessions.
DISCUSSION
Before conducting the study, our assumption was that training would have its greatest effects on those types of memory that involve more controlled and strategic processing, for example, working memory and free recall (e.g., Craik & Jennings, 1992). Contrary to expectations, our measure of working memory, Alpha Span, showed no effects of training. Speculatively, the kinds of mental operations involved in holding and re-ordering word sequences are not really “strategies” but rather operations that would become more efficient as the result of long practice in the same way as mental arithmetic operations improve with practice.
Primary memory, in the sense of Waugh and Norman (1965), was measured in the Brown-Peterson and HVLT-R tests, and showed no meaningful changes as a function of training. This null result makes sense in that primary memory is not sensitive to strategy manipulations; it is also in line with findings that this type of memory is affected very little by aging (Craik, 1968; Craik & Jennings, 1992). Recognition memory also shows relatively slight effects of aging (Craik & Jennings, 1992; Craik, 1983) and was also unaffected by training.
On the other hand, recall from secondary memory did show beneficial effects of training. This is an important result given that secondary memory is the type of episodic memory in constant use in our daily lives. In the HVLT-R test, the ETG performed significantly better than the LTG at Assessment B on total words recalled. This group difference disappeared at Assessment C, due largely to improved performance in the LTG participants following their training. The benefits of training were long lasting in the ETG as they continued to show improved performance at Assessment D, relative to baseline at Assessment A. By comparison, the LTG did not lose gains between training and Assessment D testing, but there was no evidence of continued improvement beyond Assessment C. Of note, within the HVLT data, there were changes in the use of strategies, especially by the ETG. The general pattern, which characterized the ETG's performance at all posttraining assessment sessions, was increased use of subjective organization and semantic clustering strategies, and a decline in the use of less efficient serial ordering. As noted, because Assessment A was not covaried, it is possible that the visible baseline difference in strategy performance at Assessment A contributed to the statistical differences; nevertheless, the pattern of increased use of efficient and decreased use of inefficient strategies would remain.
The results of the Logical Stories Test are somewhat ambiguous because of practice effects seen in both groups. It is also possible that the stories varied in difficulty, although significant variation is considered unlikely. The stories were taken from Dixon et al. (1989) who had standardized the collection and reported them to be comparable in terms of content, length, and difficulty. Nonetheless, recall of Level 3 information improved significantly following training in both groups, although the effect was consistently greater in the ETG. Given the strategic aspect of recalling such information, this finding provides further validation for targeting strategic processes in rehabilitation training.
Although both groups responded positively to training, as in other domains (e.g., psychosocial) the ETG exhibited more consistent and long-lasting benefits than the LTG. This discrepancy was addressed in some detail in the Psychosocial paper (Winocur et al., 2007). To reiterate, we believe that the most likely explanation lies in the groups' responses to the initial orientation and the failure of LTG participants to appreciate that there would be a delay before the start of their training program. We suspect that this factor translated into a negative reaction of which the participants themselves may not have been aware.
The current results should be interpreted cautiously because of the number of measures, the potential power issue due to the unavoidable loss of participants at Assessments C and D, and the small size of the effects. On the other hand, it is encouraging that the results consistently were in the expected direction. It is enlightening that the effects on memory (e.g., recall of word lists and stories) were seen primarily on tests that involve the greatest degree of strategic control during encoding and retrieval (e.g., Cavallini et al., 2002). Conversely, primary memory and recognition tests, which are “well supported” by ongoing processing and by the test environment (Craik, 1983), and not affected by aging (Craik & Jennings, 1992), did not benefit from rehabilitation training.
This research adds to the literature that training can improve encoding and retrieval processes in the elderly, leading to better memory functioning (Glisky & Glisky, 1999) over extended periods of time (Stigsdotter Neely & Bäckman, 1993). Although we assume that the effects were due largely to the specific memory training module, the extent to which this is so remains uncertain. It is possible that other modules, a combination of modules, or simply involvement in the study, provided the effects. As well, our data suggested that the training had some effect on real life situations (Levine et al., 2007), and appeared to generalize to a cognitive domain not specifically targeted in training (i.e., language; see Winocur et al., 2007). Future research is needed to assess the generalizability to practical, everyday aspects of memory.
ACKNOWLEDGMENTS
The authors acknowledge the valuable contributions of the other co-investigators on this project: Drs. M. Alexander, S. Black, D. Dawson, B. Levine, and I.H. Robertson. As well, the outstanding support provided by Maureen Downey-Lamb, Louise Fahy, Marina Mandic, and Tara McHugh is gratefully acknowledged. This study, and the experimental trial of which it is a part, were supported by the JSF McDonnell Foundation. D.T. Stuss holds the Reva James Leeds Chair in Neuroscience and Research Leadership at Baycrest and the University of Toronto. The information reported in this manuscript and the manuscript itself are new and original. The manuscript is not under review by any other journal and has never been published either electronically or in print. There are no financial or other relationships that could be interpreted as a conflict of interest affecting this manuscript.