The utility of regression-based norms in interpreting the minimal assessment of cognitive function in multiple sclerosis (MACFIMS)

BRETT A. PARMENTER; S. MARC TESTA; DAVID J. SCHRETLEN; BIANCA WEINSTOCK-GUTTMAN; RALPH H. B. BENEDICT

doi:10.1017/S1355617709990750

The utility of regression-based norms in interpreting the minimal assessment of cognitive function in multiple sclerosis (MACFIMS)

Published online by Cambridge University Press: 02 October 2009

BRETT A. PARMENTER ,

S. MARC TESTA ,

DAVID J. SCHRETLEN ,

BIANCA WEINSTOCK-GUTTMAN and

RALPH H. B. BENEDICT

Show author details

BRETT A. PARMENTER: Affiliation:
Department of Psychology, Western State Hospital, Tacoma, Washington
S. MARC TESTA: Affiliation:
Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland
DAVID J. SCHRETLEN: Affiliation:
Department of Psychiatry and Behavioral Sciences, Johns Hopkins University School of Medicine, Baltimore, Maryland
BIANCA WEINSTOCK-GUTTMAN: Affiliation:
Department of Neurology, Jacobs Neurological Institute, State University of New York at Buffalo, School of Medicine and Biomedical Sciences, Buffalo, New York
RALPH H. B. BENEDICT*: Affiliation:
Department of Neurology, Jacobs Neurological Institute, State University of New York at Buffalo, School of Medicine and Biomedical Sciences, Buffalo, New York
*: *Correspondence and reprint requests to: Ralph H. B. Benedict, Ph.D., Department of Neurology, 100 High Street (D-6), Buffalo, New York 14203. E-mail: benedict@buffalo.edu

Article contents

Abstract
INTRODUCTION
METHOD
RESULTS
DISCUSSION
References

Rights & Permissions

Abstract

The Minimal Assessment of Cognitive Function in Multiple Sclerosis (MACFIMS) is a consensus neuropsychological battery with established reliability and validity. One of the difficulties in implementing the MACFIMS in clinical settings is the reliance on manualized norms from disparate sources. In this study, we derived regression-based norms for the MACFIMS, using a unique data set to control for standard demographic variables (i.e., age, age2, sex, education). Multiple sclerosis (MS) patients (n = 395) and healthy volunteers (n = 100) did not differ in age, level of education, sex, or race. Multiple regression analyses were conducted on the performance of the healthy adults, and the resulting models were used to predict MS performance on the MACFIMS battery. This regression-based approach identified higher rates of impairment than manualized norms for many of the MACFIMS measures. These findings suggest that there are advantages to developing new norms from a single sample using the regression-based approach. We conclude that the regression-based norms presented here provide a valid alternative to identifying cognitive impairment as measured by the MACFIMS. (JINS, 2010, 16, 6–16.)

Keywords

Standard scores Immunologic disease Neuropsychology Normalization psychometrics Brain

Type: Research Articles
Information: Journal of the International Neuropsychological Society , Volume 16 , Issue 1 , January 2010 , pp. 6 - 16

DOI: https://doi.org/10.1017/S1355617709990750 [Opens in a new window]
Copyright: Copyright © The International Neuropsychological Society 2009

INTRODUCTION

Two types of norms are typically used to interpret performance on neuropsychological measures: discrete and continuous (Zachary & Gorsuch, Reference Zachary and Gorsuch1985). Discrete norms include sets of descriptive statistics for specific age groups (Klein, Foerster, & Hartnegg, Reference Klein, Forester and Hartnegg2007), although norms can be based on rather arbitrary age bands. For example, tables may be divided by five-year intervals (e.g., norms for individuals between the ages of 20 and 24 years, 25 and 29 years, and so on), ten-year intervals, or any other age range. These norms are appropriate if the mean and standard deviation for the normative sample group approximate the mean and standard deviation for the true population and if the raw scores are normally distributed (Zachary & Gorsuch, Reference Zachary and Gorsuch1985). One problem with discrete norms is that an individual’s apparent performance can shift depending on which age band is used, even though the raw score remains the same. Zachary and Gorsuch (Reference Zachary and Gorsuch1985) noted that a person’s IQ score on the Wechsler Adult Intelligence Scale–Revised (WAIS-R) could increase up to six points by aging a single day when their raw test scores pass from comparison with the 25–34 to the 35–44 year age group. Because of this instability, Zachary and Gorsuch recommended using continuous norms as an alternative to discrete norms.

An alternative approach is to derive continuous norms using multiple regression equations. Predictor variables can vary, but usually are specific demographic variables that have been shown to affect performance on neuropsychological measures. Such demographic variables always include age and education (Crawford & Allan, Reference Crawford and Allan1997; Heaton, Ryan, Grant, & Matthews, Reference Heaton, Ryan, Grant, Matthews, Grant and Adams1996; Leckliter & Matarazzo, Reference Leckliter and Matarazzo1989), as well as sex (Crawford & Allan, Reference Crawford and Allan1997; Leckliter & Matarazzo, Reference Leckliter and Matarazzo1989; Van der Elst, Van Boxtel, Van Breukelen, & Jolles, 2005; Van der Elst, Van Boxtel, Van Breukelen, & Jolles, 2006a), although sex often has little impact on cognitive performance (Heaton, Ryan, Grant, & Matthews, Reference Heaton, Ryan, Grant, Matthews, Grant and Adams1996; Sherrill-Pattison, Donders, & Thompson, Reference Sherrill-Pattison, Donders and Thompson2000). Age-squared can also be added as a predictor variable to evaluate the influence of nonlinear age effects on normal cognitive performance (Van der Elst, Van Boxtel, Van Breukelen, & Jolles, 2006a, b).

Several investigators argue that norms based on multiple regression equations are useful insofar as they allow an individual’s predicted score on a measure to reflect specific demographic characteristics (Heaton, Avitable, Grant, & Matthews, Reference Heaton, Avitable, Grant and Matthews1999; Crawford & Howell, Reference Crawford and Howell1998). Identification of the most relevant demographic variables can be a challenge (Van Breukelen & Vlaeyen, Reference Van Breukelen and Vlaeyen2005), but using such equations has reduced demographic biases in the raw data derived from the Boston Naming Test (Heaton, Avitable, Grant, & Matthews, Reference Heaton, Avitable, Grant and Matthews1999), Rey Auditory Verbal Learning Test (Van der Elst, Van Boxtel, Van Breukelen, & Jolles, Reference Van der Elst, Van Boxtel, Van Breukelen and Jolles2005), and Stroop Color-Word Test (Van der Elst, Van Boxtel, Van Breukelen, & Jolles, 2006b), to name a few. This method has also been used to minimize such demographic biases in a large cognitive battery consisting of 19 individual measures (Schretlen et al., Reference Schretlen, Cascella, Meyer, Kingery, Testa, Munro, Pulver, Rivkin, Rao, Diaz-Asper, Dickerson, Yolken and Pearlson2007).

If the regression-based approach can control for the influence of demographic variables, might the same approach apply to control for physical or neurological disability factors? Indeed, a few authors have suggested including not only demographic variables, but also neurologic variables in “norms” for specific clinical samples (Vanderploeg et al., 1997), such as length of coma or Glasgow Coma Scale scores in interpreting traumatic brain injury data. Sherrill-Pattison, Donders, & Thompson (Reference Sherrill-Pattison, Donders and Thompson2000) found that the severity of brain injury, as defined by length of coma, was a significant predictor of performance on neuropsychological tests.

In the present study, we endeavored to calculate regression-based norms that would assist in the clinical evaluation of patients with multiple sclerosis (MS). Like Schretlen et al. (Reference Schretlen, Cascella, Meyer, Kingery, Testa, Munro, Pulver, Rivkin, Rao, Diaz-Asper, Dickerson, Yolken and Pearlson2007), we aimed to derive data that allow for the control (and assessment) of such confounding variables as age and education. In addition, we sought to evaluate the influence of other clinical factors such as depression, dysarthria, and upper-extremity motor function. We focused on the Minimal Assessment of Cognitive Function in Multiple Sclerosis (MACFIMS; Benedict et al., Reference Benedict, Fischer, Archibald, Arnett, Beatty and Bobholz2002), a collection of tests based on consensus opinion designed to briefly evaluate the “principle features of MS-related cognitive dysfunction” (p. 382).

Cognitive impairment affects between 43–60% of patients with MS (Benedict et al, Reference Benedict, Cookfair, Gavett, Gunther, Munschauer, Garg and Weinstock-Guttman2006; Rao, Leo, Bernardin, & Unverzagt, Reference Rao, Leo, Bernardin and Unverzagt1991), although detection can be difficult to assess in an interview or a routine neurological visit (Benedict et al., 2002; Fischer et al., 1994; Peyser, Edwards, Poser, & Filskov, 1980). MS-related cognitive decline has been associated with depression (Arnett et al., Reference Arnett, Higginson, Voss, Bender, Wurst and Tippin1999a; Arnett al., 1999b; Thornton & Raz, Reference Thornton and Raz1997). The nature of this association is unclear though, as depression might cause cognitive dysfunction, or it might result from an underlying disease process that also causes the cognitive impairment (Feinstein, Reference Feinstein2006). In fact, Feinstein and colleagues (2004) found that MS patients with depression, as compared to patients without depression, tend to have greater lesion load in the prefrontal cortex and anterior temporal lobe in the dominant hemisphere (Feinstein et al., 2004). Other symptoms of the disease may also affect performance on neuropsychological tests. For example, dysarthria or impaired oral agility appears to affect performance on measures dependent on rapid speech, such as the Controlled Oral Word Association Test (Arnett, Smith, Barwick, Benedict, & Ahlstrom, Reference Arnett, Smith, Barwick, Benedict and Ahlstrom2008). Similarly, upper extremity motor dysfunction may interfere with performance on cognitive measures that require some aspects of manual dexterity.

With this in mind we constructed regression-based norms for the MACFIMS using data derived from healthy controls. The regression-based continuous norms were used to generate predicted scores for the MS patients, and in turn to calculate T scores based on the raw test performance. We then compared these regression-based T scores to T scores based on published, discrete norms, and evaluated the rates of impairment using each type of norm. Furthermore, we divided MS patients into groups based on neurological functioning, according to performance on tests of oral speed/agility and pegboard placement speed, as well as scores on the Beck Depression Inventory-Fast Screen (BDI-FS; Beck, Steer, & Brown, 2000). We hypothesized that different interpretations would emerge from manual and regression-based standard scores, significantly affecting outcomes on the MACFIMS battery. In addition, we predicted that the severity of neurological abnormality would moderate these findings.

METHOD

Participants

All data were collected in compliance with institutional guidelines. The participants were 395 patients with clinically definite MS (Polman et al., Reference Polman, Reingold, Edan, Filippi, Hartung and Kappos2005) who were assessed as research volunteers (n = 77), or were referred for clinical assessment (n = 318). The research participants were paid for their participation. Disease course (Lublin & Reingold, 1996) was determined by board-certified neurologists as follows: 294 relapsing-remitting (RR), 84 secondary progressive (SP), 10 progressive-relapsing, and 7 primary progressive (PP). Three hundred eight (308) patients were women and 87 were men. Three hundred sixty-six (366) patients were Caucasian, 23 were African-American, and 6 were classified as “other” reflecting mixed or uncertain heritage. Mean age (±SD) was 46.28 ± 8.99 years and mean education was 14.28 ± 2.34 years. Exclusion criteria included history of neurologic disease other than MS, drug or alcohol dependence, and psychiatric disease other than psychological problems attributable to MS. Patients who had experienced relapse or undergone steroid treatment within six weeks prior to participation were also excluded. Expanded Disability Status Scale (EDSS; Kurtzke, 1983) scores within six months were available for 176 patients. Median EDSS score was 3.50 (range 0–7.0).

One hundred healthy adults also participated in the study. These control participants were recruited via advertisement in local, suburban newspapers. All were paid for their participation. All of the control participants were screened for prior neurological and psychiatric illness using a standard screening interview developed in house (Benedict et al., Reference Benedict, Cookfair, Gavett, Gunther, Munschauer, Garg and Weinstock-Guttman2006; Parmenter et al., Reference Parmenter, Zivadinov, Kerenyi, Gavett, Weinstock-Guttman and Dwyer2007). Most (n = 79) were women, and 89 were Caucasian. On average, these participants were 44.79 years old (±9.43; range: 20–60; skewness: 0.214, standard error = .241; kurtosis: –0.673, standard error = 0.478) and completed 14.47 years of school (±1.72; range: 12–18; skewness: 0.222, standard error 0.241; kurtosis: –.661, standard error = 0.478). The MS patients and healthy controls did not differ significantly on these demographic variables.

Neuropsychological Measures

Neuropsychological testing was conducted by trained assistants and students under the guidance of a board-certified clinical neuropsychologist (RHBB). Patients were administered the MACFIMS battery, as recommended by a consensus panel (Benedict et al., Reference Benedict, Fischer, Archibald, Arnett, Beatty and Bobholz2002) and recently validated in a large prospective study (Benedict et al., Reference Benedict, Cookfair, Gavett, Gunther, Munschauer, Garg and Weinstock-Guttman2006). The battery included: the Judgment of Line Orientation Test (JLO; Benton, Sivan, Hamsher, Varney, & Spreen, 1994), Controlled Oral Word Association Test (COWAT; Benton & Hamsher, 1989), California Verbal Learning Test, 2nd Edition (CVLT2; Delis, Kramer, Kaplan, & Ober, 2000), Brief Visuospatial Memory Test–Revised (BVMTR; Benedict, 1997), Delis-Kaplan Executive Function System (DKEFS) Sorting Test (Delis, Kaplan, & Kramer, 2001), Symbol Digit Modalities Test, oral version (SDMT; Smith, 1982), and a modified Paced Auditory Serial Addition Test (PASAT; Rao, Leo, Bernardin, & Unverzagt, Reference Rao, Leo, Bernardin and Unverzagt1991). We included the Total Learning (TL) and Delayed Recall (DR) indices from the CVLT2 and BVMTR and both the Total Correct Sorts (CS) and the Description Score (DS) from the DKEFS Sorting Test. Two trials of the PASAT were administered, one with a 3.0-second interstimulus interval (ISI) and one with a 2.0-second ISI.

Three neurological and psychiatric measures were used to assess the influence of depression, dysarthria, and upper extremity weakness and spasticity. The Beck Depression Inventory–Fast Screen (BDI-FS; Beck, Steer, & Brown, 2000), which has been validated in patients with MS (Benedict, Fishman, McClellan, Bakshi, & Weinstock-Guttman, 2003), was used to quantify depression severity. The Maximum Repetition Rate of Syllables and Multisyllabic Combinations Test (MRR; Kent, Kent, & Rosenbek, Reference Kent, Kent and Rosenbek1987; cf Arnett, Smith, Barwick, Benedict, & Ahlstrom, Reference Arnett, Smith, Barwick, Benedict and Ahlstrom2008), in which the respondent must repeat phonemes (e.g., “ba–ta–ka”) as rapidly as possible for 6 seconds, was used to quantify dysarthria. For this, we recorded the number of triplicate phonemes repeated correctly. The Holyan 9-Hole Peg Test (9HPT; Mathiowetz, Weber, Kashman, & Volland, Reference Mathiowetz, Weber, Kashman and Volland1985) requires the participant to insert and then remove nine pegs from holes in a pegboard as quickly as possible. We recorded the average number of seconds required to complete the task twice with each hand, and for all four trials.

Published norms used in the current study include the norms provided with the CVLT2, BVMTR, and the DKEFS manuals. For the JLO and the COWAT, we used the norms provided by Benton, Sivan, Hamsher, Varney, & Spreen (1994). For the PASAT and SDMT, we used the norms provided by Rao (Reference Rao1991).

Statistical Analyses

Throughout the study, the threshold for statistical significance was p < .05. For some analyses, effect sizes were calculated with the d statistic. Group differences in age, education, depression, MRR, 9HPT, and performance on the MACFIMS battery measures were evaluated using analyses of variance (ANOVA). Group differences for sex and ethnicity were examined using chi-square analyses.

Demographically adjusted T-scores were calculated for MS patients based on the healthy group’s scores. The general procedures are described elsewhere (Heaton et al., 2004; Ivnik et al., 1992; Testa et al., Reference Testa, Winicki, Pearlson, Gordon and Schretlensubmitted). We first converted the control group’s raw scores on each neuropsychological measure to scaled scores (M = 10, SD = 3) using the cumulative frequency distribution of each measure. This served to normalize all of the test score distributions (see Table 1). We then regressed the resulting scaled scores on age, age-squared, sex, and education, entered en bloc. Plots of regression-standardized residuals predicted values showed that the assumption of homoscedasticity was not violated. Next, we converted the MS participants’ raw test scores to scaled scores using the raw-to-scale-score conversions derived from the healthy controls. We then applied the multiple regression equations derived from the healthy controls to compute demographically predicted scores for each MS participant. These predicted scores were then subtracted from each participant’s actual scores and the differences were divided by the standard deviation of the controls group’s raw residuals for each measure (Table 2). Finally, the resulting values were converted to T scores.

Table 1. Raw score to scaled score conversions

Table 2. Standard deviation of the residual from healthy controls

Paired sample t tests were used to evaluate MS participants’ T scores based on our regression models and T scores derived from published norms. In addition, MS performance on each neuropsychological measure was classified as either intact (T > 35) or impaired (T ≤ 35) based on T scores derived from each norming method. McNemar tests of dependent proportions were then used to determine if the proportion of participants classified as impaired on each measure differed depending on which norms were used.

For the neurological and psychiatric disability measures, MS patients were assigned to one of three groups based on degree of pathology. For both the MRR and 9HPT, which were normally distributed, patients were classified as follows: Impaired = scores >1.5 SDs below the control group mean; Borderline = scores between 0.5 and 1.5 SDs below the control group mean; and Normal = scores < 0.5 SDs below the control group or better. For the BDI-FS, which was not normally distributed in the MS sample, the following cut-off scores were used, consistent with the test manual: Normal = BDI-FS < 3; Borderline = BDI-FS 3–8; Depressed = BDI-FS > 8.

RESULTS

As shown in Table 3, MS patients reported greater depression than healthy controls on the BDI-FS, F(1, 493) = 51.46 p < .001. The patients also produced fewer triplicate phonemes on the MRR, F(1, 376) = 14.52, p < .001] and were slower to complete the 9HPT, F(1, 461) = 5.78, p < .05. The MS patients performed more poorly than healthy controls on all cognitive measures of the MACFIMS (all p’s ≤ 0.05).

Table 3. MS patients compared to healthy controls

Table 4 shows the normal control regression models used to derive T scores for the MACFIMS. All models include age, age-squared, sex (male = 1; female = 2), and education. MS raw scores on each of the MACFIMS measures were converted to T scores based on the regression-based norms as described earlier. For example, consider a 49-year-old female patient with 14 years of education. Her predicted scaled score on the COWAT is 10.23 [7.456 + 49(–0.264) + 49²(0.003) + 2(1.921) + 14(0.333)]. Her actual COWAT score of 33 corresponds to a scaled score of 8, according to Table 1. We then divide the difference between her actual and predicted scaled scores (8.0 – 10.23 = –2.23) by the standard deviation of the residual (2.83566), and obtain a z score of –0.79, which equals a T score of 42.

Table 4. Final regression models for MACFIMS measures

Comparison of Norms

Table 5 shows mean T scores for MS patients calculated using each method. Compared to published norms, the regression-based norms resulted in higher T scores on the JLO (p < .01), CVLT2-TL (p< .001), PASAT 3.0 (p < .001), and PASAT 2.0 (p < .001). Conversely, regression-based norms resulted in lower T scores for the COWAT (p < .001), CVLT2-DR (p < .001), BVMTR-TL (p < .001), BVMTR-DR (p < .001), the SDMT (p < .001), the DKEFS-CS (p < .001), and the DKEFS-DS (p < .001).

Table 5. Comparison of MS patients’ mean T scores for the MACFIMS calculated from published norms versus regression-based norms

** p < .01

*** p < .001.

Performance on each measure of the MACFIMS was classified as impaired or intact based on T scores derived from each set of norms. The proportion of intact to impaired MS patients was calculated for each measure. As seen in Table 6, the regression-based norms resulted in significantly more patients being classified as impaired on the CVLT2-DR (p < .05), BVMTR-TL (p < .001), BVMTR-DR (p < .001), SDMT (p < .001), DKEFS-CS (p < .001), and DKEFS-DS (p < .001). On the other hand, the published, discrete norms classified significantly more patients as impaired on the CVLT2-TL (p < 0.001), PASAT 3.0 (p < 0.001), and PASAT 2.0 (p < 0.001). Both norms classified similar numbers of patients as impaired on the COWAT and JLO.

Table 6. Number of impaired patients classified by discrete and regression-based norms

* p < 0.05

*** p < 0.001

Comparisons of Groups Based on Neurologic Symptoms

On the MRR, 119 patients were classified as normal (group 1), 89 were classified as borderline (group 2), and 93 were classified as impaired (group 3). These subgroups did not differ on age, education, sex, or race. A multivariate analysis of variance (MANOVA) on MACFIMS T scores derived from the regression-based norms revealed significant group differences, F(22, 576) = 1.74, p = .020. Follow-up analyses were conducted and, as seen in Table 7, significant group differences were found on the CVLT2-TL, F(2, 298) = 4.02, p = .019, CVLT2-DR, F(2, 298) = 3.26, p = .040, PASAT 3.0, F(2, 298) = 3.66, p = .027, and SDMT, F(2, 298) = 11.991, p < .001. On the CVLT2-TL and SDMT, group 1 performed better than groups 2 and 3. On the PASAT 3.0, group 1 performed better than group 3 only. On the CVLT2-DR, group 1 performed better than groups 2 and 3, although these comparisons only approached significance (p’s < .09).

Table 7. MRR3 Group Performance on the MACFIMS (regression-based T scores)

On the 9HPT, there were 135 patients classified as normal (group 1), 94 as borderline (group 2), and 153 as impaired (group 3). These subgroups differed in age, F(2, 379) = 20.68, p < .001, with group 1 being significantly younger (42.45 ± 8.26 years) than groups 2 (47.52 ± 9.18 years) and 3 (48.80 ± 8.64 years). The groups did not significantly differ in education, sex, or race. A MANOVA revealed significant group differences, F(22, 738) = 7.280, p < .001, on the MACFIMS. As seen in Table 8, the groups differed on all measures (all p’s < .05). Group 1 performed significantly better than group 3 on all measures (all p’s < .05). Additionally, group 1 performed significantly better than group 2 on the COWAT (p < .05), CVLT2-TL (p < .05), CVLT2-DR (p < .05), PASAT 2.0 (p < .01), and SDMT (p < .01). Group 2 performed significantly better than group 3 on the JLO (p < .05), CVLT2-TL (p < .01), CVLT2-DR (p < .05), PASAT 3.0 (p < .05), and SDMT (p < .001). When a multivariate analysis of covariance (MANCOVAs) was conducted, with age used as a covariate, significant group differences were again found, F(22, 736) = 7.64, p < .001, and group differences were found for all measures (all p’s < .01).

Table 8. 9HPT Group Performance on the MACFIMS (regression-based T scores)

On the BDI-FS, 200 patients were normal (group 1), 159 patients were borderline (group 2), and 36 patients were depressed (group 3). These groups did not differ in age, education, or sex. However, group 3 included fewer African-Americans and more patients of “other” racial background than groups 1 and 2, χ² (4) = 14.33, p = .006. However, as shown in Table 9, a MANOVA revealed no group differences on the MACFIMS.

Table 9. BDI-FS Group Performance on the MACFIMS (regression-based T scores)

DISCUSSION

The purposes of this study were three-fold: (1) establish regression-based norms for use with the MACFIMS, (2) compare these norms with the traditional manual-based norms published for each test, and (3) assess the relationship between cognitive performance and neurological and psychiatric factors.

To our knowledge, this is the first application of regression-based normative techniques to the MACFIMS. This approach enables one to compare performance across tests directly, because the norms are derived from a uniform data set (i.e., they are co-normed), rather than from different standardization samples for each test. In the regression analyses we included the traditional demographic variables typically used in regression-based norms, such as, age, age², sex, and education. Including a term for age-squared allowed us to consider the nonlinear relationship between age and cognition. The regression-based approach to norms development enables one to (a) account for demographic influences on test performance, and (b) use the entire normative sample rather than divide it into smaller subgroups for the computation of age- or education-stratified means and standard deviations. Because the control group was demographically matched to the larger clinical sample, and because the clinical sample is representative of MS clinical patients, these norms may be applicable in other regions of the USA. A challenge to the regression approach, however, is determining which variables are reasonable predictors. There is ample evidence that age, education, and, to a lesser extent, sex, correlate with neuropsychological test performance. Other variables can be included, but it is important to consider whether they make theoretical sense. For example, height might be a statistically significant predictor, but there is little theoretical basis for including it as a predictor in regression-based normative equations. In the future, we may pursue similar regression-based approaches within the MS sample that may enable us to account for the influence of peripheral or neurological factors, such as MRR and 9HPT.

Practical issues should also be considered when the clinician is determining whether to use regression-based norms. As Van der Elst, Van Boxtel, Van Breukelen, & Jolles (2006a; 2006b) observed, many clinicians are unfamiliar with such norms and might find them cumbersome. These authors recommend implementing a spreadsheet on a computer to help overcome this. On the other hand, an advantage of regression-based norms is that the clinician no longer has to compile separate norms from several sources to evaluate the patient.

Although several authors advocate using regression-based norms (Crawford & Howell, Reference Crawford and Howell1998; Heaton, Avitable, Grant, & Matthews, Reference Heaton, Avitable, Grant and Matthews1999; Zachary & Gorsuch, Reference Zachary and Gorsuch1985), others (Reitan & Wolfson, Reference Reitan and Wolfson1995; Reference Reitan and Wolfson1997) argue that demographic variables predict cognitive test performance in healthy, neurologically intact individuals, but that these relationships are uncoupled by brain damage or dysfunction. Fasteneau (Reference Fastenau1998) also cautioned against using regression-based norms, based on the view that such models can distort findings. For example, it was reported that such models penalized highly educated adults on the Trail Making Test, Boston Naming Test, and Wisconsin Card Sorting Test (WCST), while overcorrecting for age on the WCST (Fasteneau, Reference Fastenau1998). These arguments have been challenged (Shuttleworth-Jordan, Reference Shuttleworth-Jordan1997; Vanderploeg, Axelrod, Sherer, Scott, & Adams, Reference Vanderploeg, Axelrod, Sherer, Scott and Adams1997) on both methodological and theoretical grounds. Indeed, while Heaton, Avitable, Grant, and Matthews (Reference Heaton, Avitable, Grant and Matthews1999) agreed that regression-based norms can be misleading, they noted that this is a result of an inappropriate sample from which the models are derived and is not inherent to these types of norms. However, regression-based norms may not be appropriate if the referral question focuses on real-world functioning. In a recent paper, Silverberg and Millis (Reference Silverberg and Millis2009) argue that, while demographically adjusted norms can determine if a patient has deteriorated from baseline (i.e., impairment), norms that do not account for demographic variables better predict a person’s current functional abilities or lack thereof (i.e., deficiency), such as the ability to drive or live independently. Thus, the clinician will need to decide which type of norm is most appropriate, either those that account for demographic variables or those that do not, depending on the referral question.

Our findings show that regression-based norms yield significantly different T scores than discrete norms published for each test. As a result, T scores were significantly altered for all test variables: COWAT, JLO, CVLT2-TL, CVLT2-DR, BVMTR-TL, BVMTR-DR, PASAT 3.0, PASAT 2.0, SDMT, DKEFS-CS, and the DKEFS-DS. This has important clinical implications because whether or not a person is diagnosed with a cognitive disorder or dementia can vary depending, in part, on which norms are used. Our results suggest that published norms may be inadequate for interpreting performance on the MACFIMS. Compared to our regression-based adjustments, published norms resulted in significantly lower rates of impairment for MS patients on the CVLT2-DR, the BVMTR, SDMT, and the DKEFS Sorting Test. Most of the published norms that we used were stratified by age and education, but accounted for these as categorical rather than continuous variables. Consequently, discrete norms might distort the effect of these variables, such as the impact of education on the DKEFS Sorting Test.

It should be borne in mind that our study was not designed to determine which norm method is most valid. The manual norms employed in this study were collected by different researchers at different times. Some of these data were published in 1991 (e.g., for PASAT and SDMT), while others were published more recently (e.g., DKEFS). One of our goals was to collect and derive new normative data based on a single sample, so that comparisons across tests would be more valid. As we do not have a gold standard for comparison, we cannot determine which norms are most valid. However, we do note a few observations that would seem to support the regression-based approach. First, only 4 to 5% of patients are impaired on the DKEFS using the manual norms, but we and other researchers have shown previously that the DKEFS is as sensitive as the Wisconsin Card Sorting Test in MS (Beatty & Monson, Reference Beatty and Monson1996; Parmenter et al, Reference Parmenter, Zivadinov, Kerenyi, Gavett, Weinstock-Guttman and Dwyer2007). For reasons that are not clear, the manual norms for this test appear to be generous, possibly compromising the test’s sensitivity.

We also examined how neurological symptoms other than cognitive impairment might affect neuropsychological functioning. As mentioned previously, we hypothesized that patients with dysarthria, as measured by reduced performance on the maximum rate repetition test, would have reduced performance on measures reliant on speech (i.e., the COWAT, CVLT2, SDMT, and the PASAT). Additionally, we hypothesized that the severity of neurological abnormality, such as impaired performance on the MMR3 and 9HPT, would moderate our findings. For example, patients with poor manual dexterity, as measured by the 9HPT, would be expected to perform more poorly on measures dependent on manual dexterity, such as the BVMTR. Consistent with this, we found that patients impaired on the MRR3 performed significantly lower on several measures reliant on speech, such as the CVLT2, PASAT 3.0, and the SDMT (although no significant differences were found on the COWAT or PASAT 2.0). These findings suggest that neurological symptoms separate from cognition may influence performance on neuropsychological measures. This was not seen for other MS symptoms, however. MS patients with reduced manual dexterity not only performed more poorly on measures dependent on this function, but they performed more poorly on all measures of the MACFIMS. Conversely, no differences on the MACFIMS were found for patients grouped according to depression. Reduced performance on the 9HPT may represent more advanced disease, related to global cognitive impairment. However, the lack of findings related to severity of depression is uncertain, as these data are in direct contrast to previous findings, despite similar methods used to define depression (Arnett et al., Reference Arnett, Higginson, Voss, Bender, Wurst and Tippin1999a, Reference Arnett, Higginson, Voss, Wright, Bender, Wurst and Tippin1999b; Thornton & Raz, Reference Thornton and Raz1997).

One limitation of our study is the relatively small number of healthy controls on which our regression models were based. Even though one advantage of using regression-based norms is that fewer people are needed for the normative sample, a larger number would provide the models derived for this analysis with greater stability. Thus, deriving models from a larger normative sample may be a useful future endeavor.

In sum, our findings suggest that existing discrete norms might overlook impairment in patients with MS, which could prevent them from receiving needed assistance. Additionally, by controlling the effects of demographic variables, we can better appreciate how other MS symptoms might contribute to neuropsychological performance, such as the effect of dysarthria on tests that require oral responses. Thus, these norms provide an alternative when using the MACFIMS to evaluate a patient’s cognitive functioning and to investigate how MS affects cognition.

ACKNOWLEDGMENTS

There were no sources of financial support for this article. The authors have no financial or other relationships that could be interpreted as a conflict of interest with regard to this article.

References

REFERENCES

Arnett, P.A., Higginson, C.I., Voss, W.D., Bender, W.I., Wurst, J.M., & Tippin, J.M. (1999a). Depression in multiple sclerosis: Relationship to working memory capacity. Neuropsychology, 13(4), 546–56.CrossRef Google Scholar PubMed

Arnett, P.A., Higginson, C.I., Voss, W.D., Wright, B., Bender, W.I., Wurst, J.M., & Tippin, J.M. (1999b). Depressed mood in multiple sclerosis: Relationship to capacity-demanding memory and attentional functioning. Neuropsychology, 13(3), 434–446.CrossRef Google Scholar PubMed

Arnett, P.A., Smith, M.M., Barwick, F.H., Benedict, R.H.B., & Ahlstrom, B.P. (2008). Oralmotor slowing in multiple sclerosis: Relationship to neuropsychological tasks requiring an oral response. Journal of the International Neuropsychological Society, 14, 454–462.CrossRef Google Scholar PubMed

Beatty, W.W., & Monson, N. (1996). Problem solving by patients with multiple sclerosis: Comparison of performance on the Wisconsin and California Card Sorting tests. Journal of the International Neuropsychological Society, 2, 134–140.CrossRef Google Scholar PubMed

Beck, A.T., Steer, R.A., & Brown, G.K. (2000). BDI-Fast Screen for Medical Patients: Manual. San Antonio, TX: Psychological Corporation.Google Scholar

Benedict, R.H.B. (1997). Brief Visuospatial Memory Test – Revised: Professional Manual. Odessa, Florida: Psychological Assessment Resources, Inc.Google Scholar

Benedict, R.H.B., Cookfair, D., Gavett, R., Gunther, M., Munschauer, F., Garg, N., & Weinstock-Guttman, B. (2006). Validity of the minimal assessment of cognitive function in multiple sclerosis (MACFIMS). Journal of the International Neuropsychological Society, 12, 549–558.CrossRef Google Scholar PubMed

Benedict, R.H.B., Fischer, J.S., Archibald, C.J., Arnett, P.A., Beatty, W.W., Bobholz, J., et al. (2002). Minimal neuropsychological assessment of MS patients: A consensus approach. The Clinical Neuropsychologist, 16(3), 381–397.CrossRef Google Scholar PubMed

Benedict, R.H.B., Fishman, I., McClellan, M.M., Bakshi, R., & Weinstock-Guttman, B. (2003). Validity of the Beck Depression Inventory - Fast Screen in multiple sclerosis. Multiple Sclerosis, 9, 393–396.CrossRef Google Scholar PubMed

Benton, A.L., Sivan, A.B., Hamsher, K., Varney, N.R., & Spreen, O. (1994). Contributions to Neuropsychological Assessment. Second ed. New York: Oxford University Press.Google Scholar

Crawford, J.R., & Allan, K.M. (1997). Estimating premorbid WAIS-R IQ with demographic variables: Regression equations derived from a UK sample. The Clinical Neuropsychologist, 11(2), 192–197.CrossRef Google Scholar

Crawford, J.R., & Howell, D.C. (1998). Regression equations in clinical neuropsychology: An evaluation of statistical methods for comparing predicted and obtained scores. Journal of Clinical and Experimental Neuropsychology, 20(5), 755–762.CrossRef Google Scholar PubMed

Delis, D.C., Kramer, J.H., Kaplan, E., & Ober, B.A. (2000). California Verbal Learning Test Manual: Second Edition, Adult Version. San Antonio, TX: Psychological Corporation.Google Scholar

Delis, D.C., Kaplan, E., & Kramer, J.H. (2001). Delis-Kaplan Executive Function System. San Antonio, Texas: Psychological Corporation.Google Scholar

Fastenau, P.S. (1998). Validity of regression-based norms: An empirical test of the comprehensive norms with older adults. Journal of Clinical and Experimental Neuropsychology, 20(6), 906–916.CrossRef Google Scholar PubMed

Feinstein, A. (2006). Mood disorders in multiple sclerosis and the effects on cognition. Journal of the Neurological Sciences, 245, 63–66.CrossRef Google Scholar PubMed

Feinstein, A., Roy, P., Lobaugh, N., Feinstein, K., O’Connor, P., & Black, S. (2004). Structural brain abnormalities in multiple sclerosis patients with major depression. Neurology, 62, 586–590.CrossRef Google Scholar PubMed

Fischer, J.S., Foley, F.W., Aikens, J.E., Ericson, G.D., Rao, S.M., & Shindell, S. (1994). What do we really know about cognitive dysfunction, affective disorders, and stress in multiple slcerosis? A practitioner’s guide. Journal of Neurologic Rehabilitation, 8, 151–164.Google Scholar

Heaton, R.K., Avitable, N., Grant, I., & Matthews, C.G. (1999). Further cross validation of regression-based neuropsychological norms with an update for the Boston Naming Test. Journal of Clinical and Experimental Neuropsychology, 21(4), 572–582.CrossRef Google Scholar

Heaton, R.K., Ryan, L., Grant, I., & Matthews, C.G. (1996). Demographic influences on neuropsychological test performance. In Grant, I. & Adams, K. (Eds.), Neuropsychological assessment of neuropsychiatric disorders. New York: Oxford University Press.Google Scholar

Kent, R.D., Kent, J.F., & Rosenbek, J.C. (1987). Maximum performance tests of speech production. Journal of Speech and Hearing Disorders, 52, 367–387.CrossRef Google Scholar PubMed

Klein, C., Forester, F., & Hartnegg, K. (2007). Regression-based developmental models exemplified for Wisconsin Card Sorting Test parameters: Statistics and software for individual predictions. Journal of Clinical and Experimental Neuropsychology, 29(1), 25–35.CrossRef Google Scholar PubMed

Kurtzke, J.F. (1983). Rating neurologic impairment in multiple sclerosis: an expanded disability status scale (EDSS). Annals of Neurology, 13, 227–231.Google Scholar

Leckliter, I.N., & Matarazzo, J.D. (1989). The influence of age, education, IQ, gender, and alcohol abuse on Halstead-Reitan neuropsychological test battery performance. Journal of Clinical Psychology, 45(4), 484–512.3.0.CO;2-L>CrossRef Google Scholar PubMed

Lublin, F.D., & Reingold, S.C. (1996). Defining the clinical course of multiple sclerosis: results of an international survey. National Multiple Sclerosis Society (USA) Advisory Committee on Clinical Trials of New Agents in Multiple Sclerosis. [see comment]. Neurology, 46, 907–911.CrossRef Google Scholar PubMed

Mathiowetz, V., Weber, K., Kashman, N., & Volland, G. (1985). Adult norms for the nine hole peg test of finger dexterity. The Occupational Therapy Journal of Research, 5, 24–37.CrossRef Google Scholar

McDonald, W.I., Compston, A., Edan, G., Goodkin, D.E., Hartung, H., Lublin, F., McFarland, H.F., Paty, D.W., Polman, C.H., Reingold, S.C., Sandberg-Wollheim, M., Sibley, W.A., Thompson, A., van der Noort, S., Weinshenker, B.Y., & Wolinsky, J.S. (2001). Recommended diagnostic criteria for multiple sclerosis: Guidelines from the international panel on the diagnosis of multiple sclerosis. Annals of Neurology, 50, 121–127.CrossRef Google Scholar PubMed

Parmenter, B., Zivadinov, R., Kerenyi, L., Gavett, R., Weinstock-Guttman, B., Dwyer, M., et al. (2007). Validity of the Wisconsin Card Sorting and Delis-Kaplan Executive Function System (DKEFS) sorting tests in multiple sclerosis. Journal of Clinical and Experimental Neuropsychology, 29, 215–223.CrossRef Google Scholar PubMed

Peyser, J.M., Edwards, K.R., Poser, C.M., Filskov, S.B. (1980). Cognitive function in patients with multiple sclerosis Archives of Neurology 37 577–579.CrossRef Google Scholar PubMed

Polman, C.H., Reingold, S.C., Edan, G., Filippi, M., Hartung, H.P., Kappos, L., et al. (2005). Diagnostic criteria for multiple sclerosis: 2005 revisions to the “McDonald Criteria.” Annals of Neurology, 58, 840–846.CrossRef Google Scholar

Rao, S.M. (1991). A manual for the brief repeatable battery of neuropsychological tests in multiple sclerosis. New York: National MS Society.Google Scholar

Rao, S.M., Leo, G.J., Bernardin, L., & Unverzagt, F. (1991). Cognitive dysfunction in multiple sclerosis. I. Frequency, patterns, and prediction. Neurology, 41, 685–691.CrossRef Google Scholar PubMed

Reitan, R.M., & Wolfson, D. (1995). Influence of age and education on neuropsychological test results. The Clinical Neuropsychologist, 9(2), 151–158.CrossRef Google Scholar

Reitan, R.M. & Wolfson, D. (1997). The influence of age and education on neuropsychological performances of persons with mild head injuries. Applied Neuropsychology, 4(1), 16–33.CrossRef Google Scholar PubMed

Schretlen, D.J., Cascella, N.G., Meyer, S.M., Kingery, L.R., Testa, S.M., Munro, C.A., Pulver, A.E., Rivkin, P., Rao, V.A., Diaz-Asper, C.M., Dickerson, F.B., Yolken, R.H., & Pearlson, G.D. (2007). Neuropsychological functioning in bipolar disorder and schizophrenia. Biol Psychiatry, 62, 179–186.CrossRef Google Scholar

Sherrill-Pattison, S., Donders, J., & Thompson, E. (2000). Influence of demographic variables on neuropsychological test performance after traumatic brain injury. The Clinical Neuropsychologist, 14(4), 496–503.CrossRef Google Scholar PubMed

Shuttleworth-Jordan, A.B. (1997). Age and education effects on brain-damaged subjects: “Negative” findings revisited. The Clinical Neuropsychologist, 11(2), 205–209.CrossRef Google Scholar

Silverberg, N.D., & Millis, S.R. (2009). Impairment versus deficiency in neuropsychological assessment: Implications for ecological validity. Journal of the International Neuropsychological Society, 15, 94–102.CrossRef Google Scholar PubMed

Smith, A. (1982). Symbol digit modalities test: Manual. Los Angeles: Western Psychological Services.Google Scholar

Testa, S.M., Winicki, J. M, Pearlson, G.D., Gordon, B., & Schretlen, D.J. (submitted). Accounting for estimated IQ in neuropsychological test performance with regression-based norms.Google Scholar

Thornton, A., & Raz, N. (1997). Memory impairment in multiple sclerosis: A quantitative review. Neuropsychology, 11(3), 357–366.CrossRef Google Scholar PubMed

Van Breukelen, G.J.P., & Vlaeyen, J.W.S. (2005). Norming clinical questionnaires with multiple regression: The Pain Cognition List. Psychological Assessment, 17(3), 336–344.CrossRef Google Scholar PubMed

Van der Elst, W., Van Boxtel, M.P.J., Van Breukelen, G.J.P., & Jolles, J. (2005). Rey’s verbal learning test: Normative data for 1855 healthy participants aged 24–81 years and the influence of age, sex, education, and mode of transportation. Journal of the International Neuropsychological Society, 11, 290–302.CrossRef Google Scholar

Van der Elst, W., Van Boxtel, M.P.J., Van Breukelen, G.J.P., & Jolles, J. (2006a). The Stroop Color-Word Test: Influence of age, sex, and education; and normative data for a large sample across the adult age range. Assessment, 13(1), 62–79.CrossRef Google Scholar PubMed

Van der Elst, W., Van Boxtel, M.P.J., Van Breukelen, G.J.P., & Jolles, J. (2006b). The Concept Shifting Test: Adult normative data. Psychological Assessment, 18(4), 424–432.CrossRef Google Scholar PubMed

Vanderploeg, R.D., Axelrod, B.N., Sherer, M., Scott, J., & Adams, R.L. (1997). The importance of demographic adjustments on neuropsychological test performance: A response to Reitan and Wolfson (1995). The Clinical Neuropsychologist, 11(2), 211–217.CrossRef Google Scholar

Zachary, R.A., & Gorsuch, R.L. (1985). Continuous norming: Implications for the WAIS-R. Journal of Clinical Psychology, 41(1), 86–94.3.0.CO;2-W>CrossRef Google Scholar PubMed