Hostname: page-component-745bb68f8f-f46jp Total loading time: 0 Render date: 2025-02-06T20:43:05.919Z Has data issue: false hasContentIssue false

Comparing short form and RAND physical and mental health summary scores: Results from total hip arthroplasty and high-risk primary-care patients

Published online by Cambridge University Press:  28 May 2004

Chris M. Blanchard
Affiliation:
American Cancer Society
Isabelle Côté
Affiliation:
Innovus Research Inc
David Feeny
Affiliation:
Institute of Health Economics University of Alberta Health Utilities Incorporated
Rights & Permissions [Opens in a new window]

Abstract

Objectives: Summary physical health scores for the Short Form (SF) measures are computing using positive weights for physical items and negative weights for mental health items. Mental health summary scores use positive weights for mental items and negative weights for physical. The RAND Health Status Inventory (HSI) measures do not use negative weights. Do these different approaches to scoring matter? The objective was to compare summary scores using both the SF and RAND-HSI.

Methods: SF-36 and the Health Utilities Index Mark 3 (HUI3) were administered to a cohort of patients waiting for elective total hip arthroplasty (THA). SF-12 and HUI3 were administered to a cohort of high-risk primary-care patients. Summary scores were generated and compared. Single-attribute utility scores for emotion in HUI3 were also computed. Canadian and US norms for SF, RAND-HSI, and HUI3 were used to interpret results.

Results: For THA patients, mean physical health scores were 28 and 36 for SF and RAND-HSI. Mean mental health scores were 55 and 42. For the primary-care patients, the scores were 34 and 36 for physical and 46 and 40 for mental health.

Conclusions: SF and RAND-HSI provided somewhat similar summary scores in the THA study. However, SF and RAND-HSI mental health scores differed in the primary-care patient cohort and results from HUI3 corroborate the mental health deficits identified by the RAND-HSI. It may be wise for investigators to use both SF and RAND-HSI scoring systems.

Type
GENERAL ESSAYS
Copyright
© 2004 Cambridge University Press

Are physical and mental health distinct or do they interact? One prominent family of health status measures, the Medical Outcomes Study Short-Form (SF) set of measures (21), assumes that physical and mental health are independent and uncorrelated (orthogonal factor rotation). Another prominent and closely related set of measures are those of the RAND. The RAND 1.0 (9) permits interdependence and correlation between physical and mental health (oblique factor rotation). In a subsequent version, the RAND Health Status Inventory (HSI) (10;13), scores permit interdependence and are also based on item-difficulty weights derived from item-response theory (IRT). (References on IRT include Hays (1012) and Revicki and Cella [17].)

In computing the SF Physical Component Summary Score (PCS), positive weights are attached to scores for items on physical functioning, role-physical, bodily pain, and general health and negative weights are assigned to items on vitality, social functioning, role-emotional, and mental health. Similarly in computing the SF Mental Component Summary Score (MCS), positive weights are attached to scores for items on vitality, social functioning, role-emotional, and mental health, and negative weights are attached to items on physical functioning, role-physical, bodily pain, and general health. In contrast in the RAND approach, no negative weights are applied.

In many applications the SF and RAND approaches to scoring provide similar results. However, in some cases there appear to be important differences between the scores generated by the SF and RAND systems (1;2;16;18;19;20;23). This study reports on results using both the SF and the RAND-HSI to analyze data from two studies, one involving patients waiting for total hip arthroplasty (THA) and one involving primary-care patients at high risk for medication problems. The study examines whether or not results vary by scoring method.

METHODS

Elements common to the methods in both studies will be described first.

Measures

The Short-Form 36 (SF-36) and Short-Form 12 (SF-12) are among the most widely used questionnaires for the assessment of health status (21). SF-36 includes thirty-six items and generates a score for each of the eight domains of health status covered by the measure: physical functioning, role-physical, bodily pain, general health, vitality, social functioning, role-emotional, mental health. The best score for each domain and summary score is 100, whereas the worst score is 0. SF-12 includes twelve items covering the eight domains.

The RAND-36 and RAND-12 use the same health-status assessment questionnaire as the SF-36 and SF-12 (9). The scoring system for the RAND-HSI, however, differs from the scoring system for the SF measures (10;13). RAND-HSI scores are based on an oblique factor rotation and item-difficulty weights derived using IRT.

The Health Utilities Index Mark 3 (HUI3) is a preference-based generic system (5;6;8). Health status is described according to eight dimensions of health status: vision, hearing, speech, ambulation, dexterity, emotion, cognition, and pain and discomfort. There five or six levels for each dimension, ranging from highly impaired (for vision, blind) to normal. Single-attribute utility scores are available for each dimension of health status. Normal is assigned a score of 1.00; the most impaired level (blind) is assigned a score of 0.00. Health states consist of eight-element vectors. Scores for overall health are on the dead = 0.00 and perfect health = 1.00 scale and are based on community preferences in Canada.

Data Analysis

Domain scores were calculated for the SF-36 and RAND36-HSI by using the algorithms outlined by the respective developers for each scale. Similarly, physical and mental health summary scores for the SF-12 and RAND-12 HSI were calculated by using the algorithms specified by the developers. HUI3 scores were derived by using standard HUI procedures.

Patients and Procedure, THA Study

The main study was concerned with documenting health-related quality of life (HRQL) while waiting for undergoing and recovering from THA. The study has been described previously (7;15) and will be described here briefly. Before commencing the study, approval was obtained from the local Human Ethics Committee. All patients who were referred for “hip disease” between November 1993 and 1996 to any of seven surgeons performing THA in London, Ontario, were potentially eligible to participate in the study. Eligible patients who provided consent were invited to attend an outpatient department for a baseline assessment. Upon arrival to the clinic, the following data were collected from the patients: (a) age, gender, home address, employment status, duration of hip disease symptoms, and presence of comorbid conditions; (b) SF-36; (c) Health Utilities Index Mark 2 (HUI2) and HUI3 systems; (d) several disease-specific measures of health status and health-related quality of life; and (e) the six-minute walk test. Patients who were put onto a waiting list for THA continued to participate in a longitudinal study examining HRQL after THA (15). For the purposes of the present study, which focused on comparing the SF-36 and RAND-36 HSI, we analyzed data from the baseline assessment (with a larger sample size than at follow-up) of the patients eventually put on the waiting list for THA to focus on patients with documented hip disease severe enough to warrant THA. Given that patients enrolled in the study were candidates for THA, one would expect that the HRQL measures would identify problems in the physical components of health status.

Patients and Procedures, High-Risk Primary Care Patients Study

The main study was concerned with an evaluation of the provision of primary care by multidisciplinary teams of providers (physicians, nurses, and pharmacists) for patients in the community at high risk for medication problems (3;4). The recruitment of the patients into the study occurred between October of 1999 and March of 2000 in Edmonton, Alberta. Patients were assessed at baseline (study entry), 3, and 6 months. Patients completed the SF-12, a questionnaire including the HUI2 and HUI3, the Morisky instrument on adherence, a questionnaires on their utilization of health-care services, a questionnaire on their satisfaction with the care they were receiving, and, at baseline, a questionnaire on their sociodemographic characteristics. Only results from the baseline assessment (n = 199) are used in the analyses reported in this study. The study was approved by the Research Ethics Board, Panel B, the University of Alberta. The heterogeneous nature of the cohort of high-risk primary care patients suggests that both physical and mental health problems would be identified by the HRQL instruments.

Population Norms

For the SF-36, Canadian norms have been established by Hopman et al. (14). According to these norms, the mean scores (standard deviation) for PCS and MCS in the general population are 50.5 (9.0) and 51.7 (9.1). Canadian and US norms for SF-36 are very similar. For the RAND-36, norms for the United States are available from Hays (10). According to those norms, a physical health composite score lower than 43 suggests that individuals perceived physical health problems that impede life functioning, whereas a PHC score higher than 53 means that these individuals are less likely to have physical health problems that impede life functioning. Mental health composite (MHC) scores lower than 39 mean that individuals are likely to report psychological symptoms that might impede life functioning, whereas a score higher than 53 suggests that these individuals are less likely to perceive mental health problems that impede life functioning. Results to date suggest that published norms for SF-36 and RAND-36 can be used to interpret SF-12 and RAND-12 scores. Data on population norms for HUI3 single-attribute utility scores for emotion are from Statistics Canada 1996–97 National Population Health Survey of persons residing in the community.

RESULTS

THA Study

Descriptive statistics are presented in Table 1. As expected, given that the patients were put on the waiting list for THA, both the SF and RAND measures highlight burdens in physical health. For the cohort, SF-36 does not identify problems with mental health. However, the RAND36-HSI results raise the suspicion that there may be mental health problems.

High-Risk Primary Care Patient Study

Descriptive statistics for SF-12 and RAND12-HSI are presented in Table 2. As expected both measures identify problems with physical health. While the RAND12-HSI also indicates mental health problems, mental health according to the SF-12 is only marginally impaired.

Additional evidence on the mental health of the THA and high-risk patients is provided in Table 3. Single-attribute utility scores for emotion for HUI3 indicates that THA patients had emotion scores consistent with age-matched norms from the general population. Scores for HUI3 emotion for the high-risk primary-care patient population indicate mental health below norms for an age-matched cohort in the general population. For the high-risk patients, the HUI3 results corroborate the RAND12-HSI results for the mental health component.

DISCUSSION

As has been the case in many studies, SF-36 and RAND-36 HSI scores provide somewhat similar information in the cohort of hip replacement patients. The physical health problems of these patients were also documented by a wide variety of disease-specific, generic, and preference-based measures (15). That these patients did not on average have problems with emotional and mental health was also confirmed by other measures.

The results in the high-risk primary care patient cohort, however, are different. Many of these patients had both physical and mental health problems. That mental health problems were evident in the cohort is corroborated by an analysis of the single-attribute utility scores for emotion from HUI3. In this case, the SF-12 and RAND-12 mental health summary scores differ and evidence from HUI3 indicates that the RAND-12 scores are more plausible.

A limitation of the primary-care patient study is that SF-12 rather than SF-36 was used. As a result corroborative evidence based on the four domain scores for the domains of mental health is not available. A limitation of both studies is the reliance on US rather than Canadian norms for interpreting RAND-HSI scores. However, the Canadian and US norms for the SF measures are very similar, so it would seem unlikely that using US norms for the RAND would importantly affect the interpretation of the results.

It is important to compare the results for the primary-care patients with those from several studies reported in the literature. In a prospective study involving patients initiating antidepressant therapy, Simon et al. (18) noted that, even though at the three-month follow-up patients reported modest improvements in the physical health subscales and large improvements in the mental health subscale, the PCS score was unchanged. The moderate physical improvements were completely offset by the large mental health improvements – the effect of the negative weights. Although the domain scores provided an accurate picture of the changes in health experience, the SF-36 physical summary score did not.

Wilson et al. (23) applied both orthogonal and oblique factor rotation to SF-36 data from an Australian population health survey and found problems with the PCS and MCS summary scores based on orthogonal factor rotation (see also 2;19;20;22). Wilson et al. note that summary scores calculated without the use of negative weights were consistent with the underlying individual domain scores.

Birbeck et al. (1), in a study of patients with epilepsy, found the SF-36 and SF-12 less responsive to change than the epilepsy-targeted measures. In contrast, the RAND-36 HSI was as responsive as the epilepsy-targeted measures.

In the context of multiple sclerosis (MS), Nortvedt et al. (16) found that the Expanded Disability Status Scale (EDSS) was correlated with each of the four mental health components of the SF-36 but not significantly correlated with the SF-36 MCS score. In contrast, the EDSS was correlated significantly with the RAND MHC summary score. Similar to results reported here for the primary-case cohort, SF mental health scores indicated that the cohort of MS patients were only slightly below the general population, while the RAND MHC indicated that they were substantially below the population norm.

POLICY IMPLICATIONS

One interpretation of the results reported in this study is that in contexts in which patients have both physical and mental health problems, and in particular, if there are interactions between those problems, the negative weights used to compute the summary scores in the SF systems may distort the results. In such circumstances, it may be useful for investigators to recompute the summary scores by using the RAND-HSI to assess the robustness of the results. It may also be wise to use the thirty-six item version that provides summary scores for the each of the eight domains. Investigators will then be able to examine if the mental health summary score is consistent with its constituent parts and if the physical health summary score is consistent with its constituent parts. Finally, investigators might want to consider using other measures of physical and mental health that can corroborate the results to mitigate any problems associated with distortions in the summary scores.

Financial Support: The study, “The Effect of Waiting for Elective Hip Arthroplasty on Health-Related Quality of Life,” was supported by a grant from Physician Services Incorporated (PSI) of Ontario to Dr. Jeffrey Mahon (grant 94–30). The analyses reported in this study were supported by grants from the Alberta Heritage Foundation for Medical Research (AHFMR; 199909) and the Institute of Health Economics (IHE) to David Feeny. PSI, AHFMR, and IHE played no role in the design, interpretation, or analysis of the project and has not reviewed or approved of this manuscript. The authors gratefully acknowledge the input of Dr. Jeffrey L. Mahon, Dr. Robert Bourne, Dr. Cecil Rorabeck, Larry Stitt, and Susan Webster-Bogaert to the Hip Study. The authors also gratefully acknowledge the help of the patients and surgeons who participated in the study (London, Ontario orthopedic surgeons Drs. Harvey Bailey, David Chess, Wayne Grainger, Paul Kim, and Richard McCalden). Financial support for the study of primary health-care teams was provided by a grant from the Health Transition Fund. Financial support for the analyses presented in this study was provided by a grant from the Merck Company Foundation to the Institute of Health Economics. The HTF, Merck Company Foundation, and IHE played no role in the design, interpretation, or analysis of the project reported here and have not reviewed or approved of this manuscript. The authors gratefully acknowledge the input of Dr. Karen B. Farris, Dr. Jeffrey A. Johnson, Dr. Ross T. Tsuyuki, Sherry L. Dieleman, Sandra Brilliant, Ross Bayne, Leslie Gardiner, Dr. David Moores, and Marj Sandilands to the Primary-Care Study. The authors acknowledge the constructive comments of two anonymous reviewers. Conflict of interest: It should be noted that David Feeny has a proprietary interest in Health Utilities Incorporated, Dundas, Ontario, Canada. HUInc. distributes copyrighted Health Utilities Index (HUI) materials and provides methodological advice on the use of HUI. None of the other authors have any conflicts of interest.

References

Birbeck G, Kim S, Hays RD, Vickery BG. 2000 Quality of life measures in epilepsy. How well can they detect change over time? Neurology. 54: 18221827.Google Scholar
Coons SJ, Rao S, Keininger DL, Hays RD. 2000 A comparative review of generic quality-of-life instruments. Pharmacoeconomics. 17: 1335.Google Scholar
Côté I, Farris KB, Feeny D, et al. 2002 Using multi-disciplinary teams to improve primary care: Quality of medication use in the community. Institute of Health Economics Working Paper No. 02-01, February,
Farris K, Côté I, Feeny D, Johnson JA, et al. Using multi-disciplinary teams to enhance primary health care: A demonstration project. Can Fam Physician. forthcoming.
Feeny D, Torrance GW, Furlong W. 1996 Health Utilities Index. In: Spilker B, ed. Quality of life and pharmacoeconomics in clinical trials. Philadelphia: Lippincott-Raven; 239251.
Feeny D, Furlong W, Torrance GW, et al. 2002 Multi-attribute and single-attribute utility functions for the health utilities index Mark 3 system. Med Care. 40: 113128.Google Scholar
Feeny D, Blanchard C, Mahon JL, et al. 2003 Comparing community-preference based and direct standard gamble utility scores: Evidence from elective total hip arthroplasty. Int J Technol Assess Health Care. 19: 361371.Google Scholar
Furlong WJ, Feeny DH, Torrance GW, Barr RD. 2001 The Health Utilities Index (HUI) system for assessing health-related quality of life in clinical studies. Ann Med. 33: 375384.Google Scholar
Hays RD, Sherbourne CD, Mazel RM. 1993 The Rand 36-item health survey 1.0.Health Econ. 2: 217227.Google Scholar
Hays RD. 1998 R36 H.S.I. Rand 36 – 36 health status inventory. Orlando: The Psychological Corporation/Harcourt Brace & Company;
Hays RD. 1998 Item response theory models. In: Staquet MJ, Hays RD, Fayers PM, eds. Quality of life assessment in clinical trials: Methods and practice. Oxford: Oxford University Press; 183190.
Hays RD, Morales LS, Reise SP. 2000 Item response theory and health outcomes measurement in the 21st century. Med Care. 38: II28II42.Google Scholar
Hays RD, Morales LS. 2001 The RAND-36 measure of health-related quality of life. Ann Med. 33: 350357.Google Scholar
Hopman WM, Towheed T, Anastassiades T, et al. 2000 Canadian normative data for the SF-36 health survey. Canadian Multicentre Osteoporosis Study Research Group. Can Med Assoc J. 163: 265271.Google Scholar
Mahon JL, Bourne R, Rorabeck C, et al. 2002 The effect of waiting for elective total hip arthroplasty on health-related quality of life. Can Med Assoc J. 167: 11151121.Google Scholar
Nortvedt MW, Riise T, Myhr KI, Nyland HI. 2000 Performance of the SF-36, SF-12, and RAND-36 summary scales in a multiple sclerosis population. Med Care. 38: 10221028.Google Scholar
Revicki DA, Cella DF. 1997 Health status assessment for the twenty-first century: Item response theory, item banking, and computer adaptive testing. Qual Life Res. 6: 595600.Google Scholar
Simon GE, Revicki DA, Grothaus L, Vonkorff M. 1998 SF-36 Summary scores. Are physical and mental health truly distinct? Med Care. 36: 567572.Google Scholar
Taft C, Karlsson J, Sullivan M. 2001 Do SF-36 summary component scores accurately summarize subscale scores? Qual Life Res. 10: 395404.Google Scholar
Taft C, Karlsson J, Sullivan M. 2001 Reply to Drs. Ware and Kosinski. Qual Life Res. 10: 415420.Google Scholar
Ware JE, Snow KK, Kosinski M, Gandek B. 1993 SF-36 health survey manual and interpretation guide. Boston: New England Medical Center, The Health Institute;
Ware JE, Kosinski M. 2001 Interpreting SF-36 summary health measures: A response. Qual Life Res. 10: 405413.Google Scholar
Wilson D, Parsons J, Tucker G. 2000 The SF-36 summary scales: Problems and solutions. Soz Praventivmed 45: 239246.Google Scholar
Figure 0

Descriptives for the SF-36 and RAND-36 Domain and Summary Scores

Figure 1

Physical and Mental Health Summary Scores of the SF-12 and RAND-12-HSI at Baseline for the High-Risk Primary-Care Patients Study

Figure 2

Mean (Standard Deviation) Health Utility Index Mark 3 Single-Attribute Utility Scores for Emotion for Total Hip Arthroplasty Patients and Corresponding Age-Matched Canadian Population Norms