The reasons behind the rapid increase in the use of health-related quality of life (HRQoL) instruments in health care has been the growing recognition of the importance of understanding the impact of healthcare interventions on the patients' lives, rather than just on their bodies (1). HRQoL has become an important consideration in the allocation of resources for health care. Intermittent claudication is a common condition that has a major impact on the patients HRQoL (32). Claudication can become debilitating and mobility limiting, resulting in an impaired HRQoL and the loss of functional independence (15). Patients with claudication complain of leg muscle pain when walking, which can be relieved, usually in a few minutes, by resting. The pain is experienced distal to the arterial stenosis or occlusion. Atherosclerosis is the most common cause of lower limb atherosclerotic disease (LLAD) in Western nations (11).
During the past two decades, hundreds of HRQoL instruments have been developed, which are being increasingly incorporated into clinical trials to evaluate the effectiveness of health care. There is a difficulty in finding the appropriate instrument among hundreds. Head-to-head comparisons of the properties of different instruments have become increasingly important to establish the most appropriate one for a particular disease—in this case for LLAD. It is important to identify the dimensions that are influenced by the severity and nature of the disease when selecting a suitable HRQoL instrument. Generic instruments cover a broad range of dimensions and allow for comparisons between different groups of patients. Disease-specific instruments are specially designed for a particular disease, patient group or area of function. Currently, there exist only a few disease-specific instruments for patients with LLAD. However, even these instruments are quite new and have only undergone a limited validation process (26).
When choosing the instruments for this study it was considered important that they were applicable to Finnish culture and were generic in nature. Several generic instruments have been used to measure HRQoL of LLAD patients, such as the Sickness Impact Profile (2;31), the EuroQoL (5;6;10;18), the Health Utilities Index Mark III (5–7;10), RAND 36-Item Health Survey 1.0 (4–6), the World Health Organization's Quality of Life Assessment Instrument-100 (8), the Nottingham Health Profile (NHP) (22;23;25;33;34), and the SF-36 (7;17;18;23;33). Previous studies (7;17;18;23;25;33;34) have shown that both the NHP and the SF-36 have advantages and disadvantages in the LLAD patient group. The 15D has been and is being used in a wide range of conditions (see www.15D-instrument.net), but according to a review of the literature, not among patients with LLAD. Therefore, the 15D was chosen to be compared against the NHP, because the latter had been used in Finland previously in this patient group (25). The 15D and the NHP instruments have the additional advantage of being inexpensive to administer.
Outcome measures need to satisfy different criteria to be useful for clinical practice. An ideal HRQoL instrument must have feasibility aspects of being acceptable to patients, simple and easy to use, and preferably short in length. Construct validity is one of the most important characteristics, but validation is a lengthy and ongoing process. Construct validation involves gathering external empirical evidence, convergent or discriminant, so that meaningful inferences can be made with the measure. The sensitivity of a measure entails two aspects: First, the ability to distinguish between individuals and groups in different health states cross-sectionally (discriminatory power) and second, the ability to detect changes in health status of individuals or groups over time (responsiveness to changes).
Comparisons of HRQoL instruments and their psychometric properties are needed to provide recommendations about their usefulness as outcome measures for LLAD patients. The aim of this study was to compare the psychometric properties of two generic HRQoL instruments, the 15D and the NHP, in terms of feasibility, cross-sectional construct validity, discriminatory power, and responsiveness to change in patients with LLAD.
MATERIALS AND METHODS
Altogether 223 patients with LLAD (N = 223) between 2001 and 2004 were invited to participate in this study at the Clinic of Surgery at the Oulu University Hospital. The patients were informed and asked to participate through a covering letter sent or given at the hospital. Those who gave informed consent formed the study group. It consisted of LLAD patients classified as Fontaine II (intermittent claudication) and scheduled for endovascular treatment (85 patients) or elective surgery (31 patients). Those patients with conservative treatment (64 patients) were drawn from patient files.
The patients self-administered a set of questionnaires including the HRQoL questionnaires and questions dealing with age, sex, subjective health, and asymptomatic walking distance before and 12 months after the treatment. At baseline, the questionnaires were either mailed to the patients or handed out at the hospital before treatment. The patients completed the questionnaires while the researcher waited or they returned the questionnaires by mail. Approximately 1 year after treatment, the patients were invited to an extra hospital visit. Some of the patients completed the follow-up questionnaires either during the visit or at home, returning them by mail. Questionnaires were also mailed to patients who were unable to come to the extra visit and were asked to return the questionnaires by mail.
Ankle-brachial pressure index (ABI) is an objective measure of the resting ankle-brachial pressure (12). The change in the ABI within the range of ±.15 is considered clinically important (35). A higher value signifies an improvement in arterial perfusion by means of collaterals. ABI was measured at baseline and during the extra visit.
HRQoL was measured by the 15D and the NHP. The 15D is a generic, 15-dimensional, standardized instrument consisting of 15 dimensions: moving, vision, hearing, breathing, sleeping, eating, speech, elimination, usual activities, mental functioning, discomfort/symptoms, depression, distress, vitality, and sexual activity (30). Each dimension is divided into five levels by which more or less an attribute is distinguished. The 15D can be used as a profile and single index score measure. The single index score (15D score) is calculated from the health descriptive system through the use of an additive three-stage valuation model based on the multiattribute utility theory. The preference weights for the 15 dimensions and their levels have been elicited from representative population samples (29). The 15D score is on a 0–1 scale, where 1 stands for “full health” and 0 being dead. Completing the 15D questionnaire takes 5–10 minutes, and it describes the respondent's current HRQoL profile. A change in the 15D score of ±.03 is clinically important in the sense that patients can on average feel the difference (13).
The developers of the NHP have emphasized that it was made to measure distress caused by health problems and that the items intentionally measure relatively severe problems (20). It can be classified as a generic health measure because it has been made to measure the perceived health and the change in health with a large variety of diseases. The current version of the NHP was published in 1981 (19). The Finnish series of statements and the weights given to the items (24) were approved in 1991 at the Fourth European NHP Symposium in Gothenburg, Sweden.
Part I of the NHP consists of thirty-eight statements dealing with health problems. The statements make up six dimensions of subjective health: physical mobility, pain, sleep, energy, emotional reactions, and social isolation. The respondent is asked to reply “yes” if the statement applies to his or her current status or “no” if the statement does not. The final weight of each “yes” response is determined through a standardization procedure that is applied differently in every country. A patient who has no health problems on the pain dimension, for example, is given an index value of 0 on this dimension, while a patient who has all the health problems mentioned on that dimension receives an index of 100. Part II asks about any effects in health in seven areas of daily life: work, looking after the home, social life, home life, sex life, interests, hobbies, and holidays. The relative importance of the different dimensions has not been defined, and a health profile (e.g., a bar graph) is therefore made from the dimension scores.
The data were analyzed using the SPSS software package (version 12.0), and a p value of <.05 was taken to be statistically significant. To derive the 15D score with the valuation algorithm, there must be a response to each question (dimension). The missing data were replaced according to the instructions of the developer of the 15D instrument (30). At baseline, the researcher called twenty-six patients and inquired about the missing data in the NHP. Summary scores and scores for dimensions for both instruments were calculated according to their scoring algorithms (24;30). However, the construct validity analyses were carried out only among patients with complete data for both instruments at baseline (n = 151) and at 12 months (n = 126). The feasibility of the health state's descriptive system is compared by looking at the response and completion rates.
The cross-sectional construct validity was evaluated for aspects of convergent and discriminant validity by the multitrait–multimethod (MTMM) matrix (14) based on roughly comparable dimensions of the NHP and 15D. These were pain, physical mobility, sleep, emotional reactions, and energy in the NHP, and discomfort and symptoms, mobility, sleeping, depression, distress, and vitality in the 15D. According to Polit and Beck (28) r of .70 is high for most psychosocial variables, such kinds of correlations between variables are typically in the .10–.40 range. For convergent evidence, different sets of scores (the 15D score, level values of 15D dimensions comparable to the NHP dimensions and the NHP dimension scores) were correlated (Spearman) with the patients own subjective health status. It was measured using a Visual Analogue Scale (VAS), in which 100 means the best health imaginable and 0 meaning the worst health imaginable. The VAS scale was used as a whole when calculating the correlations of the patients' subjective health status with different sets of scores.
As another test for convergent evidence for the 15D score, extreme group comparisons with the t-test were carried out to test the following a priori hypotheses: (i) The patients who reported poor subjective health status have a lower average 15D score than those patients who report to have a good subjective health status and that the difference between the two groups is statistically significant and clinically important (≤−.03). For this comparison, the VAS scale was divided into three classes (good = VAS score >60, moderate = VAS score 40–60, and poor = VAS score <40). Only good and poor classes were compared. (ii) Patients who have a severe disease (short asymptomatic walking distance <150 m) have a lower average score than those patients who have a less severe disease (walking distance >150 m), and the difference is statistically significant and clinically important.
The same comparisons and hypotheses were also applied to the 15D dimensions comparable to the NHP dimensions. The differences between the groups were expected to be ≤−.03, even if this difference cannot be regarded as a clinically important change in the same sense as for the 15D score. Furthermore, the same comparisons and hypotheses were applied to the NHP dimensions. The differences between the groups were expected to be ≥3 on a 0–100 scale, even if such a difference cannot be regarded as a clinically important change, because it has not been defined for the NHP dimensions.
The discriminatory power was assessed at baseline and at 12 months by looking at the percentages of patients who were at the “ceiling” for different dimensions and for the 15D score (=1). The corresponding percentages at the “floor” indicate the range of health states used. Responsiveness to change was assessed at 12 months by effect sizes, using the formula where change in the mean score from baseline to 12 months is divided by the standard deviation at baseline (21). Cohen (9) suggests that an effect size of .20 is small, .50 is moderate, and .80 is large. Responsiveness to change was also studied by observing for how many percent of the patients the ABI index changed ≥.15 at 1 year (a clinically important improvement), and similarly for how many percent of the patients the level value of the 15D dimensions of mobility and discomfort/symptoms changed ≥.03, and the score of the NHP dimensions of physical activity and pain ≤−3.
Ethical Considerations
This study was approved by the Ethics Committee of the Faculty of Medicine, University of Oulu. The ethical principles of the Declaration of Helsinki were observed.
RESULTS
Feasibility
Response Rates
Of the 223 patients who were invited to participate in this study, 43 refused (23 women and 20 men). The final sample was 180 patients. Among women, the reasons for refusing were often the suffering from other concurrent diseases, having to take care of sick relatives, and being too tired to answer. Some patients reported being too anxious to answer, and a small minority reported the questionnaires to be too complicated. The response rate at 12 months was 83.2 percent. Seven patients had died, and one patient had moved to another country and did not want to participate in the study anymore. Two of the conservatively treated patients had later undergone percutaneous transluminal angioplasty (PTA), and one patient had undergone surgery. The previous two patients were excluded from further analysis. Among patients who had been treated with endovascular treatment, eleven had later undergone repeated PTA and two had undergone surgery. These two were excluded from further analysis.
Completion Rates
At baseline, four of the 15D questionnaires had to be rejected for being blank. The 15D questionnaire was completed fully by 84.1 percent of patients. The completion rates by dimensions were 98–100 percent (except for 86.4 percent for sexual activity). The lower completion rate for the dimension of sexual activity may indicate that this dimension is slightly less acceptable than the others (30). After replacing missing data (29), the full 15D data were available for 176 of the 180 patients (97.8 percent). At 12 months, the 15D questionnaire was fully completed by 82 percent of the patients. The completion rates by dimensions were 95–99 percent (except for 84.3 percent for sexual activity). After replacing the missing data (29), the full 15D data could be used for all 153 patients (100 percent).
Among the NHP questionnaires, 75.8 percent were fully completed at baseline. The completion rates by dimensions were 77–97 percent. Four of the NHP questionnaires had to be rejected for being blank. They were not from the same patients whose 15D questionnaires were rejected as being blank. Some patients completed one HRQoL questionnaire better (usually the 15D) than the other. After inquiries about missing data, the full NHP data were available for 176 of the 180 patients (97.8 percent). At 12 months, the NHP questionnaire was fully completed by 79 percent of patients. The completion rates by dimensions were 82–94 percent.
Construct Validity
Cross-Sectional Construct Validity
The MTMM matrix in Table 1 shows that the correlations of the 15D dimensions with the comparable NHP dimensions are consistently higher (except vitality versus energy) than the correlations with the noncomparable scales measuring dissimilar attributes. This finding is a pattern that scales with convergent and discriminant validity are expected to exhibit, thus providing evidence for the construct validity of the 15D and the NHP measures. The correlations between different sets of scores (the 15D score, level values of 15D dimensions comparable to the NHP dimensions, and the NHP dimension scores) and the subjective health status ranged from .23 to .52.

As expected, there was a statistically significant and clinically important difference in the mean 15D score between patients with poor and good subjective health status (Table 2) and between patients with short and long asymptomatic walking distance (Table 3). The differences between the groups were also statistically significant on all 15D and NHP dimensions except one, and the differences were also of an expected magnitude (Tables 2 and Tables 3). These comparisons also provide further evidence of cross-sectional convergent construct validity.


Discriminatory Power
There was a higher percentage of “floor” scores at baseline and at 12 months for the NHP. For “ceiling” scores, this tendency was not so clear, apart from scores at 12 months. Both results suggest that the NHP has less discriminatory power than the 15D (Table 4).

Responsiveness to Changes in Health Status
The 15D dimensions of mobility and discomfort/symptoms showed moderate effect sizes at 12 months, whereas other dimensions of the 15D and the NHP dimensions showed small effect sizes (Table 4). On the other hand, 58.0 percent of patients had a change of ≥.15 in the ABI index at 12 months, whereas 38.9 percent and 38.0 percent of patients had experienced a change of ≥.03 on the 15D dimensions of mobility and discomfort/symptoms, and 55.6 percent and 48.1 percent a change of ≤−3 on the NHP dimensions of pain and physical mobility.
DISCUSSION
The 15D showed slightly higher completion rates than the NHP. However, the NHP was almost equally acceptable. According to other studies as well, the 15D (16;30) and the NHP (14;33) are user-friendly.
The MTMM matrix provided clear convergent and discriminant evidence of cross-sectional construct validity of the 15D and the NHP. The convergent validity correlations were quite high, being at the range of .40–.682 on roughly comparable dimensions. Comparisons of average scores in the groups with subjective poor and good health status and among the short and long asymptomatic walking distances increased the evidence of cross-sectional construct validity. It is impossible to say, however, whether the differences between the groups were clinically important on the 15D and NHP dimensions, because the minimum clinically important differences has not been defined for them, only for the 15D score.
The results showed that the discriminatory power of the 15D on roughly comparable dimensions was superior to the NHP. Other studies have also reported higher “floor” and “ceiling” effects for the NHP, for example, compared against SF-36 in patients with LLAD (23;33), 15D in brain tumor patients (27), patients who had undergone bypass surgery, and even compared against the general public (29;30).
Both the 15D and the NHP were almost equally responsive instruments in detecting changes in the HRQoL over time in patients with LLAD. The 15D had a tendency of showing higher effect sizes than the NHP. The NHP showed quite similar percentages of patients, who experienced a ≤−3 point change on the dimensions of pain and physical mobility, to the percentage of patients, who gained a clinically important change of ≥.15 on the ABI index. The percentages of patients, who experienced a ≥.03 change on the 15D dimensions of discomfort/symptoms and mobility were somewhat lower. However, the percentages on the 15D and NHP dimensions are not directly comparable due to different anchor states of the scales. Bosch et al. (5) suggests that the measurement of clinical parameters alone has limitations and the measurement of HRQoL provides more information about the patients' health status. An advantage of the 15D may be the use of a Likert-type response format with several different scores and its ability to detect positive as well negative states of health, whereas the NHP items are dichotomous and state more extreme ends of ill health.
The analyses in this study focused only on the comparable dimensions of physical and mental health identified by the WHOQOL group (36). However, it must be kept in mind that the dimensions of the 15D and NHP are not the same, but only roughly comparable. One limitation to this study could be that the test–retest reliability was not conducted. There is increasing evidence that, in the case of chronic diseases, people do not assess their HRQoL against a fixed reference point, but one that shifts in the light of experiences (3). HRQoL is a dynamic construct that changes across the disease and its treatment.
CONCLUSION
Both instruments were feasible; neither was time-consuming, and both were easy to use. Both instruments had good cross-sectional construct and longitudinal validity.
POLICY IMPLICATIONS
This study provided clear evidence that both the 15D and the NHP are appropriate instruments for patients with LLAD.
CONTACT INFORMATION
Kirsi Koivunen, RN, MNSc (kirsi.koivunen@oulu.fi), Researcher, Department of Nursing Science and Health Administration, University of Oulu, P.O. Box 5000, FIN-90014, University of Oulu, Finland
Harri Sintonen, PhD (harri.sintonen@helsinki.fi), Professor of Health Economics, Department of Public Health, University of Helsinki, P.O. Box 41, FIN-00014, University of Helsinki, Finland
Hannele Lukkarinen, RN, PhD (hannele.lukkarinen@oulu.fi), Senior Lecturer, Department of Nursing Science and Health Administration, University of Oulu, P.O. Box 5000, FIN-90014, University of Oulu, Finland
This study was supported by grants from the Academy of Finland, the Finnish Association of Caring Sciences, University Hospital of Oulu, the Oulu University Scholarship Foundation.