Published online by Cambridge University Press: 01 November 2004
Objectives: This study tested the reliability and validity of the Western Canada Waiting List Project priority criteria score (PCS) for prioritizing patients waiting for hip and knee arthroplasty.
Methods: Sixteen orthopedic surgeons assessed 233 consecutive patients at consultation for hip or knee arthroplasty. Measures included the PCS, a visual analogue scale of urgency (VAS urgency), and maximum acceptable waiting time (MAWT). Patients completed a VAS urgency, an MAWT, the Western Ontario McMaster Osteoarthritis Index (WOMAC), and the EQ-5D. Using correlational analysis, convergent and discriminant validity was assessed between similar constructs in the priority criteria and WOMAC. Median MAWTs were determined for five levels of urgency based on PCS percentiles. Internal consistency reliability was assessed with Cronbach's alpha.
Results: The sample of 233 patients (62 percent female) ranged in age from 18 to 89 years (mean, 66.3 years). A total of 45 percent were booked for hip and 55 percent for knee arthroplasty. Correlations were strong between the PCS and surgeon VAS urgency (r=.79) and weaker between patient and surgeon measures of VAS urgency (r=.24) and MAWT (r=.44). Correlation coefficients between similar constructs in the priority criteria and WOMAC ranged from 0.24 to 0.32 and were higher than those measuring dissimilar constructs. For decreasing levels of urgency, the median MAWT ranged from 10 to 12 weeks for surgeons and 4 to 12 weeks for patients. Cronbach's alpha was 0.79.
Conclusions: Results support the validity of the PCS as a measure of surgeon-rated urgency. Patients might be ranked differently with different prioritization measures.
Countless Canadians are on waiting lists of varying lengths, for joint arthroplasty, with median waits of 11 to 29 weeks from assessment by a specialist to surgery (1;7;34). Although the effectiveness of arthroplasty has been well documented (31;37;41;56), the impact of the length of wait on health-related quality of life (HRQL) and health outcomes is less clear. Patients with worse preoperative status tend to show the largest improvement after surgery, partly due to regression to the mean (26;33;56). It has also been shown that patients with worse preoperative functional status may have comparatively worse pain and function up to 2 years after arthroplasty (22;33).
While some studies have linked length of wait to deterioration in pain and dysfunction (39;44), others have found no relationship (9;38). In addition, studies have shown no relationship between length of wait and postoperative HRQL (44;54;64). However, a recent study that adjusted for preoperative impairment and disability found that outpatient waits (from referral to consultation with the surgeon) of more than 6 months may have adverse effects on 12-month health outcomes of patients after hip replacement (26).
Waiting lists are managed by individual surgeons based on their clinical assessment of urgency of need. Generally, patients are not prioritized across multiple surgical lists (13;64). There is diversity among surgeons regarding indications for hip and knee replacement (45;61;66), and waiting times have been shown to have little relationship to the severity of patient pain or disability (30;64).
Priority setting is increasingly being considered to improve the fairness of wait list management by ensuring that patients with the greatest need are treated first (21;25;40;43;52;55). Priority scores are based on patient urgency and capacity to benefit from treatment (43;55). Although they are currently used for various procedures in parts of Canada, New Zealand, and the United Kingdom, evidence of their reliability and validity is minimal (43).
The Hip and Knee Replacement Priority Criteria Tool is one of five tools developed by the Western Canada Waiting List (WCWL) Project to assess patients' clinical urgency or priority (2;55). The tool was designed to rank patients in order of urgency, thus providing an explicit and transparent method for patient priority (2;24). It consists of seven criteria, each with three to four levels. Weighted scores result in a summary score, the priority criteria score (PCS), scaled from 0 (low urgency) to 100 (high urgency).
This study was designed to assess the reliability and aspects of validity of the PCS in a clinical setting. Validity is defined as a process of evaluating the degree to which the available evidence supports the interpretability, appropriateness, and usefulness of the PCS as a measure of patient urgency (51). We compared the tool with other measures of surgeon and patient urgency and HRQL. Our research questions were (i) What is the internal consistency reliability of the PCS? (ii) What is the degree of consistency and agreement between surgeon and patient ratings of urgency and maximum acceptable waiting time (MAWT)? (iii) What is the congruence between the PCS and surgeon and patient measures of urgency and HRQL? (iv) What is the convergent and discriminant validity of the PCS in relation to a patient-rated measure of pain and function? (v) What are physician- and patient-rated MAWTs for levels of urgency based on the PCS?
Consecutive patients 18 years and older who were assessed in an orthopedic surgeon's office and placed on a scheduled waiting list for hip or knee arthroplasty at two large tertiary-care centers in Alberta, Canada, were selected for recruitment to the study from December 2000 to June 2001. Patients were excluded if they were unable to read, write, or comprehend English, or if they suffered cognitive impairment. The study continued in parallel with the current system of prioritizing and booking patients. For each patient, surgeons completed a priority criteria tool based on clinical assessment. All surgeons and their office staff received directed education on the correct use of the tool. Surgeons rated patients on a 100-point visual analogue scale (VAS) urgency scale with anchors of “not urgent” and “extremely urgent”. They were also asked to estimate an MAWT for each patient with the question: “In your clinical judgment, what should be the appropriate maximum waiting time for this patient?” Waiting time was defined as the time from the surgery decision date until the date surgery was completed.
After examination, consenting subjects were asked to complete measures of self-reported urgency and health status. Patients assessed their urgency on a 100-point VAS urgency scale with anchors of “not urgent” and “extremely urgent.” A MAWT from the patient's perspective was assessed by the question: “In your judgment, what should be the appropriate maximum waiting time for you or a person like yourself?”
Self-reported health status measures included the Western Ontario McMaster Osteoarthritis Index (WOMAC) and the EuroQol (EQ-5D). The WOMAC was designed to measure function and symptoms for patients with osteoarthritis of the hip or knee (4). It is a twenty-four-item tool with three subscales measuring pain, stiffness, and function. Subscale scores were transformed to a scale of 0–100 (better to worse). The EQ-5D, a generic preference-based HRQL measure, was designed for use in the clinical and economic evaluation of health care (8). It consists of five items measuring mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. The EQ-5D index is a weighted summary score (range,–0.59 to 1) with weights derived from time trade-off measurements (19). Full health is assigned a value of 1, and death, 0. It can be combined with life years in the calculation of quality-adjusted life years (QALYs) used in cost-utility analysis (36). Evidence of construct validity and responsiveness of the EQ-5D has been shown for patients undergoing arthroplasty (6;12). Individuals also valued their health status on the EQ-VAS scaled from 0 to 100 (worst to best imaginable health state, respectively).
Cronbach's “coefficient alpha” is a method of assessing the reliability of a composite score and is also a measure of the consistency by which the items are a measure of common factors underlying the items. It ranges from 0 to 1.0, with higher coefficients indicating higher reliability. Values of alpha above 0.70 are generally regarded as indicative of acceptable reliability (57).
Correlations and paired t-tests were used to assess the consistency and average agreement between VAS urgency and MAWTs for patients and surgeons. The congruence between the PCS and other surgeon and patient measures of urgency was assessed through correlational analysis. A correlation of 0.30 was used as a minimum relevant value (23). Based on earlier validity work, we hypothesized that the PCS would be strongly correlated ([ges ].8) with surgeon-rated measures of urgency and moderately correlated with patient-rated measures ([ges ].5) (11). To assess convergent and discriminant validity, we compared responses with similar constructs in the PCS and the WOMAC based on the multi-trait, multi-method approach described by Campbell and Fiske (10). To determine comparable constructs, the content of the WOMAC items within each subscale was matched to that of the PCS items measuring conceptually similar constructs. It was hypothesized that correlations between variables measuring similar constructs would be positively correlated (convergent validity) and higher than correlations between variables measuring different constructs (discriminant validity). For example, the correlation between two measures of pain at rest (surgeon-rated and patient-rated) should be higher than correlations between pain at rest and pain on motion.
To distinguish the more-urgent from the less-urgent patients, individuals were grouped into five levels based on percentiles of the PCS distribution. Surgeon- and patient-rated MAWTs were compared for levels of urgency.
The sample consisted of 233 patients (62 percent female, 38 percent male) ranging in age from 18 to 89 (mean, 66.31; SD 11.70). Two hundred thirty patients had complete data on all of the priority criteria, and 202 patients had complete surgeon and patient data for all measures. Forty-five percent of patients were booked for hip and 55 percent for knee arthroplasty. Sixteen orthopedic surgeons participated in the study.
Coefficient alpha was .79. Descriptive statistics for all measures are presented in Table 1. The distribution of the PCS was symmetric with a mean of 47.98 (SD, 21.02). Paired t-tests showed that patients rated their VAS urgency significantly higher (mean=70.13) than did surgeons (mean=56.86), while there was no significant mean difference in MAWT between patients (11.57 weeks) and surgeons (11.80).
Correlations between the surgeon-rated measures ranged from 0.79 (PCS and VAS urgency) to 0.38 (MAWT with PCS and VAS urgency). Correlations between the PCS and patient-rated measures were 0.26 (EQ-VAS), 0.33 (EQ-5D), 0.33 (WOMAC), and 0.38 (patient MAWT). Surgeon- and patient-rated MAWTs were moderately correlated (0.44), whereas correlations between surgeon and patient perceptions of VAS urgency were weaker (0.24). The WOMAC most strongly correlated with the EQ-5D (0.67), patient-rated VAS urgency (0.54), and the EQ-VAS (0.39). All of the above correlations were statistically significant (p<.001).
Table 2 shows the correlation coefficients for the convergent and discriminant analysis. Four items from the priority tool were matched with conceptually similar items from the WOMAC and the WOMAC functional subscale. Convergent validity coefficients (those measuring similar constructs) ranged from 0.24 to 0.32. The correlation between surgeon-rated pain at rest and patient-rated pain at night (0.32) and pain sitting or lying (0.25) were higher than dissimilar constructs, for example, surgeon-rated pain at rest and patient-rated walking (0.20) (discriminant validity). Surgeon-rated functional limitation (Question 4) was more highly related to WOMAC ascending and descending stairs (0.26) and function (0.28) than to WOMAC pain at night (0.15) and pain sitting or lying (0.08) (discriminant validity). The WOMAC function subscale was significantly related (correlations 0.22 to 0.35) to all four of the priority criteria items measuring pain and function.
For the 20 percent least-urgent patients (PCS<30), both surgeons and patients had a median MAWT of 12 weeks. For the 20 percent most-urgent patients (PCS>65), median surgeon MAWTs were longer (10 weeks) than those for patients (4 weeks).
This study examined the reliability and validity of a new tool designed to prioritize patients for arthroplasty. Internal consistency reliability was within acceptable limits. In the absence of a “gold standard” for prioritizing patients, one way of providing validity evidence is to test the new tool against other measures of urgency. Strong correlations between the PCS and surgeon-rated VAS urgency support the validity of the PCS. Weaker correlations between surgeon-rated MAWTs and the PCS may partly reflect different expectations of reasonable wait times based on current realities and varying wait times in practice (42).
Although the PCS was significantly related to all of the patient measures, correlations tended to be clinically weak. Our results are similar to those reported by Derrett et al. (16;17), who found correlations less than 0.3 between the New Zealand Clinical Priority Assessment Criteria and two measures of health status and function, the EQ-5D and a condition-specific tool, in patients waitlisted for arthroplasty. The low correlation between the WOMAC and the PCS may be partly explained by the different underlying constructs and intended uses of the tools. Although both tools measure patient pain and function, the WOMAC was not designed as a prioritization tool and, therefore, does not include a measure of expected capacity to benefit, as does the priority tool. Moderately strong correlations between the condition-specific WOMAC and generic measure of HRQL support the relevance of the EQ-5D as a preference-based measure for use in this population.
Although there was evidence to support convergent and discriminant validity, the convergent correlations between similar patient- and surgeon-rated constructs were fairly weak. Possible explanations include differences in item wording and a rater effect. For example, the priority tool measures pain on motion (e.g., walking, bending) and the WOMAC measures pain walking on a flat surface. The priority criteria item “pain at rest” includes sitting, lying down, or causing a sleep disturbance, whereas in the WOMAC, these items are measured separately. Other studies have shown that there are weak to moderate correlations between patient- and physician-rated perceptions of health status and pain (53;58). In addition, physicians tend to rate HRQL significantly better, and intensity of pain, significantly lower, than their patients (46;47;53;59;60).
The mean urgency rating was thirteen points higher for patients than for surgeons. This lack of agreement was not surprising, as the surgeon's frame of reference may be different from that of the patient. The weak correlation between surgeon- and patient-rated urgency indicates an inconsistency in the rank ordering of patients by surgeons and patients. In contrast, surgeon- and patient-rated MAWTs were more strongly correlated. Although patients above the 20th percentile of the PCS rated MAWTs lower than did surgeons, there was good agreement, on average, between surgeons and patients.
In non-life-threatening conditions, where there is no clear clinical evidence of the long-term effects of length of waiting on patient outcomes, patient, physician, and public input into decision making is essential to the perception of a fair process to establish standards for acceptable waiting times for elective procedures (21;48;65). To our knowledge, no other study has concurrently measured physician and patient MAWTs or linked MAWTs to levels of priority scores. Other methods used to establish appropriate or reasonable waiting times for arthroplasty include physician surveys (62), patient interviews (18), retrospective patient surveys (13;32), and a 3-month maximum waiting time guarantee for patients meeting priority criteria (28;29).
Median surgeon-rated MAWTs in our study were longer than clinically reasonable wait times of 6.5 weeks reported by physicians responding to the Fraser institute surveys (62). However, in contrast to our study, their results included all types of orthopedic surgery. Patient-rated MAWTs from our study are similar to other reports of patient perspectives on acceptable wait times. In a retrospective patient survey, Ho et al. reported an average acceptable wait time of 13.2 weeks for knee replacement surgery (32). However, a limitation of their study was recall bias, as patients were asked to recall waiting times for surgery that had occurred 2 to 7 years earlier. The one study that related patient-rated acceptable waiting times to three categories of priority based on surgeon assessment found that the majority of patients (42 percent of whom had waited longer than 1 year) would have desired their surgery in less than 6 months (18). However, the sample size was small (n=35), and 33 patients were either in the severe or extreme category.
Weak to moderate correlations between physician and patient measures suggest that patients would be ranked differently depending on which measure is used to prioritize patients. Although priority criteria have been largely developed and based on physician ratings (2;25), patient-rated measures have also been advocated to determine priority (5;64). In addition, QALYs, which incorporate public values, have been proposed as a method of prioritization based on capacity to benefit (49;63). The issue of whether to prioritize based on patient, surgeon, or public values or some combination is unresolved (20;64).
In conclusion, waiting lists are a reflection of growing demands and constrained resources in a publicly funded health-care system. Priority scoring systems are one way of addressing the equitable allocation of resources. The main arguments in favor of priority scoring systems are transparency, explicitness, treatment in order of clinical need, and fairness (20;50). However, one of the criticisms has been the lack of validity evidence and an evaluation of their impact on the system as a whole (14;15;27;40). Results of this study offer support to the PCS as a measure of physician-rated urgency. However, a moderately weak relationship of patient and surgeon measures raises the issue of who should decide on the criteria and weights for prioritization.
The process of validation is one of gathering evidence over time as to the interpretability and usefulness of priority scores as measures of patient urgency. Our work is continuing on validation studies in different populations and setting standards of MAWTs in clinical practice. Longer term studies of the relationships between the PCS, patient outcomes, and length of waiting time would provide important support for the validity of priority scores. As tools are implemented, there is a need for continuous monitoring and evaluation of the effects over time on case mix, distribution of waiting times, patterns of resource use, patient outcomes, and gaming (3;20;35).
We are indebted to the twenty partner organizations from the four western Canadian provinces for their on-going support throughout the project: British Columbia Medical Association; Vancouver Island Health Authority; Vancouver Coastal Health Authority; British Columbia Ministry of Health Services; University of British Columbia, Centre for Health Services and Policy Research; Alberta Medical Association; Capital Health Authority (Edmonton); Calgary Health Region; Alberta Health and Wellness; University of Calgary, Center for Health and Policy Studies; Saskatchewan Medical Association; Regina Qu'Appelle Health Region; Saskatoon Health Region; Saskatchewan Health; Health Quality Council; Winnipeg Regional Health Authority; Manitoba Health; University of Manitoba, Manitoba Centre for Health Policy; Canadian Medical Association; Health Canada.
The authors acknowledge the members of the hip and knee replacement panel who contributed to the development of the hip and knee replacement surgery priority criteria tool: Dr. Ted Findlay, Dr. Donald Garbuz, Dr. Robert Glasgow, Ms. Karin Greaves, Dr. David Hedden, Dr. Mary Hurlburt, Dr. Bill Johnston, Dr. Stewart McMillan, Dr. Jack Reilly, Dr. Anne Sclater, Dr. Kenneth Skeith, and Dr. Lowell van Zuiden. We thank the surgeons and patients who contributed data for the study and Ms. Susan Allebone and Ms. Anne-Marie Pedersen for their contributions to data collection.
Funding for the Western Canada Waiting List Project was provided by Manitoba Health, Saskatchewan Health, Alberta Health and Wellness, the British Columbia Ministry of Health Services, and Health Canada. The views expressed herein do not necessarily represent the official policy of federal, provincial, or territorial governments.