High costs are a major concern in almost all aspects of health care. Although pharmaceutical expenditures are the fastest growing expense, other costs including those of diagnostic procedures, equipment, imaging, investigations, laboratory tests, or visits to hospital (collectively here termed “diagnostic and therapeutic items” or D&T items) account for a large share of healthcare spending. Many of these costs are incurred in the hospital setting. Hospitals are the largest overall cost in health care in most of the Organization for Economic Cooperation and Development (OECD) countries, ranging from 27.1 percent in Poland to 48.4 percent in Japan (28). Whereas some D&T items, such as a single blood test, are inexpensive in isolation, the volume of these services can increase costs to significant levels, and many are tied in with the cost of personnel, which is the single largest expense in hospitals.
Although rarely available as a distinct cost, in one instance investigational services constituted almost 30 percent of expenditures for Australia's Medicare Services (Reference Conyers13). Several studies have shown that test ordering is frequently excessive in most clinical environments, including primary care (Reference Leurquin, Van Casteren and De Maeseneer24), emergency departments (Reference Smellie, Murphy and Galloway37), preoperative assessment (Reference Johnson and Mortimer23;Reference Munro, Booth and Nicholl27), hospitals (Reference Bareford and Hayling6), and specialized wards or clinics such as renal (Reference Gardner and Henderson19) or intensive care units (Reference Graat, Kröner and Spronk20). Furthermore, these tests generally have little impact on patient care (Reference Graat, Kröner and Spronk20;Reference Johnson and Mortimer23;Reference Munro, Booth and Nicholl27) and in some cases may be harmful (Reference Black and Welch8). Other studies have shown that cost increases of 60 percent over a variety of hospital and physician services do not improve patient outcomes but occasionally worsen them (Reference Fisher, Wennberg and Stukel17;Reference Fisher, Wennberg and Stukel18). If physicians were to reduce the use of unnecessary medical services, they could control costs without negatively impacting patient care. Physicians could also reduce costs by autonomously choosing the least costly diagnostic test, imaging technique, or investigation in cases where they have a choice and there are no significant differences in safety and effectiveness. Doctors must also consider costs to their patients, as approximately half of OECD countries require some form of direct payment for hospital services, either as co-payments or deductibles, from patients (Reference Esmail and Walker15).
If physicians are going to take costs into consideration, they need to be cognizant of both the absolute cost of D&T items and the relative differences between prices, for example between the cost of a biochemistry panel including or excluding liver enzymes. However, in most places, this type of cost information is not easily available for doctors. To determine whether it is necessary to enhance both physicians' education about prices and the availability of that information, we undertook a systematic review to determine physicians' level of awareness of the cost of D&T items.
METHODS
The methods of this systematic review overlap those of the related systematic review of physician awareness of drug costs (Reference Allan, Lexchin and Wiebe3). Templates for the systematic review of survey studies are not well established but QUOROM (Reference Moher, Cook and Eastwood26) (normally reserved for systematic reviews of randomized controlled trials) is a good guide for most systematic reviews and was used here wherever possible.
Search
We searched the Cochrane Library (from 1966), EconLit (from 1969), EMBASE (from 1974), and MEDLINE (from 1950) up to May 31, 2005, using the search terms physician, doctor, medical student, house staff, intern or resident; medicine, medications, drug therapeutic, test, investigation or diagnostic test; cost or price; and knowledge, awareness, or understanding. (The results of the analysis of knowledge of drug costs are reported elsewhere (Reference Allan, Lexchin and Wiebe3).) The titles and abstracts, where available, were independently screened by both authors and if either investigator thought that the article would be potentially eligible a complete copy was obtained. To identify additional studies, the reference list of any potentially eligible article was searched and authors with two or more publications in the area or who had published in the 10 years preceding the start of our review were contacted.
Eligibility
Articles were included if either doctors, trainees (interns or residents), or medical students were surveyed; there were more than ten survey respondents; costs of D&T items were estimated; results were expressed quantitatively; there was a clear description of how authors defined Accurate Estimates; and there was a clear description of how the True Cost was determined. Because costs are variable and complex, we believed it was only reasonable for doctors to have knowledge of the total costs of the D&T item, whether that cost was borne partially or completely by the patient and/or the insurer (private or government), in their local practice environment. Therefore, “True cost” was operationally defined as the actual cost the study authors verified from one or more locally relevant reliable sources for each D&T item in their study. The definition of “Accurate Estimates” was taken from the authors and typically fell within a defined “accuracy range” (e.g., ±25 percent) around the true cost. Articles were excluded if they were not published in English or if participants were asked to estimate costs within ranges or cost increments only (for example “please estimate which $20 cost category/range is most appropriate for drug A”). Both authors independently assessed each potential article for eligibility. Differences in decisions about inclusion and exclusion were resolved through consensus.
Data Extraction
From each eligible article, both authors independently extracted a range of data including information about the types of items surveyed, the location of the study, and the demographics of the respondents. Where data were not reported in a way that allowed extraction in one of our categories, we attempted to calculate the information from available data (e.g., number of respondents calculated from the number of surveys distributed multiplied by the response rate). Comparisons within studies, such as differences between medical student and resident accuracy, were extracted when available. Authors were contacted for further data where necessary. After each investigator independently extracted the above information, the results were compared and differences resolved by consensus.
Data Analysis
The studies were too diverse to combine meta-analytically (different therapies, different cost estimation procedures, different groups of physicians). Mean accuracy (expressed as the percent of physicians who correctly estimated D&T costs) for each study was calculated by averaging the accuracy from each participant group or D&T item estimated with weighting for the number of estimation attempts. For example, if accuracy was 30 percent for D&T A (n = 100) and 50 percent for D&T B (n = 80), the average accuracy would be 39 percent ((0.30 × 100) + (0.5 × 80))/180). We calculated nonparametric summaries (median and ranges [minimum–maximum]) for the following outcomes: average cost accuracy (within defined percent margins of error), average percent of estimates over and under true cost or the margins of error (as defined by the original authors) around the true costs, and average percent error (|estimate − true cost| true cost).
Percent error is the statistic used to demonstrate the degree of estimation error. To be reliable, each estimate error (the amount above or below the true cost) must be converted to an absolute value. If not, high estimates will be positive numbers and low estimates will be negative numbers, and when summed will partially cancel each other giving a lower value and a false impression of accuracy. For example, if the true cost of a D&T item is $100 and two doctors estimate $50 and $150, respectively, the correct percent error would be 50 percent. However, if absolute values were not used, the percent error of the high estimation error would be 50 percent and the low would be −50 percent. This would make the combined percent error 0 percent, indicating no error in estimation and yield a false representation of perfect accuracy.
Additionally, a priori defined subgroups, such as year of publication (divided by median year of publication of studies), location of study, training level of participants, and specialty, were examined to determine whether these variables influenced the accuracy of the cost estimation. We also examined the influence of study quality on estimation accuracy by separating studies with a similar accuracy range into those of high and moderate–low quality. For this analysis, we used weaknesses of response rate (≤50 percent or unclear), sampling method (convenience or unclear), and survey distribution (unclear) as markers of quality. Although there is no defined adequate response rate, low response rates can bias surveys (Reference Barclay, Todd, Finlay, Grande and Wyatt5;Reference Templeton, Deehan, Taylor, Drummond and Strang38), and we believed 50 percent was generous. Non-probability sampling, such as convenience sampling, can bias studies because the sample is not representative of the population. Different modes of questionnaire administration have different inherent biases, and although there is no clearly superior method (Reference Bowling11), we believed the information was important in reviewing surveys. High quality studies had none of these weaknesses, moderate quality studies had one weakness, and low quality had two or more weaknesses. The use of these measures to assess quality has face validity as it was based on our understanding of the places where the greatest biases can occur in survey studies.
We also performed two sensitivity analyses. To minimize the heterogeneity inherent in comparing studies with multiple different services, we compared the average cost accuracies for specific D&T items among four or more studies. When data could not be combined and nonparametric statistics such as medians and ranges must be used, there is a concern that larger studies are weighted equally with smaller ones. To determine the potential influence of “weighting,” we performed sensitivity analyses where the median nonparametric statistic was selected based on the number of services in each study, the number of physicians in each study, or the total number of estimates in each study.
Lastly, where data could be extracted from studies on the true cost of an individual D&T item, we examined the influence of the true cost in two dimensions—are doctors better able to estimate the cost of inexpensive items versus expensive ones (accuracy), and do doctors underestimate the cost of expensive items and overestimate the cost of inexpensive ones (estimation pattern). We undertook this analysis because we have previously shown that the true cost has the largest influence on the accuracy and estimation pattern for drugs (Reference Allan, Lexchin and Wiebe3). To separate D&T items into high and low cost groups, in each study we looked at the cost of the items in the study to see if there were distinct cost groups that might serve to delineate high from low cost items (For example, if a study had twenty-five items with ten items under $5 and the remaining fifteen items over $40, we would separate the groups that way). If there was no distinct high and low cost group, we used the median D&T item cost for best high/low cost dividing point for that study.
Ethics approval was not required as the research involved publicly available material.
RESULTS
Literature Search and Selection
A study flow diagram is provided in Figure 1. Eleven authors were contacted to identify possible studies, and six responded to yield two previously unidentified studies, neither of these studies was ultimately included. From a total of 2,954, fourteen studies were included in the systematic review (Reference Allan and Innes1;Reference Allan and Innes2;Reference Bailey, Ruggier and Cashman4;Reference Conti, Dell'Utri and Pelaia12;Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Fairbass and Chaffe16;Reference Innes, Grafstein and McGrogan21;Reference Mills and Chaffe25;Reference Perrine29;Reference Ringenberg31;Reference Saunders, Divine and Weinberger33;Reference Schlunzen, Simonsen, Spangsberg and Carlsson34;Reference Skipper, Smith, Mulligan and Garg36;Reference Wynick and Jessop39).
Study Characteristics
The main characteristics and methodological aspects of each study are provided in Table 1. Studies were conducted from 1976 to2004 in five countries with the United States (n = 5), United Kingdom (n = 4), and Canada (n = 3) predominating. Seven studies included licensed physicians only, one involved house staff only, and five included a mixture of participants (licensed physicians, house staff, and medical students). Four studies involved general practitioners (GP) alone, five specific specialists groups, three a mix, and two were unclear as to the specialty of the doctors.
*Exact number surveyed unclear (“approximately 120”) so response rate approximate.
HS, house staff; Lic, licensed physicians; MS, medical students; GP, general practitioner or family physician; Mix, mixture of specialized physicians; ns, not specified.
Hospital-based studies in Canada, Denmark, Italy, United Kingdom, and United States defined true costs as acquisition costs (Reference Conti, Dell'Utri and Pelaia12;Reference Schlunzen, Simonsen, Spangsberg and Carlsson34), billing costs (Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Skipper, Smith, Mulligan and Garg36;Reference Wynick and Jessop39), costs obtained through surveys (Reference Innes, Grafstein and McGrogan21), and wholesale costs paid by the hospital (Reference Bailey, Ruggier and Cashman4;Reference Fairbass and Chaffe16;Reference Mills and Chaffe25). Three outpatient studies used surveys (Reference Perrine29;Reference Ringenberg31;Reference Saunders, Divine and Weinberger33) and two used a combination of billing and acquisition costs (Reference Allan and Innes1;Reference Allan and Innes2).
Studies looked at physicians' knowledge of three broad categories of D&T items—investigations (Reference Allan and Innes1;Reference Allan and Innes2;Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Innes, Grafstein and McGrogan21;Reference Perrine29;Reference Ringenberg31;Reference Saunders, Divine and Weinberger33;Reference Skipper, Smith, Mulligan and Garg36;Reference Wynick and Jessop39), medical supplies (Reference Bailey, Ruggier and Cashman4;Reference Conti, Dell'Utri and Pelaia12;Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Fairbass and Chaffe16;Reference Mills and Chaffe25;Reference Perrine29;Reference Schlunzen, Simonsen, Spangsberg and Carlsson34), and medical care (Reference Allan and Innes1;Reference Allan and Innes2;Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Wynick and Jessop39) (Supplementary Table S1, which can be viewed online at http://www.journals.cambridge.org/jid_thc). The number of different D&T items estimated per study ranged from one (cost of mammography) to forty.
Study Quality
The method of survey distribution was unclear in four studies (Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Fairbass and Chaffe16;Reference Perrine29;Reference Wynick and Jessop39), and sampling was convenience or unclear in eight studies (Reference Bailey, Ruggier and Cashman4;Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Fairbass and Chaffe16;Reference Innes, Grafstein and McGrogan21;Reference Mills and Chaffe25;Reference Perrine29;Reference Ringenberg31;Reference Schlunzen, Simonsen, Spangsberg and Carlsson34). The response rates were ≤50 percent or unknown in four studies (Reference Allan and Innes2;Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Perrine29;Reference Wynick and Jessop39). Only four (29 percent) of fourteen studies (Reference Allan and Innes1;Reference Conti, Dell'Utri and Pelaia12;Reference Saunders, Divine and Weinberger33;Reference Skipper, Smith, Mulligan and Garg36) did not have any of these three weaknesses (Supplementary Table S2, which can be viewed online at http://www.journals.cambridge.org/jid_thc). In addition, of nine studies attempting to quantify the degree of estimation error (for example percent error) (Reference Allan and Innes1;Reference Allan and Innes2;Reference Bailey, Ruggier and Cashman4;1Reference Bareford and Hayling6;Reference Innes, Grafstein and McGrogan21;Reference Mills and Chaffe25;Reference Ringenberg31;Reference Saunders, Divine and Weinberger33;Reference Schlunzen, Simonsen, Spangsberg and Carlsson34), six used average estimations without regard for signage (i.e., averaging overestimates with underestimates) or inadequately described the calculation(Reference Bailey, Ruggier and Cashman4;Reference Fairbass and Chaffe16;Reference Mills and Chaffe25;Reference Ringenberg31;Reference Saunders, Divine and Weinberger33;Reference Schlunzen, Simonsen, Spangsberg and Carlsson34). In total, eleven (79 percent) of the fourteen studies had one or more of these four weaknesses, and only three trials (Reference Allan and Innes1;Reference Conti, Dell'Utri and Pelaia12;Reference Skipper, Smith, Mulligan and Garg36) were without significant weaknesses. There was also a large variation in study design: four methods were used to determine true costs, and reasonable accuracy was defined nine different ways.
Estimation Accuracy
Table 2 summarizes cost accuracy outcomes. Using the more restrictive criterion of accuracy (75/80–120/125 percent), accuracy for all D&T items was 33 percent; increasing to 50 percent with a more liberal criterion of 50–150/200 percent. Investigations and medical care visits showed the same pattern of better percent accuracy with more liberal criteria. Except for the cost of a medical care visit, average cost accuracy was not more than 54 percent whatever the criterion used. Underestimation tended to be greater than overestimation, with the exception of medical care visits, and percent error was in the range of 50 percent, but it could only be measured for three studies that dealt with investigations.
aPercent of estimates above or below the accuracy ranges around the true costs. Ranges were true cost (Reference Allan and Innes1;Reference Allan and Innes2), the range of true (Reference Innes, Grafstein and McGrogan21), 75–125 (Reference Skipper, Smith, Mulligan and Garg36), and 50–200 (Reference Wynick and Jessop39).
b | Estimate − True Cost | True Cost.
In the sensitivity analysis of estimation accuracy (for studies using ±20 percent or ±25 percent), the number of therapies, the number of physicians, and the number of estimations for each study did not change the median accuracy more than 1 percent. Estimation variability between studies is compared by removing some of the heterogeneity and focusing on individual D&T items common to four or more studies (Supplementary Figure S1, which can be viewed online at http://www.journals.cambridge.org/jid_thc).
Subgroup Analysis
Table 3 presents nonparametric summaries for subgroups using the most commonly used margin of error (75/80–120/125 percent). The median percent accuracy clustered around 30 percent and was not significantly affected by the country where the study took place (Canada versus the United States), the median year of publication (≤1990 versus ≥1991), or the specialty of the respondents (general practitioners versus specialists). The studies where doctors worked in a mix of community and hospital settings all involved general practitioners, whereas the studies where doctors worked primarily in hospital settings all involved specialists. Therefore, the results when different types of work settings are compared (mix of community and hospital versus primarily hospital) are identical to those when general practitioners and specialists are compared. Results using a wider margin of error (50–150/200 percent) were somewhat better as would be expected but once again were not different within subgroups, including the additional subgroup of training level (licensed doctor versus house staff).
There were insufficient data to present median accuracies and ranges for the low and moderate quality groups separately; therefore, we combined data for these two. Median percent accuracy was slightly worse (27 percent; range, 13–35 percent) for high quality studies compared with those of moderate–low quality (33 percent; range, 30–41 percent).
Four studies (Reference Allan and Innes1;Reference Allan and Innes2;Reference Innes, Grafstein and McGrogan21;Reference Wynick and Jessop39) provided enough data (true cost and the percent of high/low estimations for each D&T item) to examine whether the true cost of each item affected the estimation pattern, that is, did doctors consistently underestimate the cost of high priced items and overestimate the price of low priced ones. Analysis of the 100 D&T items in these studies showed this was not the case (binomial test, 51/100, p = .92).
Seven studies (Reference Allan and Innes1;Reference Allan and Innes2;Reference Dresnick, Roth, Linn, Pratt and Blum14;Reference Innes, Grafstein and McGrogan21;Reference Perrine29;Reference Ringenberg31;Reference Wynick and Jessop39) provided enough data (true cost and estimation accuracy for each D&T item) to examine whether doctors are better able to estimate the true cost of expensive items, compared with inexpensive ones. Our results showed that expensive items are not estimated more accurately than inexpensive items. Compared with the mean estimation accuracies for these seven studies, thirty-six of seventy-nine (46 percent) inexpensive D&T items had a higher estimation accuracy, whereas thirty-six of sixty-seven (54 percent) of expensive D&T times had a higher estimation accuracy (46 percent versus 54 percent, Fisher's exact test, p = .41).
DISCUSSION
Physician awareness of D&T medical care cost items is poor. Only 33 percent of estimates were within ±20 percent or ±25 percent of the true cost and 50 percent were within ±50 percent or in the 50–200 percent range of the true cost. Country, year of study, level of training, and specialty seem to have limited impact over both primary accuracy ranges. For example, within the ±50 percent or 50–200 percent accuracy range, generalists seem to be more accurate than specialists (56 percent versus 46 percent respectively) but at the ±20 percent or ±25 percent accuracy ranges generalists are less accurate than specialists (30 percent versus 34 percent, respectively). Comparing accuracy over different D&T items, the median accuracy (within ±50 percent or 50–200 percent) improves approximately 10 percent from medical supplies (44 percent) to investigations (54 percent) to medical care visits (66 percent). However, the range of accuracies within these groups overlap, with an accuracy for medical supplies (60 percent) in one study (Reference Mills and Chaffe25) the same as that for the medical care visits (60 percent) in another (Reference Allan and Innes1). Study quality may have impacted the results as mid and low quality studies have higher estimation accuracy than high quality studies, 33 percent (range, 30–41) versus 27 percent (range, 13–35), respectively. Unfortunately, there are inadequate numbers of studies in the different quality subgroups to allow for further comparisons, so it remains uncertain how much lower quality trials may bias results favorably.
Compared with the corresponding study on drugs (Reference Allan, Lexchin and Wiebe3), estimation accuracy in this study was slightly better (33 percent for D&T versus 31 percent for drugs) and the median error was considerably less (54 percent for D&T versus 243 percent for drugs). This finding suggests that, although doctors have difficulty getting within a particular margin of error (e.g., ±20 percent or ±25 percent), they have a better sense of the approximate cost of D&T items compared with drugs. Supplementary Figure S1 (which can be viewed online at http://www.journals.cambridge.org/jid_thc) shows that variability between studies persists even when focusing on the same tests. In the study on the accuracy of the estimation of the cost of drugs (Reference Allan, Lexchin and Wiebe3), the true cost of the drugs explained a good part of this variability. However, in this study, there was no consistent pattern. In Innes et al. (Reference Innes, Grafstein and McGrogan21), the true cost of a urine culture was almost twice as much as in some other studies, and the accuracy of doctors in Innes was higher than other studies in which doctors also estimated urine culture costs. In Perrine (Reference Perrine29), the true cost of urinalysis was only one third the cost in some other studies, but the accuracy of doctors in Perrine was still higher than other studies in which doctors estimated urinalysis costs.
Looking over all studies with available data, the true cost of D&T items did not influence estimation accuracy or the pattern of estimation (whether doctors would guess high or low), whereas for drugs, the true cost appears to be the strongest predictor of estimation accuracy and estimation pattern (Reference Allan, Lexchin and Wiebe3). As the true cost of items increases, the acceptable percent margin of error makes the absolute dollar margin of error quite large. For example, in a study using ±25 percent as the acceptable margin of error, a doctor would have to be within $0.25 ($0.75 to $1.25) of an item costing $1 compared with $25 for an item costing $100. Therefore, it is surprising that accuracy was not significantly better with higher cost items.
Other studies that have focused on pharmaceuticals have shown that doctors do care about healthcare costs regardless of whether the patient or a third party pays (Reference Boiver, Martin and Perneger10;Reference Prosser and Walley30;Reference Shrank, Joseph and Choudhry35). Furthermore, when cost information is provided or if doctors receive feedback/education about costs (Reference Beilby and Silagy7;Reference Blackstone, Miller and Hodgson9;Reference Jamtvedt, Young, Kristoffersen, O'Brien and Oxman22;Reference Sachdeva, Jefferson and Coss-Bu32), they modify their behaviour and reduce costs. Clearly, more could be done to help doctors improve their ordering and reduce costs. We suggest providing cost information with lab and diagnostic imaging requisitions. Additionally, physicians should also receive feedback and education on the cost of individual items that they have ordered along with how they compare with their peers and suggestions for modifying their practices.
Limitations
Although our review could potentially have been biased by the exclusion of non-English studies, no studies in other languages were identified by our search strategy. Only three of the fourteen studies are from 2000 and later, so it is possible that the findings of this review do not reflect physicians' current awareness of D&T items. However, as physician awareness did not appear to change over the 30-year span of these studies, it is highly likely that it remains poor. Finally, all but two of the fourteen studies came from Canada, the United Kingdom, or the United States, and results may be different in other countries.
CONTACT INFORMATION
G. Michael Allan, MD (michael.allan@ualberta.ca), Assistant Professor, Department of Family Medicine, University of Alberta, 901 College Plaza, Edmonton, Alberta T6G 2C8, Canada; Research Fellow, Institute of Health Economics, #1200, 10405 Jasper Avenue, Edmonton, Alberta T5J 3N4, Canada
Joel Lexchin, MSc, MD (jlexchin@yorku.ca), Professor, School of Health Policy and Management, York University, 4700 Keele Street, Toronto, Ontario M3J 1P3, Canada; Active Staff, Department of Emergency Medicine, University Health Network, 200 Elizabeth Street, Toronto, Ontario M5G 2C4, Canada