STATEMENT OF RESEARCH QUESTIONS
Variation in medical care utilization has been well documented for over 6 decades (Reference Glover8;Reference Wennberg16;Reference Wennberg and Gittelsohn17). Since Wennberg et al. (Reference Wennberg, Freeman and Culp18) documented a two-fold differences in the use of common medical procedures between New Haven and Boston, evidence suggests that demand and supply in metropolitan markets vary in idiosyncratic ways that defy market-driven explanations. In 1989, U.S. federal government's Agency for Health Care Policy and Research was created in part to redirect health services research resources to examine the sources of these variations and identify best clinical practices for dissemination. From 1991 to 1995, several hundred million dollars were invested in Patient Outcomes Research Team (PORT) projects by AHCPR to focus on a set of medical conditions and procedures to produce clinical guidelines and other information and tools to reduce unwarranted variation in clinical practice (Reference Freund, Lave and Clancy7).
Recent patient safety research, stimulated largely by the 1999 Institute of Medicine report, “To Err is Human” and its 2001 companion “Crossing the Quality Chasm,” highlight the contribution of clinical variation to medical errors that may lead to deaths or adverse clinical events (10;11). The federal government and private research foundations have provided substantial funding of projects designed to identify and reduce unwarranted variation in clinical practice to improve patient safety. Patient safety research initiatives often apply rigorous systems approaches to minimize medical practice variation and thereby reduce error rates. Both the earlier PORTs and more recent patient safety research assume that a substantial fraction of observed variation in clinical practice is unacceptable.
A critical difference between medical practice variation research and patient safety as well as medical appropriateness research is the determination of the reference point of a “correct” utilization rate. Variation research implicitly assumes that the average rate of utilization of a procedure is the correct rate. Alternatively, patient safety and appropriateness studies develop a target utilization rate based on expert opinion of best practices of medical care. For example, a practice variations analysis focused on depression would use hospital records to observe an average utilization rate (e.g., 1/1,000 admissions per person) to function as “norm” for comparison. An appropriateness study would use expert opinion to develop a target hospitalization rate (e.g., 1.5/1000 admissions per person) based on assessments of underlying disease prevalence, alternative therapies and clinical effectiveness. Similar to appropriateness studies, patient safety analyses choose a threshold value based on expert opinion that serves as a goal rate. For example, Springfield Hospital seeks to achieve no more than a 1/10,000 fatal medical error rate per admission). The common thread of variations, appropriateness, and safety literature is that once a central point of comparison is identified, estimates of variance around that point are created to determine the extent “error” present from an identified norm in clinical practice. In operation, data-driven practice variation research reveals the extent of variation in the absence of expert opinion and serves as a diagnostic tool for policy makers and managers. Appropriateness and patient safety research provides a next level of analysis once expert opinion has established a norm of practice and, as such, serves as an on-going performance measurement tool.
The objective of this study is to identify whether the federal government's investment in the PORT initiative reduced the variation in targeted clinical procedures, and their associated welfare loss to society. A before and after quasi-experimental design is used in this evaluation. Using Medicare inpatient claims data from the 5 percent National Claims History from 1991 to 2000, we compare use rate variation and related welfare losses for medical and surgical admissions that were, and were not, directly targeted by the PORTs. This analysis permits us to address two research questions: (i) By 2000, did the hospital admissions for procedures targeted by the PORT initiative have less welfare losses than those not targeted? (ii) Did the PORT initiative reduce the welfare loss for specific targeted procedures affected in terms of inflation adjusted dollars between 1991 and 2000.
The results provide a crude market level assessment of whether a substantial government investment designed to diffuse throughout an entire industry had its desired impact on clinical care. While the PORT initiative is not identical in implementation to the current patient safety initiative, the fundamental premise is the same: unacceptable variation must be attenuated through outside intervention.
Although the PORT program centered on the United States, this research provides knowledge relevant to the international medical technology assessment efforts as well. Currently, there are over two dozen countries with formal programs in technology assessment (19). Each of these programs has objectives similar to the intent of the PORT initiative. This research highlights how it is possible to evaluate the impact of a national health policy as well as provide some insights on the data required to provide such as assessment.
SIGNIFICANCE AND IMPACT
Many investigations have focused on variations in the use of different medical procedures. For example, one study reported that hysterectomy rates vary 3.5-fold across different regions of Maine (Reference Wennberg16). Other studies found similar variations in hundreds of medical procedures (Reference Paul-Shaheen, Clark and Williams12). Wennberg and Gittlesohn (Reference Wennberg and Gittelsohn17) described differences in use rates of common surgical procedures across different regions in New England and proposed using the coefficient of variation (COV) to express the degree of variation in a medical procedure. The COV is equal to the standard deviation of the utilization rate of a procedure divided by the mean of the rate of use of the procedure. A high COV indicates a large degree of variation because the variance (as measured by the standard deviation) is relatively large in comparison to the mean rate of use. In earlier literature, calculated COVs ranged from 0.32 to 0.70. While there is no common benchmark to determine whether a procedure's COV is too high, the statistic does provide a metric to gauge the relative degrees of variation and to order procedures associated by their degree of variation. Diehr and Grembowski (Reference Diehr and Grembowski4) showed through simulation that utilization differences that are completely random will generate a non-zero COV and established a natural lower bound be approximately 0.10.
To understand the financial impact of variations in use of medical procedures, Phelps and Parente (Reference Phelps and Parente14) developed an expression for welfare loss associated with highly variable procedure use rates. Welfare loss was calculated as a function of the volume (number of procedures times the fee of procedure), price elasticity of insurance and unexplained variation in procedure use. Few procedures were found to have a strong positive correlation between volume and COV. The original analysis was developed for a technology evaluation work group of the Institute of Medicine in 1988 (9). As a result, the welfare loss study's initial findings were considered in the development of the list of medical procedures most deserving of clinical guideline development by AHCPR in 1990.
From a public policy perspective, the Phelps and Parente analysis underscores the net financial impact of variations. Nearly $1 billion was estimated to be the welfare loss from variations in New York State hospitals for 1987 alone. Phelps and Mooney (Reference Phelps and Mooney15) found that the welfare loss associated with variations was roughly similar to the amount due to so called moral hazard in health insurance contracts. It is important to underscore that outpatient, physician office, and laboratory procedures were left out of that welfare loss calculation. Given that nonhospital expenditures account for roughly half the cost of the nation's health care, the additional welfare loss from ambulatory medical procedures may be quite significant. On the other hand, including ambulatory procedures that could substitute for inpatient procedures may yield less net variation and, consequently, less welfare loss. Either scenario only underscores the value of further research in this area.
The Phelps and Parente (Reference Phelps and Parente14) welfare loss analysis was designed as a priority assessment tool capable of being applied in different health care settings. Since its publication, over 70 citations have referenced the article, many of which are from publication in international journals.Footnote 1 Po licy makers can use welfare loss analysis to gauge the effectiveness of the PORT initiatives as well as to identify medical conditions and procedures that may require more detailed analysis. If the clinical uncertainty suggested by the large variations in medical practice identified is a proxy for poor outcomes or medical errors, this analysis could identify and prioritize conditions and procedures for new initiatives in delivery system re-engineering being contemplated by both the public and private healthcare sectors.
CONCEPTUAL MODEL
The conceptual model for this work was developed in Phelps and Parente (Reference Phelps and Parente14) and is illustrated in Figure 1. We first assume there is a point in the consumption of medical care where the consumer's utility is maximized at U*. At this point, any more or any less treatment would lead to a welfare loss. For example, if a consumer received Xu of care, they would be receiving less than an optimal level of care, possibly due to rationing. At Xu, the marginal value of care to an individual in terms of its incremental health benefit exceeds the actual cost of the care, thus creating a welfare loss. If too much care is provided, as illustrated by Xo, than the cost of the care exceeds its incremental value to the patient, and they would be worse off. For example, adverse events of unnecessary medical care are welfare reducing in terms of efficient use of resources and could also lead to a health outcome that is debilitating to the patient.

Figure 1. Effect of medical practice variation.
Phelps and Parente (Reference Phelps and Parente14) and Phelps and Mooney (Reference Phelps and Mooney15) applied this economic model to develop an estimate of welfare loss for each hospital procedure using market specific admission rates. Solving for the area in the triangles, the welfare loss can be calculated as:

A key feature of this estimate is the use of the coefficient of variation as a critical parameter. This expression, described in detail in Phelps and Parente (Reference Phelps and Parente14), shows that expenditure combined with variation will drive the extent of welfare loss. In effect, the model provides a way to statistically weight procedure variation by expenditure using a construction that is consistent with economic theory. The model also allows for changes over time. Because it largely reflects a demand curve for medical care, temporal shocks such as the introduction of a new medical procedure can be interpreted as potential demand shifters. For example, a new surgical procedure that substitutes for an old procedure may prompt a reduction in price in the old procedure, leading to an inward shift in the demand curve of the old technology. Likewise, a new procedure with a lower price that complements an old procedure would lead to a demand shift out. Also, reductions in perceived incremental value of a procedure would lead to a demand shift in.
APPROACH AND METHODOLOGY
Replicating the methods used in Phelps and Parente (Reference Phelps and Parente14) and Phelps and Mooney (Reference Phelps and Mooney15), hospital procedure use rates are calculated for a given population area. In this analysis, we chose to use states as the level of analysis to avoid small cell size problems. Because there were sixty-two New York State counties in the original analysis, fifty states is a closer comparison of results to the original study than either thousands of counties or hundreds of metropolitan statistical areas.
With approximately 527 hospital admission procedures, in the form of diagnosis-related groups (DRGs), we chose to collapse certain DRGs to make more general categories of medical admissions. This representation of hospital admission procedures is common in the variations literature and was first used by Wennberg in 1984. This procedure leaves 146 possible modified DRGs (MDRGs) to represent hospitals total inpatient product.
A review of the PORT literature was conducted to identify which of the 146 MDRGs were considered “PORT-affected”, meaning that they (or a group of them) were the focus of a PORT project. The typical PORT was multi-million dollar, multi-year investigation conducted by one or more academic health centers with expertise in outcomes research, health economics, and cost-effectiveness analysis. We identified a total of 32 of the 146 MDRGs that were the direct focus of PORTs. Of those thirty-two PORT MDRGs, we considered a subset of fifteen cardiac MDRGs to be conditional members of the PORT group because their results may be different due to technological change. Heart attacks and cardiac care were the focus of the Harvard-based PORT. However, several factors may mitigate the impact of the heart disease PORT on variation in care and on social welfare. First, cardiology and cardiothoracic surgery have historically been in the vanguard in use of strategies to reduce practice variation.Footnote 2 Second, the 1990s saw widespread introduction of cardiac stents and technological advances related to coronary artery bypass graft (CABG) surgery. While the heart disease PORT's focus on reduction in practice variation is perhaps more likely to strike a familiar chord with clinicians, the complexity and rapidity of technological advances could mitigate the potential PORT contribution.
Adjusted COVs were constructed by running MDRG-specific regressions by year with state-level demographic variables representing the age, gender, socioeconomic, and market conditions. The R-squares ranged from .10 to .75 as they did in earlier analysis (Reference Phelps and Parente14). Finally, the total expenditure for each MDRG is calculated by year. With this and the preceding estimates as components, welfare loss estimates are calculated using methods in Phelps and Mooney (Reference Phelps and Mooney15), where elasticity is assumed to be fixed. This estimate is based on the Rand Health Insurance Experiment medical care demand elasticity estimate of −0.15. Once the welfare loss estimates are generated, the 1991 estimates are adjusted to 2000 dollars for relevant comparison. For each MDRG, the difference between the two period welfare loss estimates is calculated to generate a net difference in welfare loss.
DATA
A five percent sample of recipients of the Medicare health insurance program was used for this analysis. The Medicare program provides health insurance to approximately 40 million elderly or disabled individuals. For this analysis, we chose to focus on the elderly population because the disabled population is quite small and because state Medicaid and social assistance policies may affect the benefits received by disabled individuals.
Using annual inpatient claims totaling over one hundred million records; we calculated MDRG utilization rates by state. The Medicare program's denominator file of all eligible Medicare participants was used to construct state- and year-specific denominators for use in the construction of MDRG-specific utilization rates. The mean and coefficient of variation for each MDRG were constructed using summarized state level demographic data based on the accompanying Medicare program subscriber denominator files from the national claims history file. Total expenditures for each MDRG were also calculated by year for each MDRG.
Regression adjusted coefficient of variation statistics for each MDRG were constructed. The regressions used covariates from state level demographic data from the denominator file, including age, gender, Medicare HMO market penetration rate, Medicaid participation rate, race, mortality, and Medicare supplemental coverage purchases. Data from 1991 and 2000 were used to generate pre and post PORT effects. In addition, similar data was prepared for the intervening years to provide insights on trends in procedure use rates within the 10 years under investigation.
The MDRGs were partitioned into three groups: not directly PORT affected, PORT affected, and cardiac. In the original analysis by Phelps and Parente (Reference Phelps and Parente14), medical and surgical procedures were identified as distinctively different in their care patterns. For this analysis, the rationale for the identification of a separate cardiac category is the dramatic technological innovations that have occurred within the past decade, particularly, the wide spread use of angioplasty and stents, new pharmaceutical interventions, and advances in surgical technique.
RESULTS
Welfare loss comparisons between 1991 and 2000 are presented in Tables 1 through 3. Each of these tables list the MDRGs and their computed welfare losses, along with national Medicare admission rates, provider payment, and adjusted coefficient of variation. The Non-PORT Affected group is shown in Table 1. This group is the largest of the three in terms of number of procedures (n = 84), expenditure and welfare loss. The adjusted COV in 1991 ranged from 13 percent to 126 percent, contrasting with 15 percent to 396 percent in 2000. The computed welfare loss in 2000 for each MDRG ranged from $3 million to $29 billion. In terms of 2000 dollars, the difference between 1991 and 2000 welfare loss ranged from a net savings of $453 million for Other Vascular Procedures w/Complicating Conditions to a net loss of $29.4 billion for Pulmonary Edema and Respiratory Failure. The net impact of the difference in welfare loss was $36.5 billion in 2000. This includes one MDRG with exceptionally high COV estimates in 2000 leading to extraordinarily high 2000 welfare loss estimates. If this was removed from the distribution, the net welfare loss difference between 1991 and 2000 would be $7.3 billion. Thus, the net welfare loss increased dramatically for this set of non PORT-affected MDRGs from 1991 to 2000.
Table 1. Hospital Admission Utilization Expenditure, Variation, and Welfare Loss Comparisons: No PORT Influenced Hospital Admissions

In Table 2, the results of the PORT-affected MDRGs show significantly fewer procedures, yet the use of these procedures is fairly substantial, and a comparison between the 1991 and 2000 adjusted COV ranges finds less variation in 2000. The range in 2000 welfare loss estimates is from $3.2 million to $1.5 billion. Comparing the net change in 2000 welfare loss, an increase in welfare loss occurred during the period. It would appear that the reduction in variation that occurred in some instances was insufficient to counter the increase in utilization of these procedures.
Table 2. Hospital Admission Utilization Expenditure, Variation, and Welfare Loss Comparisons: PORT Influenced Hospital Admissions

We also observe evidence of bias in welfare loss calculation due to an underlying temporal change in the rate of use of a procedure. For example, lens operations had a 93 percent reduction in admissions between 1991 and 2000. This reduction was most likely due a shift in practice of performing cataract surgery in outpatient hospital settings and free-standing surgical centers. A similar reduction in prostatectomy procedures (−61 percent) is also observed. In this case, new drug therapies may have acted as substitutes for surgical procedures.
The cardiac MDRG welfare loss estimates are presented in Table 3. Compared with the other two categories, there is considerably less variation in 1991 and 2000, as measured by the adjusted COV. Consequently, there is also less welfare loss in 1991 and 2000, despite the scale of the expenditures for these MDRGs. The net welfare loss between 1991 and 2000 is $336 million, by far the lowest relative to the total expenditures of each of the three PORT categories.
Table 3. Hospital Admission Utilization Expenditure, Variation, and Welfare Loss Comparisons: Cardiac PORT Hospital Admissions

A summary of the three PORT category welfare loss analyses is shown in Table 4. The procedures not associated with a PORT initiative had the greatest increase in welfare loss in both 2000 dollars, and as a percent change of 318.8 percent, compared with the other two PORT-affected groups of admissions. This result would remain even if the two outlier admissions discussed earlier were removed from the analysis. The best category of results for the net impact in welfare loss was for the Cardiac PORTs, with a percent change of 9.9 percent. The directly PORT-affected group of admissions had the middle performance level in terms of welfare loss, with a percent increase in 2000 dollar welfare loss of 42.9 percent. A final comparative statistic is the ratio of total expenditure to welfare loss. Ideally, welfare loss will be 0, so policy makers would want to maximize this ratio. Welfare loss as a share of total expenditure is far greater for the admissions that were not PORT-affected, making this ratio lower than the ratio for the two other PORT-affected groups of admissions.
Table 4. Changes in Welfare Loss 1991–2000: Comparison of No PORT, PORT and Cardiac PORT Admissions

Figure 2 illustrates trends in overall welfare loss over time by each of the PORT-affected categories. By 2000, 84 percent of all welfare loss estimates were for non–PORT-affected procedures. This compares to only 62 percent in 1991.

Figure 2. Percentage of welfare loss between 1991 and 2000 by PORT-affected categories.
DISCUSSION
There are two key findings from this study that directly address the main research questions. First, hospital admission procedures targeted by the PORT initiative did have smaller welfare losses as a share of total expenditure. This result is also supported in terms of a generally smaller percentage increases in welfare loss between 1991 and 2000 for the PORT-affected admissions, in comparison to the non PORT-affected hospitalizations. Our second finding is that the net effect of the PORT was not a reduction in the welfare loss for those procedures, in terms of inflation adjusted dollars between 1991 and 2000. For many MDRGs, there was reduction in welfare loss. However, when these negative changes in welfare loss were added with the MDRGs where there was an increase in welfare loss, the net effect in all three PORT-affected categories was still an increase in welfare loss.
The results of this study suggest the PORT initiative may have had a welfare-improving effect on society. At the very least, the evidence suggests that, taken as a group, the directly PORT-affected and cardiac PORT procedures show that the variation situation documented by Wennberg and others might have deteriorated further had there been no PORT investigations. The cardiac PORT results were particularly interesting because of the overall lack of high levels of COV in 2000. Generally, when new medical innovations are introduced, one would expect a high level of variation in clinical practice as the new technology is diffused (Reference Eisenberg5). However, cardiac care was already governed by national clinical guidelines for the treatment of hypertension and lipid disorders by 1991, and many of the newer technologies introduced in the 1990s were introduced with a strong evidence base and with clear indications for appropriate use—factors which may be associated with inherently low levels of variation, as both the 1991and 2000 results suggest.
This research has five limitations. First, we chose only to focus on the Medicare population, limiting the generalizability of these findings to the privately insured population. However, the selection of the Medicare population removes many confounding influences such as the effect of managed care and other financial incentives that were either introduced (e.g., performance based financial rewards for physicians) or removed (e.g., scaling back utilization review activities in favor of disease management). The Medicare reimbursement system largely remained unchanged for hospitals during the study period. The rise of Medicare managed care may have an influence, but to the extent possible, its effect was formally considered in our approach by including it as a covariate in the MDRG-specific regression models used to adjust the coefficient of variation.
The second limitation is a failure to control for the introduction of new medical technologies, which could affect the production of inpatient hospital admissions. While this could certainly influence our results, we see it as equally likely that these exogenous shocks to the market would influence both the PORT affected and non-PORT admissions randomly. In addition, these shocks could simultaneously affect many clinical domains. For example, development and application of sophisticated laparoscopic surgical techniques affected not only the University of Pennsylvania PORT on gall bladder disease, but many other clinical domains not targeted by PORTS, such as orthopedic procedures, management of appendicitis, and neonatology. Similarly, advances in imaging technology improved diagnosis and in many cases management of conditions in numerous clinical domains. (Reference Bass, Pitt and Lillemoe1;Reference Escarce, Bloom, Hillman, Shea and Schwartz6).
A third limitation is that the analysis does not explicitly account for substitution of outpatient services for inpatient care. However, like the case of the development of a new technology shock, the presence of substitutable procedures may fairly be assumed to be random in its influence on either PORT or non-PORT admissions.
The fourth limitation is that our estimates of the “correct” level of hospital utilization may incorrect. For example, the average rate of knee procedures in our study was 8.6/1,000, but ideally, it should (hypothetically) be 7/1,000 based on the state of medical technology knowledge, disease prevalence and hospital supply. This deviation from a correct rate would introduce a bias formally expressed in the welfare loss expression as an additional component to be added to the welfare loss expression stated as: Additional WL = 1/2(%Bias)2/h where h is demand elasticity (Reference Phelps and Mooney15). Thus, the welfare loss increases with the square of the percent change in the bias of average use relative to the “correct” rate of use. This calculation is based on the property in statistics where mean squared error of a biased estimator contains two sources of error—variability about the mean and the squared bias. This additional welfare loss due to bias would occur for both under-use and over-use of a procedure in a manner similar to that displayed by the dead weight loss triangles in Figure #1. A more complete analysis would account for the additional welfare loss introduced by this bias with meta-analysis to compare average and expert-determined “correct” rates of utilization. While this additional step is outside the scope of the analysis, our inability to account for bias leads us to assume the average rate of use is correct and thus our estimates could be interpreted as a lower bound of the welfare loss possible for each type of admission.
Finally, the analysis is done at the level of the nation, and does not drill down to identify specific mechanisms that mitigate or aggravate variation in practice that influence social welfare. In theory, the method applied in this analysis could be applied to pinpoint patterns of variation and welfare loss at the physician level by using the attending physician's unique identification number (UPIN). The attending physician serves the role as admitting and discharge coordinator for a patient, so their individual practice style should be carefully quantified if possible. Unfortunately, physician identification data are only of good quality in 2000 because and the UPIN was not a required data field in 1991. Hospital identification might serve as a proxy, but it is not as clean a representation of practice style. Because hospitals are essentially the doctor's workshop (Reference Pauly and Redisch13), physicians, not institutions, may primarily determine admission rates. Ideally, an econometric analysis focused on some combination of both, with admitted physician effects accounted for by random effects and hospital by fixed effects, using a national database would provide the most comprehensive approach to addressing this issue.
From a policy perspective, PORTs may have been a good investment, although the Agency for Health Care Policy and Research did not provide a direct successor program of research. Practice guidelines developed through the PORT initiative were followed up by a host of clinical practice guidelines created by constituencies ranging from disease-specific societies to specialty groups whose recommendation were not always strongly evidence-based.
The success of recent research initiatives that aim to reduce errors and improve patient safety could be enhanced in some cases by adopting and further developing the measures of variation in care and social welfare loss using methods similar to those we have presented. An example of such an application may be investigations of drug prescribing practices. Patterns of drug prescribing exhibit wide variation, which may often translate directly into variation in clinical outcomes, costs of care, and risk of adverse events (3). Moreover, variation exists at several levels of the care system—patients, providers, facilities, and regions. This area of investigation promises to deepen and broaden our understanding of variation in care, and perhaps lead to novel intervention strategies built, to some degree, on the foundations laid by the pioneering PORT studies
In summary, this is the first study to systematically evaluate the long term effect of PORTs on practice variation and social welfare. Using an economic derivation of welfare loss attributable to medical practice variation, and using hospital admission claims data for 2 million elderly patients generalizable to nation, we estimated the change in welfare in the 1990s for conditions that were or were not targeted by PORT investigations. Our results show that inpatient clinical domains targeted by the PORTs had less welfare loss relative to their total expenditure by 2000, but that there was not a net decrease in the welfare loss for all hospital admissions affected by the PORT. This research provides a novel approach to evaluating the impact of significant social science and medical interventions, and variation in the delivery of these interventions, on social welfare.
CONTACT INFORMATION
Stephen T. Parente, PhD (sparente@umn.edu), Associate Professor, Department of Finance, University of Minnesota, 321 19th Avenue South, Room 3-279; Director, Medical Industry Leadership Institute, Carlson School of Management, 321 19th Avenue South, Room 3-122, Minneapolis, Minnesota 55455
Charles E. Phelps, PhD (charles.phelps@rochester.edu), University Professor/Provost Emeritus, Economics/Comm. & Preventive Medicine, University of Rochester, 200 Wallis Hall, Rochester, New York 14627
Patrick J. O'Connor, MD, MPH (patrick.j.oconnor@healthpartners.com), Senior Clinical Investigator, HealthPartners Research Foundation, 8170 33rd Avenue South, Mail Stop: 21111R, Minneapolis, Minnesota 55440-1524