Hospital length of stay (LOS) is an important contributor to healthcare expenditures.Reference Polverejan, Gardiner and Bradley 1 Increased LOS is also a risk factor for adverse events.Reference Hauck and Zhao 2 , Reference Graffunder and Venezia 3 Moreover, many factors used to measure healthcare quality have been linked to prolonged LOS,Reference Bankowitz, Doyle and Duan 4 – Reference Thomas, Guire and Horvat 7 and LOS has also been used to measure quality.Reference Southern, Bellin and Arnsten 8 – Reference Edwards, Morris and Jenkins 10 In addition, LOS has been used to measure efficiency in hospitals.Reference Hollingsworth 11 , Reference McDermott and Stock 12 For these reasons, LOS is commonly used to study disease outcomes. However, LOS varies dramatically, not only for different procedures and diagnoses, but also among hospitals.Reference Yang, Peek-Asa and Allareddy 13 Thus, when using LOS as an outcome measure, adjusting for factors associated with hospital-level variation in LOS is important.
Many studies have analyzed excess LOS associated with adverse events, including postoperative hemorrhage or hematoma,Reference Zhan and Miller 5 falls,Reference Dunne, Gaboury and Ashe 14 , Reference Wong, Recktenwald and Jones 15 adverse drug events,Reference Classen, Pestotnik and Evans 16 and decubitus ulcers.Reference Graves, Birrell and Whitby 17 Healthcare-associated infections are also a frequently studied source of excess LOS. Examples include bloodstream infections,Reference Payne, Carpenter and Badger 18 methicillin-resistant Staphylococcus aureus infection,Reference Macedo-Vinas, De Angelis and Rohner 19 , Reference De Angelis, Allignol and Murthy 20 sepsis,Reference Rivard, Luther and Christiansen 21 and surgical site infections.Reference Monge Jodra, Sainz de Los Terreros Soler and Diaz-Agero Perez 22 A common healthcare-associated infection is Clostridium difficile infection (CDI), and CDI increases the LOS of patients with the disease.Reference Lipp, Nero and Callahan 23 – Reference Kyne, Hamel and Polavaram 25
Although CDI may increase an infected patient’s LOS, it is not known whether institution-level CDI is related to prolonged LOS in patients without CDI. To our knowledge, no study has described the association between CDI incidence and hospital-wide excess LOS in patients without CDI. However, a number of possible links exist between CDI incidence and excess LOS in patients without CDI. For example, hospital CDI rates may be a proxy for quality. Poor quality hospitals may foster more CDI and create conditions that lead to excess LOS (eg, other adverse events). Alternatively, patients may stay longer at hospitals that have administrative inefficiencies.Reference White, Statile and White 26 CDI cases acquired later in a patient’s stay may be more likely to be captured in the discharge records at hospitals with excess LOS. Thus, CDI incidence could be a proxy for hospital efficiency. The purpose of this study is to explore the relationship between CDI incidence rates and LOS for patients who do not have CDI, controlling for both hospital-level and patient-level characteristics.
METHODS
Data Source
We used the Healthcare Cost and Utilization Project Nationwide Inpatient Sample, 2009–2011. The Nationwide Inpatient Sample, maintained by the Agency for Healthcare Research and Quality, is the largest database of inpatient records in the United States. It contains records of roughly 8 million hospital stays each year and is a 20% stratified sample of US hospitals. The Nationwide Inpatient Sample contains data on patient demographic characteristics, diagnoses, and procedures, measures of comorbidity and severity, reasons for and sources of admission, discharge disposition, hospital characteristics, charges and payment sources, and LOS for each unique patient record. 27
To estimate the relationship between a hospital’s CDI rate and excess LOS, we excluded all patients with any CDI diagnosis (primary or secondary) from the analysis of LOS. Excluding such patients eliminates the direct connection that exists between CDI and the increased LOS associated with CDI. However, patients with CDI were used in calculating CDI incidence rates. Because we were interested in analyzing excess LOS, patients were excluded if they were admitted and discharged on the same day. Analysis was conducted at both an aggregated hospital level and a patient level. At the hospital level, all patients without CDI who had nonmissing values for LOS were included. For the patient-level analysis, patients were excluded if records contained missing values for any of the predictor variables described below. Table 1 provides a summary of the total number of hospitals and patients included at each stage of the analysis.
TABLE 1 Study Population in Study of CDI as a Proxy for Length of Stay

NOTE. CDI, Clostridium difficile infection; NIS, National Inpatient Sample.
Outcome and Predictor Variables
Two outcome measures were used for this analysis. The first was each hospital’s average inpatient LOS, calculated as the average LOS across all patients without a CDI diagnosis. We modeled this outcome as a function of hospital-level characteristics. Our second outcome measure was individual patient-level LOS. We used this outcome in order to control for both hospital- and patient-level characteristics (Table 2). We compared the estimated effects of CDI incidence on LOS between these 2 outcomes in order to determine how much of the relationship between CDI incidence and LOS was due to patient characteristics.
TABLE 2 Summary Description of All Model Covariates in Study of CDI as a Proxy for Length of Stay

NOTE. APR, All Patient Refined; CDI, Clostridium difficile infection; DRG, diagnosis-related groups; OR, operating room; RN, registered nurse.
a The 29 comorbidity indicators were assigned by the Agency for Healthcare Research and Quality Comorbidity Software, version 3.7.
b The variables for APR DRGs along with APR DRG severity and APR DRG risk mortality were created using software developed by 3M Health Information Systems, version 27.0, for year 2009 and version 28.0 for years 2010 and 2011.
The primary explanatory variable of interest was each hospital’s annual CDI incidence rate. The incidence rate at each hospital was calculated as the ratio of the number of patient discharges with CDI diagnosis to the total number of annual discharges. Patients with CDI were identified as those with either a primary or secondary diagnosis of CDI (International Statistical Classification of Disease, Ninth Revision, code 008.45). This code has been previously validated as a measure for overall hospital CDI burden.Reference Dubberke, Reske and McDonald 28 – Reference Scheurer, Hicks and Cook 30
Two additional sets of explanatory variables were used for this analysis, which included hospital- and patient-level variables. Previous research has found LOS to be related to hospital-level factors such as bed size, teaching status,Reference McDermott and Stock 12 , Reference Freitas, Silva-Costa and Lopes 31 structure,Reference Cots, Mercade and Castells 32 and nurse staffing.Reference Thungjaroenkul, Cummings and Embleton 33 Thus, for the hospital-level-LOS analysis we controlled for a set of 12 hospital-level characteristics, including bed size, hospital ownership, region of the country, location (urban vs rural), and teaching status, along with the percentage of all licensed nurses who were registered nurses and the number of full-time nurses per 1,000 inpatient days. Note, the Nationwide Inpatient Sample contains a very limited number of hospital-level variables.
LOS has also been shown to be related to many patient-level characteristics, such as age,Reference Hoonhout, de Bruijne and Wagner 34 comorbidities,Reference Wen, He and Attenello 35 disease severity,Reference Horn, Sharkey and Buckle 36 and insurance status.Reference Yang, Peek-Asa and Allareddy 13 For the patient-level-LOS analysis, we controlled for all of the hospital-level variables along with a number of patient-level variables. Patient-level factors included patient demographic characteristics (eg, age, sex, primary payer, and ZIP-code-level income), inpatient-stay characteristics (eg, admission type, discharge quarter, weekend-admission indicator, discharge disposition, hospital-mortality indicator, neonatal and maternal indicators, along with the number of procedures, diagnoses, and chronic conditions), and disease characteristics (eg, All Patient Refined Diagnosis Related Groups indicators, severity, and risk of mortality categories; and 29 specific comorbidities). Table 2 provides a complete list of covariates, along with a description of each. In total, the patient- and hospital-level factors resulted in 429 separate covariates in the patient-level-LOS analysis.
Statistical Analysis
All statistical analyses were conducted using Stata SE, version 13.1 (StataCorp). Multivariate regression was used to estimate the effect of CDI incidence on inpatient LOS while controlling for the predictor variables described. Weighted least squares regression, with weights corresponding to the number of discharge records, was used to analyze average hospital-level LOS.
However, for the patient-level analysis, because patient LOS tends to be nonnormally distributed (ie, skewed), 5 different regression models were compared to estimate the effect of a hospital’s CDI incidence on LOS. These included (1) ordinary least squares, and a generalized linear model using a log link along with a (2) Gaussian, (3) gamma, (4) Poisson, and (5) negative binomial distribution. We compared the fit of these models using the Akaike information criterion, and estimates from the model with the lowest Akaike information criterion values are presented.
RESULTS
CDI Rates and Hospital-Level LOS
We first categorized hospitals into deciles on the basis of their CDI incidence and calculated the average LOS across hospitals in each of these deciles; Table 3 presents these results. Table 3 reports a significant (P<.001) positive correlation between LOS and CDI incidence, with LOS increasing by more than a day with a percentage point increase in CDI incidence. We then used multivariate regression to predict hospital-level LOS while controlling for hospital-specific characteristics. Results from the hospital-level regression analysis are presented in Table 4. The regression results mimic the findings of the bivariate comparisons between hospital deciles. For each year, a percentage point increase in a hospital’s CDI incidence rate was associated with an increase in average patient LOS of 1.19 to 1.61 days. For each year, CDI incidence was the strongest predictor of average LOS in terms of the absolute value of its coefficient estimate and test statistic. Furthermore, when CDI incidence was removed, the explanatory power of the model, as measured by the model’s R2 value, dropped considerably: without CDI incidence, the R2 values decreased from .45 to .11, from .51 to .12, and from .42 to .12 for each year from 2009 through 2011, respectively. Thus, CDI incidence explained the greatest amount of variation in average LOS between hospitals, of all the hospital-specific characteristics.
TABLE 3 LOS and CDI Incidence Rates by Hospital CDI Decile (Weighted Average)

NOTE. CDI, Clostridium difficile infection; LOS, length of stay. LOS is calculated as the average patient LOS across all patients without CDI for hospitals in a given decile. CDI rates are calculated as weighted averages across hospitals in a given decile, weighted by each hospital’s total number of discharges. The correlation coefficient and P values correspond to the correlation between each hospital’s average LOS and CDI incidence.
TABLE 4 Hospital-Level Results From a Weighted Least Squares Regression With Weights Corresponding to the Number of Discharges per Hospital

NOTE. CDI, Clostridium difficile infection; RN, registered nurse. The dependent variable is average hospital length of stay for patients without CDI.
CDI Rates and Patient-Level LOS
Variation in average LOS between hospitals may reflect underlying differences in patient populations rather than excess LOS. Thus, to control for patient characteristics, we used multivariate regression to predict patient-level LOS as a function of both patient and hospital characteristics.
The top of Table 5 presents the results of the regression model with the smallest Akaike information criterion value, namely the generalized linear model using a log link and gamma distribution. Results for the additional models are not presented but mirror those of the gamma model. For every year from 2009 through 2011, CDI incidence was strongly associated with an increase in an individual’s LOS (P<.001). On the basis of the coefficients of the chosen regression model, for 2009, 2010, and 2011, a percentage point increase in a hospital’s CDI incidence was roughly associated with an increase in a patient’s LOS by a factor of approximately 4.37%, 7.47%, and 5.87%, respectively. Moreover, in terms of P values, CDI incidence was one of the strongest predictors of LOS. Among the 429 covariates included in this model, CDI incidence had the eighth lowest P value in 2009 and the seventh lowest in 2010 and 2011. Only a small set of variables, such as the number of procedures a patient underwent or the number of diagnoses on a patient’s record, were stronger predictors of LOS.
TABLE 5 Patient-Level Results From GLM (Gamma Family, Log Link) and OLS Models

NOTE. AIC, Akaike information criterion; CDI, Clostridium difficile infection; GLM, generalized linear models; OLS, ordinary least squares; SE, standard error. The dependent variable is length of stay for patients without CDI.
As a point of comparison with the hospital-level LOS results, the bottom of Table 4 also reports the results of the patient-level ordinary-least-squares model with untransformed LOS. In this model, a percentage point increase in a hospital’s CDI incidence was associated with a 0.64-, 1.05-, and 0.84-day increase in a patient’s LOS, on average, for 2009, 2010, and 2011, respectively. These estimates are smaller than those in the hospital-level model, suggesting that some of the variation in average LOS associated with CDI incidence is due to patient-disease characteristics (ie, non-excess LOS). However, the relative size of the estimates in the patient-level model suggests that CDI incidence rates capture a significant portion of the excess LOS between hospitals. After accounting for patient-level characteristics, the effect size of the coefficient estimates of CDI incidence in the patient-level model were still greater than 50% of their value in the hospital-level model.
DISCUSSION
Our results demonstrate that hospital-level CDI rates are highly correlated with increased LOS in patients without a CDI diagnosis after controlling for patient and hospital characteristics. These results suggest that factors associated with high CDI rates in hospitals are also associated with excess LOS. Indeed, a 1% increase in an institution’s CDI rate, after controlling for all observable variables, is associated with an increase between 4.37% and 7.47% in a non-CDI patient’s LOS. These findings translated to an increase in LOS between 0.64 and 1.05 days. Thus, we believe that CDI acts as a proxy for hospital quality, efficiency, or perhaps both.
We found hospital CDI incidence to be a highly significant predictor of LOS at both the hospital and patient level. In fact, CDI rates were one of the strongest predictors in each model, stronger than all other hospital characteristics and most patient characteristics. Moreover, many common patient-level characteristics used in studies of patient outcomes (eg, age) had a far smaller impact on LOS than did CDI incidence in our model. These results suggest that unmeasured hospital characteristics that are captured by CDI incidence may play a greater role in determining a patient’s LOS than the patient’s and hospital’s underlying characteristics. In every model we estimated, the quality and fit (eg, Akaike information criterion or R2) of our model improved substantially when CDI incidence was included. This finding alone suggests that future research using LOS as an outcome measure should consider CDI incidence, or the factors it captures, as a proxy variable. Failure to account for CDI incidence may result in omitted variable bias leading to an incorrect interpretation of the effects of other variables related to LOS: the magnitudes, signs, and significance of many of the estimated coefficients included in our model changed dramatically when CDI incidence was removed. Given the abundance of existing research that has used LOS as an outcome measure, it may be worth revisiting factors previously associated with LOS.
A useful feature of CDI rates in comparing LOS between hospitals is that CDI rates appear to be a good proxy for excess LOS rather than LOS due to disease characteristics. A potential limitation of any measure used to make hospital-level comparisons is the need to properly adjust for the type of patients a hospital treats. For example, hospitals that treat a more severely ill patient population may appear to have longer LOS on average, even though such increased LOS would not be considered excessive. Our results suggest that CDI rates may discriminate between excess and ordinary LOS. After individual patient characteristics were added to the hospital-level model, greater than 50% of the effect of CDI on LOS remained. These findings suggest that more than half of the variation in LOS between hospitals that can be explained by CDI rates may be due to excess LOS. Thus, CDI rates may be useful when comparing excess LOS between hospitals.
CDI rates are easy to compute and to compare across hospitals, especially in relation to many measures of quality or other markers of patient safety. The Agency for Healthcare Research and Quality indicators have frequently been used as measures of hospital quality and patient safety. However, such indicators require many variables, and the coding algorithm for these indicators has changed over time. 37 Less complex measures, such as adherence to guidelines for acute myocardial infarction, often focus on only specific patient populations. In contrast, CDI rates are simple to calculate and easy to compare across hospitals. CDI rates are inherently important to measure, and our results provide another reason for tracking and considering CDI rates.
Although CDI rates are associated with longer LOS in patients who do not have CDI, we do not claim that higher CDI rates are necessarily causing longer LOS. Instead, we think that CDI rates may act as a proxy for unobserved/unmeasured hospital characteristics that are related to hospital efficiency and/or quality (eg, environmental cleanliness, hospital crowding, and inappropriate and excess use of antibiotics). We hypothesize that 2 main factors play a role in driving this relationship. First, hospitals that are of lower quality or less efficient may tend to have longer LOS and generate more hospital-associated CDI. Second, hospitals with longer LOS, due to either efficiency or quality characteristics, may also observe more hospital-associated CDI before discharge of patients. Thus, there is no guarantee that efforts to reduce CDI may affect LOS in uninfected patients. In addition, CDI rates may be dependent upon connections with other hospitals via patient transfers that we are unable to observe in this analysis.Reference Simmering, Polgreen and Campbell 38 Future investigations should focus on analyzing potentially causative factors driving the relationship between CDI rates and LOS in patients without CDI. For example, the additional isolation rooms needed for hospitals with a higher CDI incidence may result in ineffective transitions of care or misallocation of staffing resources. Unfortunately, we are unable to perform this analysis with our data.
The connection between CDI incidence and LOS may occur because a hospital is generating more hospital-associated CDI. One limitation of our study is we cannot directly determine whether a CDI diagnosis was hospital associated. Therefore, we performed a sensitivity analysis where we calculated CDI incidence using only CDI cases that were recorded as a secondary diagnosis. Although secondary CDI diagnoses have been shown to contain non–hospital-associated CDI cases,Reference Dubberke, Butler and Yokoe 29 by removing primary CDI cases, the calculated CDI incidence should contain a greater proportion of hospital-associated CDI cases. Results are reported in the Online Supplementary Appendix. When only secondary CDI cases were included, our findings became even stronger: both the estimated effect and significance of CDI incidence on LOS increased, and the fit of the model improved. Thus, the link between CDI incidence and LOS may occur via hospital-associated CDI.
There are other limitations to our study. First, we used administrative data rather than clinical microbiologic results to measure CDI. However, administrative codes for CDI have been demonstrated to be a relatively sensitive and specific marker for CDI.Reference Dubberke, Reske and McDonald 28 – Reference Scheurer, Hicks and Cook 30 Second, rates of CDI differ over time and region and depend upon different microbiologic testing approaches for CDI that undoubtedly differ across institutions. However, our study was conducted over a series of years and included many types of hospitals. Third, it would be ideal to have information regarding CDI cases attributable to hospital stays that occur after hospital discharge; the number of such cases may be nontrivial.Reference Kuntz and Polgreen 39 However, we have only inpatient data. Although this limitation does not detract from the effectiveness of CDI rates as a marker for hospitals with longer LOS, such information would be useful in further analyzing the connection between CDI rates and LOS. Fourth, in our patient-level analysis, we dropped a number of patients with missing values in the covariates. However, we are not concerned that this biased our results, given that our hospital-level model, which included all patients, showed a similar, yet stronger, association between CDI rates and LOS. Fifth, we included a large number of patient-level covariates in our analyses to avoid omitted variable bias. More parsimonious models, containing more patients but fewer patient-level variables, actually increased our observed effect (data not shown). Although the inclusion of a large number of variables could render the model susceptible to multicollinearity, we found no evidence of model instability or estimation inaccuracy. Finally, a limitation associated with all such modeling efforts is that the estimates we generate for excess LOS associated with higher CDI should be interpreted with caution: additional quality or efficiency measures are necessary to estimate the exact effect size.
In conclusion, CDI rates are an accurate predictor of LOS in patients without CDI, even after considering both individual- and institution-level factors. CDI incidence had greater explanatory power than any other hospital characteristic and almost all commonly used patient characteristics. Moreover, differences in CDI rates between hospitals appear to capture differences in excess, rather than ordinary, LOS. CDI rates are easy to measure and may provide an important marker for hospital efficiency and quality. Thus, our findings may provide another reason for policy makers, healthcare administrators, and clinicians to track CDI rates.
ACKNOWLEDGMENTS
Financial support. National Heart, Lung, and Blood Institute of the National Institutes of Health (grant K25HL122305); and the University of Iowa Health Care eHealth and eNovation Center.
Potential conflicts of interest. All authors report no conflicts of interest relevant to this article.
SUPPLEMENTARY MATERIAL
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/ice.2015.340