Background
Clostridioides difficile infection (CDI) is an important contributor to the overall burden of healthcare-associated infections,Reference Miller, Chen, Sexton and Anderson1,Reference Dubberke and Olsen2 causing an estimated half million infections and 30,000 deaths in the United States annually.Reference Lessa, Winston and McDonald3,4 Patients with CDI demonstrate symptoms across the spectrum of clinical severity ranging from relatively mild and self-limiting diarrhea to intestinal perforation, sepsis, and death. Reports indicate that CDI incidence and indicators of disease severity, including ICU admission and death, increased between 2001 and 2012, likely related to the emergence and rapid spread of the hypervirulent, fluoroquinolone-resistant BI/NAP1/027 strain.Reference Ma, Brensinger, Wu and Lewis5–Reference Pépin, Saheb and Coulombe8
The observed increases in disease severity are troubling given that CDIs cause physical exhaustion, social isolation, severe emotional distress,Reference Guillemin, Marrel and Lambert9 and decreased quality of life and productivityReference Heinrich, Harnett, Vietri, Chambers, Yu and Zilberberg10 among patients. They also place a substantial burden on the healthcare system.Reference Zhang, Prabhu and Marcella11 One potential approach to mitigating the impact of more severe infections is to use clinical prediction rules to identify patients at highest risk of poor CDI outcomes and intervene. When carefully designed and evaluated, clinical prediction rules can be useful for guiding management decisions, targeting more intensive interventions to patients most likely to benefit from them and resulting in more resource-efficient care.Reference Reilly and Evans12 To be useful in clinical practice, a tool must use information available to the clinician at the time of management decisions, must be both internally and externally valid, and must be easy to implement.
In 2010, the Society for Healthcare Epidemiology of America (SHEA) and the Infectious Diseases Society of America (IDSA) clinical practice guidelines proposedReference Cohen, Gerding and Johnson13 defining severe CDI as leukocytosis or an increase in serum creatinine (SCr) value over patient baseline levels. In the 2018 guidelines,Reference McDonald, Gerding and Johnson14 the severity criteria were updated, changing the definition of renal dysfunction from a relative increase of 50% over baseline to an absolute SCr value >1.5 mg/dL. However, the proposed criteria are based on expert opinion, and guideline authors emphasized the need to validate these or other criteria.Reference McDonald, Gerding and Johnson14 Although the lack of validation of SHEA/IDSA criteria is widely recognized, its applicability across healthcare settings has received relatively little attention. In recent years, CDI cases are increasingly diagnosed in the outpatient setting among younger patients with fewer healthcare exposures.Reference Reveles, Pugh and Lawson15,Reference Russo, Kuntz and Yu16 Severity scoring rules that rely on laboratory values may be less useful in this growing segment of CDI cases. To address these gaps, we sought to evaluate the utility of the 2010 and 2018 versions of the SHEA/IDSA CDI severity criteria to predict poor outcomes in outpatient, acute-care and long-term care settings in a large, multicenter cohort of patients with CDI.
Methods
Study design and participants
We conducted a retrospective cohort study of inpatients, outpatients, and long-term care residents with CDI in the US Department of Veterans’ Affairs (VA) Health System from January 1, 2006, to December 31, 2016. We defined CDI as the presence of C. difficile toxin or toxin genes from a stool sample. Patients were included the first time they had a positive test during the study period. This study was reviewed and approved by the University of Utah Institutional Review Board and the Research and Development Committee at the VA Salt Lake City Health Care System.
Study data
Information on patient and CDI episode characteristics was extracted from the VA’s corporate data warehouse and was accessed using the VA informatics and computing infrastructure. Acute-care and long-term care visits were extracted from inpatient files. Laboratory values were extracted from the laboratory chemistry package.
The CDI episodes were classified according to surveillance criteria from the Centers for Disease Control and Prevention.Reference McDonald, Gerding and Johnson14,Reference McDonald, Coignard and Dubberke17 A CDI incident was defined as a positive test with no history of a positive test in the previous 56 days. Recurrent CDI was defined as CDI in a patient with a positive test in the previous 15–56 days. Duplicate episodes were positive tests with another positive test in the prior 14 days and were excluded from the analysis. Episodes were also defined according to the location of diagnosis. Tests performed in a long-term care facility or outpatient clinic were grouped together as subacute care. Tests performed in the emergency department, on a medical or surgical ward, or in the intensive care unit (ICU) were defined as acute care.
Severity definition
Severity was defined according to the 2010 and 2018 SHEA/IDSA criteria. For the 2010 definition, a CDI episode was classified as severe if the patient experienced either (1) leukocytosis, defined as the maximum white blood cell (WBC) count ≥15,000 cells/μL or (2) a maximum SCr value ≥1.5 times the baseline level.Reference Cohen, Gerding and Johnson13 For the 2018 criteria, CDI episodes were considered severe if the patient had either (1) leukocytosis as previously defined, or (2) a maximum SCr value >1.5 mg/dL.Reference McDonald, Gerding and Johnson14 Maximum WBC and SCr data were recorded within the 3 days prior to diagnosis. Baseline SCr was calculated as the average of SCr values from the 4–90 days prior to diagnosis.Reference Stevens, Nelson and Schwab-Daugherty18 Patients lacking sufficient laboratory data were considered unclassifiable and were not included in the primary analysis. Severe complicated (or fulminant) CDI was not considered as a distinct severity category.
Outcome definitions
Patients were followed from the time of diagnosis for poor outcomes. Poor outcome was defined as a binary composite outcome if any of the following events occurred: (1) hospital admission for outpatients and long-term care residents within 7 days of diagnosis, (2) ICU admission within 7 days, (3) colectomy within 14 days, or (4) 30-day all-cause mortality.Reference Chitnis, Holzbauer and Belflower19–Reference Wenisch, Schmid and Kuo21
Statistical analysis
Logistic regression was used to model poor outcome separately as a function of severe disease (yes/no) as defined by the 2010 and 2018 severity criteria. Overall model performance was evaluated using Brier scores and Nagelkerke’s R2. Brier scores range from 0 for a perfect model to 0.25 for a completely uninformative model (assuming an outcome incidence of 50%). The maximum Brier score decreases as the incidence of the outcome decreases. Nagelkerke’s R2 measures the percentage of variation in the outcome that can be explained by the predictors (in this case, severity classifications).Reference Steyerberg, Vickers and Cook22 Discrimination, or the probability of correctly predicting poor outcome in a randomly chosen pair of patients on the basis of the severity criteria, was measured by the area under the receiver operating characteristics (ROC) curve (AUC). Calibration, or the agreement between observed and expected cases, is usually measured across the range of patient risk. Due to the dichotomous nature of the severity scores, patients in this study fall into 1 of 2 risk categories (ie, high or low) rather than across a spectrum; therefore, we did not measure calibration or calculate Hosmer-Lemeshow goodness-of-fit statistics. Sensitivity, specificity, and positive and negative predictive values (PPV and NPV, respectively) and 95% confidence intervals (CIs) were also calculated. Categorized predictors are well known to degrade predictive performance. Reference Steyerberg, Uno and Ioannidis23,Reference Collins, Ogundimu, Cook, Manach and Altman24 We tested whether including WBC and SCr as continuous values (2010: maximum WBC and relative SCr change from baseline; 2018: maximum WBC and maximum SCr) would improve model performance characteristics. A second sensitivity analysis was conducted assuming that all missing values fell within normal ranges due to the higher frequency of missing values and lower overall acuity of patients diagnosed in the subacute-care setting, which may bias results. Finally, we evaluated the performance of the scores among the subset of C. difficile–positive patients who also received treatment with metronidazole, oral vancomycin, or fidaxomicin to ensure that included patients were considered by the treating clinician to have clinically relevant CDI. All analyses were stratified by the setting of diagnosis (subacute care vs acute care). Statistical analyses were performed in R Studio (R Foundation for Statistical Computing, Vienna, Austria). Model performance characteristics were estimated using the predictABEL package. Reference Kundu, Aulchenko, van Duijn and Janssens25 Sample R code is included in the Supplementary Material online.
Results
Between January 1, 2006, and December 31, 2016, our search identified 86,112 episodes of CDI met eligibility criteria for inclusion in the analysis. Patient and CDI episode characteristics by location of diagnosis are shown in Table 1. Patients were a median (interquartile range [IQR], 67.0 (19.0) and 68.0 (18.0) years old in the subacute- and acute-care settings, respectively. Nearly all included episodes (>99.0%) were classified as incident, and most (60.7%) were diagnosed in the acute-care setting. Overall, 17,741 patients (20.8%) experienced poor outcomes; the most common of these were 30-day all-cause mortality (10.9%) and admission to acute care for those not hospitalized at diagnosis (8.3%). Poor outcomes were more common among patients diagnosed in the acute-care setting.
Table 1. Characteristics of Patients Diagnosed With CDI by Location of Diagnosis in the US Department of Veterans’ Affairs, 2006–2016
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201104161818013-0298:S0899823X20000082:S0899823X20000082_tab1.png?pub-status=live)
Note. CDI, Clostridioides difficile infection; WBC, white blood cell count; IQR, interquartile range; SCr, serum creatinine value; ICU, intensive care unit.
According to the 2010 and 2018 criteria, 25,743 (29.9%) and 37,879 (44.0%) episodes were classified as severe, respectively. For both the 2010 and 2018 criteria, more episodes were unclassifiable in the subacute-care group than the acute-care group (2010: 58.8% vs 13.9%; 2018: 49.2% vs 5.7%, respectively). A list of missing values by location of diagnosis, laboratory component, and criteria is provided in Table 2, and more detail is available in Supplementary Table 1 (online). The records of approximately 61% of subacute-care patients and 17% of acute-care patients were missing either the WBC or SCr information needed to determine severity using the 2010 criteria. The proportions missing decreased to 54% and 7% using the 2018 criteria. Figure 1 shows the distribution of severity determination based on the 2010 and 2018 criteria by location of diagnosis. Most cases in the outpatient setting were unclassifiable, with few severe episodes. In contrast, cases in the ICU were mostly classified as severe, with very few unclassifiable cases.
Table 2. Missingness of Data Elements to Determine Severity According to the 2010 and 2018 SHEA/IDSA Severity Criteria for Patients Diagnosed in Subacute and Acute Care
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201104161818013-0298:S0899823X20000082:S0899823X20000082_tab2.png?pub-status=live)
Note. WBC, white blood cell count; SCr, serum creatinine value.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201104161818013-0298:S0899823X20000082:S0899823X20000082_fig1.png?pub-status=live)
Fig. 1. Distribution of severity determination based on the 2010 and 2018 criteria by location of diagnosis.
Table 3 shows a direct comparison of the performance characteristics for both criteria by location based on logistic regression modeling. Overall model performance among classifiable cases was generally poor. Brier scores for the 2010 and 2018 criteria were near 0.15 for subacute care and 0.18 for acute care. Nagelkerke’s R2 indicated that only 3%–4% of variance in the outcome could be explained by either severity definition regardless of location. The AUCs were poor and similar (for both versions: subacute care, 0.60; acute care, 0.57), and the ROC curves for each criteria by location are shown in Figure 2. The 2018 severity criteria had a higher sensitivity than the 2010 criteria in both the subacute- and acute-care settings. NPVs were similar for both scores across settings (0.80–0.86). Including WBC and SCr measurements as continuous values resulted in slightly improved AUCs (0.59–0.62) but similar overall model performance measures. Treating missing values as normal and restricting the analysis to treated episodes generated similar results (Supplementary Tables 2–4 and Supplementary Figs. 1–3 online).
Table 3. Performance Characteristics of the 2010 and 2018 Severity Scores by Location of Diagnosis Based on Logistic Regression Modeling Among Classifiable Cases
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201104161818013-0298:S0899823X20000082:S0899823X20000082_tab3.png?pub-status=live)
Note. AUC, area under the curve; CI, confidence interval; PPV, positive predictive value; NPV, negative predictive value; OR, odds ratio.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201104161818013-0298:S0899823X20000082:S0899823X20000082_fig2.png?pub-status=live)
Fig. 2. Receiver-operating characteristics curve for the 2010 and 2018 SHEA/IDSA severity criteria for patients with Clostridioides difficile diagnosed in the subacute (panel A) and acute care (panel B) settings in the US Department of Veterans’ Affairs Health System, 2006–2016.
Discussion
Improving outcomes for patients who develop CDI depends on our ability to accurately identify those at greatest risk of poor outcomes. In an era of increasingly complex medical decision making and substantial cognitive overload, clinical prediction rules are an attractive way to facilitate treatment decision making and reduce cognitive burden. However, developing, validating, and promoting widespread uptake of these rules is a challenging process. Patients with CDI exhibit a broad range of symptoms and subsequent outcomes. Although distinguishing between mild-to-moderate and severe CDI is no longer recommended for initial treatment choice, the role of metronidazole in CDI treatment is still being discussed, Reference Fabre, Dzintars, Avdic and Cosgrove26,Reference McDonald, Johnson, Bakken, Garey, Kelly and Gerding27 and research into its optimal use continues. Reference Appaneal, Caffrey and LaPlante28 Given the frequency and severity of possible CDI outcomes, identifying patients most likely to require more intensive management remains an important goal.
Despite the plethora of predictive models, ranging from simple criteria based on expert opinion to complex algorithms derived from machine learning methods, Reference Kassam, Cribb Fabersunne and Smith29–Reference Li, Oh, Young, Rao and Wiens37 no single severity definition has gained widespread acceptance for the management of CDI. Furthermore, there is currently no clear consensus on whether a clinical prediction rule should focus on measuring the inflammatory process or on identifying those factors that might predict which patients will not respond to standard therapy so that adjustments to the management approach can be made. Existing severity scores have been based on a mix of inflammatory and other factors including age, comorbidities, serum biomarkers, treatment with systemic antibiotics, leukocyte count, and albumin and SCr as a measure of renal function. The SHEA/IDSA criteria, which suggest the use of leukocytosis and decreased renal function as markers of disease severity, are arguably the most visible criteria, but to date they have not been rigorously and broadly validated across multiple healthcare settings.
Earlier studies of the SHEA/IDSA criteria did not perform formal validation. Reference Gomez-Simmonds, Kubin and Furuya38,Reference Mulherin, Hutchison, Thomas, Hansen and Childress39 More recently, both versions of the SHEA/IDSA criteria were validated for the prediction of 30-day mortality in a tertiary-care facility in China among a small number of CDI patients (n = 401) with AUCs between 0.60 and 0.65. Reference Chiang, Huang and Chung40 We similarly observed that both criteria exhibited poor discrimination (ie, AUCs near 0.60) and low positive predictive values (ie, PPVs near 0.30) in both the acute-care and subacute-care settings. Overall model metrics provide further evidence of suboptimal performance. Nagelkerke’s R2 values indicate that <5% of the variance in poor outcomes can be explained by severity classifications. Including WBC and SCr parameters as continuous variables rather than dichotomous indicators did not appreciably improve model performance, suggesting the need to consider other factors in determining which patients are at greatest risk for being admitted to the hospital or ICU, needing a colectomy, or death.
Despite the imprecision of the SHEA/IDSA criteria for measuring severity to predict poor outcomes, studies suggest that failing to stratify patients according to the 2010 guidelines resulted in worse outcomes for patients, although the exact reasons for better outcomes in guideline-concordant therapy are unclear. Reference Patel, Wungjiranirun and Theethira41,Reference Crowell, Julian and Katzman42 Although clinical prediction rules are generally classified as poor performers when the C-statistic (or AUC) is <0.60 and as excellent performers when the C-statistic is >0.95, no theoretical foundations support the use of these cutoffs across all medical domains. Reference Caetano, Sonpavde and Pond43 If the C-statistic is >0.5, then the model performs better than chance alone at predicting the outcome. When treatment decisions are made based on an imperfect but still better than chance clinical prediction rule, patient outcomes may still improve. As with diagnostic tests, clinicians may place greater or lesser weight on false positives than false negatives, or it may instead be desirable to minimize overall error (eg, using the F1 score). We observed NPVs between 0.80 and 0.86, suggesting that the severity criteria may be more appropriately used to identify low-risk patients unlikely to experience poor outcomes than to identify patients at high risk. Further study is needed to determine how well a clinical prediction rule needs to perform to be useful for decision making.
Predicting poor outcomes among hospitalized patients with CDI has been the major focus of previously developed rules. However, an increasing proportion of CDI cases are occurring in the community. Reference Reveles, Pugh and Lawson15,Reference Russo, Kuntz and Yu16 Little attention has been given to how severity scores predict poor outcomes in this increasing subset of CDI patients, particularly among those without laboratory values. In our study, one-third and one-quarter of episodes were missing either WBC or SCr values for the 2010 and 2018 versions of the criteria, respectively, and 15% were missing both values. As expected, most of these unclassifiable cases occurred among patients diagnosed outside the acute-care setting.
A number of factors should be considered when building a usable clinical prediction rule. If the rule will be used to guide treatment and management decisions, it should only incorporate information available at the time of diagnosis. The acceptability of a clinical prediction or decision rule is influenced by trust in its validity, the likelihood of benefit to patients, and how easy it is to use. Reference Brehaut, Graham and Wood44 Proper internal and external validation are critical to establishing trust in a prediction rule. Although simple rules that can be encoded as heuristics are simplest to implement logistically, efforts to make prediction rules simpler (eg, dichotomization) often lead to substantial information loss and diminished predictive performance. Reference Steyerberg, Uno and Ioannidis23,Reference Collins, Ogundimu, Cook, Manach and Altman24 More complex algorithms may perform better, but they are more difficult to implement widely depending on site-specific ability to make changes to the electronic health record systems. The literature on CDI severity score development demonstrates the challenges common to clinical prediction rules in general: small sample sizes and even smaller event numbers, lack of internal and external validation, stepwise variable selection methods, use of highly correlated predictors, overfitting, conflation of predictors and outcome (eg, defining severity based on poor outcomes that occur after diagnosis), and dichotomization.
To the best of our knowledge, this study is the largest formal evaluation of the performance of the 2010 and 2018 SHEA/IDSA CDI severity criteria in a multicenter cohort and the first to include inpatients, long-term care residents, and outpatients. Our study included >80,000 episodes of CDI, and due to the integrated nature of the VA healthcare system, we were able to capture both inpatient and outpatient mortality and other outcomes of interest.
Our study has several limitations. We used retrospective rather than prospective validation of the criteria to predict poor outcomes. We were unable to separate severe complicated (or fulminant) cases in our assessment due to the unavailability of information on ileus and megacolon among the structured data elements. CDI was defined based on laboratory findings alone because we had no information on diarrheal symptoms. As recommendations for optimal testing algorithms have shifted over the years, VA laboratories moved from largely using enzyme immunoassay toward polymerase chain reaction after 2010. Reference Evans, Kralovic, Simbartl, Jain and Roselle45,Reference Reeves, Evans and Simbartl46 Changes in testing practices may have influenced the severity of patients diagnosed with CDI and included in our study. This potential dilution of our cohort with inaccurately diagnosed or asymptomatically colonized patients may have impacted the predictive performance of the severity criteria. However, the issue of possible overdiagnosis is not unique to the VA health system, and it is likely a consideration at other facilities considering the use of a clinical prediction rule to guide CDI management.
In conclusion, both the 2010 and 2018 versions of the SHEA/IDSA CDI severity criteria perform below widely accepted standards for clinical prediction rules. A nontrivial proportion of CDI cases may be unclassifiable based on these criteria, particularly for patients diagnosed in the outpatient or long-term care settings. Additional work is needed to determine the desirable characteristics of the rule, with particular attention to minimum performance standards, acceptability, applicability in CDI case subgroups, and ultimate plans for the implementation of prevention interventions.
Supplementary material
To view supplementary material for this article, please visit https://doi.org/10.1017/ice.2020.8
Acknowledgments
The authors thank Carrie Edlund for critical review and editing of the manuscript. The views expressed in this article are those of the authors and do not necessarily reflect the position or policy of the US Department of Veterans Affairs or the US government. All authors have no conflicts of interest to report.
Financial support
This work was supported in part by a Center of Innovation grant (no. CIN 13-414; primary investigator, Matthew Samore) from the Department of Veterans’ Affairs, Veterans’ Health Administration, Office of Research and Development, Health Services Research and Development.
Conflicts of interest
All authors report no conflicts of interest relevant to this article.