Hostname: page-component-6bf8c574d5-w79xw Total loading time: 0 Render date: 2025-02-22T00:03:25.882Z Has data issue: false hasContentIssue false

Prediction of Recurrent Clostridium Difficile Infection Using Comprehensive Electronic Medical Records in an Integrated Healthcare Delivery System

Published online by Cambridge University Press:  24 August 2017

Gabriel J. Escobar*
Affiliation:
Kaiser Permanente Division of Research, Oakland, California
Jennifer M. Baker
Affiliation:
Contra Costa Public Health Clinic Services, Martinez, California
Patricia Kipnis
Affiliation:
Kaiser Permanente Division of Research, Oakland, California Kaiser Permanente Northern California, Oakland, California
John D. Greene*
Affiliation:
Kaiser Permanente Division of Research, Oakland, California
T. Christopher Mast
Affiliation:
Merck Research Laboratories, North Wales, Pennsylvania
Swati B. Gupta
Affiliation:
Merck Vaccines, West Point, Pennsylvania
Nicole Cossrow
Affiliation:
Merck Research Laboratories, North Wales, Pennsylvania
Vinay Mehta
Affiliation:
Merck Research Laboratories, North Wales, Pennsylvania
Vincent Liu
Affiliation:
Kaiser Permanente Division of Research, Oakland, California Santa Clara Medical Center and Medical Offices, Kaiser Permanente Northern California, Santa Clara, California
Erik R. Dubberke
Affiliation:
Washington University School of Medicine, St Louis, Missouri
*
Address correspondence to Gabriel J. Escobar, MD, Systems Research Initiative, Kaiser Permanente Division of Research, 2000 Broadway Ave (032 R01), Oakland, CA 94612-2304 (gabriel.escobar@kp.org) or John Greene, MA, Systems Research Initiative, Kaiser Permanente Northern California Division of Research, 2000 Broadway Ave, Oakland, CA 94612 (john.d.greene@kp.org).
Address correspondence to Gabriel J. Escobar, MD, Systems Research Initiative, Kaiser Permanente Division of Research, 2000 Broadway Ave (032 R01), Oakland, CA 94612-2304 (gabriel.escobar@kp.org) or John Greene, MA, Systems Research Initiative, Kaiser Permanente Northern California Division of Research, 2000 Broadway Ave, Oakland, CA 94612 (john.d.greene@kp.org).
Rights & Permissions [Opens in a new window]

Abstract

BACKGROUND

Predicting recurrent Clostridium difficile infection (rCDI) remains difficult. METHODS. We employed a retrospective cohort design. Granular electronic medical record (EMR) data had been collected from patients hospitalized at 21 Kaiser Permanente Northern California hospitals. The derivation dataset (2007–2013) included data from 9,386 patients who experienced incident CDI (iCDI) and 1,311 who experienced their first CDI recurrences (rCDI). The validation dataset (2014) included data from 1,865 patients who experienced incident CDI and 144 who experienced rCDI. Using multiple techniques, including machine learning, we evaluated more than 150 potential predictors. Our final analyses evaluated 3 models with varying degrees of complexity and 1 previously published model.

RESULTS

Despite having a large multicenter cohort and access to granular EMR data (eg, vital signs, and laboratory test results), none of the models discriminated well (c statistics, 0.591–0.605), had good calibration, or had good explanatory power.

CONCLUSIONS

Our ability to predict rCDI remains limited. Given currently available EMR technology, improvements in prediction will require incorporating new variables because currently available data elements lack adequate explanatory power.

Infect Control Hosp Epidemiol 2017;38:1196–1203

Type
Original Articles
Copyright
© 2017 by The Society for Healthcare Epidemiology of America. All rights reserved 

Clostridium difficile infection (CDI) is a serious illness whose presentation can range from loose stools to profuse watery diarrhea, leading to dehydration, life-threatening complications, and sometimes death. This illness is associated with substantial morbidity, mortality, excess health services utilization, and increased cost.Reference Lessa, Winston and McDonald 1 Reference Olsen, Young-Xu and Stwalley 3 The Centers for Disease Control and prevention estimated that there were 453,000 cases of incident CDI (iCDI) in 2011, with 29,000 associated deaths and 83,000 first recurrences (rCDI).Reference Lessa, Winston and McDonald 1 Recurrences are common due to persistent or newly acquired bacterial spores.Reference Freedberg, Salmasian, Cohen, Abrams and Larson 4 After initial treatment and resolution of diarrhea, up to 35% of CDI patients experience rCDI.Reference Lessa, Winston and McDonald 1 , Reference McFarland 5 , Reference Bouza 6 Of those with a primary recurrence, 40% will have another CDI episode, and after 2 recurrences, the likelihood of an additional episode increases to as high as 65%.Reference McFarland, Elmer and Surawicz 7 However, due to recent advances, this estimate may be overstated.Reference Zilberberg, Reske, Olsen, Yan and Dubberke 8 , Reference Sheitoyan-Pesant, Abou Chakra, Pepin, Marcil-Heguy, Nault and Valiquette 9

Prevention of rCDI remains a critical unmet medical need, and it is desirable to predict which patients are at highest risk of recurrence. A number of research teams have developed predictive models for rCDI.Reference Hu, Katchar and Kyne 10 Reference D’Agostino, Collins, Pencina, Kean and Gorbach 13 These models have had limited sample size, have been restricted to data from a single center, have employed imprecise proxies for measures of disease severity, and have made limited use of electronic medical record (EMR) data.

A need exists for risk prediction models to address these gaps. As more healthcare systems in the United States transition to fully automated EMRs, it is important to take advantage of the increased granular clinical data that are becoming available. Although health systems are beginning to experiment with predictive models embedded in EMRs,Reference Kollef, Chen and Heard 14 Reference Escobar, Turk and Ragins 16 access to such capability remains limited. The overall incidence of CDI is affected by local factors such as antimicrobial stewardship efforts, patient case mix, varying antibiotic utilization patterns, C. difficile strain epidemiology, and prevention. Thus, models may not be completely generalizable and may need periodic updating. Although considerable interest in predicting rCDI exists, descriptions of the performance characteristics of existing models have been limited, and few have been sufficiently validated outside the populations in which they were developed. Now that treatments are available to prevent the recurrence of CDI (eg, fidaxomicin,Reference Watt, Dinh, Le Monnier and Tilleul 17 , Reference Nelson, Suda and Evans 18 bezlotoxumabReference Wilcox, Gerding and Poxton 19 ), it is advantageous to patients and healthcare providers to identify those at greatest risk for recurrence who may benefit from the most appropriate treatments.

To address these gaps, we developed and validated rCDI predictive models in a large and representative sample of adults. For our defined population, cared for by a single medical group within an integrated delivery system, Kaiser Permanente Northern California (KPNC), comprehensive EMR data were available. Our modeling process included comparing different models and externally validating a previously published model.

MATERIALS AND METHODS

This project was approved by the KPNC Institutional Review Board for the Protection of Human Subjects, which has jurisdiction over all the hospitals and clinics described in this report.

Our setting consisted of 21 KPNC hospitals described previously.Reference Escobar, Greene, Gardner, Marelich, Quick and Kipnis 20 Reference Escobar, Gardner, Greene, Draper and Kipnis 22 Under a mutual exclusivity arrangement, salaried physicians of The Permanente Medical Group care for 4.2 million Kaiser Foundation Health Plan members at facilities owned by Kaiser Foundation Hospitals. All KPNC facilities (21 hospitals and an additional 60 clinics) employ the same information systems with a common medical record number.Reference Selby 23 Comprehensive KPNC information systems permit tracking of patient information across the continuum of care, including some aspects of care outside KPNC.22,23 Deployment of the Epic EMR system (www.epicsystems.com), known internally as KP HealthConnect (KPHC), began in 2006 and was completed in 2010.

The eligible population (denominator) included adults ≥18 years of age with at least 1 positive test (the index test) for C. difficile toxins or DNA associated with a hospitalization between 2007 and 2014. The date-time stamp of the physician order for the index test was time zero (T0) for all study measurements. Details on KPNC assays and testing procedures are provided in the Appendix.

Measures

Primary study outcome

The dependent variable was rCDI, which could occur either in the inpatient or outpatient setting. To ensure that we distinguished between incident and recurrent episodes, T0 had to be preceded by an 84-day period with no evidence of CDI (Figure 1). A patient’s treatment period extended from the first known instance of antibiotic treatment to 48 hours after conclusion of such treatment. A positive test defining a patient as having rCDI had to occur within 84 days after the end of the treatment period. Tests that occurred within the treatment period were not included. Figure 1 also shows that predictors were included if available up to 4 days after T0, a clinically reasonable period for acquisition of information following a CDI testing order.

FIGURE 1 Time periods employed to define patient inclusion in cohort and patient data in predictive models. The T0 is defined by the date/time stamp of the physician order for the index test. In order for the patient to be included in the cohort, the T0 had to be preceded by 84 days with no positive test for Clostridium difficile (“clean”period). To be considered an outcome, an infection had to occur during the Recurrence period. This meant that a positive test result occurred within 84 days following the end of a variable treatment period (time between the T0 and completion of antibiotic treatment, ABX End ). Patient data included in the predictive models had to be available within 4 days from the T0 (Predictor period). See text for additional details.

Mortality

We ascertained mortality using KPNC patient demographic databases and publicly available files of deceased patients provided by the Social Security Administration, as described previously.Reference Escobar, Gardner, Greene, Draper and Kipnis 22

Model development

We assessed more than 150 potential predictors, including age, sex, and different configurations of historical variables (eg, antibiotic exposure, recent hospitalizations, and surgery). The final set of 23 predictors incorporated in the 3 models was based on clinical grounds, statistical performance, data abstraction burden in settings without EMRs, and (for the fully automated models) current KPNC data availability.Reference Escobar and Dellinger 15 , Reference Escobar, Turk and Ragins 16

Predictors fell into the following categories: demographic (age, sex), location of iCDI onset (either the inpatient setting or a skilled nursing facility), medication exposure (antibiotics, proton pump inhibitors), comorbidities (both as individual predictors as well as composite indices such as the Charlson comorbidity indexReference Deyo, Cherkin and Ciol 24 and the 12-month longitudinal COmorbidity Point Score, version 2, or COPS2Reference Escobar, Gardner, Greene, Draper and Kipnis 22 ), medical history (eg, recent surgery involving the gastrointestinal tract), and physiologic markers (ie, laboratory tests, vital signs, and a severity of illness score, the Laboratory-based Acute Physiology Score, version 2 (LAPS2).Reference Escobar, Gardner, Greene, Draper and Kipnis 22 The LAPS2 employs 16 laboratory tests, vital signs, pulse oximetry, and neurological status checks. We categorized 24 antibiotics as high risk (eg, ciprofloxacin, clindamycin, and amoxicillin).Reference Zilberberg, Reske, Olsen, Yan and Dubberke 25 Reference Dubberke, Yan and Reske 27 A full list of the predictors examined is provided in the Appendix.

Based on statistical performance, the 3 best-performing models are described here: basic, enhanced, and automated. The basic model is a parsimonious model with components that could be easily populated in most medical settings. The enhanced model is a variant of the basic model to which a limited set of variables, which could be extracted from an EMR, were added. These variables, which are part of the LAPS2 severity of illness score,Reference Escobar, Gardner, Greene, Draper and Kipnis 22 were based on their statistical contribution using methods described below. Finally, the automated model is based on variables that could be generated in real time given existing systems in place in KPNC.Reference Escobar, Turk and Ragins 16

We elected to compare these final 3 models against the Zilberberg model, a previously published model by Zilberberg et al,Reference Zilberberg, Reske, Olsen, Yan and Dubberke 25 because it was based on a large cohort and the authors provided substantive detail on its statistical performance. For the Zilberberg model, we structured predictors to match the specifications of Zilberberg et al exactly. However, we did not employ their original coefficients, instead allowing these to emerge given our population. The 4 models, arranged according to increasing complexity, are summarized in Table 1.

TABLE 1 Predictors Used Within Each Model

NOTE. LAPS2, laboratory-based acute physiology score, version 2; COPS, comorbidity point score, version 2; T0, Time zero (T0) is the date-time stamp of the physician order for the index Clostridium difficile infection test; iCDI, incident Clostridium difficile infection (see text for how iCDI is defined); ICU, intensive care unit.

a See text for more detail on model selection.

b We replicated the model developed by Zilberberg et al.Reference Zilberberg, Reske, Olsen, Yan and Dubberke 25

c A patient’s immunosuppression status was defined using algorithmic rules using International Classification of Disease, Ninth Revision (ICD-9) diagnosis codes and immunocompromising medications and treatments used in the 6 mo prior to iCDI.

d Locus of iCDI onset is categorized as (1) community-onset, healthcare-facility–associated (iCDI diagnosed by a positive toxin test within 72 h of admission or iCDI diagnosed in any outpatient setting and a hospitalization in the prior 90 d); (2) community-onset, community-associated (reference group in model: iCDI diagnosed by a positive toxin test within 72 h of admission or in any outpatient setting and no hospitalization in the previous 90 d); or (3) hospital-onset, healthcare-facility–associated (CDI diagnosed >72 h after hospital admission). These definitions were also used by Zilberberg et al.Reference Zilberberg, Reske, Olsen, Yan and Dubberke 25

eWe employed the same definitions as Zilberberg et al.Reference Zilberberg, Reske, Olsen, Yan and Dubberke 25

f LAPS2 is a composite severity of illness score and employs 16 laboratory tests, vital signs, pulse oximetry, and neurological status checks.Reference Escobar, Gardner, Greene, Draper and Kipnis 22

g COPS2 is a 12-month longitudinal comorbidity burden score that includes history elements (eg, recent surgery involving the gastrointestinal track).Reference Escobar, Gardner, Greene, Draper and Kipnis 22

Statistical Methods

We divided cohort data into derivation (patients with iCDI between 2007 and 2013) and validation (iCDI in 2014) datasets. All analyses during model development were performed using the derivation dataset, with final coefficients applied once to the validation dataset. As a further precaution against overfitting, we divided derivation data into Derivation 1 (iCDI dates 2007–2012) and Derivation 2 (2013) datasets.Reference Hastie, Tibshirani and Friedman 28 Within the Derivation 1 dataset, we identified a set of candidate predictors by first performing univariate and bivariate analyses and then applying a random forest algorithm.Reference Hastie, Tibshirani and Friedman 28 , Reference Allison 29 We evaluated the performance and robustness of all models on the Derivation 2 data set using 5-fold cross-validation.Reference Hastie, Tibshirani, Friedman and Franklin 30 We excluded multiple models because, although they performed well in the derivation dataset, performance deteriorated dramatically following cross-validation. This was particularly true with respect to models that incorporated multiple interaction terms.

We fit a simple logistic regression, excluding deaths prior to rCDI for the basic, the enhanced, and the automated models. However, because patients with CDI have a substantial mortality risk and might die prior to developing rCDI, we evaluated several models (based on the enhanced model predictors) to address the possible impact of mortality on rCDI prediction. These included competing risk discrete survival modelsReference Allison 29 and Cox competing risk survival regression.Reference Hosmer and Lemeshow 31 We conducted sensitivity analyses in which we first assigned a probability of rCDI to all patients in a randomly selected portion of the derivation dataset. We then tested various models using the remaining records in which the dependent variable was not dichotomous but continuous (ie, patients who died were assigned a probability of rCDI, and then we modeled for rCDI as a continuous outcome), and we incorporated the conditional probability of mortality into the analyses. Additional details are provided in the Appendix.

We compared the discrimination of each model using the c statistic (area under the receiver operator characteristic curve),Reference Cook, Duke, Hart, Pilcher and Mullany 32 calibration through calibration plots,Reference Crowson, Atkinson and Therneau 33 the incremental contribution of additional predictors using integrated discrimination improvement (IDI), and net reclassification improvement as recommended by CookReference Cook 34 and Pencina et al.Reference Pencina, D’Agostino, D’Agostino and Vasan 35 As recommended by Cook,Reference Cook 34 we also included the Nagelkerke pseudo-R2 in our assessments of model performance. In standard linear regression models, the ratio of the mean-squared error to the variance of the dependent variable can be subtracted from 1 to define an R2 that is always between 0 and 1. In a validation sample, however, the mean-squared error may exceed the variance of the dependent variable, and the resulting R2 may be negative. A negative R2 indicates a very poor fit with the validation sample.Reference Estrella and Mishkin 36

We also conducted sensitivity analyses in which we employed a 30-day (as opposed to an 84-day) period for outcome ascertainment.

RESULTS

We scanned KPNC databases from 2007 to 2014 and identified a total of 41,499 positive tests for Clostridium difficile. A total of 11,251 patients who experienced iCDI. In the derivation dataset, a total of 9,386 patients with iCDI experienced 1,311 first recurrences (14.0%); 2,197 (23.4%) patients died prior to the end of the follow-up period; and 260 (2.8%) died following a recurrence. The corresponding numbers in the validation dataset were 1,865 iCDIs, 144 (7.7%) rCDIs, 376 (20.2%) deaths prior to the end of the follow-up period, and 27 (1.4%) deaths following rCDI. The Appendix provides a flow chart describing the cohort assembly. Excluding patients who died prior to the end of the follow-up period, Table 2 summarizes our cohort characteristics, which are fairly similar to the cohort described by Zilberberg et al.Reference Zilberberg, Reske, Olsen, Yan and Dubberke 25 However, in general, the KPNC cohort was older but healthier (eg, the proportion with Charlson scores <3 was 80%, while that in the Zilberberg et al cohort was ~55%). Furthermore, the KPNC cohort generally had lower risk (eg, only 24% were receiving high-risk antibiotics, compared to 40% in the Zilberberg cohort). Expanded versions of this table are provided in the Appendix.

TABLE 2 Incident Clostridium difficile (iCDI) Cohort Description

NOTE. iCDI, incident Clostridium difficile infection; LAPS2, laboratory-based acute physiology score, version 2; COPS2, comorbidity point score, version 2.

a Cohort consists of patients with iCDI. Patients who died during the follow-up period were removed from analysis.

b See Deyo et alReference Deyo, Cherkin and Ciol 24 for details on how this score was assigned.

c Locus of iCDI onset is categorized as (1) community onset, healthcare-facility associated (iCDI diagnosed by a positive toxin test within 72 h of admission or iCDI diagnosed in any outpatient setting and a hospitalization in the prior 90 d); (2) community onset, community associated (reference group in model: iCDI diagnosed by a positive toxin test within 72 h of admission or in any outpatient setting and no hospitalization in the previous 90 d); or (3) hospital onset, healthcare-facility associated (CDI diagnosed >72 h after hospital admission). These definitions were also used by Zilberberg et al.Reference Zilberberg, Reske, Olsen, Yan and Dubberke 25

d We employed the same antibiotic classifications as Zilberberg et al.Reference Zilberberg, Reske, Olsen, Yan and Dubberke 25

e For an extended definition of LAPS2 and (COPS2), refer to the text and Escobar et al.Reference Escobar, Gardner, Greene, Draper and Kipnis 22 For both of these scores, increasing values are associated with increasing mortality risk. The univariate relationship of an admission LAPS2 with 30-d mortality is as follows: 0–59, 1.0%; 60–109, 5.0%, 110+, 13.7%; the univariate relationship of COPS2 with 30-d mortality is as follows: 0–39, 1.7%; 40–64, 5.2%, 65+, 9.0%.

We compared performance of the discrete time survival and competing risk Cox regression models against the simple logistic regression algorithm where we excluded patients who died prior to an rCDI. The simple logistic regression basic, enhanced, and automated models showed performance comparable to that of the competing risk survival models.

Table 3 summarizes performance characteristics of our models in the validation dataset. All models demonstrated modest discrimination, as shown by their areas under the receiver-operator characteristic curve, or c statistics (range, 0.591–0.605) and poor explanatory power, with negative Nagelkerke pseudo-R2s (−0.1033 to −0.0875). At a predicted risk of ≥15% the positive predictive value ranged from 11.0% to 12.1%; sensitivity ranged from 69.4% to 79.2%; and specificity ranged from 32.0% to 43.6% across the models. With this threshold, the number of patients needed to evaluate (NNE) to detect 1 case of rCDI ranged from 8.3 to 9.0 across the models. Figure 2 shows calibration of the Zilberberg model and the enhanced model; neither model was well calibrated.

FIGURE 2 Model Calibration Using the Validation Dataset. For both plots, the X axis shows predicted rates of recurrent CDI in 5% increments, while the Y axis shows the actual observed rates (with associated 95% confidence intervals) in the validation dataset for all observations with that predicted level of risk. The dotted line shows what would be found were calibration to be perfect. For both the Zilberberg and Enhanced models, calibration is poor: calibration fails at levels above 10% predicted risk. Observed rates do not approach predicted rates, meaning that both models over-predict recurrent CDI. Additional calibration figures, including Hosmer-Lemeshow plots, are provided in the Appendix.

TABLE 3 Model Performance in the Validation DatasetFootnote a at a Predicted Risk of ≥15%

NOTE. c statistic, area under the receiver operator characteristic curve; R2, Nagelkerke’s pseudo-R2; PPV, positive predictive value; NPV, negative predictive value; NNE, number of incident cases one would need to evaluate to detect one recurrence; NRI, net reclassification improvement; IDI, integrated discrimination improvement; iCDI, incident Clostridium difficile infection.

a The validation dataset consisted of 1,865 iCDI patients, of whom 144 developed rCDI. A total of 376 iCDI patients died (and thus could not be assessed for recurrence).

b See text for description of the 4 models. “Age ≥65 years” refers to a simple decision rule based on age alone. Sensitivity, PPV, NPV, NNE, NRI, and IDI are based on the model giving a predicted recurrence risk of ≥15% within 84 days.

c We conducted sensitivity analyses using predicted risk of ≥20%, ≥25%, and ≥30%. These results are provided in the Appendix.

Sensitivity analyses of the possible impact of mortality indicate that consideration of this issue (eg, by assigning a weighted probability of rCDI to patients who died and then modeling for rCDI as a continuous outcome) did not improve prediction. Sensitivity analyses using a 30-day (instead of 84-day) follow-up period resulted in worse model performance. Additional results are provided in the Appendix.

DISCUSSION

Using a large recent cohort, we developed and validated 3 rCDI predictive models using contemporary modeling techniques and EMR data. We also validated a previously published modelReference Zilberberg, Reske, Olsen, Yan and Dubberke 25 in a different population. However, despite including highly granular EMR data (eg, vital signs, laboratory tests, composite severity of illness scores, and longitudinal comorbidity), the models and underlying data had poor ability to predict rCDI. We formally tested a common assumption made by many investigators (ie, that deaths can simply be excluded from the numerator). We found that this approach is justified, and that including patients who die prior to the conclusion of the follow-up period did not improve prediction. Lastly, we found that shortening the length of follow-up to 30 days resulted in worse model performance.

Some authors have reported better model performance. Examination of these other studies paints a less optimistic picture. Hu et alReference Hu, Katchar and Kyne 10 report the use of machine-learning approaches and a c statistic of 0.80 in their validation dataset. However, this study had a very small sample size (N=110, with N=64 in the validation dataset) and did not employ cross-validation (ie, no formal assessment of the possibility that model performance in a different population might be poor). We were able to achieve c statistics that were this high in our derivation dataset, but these apparently successful models demonstrated considerable instability during cross-validation. We did not pursue them further and chose more parsimonious models.

Contrary to previous literature reports, some predictors (eg, specific antibiotic exposures) were of limited value, particularly in models that included severity of illness. This probably reflects the fact that severity of illness is highly correlated (and may, in fact, be the underlying risk factor) with other predictors (eg, intensive care and antibiotics known to predispose for CDI). We deliberately focused on predicting rCDI in iCDI cases, though previous CDI is a well-known risk factor for recurrence. It is possible that, had we included prior CDI as a predictor, we might have achieved better model performance. However, models that included the COPS2 score (a longitudinal comorbidity measure that captures information from the preceding 12 months) did not perform much better.

Multiple investigators, using a variety of statistical approaches, including machine-learning methods, have been unable to produce static models with better performance using the currently available set of predictors. While it is true that many predictors reach statistical significance in bivariate analyses (particularly when the sample size is large), the clinical significance may be muted because the relative proportions of patients with and without recurrence are not that different. Further, it is clear that the risk factors (age, antibiotic exposure, severity of illness) that place an individual at risk for iCDI are also risk factors for rCDI. Thus, future efforts ought to be placed on identifying better predictors rather than on using different statistical approaches with the currently available predictors. New predictors may include newer biomarkers (eg, indicators of underlying predisposition to recurrence), environmental factors (eg, proximity to other CDI patients, presence of C. difficile sporesReference Freedberg, Salmasian, Cohen, Abrams and Larson 4 ), behavioral aspects (eg, handwashing), and/or molecular markers (eg, information on specific C. difficile strains). It is also important to consider rCDI in an ecological context, and future predictive models may need to be explicit about including environmental and ecological predictors (eg, isolation rooms, who is roomed where, other family members exposure), if such data become available.

One alternative that we did not explore because it is currently not feasible with existing EMRs, was to develop dynamic models. In contrast to the static approach we and others have employed (ie, providing a single probability estimate based on a discrete set of predictors available at some T0), such models adjust posterior probabilities based on new information. In the case of rCDI, having additional information on both antibiotic treatment as well as other exposures (eg, proton pump inhibitors) could have dramatic effects on our ability to predict recurrence.Reference McDonald, Milligan, Frenette and Lee 37 , Reference Deshpande, Pasupuleti and Thota 38 The development of such models would require EMRs with greater capabilities than those currently available.

Our study had several additional limitations. Due to resource limitations and sparse data, we limited our cases to inpatient iCDI. During this study, KPNC implemented aggressive efforts to reduce CDI. As a result, our data show that the incidences of iCDI and rCDI were decreasing in our study cohort. Despite these limitations, models to predict recurrence have value. They do permit identification of patient subsets with elevated or very low risk. In some scenarios, and in the context of discrete interventions, the use of these models might improve outcomes and decrease costs. In addition, existing models point to predictors that can be assessed in the future, such as the aforementioned ecological ones.

Compared to our ability to predict other outcomes (eg, death, unplanned transfer to intensive care),Reference Escobar, Turk and Ragins 16 , Reference Escobar, Greene, Gardner, Marelich, Quick and Kipnis 20 , Reference Escobar, Gardner, Greene, Draper and Kipnis 22 our ability to predict rCDI is limited and contrasts with much better ability to predict iCDI.Reference Kuntz, Johnson and Raebel 39 , Reference Kuntz, Smith and Petrik 40 Given the major consequences of rCDI on patient outcomes, our results support the need to expand research on the prevention and treatment of recurrence. Such research may also result in the identification of novel predictors that are currently unavailable even in the most comprehensive EMRs.

ACKNOWLEDGMENTS

This project was funded by a grant from Merck Sharp & Dohme Corporation, Whitehouse Station, New Jersey. The authors wish to thank Juan Carlos LaGuardia for help assembling the dataset, Dr Tracy Lieu for reviewing the manuscript, Vanessa Rodriguez for formatting the text for publication, Anna Cardellino for her assistance in drafting the protocol, and Mary Beth Dorr for her review and guidance in the analysis.

Financial support: Dr Vincent Liu was funded by a National Institutes of Health award (grant no. K23GM112018).

Potential conflicts of interest: The Kaiser Permanente authors Escobar, Kipnis, Liu, Greene, and Baker have no conflicts of interest to report. Dr Erik Dubberke has received grant support from Rebiotix, Merck, and Sanofi Pasteur; he also has consulting and advisory board relationships with Rebiotix, Summit, GSK, Valenva, Sanofi Pasteur. The remaining coauthors Cossrow, Gupta, Mast, and Mehta are or were employees of Merck Sharp & Dohme Corporation, a subsidiary of Merck & Co., Kenilworth, New Jersey, and potentially own stock and/or hold stock options in the company.

SUPPLEMENTARY MATERIAL

To view supplementary material for this article, please visit https://doi.org/10.1017/ice.2017.176

References

REFERENCES

1. Lessa, FC, Winston, LG, McDonald, LC. Burden of Clostridium difficile infection in the United States. N Engl J Med 2015;372:23692370.Google Scholar
2. Kwon, JH, Olsen, MA, Dubberke, ER. The morbidity, mortality, and costs associated with Clostridium difficile infection. Infect Dis Clin North Am 2015;29:123134.Google Scholar
3. Olsen, MA, Young-Xu, Y, Stwalley, D, et al. The burden of Clostridium difficile infection: estimates of the incidence of CDI from US administrative databases. BMC Infect Dis 2016;16:177.Google Scholar
4. Freedberg, DE, Salmasian, H, Cohen, B, Abrams, JA, Larson, EL. Receipt of antibiotics in hospitalized patients and risk for Clostridium difficile infection in subsequent patients who occupy the same bed. JAMA Intern Med 2016;176:18011808.Google ScholarPubMed
5. McFarland, LV. Renewed interest in a difficult disease: Clostridium difficile infections—epidemiology and current treatment strategies. Curr Opin Gastroenterol 2009;25:2435.Google Scholar
6. Bouza, E. Consequences of Clostridium difficile infection: understanding the healthcare burden. Clin Microbiol Infect 2012;18(Suppl 6):512.Google Scholar
7. McFarland, LV, Elmer, GW, Surawicz, CM. Breaking the cycle: treatment strategies for 163 cases of recurrent Clostridium difficile disease. Am J Gastroenterol 2002;97:17691775.Google Scholar
8. Zilberberg, MD, Reske, K, Olsen, M, Yan, Y, Dubberke, ER. Risk factors for recurrent Clostridium difficile infection (CDI) hospitalization among hospitalized patients with an initial CDI episode: a retrospective cohort study. BMC Infect Dis 2014;14:306.Google Scholar
9. Sheitoyan-Pesant, C, Abou Chakra, CN, Pepin, J, Marcil-Heguy, A, Nault, V, Valiquette, L. Clinical and healthcare burden of multiple recurrences of Clostridium difficile infection. Clin Infect Dis 2016;62:574580.Google Scholar
10. Hu, MY, Katchar, K, Kyne, L, et al. Prospective derivation and validation of a clinical prediction rule for recurrent Clostridium difficile infection. Gastroenterology 2009;136:12061214.Google Scholar
11. Eyre, DW, Walker, AS, Wyllie, D, et al. Predictors of first recurrence of Clostridium difficile infection: implications for initial management. Clin Infect Dis 2012;55:S77S87.Google Scholar
12. Hebert, C, Du, H, Peterson, LR, Robicsek, A. Electronic health record–based detection of risk factors for Clostridium difficile infection relapse. Infect Control Hosp Epidemiol 2013;34:407414.Google Scholar
13. D’Agostino, RB Sr., Collins, SH, Pencina, KM, Kean, Y, Gorbach, S. Risk estimation for recurrent Clostridium difficile infection based on clinical factors. Clin Infect Dis 2014;58:13861393.Google Scholar
14. Kollef, MH, Chen, Y, Heard, K, et al. A randomized trial of real-time automated clinical deterioration alerts sent to a rapid response team. J Hosp Med 2014;9:424429.Google Scholar
15. Escobar, GJ, Dellinger, RP. Early detection, prevention, and mitigation of critical illness outside intensive care settings. J Hosp Med 2016;11:S5S10.Google Scholar
16. Escobar, GJ, Turk, BJ, Ragins, A, et al. Piloting electronic medical record-based early detection of inpatient deterioration in community hospitals. J Hosp Med 2016;11:S18S24.Google Scholar
17. Watt, M, Dinh, A, Le Monnier, A, Tilleul, P. Cost-effectiveness analysis on the use of fidaxomicin and vancomycin to treat Clostridium difficile infection in France. J Med Econ 2017;20:678686.Google Scholar
18. Nelson, RL, Suda, KJ, Evans, CT. Antibiotic treatment for Clostridium difficile–associated diarrhoea in adults. Cochrane Database Syst Rev 2017;3:CD004610.Google Scholar
19. Wilcox, MH, Gerding, DN, Poxton, IR, et al. Bezlotoxumab for prevention of recurrent Clostridium difficile infection. N Eng. J Med 2017;376:305317.Google Scholar
20. Escobar, GJ, Greene, JD, Gardner, MN, Marelich, GP, Quick, B, Kipnis, P. Intra-hospital transfers to a higher level of care: contribution to total hospital and intensive care unit (ICU) mortality and length of stay (LOS). J Hosp Med 2011;6:7480.Google Scholar
21. Liu, V, Kipnis, P, Rizk, NW, Escobar, GJ. Adverse outcomes associated with delayed intensive care unit transfers in an integrated healthcare system. J Hosp Med 2012;7:224230.Google Scholar
22. Escobar, GJ, Gardner, MN, Greene, JD, Draper, D, Kipnis, P. Risk-adjusting hospital mortality using a comprehensive electronic record in an integrated health care delivery system. Med Care 2013;51:446453.Google Scholar
23. Selby, JV. Linking automated databases for research in managed care settings. Ann Intern Med 1997;127:719724.Google Scholar
24. Deyo, RA, Cherkin, DC, Ciol, MA. Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J Clin Epidemiol 1992;45:613619.Google ScholarPubMed
25. Zilberberg, MD, Reske, K, Olsen, M, Yan, Y, Dubberke, ER. Development and validation of a recurrent Clostridium difficile risk-prediction model. J Hosp Med 2014;9:418423.Google Scholar
26. Dubberke, ER, Reske, KA, Yan, Y, Olsen, MA, McDonald, LC, Fraser, VJ. Clostridium difficile–associated disease in a setting of endemicity: identification of novel risk factors. Clin Infect Dis 2007;45:15431549.Google Scholar
27. Dubberke, ER, Yan, Y, Reske, KA, et al. Development and validation of a Clostridium difficile infection risk prediction model. Infect Control Hosp Epidemiol 2011;32:360366.Google Scholar
28. Hastie, T, Tibshirani, R, Friedman, JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York: Springer Verlag; 2009.Google Scholar
29. Allison, PD. Logistic Regression Using SAS: Theory and Application. 2nd ed. Cary, NC: SAS Institute; 2012.Google Scholar
30. Hastie, T, Tibshirani, R, Friedman, J, Franklin, J. The elements of statistical learning: data mining, inference and prediction. Mathemat Intelligenc 2005;27:8385.Google Scholar
31. Hosmer, DW, Lemeshow, S. Applied Survival Analysis: Regression Modelling of Time to Event Data. Hoboken, NJ: Wiley; 2008.Google Scholar
32. Cook, DA, Duke, G, Hart, GK, Pilcher, D, Mullany, D. Review of the application of risk-adjusted charts to analyse mortality outcomes in critical care. Crit Care Resusc 2008;10:239251.Google Scholar
33. Crowson, CS, Atkinson, EJ, Therneau, TM. Assessing calibration of prognostic risk scores. Stat Method Med Res 2014;25:16921706.Google Scholar
34. Cook, NR. Use and misuse of the receiver operating characteristic curve in risk prediction. Circulation 2007;115:928935.Google Scholar
35. Pencina, MJ, D’Agostino, RB Sr, D’Agostino, RB Jr, Vasan, RS. Evaluating the added predictive ability of a new marker: from area under the ROC curve to reclassification and beyond. Stat Med 2008;27:157172; discussion 207–212.Google Scholar
36. Estrella, A, Mishkin, FS. Predicting US recessions: financial variables as leading indicators. Rev Econ Statist 1998;80:4561.Google Scholar
37. McDonald, EG, Milligan, J, Frenette, C, Lee, TC. Continuous proton pump inhibitor therapy and the associated risk of recurrent Clostridium difficile Infection. JAMA Intern Med 2015;175:784791.Google Scholar
38. Deshpande, A, Pasupuleti, V, Thota, P, et al. Risk factors for recurrent Clostridium difficile infection: a systematic review and meta-analysis. Infect Control Hosp Epidemiol 2015;36:452460.Google Scholar
39. Kuntz, JL, Johnson, ES, Raebel, MA, et al. Predicting the risk of Clostridium difficile infection following an outpatient visit: development and external validation of a pragmatic, prognostic risk score. Clin Microbiol Infect 2015;21:256262.Google Scholar
40. Kuntz, JL, Smith, DH, Petrik, AF, et al. Predicting the risk of Clostridium difficile infection upon admission: a score to identify patients for antimicrobial stewardship efforts. Perm J 2016;20:2025.Google Scholar
Figure 0

FIGURE 1 Time periods employed to define patient inclusion in cohort and patient data in predictive models. The T0 is defined by the date/time stamp of the physician order for the index test. In order for the patient to be included in the cohort, the T0 had to be preceded by 84 days with no positive test for Clostridium difficile (“clean”period). To be considered an outcome, an infection had to occur during the Recurrence period. This meant that a positive test result occurred within 84 days following the end of a variable treatment period (time between the T0 and completion of antibiotic treatment, ABXEnd). Patient data included in the predictive models had to be available within 4 days from the T0(Predictor period). See text for additional details.

Figure 1

TABLE 1 Predictors Used Within Each Model

Figure 2

TABLE 2 Incident Clostridium difficile (iCDI) Cohort Description

Figure 3

FIGURE 2 Model Calibration Using the Validation Dataset. For both plots, the X axis shows predicted rates of recurrent CDI in 5% increments, while the Y axis shows the actual observed rates (with associated 95% confidence intervals) in the validation dataset for all observations with that predicted level of risk. The dotted line shows what would be found were calibration to be perfect. For both the Zilberberg and Enhanced models, calibration is poor: calibration fails at levels above 10% predicted risk. Observed rates do not approach predicted rates, meaning that both models over-predict recurrent CDI. Additional calibration figures, including Hosmer-Lemeshow plots, are provided in the Appendix.

Figure 4

TABLE 3 Model Performance in the Validation Dataseta at a Predicted Risk of ≥15%

Supplementary material: File

Escobar et al supplementary material

Appendix

Download Escobar et al supplementary material(File)
File 250.8 KB