Machine learning for the prediction of antimicrobial stewardship intervention in hospitalized patients receiving broad-spectrum agents

Rachel J. Bystritsky; Alex Beltran; Albert T. Young; Andrew Wong; Xiao Hu; Sarah B. Doernberg

doi:10.1017/ice.2020.213

Machine learning for the prediction of antimicrobial stewardship intervention in hospitalized patients receiving broad-spectrum agents

Published online by Cambridge University Press: 18 June 2020

Xiao Hu and

Rachel J. Bystritsky*: Affiliation:
Department of Medicine, Infectious Diseases, University of California–San Francisco, San Francisco, California
Alex Beltran: Affiliation:
Department of Bioengineering, University of California–San Francisco, San Francisco, California
Albert T. Young: Affiliation:
School of Medicine, University of California–San Francisco, San Francisco, California
Andrew Wong: Affiliation:
School of Medicine, University of California–San Francisco, San Francisco, California
Xiao Hu: Affiliation:
Department of Bioengineering, University of California–San Francisco, San Francisco, California
Sarah B. Doernberg: Affiliation:
Department of Medicine, Infectious Diseases, University of California–San Francisco, San Francisco, California
*: Author for correspondence: Rachel J. Bystritsky, E-mail: Rachel.Bystritsky@ucsf.edu

Article contents

Abstract
Objective:
Methods:
Results:
Conclusions:
Methods
Results
Discussion
Financial support
Conflicts of interest
Supplementary material
References

Rights & Permissions

Abstract

Objective:

A significant proportion of inpatient antimicrobial prescriptions are inappropriate. Post-prescription review with feedback has been shown to be an effective means of reducing inappropriate antimicrobial use. However, implementation is resource intensive. Our aim was to evaluate the performance of traditional statistical models and machine-learning models designed to predict which patients receiving broad-spectrum antibiotics require a stewardship intervention.

Methods:

We performed a single-center retrospective cohort study of inpatients who received an antimicrobial tracked by the antimicrobial stewardship program. Data were extracted from the electronic medical record and were used to develop logistic regression and boosted-tree models to predict whether antibiotic therapy required stewardship intervention on any given day as compared to the criterion standard of note left by the antimicrobial stewardship team in the patient’s chart. We measured the performance of these models using area under the receiver operating characteristic curves (AUROC), and we evaluated it using a hold-out validation cohort.

Results:

Both the logistic regression and boosted-tree models demonstrated fair discriminatory power with AUROCs of 0.73 (95% confidence interval [CI], 0.69–0.77) and 0.75 (95% CI, 0.72–0.79), respectively (P = .07). Both models demonstrated good calibration. The number of patients that would need to be reviewed to identify 1 patient who required stewardship intervention was high for both models (41.7–45.5 for models tuned to a sensitivity of 85%).

Conclusions:

Complex models can be developed to predict which patients require a stewardship intervention. However, further work is required to develop models with adequate discriminatory power to be applicable to real-world antimicrobial stewardship practice.

Type: Original Article
Information: Infection Control & Hospital Epidemiology , Volume 41 , Issue 9 , September 2020 , pp. 1022 - 1027

DOI: https://doi.org/10.1017/ice.2020.213 [Opens in a new window]
Copyright: © 2020 by The Society for Healthcare Epidemiology of America. All rights reserved.

Antimicrobial resistance is a growing problem in the care of hospitalized patients, and it is driven by the overuse of antimicrobials. Prior studies have shown that ~30% of antibiotics prescribed in the inpatient setting are inappropriate—either unsuitable or unnecessary.^{Reference Hecker, Aron, Patel, Lehmann and Donskey1,Reference Cosgrove, Seo and Bolon2} Antimicrobial stewardship programs aim to improve the appropriate use of antimicrobial agents by promoting the selection of optimal antibiotic regimens.^{Reference Barlam, Cosgrove and Abbo3} Postprescription review with feedback (PPRF), or real-time review of antibiotic prescriptions with feedback to prescribers, is a cornerstone of many antimicrobial stewardship programs. PPRF strategies have been shown to reduce inappropriate antibiotic use, resulting in decreased antibiotic resistance and improved clinical outcomes.^{Reference Barlam, Cosgrove and Abbo3–Reference Davey7} A major drawback of PPRF is that it is labor intensive and requires a significant time commitment on the part of experienced antimicrobial stewardship pharmacists and/or physicians.^{Reference Doernberg, Abbo and Burdette8} Much of the inefficiency is due to the fact that many prescriptions require review to identify targets appropriate for feedback. Thus, new approaches are needed to improve the efficiency of daily PPRF. One approach to improving efficiency and effectiveness of PPRF is through computerized clinical decision support systems.^{Reference Catho, De Kraker and Waldispul Suter9} However, these systems generally require programming priorities for patients to be reviewed (eg, those on dual anaerobic coverage or those with positive blood cultures). More complex factors or interactions of factors may more accurately predict which patients are receiving inappropriate or unnecessary antibiotics.

Machine learning, the ability of a computer to learn without being explicitly programmed,^{Reference Deo10} is a complex set of approaches that are being applied in medicine for a wide variety of purposes. Machine-learning methods are increasingly being applied to infectious diseases, including to identify drug-resistance genes in multidrug-resistant tuberculosis^{Reference Huang, Ding and Yang11} and to predict recurrent Clostridium difficile infection.^{Reference Escobar, Baker and Kipnis12} The goal of this study was to apply traditional statistical modeling and machine-learning methods utilizing information contained within the electronic health record (EHR) to determine which patients on antibiotics required stewardship intervention. Our ultimate goal was to develop a model that could be employed by antimicrobial stewardship programs (ASPs) to improve efficiency and effectiveness of PPRF.

Methods

This project was approved by the UCSF Institutional Review Board for the Protection of Human Subjects.

Study sample

This study cohort included adult patients hospitalized between December 1, 2015, and August 1, 2017, at the University of California San Francisco Medical Center (UCSF), a 600-bed academic tertiary-care medical center. The eligible population included adults ≥18 years of age who received at least 1 antimicrobial from a list of those routinely tracked by the ASP (or “tracked antimicrobial,” listed in Appendix 1 online). Patients could be included multiple times if there were multiple admissions or courses of antimicrobials during the study period. The unit of analysis was patient days. Days on which patients were being seen by the infectious diseases consultation team were excluded (these patients are not reviewed by the ASP). Only weekdays were included in the data sets used for model development because our ASP does not perform reviews on weekends. New antibiotic courses were defined as courses that were started at least 72 hours after the most recent antibiotic dose administration. Multiple antibiotics given concurrently were defined as a single course.

PPRF description and study outcome

At UCSF, PPRF occurs every weekday and includes review of inpatients receiving any of the broad-spectrum antimicrobials being tracked. The ASP team marks patients on a flowsheet as their records are reviewed, and they make notes to other ASP providers and set a time for the next review. Not all charts are reviewed every day. A chart may not have been reviewed on a given day either because it had been reviewed on a prior day and follow-up had been set for a future date or because there was insufficient time available for the ASP to review all charts on that day. When the ASP suggests a change in antimicrobial management, a note is left in the chart in addition to direct feedback. Our primary outcome was whether the ASP left a note on any given patient day, a proxy measure for whether the ASP deemed antimicrobial management to require a stewardship intervention. Antimicrobial management could be deemed to require intervention by the ASP team if the antibiotic choice was felt to be inappropriate (ie, unnecessarily broad spectrum, drug–pathogen mismatch, redundant therapy) or unnecessary.

Data extraction and processing

Our hospital uses an electronic health record (EHR) system consisting of a combination of commercially available software (Epic, Epic Systems, Verona, WI) and locally developed databases for infection control purposes, which are derived from the EHR. We extracted data from both the Epic-based relational database management system (Clarity) and our infection control database.

For each day a patient was administered at least 1 tracked antimicrobial, we extracted >200 potential predictors from the EHR. We split the data into 2 main categories: time-invariant and time-varying variables. Time-invariant variables included patient demographics (eg, gender, age), admitting service, and statistics on prior admission (eg, number of admissions in the prior 90 days). Time-varying variables extracted for each day included antimicrobial administration, length of stay, procedures (ie, whether an incision and drainage had been performed), and laboratory and vital sign data. Daily laboratory and vital sign data were extracted corresponding to a time range of 9:00 am the day prior to 9:00 am the day of interest to replicate the data used by the stewardship team during their review.

The raw data set included highly granular data including multiple vital sign readings per day, which were simplified to include only the maximal or minimal value per day, depending on the variable (ie, maximum temperature, maximum creatinine, minimum absolute neutrophil count). We mapped all categorical data to binary features (eg, if a patient received a particular antimicrobial on a given day, then the binary feature associated with that medication was set to “1,” or “0” if they did not). For many of the continuous variables, we dichotomized the variables into binary features based on well-established reference ranges (eg, fever if maximum temperature was >38°C, see Appendix 2 online for all criteria). We extracted data for each day a patient was prescribed a tracked antimicrobial.

Statistical methods

Univariate analyses were performed using χ² tests for categorical data and t tests for continuous variables. Data were split randomly into derivation (80%) and validation (20%) data sets by patient (based on medical record number). Values of variables varying over time (eg, vital signs, most lab values, and antibiotics administered) were analyzed on the day of interest. Fixed variables that were considered relevant to an entire course of antibiotics (eg, initial positive urinalysis or cultures, admission to intensive care unit, admitting service, or demographics) were reported by antibiotic course.

For the logistic regression model, we included a final set of 26 predictors based on a priori assumptions and statistical performance (P < .05 in univariate analysis). We used a stepwise logistic model and calculated a C statistic (area under the receiver operating characteristic curve (AUROC)) for comparison to the machine-learning models. Significance levels for removal and addition to the model were 0.15 and 0.10, respectively.

For the machine-learning model, all ordinal and categorical predictors were binarized via one-hot encoding i.e. a new binary column was added to represent each possible predictor/value pair. However, feature selection was handled internally by an approximate algorithm based on feature distributions. Each model consisted of an ensemble of shallow decision trees. They were trained using 10-fold cross validation on the derivation set across 160 different parameter combinations. All parameters were hard-coded except the number of trees, which was determined by an early stopping rule. Trees were added to each model until AUROC for predictions on the hold-out sets did not increase for 50 training rounds. The final model (ie, parameter set) was identified by the best average AUROC across all hold-out sets.

A Brier score,^{Reference Brier13} the mean squared prediction error, was calculated for each model. The Brier score measures differences between observed events and predicted probabilities. Scores vary from 0 to 1 with a score of 0 indicating perfect predictive performance.^{Reference Cohen, Ko and Bilimoria14} Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated via standard methods for the corner point of each model as well as prespecified thresholds of sensitivity. The number needed to review (NNR) is the number of patient days needed to review to identify 1 patient requiring stewardship intervention and was calculated as 1/PPV. The logistic regression and boosted-tree models were compared using the McNemar test^{Reference Deitterich15} using the threshold determined by the corner point of the area’s receiver operating curve on the test set.

We used STATA version 15.1 software (StataCorp, College Station, TX) for statistical analyses. For the machine-learning analysis, XGBoost 0.72 was used with Python version 3.6.5 software (Python Software Foundation, Wilmington, DE). Sensitivity, specificity, and positive and negative predictive values were determined using Python, with output from STATA for the logistic regression model.

Results

Between December 15, 2015, and August 1, 2017, 9,651 adult hospitalized patients without infectious diseases consultations received at least one “tracked” antimicrobial for a total of 18,275 antimicrobial courses (mean duration, 3.4 days) over 62,095 patient days. The ASP left recommendations during 684 antimicrobial courses (3.7%). Each patient received 1.9 courses of antibiotics on average during the study period. The most frequently administered antimicrobials in our cohort were vancomycin (43.7%) and piperacillin-tazobactam (20.5%) by patient days. ASP recommendations were made, on average, on day 4.9 of therapy (SD ± 4.3). Baseline demographics of age and gender were similar between patients who did and did not receive an ASP note (Table 1). By univariate analysis, variables that were associated with whether an intervention was made included positive cultures, positive urinalysis, admission to the intensive care unit, lack of prior infectious diseases consultation, international normalized ratio (INR), and type of antibiotic (Table 2).

Table 1. Characteristics of the Study Population

Note. ASP, antimicrobial stewardship program; SD, standard deviation; ID, infectious diseases; WBC, white blood cell; ICU, intensive care unit.

Table 2. Predictors Associated with a Recommendation Being Made by the Stewardship Team

Note. OR, odds ratio; CI, confidence interval; GNR, gram-negative rod; ID, infectious diseases.

^a Resistant GNR was defined as ESBL or Amp-C–producing gram-negative rod.

^b Tachycardia was defined as a heart rate >110 beats per minute.

A list of 25 predictors (see Appendix 1 and Supplementary Materials online) that had been selected a priori and identified as significant in the unadjusted analysis were included in a stepwise logistic regression model using the development data set. Using the prespecified criteria for addition or removal of predictor variables, 12 variables remained in the final model (Table 2).

All features extracted from the EHR were used to construct the boosted-trees model using an unbiased approach. The final boosted-tree model contained 232 trees, with a depth of trees of 1.

Both models were then applied to the validation data set, and C-statistics (AUROCs) of 0.73 (95% CI, 0.69–0.77) and 0.75 (95% CI, 0.72–0.79) were obtained for the logistic regression and boosted-tree models (P = .07) (Fig. 1A and 1B). Decision thresholds were selected at the corner points for each model as well as at prespecified sensitivities of 0.85, 0.90, and 0.95, which resulted in diagnostic performance shown in Table 3. We tuned the models to higher sensitivity thresholds under the assumption that a stewardship program would want to capture a comprehensive list of patients on inappropriate antibiotics. The number needed to review ranged from 33.3 to 45.5 for the logistical model and from 27.8 to 50 for the boosted-trees model. For comparison, based on our data, the number needed to review using our current process would be 99 based on recommendations being made on 1.1% of cases.

Fig. 1. Area under the receiver operating characteristic curve (AUC) for logistic regression (dotted line) and boosted trees (solid line) for the derivation (left) and validation sets (right).

Table 3. Performance Characteristics of Logistic Regression and Boosted-Tree Models^a

Note. AUC, area under the curve; PPV, positive predictive value; NPV, negative predictive value; NNR, number needed to review.

^a The first line for each model shows the performance characteristics at the corner point with performance characteristics at prespecified sensitivities below.

The types of variables that were included in the logistic regression model compared to the boosted-tree model are shown in Table 4 (for full set of variables used in both models see Appendix 3 in the Supplementary Materials online).

Table 4. Predictors Used in Each Model

Note. GNR, gram-negative rod; ESBL, extended-spectrum beta-lactamase.

^a Positive urinalysis defined as >10 white blood cells per high powered field.

^b Resistant GNR was defined as ESBL or Amp-C producing gram-negative rod.

^c Miscellaneous cultures is defined as sterile or nonsterile site cultures other than blood, urine, respiratory or cerebrospinal fluid culture.

Discussion

In this cohort, both the machine-learning (boosted trees) and logistic regression models demonstrated modest performance for predicting which patients required stewardship intervention on a given day. Both methods exhibited similar performance, with AUROCs favoring the boosted-tree model, although this difference was not statistically significant. Both models had high NPVs, and therefore performed well in regard to identifying cases that do not need further manual review, though the paucity of outcomes limits the interpretation of the NPVs. However, both models showed low PPV and high NNR to identify patients requiring stewardship intervention on a given day. The high NNR may diminish the utility of making the PPRF process more efficient, although this may be mitigated in part by the fact that a larger number of patients can be screened with the use of an automated algorithm.

This study had several limitations that may have compromised the performance of the predictive models. First, the number of outcomes was sparse. Optimal tuning of machine-learning models requires a large number of outcomes for training and validation. Although our data set included >60,000 patient days, this is a relatively small data set by machine-learning standards. Additionally, not all patient days in the data set were reviewed by the ASP team, even during the business week, due to limited resources, but they were still included in the model. These omissions may have compromised the ability of the models to discriminate patients on requiring stewardship intervention because some cases may not have been intervened upon simply because resources were insufficient to review them. Because our stewardship team schedules dates for follow-up review of patients, a significant subset of patients not reviewed on a given weekday were not reviewed intentionally because the antibiotic course was deemed appropriate for that day a priori. For example, a patient on appropriate definitive therapy would be scheduled for follow-up review on the day therapy is scheduled to be completed to ensure appropriate duration, but the days prior to this would not have been reviewed further. Thus, including in the model the days not reviewed provided additional information about appropriateness of antibiotics despite the important drawback that some of the days lacking review were not due to presumed appropriateness but instead were due to limited ASP resources for review that day. Whether an intervention was made also depends on additional factors such as perceived probability of acceptance of recommendations by the prescribing team and the availability of data on which a decision to intervene can be based. Pragmatically speaking, these considerations may be important when choosing whether a stewardship intervention should be made but may not be fully recognized by the model. We must be careful when interpreting the output of the model to take into consideration the circumstances under which it was trained. During the study period, most recommendations were made by a small number of individuals staffing the ASP team (ie, 1–2), which further limits the generalizability of the model.

One limitation of machine learning in general is the “black box” nature of the models. The models provide probabilistic outputs but offer no explanation of how these predictions are made.^{Reference Watson, Krutzinna, Griffiths, McInnes, Barnes and Floridi16} Explainable models are being developed that incorporate hypothesis-driven experiments with the limitation that they require large data sets (tens of thousands to millions of subjects).^{Reference Vu, Adalı and Ba17}

The advantage of machine learning over other methods includes the ability to handle a large number of variables and samples. In this study, we used a preprocessed data set, which is labor intensive and limits implementation feasibility. We also simplified the data set to include only the maximum or minimum values per day for each variable, which allowed for logistic regression modeling but may have eliminated additional data that would have been predictive in the machine-learning model. In addition, having a well-characterized data set of patients on antibiotics and interventions made will be critical and may necessitate comprehensive point-prevalence style review of patients on antibiotics to achieve. This study used a basic boosted-trees machine-learning model. Other models exist, such as learning to rank models,^{Reference Watson, Krutzinna, Griffiths, McInnes, Barnes and Floridi16} which model the relative ordering of inputs according to their true labels or scores and may be better suited to the practice of antimicrobial stewardship by allowing prioritization of patients by likelihood of antimicrobial inappropriateness. Future directions include building a model that can use highly granular raw data output from the EHR that would allow for availability of larger sets of data for training as well as real-time updating of the model after implementation, which could then be integrated into the EHR to allow for identification of high-yield targets for manual review.

PPRF has the potential to improve antimicrobial use and to improve outcomes, but optimal implementation remains hampered by the labor intensiveness of manual review. This study serves as an example of an approach to leveraging statistical and machine-learning models to identify patients who may require stewardship intervention and can be targeted for manual review. Our models suggest that machine learning may be able to outperform statistical models when dealing with complex data sets using large numbers of interacting variables. Further research is needed to develop advanced models to predict need for stewardship intervention in hospitalized patients and optimize the efficiency of post-prescription review with feedback.

Acknowledgments

None.

Financial support

This study was supported by an NIH Research Training Grant (grant no. T32 2T32AI007641) to R.B.

Conflicts of interest

All authors report no conflicts of interest relevant to this article.

Supplementary material

To view supplementary material for this article, please visit https://doi.org/10.1017/ice.2020.213

References

Hecker, MT, Aron, DC, Patel, NP, Lehmann, MK, Donskey, CJ. Unnecessary use of antimicrobials in hospitalized patients: current patterns of misuse with an emphasis on the antianaerobic spectrum of activity. Arch Intern Med 2003;163:972–978.CrossRef Google Scholar PubMed

Cosgrove, SE, Seo, SK, Bolon, MK, et al; CDC Prevention Epicenter Program. Evaluation of postprescription review and feedback as a method of promoting rational antimicrobial use: a multicenter intervention. Infect Control Hosp Epidemiol 2012;33:374–380.CrossRef Google Scholar PubMed

Barlam, TF, Cosgrove, SE, Abbo, LM, et al.Implementing an antibiotic stewardship program: guidelines by the Infectious Diseases Society of America and the Society for Healthcare Epidemiology of America. Clin Infect Dis 2016;62(10):e51–e77.CrossRef Google Scholar

Honda, H, Murakami, S, Tagashira, Y, et al.Efficacy of a postprescription review of broad-spectrum antimicrobial agents with feedback: a 4-year experience of antimicrobial stewardship at a tertiary care center. Open Forum Infect Dis 2018;5(12):ofy314.CrossRef Google Scholar

Tamma, PD, Avdic, E, Keenan, JF, et al.What is the more effective antibiotic stewardship intervention: preprescription authorization or postprescription review with feedback? Clin Infect Dis 2017;64:537–543.Google Scholar PubMed

Lesprit, P, et al.Postprescription review improves in-hospital antibiotic use: a multicenter randomized controlled trial. Clin Microbiol Infect 2015;21(2):180.e1–e7. Epub 2014 Oct 14.CrossRef Google Scholar PubMed

Davey, P, et al.Interventions to improve antibiotic prescribing practices for hospital inpatients. Cochrane Database Syst Rev 2017;2:CD003543.Google Scholar PubMed

Doernberg, SB, Abbo, LM, Burdette, SD, et al.Essential resources and strategies for antibiotic stewardship programs in the acute care setting. Clin Infect Dis 2018;67:1168–1174.CrossRef Google Scholar PubMed

Catho, G, De Kraker, M, Waldispul Suter, B, et al.Study protocol for a multicentre, cluster randomized, superiority trial evaluating the impact of computerized decision support, audit and feedback on antibiotic use: the COMPuterized Antibiotic Stewardship Study (COMPASS). BMJ Open 2018;8(6):e022666.CrossRef Google Scholar

Deo, RC. Machine Learning in Medicine. Circulation 2015;132:1920–1930.CrossRef Google Scholar PubMed

Huang, H, Ding, N, Yang, T, et al.Cross-sectional whole-genome sequencing and epidemiological study of multidrug-resistant Mycobacterium tuberculosis in China. Clin Infect Dis 2018; ciy883.Google Scholar

Escobar, GJ, Baker, JM, Kipnis, P, et al.Prediction of recurrent Clostridium difficile infection using comprehensive electronic medical records in an integrated healthcare delivery system. Infect Control Hosp Epidemiol 2017;38:1196–1203.CrossRef Google Scholar

Brier, GW. Verification of forecasts expressed in terms of probability. Mon Weather Rev 1950;78:1e3.2.0.CO;2>CrossRef Google Scholar

Cohen, ME, Ko, CY, Bilimoria, KY, et al.Optimizing ACS NSQIP modeling for evaluation of surgical quality and risk: patient risk adjustment, procedure mix adjustment, shrinkage adjustment, and surgical focus. J Am Coll Surg 2013;217:336–346.CrossRef Google Scholar PubMed

Deitterich, TG. Approximate statistical tests for comparing supervised classification learning algorithms. Neural Comput 1998;10:1895–1923.CrossRef Google Scholar

Watson, DS, Krutzinna, J, Griffiths, CE, McInnes, IB, Barnes, MR, Floridi, L. Clinical applications of machine learning algorithms: beyond the black box. BMJ 2019;364:I886.CrossRef Google Scholar PubMed

Vu, MT, Adalı, T, Ba, D, et al.A shared vision for machine learning in neuroscience. J Neurosci 2018;38:1601–1607.CrossRef Google Scholar PubMed

Deng, J, Yuan, Q, Mamitsuka, H, Zhu, S. Drug E-Rank: Predicting Drug-Target Interactions by Learning to Rank. Methods Mol Biol 2018;1807:195–202.CrossRef Google Scholar

Table 1. Characteristics of the Study Population

Table 2. Predictors Associated with a Recommendation Being Made by the Stewardship Team

Fig. 1. Area under the receiver operating characteristic curve (AUC) for logistic regression (dotted line) and boosted trees (solid line) for the derivation (left) and validation sets (right).

Table 3. Performance Characteristics of Logistic Regression and Boosted-Tree Modelsa

Table 4. Predictors Used in Each Model

Bystritsky et al. supplementary material

File 19.1 KB

Article contents

Machine learning for the prediction of antimicrobial stewardship intervention in hospitalized patients receiving broad-spectrum agents

Abstract

Methods

Study sample

PPRF description and study outcome

Data extraction and processing

Statistical methods

Results

Discussion

Acknowledgments

Financial support

Conflicts of interest

Supplementary material

References

Bystritsky et al. supplementary material

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests