Multidrug-resistant gram-negative (MDRGN) organisms represent a growing clinical threat. These bacteria can spread rapidly among vulnerable hospitalized populations, and MDRGN infections are associated with significant morbidity and mortality.Reference Livorsi, Chorazy and Schweizer1, Reference McDanel, Schweizer and Crabb2 Timely identification can limit nosocomial transmission and improve patient outcomes by facilitating prompt initiation of appropriate treatment.Reference Micek, Hampton and Kollef3, Reference Zhang, Micek and Kollef4 However, rapid diagnostics that can be readily incorporated into routine laboratory workflows are limited or lacking for many MDRGNs, posing clinical and epidemiological challenges Extended-spectrum β-lactamase (ESBL)–producing bacteria, which can hydrolyze most β-lactam antibiotics other than carbapenems, are a representative example of these MDRGNs.
Currently, no phenotypic method has been endorsed by the Clinical and Laboratory Standards Institute (CLSI) for ESBL detection.5 Although molecular methods for identifying ESBL genes are commercially available, these assays do not include a comprehensive list of known ESBL genes and would require frequent panel updates to detect emerging ESBLs.Reference Ledeboer, Lopansri and Dhiman6, Reference Ward, Stocker, Begum, Wade, Ebrahimsa and Goldenberg7 Molecular diagnostics can also be resource-intensive and are often not cost-effective for laboratories in regions where ESBL prevalence is low, and they are cost-prohibitive for developing areas of the world where ESBL prevalence is high.
Statistical models for identifying MDRGN infections can provide important information in settings where rapid diagnostics are unavailable or are resource-impractical. One particular approach, generating a logistic regression–derived risk score, is common in the healthcare epidemiology literature. However, classification and regression tree (CART) analysis or “recursive partitioning,” a form of machine learning, is an alternative approach for developing this type of decision support tool. Our group previously developed a CART decision tree for predicting ESBL bloodstream infections.Reference Goodman, Lessler and Cosgrove8 Since publication, there has been interest in whether a risk score derived from the same population could achieve greater predictive accuracy while remaining sufficiently simple to incorporate into practice.
We performed a case study of the development of a risk score from the same ESBL dataset as our original decision tree to compare the predictive accuracy of these 2 methods and to illustrate the advantages and disadvantages of logistic regression risk scores versus CART decision trees. Our objective is to offer general guiding principles for epidemiologists and researchers for when they might consider one prediction approach versus the other.
Methods
Cohort
The full description of the cohort has been previously reported.Reference Goodman, Lessler and Cosgrove8 Briefly, the study included adults hospitalized at the Johns Hopkins Hospital with bacteremia due to Escherichia coli or Klebsiella spp, from 2008 to 2015. Only the first episode of bacteremia per patient was included. Escherichia coli or Klebsiella spp with ceftriaxone minimum inhibitory concentrations (MICs) ≥2 μg/mL underwent testing for ESBL production. A decrease of ≥3 doubling dilutions in the MIC for a third-generation cephalosporin tested in combination with 4 μg/mL of clavulanic acid, versus its MIC when tested alone, was used to confirm ESBL status.
Patient data were collected via manual chart review from all available inpatient and outpatient medical records from facilities within the Johns Hopkins Health System, as well as from medical records for patients who previously received medical care at institutions in the Epic Care Everywhere Network (www.epic.com/CareEverywhere/). Patient data collected, which was based on the time period prior to day 1 of bacteremia (defined as the date the initial blood culture was collected), included the following: (1) demographic data; (2) preexisting medical conditions; (3) presumptive source of bacteremia (eg, catheter, pneumonia); (4) indwelling hardware; (5) multidrug-resistant organism (MDRO) colonization or infection (MDR Pseudomonas aeruginosa, MDR Acinetobacter baumannii, ESBL-producing Enterobacteriaceae, carbapenem-resistant Enterobacteriaceae, vancomycin-resistant Enterococcus species, and methicillin-resistant Staphylococcus aureus)9 in the prior 6 months; (6) days of antibiotic therapy with gram-negative activity in the prior 6 months; (7) length of stay in any healthcare facility in the prior 6 months; (8) post-acute care facility stay in the prior 6 months; and (9) hospitalization in another country in the prior 6 months (assessed by standard nursing intake questionnaire upon Johns Hopkins Hospital admission). International hospitalizations in the following regions were classified as ESBL “high-burden”: Latin America (excluding the Caribbean), the Middle East (including Egypt), South Asia, China, and the Mediterranean.Reference Kantele, Laaveri and Mero10, Reference Ostholm-Balkhed, Tarnberg and Nilsson11
Statistical methods
Descriptive statistics, univariable analyses, and decision tree derivation and validation have been described previously.Reference Goodman, Lessler and Cosgrove8 Briefly, a tree was derived using the following process: (1) identification of the single variable that, when used to split the dataset into 2 groups (“nodes”), best separated ESBL-positive from ESBL-negative patients, according to the Gini impurity criterionReference Duda, Hart and Stork12, Reference Breiman, Friedman, Stone and Olshen13; (2) repetition of this partitioning process in each daughter node and subsequent generations of nodes (“branching”); and (3) termination at “terminal” nodes (“leaves”) when no additional variables in the data sufficiently distinguished patients by their ESBL status. Terminal nodes in binary recursive partitioning trees predict ESBL status categorically, but by evaluating the node impurity (eg, the mixture of ESBL-positive and ESBL-negative patients), they also offer associated probabilities.
We internally validated the performance of our tree using the leave-one-out cross-validation method,Reference Duda, Hart and Stork12 in which a single observation is held out and a new model is derived from a dataset containing the remaining n − 1 observations. The resulting model is used to predict the value of the held-out observation. This process is repeated for all observations in the dataset, and performance metrics (eg, error) can be averaged across the n fitted models (in this case, decision trees) to produce a single estimate. We evaluated the discrimination of the original and cross-validated models through the generation of receiver operating characteristic (ROC) curves and calculation of C statistics. Decision tree analyses were performed using the RPART (Recursive Partitioning and Regression Trees) package in R Studio version 4.1–90.99.902 software (R Foundation for Statistical Computing, Vienna, Austria).
To develop a risk score, continuous variables (eg, age and antibiotic days) were first converted into ordinal categories to reduce complexity, given the score’s anticipated manual application. A multivariable logistic regression model was derived using stepwise variable selection with backward elimination at an α level of 0.05. To create points, regression coefficients were rescaled by dividing by the smallest final model coefficient and rounding to the nearest integer (with the exception of antibiotic therapy, which received 0.25 points per week (up to a maximum of 1 point or ≥4 weeks), to simplify end-user calculations). Patient scores were calculated by summing their respective points (risk score model).
For both the multivariable regression model and the risk score model, discrimination was assessed with ROC curves and accompanying C statistics (ie, area under the curve). Risk score model calibration was evaluated using Hosmer-Lemeshow (HL) goodness-of-fit tests and graphical plots of observed proportion versus model-predicted ESBL probabilities by decile groups. Discrimination was internally validated with leave-one-out cross-validation. Risk score analyses were performed in Stata version 13.0 software (StataCorp, College Station, TX) and R Studio.
Results
Spanning the 2008 to 2015 time period, a total of 1,288 bacteremic patients met inclusion criteria, of whom 194 (15%) were ESBL positive. Patient and microbial characteristics have been reported previously.Reference Goodman, Lessler and Cosgrove8
Risk score
The multivariable model and resulting risk score included 14 variables (Table 1), which were broadly categorizable into 6 groups (Fig. 1):
1. Indwelling hardware on day of culture. Orthopedic hardware (2 points); chronic indwelling vascular hardware (1 point); nephrostomy tube or Foley catheter (2 points); gastrointestinal feeding tube (2 points).
2. Presumptive source of bloodstream infection. central vascular catheter (2 points); pneumonia (2 points).
3. Patient characteristics. Structural lung disease (chronic obstructive pulmonary disease, emphysema, or tracheostomy dependency) (2 points); self-identification as Asian race (2 points).
4. Healthcare exposure within the previous 6 months. Post-acute care facility (2 points); ≥1 night of international hospitalization in an ESBL high-burden region (5 points).
5. MDRGN colonization or infection within the previous 6 months. ESBL (6 points); carbapenem-resistant Enterobacteriaceae (CRE) (6 points); MDR Pseudomonas spp (−4 points).
6. Antibiotic exposure within the previous 6 months. Weeks of therapy with gram-negative activity (0.25 points per week, up to a maximum of 1 point).
Table 1. Regression Model and Corresponding Points Scoring Systema for Predicting Extended-Spectrum β-Lactamase (ESBL) Status in a Cohort of Adult Patients with Escherichia coli and Klebsiella spp Bacteremia

a To create points, the smallest model coefficient (0.15, per week of antibiotic therapy) was identified. To simplify end-user calculations, antibiotic therapy was scaled to receive 0.25 points per week, up to a maximum of 1 point or ≥4 weeks, by dividing by 0.60 (0.15/0.60 = 0.25). All other coefficients were also divided by 0.60 and rounded to the nearest whole integer. Patient scores were calculated by summing their respective points (risk score model).
b Chronic obstructive pulmonary disease, emphysema, or chronic ventilator dependency.
c Latin America (excluding the Caribbean), the Middle East (including Egypt), South Asia, China, and the Mediterranean.

Fig. 1. A printable clinical risk score for bedside use to predict a bacteremic patient’s likelihood of infection with an extended-spectrum β-lactamase (ESBL)–producing organism at the time of organism genus and species identification. Risk-factor points are noted in parentheses and summed among the 14 variables to produce a patient’s risk score. Possible score cutoffs for ESBL-positive bacteremia, and associated sensitivities and specificities, are reflected in Table 2. aChronic obstructive pulmonary disease, emphysema, or chronic ventilator-dependency. bLatin America (excluding the Caribbean), the Middle East (including Egypt), South Asia, China, and the Mediterranean.*This statement reflects the positive predictive value of the score at a cutoff point of 7.25 and should be modified by the facility to account for local prevalence of ESBL bacteremia. Note. MDRGN, multidrug-resistant gram-negative organism; CRE, carbapenem-resistant Enterobacteriaceae. Drug-resistant organisms were defined in accordance with the Centers for Disease Control and Prevention guidelines.9
Patient scores ranged from −3 to 18.75, with a median score of 2 points (interquartile range: 0–3.25). The C statistic for the clinical risk score was 0.87 and 0.89 following cross-validation. The C statistic for the multivariable logistic regression model was also 0.87 (Fig. 2). The multivariable logistic regression model provided evidence of acceptable calibration (HL goodness-of-fit test P = .13). Following point conversion, however, the risk score model over- or underestimated the probability of ESBL infection at different points along the risk continuum, with the exception of very high-risk deciles (HL goodness-of-fit test P < .001) (Fig. 2). An ESBL-positive cutoff point of ≥7.25 maximized overall ESBL classification accuracy (92%). At this cutoff point, the risk score had a sensitivity of 49.5% and a specificity of 99.5%, and its positive and negative predictive values were 94.6% and 91.8%, respectively. Table 2 provides the risk score’s sensitivity and specificity at each possible ESBL-positive cutoff point.

Fig. 2. Discrimination and calibration metrics for the multivariable logistic regression model and resulting risk score model. (A) Receiver operating characteristic (ROC) curve for the logistic regression model, prior to risk score transformation. The area under the curve (AUC) was 0.87 which, after rounding, was unchanged following conversion to a point-based risk-score model. See Table 2 for exact sensitivity and specificity values at different score cutoff points. (B) Calibration plot of observed proportion versus ESBL probabilities predicted by the risk score model, by decile groups.
Table 2. Risk Score Sensitivity, Specificity, and Overall Classification Accuracy at Select Cutoff Points for Predicting Extended-Spectrum β-Lactamase (ESBL) Status in a Cohort of Adult Patients with Escherichia coli and Klebsiella Species Bacteremiaa

Note. CI, confidence interval.
a Cutoff points <0 and ≥9.5 were excluded because, respectively, they yielded equal sensitivity (100%) but inferior specificity, or inferior sensitivity but equal specificity (100%). Dark gray shading indicates the cutoff point that maximized overall classification accuracy (≥7.25 points).
Decision tree
The final decision treeReference Goodman, Lessler and Cosgrove8 included 5 predictors: central vascular catheter, age ≥43 years, and in the prior 6 months: history of ESBL colonization/infection, ≥1 night hospitalization in an ESBL high-burden region, and/or ≥1 week of gram-negative active antibiotic therapy (Fig. 3). The C statistic of the decision tree was 0.77 (unchanged in cross-validation); the sensitivity and specificity were 51.0% and 99.1%, and the positive and negative predictive values were 90.8% and 91.9%, respectively. Table 3 presents a comparison of the performance metrics of the risk score versus the decision tree.

Fig. 3. A clinical decision tree to predict a bacteremic patient’s likelihood of infection with an extended-spectrum β-lactamase (ESBL)–producing organism at the time of organism genus and species identification, adapted from Goodman et al (2016).Reference Goodman, Lessler and Cosgrove8 Gray-shaded terminal nodes indicate that the tree would classify patients as ESBL positive, and accompanying percentages (derived from terminal-node impurities) reflect the probability that patients assigned to a given terminal node are ESBL-positive. Terminal node numbering (1–6) is included in parentheses. *Latin America (excluding the Caribbean), the Middle East (including Egypt), South Asia, China, and the Mediterranean.
Table 3. Comparative Performance Metrics of a Logistic Regression-Derived Clinical Risk Score and a Machine Learning-Derived Decision Tree to Predict Extended-Spectrum β-Lactamase (ESBL) Status

a Risk score values vary depending upon the selected cutoff point for dichotomization. Values reflected for the risk score are for the cutoff point of ≥7.25 points, which optimized overall classification accuracy.
Discussion
Despite advances in rapid diagnostics, timely identification of MDRGNs remains a clinical and epidemiological challenge. Diagnostic delays can prolong the period of ineffective antibiotic therapy and can increase the risk of nosocomial transmissions.Reference Micek, Hampton and Kollef3, Reference Zhang, Micek and Kollef4 Statistical models for predicting drug resistance can play an important role in settings where rapid diagnostic tests are unavailable or are resource-impractical. This case study of ESBL bloodstream infections explores 2 approaches for developing predictive models: traditional logistic regression-derived risk scores and machine learning-derived decision trees.
The risk score included 14 independent predictors, broadly classifiable into 6 categories: indwelling hardware, bloodstream infection source, patient characteristics, recent gram-negative antibiotic exposure, healthcare exposure, and MDRO history. Many of these variables (eg, antibiotic use, prior ESBL colonization or infection) were retained in the decision tree. They are also consistent with other studies examining risk factors for MDRGN bloodstream infectionsReference Tseng, Chen and Yang14 and recent scores for identifying community- and hospital-onset ESBL or third-generation cephalosporin-resistant bacteremia in other populations.Reference Rottier, van Werkhoven and Bamberg15, Reference Augustine, Testerman and Justo16 Taken together with the risk score’s similar C statistic following cross-validation (0.89), this evidence suggests that despite the inclusion of a large number of variables, the risk score was not overfit.
Given that risk scores for binary predictions are dichotomized at a cutoff point, in practice the risk score and the decision tree performed similarly: sensitivities 49.5% and 51.0% and specificities 99.5% and 99.1%, respectively. However, the risk score had a ~10% higher area-under-the-curve (risk score and decision tree C statistics: 0.87 vs 0.77). This higher AUC offers users more latitude to prioritize sensitivity over specificity, or vice versa, by changing the cutoff point (as discussed in more detail below). In theory, a decision tree could also be developed to optimize a different balance of sensitivity and specificity, but this would require deriving an entirely new tree. The risk score’s greater flexibility, however, came at a cost of low user-friendliness for manual application. Studies consistently demonstrate that incorporating decision support tools at the point of care is important to their success,Reference Kawamoto, Houlihan, Balas and Lobach17 but manual tabulation of 14 variables would encounter significant bedside utilization barriers. In contrast, decision-tree branching logic does not require end-user calculations and, at least in this ESBL case study, the final decision tree included far fewer (ie, 5) predictors.
The potential tradeoff between flexibility and user friendliness is an important consideration when evaluating whether risk scores or decision trees are a more suitable decision support tool for a given application. Additional considerations, however, may also help to guide researchers in selecting one option versus the other. Below, we summarize the relative strengths of risk scores and decision trees for model development and fitting, implementation, and adaptability. Of note, the CART analysis is the tree-fitting process (approach), and a decision tree is the result (output), just as logistic regression is a common (but by no means the only or necessarily even the preferred) approach for developing a risk score. Approach and output can differ in their strengths and limitations, and we distinguish these concepts in our discussion.
Methodological differences between logistic regression and CART influence the data assumptions and exploratory analyses required for model development and fitting. In general, the more complex or challenging the underlying data, the more utility a machine learning approach can provide. Specifically, logistic regression imposes important data requirements, including minimal collinearity (ie, correlation) among independent variables and a sufficient ratio of cases to predictors (ie, sufficient sample size; a general, although debatable, guideline is 10 expected cases per predictor evaluated).Reference Peduzzi, Concato, Kemper, Holford and Feinstein18, Reference Vittinghoff and McCulloch19 In contrast, CART is nonparametric and makes fewer data assumptions,Reference Breiman, Friedman, Stone and Olshen13 and it can accommodate collinear independent variables. It is also less sensitive to outliers and more robust to high-dimensional data, which possess many independent variables relative to outcomes. These features are appealing in MDRGN research, given the abundance of predictors in patient medical records but the relative rarity of clinical outcomes. Moreover, logistic regression requires a priori specification and evaluation of variable interactions, whereas CART identifies interactions without user input,Reference Breiman, Friedman, Stone and Olshen13 a potentially helpful feature when the understanding of variable relationships is generally limited.
The benefits of CART, however, can come with a steep learning curve for researchers without prior experience with these methods. In particular, decision trees are prone to overfitting, in which they fit the data “too well” (including its idiosyncrasies and noise) and may consequently perform poorly on new data.Reference Dietterich20 Sufficient expertise in pruning and/or stopping criteria during the tree-branching process is therefore critical to the utility and generalizability of the resulting tree, as is the use of internal validation methods (eg, cross-validation) when external testing datasets are unavailable. Although ensemble tree methods such as random forests analysis can address many of these challenges, these methods do not produce a single decision tree that can be used as a decision support tool (without automation).Reference Chen and Ishwaran21, Reference Strobl, Malley and Tutz22
Decision tree branching logic does not require calculations, and decision trees are generally intuitive and user-friendly. When manual bedside use is anticipated, these features are especially beneficial. As facilities incorporate automated decision support tools and algorithms into electronic health records (EHRs), these benefits attenuate. In this ESBL case study, because important variables required clinical judgment (eg, source of infection) or were not hard-coded in the EHR (eg, foreign country of recent hospitalization was only entered as natural language), automating the decision support tool would have been challenging. As a result, the decision tree’s simplicity for manual bedside use was highly valuable for this research application.
Finally, for applications in which decision support tool flexibility is paramount, risk scores are attractive because their cutoff points are modifiable by end users. Risk scores provide a range of score cutoffs, each with an associated sensitivity and specificity, which allow individual users to toggle the cutoff point to minimize the false-positive or false-negative rate (eg, depending upon infection severity or the clinical appearance of the patient). Using the current risk score, for example, a user seeking to increase sensitivity could choose a lower cutoff point of ≥3 points and reduce the risk of incorrectly classifying an ESBL infection as ESBL negative to <1 in 5 (sensitivity 83.5%, specificity 73.1%) (Table 1). This flexibility allows clinicians and hospital epidemiologists to maximize detection of cases (ie, ESBL-positive patients), though at the cost of attendant reductions in specificity and overall classification accuracy.
We caution, however, that although enhanced flexibility is generally beneficial, a risk score’s utility depends upon users understanding the score and the implications of adjusting the cutoff point. Large score differences between patients may translate to minimal differences in risk, and vice versa. Moreover, cutoff-point positive and negative predictive values (ie, the probability that a patient does or does not have an ESBL-producing infection given a score that is respectively above or below the selected cutoff point) will vary by ESBL prevalence in the target population. It is imperative that the table of cutoff-point sensitivities and specificities, and an understanding that an institution’s disease prevalence will affect the positive and negative predictive values, guides decisions about score thresholds for ESBL infection.
In contrast to risk scores, classification trees provide binary predictions (eg, “ESBL” or “not ESBL”), with a single sensitivity and specificity value for the tree as a whole. Terminal node percentages (eg, “37% probability of being ESBL positive”) can quantify these predictions but do not provide a formal mechanism for prioritizing sensitivity versus specificity. For research applications in which sensitivity is the priority, methods are available to impose a greater “cost” for case misclassification during the tree-fitting process.Reference Drummond and Holte23 The limitation, however, is that these mechanisms are not adjustable by end users after a tree is built. In other words, whereas the CART approach provides flexibility to optimize sensitivity or specificity, once a single, final tree (output) is developed and provided to clinicians, the ability to adjust sensitivity and specificity is limited.
Although these considerations can help researchers to evaluate whether a risk score or a decision tree is preferable for a given research question (Table 4), a decision is rarely clear cut. In cases in which each model would at least partially meet stated goals, we encourage investigators to develop both support tools in parallel to compare their performance metrics. In particular, although model performance was comparable in this case study, other applications with more challenging data (eg, high-dimensionality, higher-order variable interactions) might more clearly favor a machine learning approach such as CART.
Table 4. Comparative Strengths and Limitations of Logistic Regression-Derived Risk Scores and Classification and Regression Tree (CART) Analysis-Derived Decision Trees for Predicting Drug-Resistant Infections in Clinical Settings

Our study has several limitations. This study was conducted in a single center, and although we internally validated our models, it lacked an external validation cohort. In addition, data may have been missing for patients treated outside of the Epic Care Everywhere network, although we do not expect such occurrences to have differed by ESBL status. As such, any resulting exposure misclassification would likely reduce predictive performance, and yet risk score discrimination remained robust, including in cross-validation. Nevertheless, we encourage others to evaluate and validate the risk score in their own patient populations, particularly for settings that differ from our academic, tertiary-care hospital cohort. Importantly, however, because study characteristics were constant across analyses, we expect decision tree and risk score comparisons to be unbiased. Finally, this case study intended to offer a practical, high-level introduction to a relatively simple machine learning approach, but we note that many machine learning methodologies (eg, random forests, Super Learner) offer potential healthcare epidemiology utility. We refer interested readers to additional resources that address these approaches and the underlying algorithms in greater technical detail.Reference Strobl, Malley and Tutz22, Reference Tibshirani24, Reference Song and Lu26
Overall, timely identification of MDRGN infections remains a clinical and epidemiological challenge. Rapid detection enables isolation of infected patients and prompt initiation of appropriate antibiotic treatment. Statistical models for predicting drug resistance can provide important information in settings when laboratory diagnostics are challenging to implement. This examination explored 2 alternative decision support tools, logistic regression–derived risk scores and machine learning–derived decision trees, in an inpatient cohort of bacteremic patients to predict ESBL infection. These methodologies offer different strengths and limitations, and we hope that their continued utilization in infectious disease research will assist with improving patient outcomes.
Author ORCIDs
Katherine E. Goodman, 0000-0003-2851-775X
Acknowledgments
Financial support
This work was supported by funding from the Agency for Healthcare Research and Quality (grant no. R36HS025089 to K.E.G.), from the Sherrilyn and Ken Fisher Center for Environmental Infectious Diseases (grants to K.E.G. and A.M.M.), and from the National Institutes of Health (grant no. K23-AI127935 to P.D.T.).
Conflicts of interest
None of the authors report any conflicts of interest.