Introduction
Surge capacity, or the “ability to manage a sudden, unexpected increase in patient volume (ie, numbers of patients) that would otherwise severely challenge or exceed the current capacity of the health care system,” is a fundamental necessity for hospital management of disasters such as mass-casualty incidents.Reference Hick, Hanfling and Burstein 1 This is particularly critical for the hospital emergency department (ED), which is usually the area of the hospital affected most during the initial phase of the disaster response. 2 , Reference Bayram, Sauer and Catlett 3
Quantification of surge capacity is difficult, however, as “there is no defined criterion standard metric that is consistently used across studies to determine when an ED is in a state of diminished or overwhelmed surge capacity.”Reference Handler, Gillam, Kirsch and Feied 4 In a recent Academic Emergency Medicine (Society for Academic Emergency Medicine, Des Plaines, Illinois USA) consensus conference “The Science of Surge Capacity” the authors noted: “…quantifying surge capacity is a complex task. The existing literature in disaster research is largely anecdotal, observational, and retrospective in nature.”Reference Kaji, Koenig and Bey 5 The authors also noted that no metrics for measurement of surge capacity are universally accepted.Reference Kaji, Koenig and Bey 5 Metrics used for measurement of ED operations have been published, however, recommended markers such as “percentage of ED stretchers per day occupied by inpatients” and “percentage of time the ED is above stated capacity” may not be relevant for measurement of surge capacity in the initial few hours of disaster management.Reference Schull, Guttmann and Leaver 6 However, if hospitals are to strive to increase department surge capacity, some form of objective benchmarking is mandatory. Measurement of patient length-of-stay (LOS) may be a potentially useful metric.Reference Bogucki 7
In addition to the lack of objective metrics to measure surge capacity, there is no standardized statistical method to evaluate this data. Process-control tools in statistics have been used for decades to investigate a wide variety of manufacturing processes.Reference Ott, Schilling and Neubauer 8 In general, these are simple tools (usually graphs and control charts) and are useful to look for processes that are inconsistent or ineffective (lacking statistical control). Although popularly used to investigate manufacturing and other business processes, they are not yet commonly being used to evaluate the process of ED disaster management.
Direct measurement of an ED's surge capacity is difficult as true disasters are rare events that the department must prepare for, often without having experienced the event in the past. Thus, since actual surge capacity during disaster situations can rarely be measured directly, simulation may offer a viable alternative. The authors of this research previously have studied computer simulation for use in ED disaster plan evaluation and for teaching of medical students and residents.Reference Franc-Law, Bullard and Della Corte 9 , Reference Franc-Law, Ingrassia, Ragazzoni and Della Corte 10 However, the authors are aware of no current studies that specifically detail simulation benchmarks for measurement of disaster surge capacity in the ED, nor their evaluation using methods of statistical process control.
The objective of the current study was to develop a statistical method for the derivation of surge capacity metrics using a combination of computer simulation and simple process-control statistical tools. The use of this method was then demonstrated on a subsequent computer simulation of an ED response to a mass-casualty incident.
Methods
A MySQL (Oracle, Redwood Shores, California USA) database of simulated patients was assembled by a convenience chart review of ED patients from the University of Alberta Hospital in Edmonton, Alberta, Canada. Although detailed methods of performing chart review for Emergency Medicine have been published (and are needed when statistical inference is made based on the results of the chart review), this study did not use rigorous methods as no such inference was performed; patient histories were intended as a possible patient scenario only to be used for the simulations.Reference Gilbert, Lowenstein, Koziol-McLain, Barta and Steiner 11 Details including history, exam, past medical history, laboratory results, and imaging results were assembled by a single reviewer. No identifying patient data were retained. Histories were modified to place the patients into one of the disaster scenarios. Triage codes actually assigned at ED presentation by the computer-assisted Canadian triage and acuity scale (CTAS) were also documented, as this method of computerized triage assignment has been reliable previously.Reference Dong, Bullard and Meurer 12 , Reference Dong, Bullard and Meurer 13 Triage codes were also translated to the Simple Triage and Rapid Treatment (START) algorithm, as this method is used more frequently worldwide. 14 This translation to START was performed electronically with a simple database query that used the fields for vital signs to determine the START code according to published START guidelines. 14 The database also contains a selection of nondisaster patients to replicate baseline ED flow. Patient data were initially obtained in English but also translated to Italian.
The simulated patient information was expanded further to a highly-complex, multi-dimensional database to account for changes in patient characteristics as a result of interventions and passage of time. In high-fidelity simulations, patients must show a response to many simulation variables. However, in real life, patients follow only one clinical course. Thus, some re-engineering of the patients’ actual course was needed to create this multi-dimensional database. For instance, a patient who in real life had a pneumothorax that was quickly identified and treated followed one course. However, the simulation must also account for the possibility that the pneumothorax is not identified, or that the wrong treatment is given, and the patient's history must be re-engineered based on educated estimations to account for these factors. Thus, in the end, although the patients in the database were loosely based on actual patient data, a generous amount of estimation was required to translate the patient histories into high-fidelity simulated patients.
A PHP computer program (PHP: Hypertext Processor, PHP Group/Rasmus Lerdorf, Canada) was developed to allow creation of customized disaster patient datasets from the full dataset by specifying a number of initial parameters, including number of patients, disaster scenario, length of simulation, DeBoer acuity index, baseline ED patient flow, and delay to first patients.Reference de Boer and Debacker 15 A custom dataset was created using this software program to provide patients for ED simulation using the simulation software SurgeSim (MedStatStudio, Edmonton, Alberta, Canada). The software is a web-based simulation written in HTML (Hypertext Mark-up Language, World Wide Web Consortium W3C/MIT, Cambridge, Massachusetts USA) and PHP. The software is powered by an APACHE (Apache Software Foundation, Delaware USA) web server on a laptop computer using either the SUSE Linux (Novell, Provo, Utah USA) or the Macintosh OSX (Apple Inc, Cupertino, California USA) operating system. The simulation software, which resembles an ED tracking software, is available in both English and Italian (Figure 1). The simulation software is highly customizable, including parameters such as ED layout, hospital resources, radiology resources, and delays for specific procedures. Because the simulation has been designed to be performed in real time, procedure delays for various procedures and labs were estimated by observations at the University of Alberta Hospital (upon which the layout of the simulated ED was based). Patients in the simulation software develop over time and are responsive to participant actions; each simulated patient has essentially an unlimited number of potential outcomes. As a very simple example, a patient with a pneumothorax on arrival who has a chest tube placed will show improvement in vital signs and will then have an X-ray showing the chest tube placement and resolution of the pneumothorax. Conversely, the same patient may proceed to respiratory arrest or death if untreated over a certain period of time. During the simulation, thousands of data points are saved into the MySQL database table, including such markers of patient flow, resource use, procedures performed, and bed occupancy.
The scenario used for this study was a simulated airplane crash and included simulated patients: 19 red, 36 yellow, and 130 green (by START criteria), and 31 nondisaster patients who arrived to a simulated urban hospital. Simulation sessions were performed using a standard web browser on participant laptop computers that connected to an ad-hoc wireless network. Typical simulation sessions began with a short lesson on command-and-control, followed by a tutorial session lasting approximately 45 minutes where participants were given a brief lecture on the software use and then given ample time to practice with a sample set of patients. During the disaster simulation, participants worked as a team to manage the simulated disaster, each participant at their own laptop computer. Typically, teams included 10 to 20 participants who usually set their laptops on tables in a conference room allowing for ease of face-to-face communication. Although each simulation group was given access to the printed disaster plan for the simulated hospital, actual disaster management was left entirely to the discretion of the participants who developed their own command-and-control structure and overall approach to the incident. Trained moderators at each session simulated consultant physicians, administration, and hospital support staff using a preconstructed script, which was identical for all simulations. The exercise management staff was able to customize many parameters of the hospital disaster response during the simulation. For instance, participants may have requested such manoeuvres as increasing number of beds in each ED room or transferring inpatients between wards.
Since there are no universally accepted metrics for surge capacity, a set of eight markers was developed a priori before any of the derivation set simulations were performed. The markers were chosen as they represent clearly-defined milestones in patient flow, and should be easily obtainable in both simulation and real-life environments. This included four LOS markers and four patient-volume (PV) markers. Triage accuracy was also measured. The four LOS markers included time from patient arrival to: (1) triage; (2) room assignment; (3) medical doctor (MD) assignment; and (4) disposition. The four PV markers included total number of patients during the simulation to be: (1) triaged; (2) assigned to a room; (3) assigned to an MD; and (4) disposed. In addition, triage accuracy was compared to the CTAS or START values from the database. Although the simulations result in thousands of data points stored in the MySQL database, to maintain a reasonable experiment-wide error rate, only the above-mentioned eight markers of surge capacity were evaluated.
Simulation data were analysed using a customized function (Radmac) written by the principle author (JMF) in the R statistical software language (R Development Core Team, Vienna, Austria), which directly probes the MySQL database of the simulation software. A useful feature of the statistical software is that it is cumulative; markers are adjusted with each simulation run and the software essentially “learns” what the benchmarks should be. The R code is freely available from the authors in an attempt to hold to the highest standards of reproducibility as suggested by Peng. 16 , Reference Peng 17
In the derivation phase, benchmarks for the eight chosen metrics were obtained. Throughout the study, median was used as the measure of central tendency in preference to mean. As in actual disaster response, occasional outliers are produced when a small number of patients are accidently treated at times far longer than desirable. As such, median is much less sensitive to these outliers and gives a more reasonable estimate of the expected response. Simulation also introduces an additional complication, since, as the simulations were artificially stopped (censored) after a certain amount of time, it is impossible to know what would have happened with patients left in the ED. Median is insensitive to this factor. Additionally, it is often easier to explain median to nonmathematicians by simple statements such as “one-half of patients were triaged in less than this time.” Likewise, interquartile range (IQR) was used as the measure of dispersion, again to avoid excessive influence by outliers. For the LOS markers, the third quartile (3Q) time was also obtained, as this is a useful marker of process control; time markers longer than 3Q warrant investigation. For the same reason, first quartile (1Q) was calculated for the four PV-based markers.
In the application phase, the values of the eight metrics were applied to the experimental group (Group A). Thirteen students of the European Master in Disaster Medicine (EMDM) program were randomly chosen from the class of 26 students. As the EMDM is a second-level master's degree, the level of education was high among the members; most had completed a previous medical residency. The group performed the simulation using the same simulation software and scenario.
Process-control tools were used to compare the application phase simulation to the derived benchmarks in an easily interpretable, graphical manner. To assess LOS, simulation results for median time from arrival to each marker (triage, bed assignment, MD assessment, and disposition) were plotted against the median and 1Q from the derivation set. Conversely, PV for each of the same four markers was plotted against the median and 1Q from those simulations in the derivation set. Again, all calculations were performed using a customized and reusable R function. Triage accuracy of the experimental group was also obtained by comparison with the established database triage codes.
Ethical approval was obtained when necessary for all simulations in the derivation phase performed at the University of Alberta. Informed consent was obtained from each participant prior to the start of the study. Since all data were deidentified and reported in aggregate, the local ethics committee at the University of the Eastern Piedmont authorized the study without need for formal institutional review approval.
Results
In the derivation phase, 62 simulations were performed in five countries (Canada, Italy, Sweden, Poland, and Germany). In total, 357 participants performed 3,835 simulated patient encounters. Educational level of the participants in this phase varied widely from medical students, nurses, medical residents, and medical specialists.
Length-of-stay benchmarks were obtained for each individual patient in the simulation database. For each patient, median, IQR, and 3Q were calculated. Patient-volume markers were obtained for simulations performed using the same simulation scenario and are presented in Figure 2. This represented the number of patients to reach each milestone during the simulation. Overall triage accuracy for 3,835 triaged patients from the derivation set is presented in Figure 3.
Triage accuracy for the application group (Group A) is shown in Figure 4. For Group A, median LOS from arrival to triage, bed assignment, MD assessment, and disposition for those patients assessed by Group A were compared to the median time and 3Q times for the same patients from the derivation set (Figure 5). In this case, median time from arrival to triage and arrival to room assignment was faster than the median from the derivation phase. Conversely, median time from arrival to initial medical assessment was slower than the median, although still faster than the 3Q of the derivation set. Lastly, time from arrival to disposition was slower than the 3Q time from the derivation set.
The volume of patients to reach each milestone in Group A was compared to the benchmark of median and 1Q derived from all previous simulation runs using the same simulation scenario in the derivation set (Figure 6). In this case, in Group A, the number of patients to be triaged, assigned to a room, and assessed by an MD was greater than the median of the derivation set. The number of patients to reach a disposition decision was less than the median from the derivation set, but still more than the 1Q.
Discussion
This study details the quantification of surge capacity based on derivation of LOS and PV benchmarks in a standardized simulation scenario. By applying simple statistical methods to a study group, it was possible to implement process-control tools to gain insight into surge capacity management.
Simple charts are displayed which allow for direct visualization of potentially missed opportunities for improvement of the surge response. The current study uses the median and 3Q for LOS benchmarks to assess for out-of-control processes. Although values above the 3Q are not proof that the process is faulty, it would indicate an area that likely warrants further assessment. These methods are designed to allow initial studies of small samples to be used to indicate which parts of the process require further investigation. In contrast to traditional medical research, the objective in process-control studies is to expose areas of the process that may benefit from further investigation. For instance, a typical medical study to assess medication efficacy must prove with minimal doubt that a medication is efficacious (small alpha risk). To do this requires either very large studies, or accepting a large statistical risk of being unable to show the medication is efficacious even when it is (large beta risk). Process-control studies conversely attempt to lower the beta risk (risk that the process is claimed to be in control when it is not) by accepting a higher alpha-risk (risk that the process is claimed to be out-of-control when it is actually in control). This is generally considered to be safe; investigating a process that is later shown to be in control is seldom dangerous, and must be balanced against the danger of “missed opportunity” if a process that is truly problematic is not investigated.
In the graph of LOS benchmarks (Figure 5), the vertical bars represent the median time for a patient to reach the benchmark in the experimental group. The lighter line indicates the 3Q for the same patients used in all simulations of the derivation phase and is used as the control limit. In Figure 5, it is shown that Group A performed well (below the lighter line) on all metrics, except disposition where the median time to disposition was above the 3Q. This would indicate that investigation into how the groups were making disposition manoeuvres would likely be the most fruitful way to investigate possibilities of increasing surge capacity. Conversely, it can be seen that Group A performed extremely quickly at triage, and further investment in training, education, or investigation of the triage process may be unlikely to have a major effect on capacity.
In the graph of PV benchmarks (Figure 6), the vertical bars indicate the number of patients to reach a certain milestone in Group A, while the light line marks the 1Q value for the number of patients to reach that milestone on all previous simulations using the same scenario in the derivation set, and it serves as the control limit. Here again, Group A performed well in the PV to reach the specific milestones, with the weakest point being the number of patients to reach disposition. Although the number of patients is still above the 1Q control line, the performance was below the median (dark line); this also supports that investigation of the disposition process may be recommended.
The application of the statistical process-control tools to the Group A data shows the advantage of process-control methods. Using the graphs, it is simple to visualize opportunities to improve the surge capacity. By including the median and 1Q/3Q control lines, it is obvious which processes are under statistical control without the need to resort to confidence intervals or P values that can be difficult for nonstatisticians to appreciate fully.
There are several advantages to the use of simulation to quantify surge capacity. Simulation can allow a reproducible and reusable instrument to measure surge capacity. Furthermore, since simulation techniques can involve the actual ED staff, it can allow quantification of the personnel component of disaster response rather than evaluation of the infrastructure only. Computer simulation, in particular, can be useful as it gives the advantage of precise numerical data. A standardized simulation scenario can be used to develop baseline metrics and the simulation protocol repeated under various conditions to quantify the relative changes in surge capacity. The ability to organize the simulation with minimal infrastructure involvement and with minimal influence on the hospital's day-to-day activity means that the simulation can be repeated easily. This provides a simple manner to evaluate the influence of a particular factor; for instance, the simulation could be performed both before and after employee training on the disaster plan and the results compared.
There are many advantages to the use of the combination of simulation and process-control tools for the quantification of ED surge capacity. First, the statistical methods described are reusable. Second, the software is developed such that benchmarks are dynamic; after each simulation, the benchmarks can be derived again, changing after each simulation. Unlike more traditional disaster management simulations, data from the computer simulation can be analyzed immediately after the exercise by the automated R function and results are available for immediate debriefing. Overall, the methodology is captured ideally in the philosophy for studies of process control: looking at the process in a small, simulated point in time to search for areas that need further investigation or refinement.
Limitations
A computer simulation of a complex system, such as an ED in its response to a complex and widely variable stimulus (a multi-trauma incident), requires numerous assumptions, and it would be unfounded to claim that computer simulation could predict real-world patient movement with absolute accuracy. There are several limitations to the simulated patient database. First, the patients are only loosely based on real data, and much re-engineering was needed to translate the simple linear patient histories to complex high-fidelity patients. Much of this relates, in general, to the use of high-fidelity simulation as “the creation of patho-physiology using the simulator models is therefore subject to the biases and interpretation of the scenario writer.”Reference Maran and Glavin 18 The authors of this study consider this a minor limitation, as the overall exercise goal is evaluation of surge capacity and not evaluation of the clinical management of any individual patient. More important in the study methodology is consistency of the patients between simulations, which is guaranteed by the computer software. Another unfortunate by-product of high-fidelity patients is that the complex multi-dimensional database and the simulation software make it impossible to translate cases to a written summary; the patients are suitable only for use in computer simulation.
There are also limitations in the use of computer simulation software. First, there is no proof that simulation performance accurately predicts real-world performance. As eloquently related by Pierre-Nicolas Carron, “a medical simulation can never closely duplicate a real situation; a medical simulation is limited by interface realism as well as technical and financial limitations.”Reference Carron, Trueb and Yersin 19 In addition, each simulation is dependant highly on the participant performance. Although this represents a major advantage of the simulation (allowing evaluation of personnel in addition to infrastructure), it also introduces the variable of participant performance, meaning that repeated simulations may be necessary to separate the effect of infrastructure from that of personnel. However, each simulation is labor intensive, requiring participants be present for a session lasting approximately four hours. Although it is attractive to involve multiple simulations to assess changes to infrastructure, disaster plan, and personnel, it may be difficult to coordinate the human resources needed. Computer simulation, although it may be designed to reflect real-time management, is often dependant upon assumptions of procedure and laboratory delay times; again, in this study, the influence was minimized by ensuring all groups use the same simulation assumptions. In addition, the present number of simulations is also not large enough to permit subgroup analysis, for instance, evaluation of performance by such features as participant education, ED size, or type of ED plan.
Although the simulation-based benchmarking method is attractive, further studies are needed. In particular, there is a need for many more repetitions of the simulation, including replications of the scenario and repetitions of the same patient data set to different ED layouts. This would allow more specific benchmarks to be developed, for instance, based on ED size or participant education.
Furthermore, applying the same statistical method to other simulation software or other types of exercises would be valuable to assess the robustness of the methodology. Additionally, further studies are currently underway by the authors of this study to evaluate more precisely time delays associated with ED procedures.
Conclusions
The present study demonstrates that LOS and PV benchmarks for quantification of surge capacity can be derived from computer simulation tools. These benchmarks are dynamic with each new simulation contributing further information. This study also details how simple graphical tools can be used to compare results of a single simulation to the derived metrics suggesting which areas of the surge response may require further investigation. Since simulation software is reusable, surge capacity could be potentially evaluated multiple times to assess the efficacy of changes in ED management on surge capacity. The presented statistical method can also be applied to many other types of simulations, such as computer simulations, and live exercises to create a reusable tool for evaluation of surge capacity.