Introduction
Resources to assist in a surge of critically ill or injured individuals during a disaster or mass-causality incident (MCI) may be limited, creating a discrepancy between needs and available resources. Reference Ghanbari, Ardalan, Zareiyan, Nejati, Hanfling and Bagheri1 Unlike triage within the hospital, disaster triage must perform the delicate task of balancing the need of each patient for medical care and the overarching needs of the health care system to avoid overwhelming health care capacity. This can be an unusual and frightening event for health care providers who often respond to disasters working in unfamiliar conditions and are forced to make uncomfortable, morally challenging, and critical clinical care decisions with minimal information.
To address these concerns, numerous disaster triage algorithms have been developed. Their goals are to provide an easily learnable and consistent tool to aid in these difficult decisions. Many are based on simple flow charts with the goal of facilitating difficult clinical decisions during times of provider stress while ensuring that victims of a disaster or MCI are prioritized based on their clinical needs. The ability of these disaster triage tools to accurately triage victims is important. Under-triage (poor sensitivity) can result in a failure to recognize victims who could benefit from urgent medical intervention. Conversely, over-triage (poor specificity) results in valuable resources being used prematurely or unnecessarily.
To assist in the triage of victims of a disaster or MCI, the Simple Triage and Rapid Treatment (START) tool was introduced in 1983 by the Newport Beach Fire and Marine Department (Newport Beach, California USA) and the Hoag Hospital (Orange County, California USA) using the now familiar groups of black (expectant), red (immediate), yellow (delayed), and green (minor) to prioritize care of disaster victims. Reference Hogan and Burstein2 As the tool is based on a simple flowchart, it has been targeted to providers of all groups (ie, physicians, nurses, trainees, and prehospital providers) as a simple way to promote consistent and reproducible triage.
Despite being one of the most commonly used and studied disaster triage systems world-wide, there are no published syntheses on the accuracy of the START tool. The purpose of this meta-analysis was to assess overall accuracy, as well as the proportion of under- and over-triage, for the START method when used by providers across a variety of backgrounds. In addition, specific estimates of accuracy were obtained for each of the four START categories: red, yellow, green, and black.
Methods
A study protocol was developed “a priori” and registered on PROSPERO (registration # CRD42020175457) to define the objectives, selection criteria, data collection, and analysis. This review conforms to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. Reference Page, McKenzie and Bossuyt3
Differences between Protocol and Review
Several amendments were made to the initial protocol. First, the initial plan was to employ the following tools to access the risk of bias: (1) Cochrane risk of bias tool for studies enrolling patients in a true disaster, randomized controlled trials (RCTs); (2) before-after quality assessment checklist for before-after studies; (3) Newcastle-Ottawa Scale (NOS) for observational cohort/case-control studies; and (4) the Simulation Research Evaluation Rubric for simulation-based MCI studies. Reference Fey, Gloe and Mariani4–Reference Wells, Shea, O’Connell and Peterson8 However, all collaborators agreed before the implementation that it was more appropriate to use the Mixed Methods Appraisal Tool (MMAT; Montreal, Canada) as it is a single tool that allows for the assessment of multiple study designs. Reference Hong, Fàbregues and Bartlett9,Reference Pace, Pluye and Bartlett10 Second, several sub-group and sensitivity analyses proposed in the protocol will now be reported elsewhere. Third, due to infrequent and incomplete reporting, other proposed accuracy outcomes including specificity, sensitivity, likelihood ratios, as well as reliability/validity were not assessed as planned; the review instead focused on assessing triage accuracy as well as over- and under-triage outcomes.
Inclusion/Exclusion
To identify all relevant studies on the topic, the following inclusion criteria were used.
Population
Adults (≥17 years of age) being triaged in or out of hospital during a true or simulated disaster or MCI.
Intervention
Use of START triage assessment, either in a true or simulated disaster or MCI.
Comparator
Studies consisting of a single or multiple groups were eligible for inclusion, including those comparing the accuracy of START to a reference standard (eg, case examples that are defined through consensus), other triage tools, different triage assessors, or different disaster scenarios.
Outcomes
Studies were required to report the diagnostic accuracy of START to be included in the review. The primary outcome of the review was the triage accuracy of START, which was defined as the proportion of cases triaged to the correct category when compared to the reference standard. The overall triage accuracy of START was reported, as well as the accuracy of START based on the triage sub-groups (black, red, yellow, and green). Secondary outcomes of interest included over- and under-triage.
Study Design
Studies described as RCTs or non-RCTs, cohort descriptive studies, or mixed methods studies were eligible for inclusion. Reviews, editorials, and commentaries were not included.
There were no exclusion criteria based on who conducted the triage assessment using START. Studies that strictly reported the accuracy of a modified version of START were not eligible to be included. No other limits regarding publication status or language were applied.
Search Methods for Identification of Studies
A search was executed by an expert health librarian (SC) on the following databases: OVID Medline (US National Library of Medicine, National Institutes of Health; Bethesda, Maryland USA); OVID EMBASE (Elsevier; Amsterdam, Netherlands); OVID Global Health (EBSCO Information Services; Ipswich, Massachusetts USA); EBSCO CINAHL (EBSCO Information Services; Ipswich, Massachusetts USA); Compendex (Engineering Village [Elsevier; Amsterdam, Netherlands]); SCOPUS (Elsevier; Amsterdam, Netherlands); ProQuest Dissertations and Theses Global (Ann Arbor, Michigan USA); Cochrane Library (The Cochrane Collaboration; London, United Kingdom); and PROSPERO (University of York; York, United Kingdom) using controlled vocabulary (eg, MeSH [Medical Subject Headings] and Emtree [Embase]) and keywords representing the concepts “START” and “triage” and “mass casualties.” The searches were complete up through March 2020 and search strategies were adjusted appropriately for different databases. For primary databases, searches were limited to 1983 through March 2020. Results of the searches were exported to the citation management system RefWorks (Version 2.1.0.1; ProQuest, LLC; Ann Arbor, Michigan USA) and also exported to the COVIDENCE systematic review program (Veritas Health Innovation Ltd; Melbourne, Australia). Detailed search strategies are available in Appendix 1 (available online only).
Additional searches of the grey literature were conducted to identify any studies missed from the search of the databases. Sources of the grey literature that were searched included Google Scholar (Google Inc.; Mountain View, California USA); clinical trial registries (clinicaltrials.gov, Cochrane Central Register of controlled trials [The Cochrane Collaboration; London, United Kingdom], and controlled-trials.com); Web of Science (Thomson Reuters; New York, New York USA); backward and forward SCOPUS searches of included studies; bibliographies from included studies and known reviews; as well as hand-searching of abstracts from emergency medicine conferences, including Canadian Association of Emergency Medicine (2018-2020; Ottawa, Ontario Canada), Society of Academic Emergency Medicine (2018-2020; Des Plaines, Illinois USA), and American College of Emergency Physicians (2018-2020; Irving, Texas USA). When possible, search results were imported into EndNote (Clarivate Analytics; Philadelphia, Pennsylvania USA) before being exported into the COVIDENCE systematic review program.
Study Selection and Data Extraction
Potentially eligible studies were selected using a two-stage screening process. At the first stage, titles and abstracts of all studies identified in the literature search were screened by two independent reviewers (UDW, SWK) against pre-determined inclusion and exclusion criteria to identify potentially eligible studies. At the second stage, the full-text manuscripts of all studies identified as potentially eligible by at least one of the reviewers were retrieved and reviewed for eligibility by two independent reviewers (UDW, SWK) using the pre-defined eligibility criteria. Disagreements between the reviewers regarding the eligibility of studies was mediated via a third-party adjudicator (JMF).
The data of all included studies were extracted independently by two of three available reviewers (UDW, SWK, JM). Pre-defined outcomes were extracted using standardized extraction forms. The completed data extraction was then verified for accuracy by the third reviewer (either UDW, SWK, or JM) who did not complete the initial assessment. Disagreements that could not be settled via discussion between the reviewers were mediated by a third-party adjudicator (JMF). Outcomes regarding study characteristics, characteristics of simulation or MCI, implementation of START, description of the reference standard, and primary/secondary outcomes of interest were extracted onto standardized forms.
Quality Appraisal
The MMAT was employed to assess the quality of the included studies. Reference Hong, Fàbregues and Bartlett9,Reference Pace, Pluye and Bartlett10 Two reviewers (SK, UDW) independently evaluated the MMAT level of evidence for each article and completed a data extraction table. Discrepancies were resolved through discussion.
Data Synthesis
Accuracy outcomes including triage accuracy, over-triage, and under-triage are presented as means with 95% confidence interval (CI) as calculated by the binomial method. Reference Devore11 A meta-analysis for the outcomes of triage accuracy, over-triage, and under-triage was performed using Stat59 (Build73a7cf; Stat59 Services Ltd.; Edmonton, Alberta Canada). Pooled estimates were calculated using the DerSimonian and Laird method for the random effects model and the inverse variance method for the fixed effects model. Reference Egger, Smith and Altman12 The heterogeneity statistic Q was calculated by the inverse variance method and P values for heterogeneity were based on the chi-square test of Q. Reference Egger, Smith and Altman12 A sensitivity analysis of the primary outcome (overall triage accuracy) based on the inclusion of studies in which triage accuracy had to be imputed due to incomplete outcome reporting was also completed.
Results
Search Results
A total of 3,901 articles were identified in the search of the electronic databases and grey literature (Figure 1). After duplicates were removed, the title and abstracts of 1,820 studies were reviewed. A total of 1,471 studies were excluded for irrelevance and 349 were identified as potentially eligible. After full-text review, 317 studies were excluded for various reasons including ineligible study design (n = 111), did not assess START (n = 72), did not report eligible outcomes (n = 61), duplicate publications (n = 41), not set in a disaster/MCI (n = 20), included pediatric victims (n = 7), and in some cases, the full text could not be retrieved (n = 5). As a result, a total of 32 studies were included in the review (Table 1 Reference Arshad, Williams and Asaeda13–Reference Ersoy and Akpinar44 ; available online only).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220218081639037-0653:S1049023X2100131X:S1049023X2100131X_fig1.png?pub-status=live)
Figure 1. PRISMA Flow Diagram.
Abbreviations: START, Simple Triage and Rapid Treatment; MCI, mass-casualty incident.
Study Characteristics
A descriptive analysis of these 32 studies will be published as a scoping review elsewhere and a summary of the characteristics of the studies are provided in Table 1. Reference Wisnesky, Kirkland, Rowe and Franc45 Briefly, one-half of the studies were based in North America, with the remaining studies being conducted in various countries across Europe, Asia, the Middle East, and Oceania. The majority of studies involved triage of simulated patients. Based on the classification of simulation techniques developed by Alinier, the majority of studies used a Level 3 (live) simulation involving actors in moulage playing victims in various MCI scenarios. Reference Alinier46 The next most common simulation type was paper-based exercises (Level 0) or computer-based simulations including the use of virtual reality (VR) simulations (Level 2). The simulations most commonly involved motor vehicle collisions with some studies simulating a collision of trains or airplanes. Eleven studies did not specify the nature of the disaster or MCI simulated. Participants who completed the triage assessments included: paramedics; firefighters and other first responders; physicians; nurses; and students. All but two studies assessed the accuracy of START by comparing to a reference standard. Reference Kahn, Schultz, Miller and Anderson38,Reference Challen and Walter39 However, the majority of studies (n = 22) did not specify the reference standard used when assessing the accuracy of START. When specified, the reference standard was most commonly expert opinion. Reference Ferrandini Price, Escribano Tortosa and Nieto Fernandez-Pacheco15,Reference Ingrassia, Colombo, Barra, Carenzo, Franc and Della Corte16,Reference Bolduc, Maghraby, Fok, Luong and Homier25,Reference Sapp, Brice, Myers and Hinchey30,Reference Silvestri, Field and Mangalat32,Reference Buono, Lyon and Huang36,Reference Crews40,Reference Curran-Sills and Franc41,Reference Fink, Rega, Sexton and Wishner47
Quality of Studies
Five studies were assessed using the MMAT quality criteria for quantitative RCTs (Appendix 2; available online only). Reference Badiali, Giugni and Marcis14,Reference Ingrassia, Ragazzoni, Carenzo, Colombo, Gallardo and Corte18,Reference Jain, Sibley, Stryhn and Hubloue20–Reference Lee, Franc and Lee22 Across the studies, the method of randomization was not clear. Reference Badiali, Giugni and Marcis14,Reference Ingrassia, Ragazzoni, Carenzo, Colombo, Gallardo and Corte18,Reference Jain, Sibley, Stryhn and Hubloue20–Reference Lee, Franc and Lee22 In addition, in several studies, blinding was not adequately described. Reference Badiali, Giugni and Marcis14,Reference Jain, Sibley, Stryhn and Hubloue20–Reference Lee, Franc and Lee22 Assessments using MMAT, however, did identify complete outcome reporting across all five studies and that participants did adhere to the assigned groups.
Seventeen studies were assessed using the MMAT quality criteria for quantitative non-RCTs which included before-after studies and comparative observational cohorts (Appendix 3; available online only). Reference Arshad, Williams and Asaeda13,Reference Ferrandini Price, Escribano Tortosa and Nieto Fernandez-Pacheco15–Reference Ingrassia, Ragazzoni, Tengattini, Carenzo and Corte17,Reference Izumida, Kato and Shigeno19,Reference Loth, Cote and Shaafi Kabiri24,Reference Bolduc, Maghraby, Fok, Luong and Homier25,Reference Navin, Sacco and Waddell27,Reference Risavi, Lee, Terrell and Holsten28,Reference Sapp, Brice, Myers and Hinchey30,Reference Silvestri, Field and Mangalat32,Reference Wu, Shu and Chung34,Reference Buono, Lyon and Huang36,Reference Challen and Walter39,Reference Curran-Sills and Franc41,Reference Ellebrecht and Latasch43,Reference Waddell and Navin48 The majority of the studies did not provide adequate description of the study participants or how they were included in the study. The majority of studies utilized appropriate measurements and had complete outcome reporting.
Eight studies were assessed using the MMAT quality criteria for quantitative descriptive studies, which consisted of a single cohort (Appendix 4; available online only). Reference Lima, De-Vasc Oncelos and Queiroz23,Reference Schenker, Goldstein and Braun31,Reference Simões, Duarte Neto, Maciel, Furtado and Paulo33,Reference McCoy, Alrabah and Weichmann35,Reference McElroy, Steinberg, Keller and Falcone37,Reference Kahn, Schultz, Miller and Anderson38,Reference Djalali, Carenzo and Ragazzoni42,Reference Ersoy and Akpinar44 The majority of these studies reported a clear research question, relevant sampling strategy, and had low risk for nonresponse bias. Several studies, however, did not report how their statistical analysis was completed and several provided insufficient information on the study participants.
Finally, two studies were assessed using the MMAT quality criteria for mixed methods studies (Appendix 5; available online only). Reference Mills, Dykstra and Hansen26,Reference Crews40 Overall, the quality of the two studies varied with one study judged as not effectively integrating the mixed methods approach to answer their research question and having low quality for the quantitative and qualitative components. Reference Crews40
Primary Outcome
Overall Triage Accuracy
Twenty-four studies included the absolute numbers for overall correct triage and form the analysis group for the main study outcome (Figure 2). The reported proportion for correct triage in the included studies ranged from 0.27 to 0.99. Using the random effects model, the pooled estimate for proportion of correct triage was 0.73 (95% CI, 0.67 to 0.78). The pooled estimate was 0.70 (95% CI, 0.70 to 0.71) using the fixed effect model. There was significant heterogeneity among the included studies (P < .0001).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220218081639037-0653:S1049023X2100131X:S1049023X2100131X_fig2.png?pub-status=live)
Figure 2. Overall Triage Correct.
Five studies contained adequate information to be included in the assessment of accuracy of the triage category black (Figure 3). The proportion of cases correctly triaged ranged from 0.58 to 0.98. The pooled estimate for the proportion of victims correctly triaged was 0.85 (95% CI, 0.63 to 1.0) using the random effects model. Using the fixed effect model, the accuracy was 0.82 (95% CI, 0.80 to 0.84). There was significant heterogeneity among studies (P < .0001).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220218081639037-0653:S1049023X2100131X:S1049023X2100131X_fig3.png?pub-status=live)
Figure 3. Category BLACK Triage Correct.
Eight studies were included in the red triage category with accuracy ranging from 0.40 to 0.95 (Figure 4). Using the random effects model, the pooled estimate for proportion correctly triaged was 0.80 (95% CI, 0.75 to 0.86). With the fixed effect model, pooled triage accuracy was 0.83 (95% CI, 0.81 to 0.84). Again, there was evidence of significant heterogeneity among studies (P < .0001).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220218081639037-0653:S1049023X2100131X:S1049023X2100131X_fig4.png?pub-status=live)
Figure 4. Category RED Triage Correct.
The proportion of yellow triage category victims correctly triaged was reported in eight studies with accuracy ranging from 0.49 to 0.87 (Figure 5). The pooled estimate for proportion correctly triaged was 0.66 (95% CI, 0.55 to 0.77) with significant heterogeneity among studies (P < .0001) using a random effects model. Using a fixed effects model, the pooled estimate was 0.70 (95% CI, 0.69 to 0.71).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220218081639037-0653:S1049023X2100131X:S1049023X2100131X_fig5.png?pub-status=live)
Figure 5. Category YELLOW Triage Correct.
In the green triage category, eight studies reported an accuracy between 0.70 and 0.94 (Figure 6). The estimated pooled proportion correct was 0.87 (95% CI, 0.82 to 0.92) with significant heterogeneity among studies (P < .0001) using the random effects model. Using a fixed effects model, pooled accuracy was 0.86 (95% CI, 0.86 to 0.87).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220218081639037-0653:S1049023X2100131X:S1049023X2100131X_fig6.png?pub-status=live)
Figure 6. Category GREEN Triage Correct.
Sensitivity Analysis
In four studies that did not explicitly report the absolute number of simulated patients triaged correctly, the results could be calculated by imputation. Reference Ferrandini Price, Escribano Tortosa and Nieto Fernandez-Pacheco15,Reference Mills, Dykstra and Hansen26,Reference Wu, Shu and Chung34,Reference Challen and Walter39 Addition of these four studies resulted in minimal change to the pooled estimates. When added to the fully reported studies, 28 studies could be included for overall assessment of START accuracy. Among these studies, the proportion of cases triaged correctly ranged from 0.27 to 1.0. The pooled effect estimate for accuracy was 0.74 (95% CI, 0.70 to 0.79) using a random effects model. The pooled triage accuracy was 0.72 (95% CI, 0.71 to 0.72) using a fixed effects model. There was significant heterogeneity among studies (P < .0001).
Four studies initially included in the analysis group did not contain sufficient information to estimate triage accuracy and could not be included in the analysis of triage accuracy. Reference Loth, Cote and Shaafi Kabiri24,Reference Riza’i, Ade, Albar, Sulitio and Muharris29,Reference McCoy, Alrabah and Weichmann35,Reference Djalali, Carenzo and Ragazzoni42
Secondary Outcome
Over-Triage
The proportion of over-triage was assessed in 18 (56%) studies (Figure 7). Over-triage proportions ranged from 0.006 to 0.53. The pooled over-triage estimate was 0.14 (95% CI, 0.11 to 0.17) using the random effects model. Using the fixed effects model, the pooled over-triage proportion was 0.10 (95% CI, 0.09 to 0.10). There was significant heterogeneity among studies (P < .0001).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220218081639037-0653:S1049023X2100131X:S1049023X2100131X_fig7.png?pub-status=live)
Figure 7. Over Triage.
Under-Triage
Under-triage was assessed in 18 (56%) studies and the reported proportion of under-triage ranged from 0.0061 to 0.25 (Figure 8). Using the random effects model, the pooled proportion of under-triage was 0.10 (95% CI, 0.07 to 0.14). The pooled estimate was 0.06 (95% CI, 0.06 to 0.07) using the fixed effects model. There was evidence of significant heterogeneity among studies (P < .0001).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220218081639037-0653:S1049023X2100131X:S1049023X2100131X_fig8.png?pub-status=live)
Figure 8. Under Triage.
Discussion
Using a robust search of the literature and efforts to avoid publication and selection bias, this meta-analysis identified the available studies utilizing the START tool for the assessment of critically ill or injured individuals in a real or simulated disaster. With a triage accuracy of 73%, this meta-analysis suggests that the START method is not sufficiently accurate to serve as a reliable disaster triage tool.
In this meta-analysis, there was evidence of significant heterogeneity for each of the outcomes. As the included studies featured a number of different methodologies, disaster scenarios, and providers, this is not surprising. In fact, as disaster triage systems are expected to function across a wide range of disaster scenarios and provider experience, the heterogeneity of the studies represents one of the strengths of this meta-analysis: assessment of the true accuracy of START across a variety of scenarios. For a disaster triage system such as START to be useful, it must be reproducible. In addition, mass-casualty plans should contain a triage system that can be used by health care providers at a variety of levels.
Published studies of other disaster triage methods also show inconsistent and often low accuracy. For instance, recent studies of the Sort Assess Life-Saving Triage (SALT) method have found accuracy from 52% to 79%. Reference Fink, Rega, Sexton and Wishner47,Reference Hartman, Daines, Seto, Shimshoni, Feldman and Sort49–Reference McKee, Heffernan and Willenbring51 Accuracy of Careflight triage among recently published articles varied from 39% to 94%, while triage sieve accuracy varied from 16% to 90%. Reference McKee, Heffernan and Willenbring51–Reference Vassallo and Smith54 The Sacco Triage Method (STM) may hold promise of a higher rate of accuracy, however, there are few published studies employing this method. Reference Navin, Sacco and Waddell27,Reference Cross and Cicero55,Reference Cross and Cicero56 By comparison, START was found to have a pooled accuracy of 73% with similar wide ranges (27% to 99%).
Traditionally, most disaster triage systems have been designed primarily to avoid under-triage as this protects disaster victims from a dangerous situation where they are denied early access to necessary medical care. Unfortunately, only a few studies (n = 18) included in this review reported under-triage as an outcome. Analysis of those studies suggests that under-triage of START victims occurs in six percent to ten percent of victims.
Conversely, although over-triage represents little danger to the victim (victims are prioritized to get care quickly even if they do not need it), it represents a danger in overwhelming the health care response system. Again, only a small number of studies (n = 18) reported the outcome of over-triage, which was found to occur in seven percent to fourteen percent of victims.
This review also demonstrated that triage accuracy was lowest in the mid-acuity (yellow) patients at 66%. Accuracy was highest among the least acute (green) victims at 87%. Triage accuracy of black and red victims was 85% and 80%, respectively. Intuitively, having the highest accuracy among the most acute groups seems desirable as mistakes in triage among more acute victims are more likely to affect the victim’s outcomes. The risk of the mid-acuity (yellow) victims being triaged incorrectly, however, does have the potential to result in maldistribution of resources that may be limited.
This review investigated the accuracy of the START method across all levels of providers. By definition, disaster scenarios exist because there is a lack of adequate resources to care for victims in the usual manner. Disaster plans commonly provide operating procedures to increase surge capacity by repurposing spaces, increasing supplies, and creating special operating protocols. In addition, responding to increased surge capacity often involves repurposing of staff. Thus, during a disaster, the task of prioritizing patients may be assigned to personnel not usually accustomed to performing triage. To be effective for use in disaster situations, a structured triage protocol such as START should be adequately intuitive to be used across the entire range of providers: physicians, nurses, and prehospital providers. Triage systems that cannot be used by providers of all levels — while acceptable for normal operation of the emergency medical system — are suboptimal for disaster response.
Since the development of the START system in the 1980s, there has been a massive change in available technology. The ability to assess disaster victims using only a flowchart on a piece of paper and colored tags was clearly innovative forty years ago. As the use of hand-held electronic devices has become ubiquitous, electronic triage tools may hold promise. A recently published, large multi-center trial of 1,491 emergency department patients showed an increase in triage accuracy from 75.4% to 92.7% after introduction of the electronic Canadian Triage and Acuity Scale (eCATS) tool. Reference McLeod, McCarron and Ahmed57 Conversely, another study comparing head-to-head electronic versus non-electronic methods failed to find a significant improvement. Reference Bolduc, Maghraby, Fok, Luong and Homier25 Advancement in simulation training has also occurred over the years, from written summaries to live simulations with actors/mannequins with moulage to simulation exercises in VR. The impact of new technological advancements in triage assessment and simulation training on the accuracy of START is not well-understood and should be the subject of further investigation.
Limitations
The current study addresses the accuracy of START against self-reported reference standards. As many studies did not describe in detail how reference standards were developed or obtained, it is possible that study investigators introduced bias in the development of these standards. This may be most concerning when the group studying the triage system and the group setting the reference standards included the same people.
The majority of included studies were based on a reference standard of expert opinion. As such, most studies evaluated the reliability of the START method to provide consistent assignment of the victims to the four groups. However, the prognostic capacity of the triage system was usually not assessed. A vital question that should be a topic for further studies would be to assess the prognostic ability of the test. Does the START method predict actual need for treatment? Furthermore, the final and most important outcome would be to evaluate the efficacy of the triage system in actually providing the best balance of individual patient care to optimize the disaster response. Although logistically and ethically challenging to perform, these studies would help delineate the true role of the START method. It should be considered, however, that if a test cannot be reliably applied by the group of providers using it, it is unlikely that the test will have good prognostic performance.
The majority of studies included in this review involved triage of simulated patients. The fidelity of the simulations varied from pen-and-paper to simulated patients to VR scenarios. While simulations are commonly used for training and evaluation of disaster medicine scenarios, all simulations have limitations. It is possible that true performance of START in real-life scenarios is different from that in simulated scenarios. Further studies to assess START performance in true disaster situations may be helpful in quantifying this difference. However, logistically performing this study would likely be very difficult to coordinate in a safe and ethical manner given the unpredictable nature of disasters.
Unfortunately, there are several limitations due to the quality of the included studies. Only five of the included studies were randomized trials. Many studies suffered quality issues such as inadequate description of randomization, lack of blinding, and inadequate description of participant selection.
This study did not address pediatric disaster triage. As injury patterns and clinical presentations in pediatric disaster victims may vary greatly from adult victims, this study should not be used as a judgement of the accuracy of START among pediatric patients.
Finally, while there is a risk of selection and publication bias in this review, steps were taken to minimize these risks, including using two independent reviewers to identify eligible studies with a third-party mediator, as well as conducting an extensive search of the published and unpublished literature.
Conclusions
The results of this systematic review and meta-analysis suggest that with an overall accuracy of 73%, START triage may not be sufficiently accurate to serve as a reliable disaster triage tool. Although meta-analysis revealed relatively low proportions of over- and under-triage, only approximately one-half of the included studies reported these outcomes. Victims in the mid-acuity (yellow) group appear to be the least accurately triaged. The vast majority of the included studies were simulation exercises and used expert opinion as the reference standard leading to potential biases in the results. In general, although the accuracy of START may be similar to other models of disaster triage, development of a more accurate triage method should be urgently pursued.
Conflicts of interest/funding
Scoping and Systematic Review Grant from the Emergency Strategic Clinical Network (ESCN) at Alberta Health Services and the Emergency Medicine Research Group (EMeRG) in the Department of Emergency Medicine at the University of Alberta. Dr. Rowe’s research is supported by a Scientific Director’s Grant (SOP 168483) from the Canadian Institutes of Health Research (CIHR; Ottawa, ON). The content hereof is the sole responsibility of the authors and does not necessarily represent the official views of the funding agencies. The funders had no role in the design, implementation, analysis, and write-up of the study. Dr. Franc is CEO and Founder of Stat59. The remaining authors have no conflicts to declare.
Acknowledgement
The authors would like to thank Jillian Meyer (JM) for her role in assisting with the review.
Supplementary Materials
To view supplementary material for this article, please visit https://doi.org/10.1017/S1049023X2100131X