Introduction
Over the past decade, natural disasters have claimed hundreds of thousands of lives.1 As urbanization and climate change accelerate, the incidence of devastating natural disasters will likely increase.Reference James, Subbarao and Lanier2 Medical responders have struggled to provide care in the immediate aftermath of these disasters, often with widely variable results.Reference Rosborough3 Improperly prepared response teams may engage in practices that undermine the long-term wellbeing of a population through inappropriate use of resources, inconsistent application of local standards of care, and alienation and ultimate disengagement of local health staff.4 To increase accountability, many governing bodies have mandated that humanitarian assistance be evidence-based and be followed by after-action reports that include measures of effectiveness (MOEs).Reference Reaves, Schor and Burkle5-Reference Bradt and Aitken9
Measures of effectiveness are clearly identified and agreed-upon targets that can be used to evaluate progress towards a desired goal. They facilitate communication among stakeholders and beneficiaries as to whether essential standards are being met, and link policy to action. Further, some MOEs can be assessed in real time, thereby facilitating immediate feedback and informing responders on the need for modification of practices.Reference Ciottone10 Prior efforts have been made to standardize reporting. The Sphere Project (International Council of Volunteer Agencies, Geneva Switzerland) is among the most notable collaborative efforts to produce clear targets for aid agencies working to serve displaced populations. In its latest edition of Humanitarian Charter and Minimum Standards in Disaster Response (2011), the group outlines standards for a wide range of services, including water and sanitation, nutrition, and health care.11 Incorporating Sphere standards with measures used by other humanitarian agencies led to development of the Rapid Epidemiologic Assessment tool to identify major areas of need within displaced populations.Reference Bradt and Drummond12 Recent efforts have produced more detailed after-action reporting tools.Reference Debacker, Hubloue and Dhondt13 These new tools are data intensive, however, and are not suited for real-time assessment. Additionally, the information is less applicable to individual agency contributions, relies partially on subjective rating scales, and may be more applicable to resource-rich settings.
The international disaster community is in need of a reporting tool that can describe the ongoing efforts of a disaster response team providing medical care. The ideal tool should be accessible, able to facilitate real-time decision making, easy to understand, constructed of MOEs that are widely regarded as indicative of the quality of care provided, informative as to problem areas that need review, and reasonably completed within two hours, thereby not detracting attention from provision of medical care.Reference Bradt and Drummond12
There have been prior calls for monitoring and evaluation standards for disaster medical response.Reference Reaves, Schor and Burkle5, Reference Bradt and Aitken9, Reference Bradt and Drummond12-Reference Leiba, Schwartz and Eran14 Measures of effectiveness have been difficult to construct because little data exists to link agency response with population outcomes and impact. Previous efforts to construct a standardized reporting template have produced disparate results, few of which have been adopted widely. In matters where no consensus exists for a difficult-to-test hypothesis, group consensus methods have been utilized to interpret and consolidate existing data and viewpoints.Reference Fink, Kosecoff, Chassin and Brook15
The goal of this study was to use group consensus methods to construct an MOE-based reporting template to describe acute phase medical care provided by agencies responding to a major natural disaster.
Methods
Literature Review
A literature review was conducted to provide participants with a summary of the current state of performance reporting in disaster medical response. Using the search terms listed in Table 1, a PubMed (National Center for Biotechnology Information, Bethesda, Maryland USA) search identified articles published from June 2001 through June 2011 describing medical care provided following the 20 disasters of any type with the highest casualty count during this period.1 PubMed was used as a primary source of article selection, as it provided access to a large number of peer-reviewed publications. While not exhaustive, it is a recognized starting point for disaster-related literature reviews.Reference Chan and Burkle16, Reference Kelen and Sauer17
A single reviewer (RD) screened these articles for measures describing provided medical care. Articles were excluded if they referred to care only in the nonacute phase of a response (more than two weeks following an event). An additional search using Google Scholar (Google, Mountain View, California USA) was conducted for other disaster literature that used consensus methodology to develop response standards. At the time of this review, only the work of the Sphere Project met this additional criterion.11
The selected articles were reviewed for all measures pertaining to the input, activities, outputs, outcomes, and impact of disaster medical response.Reference Reaves, Schor and Burkle5 All qualitative and quantitative descriptors were extracted by a single reviewer and compiled in a database (RD). A second reviewer (AC) evaluated the database for appropriate grouping.
These measures were sorted (RD, AC) into a spreadsheet using the above domains, and then further sorted into major categories (evaluation, treatment, disposition, prevention and population [public health], and team [logistics]). The steering committee (FB, MC, DF, AG, GK, ER; Appendix A, available online) was comprised of persons with publications pertaining to disaster response, or those serving in leadership capacities for disaster response organizations. They selected measures according to the following prompt:
A single-impact natural disaster has struck a large community, resulting in mass-traumatic injury and incapacitation of existing medical infrastructure. Medical teams are deployed to assist with provision of medical care spanning 72 hours post event (T+72hours) to the next two weeks (T+14 days). What clinical measures are predictive of reduction in morbidity and mortality during this time?
Those measures receiving approval from at least four out of six (66.7%) committee members were graduated into the Delphi stage. This target was chosen because it falls within the 55%-80% commonly used to indicate consensus.Reference Hsu and Sandford18, Reference Powell19
Delphi Process
A subject matter expert panel (referred to as the Expert Panel) comprised of persons recognized for contributions to disaster response, agency representatives, and those in leadership positions overseeing disaster-related functions, was nominated by the steering committee using pre-established criteria.Reference Hsu and Sandford18 The nominees were directed to an online survey site (Survey Monkey Inc., Palo Alto, California USA).
In Round 1, participants from the Expert Panel were presented with those measures identified by the steering committee. Participants were asked to identify which measures in each category applied to the above-mentioned prompt, and to suggest (“free-text”) additional measures. At that time, the steering committee was invited to submit free-text measures through the survey site. Contributed measures were compiled with those of the Expert Panel.
Three reviewers (AC, RD, ER) independently evaluated the consolidated free-text measures according to the following predetermined criteria: can be objectively measured, describes the acute response, and not already presented in Round 1. These reviewers then discussed, via phone conference, whether each measure met the above criteria. Unanimous consensus was required to approve measures or consolidate redundant measures. Those that could be reworded into a rate (numerator/denominator/time) were, if the reworded measure received unanimous consent by the three reviewers. Those measures not meeting unanimous consensus were eliminated. Measures that were suggested by the Expert Panel, but that had already been reviewed in some form by the steering committee in an earlier round, were eliminated.
Qualifying free-text measures were combined with those measures receiving >75% approval by participants in Round 1. These measures were organized into major categories for Round 2 and presented to those subjects who completed the prior round. In Round 2, participants were asked to identify which measures answered the initial prompt using dichotomous choices (“yes” or “no”). They were asked to approve only those measures meeting the National Quality Forum's Criteria for Measure Construction: important, valid, usable, and feasible.20 The survey site randomized category ordering for each participant.
As with the prior round, measures receiving >75% of Expert Panel approval in Round 2 advanced to Round 3. In this final round, measures were randomly ordered on a single electronic page for each subject. Subjects rated on a 5-point, ordinal scale the value of each measure as it pertained to the initial prompt (1 = This measure has no value and should be eliminated, 3 = This measure has value in some situations, 5 = This measure has definite value in most situations and should be kept). Those individual measures with a median value equal to or greater than the median for all measures were selected as the final measures. While there is wide variability among studies in defining consensus, the use of central tendency techniques employing the median value to indicate majority consensus has been described, especially in use of ordinal scales in the modified Delphi format.Reference Fink, Kosecoff, Chassin and Brook15, Reference Hsu and Sandford18, Reference Williams and Webb21
Traditionally, the final round of a Delphi study has respondents rank the most valuable items. Because the measures in this study had already approached a high level of consensus in the previous round (>75%), and because the remaining measures pertained to different categories, an ordinal scale was used so that an Expert Panel member could evaluate an individual measure on its own merit. Additionally, providing an ordinal scale, as opposed to the previous dichotomous choice, allowed more information to be collected regarding consensus on a measure. Surveys for all three rounds have been included (Appendices B, C, and D, available online).
Analysis
Fleiss Kappa scores were calculated (STATSTODO, Trading Pty LTD, Brisband, Queensland Australia) to assess agreement for measure evaluation by the Steering Committee. Chi-squared test was calculated to determine statistical difference in response rate by subject type (OpenEpi, Emory University, Atlanta, Georgia USA). Median and mode scores were tabulated for all measures in Round 3.
The study was approved by the Baylor College of Medicine (Houston, Texas USA) Institutional Review Board.
Results
One hundred twenty-two participants were nominated by the steering committee (range 2-49 nominees/member). All were sent electronic invitations. Forty-nine people initiated the Delphi process (40.2%), of whom 39 (79.6%) completed Round 1. Twenty-six of thirty-nine (66.7%) completed Round 2, and 24/39 (61.5%) completed Round 3. Of those who completed all rounds, 33% represent international agencies, and 42% represent a US government agency (Table 2; Appendix E, available online). There was no statistical difference between composition of responders and those of partial responders and nonresponders (P = .95).
Abbreviations: IGO, intergovernmental organization; NGO, nongovernmental organization.
aAgency type with which respondents report affiliation. Nonresponders were identified by web search.
bPartial refers to those participants who opened the survey link but did not finish all three rounds.
cNo response indicates those persons who were sent invitations but did not open the survey.
Of the 220 articles initially screened in the literature review, 146 met inclusion criteria. Twelve hundred eighty-seven measures were extracted (Figure 1). These were consolidated into 397 unique measures, 116 (29·2%) of which were approved by the Steering Committee, (Fleiss κ = 0.0513 [0.0259, 0.0767]). In the first round of the Delphi process, 25/116 (22%) measures were approved by the Expert Panel. Seventy-seven free-text measures (consolidated from 347 submissions) were added to this list of measures proceeding to the next round. In Round 2, 56/102 (55%) of measures were approved. In Round 3, each measure obtained a median score of 3 or greater, indicating a high level of approval. The median response for all measures combined was 4. Only 37/56 (66%) individual measures had a median score of 4 or higher and were included in the final reporting tool (Table 3). These measures describe team logistics (15), treatment (10), public health (6), disposition (4), and evaluation (2). Eleven of thirty-seven (30%) were quantitative rates.
Measures graduating through all three rounds of the group consensus process. Rate units are suggested in parentheses. Measures are sorted into major categories. Within each category, measures are presented in their rank-order by median score. Mean values are used only to sort within common median scores.
aQuantitative rate.
Discussion
Using a modified Delphi method, the authors identified 37 MOEs for disaster medical response during the acute phase following a natural disaster. Categories pertaining to treatment and team logistics had the greatest number of measures. This may reflect the focus of response teams and the source of their greatest challenges in delivery of care. The measures with the highest median scores describe team organization (response times, incorporation of local medical staff, and establishment of clear chain of command) and assessment (number of operating health care facilities and needs assessment performed by an advanced team).
The majority of measures presented are qualitative; most of those can be answered dichotomously. Two measures (basic life support measures available and method of hand-off) are posed as open-ended questions. Only eleven rate measures graduated through all three rounds. Such measures require data intensive reporting, which may not be feasible. These measures offer detail on severity of illness (number of pediatric patients requiring ventilation and rate of transport of critical patients), treatment performance (number of patients treated and treatment of contaminated wounds with debridement and antibiotics), and outcomes (mortality rates of critical patients and cause specific mortality rates). Additional rate measures were suggested; most did not survive the consensus process.
These measures provide an opportunity for “real-time” evaluation during an acute response. It is critical that they balance the need for detailed information with the limitations of a medical response team working in austere conditions, and without the aid of data collection personnel. Such a reporting tool allows rapid assessment of effectiveness, which should be done systematically during provision of medical care.
Much has been written on the need for uniform reporting and establishing standards of care in humanitarian response, but little has been suggested in terms of objective measures that are indicative of impact. Group consensus methods provide a formalized process to elicit general opinion on a topic where little evidence exists and where sufficient experimental opportunities are not practical.Reference Fink, Kosecoff, Chassin and Brook15, Reference Jones and Hunter22 The Delphi process was selected to elicit expert opinion in a manner that would allow self-validation of findings. While the identified measures are merely the product of expert deliberation and have yet to be tested in an actual disaster, they represent the next best step towards objective indicators. By providing performance targets for response agencies, attention can be placed on important considerations, such a prearrival coordination with overseeing agencies and transition of care upon completion of mission.
Without such targets, the community of disaster responders risks repeating the challenges of prior disaster efforts: limited coordination of responders, suboptimal utilization of aid, and impacted populations unable to access life-saving care in the critical period following a major event. Evaluation of outcomes should be used to compare organizational performance, thereby allowing funding to be directed to the most effective organizations. It could also enable governments of impacted countries to select organizations with better performance to participate in the response. The Sphere Project made a monumental step when it elaborated on Minimum Standards for disaster response. The present work seeks to extend those efforts and provide greater accountability in the acute phase of a disaster medical response.
Validation is required. The list of measures should be evaluated by agencies providing acute medical care in the days and weeks following the next major natural disaster. Additionally, these acute response measures should be linked to subsequent recovery efforts in an attempt to identify which measures are predictive of reductions in morbidity and mortality, and establishment of a sustainable and effective humanitarian effort.
Limitations
There were limitations to this study. Participants in the Expert Panel were nominated by the steering committee, potentially introducing selection bias. A larger panel of subjects may have also been beneficial to achieve thematic saturation; however, most studies of this design utilize 15-20 participants.Reference Hsu and Sandford18 United States experts were disproportionately represented, despite a large number of international experts being approached. Response rate was not correlated with international status. It is unclear why so few people completed all rounds of the study. This may have been secondary to the complexity or repetitiveness of an iterative survey process. Further investigation into this might be helpful if the study is to be replicated. A single reviewer extracted measures during a review of PubMed articles and a search through online grey and unpublished literature. This may have introduced classification bias and would reflect publication bias. However, the impact of potential measure omissions may have been negated in part by the ability of both groups to add free-text measures. Key words were not defined using a common standard and respondents may have interpreted items differently. The consolidation of redundant measures and classification of free-text measures could also have introduced classification bias. The authors attempted to decrease this risk by having three authors review measures using a predetermined process.
All measures in Round 3 received a median rating of 3 or greater. This indicates that all measures were perceived as having “value in some situations.” An overly long list of evaluation measures can be cumbersome and a criterion was put in place so as to refine this group of 56 to a more manageable 37. In some cases, a single respondent rating a measure one point higher or lower could have changed the median score, thereby qualifying or eliminating a measure from inclusion. Selected measures reflect the views of the subject matter expert panel; a different panel may have selected others. For this reason, data on all measures from Round 3 have been included (Table 3; Appendix F, available online).
Conclusion
Group consensus methods were used to develop measures describing the functions of acute medical response to major natural disasters. These measures can facilitate detailed description of agency contributions and allow real-time assessment of performance. This is a crucial step in linking early actions and outputs to long-term outcomes and impact. Development of standards helps ensure that disaster care counts when it is most needed; this work is a step towards developing those standards.
Note: The views expressed in the paper are solely those of the authors and do not necessarily reflect the views, policies, or official position of the Royal Canadian Air Force, the Canadian Department of National Defence, the US Government, the US Department of Defense, or the US Department of Health and Human Services.
Acknowledgements
Funding by the US Department of Health and Human Services, Emergency Medical Services for Children Grant. The authors thank Susan Torrey, Deborah Hsu, Manish Shah, and Thomas Kirsch for their mentorship and support.
Supplementary materials
To view supplementary material for this article, please visit http://dx.doi.org/10.1017/S1049023X14000922