Introduction
Managing the psychological impact of a disaster is a critical public health challenge. The informed incidence and prevalence of mental health disorders are essential to effective service planning in the aftermath of a disaster.Reference Dussaillant and Apablaza 1 , Reference McFarlane, Van Hoof and Goodhew 2 But, as recent academic literature discusses, there is a shortfall of reliable systems for translating available data into public health tools.Reference Ursano, Fullerton and Benedek 3 This gap in knowledge must be filled, especially since such systems may be of significant use for emergency response planning. One of the biggest challenges faced by disaster researchers and disaster management and prevention practitioners is identifying the population at risk as precisely as possible.Reference Samarasundera, Hansell, Leibovici, Horwell, Anand and Oppenheimer 4
In terms of mental health, posttraumatic stress disorder (PTSD) is probably one the most frequent and debilitating consequences of a disaster.Reference Dominici, Levy and Louis 5 The disorder can become chronic and enduring, with lifelong effects that might even escalate in time.Reference Dussaillant and Apablaza 1 , Reference McFarlane, Van Hoof and Goodhew 2 But research has shown that early interventions have some effectiveness in the prevention of the disorder.Reference Galea, Nandi and Vlahov 6 - Reference Kliem and Kroger 8 Therefore, PTSD should be one of the main targets of emergency mental health interventions in the aftermath of a disaster. However, these interventions are costly. To improve their cost-effectiveness, the right choice of intervention targets is central and achieving this is important to predict population risk.Reference Shalev, Ankri and Israeli-Shalev 9 The predictability of local aggregate measures of posttraumatic stress (PTS) symptoms is valuable in that it points to localities where the problem may become most extended. Thus, it aids disaster management professionals in the challenge of assigning scarce mental health personnel to different geographic locations.
The enormous individual heterogeneity of response to the environmental shock, in terms of the emergence (or not) of PTS symptomatology, has not been duly understood, even now. Yet, some risk factors have been identified through abundant research, including several reviews.Reference Dussaillant and Apablaza 1 , Reference Dominici, Levy and Louis 5 , Reference Kessler, Rose and Koenen 10 - Reference Norris, Friedman, Watson, Byrne, Diaz and Kaniasty 13 For example, the development of PTSD has been consistently found to correlate with disaster exposure, type, and severity. Human loss (death of a relative) and physical injury also have been found to be associated with the symptoms. Material loss, especially (but not exclusively) housing damage, is related to the occurrence of post-disaster PTSD. Previous history of mental problems also has been found to be closely associated with the development of PTSD in the aftermath of a disaster. Females appear to be more prone to acquire the disorder, whereas the elderly seem to be more resilient. Socioeconomic status and poverty have been found to be risk factors for PTSD.
Although these risk factors have been identified, even after accounting for the standard sociodemographic controls or the abovementioned risk factors, it is still very difficult to predict whether a specific person will suffer PTS. Human beings are heterogeneous in multiple dimensions not accounted for in most studies, such as the biological predisposition to mental disorders, cognitive and emotional types, and personality. These factors influence mental health in ways that are only partially understood.Reference McFarlane, Van Hoof and Goodhew 2 , Reference Shalev, Ankri and Israeli-Shalev 9 , Reference Norris, Friedman, Watson, Byrne, Diaz and Kaniasty 13 This unobserved individual heterogeneity makes it difficult to predict the emergence of PTSD at the individual level using standard covariates.Reference Norris and Wind 14 But when predicting aggregate measures of PTSD (ie, local mean scores, prevalence), the within group variation is removed and better predictive power may be obtained. This is what is intended to be checked through this study.
This analysis tries to advance in this line of research by deriving simple algorithms to predict the prevalence of PTSD and the distribution of symptoms in locations where an earthquake has struck. The starting point in the search for this “rule of thumb” is a very rich database with plenty of possibilities to model PTS prevalence. There is also, therefore, the possibility to estimate more complex models to start with.
But the main interest of this specific study is to derive simple but predictive algorithms to be applied in an emergency context. This, and not a complex algorithm requiring difficult-to-find data, is what is needed in a real-world setting to rapidly assign emergency mental health professional assistance to the different locations hit by a disaster. The objective is to derive a predictive algorithm under the assumption of data scarcity, and therefore, efforts will be made to obtain good prediction with the least possible information requirements. Data averaged at the local level are more immediately available after the disaster strikes, compared to individual-level information. This, since local poverty levels, local average educational levels, or local unemployment statistics are usually obtained through representative surveys where only a fraction of the population is interviewed. Therefore, individual-level data will probably not be readily available after the disaster strikes, but aggregate-level data may be easier to obtain. It is for that reason that this paper focuses on aggregate-level predictors.
Methods
Data
The Post-Earthquake Survey (EPT; Spanish acronym) database contains longitudinal (two-panel) data about the same persons before and after a major disaster. The database is innominated, as compromised in the informed consent included in the interview protocols. It can be downloaded, together with accompanying manuals, from the Ministerio de Desarrollo Social Site.Reference Zubizarreta, Cerdá and Rosenbaum 15
It encompasses nationally representative data from a household survey gathered in November and December 2009, a few months before the 2010 earthquake and tsunami that hit Chile. The database was complemented by post-disaster follow-up information, since the Chilean government re-interviewed a representative subsample of 22,456 of the original 71,460 households between May and June 2009. The follow-up asked about several disaster-related and socioeconomic issues, and respondents were requested to complete the Davidson trauma battery, 16 a self-report instrument used to evaluate PTS symptoms.
This trauma battery consists of 17 items, each corresponding to a PTS symptom, as described by the Diagnostic and Statistical Manual of Mental Disorders, 4th edition.Reference Davidson, Book and Colket 17 The questions were worded so that PTSD symptoms were assessed specifically in relation to the earthquake/tsunami. Each item is rated twice on a five-point scale, once in terms of frequency (increasing from “not at all” to “every day”) and once in terms of severity (increasing from “not at all distressing” to “extremely distressing”). The Appendix (available online only) provides a list of the 17 items. Since the frequency scores for each item range from zero to four and the severity scores for each item also range from zero to four, the total score per item ranges from zero to eight points. When adding up the scores for the 17 items, a PTS total is obtained, which ranges from a minimum score of zero to a maximum of 136. Only respondents who were present at the moment of the interview were asked to answer the battery, and this resulted in 23,907 valid PTS score values for individuals aged 18 or older. At least one adult person from 21,059 of the households included in the sample responded to the battery.
Several municipality-level variables were generated using the EPT. Poverty, unemployment, and rurality prior to the earthquake were constructed as the weighted average of individual indicator variables. The EPT defines as rural the zones with less than 1,000 inhabitants or the zones with between 1,000 and 2,000 inhabitants in which less than 50% of the active population work in the secondary or tertiary sectors. 18 Local inequality indexes (Gini Index and Theil Index) prior to the earthquake were estimated using a weighted measure of the total household income. The proportion of complete household destruction and the proportion of severe household damage variables were constructed using the responses to a question that asked each respondent whether their house had been completely destroyed by the earthquake/tsunami, had been severely damaged, had undergone some minor damage, or had no damage at all.
The EPT data were complemented with information on the strength of the earthquake and the tsunami, the history (and intensity) of aftershocks, and death rate at municipality level (203 municipalities). The intensity of the earthquake was quantified through peak ground acceleration (PGA), a measure that describes, in a broad sense, how hard the earth shakes in a given geographic area. Using the values provided by the United States Geological Survey (USGS), a research team led by José Zubizarrieta estimated the PGA in each of the municipalities where the EPT was collected.Reference Norris and Wind 14 They obtained one value for each municipality using the PGA grid provided by the USGS. Municipality values correspond to the inverse distance weighted average of the three closest grid estimates. Their interpolated data were used in the estimation.
To measure the intensity of the tsunami, local geo-referenced data on height of the waves and horizontal inundation from the Global Historical Tsunami Database at the National Geophysical Data Center (NGDC), part of the National Oceanic and Atmospheric Administration (NOAA; Silver Spring, Maryland USA), 19 were used. The highest wave registered on the coast of each municipality and the longest inundation record were counted in when more than one observation was documented. In the case of locations with no information, the tsunami data were interpolated according to a north-south rank ordering of the coastal municipalities. In coastal locations where there was no tsunami information in any nearby municipality, it was assumed that there was no alteration of the sea and assigned a value of zero to the indicators. The same value of zero was assigned to non-coastal municipalities.
Local data of intensity (measured in the Modified Mercalli Scale; MMS), date, and location of the earthquake aftershocks, occurring between February 27 and May 1, 2010 when EPT fieldwork commenced, also were available for this study. These data were obtained from the USGS 20 that gathered the data using the “Did You Feel It?” method. 21 , Reference Atkinson and Wald 22 For municipalities where there was no measurement, the mean intensity for the province (the administrative division that follows in size, grouping several municipalities) was assigned. Where there was no measurement at the provincial level, null intensity (ie, no aftershock) was assumed. Several variables that grouped aftershocks by intensity and counted them as they occurred between February 27 and May 1 were constructed.
Finally, the number of deaths per municipality (due to the earthquake/tsunami) was obtained from the Statistics Unit at the Chilean Forensic Services Department.Reference Wald, Quitoriano, Dengler and Dewey 23 These data were converted into death rates by using the municipality population figures obtained from the Chilean National Institute of Statistics (Santiago, Chile).Reference Nahuepan and Varas 24 Table 1 provides a list of variable names and explains their particular construction.
Table 1 Sets of Covariates Used in the Estimations.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170728025717-66785-mediumThumb-S1049023X17000206_tab1.jpg?pub-status=live)
Note: L=enters linearly; QP=enters as a quadratic polynomial; CP=enters as a cubic polynomial; Fn=enters as a factor variable with n factors. Preferred specification in grey.
PGA=peak ground acceleration.
Destruction=proportion of completely destroyed households.
Damage=proportion of households with severe damage (but not completely destroyed).
Rurality=proportion of the adult (≥18) population that lives in rural zones.
Death Ratio=deaths per 10,000 local inhabitants.
Horiz=length of entry of the sea into the land. In meters.
Hori200=simplified version of horiz. The value of horiz is rounded to the nearest multiple of 200.
Water Height=height of the highest tsunami wave recorded in the coast of the locality. In meters.
Poverty=proportion of the population 18 or older that falls below the poverty line.
Aftershocks (7 vars)=seven variables each describing the number of aftershocks of a certain intensity (1 to 2 MMS; 2 to 3 MMS; 3 to 4MMS;…;7 to 8MMS).
Aftershocks 4 to 8 MMS=total number of aftershocks from 4MMS to 8MMS between February 27 and May 1st.
Unemployment=proportion of the local active population that is currently unemployed.
Gini=Gini Index of Inequality.
Theil=Theil index of inequality.
Statistical Methods
Several statistical analyses were performed using STATA 13.0 (StataCorp LLC; College Station, Texas USA). Different methods were used, depending on the variable to be predicted: PTS scores averaged at the municipality level, measures of the prevalence of PTS scores above thresholds, or centiles of the municipal PTS score distribution. To obtain these aggregate measures, the weights provided in the survey were utilized. Since not every household, but only those present at the time of the interview responded to the PTS battery, some doubt may arise regarding the convenience of sample weight used to generate the aggregates. As a background check, parallel analyses with un-weighted aggregates (not reported) were performed and returned similar results.
Predicting PTS Average Scores (within Municipality)
Linear models were estimated with ordinary least squares with robust errors. Equations 1 to 4 depict the estimation process, with Yij
representing individual PTS scores and Xj
a set of covariates that will be described in the Estimation section of this paper. Coefficients estimated for the specification described in Equation 1,
$\hat{\alpha }$
and
$\hat{\beta }$
, were used to generate municipality-level aggregate predictions (Equation 2). These predictions were then regressed with the empirical municipality-level average values (
$\rm \bar{Y}$
as defined in Equation 3) to check the similarity between empirical and estimated aggregates, as shown in Equation 4. Coefficients of determination (R2) and root mean square errors (RMSE) for this last step are the measures of goodness of fit that were chosen to report.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170727094706501-0202:S1049023X17000206:S1049023X17000206_eqnU4.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170727094706501-0202:S1049023X17000206:S1049023X17000206_eqnU5.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170727094706501-0202:S1049023X17000206:S1049023X17000206_eqnU6.gif?pub-status=live)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170727094706501-0202:S1049023X17000206:S1049023X17000206_eqnU7.gif?pub-status=live)
Predicting PTS Prevalence
Prevalence was measured as the proportion of the sample that got a PTS score above a certain threshold. Thresholds were set at 20, 30, and 40 points on the Davidson scale. The team that validated the Spanish version of the battery proposed a cut-off score of 40 as the most efficient to determine clinical PTSD. 25 Nevertheless, several authors indicated that sub-syndromal PTSD does imply some form of disability (sometimes similar to that of the full-blown disorder), which deserves further study.Reference Dussaillant and Apablaza 1 , Reference Bobes, Calcedo-Barba and García 26 , Reference Stein, Walker, Hazen and Forde 27 It was, therefore, decided to study lower thresholds too (cut-off score of 30 and of 20). Models for prevalence were estimated at municipality level in one stage using ordinary least square regressions with robust errors, as depicted in Equation 5. Here, Pj represents prevalence level in any of its definitions. Adjusted R2 and RMSE for these estimations are the goodness of fit measures that were chosen to report.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20170727094706501-0202:S1049023X17000206:S1049023X17000206_eqnU8.gif?pub-status=live)
Predicting Centiles of the Local PTS Score Distribution
Models for the 90th, 80th, 70th, and 60th quantile of the distribution of PTS scores were estimated. Disaster exposure has a strong but heterogeneous effect on PTS symptoms. In representative samples, the distribution of PTS symptoms is highly skewed to the right, meaning that only a few individuals are high scorers. This is still the case even after a major disaster. The evidence indicates that PTS symptoms are dramatically but unevenly high among residents of strongly affected areas.Reference Norris and Wind 14 This is why an examination of the higher deciles of the score distribution might shed some light into understanding the phenomenon. To achieve this, the method for quantile regressions was used.Reference Milliken, Auchterlonie and Hoge 28 The process is similar to that described above: on a first step, the model was estimated by quantile regression using individual level PTS scores as dependent variables and municipal-level covariates. A predicted value for the centile was obtained for each municipality. At the same time, the observed centile was obtained from the empirical distribution of each municipality. Finally, a regression of empirical centiles on predicted centiles is estimated. R2 and RMSE for this last estimation are the goodness of fit measures chosen to be reported.
Estimation
Each of the aforementioned methods was applied to several sets of covariates, as shown in Table 1 (Table 1 also contains a brief description of each of the covariates used throughout the analysis and how they are measured). The objective was to identify a parsimonious model with the predictors which, in the context of a disaster, are standard and the easiest to find. Since the number of variables at hand was manageable, variable inclusion and exclusion was performed manually, and the assessment of the model was guided mainly by human expertise. Forward stage-wise regression, lasso methods, and least angle regressionReference Koenker 29 also were used for the intermediate assessment of whether there was any important quality of the data that were being missed (results not reported). In these assessments, PGA and some household damage variables (Destruction, Damage, or Destruction+Damage) were always selected as the most informative, regardless of which of the methods was used. However, these statistical methods are not suitable for the final purposes, which include, in the choice of predictors, an assessment of the local availability of covariates.
Covariate sets described in Table 1 include sometimes subsets of variables that are highly collinear. This was observed while conducting the analyses. Variance inflation factors were estimated for each of the sets with mean results that are frequently above standard thresholds.Reference Efron, Hastie, Johnstone and Tibshirani 30 High collinearity of the regressors is a problem in this type of analyses. To investigate the magnitude of the problem, accuracy of the models was studied using 10-fold cross-validationReference O’Brien 31 , Reference Geisser 32 of the R2 obtained in the first stage of each estimation. Results (not reported) indicate that the standard error of the R2 estimate is at or below 0.015 in most models.
Results
Table 2 provides the descriptive statistics of the dependent and independent variables used throughout the analyses. Although some of the analyses used individual data as dependent variables, for the sake of space, the table only describes the variables already aggregated to the municipality level. Also for the sake of space, descriptors for the quadratic forms of the variables or the addition of variables are not provided, although these transformations were included sometimes in the estimations. The table is divided into two blocks of variables. The first describes the set of dependent variables used throughout the analyses, and the second describes the set of covariates.
Table 2 Descriptive Statistics of the Variables Used in the Analyses (N=203)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170728025717-83981-mediumThumb-S1049023X17000206_tab2.jpg?pub-status=live)
Abbreviations: MMS, Modified Mercalli Scale; PTS, posttraumatic stress.
Table 3 shows results from the estimation of the different models. It shows the proportion of variance (adjusted R2) that was explained by the different sets of covariates described in Table 1. Adjusted R2 was selected as a measure of fit since it was widely used and its scale was the same regardless of the scale of the independent variable. Table 4 reports the RMSE, an alternative measure of fit. Since RMSE is scale-dependent, model comparisons based on this statistic were possible only across cells that belonged to the same columns of Table 4. To have an idea of magnitude of the statistic, it should be compared to mean, maximum, and minimum empirical values, as shown in Table 2.
Table 3 Variance Explained (Adjusted R2) for Models Aiming to Predict the Mean and Upper Centiles of the PTS Score Distribution, and PTS Prevalence Using 20, 30, and 40 Points on the Davidson’s Scale as Cut Scores
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170728025717-63727-mediumThumb-S1049023X17000206_tab3.jpg?pub-status=live)
Note: Preferred specifications in grey (N=203).
1 On a first stage Weighted OLS regressions with individual PTS score data were estimated (N=23907). On a second stage aggregate predicted values were compared to empirical aggregates through OLS regression at the municipality level. R2 reported are for second stage.
2 On a first stage weighted quantile regressions with individual PTS score data were estimated (N=23907). On a second stage aggregate predicted values were compared to empirical aggregates through OLS regression at the municipality level. R2 reported are for second stage.
3 OLS regressions with empirical local prevalence as dependent variable.
Table 4 Root Mean Square Error (RMSE) for Models Aiming to Predict the Mean and Upper Centiles of the PTS Score Distribution, and PTS Prevalence Using 20, 30, and 40 Points in the Davidson’s Scale as Cut Scores
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170728025717-18755-mediumThumb-S1049023X17000206_tab4.jpg?pub-status=live)
Note: Preferred specifications in grey (N=203).
1 On a first stage Weighted OLS regressions with individual PTS score data were estimated (N=57531). On a second stage aggregate predicted values were compared to empirical aggregates through OLS regression at the municipality level. R2 reported are for second stage.
2 On a first stage weighted quantile regressions with individual PTS score data were estimated (N=25949). On a second stage aggregate predicted values were compared to empirical aggregates through OLS regression at the municipality level. R2 reported are for second stage.
3 OLS regressions with empirical local prevalence as dependent variable.
Finally, Tables 5 and 6 show the estimated coefficients for two of the models estimated. These were the preferred specifications, given the constraints mentioned above. The choice of these preferred specifications is discussed in the next section of this document. Coefficients for the rest of the models are available from the authors upon request.
Table 5 OLS Regression Results, Average PTS Score, and Prevalence (Standard Errors between Parentheses)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170728025717-02847-mediumThumb-S1049023X17000206_tab5.jpg?pub-status=live)
Abbreviations: PGA, peak ground acceleration; PTS, posttraumatic stress.
* P<.05.
** P<.01.
*** P<.001.
Table 6 Quantile Regression Results: Quantiles 60, 70, 80, and 90 (Standard Errors between Parentheses)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary-alt:20170728025717-95722-mediumThumb-S1049023X17000206_tab6.jpg?pub-status=live)
Abbreviation: PGA, peak ground acceleration.
* P<.05.
** P<.01.
*** P<.001.
Discussion
Discussion of Main Results
A quick inspection of Table 3 shows that the different specifications are almost always capable of explaining more than 60% of the variance of the variables to be predicted.
Local average PTS score: the highest R2 (0.737) of all models is obtained when predicting the local average PTS score with the less restrictive of all the covariate sets of Table 1 (Covariate Set 0). But many other simpler specifications of the model for local average PTS score can explain more than two-thirds of its variance (R2>0.667).
Percentiles of PTS symptom distribution: fit improves for the higher percentiles (80th and 90th). This is good news since the main interest lies in predicting the right tail of the distribution. As discussed previously, a large majority of people will not present PTS symptoms even after a major natural disaster. The distribution of PTS symptoms is, therefore, highly skewed to the left and is not too informative about the real degree of the mental health problems that may have arisen with the disaster.Reference Norris and Wind 14
Prevalence: remember that a cut-off score of 40 is said to be the most efficient in the determination of clinical PTSD. 25 Nonetheless, alternative measures that include some subclinical scores were defined: three alternative prevalence values were constructed, one for the cut-score of 40 (prevalence40), another for the cut-score of 30 (prevalence30), and the last one for the cut-score of 20 (prevalence20). Table 3 shows that models for prevalence30 achieve better fit than its alternatives and, in many covariate specifications, get R2 values at or above 0.66. Models for prevalence40 achieve lower R2 but still, in most covariate specifications, the statistic is at or above 0.6.
Choosing a Model
The task of choosing a model includes the need to take into account the potential availability or ubiquity of covariates. In the process of choosing a model, it must be taken into consideration that some of the covariates are more difficult to find or require more detailed data than others. For example, Horizontal Inundation is easier to measure in steps of 200 meters than down to the nearest meter (and therefore the variable Horiz200 is preferred to Horiz in Tables 1 and 2 as a covariate). Also, length of inundation, Horiz200 or Horiz in Tables 1 and 2, is preferred to height of the highest wave (water height), since the latter will not be observable in the aftermath of the disaster. Inequality indexes (Gini, Theil) and unemployment are more difficult to estimate than poverty, since the latter is constructed by the aggregation of simple indicators of whether an individual is or is not poor. Regarding the aftershock covariates, the sum of aftershocks that are clearly perceived by the population (MMS 4 and over; Tables 1 and 2) are preferred to a disaggregated group of variables indicating each the frequency of aftershocks of a certain intensity (MMS 1 to 2, MMS 2 to 3, and so on). Overall, variables related to the aftershocks are not the most preferred because their construction requires that some time should elapse in the aftermath of the disaster. As already mentioned, the expectation is to make predictions as soon as possible after the disaster strikes. In addition, PGA is available worldwide from USGS at short notice after the earthquake has struck, and household destruction is evidenced immediately (although quantifying it in detail is more difficult and therefore the variable that combines complete destruction and severe damage is preferred, instead of considering them separately; Table 1). Household damage variables are preferred to death rate, since accurate information about the latter will be available within a few days or weeks. Nevertheless, a rough estimation of death rate can be obtained with some speed.
With this in mind, inspection of Tables 3 and 4 can be performed in search of a set of covariates ensuring a reasonable fit while at the same time comprising information relatively easy to get in the aftermath of a disaster. In this analysis, Covariate Set 0 (Table 1) should be used as the reference, since it is the less restrictive in terms of covariate choice. Therefore, Covariate Set 0 yields the best predictions, independently of the variable that is being explained. However, availability of these covariates in the aftermath of a disaster is unlikely. Something similar happens with the simpler specification that uses Covariate Set I. Some predictive power should be sacrificed in order to make the predictions more attainable.
The close inspection of Tables 3 and 4 indicates that Covariate Set III seems to work better than the less parsimonious Covariate Set II, independently (with few exceptions) of the variable to be explained. However, Covariate Set III is still too liberal for these purposes. When comparing the results from Covariate Sets IV to XII, a similar model fit is found, once again, independently of the variable being predicted. Covariate Set IX is especially interesting since it seems to dominate the rest regardless of the variable being explained (except when the dependent variable is the 90th percentile of the PTS score local distribution). Covariate Set IX includes PGA, percentage of households destroyed or severely damaged, death ratio, and local poverty levels. Of these, death ratio is maybe the most difficult to find in at the immediate aftermath of a disaster. If excluded, models can be estimated using Covariate Set XII without sacrificing too much predictive power. Covariate Sets XIII and XIV are much worse at predicting, meaning that it would not be advisable to use PGA and household destruction alone to guess PTS prevalence or score distribution.
Covariate Sets XV to XVII were included to check whether entering the main covariates in a linear (and not polynomial) fashion would suffice. A linear specification is preferred to the quadratic form since it would give a very straightforward rule of thumb for calculation. But Tables 3 and 4 show clearly that all these linear specifications are outstripped by the very simple Specification XIII, which only includes PGA and household destruction data, both entering in a linear plus quadratic form.
These considerations lead to the conclusion that the preferred models are those comprising Covariate Sets IX or XII. Linear specifications are not advisable since they are overcome by Covariate Set XIII, giving rise to a very simple two-covariate (PGA and Destruction+Damage) model where each covariate enters as a second-degree polynomial. But still, Specification XIII is easily improved when poverty is added as a predictor (and further improved with the death ratio).
The three covariate sets at the end of the list (XVIII, XIX, and XX) were devised to answer several questions that arise after having chosen Sets IX and XII as the preferred covariate sets. First of all, Covariate Set XVIII permits to check how model fit improves Specification IX when the death ratio enters the equations as a second-degree polynomial instead of only linearly. Results indicate that although fit is slightly improved with the new specification, improvement is low in absolute terms. Since death ratio data will probably be a very rough estimate (if they are available at all) in the immediate aftermath of a disaster, it may be preferable to insert them in the model only in a linear form.
Covariate Sets XIX and XX are alternative specifications devised to check how model fit is improved (compared to the preferred Specification XII) when the Destruction+Damage covariate is separated into its components. In Covariate Set XIX, both Destruction and Damage enter linearly, and in Covariate Set XX, both enter as polynomials. Specification XIX is worse than Specification XII across models. It is preferable, then, to use one rough measure of Destruction+Damage alone as long as it enters the model as a quadratic polynomial. When each component of Destruction+Damage enters separately and as polynomials, they are then in Specification XX. The latter is superior to Specification XII only in predicting the average PTS score. The main interest of this research, though, lies in predicting the right tail of the distribution. With this in mind, preferred specifications are still those that use Covariate Sets IX and XII.
Simple Algorithms
Tables 5 and 6 provide the coefficients that arise from the preferred specifications that have been just been decided upon. As discussed above, only PGA, poverty, Destruction+Damage, and death rate are the covariates included in the final choices. The importance of these specific covariates in the assessment of the risk of PTSD has been documented in previous literature. The exposure level has been documented as a fundamental determinant of mental health disorders.Reference Arlot and Celisse 33 , Reference North 34 Specifically, the relevance of earthquake intensity as an important predictor of PTSD and other mental disorders has been documented for disasters similar to Chile’s 2010 in other latitudes (ie, local effects were found in the analysis of the Christchurch 2010/2011 earthquakeReference Schultz, Espinel, Galea and Reissman 35 ). Also, severe, lasting, and pervasive mental health effects have been found to be associated with the degree of damage to property, loss of lives, and socioeconomic status.Reference Neria, Galea and Norris 12 , Reference North 34
Estimated coefficients on Tables 5 and 6 can be used as simple algorithms to predict PTS symptom prevalence and distribution. Estimation is straightforward when the data are available: the practitioner must multiply each coefficient with the corresponding variable and add the results. For example, according to Table 5, to predict average PTS score in a location where PGA is 22 (g/100), rate of poverty is 0.5, and 20% of the households were destroyed or severely damaged, with no information about the death rate, the calculation would be: 0.947+0.072x25+0.006x252+69.297x0.2-65.658x0.22+48.070x0.25-87.580x0.252, and the prediction would render a local average of 39.4 PTS score points. Since the prediction is of average PTS score using Covariate Set IX, a RMSE of 5.823 PTS score points can be associated to this estimation from Table 4. But, more interesting than the point estimates are the comparisons among locations that can be made using this tool. In other words, locations can be identified in terms of the relative importance of the mental health problem to be tackled, and such comparisons may be used to assign mental health personnel.
It must be kept in mind that both the dependent variables and the covariates utilized in the analyses are local aggregates. Even though it was possible to make very accurate predictions, it should not be forgotten that what was being predicted were aggregates and not individual outcomes. Therefore, if the algorithm informs that a certain proportion of the local adult population will display PTS symptoms, individual subjects must still be screened using other tools. Estimation results contained in this document should not be used to make inferences about the predisposition of any particular individual(s) to have the condition. The fallacious nature of such inferences is extensively discussed in the literature.Reference Hogg, Kingham, Wilson, Griffin and Ardagh 36
Limitations and Suggestions for Future Research
External validity of the results of this paper should be checked. The predictive capacity of the covariates identified in this paper and the stability of the coefficients found in the estimations should be assessed across disaster contexts and in other geographies. More research must be made in order to assess whether these simple algorithms can be applied in any setting.
Also, the results of this study should not be read as identifying causality, since the estimations are only intended to reflect correlational associations between variables. Along these lines, it is important to note that only as long as pre-disaster PTS prevalence is uniform across locations, it can be argued that this study of prevalence is actually informing about incidence of the symptomatology after the earthquake. The assumption seems plausible since some homogeneity (of aggregate statistics) is observed when locations not struck by the disaster are studied. This is a feature of the sample not reported in this paper due to space concerns. Detailed descriptive information of the sample is available from the corresponding author upon request. Moreover, although pre-disaster prevalence might have a role in the prediction of PTS symptoms in the aftermath, it is not likely that the high and significant coefficients for disaster-related variables in the estimations were only due to chance.
Good models to predict whether an individual will develop PTSD after a disaster are still required. Although some progress has been made towards that objective,Reference Shalev, Ankri and Israeli-Shalev 9 this strand of research is still developing. If it was possible to predict PTSD accurately at the individual level quickly (without the need, costs, or time-intensiveness of professional screening), post-disaster mental health intervention could significantly improve.
Finally, PTSD is only one of several mental health problems that arise in the aftermath of a disaster.Reference Dussaillant and Apablaza 1 The estimates derived herein only point to PTSD prevalence and symptom distribution, since no data on other anxiety disorders, depression, substance abuse, panic disorder, or other mental disorders were available. Understanding the prevalence and local distribution of these other disorders is still an issue that deserves further research if the intention is to achieve an optimal design of treatment services.Reference Dussaillant and Apablaza 1 , Reference Piantadosi, Byar and Green 37 Nevertheless, there is evidence pointing to the fact that some disorders such as depression, dysthymia, and substance abuse are frequently comorbid (and therefore highly correlated to) to PTSD.Reference McFarlane, Van Hoof and Goodhew 2 , Reference Piantadosi, Byar and Green 37 Although this point deserves more research, an estimation of PTSD prevalence might be a good proxy of prevalence of the main mental health problems that arise due to a disaster.
Conclusions
After a major earthquake, the assignment of scarce mental health emergency personnel to different geographic areas is crucial to the effective management of the crisis. The scarce information that is available in the aftermath of a disaster may be valuable in helping predict where are the populations that are in most need.
The analyses reported in this paper show that it is possible to devise simple algorithms to predict PTS prevalence and local PTS score distribution even in a setting in which information is limited, a scenario that is likely in the immediate aftermath of a large-scale disaster. When only including PGA, poverty rate, and household damage in linear and quadratic form, good predictive capacity was achieved. Simple algorithms to predict local prevalence and distribution of PTS symptoms using these variables were derived.
Algorithms that attain precise identification of individuals at high risk of PTSD or other mental disorders associated to disasters is one of the immediate challenges of research, not tackled in this study, which only studied local aggregates. Also, more research must be made in order to assess whether these simple algorithms can be applied in any setting.
Acknowledgments
The authors thank Dr. José Zubizarrieta who kindly shared his estimates of the earthquake’s PGA for each municipality. They also thank Dr. Anthony Rosellini for his helpful comments on an initial draft of this paper. This research used information from the Encuesta Post Terremoto. The authors thank the Chilean Ministerio de Desarrollo Social, intellectual proprietor of the survey, that made it available for this research. Results of this study are the sole responsibility of the authors and do not compromise the Ministerio de Desarrollo Social at all.
Supplementary Material
To view supplementary material for this article, please visit https://doi.org/10.1017/S1049023X17000206