Introduction
Concern has been expressed about violence committed by military personnel (Institute of Medicine, 2010; Department of the US Army, 2012). Between 305 and 399 violent felonies were committed per year during the years 2006–2011 for every 100 000 US Army soldiers (Department of the US Army, 2012), while close to 4% of soldiers in post-deployment surveys reported recent physical fights where they used a knife or gun (Thomas et al. Reference Thomas, Wilk, Riviere, McGurk, Castro and Hoge2010; Gallaway et al. Reference Gallaway, Fink, Millikan and Bell2012; Sundin et al. Reference Sundin, Herrell, Hoge, Fear, Adler, Greenberg, Riviere, Thomas, Wessely and Bliese2014; MacManus et al. Reference MacManus, Rona, Dickson, Somaini, Fear and Wessely2015). The US Army has implemented several programs to address this problem (Department of Defense Instruction, 2014; Fort Lee Human Resources, 2014), but these programs are mostly universal interventions aimed at training all soldiers in basic violence prevention strategies. Cost-effective prevention sometimes also requires more intensive targeted interventions for individuals at high risk (Foster & Jones, Reference Foster and Jones2006; Golubnitschaja & Costigliola, Reference Golubnitschaja and Costigliola2012). Actuarial methods are needed to determine who is at high risk of perpetrating violence (Skeem & Monahan, Reference Skeem and Monahan2011; Fazel et al. Reference Fazel, Singh, Doll and Grann2012). A number of actuarial violence prediction tools have been developed for this purpose to screen psychiatric patients (Higgins et al. Reference Higgins, Watts, Bindman, Slade and Thornicroft2005; Monahan et al. Reference Monahan, Steadman, Robbins, Appelbaum, Banks, Grisso, Heilbrun, Mulvey, Roth and Silver2005), incarcerated criminals (Berk & Bleich, Reference Berk and Bleich2014; Monahan & Skeem, Reference Monahan and Skeem2014), and workers (LeBlanc & Kelloway, Reference LeBlanc and Kelloway2002; Meloy et al. Reference Meloy, White and Hart2013) for high-risk preventive interventions, but no such tool has been developed for Regular Army soldiers. One way to do so would be to use the administrative databases available for all soldiers to develop an actuarial model based on modern machine learning methods (Berk, Reference Berk2008). Although it is unclear how well the variables in existing administrative databases could predict future violent crimes, these data were recently used successfully to develop an actuarial model for post-hospitalization suicides among US Army soldiers (Kessler et al. Reference Kessler, Warner, Ivany, Petukhova, Rose, Bromet, Brown, Cai, Colpe, Cox, Fullerton, Gilman, Gruber, Heeringa, Lewandowski-Romps, Li, Millikan-Bell, Naifeh, Nock, Rosellini, Sampson, Schoenbaum, Stein, Wessely, Zaslavsky and Ursano2015). The current report presents the results of an attempt to develop a comparable model for violent crime perpetration among US Army soldiers. We focus on non-familial physical violent crimes, excluding family violence and sexual violence, based on evidence that their predictors are different from the predictors of non-familial physical violence (Marshall et al. Reference Marshall, Panuzio and Taft2005; Mohammadkhani et al. Reference Mohammadkhani, Forouzan, Khooshabi, Assari and Lankarani2009; Elbogen et al. Reference Elbogen, Fuller, Johnson, Brooks, Kinneer, Calhoun and Beckham2010a ; Sullivan & Elbogen, Reference Sullivan and Elbogen2014).
Method
Sample
The sample in which the model was developed (referred to in the machine learning literature as the ‘training sample’) was the Historical Administrative Data System (HADS) of the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS; Ursano et al. Reference Ursano, Colpe, Heeringa, Kessler, Schoenbaum and Stein2014). Army STARRS is an epidemiological-neurobiological study of risk and resilience factors for suicide and related outcomes in the US Army. The HADS was developed originally to provide administrative data on the correlates of soldier suicides. The HADS brings together data from 38 Army/Department of Defense administrative data systems (Supplementary Appendix Table S1) for the 975 057 Regular US Army soldiers serving between 1 January 2004 and 31 December 2009 (Kessler et al. Reference Kessler, Colpe, Fullerton, Gebler, Naifeh, Nock, Sampson, Schoenbaum, Zaslavsky, Stein, Ursano and Heeringa2013). The outcome variable was a dichotomous measure of the first accusation of a major physical violent crime, not occurring within the soldier's family, for which the Army found sufficient evidence to warrant a full investigation (although not necessarily enough for a conviction). Such an event was recorded in the administrative records of 5771 soldiers in the population.
As detailed below, we analyzed the HADS using discrete-time survival analysis (Willett & Singer, Reference Willett and Singer1993) with person-month as the unit of analysis. That is, each month in the career of each soldier over the time interval between January 2004 and December 2009 was used as a separate observational record. We developed actuarial models to predict whether soldiers who had never been accused of committing a major physical violent crime were accused of doing so in each of those months. The independent variables in the models were administrative variables available for the soldier the month prior to the month of the accusation. Person-months were censored either at termination of Regular Army service or after the month when the crime occurred, whichever came first. There were approximately 37 million person-months in the HADS, 5771 of which were coded 1 on our dichotomous outcome variable. Rather than work with all possible control person-months (coded 0 on the outcome variable) in our analysis, we used the logic of case-control analysis (Schlesselman, Reference Schlesselman1982) to select a probability sample of control person-months that we weighted by the inverse of their probability of selection. Unbiased estimates of the odds ratios of significant independent variables in the model were obtained by analyzing all cases along with this weighted probability sample of control person-months. We then tested the model in an independent validation sample of 43 248 soldiers who participated in Army STARRS surveys carried out in 2011–2012 and who were subsequently followed in administrative records through 2013 (roughly 10 million person-months). The STARRS survey samples, which are described in detail elsewhere (Kessler et al. Reference Kessler, Colpe, Fullerton, Gebler, Naifeh, Nock, Sampson, Schoenbaum, Zaslavsky, Stein, Ursano and Heeringa2013), consisted of probability samples of soldiers at all phases of the Army career.
Measures
Dependent variable
Data from five HADS datasets were combined to identify the date, type, and judicial outcome of all crimes that occurred over the study period. Crime types were classified using the Bureau of Justice Statistics National Corrections Reporting Program (NCRP) classification system (US Department of Justice, 2009). We then defined our outcome as first founded major physical violent crimes committed against someone other than a family member based on NCRP codes for murder-manslaughter, kidnapping, aggravated arson, aggravated assault, and robbery. A ‘founded’ crime is one for which the Army found evidence sufficient to warrant a full investigation. As noted above, 5771 soldiers met this definition. Such ‘founded’ cases exclude those that do not pass a test of probable cause based on review of the totality of the circumstances. This focus on founded offenses is consistent with other research (Army Suicide Prevention Task Force, 2010; Department of the US Army, 2012; Skeem et al. Reference Skeem, Kennealy, Monahan, Peterson and Appelbaum2015; Steadman et al. Reference Steadman, Monahan, Pinals, Vesselinov and Robbins2015), which virtually always uses arrest rather than conviction as the dependent variable based on the fact that arrest records reflect actual violent behaviors much more closely than conviction records. Conviction records among founded cases, in comparison, largely reflect the vagaries of bureaucratic processing by the criminal justice system, including the fact that some soldiers with founded offenses escape conviction by accepting a Discharge Under Other Than Honorable Conditions (UOTHC) in lieu of court martial.
Our focus on first founded offenses is due to the fact that the vast majority of all founded major physical violent crimes in the US Army are first offenses (75% among men and 84% among women) and most repeat offenses are committed prior to initial apprehension. Recidivism in the classical sense (i.e. a repeat offense after being released from prison for the first offense) is rare in the military, as convicted major violent crime offenders typically receive a dishonorable discharge immediately after serving a sentence, while soldiers with founded crimes who accept UOTHC discharges are discharged immediately at the time of release from custody.
Our focus on non-familial physical violence (i.e. excluding familial violence and sexual assaults) was not based on the comparatively high prevalence of this type of crime. Indeed, the number of soldiers with founded non-familial major physical violence was smaller during the years of this study (n = 5771) than the number with familial physical violence (15 154), although non-familial sexual violent crime (6198) was much more common than familial sexual violent crime (718). However, as noted in the introduction, previous research suggests that the predictors of these different types of violent crime vary (Marshall et al. Reference Marshall, Panuzio and Taft2005; Mohammadkhani et al. Reference Mohammadkhani, Forouzan, Khooshabi, Assari and Lankarani2009; Elbogen et al. Reference Elbogen, Fuller, Johnson, Brooks, Kinneer, Calhoun and Beckham2010a ; Sullivan & Elbogen, Reference Sullivan and Elbogen2014), leading us to focus on each of them separately. While the current report presents the results of our model-building efforts to predict non-familial major physical violence, separate reports will have the results of attempts to build comparable models for the other types of violence.
Potential predictors
Numerous epidemiological studies have examined predictors of violence among active-duty military personnel (Killgore et al. Reference Killgore, Cotting, Thomas, Cox, McGurk, Vo, Castro and Hoge2008; Gallaway et al. Reference Gallaway, Fink, Millikan and Bell2012; MacManus et al. Reference MacManus, Dean, Al Bakir, Iversen, Hull, Fahy, Wessely and Fear2012a , Reference MacManus, Dean, Iversen, Hull, Jones, Fahy, Wessely and Fear b , Reference MacManus, Dean, Jones, Rona, Greenberg, Hull, Fahy, Wessely and Fear2013) and veterans (Jakupcak et al. Reference Jakupcak, Conybeare, Phelps, Hunt, Holmes, Felker, Klevens and McFall2007; Elbogen et al. Reference Elbogen, Wagner, Fuller, Calhoun, Kinneer and Beckham2010b , Reference Elbogen, Johnson, Wagner, Newton, Timko, Vasterling and Beckham2012, Reference Elbogen, Johnson, Newton, Fuller, Wagner and Beckham2013, Reference Elbogen, Cueva, Wagner, Sreenivasan, Brancu, Beckham and Van Male2014 a, b; Hellmuth et al. Reference Hellmuth, Stappenbeck, Hoerster and Jakupcak2012; Sullivan & Elbogen, Reference Sullivan and Elbogen2014). A recent review (Elbogen et al. Reference Elbogen, Fuller, Johnson, Brooks, Kinneer, Calhoun and Beckham2010 a) organized the significant predictors in these studies into four broad categories: socio-demographic and dispositional (e.g. sex, race-ethnicity, personality); historical (e.g. childhood experiences, military career experiences, prior violence); clinical (e.g. mental and physical disorders); and contextual-environmental (e.g. access to weapons). Given that our analysis was carried out opportunistically (i.e. selecting our measures of potential predictors from administrative data collected for other purposes), we were not able to operationalize all the significant predictors in previous studies. However, 446 HADS variables were found that could be used as indicators of previously documented predictors. These included 21 socio-demographic variables, 38 variables defining military career experiences, 66 variables representing prior crime perpetration and victimization, 282 clinical variables (treated mental and physical disorders and medications), and 39 contextual-environmental variables (e.g. unit characteristics defined at the battalion level, registered weapons). A complete description of the independent variables is available in Supplementary Appendix Tables S3–S6.
Analysis methods
Data analysis was carried out remotely by analysts from Harvard Medical School on the secure Army STARRS Data Coordination Center server at the University of Michigan. De-identified HADS analysis was approved by the Human Subjects Committees of the Uniformed Services University of the Health Sciences for the Henry M. Jackson Foundation (the primary Army STARRS grantee), the University of Michigan, and Harvard Medical School. The governing IRBs did not require obtaining informed consent from individual soldiers because the data were de-identified.
Cross-tabulations were used to calculate outcome incidence. Incidence was expressed as the number of founded accusations per 1000 person-years for descriptive purposes. However, model-building was not based on an incidence analysis, but rather on discrete-time person-month survival analysis (Willett & Singer, Reference Willett and Singer1993). This is an important distinction because previous research has shown that the examination of risk factors based on incidence analysis can yield inaccurate results (Kraemer, Reference Kraemer2009). It is noteworthy in this regard that our models examined risk factors for the first occurrence of a founded major physical violent crime in each month of the career of each soldier in the Army between January 2004 and December 2009. The models allowed for time-varying values of the risk factors as the vast majority of variables had values that changed over time (e.g. the soldier's rank, time in service, history of prior healthcare visits, etc.). Because of these time-varying values, the model could assign a different predicted probability of the outcome to a single soldier each month, allowing us to examine the possible existence of critical high-risk time periods in the careers of individual soldiers. This, coupled with the fact that the independent variables in our models are routinely updated for each soldier each month, means that the Army could use our models to generate a new predicted probability of committing a violent crime in the next 1, 6, or 12 months (or over any other designed future risk period) for each soldier each month. Given the existence of hypotheses suggesting that risk factors for violence are different for men and women (Whittington et al. Reference Whittington, Hockenhull, McGuire, Leitner, Barr, Cherry, Flentje, Quinn, Dundar and Dickson2013), separate sex-specific models were developed.
The major challenge in developing an actuarial prediction model from a data array of this type is that the existence of such a large number of predictors introduces the possibility of over-fitting, which would lead to poor performance of the model when applied to future time periods. Machine learning methods are designed to minimize this problem by using iterative cross-validation to select a stable and optimal subset of predictors (Kohavi, Reference Kohavi1995). A six-step process was used to achieve this end.
-
(1) Bivariate associations of temporally prior independent variables with the subsequent occurrence of the outcome were examined in our person-month dataset using proc logistic in SAS version 9.3 (SAS Institute Inc., 2010). This step was conducted in a pooled dataset across the entire 72-month study period using a logistic link function and included control variables for time (i.e. month and year) to adjust for temporal trends in crime rates.
-
(2) The functional forms of significant bivariate associations involving non-dichotomous independent variables were transformed to capture substantively plausible nonlinearities.
-
(3) Multivariate associations were estimated in a logistic models that included all independent variables that were significant in bivariate analyses.
-
(4) As coefficients in the multivariate models were unstable, the method of 10-fold cross-validated forward stepwise regression was used to select the optimal number of significant independent variables to maximize the proportion of observed crimes found among the 5% of soldiers with highest cross-validated predicted risk. Ten-fold cross-validation is a method that estimates 10 separate stepwise models, each time holding out a separate 10% of the population, and then uses the coefficients from each 90% subsample to generate a predicted probability only for the 10% of the population in the hold-out subsample (Kohavi, Reference Kohavi1995). Changes in model fit associated with the number of independent variables were then inspected in the aggregation of the 10 hold-out subsamples to determine the smallest number of independent variables needed to achieve optimal cross-validated prediction accuracy, thus minimizing risk of the over-fitting that often occurs when using stepwise regression analysis (Anderssen et al. Reference Anderssen, Dyrstad, Westad and Martens2006).
-
(5) A search for stable interactions among independent variables in the optimal stepwise model was carried out using the R package RandomForests (RF; Liaw & Wiener, Reference Liaw and Wiener2002). RF is a tree-based method that uses simulation across many different subsampled trees (in our models, 500 trees) to generate a single stable summary predicted outcome score capturing the significant interactions among the independent variables (Svetnik et al. Reference Svetnik, Liaw, Tong, Culberson, Sheridan and Feuston2003). The incremental improvement in fit achieved by using RF was determined by adding a variable representing the RF predicted probability to the optimal regression equation estimated in the previous step and determining the extent to which this led to an increase in the proportion of crimes committed by the 5% of soldiers with highest cross-validated predicted risk.
-
(6) The R package glmnet (Friedman et al. Reference Friedman, Hastie and Tibshirani2010) was then used to estimate elastic net penalized regression models. Penalized regression models trade off a small amount of bias in coefficients to increase the efficiency and stability of estimates (Zou & Hastie, Reference Zou and Hastie2005).
The coefficients in the optimal models were used to calculate the predicted probability of the outcome for each observation (person-month) in the dataset. The association between this predicted probability and the observed occurrence of the outcome (i.e. a given soldier actually being accused of committing one of the crimes considered here) was then used to calculate the area under the receiver-operating characteristic curve (AUC) as an estimate of model accuracy. In order to visualize this association, person-months were ranked by predicted probability from highest to lowest risk and then grouped into 20 categories of equal size (ventiles). The proportion of true-positive observations in each ventile was then calculated and graphed. Given the active debate about identifying high-risk individuals using information about race-ethnicity (Berk, Reference Berk2009), all analyses were carried out both with and without race-ethnicity among the independent variables.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008.
Results
Incidence of perpetration by sex, time-in-service, and deployment status
Among soldiers who had never before been accused of one of the crimes considered here during their Army career, an average of 16.7 out of every 100 000 men and 7.5 out of every 100 000 women were accused of doing so in a given month over the study period. These numbers can be expressed equivalently as incidence rates of 2.0/1000 person-years among men and 0.9/1000 person-years among women. Incidence was significantly higher among men than women (χ2 1 = 329.9, p < 0.001) and inversely related to time-in-service (from highs of 3.8/1000 person-years among men and 1.6/1000 person-years among women in the second year of service to lows of 0.2–0.1/1000 person-years after >20 years of service; χ2 7 = 104.2–2002.6, p < 0.001) (Table 1). Over 50% of first occurrences of the outcome occurred in the first 3 years of service. Incidence was significantly lower among currently deployed (0.9–0.3/1000 person-years) than never-deployed (2.4–1.1/1000 person-years) and previously deployed (2.3–0.9/1000 person-years; χ2 1 = 26.1–562.4, p < 0.001) men and women and generally declined with time-in-service in subgroups defined by the conjunction of sex and deployment status (Supplementary Appendix Table S7).
* Significant at the 0.05 level, two-sided test.
a 5771 Regular Army soldiers had first onsets of major physical violence perpetration between 1 January 2004 and 31 December 2009 out of the 975 057 Regular Army soldiers (821 807 men, 153 250 women) in active-duty service over that time period.
b n = Number of soldiers with first founded accusations of non-familial major physical violent crime perpetration in the time interval represented by the row.
c Percent of the total population person-months in the time interval represented by the row. Men had a total of 31 721 734 population person-months and women had a total of 5 181 659 population person-months.
Building the models
While the majority of the 446 HADS variables had significant (0.05 level, two-sided tests) bivariate associations with the outcome among men (82.7%) and women (56.8%) (Supplementary Appendix Tables S8–S21), fewer (112 among men, 81 among women) entered the unrestricted stepwise model at the 0.05 level and fewer yet (24 among men, 15 among women) resulted in cross-validated improvements in overall model fit. AUC and the proportion of observed crimes committed by those in the top ventile of predicted risk were similar whether or not the RF summary variable was added to the optimal cross-validated set of independent variables (Supplementary Appendix Table S22), leading us not to include RF as part of the final models. Fit statistics were very similar in unpenalized and optimal penalized models (AUC = 0.81 in both models among men and 0.80–0.82 among women; 36.2–36.4% of observed crime among men and 31.3–33.1% among women in the top 5% of predicted risk) (Fig. 1). Incidence in the top ventile of predicted risk (which, in the screening scale literature, would be referred to as the ‘positive predictive value’ of the model at the 5% cut-point) was 14.7/1000 person-years in both the unpenalized and penalized models among men (7.4 times the total sample incidence) and 5.8–6.3/1000 person-years among women (6.4–7.0 times the total sample incidence).
Coefficients in the optimal models
Five socio-demographic variables among men and one among women were significantly associated with elevated risk in the optimal models: young age, minority race/ethnicity (non-Hispanic Black the only significant socio-demographic variable among women), and less than at least some college education (Table 2). Seven Army career variables among men and six among women were also significant. Three with elevated risk were associated with early career stages: junior enlisted rank (E1–E4, men and women); intermediate enlisted rank (E5–6, women); and 0–10 years-in-service (men). Three other career-related variables discriminated among commands, with Forces Command (women; responsible for ground forces) and Area-based Component Commands (men and women; responsible for Army operations in specific regions of the world) having elevated risks and Training and Doctrine Command (men; responsible for recruiting and training) having low risk. The other four significant career-related variables associated with elevated risk were early age at enlistment (women), being an infantryman (only possible for men during the years of data collection), not being currently deployed (men and women), and recent demotion (men).
OR, Odds ratio; CI, confidence interval; VIF, variance inflation factor; TRADOC, training and doctrine command; FORSCOM, forces command; ODD, oppositional defiant disorder; NCO, non-commissioned officer.
a The analysis sample included all person-months with the outcome plus a probability sample of all other person-months in the population stratified by sex and marital status (total case-control sample of 201 121 person-months; 187 316 for men, 13 805 for women). All records in the control sample were weighted by the inverse of probability of selection. All independent variables shown here were significant at the 0.05 level (two-sided test).
b One temporal control variable also stepped into the model for men (year of observation being in 2009) but is not shown here.
c Variance inflation factor (VIF) for the coefficient associated with independent variable X i in the above equation equals 1/(1 − R 2 i ), where R 2 i is the coefficient of determination of a regression equation in which X i is the dependent variable and all the other independent variables in the model are included as predictors of X i . VIF > 5.0 is typically considered an indicator of meaningful multicollinearity (Stine, Reference Stine1995).
d This was a categorical variable coded 0–4 (0 = 0 visits, 1 = 1–2 visits, 2 = 3–5 visits, 3 = 6–10 visits, 4 = 11+ visits). The percentage reported reflects that 14.0% of men had one or more days with outpatient visits in the past 3 months.
e Pooled across unit officers/NCOs in the past 3 months. These variables were standardized to have a mean of 0 and standard deviation of 1.
Four indicators of past 12- to 24-month criminality among men and three among women were associated with significantly elevated risk in the optimal models: perpetration of any crime (men and women); two or more types of crime (men); verbal violent crime (women; e.g. blackmail, intimidation); minor physical violent crime (men); and any crime victimization (women). Ten clinical factors (past 3–12 months) were also significant: any outpatient mental disorder treatment (women) and number of such visits (men); outpatient treatment of conduct/oppositional defiant disorder (men), stress-related disorder (men), and alcohol or drug-induced disorder (women); inpatient treatment of major depression with psychosis or for stressors/adversities (women); sedative-hypnotic prescriptions (men); and suicide attempts (men). Finally, two contextual variables were significant among men: there was an inverse association with tenure of unit officers (median time-in-service); and a positive association with deployment experience of non-commissioned officers (NCOs; median time deployed).
Sensitivity analysis
To investigate the value of sex-specific models, we applied the coefficients from the unpenalized male model to the female sample and vice versa. Both AUC (0.80–0.79 for men and women, respectively) and the proportion of observed crimes committed by those in the top ventile of predicted risk (31.6–27.7% for men and women, respectively) remained elevated, although somewhat lower than in the same-sex models, showing that core variables in the models are similar but not identical for men and women. Model-building was also repeated after excluding race-ethnicity as an eligible potential predictor. Results were quite similar to those in models that included race-ethnicity (Supplementary Appendix Table S22).
As the models were designed to predict perpetration this month, further analysis was needed to evaluate prediction accuracy over longer time periods. We calculated the proportion of observed crimes committed by those in the top ventile of predicted risk for all possible 1-month, 6-month, and 12-month follow-up periods from January 2004 through January 2009 and in 20-month (January 2004–August 2005, September 2005–April 2007, May 2007–January 2009) and 30-month (January 2004–June 2006, July 2006–January 2009) intervals. (February–December 2009 were excluded because we did not have 12 months of follow-up data after these months). The proportions were highest over 1-month periods (29.5–35.3%) (Table 3). This is to be expected given that some risk factors could have come into being only later (e.g. a new demotion), thereby leading to an increase in predicted risk with shorter time lags between predictors and the outcome. Nonetheless, the proportions remained elevated over 6-month (22.7–29.1%) and 12-month (18.3–24.1%) periods, documenting that most significant predictors are stable over these intervals of time. Proportions were also consistent across the five 20-month and 30-month time intervals, indicating that model stability was quite good over the years 2004–2009.
a Estimates are based on the predicted probabilities from the final total sample penalized models. February-December 2009 were excluded because we did not have 12 months of follow-up data after these months.
Although time-in-service was strongly related to risk of being accused of committing the outcome, the fact that RF did not improve model fit meant that no interactions were found between time-in-service and other independent variables. However, this might have been because we lacked adequate statistical power to detect these interactions due to the high proportion of the outcome occurring in the early years of service. We evaluated this possibility by examining the proportion of observed crimes committed by those in the top ventile of predicted risk within subgroups defined by time-in-service (Table 4). Unsurprisingly, the proportion of soldiers in the top ventile of predicted risk varied inversely with time-in-service among both men and women (χ2 7 = 2310.4–94.0, p < 0.001). However, when cut-points were recalibrated to focus on the 5% of soldiers at highest predicted risk within each time-in-service subsample, the association between time-in-service and the proportion of observed crimes committed by those in the top ventile of predicted risk became insignificant (31.2% among men, χ2 7 = 8.0, p = 0.33; 27.0% among women, χ2 6 = 8.2, p = 0.23).
* Significant at the 0.05 level, two-sided test.
a Estimates are based on the coefficients from the penalized models [mixing parameter penalty (MPP) = 0.5 for men; MPP = 0.1 for women].
b Ventiles were re-classified independently within each time-in-service group so the top ventile of predicted risk includes 5% of the person-months within each time in service category
c There were only two person-months coded yes for a first occurrence of major physical violence perpetration among women with >20 years-in-service. One of the two had a predicted probability in the top ventile, resulting an unstable standardized rate of perpetration among women with >20 years-in-service (2400/1000 person-years). We consequently collapsed the 10–20 and >20 years-in-service groups to form a >10 years-in-service group.
Validation of the model in the Army STARRS 2011–2013 survey sample
The coefficients estimated in the 2004–2009 HADS were applied to the sample of soldiers who participated in Army STARRS surveys in 2011–2011 (n = 43 248) and were followed administratively through the end of 2013 (10 165 562 person-months). Men and women were combined because of the small number of instances of the outcome in this sample (n = 16). AUC was 0.77 and the proportion of observed crimes committed by those in the top ventile of predicted risk was 50.5%.
Discussion
The administrative data used here, although broad in scope, were limited because they did not include indicators of some significant predictors found in previous studies (e.g. personality traits, social networks, early life experiences), because they had more missing and inconsistent values than would data collected for research purposes, and because they excluded perpetrators who eluded authorities. Within the context of these limitations, we showed that models could be developed with quite stable prediction accuracy across subgroups within the 2004–2009 training dataset and provisionally validated in the independent 2011–2013 dataset. Caution is needed in the latter regard, though, as the validation sample was small and a more complete validation is needed once HADS data become available for a more recent time period. It would be premature to use the tool in practice prior to a more thorough validation.
It is also important not to over-interpret the specific variables in our final models, as the stepwise selection method maximized overall prediction accuracy at the expense of individual coefficient accuracy. Three general observations about the variables in the final models are nonetheless noteworthy. First, these variables were highly consistent with previous military research in showing that violence is associated with young age and low rank, low socioeconomic and minority status, prior crime involvement, and mental disorders (Elbogen et al. Reference Elbogen, Johnson, Wagner, Newton, Timko, Vasterling and Beckham2012, Reference Elbogen, Cueva, Wagner, Sreenivasan, Brancu, Beckham and Van Male2014a , Reference Elbogen, Johnson, Wagner, Sullivan, Taft and Beckham b ; Gallaway et al. Reference Gallaway, Fink, Millikan and Bell2012; Hellmuth et al. Reference Hellmuth, Stappenbeck, Hoerster and Jakupcak2012; MacManus et al. Reference MacManus, Dean, Al Bakir, Iversen, Hull, Fahy, Wessely and Fear2012a , Reference MacManus, Dean, Jones, Rona, Greenberg, Hull, Fahy, Wessely and Fear2013; Sullivan & Elbogen, Reference Sullivan and Elbogen2014).
Second, our finding that never-deployed and previously deployed soldiers had comparably elevated violent crime risk is striking given that recent research has suggested that combat exposure leads to increased violence among soldiers returning from deployment (MacManus et al. Reference MacManus, Rona, Dickson, Somaini, Fear and Wessely2015). The RF analysis failing to find evidence of meaningful interactions means that no evidence was found for differences in the strength of associations of predictors among the previously deployed and never-deployed v. the currently deployed. We also carried out post-hoc analyses to include information on history and recency of deployment and the conjunction of combat arms occupation with deployment among the final model independent variables, but none of these were significantly associated with the outcome (Supplementary Appendix Table S23). These findings suggest that the significantly elevated rates of violence found in previous research among soldiers with a history of combat deployment are explained by other variables in our model. It would be useful to investigate this matter formally in future studies by beginning with the gross associations of deployment with violence and determining which of the variables in our final models explained those gross associations.
Third, the opposite-sign coefficients associated with unit leader tenure/experiences are noteworthy. The distinction between officers and NCOs is artifactual (illustrating the caution noted above against over-interpreting the coefficients associated with specific significant predictors), as analysis of bivariate associations showed that time-in-service of both NCOs and officers was negatively associated with unit member violence, while time deployed of both NCOs and officers was positively associated with unit member violence. This further analysis also showed that the opposite-sign bivariate associations were quite stable over time and existed among women as well as men (Supplementary Appendix Table S24). To put the magnitudes of these associations in perspective, a policy simulation based on the provisional assumption that the coefficients represent causal effects suggested that randomly assigning soldiers to units led by officers with time-in-service 1 s.d. above the Army-wide mean and by NCOs with time-deployed 1 s.d. below the Army-wide mean would decrease incidence of the crimes considered here by nearly 40%. Of course, it is unclear if a causal interpretation is appropriate or, if so, what underlying mechanisms might account for these associations. Suggestions exist in the literature, such as that longer tenure of unit leaders is associated with both improved unit discipline (Shamir et al. Reference Shamir, Brainin, Zakay and Popper2000) and reduced aggression of unit members (Bliese et al. Reference Bliese, Halverson and Schriesheim2002), that the disciplinary climate created by unit leaders influences violence rates within units (Millikan et al. Reference Millikan, Bell, Gallaway, Lagana, Cox and Sweda2012), and that unit leaders who experience repeated deployments might become more tolerant of violence (Parmak et al. Reference Parmak, Euwema and Mylle2012). But systematic multivariate analysis and subsequent experimentation would be needed to determine which of these or other processes might account for the associations found here between unit leader tenure/experiences and unit member violent crimes.
It is interesting to compare the accuracy of our models with the accuracy of violence risk assessment tools developed in forensic and inpatient settings, even though the populations in these other studies are so different from the population considered here that such comparisons are no more than suggestive. Unlike our administrative data tool, these existing risk assessment tools are usually quite labor-intensive to administer in that they require clinicians to make in-depth assessments. Prediction accuracy is typically evaluated by calculating AUC. A recent comprehensive review of the 17 tools of this sort (including six that were developed specifically to predict sexual violence and one developed to predict domestic violence) evaluated in multiple settings found that 11 had mean AUC below 0.70 and the others had AUC in the range 0.70–0.79, with the highest AUC among instruments used in at least five studies being 0.73 (Whittington et al. Reference Whittington, Hockenhull, McGuire, Leitner, Barr, Cherry, Flentje, Quinn, Dundar and Dickson2013). Our models, in comparison, had AUC of 0.80–0.82 in the training dataset and 0.77 in the validation dataset. These levels of prediction accuracy were achieved based entirely on administrative predictors available on an ongoing basis for all soldiers. Furthermore, unlike typical violence risk assessment tools, which focus on individual differences in risk over a single risk period (e.g. risk of committing a violent act in the next 12 months), our approach allows us to look not only at between-person variation in risk but also at within-person variation in risk over time (i.e. detection of critical periods of risk for individual soldiers).
The US Army does not currently use actuarial methods to identify soldiers at high risk of committing violent crimes. However, the high AUC and high proportion of observed crimes committed by those in the top ventile of predicted risk in our models raise the possibility that our models, if they are validated in future studies that go beyond the provisional validation reported here, might be useful for determining which soldiers should receive more intensive risk evaluations or interventions (Naeem et al. Reference Naeem, Clarke and Kingdon2009; Douglas et al. Reference Douglas, Hart, Webster and Belfrage2013; Shea et al. Reference Shea, Lambert and Reddy2013). It is important to recognize, though, that the crimes considered here are uncommon even in the 5% of soldiers classified as being at high risk. This means that targeted preventive interventions would only be cost-effective if (i) the value of preventing even a single case of violent crime was determined to be high, (ii) the intervention was inexpensive, and/or (iii) the intervention was effective in preventing not only violent crime but also other adverse outcomes associated with high violence risk (e.g. depression, substance abuse, self-harm). Competing risks would have to be considered under each scenario. Although evaluation of these scenarios is outside of the scope of the current report, such an evaluation would have to be a central focus of any future efforts to determine the feasibility and desirability of using our models to target preventive interventions.
Supplementary material
For supplementary material accompanying this paper visit http://dx.doi.org/10.1017/S0033291715001774.
Acknowledgements
The data analyzed in this report were collected as part of the Army Study to Assess Risk and Resilience in Servicemembers (Army STARRS). Army STARRS was sponsored by the Department of the Army and funded under cooperative agreement number U01MH087981 with the US Department of Health and Human Services, National Institutes of Health, National Institute of Mental Health (NIH/NIMH). This research was conducted by Harvard Medical School and is funded by the Department of Defense, Office of the Assistant Secretary for Defense for Health Affairs, Defense Health Program (OASD/HA), awarded and administered by the US Army Medical Research & Materiel Command (USAMRMC), at Fort Detrick, MD, under Contract Number (Award no. W81XWH-12-2-0113.
The views, opinions and/or findings contained in this research are those of the authors and do not necessarily reflect the views of the Department of the Army, Department of Defense (DoD), Department of Health and Human Services, or NIMH and should not be construed as an official DoD/Army position, policy or decision unless so designated by other documentation. No official endorsement should be made.
Declaration of Interest
Dr Stein has been a consultant for Care Management Technologies, received payment for his editorial work from UpToDate and Depression and Anxiety, and had research support for pharmacological imaging studies from Janssen. In the past 3 years, Dr Kessler has been a consultant for Hoffman-La Roche Inc., Johnson & Johnson Wellness and Prevention, and Sanofi-Aventis Group. Dr Kessler has served on advisory boards for Mensante Corporation, Plus One Health Management, Lake Nona Institute, and US Preventive Medicine. Dr Kessler is a co-owner of DataStat Inc. Dr Monahan is a co-owner of the Classification of Violence Risk (COVR) Inc. The remaining authors declare no conflict of interest.