Introduction
Obsessive-compulsive disorder (OCD) is one of the most prevalent neuropsychiatric disorders, affecting about 2.5% of people throughout their lives (Kessler, Chiu, Demler, Merikangas, & Walters, Reference Kessler, Chiu, Demler, Merikangas and Walters2005; Ruscio, Stein, Chiu, & Kessler, Reference Ruscio, Stein, Chiu and Kessler2010). According to the World Health Organization (WHO), OCD was ranked one of the most debilitating and disabling diseases (Murray, Lopez, & World Health Organization, World Bank & Harvard School of Public Health, Reference Murray, Lopez, Murray and Lopez1996). Nowadays, the WHO lists anxiety disorders, including OCD, as the sixth largest contributor to non-fatal health loss (World Health Organization, 2017). Despite the high levels of suffering, disability, and impairment associated with the disorder, high rates of comorbidities and low rate of response to treatment, suicidal behavior in patients with OCD have received less attention (American Psychiatric Association, 2013; Eisen et al., Reference Eisen, Mancebo, Pinto, Coles, Pagano, Stout and Rasmussen2006; Jacoby, Leonard, Riemann, & Abramowitz, Reference Jacoby, Leonard, Riemann and Abramowitz2014).
Suicide is a major public health problem and one of the top 10 causes of death and burden of disease worldwide, across all age groups (Lozano et al., Reference Lozano, Naghavi, Foreman, Lim, Shibuya, Aboyans and Memish2012; World Health Organization, 2014). Also, about 90% of suicide deaths are directly related to mental disorders (Arsenault-Lapierre, Kim, & Turecki, Reference Arsenault-Lapierre, Kim and Turecki2004; Hawton & van Heeringen, Reference Hawton and van Heeringen2009; Nock, Hwang, Sampson, & Kessler, Reference Nock, Hwang, Sampson and Kessler2010). Suicide in OCD has been considered a rare phenomenon, with suicide death rates described in the classic literature about 1% (Coryell, Reference Coryell1981; Goodwin, Guze, & Robins, Reference Goodwin, Guze and Robins1969), with some studies finding a negative association results between suicide and OCD (Kanwar et al., Reference Kanwar, Malik, Prokop, Sim, Feldstein, Wang and Murad2013).
Contrary to these findings, recent evidence has suggested that OCD suicide rates might be underestimated, with two systematic reviews indicating rates of suicide attempts (SA) in OCD ranging from 1% to 46.3% (Angelakis, Gooding, Tarrier, & Panagioti, Reference Angelakis, Gooding, Tarrier and Panagioti2015) and from 6% to 51.7% (Albert, De Ronchi, Maina, & Pompili, Reference Albert, De Ronchi, Maina and Pompili2018). However, the results of previous studies raise some questions: suicide in OCD is connected to intrinsic factors about the psychopathology of OCD (related to the direct effects of the nuclear psychopathological aspects of the disorder) or is the result of factors extrinsic to OCD psychopathology nucleus, such as the occurrence of psychiatric comorbidities or sociodemographic factors. To the best of our knowledge, this research field to date has mainly used traditional statistical analysis, but the psychopathological heterogeneity of OCD can potentially reduce power and obscure the findings of these clinical studies, as it is known that there are several limitations using traditional hypothesis testing methods to analyze multidimensional and heterogeneous data (Passos, Mwangi, & Kapczinski, Reference Passos, Mwangi and Kapczinski2019b).
In light of these findings, techniques that can analyze data from multiple biological levels at the individual level, such as machine learning, are essential (Passos et al., Reference Passos, Ballester, Barros, Librenza-Garcia, Mwangi, Birmaher and Kapczinski2019a). The use of machine learning focuses on algorithms that can analyze, learn, and extract patterns from data in a non-linear and interactive way, thus transforming data into relevant information, exceeding the human brain's ability to understand it (Greenhalgh, Howick, & Maskrey, Reference Greenhalgh, Howick and Maskrey2014; Huys, Maia, & Frank, Reference Huys, Maia and Frank2016; Passos & Mwangi, Reference Passos and Mwangi2018; Rajkomar, Dean, & Kohane, Reference Rajkomar, Dean and Kohane2019).
Machine learning has shown to be more accurate in predicting suicidality than traditional statistical analysis in several studies (de Ávila Berni et al., Reference de Ávila Berni, Rabelo-da-Ponte, Librenza-Garcia, Boeira, Kauer-Sant'Anna, Passos and Kapczinski2018; Larsen et al., Reference Larsen, Cummins, Boonstra, O'Dea, Tighe, Nicholas and Christensen2015; Leonard Westgate, Shiner, Thompson, & Watts, Reference Leonard Westgate, Shiner, Thompson and Watts2015; Niculescu et al., Reference Niculescu, Levey, Phalen, Le-Niculescu, Dainton, Jain and Salomon2015; O'Dea et al., Reference O'Dea, Wan, Batterham, Calear, Paris and Christensen2015; Passos et al., Reference Passos, Mwangi, Cao, Hamilton, Wu, Zhang and Soares2016; Simon et al., Reference Simon, Johnson, Lawrence, Rossom, Ahmedani, Lynch and Shortreed2018; Torous et al., Reference Torous, Larsen, Depp, Cosco, Barnett, Nock and Firth2018) producing suicidal behavioral discriminators [areas under the curve (AUCs) = 0.60–0.80] that exceed those using isolated risk factors (AUCs = 0.58) (Franklin et al., Reference Franklin, Ribeiro, Fox, Bentley, Kleiman, Huang and Nock2017). In a recent systematic review, Burke, Ammerman, and Jacobucci (Reference Burke, Ammerman and Jacobucci2019) identified 35 articles assessing the use of machine learning techniques for predicting and identifying risk factors for suicide-related outcomes. This study found greater precision in the prediction of suicide outcomes when compared with articles that used traditional statistical methods. In addition, such studies replicated findings of known risk factors, identified new variables involved in the phenomenon of suicide, and allowed the identification of subgroups, helping to draw cut-off points for suicide risk that can be used in clinical practice (Burke et al., Reference Burke, Ammerman and Jacobucci2019).
Currently, this type of analysis has not been used to predict the risk of SA in patients with OCD at the individual level. In this sense, our objective will be to explore sociodemographic and clinical factors associated with SA in adult OCD patients. Considering that the evaluation of a patient with OCD is extremely complex due to its heterogeneity (several presentations and symptom dimensions) (Rosario-Campos et al., Reference Rosario-Campos, Miguel, Quatrano, Chacon, Ferrao, Findley and Leckman2006), the precise identification of which clinical variables can predict suicidal behavior at the individual level is clinically essential for the development of preventive and therapeutic interventions.
Methods
Participants
The sample was initially formed by 1001 OCD patients from seven university centers in six Brazilian cities, which made up the Brazilian OCD Research Consortium (C-TOC). Patients were interviewed between August 2003 and August 2009 and were included in the study because they met criteria for OCD confirmed by the Structured Clinical Interview for Axis Disorders I of DSM-IV (SCID-I) (First, Spitzer, Gibbon, & Williams, Reference First, Spitzer, Gibbon and Williams1997). Patients with psychotic disorders or any other condition that impaired their understanding of the evaluation questions were excluded (e.g. schizophrenia, autism spectrum disorder, intellectual disability), and each local Ethics Committee approved the research protocol. The detailed description of the evaluation protocol is available elsewhere (Miguel et al., Reference Miguel, Ferrão, do Rosário, de Mathis, Torres, Fontenelle and da Silva2008). For this paper, there were 42 (4.2%) patients excluded due to missing data concerning suicide information, resulting in a sample of 959 OCD patients.
Instruments
Subjects were interviewed by psychologists or psychiatrists trained in the application of the protocol and their structured interviews. The C-TOC protocol comprised sociodemographic data (sex, age, religion, marital status, occupation, socioeconomic classification, etc.), personal medical history, family psychiatric history, and other standardized instruments. Among them, the items below were relevant for this study:
The SCID-I and additional modules for tic and impulse control disorders were used to assess psychiatric comorbidities (First et al., Reference First, Spitzer, Gibbon and Williams1997). Attention deficit hyperactivity disorder (ADHD) and separation anxiety disorder were investigated through a module of the Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS) (Kaufman et al., Reference Kaufman, Birmaher, Brent, Rao, Flynn, Moreci and Ryan1997).
-
– The Yale Obsessive-Compulsive Brown Scale (Y-BOCS) was used to measure the severity of OCD symptoms. It has been translated to Portuguese by Asbahr, Lotufo, Turecki, and Miguel (Reference Asbahr, Lotufo, Turecki and Miguel1996).
-
– The Dimensional Yale-Brown Obsessive-Compulsive Scale (DY-BOCS) (Rosario-Campos et al., Reference Rosario-Campos, Miguel, Quatrano, Chacon, Ferrao, Findley and Leckman2006) evaluated the presence and severity of obsessive-compulsive symptoms (OCS) according to specific six dimensions, including obsessions and related compulsions, as well as investigated avoidance and the time spent with OCS, the level of anxiety, and symptom burden. Its scores range from 0 to 5 (maximum of 15 for each dimension). The negative impact of OCS is also measured (maximum score of 30), and the therapeutic response can be evaluated according to specific dimensions. This instrument was simultaneously validated in Portuguese and English.
-
– The Beck Depression Inventory (BDI) (Beck, Ward, Mendelson, Mock, & Erbaugh, Reference Beck, Ward, Mendelson, Mock and Erbaugh1961) and the Beck Anxiety Inventory (BAI) (Beck, Epstein, Brown, & Steer, Reference Beck, Epstein, Brown and Steer1988) for depressive and anxious symptoms assessment, respectively. These scales were translated to Portuguese by Gorenstein and Andrade (Reference Gorenstein and Andrade1996).
-
– The Yale OCD Natural History Questionnaire was used to examine OCS onset and course, including stressful life events.
As in previous publications with similar samples (Torres et al., Reference Torres, Ramos-Cerqueira, Ferrão, Fontenelle, do Rosário and Miguel2011; Velloso et al., Reference Velloso, Piccinato, Ferrão, Aliende Perin, Cesar, Fontenelle and do Rosário2016), we used a clinical questionnaire about suicide behaviors composed of seven questions with categorical answers (‘Yes’, ‘No’, ‘I do not know’), and included: ‘Have you ever thought that it was not worth living?’; ‘Did you ever wish you were dead?’; ‘Have you ever thought about taking your own life or committing suicide?’; ‘Have you planned to take your own life or commit suicide?’; ‘Have you ever tried to take your own life or commit suicide?’; ‘Did you need hospitalization/treatment at that time?’; ‘Have any of your relatives tried to commit suicide?’; ‘Have any of your relatives ever committed suicide?’. The question about attempted suicide (SA) was chosen as the outcome in this study.
Selection of predictor variables
The variables used in the model of this study were selected through a structured search on PubMed. We searched for articles that had clinical relevance in finding risk factors for suicide in OCD. Thus, the predictor variables selected to be used in ‘training’ the algorithm included demographic and clinical variables related to OCD and other psychopathologies that presented comorbidly in our database. This way of selecting the predictor variables is a method already used in previous studies that used machine learning (Passos et al., Reference Passos, Mwangi, Cao, Hamilton, Wu, Zhang and Soares2016; Perlis, Reference Perlis2013). Given the extensive number of variables present in our dataset (n = 1.680), we suggest that readers seek Miguel et al. (Reference Miguel, Ferrão, do Rosário, de Mathis, Torres, Fontenelle and da Silva2008) for a more detailed description of all the data collected. According to the meticulous C-TOC original dataset review [two authors, NAA and YAF, reviewed each variable of the dataset separately and, after a best estimate diagnosis technique (Maziade et al., Reference Maziade, Roy, Fournier, Cliche, Mérette, Caron and Dion1992), a consensus was achieved], 89 remained variables were entered into the analysis. Table S1 in the Supplementary Material resumes these remaining variables.
Statistical analysis
Descriptive analyses were performed first. The Kolmogorov–Smirnov test was used to assess the normality of the data. Continuous variables were described as means and standard deviations (s.d.) and/or median and minimum–maximum values (min-max) according to their distribution (normal or non-normal). Categorical variables were described as absolute values (n) and relative values (%). Statistics Package for Social Sciences 18.0 (SPSS) was used to perform the analysis (SPSS Inc., 2009).
Machine learning analysis was performed with R software (Version R 3.3.1) (R Core Team, 2013) and R Studio (Version 0.99.902) with the R package caret (Version 6.0-73) (Kuhn, Reference Kuhn2008). We ran experiments with the elastic net method (Zou & Hastie, Reference Zou and Hastie2005). We used machine learning techniques to address two problems in conventional multiple regression: (1) coefficients are unstable when high correlations exist among predictors; this is the case for the predictors included in the present study, leading to low replication of predictions in independent samples (Berk, Reference Berk2008); (2) traditional regression assumes additivity, whereas the predictors considered here might have non-additive effects (DiGangi et al., Reference DiGangi, Gomez, Mendoza, Jason, Keys and Koenen2013; Ozer, Best, Lipsey, & Weiss, Reference Ozer, Best, Lipsey and Weiss2003; Tolin & Foa, Reference Tolin and Foa2006). The elastic net method mitigates the first of these problems by introducing regularization factors while preserving the model interpretability provided by traditional regression. We leave using non-linear models for future work in the field as they present additional challenges for generalization and interpretation.
The elastic net is a machine learning method that comprises both feature selection characteristics with regularization and classification. In statistics and, in particular, in the fitting of linear or logistic regression models, the elastic net is a regularized regression method that linearly combines the L1 and L2 penalties of the lasso (least absolute shrinkage and selection operator is a regression analysis method that performs both variable selection and regularization in order to enhance the prediction accuracy and interpretability of the statistical model it produces) and ridge (it is the most commonly used method of regularization of ill-posed problems; in statistics, the method is known as ridge regression, in machine learning it is known as weight decay) methods, turning the method able to remove predictors with low impact to the outcome while regularizing for improved generalization. As our dataset is composed of several attributes, identifying the most important ones permits for a wider applicability and more practical use of our risk calculators.
To handle missing data, first, we removed all variables with more than 5% missing and imputed the remaining variables by the variable mean, for numerical variables, or by the variable mode, for categorical variables. For each analysis, we split our data into training (75% of the whole sample) and test datasets (25%). We deployed a standard machine learning protocol with 10-fold cross-validation, feature selection, hyperparameter tuning and class imbalance correction in the training dataset (Fig. S1 in the Supplementary Material) with model selection based on the area under the receiver operating characteristic (ROC) curve. Class imbalance usually leads to very different sensitivity and specificity scores in the model; to account for this, we balanced each class predominance with class weighting. Each class was weighted inversely proportional to its frequency in the training set, or:

where W 1 is the weight for the positive class and F 0 is the frequency of the negative class in the training set. Accordingly, W 0 = 1–W 1. Class weighting allows us to use the whole training set instead of relying on down-sampling techniques.
We calculated individual-level predicted SA probabilities based on the elastic net algorithm, as well as ROC and AUC to evaluate prediction accuracy. Additionally, we calculated sensitivity, specificity, balanced accuracy, positive predictive value (PPV), and negative predictive value (NPV) when the cut-off is selected as 0.5. Elastic net regularization predicted probabilities were then discretized into deciles (10 groups of equal size ordered by percentiles) and cross-classified with observed SA.
From the variables selected by the elastic net algorithm to distinguish suicide attempters from non-attempters, only those with importance weighting factor higher than 20% were considered as clinically and epidemiologically relevant.
Results
Among the 959 adult patients included, there were 413 (43.1%) males and 546 females (56.9%). The mean (s.d.) age was 34.95 (12.94), with a median of 32, ranging from 9 to 82 years of age; the mean number of years studied was 14.51 (4.95), with a median of 14 and ranging from 1 to 31 years of study. Regarding marital status, 520 (54.2%) were single or without stable partners; 378 (39.4%) reported having children; 560 (58.4%) were Catholics, while 298 (31.1%) were from other religions and 101 (10.5%) said they had no religion. Of the 858 (89.5%) patients who had some religion, 521 (60.7%) admitted to being practitioners. From the occupational point of view, 415 (43.3%) were working, 166 (17.3%) were students, 102 (10.6%) were housewives, 104 (10.8%) were retired, 22 (15.6%) were unemployed, and when questioned about paid employment, 457 (47.7%) had no remuneration. The socioeconomic class, according to ABIPEME, showed that 157 (16.4%) were of class A, 369 (38.5%) of class B, 341 (35.6%) of class C, and 92 (9.6%) were of classes D and E. For clinical variables, 231 (24.1%) had never performed psychiatric or psychological treatment up to the interview. Of the 728 (75.9%) who had gone through specialized care, 675 (92.7%) used a serotonin reuptake inhibitor, 138 (19.0%) used another type of antidepressant, 365 (50.1%) used a benzodiazepine, 105 (14.4%) used a mood stabilizer, 49 (6.7%) used lithium, and 192 (26.4%) used a neuroleptic. Moreover, 609 (83.7%) had gone through psychotherapy [148 (20.3%) on cognitive-behavioral therapy for OCD]. There were 65 (8.9%) patients with a history of psychiatric hospitalization and 12 (1.7%) who had undergone electroconvulsive therapy.
From the psychopathological point of view, Table 1 shows the descriptive results of phenomena intrinsic to the symptomatological core of OCD, while Table 2 presents clinical characteristics extrinsic to the psychopathological nucleus of OCD.
Table 1. Descriptive results of the psychopathological variables intrinsic to the phenomenology of obsessive-compulsive disorder

s.d., standard deviation; n, absolute value; %, relative value; OCS, obsessive-compulsive symptoms; OCD, obsessive-compulsive disorder; DY-BOCS, Dimensional Yale-Brown Obsessive Compulsive Scale; Y-BOCS, Yale-Brown Obsessive Compulsive Scale; BABS, Brown Assessment of Beliefs Scale.
Table 2. Descriptive results of the psychopathological variables extrinsic to the phenomenology of obsessive-compulsive disorder

s.d., standard deviation; BDI, Beck Depression Inventory; BAI, Beck Anxiety Inventory; n, absolute value; %, relative value.
Figure 1 shows the prevalence rates of suicidality in patients with OCD. Of those who committed an SA, 19 (1.99%) had a history of hospitalization – 15 (78.9%) in a general hospital and four (21.1%) in a psychiatric unit.

Fig. 1. Prevalence of suicidality phenomena in the sample of adults with obsessive-compulsive disorder. Graphics from 1a to 1g show the percentages of each of the aspects related to suicidality. Figure 1a, b = 915 patients with data available; Figure 1f, g = 958 patients with data available.
Machine learning analysis
The model showed an accuracy of 85% in distinguishing individual attempters from non-attempters. The elastic net algorithm selected 18 variables that could predict SA: (1) previous suicide plans; (2) history of suicidal thoughts; (3) lifetime depressive episode; (4) lifetime intermittent explosive disorder (IED); (5) lifetime substance use/dependence disorder; (6) lower socioeconomic class; (7) lifetime anorexia disorder; (8) lifetime ADHD; (9) mulatto ethnicity; (10) lifetime kleptomania disorder; (11) presence of any sensory phenomena; (12) lifetime simple phobia; (13) lifetime panic/agoraphobia disorder; (14) familial history of OCD; (15) familial history of alcohol dependence; (16) familial history of psychosis; (17) familial history of SA; and (18) having no occupation. Thus, the most relevant (higher than 20% of importance weighting factor) predictor variables that remained in the model were: (1) previous suicide plans, (2) history of suicidal thoughts, (3) presence of lifetime depressive episode, and (4) presence of lifetime IED. Figure 2 shows the absolute weighting factors of each clinical variable that remained in the model normalized by the largest factor.

Fig. 2. Bar graph showing weighting factors assigned to each clinical variable by elastic net algorithm based on their relevance in distinguishing suicide attempters from non-attempters. The vertical line ‘cuts’ the importance weighting factors at the 20% level (considered clinically and epidemiologically relevant for this study).
Sensitivity and specificity of the model were 84.61% and 87.32%, respectively, with a balanced accuracy of 85.97% and significant at χ2 p < 0.0001. The PPV was 44.89% and the NPV was 97.89%, with an AUC of 0.95, obtained by ROC curve analysis (Fig. 3).

Fig. 3. Receiver operating characteristic curve of the predictive model.
All samples were ranked based on their probability of belonging to the positive class and separated in deciles. The percentage of positive and negative class samples was then analyzed at each decile based on their expected outcome. This showed that over half of the OCD sample which had an SA were recognized to be at the highest risk of SA, and 85% were categorized as at highest or second-to-highest risk of SA deciles. Likewise, 100% of the sample was categorized in the first three deciles of predictive risk of having SA. Figure 4 shows the concentration risk for SA based on output probability model, illustrating how suitable to real-world scenario the model would be. A perfectly calibrated model (e.g. the predicted probability translates into the exact SA chance) would have a smoother curve across the deciles, which was not the case. However, the high prevalence of SA among the two first deciles shows that high probabilities predicted by the model concentrated the most cases of SA. Therefore, instead of relying only on the probabilities predicted by the model, this chart can be used to provide a more realistic suicide risk for individuals, since it is grounded on previous out-of-sample predictions.

Fig. 4. Concentration of risk for suicide attempts among OCD patients.
Discussion
Describing suicidality in patients with OCD
In our study, 104 (10.8%) patients with OCD attempted suicide throughout their lives, a similar rate to the 10.3% average found in the systematic review by Angelakis et al. (Reference Angelakis, Gooding, Tarrier and Panagioti2015) and identical to the median of 10.8% (mean of 14.2%) in clinical samples from the systematic review by Albert et al. (Reference Albert, De Ronchi, Maina and Pompili2018). Two previous studies by our group on suicide in OCD found SA rates of 11% in a subsample of 582 patients (n = 64) (Torres et al., Reference Torres, Ramos-Cerqueira, Ferrão, Fontenelle, do Rosário and Miguel2011) and 19.4% in a sub-sample of 356 (n = 69) (Velloso et al., Reference Velloso, Piccinato, Ferrão, Aliende Perin, Cesar, Fontenelle and do Rosário2016). However, the sample size of these two studies was smaller than the present study.
In the WHO and World Mental Health (WMH) Survey, which 108 705 adults from 21 countries were evaluated, the prevalence of SA was 0.3% for developed countries and 0.4% for developing countries (Borges et al., Reference Borges, Nock, Haro Abad, Hwang, Sampson, Alonso and Kessler2010). In our study, the rates of SA were higher than in population-based samples, since we used a sample from tertiary psychiatric clinical services. Our findings, therefore, are in line with previous findings that suicidal behavior is more prevalent in the OCD population than in the general population (Albert et al., Reference Albert, De Ronchi, Maina and Pompili2018; Angelakis et al., Reference Angelakis, Gooding, Tarrier and Panagioti2015; Fernández de la Cruz et al., Reference Fernández de la Cruz, Rydell, Runeson, D'Onofrio, Brander, Rück and Mataix-Cols2017).
Predictive factors of suicide attempted in patients with OCD
To the best of our knowledge, this is the first study to use machine learning techniques to explore possible predictive factors for individual SA in patients with OCD. We found 18 possible predictive variables of SA, but only four were clinically and epidemiologically relevant. Regarding sociodemographic variables, our negative finding is in agreement with previous studies showing no association between suicidality and marital status, family status, level of education, employment, religion, quality of life, clinical course, age of onset, or family history of suicide or suicidality (Kamath, Reddy, & Kandavel, Reference Kamath, Reddy and Kandavel2007; Torres et al., Reference Torres, de Abreu Ramos-Cerqueira, Torresan, de Souza Domingues, Hercos and Guimarães2007, Reference Torres, Ramos-Cerqueira, Ferrão, Fontenelle, do Rosário and Miguel2011).
Several studies have pointed out that suicidality in OCD may be associated with the phenomena intrinsic to the psychopathological core of the disorder, which seems to have its effect mediated by the severity of OCD symptoms (Hung et al., Reference Hung, Tang, Chiu, Chen, Chou, Chiou and Chang2010; Torres et al., Reference Torres, de Abreu Ramos-Cerqueira, Torresan, de Souza Domingues, Hercos and Guimarães2007): sexual/religious/moral dimensions (Dell'Osso et al., Reference Dell'Osso, Benatti, Arici, Palazzo, Altamura, Hollander and Zohar2018; Fernández de la Cruz et al., Reference Fernández de la Cruz, Rydell, Runeson, D'Onofrio, Brander, Rück and Mataix-Cols2017; Torres et al., Reference Torres, Ramos-Cerqueira, Ferrão, Fontenelle, do Rosário and Miguel2011), aggressiveness/catastrophe (Balci & Sevincok, Reference Balci and Sevincok2010), and symmetry/organization and arrangement (Alonso et al., Reference Alonso, Segalàs, Real, Pertusa, Labad, Jiménez-Murcia and Menchón2010). Obsessions involving sexual/religious/moral and aggressive/catastrophic content are considered as ‘taboo thoughts’ and could present a higher degree of shame, guilt, and responsibility for the content of their obsessions, which may lead to thoughts of death, suicide ideation, SA, and suicide (Angelakis et al., Reference Angelakis, Gooding, Tarrier and Panagioti2015; Balci & Sevincok, Reference Balci and Sevincok2010; Gupta, Avasthi, Grover, & Singh, Reference Gupta, Avasthi, Grover and Singh2014; Kamath et al., Reference Kamath, Reddy and Kandavel2007). Our findings, however, show that no OCD psychopathology intrinsic variable is predictive of SA. Thus, suicidality in OCD does not appear to be related to the disorder itself. Concurrently, Sareen et al. (Reference Sareen, Cox, Afifi, de Graaf, Asmundson and ten Have2005), in a population study with traditional statistical analysis, did not find an independent association between OCD and suicidality, after adjustment of sociodemographic variables and control for psychiatric comorbidities. The only intrinsic variable from the nuclear phenomenology of OCD psychopathology that remained important in the machine learning model in our study was the presence of any sensory phenomenon but did not reach the importance value of pre-stipulated clinical-epidemiological relevance.
Comorbidities
The two lifetime comorbidities that remained predictive of SA were major depressive disorder and IED. In two previous systematic reviews, higher rates of suicidality were found in patients with OCD who presented concomitant comorbidities (especially higher severity of depressive and anxious symptoms) and previous history of suicidal behavior and feelings of hopelessness (Albert et al., Reference Albert, De Ronchi, Maina and Pompili2018; Fernández de la Cruz et al., Reference Fernández de la Cruz, Rydell, Runeson, D'Onofrio, Brander, Rück and Mataix-Cols2017). On the other hand, Torres et al. (Reference Torres, Prince, Bebbington, Bhugra, Brugha, Farrell and Singleton2006), after controlling for variables such as the presence of comorbidities, found that the risk of suicidal behavior in OCD remained significant, reaching up to 25% of prevalence, with an independent effect of OCD symptoms in suicidality, since patients with OCD without comorbidities did not differ from those with comorbidities in terms of the prevalence of suicidal behavior. Such divergence may be due to each study's samplings (clinical × epidemiological) and/or analytical approaches.
Depression as comorbidity predictive of attempted suicide in OCD
The presence of depressive symptoms has an important impact on the expression of suicidality among patients with OCD, since depression is one of the most common comorbid conditions in OCD (Kamath et al., Reference Kamath, Reddy and Kandavel2007; Maina, Salvi, Tiezzi, Albert, & Bogetto, Reference Maina, Salvi, Tiezzi, Albert and Bogetto2007; Tükel, Meteris, Koyuncu, Tecer, & Yazici, Reference Tükel, Meteris, Koyuncu, Tecer and Yazici2006). Depression alone is one of the major factors for suicidal behavior (Bertolote et al., Reference Bertolote, Fleischmann, De Leo, Bolhari, Botega, De Silva and Wasserman2005) and is associated with all suicidal outcomes (Scocco, de Girolamo, Vilagut, & Alonso, Reference Scocco, de Girolamo, Vilagut and Alonso2008).
We can hypothesize that the existence of comorbid depressive symptoms results in more severe and incapacitating OCS, with a higher occurrence of suicidal behaviors compared to patients with OCD without depressive symptoms (Hollander et al., Reference Hollander, Greenwald, Neville, Johnson, Hornig and Weissman1996). Since mood changes appear to be consequences of chronic stress and injury associated with the severity of OCD symptoms (Angst et al., Reference Angst, Gamma, Endrass, Goodwin, Ajdacic, Eich and Rössler2004), depression diagnosis and/or severity may be an indicator of OCD severity and could ‘link’ OCD and suicide outcomes, even if secondary to OCD (Torres et al., Reference Torres, de Abreu Ramos-Cerqueira, Torresan, de Souza Domingues, Hercos and Guimarães2007).
IED as comorbidity predictive of attempted suicide in OCD
In another study of our group, with a smaller sample (n = 582), the authors found that impulse control disorders were independently associated with SI, suicide plans, and SA (Torres et al., Reference Torres, Ramos-Cerqueira, Ferrão, Fontenelle, do Rosário and Miguel2011). Our finding that IED comorbidity in patients with OCD may predict SA with machine learning analysis reinforces the relationship between the cited conditions. The association of IED with suicidality has been found in other studies, independently of OCD comorbidity (Gelegen & Tamam, Reference Gelegen and Tamam2018; Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He and Kessler2007). A community-based survey found that impulse-control disorders were strongly associated with SA (Lee et al., Reference Lee, Fung, Tsang, Liu, Huang, He and Kessler2007). Likewise, in a cross-sectional study, the lifetime prevalence of IED was 2.7 times higher among those with lifetime SA (Gelegen & Tamam, Reference Gelegen and Tamam2018). We hypothesized that impulsive and disruptive behavior can contribute to the association with suicidal phenomena in this group of patients. In our sample, the lifetime prevalence of IED was higher (7.4%) than the 12-month prevalence in the USA (2.7%) (American Psychiatric Association, 2013), which may have impacted our results.
Psychopathological aspects of suicidality as predictive factors of SA in patients with OCD
In our study, the risk of SA in OCD can be predicted with high accuracy by SI and plans, as these two variables accounted for about 85% of the prediction of SA. As expected, the results corroborate that SI, SA, and consummated suicide represent different aspects of the same psychopathological phenomenon: the spectrum of suicidality. In two other studies in our group, there was also a progressive severity continuity of the phenomenon of suicide in patients with OCD, evolving from ideation to planning and attempting suicide, culminating in ‘per se’ consumption of suicide (Torres et al., Reference Torres, Ramos-Cerqueira, Ferrão, Fontenelle, do Rosário and Miguel2011; Velloso et al., Reference Velloso, Piccinato, Ferrão, Aliende Perin, Cesar, Fontenelle and do Rosário2016).
According to the systematic review by Albert et al. (Reference Albert, De Ronchi, Maina and Pompili2018), individuals with OCD had an odds ratio ranging from 1.9 to 10.3 to present lifetime SI, when compared to the general population. The risk levels remained significant even after controlling for confounding variables (AOR between 3.8 and 5.58). In a prospective population-based study (36 000 patients with OCD), Fernández de la Cruz et al. (Reference Fernández de la Cruz, Rydell, Runeson, D'Onofrio, Brander, Rück and Mataix-Cols2016) reported that most patients with prior SA had at least one comorbid psychiatric pathology, while about 40% of those with a history of completed suicide had OCD without comorbidities. Thus, the authors concluded that OCD without comorbidity is more associated with death by suicide and OCD with comorbidities is more associated with SA. After controlling for the impact of different comorbidities, the risk remained significant, which would reinforce the idea that OCD alone poses a higher risk of suicidal behavior, contrary to our study.
In the present study, SA in patients with OCD was not predicted by characteristics of the disorder itself, but by precursor aspects of SA (thoughts and suicidal planning) and two comorbidities: major depression and IED. Such a finding seem to diverge from previous studies, where OCD was associated with suicidal behavior even after control of confounding factors; such difference may be due to the use of different analytic methodology. The use of ML algorithms is better applied in scenarios where numerous variables must be considered simultaneously to estimate the probability of an event occurring. ML techniques do not ignore small effects to perform predictions or to identify patterns (Mwangi, Ebmeier, Matthews, & Steele, Reference Mwangi, Ebmeier, Matthews and Steele2012; Perlis, Reference Perlis2013).
Machine learning for prediction of suicidality and future directions
Standard investigation, including analysis in the field of suicide in OCD, has focused on traditional statistical approaches that explore a linear relationship between variables at group-level data. In this context, machine learning approaches can be advantageous and have increasingly been used in prognostic psychiatry, as they can assume a complex relationship between variables, including non-linear patterns, and are focused at an individual patient level (Dwyer, Falkai, & Koutsouleris, Reference Dwyer, Falkai and Koutsouleris2018). In the last 6 years, therefore, several studies proposed machine learning models to predict SA in general population. A systematic review and simulation study about suicide prediction models in general population, however, showed that machine learning for suicide prediction had good classification accuracy, but with low predictive positive value (Belsher et al., Reference Belsher, Smolenski, Pruitt, Bush, Beech, Workman and Skopp2019). It is somewhat expected that models with rare events, such as suicide in general population, have a low PPV. Accordingly, we suggest that future studies should address specific populations with higher rates for SA, such as patients with OCD or depressive disorders.
Other studies have evaluated suicide outcomes among samples from patients with specific psychiatric disorders. Passos et al. (Reference Passos, Mwangi, Cao, Hamilton, Wu, Zhang and Soares2016) showed a clinical signature using a relevant vector machine to classify SA history throughout life in patients diagnosed with major depression or bipolar disorder. The most accurate model showed an AUC of 0.77, sensitivity of 0.72, and specificity of 0.71. The most relevant predictor variables were previous hospitalizations for depression, history of psychosis throughout life, cocaine dependence, and comorbidity with post-traumatic stress disorder (Passos et al., Reference Passos, Mwangi, Cao, Hamilton, Wu, Zhang and Soares2016).
Another study (Oh, Yun, Hwang, & Chae, Reference Oh, Yun, Hwang and Chae2017) built artificial neural network models to identify SA in 1 month, 1 year, and throughout life among individuals with depression and anxiety disorders, using self-reported psychiatric questionnaires. This report found an overall accuracy of 93.7% in 1 month, 90.8% in 1 year, and 87.4% in lifetime SA detection. Lastly, a cross-sectional study developed four machine learning models to estimate the probability of a patient with schizophrenia for attempting suicide. The model with better performance found an AUC = 0.71 (Hettige et al., Reference Hettige, Nguyen, Yuan, Rajakulendran, Baddour, Bhagwat and De Luca2017).
Lastly, we hypothesize that future studies integrating data from different biological levels, such as genetics, digital health, and other socio-environmental data, in specific populations could help to build more accurate models.
Limitations
The limitations of this study include the clinical-referred nature of the sample, thus possibly not representative of patients with OCD in the community. It is a cross-sectional study, subject to memory bias and therefore not ideal for the identification of predictors. With respect to the machine learning analysis, the elastic net presents some important limitations. Non-linear relationships between the outcome and the predictors are not captured, which might be the case in a complex phenomenon like SA. It also is the case that elastic net fails to include relationships between variables, only relying on the same principles that traditional regression does, although with improved capabilities for out-of-sample generalization.
Since the database was not specially constructed to study the suicidality spectrum, a specific and validated questionnaire was not applied, which impacts directly on the present report's premise. Moreover, aspects such as duration, variability, and persistence of the suicidal behaviors were not measured. Also, no control of potentially triggering variables at the time of SA was done, as well as interference of the severity of OCD symptoms and of psychiatric comorbidities to lead to SA. A smaller part of our sample had previously been evaluated in two other studies in our group, which may be argued to generate circular knowledge; however, the methodology applied differs fundamentally, confirming some results, but proposing others, diverse from the previous literature. Finally, the predictor variables included in the algorithm were defined based on previous studies and may not ideally represent all possible predictors.
Conclusions
This is the first study that uses ML algorithm to explore the predictors of individual risk of SA in patients with OCD. Our study adds to evidence that suicidality is a relevant phenomenon in OCD patients and demonstrates that an algorithm of ML can predict SA among patients with OCD. Our results also indicate that suicidality appears as a continuum and that it is imperative to analyze and investigate manifestations of suicidality actively in all patients with OCD, but more aggressively among those with comorbid depressive symptoms. The ML techniques are ideal for identification of the pattern of multiple variables with complex combinations and can be used to stratify the risk of SA for a given person by comparing the probability of SA predicted by the algorithm for their variables as compared to those calculated from a dataset of subjects with known outcomes.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/S0033291720002329.
Acknowledgements
We thank Dr James Leckman for sharing unpublished findings with our team, which were used for to evaluate the history of stressful events in patients with OCD.
Financial support
The present work was carried out with the support of the Coordination of Improvement of Higher Education Personnel – Brazil (CAPES) – Financing Code 001 [Ygor Arzeno Ferrão and Neusa Agne were stockholders of the National Research Council (CNPq)].
Conflict of interest
The authors declare that there is no conflict of interest regarding the publication of this article.