Introduction
Personality disorder (PD) diagnoses have an important public health impact as they predict increased utilization of medical and mental health care services (Twomey et al., Reference Twomey, Baldwin, Hopfe and Cieza2015; Tyrer et al., Reference Tyrer, Reed and Crawford2015; Huprich, Reference Huprich2018). Studies using structured diagnostic interviews have identified a PD diagnosis in 40–82% of psychiatric outpatient populations (Zimmerman et al., Reference Zimmerman, Rothschild and Chelminski2005; Newton-Howes et al., Reference Newton-Howes, Tyrer, Anagnostakis, Cooper, Bowden-Jones and Weaver2010; Beckwith et al., Reference Beckwith, Moran and Reilly2014) and in 64–74% of psychiatric inpatient populations (Grilo et al., Reference Grilo, McGlashan, Quinlan, Walker, Greenfeld and Edell1998; Keown et al., Reference Keown, Holloway and Kuipers2005; Stevenson et al., Reference Stevenson, Datyner, Boyce and Brodaty2011), further increasing utilization in these settings (Twomey et al., Reference Twomey, Baldwin, Hopfe and Cieza2015).
The variability in these prevalence estimates suggests the challenge of studying PDs in real-world settings. Despite high levels of usage of health care resources, and high rates of polypharmacy and hospital admissions (Quirk et al., Reference Quirk, Berk, Chanen, Koivumaa-Honkanen, Brennan-Olsen, Pasco and Williams2016) and the economic burden associated (Soeteman et al., Reference Soeteman, Hakkaart-Van Roijen, Verheul and Van Busschbach2008), evaluating personality dimensions is still not a part of routine assessment in psychiatric inpatient units (Fok et al., Reference Fok, Stewart, Hayes and Moran2014; Jacobs et al., Reference Jacobs, Gutacker, Mason, Goddard, Gravelle, Kendrick and Gilbody2015). Likewise, in administrative data sets, PDs may not be coded consistently, or may be treated as a single undifferentiated category (Jiménez et al., Reference Jiménez, Lam, Marot and Delgado2004; McLay et al., Reference McLay, Daylo and Hammer2005; Compton et al., Reference Compton, Craw and Rudisch2006; Jacobs et al., Reference Jacobs, Gutacker, Mason, Goddard, Gravelle, Kendrick and Gilbody2015; Newman et al., Reference Newman, Harris, Evans and Beck2018). On the other hand, the current categorical diagnosis for PDs has been questioned as not scientifically valid, while PD clinical features are being increasingly understood as dimensional phenotypes (Bjelland et al., Reference Bjelland, Lie, Dahl, Mykletun, Stordal and Kraemer2009; Haslam et al., Reference Haslam, Holland and Kuppens2012; Skodol, Reference Skodol2012; Tyrer et al., Reference Tyrer, Reed and Crawford2015). Accordingly, the DSM-5 and ICD-11 have both moved toward dimensional models of PD (Bach et al., Reference Bach, Sellbom and Simonsen2018a,Reference Bach, Sellbom, Skjernov and Simonsenb) and remain to be studied. Novel approaches to explore personality dimensions in psychiatric cohorts are needed (Quirk et al., Reference Quirk, Berk, Chanen, Koivumaa-Honkanen, Brennan-Olsen, Pasco and Williams2016).
To address this gap, we applied natural language processing (NLP) of electronic health records (EHRs) to characterize a large inpatient psychiatric cohort (Manning and Schiitze, Reference Manning and Schiitze1999). We hypothesized that EHR notes would capture relevant clinical descriptions as unstructured data, quantifiable by validated algorithmic tools that have been previously used for medical (Yu et al., Reference Yu, Kumamaru, George, Dunne, Bedayat, Neykov, Hunsaker, Dill, Cai and Rybicki2014; Yim et al., Reference Yim, Yetisgen, Harris and Kwan2016) and mental health research (Althoff et al., Reference Althoff, Clark and Leskovec2016; Can et al., Reference Can, Marín, Georgiou, Imel, Atkins and Narayanan2016; McCoy et al., Reference McCoy, Castro, Roberson, Snapper and Perlis2016; Birnie et al., Reference Birnie, Stewart and Kolliakou2018; McCoy et al., Reference McCoy, Yu, Hart, Castro, Brown, Rosenquist, Doyle, Vuijk, Cai and Perlis2018; Afshar et al., Reference Afshar, Phillips, Karnik, Mueller, To, Gonzalez, Price, Cooper, Joyce and Dligach2019). In particular, we examined the relationship between these dimensions and sociodemographic and clinical features, as a means of more comprehensively characterizing personality psychopathology in a real-world setting.
Methods
Subjects
Sociodemographic and clinical data were extracted from the health records of patients in the adult psychiatry inpatient unit at Massachusetts General Hospital between 2010 and 2016. Sociodemographic data included age, sex, race, and type of insurance, as well as relevant clinical factors such as admission route (i.e. either via the emergency room or not), length of stay, and Charlson Comorbidity Index. Admission and discharge documentation were extracted for estimation of personality trait domains by NLP. These EHR data were managed as an i2b2 datamart (Murphy et al., Reference Murphy, Weber, Mendis, Gainer, Chueh, Churchill and Kohane2010).
The Partners HealthCare Human Research Committee approved the study protocol, waiving the requirement for informed consent as detailed by 45 CFR 46.116 as no participant contact was required in this study based on secondary use of data arising from routine clinical care.
Generation of personality phenotypes
Building on our prior work in transdiagnostic psychiatric phenotypes, we developed personality-specific transdiagnostic phenotypes based on NLP (McCoy et al., Reference McCoy, Yu, Hart, Castro, Brown, Rosenquist, Doyle, Vuijk, Cai and Perlis2018). This process seeds an NLP model using expert-defined, or curated, terms. As with our prior work, we consulted relevant texts to guide phenotypic seed term generation; in this case, the DSM-5 and ICD-11. The DSM-5 (section III) (American Psychiatric Association, 2013) and ICD-11 (Tyrer et al., Reference Tyrer, Reed and Crawford2015; Bach and First, Reference Bach and First2018) assess PDs based on determining levels of functioning/impairment and stylistic traits organized in personality dimensions. In the DSM-5, these dimensions are Negative Affectivity, Detachment, Antagonism, Disinhibition, and Psychoticism. The ICD-11 includes the same dimensions, except Psychoticism, and adds Anankastia (or Compulsivity) as a new dimension. Definitions of overlapping dimensions are similar between the DSM-5 and ICD-11 (Bach et al., Reference Bach, Sellbom, Skjernov and Simonsen2018b). These extracted trait domain definitions, according to Skodol (Reference Skodol2018) and Tyrer et al. (Reference Tyrer, Reed and Crawford2015), are shown in Table 1 along with the examples of personality features that comprise these dimensions. These DSM-5 and ICD-11 derived terms were then expanded using the Personality Inventory for DSM-5 items (Krueger et al., Reference Krueger, Derringer, Markon, Watson and Skodol2012), other personality trait studies (Ashton et al., Reference Ashton, Lee, Perugini, Szarota, de Vries, Di Blas, Boies and De Raad2004, Reference Ashton, Lee, de Vries, Hendrickse and Born2012; Bach et al., Reference Bach, Sellbom and Simonsen2018a, Reference Bach, Sellbom, Skjernov and Simonsen2018b), and a thesaurus (Dictionary.com, LLC, 2019). From the generated synonym list, a clinically refined set of NLP seed terms was selected based on expert consensus (S.A.B., R.H.P.; Table 1).
Table 1. Personality trait domains in DSM-5 and ICD-11
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201014124104747-0365:S0033291719002320:S0033291719002320_tab1.png?pub-status=live)
As these pre-selected term lists are unlikely to capture the full diversity of clinical vocabulary, we applied a previously reported method for expanding clinical vocabularies (McCoy et al., Reference McCoy, Yu, Hart, Castro, Brown, Rosenquist, Doyle, Vuijk, Cai and Perlis2018). In this method, Latent Dirichlet allocation (LDA) is used to fit a probabilistic topic model to all documents. The use of topic loadings as LDA-determined phenotypes has been used for computational phenotyping and is discussed in our prior research (McCoy et al., Reference McCoy, Castro, Snapper, Hart, Januzzi, Huffman and Perlis2017, Reference McCoy, Yu, Hart, Castro, Brown, Rosenquist, Doyle, Vuijk, Cai and Perlis2018). Briefly, with an LDA-based topic model, documents are probability distributions over topics, and each topic is a probability distribution over the full vocabulary (Blei et al., Reference Blei, Ng and Jordan2003; Blei, Reference Blei2012). The posterior distributions of the term-topic distributions are inspected to identify the topic under which the cumulative probability of the expert-selected personality token within each list is greatest. This total cumulative probability of the seed word list is used to identify the relevant topic. Thereafter, that topic's topic-document weights are used as the phenotype for the relevant domain. In essence, this approach asks which LDA topics capture the greatest number of curated tokens for a given PD, and then uses the ‘best’ topic to represent that disorder. The tokens (terms) incorporated in topics corresponding to each concept are listed in Table 1, and the entire process is outlined in Fig. 1. For the topic modeling, we used the R interface to a Gibbs sampler implementation of LDA (topicmodels v0.2), one of many widely used open source implementations of LDA licensed under free software licenses (McCallum, Reference McCallum2002; Řehůřek and Sojka, Reference Řehůřek and Sojka2010; Grün and Hornik, Reference Grün and Hornik2018).
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201014124104747-0365:S0033291719002320:S0033291719002320_fig1.png?pub-status=live)
Fig. 1. Diagram of personality phenotype generation through the transfer of human expert language model into model learned through unsupervised machine learning. A probabilistic topic model is learned from patients’ clinical documentation through LDA. The learned topics are then matched to the personality symptom domains by linking the learned topic under which expert-identified tokens are most common. Thereafter, the linked topics are used as the phenotype for the linked personality domain.
Study design and analysis
We used robust clustering to account for individuals with multiple admissions. Linear regression modeling adjusting for sex, age, race, insurance type, Charlson Comorbidity Index, and route of admission was used to analyze personality domain loadings in different sociodemographic profiles. Linear regression adjusting for these sociodemographic variables, as well as for other personality trait domains, was used to explore the association between personality trait domains and hospital length of stay. Analyses utilized Stata/SE 13.1 (Statacorp, College Station, TX, USA).
Results
Characteristics of the full set of 4702 admissions for 3623 individuals are displayed in Table 2. Individual personality trait domains differed in their association with sociodemographic features (Table 3). Being male, non-white, having a low burden of medical comorbidity, being admitted through the emergency room, and having public insurance were independently associated with higher levels of disinhibition, detachment, and psychoticism. On the other hand, being female, white, and using private insurance were independently associated with increased levels of negative affectivity. Age was also associated with personality features: on average, patients with increased levels of disinhibition and psychoticism were younger, while patients with more negative affectivity were older.
Table 2. Cohort characteristics at admission
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201014124104747-0365:S0033291719002320:S0033291719002320_tab2.png?pub-status=live)
Table 3. Association between sociodemographic features and personality trait domains§
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201014124104747-0365:S0033291719002320:S0033291719002320_tab3.png?pub-status=live)
CI, confidence interval; ER, emergency room.
aβ (95% confidence interval) is equal to the variation (and its 95% CI) in days of length of stay, if the named personality domain score increased/decreased by 10%.
*p < 0.05; **p < 0.01; ***p < 0.001.
We next examined the association between personality trait domains extracted from clinical notes and length of inpatient stay. As shown in Table 4, the presence of disinhibition, psychoticism, and negative affectivity was significantly associated with a longer length of stay. In contrast, detachment was associated with a shorter length of stay. A 10% increase in the disinhibition domain score was associated with a ~2.7-day increase in length of stay. Similarly, a 10% increase in the psychoticism and negative affectivity domain scores was associated with an increase in length of stay of ~0.8 and ~0.7 days, respectively. On the other hand, having a 10% increase in detachment features was associated with a decreased length of stay by nearly 0.3 days.
Table 4. Regression model of personality trait domains and hospital length of stay (n = 4687 admissions)
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20201014124104747-0365:S0033291719002320:S0033291719002320_tab4.png?pub-status=live)
ER, emergency room.
aβ (95% confidence interval) is equal to the variation (and its 95% CI) in days of length of stay, if the named personality trait domain score increased/decreased by 10%.
Discussion
As anticipated based on studies using traditional personality measures, we observed an association between sociodemographic features and individual personality trait domains (Lynn and Martin, Reference Lynn and Martin1997; Kjelsås and Augestad, Reference Kjelsås and Augestad2004). Demographic profiles are useful to predict certain behaviors (Krismayer et al., Reference Krismayer, Schedl, Knees and Rabiser2019), but their relationship with dimensional traits is less studied (Al-Halabí et al., Reference Al-Halabí, Herrero, Saiz, Garcia-Portilla, Corcoran, Teresa Bascaran, Errasti, Lemos and Bobes2010).
In particular, we found that greater scores in disinhibition, negative affectivity, and psychoticism were associated with a significantly longer length of stay, while a greater score in detachment was associated with a decreased length of stay. One way to interpret the effect sizes we observed is to compare our results to the US national average length of stay in inpatient psychiatric units, which is 6.6 days (Heslin et al., Reference Heslin, Elixhauser and Steiner2015). According to our results, an increase of 10% in the disinhibition dimension score may increase inpatient length of stay by 40% when compared to the national average. Likewise, patients scoring 10% higher in either psychoticism or negative affectivity may have an increased length of stay by an extra 12% when compared to the national average. Conversely, an increase of 10% in the detachment dimension score may decrease length of stay by 6% when compared to the national average. Given these results, personality may be a relevant factor to consider in terms of length of stay in the psychiatric inpatient setting.
While there is no doubt that PDs in general are associated with an increase in mental health services use in the outpatient setting (Twomey et al., Reference Twomey, Baldwin, Hopfe and Cieza2015; Tyrer et al., Reference Tyrer, Reed and Crawford2015), this relationship has been less clear in terms of psychiatric inpatient services use. In contrast to our results, several epidemiological studies (Jacobs et al., Reference Jacobs, Gutacker, Mason, Goddard, Gravelle, Kendrick and Gilbody2015; Piccinelli et al., Reference Piccinelli, Bortolaso, Bolla and Cioffi2016; Pauselli et al., Reference Pauselli, Verdolini, Bernardini, Compton and Quartesan2017; Newman et al., Reference Newman, Harris, Evans and Beck2018) and service use studies (Jiménez et al., Reference Jiménez, Lam, Marot and Delgado2004; McLay et al., Reference McLay, Daylo and Hammer2005; Compton et al., Reference Compton, Craw and Rudisch2006; Leontieva and Gregory, Reference Leontieva and Gregory2013; Habermeyer et al., Reference Habermeyer, De Gennaro, Frizi, Roser and Stulz2018) have shown that PDs do not necessarily increase, and may even shorten, length of stay. Consequently, personality may have been overlooked as an addressable factor in efforts to optimize services use. Only a few studies have found that personality was associated with an increased length of stay (Tyrer and Simmonds, Reference Tyrer and Simmonds2003; Fok et al., Reference Fok, Stewart, Hayes and Moran2014). However, neither of these studies explored which personality traits or diagnosis was associated with this outcome.
The only prior study we identified that similarly investigated the association between different personality types and use of services in the psychiatric inpatient setting is Keown et al. (Reference Keown, Holloway and Kuipers2005). This study considered a cohort of 193 patients from a community served by a mental health team in the UK, who were assessed using a structured interview, diagnosed according to the ICD-10, and followed over a 4-year period. Keown et al. found that among non-psychotic patients, having paranoid, dependent, and emotionally unstable PD was associated with an increased length of stay by 150 days in the 4-year period when a patient had one PD disorder, and up to 321 days for patients who had two or all of these PD disorders. Among psychotic patients, length of stay was associated with having more paranoid and anxious traits. Conversely, in the latter group of psychotic patients, the presence of anankastic traits was associated with a shorter length of stay.
The results of the Keown et al. study are in line with those from our study, since there is evidence of a correspondence between unstable personality and disinhibition, between paranoid personality and psychoticism, and between anxiety/dependence and negative affectivity (Skodol, Reference Skodol2018). On the other hand, unlike the Keown et al. study, we found that detachment – and not anankastic traits – was associated with a shorter length of stay. Interpersonal distance and restriction in the expression of affect may be associated with diminished expression of need for care, so when behavioral symptoms remit, these patients may be more likely to be discharged. However, the anankastia domain also shows a correlation with detachment (Skodol, Reference Skodol2018), ranging from 0.46 (Bach et al., Reference Bach, Sellbom, Skjernov and Simonsen2018b) to 0.79 (Lugo et al., Reference Lugo, de Oliveira, Hessel, Monteiro, Pasche, Pavan, Motta, Pacheco and Spanemberg2019).
Limitations
There are several limitations of our study to be considered. Extracting personality trait domains from EHR notes of psychiatry inpatients is limited by the fact that topics identified by the NLP process may account for state-related symptoms in the context of acute psychiatric syndromes like depression or psychosis, and not for stable personality traits. However, some studies show that trait assessments established during acute episodes (e.g. a major depressive episode) may be valid reflections of personality pathology rather than artifacts of symptomatic state (Morey et al., Reference Morey, Shea, Markowitz, Stout, Hopwood, Gunderson, Grilo, McGlashan, Yen, Sanislow and Skodol2010; Sevilla-Llewellyn-Jones et al., Reference Sevilla-Llewellyn-Jones, Cano-Domínguez, de-Luis-Matilla, Peñuelas-Calvo, Espina-Eizaguirre, Moreno-Kustner and Ochoa2017). Another alternative is that personality may itself influence symptom expression, and hence a clinical feature may be an expression of both symptoms and traits (von Gunten et al., Reference von Gunten, Pocnet and Rossier2009; Widiger, Reference Widiger2011). Personality dimensions and common psychiatric disorders also covary (Wright and Simms, Reference Wright and Simms2015) and may be part of spectra, that is, larger constellations of syndromes sharing some common features (Kotov et al., Reference Kotov, Krueger, Watson, Achenbach, Althoff, Bagby, Brown, Carpenter, Caspi, Clark, Eaton, Forbes, Forbush, Goldberg, Hasin, Hyman, Ivanova, Lynam, Markon, Miller, Moffitt, Morey, Mullins-Sweatt, Ormel, Patrick, Regier, Rescorla, Ruggero, Samuel, Sellbom, Simms, Skodol, Slade, South, Tackett, Waldman, Waszczuk, Widiger, Wright and Zimmerman2017). The approach taken here does not distinguish trait from state effects, but it may still capture relevant clinical features at a given point in time.
Conversely, this study does address several key limitations in the prior evidence base. First, personality diagnosis tends to be overlooked by clinicians; some studies indicate that PD prevalence may be underestimated in psychiatry inpatient settings (Fok et al., Reference Fok, Stewart, Hayes and Moran2014; Jacobs et al., Reference Jacobs, Gutacker, Mason, Goddard, Gravelle, Kendrick and Gilbody2015), especially in the absence of structured assessments (Zimmerman et al., Reference Zimmerman, Chelminski and Young2008; Leontieva and Gregory, Reference Leontieva and Gregory2013; Newman et al., Reference Newman, Harris, Evans and Beck2018). Second, when using only a clinical diagnostic approach, there may be a variation regarding which disorders are more likely to be diagnosed and which may be overlooked. This may be based on factors such as symptom severity, expectation of response to treatment, or familiarity with particular PD diagnoses (Zimmerman and Morgan, Reference Zimmerman and Morgan2013; Zimmerman, Reference Zimmerman2016). Finally, most prior personality studies have used a categorical diagnostic approach for PD diagnosis, which has been criticized for its questionable validity (Haslam et al., Reference Haslam, Holland and Kuppens2012; Skodol, Reference Skodol2012; Tyrer et al., Reference Tyrer, Reed and Crawford2015).
To address these limitations, we used NLP and machine learning as a novel method to overcome underdiagnosis, selective diagnosis, and lack of characterization of personality in the inpatient setting. In particular, our methodology allows access to extensive clinical information, deliberations, and clinicians' clinical judgment that may not be reflected in coded diagnoses. Likewise, we used a dimensional model to assess personality, in contrast to previous studies that used a categorical approach. This method may account more realistically for specific and clinically significant personality features in the inpatient setting.
Conclusion
In aggregate, our study suggests that personality features can be systematically and scalably measured using NLP in the inpatient setting, and that these features may relevantly contribute to service utilization. Developing treatment strategies for patients scoring high in PD features may facilitate more efficient, targeted interventions, and may help reduce the impact on mental health service utilization.
Author contributions
Drs. Barroilhet and Perlis designed the study, analyzed the results, and drafted the manuscript. Dr. McCoy developed the natural language processing methodology and revised the manuscript. Ms. Pellegrini revised the manuscript. All authors have given final approval of this submission.
Financial support
This work was supported by the National Institute of Mental Health (R.H.P., grant number 1R01MH106577). The funder had no role in the design and conduct of the study; collection, management, analysis, and interpretation of the data; preparation, review, or approval of the manuscript; or decision to submit the manuscript for publication.
Conflict of interest
Dr. Barroilhet and Ms. Pellegrini report no conflicts of interest. Dr. McCoy receives research funding from the Brain and Behavior Research Foundation, National Institute of Aging, Telefonica Alfa, and The Stanley Center at the Broad Institute. Dr. Perlis holds equity in Psy Therapeutics and Outermost Therapeutics; serves on the scientific advisory boards of Genomind and Takeda; and consults to RID Ventures. Dr. Perlis receives research funding from NIMH, NHLBI, NHGRI, and Telefonica Alfa. Dr. Perlis is an associate editor for JAMA-Network Open.
Ethical standards
The authors assert that all procedures contributing to this work comply with the ethical standards of the relevant national and institutional committees on human experimentation and with the Helsinki Declaration of 1975, as revised in 2008. The study protocol has been approved by the Partners HealthCare Human Research Committee (protocol number 2016P002084). The requirement for informed consent was waived as detailed by 45 CFR 46.116 since no participant contact was required in this study based on secondary use of data arising from routine clinical care.