Background
The prognosis for mental illness requiring hospital treatment is often poor. Specifically, a substantial proportion of patients with mental illness i) develop physical comorbidities such as type 2 diabetes and cardiovascular disease (Katon, Reference Katon2008; Momen et al., Reference Momen, Plana-Ripoll, Agerbo, Benros, Børglum, Christensen, Dalsgaard, Degenhardt, de Jonge, Debost, Fenger-Grøn, Gunn, Iburg, Kessing, Kessler, Laursen, Lim, Mors, Mortensen, Musliner, Nordentoft, Pedersen, Petersen, Ribe, Roest, Saha, Schork, Scott, Sievert, Sørensen, Stedman, Vestergaard, Vilhjalmsson, Werge, Weye, Whiteford, Prior and McGrath2020), ii) progress towards more severe psychopathology, for example, from unipolar depression to bipolar disorder or schizophrenia (Musliner et al., Reference Musliner, Munk-Olsen, Mors and Østergaard2017; Musliner & Østergaard, Reference Musliner and Østergaard2018), iii) are resistant to first-line treatments (Rush et al., Reference Rush, Trivedi, Wisniewski, Nierenberg, Stewart, Warden, Niederehe, Thase, Lavori, Lebowitz, McGrath, Rosenbaum and Sackeim2006; Elkis & Buckley, Reference Elkis and Buckley2016), iv) develop substance abuse (Prochaska et al., Reference Prochaska, Gill, Hall and Hall2005; Messer et al., Reference Messer, Lammers, Müller-Siecheneder, Schmidt and Latifi2017), v) experience side effects to psychopharmacological treatment (Rothschild et al., Reference Rothschild, Mann, Keohane, Williams, Foskett, Rosen, Flaherty, Chu and Bates2007; Goldberg & Ernst, Reference Goldberg and Ernst2016), vi) are sentenced to treatment in forensic psychiatric settings (Tomlin et al., Reference Tomlin, Lega, Braun, Kennedy, Herrando, Barroso, Castelletti, Mirabella, Scarpa, Völlm, Pham, Müller-Isberner, Taube, Rivellini, Calevro, Liardo, Pennino, Markiewicz, Barbosa, Bulten, Thomson, Pustoslemšek, Arroyo, Seppänen, Thibaut, Kozaric-Kovacic, Palijan, Markovska-Simoska, Raleva, Šileikaitė, Germanavicius and Čėsnienė2021), vii) require involuntary treatment (Rains et al., Reference Rains, Zenina, Dias, Jones, Jeffreys, Branthonne-Foster, Lloyd-Evans and Johnson2019; Salagre et al., Reference Salagre, Rohde, Ishtiak-Ahmed, Gasse and Østergaard2020), or viii) may self-harm and attempt suicide (Qin & Nordentoft, Reference Qin and Nordentoft2005; Leadholm et al., Reference Leadholm, Rothschild, Nielsen, Bech and Ostergaard2014). As a consequence, patients with mental disorders generally experience reduced quality of life as well as lifespan (Tiihonen et al., Reference Tiihonen, Lönnqvist, Wahlbeck, Klaukka, Niskanen, Tanskanen and Haukka2009; Lawrence et al., Reference Lawrence, Hancock and Kisely2013; Laursen et al., Reference Laursen, Musliner, Benros, Vestergaard and Munk-Olsen2016; Tanskanen et al., Reference Tanskanen, Tiihonen and Taipale2018; Plana-Ripoll et al., Reference Plana-Ripoll, Pedersen, Holtz, Benros, Dalsgaard, de Jonge, Fan, Degenhardt, Ganna, Greve, Gunn, Iburg, Kessing, Lee, Lim, Mors, Nordentoft, Prior, Roest, Saha, Schork, Scott, Scott, Stedman, Sørensen, Werge, Whiteford, Laursen, Agerbo, Kessler, Mortensen and McGrath2019b, Reference Plana-Ripoll, Pedersen, Agerbo, Holtz, Erlangsen, Canudas-Romo, Andersen, Charlson, Christensen, Erskine, Ferrari, Iburg, Momen, Mortensen, Nordentoft, Santomauro, Scott, Whiteford, Weye, McGrath and Laursen2019a; Desalegn et al., Reference Desalegn, Girma and Abdeta2020). This imposes huge costs on both the individual and societal level and demands intervention (Thornicroft, Reference Thornicroft2013; Saxena, Reference Saxena2018).
Since it is not all patients with mental disorders that experience the above-mentioned negative outcomes, targeted intervention is helpful to address problems at an early stage and reduce costs (Offord, Reference Offord2000). To allow for such targeting, identification of elevated risk (outcome prediction) at the level of the individual is required (Collins & Varmus, Reference Collins and Varmus2015). However, as the risk/etiological factors contributing to the outcomes in question are multiple and likely interact, such prediction is not trivial (Mandelli & Serretti, Reference Mandelli and Serretti2013). To overcome this challenge, the emerging field of precision psychiatry aims to integrate data from several sources to produce personalised predictions of diagnostic, prognostic, and predictive nature (Salazar de Pablo et al., Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan, Oliver, Baldwin, Danese, Fazel, Steyerberg, Stahl and Fusar-Poli2020).
The emergence of precision psychiatry has been aided by advances in machine learning methods, mainly within the fields of deep learning and natural language processing (NLP). Deep learning, which applies non-linear processing to learn multiple layers of increasingly complex representations of data, has been shown to be extremely effective at a wide range of tasks (LeCun et al., Reference LeCun, Bengio and Hinton2015). For instance, deep learning models defeated the human world champion at the game of Go (Silver et al., Reference Silver, Huang, Maddison, Guez, Sifre, van den Driessche, Schrittwieser, Antonoglou, Panneershelvam, Lanctot, Dieleman, Grewe, Nham, Kalchbrenner, Sutskever, Lillicrap, Leach, Kavukcuoglu, Graepel and Hassabis2016), can generate convincing synthetic news stories (Brown et al., Reference Brown, Mann, Ryder, Subbiah, Kaplan, Dhariwal, Neelakantan, Shyam, Sastry, Askell, Agarwal, Herbert-Voss, Krueger, Henighan, Child, Ramesh, Ziegler, Wu, Winter, Hesse, Chen, Sigler, Litwin, Gray, Chess, Clark, Berner, McCandlish, Radford, Sutskever and Amodei2020), and have, in medical contexts, been used to predict specific outcomes such as diabetic retinopathy (Gulshan et al., Reference Gulshan, Peng, Coram, Stumpe, Wu, Narayanaswamy, Venugopalan, Widner, Madams, Cuadros, Kim, Raman, Nelson, Mega and Webster2016), severe diabetes (Miotto et al., Reference Miotto, Li, Kidd and Dudley2016), readmissions (Rajkomar et al., Reference Rajkomar, Oren, Chen, Dai, Hajaj, Hardt, Liu, Liu, Marcus, Sun, Sundberg, Yee, Zhang, Zhang, Flores, Duggan, Irvine, Le, Litsch, Mossin, Tansuwan, Wang, Wexler, Wilson, Ludwig, Volchenboum, Chou, Pearson, Madabushi, Shah, Butte, Howell, Cui, Corrado and Dean2018b), and in-hospital mortality (Rajkomar et al., Reference Rajkomar, Oren, Chen, Dai, Hajaj, Hardt, Liu, Liu, Marcus, Sun, Sundberg, Yee, Zhang, Zhang, Flores, Duggan, Irvine, Le, Litsch, Mossin, Tansuwan, Wang, Wexler, Wilson, Ludwig, Volchenboum, Chou, Pearson, Madabushi, Shah, Butte, Howell, Cui, Corrado and Dean2018b). NLP works with natural text and has made great strides since the introduction of the transformer neural network architecture (Vaswani et al., Reference Vaswani, Shazeer, Parmar, Uszkoreit, Jones, Gomez, Kaiser and Polosukhin2017) and Bidirectional Encoder Representations from Transformers (BERT) (Devlin et al., Reference Devlin, Chang, Lee and Toutanova2019). As a very sizeable proportion of the clinical information in electronic health records (EHRs) in psychiatry – and more so than in most other medical specialties – consists of clinical notes in natural language, NLP may be particularly useful for outcome prediction in this specific field (Rumshisky et al., Reference Rumshisky, Ghassemi, Naumann, Szolovits, Castro, McCoy and Perlis2016). Clinical notes offer practitioners a large degree of flexibility and freedom in describing the individual patients and thus provide more diverse and individual-specific insights than do structured data. For instance, early signs of diagnostic drift or treatment efficacy/resistance might be present in clinical notes, but due to time constraints and subtlety of the information, clinicians might struggle to leverage this information. Also, while conventional statistical methods are not suited for dealing with large amounts of texts, methods from NLP have shown great potential in using clinical notes for predicting psychiatric readmission (Rumshisky et al., Reference Rumshisky, Ghassemi, Naumann, Szolovits, Castro, McCoy and Perlis2016; Boag et al., Reference Boag, Kovaleva, McCoy, Rumshisky, Szolovits and Perlis2021) and diagnostic classification (Li et al., Reference Li, Rao, Solares, Hassaine, Ramakrishnan, Canoy, Zhu, Rahimi and Salimi-Khorshidi2020). Despite increased interest and new developments in machine learning methods, however, precision psychiatry has yet to prove its value for clinical practice (Manchia et al., Reference Manchia, Pisanu, Squassina and Carpiniello2020). Indeed, a recent systematic review of 584 prediction modelling studies in psychiatry found that the majority of studies lacked proper validation, and only a single study had resulted in implementation in clinical practice (Salazar de Pablo et al., Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan, Oliver, Baldwin, Danese, Fazel, Steyerberg, Stahl and Fusar-Poli2020). Without appropriate validation, the performance and generalisability of models to other data sets cannot be assessed, thus making them unsuitable for clinical implementation. For precision psychiatry to fulfill its promises, prediction modelling studies must place an explicit focus on validating models and clinical feasibility.
Using data from EHRs from the Central Denmark Region, we recently conducted a machine learning study using basic NLP methods to predict incident mechanical restraint of psychiatric inpatients (Danielsen et al., Reference Danielsen, Fenger, Østergaard, Nielbo and Mors2019). By using information from clinical notes readily available in the EHR, our model achieved an area under the receiver operating characteristic curve of 0.87, and a slightly modified version of this model is currently being implemented in clinical practice. Seeking to capitalise on these promising results – and more advanced methodologies – with the ultimate scope of improving a broad range of outcomes for individuals requiring hospital-based psychiatric treatment, we have established the PSYchiatric Clinical Outcome Prediction (PSYCOP) cohort. The aim of this paper is to describe this cohort, the planned analyses/studies, and the expected outcomes.
Methods
The PSYCOP cohort
The PSYCOP cohort consists of all individuals having at least one contact (emergency room, outpatient, or inpatient) with the psychiatric services of the Central Denmark Region from January 1, 2011 (initiation of the current EHR system (MidtEPJ)) to October 28, 2020, and covers 119 292 unique patients with a total observation time (days from first contact to last contact) of 231 262 years (median = 312 days, 25% quantile = 67 days, 75% quantile = 1073 days). The Central Denmark Region is one of five Danish Regions and has a population of approximately 1.3 million people. The psychiatric services of the Central Denmark Region consist of five psychiatric hospitals, which provide tax-supported (free) emergency room, outpatient and inpatient treatment to the inhabitants of the region.
Table 1 shows the distribution of the patients from the PSYCOP cohort across the diagnostic categories from the International Classification of Diseases – Tenth Revision (ICD-10) (Organization, 1992), stratified on age (<18 years and ≥18 years) and sex.
Table 1. Number of patients by main ICD-10 category, age and sex. Only the most severe diagnosis per patient is reported *
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211105044055431-0577:S0924270821000223:S0924270821000223_tab1.png?pub-status=live)
* If a patient had multiple diagnoses, the lowest ICD-10 code in this hierarchy is reported: F00–F09 > F20–F29 > F30–F39 > F40–F48 > F50–F59 > F60–F69 > F70–F79 > F80–F89 > F90–F98 > F10–F19 > Others.
As evident from Table 1, the sample size will suffice for analyses of some, but not all, specific diagnostic categories.
Fig. 1 shows the cohort members’ age at their first contact to the psychiatric services in the Central Denmark Region, including stratification on the major diagnostic categories.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211105044055431-0577:S0924270821000223:S0924270821000223_fig1.png?pub-status=live)
Fig. 1. Age at first contact by sex. Bins with less than 5 patients are set to 0 for data privacy purposes. (A) Age at first contact for any mental disorder. (B) Age at first contact by selected ICD-10 categories.
Here, the expected overrepresentation of young boys (in part due to autism and ADHD) and teenage girls (in part due to depression, anxiety and eating disorders) is evident (Dalsgaard et al., Reference Dalsgaard, Thorsteinsson, Trabjerg, Schullehner, Plana-Ripoll, Brikell, Wimberley, Thygesen, Madsen, Timmerman, Schendel, McGrath, Mortensen and Pedersen2020).
By October 28, 2020, a total of 11 412 patients were deceased (median age (years) = 80.4, 25% quantile = 66.9, and 75% quantile = 87.7), enabling investigation of mortality.
Electronic health record (EHR) data
Data from the MidtEPJ EHR system are mirrored on a daily basis (initiated each day at 5 pm and finished at approximately 6 am the following day) to servers managed by the Business Intelligence Office (BI-servers) in the Central Denmark Region. The data from these BI-servers are available for all patients in the PSYCOP cohort. The BI-servers include both structured/quantitative data and text from clinical notes. The structured data cover all lab results, height, weight, diagnoses registered in accordance with the ICD-10, information on administered medication (during inpatient treatment) and prescribed medication from 2016 and onwards (during both in- and outpatient treatment), and other treatments such as electroconvulsive therapy, repetitive transcranial magnetic stimulation, and psychotherapy. Data on medications include time for prescription, the name of the medication, its anatomical therapeutic chemical code (ATC) (World Health Organization, 1976), and route of administration. For medication administered during inpatient stays, information on whether the patient accepted or rejected the treatment is also stored. For inpatients treated for depression and bipolar disorder, ratings on the Hamilton Depression Rating Scale (Hamilton, Reference Hamilton1960) and the modified Bech–Rafaelsen Mania Scale (Straszek & Licht, Reference Straszek and Licht2019) are also available. Furthermore, the BI-servers contain structured metadata, such as the unique personal identification number, age, sex, the department responsible for the treatment, time of admission and discharge, and time for the beginning and end of outpatient treatment courses. The clinical notes contain text in natural language (e.g. a description of observed symptoms, treatment plans, conclusions, etc.). Each clinical note has a predefined heading (Sundhedsfagligt indhold (SFI) code) describing the content (e.g. ‘Suicide Risk Assessment’, ‘Subjective Mental State’, ‘Objective Mental State’, and ‘Social Functioning’), which is also stored in the BI-servers. Each data point in the BI-servers contains a timestamp that indicates when the data point was added to the MidtEPJ system, allowing for longitudinal and time-sensitive analyses. Notably, although the focus of the PSYCOP studies is on psychiatric outcomes, the EHR data on the PSYCOP cohort members are not limited to information from the psychiatric hospitals, but covers contacts from all hospital departments in the Central Denmark Region.
Planned analyses
Data from the cohort will be employed to predict multiple outcomes using the most recent advances in machine learning along with conventional statistical and epidemiological methods. At present, studies predicting the following outcomes are planned: i) transition to schizophrenia or bipolar disorder, ii) suicide and suicide attempts/intentional self-harm, iii) premature death, iv) treatment response, v) admission (including involuntary admission), vi) transition from general psychiatry to forensic psychiatry, vii) type 2 diabetes, viii) cardiovascular disease, ix) obesity/dyslipidaemia, and x) cancer. Fig. 2 illustrates the overall workflow of the studies based on the PSYCOP cohort.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211105044055431-0577:S0924270821000223:S0924270821000223_fig2.png?pub-status=live)
Fig. 2. The machine learning workflow. Curated data from the electronic health records are fed to machine learning models, which output predictions for multiple psychiatric outcomes.
Because data in EHRs are highly interacting and heterogenous, we will primarily rely on deep learning methods. A first step will be to train a language model specifically suited for Danish EHRs. Text content in EHR is riddled with technical jargon, abbreviations, and non-standard grammatical structures, which can impair the performance of conventionally trained language models. Indeed, a recent study found that a transformer-based language model specifically trained for EHR outperformed other language models in disease prediction by 8–13% (Li et al., Reference Li, Rao, Solares, Hassaine, Ramakrishnan, Canoy, Zhu, Rahimi and Salimi-Khorshidi2020). We expect to obtain similar results for Danish data sets and to have the model serve as the foundation for further studies.
Prediction models will be trained using clinical variables, such as medications and diagnoses as well as free text from the clinical notes. Analysis of such disparate types of information calls for multiple specialised models. Thus, text will be modelled using transformer-based language models such as BERT (Devlin et al., Reference Devlin, Chang, Lee and Toutanova2019), XLM-Roberta (Conneau et al., Reference Conneau, Khandelwal, Goyal, Chaudhary, Wenzek, Guzmán, Grave, Ott, Zettlemoyer and Stoyanov2020), T5 (Raffel et al., Reference Raffel, Shazeer, Roberts, Lee, Narang, Matena, Zhou, Li and Liu2020), and the Danish EHR-specific model described in the preceding paragraph. Tabular data might be modelled using statistical and machine learning methods such as logistic regression and XGBoost (Chen & Guestrin, Reference Chen and Guestrin2016) or jointly with the text data using, for example, multimodal transformer models or ensembles. The specific methodology will depend on the research question at hand and will be fine-tuned for each individual study. Code, models, and aggregated data will be shared to the extent that it is possible and lawful according to Danish legislation.
Hardware and software
Training deep learning models requires significant computational resources. Whereas most conventional machine learning methods can be run on a consumer-grade central processing unit (CPU), deep learning models require large graphics processing units (GPUs) to efficiently handle the distributed and parallel nature of neural networks (LeCun, Reference LeCun2019). To accommodate this, two NVIDIA A100 40 GB GPUs have been purchased for the studies based on the PSYCOP cohort. This state-of-the-art hardware setup will allow for fast and efficient training of the most recent deep learning models. The majority of the analyses will use the programming language Python (Van & Drake, Reference Van and Drake2009) due to the availability of several high-functioning deep learning frameworks, such as PyTorch (Paszke et al., Reference Paszke, Gross, Massa, Lerer, Bradbury, Chanan, Killeen, Lin, Gimelshein and Antiga2019) and Tensorflow (Abadi et al., Reference Abadi, Barham, Chen, Chen, Davis, Dean, Devin, Ghemawat, Irving and Isard2016).
Approvals
The studies based on the PSYCOP cohort are approved by the Legal Office under the Central Denmark Region in accordance with the Danish Health Care Act §46, Section 2. According to the Danish Committee Act, ethical review board approval is not required for non-interventional studies. All data are processed and stored in accordance with the European Union General Data Protection Regulation.
Discussion
Individuals with mental illness are at high risk for a range of adverse outcomes. Anticipating such outcomes before they arise has the potential to improve prognoses by allowing early preventive measures to be taken. Deep learning holds promise for this task, but requires large amounts of data for training. As a step in this direction, the PSYCOP cohort will allow for large-scale prediction modelling studies based on fine-grained data from the real-world EHRs of almost 120 000 Danish patients with mental illness. This represents a unique opportunity to obtain novel and actionable insights into an array of clinical outcomes. Our prior study of mechanical restraint (Danielsen et al., Reference Danielsen, Fenger, Østergaard, Nielbo and Mors2019), which was based on a limited data set and employed classical machine learning methods, highlights this potential.
Expected outcomes
Based on the proposed analyses, we expect to gain new knowledge and models for several important clinical outcomes, such as premature mortality and development of psychiatric and medical comorbidities. We are optimistic that inclusion of all types of data from the EHR will allow us to leverage information otherwise “hidden” in the clinical notes for high-fidelity prediction and interpretation. Therefore, the overarching goal of the studies based on the PSYCOP cohort is to obtain results that will facilitate the development of decision-support systems for identifying at-risk patients. Decision-support systems such as the previously mentioned mechanical restraint model (Danielsen et al., Reference Danielsen, Fenger, Østergaard, Nielbo and Mors2019) provide clinicians with the opportunity to anticipate problems before they arise, thereby enabling interventions and, hopefully, prevention. In other instances, knowing whether a patient is likely to transition from unipolar depression to schizophrenia, for example, can lead to earlier diagnosis and thereby improved prognosis (Patton et al., Reference Patton, Coffey, Romaniuk, Mackinnon, Carlin, Degenhardt, Olsson and Moran2014). Fig. 3 illustrates how the developed prediction models may be implemented in clinical practice.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20211105044055431-0577:S0924270821000223:S0924270821000223_fig3.png?pub-status=live)
Fig. 3. Implementation in clinical practice. The machine learning models take data from electronic health records as input and provide predictions and recommendations for the clinical team.
One of the main challenges facing precision psychiatry is the limited implementation of prediction models (Salazar de Pablo et al., Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan, Oliver, Baldwin, Danese, Fazel, Steyerberg, Stahl and Fusar-Poli2020), which is complicated by several factors (Chekroud & Koutsouleris, Reference Chekroud and Koutsouleris2018). For example, patients or clinicians might object to recommendations from the model. Therefore, estimates of uncertainty and guidance for interpretation should accompany outputs, and sufficient infrastructure for producing and acting on model recommendations must be established (Chekroud & Koutsouleris, Reference Chekroud and Koutsouleris2018; Manchia et al., Reference Manchia, Pisanu, Squassina and Carpiniello2020; Salazar de Pablo et al., Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan, Oliver, Baldwin, Danese, Fazel, Steyerberg, Stahl and Fusar-Poli2020). The successful implementation of precision psychiatry models requires keeping these factors in mind from the outset. To facilitate the implementation of the created prediction models, the studies based on the PSYCOP cohort will be conducted in close collaboration with both the providers of the EHR system and clinicians. Further, models will only be trained using data that are available at inference time. This requirement, which might seem self-evident, is often neglected, for instance, when training a model with a fixed time to onset of outcome, which tends to provide overly optimistic evaluations of performance and reduce clinical validity (Lauritsen et al., Reference Lauritsen, Thiesson, Jørgensen, Riis, Espelund, Weile and Lange2021).
Beyond the potential clinical benefits of the studies based on the PSYCOP cohort, the creation of a Danish EHR-specific language model will scaffold future work in the field. The Southern Denmark Region and the Northern Denmark Region will in 2021 and 2022, respectively, transition to the same EHR-system currently in use in the Central Denmark Region. This will allow work and models derived from the PSYCOP cohort to be validated on data from these two regions and thereby provide an estimate of model robustness to population changes and data set shifts (Subbaswamy et al., Reference Subbaswamy, Adams and Saria2021), as well as the opportunity for broader implementation.
Ethical considerations
Developing and implementing precision medicine requires vigilance because the consequences of misuse or erroneous predictions can be severe. Using an erroneous prediction to guide treatment can lead to improper short-term care, while also negatively influencing or postponing later interventions (Starke et al., Reference Starke, Clercq, Borgwardt and Elger2020). Further, though a model might produce predictions with high accuracy, human errors in interpretation and misuse remain possible. For example, a model might predict the risk of newly admitted patients being subjected to mechanical restraint within the first 72 h of their admission (Danielsen et al., Reference Danielsen, Fenger, Østergaard, Nielbo and Mors2019). If, on this basis, an admitted patient with a high risk score is subjected to “preventive” mechanical restraint, the result is undue and illegal coercion and stress for the patient, as opposed to extra attention and care. Though perhaps slightly exaggerated, this example highlights the need to design clinical decision support with human factors in mind (Beeler et al., Reference Beeler, Bates and Hug2014).
To ensure equitable health outcomes, steps must be taken to mitigate biases such as those related to race, sex, and socio-economic status. If the training data includes biases, machine learning models will, unless explicit measures are taken, learn and potentially perpetuate them. For example, in a study in which a machine learning model was trained to distinguish between images of benign and malignant moles, the model achieved an accuracy equivalent to that of dermatologists (Esteva et al., Reference Esteva, Kuprel, Novoa, Ko, Swetter, Blau and Thrun2017). However, because the training data mainly included images from fair-skinned populations, the model may underperform on patients of colour (Adamson & Smith, Reference Adamson and Smith2018). In psychiatry, it is likely that biased trends, such as the underdiagnosis of ADHD in females (Quinn & Madhoo, Reference Quinn and Madhoo2014) and different treatment-seeking patterns among ethnic minorities (Roberts et al., Reference Roberts, Gilman, Breslau, Breslau and Koenen2011), can lead to biased prediction models. Ensuring fairness and reducing bias in machine learning for healthcare are ongoing research endeavours (Rajkomar et al., Reference Rajkomar, Hardt, Howell, Corrado and Chin2018a; Chen et al., Reference Chen, Joshi and Ghassemi2020), although practical efforts have been limited in psychiatry. We will, therefore, ensure that studies based on the PSYCOP cohort will evaluate all models for biases related to sex, age, and other relevant factors.
Finally, several ethical considerations related to patient consent and autonomy must be addressed when using prediction models to assist clinicians (Starke et al., Reference Starke, Clercq, Borgwardt and Elger2020). Should patients give informed consent before their data can be used to provide predictions? How interpretable do prediction models need to be to accommodate patients’ right to know the basis of their treatment? Respecting patient autonomy entails providing clear descriptions of the factors that led to a given treatment or diagnosis. However, these considerations may clash with the “black box” nature of many machine learning methods (Rudin, Reference Rudin2019). Thus, investigating ways to complement prediction models with explanations and uncertainty estimates is imperative for the models’ ethical deployment.
Perspectives
Although data from the PSYCOP cohort alone have multiple important applications, it can also be combined with data from other sources. All inhabitants of Denmark are registered with a unique personal identification number (Civil Personal Registration number (Pedersen, Reference Pedersen2011)), which serves as an identifier in the EHR. This provides the possibility to link data from the EHRs to data from other sources to facilitate more fine-grained analyses. For instance, genetic information from the Danish iPSYCH 2012 (Pedersen et al., Reference Pedersen, Bybjerg-Grauholm, Pedersen, Grove, Agerbo, Bækvad-Hansen, Poulsen, Hansen, McGrath, Als, Goldstein, Neale, Daly, Hougaard, Mors, Nordentoft, Børglum, Werge and Mortensen2018) and 2015 cohorts (Bybjerg-Grauholm et al., Reference Bybjerg-Grauholm, Pedersen, Bækvad-Hansen, Pedersen, Adamsen, Hansen, Agerbo, Grove, Als, Schork, Buil, Mors, Nordentoft, Werge, Børglum, Hougaard and Mortensen2020) has recently been used in conjunction with register data to investigate the association between polygenic risk scores and progression to bipolar or psychotic disorders from unipolar depression (Musliner et al., Reference Musliner, Krebs, Albiñana, Vilhjalmsson, Agerbo, Zandi, Hougaard, Nordentoft, Børglum, Werge, Mortensen and Østergaard2020). Similarly, the EHR data from the PSYCOP cohort can potentially be combined with genetic information for multimodal prediction of psychiatric outcomes. Likewise, the CROSS-TRACKS (Riis et al., Reference Riis, Kristensen, Petersen, Ebdrup, Lauritsen and Jørgensen2020) and PSYCOP cohorts can be combined to investigate factors related to, for instance, the movement of patients between points of care or early risk factors for mental disorders in primary care. CROSS-TRACKS is a Danish population-based cohort that includes data from both primary and secondary care as well as socio-demographic register data. At present, the patient population included in CROSS-TRACKS does not overlap with the PSYCOP cohort; however, its planned expansion aims to include patients from the entire Central Denmark Region in the coming years.
Limitations
Although studies based on the PSYCOP cohort are likely to provide several novel insights and applications, a number of important limitations must be considered. First, the structure of the EHR was changed to fit the format of the Danish National Patient Registry, Version 3 (LPR3) from Version 2 (LPR2) in February 2019. LPR3 diverges from LPR2 in several ways, most notably in the way contacts and diagnoses are registered. Further, the reporting of several clinical variables has been modified, which might complicate efforts to apply models trained on data from LPR2 to data from LPR3. Luckily, only a few clinical variables containing unstructured text were affected by the change, meaning that models using NLP are likely to be robust to this change. Second, EHRs are primarily created and used for clinical, not research, purposes. Consequently, the data are not ‘clean’ or organised by research standards, is irregularly observed, contains a large amount of missing values, and has complex temporal interactions (Goldstein et al., Reference Goldstein, Navar, Pencina and Ioannidis2017; Ching et al., Reference Ching, Himmelstein, Beaulieu-Jones, Kalinin, Do, Way, Ferrero, Agapow, Zietz, Hoffman, Xie, Rosen, Lengerich, Israeli, Lanchantin, Woloszynek, Carpenter, Shrikumar, Xu, Cofer, Lavender, Turaga, Alexandari, Lu, Harris, DeCaprio, Qi, Kundaje, Peng, Wiley, Segler, Boca, Swamidass, Huang, Gitter and Greene2018; Goldstein, Reference Goldstein2020). This poses significant challenges for prediction models, which must be specifically designed to handle these problems. Third, despite the significant size of the PSYCOP cohort, some outcomes might be too rare to be detected with adequate power. This is a recurring issue in machine learning for health care that might require particular modelling strategies and evaluation metrics (Schoon et al., Reference Schoon, Melamed, Breiger, Yoon and Kleps2019; Wang, Reference Wang2021). Fourth, insights gained from studies of the PSYCOP cohort are limited to individuals receiving treatment from psychiatric hospital services. Therefore, the models cannot be directly used to predict, for example, mental disorders in the general population. The first point of contact for patients is often their general practitioner, who makes decisions on first-line treatment and referrals to the psychiatric services. Patients treated for mental disorders by their general practitioner are thus not covered by the PSYCOP cohort, which will bias the cohort towards the most severe cases. However, as previously reviewed, patients requiring hospital-based treatment for mental illness are at high risk for a large number of comorbidities and adverse outcomes. Restricting our models to operate solely on this group provides greater opportunity to produce useful prediction models, as the base rate of the outcomes of interest is likely vastly higher among these patients compared with the general population. Fifth, the PSYCOP cohort is limited to patients receiving psychiatric treatment in the Central Denmark Region and is subject to censoring. Specifically, if patients move to a different region or country during follow-up, we lack information on continuing course of treatment. Lastly, although precision psychiatry bears great promise, successfully applying machine learning models to this field is challenging, as demonstrated by the low number of models implemented in clinical care (Salazar de Pablo et al., Reference Salazar de Pablo, Studerus, Vaquerizo-Serrano, Irving, Catalan, Oliver, Baldwin, Danese, Fazel, Steyerberg, Stahl and Fusar-Poli2020). The heterogeneous course of mental disorders makes them somewhat elusive targets for prediction and necessitates close collaboration between clinicians and developers in defining useful and actionable outcomes. Additionally, mental disorders do not possess distinct biological markers to the same degree as diseases in other medical specialties. Indeed, for both man and machine, detecting the presence of cancer from a mammography is inherently easier than inferring a patient’s risk of developing schizophrenia from a series of disjointed clinical notes.
Conclusion
The PSYCOP cohort provides opportunity for novel research and findings related to important clinical outcomes in psychiatry. Clinical feasibility and usefulness will be guiding factors for development of prediction models, thereby serving to bridge the gap from development to implementation of machine learning in precision psychiatry.
Acknowledgements
The authors are grateful to Bettina Nørremark (Aarhus University Hospital – Psychiatry and the Business Intelligence unit of the Central Denmark Region) for data management.
Author contributions
All authors have contributed/will contribute to the design of studies based on the PSYCOP cohort. The analyses for this paper were carried out by Hansen. The results were interpreted by all authors. Hansen and Østergaard wrote the first draft of the manuscript, which was subsequently revised for important intellectual content by the remaining authors. All authors approved the final version of the manuscript prior to submission.
Conflicts of interest
SDØ has received the 2020 Lundbeck Foundation Young Investigator Prize. The remaining authors declare no competing interests.
Funding
Studies based on the PSYCOP cohort are supported by the Lundbeck Foundation (grant number: R344-2020-1073), the Danish Cancer Society (grant number: R283-A16461), the Central Denmark Region Fund for Strengthening of Health Science (grant number: 1-36-72-4-20) and the Danish Agency for Digitisation Investment Fund for New Technologies (grant number 2020-6720). Østergaard reports further funding from the Novo Nordisk Foundation (grant number: NNF20SA0062874), the Lundbeck Foundation (grant number: R358-2020-2341) and Independent Research Fund Denmark (grant number: 7016-00048B). These funding bodies were not involved in the study design; in the collection, analysis and interpretation of data; in the writing of the report or in the decision to submit the paper for publication.