Commentary: the ethical challenges of machine learning in psychiatry: a focus on data, diagnosis, and treatment

Daniel S. Barron

doi:10.1017/S0033291721001008

Commentary: the ethical challenges of machine learning in psychiatry: a focus on data, diagnosis, and treatment

Published online by Cambridge University Press: 12 May 2021

Daniel S. Barron

Show author details

Daniel S. Barron*: Affiliation:
Department of Psychiatry, Yale University, New Haven, CT, USA Department of Anesthesiology and Pain Medicine, University of Washington, Seattle, WA, USA Department of Psychiatry, Brigham & Women's Hospital, Harvard University, Boston, MA, USA Department of Anesthesiology & Pain Medicine, Brigham & Women's Hospital, Harvard University, Boston, MA, USA
*: Author for correspondence: Daniel S. Barron, Email: daniel.s.barron@yale.edu

Article contents

Extract
Which data are useful?
Why diagnose?
How best to treat?
References

Rights & Permissions

Extract

The clinical interview is the psychiatrist's data gathering procedure. However, the clinical interview is not a defined entity in the way that ‘vitals’ are defined as measurements of blood pressure, heart rate, respiration rate, temperature, and oxygen saturation. There are as many ways to approach a clinical interview as there are psychiatrists; and trainees can learn as many ways of performing and formulating the clinical interview as there are instructors (Nestler, 1990). Even in the same clinical setting, two clinicians might interview the same patient and conduct very different examinations and reach different treatment recommendations. From the perspective of data science, this mismatch is not one of personal style or idiosyncrasy but rather one of uncertain salience: neither the clinical interview nor the data thereby generated is operationalized and, therefore, neither can be rigorously evaluated, tested, or optimized.

Keywords

Schizophrenia diagnosis machine learning

Type: Invited Commentary
Information: Psychological Medicine , Volume 51 , Issue 15 , November 2021 , pp. 2522 - 2524

DOI: https://doi.org/10.1017/S0033291721001008 [Opens in a new window]
Copyright: Copyright © The Author(s), 2021. Published by Cambridge University Press.

Which data are useful?

The clinical interview is the psychiatrist's data gathering procedure. However, the clinical interview is not a defined entity in the way that ‘vitals’ are defined as measurements of blood pressure, heart rate, respiration rate, temperature, and oxygen saturation. There are as many ways to approach a clinical interview as there are psychiatrists; and trainees can learn as many ways of performing and formulating the clinical interview as there are instructors (Nestler, Reference Nestler1990). Even in the same clinical setting, two clinicians might interview the same patient and conduct very different examinations and reach different treatment recommendations. From the perspective of data science, this mismatch is not one of personal style or idiosyncrasy but rather one of uncertain salience: neither the clinical interview nor the data thereby generated is operationalized and, therefore, neither can be rigorously evaluated, tested, or optimized.

Consider a standard psychiatric evaluation, wherein a thorough clinical interview will span a patient's biologic, psychologic, and social history. A clinical interview might yield thousands of datapoints that can range from a patient's visible and audible behavior (posture, speech, and expression); their reported narrative and symptomatology; results from clinical tests like blood work, urine toxicology, and electroencephlogram (EKG); collateral information from family members, legal authorities, or other health care providers; and the patient's socioeconomic status. Whether a clinical datapoint is useful is a testable hypothesis, one which depends on the specific use in question; for example, a patient's response to an selective serotoninreuptake inhibitor (SSRI) over 4–6 weeks (Chekroud et al., Reference Chekroud, Gueorguieva, Krumholz, Trivedi, Krystal and McCarthy2017) might be useful to an outpatient clinician but not useful to an emergency room psychiatrist assessing a patient's acute suicide risk (Just et al., Reference Just, Pan, Cherkassky, McMakin, Cha, Nock and Brent2017).

Defining and operationalizing which clinical data are useful for which decisions are no small matter, one that decades of research have been unable to answer. And yet, the very thing that machine learning (ML) algorithms offer is the ability to identify data that optimize some yet undefined purpose. The question becomes which purpose to optimize. Two answers might lie in diagnosis and treatment.

Why diagnose?

Schizophrenia is not schizophrenia in the way that hypertension is hypertension. Hypertension is diagnosed in one way: measured blood pressure is greater than a defined value. Though schizophrenia is also a defined diagnosis, if we consider the criteria for schizophrenia (see Table 1), there are 7 696 580 419 045 sets of symptoms that meet both criteria A and B as defined in the Structured Clinical Interview for Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5) (SCID-5) (First, Williams, & Karg, Reference First, Williams and Karg2016). Crucially, these sets do not differentiate symptom severity: for example, two patients might have so-called ‘tangential speech,’ but how tangential is irrelevant to diagnosis.

Table 1. Schizophrenia is not schizophrenia

Each case vignette from Starke et al., (Reference Starke, Clercq, Borgwardt and Elger2020) actually describes one of 7 696 580 419 045 possible types of schizophrenia, based on how many sets of symptoms meet SCID-5 criteria for schizophrenia (First et al., Reference First, Williams and Karg2016).

^a Based on the SCID-5, Criteria A is met if at least two of the A-criteria symptoms are present, and at least one symptom is from either A1, A2, or A3. Mathematically, the total combination of symptoms that meet criteria A can be represented as a power set. To compute the total symptom sets possible across A1–A5, we simply calculate [(2⁴⁰)] . From this total, we subtract the number of unwanted or redundant symptom sets. A set can be unwanted in two ways: (1) if it only includes symptoms from A4 and A5, ([(2¹⁷)]; (2) if it involves symptoms from a single A group, some of which are already accounted for in (1) with the remaining from A1, A2, A3: [(2¹³ − 1) + (2⁶ − 1) + (2⁴ − 1))]. So overall, the total number of symptom sets for criterion A is: [(2⁴⁰)] − [(2¹⁷) + (2¹³ − 1) + (2⁶ − 1) + (2⁴ − 1))] = =1 099 511 488 435. Criteria B adds (2³ − 1) = 7 sets. So there are a total of 1 099 511 488 435 × 7 = 7 696 580 419 045 sets of symptoms that meet SCID-5 criteria for schizophrenia. Of course, this number assumes that each individual symptom has a clear, monolithic meaning (which they do not). The author acknowledges and is grateful for the mathematical assistance of Drs. Leo A. Harrington (Department of Mathematics, UC Berkeley), W. Hugh Woodin (Department of Mathematics & Philosophy, Harvard University), and Gabriel Goldberg (Department of Mathematics, UC Berkeley) who, separately, helped me converge on the above solution.

Abbreviation: SSCID-5, Structured Clinical Interview for DSM-5.

Because no quantitative measures exist for the signs or symptoms of schizophrenia, ‘mild’ is the only modifier that Stark could apply to patient D's ‘psychotic symptoms’ (of which there could be many variants, see Table 1). Contrast ‘mild’ with a blood pressure of 200/120, which can be readily understood in relation to 120/80. It is not at all clear whether or to what extent D's symptoms overlap with R's or T's (and, indeed, it is statistically unlikely that they do). And yet, Starke accurately describes how each patient's unknown symptom profile would be represented in an ML study: schizophrenia.

The larger purpose of diagnosis in psychiatry remains unclear. Current psychiatric diagnoses are not motivated by etiology or treatment or symptom severity. Psychiatric diagnosis is a vestige of the pre-computer era: in a world of hand-written clinical notes, diagnosis's virtue was to tidily communicate and standardize the general flavor of a clinical interview (Lieberman & Ogas, Reference Lieberman and Ogas2015). In this sense, psychiatric diagnosis met (and meets) its mark: although schizophrenia can be diagnosed in 7 696 580 419 045 ways, most clinicians (and even non-clinicians) still have a notion for what is communicated by ‘schizophrenia’ and how this differs from, say, ‘PTSD’ (Young, Lareau, & Pierre, Reference Young, Lareau and Pierre2014). Diagnosis is a latent variable, a summary statistic of salient information from the clinical interview, varied as it might be. While data loss is necessary to define any summary statistic, data scientists are understandably suspicious of a latent variable that represents at least 7 696 580 419 045 sets of symptoms, each set possibly representing a unique etiology or pathophysiology.

Now that technology is relieving some of the burden of data creation, storage, and transmission, we might ask ourselves: if the best a diagnosis can offer is a latent variable summarizing a clinical interview, then why not produce high-definition audio and visual recording of the entire interview without any loss of data? Furthermore, given the complexity that arises in defining ground truth for a psychiatric diagnosis, ML analyses have begun to look for mechanistic understanding that might more ably pair clinical data with underlying biology or etiology (Bzdok & Ioannidis, Reference Bzdok and Ioannidis2019). It could be that ML classifiers might represent an evolution beyond psychiatric diagnostic groupings and that the question of which clinical data are most relevant might be better answered by data scientists than by clinicians.

How best to treat?

As Starke describes, many ML studies attempt to circumvent diagnoses entirely and inform treatment. This parallels what a clinician's brain does: gather and sift through clinical data primarily to inform treatment and, only later, to diagnose (Waszczuk et al., Reference Waszczuk, Zimmerman, Ruggero, Li, MacNamara, Weinberg and Kotov2017). This makes sense given the lack of specificity between diagnosis and treatment; antipsychotics, antidepressants, and mood stabilizers are routinely used in treating psychosis or depression or mood instability. Furthermore, patients with the same diagnosis often receive different treatments: Starke's patient R might be prescribed an antipsychotic and antidepressant while patient T is prescribed only an antipsychotic.

ML studies very well might help clinicians optimize treatment, yet as Stark notes, examples should be taken with a grain of salt: there is no consensus on how to measure treatment outcome in psychiatry (Zimmerman, Morgan, & Stanton, Reference Zimmerman, Morgan and Stanton2018). For example, would antipsychotic treatment be ‘successful’ if patient R's hallucinations decrease by 50%? By 90%? What if the hallucinations stop entirely but, even though R no longer requires frequent hospitalization, R cannot return to university because the treatment itself is too sedating? Or what if R's hallucinations do not dissipate but they are able to return to university? There is no clear answer to this question and, I suspect, any ML analysis attempting to optimize treatment selection would require not simply exhaustive phenotyping but also a ‘personalized tuning’ of the algorithm based on that patient's unique goals and expectations for treatment (Barron, Reference Barron2021). There, very well, maybe as many definitions of treatment success as there are patients in treatment and, even so tailored, that definition might change over time.

Overall, it remains possible that ML algorithms and the data scientists that produce them might bring clarity to the questions raised by decades of research. Starke's discussion of the ethical challenges for ML algorithms in psychiatry was a welcome addition to the growing dialogue. At base, the moral virtue of an algorithm is not simply whether it works but what it does, and for whom.

Acknowledgements

I thank the editors for inviting me to comment on Starke et al.'s ((Starke, Clercq, Borgwardt, & Elger, Reference Starke, Clercq, Borgwardt and Elger2020); hereafter, Starke) timely paper about the ethical challenges for ML in psychiatry. I hope to further magnify three challenges, which are fundamental questions about data, diagnosis, and treatment in psychiatric disorders.

References

Barron, D. (2021). Reading Our minds: The rise of Big data psychiatry. New York City: Columbia University Press, Columbia Global Reports.Google Scholar

Bzdok, D., & Ioannidis, J. P. A. (2019). Exploration, inference, and prediction in neuroscience and biomedicine. Trends in Neurosciences, 42, 251–262.CrossRef Google Scholar PubMed

Chekroud, A. M., Gueorguieva, R., Krumholz, H. M., Trivedi, M. H., Krystal, J. H., & McCarthy, G. (2017). Reevaluating the efficacy and predictability of antidepressant treatments: A symptom clustering approach. JAMA Psychiatry, 74, 370.CrossRef Google Scholar PubMed

First, M. B., Williams, J., & Karg, R. (2016) Structured Clinical Interview for DSM-5 Disorders (SCID-5), Clinician Version (SCID-5-CV).Google Scholar

Just, M. A., Pan, L., Cherkassky, V. L., McMakin, D. L., Cha, C., Nock, M. K., & Brent, D. (2017). Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth. Nature Human Behaviour, 1, 911–919.CrossRef Google Scholar PubMed

Lieberman, J. A., & Ogas, O. (2015). Shrinks: The untold story of psychiatry. New York, NY: Back Bay Books.Google Scholar

Nestler, E. J. (1990). The case of double supervision: A resident's perspective on common problems in psychotherapy supervision. Academic Psychiatry, 14, 129–136.CrossRef Google Scholar PubMed

Starke, G., Clercq, E. D., Borgwardt, S., & Elger, B. S. (2020). Computing schizophrenia: Ethical challenges for machine learning in psychiatry. Psychological Medicine, 1–7. https://doi.org/10.1017/S0033291720001683.CrossRef Google Scholar

Waszczuk, M. A., Zimmerman, M., Ruggero, C., Li, K., MacNamara, A., Weinberg, A., … Kotov, R. (2017). What do clinicians treat: Diagnoses or symptoms? The incremental validity of a symptom-based, dimensional characterization of emotional disorders in predicting medication prescription patterns. Comprehensive Psychiatry, 79, 80–88.CrossRef Google Scholar PubMed

Young, G., Lareau, C., & Pierre, B. (2014). One quintillion ways to have PTSD comorbidity: Recommendations for the disordered DSM-5. Psychological Injury and Law, 7, 61–74.CrossRef Google Scholar

Zimmerman, M., Morgan, T. A., & Stanton, K. (2018). The severity of psychiatric disorders. World Psychiatry, 17, 258–275.CrossRef Google Scholar PubMed

Table 1. Schizophrenia is not schizophrenia

Article contents

Commentary: the ethical challenges of machine learning in psychiatry: a focus on data, diagnosis, and treatment

Extract

Keywords

Which data are useful?

Why diagnose?

How best to treat?

Acknowledgements

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests