Hostname: page-component-745bb68f8f-v2bm5 Total loading time: 0 Render date: 2025-02-06T15:54:54.093Z Has data issue: false hasContentIssue false

Clinical wisdom in the age of computer apps: a systematic review of four functions that may complement clinical treatment

Published online by Cambridge University Press:  19 August 2022

Catalina R. Pacheco*
Affiliation:
Department of Psychiatry and Behavioral Sciences, Tulane University School of Medicine, New Orleans, LA, USA
Michael S. Scheeringa
Affiliation:
Department of Psychiatry and Behavioral Sciences, Tulane University School of Medicine, New Orleans, LA, USA
*
*Corresponding author. Email: cpacheco@tulane.edu
Rights & Permissions [Opens in a new window]

Abstract

Mental health clinicians perform complex tasks with patients that potentially could be improved by the massive computing power available through mobile apps. This study aimed to analyse commercially available mobile and computer applications (apps) focused on treating psychiatric disorders. Apps were analysed by two independent raters for whether they took advantage of computer power to process data in a fashion that augments four main elements of clinical treatment including (1) assessment/diagnosis, (2) treatment planning, (3) treatment fidelity monitoring, and (4) outcome tracking. The evidence base for each of these apps was also explored via PsychINFO, Research Gate and Google Scholar. Searches of the Google Play Store, the Apple App Store, and the One Mind PsyberGuide found 722 apps labelled for mental health use, of which 163 apps were judged relevant to clinical work with patients with psychiatric disorders. Fifty-nine of these were determined to contain a computer-driven function for at least one of the four main elements of clinical treatment. The most common element was assessment/diagnosis (55/59 apps), followed by outcome tracking (34/59 apps). Six apps updated treatment plans using user input. Only one app tracked treatment fidelity. None of the apps contained computer-driven functions for all four elements. Twelve apps were supported in randomized clinical trials to show greater efficacy compared with either wait-list or other active treatments. Results showed that these four clinical elements can be meaningfully augmented, but the full potential of computer processing appears unreached in mental health-related apps.

Key learning aims

  1. (1) To understand what apps are currently available to treat clinical-level psychiatric problems.

  2. (2) To understand how many of the commercially available mental health-focused apps can be used for the treatment of clinical populations.

  3. (3) To understand how mental health services can be complemented by utilizing computer processing power within apps.

Type
Original Research
Copyright
© The Author(s), 2022. Published by Cambridge University Press on behalf of the British Association for Behavioural and Cognitive Psychotherapies

Introduction

In 1968, Lewis Goldberg was at the forefront of asking how the accuracy of ‘clinical wisdom’ might be improved. He noted discouraging conclusions from studies that clinical judgements between individual clinicians tended to be unreliable (Goldberg, Reference Goldberg1968). He concluded that if ‘complex clinical inferences’ are to be learned reliably by clinicians, there must be some form of feedback which must include whether clinical judgements were accurate or not. A more reliable methodology was needed to ‘substitute for the more ephemeral storage capacities of the unaided human brain’. Goldberg believed this substitute was research, but with the advent of computing power availability within widely available mobile mental health applications (apps), Goldberg’s recommendation may be taking on new possibilities.

Computer-facilitated mental health care has grown tremendously in recent years with the creation of software that provides one or more components of traditional office-based care. These include software programs for phones (mobile apps) or computers (basic websites and progressive web apps, which are websites that behave and feel like a mobile app) which can facilitate virtual mental health care by providing flexible hours for treatment, reducing logistical issues and potentially cost, making treatment more accessible to patients with low motivation or high anxiety, increasing discretion, and increasing patient engagement (Cartreine et al., Reference Cartreine, Ahern and Locke2010; Imel et al., Reference Imel, Caperton, Tanana and Atkins2017; Olff, Reference Olff2015). Other types of software provide online self-assessments, provide psychoeducation, or track symptom change over time.

Although apps offer novel methods to deliver components of clinical care such as stress management, emotion tracking and mindfulness, it is unclear whether apps can fully replace or meaningfully improve the experience of clinical therapy. A recent, large review of mental health mobile applications by Lau and colleagues (Reference Lau, O’Daffer, Colt, Yi-Frazier, Palermo, McCauley and Rosenberg2020) found that free mobile apps available on the iPhone or Android app stores covered 31 unique intervention and didactic content categories. The majority of apps were designed as self-help interventions and were not clearly intended for individuals with psychopathology, and thereby are necessarily analogous to clinical treatment. Only 4.7% of apps were designed specifically for psychological disorders. This review highlighted a gap in our understanding of the state of mobile mental health apps. Most apps lack a clear clinical orientation, and there has not been a comprehensive review of apps intended for help-seeking individuals with clinical-level, impairing disorders.

Another gap is that there has been no review that we are aware of that has examined the extent to which computer power has been harnessed to complete clinical tasks. Here, ‘computer power’ refers to the superior memory and data processing capacities compared with human brains which can be utilized to collect and analyse data, and subsequently use that data to perform tasks. To date, most apps have been promoted as a way to improve accessibility to skills based in therapeutic theories, or as simple tracking tools to complement sessions with therapists. The potential of the processing power of computers seems untapped. For example, Imel and colleagues (Reference Imel, Caperton, Tanana and Atkins2017) describe a hypothetical scenario in which machine learning might be applied to transcripts of therapy sessions to predict treatment outcome. Apps open the door for massive computer processing capabilities to use data to do clinical tasks faster or better than the minds of individual clinicians (Olff, Reference Olff2015), but it is unclear how many, if any, apps harness these capabilities.

The purpose of this study is to review apps, both mobile and web-based, that utilize computer processing to enhance the treatment of clinical populations by using data collected by the app. Our goal was to characterize the content of these types of apps relevant to four main characteristics of traditional, in-person clinical treatment which could be augmented by computing power. Namely, these characteristics are (1) assessment/diagnosis, (2) treatment planning, (3) treatment fidelity tracking, and (4) tracking of treatment outcome. Each of these components has the potential to be significantly improved with the integration of existing technologies. The following is a discussion of several of the many possibilities in each component.

Assessment/diagnosis

Mental health apps could improve both the precision and accuracy of self-assessment. As mentioned by Imel et al. (Reference Imel, Caperton, Tanana and Atkins2017), the anonymity provided by mobile apps may influence users to disclose information they may choose to keep from a therapist due to feelings of shame. Furthermore, Olff (Reference Olff2015) suggested that mobile apps might be used to allow ecological momentary assessments which may promote accurate symptom monitoring. Olff (Reference Olff2015) also describes how the adaptability of computerized assessments enable the app to only display items on an assessment that are relevant to the user, for example, by only asking about events or symptoms that occurred in a specific time frame or by terminating an assessment for a specific disorder if screening questions indicate the absence of a disorder. Similarly, the adaptability of computerized assessments enables them to ask increasingly specific questions to parse out similar disorders and identify co-morbidities.

Treatment planning

Computing power could be used to automate more flexible and organized treatment planning. For example, apps could determine where to focus treatment depending on the results of an initial assessment and then recommend complete protocols, or pick and choose relevant modules from multiple protocols to build individualized treatment plans for patients with co-morbidities (Andersson, Reference Andersson2009). Additionally, computing power may help providers and patients collaboratively develop a treatment plan which focuses on areas that seem important to both individuals. Evidence shows that allowing patients to choose internet-delivered therapy modules produces comparable results to programs in which the clinician chooses the modules (Andersson et al., Reference Andersson, Estling, Jakobsson, Cuijpers and Carlbring2011). Whether apps are purely self-help or used with clinicians, providing a menu of modules from which to build a treatment plan may be an efficient form of treatment planning.

Furthermore, as treatments progress and apps continue collecting data, apps can automatically suggest adaptations to treatment plans as necessary. For example, if one type of treatment (e.g. cognitive behavioural therapy) does not seem to be benefiting a patient, an app could identify this quickly through repeated assessments and then either make a modest suggestion to re-think the current strategy (Brattland et al., Reference Brattland, Koksvik, Burkeland, Gråwe, Klöckner, Linaker, Ryum, Wampold, Lara-Cabrera and Iversen2018) or a more specific suggestion for another type of treatment (e.g. dialectical behaviour therapy).

Treatment fidelity

Treatment fidelity is defined as the degree to which a clinical treatment was delivered as intended. Computers excel at presenting stored data in pre-programmed sequences through intuitive and attractive interfaces. These features may be well-suited to facilitate fidelity to and compliance with lengthy, manualized treatment plans. Once a manualized treatment protocol is written into the code of an app or computerized program, that treatment will be delivered with fidelity. If the app is designed to track the progression through a treatment protocol, both the patient and clinician, if relevant, can see exactly where they are in the treatment plan and which steps come next. Even if there is a human clinician implementing the therapy, this structure may increase the fidelity with which a treatment is implemented.

Apps could also serve to increase the fidelity with which patients implement therapeutic tools in their daily lives. Many apps include descriptions of therapeutic techniques or activities, for example by including videos to guide the user through progressive muscle relaxation. By making these resources available at all times, apps can reinforce skills in real life that patients learned in-session. Imel and colleagues (Reference Imel, Caperton, Tanana and Atkins2017) theorize that access to these types of tools may increase patient participation in treatment.

Treatment outcome monitoring

Computing power delivered through phones and computers provides an avenue for both qualitative and quantitative outcome monitoring. Quantitatively, comparisons of pre- and post-treatment clinical assessments provide patients and therapists with clear evidence of changes in symptom severity and functional impairment. Moreover, use of apps to administer baseline and outcome measures may help ensure that the information is delivered to patients instead of only to clinicians. Qualitatively, if patients input specific goals at the beginning of treatment, apps can preserve these goals on a dashboard to minimize forgetting of difficult topics. In this way, the standardization provided by the app can ensure that these issues are followed up on more consistently than they might be in traditional therapy.

Furthermore, providers tend to over-estimate their effectiveness, so encouraging regular feedback may give therapists important feedback about their own effectiveness (Walfish et al., Reference Walfish, McAlister, O’Donnell and Lambert2012). Similar to Goldberg’s earlier recommendation (Reference Goldberg1968), Imel et al. (Reference Imel, Caperton, Tanana and Atkins2017) discussed the usefulness of specific, real-time feedback about clinical decision-making for developing clinical expertise and the lack of such feedback after licensure. Apps can be utilized to provide this type of feedback to therapists, thereby increasing the chances that the therapist provides the therapy in the way in which it was meant to be delivered.

As mentioned above, the purpose of this study is to review apps, both mobile and web-based, that utilize computer processing to enhance various aspects of treatment of clinical populations by using data collected by the app. Our research questions were as follows: (1) how many apps or websites utilize computer processing to perform or augment four main areas of clinical work, namely (a) assessment/diagnosis, (b) treatment planning, (c) treatment fidelity tracking, and (d) tracking of treatment outcome?; (2) how do apps gather data and provide feedback in these areas?; and (3) is there evidence that the available integrations of computer processing in these areas significantly enhances patient outcomes?

Method

From July to November 2020, apps and progressive web applications were extracted from several databases – the Google Play Store, the Apple App Store, and a virtual app guide curated by One Mind PsyberGuide. Mobile apps were extracted from the Google Play and Apple stores using the search terms ‘mental health’ and ‘mental health apps’. This was supplemented with Google searches using the search terms ‘progressive web apps mental health’ and ‘mental health web application’.

Information about the apps was extracted from the descriptions published in each database. If app descriptions did not provide enough information, the authors explored the app websites for the information of interest. If the details of apps were still unclear, or if apps did not have a website, apps were downloaded. Only free apps or apps with a free trial were downloaded. If information could not be determined about paid apps, the authors requested the information from the app developers. In the event that the app developers did not respond to the request for information, the unclear aspect was considered ‘unclear’ in both the final count and in Table 1.

Table 1. Characteristics of included apps

ACT, acceptance and commitment therapy; BA, behavioural activation; DBT, dialectical behvioural therapy; HRV, heart rate variability; IPT, interpersonal therapy; OCD, obsessive compulsive disorder; tDCS, transcranial direct current stimulation.

* Lyssn is the only service in our review that monitored treatment fidelity.

Inclusion criteria were that the apps (1) included a clinical aspect that could be used to treat psychiatric problems, (2) included a sophisticated computer-driven element calculated from user input, and (3) were in English. An app was defined as ‘clinical’ if it included an assessment or taught users skills from typical psychotherapy modalities (such as cognitive behavioural therapy, dialectical behaviour therapy, acceptance and commitment therapy, etc.) to treat psychiatric disorders. Apps could also be considered clinical if they included linkages to licensed telehealth clinicians. Apps that were solely for meditation, mindfulness, relaxation techniques and insomnia were excluded because application of these techniques is not limited to psychiatric treatment.

One author (C.R.P.) rated each app discovered through the searches as ‘clinical’ or ‘not clinical’ with reasons to support the ratings. The second author (M.S.S.) reviewed the apps and made independent ratings about ‘clinical’ status. If M.S.S. was uncertain about C.R.P.’s ratings, then M.S.S. undertook independent assessments of these apps.

A computer-driven element was defined as an app that used computer processing power to augment clinical therapy, either through a baseline assessment, treatment fidelity tracking, and/or treatment outcome monitoring, in a way that would be difficult or time-consuming for clinicians. Apps that simply provided and summed short (less than 10 items) clinical assessment measures did not meet the definition of ‘sophisticated’ and were excluded.

Both authors (C.R.P. and M.S.S.) rated every clinical app independently on sophisticated computer-driven elements, defined as computer processing of user input for at least one of these four typical activities of psychiatric treatment: (1) clinical assessment, (2) treatment planning, (3) treatment fidelity to a standardized protocol, or (4) tracking treatment outcomes. The app must have presented some sort of output as a result of user input. For example, an app could suggest lessons or activities to users based on the results of an assessment or augment the activities suggested based on the user’s previous ratings of an activity. Because of this requirement, apps that solely offered clinician-provided telehealth and did not offer some other computerized aspect of treatment were excluded. Apps that had at least one of the four computer-driven elements were retained. Discrepancies were resolved by discussion.

Next, both authors independently determined if the apps were self-help or therapist-assisted (or both), offered multiple types of treatments, made data collection mandatory or optional, was supported by empirical evidence, and how it gave feedback to users. Discrepancies were resolved by discussion.

The available empirical evidence of each app’s treatment effectiveness was found by exploring websites dedicated to the apps or by searching the names of the apps on Google Scholar, Research Gate, or PsycINFO. If empirical evidence was not found by these methods, the authors requested any existing evidence directly from the app developers as detailed by the respective app stores. ‘Empirical evidence’ was defined as positive results from a randomized clinical trial (RCT) on treatment outcome that included use of the app.

Results

Figure 1 represents a visual of the selection process. These search terms resulted in 351 apps from the Google store and 299 apps from the Apple store. Ninety-four apps were common between the two stores. One Mind PsyberGuide provided 198 apps, 33 of which were common between the Apple store and 43 of which were common between the Google Play Store. Duplicate apps were removed so that each app was only considered once in the final counts. The Google and literature searches resulted in 16 apps not found through the previous searches. Through the combination of these searches, 722 unique apps were found. Fourteen apps were not in English and were therefore excluded. After the initial search, 32 apps were no longer available for download on either store. This resulted in a total of 676 apps that were reviewed, 76 of which were downloaded. The two authors’ ratings agreed on whether apps were clinical 99.0% of the time (669 out of 676 apps). After discrepancies were investigated and discussed, only one of the first author’s ratings was changed. Out of the 676 apps, 513 were excluded for failure to meet our definition of having a clinical element, leaving 163 apps that were reviewed in the next step.

Figure 1. PRISMA flow diagram of the inclusion process.

Of these 163 apps, 104 were excluded due to a lack of a computer-driven element, resulting in a total of 59 apps with a clinical, computer-driven element which were included in the final qualitative review. A flow diagram of the inclusion process can be seen in Fig. 1. The authors’ ratings agreed about whether the app had a computer-driven element 77.9% of the time (127 out of 163 apps). A list of the included apps and their qualities can be found in Table 1. Twenty-nine of the apps were self-help, 15 were therapist-assisted, and 15 could be used either as self-help or with a therapist.

Self-assessment

Baseline clinical assessments were the most common computer-driven element of the apps with 55 out of 59 apps providing some sort of assessment. The content of measures in 53 of the 55 apps could be determined. The content of two apps could not be determined because they were behind paywalls and the developers did not respond to our requests for information.

Of the 53 apps for which content was known, 15 provided measures for a single syndrome: eight were for depression, four were for anxiety, two were for post`traumatic stress disorder (PTSD), and one was for obsessive-compulsive disorder (OCD). One app that assessed for depression and one app that assessed for anxiety also asked about the user’s stage of change.

Thirty-five apps provided measures for more than one syndrome. The most common was to provide both a depression and an anxiety screen, with 13 apps giving an anxiety and depression screen and four assessing anxiety, depression and stress. One app assessed for PTSD and depression, and one assessed for anxiety, OCD and phobias. Sixteen apps included a multitude of assessments. There was a wide variety of content within these apps. For example, Clinicom claims to assess for over 55 mental health conditions while Spring Health screened for many symptoms, psychiatric and family history, and social determinants of health. Two apps measured well-being, as opposed to symptoms. One of these apps also included a behavioral health scale. Lastly, one app used the Outcome Rating Scale and Session Rating Scale.

Twenty-two apps allowed the user to choose whether or not to take the assessment. Twenty-five apps required the assessment. For eight apps, it was unclear whether the assessments were mandatory or optional due to the measures being behind a paywall.

How did computer power analyse the data and present the results to users?

Thirty-four apps displayed summed scores of assessment measures. Of these, 15 included a qualitative severity rating (e.g. mild, moderate, severe). One app produced a score from a stress test but not for anxiety and depression questionnaires. Six other apps provided a qualitative severity rating with no scores. One app did not give a score or interpretation but used an inventory of anxieties to develop a customized exposure ladder. Four apps did not give any feedback. Two apps gave feedback about which stage of change the user was in and how their symptoms compared with other children their age. One app gave a suggested diagnosis. Seven apps were unclear from their available information.

Computer-guided treatment plan

Of the 53 apps that provided any treatment techniques, 11 provided more than one type of clinical treatment, and six of these updated the treatment plan depending on user input.

How did computer power use data to guide treatment plans?

Three of the apps that could guide treatment plans were chatbots (Woebot, Tess and Anxiety Test & Relief). These apps ask preliminary questions and then suggest exercises to help manage the identified issue. Tess follows up on a previously mentioned issue and asks for feedback. If the user expresses that the previous suggestion was not helpful, the chatbot suggests a different type of exercise. Anxiety Test & Relief uses the Tess algorithm. In contrast, Woebot does not check in to see how relevant or helpful the exercise was for the user.

The Trier Treatment Navigator uses the results of the baseline assessment to suggest strategies for treatment. Via repeated assessments, the system then identifies patients who are ‘not on track’ and suggests clinical exercises, worksheets and videos for the clinicians. MoodMission used the results of its preliminary assessment to suggest a ‘mission’ which could incorporate cognitive behavioural therapy (CBT), behavioural activation (BA), meditation, or relaxation exercises. The app has users rate their distress before and after completing the mission and adapts its suggestions to match the missions that most successfully lowered the distress score. Thrive, by Waypoint Health, offers therapy plans based on social skills training and CBT. According to their website, the algorithm gives specific recommendations depending on the user’s goals, past exercises, and experiences with depression.

Treatment fidelity

Only one out of the 59 programs tracked treatment fidelity.

How did computer power analyse the data and present the results to users?

Lyssn is described as artificial intelligence software that provides feedback about the treatment fidelity of therapy sessions. The software derives a transcript from psychotherapy sessions and uses that transcript to calculate an overall fidelity score, a percentage of non-adherent behaviours, scores for empathy and the ‘motivational interviewing spirit’, statistics on the amount of session time that the therapist spent talking, the number of open questions asked, and the number of reflections made. Data collection is mandatory.

Symptom tracking/outcome monitoring

The second most common computer-driven component was symptom or outcome monitoring via repeated assessments (34/59 apps). The content of measures in 33 apps could be determined. Symptom tracking was voluntary for 14 of the apps and mandatory for 14 of the apps. It was unclear whether six apps required monitoring. It is worth noting that three apps, TruReach, OCD Challenge, and Dartmouth Path, offer repeated assessments but do not keep a log of previous scores.

How did computer power analyse the data and present the results to users?

Twenty-two apps graphed assessments over time. Nine of these apps included qualitative severity ratings for each score. One of these apps gave only qualitative severity ratings after the assessments and graphed these ratings over time. One app described how much the user’s most recent score differed from their baseline assessment score (i.e. +25 from baseline) but did not graph the changes. Four apps kept a log of scores but did not graph them or otherwise quantify change. One of these apps also kept a log of severity ratings. Three of the apps that graphed scores over time were used to predict treatment response and included alerts when repeated assessments indicated a negative change trajectory. It was unclear how seven of the apps gave feedback on their repeated assessments.

Evidence

Twelve out of the 59 apps (20.3%) have been shown to improve clinical outcomes in RCTs. Two of these 12 apps were sophisticated measurement apps without a treatment component, whereas the other 10 included a treatment component. See Table 2 for a list of the apps that are supported by empirical evidence. Eight of the 12 apps were tested against wait-list controls, and all were found more effective. Only four apps were tested against another active treatment that did not involve an app, and all four were found more effective (Brattland et al., Reference Brattland, Koksvik, Burkeland, Gråwe, Klöckner, Linaker, Ryum, Wampold, Lara-Cabrera and Iversen2018; Mahoney et al., Reference Mahoney, Mackenzie, Williams, Smith and Andrews2014; Rose et al., Reference Rose, Buckey, Zbozinek, Motivala, Glenn, Cartreine and Craske2013; Slade et al., Reference Slade, Lambert, Harmon, Smart and Bailey2008).

Table 2. Apps that are supported by randomized clinical trial evidence of treatment effectiveness

* The PCOMS method of outcome monitoring was tested in multiple RCTs but the graphs were created on paper before software methods were deployed with clinicians. Those studies are not included.

Most trials were conducted with patients who had not initiated an approach to a clinic; they were recruited from advertisements in the community. Only four apps were tested in help-seeking clinic populations; three of these were tested against wait lists (Farrer et al., Reference Farrer, Christensen, Griffiths and Mackinnon2011; Sandoval et al., Reference Sandoval, Buckey, Ainslie, Tombari, Stone and Hegel2017; Slade et al., Reference Slade, Lambert, Harmon, Smart and Bailey2008; Twomey et al., Reference Twomey, O’Reilly, Byrne, Bury, White, Kissane, McMahon and Clancy2014), and one was tested against another active intervention (Brattland et al., Reference Brattland, Koksvik, Burkeland, Gråwe, Klöckner, Linaker, Ryum, Wampold, Lara-Cabrera and Iversen2018).

Discussion

Out of the 676 apps identified through our search, 59 of them contained a computer-driven element, according to our definition, that performed or augmented at least one of the four main areas of clinical work, namely, assessment, treatment planning, fidelity monitoring, and outcome monitoring. Of these 59 apps, 55 included a baseline assessment, six updated the treatment plan according to reassessments, one tracked treatment fidelity, and 35 monitored clinical outcomes with some form of reassessment. None of the apps included all four clinical elements. It is worth noting that we found a total of 163 apps that seemed applicable to individuals with clinical-level mental health problems. While this is still a large number of apps for consumers to sort through, it is far fewer than the 722 apps retrieved through our initial search. The vast majority of apps that can be found in app stores with keywords related to mental health were designed for wellness, everyday stressors, and non-clinical-level issues, a finding which is similar to previous research which has found that the majority of mental health treatment apps are not developed by mental health professionals or researchers and rarely include clinically accurate, evidence-based information (Bry et al., Reference Bry, Chou, Miguel and Comer2018; Shen et al., Reference Shen, Levitan, Johnson, Bender, Hamilton-Page, Jadad and Wiljer2015).

Assessments were by far the most common computer-driven element in apps. These appear to add valuable standardized information for single psychiatric syndromes, but this review raises two concerns. First, the majority of apps did not assess for conditions beyond anxiety and depression. Many psychiatric syndromes, if not most, co-exist with features of other psychiatric syndromes that may impact choice of treatment strategies, duration of treatment, and prognosis for response. Second, approximately half (34/59) of the apps offered baseline assessments but made them optional. Evidence has shown that questionnaires are often not completed when they are optional (Liu et al., Reference Liu, Cruz, Rockhill and Lyon2019), but they are completed close to 100% of the time when they are required (Scheeringa, Reference Scheeringa2020).

Monitoring outcomes with repeat assessments was the second most common computer-driven element we found. Some apps, such as PCOMS and Trier Treatment Navigator, have demonstrated how automated computer processing can perform this function in a feasible and effective manner. These apps determine who is at risk of treatment failure and alert clinicians if their patients are ‘off-track’ of their expected treatment response. At least half of the apps that allow repeat assessments, however, make it optional and do not link the results into either therapists’ workflows or clients’ user experiences in a compelling manner.

With regard to treatment planning, the most common strategy used the results of self-assessments to suggest CBT protocols, and the suggestions generally were not updated from additional user input throughout treatment. For example, Mayo Clinic’s Anxiety Coach used the user’s specific anxieties to create an exposure hierarchy, but there was no option to revise the hierarchy once it was created. The apps that did use data to update the ‘treatment’ generally did not create overall treatment plans, but rather suggested activities depending on the assessment. The exception was Tess, the chatbot that claims to update therapy modalities depending on user input. However, it was difficult to determine how many, or which types of therapies are offered by the app. While the entirety of available apps may provide a wide range of treatment techniques, consumers must sort through a bewildering number of apps to find them. Utilization of computer processing power appears at a very preliminary stage for helping users navigate the complex world of psychotherapy options.

Not surprisingly, treatment fidelity monitoring was the least developed computer-driven element, consistent with the trend of most therapist to avoid evidence-based therapies (EBT) outside of academic trials. Despite consistent evidence of the superiority of EBTs versus usual care (Weisz et al., Reference Weisz, Kuppens, Eckshtain, Ugueto, Hawley and Jensen-Doss2013), most patients are not offered them in practice (Shafran et al., Reference Shafran, Clark, Fairburn, Arntz, Barlow, Ehlers, Freeston, Garety, Hollon, Ost, Salkovskis, Williams and Wilson2009). It is too early to tell if apps will make an impact in this area. Lyssn, the single app we found that offers fidelity tracking, has yet to be tested.

Our review process highlighted a lack of standardization and transparency in ways to search for information about apps. It was difficult to find accurate information on the apps in this review. The descriptions on both the app websites and in the app stores were often general and did not give specific details about the contents of the app. For example, it was often difficult to determine what type of assessment the app contained (i.e. whether they used standardized, validated measures, or asked general questions about psychiatric disorders) or what exactly the treatment consisted of. Additionally, often an app’s description claimed the app was based in CBT or used CBT principles but only included isolated techniques such as deep breathing or progressive muscle relaxation. This is understandable as the app field is relatively new and the major app stores have not agreed on a uniform reporting code.

We found that less than a quarter of the reviewed apps (12/59) have been shown to result in significant clinical outcomes in an RCT. Thus, the research evidence that might indicate these apps add value above and beyond traditional psychotherapy with a human therapist is promising but preliminary. Four apps outperformed another active intervention and eight apps have outperformed wait-list control groups in randomized trials. However, most of this evidence may not be generalizable to traditional clinical work because only four of these trials were conducted with help-seeking samples. Only one app has been shown more effective than another non-app intervention in a clinical population (Brattland et al., Reference Brattland, Koksvik, Burkeland, Gråwe, Klöckner, Linaker, Ryum, Wampold, Lara-Cabrera and Iversen2018). This lack of evidence-based content in mental health apps is well-documented (Lau et al., Reference Lau, O’Daffer, Colt, Yi-Frazier, Palermo, McCauley and Rosenberg2020; Olff, Reference Olff2015). Overall, it was difficult to ascertain which apps had been tested in any type of empirical research. Although the websites for several of the apps included studies that the developers had conducted, many did not. Additionally, many studies did not specifically state the name of the app used so it was difficult to know which app (or which version of that app) was used in published studies.

This raises more of a question rather than a limitation of whether traditional RCT evidence is really the best metric to judge apps. Apps are mostly extensions of already-proven methods, except perhaps for chatbots and Lyssn, and it is not obvious that effectiveness RCTs are truly the next incremental step of science for this field. Furthermore, it is unclear whether a favourable performance in an RCT would necessarily translate into clinical benefits. Although there are several apps that appear promising, more longitudinal research is needed to identify any beneficial effects of mental health applications. The added value of apps lies in the potential for harnessing computer power to augment clinical processes, disseminating treatments to larger populations who cannot or will not access in-person clinics, and enabling clients to have a more rewarding experience.

Technical metrics that are currently the most frequently available in app stores, such as privacy, security, interoperability, and when the app was last updated, are additional important metrics that complement the clinical components we reviewed. Satisfaction metrics, such as star ratings, number of downloads, and number of reviews posted, may serve as proxies of clinical effectiveness, but are unreliable. Star ratings and reviews can be skewed positively by fake reviews and skewed negatively by self-selection bias of unhappy customers being relatively more vocal. The number of downloads can be skewed by marketing and cost factors. One recent review demonstrated that most technical and satisfaction metrics showed no correlation with each other (Lagan et al., Reference Lagan, D’Mello, Vaidyam, Bilden and Torous2021). It is acknowledged that app quality is a multidimensional construct that includes clinical utility, privacy and user satisfaction.

Strengths of this review include its comprehensive review of the two main mobile app stores. All of the apps that were available at the time of this report were reviewed in some capacity. Weaknesses include that we only included English language apps and that we did not explore features in detail that were behind a paywall or were restricted to employees whose company purchased access to the apps. However, most of the basic information about those apps were available on developers’ websites and it is unlikely novel features were missed.

In summary, the majority of the more than 700 self-proclaimed ‘mental health’ apps currently available do not contain a truly clinical aspect according to our definition. Out of the 163 that do, a minority implement any computer-driven process to help with treatment. This finding is similar to those of a review of smartphone apps for anxiety which found that attempts to take advantage of technological possibilities (e.g. sensors and ecological momentary assessment) were rare (Bry et al., Reference Bry, Chou, Miguel and Comer2018). Out of the subset of available apps that did include a computer-driven process in our review, it seems that there are several sophisticated available apps that have the potential to meaningfully augment psychotherapy. Apps have undoubtedly increased dissemination of valuable psychiatric and psychological information, but it does seem that the full potential of computer processing appears unreached in mental health-related apps. Apps may be making a dent in Goldberg’s lament that clinical wisdom could stand improvements, but it is evident that computer processing is not easily applied via apps to the ‘complex clinical inferences’ that clinicians need to master.

Key practice points

  1. (1) Most commercially available, self-proclaimed mental health-related applications do not contain elements that make them relevant to clinical problems.

  2. (2) Most apps that can be used for clinical populations do not meaningfully gather user data to supplement assessment or treatment.

  3. (3) There are promising models for disparate elements of treatment, but none of the reviewed apps contained all four.

Data availability statement

Data are available upon request.

Acknowledgements

None.

Author contributions

Catalina Pacheco: Conceptualization (equal), Data curation-Lead, Formal analysis (equal), Investigation (equal), Methodology (equal), Writing – original draft (equal), Writing – review & editing (equal); Michael Scheeringa: Conceptualization (equal), Data curation (equal), Methodology (equal), Supervision (equal), Writing – original draft (equal), Writing – review & editing (equal).

Financial support

None.

Conflicts of interest

The second author receives royalties from Guilford Press, Central Recovery Press, and Psychology Today.

Ethical standards

Authors have abided by the Ethical Principles of Psychologists and Code of Conduct as set out by the BABCP and BPS.

References

Further reading

Liu, F. F., Cruz, R. A., Rockhill, C. M., & Lyon, A. R. (2019). Mind the gap: considering disparities in implementing measurement-based care. Journal of the American Academy of Child and Adolescent Psychiatry, 58, 459461. https://doi.org/10.1016/j.jaac.2018.11.015 CrossRefGoogle ScholarPubMed
Imel, Z. E., Caperton, D. D., Tanana, M., & Atkins, D. C. (2017). Technology-enhanced human interaction in psychotherapy. Journal of Counseling Psychology, 64, 385393. https://doi.org/10.1037/cou0000213 CrossRefGoogle ScholarPubMed

References

Andersson, G. (2009). Using the internet to provide cognitive behaviour therapy. Behaviour Research and Therapy, 47, 175180. https://doi.org/https://doi.org/10.1016/j.brat.2009.01.010 CrossRefGoogle ScholarPubMed
Andersson, G., Estling, F., Jakobsson, E., Cuijpers, P., & Carlbring, P. (2011). Can the patient decide which modules to endorse? an open trial of tailored internet treatment of anxiety disorders. Cognitive Behaviour Therapy, 40, 5764. https://doi.org/10.1080/16506073.2010.529457 CrossRefGoogle ScholarPubMed
Bakker, D., Kazantzis, N., Rickwood, D., & Rickard, N. (2018). A randomized controlled trial of three smartphone apps for enhancing public mental health. Behaviour Research and Therapy, 109, 7583. https://doi.org/https://doi.org/10.1016/j.brat.2018.08.003 CrossRefGoogle ScholarPubMed
Brattland, H., Koksvik, J. M., Burkeland, O., Gråwe, R. W., Klöckner, C., Linaker, O. M., Ryum, T., Wampold, B., Lara-Cabrera, M. L., & Iversen, V. C. (2018). The effects of routine outcome monitoring (ROM) on therapy outcomes in the course of an implementation process: a randomized clinical trial. Journal of Counseling Psychology, 65, 641652. https://doi.org/10.1037/cou0000286 CrossRefGoogle Scholar
Bry, L. J., Chou, T., Miguel, E., & Comer, J. S. (2018). Consumer smartphone apps marketed for child and adolescent anxiety: a systematic review and content analysis. Behavior Therapy, 49, 249261. https://doi.org/10.1016/j.beth.2017.07.008 CrossRefGoogle ScholarPubMed
Cartreine, J. A., Ahern, D. K., & Locke, S. E. (2010). A roadmap to computer-based psychotherapy in the United States. Harvard Review of Psychiatry, 18, 8095. https://doi.org/10.3109/10673221003707702 CrossRefGoogle ScholarPubMed
Ellis, L., Campbell, A., Sethi, S., & O’Dea, B. (2011). Comparative randomized trial of an online cognitive-behavioral therapy program and an online support group for depression and anxiety. Journal of Cyber Therapy and Rehabilitation, 4, 461467.Google Scholar
Farrer, L., Christensen, H., Griffiths, K. M., & Mackinnon, A. (2011). internet-based cbt for depression with and without telephone tracking in a national helpline: randomised controlled trial. PloS One, 6, e28099. https://doi.org/10.1371/journal.pone.0028099 CrossRefGoogle Scholar
Fitzpatrick, K. K., Darcy, A., & Vierhile, M. (2017). delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (Woebot): a randomized controlled trial. JMIR Mental Health, 4, e19. https://doi.org/10.2196/mental.7785 CrossRefGoogle ScholarPubMed
Fulmer, R., Joerin, A., Gentile, B., Lakerink, L., & Rauws, M. (2018). using psychological artificial intelligence (Tess) to relieve symptoms of depression and anxiety: randomized controlled trial. JMIR Mental Health, 5, e64. https://doi.org/10.2196/mental.9782 CrossRefGoogle ScholarPubMed
Goldberg, L. R. (1968). Simple models or simple processes? Some research on clinical judgments. American Psychologist, 23, 483496. https://doi.org/10.1037/h0026206 CrossRefGoogle ScholarPubMed
Imel, Z. E., Caperton, D. D., Tanana, M., & Atkins, D. C. (2017). Technology-enhanced human interaction in psychotherapy. Journal of Counseling Psychology, 64, 385393. https://doi.org/10.1037/cou0000213 CrossRefGoogle ScholarPubMed
Kladnitski, N., Smith, J., Uppal, S., James, M. A., Allen, A. R., Andrews, G., & Newby, J. M. (2020). Transdiagnostic internet-delivered CBT and mindfulness-based treatment for depression and anxiety: a randomised controlled trial. Internet Interventions, 20, 100310. https://doi.org/10.1016/j.invent.2020.100310 CrossRefGoogle ScholarPubMed
Kuhn, E., Kanuri, N., Hoffman, J. E., Garvert, D. W., Ruzek, J. I., & Taylor, C. B. (2017). A randomized controlled trial of a smartphone app for posttraumatic stress disorder symptoms. Journal of Consulting and Clinical Psychology, 85, 267273. https://doi.org/10.1037/ccp0000163 CrossRefGoogle ScholarPubMed
Lagan, S., D’Mello, R., Vaidyam, A., Bilden, R., & Torous, J. (2021). Assessing mental health apps marketplaces with objective metrics from 29,190 data points from 278 apps. Acta Psychiatrica Scandinavica. https://doi.org/https://doi.org/10.1111/acps.13306 CrossRefGoogle ScholarPubMed
Lau, N., O’Daffer, A., Colt, S., Yi-Frazier, J. P., Palermo, T. M., McCauley, E., & Rosenberg, A. R. (2020). Android and iPhone mobile apps for psychosocial wellness and stress management: systematic search in app stores and literature review. JMIR mHealth and uHealth, 8, e17798. https://doi.org/10.2196/17798 CrossRefGoogle ScholarPubMed
Liu, F. F., Cruz, R. A., Rockhill, C. M., & Lyon, A. R. (2019). mind the gap: considering disparities in implementing measurement-based care. Journal of the American Academy of Child and Adolescent Psychiatry, 58, 459461. https://doi.org/10.1016/j.jaac.2018.11.015 CrossRefGoogle ScholarPubMed
Mackinnon, A., Griffiths, K. M., & Christensen, H. (2008). Comparative randomised trial of online cognitive-behavioural therapy and an information website for depression: 12-month outcomes. British Journal of Psychiatry, 192, 130134. https://doi.org/10.1192/bjp.bp.106.032078 CrossRefGoogle Scholar
Mahoney, A. E., Mackenzie, A., Williams, A. D., Smith, J., & Andrews, G. (2014). Internet cognitive behavioural treatment for obsessive compulsive disorder: a randomised controlled trial. Behaviour Research and Therapy, 63, 99106. https://doi.org/10.1016/j.brat.2014.09.012 CrossRefGoogle ScholarPubMed
Miner, A., Kuhn, E., Hoffman, J. E., Owen, J. E., Ruzek, J. I., & Taylor, C. B. (2016). Feasibility, acceptability, and potential efficacy of the PTSD Coach app: a pilot randomized controlled trial with community trauma survivors. Psychological Trauma: Theory, Research, Practice, and Policy, 8, 384392. https://doi.org/10.1037/tra0000092 CrossRefGoogle ScholarPubMed
Moberg, C., Niles, A., & Beermann, D. (2019). guided self-help works: randomized waitlist controlled trial of Pacifica, a mobile app integrating cognitive behavioral therapy and mindfulness for stress, anxiety, and depression. Journal of Medical Internet Research, 21, e12556. https://doi.org/10.2196/12556 CrossRefGoogle Scholar
O’Kearney, R., Gibson, M., Christensen, H., & Griffiths, K. M. (2006). Effects of a cognitive-behavioural internet program on depression, vulnerability to depression and stigma in adolescent males: a school-based controlled trial. Cognitive Behaviour Therapy, 35, 4354. https://doi.org/10.1080/16506070500303456 CrossRefGoogle ScholarPubMed
Olff, M. (2015). Mobile mental health: a challenging research agenda. European Journal Psychotraumatology, 6, 27882. https://doi.org/10.3402/ejpt.v6.27882 CrossRefGoogle ScholarPubMed
Possemato, K., Kuhn, E., Johnson, E., Hoffman, J. E., Owen, J. E., Kanuri, N., De Stefano, L., & Brooks, E. (2016). Using PTSD Coach in primary care with and without clinician support: a pilot randomized controlled trial. General Hospital Psychiatry, 38, 9498. https://doi.org/10.1016/j.genhosppsych.2015.09.005 CrossRefGoogle ScholarPubMed
Powell, J., Hamborg, T., Stallard, N., Burls, A., McSorley, J., Bennett, K., Griffiths, K. M., & Christensen, H. (2013). Effectiveness of a web-based cognitive-behavioral tool to improve mental well-being in the general population: randomized controlled trial. Journal of Medical Internet Research, 15, e2. https://doi.org/10.2196/jmir.2240 CrossRefGoogle Scholar
Richards, D., Enrique, A., Eilert, N., Franklin, M., Palacios, J., Duffy, D., Earley, C., Chapman, J., Jell, G., Sollesse, S., & Timulak, L. (2020). A pragmatic randomized waitlist-controlled effectiveness and cost-effectiveness trial of digital interventions for depression and anxiety. npj Digital Medicine, 3, 85. https://doi.org/10.1038/s41746-020-0293-8 CrossRefGoogle ScholarPubMed
Richards, D., Timulak, L., O’Brien, E., Hayes, C., Vigano, N., Sharry, J., & Doherty, G. (2015). A randomized controlled trial of an internet-delivered treatment: its potential as a low-intensity community intervention for adults with symptoms of depression. Behaviour Research and Therapy, 75, 2031. https://doi.org/10.1016/j.brat.2015.10.005 CrossRefGoogle ScholarPubMed
Rose, R. D., Buckey, J. C. Jr, Zbozinek, T. D., Motivala, S. J., Glenn, D. E., Cartreine, J. A., & Craske, M. G. (2013). A randomized controlled trial of a self-guided, multimedia, stress management and resilience training program. Behaviour Research and Therapy, 51, 106112. https://doi.org/10.1016/j.brat.2012.11.003 CrossRefGoogle ScholarPubMed
Sandoval, L. R., Buckey, J. C., Ainslie, R., Tombari, M., Stone, W., & Hegel, M. T. (2017). Randomized controlled trial of a computerized interactive media-based problem solving treatment for depression. Behavior Therapy, 48, 413425. https://doi.org/10.1016/j.beth.2016.04.001 CrossRefGoogle ScholarPubMed
Scheeringa, M. S. (2020). A different way to mind the gap: mandated versus voluntary collection of measures. Journal of the American Academy of Child and Adolescent Psychiatry, 59, 576577. https://doi.org/10.1016/j.jaac.2019.06.021 CrossRefGoogle ScholarPubMed
Schure, M. B., Lindow, J. C., Greist, J. H., Nakonezny, P. A., Bailey, S. J., Bryan, W. L., & Byerly, M. J. (2019). Use of a fully automated internet-based cognitive behavior therapy intervention in a community population of adults with depression symptoms: randomized controlled trial. Journal of Medical Internet Research, 21, e14754. https://doi.org/10.2196/14754 CrossRefGoogle Scholar
Sethi, S., Campbell, A. J., & Ellis, L. A. (2010). The use of computerized self-help packages to treat adolescent depression and anxiety. Journal of Technology in Human Services, 28, 144160. https://doi.org/10.1080/15228835.2010.508317 CrossRefGoogle Scholar
Shafran, R., Clark, D. M., Fairburn, C. G., Arntz, A., Barlow, D. H., Ehlers, A., Freeston, M., Garety, P. A., Hollon, S. D., Ost, L. G., Salkovskis, P. M., Williams, J. M., & Wilson, G. T. (2009). Mind the gap: improving the dissemination of CBT. Behaviour Research and Therapy, 47, 902909. https://doi.org/10.1016/j.brat.2009.07.003 CrossRefGoogle ScholarPubMed
Shen, N., Levitan, M. J., Johnson, A., Bender, J. L., Hamilton-Page, M., Jadad, A. R., & Wiljer, D. (2015). Finding a depression app: a review and content analysis of the depression app marketplace. JMIR MHealth and UHealth, 3, e3713. https://doi.org/10.2196/mhealth.3713 CrossRefGoogle ScholarPubMed
Slade, K., Lambert, M. J., Harmon, S. C., Smart, D. W., & Bailey, R. (2008). Improving psychotherapy outcome: the use of immediate electronic feedback and revised clinical support tools. Clinical Psychology & Psychotherapy, 15, 287303. https://doi.org/10.1002/cpp.594 CrossRefGoogle ScholarPubMed
Stech, E. P., Grierson, A. B., Chen, A. Z., Sharrock, M. J., Mahoney, A. E. J., & Newby, J. M. (2020). Intensive one-week internet-delivered cognitive behavioral therapy for panic disorder and agoraphobia: a pilot study. Internet Interventions, 20, 100315. https://doi.org/https://doi.org/10.1016/j.invent.2020.100315 CrossRefGoogle ScholarPubMed
Twomey, C., O’Reilly, G., Byrne, M., Bury, M., White, A., Kissane, S., McMahon, A., & Clancy, N. (2014). A randomized controlled trial of the computerized CBT programme, MoodGYM, for public mental health service users waiting for interventions. British Journal of Clinical Psychology, 53, 433450. https://doi.org/10.1111/bjc.12055 CrossRefGoogle ScholarPubMed
Walfish, S., McAlister, B., O’Donnell, P., & Lambert, M. J. (2012). An investigation of self-assessment bias in mental health providers. Psychological Reports, 110, 639644. https://doi.org/10.2466/02.07.17.PR0.110.2.639-644 CrossRefGoogle ScholarPubMed
Weisz, J. R., Kuppens, S., Eckshtain, D., Ugueto, A. M., Hawley, K. M., & Jensen-Doss, A. (2013). Performance of evidence-based youth psychotherapies compared with usual clinical care: a multilevel meta-analysis. JAMA Psychiatry, 70, 750761. https://doi.org/10.1001/jamapsychiatry.2013.1176 CrossRefGoogle ScholarPubMed
Figure 0

Table 1. Characteristics of included apps

Figure 1

Figure 1. PRISMA flow diagram of the inclusion process.

Figure 2

Table 2. Apps that are supported by randomized clinical trial evidence of treatment effectiveness

Submit a response

Comments

No Comments have been published for this article.