Introduction
In 1968, Lewis Goldberg was at the forefront of asking how the accuracy of ‘clinical wisdom’ might be improved. He noted discouraging conclusions from studies that clinical judgements between individual clinicians tended to be unreliable (Goldberg, Reference Goldberg1968). He concluded that if ‘complex clinical inferences’ are to be learned reliably by clinicians, there must be some form of feedback which must include whether clinical judgements were accurate or not. A more reliable methodology was needed to ‘substitute for the more ephemeral storage capacities of the unaided human brain’. Goldberg believed this substitute was research, but with the advent of computing power availability within widely available mobile mental health applications (apps), Goldberg’s recommendation may be taking on new possibilities.
Computer-facilitated mental health care has grown tremendously in recent years with the creation of software that provides one or more components of traditional office-based care. These include software programs for phones (mobile apps) or computers (basic websites and progressive web apps, which are websites that behave and feel like a mobile app) which can facilitate virtual mental health care by providing flexible hours for treatment, reducing logistical issues and potentially cost, making treatment more accessible to patients with low motivation or high anxiety, increasing discretion, and increasing patient engagement (Cartreine et al., Reference Cartreine, Ahern and Locke2010; Imel et al., Reference Imel, Caperton, Tanana and Atkins2017; Olff, Reference Olff2015). Other types of software provide online self-assessments, provide psychoeducation, or track symptom change over time.
Although apps offer novel methods to deliver components of clinical care such as stress management, emotion tracking and mindfulness, it is unclear whether apps can fully replace or meaningfully improve the experience of clinical therapy. A recent, large review of mental health mobile applications by Lau and colleagues (Reference Lau, O’Daffer, Colt, Yi-Frazier, Palermo, McCauley and Rosenberg2020) found that free mobile apps available on the iPhone or Android app stores covered 31 unique intervention and didactic content categories. The majority of apps were designed as self-help interventions and were not clearly intended for individuals with psychopathology, and thereby are necessarily analogous to clinical treatment. Only 4.7% of apps were designed specifically for psychological disorders. This review highlighted a gap in our understanding of the state of mobile mental health apps. Most apps lack a clear clinical orientation, and there has not been a comprehensive review of apps intended for help-seeking individuals with clinical-level, impairing disorders.
Another gap is that there has been no review that we are aware of that has examined the extent to which computer power has been harnessed to complete clinical tasks. Here, ‘computer power’ refers to the superior memory and data processing capacities compared with human brains which can be utilized to collect and analyse data, and subsequently use that data to perform tasks. To date, most apps have been promoted as a way to improve accessibility to skills based in therapeutic theories, or as simple tracking tools to complement sessions with therapists. The potential of the processing power of computers seems untapped. For example, Imel and colleagues (Reference Imel, Caperton, Tanana and Atkins2017) describe a hypothetical scenario in which machine learning might be applied to transcripts of therapy sessions to predict treatment outcome. Apps open the door for massive computer processing capabilities to use data to do clinical tasks faster or better than the minds of individual clinicians (Olff, Reference Olff2015), but it is unclear how many, if any, apps harness these capabilities.
The purpose of this study is to review apps, both mobile and web-based, that utilize computer processing to enhance the treatment of clinical populations by using data collected by the app. Our goal was to characterize the content of these types of apps relevant to four main characteristics of traditional, in-person clinical treatment which could be augmented by computing power. Namely, these characteristics are (1) assessment/diagnosis, (2) treatment planning, (3) treatment fidelity tracking, and (4) tracking of treatment outcome. Each of these components has the potential to be significantly improved with the integration of existing technologies. The following is a discussion of several of the many possibilities in each component.
Assessment/diagnosis
Mental health apps could improve both the precision and accuracy of self-assessment. As mentioned by Imel et al. (Reference Imel, Caperton, Tanana and Atkins2017), the anonymity provided by mobile apps may influence users to disclose information they may choose to keep from a therapist due to feelings of shame. Furthermore, Olff (Reference Olff2015) suggested that mobile apps might be used to allow ecological momentary assessments which may promote accurate symptom monitoring. Olff (Reference Olff2015) also describes how the adaptability of computerized assessments enable the app to only display items on an assessment that are relevant to the user, for example, by only asking about events or symptoms that occurred in a specific time frame or by terminating an assessment for a specific disorder if screening questions indicate the absence of a disorder. Similarly, the adaptability of computerized assessments enables them to ask increasingly specific questions to parse out similar disorders and identify co-morbidities.
Treatment planning
Computing power could be used to automate more flexible and organized treatment planning. For example, apps could determine where to focus treatment depending on the results of an initial assessment and then recommend complete protocols, or pick and choose relevant modules from multiple protocols to build individualized treatment plans for patients with co-morbidities (Andersson, Reference Andersson2009). Additionally, computing power may help providers and patients collaboratively develop a treatment plan which focuses on areas that seem important to both individuals. Evidence shows that allowing patients to choose internet-delivered therapy modules produces comparable results to programs in which the clinician chooses the modules (Andersson et al., Reference Andersson, Estling, Jakobsson, Cuijpers and Carlbring2011). Whether apps are purely self-help or used with clinicians, providing a menu of modules from which to build a treatment plan may be an efficient form of treatment planning.
Furthermore, as treatments progress and apps continue collecting data, apps can automatically suggest adaptations to treatment plans as necessary. For example, if one type of treatment (e.g. cognitive behavioural therapy) does not seem to be benefiting a patient, an app could identify this quickly through repeated assessments and then either make a modest suggestion to re-think the current strategy (Brattland et al., Reference Brattland, Koksvik, Burkeland, Gråwe, Klöckner, Linaker, Ryum, Wampold, Lara-Cabrera and Iversen2018) or a more specific suggestion for another type of treatment (e.g. dialectical behaviour therapy).
Treatment fidelity
Treatment fidelity is defined as the degree to which a clinical treatment was delivered as intended. Computers excel at presenting stored data in pre-programmed sequences through intuitive and attractive interfaces. These features may be well-suited to facilitate fidelity to and compliance with lengthy, manualized treatment plans. Once a manualized treatment protocol is written into the code of an app or computerized program, that treatment will be delivered with fidelity. If the app is designed to track the progression through a treatment protocol, both the patient and clinician, if relevant, can see exactly where they are in the treatment plan and which steps come next. Even if there is a human clinician implementing the therapy, this structure may increase the fidelity with which a treatment is implemented.
Apps could also serve to increase the fidelity with which patients implement therapeutic tools in their daily lives. Many apps include descriptions of therapeutic techniques or activities, for example by including videos to guide the user through progressive muscle relaxation. By making these resources available at all times, apps can reinforce skills in real life that patients learned in-session. Imel and colleagues (Reference Imel, Caperton, Tanana and Atkins2017) theorize that access to these types of tools may increase patient participation in treatment.
Treatment outcome monitoring
Computing power delivered through phones and computers provides an avenue for both qualitative and quantitative outcome monitoring. Quantitatively, comparisons of pre- and post-treatment clinical assessments provide patients and therapists with clear evidence of changes in symptom severity and functional impairment. Moreover, use of apps to administer baseline and outcome measures may help ensure that the information is delivered to patients instead of only to clinicians. Qualitatively, if patients input specific goals at the beginning of treatment, apps can preserve these goals on a dashboard to minimize forgetting of difficult topics. In this way, the standardization provided by the app can ensure that these issues are followed up on more consistently than they might be in traditional therapy.
Furthermore, providers tend to over-estimate their effectiveness, so encouraging regular feedback may give therapists important feedback about their own effectiveness (Walfish et al., Reference Walfish, McAlister, O’Donnell and Lambert2012). Similar to Goldberg’s earlier recommendation (Reference Goldberg1968), Imel et al. (Reference Imel, Caperton, Tanana and Atkins2017) discussed the usefulness of specific, real-time feedback about clinical decision-making for developing clinical expertise and the lack of such feedback after licensure. Apps can be utilized to provide this type of feedback to therapists, thereby increasing the chances that the therapist provides the therapy in the way in which it was meant to be delivered.
As mentioned above, the purpose of this study is to review apps, both mobile and web-based, that utilize computer processing to enhance various aspects of treatment of clinical populations by using data collected by the app. Our research questions were as follows: (1) how many apps or websites utilize computer processing to perform or augment four main areas of clinical work, namely (a) assessment/diagnosis, (b) treatment planning, (c) treatment fidelity tracking, and (d) tracking of treatment outcome?; (2) how do apps gather data and provide feedback in these areas?; and (3) is there evidence that the available integrations of computer processing in these areas significantly enhances patient outcomes?
Method
From July to November 2020, apps and progressive web applications were extracted from several databases – the Google Play Store, the Apple App Store, and a virtual app guide curated by One Mind PsyberGuide. Mobile apps were extracted from the Google Play and Apple stores using the search terms ‘mental health’ and ‘mental health apps’. This was supplemented with Google searches using the search terms ‘progressive web apps mental health’ and ‘mental health web application’.
Information about the apps was extracted from the descriptions published in each database. If app descriptions did not provide enough information, the authors explored the app websites for the information of interest. If the details of apps were still unclear, or if apps did not have a website, apps were downloaded. Only free apps or apps with a free trial were downloaded. If information could not be determined about paid apps, the authors requested the information from the app developers. In the event that the app developers did not respond to the request for information, the unclear aspect was considered ‘unclear’ in both the final count and in Table 1.
Table 1. Characteristics of included apps
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220818122720701-0695:S1754470X22000368:S1754470X22000368_tab1.png?pub-status=live)
ACT, acceptance and commitment therapy; BA, behavioural activation; DBT, dialectical behvioural therapy; HRV, heart rate variability; IPT, interpersonal therapy; OCD, obsessive compulsive disorder; tDCS, transcranial direct current stimulation.
* Lyssn is the only service in our review that monitored treatment fidelity.
Inclusion criteria were that the apps (1) included a clinical aspect that could be used to treat psychiatric problems, (2) included a sophisticated computer-driven element calculated from user input, and (3) were in English. An app was defined as ‘clinical’ if it included an assessment or taught users skills from typical psychotherapy modalities (such as cognitive behavioural therapy, dialectical behaviour therapy, acceptance and commitment therapy, etc.) to treat psychiatric disorders. Apps could also be considered clinical if they included linkages to licensed telehealth clinicians. Apps that were solely for meditation, mindfulness, relaxation techniques and insomnia were excluded because application of these techniques is not limited to psychiatric treatment.
One author (C.R.P.) rated each app discovered through the searches as ‘clinical’ or ‘not clinical’ with reasons to support the ratings. The second author (M.S.S.) reviewed the apps and made independent ratings about ‘clinical’ status. If M.S.S. was uncertain about C.R.P.’s ratings, then M.S.S. undertook independent assessments of these apps.
A computer-driven element was defined as an app that used computer processing power to augment clinical therapy, either through a baseline assessment, treatment fidelity tracking, and/or treatment outcome monitoring, in a way that would be difficult or time-consuming for clinicians. Apps that simply provided and summed short (less than 10 items) clinical assessment measures did not meet the definition of ‘sophisticated’ and were excluded.
Both authors (C.R.P. and M.S.S.) rated every clinical app independently on sophisticated computer-driven elements, defined as computer processing of user input for at least one of these four typical activities of psychiatric treatment: (1) clinical assessment, (2) treatment planning, (3) treatment fidelity to a standardized protocol, or (4) tracking treatment outcomes. The app must have presented some sort of output as a result of user input. For example, an app could suggest lessons or activities to users based on the results of an assessment or augment the activities suggested based on the user’s previous ratings of an activity. Because of this requirement, apps that solely offered clinician-provided telehealth and did not offer some other computerized aspect of treatment were excluded. Apps that had at least one of the four computer-driven elements were retained. Discrepancies were resolved by discussion.
Next, both authors independently determined if the apps were self-help or therapist-assisted (or both), offered multiple types of treatments, made data collection mandatory or optional, was supported by empirical evidence, and how it gave feedback to users. Discrepancies were resolved by discussion.
The available empirical evidence of each app’s treatment effectiveness was found by exploring websites dedicated to the apps or by searching the names of the apps on Google Scholar, Research Gate, or PsycINFO. If empirical evidence was not found by these methods, the authors requested any existing evidence directly from the app developers as detailed by the respective app stores. ‘Empirical evidence’ was defined as positive results from a randomized clinical trial (RCT) on treatment outcome that included use of the app.
Results
Figure 1 represents a visual of the selection process. These search terms resulted in 351 apps from the Google store and 299 apps from the Apple store. Ninety-four apps were common between the two stores. One Mind PsyberGuide provided 198 apps, 33 of which were common between the Apple store and 43 of which were common between the Google Play Store. Duplicate apps were removed so that each app was only considered once in the final counts. The Google and literature searches resulted in 16 apps not found through the previous searches. Through the combination of these searches, 722 unique apps were found. Fourteen apps were not in English and were therefore excluded. After the initial search, 32 apps were no longer available for download on either store. This resulted in a total of 676 apps that were reviewed, 76 of which were downloaded. The two authors’ ratings agreed on whether apps were clinical 99.0% of the time (669 out of 676 apps). After discrepancies were investigated and discussed, only one of the first author’s ratings was changed. Out of the 676 apps, 513 were excluded for failure to meet our definition of having a clinical element, leaving 163 apps that were reviewed in the next step.
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220818122720701-0695:S1754470X22000368:S1754470X22000368_fig1.png?pub-status=live)
Figure 1. PRISMA flow diagram of the inclusion process.
Of these 163 apps, 104 were excluded due to a lack of a computer-driven element, resulting in a total of 59 apps with a clinical, computer-driven element which were included in the final qualitative review. A flow diagram of the inclusion process can be seen in Fig. 1. The authors’ ratings agreed about whether the app had a computer-driven element 77.9% of the time (127 out of 163 apps). A list of the included apps and their qualities can be found in Table 1. Twenty-nine of the apps were self-help, 15 were therapist-assisted, and 15 could be used either as self-help or with a therapist.
Self-assessment
Baseline clinical assessments were the most common computer-driven element of the apps with 55 out of 59 apps providing some sort of assessment. The content of measures in 53 of the 55 apps could be determined. The content of two apps could not be determined because they were behind paywalls and the developers did not respond to our requests for information.
Of the 53 apps for which content was known, 15 provided measures for a single syndrome: eight were for depression, four were for anxiety, two were for post`traumatic stress disorder (PTSD), and one was for obsessive-compulsive disorder (OCD). One app that assessed for depression and one app that assessed for anxiety also asked about the user’s stage of change.
Thirty-five apps provided measures for more than one syndrome. The most common was to provide both a depression and an anxiety screen, with 13 apps giving an anxiety and depression screen and four assessing anxiety, depression and stress. One app assessed for PTSD and depression, and one assessed for anxiety, OCD and phobias. Sixteen apps included a multitude of assessments. There was a wide variety of content within these apps. For example, Clinicom claims to assess for over 55 mental health conditions while Spring Health screened for many symptoms, psychiatric and family history, and social determinants of health. Two apps measured well-being, as opposed to symptoms. One of these apps also included a behavioral health scale. Lastly, one app used the Outcome Rating Scale and Session Rating Scale.
Twenty-two apps allowed the user to choose whether or not to take the assessment. Twenty-five apps required the assessment. For eight apps, it was unclear whether the assessments were mandatory or optional due to the measures being behind a paywall.
How did computer power analyse the data and present the results to users?
Thirty-four apps displayed summed scores of assessment measures. Of these, 15 included a qualitative severity rating (e.g. mild, moderate, severe). One app produced a score from a stress test but not for anxiety and depression questionnaires. Six other apps provided a qualitative severity rating with no scores. One app did not give a score or interpretation but used an inventory of anxieties to develop a customized exposure ladder. Four apps did not give any feedback. Two apps gave feedback about which stage of change the user was in and how their symptoms compared with other children their age. One app gave a suggested diagnosis. Seven apps were unclear from their available information.
Computer-guided treatment plan
Of the 53 apps that provided any treatment techniques, 11 provided more than one type of clinical treatment, and six of these updated the treatment plan depending on user input.
How did computer power use data to guide treatment plans?
Three of the apps that could guide treatment plans were chatbots (Woebot, Tess and Anxiety Test & Relief). These apps ask preliminary questions and then suggest exercises to help manage the identified issue. Tess follows up on a previously mentioned issue and asks for feedback. If the user expresses that the previous suggestion was not helpful, the chatbot suggests a different type of exercise. Anxiety Test & Relief uses the Tess algorithm. In contrast, Woebot does not check in to see how relevant or helpful the exercise was for the user.
The Trier Treatment Navigator uses the results of the baseline assessment to suggest strategies for treatment. Via repeated assessments, the system then identifies patients who are ‘not on track’ and suggests clinical exercises, worksheets and videos for the clinicians. MoodMission used the results of its preliminary assessment to suggest a ‘mission’ which could incorporate cognitive behavioural therapy (CBT), behavioural activation (BA), meditation, or relaxation exercises. The app has users rate their distress before and after completing the mission and adapts its suggestions to match the missions that most successfully lowered the distress score. Thrive, by Waypoint Health, offers therapy plans based on social skills training and CBT. According to their website, the algorithm gives specific recommendations depending on the user’s goals, past exercises, and experiences with depression.
Treatment fidelity
Only one out of the 59 programs tracked treatment fidelity.
How did computer power analyse the data and present the results to users?
Lyssn is described as artificial intelligence software that provides feedback about the treatment fidelity of therapy sessions. The software derives a transcript from psychotherapy sessions and uses that transcript to calculate an overall fidelity score, a percentage of non-adherent behaviours, scores for empathy and the ‘motivational interviewing spirit’, statistics on the amount of session time that the therapist spent talking, the number of open questions asked, and the number of reflections made. Data collection is mandatory.
Symptom tracking/outcome monitoring
The second most common computer-driven component was symptom or outcome monitoring via repeated assessments (34/59 apps). The content of measures in 33 apps could be determined. Symptom tracking was voluntary for 14 of the apps and mandatory for 14 of the apps. It was unclear whether six apps required monitoring. It is worth noting that three apps, TruReach, OCD Challenge, and Dartmouth Path, offer repeated assessments but do not keep a log of previous scores.
How did computer power analyse the data and present the results to users?
Twenty-two apps graphed assessments over time. Nine of these apps included qualitative severity ratings for each score. One of these apps gave only qualitative severity ratings after the assessments and graphed these ratings over time. One app described how much the user’s most recent score differed from their baseline assessment score (i.e. +25 from baseline) but did not graph the changes. Four apps kept a log of scores but did not graph them or otherwise quantify change. One of these apps also kept a log of severity ratings. Three of the apps that graphed scores over time were used to predict treatment response and included alerts when repeated assessments indicated a negative change trajectory. It was unclear how seven of the apps gave feedback on their repeated assessments.
Evidence
Twelve out of the 59 apps (20.3%) have been shown to improve clinical outcomes in RCTs. Two of these 12 apps were sophisticated measurement apps without a treatment component, whereas the other 10 included a treatment component. See Table 2 for a list of the apps that are supported by empirical evidence. Eight of the 12 apps were tested against wait-list controls, and all were found more effective. Only four apps were tested against another active treatment that did not involve an app, and all four were found more effective (Brattland et al., Reference Brattland, Koksvik, Burkeland, Gråwe, Klöckner, Linaker, Ryum, Wampold, Lara-Cabrera and Iversen2018; Mahoney et al., Reference Mahoney, Mackenzie, Williams, Smith and Andrews2014; Rose et al., Reference Rose, Buckey, Zbozinek, Motivala, Glenn, Cartreine and Craske2013; Slade et al., Reference Slade, Lambert, Harmon, Smart and Bailey2008).
Table 2. Apps that are supported by randomized clinical trial evidence of treatment effectiveness
![](https://static.cambridge.org/binary/version/id/urn:cambridge.org:id:binary:20220818122720701-0695:S1754470X22000368:S1754470X22000368_tab2.png?pub-status=live)
* The PCOMS method of outcome monitoring was tested in multiple RCTs but the graphs were created on paper before software methods were deployed with clinicians. Those studies are not included.
Most trials were conducted with patients who had not initiated an approach to a clinic; they were recruited from advertisements in the community. Only four apps were tested in help-seeking clinic populations; three of these were tested against wait lists (Farrer et al., Reference Farrer, Christensen, Griffiths and Mackinnon2011; Sandoval et al., Reference Sandoval, Buckey, Ainslie, Tombari, Stone and Hegel2017; Slade et al., Reference Slade, Lambert, Harmon, Smart and Bailey2008; Twomey et al., Reference Twomey, O’Reilly, Byrne, Bury, White, Kissane, McMahon and Clancy2014), and one was tested against another active intervention (Brattland et al., Reference Brattland, Koksvik, Burkeland, Gråwe, Klöckner, Linaker, Ryum, Wampold, Lara-Cabrera and Iversen2018).
Discussion
Out of the 676 apps identified through our search, 59 of them contained a computer-driven element, according to our definition, that performed or augmented at least one of the four main areas of clinical work, namely, assessment, treatment planning, fidelity monitoring, and outcome monitoring. Of these 59 apps, 55 included a baseline assessment, six updated the treatment plan according to reassessments, one tracked treatment fidelity, and 35 monitored clinical outcomes with some form of reassessment. None of the apps included all four clinical elements. It is worth noting that we found a total of 163 apps that seemed applicable to individuals with clinical-level mental health problems. While this is still a large number of apps for consumers to sort through, it is far fewer than the 722 apps retrieved through our initial search. The vast majority of apps that can be found in app stores with keywords related to mental health were designed for wellness, everyday stressors, and non-clinical-level issues, a finding which is similar to previous research which has found that the majority of mental health treatment apps are not developed by mental health professionals or researchers and rarely include clinically accurate, evidence-based information (Bry et al., Reference Bry, Chou, Miguel and Comer2018; Shen et al., Reference Shen, Levitan, Johnson, Bender, Hamilton-Page, Jadad and Wiljer2015).
Assessments were by far the most common computer-driven element in apps. These appear to add valuable standardized information for single psychiatric syndromes, but this review raises two concerns. First, the majority of apps did not assess for conditions beyond anxiety and depression. Many psychiatric syndromes, if not most, co-exist with features of other psychiatric syndromes that may impact choice of treatment strategies, duration of treatment, and prognosis for response. Second, approximately half (34/59) of the apps offered baseline assessments but made them optional. Evidence has shown that questionnaires are often not completed when they are optional (Liu et al., Reference Liu, Cruz, Rockhill and Lyon2019), but they are completed close to 100% of the time when they are required (Scheeringa, Reference Scheeringa2020).
Monitoring outcomes with repeat assessments was the second most common computer-driven element we found. Some apps, such as PCOMS and Trier Treatment Navigator, have demonstrated how automated computer processing can perform this function in a feasible and effective manner. These apps determine who is at risk of treatment failure and alert clinicians if their patients are ‘off-track’ of their expected treatment response. At least half of the apps that allow repeat assessments, however, make it optional and do not link the results into either therapists’ workflows or clients’ user experiences in a compelling manner.
With regard to treatment planning, the most common strategy used the results of self-assessments to suggest CBT protocols, and the suggestions generally were not updated from additional user input throughout treatment. For example, Mayo Clinic’s Anxiety Coach used the user’s specific anxieties to create an exposure hierarchy, but there was no option to revise the hierarchy once it was created. The apps that did use data to update the ‘treatment’ generally did not create overall treatment plans, but rather suggested activities depending on the assessment. The exception was Tess, the chatbot that claims to update therapy modalities depending on user input. However, it was difficult to determine how many, or which types of therapies are offered by the app. While the entirety of available apps may provide a wide range of treatment techniques, consumers must sort through a bewildering number of apps to find them. Utilization of computer processing power appears at a very preliminary stage for helping users navigate the complex world of psychotherapy options.
Not surprisingly, treatment fidelity monitoring was the least developed computer-driven element, consistent with the trend of most therapist to avoid evidence-based therapies (EBT) outside of academic trials. Despite consistent evidence of the superiority of EBTs versus usual care (Weisz et al., Reference Weisz, Kuppens, Eckshtain, Ugueto, Hawley and Jensen-Doss2013), most patients are not offered them in practice (Shafran et al., Reference Shafran, Clark, Fairburn, Arntz, Barlow, Ehlers, Freeston, Garety, Hollon, Ost, Salkovskis, Williams and Wilson2009). It is too early to tell if apps will make an impact in this area. Lyssn, the single app we found that offers fidelity tracking, has yet to be tested.
Our review process highlighted a lack of standardization and transparency in ways to search for information about apps. It was difficult to find accurate information on the apps in this review. The descriptions on both the app websites and in the app stores were often general and did not give specific details about the contents of the app. For example, it was often difficult to determine what type of assessment the app contained (i.e. whether they used standardized, validated measures, or asked general questions about psychiatric disorders) or what exactly the treatment consisted of. Additionally, often an app’s description claimed the app was based in CBT or used CBT principles but only included isolated techniques such as deep breathing or progressive muscle relaxation. This is understandable as the app field is relatively new and the major app stores have not agreed on a uniform reporting code.
We found that less than a quarter of the reviewed apps (12/59) have been shown to result in significant clinical outcomes in an RCT. Thus, the research evidence that might indicate these apps add value above and beyond traditional psychotherapy with a human therapist is promising but preliminary. Four apps outperformed another active intervention and eight apps have outperformed wait-list control groups in randomized trials. However, most of this evidence may not be generalizable to traditional clinical work because only four of these trials were conducted with help-seeking samples. Only one app has been shown more effective than another non-app intervention in a clinical population (Brattland et al., Reference Brattland, Koksvik, Burkeland, Gråwe, Klöckner, Linaker, Ryum, Wampold, Lara-Cabrera and Iversen2018). This lack of evidence-based content in mental health apps is well-documented (Lau et al., Reference Lau, O’Daffer, Colt, Yi-Frazier, Palermo, McCauley and Rosenberg2020; Olff, Reference Olff2015). Overall, it was difficult to ascertain which apps had been tested in any type of empirical research. Although the websites for several of the apps included studies that the developers had conducted, many did not. Additionally, many studies did not specifically state the name of the app used so it was difficult to know which app (or which version of that app) was used in published studies.
This raises more of a question rather than a limitation of whether traditional RCT evidence is really the best metric to judge apps. Apps are mostly extensions of already-proven methods, except perhaps for chatbots and Lyssn, and it is not obvious that effectiveness RCTs are truly the next incremental step of science for this field. Furthermore, it is unclear whether a favourable performance in an RCT would necessarily translate into clinical benefits. Although there are several apps that appear promising, more longitudinal research is needed to identify any beneficial effects of mental health applications. The added value of apps lies in the potential for harnessing computer power to augment clinical processes, disseminating treatments to larger populations who cannot or will not access in-person clinics, and enabling clients to have a more rewarding experience.
Technical metrics that are currently the most frequently available in app stores, such as privacy, security, interoperability, and when the app was last updated, are additional important metrics that complement the clinical components we reviewed. Satisfaction metrics, such as star ratings, number of downloads, and number of reviews posted, may serve as proxies of clinical effectiveness, but are unreliable. Star ratings and reviews can be skewed positively by fake reviews and skewed negatively by self-selection bias of unhappy customers being relatively more vocal. The number of downloads can be skewed by marketing and cost factors. One recent review demonstrated that most technical and satisfaction metrics showed no correlation with each other (Lagan et al., Reference Lagan, D’Mello, Vaidyam, Bilden and Torous2021). It is acknowledged that app quality is a multidimensional construct that includes clinical utility, privacy and user satisfaction.
Strengths of this review include its comprehensive review of the two main mobile app stores. All of the apps that were available at the time of this report were reviewed in some capacity. Weaknesses include that we only included English language apps and that we did not explore features in detail that were behind a paywall or were restricted to employees whose company purchased access to the apps. However, most of the basic information about those apps were available on developers’ websites and it is unlikely novel features were missed.
In summary, the majority of the more than 700 self-proclaimed ‘mental health’ apps currently available do not contain a truly clinical aspect according to our definition. Out of the 163 that do, a minority implement any computer-driven process to help with treatment. This finding is similar to those of a review of smartphone apps for anxiety which found that attempts to take advantage of technological possibilities (e.g. sensors and ecological momentary assessment) were rare (Bry et al., Reference Bry, Chou, Miguel and Comer2018). Out of the subset of available apps that did include a computer-driven process in our review, it seems that there are several sophisticated available apps that have the potential to meaningfully augment psychotherapy. Apps have undoubtedly increased dissemination of valuable psychiatric and psychological information, but it does seem that the full potential of computer processing appears unreached in mental health-related apps. Apps may be making a dent in Goldberg’s lament that clinical wisdom could stand improvements, but it is evident that computer processing is not easily applied via apps to the ‘complex clinical inferences’ that clinicians need to master.
Key practice points
-
(1) Most commercially available, self-proclaimed mental health-related applications do not contain elements that make them relevant to clinical problems.
-
(2) Most apps that can be used for clinical populations do not meaningfully gather user data to supplement assessment or treatment.
-
(3) There are promising models for disparate elements of treatment, but none of the reviewed apps contained all four.
Data availability statement
Data are available upon request.
Acknowledgements
None.
Author contributions
Catalina Pacheco: Conceptualization (equal), Data curation-Lead, Formal analysis (equal), Investigation (equal), Methodology (equal), Writing – original draft (equal), Writing – review & editing (equal); Michael Scheeringa: Conceptualization (equal), Data curation (equal), Methodology (equal), Supervision (equal), Writing – original draft (equal), Writing – review & editing (equal).
Financial support
None.
Conflicts of interest
The second author receives royalties from Guilford Press, Central Recovery Press, and Psychology Today.
Ethical standards
Authors have abided by the Ethical Principles of Psychologists and Code of Conduct as set out by the BABCP and BPS.
Comments
No Comments have been published for this article.